-
Notifications
You must be signed in to change notification settings - Fork 28
HTTP interface for transient programs
This is a tutorial intended to learn to invoke any part of a Transient-universe program using HTTP requests. It is also very useful to understand the mechanism of serialization and remote execution of distributed transient programs.
This is not the REST API that is also included in transient-universe. This API is shown in examples like api.hs which is undocumented but I hope, may be self-explaining. Note: this is the api.hs version for the "new" branch which is being detailed here.
Transient is a library for the language Haskell that allows high-level effects like parallelism, concurrency, asynchronicity, streaming and distributed computing and manage them without special constructions.
To run the examples you need the "new" branches. Also, you need ghc 8.x and cabal:
git clone http://github.com/transient-haskell/transient
git checkout new
git clone http://github.com/transient-haskell/transient-universe
git checkout new
cd transient
cabal install
cd transient-universe
cabal install
With the new version of transient, any component of a transient program that uses the cloud monad can be invoked by an HTTP request.
As an example, this program:
main= keep $ initNode $ inputNodes <|> do
....
....
(NOTE: add complete examples with includes, compilation, etc)
That program initializes a server node and lets the user to input hosts and ports of other nodes by entering them in the command line or as interactive console input. It is the standard way to initialize a node:
> runghc program.hs -p start/localhost/8000/add/localhost/3000/n
That initializes the node at localhost:3000
(read by initNode) and adds a node localhost:3000
to the list of nodes know (this latter is processed by inputNodes).
In the interactive mode the parameters are entered as menu options with a guided dialog:
>runghc program.hs
...
Enter start to: re/start node
...
>start
option: start
hostname of this node. (Must be reachable)? > locahost
"locahost"
port to listen? > 8000
8000
Connected to port: 8000
Enter list to: list nodes
Enter add to: add a new node
option: add
Hostname of the node (none): > localhost
"localhost"
port? > 3000
3000
services? ([]) > n
connect to the node to interchange node lists? (n) "n"
Added node: ("localhost",3000,[])
Now, as a free functionality without additional code, it is possible to tell the program to add a node localhost:3001
by invoking it remotely with a HTTP request, with this URL:
> curl 'http://localhost:8000/0/1/e/f/()/w/"add"/"localhost"/3001/[]/[]/"n"/' --globoff
(Note: the final form of this request may vary)
it returns the following message by HTTP:
SMore/1/100103000/("localhost",3001,[])/()/e/e/
And print the following in the terminal of the node:
Added node: ("localhost",3001,[])
Limitations as is now:
- No authentication
- A undetermined number of response messages can be received, no total length and no chunked encoding as is now.
To inhabilitate HTTP request in a branch of the computation or in the whole program you can use noHTTP
:
main= keep $ initNode $ this <|> that
that= do
noHTTP
...
In this program, this executes until it finds an asynchronous operation like abduce, async, react, spawn, parallel, waitEvents or empty etc. Then it executes the second term that
.
All HTTP request to execute that
branch will receive a 403 message.
This tutorial is about how to obtain such URLs and about how and why they work. It will also help a lot in gaining insights about how the closure serialization and remote execution in transient-universe works.
Suppose that I have this program:
import Transient.Base
import Transient.Move
import Transient.Move.Utils
main = keep $ initNode $
localIO $ putStrLn "hello world"
This program initializes a server but also is a console application. When you run it it produces:
>runghc program.hs -p start/localhost:8000
...
...
option: start
hostname of this node. (Must be reachable)? "localhost"
port to listen? 8000
Connected to port: 8000
hello world
But we can make the program to begin execution at any place in the program if it is invoked with an HTTP request with the appropriate URL. To discover the URL which does so, we insert showURL
in the place that we want to call:
main = keep $ initNode $ do
showURL
localIO $ putStrLn "hello world"
Now the program will reveal the URL:
>program -p start/localhost:8000
... (additional output..)
option: start
hostname of this node. (Must be reachable)? "localhost"
port to listen? 8000
Connected to port: 8000
'http://localhost:8000/0/0/e/'
hello world
before "hello world", the program print a URL. If you invoke it using curl:
> curl http://localhost:8000/0/0/e/
you will see that the program executes again and print in the terminal:
'http://localhost:8000/0/0/e/'
hello world
The effect is as if we have executed the continuation from the location of showURL
on.
You also will see that curl receives no response and stay waiting for something. This will never happen with this program.
To receive something you need to tell the program to do so. Transient has teleport
which transport the closure which the program is executing to the node which is connected, in this case, the program that invoked the URL.
If we add teleport:
main = keep $ initNode $ do
showURL
localIO $ putStrLn "hello world"
teleport
if we invoke curl against this program we will see a response:
> curl http://localhost:8000/0/0/e/
SMore/0/10002000/e/()/e/
What is that? Like the URL entered, the response is the serialization of the closure which ran in the server. In that response, the program says the following:
-
SMore
: there may be more responses -
0/
: I send to destination 0 in the calling program -
10002000/
: identifier of theteleport
which made the response, which is also a place in the program, a closure. -
e/
: Ie
xecuted what is inside of the first statement (which isinitNode
) -
showURL
is executed but has no trace -
()/
: is the result ofputStrLn "hello world"
-
e/
: Ie
xecuted the next statement (theteleport
)
In the same way, the calling url contains /0/0/e/
which means: I call you from 0, I call the closure 0, that is, the beginning of the program and I tell you e/
which means execute what is inside of the first method (initNode
). And that is what the program executes.
All lines which use local
or localIO
are serialized in the response. This program:
main = keep $ initNode $ do
showURL
localIO $ putStrLn "hello world"
local $ return (42 :: Int)
teleport
produces this response when invoked by curl:
> curl http://localhost:8000/0/0/e/
SMore/0/10002000/e/()/42/e/
Do you see the difference?
however, if I change the second 0 and put any other number. For example, 1:
>curl http://localhost:8000/0/1/e/
SMore/1/20002000/()/42/e/
the output include only the segment of the path that has been generated by the server ()/42/e/
.
The reason is that the response is now sent to a closure 1 that is not 0 (the beginning of the calling program, since a transient program assumes that it is talking to another transient node) The node assumes that the caller need only what is new. That is to avoid unnecessary repetitions.
This propram:
main = keep $ initNode $ do
showURL
localIO $ putStrLn "hello world"
local $ choose [1..3]
teleport
when executed, it produces three responses:
>curl http://localhost:8000/0/1/e/
SMore/0/10002000/1/e/
SMore/0/10002000/2/e/
SMore/0/10002000/3/e/
corresponding with the three values returned by choose
.. It is possible to receive a finite or infinite stream.
A program can be invoked with different URLs for executing different things. This program:
main = keep $ initNode $ hi "hello" <|> hi "world"
where
hi text= do
showURL
localIO $ putStrLn text
teleport
Produces this output in the console:
> program -p start/localhost:8000
...
Connected to port: 8000
'http://localhost:8000/0/0/e/'
hello
'http://localhost:8000/0/0/e/()/w/'
world
Now there are two URLs. That is because <|>
is the alternative operation in Haskell. when the first term "hi "hello"
return nothing, the second is executed. Since teleport
send the data to the remote caller and stop, the other term is executed. The second URL is a bit longer since it contains the execution of two extra lines of code: ()
of the first print "hello"
and w
for the second teleport
.
> curl http://localhost:8000/0/0/e/
SMore/0/10002000/e/()/e/
SMore/0/20102000/e/()/w/()/e/
Now there are two responses since the two teleports are executed. Notice that they have different closure identifiers. The program display in the console the same output again.
If we execute the second URL:
curl 'http://localhost:8000/0/0/e/()/w/'
SMore/0/20102000/e/()/w/()/e/
Only one response is produced, as expected since we are addressing the second term. Now we need to quote it since curl does not like '()' in the URL.
To execute the first hi "hello"
and not to execute the second, we have to instruct teleport
not to let execute alternative terms. this involves some tweaking that is not worth to mention for this tutorial since teleport
is not used normally by the programmer. Usually, higher-level primitives that use teleport are used like atRemote
or runAt
. They are a computation between two teleport
s; The first teleport transport the computation to the remote node and the second transport the results back to the caller.
We will see this shortly after seeing after this example for managing console input:
main = keep $ initNode $ do
local $ option "r" "run"
showURL
localIO $ putStrLn "hello world"
local $ return (42 ::Int)
teleport
> program -p start/localhost:8000
Connected to port: 8000
Enter r to: run
r
"r"
option: r
'http://localhost:8000/0/0/e/"r"/'
hello world
option
wait for "r" in the console input. Once it is entered, since showURL is after it, the URL that is displayed includes "r" in the URL. By invoking it:
> curl'http://localhost:8000/0/0/e/"r"/'
SMore/0/30002000/e/"r"/()/42/e/
The URL will pass over option
and execute from showURL
on.
main = keep $ initNode $ do
local $ option "h" "hello"
atRemote $ do
showURL
localIO $ putStrLn "hello world"
Once this program is initiated (as usual) and "h" is entered, atRemote
executes locally, since the node is not connected with any other (is connected with himself) . showURL
is within atRemote
. If the URL printed is invoked then:
> curl 'http://localhost:8000/0/0/e/"h"/e/()/e/'
SMore/0/30004000/e/"h"/e/()/e/()/e/
A response is produced. The reason is that now there is a remote connection and the final teleport
within atRemote
sends the response back.
main = keep $ initNode $ hi "hello" <|> hi "world"
where
hi text= atRemote $ do
showURL
localIO $ putStrLn text
produces:
> program -p start/localhost/8000
...
Connected to port: 8000
'http://localhost:8000/0/0/e/e/()/e/'
hello
'http://localhost:8000/0/0/e/w/e/()/e/'
world
if we introduce the first URL:
> curl 'http://localhost:8000/0/0/e/e/()/e/'
SMore/0/20004000/e/e/()/e/()/e/
The alternative element is not executed since the tweak mentioned early for teleport
is present in atRemote
And now something more fun is coming; This program
main = keep $ initNode $ inputNodes <|> hi
where
hi = do
showURL
localIO $ putStrLn "hello"
let x= "hello "
teleport
showURL
localIO $ print $ x ++ "world"
teleport
prints the following when initialized:
> runghc program.hs -p start/localhost/8000
...
Connected to port: 8000
Enter list to: list nodes
Enter add to: add a new node
'http://localhost:8000/0/0/e/f/w/'
hello
'http://localhost:8000/0/0/e/f/w/()/()/'
"hello world"
It has two teleport
s, one after another. Locally, teleport
transfer his closure to himself and continue executing. That is the reason why all the program is executed.
If we invoke remotely the first:
> curl 'http://localhost:8000/0/1/e/f/w/'
SMore/1/20102000/()/e/
The program display in the console:
'http://localhost:8000/0/1/e/f/w/'
hello
It stop at the first teleport since it is a remote invocation.
We can continue from that first teleport by calling with his teleport identifier 20102000:
>curl 'http://localhost:8000/20102000/1/'
Will print in the console of the node:
'http://localhost:8000/0/0/()/()/'
"hello world"
The closure invoked includes the variable x
already instantiated with "hello ". That closure with his teleport becomes a new endpoint.
This works now as long as both request share the same connection. If the connection terminates, the closure is garbage collected. Currently a transient node closes the connection after one-three minutes depending on the version. In fact this example makes use of a bug of transient which think that calling nodes reuse connections, but curl does not do so among different invocations. That is the reason why the second invocation receives no response.
Any program can be invoked via HTTP. This program uses the runAt primitive, which allows a transient program to call a copy of him in another network address and return the result back.
main = keep $ initNode $ inputNodes <|> do
local $ option "r" "run"
i <- atOtherNode $ do
showURL
localIO $ print "hello"
i <- local $ threads 0 $ choose[1:: Int ..]
localIO $ threadDelay 1000000
return i
localIO $ print i
where
atOtherNode doit= do
node <- local $ do
nodes <- getNodes
guard $ length nodes > 1
return $ nodes !! 1
runAt node doit
nodes
contains the list of know node. The first node is the localnode, if there is another second node, the program call it with runAt and return a stream of numbers, done by a single thread; the current one. since choose
apply as much parallelism as it can with as much thread as it can, guess what would happen if we don't limit the threads available for this infinite stream.
Now we execute it in two different consoles:
The first node start:
runghc program.hs -p start/8000
The second node is started and let it know the existence of the first:
runghc program.hs -p start/3000/add/localhost/8000/n
Then in the console of this second we enter the "r" option and the remote node return a stream of increasing numbers the first display the URL for the invocation, since showURL is included:
...
Connected to port: 8000
Enter list to: list nodes
Enter r to: run
Enter add to: add a new node
'http://localhost:8000/0/30104000/e/f/w/"r"/("localhost",8000,[])/e/e/e/e/'
"hello"
In the second, a stream of values is presented. one every second:
...
option: add
Hostname of the node (none): "localhost"
port? 8000
services? ([])
connect to the node to interchange node lists? (n) "n"
Added node: ("localhost",8000,[])
> r
option: r
1
2
3
4
5
6
7
...
If we invoke the URL, we will see the messages that the remote node at port 8000 send to the calling node:
> curl 'http://localhost:8000/0/30104000/e/f/w/"r"/("localhost",8000,[])/e/e/e/e/' --globoff
SMore/30104000/60106000/()/1/()/e/
SMore/30104000/60106000/()/2/()/e/
SMore/30104000/60106000/()/3/()/e/
SMore/30104000/60106000/()/4/()/e/
SMore/30104000/60106000/()/5/()/e/
SMore/30104000/60106000/()/6/()/e/
SMore/30104000/60106000/()/7/()/e/
....
Serialization can be defined by the user so if your data is serlialized with the Loggable
class in the module Transient.Logged. (To be detailed)
Some other remarks about things not developed or not documented:
-
Serialization is defined by the user. The class is at Loogged.hs. You can define it for your data. including JSON
-
It needs to make better use of closures created dynamically.
-
showURL can Allow the program to be self documenting by displaying entry points when initialized.
| Intro
| How-to
| Backtracking to undo IO actions and more
| Finalization: better than exceptions
| Event variables: Publish Suscribe
| Checkpoints(New), suspend and restore
| Remote execution: The Cloud monad
| Clustering: programming the cloud
| Mailboxes for cloud communications
| Distributed computing: map-reduce