-
Notifications
You must be signed in to change notification settings - Fork 0
PerilousApricot/ibp_client
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This version of the IBP client API supports most of the existing calls as outlined in: http://loci.cs.utk.edu/ibp/documents/IBPClientAPI.pdf and additionally provides support for asynchronous calls and set/get methods for IBP data structures. The following calls are not supported: IBP_mcopy(), IBP_nfu_op(), IBP_datamover(), IBP_setAuthenAttribute(), IBP_freeCapSet(), DM_Array2String, and IBP_setMaxOpenConn(). Only the first 2 calls, IBP_mcopy(), IBP_nfu_op() are mentioned in the LoCI documentation. Everything else should work as normal, including the type definitions. Datatypes and accessor methods ---------------------------------- I have added new type definitions for the standard IBP data structures that are more natural for me. So use whichever you want. Below is the list from ibp/ibp_types.h: typedef struct ibp_attributes ibp_attributes_t; typedef struct ibp_depot ibp_depot_t; typedef struct ibp_dptinfo ibp_depotinfo_t; typedef struct ibp_timer ibp_timer_t; typedef struct ibp_capstatus ibp_capstatus_t; typedef char ibp_cap_t; typedef struct ibp_set_of_caps ibp_capset_t; There is also a new type "ibp_ridlist_t" for getting the list of resources from a depot. One can then use the traditional IBP_status calls to probe the individual resources. There are also a large number of accessor functions to hid the internal struct definitions from the user. The expection to this is "ibp_depotinfo_t" datatype. I'm not sure what a lot of the fields mean and so left it "as is". Listed below are all the available accessor functions from ibp/ibp_types.h: ---------------- ibp_depot_t ----------------------- ibp_depot_t *new_ibp_depot(); void destroy_ibp_depot(ibp_depot_t *d); void set_ibp_depot(ibp_depot_t *d, char *host, int port, rid_t rid); -----------------ibp_attributes_t-------------------- ibp_attributes_t *new_ibp_attributes(); void destroy_ibp_attributes(ibp_attributes_t *attr); void set_ibp_attributes(ibp_attributes_t *attr, time_t duration, int reliability, int type); void get_ibp_attributes(ibp_attributes_t *attr, time_t *duration, int *reliability, int *type); ----------------ibp_timer_t------------------------- ibp_timer_t *new_ibp_timer(); void destroy_ibp_timer(ibp_timer_t *t); void set_ibp_timer(ibp_timer_t *t, int client_timeout, int server_timeout); void get_ibp_timer(ibp_timer_t *t, int *client_timeout, int *server_timeout); -----------------ibp_cap_t and ibp_capset_t---------- void destroy_ibp_cap(ibp_cap_t *cap); ibp_cap_t *dup_ibp_cap(ibp_cap_t *src); ibp_capset_t *new_ibp_capset(); void destroy_ibp_capset(ibp_capset_t *caps); void copy_ibp_capset(ibp_capset_t *src, ibp_capset_t *dest); ibp_cap_t *get_ibp_cap(ibp_capset_t *caps, int ctype); -----------------ibp_capstatus_t (gleaned from IBP_manage calls)--------- ibp_capstatus_t *new_ibp_capstatus(); void destroy_ibp_capstatus(ibp_capstatus_t *cs); void copy_ibp_capstatus(ibp_capstatus_t *src, ibp_capstatus_t *dest); void get_ibp_capstatus(ibp_capstatus_t *cs, int *readcount, int *writecount, int *current_size, int *max_size, ibp_attributes_t *attrib); -----------------RID list management and RID conversion functions--------- void ridlist_init(ibp_ridlist_t *rlist, int size); void ridlist_destroy(ibp_ridlist_t *rlist); int ridlist_get_size(ibp_ridlist_t *rlist); rid_t ridlist_get_element(ibp_ridlist_t *rlist, int index); char *ibp_rid2str(rid_t rid, char *buffer); rid_t ibp_str2rid(char *rid_str); void ibp_empty_rid(rid_t *rid); Most of the stuff above is self-explanatory if you are familiar with IBP. I do want to highlight the RID management functions. My goal is to abstract what an RID is from it's use. With the routines above there is never a reason to probe into what an RID actually is. It could be an integer(which is what it currently is), a character string, IP address, etc. Asynchronous calls --------------------------------------------------- The goal of the asynchronous interface is to minimize the effect of network latency, make more efficient use of an individual network connection, and minimize the need for a developer to understand pthreads programming. All the functionality of the traditional synchronous calls is available in the async interface. I have taken the liberty to separate out functionality and provide more descriptive names. The best way to illustrate the programming differences is to give an example using the async protocol. Before I show an example there are a few new datatypes defined for the async calls, namely: ibp_op_t - Generic container for an async operation oplist_t - Contains a list of async operations Below is a program that creates a collection of allocations: -------------------------------------------------------------------------- ibp_capset_t *create_allocs(int nallocs, int asize, ibp_depot_t *depot) { int i, err; ibp_attributes_t attr; oplist_t *oplist; ibp_op_t *op; //**Create caps list which is returned ** ibp_capset_t *caps = (ibp_capset_t *)malloc(sizeof(ibp_capset_t)*nallocs); //** Specify the allocations attributes ** set_ibp_attributes(&attr, time(NULL) + A_DURATION, IBP_HARD, IBP_BYTEARRAY); oplist = new_ibp_oplist(NULL); //**Create a new list of ops oplist_start_execution(oplist); //** Go on and start executing tasks. This could be done any time //*** Main loop for creating the allocation ops *** for (i=0; i<nallocs; i++) { op = new_ibp_alloc_op(&(caps[i]), asize, depot, &attr, ibp_timeout, NULL); //**This is the actual alloc op add_ibp_oplist(oplist, op); //** Now add it to the list and start execution } err = oplist_waitall(oplist); //** Now wait for them all to complete if (err != IBP_OK) { printf("create_allocs: At least 1 error occured! * ibp_errno=%d * nfailed=%d\n", err, ibp_oplist_nfailed(iolist)); } free_oplist(oplist); //** Free all the ops and oplist info return(caps); } int main(int arcg, char *argv[]) { ibp_depot_t depot; ibp_rid_t rid; rid = ibp_str2rid("0"); //** Specify the Resource to use set_ibp_depot(&depot, "vudepot1.reddnet.org", 6714, &rid); //** fill in the depot struct ibp_init(); //** Initialize the IBP subsystem **REQUIRED** create_allocs(10, 1024, &depot); //** Perform the allocations ibp_finalize(); //** Shutdown the IBP subsystem ***REQUIRED*** return(0); } -------------------------------------------------------------------------- The most important thing to notice in main() is the ibp_init() and ibp_finalize() calls. These are required to start and shutdown the IBP subsystem. Also notice the lack of anything related to pthreads. All the multithreading is taking place inside the IBP async layer. You can mix and match IBP commands. They don't all have to be of the same type. Each ibp_waitall() waits until all current tasks have completed. That means you can intersperse add_ibp_oplist() calls and ibp_waitany() or ibp_wait_all() calls. For an example of this look at the base_async_test() in ibp_test.c. Internally each operation is assigned a "workload" and submitted to a global queue for an individual depot. As the global depot queue fills up it launches individaul depot connections based on the backlog in the global depot queue. Each of the individual depot connections maintains a local work queue. They pull ops from the depot's global que as tasks come in. Each of these connections is spawned as a separate execution thread and performs their operations in parallel. Each IBP operation can be broken up into 3 distinct phases: issue_command, send_phase, recv_phase. This breakdown is used to overcome latency by streaming operations to the depot and having the depot stream the results back. To make it clearer take a list of 4 commands, labels cmd_1...cmd_4. Using the sync calls this would be processed: issue_command_1 (start of cmd_1) send_phase_1 recv_phase_1 (wait for completion) issue_command_2 (start of cmd_2) send_phase_2 recv_phase_2 (wait for completion) issue_command_3 (start of cmd_3) send_phase_3 recv_phase_3 (wait for completion) issue_command_4 (start of cmd_4) send_phase_4 recv_phase_4 (wait for completion) If the latency is large compared to the operation it makes it difficult to effectively use a depot connection. In this case each issue_command and recv_phase incurs a network latency. As a result one tends to make numerous connections to a depot and use just a fraction of the bandwidth for each connection. This causes a much higher load on the depot than is necessary. If one uses async calls, assuming a single depot connection, then the operations are reorded to minimize latency: issue_command_1 (start of cmd_1) send_phase_1 issue_command_2 (start of cmd_2 - no pause) send_phase_2 issue_command_3 (start of cmd_3 - no pause) send_phase_3 issue_command_4 (start of cmd_4 - no pause) send_phase_4 recv_phase_1 (wait for completion) recv_phase_2 (wait for completion) recv_phase_3 (wait for completion) recv_phase_4 (wait for completion) In this approach a latency penalty is incurred for the initial issue_command and *possibly* for the initial recv_phase. If there are enough commands in the global queue then there is no initial recv_phase latency. This is because the issue_command and send_phase ops is still being processed and overlaps the initial recv_phase. The first 7 commands, all the issue_command and send_phase calls, are all sent as fast as the network will transmit them. There is no waiting for completion. This approach can eliminate much of the performace difference between local are remote depot access for lightweight operations, like IBP_allocate or IBP_mange() calls. The async depot connection is actually 2 separate threads. Each managing one side of the connection: send or recv. As the workload varies between depots the client library automatically adds/removes threads as needed. If a depot closes an existing connection any commands on the local queue are placed back on the depot's global queue and if needed a new connection is spawned. There is some self-tuning based on depot load. For example if a depot can only sustain 2 *stable* connections because of load this is automatically detected. This helps eliminate network churn and greatly improves performance. After a preset time an attempt is made to increase the number of connections. There are several parameters that can be tweaked to tune performance. These are discussed in the IBP client library configuration section later. The synchronous commands are all constructed from the synchronous calls but with additional logic to support a depot connection per client thread. This way the traditional behavior is preserved. Asynchronous operations ---------------------------- All operations have the ability to use an application notification or callback structure through the "oplist_app_notify_t" data type. This type is defined later in the section concerning native oplist_t operations. ----Native operations on an ibp_op_t---- ibp_op_t *new_ibp_op(); void free_ibp_op(ibp_op_t *op); -- Frees the internal variables for op and also op itself void finalize_ibp_op(ibp_op_t *iop); -- Only frees internal variables the op remains intact int ibp_op_status(ibp_op_t *op); -- Get the ops result, ie IBP_errno() int ibp_op_id(ibp_op_t *op); -- Get the ops id. Each op has a unique ID for tracking purposes in an oplist. The numbering always starts at 0. ----Read/Write ops that support offsets---------- Notice that there are 2 variants based on where the data comes from -- either a memory buffer or user supplied routine. Internally there is only the user version since the memory buffer versions just calls the user version with an internally supplied routine. The user specified versions allow you to perform scatter/gather operations into a coherent stream without the overhead of mutiple IBP calls. The user specified routine has the form: int next_block(int pos, void *arg, int *nbytes, char **buffer); and returns the next block of data to read/write with the size stored in "nbytes" and a pointer to the user supplied buffer in "buffer". The starting buffer position is stored in "pos". The routine should return a valid IBP error message. So if everything goes fine IBP_OK should be returned. The arg argument is the same routine supplied to the read/write op and is used to store private state information. Upon completion of a write operation an additional call is made to the user routine with buffer set to NULL. This allows the write call to perform any final processing on the last block of data. void set_ibp_user_read_op(ibp_op_t *op, ibp_cap_t *cap, int offset, int size, int (*next_block)(int, void *, int *, char **), void *arg, int timeout, oplist_app_notify_t *an); ibp_op_t *new_ibp_user_read_op(ibp_cap_t *cap, int offset, int size, int (*next_block)(int, void *, int *, char **), void *arg, int timeout, oplist_app_notify_t *an); ibp_op_t *new_ibp_read_op(ibp_cap_t *cap, int offset, int size, char *buffer, int timeout, oplist_app_notify_t *an); void set_ibp_read_op(ibp_op_t *op, ibp_cap_t *cap, int offset, int size, char *buffer, int timeout, oplist_app_notify_t *an); void set_ibp_user_write_op(ibp_op_t *op, ibp_cap_t *cap, int offset, int size, int (*next_block)(int, void *, int *, char **), void *arg, int timeout, oplist_app_notify_t *an); ibp_op_t *new_ibp_user_write_op(ibp_cap_t *cap, int offset, int size, int (*next_block)(int, void *, int *, char **), void *arg, int timeout, oplist_app_notify_t *an); ibp_op_t *new_ibp_write_op(ibp_cap_t *cap, int offset, int size, char *buffer, int timeout, oplist_app_notify_t *an); void set_ibp_write_op(ibp_op_t *op, ibp_cap_t *cap, int offset, int size, char *buffer, int timeout, oplist_app_notify_t *an); ---- Append operations - These just append data to an allocation ----- ibp_op_t *new_ibp_append_op(ibp_cap_t *cap, int size, char *buffer, int timeout, oplist_app_notify_t *an); void set_ibp_append_op(ibp_op_t *op, ibp_cap_t *cap, int size, char *buffer, int timeout, oplist_app_notify_t *an); ibp_op_t *new_ibp_append_op(ibp_cap_t *cap, int size, char *buffer, int timeout, oplist_app_notify_t *an); void set_ibp_append_op(ibp_op_t *op, ibp_cap_t *cap, int size, char *buffer, int timeout, oplist_app_notify_t *an); -----------IBP allocate/remove operations----------------------- I've made an explicit ibp_remove_op to decr the appropriate allocation's ref count. This is just a macro for an ibp_manage() cal with a IBP_DECR command. ibp_op_t *new_ibp_alloc_op(ibp_capset_t *caps, int size, ibp_depot_t *depot, ibp_attributes_t *attr, int timeout, oplist_app_notify_t *an); void set_ibp_alloc_op(ibp_op_t *op, ibp_capset_t *caps, int size, ibp_depot_t *depot, ibp_attributes_t *attr, int timeout, oplist_app_notify_t *an); ibp_op_t *new_ibp_remove_op(ibp_cap_t *cap, int timeout, oplist_app_notify_t *an); void set_ibp_remove_op(ibp_op_t *op, ibp_cap_t *cap, int timeout, oplist_app_notify_t *an); ----------------Modify an allocations reference count-------------------- ibp_op_t *new_ibp_modify_count_op(ibp_cap_t *cap, int mode, int captype, int timeout, oplist_app_notify_t *an); void set_ibp_modify_count_op(ibp_op_t *op, ibp_cap_t *cap, int mode, int captype, int timeout, oplist_app_notify_t *an); void set_ibp_modify_alloc_op(ibp_op_t *op, ibp_cap_t *cap, size_t size, time_t duration, int reliability, int timeout, oplist_app_notify_t *an); ibp_op_t *new_ibp_modify_alloc_op(ibp_cap_t *cap, size_t size, time_t duration, int reliability, int timeout, oplist_app_notify_t *an); --------------- Probe an allocation for details------------------ ibp_op_t *new_ibp_probe_op(ibp_cap_t *cap, ibp_capstatus_t *probe, int timeout, oplist_app_notify_t *an); void set_ibp_probe_op(ibp_op_t *op, ibp_cap_t *cap, ibp_capstatus_t *probe, int timeout, oplist_app_notify_t *an); ----------------Depot-depot copy--------------------------------- Notice that the names are "copyappend" cause this is what actually happens. The user specifies the source caps offset and length which is *appended* to the destination cap. As a result once an allocation becomes full it is *impossible* to specify it as a destination cap for future depot-depot copies. Ideally a new command could be added to specify a dest offset and a command for truncating an allocation. ibp_op_t *new_ibp_copyappend_op(ibp_cap_t *srccap, ibp_cap_t *destcap, int src_offset, int size, int src_timeout, int dest_timeout, int dest_client_timeout, oplist_app_notify_t *an); void set_ibp_copyappend_op(ibp_op_t *op, ibp_cap_t *srccap, ibp_cap_t *destcap, int src_offset, int size, int src_timeout, int dest_timeout, int dest_client_timeout, oplist_app_notify_t *an); --------------Modify a depot's global resources----------------- void set_ibp_depot_modify_op(ibp_op_t *op, ibp_depot_t *depot, char *password, size_t hard, size_t soft, time_t duration, int timeout, oplist_app_notify_t *an); ibp_op_t *new_ibp_depot_modify_op(ibp_depot_t *depot, char *password, size_t hard, size_t soft, time_t duration, int timeout, oplist_app_notify_t *an); ---------------Depot Inquiry calls aka IBP_status()---------------------- void set_ibp_depot_inq_op(ibp_op_t *op, ibp_depot_t *depot, char *password, ibp_depotinfo_t *di, int timeout, oplist_app_notify_t *an); ibp_op_t *new_ibp_depot_inq_op(ibp_depot_t *depot, char *password, ibp_depotinfo_t *di, int timeout, oplist_app_notify_t *an); ---------------Depot version call----------------------------------------- This is a new command added to the ACCRE depot. It returns a free form character string. The string is terminated by having "END\n" on a single line. This is similar to the "help->About" widgets for GUI apps. void set_ibp_version_op(ibp_op_t *op, ibp_depot_t *depot, char *buffer, int buffer_size, int timeout, oplist_app_notify_t *an); ibp_op_t *new_ibp_version_op(ibp_depot_t *depot, char *buffer, int buffer_size, int timeout, oplist_app_notify_t *an); -----------------Request list or depot resources-------------------------- This command was added by Nevoa Networks to probe a depot for its resource list. void set_ibp_query_resources_op(ibp_op_t *op, ibp_depot_t *depot, ibp_ridlist_t *rlist, int timeout, oplist_app_notify_t *an); ibp_op_t *new_ibp_query_resources_op(ibp_depot_t *depot, ibp_ridlist_t *rlist, int timeout, oplist_app_notify_t *an); oplist_t operations ---------------------------- -----------Application notification callback---------------- The "oplist_app_notify_t" type is used in all operations, oplists, and the actual oplist implementation. The structure takes a user supplied notification routine, *notify* below, and a user argument. Below are the routines needed to set and execute the structure. void app_notify_set(oplist_app_notify_t *an, void (*notify)(void *data), void *data); void app_notify_execute(oplist_app_notify_t *an); ----- Manipulate an oplist_t------------ Notice that both new_ibp_oplist() and init_oplist() have a "oplist_app_notify_t" parameter. This provides callback or application notification functionality. The "oplist_app_notify_t" is defined earlier. This function is called each time an operation completes. oplist_t *new_ibp_oplist(oplist_app_notify_t *an); void init_oplist(oplist_t *iol, oplist_app_notify_t *an); void free_oplist(oplist_t *oplist); --Free oplist internal data and oplist itself void finalize_oplist(oplist_t *oplist, int op_mode); --Only free oplist internal data will not free oplist struct ------------------Valid op_mode values---------------- OPLIST_AUTO_NONE 0 //** User has to manually free the data and oplist OPLIST_AUTO_FINALIZE 1 //** Auto "finalize" oplist when finished OPLIST_AUTO_FREE 2 //** Auto "free" oplist when finished -----------Retreive failed operations----------------- To actually get the failed ops status use ibp_op_status() defined earlier. One can then probe the ops id with ibp_op_id(). int oplist_nfailed(oplist_t *oplist); ibp_op_t *ibp_get_failed_op(oplist_t *oplist); -------------Determine the number of tasks remaining to be proceessed------------ int oplist_tasks_left(oplist_t *oplist); -------Add an operation to a list--------------- int add_ibp_oplist(oplist_t *iolist, ibp_op_t *iop); -----------Signal completed task submission----------------- If using callbacks there may not be a need to use the "wait" routines to block until an oplist completes. In this case you can signal to the oplist system that you are finished submitting tasks and let it automatically handle memory reclamation. This is done via the routine below where "free_mode" is one of those defined above for the finalize_oplist routine. void oplist_finished_submission(oplist_t *oplist, int free_mode); -----------Wait for operation completion---------------- int oplist_waitall(oplist_t *iolist); --All current oplist tasks must complete before returning ibp_op_t *ibp_waitany(oplist_t *iolist); --Returns when any task completes in the list. void oplist_start_execution(oplist_t *oplist); --Start executing commands in the list int ibp_sync_command(ibp_op_t *op); --Quick and dirty way to execute a command without all the extra overhead for list manipulation. This is how all the sync commands are created. IBP client library configuration file --------------------------------------------- The client library has several adjustable parameters that can be modifed either from a configuration file or through function calls. An example configuration file is given below: [ibp_async] min_depot_threads = 1 max_depot_threads = 4 max_connections = 128 command_weight = 10240 max_thread_workload = 10485760 #Swap out the line below for low latency networks #max_thread_workload = 524288 wait_stable_time = 15 check_interval = 5 max_retry = 2 min_depot_threads/max_depot_threads - Specifies the min and max number of threads that are created to a specific depot. These parameters are ignored for synchrounous calls. max_connections - Max number of allowed connection for all sync and async calls. If this number is met the client starts closing underutilized connections. command_weight - Base weight to assign to a command. A R/W command adds to this the number of bytes R/W. max_thread_workload - Once the depot's global queue has this much work a new thread is created. wait_stable_time - Amount of time to wait, in seconds, before trying to launch a new coneection. This is only triggerred if the depot has been closing connections. check_interval - Max wait time, in seconds, to wait between workload checks. max_retry - Max number of times to retry a command. Only used for dead connection failures. Configuration routines ----------------------------- --------IBP client generic routines------- void ibp_init(); -- Init IBP subsystem. Must be called before any sync or async commands void ibp_finalize(); -- Shuts down IBP subsystem char *ibp_client_version() - Returns an arbitrary character string with version information -----------Load config from file or store config----------- int ibp_load_config(char *fname); void set_ibp_config(ibp_config_t *cfg); void default_ibp_config(); ---------------Modify parameter routines----------- void ibp_set_min_depot_threads(int n); int ibp_get_min_depot_threads(); void ibp_set_max_depot_threads(int n); int ibp_get_max_depot_threads(); void ibp_set_max_connections(int n); int ibp_get_max_connections(); void ibp_set_command_weight(int n); int ibp_get_command_weight(); void ibp_set_max_thread_workload(int n); int ibp_get_max_thread_workload(); void ibp_set_wait_stable_time(int n); int ibp_get_wait_stable_time(); void ibp_set_check_interval(int n); int ibp_get_check_interval(); void ibp_set_max_retry(int n); int ibp_get_max_retry(); Example Programs --------------------------------------------- There are 3 programs included showing how to use both the sync and async calls. ibp_perf - Performs client-to-depot benchmarks ibp_copyperf - Performs depot->depot benchmarks ibp_test - Check basic functionality. No load tests like the other two programs work on.
About
IBP Client
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published