Skip to content

Bulk Processing

Ross Scroggs edited this page Dec 12, 2019 · 38 revisions

Bulk Processing

Introduction

Batch and CSV file processing can improve performance by executing Gam commands in parallel. The variables num_threads, num_tbatch_threads and auto_batch_min in gam.cfg control parallelism.

Batch files

There are two types of batch processing, one that uses processes and one that uses threads. Using processes is higher performance but gam csv commands are not supported.

  • gam batch - gam commands are run as processes, gam csv commands are not allowed in the batch file
  • gam tbatch - gam commands are run as threads, gam csv commands are allowed in the batch file
<UserGoogleDoc> ::=
        <EmailAddress> <DriveFileIDEntity>|<DriveFileNameEntity>|(<TeamDriveNameEntity> <TeamDriveFileNameEntity>)

gam batch <FileName>|-|(gdoc <UserGoogleDoc>) [charset <Charset>] [showcmds [<Boolean>]]
gam tbatch <FileName>|-|(gdoc <UserGoogleDoc>) [charset <Charset>] [showcmds [<Boolean>]]
  • <FileName> - A flat file containing Gam commands
  • - - Gam commands coming from stdin
  • gdoc <UserGoogleDoc> - A Google Doc containing Gam commands

Batch files can contain the following types of lines:

  • Blank lines - Ignored
  • # Comment line - Ignored
  • gam <GAMArgumentList> - Execute a GAM command
  • commit-batch - GAM waits for all current running gam commands to complete before proceeding
  • print <String> - Print <String> on stderr

Tbatch files can also contain the following line:

  • execute <Program> <ArgumentList> - Execute an arbitrary command; use the full path to specify <Program>

Example

  • You need to create accounts for your new students and assign them to groups based on their graduation year.
  • You have a CSV file NewStudents.csv with columns: Email,First,Last,GradYear,Password
  • You have a batch file NewStudents.bat containing these commands:
gam csv NewStudents.csv gam create user ~Email firstname ~First lastname ~Last org "/Students/~~GradYear~~" password ~Password
commit-batch
gam update group seniors sync members ou /Students/2020
gam update group juniors sync members ou /Students/2021
gam update group sophomores sync members ou /Students/2022
gam update group highschool sync members ous "'/Students/2020','/Students/2021','/Students/2022'"
  • Execute the batch file
gam redirect stdout ./NewStudents.out redirect stderr ./NewStudents.err tbatch NewStudents.bat showcmds

CSV files

<UserGoogleSheet> ::=
        <EmailAddress> <DriveFileIDEntity>|<DriveFileNameEntity> <SheetName>

gam csv <FileName>|-|(gsheet <UserGoogleSheet>) [charset <Charset>] [columndelimiter <Character>] [quotechar <Character>] [fields <FieldNameList>]
        (matchfield|skipfield <FieldName> <RegularExpression>)* gam <GAMArgumentList>

gam loop <FileName>|-|(gsheet <UserGoogleSheet>) [charset <Charset>] [columndelimiter <Character>] [quotechar <Character>] [fields <FieldNameList>]
        (matchfield|skipfield <FieldName> <RegularExpression>)* gam <GAMArgumentList>
  • gam csv - Use parallel processing
  • gam loop - Use serial processing
  • <FileName> - A CSV file and the one or more columns that contain data
  • - - The one or more columns that contain data from stdin
  • gsheet <UserGoogleSheet> - A Google Sheet and the one or more columns that contain data
  • columndelimiter <Character> - Columns are separated by <Character>; if not specified, the value of csv_input_column_delimiter from gam.cfg will be used
  • quotechar <Character> - The column quote characer is <Character>; if not specified, the value of csv_input_quote_char from gam.cfg will be used
  • fields <FieldNameList> - The column headings of a CSV file that does not contain column headings.
  • (matchfield|skipfield <FieldName> <RegularExpression>)* - The criteria to select rows from the CSV file; can be used multiple times; if not specified, all rows are selected

You can make substitutions in <GAMArgumentList> with values from the CSV file.

  • An argument containing exactly ~xxx is replaced by the value of field xxx
  • An argument containing instances of ~~xxx~~ has ~~xxx~~ replaced by the value of field xxx
  • An argument containing instances of ~~xxx~!~pattern~!~replacement~~ has ~~xxx~!~pattern~!~replacement~~ replaced by re.sub(pattern, replacement, value of field xxx) See: https://docs.python.org/2/library/re.html

Example

  • You need to update the work addresses of a set of users
  • You want a note field that shows their email address as name AT domain.com
  • You have a CSV file Users.csv with columns: primaryEmail,Street,City,State,ZIP
gam csv Users.csv gam update user ~primaryEmail address type work unstructured "~~Street~~, ~~City~~, ~~State~~ ~~ZIP~~" primary note text_plain "~~primaryEmail~!~^(.+)@(.+)$~!~\1 AT \2~~"
  • You want to do the above using a Google Sheet
gam csv gsheet <user> <fileID> "<sheetName>" gam update user ~primaryEmail address type work unstructured "~~Street~~, ~~City~~, ~~State~~ ~~ZIP~~" primary note text_plain "~~primaryEmail~!~^(.+)@(.+)$~!~\1 AT \2~~"

CSV files with redirection and select

You should use the multiprocess option on any redirected files: csv, stdout, stderr.

gam redirect csv ./filelistperms.csv multiprocess csv Users.csv gam user ~primaryEmail print filelist fields id,title,permissions,owners.emailaddress

If you want to select a gam.cfg section for the command, you can select the section at the outer gam and save it or select the section at the inner gam.

gam select <Section> save redirect csv ./filelistperms.csv multiprocess csv Users.csv gam user ~primaryEmail print filelist fields id,title,permissions,owners.emailaddress
gam redirect csv ./filelistperms.csv multiprocess csv Users.csv gam select <Section> user ~primaryEmail print filelist fields id,title,permissions,owners.emailaddress

Automatic batch processing

You can enable automatic batch (parallel) processing when issuing commands of the form gam <UserTypeEntity> .... In the following example, if the number of users in group [email protected] exceeds 1, then the print filelist command will be processed in parallel.

gam config auto_batch_min 1 redirect csv ./filelistperms.csv multiprocess group [email protected] print filelist fields id,title,permissions,owners.emailaddress

With automatic batch processing, you should use the multiprocess option on any redirected files: csv, stdout, stderr.

If you want to select a gam.cfg section for the command, you must select and save it for it to be processed correctly.

gam select <Section> save config auto_batch_min 1 redirect csv ./filelistperms.csv multiprocess group [email protected] print filelist fields id,title,permissions,owners.emailaddress

Update History

Installation

Configuration

Notes and Information

Definitions

Command Processing

Collections

Client Access

Special Service Account Access

Service Account Access

Clone this wiki locally