Skip to content

Latest commit

 

History

History
28 lines (22 loc) · 1.58 KB

README.md

File metadata and controls

28 lines (22 loc) · 1.58 KB

Traffic Parser

Parser for *.csv.gz traffic files. Requires Node.js 12 or later.

Example usage:

$ git clone https://github.com/bbar/traffic-parser.git traffic-parser
$ cd traffic-parser
$ yarn install # (or npm install)
$ node parse.js \
    --sources="/some/path/a.csv.gz /some/path/b.csv.gz /some/path/c.csv.gz /some/path/d.csv.gz" \
    --destination="/some/path/data/parsed/intervals" \
    --weekdays="0,1,2,3,4,5,6" \
    --batch=625 \
    --sourceInterval=5 \
    --targetInterval=60
Argument Required Type Default Description
sources yes String Files to parse
destination yes String Directory where parsed files are placed
weekdays no String 0,1,2,3,4,5,6 Weekdays to parse (0=Sunday)
batch no Int 625 Max lines written to any file at once
sourceInterval no Int 5 (Mins) Interval between traffic samples in the *.csv.gz file
targetInterval no Int 5 (Mins) Desired interval between traffic samples for parsed files

A quick note about the batch argument... A batch size of 625 means the code will parse 625 lines from a *.csv.gz file, write those to disk, then parse another 625 lines, write to disk, …, until it’s done. If you set a batch limit of 30000 or so and haven't increased node's limit with --max-old-space-size, V8 will likely explode with a memory error. Surprisingly (to me, anyway) setting a larger batch size doesn't mean better performance. I started at 20000 and kept cutting it in half until I saw the best performance, which was around 625. So that's the default.