Skip to content

bbar/traffic-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Traffic Parser

Parser for *.csv.gz traffic files. Requires Node.js 12 or later.

Example usage:

$ git clone https://github.com/bbar/traffic-parser.git traffic-parser
$ cd traffic-parser
$ yarn install # (or npm install)
$ node parse.js \
    --sources="/some/path/a.csv.gz /some/path/b.csv.gz /some/path/c.csv.gz /some/path/d.csv.gz" \
    --destination="/some/path/data/parsed/intervals" \
    --weekdays="0,1,2,3,4,5,6" \
    --batch=625 \
    --sourceInterval=5 \
    --targetInterval=60
Argument Required Type Default Description
sources yes String Files to parse
destination yes String Directory where parsed files are placed
weekdays no String 0,1,2,3,4,5,6 Weekdays to parse (0=Sunday)
batch no Int 625 Max lines written to any file at once
sourceInterval no Int 5 (Mins) Interval between traffic samples in the *.csv.gz file
targetInterval no Int 5 (Mins) Desired interval between traffic samples for parsed files

A quick note about the batch argument... A batch size of 625 means the code will parse 625 lines from a *.csv.gz file, write those to disk, then parse another 625 lines, write to disk, …, until it’s done. If you set a batch limit of 30000 or so and haven't increased node's limit with --max-old-space-size, V8 will likely explode with a memory error. Surprisingly (to me, anyway) setting a larger batch size doesn't mean better performance. I started at 20000 and kept cutting it in half until I saw the best performance, which was around 625. So that's the default.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published