- Add
end
parameter as a new stage to crawl path. It calls an async function that can be used to clean up after crawling or trigger some processes needed to refresh data in the application.
- Update extraction-worker and eth-fun to new minor versions which allow getting a transaction by hash.
config.environment
(including overwritten environment variables) is now passed intoextractor.{init|update}
. We stop recommend usingprocess.env
in strategies.- For
coordinator.remote
the input is now an objectremote({environment, execute})
. Here, we also stop recommend usingprocess.env
directly. - Previously, although stated in the docs, defining a value in
config.environment
did NOT take presedence over itsprocess.env
counter-part. It does now. - Priorly, only the first path element in the configuration file could define a "coordinator". We now allow all path elements to define coordinators. They're executed in parallel. However, coordinators are still not documented...
- Add flag
path[0].coordinator.archive: Boolean
that allows to delete extraction and transformation files after a single coordinator run. - Fix a crash that occurred in coordinator when transformation or extraction files were not present (e.g. when there weren't any crawl results).
- @attestate/kiwistand was crashing on a small Digital Ocean instance because
all MDB readers were used
[issue]. We demonstrated
that it can be fixed by increasing lmdb's
maxReaders
to 500. Hence, we have added a parameter todatabase.open(path, maxReaders)
.
function lifecycle.load()
now exits gracefully when prior transform job has no processable outputs.
- (breaking)
config.path[]
for transformer, extractor and loader, the propertiesoutput.path
andinput.path
were renamed tooutput.name
and doesn't have to be a path anymore. Instead, they are file names that are automatically resolved from withinenv.DATA_DIR
. - (breaking)
process.env
variables defined in the.env
file can now also be defined (and overwritten) in theconfig.mjs
file'senvironment
property. - (breaking) All lifecycle methods now have an updated interface as outlined
below:
- extractor
function init({ state, args, execute })
- extractor
function update({ message })
- tranformer
function onLine({ state })
wherestate.line
is the line function.args
can be matched too. - loader
function* order({ state })
wherestate.line
is the line - loader
function* direct({ state })
wherestate.line
is the line
- extractor
- There is a new component to a strategy called the "coordinator" that keeps
track of state. The config.mjs file (the
path
property) features a new field calledcoordinator
where amodule
and aninterval
can be defined. They're used to re-run the first path once all jobs have been completed, to e.g. keep in synchronization with a network like Ethereum. - We forked the extraction-worker from the neume-network organization and added a feature to immediately execute a worker message. More details: https://github.com/attestate/extraction-worker/.
- Reference docs for the extraction worker have been added.
- Reference docs for the crawler CLI have been added.
- Note: The
@attestate/crawler-call-block-logs
module at version 0.3.0 is compatible.
- (breaking) Create two separate LMDB tables for "order" and "load" data
- Add
crawler.mjs range
command - Add Strategy Specification sphinx page
- (breaking) Change
loader.handler
to two generator functionsorder
anddirect
as a property calledmodule
(consistent with extractor and transformer) - (breaking)
configuration.output.path
object is now required - (breaking)
configuration.loader.module
object is now required - Integrate with LMDB to persist loader data
- (breaking) Merge crawl path and configuration file into configuration file
- (breaking) Move
EXTRACTION_WORKER_CONCURRENCY
into configuration file
[skipped by accident]
- Initial release