Skip to content

v1.0.2

Compare
Choose a tag to compare
@soldni soldni released this 21 Mar 16:45
· 54 commits to main since this release
4e1d17f

What's Changed

  • Taggers for URL filtering by @soldni in #112
  • Updated CFF and Bibtex by @soldni in #118
  • Add preliminary Dolma v1.7 configurations, fix corner case in tokens. by @soldni in #120
  • Update CITATION.cff by @soldni in #126
  • Option to use ngram overlap to dedupe paragraphs by @rodneykinney in #122
  • Tagger modules import (fix for #128) by @soldni in #129
  • Added Support for JQ syntax in include/exclude mixer config by @soldni in #131
  • Added JQ syntax for replacements + added minimum score. by @soldni in #133
  • Bump the cargo group group with 1 update by @dependabot in #132
  • Improves tool to compute statistics; adds deduplication options. by @soldni in #135
  • use precompiled regex when loading url blocklists by @peterbjorgensen in #137

Full Changelog: v1.0.1...v1.0.2