compromise uses semver, and pushes to npm and github frequently
- Major is a breaking api change - method or response changes that can cause runtime errors.
- Minor is a behaviour change - Tagging or grammar changes.
- Patch is an obvious, non-controversial bugfix.
While all Major releases should be reviewed, our only large releases are v6 in 2016 v12 in 2019 and v14 in 2022. Others have been mostly incremental.
- [fix] - runtime error in punctuation replace #1150
- [update] - compromise-dates 3.7.0
- [fix] - runtime error in number parser #1145
- [update] - dependencies
- [new] - .slashes() and .slashes().split() methods #1100
- [fix] - multiple contraction issue #1128
- [fix] - toNumbers() return values #1113
- [fix] - (plugins/wikipedia) - fix hard-coded path for #1116
- [fix] - (plugins/dates) - limit values in
mm/dd
format - [fix] - (plugins/dates) params mutation #1109
- [change] - split people names by commas #1111
- [change] - typescript export update #1104
- [update] - eslint config format
- [update] - github actions
- [update] - dependencies
- [new] - .compute('freeze')
- [new] - .debug('freeze')
- [change] - allow 3-slashes in a word
- [new] - .payload() plugin
- [new] -
.numbers().isUnit()
method #1089 - [change] - update github workflow (thanks FDawgs!)
- [fix] - README issues (thanks track0x1!)
- [fix] - .has() inconsistency
- [new] - support adding debug methods via plugins
- [change] - remove deprecated .debug(object) support
- [fix] - parentheses() match issue
- [fix] - tokenization issue #1085
- [new] -
dates().isBefore()
,dates().isBefore()
methods - [new] -
.debug('dates')
method - [fix] - lazy join() issue
- [update] - dependencies
- [new] - support for frozen lex in plugin object #1080
- [fix] - toggling options in .json()
- [new] - .join() and .joinIf() methods
- [new] - support freeze in sweep
- [change] - internal typescript improvements
- [fix] - tagging issues
- [change] - @hasEllipses must be following the word
- [update] - dependencies
- [fix] - missing words in html output (thanks ryan!)
- [change] - better #Possessive tagging for #1074
- [change] - improved is/has contraction classifier #1074
- [change] - fixes to subordinate clause identification #1072
- [update] - dependencies
- [new] - tagging
.freeze()
and.unfreeze()
feature - [change] - stronger deferal to internal lexicon
- [change] - support any-length phrases in lexicon
- [fix] - prevent missed overlapping lexicon phrases
- [update] - dependencies
- [fix] - abbreviation checks for sentence-tokenizer #1061
- [change] - improve person tagger #1059
- [change] - add #FutureTense tag
- [fix] - .out() runtime error #1056
- [fix] - punctuation loss in .not() #1022
- [update] - dependencies
- [fix] - verb conjugation fixes
- [fix] - tagger fixes
- [change] - align package.json with ESM module #1023
- [fix] - .splitBefore() bugfix
- [fix] - typescript+docs fixes #1023
- [fix] - subtle changes to .text() and .isFull()
- [update] - dependencies
- [new] - .verbs().toPastParticiple() method
- [new] -
.normalize({ debullet: true })
#1004 - [change] - typescript path changes (thanks @rotemdan !)
- [fix] - suffix tagging issues
- [fix] - match syntax issue #997
- [change] - keep possessive in replace #1011
- [change] - major improvements to adj.toNoun() conjugator
- [fix] - parsematch bug #997
- [fix] - "there's been" contraction
- [new] - .conjugate() methods on Noun/Adverb/Adjective classes
- [new] - add Gerund and PastParticiple to .verbs().conjugate() results
- [new] - option to keep possessives in .replace() #1011
- [fix] - tagger fix #998
- [update] - dependencies
- [change] - #Actor tagging - in advance of #565
- [change] - .noun() lumping changes - in advance of #565
- [new] - support japanese full-stop
- [fix] - number tagging #992
- [update] - dependencies
- [fix] - tagging fixes
- [change] - allow #Plural acronyms
- [fix] - allow root matches in fastOr
- [fix] - more flexible PhrasalVerb tagging
- [fix] - tagging fixes
- [new] - add Person .presumedMale(), .presumedFemale() methods
- [new] - add Pronoun class, .refersTo()
- [new] - add Noun.references()
- [new] - .nouns('spencer') shorthand as an if-match
- [change] - "[do] you .." etc now #QuestionWord
- [new] - add #Hyphenated tag
- [fix] - improved Auxiliary verb tagging
- [update] - dependencies
- [fix] - concat fix
- [change] - tagging fixes
- [change] -
{word/tag/sense}
sense-match syntax
- [new] - match term id
- [change] - tag text by default on .concat('')
- [change] - allow modifying term prePunctuation
- [new] - .wrap() method
- [new] - .isFull() method
- [new] - support full
notIf
matches on sweep - [fix] - text params for #953
- [fix] - nouns().isSingular() missing
- [change] - one-character w/ dash tokenization #977
- [change] - allow setting
model.one.prePunctuation
+postPunctuation
- [fix] - compromise-paragraphs plugin
- [change] - move internal conjugation methods
- [update] - github scripts
- [change] - fixes to .clauses() parser
- [change] - an astrix is not a word
- [new] - @hasColon method
- [new] - @hasDash supports two dashes
- [new] -
#Passive
verb tag - [new] - existential
#There
tag - [new] - add tense info to sentence json
- [fix] - verb tokenization issues
- [fix] - .replace() issues
- [update] - dependencies
- [fix] - runtime error #965
- [fix] - misc possessive tagging issues
- [update] - dependencies
- [fix] - .remove() fixes
- [change] - support « angle quotes »
- [update] - dependencies
- [fix] - possible runtime error in setTag method
- [change] - make #Honorific always a #Person #951
- [new] - manually change conjugations/inflections from plugin #949
- [new] -
.adjectives().conjugate()
method - [update] - dependencies
- [fix] - fix logic for greedy-negative matches - #936
- [fix] - fix tagging for 3-digit year iso dates - #868
- [update] - dependencies
- [fix] - support {root} matches without compromise/two
- [fix] - guard for toRoot methods in root match
- [update] - compromise-stats
- [fix] - hotfix for sentence tokenization issue #935
- [change] - improvements to negative-optional match logic -
!foo?
- [change] - support short sentences embedded in quotes+parentheses
- [change] - faster sentence tokenizer
- [change] - ° symbol is not punctuation
- [new] - implement .swap() for comparative/superlative adjectives
- [fix] - sentence.toFuture() conjugation rules
- [update] - dependencies
- [change] - support root matches like '{walk}' work without doing .compute('root')
- [change] - split numbers+units '12km' as contraction - #919
- [new] -
.lazy(txt, match)
fast-scan method 1 - [fix] - support apostrophes in lexicon #932
- [fix] - support unTag property in sweep
- [change] - keep sentence caches, when still valid
- [change] - alias nlp.compile() to .buildTrie()
- [fix] - tagging fixes
- [update] - dependencies plugin-releases: dates, speed, de-compromise
- [fix] - missed caches in .sweep()
- [new] - .out('hash') and
.json({hash:true})
- [fix] - unwanted logging in compromise/one
- [fix] - dependency export path for react-native builds #928
- [change] - split hyphenated words in match syntax 'foo-bar'
- [change] - support 4-digit number-ranges (when not a phone number) plugin-releases: dates
- [fix] - double-contraction issue #925
- [fix] - .not() memleak #926
- [fix] - speed improvements
- [fix] - bug with fast-or possessive matches
- [fix] - bug with slow-or end-matches
- [change] - no-longer attempt 's contractions in compromise/one
- [new] - flag novel tags in world.one.tagSet
- [new] - .sweep() and nlp.buildNet() methods
- [new] - some typescript support in plugins #918
- [fix] - better unicode support with Unicode property escapes
- [fix] - problems matching on cached documents
- [fix] - typescript fixes
- [fix] - suffix tagging issues
- [fix] - uncached matches missing in .sweep()
- [fix] - non-empty results when pointer is first repaired
- [fix] - nouns().toPlural() fix for #921
- [fix] - drop deprecated .subst() method internally
- [new] - some support for .numbers().units() again #919
- [new] - add .harden() .soften() undocumented methods
- [fix] - support pre-parsed matches in .has() .if() and .not()
- [fix] - contraction OR match issue
- [fix] - match-syntax min-max issue
- [fix] - normalized printout of abbreviations
- [update] - date plugin release
- [update] - dependencies
- [fix] - main property in package.json #911
- [fix] - client-side export format for plugins
- [new] - more adjective transformation methods
- [new] - emoji + emoticon tagger
- [new] - case-sensitive match option -
{caseSensitive:true}
Major release - see Release Notes for full details
- [breaking] - remove
.parent()
and.parents()
chain - (use.all()
instead) - [breaking] - remove
@titleCase
alias (use @isTitleCase) - [breaking] - remove '.get()' alias - use '.eq()'
- [breaking] - remove
.json(0)
shorthand - use.json()[0]
- [breaking] - remove
.tagger()
- use .compute('tagger') - [breaking] - remove
.export()
-> .load() - use .json() -> nlp(json) - [breaking] - remove
nlp.clone()
- [breaking] - remove
.join()
deprecated - [breaking] - remove
.lists()
deprecated - [breaking] - remove
.segment()
deprecated - [breaking] - remove
.sententences().toParticiple()
&.verbs().toParticiple()
- [breaking] - remove
.nouns().toPossessive()
&.nouns().hasPlural()
- [breaking] - remove array support in match methods - (use
.match().match()
instead) - [breaking] - refactor
.out('freq')
output format - (uses.compute('freq').terms().unique().json()
instead) - [breaking] - change
.json()
result format for subsets - [change] merge re-used capture-group names in one match
- [change] drop support for undocumented empty '.split()' methods - which used to split the parent
- [change] subtle changes to
.text('fmt')
formats - [change] @hasContraction is no-longer secretly-greedy. use
@hasContraction{2}
- [change]
.and()
now does a set 'union' operation of results (no overlaps) - [change] bestTag is now
.compute('tagRank')
- [change]
.sort()
is no longer in-place (its now immutable) - [change] drop undocumented options param to
.replaceWith()
method - [change] add match-group as 2nd param to split methods
- [change] remove #FutureTense tag - which is not really a thing in english
- [change]
.unique()
no-longer mutates parent - [change]
.normalize()
inputs cleanup - [change] drop agreement parameters in .numbers() methods
- [change] - less-magical money parsing -
nlp('50 cents').money().get()
is no-longer0.5
- [change] - .find() does not return undefined on an empty result anymore
- [change] - fuzzy matches must now be wrapped in tildes, like
~this~
- [new]
.union()
, .intersection(), .difference() and .complement() methods - [new]
.confidence()
method - approximate tagging confidence score for arbitrary selections - [new]
.settle()
- remove overlaps in matches - [new]
.isDoc()
- helper-method for comparing two views - [new]
.none()
- helper-method for returning an empty view of the document - [new]
.toView()
method - drop back to a normal Class instance - [new]
.grow()
.growLeft()
and.growRight()
methods - [new] add punctuation match support via pre/post params
- [new] add ambiguous empty .map() state as 2nd param
- [fix] - regex backtracing issue #847 (thanks @srubin)
- misc tagging fixes update deps
- [fix] - verbphrase conjugation fixes
- [fix] - verbphrase tagger fixes
- [fix] - url tagging regex improvements (thanks Axay!) update deps plugin-releases: dates
- [fix] - obscure runtime error in capture-groups update deps plugin-releases: typeahead
- [change] - use babel default build target (drop ie11 polyfill)
- [change] - dont compile esm build w/ babel anymore
- [fix] - sentence conjugation fixes
- [fix] - improvements to phrasal verbs
- [change] - keep tokenization for some more dashed suffixes like 'snail-like' plugin-releases: dates, numbers, sentences
- [change] - tokenize '2 - 5' as NumerRange, like '2-5' is
- [fix] - edge-cases for URLs with numbers
- [fix] - some sentences.toPastTense() fixes
- [fix] - 'n weekends from now' math in plugin-date plugin-releases: dates, sentences
- [fix] - support more time-ranges plugin-releases: [email protected]
- [new] - support Time-range like '3pm-4pm'
- [change] - cleanup some unicode regexes plugin-releases: dates
- [fix] - match syntax tokenization fix
- [change] - improved performance monitoring
- [fix] - support complicated regular-expressions in match syntax
- improved performance testing
- [fix] - support matching implicit terms in (or|blocks)
- [change] - add #Timezone tag (from date-plugin)
- [change] - add many more cities and regions
- [change] - #Date terms can still be a #Conjunction
- [new] - #Imperative tag and
.verbs().isImperative()
method - [fix] - some tagger issues
- update deps plugin-releases: dates
- [new] - #Fraction tag and improved fraction support (thanks Jakeii!)
- [fix] - edge-case match issues with
!
syntax - [change] - update deps
- updates for
[email protected]
,[email protected]
- [fix] - fix weird ordering issue with named exports #815
- [fix] - typescript issue
- [fix] - matches over a contraction
- [new] - add 'implicit' text output
- [new] - World.addConjugations() method
- [new] - World.addPlurals() method
- [new] - start compromise-penn-tags plugin
- [new] - add fuzzy option to match commands
- [new] - support multiple-word matches in OR matches (a|b|foo bar|c)
- [change] (internal) - rename 'oneOf' match syntax to 'fastOr'
- [change] - use new export maps format
- [fix] - conjugations fixes #800
- [fix] - tokenization fixes #801
- [change] improved support for fractions in numbers-plugin #793
- [change] remove zero-width characters in normalized output #759
- [change] improved Person tagging with particles #794
- [change] improved i18n Person names
- [change] tagger+tokenization fixes
- [change] remove empty results from .out('array') #795
- [change]
.tokenize()
runs any postProcess() scripts from plugins - [change] improved support for lowercase acronyms
- [change] - support years like '97
- [change] - change tokenizer for '20-aug'
- [change] - update deps of all plugins
- [fix] - NumberRange tagging issue #795
- [fix] - improved support for ordinal number ranges
- [fix] - improved regex support in match-syntax
- [fix] - improved support for
softmatch syntax #797 - [fix] - better handling of
{0,n}
match syntax - [new] - new plugin
strict-match
- [new] - set NounPhrase, VerbPhrase tags in nlp-sentences plugin
- [new] -
.phrases()
method in nlp-sentences plugin - [new] - support
.apppend(doc)
and.prepend(doc)
- [new] -
values.normalize()
method
- [change] many misc tagging fixes
- 'if' is now a #Preposition
- possessive pronouns are #Pronoun and #Possessive
- more phrasal verbs
- make #Participle tag #PastTense
- favor #PastTense over #Participle interpretation in tagger
- [change]
@hasHyphen
returns false for sentence dashes - a lot more testing
- [new] first-attempt at
verbs().subject()
method - [change] avoid conjugating imperative tense - 'please close the door'
- [change] misc tagging fixes #786
- [change] .nouns() results split on quotations #783
- [change] NumberRange must be < 4 digits #735
- [change] reduction in #Person tag false-positives
- [new] add
.parseMatch()
method for pre-parsing match statements
- [change] stop including adverbs and some auxiliaries in
.conjugate()
results - [change] .append() and .prepend() on an empty doc now creates a new doc
- [new] add
verbs().toParticiple()
method (add to observables/verb) - [new] add
sentences().toParticiple()
method (add to observables/verb) - [fix] some verb-tagging issues
- [fix] contractions issue in
.clone()
- [fix] try harder to retain modal-verbs in conjugation - 'i should drive' no-longer becomes 'i will drive'
- fix for offset issue #771
- fix for
{min,max}
syntax #767 - typescript fixes
- update deps
-support unicode spaces for #759
- major improvements to
compromise-plugin-dates
(1.0.0)
- bugfixes (conjugation and tagging) 752, 737, 725, 751, 743 748, 755, 758, 706, 761
- support tokenized array as input
- update deps
- bugfix updates to
plugin-sentences
, andplugin-dates
- deprecate
.money()
and favour overloaded method in compromise-numbers plugin - add
.percentages()
and.fractions()
to compromise-numbers plugin - add
.hasAfter()
and.hasBefore()
methods - change handling of slashes
- add
.world()
method to constructor - add more abbreviations
- fix regex backtracking #739
- tokenize build:
-
- remove conjugation and inflection data
-
- remove conjugation and inflection functions
- remove sourcemap from build process (too big)
- improvements to
.numbers().units()
- fix for linked-list runtime error #744 with contractions
- fix
verbs.json()
runtime-error - improve empty
.lists()
methods - allow custom tag colors
- test new github action workflow
- significant (~30%) speed up of parsing
- change sensitivity of input in
.lookup()
for major speed improvements. - improved typescript types
- subtle changes to internal caching
- adds 'oneOf' match syntax param
- fixes
[word?]
syntax parsing
major changes to .export()
and [capture] group
match-syntax.
- [breaking] move .export() and .load() methods to plugin (compromise-export)
-
- change .export() format - this hasn't worked properly since v12. (mis-parsed contractions) see #669
- [breaking] split
compromise-output
intocompromise-html
andcompromise-hash
plugins - [breaking]
.match('foo [bar]')
no-longer returns 'bar'. (use.match('foo [bar]', 0)
) - [breaking] capture groups are no longer merged.
.match('[foo] [bar]')
returns two groups accessible with the new.groups()
function - [breaking] change
.sentences()
method to return only full-sentences of matches (use.all()
instead)
modifications:
- [fix] - nlp.clone() - hasn't worked properly, since v12. (@Drache93)
- [fix] - issues with greedy capture [*] and [.+] -(@Drache93) 💛
- add whitespace properties (pre+post) to default json output (suppress with
.json({ whitespace: false })
) .lookup({ key: val })
with an object now returns an object back ({val: Doc})- add nlp constructor as a third param to
.extend()
- support lexicon object param in tokenize -
.tokenize('my word', { word: 'tag' })
- clean-up of scripts and tooling
- improved typescript types
- add support for some french contractions like
j'aime -> je aime
- allow null results in
.map()
function - better typescript support
- allow longer acronyms
- [fix] - offset length issue
- [new] - add new named-match syntax, with .groups() method (@Drache93)
- [new] - add
nlp.fromJSON()
method - [new] - add a new
compromise-tokenize.js
build, without the tagger, or data included.
- prefer
@titleCase
instead of#TitleCase
tag - update dependencies
- fix case-sensitive paths
- fix greedy-start match condition regression #651
- fix single period sentence runtime error
- fix potentially-unsafe regexes
- improved tagging for '-ed' verbs (#616)
- improve support for auxilary-pastTense ('was lifted') verb-phrases
- more robust number-tagging regexes
- setup typescript types for plugins #661 (thanks @Drache93!)
- verb conjugation and tagger bugfixes
- disambiguate between acronyms & yelling
- fix 'aint' contraction
- make Doc.world writable
- update deps
- more tests
- fix shared period with acronym at end of sentence
- fix some mis-classification of contraction
- fix over-active emoji regex
- tag 'cookin', 'hootin' as
Gerund
- support unicode single-quote symbols in contractions
- improved splitting in .nouns()
- add
.nouns().adjectives()
method - add
concat
param to.pre()
and.post()
- allow ellipses at start of term "....so" in
@hasEllipses
- fix matches with optional-end
foo?$
match syntax - add typescript types for subsets
- add 'sideEffect:false' flag to build
- considerable speedup (20%) in tagger
- ensure trimming of whitespace for root/clean/reduced text formats
- fix client-side logging
- more flexible params to
replace()
andreplaceWith()
- see Release Notes
- support singular units in
.value()
.quotations()
no-longer return repeated results for nested quotes- simplify quotation tagset
.out('normal')
no longer includes quotes or trailing-possessives- improve
.debug()
on client-side
- better honorific support, add
honorifics
feature to .normalize() - elipses bugfixes
- replace unicode chars in
.normalize()
now by default acronyms().stripPeriods()
andacronyms().addPeriods()
- tag professions as
#Actor
- add more behaviours to
.normalize()
- support match-results as inputs to .match() and .not()
- support some us-state abbreviations like 'Phoeniz AZ'
- add
nouns().toPossessive()
- ngrams now remove empty-terms in contractions - fixes counting issue #476
- expose internal
sentences().isQuestion()
method .join()
as an alias for.flatten()
- slightly different behavior for wildcards in capture-groups pull/472
.possessives()
subset +#Possessive
tagging fixes- hide massive
world
output for console.log of a term
- improve quotations() method
- add .parentheses() method
- add 'nickname' support to .people()
- 'will be #Adjective' now tagged as Copula
- include adverbs in verb conjugation (more) consistently
sentences().toContinuous()
andverbs().toGerund()
- some more aliases for jquery-like methods api
- move
getPunctuation
,setPunctuation
from .sentence to main Text method - rename internal
endPunctuation
togetPunctuation
- more consistent
cardinal/ordinal
tagging for values
- add #Abbreviation tag
- add #ProperNoun tag
- fixes for noun inflection
- include old ending punctuation in a
.replace()
cmd
- almost-double the support for first-names
- changes to bestTag method
- rolls-back some aggressive JustesonKatz stuff
- better support for emdash numberRange
- 'can't' contraction bugfix
- fix for dates().toShortForm()
- add
#Multiple
Values tag, and changes to how invalid numbers like 'sixty fifteen hundred' are understood - better em-dash/en-dash support
- better conjugate implicit verbs inside contractions - "i'm", "we've"
- nouns().articles() method
- neighborhoods as #Place
- support more complex noun-phrases with JustesonKatz in
.nouns()
- support for persistent lexicon/tagset changes
addTags, addWords, addRegs, addPlurals, addConjugations
methods to extend native data-
.plugin()
method to wrap all of these into one
-
- (removal of
.packWords()
method)
- (removal of
- more
.organizations()
matches - regex-support in .match() -
nlp('it is waaaay cool').match('/aaa/').out()//'waaaay'
- improved apostrophe-s disambiguation
- support whitespace before sentence boundary
- improved QuestionWord tagging, some
.questions()
without a question-mark - phrasalVerb conjugation
- new #Activity tag for Gerunds as nouns 'walking is fun'
- change ngram params to an object
{size:int, max:int}
- implement '[]' capture-group syntax in .match()
- bring-back
map, filter, foreach and reduce
methods - set
.words()
as alias for .terms() people().firstNames()
,people().lastNames()
- split-out comma-separated adverbs
- fix for '.watch' reserved word in efrt
- improved
places()
parsing - improved
{min,max}
match syntax - new
.out('match')
method - quiet addition of .pack() and .unpack() for owen
- move internal lexicon around, to support new format in v11
- added states & provinces as #Region
- added #Comparable tag for adjectives that conjugate
- add increment/decrement/add/subtract methods to .values()
- add units(), noUnits() methods to .values()
- 'uncountable' nouns are no longer assumed to be singular
- money tag is no longer always a value
- improved tagging of
VerbPhrase
andCondition
- fixes to contractions in sentence-changes - "i'm going -> i went"
- several verb conjugation fixes
- accept Terms & Result objects in .match() and .replace()
- new
Percent
tag - lump more units in with
.values()
- .trim() method,
- adjective tagging fixes
- some new .out() methods
- fix return format of .isPlural(), so it acts like a match filter
- less-greedy date tagging & ambiguous month fixes
- cleanup & rename some
.value()
methods - change lumping behaviour of lexicon terms with multiple words
- keep more former tags after a term replace method
- new
.random()
method - new
.lessThan()
,.greaterThan()
,.equalTo()
methods - new prefix/suffix/infix matches with
_ffix
syntax tag()
supports a sequence of tags for a sequence of terms- .match 'range' queries now use a real match -
#Adverb{2,4}
- new
.before()
and.after()
match methods - removes
.lexicon()
method for many-lexicons concept - changes params of
.replaceWith()
method to a 'keyTags' boolean - improved .debug() and logging on client-side
- pretty-real filesize reduction by swapping es6 classes for es5 inheritance
- rename
Term.tag
object toTerm.tags
so the.tag()
method can work throughout more-consistently - fix 'Auxillary' tag typo to 'Auxiliary'
- optimisation of .match(), and tagset - significant speedup!
- adds
.tagger()
method and cleanup extra params - adds
wordStart
andwordEnd
offsets to.out('offset')
for whitespace+punctuation - new
.has()
method for faster lookups
- add
nlp.out('index')
method, 12 bugs
- add
nlp.tokenize()
method for disabling pos-tagging of input
- less-ambitious date-parsing of nl-date forms
- filesize reduction using efrt data structure (254k -> 214k)
- fix for IE9
- weee! big change! npm package rename
- builds now using browserify + derequire()
- re-written term-lumper logic
- new nlp.lexicon({word:'POS'}) flow
- be consistent with
text.normal()
,term.all_forms()
,text.word_count()
.text.normal()
includes sentence-terminators, like periods etc.
- airport codes support, helper methods for specific POS
- newlines split sentences
- Text methods now return this, instead of array of sentences
- more-sensible responses for invalid, non-string inputs
- 14 PRs, with fixes for currencies, pluralization, conjugation
- Value.to_text() new method, fix "Posessive" POS typo
- return of the text.spot() method (Re:#107)
- more aggressive lumping of dates, like 'last week of february'
- whitespace reproduction in .text() methods
- move negate from sentence to verb & statement
- rename 'implicit' to 'expansion' for smarter contractions
- added readable-compression to adj, verbs (121kb -> 117kb)
- hyphenated words are normalized into spaces
- grammar-aware match & replace functions
- Statement & Question classes
- split ngram, locale, and syllables into plugins in seperate repo
- es6 classes, babel building
- better test coverage
- ngram uses term tokenization, so that 'Tony Hawk' us one term, and not two
- more organized pos rules
- Pos tagging is done implicitly now once nlp.Text is run
- Entity spotting is split into .people(), .place(), .organisations()
- unicode normalisation is killed
- opaque two-letter tags are gone
- plugin support
- passive tense detection
- lexicon can be augmented third-party
- date parsing results are different
- smarter handling of ambiguous contractions ("he's" -> ["he is", "he has"])
- added name genders and beginning of co-reference resolution ('Tony' -> 'he') API.
- small breaking change on
Noun.is_plural
andNoun.is_entity
, affording significant pos() speedup. Bumped Major version for these changes.
- Phrasal verbs ('step up'), firstnames and .people()
- Major file-size reduction through refactoring
- New NER choosing algorithm, better capitalisation logic, consolidated tests
- Sentence class methods, client-side demos