Skip to content

Commit

Permalink
Merge pull request #284 from delph-in/develop
Browse files Browse the repository at this point in the history
v1.2.3
  • Loading branch information
goodmami authored Apr 8, 2020
2 parents 37d7f45 + 22be16c commit dee8e5f
Show file tree
Hide file tree
Showing 12 changed files with 328 additions and 199 deletions.
318 changes: 171 additions & 147 deletions CHANGELOG.md

Large diffs are not rendered by default.

3 changes: 0 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,9 +125,6 @@ Do the following tasks prior to releasing on GitHub and PyPI.
- [ ] Merge to master
- [ ] Test again
- [ ] push
- [ ] Create a source distribution: `setup.py sdist`
- [ ] Build a wheel distribution: `setup.py bdist_wheel`
- [ ] Upload to PyPI: `twine upload dist/*`
- [ ] [Make a new release](https://github.com/delph-in/pydelphin/releases/new)
- [ ] Announce

Expand Down
60 changes: 30 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ interacting with tools in the DELPH-IN ecosystem. PyDelphin's goal is
to lower the barriers to making use of DELPH-IN resources to help
users quickly build applications or perform experiments, and it has
been successfully used for research into machine translation (e.g.,
[Goodman, 2018][]), sentence chunking ([Muszyńska, 2016][]), neural
semantic parsing ([Buys & Blunsom, 2017][]), natural language
generation ([Hajdik et al., 2019][]), and more.
[Goodman, 2018]), sentence chunking ([Muszyńska, 2016]), neural
semantic parsing ([Buys & Blunsom, 2017]), natural language
generation ([Hajdik et al., 2019]), and more.

[Goodman, 2018]: https://goodmami.org/static/goodman-dissertation.pdf
[Muszyńska, 2016]: https://www.aclweb.org/anthology/P16-3014
Expand All @@ -32,7 +32,7 @@ New to PyDelphin? Want to see examples? Try the

## Installation and Upgrading

Get the latest release of PyDelphin from [PyPI][]:
Get the latest release of PyDelphin from [PyPI]:

```bash
$ pip install pydelphin
Expand All @@ -54,44 +54,44 @@ issue](https://github.com/delph-in/pydelphin/issues).
PyDelphin contains the following modules:

Semantic Representations:
- [`delphin.mrs`][]: [Minimal Recursion Semantics](http://moin.delph-in.net/MrsRfc)
- [`delphin.eds`][]: [Elementary Dependency Structures](http://moin.delph-in.net/EdsTop)
- [`delphin.dmrs`][]: [Dependency Minimal Recursion Semantics](http://moin.delph-in.net/RmrsDmrs)
- [`delphin.mrs`]: [Minimal Recursion Semantics](http://moin.delph-in.net/MrsRfc)
- [`delphin.eds`]: [Elementary Dependency Structures](http://moin.delph-in.net/EdsTop)
- [`delphin.dmrs`]: [Dependency Minimal Recursion Semantics](http://moin.delph-in.net/RmrsDmrs)

Semantic Components and Interfaces:
- [`delphin.semi`][]: [Semantic Interface](http://moin.delph-in.net/SemiRfc)
- [`delphin.vpm`][]: [Variable Property Mapping](http://moin.delph-in.net/RmrsVpm)
- [`delphin.variable`][]: MRS variables
- [`delphin.predicate`][]: [Semantic Predicates](http://moin.delph-in.net/PredicateRfc)
- [`delphin.scope`][]: Underspecified scope
- [`delphin.sembase`][]: Basic semantic structures
- [`delphin.codecs`][]: A wide variety of serialization codecs for MRS, EDS, and DMRS
- [`delphin.semi`]: [Semantic Interface](http://moin.delph-in.net/SemiRfc)
- [`delphin.vpm`]: [Variable Property Mapping](http://moin.delph-in.net/RmrsVpm)
- [`delphin.variable`]: MRS variables
- [`delphin.predicate`]: [Semantic Predicates](http://moin.delph-in.net/PredicateRfc)
- [`delphin.scope`]: Underspecified scope
- [`delphin.sembase`]: Basic semantic structures
- [`delphin.codecs`]: A wide variety of serialization codecs for MRS, EDS, and DMRS

Grammar and Parse Inspection:
- [`delphin.derivation`][]: [Derivation trees](http://moin.delph-in.net/ItsdbDerivations)
- [`delphin.tdl`][]: [Type-Description Language](http://moin.delph-in.net/TdlRfc)
- [`delphin.tfs`][]: Feature structures and type hierarchies
- [`delphin.derivation`]: [Derivation trees](http://moin.delph-in.net/ItsdbDerivations)
- [`delphin.tdl`]: [Type-Description Language](http://moin.delph-in.net/TdlRfc)
- [`delphin.tfs`]: Feature structures and type hierarchies

Tokenization:
- [`delphin.repp`][]: [Regular-Expression PreProcessor](http://moin.delph-in.net/ReppTop)
- [`delphin.tokens`][]: [YY Token lattices](http://moin.delph-in.net/PetInput#YY_Input_Mode)
- [`delphin.lnk`][]: Lnk surface alignments
- [`delphin.repp`]: [Regular-Expression PreProcessor](http://moin.delph-in.net/ReppTop)
- [`delphin.tokens`]: [YY Token lattices](http://moin.delph-in.net/PetInput#YY_Input_Mode)
- [`delphin.lnk`]: Lnk surface alignments

Corpus Management and Processing:
- [`delphin.itsdb`][]: [\[incr tsdb()\]](http://moin.delph-in.net/ItsdbTop) profiles
- [`delphin.tsdb`][]: Low-level interface to test suite databases
- [`delphin.tsql`][]: [TSQL](http://moin.delph-in.net/TsqlRfc) test suite queries
- [`delphin.itsdb`]: [\[incr tsdb()\]](http://moin.delph-in.net/ItsdbTop) profiles
- [`delphin.tsdb`]: Low-level interface to test suite databases
- [`delphin.tsql`]: [TSQL](http://moin.delph-in.net/TsqlRfc) test suite queries

Interfaces with External Processors:
- [`delphin.interface`][]: Structures for interacting with external processors
- [`delphin.ace`][]: Python wrapper for common tasks using [ACE](http://sweaglesw.org/linguistics/ace/)
- [`delphin.web`][]: Client for the [web API](http://moin.delph-in.net/ErgApi)
- [`delphin.interface`]: Structures for interacting with external processors
- [`delphin.ace`]: Python wrapper for common tasks using [ACE](http://sweaglesw.org/linguistics/ace/)
- [`delphin.web`]: Client for the [web API](http://moin.delph-in.net/ErgApi)

Core Components and Command Line Interface:
- [`delphin.commands`][]: Functional interface to common tasks
- [`delphin.cli`][]: Command-line interface to functional commands
- [`delphin.hierarchy`][]: Multiple-inheritance hierarchies
- [`delphin.exceptions`][]: PyDelphin's basic exception classes
- [`delphin.commands`]: Functional interface to common tasks
- [`delphin.cli`]: Command-line interface to functional commands
- [`delphin.hierarchy`]: Multiple-inheritance hierarchies
- [`delphin.exceptions`]: PyDelphin's basic exception classes


[`delphin.cli`]: https://pydelphin.readthedocs.io/en/latest/api/delphin.cli.html
Expand Down
2 changes: 1 addition & 1 deletion delphin/__about__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# the warehouse project:
# https://github.com/pypa/warehouse/blob/master/warehouse/__about__.py

__version__ = '1.2.2'
__version__ = '1.2.3'
__version_info__ = __version__.replace('.', ' ').replace('-', ' ').split()

__title__ = 'PyDelphin'
Expand Down
16 changes: 10 additions & 6 deletions delphin/ace.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,7 @@ def __exit__(self, exc_type, exc_value, traceback):

def _result_lines(self, termini: List[Pattern[str]] = None) -> List[str]:
poll = self._p.poll
assert self._p.stdout is not None, 'cannot receive output from ACE'
next_line = self._p.stdout.readline

if termini is None:
Expand Down Expand Up @@ -195,6 +196,7 @@ def send(self, datum: str) -> None:
:meth:`interact` method for most data-processing tasks with
ACE.
"""
assert self._p.stdin is not None, 'cannot send inputs to ACE'
try:
self._p.stdin.write((datum.rstrip() + '\n'))
self._p.stdin.flush()
Expand Down Expand Up @@ -291,12 +293,14 @@ def close(self) -> int:
Close the ACE process and return the process's exit code.
"""
self.run_info['end'] = datetime.now()
self._p.stdin.close()
for line in self._p.stdout:
if line.startswith('NOTE: tsdb run:'):
self._read_run_info(line)
else:
logger.debug('ACE cleanup: %s', line.rstrip())
if self._p.stdin is not None:
self._p.stdin.close()
if self._p.stdout is not None:
for line in self._p.stdout:
if line.startswith('NOTE: tsdb run:'):
self._read_run_info(line)
else:
logger.debug('ACE cleanup: %s', line.rstrip())
retval = self._p.wait()
return retval

Expand Down
2 changes: 1 addition & 1 deletion delphin/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -455,7 +455,7 @@ def _mkprof_from_database(destination, db, schema, where, full, gzip):
tsdb.write_schema(destination, schema)

to_copy = set(schema if full else tsdb.TSDB_CORE_FILES)
where = '' if where is None else 'where ' + where
where = '' if not where else 'where ' + where

for table in schema:
if table not in to_copy or _no_such_relation(db, table):
Expand Down
57 changes: 47 additions & 10 deletions delphin/itsdb.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"""

from typing import (
Union, Iterable, Sequence, Tuple, List, Dict,
Union, Iterable, Sequence, Tuple, List, Dict, Any,
Iterator, Optional, IO, overload, cast as typing_cast
)
from pathlib import Path
Expand All @@ -15,6 +15,8 @@
import collections
import itertools

from progress.bar import Bar as ProgressBar

from delphin import util
from delphin import tsdb
from delphin import interface
Expand Down Expand Up @@ -53,6 +55,10 @@ class FieldMapper(object):
"""
A class for mapping between response objects and test suites.
If *source* is given, it is the test suite providing the inputs
used to create the responses, and it is used to provide some
contextual information that may not be present in the response.
This class provides two methods for mapping responses to fields:
* :meth:`map` -- takes a response and returns a list of (table,
Expand Down Expand Up @@ -83,7 +89,7 @@ class FieldMapper(object):
affected_tables: list of tables that are affected by the
processing
"""
def __init__(self):
def __init__(self, source: tsdb.Database = None):
# the parse keys exclude some that are handled specially
self._parse_keys = '''
ninputs ntokens readings first total tcpu tgc treal words
Expand All @@ -102,14 +108,23 @@ def __init__(self):
user host os start end items status
'''.split()
self._parse_id = -1
self._runs = {}
self._runs: Dict[int, Dict[str, Any]] = {}
self._last_run_id = -1

self.affected_tables = '''
run parse result rule output edge tree decision preference
update fold score
'''.split()

self._i_id_map: Dict[int, int] = {}
if source:
pairs = typing_cast(List[Tuple[int, int]],
source.select_from(
'parse',
('parse-id', 'i-id'),
cast=True))
self._i_id_map.update(pairs)

def map(self, response: interface.Response) -> Transaction:
"""
Process *response* and return a list of (table, rowdata) tuples.
Expand Down Expand Up @@ -145,8 +160,16 @@ def map(self, response: interface.Response) -> Transaction:
def _map_parse(self, response: interface.Response) -> tsdb.ColumnMap:
patch: tsdb.ColumnMap = {}
# custom remapping, cleanup, and filling in holes
patch['i-id'] = response.get('keys', {}).get('i-id', -1)
self._parse_id = max(self._parse_id + 1, patch['i-id'])
keys = response.get('keys', {})
if 'i-id' in keys:
patch['i-id'] = keys['i-id']
elif 'parse-id' in keys and keys['parse-id'] in self._i_id_map:
patch['i-id'] = self._i_id_map[keys['parse-id']]
else:
patch['i-id'] = -1
i_id = patch['i-id']
assert isinstance(i_id, int) # for type-checker's benefit, mainly
self._parse_id = max(self._parse_id + 1, i_id)
patch['parse-id'] = self._parse_id
patch['run-id'] = response.get('run', {}).get('run-id', -1)
if 'tokens' in response:
Expand Down Expand Up @@ -605,7 +628,7 @@ def select(self, *names: str, cast: bool = True) -> Iterator[tsdb.Record]:
Row(10)
>>> next(table.select('i-id', 'i-input'))
Row(10, 'It rained.')
>>> next(table.select('i-id', 'i-input'), cast=False)
>>> next(table.select('i-id', 'i-input', cast=False))
('10', 'It rained.')
"""
indices = tuple(map(self._field_index.__getitem__, names))
Expand Down Expand Up @@ -809,9 +832,9 @@ def process(self,
selector: a pair of (table_name, column_name) that specify
the table and column used for processor input (e.g.,
`('item', 'i-input')`)
source (:class:`TestSuite`, :class:`Table`): test suite or
table from which inputs are taken; if `None`, use the
current test suite
source (:class:`~delphin.tsdb.Database`): test suite from
which inputs are taken; if `None`, use the current
test suite
fieldmapper (:class:`FieldMapper`): object for
mapping response fields to [incr tsdb()] fields; if
`None`, use a default mapper for the standard schema
Expand All @@ -836,7 +859,7 @@ def process(self,
if source is None:
source = self
if fieldmapper is None:
fieldmapper = FieldMapper()
fieldmapper = FieldMapper(source=source)
index = tsdb.make_field_index(source.schema[input_table])

affected = set(fieldmapper.affected_tables).intersection(self.schema)
Expand All @@ -845,21 +868,35 @@ def process(self,

key_names = [f.name for f in source.schema[input_table] if f.is_key]

bar = None
if not logger.isEnabledFor(logging.INFO):
with tsdb.open(source.path, input_table) as fh:
total = sum(1 for _ in fh)
if total > 0:
bar = ProgressBar('Processing', max=total)

for row in source[input_table]:
datum = row[index[input_column]]
keys = [row[index[name]] for name in key_names]
keys_dict = dict(zip(key_names, keys))
response = cpu.process_item(datum, keys=keys_dict)

logger.info(
'Processed item {:>16} {:>8} results'
.format(tsdb.join(keys), len(response['results']))
)
if bar:
bar.next()

for tablename, data in fieldmapper.map(response):
_add_row(self, tablename, data, buffer_size)

for tablename, data in fieldmapper.cleanup():
_add_row(self, tablename, data, buffer_size)

if bar:
bar.finish()

tsdb.write_database(self, self.path, gzip=gzip)


Expand Down
18 changes: 17 additions & 1 deletion docs/guides/itsdb.rst
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ is to pass in a running :class:`~delphin.ace.ACEProcess` instance to
:meth:`TestSuite.process <delphin.itsdb.TestSuite.process>`\ ---the
:class:`~delphin.itsdb.TestSuite` class will determine if the
processor is for parsing, transfer, or generation (using the
:attr:`ACEProcessor.task <delphin.ace.ACEProcessor.task>` attribute)
:attr:`ACEProcessor.task <delphin.ace.ACEProcess.task>` attribute)
and select the appropriate inputs from the test suite.

>>> from delphin import ace
Expand Down Expand Up @@ -237,3 +237,19 @@ NOTE: 75 passive, 361 active edges in final generation chart; built 89 passives
NOTE: 35 passive, 210 active edges in final generation chart; built 37 passives total. [1 results]
[...]


Troubleshooting
---------------

``TSDBWarning: Invalid date field``

This warning occurs when PyDelphin tries to cast a value with the
``:date`` datatype when the raw value is not an acceptable date
format (see :func:`delphin.tsdb.cast` for an
explanation). Practically this means that the date will not be
usable for things like TSQL conditions, but also note that it can
cause data loss when writing a profile containing invalid dates to
disk as PyDelphin will not write invalid data. Low-level operations
that do not cast the value, such as from the :mod:`delphin.tsdb`
module, may be able to write the raw string without data loss, but
it is better to just fix the invalid dates.
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
delphin.highlight==1.0.0
falcon==2.0.0
penman==0.9.1
progress==1.5
regex==2020.1.8
requests==2.22.0
setuptools>=38.6.0
Expand Down
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@
],
install_requires=[
'penman==0.9.1',
'progress==1.5',
],
extras_require={
'docs': docs_require,
Expand Down
30 changes: 30 additions & 0 deletions tests/itsdb_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -272,3 +272,33 @@ def test_match_rows():
[],
[{'i-id': '30', 'i-input': 'd'}])
]


def test_bad_date_issue_279b(tmp_path, empty_alt_testsuite):
tmp_ts = tmp_path.joinpath('test_bad_date_issue_279b')
tmp_ts.mkdir()
schema = tsdb.read_schema(empty_alt_testsuite)
fields = schema['item']
tsdb.write_schema(tmp_ts, schema)
tsdb.write(
tmp_ts, 'item', [(0, 'The cat meows.', 'September 8, 1999')], fields)
ts = itsdb.TestSuite(tmp_ts)
assert list(ts['item'].select('i-date', cast=False)) == [
('September 8, 1999',)
]
with pytest.warns(tsdb.TSDBWarning):
ts['item'][0]['i-date']

# Ideally the following would not raise an assertion error, but
# the invalid date gets stored as `None` in memory which then gets
# written to disk. Unfortunately the fix is not obvious at this
# time, so I'm going to sidestep the issue for now and just say
# that PyDelphin will not write profiles with invalid values.
#
# tsdb.write_database(ts, tmp_ts)
# ts.reload()
# assert list(ts['item'].select('i-date', cast=False)) == [
# ('September 8, 1999',)
# ]


Loading

0 comments on commit dee8e5f

Please sign in to comment.