-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding typing to tree branches #139
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,7 +4,9 @@ | |
|
||
from typing import Any, Dict, Iterator, List, Mapping, Optional, Set, Tuple | ||
|
||
from penman.types import Branch, Node, Variable | ||
from typing_extensions import TypeGuard | ||
|
||
from penman.types import Branch, Node, Symbol, Variable | ||
|
||
_Step = Tuple[Tuple[int, ...], Branch] # see Tree.walk() | ||
|
||
|
@@ -112,10 +114,10 @@ def _format(node: Node, level: int) -> str: | |
|
||
def _format_branch(branch: Branch, level: int) -> str: | ||
role, target = branch | ||
if is_atomic(target): | ||
target = repr(target) | ||
else: | ||
if is_tgt_node(target): | ||
target = _format(target, level) | ||
else: | ||
target = repr(target) | ||
return f'({role!r}, {target})' | ||
|
||
|
||
|
@@ -124,7 +126,7 @@ def _nodes(node: Node) -> List[Node]: | |
ns = [] if var is None else [node] | ||
for _, target in branches: | ||
# if target is not atomic, assume it's a valid tree node | ||
if not is_atomic(target): | ||
if is_tgt_node(target): | ||
ns.extend(_nodes(target)) | ||
return ns | ||
|
||
|
@@ -135,7 +137,7 @@ def _walk(node: Node, path: Tuple[int, ...]) -> Iterator[_Step]: | |
curpath = path + (i,) | ||
yield (curpath, branch) | ||
_, target = branch | ||
if not is_atomic(target): | ||
if is_tgt_node(target): | ||
yield from _walk(target, curpath) | ||
|
||
|
||
|
@@ -180,15 +182,32 @@ def _map_vars( | |
|
||
newbranches: List[Branch] = [] | ||
for role, tgt in branches: | ||
if not is_atomic(tgt): | ||
if is_tgt_node(tgt): | ||
tgt = _map_vars(tgt, varmap) | ||
elif role != '/' and tgt in varmap: | ||
# MyPy forgets that (Node ∨ Sym) ^ ¬Node → Sym | ||
elif is_tgt_symbol(tgt) and role != '/' and tgt in varmap: | ||
tgt = varmap[tgt] | ||
newbranches.append((role, tgt)) | ||
|
||
return (varmap[var], newbranches) | ||
|
||
|
||
def is_tgt_node(target: Symbol | Node) -> TypeGuard[Node]: | ||
""" | ||
Inverse of :func:`is_atomic`, only for Symbol | Node from branches. | ||
Automatically narrows the type to Node for better type inference | ||
""" | ||
return not is_atomic(target) | ||
|
||
|
||
def is_tgt_symbol(target: Symbol | Node) -> TypeGuard[Symbol]: | ||
""" | ||
Same as :func:`is_atomic`, only for Symbol | Node from branches. | ||
Automatically narrows the type to Symbol for better type inference | ||
""" | ||
return is_atomic(target) | ||
Comment on lines
+195
to
+208
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are a few issues here:
|
||
|
||
|
||
def is_atomic(x: Any) -> bool: | ||
""" | ||
Return ``True`` if *x* is a valid atomic value. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,14 +2,15 @@ | |
Basic types used by various Penman modules. | ||
""" | ||
|
||
from typing import Any, Iterable, List, Tuple, Union | ||
from typing import Iterable, List, Tuple, Union | ||
|
||
Variable = str | ||
Constant = Union[str, float, int, None] # None for missing values | ||
Role = str # '' for anonymous relations | ||
Symbol = str | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this the correct name for what's in branch targets? It seemed like the target must always be a string if it's not a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I believe this would be (a / A
:ARG1 (b / B) ; node target
:ARG2 b ; variable target (atomic)
:ARG3 "a b" ; string target (atomic)
:ARG4 abc ; symbol target (atomic)
) The only difference between a symbol and a variable is that a variable is used in a node (such as
That is correct. Penman does no interpretation of datatypes on parsing. It will accept a limited number of non-string types (ints, floats, etc.) during encoding, but they will be strings again when decoding.
While unconventional, missing targets are allowed and parse to >>> penman.parse('(a / A :ARG1)')
Missing target: (a / A :ARG1)
Tree(('a', [('/', 'A'), (':ARG1', None)])) Similarly, an empty node target means that the type annotation of >>> penman.parse('(a / A :ARG1 ())')
Tree(('a', [('/', 'A'), (':ARG1', (None, []))])) |
||
|
||
# Tree types | ||
Branch = Tuple[Role, Any] | ||
Branch = Tuple[Role, Union[Symbol, "Node"]] | ||
Node = Tuple[Variable, List[Branch]] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to have an AMR tree that consists of just a single There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably not for AMR, but it is a possible graph for this Penman library: >>> penman.parse('(a)') # tree node has empty branch list
Tree(('a', []))
>>> penman.decode('(a)').triples # graph has :instance None
[('a', ':instance', None)] |
||
|
||
# Graph types | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty awkward, and will technically reduce performance just to get typing to work. Annoyingly, Mypy Typeguard doesn't work the same as
isinstance()
(python/typing#1351). So, replacingis_tgt_symbol(target)
above withisinstance(target, str)
then allows Mypy to infer that it must aNode
here and aSymbol
above without needing anelif
here. Not sure which solution is best, since the method is more clear what's going on compared to a nakedisinstance
.Alternatively, we could just use
cast
to force Mypy to recognize the correct type 🤷♂️There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a little awkward. In general, I'm not too concerned about reducing performance as long as it's correct.
I haven't looked closely at this code in a while so I'd need to think a bit more about a better solution, but in the meantime I want to point out that the change from
else
toelif
means that there is no moreelse
case. A reader of the code would have to know thatis_tgt_symbol()
andis_tgt_node()
are defined as opposites to determine that theelif
would catch all other cases. Otherwise it looks like a latent bug.