Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding typing to tree branches #139

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions penman/layout.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
from penman.graph import CONCEPT_ROLE, Graph
from penman.model import Model
from penman.surface import Alignment, RoleAlignment
from penman.tree import Tree, is_atomic
from penman.tree import Tree, is_atomic, is_tgt_node, is_tgt_symbol
from penman.types import BasicTriple, Branch, Node, Role, Variable

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -166,10 +166,10 @@ def _interpret_node(t: Node, variables: Set[Variable], model: Model):
has_concept |= role == CONCEPT_ROLE

# atomic targets
if is_atomic(target):
target, target_epis = _process_atomic(target)
if is_tgt_symbol(target):
tgt, target_epis = _process_atomic(target)
epis.extend(target_epis)
triple = (var, role, target)
triple = (var, role, tgt)
if model.is_role_inverted(role):
if target in variables:
triple = model.invert(triple)
Expand All @@ -178,7 +178,8 @@ def _interpret_node(t: Node, variables: Set[Variable], model: Model):
triples.append(triple)
epidata.append((triple, epis))
# nested nodes
else:
# mypy forgets that (Node ∨ Sym) ^ ¬Sym → Node
elif is_tgt_node(target):
Copy link
Author

@chanind chanind Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty awkward, and will technically reduce performance just to get typing to work. Annoyingly, Mypy Typeguard doesn't work the same as isinstance() (python/typing#1351). So, replacing is_tgt_symbol(target) above with isinstance(target, str) then allows Mypy to infer that it must a Node here and a Symbol above without needing an elif here. Not sure which solution is best, since the method is more clear what's going on compared to a naked isinstance.

Alternatively, we could just use cast to force Mypy to recognize the correct type 🤷‍♂️

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a little awkward. In general, I'm not too concerned about reducing performance as long as it's correct.

I haven't looked closely at this code in a while so I'd need to think a bit more about a better solution, but in the meantime I want to point out that the change from else to elif means that there is no more else case. A reader of the code would have to know that is_tgt_symbol() and is_tgt_node() are defined as opposites to determine that the elif would catch all other cases. Otherwise it looks like a latent bug.

triple = model.deinvert((var, role, target[0]))
triples.append(triple)

Expand Down Expand Up @@ -566,7 +567,7 @@ def _rearrange(node: Node, key: Callable[[Branch], Any]) -> None:
first = []
rest = branches[:]
for _, target in rest:
if not is_atomic(target):
if is_tgt_node(target):
_rearrange(target, key=key)
branches[:] = first + sorted(rest, key=key)

Expand Down
4 changes: 2 additions & 2 deletions penman/transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
)
from penman.model import Model
from penman.surface import Alignment, RoleAlignment, alignments
from penman.tree import Tree, is_atomic
from penman.tree import Tree, is_tgt_node
from penman.types import BasicTriple, Node, Target, Variable

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -62,7 +62,7 @@ def _canonicalize_node(node: Node, model: Model) -> Node:
role, tgt = edge
# alignments aren't parsed off yet, so handle them superficially
role, tilde, alignment = role.partition('~')
if not is_atomic(tgt):
if is_tgt_node(tgt):
tgt = _canonicalize_node(tgt, model)
canonical_role = model.canonicalize_role(role) + tilde + alignment
canonical_edges.append((canonical_role, tgt))
Expand Down
35 changes: 27 additions & 8 deletions penman/tree.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@

from typing import Any, Dict, Iterator, List, Mapping, Optional, Set, Tuple

from penman.types import Branch, Node, Variable
from typing_extensions import TypeGuard

from penman.types import Branch, Node, Symbol, Variable

_Step = Tuple[Tuple[int, ...], Branch] # see Tree.walk()

Expand Down Expand Up @@ -112,10 +114,10 @@ def _format(node: Node, level: int) -> str:

def _format_branch(branch: Branch, level: int) -> str:
role, target = branch
if is_atomic(target):
target = repr(target)
else:
if is_tgt_node(target):
target = _format(target, level)
else:
target = repr(target)
return f'({role!r}, {target})'


Expand All @@ -124,7 +126,7 @@ def _nodes(node: Node) -> List[Node]:
ns = [] if var is None else [node]
for _, target in branches:
# if target is not atomic, assume it's a valid tree node
if not is_atomic(target):
if is_tgt_node(target):
ns.extend(_nodes(target))
return ns

Expand All @@ -135,7 +137,7 @@ def _walk(node: Node, path: Tuple[int, ...]) -> Iterator[_Step]:
curpath = path + (i,)
yield (curpath, branch)
_, target = branch
if not is_atomic(target):
if is_tgt_node(target):
yield from _walk(target, curpath)


Expand Down Expand Up @@ -180,15 +182,32 @@ def _map_vars(

newbranches: List[Branch] = []
for role, tgt in branches:
if not is_atomic(tgt):
if is_tgt_node(tgt):
tgt = _map_vars(tgt, varmap)
elif role != '/' and tgt in varmap:
# MyPy forgets that (Node ∨ Sym) ^ ¬Node → Sym
elif is_tgt_symbol(tgt) and role != '/' and tgt in varmap:
tgt = varmap[tgt]
newbranches.append((role, tgt))

return (varmap[var], newbranches)


def is_tgt_node(target: Symbol | Node) -> TypeGuard[Node]:
"""
Inverse of :func:`is_atomic`, only for Symbol | Node from branches.
Automatically narrows the type to Node for better type inference
"""
return not is_atomic(target)


def is_tgt_symbol(target: Symbol | Node) -> TypeGuard[Symbol]:
"""
Same as :func:`is_atomic`, only for Symbol | Node from branches.
Automatically narrows the type to Symbol for better type inference
"""
return is_atomic(target)
Comment on lines +195 to +208
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few issues here:

  1. It's not great that these don't really do anything different from is_atomic() except for type checking. I think having them as part of the public API would confuse users.
  2. If they were to stay, I'd prefer to use non-abbreviated names in public API functions: is_target_symbol() or maybe is_symbol().
  3. TypeGuard is added in Python 3.10, but Penman currently supports down to Python 3.8.



def is_atomic(x: Any) -> bool:
"""
Return ``True`` if *x* is a valid atomic value.
Expand Down
5 changes: 3 additions & 2 deletions penman/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,15 @@
Basic types used by various Penman modules.
"""

from typing import Any, Iterable, List, Tuple, Union
from typing import Iterable, List, Tuple, Union

Variable = str
Constant = Union[str, float, int, None] # None for missing values
Role = str # '' for anonymous relations
Symbol = str
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the correct name for what's in branch targets? It seemed like the target must always be a string if it's not a Node, with even number constants showing up as strings. Are there edge-cases of trees where this isn't correct?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the correct name for what's in branch targets?

I believe this would be Atom. Considering the grammar, atoms are variables or constants, constants are either strings or symbols. Here's an example:

(a / A
   :ARG1 (b / B)  ; node target
   :ARG2 b        ; variable target (atomic)
   :ARG3 "a b"    ; string target   (atomic)
   :ARG4 abc      ; symbol target   (atomic)
)

The only difference between a symbol and a variable is that a variable is used in a node (such as a and b in the example above). The difference between strings and symbols, besides the quotes, is just that symbols cannot contain control characters like whitespace, parens, colons, etc. Quoted strings may not be used as variables (("a" / A) raises an error, and :ARG1 "a" does not reference a variable a).

even number constants showing up as strings

That is correct. Penman does no interpretation of datatypes on parsing. It will accept a limited number of non-string types (ints, floats, etc.) during encoding, but they will be strings again when decoding.

Are there edge-cases of trees where this isn't correct?

While unconventional, missing targets are allowed and parse to None (a warning may be issued as well):

>>> penman.parse('(a / A :ARG1)')
Missing target: (a / A :ARG1)
Tree(('a', [('/', 'A'), (':ARG1', None)]))

Similarly, an empty node target means that the type annotation of Node is not entirely correct, as mentioned in #129:

>>> penman.parse('(a / A :ARG1 ())')
Tree(('a', [('/', 'A'), (':ARG1', (None, []))]))


# Tree types
Branch = Tuple[Role, Any]
Branch = Tuple[Role, Union[Symbol, "Node"]]
Node = Tuple[Variable, List[Branch]]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have an AMR tree that consists of just a single Variable with no branches?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not for AMR, but it is a possible graph for this Penman library:

>>> penman.parse('(a)')  # tree node has empty branch list
Tree(('a', []))
>>> penman.decode('(a)').triples  # graph has :instance None
[('a', ':instance', None)]


# Graph types
Expand Down