-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python 3 version of timex library #26
base: master
Are you sure you want to change the base?
Conversation
nltk_contrib/timex3.py
Outdated
'Monday': 0, | ||
'Tuesday': 1, | ||
'Wednesday': 2, | ||
'Thursday': 3, | ||
'Friday': 4, | ||
'Saturday': 5, | ||
'Sunday': 6} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These capitalized day and month names were not working in every scenario for me. By converting to lowercase for input text these issues were resolved.
month = "(january|february|march|april|may|june|july|august|september| \ | ||
october|november|december)" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a bug in the original version here where the global variable month was overwritten and then unusable on subsequent calls to timex. Adding month in here solves the issue.
@@ -175,9 +171,14 @@ def ground(tagged_text, base_date): | |||
timex_found = timex_regex.findall(tagged_text) | |||
timex_found = map(lambda timex:re.sub(r'</?TIMEX2.*?>', '', timex), \ | |||
timex_found) | |||
timexList = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new variable is used to return timex values as a list in addition to the timex tagged format.
|
||
import re | ||
import string | ||
import os | ||
import sys | ||
from datetime import datetime, timedelta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new timedelta features of python3 allow us to remove the dependency on mx.DateTime
Pleas see the second commit for line-by-line changes. There are a few things fixed here other than the simple conversion that should be reviewed. Thanks for looking in advance! |
Appreciated ! 👍 |
Created a new version of timex that is in python 3 and kept the original version with the original file name. The new version uses python3 datetime "timedelta" features so it no longer relies on the eGenix datetime distribution that timex depends on in order to function. All relative time expressions are working in the project I am using it in.