All notable changes to this project will be documented in this file.
- add parsing for iframe.u-*[src] (#116)
- bug fix: reduced implied urls (#117)
- bug fix: don't collapse whitespace between tags
- specify explicit versions for dependencies
- revert BeautifulSoup copying added in 1.1.1 due to bugs (eg #108)
- misc performance improvements
- streamline backcompat to use JSON only.
- fix multiple mf1 root rel-tag parsing
- correct url and photo for hreview.
- add rules for nested hreview. update backcompat to use multiple matches in old properties.
- fix
rel-tag
top-category
conversion so that other classes are not lost. - use original authored html for
e-*
parsing in backcompat - make classes and rels into unordered (alphabetically ordered) deduped arrays.
- only use class names for mf2 which follow the naming rules
- fix
parse
method to use default html parser. - always use the first value for attributes for rels.
- correct AM/PM conversion in datetime value class pattern.
- add ordinal date parsing to datetimes value class pattern. ordinal date is normalised to YYYY-MM-DD
- remove hack for html tag classes since that is fixed in new BS
- better whitespace algorithm for
name
andhtml.value
parsing - experimental flag for including
alt
inu-photo
parsing - make a copy of the BeautifulSoup given by user to work on for parsing to prevent changes to original doc
- bump version to 1.1.1
- bump version to 1.1.0 since it is a "major" change
- added tests for new implied name rules
- modified earlier tests to accommodate new rules
- use space separator instead of "T"
- Don't add "00" seconds unless authored
- use TZ authored in separate
value
element - only use first found
value
of a particular typedate
,time
, ortimezone
. - move backcompat rules into JSON files
- reorganise value class pattern parsing into new files
- add datetime_helpers to organise datetime parsing rules
- reorganise tests
- remove Heroku frontend, point to mf2py-web and python.microformats.io instead in README.
- remove Flask and gunicorn requirements
- add debug info with description, version, url and the html parser used
- strip leading/trailing white space for
e-*[html]
. update the corresponding tests - blank values explicitly authored are allowed as property values
- include
alt
orsrc
from<img>
in parsing forp-*
ande-*[value]
- parse
title
from<link>
forp-*
resolves #84 - and
poster
from<video>
foru-*
resolves #76 - use
html5lib
as default parser - use the final redirect URL resolves #62
- update requirements to use BS4 v4.6.0 and html5lib v1.0.1
- drop support for Python 2.6 as html5lib dropped support
- Implied property checks now ignore alt="", treating it the same as if no alt value is defined.
- Support for using a custom dict implementation by setting mf2py.Parser.dict_class. collections.OrderedDict yields much nicer output for hosted parsers.
- Performance improvement changing simple calls to soup.find_all to a manual iteration over .contents.
- Performance improvement by limiting number of calls to soup.find_all in backcompat module. Should not be any functional changes.
- Backward compatibility parsing for rel=tag properties. These are now converted to p-category based on the last path segment of the tag URI as spec'd in http://microformats.org/wiki/h-entry#Parser_Compatibility
- Optional property html_parser to specify the html parser that BeautifulSoup should use (e.g., "lxml" or "html5lib")
u-*
properties are now parsed from<link>
elements per the updated spec http://microformats.org/wiki/microformats2-parsing-issues#link_elements_and_u-_parsing
- Version number bumped to 1.0.0 following community discussion.
- Stricter checks that Parser.init params are actually None before ignoring them.
- Now produces unicode strings for every key and value, no more byte strings anywhere.
- Do not add 'T' between date and time when normalizing dates
- Unit tests for running the microformats test suite
- New top-level "rel-urls" entry, contains rich data parsed from rel links, organized by URL.
- convenience method
mf2py.parse
that takes the same arguments as Parser and returns a dict.
- nested h-* classes now parse their "value" based on the property they represent (p-, u-, dt-*), so for example "p-in-reply-to h-cite" would have a name as its value and "u-in-reply-to h-cite" will have a URL.
- Add rel=bookmark to backward compat parsing rules based (translated to u-url in mf2)
- Parser constructor now takes explicit named arguments instead of **kwargs, for saner behavior when called with unnamed arguments.
- Bugfix: Empty href="" attributes are now properly interpreted as the current document's URL.
- Minor Py3 compatibility fix
- Correct typo
test_requires
->tests_require
in setup.py
- Started keeping a changelog!
- Use a better method for extracting HTML for an e-* property
- Correct BeautifulSoup4 dependency in setup.py to fix error with installation from PyPI.
- Buffed up docstrings for public methods.