Possible bug parsing DOCTYPE of 4.01 Transitional HTML document #36

langdonr621 · 2024-02-21T03:48:54Z

hi there,

i'm trying to use your library (0.7.0) but am facing a problem when parsing, for example, https://www.unicode.org/reports/tr29/#Word_Boundaries. the error thrown is similar to:

running 1 test
Failed :(  --> 1:23
 |
1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
 |                       ^---
 |
 = expected attribute key

here's a test function showing the problem:

    #[test]
    fn test_html_parser() {
        const HTML: &str = r#"<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

<html></html>"#;

        if let Err(x) = html_parser::Dom::parse(HTML) {
            println!("Failed :( {}", x)
        }
    }

this wikipedia page claims the directive's syntax is correct.

am i correct in assuming this is a bug with the grammar's rules used?

if not would appreciate it if you can indicate a work-around.

TIA + cheers;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug parsing DOCTYPE of 4.01 Transitional HTML document #36

Possible bug parsing DOCTYPE of 4.01 Transitional HTML document #36

langdonr621 commented Feb 21, 2024

Possible bug parsing DOCTYPE of 4.01 Transitional HTML document #36

Possible bug parsing DOCTYPE of 4.01 Transitional HTML document #36

Comments

langdonr621 commented Feb 21, 2024