Skip to content

Commit

Permalink
Add caveats to section removal doc
Browse files Browse the repository at this point in the history
Signed-off-by: Evan Lloyd New-Schmidt <[email protected]>
  • Loading branch information
newsch committed Apr 24, 2024
1 parent 58864b4 commit 62a16d2
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions src/html.rs
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,9 @@ fn remove_ids(document: &mut Html, ids: impl IntoIterator<Item = NodeId>) {
}

/// Remove sections with the specified `titles` and all trailing elements until next section.
///
/// `titles` are matched by case-sensitive simple byte comparison.
/// `titles` should be normalized to Unicode NFC to match Wikipedia's internal normalization: <https://mediawiki.org/wiki/Unicode_normalization_considerations>.
fn remove_sections(document: &mut Html, titles: &BTreeSet<&str>) {
let mut to_remove = Vec::new();

Expand Down Expand Up @@ -507,6 +510,7 @@ mod test {
/// text, which this does not handle.
///
/// See also:
/// - [super::remove_sections]
/// - Mediawiki discussion of normalization: https://mediawiki.org/wiki/Unicode_normalization_considerations
/// - Online conversion tool: https://util.unicode.org/UnicodeJsps/transform.jsp?a=Any-NFC
#[test]
Expand Down

0 comments on commit 62a16d2

Please sign in to comment.