Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heading auto identifier #175

Open
wants to merge 35 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
1da5a80
First step for header identifier like gfm
chowette Mar 30, 2022
340f15c
Implement identifier transformation like github
chowette Apr 3, 2022
5513c81
Fix : garbage can occur in large file.
chowette Apr 4, 2022
9215d87
no need to trim space when building identifiers.
chowette Apr 4, 2022
5351400
convert uppercase unicode identifier into lower case
chowette Apr 5, 2022
d253fc3
First proof of concept for duplicate identifier case
chowette Apr 7, 2022
620fd9d
better md_int16_to_str convertion function
chowette Apr 9, 2022
c8c4fea
replace bad O(n²) algorithm by a hashMap of identifier for numbering
chowette Apr 11, 2022
edf9bd9
heading with link correctly ignore url when generating heading identi…
chowette Apr 12, 2022
ef79138
make heading only when MD_FLAG_HEADINGAUTOID is set
chowette Apr 12, 2022
7812fdb
Emoji are treated as ponctuation, unicode emoji are stripped
chowette Apr 23, 2022
3738b8a
fix -Wdeclaration-after-statement travis errors
chowette Apr 23, 2022
2f3ff6f
add more tests to improve coverage
chowette Apr 23, 2022
f572773
add more tests to improve coverage
chowette Apr 23, 2022
12ccd2f
fix use of wrong macro
chowette Apr 26, 2022
138a104
remove unused struct MD_POSTFIX_DEF_tag
chowette Apr 26, 2022
5d8a7e9
Change how struct `MD_REF_DEF` store dest:
chowette Apr 26, 2022
1ecb4b8
extract reference definition
chowette Apr 26, 2022
2d15e71
store identifier with the leading #
chowette Apr 26, 2022
ada2e65
store the heading
chowette Apr 26, 2022
74f3e4b
add flag MD_FLAG_HEADINGAUTOID doc
chowette Oct 14, 2022
6bfc91d
remember the heading as a reference definition
chowette Oct 14, 2022
1ec0845
rebuild identifier reference after a reallocation
chowette Oct 14, 2022
1eb9b06
store the heading level
chowette Oct 14, 2022
36bb1e9
Output TOC at start of document
chowette Oct 14, 2022
8738b1e
Add TOC option to the parser parameter struct
chowette Oct 16, 2022
bc98da4
add optional table of content place holder MARK
chowette Oct 19, 2022
3d4fc52
fix probleme with table of content <ul> and </ul> generation
chowette Oct 19, 2022
643423b
add some test to the TOC option
chowette Oct 19, 2022
001494b
Table of content placement tests
chowette Oct 19, 2022
1ee979f
fix default TOC depth to properly handel case when no TOC is needed
chowette Oct 19, 2022
3fca919
add some more pathological tests cases
chowette Oct 19, 2022
d4f99b2
Fix declaration-after-statement build error in travis
chowette Oct 19, 2022
f2fab2e
Fix bug with empty heading found by @software-made-easy
chowette Oct 19, 2022
a41ab75
add more tests to improve coverage
chowette Oct 24, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,10 @@ extensions:
* With the flag `MD_FLAG_UNDERLINE`, underscore (`_`) denotes an underline
instead of an ordinary emphasis or strong emphasis.

* With the flag `MD_FLAG_HEADINGAUTOID`, unique identifiers are generated for
headings. The HTML render output them as `id` in the heading tag. For example
`<h1 id="title">Title</h1>`.

Few features of CommonMark (those some people see as mis-features) may be
disabled with the following flags:

Expand Down
36 changes: 35 additions & 1 deletion md2html/md2html.c
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,10 @@ static unsigned parser_flags = 0;
#endif
static int want_fullhtml = 0;
static int want_xhtml = 0;
static int want_toc = 0;
static int want_stat = 0;

MD_TOC_OPTIONS toc_options = { 0, NULL};

/*********************************
*** Simple grow-able buffer ***
Expand Down Expand Up @@ -142,7 +144,7 @@ process_file(FILE* in, FILE* out)
t0 = clock();

ret = md_html(buf_in.data, (MD_SIZE)buf_in.size, process_output, (void*) &buf_out,
parser_flags, renderer_flags);
parser_flags, renderer_flags, &toc_options);

t1 = clock();
if(ret != 0) {
Expand Down Expand Up @@ -200,6 +202,9 @@ static const CMDLINE_OPTION cmdline_options[] = {
{ 'o', "output", 'o', CMDLINE_OPTFLAG_REQUIREDARG },
{ 'f', "full-html", 'f', 0 },
{ 'x', "xhtml", 'x', 0 },
{ 't', "table-of-content", 't', CMDLINE_OPTFLAG_OPTIONALARG },
{ 0, "toc", 't', CMDLINE_OPTFLAG_OPTIONALARG },
{ 0, "toc-depth", 'd', CMDLINE_OPTFLAG_REQUIREDARG },
{ 's', "stat", 's', 0 },
{ 'h', "help", 'h', 0 },
{ 'v', "version", 'v', 0 },
Expand All @@ -220,6 +225,7 @@ static const CMDLINE_OPTION cmdline_options[] = {
{ 0, "funderline", '_', 0 },
{ 0, "fverbatim-entities", 'E', 0 },
{ 0, "fwiki-links", 'K', 0 },
{ 0, "fheading-auto-id", '#', 0 },

{ 0, "fno-html-blocks", 'F', 0 },
{ 0, "fno-html-spans", 'G', 0 },
Expand All @@ -240,6 +246,11 @@ usage(void)
" -o --output=FILE Output file (default is standard output)\n"
" -f, --full-html Generate full HTML document, including header\n"
" -x, --xhtml Generate XHTML instead of HTML\n"
" -t, --table-of-content=MARK, --toc=MARK\n"
" Generate a table of content in place of MARK line\n"
" If no MARK is given, the toc is generated at start\n"
" --toc-depth=D Set the maximum level of heading in the table\n"
" of content. 1 to 6. Default is 3\n"
" -s, --stat Measure time of input parsing\n"
" -h, --help Display this help and exit\n"
" -v, --version Display version and exit\n"
Expand Down Expand Up @@ -269,6 +280,8 @@ usage(void)
" --ftasklists Enable task lists\n"
" --funderline Enable underline spans\n"
" --fwiki-links Enable wiki links\n"
" --fheading-auto-id\n"
" Enable heading auto identifier\n"
"\n"
"Markdown suppression options:\n"
" --fno-html-blocks\n"
Expand All @@ -295,6 +308,12 @@ version(void)
static const char* input_path = NULL;
static const char* output_path = NULL;

static int parse_toc_depth(char const* value){
toc_options.depth = -1;
toc_options.depth = *value - '0';
return (toc_options.depth>0 && toc_options.depth <= 6);
}

static int
cmdline_callback(int opt, char const* value, void* data)
{
Expand All @@ -311,6 +330,20 @@ cmdline_callback(int opt, char const* value, void* data)
case 'o': output_path = value; break;
case 'f': want_fullhtml = 1; break;
case 'x': want_xhtml = 1; renderer_flags |= MD_HTML_FLAG_XHTML; break;
case 't':
want_toc = 1;
parser_flags |= MD_FLAG_HEADINGAUTOID;
toc_options.toc_placeholder = value;
if(toc_options.depth == 0)
toc_options.depth = 3;
break;
case 'd':
if(!parse_toc_depth(value)){
fprintf(stderr, "Invalid toc-depth: %s\n", value);
fprintf(stderr, "Must be a number in the range 1-6\n");
exit(1);
}
break;
case 's': want_stat = 1; break;
case 'h': usage(); exit(0); break;
case 'v': version(); exit(0); break;
Expand All @@ -335,6 +368,7 @@ cmdline_callback(int opt, char const* value, void* data)
case 'K': parser_flags |= MD_FLAG_WIKILINKS; break;
case 'X': parser_flags |= MD_FLAG_TASKLISTS; break;
case '_': parser_flags |= MD_FLAG_UNDERLINE; break;
case '#': parser_flags |= MD_FLAG_HEADINGAUTOID; break;

default:
fprintf(stderr, "Illegal option: %s\n", value);
Expand Down
66 changes: 66 additions & 0 deletions scripts/build_symbol_map.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#!/usr/bin/env python3

import os
import sys
import textwrap


self_path = os.path.dirname(os.path.realpath(__file__));
f = open(self_path + "/unicode/DerivedGeneralCategory.txt", "r")

codepoint_list = []
category_list = [ "Sm", "Sc", "Sk", "So" ]

# Filter codepoints falling in the right category:
for line in f:
comment_off = line.find("#")
if comment_off >= 0:
line = line[:comment_off]
line = line.strip()
if not line:
continue

char_range, category = line.split(";")
char_range = char_range.strip()
category = category.strip()

if not category in category_list:
continue

delim_off = char_range.find("..")
if delim_off >= 0:
codepoint0 = int(char_range[:delim_off], 16)
codepoint1 = int(char_range[delim_off+2:], 16)
for codepoint in range(codepoint0, codepoint1 + 1):
codepoint_list.append(codepoint)
else:
codepoint = int(char_range, 16)
codepoint_list.append(codepoint)
f.close()


codepoint_list.sort()


index0 = 0
count = len(codepoint_list)

records = list()
while index0 < count:
index1 = index0 + 1
while index1 < count and codepoint_list[index1] == codepoint_list[index1-1] + 1:
index1 += 1

if index1 - index0 > 1:
# Range of codepoints
records.append("R(0x{:04x},0x{:04x})".format(codepoint_list[index0], codepoint_list[index1-1]))
else:
# Single codepoint
records.append("S(0x{:04x})".format(codepoint_list[index0]))

index0 = index1

sys.stdout.write("static const unsigned SYMBOL_MAP[] = {\n")
sys.stdout.write("\n".join(textwrap.wrap(", ".join(records), 110,
initial_indent = " ", subsequent_indent=" ")))
sys.stdout.write("\n};\n\n")
16 changes: 16 additions & 0 deletions scripts/run-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,22 @@ echo
echo "Underline extension:"
$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/underline.txt" -p "$PROGRAM --funderline"

echo
echo "Heading auto identifiers extension:"
$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/heading-auto-identifier.txt" -p "$PROGRAM --fheading-auto-id"

echo
echo "Pathological input:"
$PYTHON "$TEST_DIR/pathological_tests.py" -p "$PROGRAM"

echo
echo "Heading auto identifiers pathological input:"
$PYTHON "$TEST_DIR/pathological_auto_ident_tests.py" -p "$PROGRAM --fheading-auto-id"

echo
echo "Table of content extension:"
$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/toc.txt" -p "$PROGRAM --table-of-content"

echo
echo "Table of content placement extension:"
$PYTHON "$TEST_DIR/spec_tests.py" -s "$TEST_DIR/toc-mark.txt" -p "$PROGRAM --table-of-content=[[__TOC__]]"
25 changes: 21 additions & 4 deletions src/md4c-html.c
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,20 @@ render_open_code_block(MD_HTML* r, const MD_BLOCK_CODE_DETAIL* det)
RENDER_VERBATIM(r, ">");
}

static void
render_header_block(MD_HTML* r, const MD_BLOCK_H_DETAIL* det)
{
static const MD_CHAR* head[6] = { "<h1", "<h2", "<h3", "<h4", "<h5", "<h6" };

RENDER_VERBATIM(r, head[det->level- 1]);
if(det->identifier.text != NULL) {
RENDER_VERBATIM(r, " id=\"");
render_attribute(r, &det->identifier, render_html_escaped);
RENDER_VERBATIM(r, "\"");
}
RENDER_VERBATIM(r, ">");
}

static void
render_open_td_block(MD_HTML* r, const MD_CHAR* cell_type, const MD_BLOCK_TD_DETAIL* det)
{
Expand Down Expand Up @@ -378,7 +392,6 @@ render_open_wikilink_span(MD_HTML* r, const MD_SPAN_WIKILINK_DETAIL* det)
static int
enter_block_callback(MD_BLOCKTYPE type, void* detail, void* userdata)
{
static const MD_CHAR* head[6] = { "<h1>", "<h2>", "<h3>", "<h4>", "<h5>", "<h6>" };
MD_HTML* r = (MD_HTML*) userdata;

switch(type) {
Expand All @@ -388,7 +401,7 @@ enter_block_callback(MD_BLOCKTYPE type, void* detail, void* userdata)
case MD_BLOCK_OL: render_open_ol_block(r, (const MD_BLOCK_OL_DETAIL*)detail); break;
case MD_BLOCK_LI: render_open_li_block(r, (const MD_BLOCK_LI_DETAIL*)detail); break;
case MD_BLOCK_HR: RENDER_VERBATIM(r, (r->flags & MD_HTML_FLAG_XHTML) ? "<hr />\n" : "<hr>\n"); break;
case MD_BLOCK_H: RENDER_VERBATIM(r, head[((MD_BLOCK_H_DETAIL*)detail)->level - 1]); break;
case MD_BLOCK_H: render_header_block(r, (const MD_BLOCK_H_DETAIL*)detail); break;
case MD_BLOCK_CODE: render_open_code_block(r, (const MD_BLOCK_CODE_DETAIL*) detail); break;
case MD_BLOCK_HTML: /* noop */ break;
case MD_BLOCK_P: RENDER_VERBATIM(r, "<p>"); break;
Expand All @@ -398,6 +411,7 @@ enter_block_callback(MD_BLOCKTYPE type, void* detail, void* userdata)
case MD_BLOCK_TR: RENDER_VERBATIM(r, "<tr>\n"); break;
case MD_BLOCK_TH: render_open_td_block(r, "th", (MD_BLOCK_TD_DETAIL*)detail); break;
case MD_BLOCK_TD: render_open_td_block(r, "td", (MD_BLOCK_TD_DETAIL*)detail); break;
case MD_BLOCK_NAV: RENDER_VERBATIM(r, "<nav id=\"TOC\" role=\"doc-toc\">\n"); break;
}

return 0;
Expand Down Expand Up @@ -426,6 +440,7 @@ leave_block_callback(MD_BLOCKTYPE type, void* detail, void* userdata)
case MD_BLOCK_TR: RENDER_VERBATIM(r, "</tr>\n"); break;
case MD_BLOCK_TH: RENDER_VERBATIM(r, "</th>\n"); break;
case MD_BLOCK_TD: RENDER_VERBATIM(r, "</td>\n"); break;
case MD_BLOCK_NAV: RENDER_VERBATIM(r, "</nav>\n"); break;
}

return 0;
Expand Down Expand Up @@ -531,20 +546,22 @@ debug_log_callback(const char* msg, void* userdata)
int
md_html(const MD_CHAR* input, MD_SIZE input_size,
void (*process_output)(const MD_CHAR*, MD_SIZE, void*),
void* userdata, unsigned parser_flags, unsigned renderer_flags)
void* userdata, unsigned parser_flags, unsigned renderer_flags,
MD_TOC_OPTIONS* toc_options)
{
MD_HTML render = { process_output, userdata, renderer_flags, 0, { 0 } };
int i;

MD_PARSER parser = {
0,
1,
parser_flags,
enter_block_callback,
leave_block_callback,
enter_span_callback,
leave_span_callback,
text_callback,
debug_log_callback,
*toc_options,
NULL
};

Expand Down
5 changes: 4 additions & 1 deletion src/md4c-html.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,13 +52,16 @@
* Param userdata is just propagated back to process_output() callback.
* Param parser_flags are flags from md4c.h propagated to md_parse().
* Param render_flags is bitmask of MD_HTML_FLAG_xxxx.
* Param toc_options is a pointer to toc options from md4c.h propagated to md_parse().
*
* Returns -1 on error (if md_parse() fails.)
* Returns 0 on success.
*/
int md_html(const MD_CHAR* input, MD_SIZE input_size,
void (*process_output)(const MD_CHAR*, MD_SIZE, void*),
void* userdata, unsigned parser_flags, unsigned renderer_flags);
void* userdata, unsigned parser_flags, unsigned renderer_flags,
MD_TOC_OPTIONS* toc_options
);


#ifdef __cplusplus
Expand Down
Loading