SemanticMarkdown
is a library that allows marking parts of the Markdown document with XML tags.
Those are extracted and provided in Keyword-list form along with the non marked format.
E.g.:
<author>Alexander "exlee" Kaminski</author>
<date>2022-02-02</date>
<language>en_US</language>
# Hello World!
Every document has to start somewhere?!
<hint>It's possible to extend from World to Universe at some point</hint>
<mobile_content>
As _content_ on this *page* is very intensive you will not be able to see the images!
</mobile_content>
<update>2022-02-03 : Added hint</update>
<update>2022-02-04 : Set tags</update>
... is transformed into friendly keyword list:
[
author: ...,
date: ...,
language: ...,
content: ...,
mobile_content: ...,
update: ...,
update: ...
]
Such list could be then used for conditional rendering or all kinds of rendering transformation using marked parts/attributes.
Solution space:
- Have a local markdown-based CMS system
- Conditional content loading depending on various conditions, like locale or browser configuration (without DB)
- Data points embedding for interactive components
- Assymetrical documents (e.g. flashcards)
SemanticMarkdown
can be installed by adding semantic_markdown
to your list of dependencies in mix.exs
:
def deps do
[
{:semantic_markdown, "~> 0.1.0"}
]
end
Documentation can be found at https://hexdocs.pm/semantic_markdown.
Markdown is a great format for short and longer forms. However it's somewhat limited when it comes to creating structured content. The usual solutions is to use either CMS system or directly database in order to feed content. Database modelling takes time and might be an overkill for small solutions.
Also - Markdown is VERY good for writing content, so if the solution is small, text-driven one can use Markdown instead of trying to hammer-in back office system so that the content can be provided.
Same semantic information can be obtained by using database. For small solutions modelling database or even setting it up can be overkill over having flat local files. Semantic marking allows for example loading markdown into local database (like SQLite3) for faster reads and incrementally extending model as needed.
At the time of writing this library Earmark
hard codes see footnotes
and return to article
when
parsing them. SemanticMarkdown
provides options to replace those during parse allowing to use
non-English titles (e.g. with gettext
), which was another motivation and the actual
I wanted to have simple CMS system for a content generation, and couldn't find one, so made my own ;)
Some Markdown parsing solutions are using "header" parts in order to provide data with semantic value, e.g.:
date: 2022-02-02
language: en_US
author: Anonymous Writer
-----
# Title of the document
(...)
Where front-matter can be any format (XML, TOML, YAML etc.).
Such approach works well when provided data can be embedded in such data file. It doesn't allow marking parts of document and it's usually developer's responsibility to make sure that document is split in proper manner.
XML parsing (with library such as SweetXml) would probably be preferable.
<xml>
<title> ... </title>
<content>
...
</content>
<sources>
...
</sources>
</xml>
Since not only that would provide semantic tagging and formatting but also allow for hardening data with name spaces. However, if one decides to use it, they're on their own to implement Markdown parsing for specific nodes.
Instead of toying with semantic tagging one could use IAL extensions (see Earmark's) and then use other methods of hiding content (like CSS/JS).
It it's also possible to split the .md
files into multiple ones using schema like, but if there are a lot of information with semantic meaning such split would be very cumbersome to uphold.
- it should be possible to have inner transforms on tag-by-tag or even node-by-node basis
- footnotes need to be in the same semantic node making them somewhat useless
- since parsing is done using
Earmark
it shares some caveats (like HTML Limitation) - no performance tests were done, but most likely it's not very fast so the input files should be pre-processed and cached
- it'd be nice to have tag transformers provided in form of
(text) -> any
so that output can be "smarter" - nested semantic tags are not supported (this probably would require switching parser entirely)
- More tests, especially with more complex documents
- Configurable transformers for tags
- Per-tag inner-parsing