Skip to content

Elixir Library for parsing Markdown while retaining associated semantics

License

Notifications You must be signed in to change notification settings

exlee/ex_semantic_markdown

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SemanticMarkdown

Description

SemanticMarkdown is a library that allows marking parts of the Markdown document with XML tags. Those are extracted and provided in Keyword-list form along with the non marked format.

E.g.:

<author>Alexander "exlee" Kaminski</author>
<date>2022-02-02</date>
<language>en_US</language>

# Hello World!

Every document has to start somewhere?!

<hint>It's possible to extend from World to Universe at some point</hint>

<mobile_content>
As _content_ on this *page* is very intensive you will not be able to see the images!
</mobile_content>

<update>2022-02-03 : Added hint</update>
<update>2022-02-04 : Set tags</update>

... is transformed into friendly keyword list:

[
  author: ...,
  date: ...,
  language: ...,
  content: ...,
  mobile_content: ...,
  update: ...,
  update: ...
]

Such list could be then used for conditional rendering or all kinds of rendering transformation using marked parts/attributes.

Solution space:

  • Have a local markdown-based CMS system
  • Conditional content loading depending on various conditions, like locale or browser configuration (without DB)
  • Data points embedding for interactive components
  • Assymetrical documents (e.g. flashcards)

Installation

SemanticMarkdown can be installed by adding semantic_markdown to your list of dependencies in mix.exs:

def deps do
  [
    {:semantic_markdown, "~> 0.1.0"}
  ]
end

Documentation can be found at https://hexdocs.pm/semantic_markdown.

Rationale

Markdown is a great format for short and longer forms. However it's somewhat limited when it comes to creating structured content. The usual solutions is to use either CMS system or directly database in order to feed content. Database modelling takes time and might be an overkill for small solutions.

Also - Markdown is VERY good for writing content, so if the solution is small, text-driven one can use Markdown instead of trying to hammer-in back office system so that the content can be provided.

Database

Same semantic information can be obtained by using database. For small solutions modelling database or even setting it up can be overkill over having flat local files. Semantic marking allows for example loading markdown into local database (like SQLite3) for faster reads and incrementally extending model as needed.

Footnotes

At the time of writing this library Earmark hard codes see footnotes and return to article when parsing them. SemanticMarkdown provides options to replace those during parse allowing to use non-English titles (e.g. with gettext), which was another motivation and the actual

TL;DR;

I wanted to have simple CMS system for a content generation, and couldn't find one, so made my own ;)

Alternative solutions

Front-matter

Some Markdown parsing solutions are using "header" parts in order to provide data with semantic value, e.g.:

date: 2022-02-02
language: en_US
author: Anonymous Writer

-----

# Title of the document
(...)

Where front-matter can be any format (XML, TOML, YAML etc.).

Such approach works well when provided data can be embedded in such data file. It doesn't allow marking parts of document and it's usually developer's responsibility to make sure that document is split in proper manner.

XML Parsing

XML parsing (with library such as SweetXml) would probably be preferable.

<xml>
  <title> ... </title>
  <content>
    ...
  </content>
  <sources>
    ...
  </sources>
</xml>

Since not only that would provide semantic tagging and formatting but also allow for hardening data with name spaces. However, if one decides to use it, they're on their own to implement Markdown parsing for specific nodes.

Markdown classes

Instead of toying with semantic tagging one could use IAL extensions (see Earmark's) and then use other methods of hiding content (like CSS/JS).

Document splitting

It it's also possible to split the .md files into multiple ones using schema like, but if there are a lot of information with semantic meaning such split would be very cumbersome to uphold.

Missing features / known issues

  • it should be possible to have inner transforms on tag-by-tag or even node-by-node basis
  • footnotes need to be in the same semantic node making them somewhat useless
  • since parsing is done using Earmark it shares some caveats (like HTML Limitation)
  • no performance tests were done, but most likely it's not very fast so the input files should be pre-processed and cached
  • it'd be nice to have tag transformers provided in form of (text) -> any so that output can be "smarter"
  • nested semantic tags are not supported (this probably would require switching parser entirely)

Next

  1. More tests, especially with more complex documents
  2. Configurable transformers for tags
  3. Per-tag inner-parsing

About

Elixir Library for parsing Markdown while retaining associated semantics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages