Maintaining Alignment and Indentation in HTML to DOCX Conversion #10010
Unanswered
vishalmudgalpw
asked this question in
Q&A
Replies: 2 comments
-
Hi @jgm @tarleb Thanks! |
Beta Was this translation helpful? Give feedback.
0 replies
-
These are complicated questions, and I don't have enough free time right
now.
I'm available for hire for urgent requests; please reach out via email
if that's an option for you.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We are converting HTML to DOCX and need to maintain specific alignment and indentation in the resulting DOCX output. Our use case involves generating tests containing questions.
Command We Use for Conversion:
pandoc-filters.lua:
function Span(elem) if elem.attributes["custom-style"] == "tab" then return pandoc.RawInline('openxml', '<w:r><w:tab/></w:r>') elseif elem.attributes["custom-style"] == "tab_0_75" then return pandoc.RawInline('openxml', '<w:r><w:tab/></w:r>') end end
Style Reference DOC:
html-to-docx-style-reference.docx
Example Input HTML:
We process this HTML to maintain the required alignment and indentation of dynamic data before rendering it on the UI. Here’s how we handle the transformation:
<p>
Elements to<span>
: Append<br>
tags for line breaks in Word, but do not append<br>
for the last paragraph element.Processed HTML Template:
Custom styles for indentation, bold_char, and tab are defined in the style reference DOCX.
Output We Get:
nahsz1ms9amj9mfvtbpz4o6sd.docx
Issue: As seen in the output DOCX file, there are many spaces in b/w of text in first two lines wherever the
<br>
tag is appended. This happens because we want our content to be justified. The custom style applies justified alignment, which leads to unwanted spaces.Case 1: If we use
<br>
tags, we maintain the desired alignment and indentation, but the justified content introduces a lot of spaces whenever our logic appends the<br>
tag.Case 2: If we remove the
<br>
tag and use<p>
or<div>
tags to achieve natural line breaks, it breaks our indentation and alignment. The content will come below the question index.Solution we require:
How can we maintain the alignment and indentation while applying line breaks where necessary and keeping the content justified?
Which approach should we use to achieve this? Currently both cases have issue. Any suggestions on how to maintain alignment, indentation, and justified content without introducing unwanted spaces would be greatly appreciated.
Desired Output:
desired_output.docx
Thanks 🙏
@jgm
Beta Was this translation helpful? Give feedback.
All reactions