There are two options to run the scripts. There's a docker-based setup that only requires installation of Docker and automatically manages all other dependencies. The other option is to install the dependencies and run the shell script that calls the python files to perform the conversions.
Follow the steps in the following order -
- Base Requirements for installation
- Pre-conversion Requirements
- Workflow Docker Option or Dependencies Option
- Post-conversion checks
- Clone the github repo to your local machine.
cd
into the desired directory and clone using -git clone https://github.com/WeAreSeismica/tex2jats.git
Before running the TeX2JATS converter, you need to:
- Have produced final
.tex
galleys with updated metadata (proof accepted by authors) - Check that date format is correct (Month dd, YYYY)
- Check that you did not use the obsolete command
seistable
in the tex file - Convert every figure to PNG format if not already done. You can use the following bash command (converts every PDF file which starts with fig to a PNG format, requires imagemagick, you can adjust density if needed):
mogrify -verbose -quality 00 -density 250 -format png ./fig*.pdf
-
Install Docker if you don't have it already.
-
Start up Docker. Usually you will have an application called "Docker" on your computer with a rudimentary graphical user interface (GUI). You can also run this command in the command-line interface (CLI):
open -a Docker
-
Move the directory containing latex files to be converted into the git repo
tex2jats
. -
Open the file
env.set
, and edit the parameters in the file as mentioned there. Spaces in names are accepted. -
Open a shell and navigate to the wherever the git repo is cloned, which now includes the latex files directory.
cd path/to/git/repo/tex2jats
You can always run
pwd
to check whether you're in the right place. -
Run docker-compose by entering the below commands in your favorite shell.
docker-compose up -d
-
This should convert and create all the necessary files. Conversion may take up to a minute. You can check the status of the conversion by using -
docker-compose logs
Verify that there are
xml
files in the latex files directory. -
Shut down docker-compose by entering -
docker-compose down
These ones are shared dependencies with the docx/odt parsing for Seismica module:
- python 3.n (preferably 3.8+)
- numpy
- beautifulsoup (+ lmxl parser on MacOS)
- pandoc
- biblib
Other dependencies:
- python3 datetime (often already installed)
- GNU sed, v4.8
- perl, v>5 (below not tested)
-
Copy tex2xml.sh, apa.csl, and cleanjats.py to your current working directory (CWD, where the TeX galley is):
cd /cwd/
cp /path/to/tex2jats/tex2xml.sh ./
cp /path/to/tex2jats/*.py ./
cp /path/to/tex2jats/apa.csl ./
-
In your CWD, use tex2xml.sh to convert the TeX to JATS XML:
./tex2xml.sh proof biblio
or./tex2xml.sh proof biblio math
for the math mode
with:proof
, the name of the TeX galley without the extension (which should be .tex)biblio
, the name of the corrected list of references, without the extension (which should be .bib)math
, to activate the math mode (to use if math does not print correctly, especially if amsmath TeX package is in use)
- Either of the workflows above will output the following files within the latex directory:
proof.xml
proof_metadata.jats
proof_credits.jats
proof_galley.xml
proof_tab1.tex
if you have one table (see point (5)), and one similar file per tableproof_tab1.xml
You will then work with the XML galley only (proof_galley.xml
). Other files are only here for correction if needed.
-
Open the galley with a web browser, it should not show any error. You can use the web browser to debug (will show the line of every error).
-
Correct metadata if needed: Authors' names, affiliations, DOI, title, and other metadata. If multiple affiliations, there are often multiple points.
-
Cross references for figures (xref) look OK. They should be printed with a format similar to:
<xref ref-type="fig" alt="1" rid="fig1">1</xref>
-
Check the acknowledgements and references
-
If there are TABLES in your TeX galley, tex2xml.sh will export two files for each table: tabxx.tex and tabxx.xml, xx ranging from 1 to the total number of arrays present in the article.
-> If the table is simple and has been converted properly by pandoc + cleanjats.py:
- Check and correct the caption of the table in the xml file for references or formula that have not been converted.
-> If the table is too complex and has not been properly converted:
- Correct any unwanted symbol in
tabxx.tex
- Translate the
tabxx.tex
to HTML with https://tableconvert.com/latex-to-html - Copy the HTML code in the XML table file
tabxx.xml
, where indicated - Replace the wrong table code in the XML galley
proof_galley.xml
with the updatedtabxx.xml
. In the XML galley, tables are under atable-wrap
environment. - Check and correct the caption of the table in the xml file for references or formula that have not been converted.
- Upload the XML galley to the OJS website.
- You can rename the XML galley
- Images need to be uploaded separately (no need to fill in the caption etc.)
- Use the preview tool to open the XML Lens Viewer
- Check that title, authors, and all metadata are printed correctly (in the main text page but also the
info
tab) - Cross references to figures and tables and references are OK
- Tables are printed correctly
- Acknowledgements are printed correctly
- References (in the
References
tab) are printed correctly.
- References and maths formula in table captions are often not converted properly
- Often in metadata, credits or acknowledgements: symbols are not properly converted, or are still escaped when they shouldn't: look for \&, \%, multiple occurrences of dots and/or comma (
..
,.,
).
- Correctly parse math expressions and references within table captions