Raw ==> Set ==> Visual ==> Model
Where raw
, set
, visual
are tools and model is a neural network for recognizing latex.
Model is based on paper:
IMAGE TO LATEX VIA NEURAL NETWORKS
Avinash More
San Jose State University
ITL detects each character separately and merges them into one sequence.
Let s
be a number of supported characters. ITL uses s
clones of the same architecture. J
'th neural network recognises j
'th character using one-hot encoding
. The project currently supports the following characters: +
, -
, ^
, {
, }
,^
, \cdot
, a
, x
, 1
, 2
, 3
, 4
, 5
, 6
, 7
, 8
, 9
, 0
Input shape=(64, 64)
.
Format of input features: eq_n_b_c
.
n
number of labelb
number of backgroundc
number of effect pack applied on a given feature. For givenn
andb
all featureseq_n_b_{0...effects number}
represents the same label.
Link for the dataset used during training will be available soon on mvxxx.github.io
as exp.tar.gz
.
The whole dataset was fully generated in use of tool/{raw, set, visual}
in kind of pipe.
For training dataset of length 7
, the mean accuracy was (depending on the difficulty of the dataset) ~0.70-0.95
after 10
epochs of training.
A functional tool which is written in OCaml. Random latex expression generators, with various syntactic levels and concepts describing exact behavior within the level. Create a set of generators capable of supplying the model with properly generated random latex expressions,
matching strict expectations, for training purposes.
Performance:
Type | Images/sec |
---|---|
Complex | ~1.600.000 |
Standard | ~2.000.000 |
Basic | ~4.000.000 |
Script tool which gets several input files with raw LaTeX and converts them into basic .png expressions. This part executes the worker for each input file (kind of thread pooling). Using via bash script: bash set.sh *.in
.
It will produce all content inside temporary folders, then it moves all photos to the output folder. These images are input for visual
part. All input *.in
labels are concatenated and stacked into the labels
file.
It gets raw text like: 3-\frac{100}{88}
and returns:
The biggest tool, written in C++
and using OpenCV
, capable of creating millions of written like human math equations. It applies a lot of different effects in order to make math as if it was
written by people. It can be configured using config.hpp
. The final result is the base of the dataset for machine learning.
It gets raw png like before and returns (in that case, only rotate was applied):
In that part, there are predefined effects like:
Type | Brief |
---|---|
left rotate | rotates images by (-C,0) |
right rotate | rotates images by (0,C) |
symetric scaling upward | scales both x and y upward |
symetric scaling downward | scales both x and y downward |
non-symetric scaling upward | scales independently x and y upward |
non-symetric scaling downward | scales independently x and y downward |
There are also effects applied outside effect manager:
Type | Brief |
---|---|
position | changes position of sprite on background |
background | changes background |
perlin | applies perlin noise mask (in progress) |
Each effect is taken or not. For each image, we apply all possible combinations of effects. Let say that we have effects e1, e2, e3
and image p
. Then the output will be
p ---(!e1,!e2,!e3)---> p0
p ---(!e1,!e2,e3)----> p1
p ---(!e1,e2,!e3)----> p2
p ---(e1,!e2,!e3)----> p3
p ---(e1,e2,!e3)-----> p4
p ---(!e1,e2,e3)-----> p5
p ---(e1,!e2,e3)-----> p6
p ---(e1,e2,e3)------> p7
!e
means that we don't take e
. So for each image, the output is 2^k
modified images.
Example of use:
$ /usr/bin/time -f %e ./visual -la ../data/ .png
[INFO] Initializing module: PN3itl5StateE
[INFO] Program uses multithreading. Threads number: 12
[INFO] Initializing module: PN3itl9TransformE
[INFO] Initializing module: PN3itl12ImageManagerE
[INFO] Initializing module: PN3itl13EffectManagerE
[INFO] Initializing module: PN3itl11PerlinNoiseE
[INFO] Work finished successfully
3.82
for 96.000 produced images 64x64
. Speed is about 25.000/s
on AMD RYZEN 5 2600
. As you can see, there are available multiple flags:
constexpr char const* testing = "-t";
constexpr char const* printing_steps = "-p";
constexpr char const* log_erros = "-le";
constexpr char const* log_info = "-li";
constexpr char const* log_suggestions = "-ls";
constexpr char const* log_warnings = "-lw";
constexpr char const* log_all = "-la";
constexpr char const* log_time = "-lt";
If you don't want to log, just run a program without any flags.