Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oxidize everything #459

Open
crlf0710 opened this issue Sep 7, 2019 · 34 comments
Open

Oxidize everything #459

crlf0710 opened this issue Sep 7, 2019 · 34 comments

Comments

@crlf0710
Copy link
Contributor

crlf0710 commented Sep 7, 2019

So, i have a branch that converted all the c code (c++ code and objc code not included, yet) to rust using the c2rust tool at https://github.com/crlf0710/tectonic/tree/oxidize . However the code generated is not fully portable, and all pre-processor macros are lost.

Current progress is here:

  • Linux x64. Fully runnable but HAVE_ZLIB section of code (mostly in dpx-pdfobj.c) is not there yet.
  • Windows x64. The Rust part compiles fine, but there's linking errors because vsprintf, __ctype_b_loc, etc.. are not immediately usable on windows rust.
  • Mac and x86 platforms. Not tested yet, i imagine there will be problems with pointer size.

I'll continue work on this, might be slow. If anyone find the branch useful or want to give a hand, just go ahead.

@ratmice
Copy link
Contributor

ratmice commented Sep 7, 2019

One thing to note here is that (i believe...) some of this c code being converted I believe is actually generated via web2c, and rather than jumping from web -> c -> rust, it'd eventually be nice to just go directly from web -> rust.

As such, perhaps this can help guide us in where to direct the most effort in deciding which sources should receive more attention than others. I don't know enough about the underlying engine yet to say which, but a web2rust is something i would fancy working on eventually (however not in the near-term).

@crlf0710
Copy link
Contributor Author

crlf0710 commented Sep 7, 2019

@ratmice Yup, i believe the web converted code lives in the xetex_xetex0.rs and xetex_ini.rs. pkgw has added nice prefixes on each filename so it's easy to tell these files apart.

Another thing that some people may care about is the mixture state of code license. I think the xetex, bibtex, synctex part of the code are MIT-compatible, but dpx part (dvipdfmx, for pdf generation) and teckit(written in c++) are GPL-ed. If they can be replaced by MIT-compatible licensed code, it will enable Tectonic be embedded and used in more scenarios, including many commercial solutions.

EDIT: BibTex is written in WEB too.

@pkgw
Copy link
Collaborator

pkgw commented Sep 9, 2019

When it comes to Tectonic, yes, the bulk of the original code that we compile is C code that was generated through web2c. I have oriented my work around a very explicit decision that we are only going to base on the C code going forward. I will consult with the Web code to understand what's going on, and I'll look at XeTeX patches in Web-space, but that's it. I've done a lot of work to tidy up the C code since for me, it is the most important expression of the original engine code upon which Tectonic is based.

The basic motivation for this approach maps to this: I have a suspicion that a web2rust would be super hard to pull off. AFAICT, much of the TeX code (in its native Web) depends on aliasing tricks that would be unsafe Rust at best, and I wouldn't be surprised if some of the Web source wasn't even expressible as unsafe. One of the ways that I try to evolve the C code of Tectonic is to reduce the dirty tricks that it pulls — hopefully leading to a more natural Rust-ification of the C code someday. I'm very surprised that c2rust can handle what we've currently got!

@ratmice
Copy link
Contributor

ratmice commented Sep 9, 2019

@pkgw Thanks for clarifying your position on this, and the word of caution. It is very helpful to know when my initial inclination is contrary to explicit decisions that have been made. Reflecting upon that, if rustweb or web2rust ever become things, they will certainly need to take into account everything learned from the manual conversion of the web2c output. As such, my comment on where to focus attention does seem misguided.

@pkgw
Copy link
Collaborator

pkgw commented Sep 9, 2019

@ratmice Sorry, I didn't mean to come across as being that negative! The way that I envision Tectonic going right now, I don't think we'd use a web2rust tool for the core engine ... but, for instance, if there are other support tools that we'd like to integrate that are Web-based, I'd much rather go straight to Rust than convert to C! And of course, Tectonic is not the whole universe here.

And also, to be clear, the idea of Rustifying the C code is super interesting! Again, I totally didn't think it would even be possible given how many tricks the implementation pulls. My day job is keeping me ultra-busy these days but I hope to find a chance to see how c2rust makes everything work!

@ratmice
Copy link
Contributor

ratmice commented Sep 9, 2019

@pkgw I didn't really take it negatively at all, hopefully some context might shine a light on where i'm coming from.

a) my interest in TeX largely stems from it being a non-dynamically allocating language implementation portable to modern systems. Investigating this in modern systems languages.
I imagine much of the platform specific non-generated C-code was written without this style in mind, where in the generated sources this aspect should still be relatively intact. To the extent that one might imagine being able to compile the rustified generated code in a no_std environment.

b) being overall new to the engine of both xetex and tectonic itself, going through the update process of staging is a bit overwhelming when it comes to merging the modified upstream with the modified tectonic downstream code base.

The fear is that less reliance on the web portion, we remove the limitations of its original environment, the dividing line becomes much less clear, and the tricks in the implementation gain context. Perhaps the appropriate way to preserve A. is to try compiling the rustified c-code in it's own no_std crate -- if this is an aspect tectonic wishes to preserve, I will give that a go.

@lovasoa
Copy link

lovasoa commented Sep 9, 2019

Here is the error I get when trying to compile this on macos:

error: failed to run custom build command for `tectonic_engine v0.0.1-dev (/private/tmp/tectonic/engine)`

Caused by:
  process didn't exit successfully: `/private/tmp/tectonic/target/debug/build/tectonic_engine-afd9796eb3d3d62f/build-script-build` (exit code: 101)
--- stdout
cargo:rerun-if-env-changed=TECTONIC_DEP_BACKEND

--- stderr
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Failure { command: "\"pkg-config\" \"--libs\" \"--cflags\" \"harfbuzz >= 1.4 harfbuzz-icu icu-uc freetype2 graphite2 libpng zlib\"", output: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "Package icu-uc was not found in the pkg-config search path.\nPerhaps you should add the directory containing `icu-uc.pc\'\nto the PKG_CONFIG_PATH environment variable\nPackage \'icu-uc\', required by \'harfbuzz-icu\', not found\n" } }', src/libcore/result.rs:999:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

EDIT

I installed icu. with

brew install icu4c
brew link icu4c
export PKG_CONFIG_PATH="/usr/local/opt/icu4c/lib/pkgconfig"

and now the error is

CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2,sse3,ssse3")
running: "cc" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "." "-I" "/usr/local/Cellar/icu4c/64.2/include" "-I" "/usr/local/Cellar/harfbuzz/2.6.1/include/harfbuzz" "-I" "/usr/local/Cellar/glib/2.60.6/include/glib-2.0" "-I" "/usr/local/Cellar/glib/2.60.6/lib/glib-2.0/include" "-I" "/usr/local/opt/gettext/include" "-I" "/usr/local/Cellar/pcre/8.43/include" "-I" "/usr/local/opt/freetype/include/freetype2" "-I" "/usr/local/Cellar/graphite2/1.3.13/include" "-I" "/usr/local/Cellar/libpng/1.6.37/include/libpng16" "-Wall" "-Wextra" "-Wall" "-Wcast-qual" "-Wdate-time" "-Wendif-labels" "-Wextra" "-Wextra-semi" "-Wformat=2" "-Winit-self" "-Wmissing-declarations" "-Wmissing-include-dirs" "-Wmissing-prototypes" "-Wmissing-variable-declarations" "-Wnested-externs" "-Wold-style-definition" "-Wpointer-arith" "-Wredundant-decls" "-Wstrict-prototypes" "-Wswitch-bool" "-Wundef" "-Wwrite-strings" "-Wno-unused-parameter" "-Wno-implicit-fallthrough" "-Wno-sign-compare" "-std=gnu11" "-DHAVE_ZLIB=1" "-DHAVE_ZLIB_COMPRESS2=1" "-DZLIB_CONST=1" "-DXETEX_MAC=1" "-o" "/private/tmp/tectonic/target/debug/build/tectonic_engine-f115cc9379b11c54/out/tectonic/xetex-macos.o" "-c" "tectonic/xetex-macos.c"
cargo:warning=clang: error: no such file or directory: 'tectonic/xetex-macos.c'
cargo:warning=clang: error: no input files
exit code: 1

--- stderr


error occurred: Command "cc" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "." "-I" "/usr/local/Cellar/icu4c/64.2/include" "-I" "/usr/local/Cellar/harfbuzz/2.6.1/include/harfbuzz" "-I" "/usr/local/Cellar/glib/2.60.6/include/glib-2.0" "-I" "/usr/local/Cellar/glib/2.60.6/lib/glib-2.0/include" "-I" "/usr/local/opt/gettext/include" "-I" "/usr/local/Cellar/pcre/8.43/include" "-I" "/usr/local/opt/freetype/include/freetype2" "-I" "/usr/local/Cellar/graphite2/1.3.13/include" "-I" "/usr/local/Cellar/libpng/1.6.37/include/libpng16" "-Wall" "-Wextra" "-Wall" "-Wcast-qual" "-Wdate-time" "-Wendif-labels" "-Wextra" "-Wextra-semi" "-Wformat=2" "-Winit-self" "-Wmissing-declarations" "-Wmissing-include-dirs" "-Wmissing-prototypes" "-Wmissing-variable-declarations" "-Wnested-externs" "-Wold-style-definition" "-Wpointer-arith" "-Wredundant-decls" "-Wstrict-prototypes" "-Wswitch-bool" "-Wundef" "-Wwrite-strings" "-Wno-unused-parameter" "-Wno-implicit-fallthrough" "-Wno-sign-compare" "-std=gnu11" "-DHAVE_ZLIB=1" "-DHAVE_ZLIB_COMPRESS2=1" "-DZLIB_CONST=1" "-DXETEX_MAC=1" "-o" "/private/tmp/tectonic/target/debug/build/tectonic_engine-f115cc9379b11c54/out/tectonic/xetex-macos.o" "-c" "tectonic/xetex-macos.c" with args "cc" did not execute successfully (status code exit code: 1).

Should I open an issue ?

@pkgw
Copy link
Collaborator

pkgw commented Sep 9, 2019

@crlf0710, is @lovasoa's problem potentially an oversight in the build scripts for your oxidize branch?

@lovasoa
Copy link

lovasoa commented Sep 9, 2019

It looks like xetex-macos.c is included conditionally, so it was not translated to rust by c2rust.

@crlf0710
Copy link
Contributor Author

crlf0710 commented Sep 9, 2019

@lovasoa I updated the branch and added the file. I believe there'll still be linking issues. For oxidization issues. I think you can open issues and/or pr under https://github.com/crlf0710/tectonic/issues.

@lovasoa
Copy link

lovasoa commented Sep 9, 2019

See crlf0710#4

@crlf0710
Copy link
Contributor Author

crlf0710 commented Sep 9, 2019

I've invited @pkgw as collaborator in the fork repo, i think when the oxidized code base is a good enough state (might take a while), we can upstream the oxidized version back here.

@ThomasdenH
Copy link

This sounds interesting, how can people contribute best?

@crlf0710
Copy link
Contributor Author

I've opened a few issues under https://github.com/crlf0710/tectonic/issues. Leave a comment there if you want to try one of those~

@crlf0710
Copy link
Contributor Author

Status update: Since then we've successfully manually converted all the remaining xetex c++/objc code into rust, now the oxidized version of tectonic can run successfully on windows/linux/mac. Unfortunately there are some nightly feature usages introduced by c2rust itself. We're still working on removing them to allow the crate to build on stable again.

We're also working using existing crates to replace the manual dependency specification. Any help is appreciated here!

By the way, I'm also spending a little time to investigate on the original WEB tooling that TeX itself uses. Hope i can find something useful soon.

@pkgw
Copy link
Collaborator

pkgw commented Sep 29, 2019

@crlf0710 Thanks for the update! I'm afraid that I still haven't looked over to see what the translated code looks like, but I'm very impressed that it's all working!

@burrbull
Copy link

It would be great to add notice in master repo README or/and site about this work, as well as an invitation to participate.

@pkgw
Copy link
Collaborator

pkgw commented Nov 11, 2019

@burrbull Yes, I am happy to do so, but have been overwhelmed with my day job lately and haven't had the bandwidth to take the initiative on matters like this. Pull requests are more than welcome, as they say.

@burrbull
Copy link

I have progress in oxidizing:
crlf0710#273
But I need help in regression testing of changes.
@Mrmaxmeier are you still interested in project? Your site https://tt.ente.ninja/ is shut down.

@pkgw
Copy link
Collaborator

pkgw commented Oct 20, 2020

Thanks for the update @burrbull . I have not been following the effort due to personal time constraints, but I would love to make sure that this repo and the oxidation effort stay well-aligned and that we have a vision for how to merge the work back into the mainline.

BTW, I am working towards updating Tectonic for TeXLive 2020.0 and it looks like I will need to update the C/C++ code for the first time in several years. There's always more to do ...

@Mrmaxmeier
Copy link
Contributor

I have progress in oxidizing:
crlf0710#273
But I need help in regression testing of changes.
@Mrmaxmeier are you still interested in project? Your site https://tt.ente.ninja/ is shut down.

Sorry about that. The setup is a bit awkward and needs a bunch of storage and compute. It previously lived on my NAS but I'll resurrect it somewhere else.
I might reduce the amount of samples though as each run currently takes ~10h in CPU time.

@Mrmaxmeier
Copy link
Contributor

[..] but I would love to make sure that this repo and the oxidation effort stay well-aligned and that we have a vision for how to merge the work back into the mainline.

This concerns me as well. I guess the regression tests are nice to ensure correct refactorings, but it doesn't seem like there's a straight-forward path to merging things back into tectonic.

@XVilka
Copy link

XVilka commented Oct 22, 2020

I can help with the server for that for a while. Send me an email.

@burrbull
Copy link

I can help with the server for that for a while. Send me an email.

Mrmaxmeier/tectonic-on-arXiv#5

@XVilka
Copy link

XVilka commented Oct 22, 2020

Not sure what do you mean. I am thinking about something like hosting the self-hosted GitHub CI worker and probably syncing script with that arXiv S3 bucket.

@burrbull
Copy link

burrbull commented Nov 1, 2020

Need help in porting #666 to oxidize.

@XVilka
Copy link

XVilka commented Nov 27, 2020

Talking about donate it makes sense to move this branch under https://github.com/tectonic-typesetting/ organization (or create new?) and add this option to https://tectonic-typesetting.github.io/en-US/contribute.html. cc @pkgw

crlf0710#257 (comment)

@XVilka
Copy link

XVilka commented Feb 24, 2021

Now that Rust Foundation is announced, maybe it makes sense to cooperate with them for donations/etc?
For example, registering GitHub Sponsors with them.

@ralismark
Copy link
Contributor

ralismark commented Jul 23, 2021

Out of interest, is there a plan for when/how the oxidised fork gets merged back into master? Separate to this fork, I've been having a go at reimplementing parts of the XeTeX engine in rust using cbindgen, referencing documentation generated from the XeTeX web source rather than just the generated C code.

@pkgw
Copy link
Collaborator

pkgw commented Sep 10, 2021

@ralismark Sorry for the extremely late reply here. The honest answer is that there isn't a plan per se — I'd like to see it happen but my personal priority is to get the infrastructure for real HTML output in place (and I'm so busy with real life these days that even that project is going nowhere fast). My hope is that the split into crates will make it easier to incrementally migrate oxidized C code back into the codebase, but unfortunately it's true that the split introduced a bunch of changes that I'm sure are difficult to mirror into the fork.

Just to throw it out there, though, the bibtex engine is a single C file, so it would probably be by far the easiest place to start an incremental oxidization effort. It would also be great to migrate the xetex_layout crate, which is OS-dependent font code that would be great to get into a more modern, flexible Rust incarnation. And that code is actual human-written C++ and not hard-to-understand web2c output.

@CraftSpider
Copy link
Contributor

CraftSpider commented May 12, 2023

I really like the work from oxidize, but have started work on a slightly different approach: in #1032, I've started a bottom-up hand conversion. The idea being that by starting at the code that depends on the least other code, or statics that are common but relatively easy to replace with an API, and converting things above it as all their requirements are converted, the result is both amenable to human review commit-by-commit (since every individual refactor results in a functional midpoint), and allows for code to be converted into safe Rust as soon as possible, since each function's dependents are the first code to next be converted.

I'm not sure how far I'll get with this strategy, but I'm hoping if I can convert bibtex it will act as a proof-of-concept for the quality of the resulting code.

@CraftSpider
Copy link
Contributor

It's worth noting that my conversion makes no effort to preserve the properties of the original code - if anything it sprints in the opposite direction, attempting to use common Rust practices and the std as much as possible.

@burrbull
Copy link

I'm glad to hear someone else started similar work.

@XVilka
Copy link

XVilka commented Nov 21, 2023

@CraftSpider I saw it was merged. Awesome work! Good to see the effort continues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants