-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oxidize everything #459
Comments
One thing to note here is that (i believe...) some of this c code being converted I believe is actually generated via web2c, and rather than jumping from web -> c -> rust, it'd eventually be nice to just go directly from web -> rust. As such, perhaps this can help guide us in where to direct the most effort in deciding which sources should receive more attention than others. I don't know enough about the underlying engine yet to say which, but a web2rust is something i would fancy working on eventually (however not in the near-term). |
@ratmice Yup, i believe the web converted code lives in the xetex_xetex0.rs and xetex_ini.rs. pkgw has added nice prefixes on each filename so it's easy to tell these files apart. Another thing that some people may care about is the mixture state of code license. I think the xetex, bibtex, synctex part of the code are MIT-compatible, but dpx part (dvipdfmx, for pdf generation) and teckit(written in c++) are GPL-ed. If they can be replaced by MIT-compatible licensed code, it will enable Tectonic be embedded and used in more scenarios, including many commercial solutions. EDIT: BibTex is written in WEB too. |
When it comes to Tectonic, yes, the bulk of the original code that we compile is C code that was generated through web2c. I have oriented my work around a very explicit decision that we are only going to base on the C code going forward. I will consult with the Web code to understand what's going on, and I'll look at XeTeX patches in Web-space, but that's it. I've done a lot of work to tidy up the C code since for me, it is the most important expression of the original engine code upon which Tectonic is based. The basic motivation for this approach maps to this: I have a suspicion that a |
@pkgw Thanks for clarifying your position on this, and the word of caution. It is very helpful to know when my initial inclination is contrary to explicit decisions that have been made. Reflecting upon that, if rustweb or web2rust ever become things, they will certainly need to take into account everything learned from the manual conversion of the web2c output. As such, my comment on where to focus attention does seem misguided. |
@ratmice Sorry, I didn't mean to come across as being that negative! The way that I envision Tectonic going right now, I don't think we'd use a And also, to be clear, the idea of Rustifying the C code is super interesting! Again, I totally didn't think it would even be possible given how many tricks the implementation pulls. My day job is keeping me ultra-busy these days but I hope to find a chance to see how |
@pkgw I didn't really take it negatively at all, hopefully some context might shine a light on where i'm coming from. a) my interest in TeX largely stems from it being a non-dynamically allocating language implementation portable to modern systems. Investigating this in modern systems languages. b) being overall new to the engine of both xetex and tectonic itself, going through the update process of staging is a bit overwhelming when it comes to merging the modified upstream with the modified tectonic downstream code base. The fear is that less reliance on the web portion, we remove the limitations of its original environment, the dividing line becomes much less clear, and the tricks in the implementation gain context. Perhaps the appropriate way to preserve A. is to try compiling the rustified c-code in it's own no_std crate -- if this is an aspect tectonic wishes to preserve, I will give that a go. |
Here is the error I get when trying to compile this on macos:
EDIT I installed icu. with
and now the error is
Should I open an issue ? |
It looks like |
@lovasoa I updated the branch and added the file. I believe there'll still be linking issues. For oxidization issues. I think you can open issues and/or pr under https://github.com/crlf0710/tectonic/issues. |
See crlf0710#4 |
I've invited @pkgw as collaborator in the fork repo, i think when the oxidized code base is a good enough state (might take a while), we can upstream the oxidized version back here. |
This sounds interesting, how can people contribute best? |
I've opened a few issues under https://github.com/crlf0710/tectonic/issues. Leave a comment there if you want to try one of those~ |
Status update: Since then we've successfully manually converted all the remaining xetex c++/objc code into rust, now the oxidized version of tectonic can run successfully on windows/linux/mac. Unfortunately there are some nightly feature usages introduced by c2rust itself. We're still working on removing them to allow the crate to build on stable again. We're also working using existing crates to replace the manual dependency specification. Any help is appreciated here! By the way, I'm also spending a little time to investigate on the original WEB tooling that TeX itself uses. Hope i can find something useful soon. |
@crlf0710 Thanks for the update! I'm afraid that I still haven't looked over to see what the translated code looks like, but I'm very impressed that it's all working! |
It would be great to add notice in master repo README or/and site about this work, as well as an invitation to participate. |
@burrbull Yes, I am happy to do so, but have been overwhelmed with my day job lately and haven't had the bandwidth to take the initiative on matters like this. Pull requests are more than welcome, as they say. |
I have progress in oxidizing: |
Thanks for the update @burrbull . I have not been following the effort due to personal time constraints, but I would love to make sure that this repo and the oxidation effort stay well-aligned and that we have a vision for how to merge the work back into the mainline. BTW, I am working towards updating Tectonic for TeXLive 2020.0 and it looks like I will need to update the C/C++ code for the first time in several years. There's always more to do ... |
Sorry about that. The setup is a bit awkward and needs a bunch of storage and compute. It previously lived on my NAS but I'll resurrect it somewhere else. |
This concerns me as well. I guess the regression tests are nice to ensure correct refactorings, but it doesn't seem like there's a straight-forward path to merging things back into tectonic. |
I can help with the server for that for a while. Send me an email. |
|
Not sure what do you mean. I am thinking about something like hosting the self-hosted GitHub CI worker and probably syncing script with that arXiv S3 bucket. |
Need help in porting #666 to |
|
Now that Rust Foundation is announced, maybe it makes sense to cooperate with them for donations/etc? |
Out of interest, is there a plan for when/how the oxidised fork gets merged back into master? Separate to this fork, I've been having a go at reimplementing parts of the XeTeX engine in rust using cbindgen, referencing documentation generated from the XeTeX web source rather than just the generated C code. |
@ralismark Sorry for the extremely late reply here. The honest answer is that there isn't a plan per se — I'd like to see it happen but my personal priority is to get the infrastructure for real HTML output in place (and I'm so busy with real life these days that even that project is going nowhere fast). My hope is that the split into crates will make it easier to incrementally migrate oxidized C code back into the codebase, but unfortunately it's true that the split introduced a bunch of changes that I'm sure are difficult to mirror into the fork. Just to throw it out there, though, the bibtex engine is a single C file, so it would probably be by far the easiest place to start an incremental oxidization effort. It would also be great to migrate the |
I really like the work from oxidize, but have started work on a slightly different approach: in #1032, I've started a bottom-up hand conversion. The idea being that by starting at the code that depends on the least other code, or statics that are common but relatively easy to replace with an API, and converting things above it as all their requirements are converted, the result is both amenable to human review commit-by-commit (since every individual refactor results in a functional midpoint), and allows for code to be converted into safe Rust as soon as possible, since each function's dependents are the first code to next be converted. I'm not sure how far I'll get with this strategy, but I'm hoping if I can convert bibtex it will act as a proof-of-concept for the quality of the resulting code. |
It's worth noting that my conversion makes no effort to preserve the properties of the original code - if anything it sprints in the opposite direction, attempting to use common Rust practices and the std as much as possible. |
I'm glad to hear someone else started similar work. |
@CraftSpider I saw it was merged. Awesome work! Good to see the effort continues. |
So, i have a branch that converted all the c code (c++ code and objc code not included, yet) to rust using the c2rust tool at https://github.com/crlf0710/tectonic/tree/oxidize . However the code generated is not fully portable, and all pre-processor macros are lost.
Current progress is here:
HAVE_ZLIB
section of code (mostly in dpx-pdfobj.c) is not there yet.vsprintf
,__ctype_b_loc
, etc.. are not immediately usable on windows rust.I'll continue work on this, might be slow. If anyone find the branch useful or want to give a hand, just go ahead.
The text was updated successfully, but these errors were encountered: