-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document support for preprocessing code #108
Comments
It should be much easier to strip complex comments by overriding |
True, I didn't think about that. I found that TatSu does preprocessing itself by making the lines appear like comments so that it doesn't matter to leave the directives in. Nice shortcut on that one xD |
You can also override |
... fine but what If I find something that requires actual preprocessing |
A preprocessor is at most a macro interpreter. For actual preprocessing, you chain PEG parsers... |
And can I do that while also keeping the line numbers intact? |
It would take some work. The output of the first pass would not be plain text, but something that contains the line number information (which is something you mention in your original post). The current implementation of Take a look at lines[i:j] = preprocessed_lines ## may be []
index[i:j]= LineIndexInfo.block_index('such changes, or filename', len(preprocessed_lines))
return lines, index Preprocessing may be the least reviewed part of TatSu, as I wrote all of it in a hurry for COBOL and NaturalAG. There's probably room for improvement. |
This is solved on my last comment. This is an example from the actual COBOL parser: def _preprocess_block(self, name, block, **kwargs):
block = uncomment_exec_sql(block)
(lines, index) = super()._preprocess_block(name, block, **kwargs)
continuations.preprocess_lines(lines, index)
return (lines, index)
def process_block(self, name, lines, index, **kwargs):
lines = [self.normalize_cobol_line(i, c) for i, c in enumerate(lines)]
n = 0
while n < len(lines):
if COPYRE.match(lines[n]):
n = self.resolve_copy(n, lines, index, **kwargs)
else:
n += 1
return lines, index |
I'm changing the title to leave the issue open make it a documentation request. |
Justa PING to myself, because this is now a documentation request. |
@Victorious3, At some point As to write the good documentation, couldn't your case be solved by a chain of |
For your issue to be solved with a chain of PEG originally doesn't use a tokenizer because it can drill down to comments and tokens, but the work with the Python PEG parser was much easier because there was a tokenizer. The introduction of Let's leave this open, and think more about it. |
Handling comments inside my grammar has been such a performance drag that I decided to strip them in a preprocessing step (I have
/* nested /* comments */*/
). There's a method called_preprocess
inBuffer
which I ended up overriding for this purpose.Sadly that completely messes with the line numbers. I found no provision in TatSu for this so I ended up generating my own
LineCache
in a very similar way to TatSu and converting the "wrong" line numbers to my "real" line numbers before giving my diagnostics. This... worked, but its obviously not ideal.I have no idea how to generalize my solution but I still think TatSu could support this in some way, so I leave it open for discussion.
Here are my thoughts on it:
#line
directives (see C). If those were supported by TatSu it could make for a simple but powerful solution. (Such a feature would have to be customizable to avoid clashes). Cons: I'd have to do math to figure out what directives to generate. Math is annoying.The text was updated successfully, but these errors were encountered: