Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed Parsing Method of Identifiers #138

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

hexofyore
Copy link
Contributor

No description provided.

@hexofyore
Copy link
Contributor Author

@ISibboI Sorry. I just checked the test i ran and forgot to run full test. I will look at it. Meanwhile, you can check the logic.

@hexofyore
Copy link
Contributor Author

@ISibboI Is there wrong with the new one?

@benwr
Copy link
Contributor

benwr commented Jun 8, 2023

(randomly happened to see this PR - wanted to chime in to say that I've actually benefitted a lot from the flexibility in identifier names, since it lets you define custom conventions that "look like" special syntax but are actually implemented by the Context)

@ISibboI ISibboI force-pushed the feature_literal_errors branch from 53d4507 to 8431650 Compare June 8, 2023 12:53
Copy link
Owner

@ISibboI ISibboI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is great.

Due to the comment of @benwr, I would propose to make the identifier parsing generic using a (zero-cost) strategy pattern. This implies quite a large amount of changes though. I would be happy if you would work on that, but otherwise I can also do it.

The Context trait should be extended with an associated type IdentifierValidatorType (or similar). This associated type should be bounded by a trait IdentifierValidator (or similar) with a single function validate_identifier(&str) -> EvalexprResult<()>.

Then, when parsing a literal, it should first attempt to parse boolean and all numeric variants (int, float, scientific notation), and then pass it to validate_identifier. This function can then return any error it wants to if it disagrees with the identifier.

There can be an "any" implementation, that reflects the current behaviour.
An implementation that reflects the Rust behaviour, like what you implemented now.
And also an implementation that allows e.g. only ASCII characters, underscores, numbers and colons, if you want to.

Then this requires a bunch of changes in context implementations that I cannot describe without trying...

So yeah, a bunch of work.

let contains_alphabet = literal.contains(|x: char| x.is_alphabetic());
if let Ok(boolean) = literal.parse::<bool>() {
Some(Token::Boolean(boolean))
} else if starts_with_alphabet_or_underscore
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decomposition of the conditions into separate variables is great!
Here we should first check only starts_with_alphabet_or_underscore, and if it does but any of the other three are false, then return a the error variant IllegalIdentifierSequence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did boolean first because true and false would both be interpreted as identifiers.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that the branch after boolean should only check for starts_with_alphabet_or_underscore, and then should have an inner branch that checks the rest.

@@ -370,10 +384,16 @@ fn partial_tokens_to_tokens(mut tokens: &[PartialToken]) -> EvalexprResult<Vec<T
cutoff = 3;
Some(Token::Float(number))
} else {
Some(Token::Identifier(literal.to_string()))
return Err(EvalexprError::IllegalIdentifierSequence(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are parsing the literal as a number, so this should be a new error variant like IllegalNumericLiteral.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will change this

}
},
_ => Some(Token::Identifier(literal.to_string())),
_ => {
return Err(EvalexprError::IllegalIdentifierSequence(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are parsing the literal as a number, so this should be a new error variant like IllegalNumericLiteral.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@hexofyore
Copy link
Contributor Author

Ok. I will probably do this after this weekend.

@ISibboI ISibboI force-pushed the main branch 6 times, most recently from e4a8571 to 6608b16 Compare October 11, 2024 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants