Skip to content

rzhikharevich/obiectumtokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ObiectumTokenizer

Just a small library for tokenizing strings.

Features

  • Line and column numbers.
  • Custom special characters - they're always counted as separate tokens (if not inside a string).
  • Custom comments - both single-line and multi-line.
  • Strings - a string is always a monolithic token, wrapped in quotes.
  • UTF-8 is the only supported encoding.

Example

obtokenizer_tokenizer_t tokenizer;
if (obtokenizer_init(&tokenizer, "abc /* comment №1 */ def,123 // №2") ||
    obtokenizer_add_spec_char(&tokenizer, ',')                   || // Count commas as separate tokens.
    obscanner_add_comment_mark(&tokenizer.scanner, false, "//")  || // Enable C-style single-line comments.
    obscanner_add_comment_mark(&tokenizer.scanner, true,  "/*")  || // Enable C-style multi-
    obscanner_add_comment_mark(&tokenizer.scanner, true,  "*/")     // line comments.
    ) {
    // error
}

obtokenizer_token_t token;
while (!obtokenizer_get(&tokenizer, &token)) {
    if (token.str[0] == '\0') {
        // No more tokens.
        obtokenizer_free_token(&token);
        break;
    }

    printf("%d:%d: %s\n", token.pos.line, token.pos.col, token.str);

    obtokenizer_free_token(&token); // Must be called before each reuse of a token structure.
}

Output:

$ ./test
1:1: abc
1:22: def
1:25: ,
1:26: 123

About

Just a small tokenization library written in pure C.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published