Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing nested values in objects #95

Open
kkozlik opened this issue Apr 28, 2023 · 5 comments
Open

Parsing nested values in objects #95

kkozlik opened this issue Apr 28, 2023 · 5 comments

Comments

@kkozlik
Copy link

kkozlik commented Apr 28, 2023

Hello,
I am wondering whether is it possible to to parse nested values in objects using json machine. The manual describe Parsing nested values in arrays using the - in pointer, but this unfortunatelly does not work in case of objects.

I have a JSON in format:

{
    "results": {
        "fruits": {
            "apple": {
                "color": "red"
            },
            "pear":{
                "color": "yellow"
            }
        },
        "vegetable": {
            "carrot": {
                "color": "red"
            }
        }
    }
}

And I do not know the categories (fruits, vegetable,...) in advance. When I use pointer like /results the parser reads the whole category into memory which still could be pretty big.

Is it somehow possible to read the names of categories only and skip storing its actual content into memory? The PassThruDecoder still loads the content into memory, it just do not decode it. Maybe somehow instruct the parser to parse only keys, but do not read the content of the { } into $jsonBuffer? Once I have list of categories, I can parse the objects in second round, using pointers for each category.

Or maybe another solution could be implement the hyphen pointer also for objects. Then I can parse the file with pointer like /results/- and read the category names using the getCurrentJsonPointer() function?

@halaxa
Copy link
Owner

halaxa commented Apr 28, 2023

Hi, thanks for participating.

As you said, it is not possible to parse nested items in objects. JSON Pointer is too simple a language for that. We can't just start supporting "-" as a wildcard for object keys because what if a key in an object is "-"?

Parser has to always read everything to get to the object keys. The main use case for the parser is to sequentially read all the items in a specified subtree. That's why it stores every item in memory. It's usually what a programmer wants.

Once I have list of categories, I can parse the objects in second round, using pointers for each category.

This will only complicate things for you. If you're already there, decode it and use it. The second round will do exactly the same work as the first. If you expect it to be somehow more efficient or faster the second round, keep in mind that using json pointers will not affect parsing time in any way, only memory usage. The parser always has to read everything to get to the desired key. No direct access as in hashmaps.

PassThruDecoder is there for such situations. If a single item is too big, do the top-level parsing using it and then parse the produced string via ExtJsonDecoder as shown in README.

If you're really low on memory try #36. The prototype should work. It should be installable via

composer require halaxa/json-machine:dev-recursive

It might nudge me to finish it :)

If any of this is of no use to you, try for example salsify/jsonstreamingparser.

Does this answer your questions?

@kkozlik
Copy link
Author

kkozlik commented Apr 28, 2023

We can't just start supporting "-" as a wildcard for object keys because what if a key in an object is "-"?

Yep, true. But I am sure this problem would be solvable with some kind of escaping. And maybe use something else than hyphen...

Parser has to always read everything to get to the object keys.

I perfectly understand this. I just though that the value does not need to be hold in memory. If the /results/fruits has few thousands of records, few MB each, than even use of PassThruDecoder means holding some GBs in memory. However if the parser would not hold that text in $jsonBuffer variable, just iterate over keys and throw those data away, I would be able to iterate over keys with use of very few memory only, would not I? And once I have the keys, I can construct pointers like /results/fruits, /results/vegetable, etc. and iterate over the file once again and hold only single record in the memory each time. That was just my dumb idea...

The #36 looks promising and much more elegant solution of course. I will try it.

Thanks for pointing to salsify parser, I will also have a look to it.

@halaxa
Copy link
Owner

halaxa commented May 7, 2023

Yep, true. But I am sure this problem would be solvable with some kind of escaping. And maybe use something else than hyphen...

The only option via escaping would be adding another escape sequence, for example ~2 which could mean asterisk (maybe .* regex equivalent), and thus stop being compatible with the json pointer spec. I'm not sure if I incline to this.

https://datatracker.ietf.org/doc/html/rfc6901#section-3

@XedinUnknown
Copy link
Contributor

@halaxa, hi! Thanks for the awesome lib! I really hope it will become a fully-comprehensive solution one day, because as of now it seems to be at least the best 🙏

To the point, IMHO: if the spec doesn't mention that you must not introduce any other escape sequences and everything besides existing defined 2 sequences must be treated literally, I believe you would technically still be compatible with their spec; you'd simply supersede it.

@halaxa
Copy link
Owner

halaxa commented Nov 24, 2024

Recursive iteration #36 is now finished, merged, and released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants