-
-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLI support? #73
Comments
On Linux this can already be done using jq: https://stedolan.github.io/jq/ |
Good point, I didn't know, that jq supports stream parsing. The speed will be incomparable, that's clear, but the jq's usage with stream parsing seems somehow unintuitive. While looking at the usage of jq another option came to my mind for $ wget <big list of users to stdout> | jm --pointer=/results
{"key": 0, "value": {"name": "Frank Sinatra", ...}}
{"key": 1, "value": {"name": "Ray Charles", ...}}
... It is extensible for other fields in the future such as |
You're right, using #!/usr/bin/env php
<?php
use JsonMachine\Items;
if ( ! is_file(dirname(__DIR__).'/vendor/autoload.php')) {
throw new LogicException('Composer autoloader missing. Try running "composer install".');
}
require_once dirname(__DIR__).'/vendor/autoload.php';
function usage()
{
echo sprintf('usage: %s --pointer=""', __FILE__)."\n";
exit(1);
}
$options = getopt(null, ['pointer:']);
if (!isset($options['pointer'])) {
usage();
}
$iterator = Items::fromFile('php://stdin', $options);
foreach ($iterator as $row) {
echo json_encode($row)."\n";
} |
Yes, something along those lines. Using $ wget <big list of users to stdout> | jm --item-template="{{name}};{{born}}"
Frank Sinatra;1915
Ray Charles;1930 Combined with json pointer it could be quite versatile. |
For the uninitiated, jq's streaming parser is usually quite difficult to use, but worse, for the following two essential tasks (described here using standard jq syntax), it is typically very slow (many hours or days) for very big files:
To my knowledge, there is currently no CLI-tool for running these two jq queries conveniently, speedily, and losslessly against very large JSON arrays or objects, respectively. (By "lossless" I mean avoiding the loss of precision in handling JSON numbers.) Being able to use JSON Pointer to fine-tune the point of the "explosion" would be fantastic! Thank you! |
@fwolfsjaeger - Unfortunately your script does not preserve the JSON structure of the items at the specified point(s). Or at least, I tried it with 'pointer' => '/-' and with input: -- Incidentally, after running
(In fact, both the files ./vendor/autoload.php are present.) |
That's correct behavior. If you want to iterate top level, use empty string JSON pointer (default). Read more about it in REAMDE to see how exactly a hyphen in JSON pointer works. By using |
@halaxa - Thank you for your explanation. Please understand that the difficulty I had was precisely because I read the README quite closely, the point being that in JSON, numbers are scalars, not iterables. That is, I would have expected that an attempt to iterate over a number would either result in an error, or nothing at all. (In jq, gojq, and jaq, it results in an error e.g. Part of my confusion arose from statements such as the following in an "Overview of JSON Pointer": (*)
When I tried using "/" as the JSON Pointer, I just got an error, so "/-" seemed like the next best bet. The fact that #fwolfsjaeger's script requires a pointer didn't help my understanding. Now that I understand how to iterate over an array, I would like to know how to avoid loss of numeric precision, e.g. 400000000000000000000000000000000000000000000000000000000123 => 4.0e+59 Thank you again. |
No problem :) This sencence
from here https://www.baeldung.com/json-pointer is incorrect. See https://www.rfc-editor.org/rfc/rfc6901#section-5. The official RFC is also linked from the JSON Machine README https://github.com/halaxa/json-machine#what-is-json-pointer-anyway.
|
I'll elaborate on the other two points of yours later. Hopefully tomorrow. |
@halaxa @fwolfsjaeger - My PHP was never good to begin with and is by now very rusty, but the following script has already proven useful to me and might provide a basis for further improvements. Suggestions would of course be welcome. [EDIT: The script has been moved to Issue#88 ] |
Can you move this last post to a new discussion, please? |
@halaxa - The last post is essentially a CLI script, so I thought this would be the best thread? By the way, many of my colleagues who might benefit from a script such as jm would probably be discouraged by the installation hurdles that currently exist, so I was wondering whether you could envision at some point making JSON Machine available via |
It sure is a cli script. But I understand you want some suggestions. Discussions would be better place for this. If you want to actually participate with some code to this repository, please use a pull request. This thread is mainly for ideas and suggestions about how should CLI interface work. As for other installation channels, I'll let someone else to do it for now. It needs its own maintenance time which I don't have. It's OSS, anyone can generate any package from any revision. But thank you for your suggestion. Please keep them coming ;) |
I understand the confusion. The idea is, that you can specify either iterable or scalar and JSON Machine will always give it to you. Of course you can run into a confusion when usin wildcard JSON Pointer. I have an idea, what about having an option for enabling strict mode? Either you specify
Pass a custom |
@halaxa - I've created Issue#88 in accordance with your request. Should I now delete the script from the message in this thread? As mentioned in Issue#87, I'm not sure how JSON_BIGINT_AS_STRING helps, as it converts all "bigints" to strings, and doesn't even attempt to handle big or small decimals. As for |
I agree with deleting it.
Your example has an integer, so that's why I suggested it. |
It's not clear to me how a CLI script can be implemented to handle a JSON file that contains more than one top-level JSON entity. There are tools for converting such files to JSONLines format (one JSON entity per line), but it's inconvenient to have to place each of these in a separate file for the sake of JSON Machines. In addition, some of these tools lose numerical precision. Any suggestions would be appreciated. Thanks. |
Please let me know by reactions/voting or comments if a CLI version of JSON Machine would be useful to have. Thanks.
jm
command would take a JSON stream fromstdin
, and send items one by one tostdout
wrapped in a single-item JSON object encoded as{key: value}
.Possible usage:
Another idea might be to wrap the item in a JSON list instead of an object, like so:
The text was updated successfully, but these errors were encountered: