-
-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for running the same template over a stream of JSON and output individual files #1248
Comments
Also, I've read your blog post https://blog.hairyhenderson.ca/post/one_template_many_outputs/, and although "technically" it can be a solution to what I'm proposing, it's not quite the same. For example, in my use-case, I just had to generate ~600K files, for an input JSON file of ~22 GiB. I would assume that loading that JSON in memory would consume quite a lot more memory, thus it is just not feasible to apply your suggestion. Indeed, I could chunk the input file, but given it is actually compressed, it would be so much easier to just be able to run over that stream. |
Hi @cipriancraciun - thanks for filing this issue! Just so I'm understanding this correctly - is this essentially asking for JSONL support? This also seems similar (but not directly related) to #534. I've thought about this in the past and I've also encountered a few separate standards - NDJSON (which seems identical to JSONL), and However what I had been considering was essentially supporting JSONL/multi-doc YAML streams as arrays - in other words, a JSONL datasource would be parsed as a whole first and then accessible in the template for looping or indexing. Certainly not very useful for your particular use, especially if you're talking about multi-GiB inputs! Just thinking very briefly about this I could imagine some sort of $ gomplate --stream -d stream=./stream.jsonl -f template.tmpl --output-map 'out/{{ .name }}' All this said, this sort of change in gomplate's behaviour would likely be quite complex - there are a bunch of assumptions made about how it processes datasources that would need to be totally re-worked. And, there's the matter of time as well - I don't have a lot of free time these days to work on gomplate, so this would likely take quite a while to implement... |
@hairyhenderson, indeed I'm asking for support for one of those formats. Now regarding the exact format, I would list them in order by preference:
(In fact option one and two are useful in different cases, thus perhaps supporting both would be useful.)
Indeed, when one has sequences of JSON terms, most likely they are quite large, and couldn't have been provided as a single JSON array in the first place. (In fact if this would be the case, a simple |
Thanks @cipriancraciun, and my apologies for the slow response. This makes sense, and I'm tentatively interested in adding this to gomplate. However just be aware that this will take some time as the changes are complex and I don't work on gomplate full-time. |
@hairyhenderson, I understand completely, take your time (if you decide to implement this). (I understand this is an open-source project, and if I would be accustomed with the code, I would have tried it myself.) |
This issue is stale because it has been open for 60 days with no activity. Remove |
I am still interested in this feature (for future use of Granted, at the time I don't have the time to implement it myself, thus I'm OK with this feature request being closed with "won't implement". |
Thanks for the feedback @cipriancraciun! I'm going to close this issue for now, since I don't have the time to implement this, and nobody else seems to have any interest in implementing it. |
(This feature request is somewhat related to #485 and #197 although not quite.)
For example there is a stream of JSON objects, perhaps obtained from
jq
, and one wants to run the same template over each of those JSON objects, and output each result into its distinct file.I would expect to be able to run something like this:
Explaining the snippet above:
jq ...
would produce one JSON object per line (although ideallygomplate
should read an entire JSON object regardless of whitespaces, and then continue to the next one);gomplate
sees the.json-sequence
extension (or perhaps a different flag like--context-sequence
or similar), and read one such JSON object at a time, and thengomplate
would run the template on that JSON object, and the output filename generator, and write the result to that particular file;gomplate
would loop until it is done;Additionally, if one doesn't specify
--output-map
, but only--out
or nothing at all, thengomplate
would execute the same template for each JSON object and just append everything together either to the single output file or tostdout
.Additionally, although this might be harder to implement but it would be orthogonal to multiple template files, if one specifies multiple
--in
files or--in-dir
folder, and also--context-sequence
(for example), but in this case--output-map
is mandatory, each JSON object is executed for each template.The text was updated successfully, but these errors were encountered: