Skip to content
This repository has been archived by the owner on Sep 28, 2022. It is now read-only.

Add tuning-related parameters to the command line tools that implement the data format conversions #14

Open
hartig opened this issue Aug 31, 2018 · 4 comments

Comments

@hartig
Copy link
Member

hartig commented Aug 31, 2018

TODO: what parameters do we have and which of them may be reasonable to expose?

@keski
Copy link
Collaborator

keski commented Aug 31, 2018

The command line tools, ConverterRDF2RDFStar and ConverterRDFStar2RDF, do not support all of the flags that are provided in their respective usage instructions. Not all of these are strictly required, but we should clean it up to avoid any confusion.

keski added a commit that referenced this issue Aug 31, 2018
keski added a commit that referenced this issue Aug 31, 2018
keski added a commit that referenced this issue Sep 3, 2018
@keski
Copy link
Collaborator

keski commented Sep 3, 2018

The main performance issues seems to stem from the fact that the PipedRDFIterator and PipedTriplesStream should not be run in the same thread. Initial tests indicate that fixing this leads to fairly consistent results for both the first and second pass (see below).

For ConverterRDFStar2RDF:

# lines (nested) First pass Second pass
200,000 54k rows/s 29k rows/s
500,000 80k rows/s 31k rows/s
1,000,000 102k rows/s 32k rows/s
2,000,000 132k rows/s 32k rows/s

Similar performance improvements were also seen for ConverterRDF2RDFStar for larger files.

@keski
Copy link
Collaborator

keski commented Sep 3, 2018

nested.ttls (150M records) now takes around 2 hours (125 minutes) to convert into reification format on the server using the predefined buffer size (which should probably be set a bit higher).

@keski
Copy link
Collaborator

keski commented Sep 4, 2018

nested-reif.ttl takes a about 1.5 hours (95 minutes) to convert back to TTLS. But comparing this file with the original nested.ttls, however, shows major discrepancies in terms of file size. This seems to be the result of percent encoding (see issue #16).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants