Skip to content

Latest commit

 

History

History
107 lines (90 loc) · 5.36 KB

README.md

File metadata and controls

107 lines (90 loc) · 5.36 KB

Build Status Coverity Scan Build Status Code Climate

QuiXDM

QuiXDM is an ubiquitous open-source datamodel to process in a Streaming fashion:

Getting Started

To install it

Why QuiXDM?

There is SAX,StAX, DOM, Jackson, Jena, CSVParser, HTMLParser out there for processing data

Feature\API SAX StAX DOM Jackson QuiXDM
in memory/streaming streaming streaming in memory streaming streaming
push/pull push pull -- pull pull
data model low level XML low level XML low level XML low level JSON XPath Data Model
handle sequence no no no no yes
handle json/yaml no no no yes yes
handle rdf no no no no yes
handle csv no no no no yes
handle html no no no no yes

How does it work?

It uses a consistent datamodel to represent all those contents in streaming.

// Here is the grammar of events
sequence       := START_SEQUENCE, (document|json_yaml|table|semantic)*, END_SEQUENCE
document       := START_DOCUMENT, (PROCESSING-INSTRUCTION|COMMENT)*, element, (PROCESSING-INSTRUCTION|COMMENT)*, END_DOCUMENT
json_yaml      := START_JSON, object, END_JSON
table          := START_TABLE, header*, array_of_array, END_TABLE
semantic       := START_RDF, statement*, END_RDF
element        := START_ELEMENT, (NAMESPACE|ATTRIBUTE)*, (TEXT|element|PROCESSING-INSTRUCTION|COMMENT)*, END_ELEMENT
object         := START_OBJECT, (KEY_NAME, value)*, END_OBJECT
value          := object|array|flat_value
flat_value     := VALUE_FALSE|VALUE_TRUE|VALUE_NUMBER|VALUE_NULL|VALUE_STRING
array          := START_ARRAY, value*, END_ARRAY
array_of_array := START_ARRAY, flat_array+, END_ARRAY
flat_array     := START_ARRAY, flat_value*, END_ARRAY
statement      := START_PREDICATE, SUBJECT, OBJECT, GRAPH?, END_PREDICATE

Mostly look at QuiXToken.java

Use

With Object creation (à la javax.xml.stream.XMLEventReader)

Simplest way to use, is to instantiate innovimax.quixproc.datamodel.in.QuiXEventStreamReader.java

Iterable<Source> sources = 
		"/tmp/file/file_aaa.xml",	
		"/tmp/file/file_aab.json",
		"/tmp/file/file_aac.csv",
		"/tmp/file/file_aad.yml",
		"/tmp/file/file_aae.n3"	
;
QuiXEventStreamReader qesr = new QuiXEventStreamReader(sources);
while(qesr.hasNext()) {
	System.out.println(qesr.next());
}

Lightweight iterator without Object creation (à la javax.xml.stream.XMLStreamReader)

TODO

Well it comes from the fact that Streaming interface in XML should really be streaming. The truth is that there is no such character streaming interface in Java.

  • String is definitely not streamable and limited to 2^31 characters
  • CharSequence, which could have been, is neither because it has length()
  • CharIterator doesn't exist in the JDK (but you can find it here)
  • CharSequence.chars() returns IntStream (instead of CharStream because Java 8 people didn't want to add it)
  • Java 8 Stream implies that every char is boxed (which means it's highly INEFFICIENT)

Having such context, that's why QuiXCharStream and QuiXQName went live in order to :

  • be able to address the TEXT recombination issue (text() node in XDM cannot be contiguous)
  • be able to stream even corner cases XML:
    • huge string
    • huge names
    • huge namespace uris

Contributors

Innovimax is contributing to this work

Related Projects

QuiXDM can be used standalone

This is the data model of QuiXPath and QuiXProc

It is part of two bigger projects :