-
Notifications
You must be signed in to change notification settings - Fork 90
Using ARC's RDF Store
An ARC Store is instantiated like any other component:
/* ARC2 static class inclusion */
include_once('path/to/arc/ARC2.php');
/* configuration */
$config = array(
/* db */
'db_host' => 'localhost', /* optional, default is localhost */
'db_name' => 'my_db',
'db_user' => 'user',
'db_pwd' => 'secret',
/* store name (= table prefix) */
'store_name' => 'my_store',
);
/* instantiation */
$store = ARC2::getStore($config);
if (!$store->isSetUp()) {
$store->setUp();
}
$q = 'SELECT ...';
$rs = $store->query($q);
if (!$store->getErrors()) {
$rows = $rs['result']['rows'];
...
}
ARC supports standard SPARQL 1.0 queries as well as SPARQL+ for write operations.
The default query() method returns an associative array with two keys: "query_time" and "result". The former tells how long the SPARQL engine needed to process the query (excluding parse time), the latter contains query-dependent sub-structures. The query() method also accepts a second parameter to specify a result format. Examples are listed below:
query('SELECT ?fname ...')
// Results:
// $rs['query_time'] Duration
// $rs['result']['rows'] Rows
// $rs['result']['rows'][0] First row
// $rs['result']['rows'][1] Second row
// $rs['result']['rows'][1]['fname'] Second row result by "SPARQL variable" name
query('SELECT ?fname ...', 'rows')
// Results:
// $rs Rows
...
query('ASK ...')
// Results:
// $rs['query_time'] Duration
// $rs['result'] TRUE or FALSE
query('ASK ...', 'raw')
// Results:
// $rs TRUE or FALSE
query('DESCRIBE http://example.com/...')
// Results:
// $rs['query_time'] Duration
// $rs['result'] Index
// $rs['result']['http://example.com/'] Index res
The index format is described in Internal Structures.
query('DESCRIBE http://example.com/...', 'raw')
// Results:
// $rs Index
...
query('CONSTRUCT ...') works analogue to DESCRIBE
query('LOAD ...')
// Results:
// $rs['query_time'] Duration
// $rs['result']['t_count'] Added triples
// $rs['result']['load_time'] Load time
// $rs['result']['index_update_time'] Index update time
query('LOAD ...', 'raw')
// Results:
// $rs['t_count'] Added triples
// $rs['load_time'] Load time
// $rs['index_update_time'] Index update time
query('INSERT ...') works analogue to LOAD
query('DELETE ...')
// Results:
// $rs['query_time'] Duration
// $rs['result']['t_count'] Removed triples
// $rs['result']['delete_time'] Delete time
// $rs['result']['index_update_time'] Index update time
query('DELETE ...', 'raw')
// Results:
// $rs['t_count'] Removed triples
// $rs['delete_time'] Delete time
// $rs['index_update_time'] Index update time
query('DUMP) creates (and outputs) a store backup (see dump method below), the result format parameter has no effect
Besides a query and result_format, the query() method accepts two other parameters: query_base and whether to keep_bnode_ids. "query_base" (parameter #3, default: empty) allows you to specify a base for the query (e.g. if the query contains relative paths, but no BASE).
"keep_bnode_ids" (parameter #4, default: false) is an advanced trigger that enables deletes and updates of blank nodes. ARC supports bnode identification for read operations, i.e. bnode IDs returned by a SELECT can be used in successive queries, if masked as URIs (e.g. <_:bn27>). Likewise, ARC can be told to write bnodes to the store without changing their IDs:
$q1 = 'DELETE FROM <...> { <_:methuselah> ex:age ?age . }';
$q2 = 'INSERT INTO <...> { <_:methuselah> ex:age 969 . }';
$store->query($q1, 'raw', '', true);
$store->query($q2, 'raw', '', true);
reset() All tables are emptied.
drop() All tables are deleted.
insert($doc, $g, $keep_bnode_ids = 0) A convenience method. $doc can be an ARC structure, or an ARC-supported RDF format (including HTML), $g is the target graph URI, $keep_bnode_ids is explained in the paragraph above.
dump() Creates a SPOG document from all quads in the store. This method can be used for streamed store backups.
createBackup($path, $q = '') Saves a SPOG file that either contains a complete store dump, or triples/quads from a custom, SPO(G)-compliant SELECT query (via the $q parameter).
replicateTo($name) Creates a new store and replicates all tables and quads to it.
renameTo($name) Renames the store's underlying database tables.
optimizeTables($level = 2) /* 1: triple + g2t, 2: triple + *2val, 3: all tables */ Defragments the MySQL tables. This method is automatically called every ~50th LOAD or DELETE query. You can also call it explicitly, though, when queries are getting slower than they should due to store updates.
extendColumns()** Changes the table column types from MEDIUMINT to INT for scaling beyond 16M triples. Called automatically by RDF loaders.
store_indexes (default: array('sp (s,p)', 'os (o,s)', 'po (p,o)')) Custom MySQL triple table indexes.
store_write_buffer (default: 2500) This option let's you set the batch size of triples written to the MySQL tables via SQL.
store_engine_type (default: MyISAM)** This option let's you set the MySQL engine type used by ARC, in case your application environment works better with InnoDB, or maybe even MEMORY.
store_strip_mb_comp_str (default: false) If you encounter UTF-8/multibyte-related MySQL errors on your system during INSERTs or LOADs, you can try setting this flag to "1". Multibyte comparisons may then return inaccurate results, but the errors should go away.
max_errors (default: 25) This option let's you set the maximum number of errors before ARC will stop proceeding (e.g. during LOADs or streaming parsing).
ARC provides a dedicated "RemoteStore" component for running queries against Web-accessible SPARQL endpoints.