Skip to content

Commandline Tools Refresh

rfecher edited this page Mar 3, 2016 · 13 revisions

We are migrating our commandline tools to JCommander and considering ways to make it more interactive and intuitive.

We will create a temp directory with cached information that can be re-used across command line runs ('.geowave directory').

config Commands that affect local configuration only

Subcommands:

  • addstore <name> <options>

options include -t or --type , -d or --default flag to indicate to make this the default configuration (meaning default values for future stores that are configured will be this), (the name of the store and what "geowave namespace" used to be, a table prefix) and -<typename>.<option> where for example could be accumulo, and includes each of those individual options such as zookeeper, user, accumulo namespace (the concept of geowave namespace is simply replaced by the name of the store), and instance. All options will be prompted with defaults provided if they are not given on the command line explicitly. -f or --force will immediately error if required options aren't given and not prompt the user to fulfill any missing options.

  • rmstore <name> <options>

Will prompt the user with "are you sure" and then will delete all config associated with the name. -f or --force will immediately error if required options aren't given and not prompt the user for yes/no.

  • cpstore <name> <newname> <options>

Copy the store info from to any option provided will override the property of when copied to .

  • addindex <name> <options>

options include -t or --type , -d or --default flag to indicate to make this the default configuration (meaning default values for future indices that are configured will be this), -np or --numpartitions, -ps or --partitionstategy and -<typename>.<option> where for example could be spatialtemporal, and includes each of those individual options such as period, bias, and pointTimestampOnly. All options will be prompted with defaults provided if they are not given on the command line explicitly. -f or --force will immediately error if required options aren't given and not prompt the user to fulfill any missing options.

  • rmindex <name> <options>

Will prompt the user with "are you sure" and then will delete all config associated with the name. -f or --force will immediately error if required options aren't given and not prompt the user for yes/no.

  • cpindex <name> <newname> <options>

Copy the index info from to any option provided will override the property of when copied to .

  • addindexgrp <grpname> <comma delimited index/group name list>

Group multiple index configurations together given a name. This acts as a convenience for re-using multiple indices together on ingest.

  • rmindexgrp <grpname> <options>

Will prompt the user with "are you sure" and then will delete all config associated with the group name. -f or --force will immediately error if required options aren't given and not prompt the user for yes/no.

  • set <propertyname=value>

Will set a valid property name within the cache. This can be useful if you want to update a particular property of a index or store.

  • list <options>

List all properties in local config. -f or --filter will allow you to give a regex to filter the list by (useful regexes may be 'store' or 'index' to isolate properties for one or the other or a particular store/index name to further isolate the list).

ingest Commands that ingest data directly into GeoWave or stage data to be ingested into GeoWave

  • localToGW <file or directory> <storename> <comma delimited index/group list> <options>

-q or --quiet can be used to output nothing. -x or --extension will limit the files parsed to particular extensions, and -f or --format will limit the formats to attempt to parse files with.

  • localToKafka <file or directory> <options>

options include --kafkaprops, --metadataBrokerList, --requestRequiredAcks, --produceType, --serializerClass, --retryBackoffMs, -q or --quiet can be used to output nothing, -x or --extension will limit the files parsed to particular extensions, and -f or --format will limit the formats to attempt to parse files with. This is useful for producing a stream of GeoWave compliant data (avro encoded) that a consumer can then directly ingest into GeoWave.

  • localToHdfs <file or directory> <hdfs host:port/path to base directory to write to>

-x or --extension will limit the files parsed to particular extensions, and -f or --format will limit the formats to attempt to parse files with. This is useful for staging data in HDFS that can then utilize mapreduce to parallelize GeoWave ingest (and can be reused for several mapreduce ingest runs if necessary). Avro encoded data will be put on HDFS that can be directly consumed by geowave mapreduce ingest operations.

  • kafkaToGW <store name> <comma delimited index/group list> <options>

options include --kafkaprops, --groupId, --zookeeperConnect, --autoOffsetReset, --fetchMessageMaxBytes, --consumerTimeoutMs, -q or --quiet can be used to output nothing, -x or --extension will limit the extensions, and -f or --format will limit the formats to try. This is useful in conjunction with localToKafka in consuiming a stream of GeoWave compliant data (avro encoded) to ingest into GeoWave.

  • mrToGW <hdfs host:port/path to base directory to write to> <store name> <comma delimited index/group list> <options>

options include --kafkaprops, --groupId, --zookeeperConnect, --autoOffsetReset, --fetchMessageMaxBytes, --consumerTimeoutMs, -q or --quiet can be used to output nothing, -x or --extension will limit the extensions, and -f or --format will limit the formats to try, -c and --clean will delete the data from HDFS when complete. This is useful in conjunction with localToHdfs to stage data to HDFS and then parallelize ingest into GeoWave using mapreduce.

  • localToMrGW <file or directory> <hdfs host:port/path to base directory to write to> <store name> <comma delimited index/group list> <options>

This wraps localToHdfs with mrToGW operations, run serially, all options are the combination of these individual operation options.

  • listformat

print out the formats available with descriptions for what they support

remote Operations to manage a remote store

Subcommands:

  • listindex <store>

display all indices in this remote store

  • listadapter <store>

display all adapters in this remote store

  • liststat <store>

display all stats in this remote store

  • rmindex <store> <indexId>

remove an index from the remote store and all associated data for the index

  • rmadapter <store> <adapterId>

remove an adapter from the remote store and all associated data for the adapter

  • rmstat <statId>

remove a statistic from the remote store. You will be prompted with are you sure.

  • calcstat <statId> <adapterId>

calculate a specific statistic in the remote store, given adapter ID and statistic ID. If that stat already exists you will be prompted confirming if you are you sure you want to overwrite it.

  • recalcstats <store>

recalculate all stats

  • clear <store>

You will be prompted with are you sure. If yes, you will delete all information in this store (data and metadata).

analytic Commands that run mapreduce or spark processing to enhance an existing GeoWave dataset

Subcommands:

TODO: polish this, for now we can try to mimic behavior one-to-one with what exists except replace -store <storename> in any place that currently uses the individual connection params

  • kde
  • kmeansparallel
  • nn
  • dbscan
  • kmeansjump

Pluggable Interfaces

Store Factory Family (StoreFactoryFamilySpi)

Currently each type of store has an Spi extension to register itself. The primary plugin of the store factory family must continue to exist. The other individual Spi extension points are nice to have but not necessary.

Dimensionality Type for Indexing (IngestDimensionalityTypeProviderSpi)

Our generic multi-dimensional indexing is fed specific dimensionality definitions through this mechanism. It must stay pluggable as we have many other use cases in mind for multi-dimensional indexing, spatial and spatial-temporal are 2 special cases. This is initially what provides the system with an Index.

Commandline Operations (CLIOperationProviderSpi)

This is fundamentally how our commandline tool is provided with commands.

Ingest Formats (IngestFormatPluginProviderSpi)

This is how new file formats can easily be added to the ingest tool. Also, this is initially what provides the system with a DataAdapter.