Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a SQL support documentation #522

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,9 @@ marketstore connect --url <address>
```
and run commands through the sql session.

### SQL Support
Copy link
Contributor

@dakimura dakimura Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the following to README.ja.md ? 🙏

### SQL Support
[SQL support](./docs/sql-support/sql-support.md) を参照してください。

See [SQL support](./docs/sql-support/sql-support.md)

## Plugins
Go plugin architecture works best with Go1.10+ on linux. For more on plugins, see the [plugins package](./plugins/) Some featured plugins are covered here -

Expand Down
14 changes: 14 additions & 0 deletions docs/sql-support/insert-statement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
## INSERT Statements
Copy link
Contributor

@dakimura dakimura Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the doc for INSERT as well according to my review comments above 🙇


```
INSERT INTO data_location select_statement;
```

#### Where `data_directory` is a sub-directory pointing to the `rootDirectory` of your configuration file
```sql
-- example
INSERT INTO `gdax_BTC-USD/1D/OHLCV` SELECT * FROM `binance_BTC-USDT/1D/OHLCV`;
```

### Aggregate functions supported:
- TICKCANDLER
21 changes: 21 additions & 0 deletions docs/sql-support/select-statement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
## SELECT Statements

```
SELECT [ ALL | DISTINCT ] select_expr [, ...]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I'm not sure if [ALL | DISTINCT] is supported or not...

[ FROM data_directory [, ...] ]
Copy link
Contributor

@dakimura dakimura Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[imo] data_directory -> bucket_name might be better because normal marketstore users (=SQL users) don't want to care where the data reside in marketstore but they just specify a bucket name when they use SQL

[ WHERE condition ]
[ { LIMIT [ count ] } ]
```

#### Where `data_directory` is a sub-directory pointing to the `rootDirectory` of your configuration file

``` sql
Copy link
Contributor

@dakimura dakimura Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think example SQL per each syntax is helpful for readers.

-- select all columns from a bucket
» SELECT * FROM `gdax_BTC-USD/1D/OHLCV`

-- select some columns from a bucket
» SELECT Epoch, Open, High FROM `gdax_BTC-USD/1D/OHLCV`

-- WHERE BETWEEN 
» SELECT * FROM `gdax_BTC-USD/1D/OHLCV` WHERE Epoch BETWEEN '2021-09-10-12:30' AND '2021-10-26-08:40';

-- LIMIT
» SELECT * FROM `gdax_BTC-USD/1D/OHLCV` LIMIT 3;

-- example
SELECT * FROM `gdax_BTC-USD/1D/OHLCV`; --escape dashes by wrapping it with backticks
```

#### Aggregate functions supported are the following:
Copy link
Contributor

@dakimura dakimura Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some explanation about aggregate function would be helpful ! Please refer this design doc
https://github.com/alpacahq/marketstore/blob/6eb74814a8feb4141e164d57b240dec4e530491b/docs/design/executor_design.txt

- COUNT
Copy link
Contributor

@dakimura dakimura Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some explanation and example aggfunc usage might be helpful for readers.
I wrote examples for each aggfunc here just now !

##### CANDLE CANDLER
Candle Candler converts the time range of candle data.

-- converts 1-Min candle data to 4-Hour candle data
» SELECT candlecandler('4H',Open,High,Low,Close) FROM `TEST/1Min/OHLCV`;

##### TICK CANDLER
Tick Candler converts tick data into candlestick data and returns it.

-- converts tick data to 2-hour candlestick data
» SELECT tickcandler('2H', Ask) FROM `TEST/1Min/Tick`;

##### MAX
MAX retrieves only the maximum value of the specified column.

» SELECT MAX (Ask) FROM `TEST/1Min/Tick`; 

##### MIN
MAX retrieves only the minimum value of the specified column.

» SELECT MIN (Bid) FROM `TEST/1Min/Tick`;

##### GAP
Gap returns only when the time diff between adjacent records exceeds the specified time range.

» SELECT Gap ('10Sec') FROM `TEST/1Min/Tick`;

##### COUNT
Count returns the number of records that match the condition.

» SELECT Count (*) FROM `TEST/1Min/Tick` WHERE Epoch BETWEEN '2017-01-01-00:30:00' AND '2017-01-01-02:30:00';


For more specific usages in code, please see our integration tests:
https://github.com/alpacahq/marketstore/blob/654151cb4e028a300f4f74c6d0fe9132531491da/tests/integ/tests/test_aggcandler.py
https://github.com/alpacahq/marketstore/blob/654151cb4e028a300f4f74c6d0fe9132531491da/tests/integ/tests/test_basic_aggfunc.py

- AVG
- MAX
- MIN
10 changes: 10 additions & 0 deletions docs/sql-support/sql-support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# SQL Support
Copy link
Contributor

@dakimura dakimura Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[imo] Probably readers of this document want to know what SQL support is and how to use it, so how about adding some basic explanation and usage first? and used technologies can be mentioned later.

[Usage]

  • CLI
$ marketstore connect --url localhost:5993
{"level":"info","timestamp":"2021-10-27T09:04:16.614+0900","msg":"Running single threaded"}
Connected to remote instance at: http://localhost:5993
Type `\help` to see command options
» SQL goes here
  • pymarketstore
>>> import numpy as np, pandas as pd, pymarketstore as pymkts
>>> cli = pymkts.Client()
>>> reply = cli.sql("SELECT * FROM `USDJPY/1Min/OHLCV` WHERE Epoch Between '2018-01-01' AND '2018-01-02';")
>>> reply.first().df()
                                 Open        High         Low       Close  Volume
Epoch                                                                            
2018-01-01 16:00:00+00:00  112.579002  112.589996  112.579002  112.585999      61
2018-01-01 16:01:00+00:00  112.586998  112.597000  112.585999  112.592003     109
2018-01-01 16:02:00+00:00  112.593002  112.597000  112.588997  112.589996      38
2018-01-01 16:03:00+00:00  112.592003  112.642998  112.589996  112.637001     124
2018-01-01 16:04:00+00:00  112.638000  112.657997  112.637001  112.653000     136
...                               ...         ...         ...         ...     ...
2018-01-01 23:55:00+00:00  112.666000  112.667999  112.660004  112.667999      32
2018-01-01 23:56:00+00:00  112.668999  112.681999  112.667000  112.672997      46
2018-01-01 23:57:00+00:00  112.675003  112.678001  112.663002  112.671997      60
2018-01-01 23:58:00+00:00  112.668999  112.677002  112.664001  112.667000      49
2018-01-01 23:59:00+00:00  112.666000  112.667000  112.657997  112.663002      48
[464 rows x 5 columns]


The SQL interpreter uses ANTLR4 in order to generate a parse tree and evaluate the given SQL statements within the CLI application.

In order to achieve this, ANTLR4 requires a lexer file and a grammar file in order to work. The lexer file can be found in `sqlparser/parser/SQLLexerRules.g4` directory and the grammar rules can be found at `sqlparser/parser/SQLBase.g4` directory.

The SQL grammar is mainly based on Facebook Presto DB. Currently the SQL support is limited to the following Data Manipulation Language since market data is a structured data and highly likely it will stay this way (but contribution is highly recommended if you want to expand the current support):

- [SELECT statement](./select-statement.md)
- [INSERT statement](./insert-statement.md)