Releases: mshuaic/Blockchain_med
Baseline7 & 8
-
Baseline8
Modifies and query algorithms
Now and query queries one stream, then it sorts the result in memory.
How to pick the stream?
Priority: the smaller result set has higher priority
For example: User queries (timestamp AND id). Timestamp stream
normally returns only one result and ID stream returns four results,
so Timestamp stream has higher priority than ID stream. We should query
Timestamp stream, get the result and sort it in memory. -
Baseline7
Baseline6 + database normalization
Baseline 5 & 6
-
Baseline6
New feature found
In json-rpc call, we can batch multiple calls into an array and send it to server once.
Now instead of sending multiple calls which bottleneck the performance (latency
= # of calls * network latency), we batch multiple calls and sent it only
once. The server will desterilize calls locally, hence the former bottleneck
is not a problem anymore.Based on this idea, we reconstruct transaction structure
- Before: We write the same data into multiple streams in one transaction.
Every vout of the transaction contains the same data. The data is duplicated
many times in that transaction. - After: Write the actual date into one stream, and write empty data to the
other streams in the same transaction. All those stream items will share the
same txid so we can use getstreamitem to retrieve the item from the first
stream after finding that txid from listing the other stream items. The result
often contains multiple txid, and we use batch call to query the actual data.
Note: this approach is similar to unique ID approach. See 06/28/2018
- Before: We write the same data into multiple streams in one transaction.
-
Baseline5
Build multi-level indexing structure for timestamp on blockchain- 1-level: timestamp gap = 10000. In this level, one key stores about ten
values. - 2-level: timestamp gap = 1-level gap * 100. In this level, one key stores
100 1-level key-values records. - n-level: timestamp gap = (n-1) level gap * 100. One key stores 100 (n-1)-
level key-values records.
The number of level is determined by the range of timestamp.(NEED TO DO MORE
TESTES LATER)
Now, the range query gets a batch of records at once.
- 1-level: timestamp gap = 10000. In this level, one key stores about ten
baseline4
- Baseline4 (database normalization)
Most lines' ref-ID are refer back to the same original ID which means those
lines' User and Resource are the same. For this reason, we can exclude User
and Resource in transaction. When querying User and Resource, the baseline4
first get its original Node+ID, and then using Node+ID to query additional
result and union them.- In memory solution: First query Node, then extracting the lines that
has matching ID. - On Blockchain solution: Create an additional Node+Ref-ID stream whose
key is Node+Ref-ID and value is log record. We can query Node+ID =
Node+Ref-ID to get the additional result.
- In memory solution: First query Node, then extracting the lines that
Obviously, the former solution use more memory (we don't have the ability to benchmark the memory yet), and latter solution requires more space.
Baseline3
- (Done) Baseline3
-
Using multiple streams to insert data to Blockchain:
one line in log data -> one transaction, and the transaction will add data to
multiple streams(tables) atomically. It reduces the number of transactions to 1/7
(7 is the number of attributes). As a result, the insertion time and storage are
both decreased dramatically. See figure blow. -
Build an indexing solution for timestamp:
- Retrieve all timestamps from Blockchain
- Build a sorted list for the timestamps
Now, we are able to do fast range query, however, this solution builds an index
table in memory which is prohibited by the competition. We just use it as
an experiment for now.
-
- (Done) Sorting function
We are able to sort the result of querying. The test only use 400 records, so
it does not show any significant difference in time. We will increase the number
of records later.
v0.2.1
Baseline Implementation v0.2
- added hash pointer solution
baseline implementation v0.1.1
- clean up the code
- reconstruction the file structure
baseline implementation
Baseline implementation version 0.1
- Insertion: insert n (n = number of attribute) times copy to blockchain, using attribute as key and entire line as value
- Range query: query from start time, and increase timestamp by 1 every time till end time. The total number of query needed is (start - end)
- And operation: query using single attribute and do AND operation locally