-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What the heck is "isvariablelength" for? Nanoseconds wat? #325
Comments
Maybe also a small demonstration of the client behaviors:
nav] In [32]: client.write(np.array([(pd.Timestamp("2016-01-01 10:00:00").value/10**9, 3, 140000)], dtype=[('Epoch', 'i8'), ('
...: Bid', 'f4'), ('Nanoseconds', 'i4')]),'Monkey/1Sec/TICK', isvariablelength=False),
Out[32]: ({'responses': None},)
[nav] In [33]: client.query(pymarketstore.Params('Monkey', '1Sec', 'TICK')).first().df()
Out[33]:
Bid Nanoseconds
Epoch
2016-01-01 10:00:00+00:00 3.0 140000
[nav] In [34]: client.write(np.array([(pd.Timestamp("2016-01-01 10:00:00").value/10**9, 3, 140001)], dtype=[('Epoch', 'i8'), ('
...: Bid', 'f4'), ('Nanoseconds', 'i4')]),'Monkey/1Sec/TICK', isvariablelength=False),
Out[34]: ({'responses': None},)
[nav] In [35]: client.query(pymarketstore.Params('Monkey', '1Sec', 'TICK')).first().df()
Out[35]:
Bid Nanoseconds
Epoch
2016-01-01 10:00:00+00:00 3.0 140001 So if you never wrote [ins] In [45]: client.write(np.array([(pd.Timestamp("2016-01-01 10:00:00").value/10**9, 3)], dtype=[('Epoch', 'i8'), ('Bid', 'f
...: 4'),]),'Monkey_NO_NANO/1Sec/TICK', isvariablelength=False),
Out[45]: ({'responses': None},)
[ins] In [46]: client.query(pymarketstore.Params('Monkey_NO_NANO', '1Sec', 'TICK')).first().df()
Out[46]:
Bid
Epoch
2016-01-01 10:00:00+00:00 3.0
[nav] In [37]: client.write(np.array([(pd.Timestamp("2016-01-01 10:00:00").value/10**9, 3, 140000)], dtype=[('Epoch', 'i8'), ('
...: Bid', 'f4'), ('Nanoseconds', 'i4')]),'APPEND/1Sec/TICK', isvariablelength=True)
Out[37]: {'responses': None}
[ins] In [39]: client.write(np.array([(pd.Timestamp("2016-01-01 10:00:00").value/10**9, 3, 140001)], dtype=[('Epoch', 'i8'), ('
...: Bid', 'f4'), ('Nanoseconds', 'i4')]),'APPEND/1Sec/TICK', isvariablelength=True),
Out[39]: ({'responses': None},)
[ins] In [40]: client.query(pymarketstore.Params('APPEND', '1Sec', 'TICK')).first().df()
Out[40]:
Bid Nanoseconds
Epoch
2016-01-01 10:00:00+00:00 3.0 140000
2016-01-01 10:00:00+00:00 3.0 140001 But wait let's continue with that and find our magic [nav] In [41]: client.write(np.array([(pd.Timestamp("2016-01-01 10:00:00").value/10**9, 3)], dtype=[('Epoch', 'i8'), ('Bid', 'f
...: 4'),]),'APPEND/1Sec/TICK', isvariablelength=True),
Out[41]: ({'responses': None},)
[nav] In [42]: client.query(pymarketstore.Params('APPEND', '1Sec', 'TICK')).first().df()
Out[42]:
Bid Nanoseconds
Epoch
2016-01-01 10:00:00+00:00 3.0 0
2016-01-01 10:00:00+00:00 3.0 140000
2016-01-01 10:00:00+00:00 3.0 140001 ^ That doesn't happen if you use [ins] In [44]: client.write(np.array([(pd.Timestamp("2016-01-01 10:00:00").value/10**9, 3)], dtype=[('Epoch', 'i8'), ('Bid', 'f
...: 4'),]),'Monkey/1Sec/TICK', isvariablelength=True),
Out[44]:
({'responses': [{'error': 'unable to match data columns ([{Epoch INT64} {Bid FLOAT32}]) to bucket columns ([{Epoch INT64} {Bid FLOAT32} {Nanoseconds INT32}])',
'version': '34352c9738c9164d7c65264a532d99341c57fae2'}]},) |
This was referenced Jun 9, 2020
Open
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm looking at the relevant server code sections:
You know what's handy, putting the name
Append
somewhere in the func name 😉Secondly,
Also it's probably worth mentioning that there's some kind of relationship with the
Nanoseconds
column. I got real confused when using the client to do stuff and got weird different numbers back depending on whether I wrote theNanoseconds
field and usedisvariablelength
(look, the unit tests and me are the same 🥰).That is, if
isvariablelength
is set:ColumnSeries
to disk, the'Nanoseconds'
is removed before row conversion, which appears to be to avoid a mismatch error when comparing column field names of the previoustbi
(time bucket info) 🤔RowSeries
:marketstore/utils/io/columnseries.go
Line 235 in 7537537
rowType
isn't passed through, so theDataShape{"Nanoseconds", INT32})
is never appended to thedataShapes
array which makes one wonder: "why aren't we passing it through because we need it when we read theNewRowSeries
comment:ColumnSeries.GetTime()
is called just before all this and later passed toWriter.WriteRecords()
..Ok so let's stop and think here.
We're removing
Nanoseconds
because, before writing to disk, we convertColumnSeries
->RowSeries
without passing through therowType
flag, which would makeNewRowSeries
add the'Nanoseconds'
DataShape
which we apparently need:But really it's because we already got the
Nanoseconds
out and are passing it as a[]time.Time
toWriter.WriteRecords()
?Uh, ok so I guess because the read() means when
GetTime()
get's called, or?Again, comment says we need this
Nanoseconds
field for "reading" andGetTime()
seems to need it for generating a[]time.Time
output, if there is aNanoseconds
column. Well that's good because (as mentioned in last bullet ^) we are calling it then handing it toWriter.WriteRecords()
.Let's note that
ColumnSeriesMap.FilterColumns()
method requiresNanoseconds
as part of the index.Ok where are we again?
ColumnSeriesMap
needsNanoseconds
for the index butColumnSeries
doesn'tColumnSeries
in theColumnSeriesMap
and remove theNanoseconds
fields because when converting to aRowSeries
we don't pass through therowType
flag which would add thatDataShape
forNanoseconds
times := cs.GetTime()
and eventually pass that to the writer routine whilst documenting in that method that we needNanoseconds
in for reading🤯. So this all seems pretty circular.
Alright let's go back to what we were doing. Right,
WriteCSM()
, we're writing ourColumnSeriesMap
to disk!marketstore/executor/writer.go
Line 282 in 88008f2
Ok so if a
tbi
don't exist andisVariableType
is set, we're gonna passrecordType=io.VARIABLE
toio.NewTimeBucketInfo()
.So we have a
ColumnSeries
with noNanoseconds
DataShape
(ifisVariableLength
is set) and we're making a newTimeBucketInfo
with a "variable length" meaning this stuff gets set:marketstore/utils/io/metadata.go
Lines 85 to 87 in 4811cc6
Cool, let's go back to
WriteCSM()
...marketstore/executor/writer.go
Lines 314 to 335 in 88008f2
GetDataShapesWithEpoch()
callsTimeBucketInfo.GetDataShapes()
which builds aDataShapeVector
that we're gonna compare against the same returned from theColumnSeries
that we just converted to aRowSeries
which we're actually going to write to disk.So if the columns in the
TimeBucketInfo
and theColumnSeries
match, we're golden and ready to write to disk the newRowSeries
we just rendered.Ok so now
Writer.Write()
gets called with theRowSeries
and sends a command to another channel to write the data to disk.So everything should be fine?
Nanoseconds
is written to disk whenisVariableLength
is set but that's because it always is even ifisVariableLength
is false?That seems to fit with the testing comments minus some mysterious precision problem.
But then I found this rewritebuffer.go and started getting worried:
marketstore/executor/rewritebuffer.go
Lines 12 to 27 in 88008f2
Oh man there's more
isVariableLength
stuff 😿It turns out that's used when reading back data for queries...that explains that test that doesn't work.
So as far as I can tell (which is really really questionable) it looks like
Nanoseconds
written by the client are always written bymarketstore
to disk despiteisvariablelength
, (still unclear why that is) and when you read back those same records, the re-write buffer is calculating it's ownNanoseconds
(if it needs to ?), but iffisvariablelength=True
do you always read back aNanoseconds
field despite whether you wrote on in the first place?Summary
isvariablelength
should be documented as an append operation and then maybe even make that a separateClient.append()
method?Nanoseconds
are always written as a field if you useisvariablelength=True
(despite the comments and server code making it super confusing..:cry:)Nanoseconds
values are written by some client tests storing tick dataNanoseconds
for orders trackingPS
Sorry about the long write up but I tend to want to get to know the projects I'm eyeing up seriously for production use 👍
The text was updated successfully, but these errors were encountered: