-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix row numbers in the left column + add tests #26
Conversation
src/HighTable.tsx
Outdated
@@ -49,15 +49,15 @@ type State = { | |||
columnWidths: Array<number | undefined> // width of each column | |||
invalidate: boolean // true if the data must be fetched again | |||
hasCompleteRow: boolean // true if at least one row is fully resolved (all of its cells) | |||
rows: AsyncRow[] // slice of the virtual table rows (sorted rows) to render as HTML | |||
rows: (Row | undefined)[] // slice of the virtual table rows (sorted rows) to render as HTML - undefined means pending |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this change from AsyncRow to Row?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[if I understand well...]
In master, when we set rows with SET_ROWS
, it's an array of Row
objects, not of AsyncRow
objects, see
Line 186 in da40bff
const resolved: Row[] = [] |
Line 200 in da40bff
resolved.push(resolvedRow) |
I think something is slightly confusing in master: the local rows
variable in useEffect
is effectively AsyncRow[]
while the state field rows
is Row[]
. That's why I renamed
Line 183 in da40bff
const rows = asyncRows(unwrapped, end - start, data.header) |
to
Line 183 in 727168c
const rowsChunk = asyncRows(unwrapped, end - start, data.header) |
Also: when we use the state rows
in the JSX code, resolved Row
objects are expected:
Lines 390 to 401 in da40bff
return <tr key={tableIndex} title={rowError(row, dataIndex)} className={isSelected({ selection, index: tableIndex }) ? 'selected' : ''}> | |
<td style={cornerStyle} onClick={event => onRowNumberClick({ useAnchor: event.shiftKey, tableIndex })}> | |
<span>{ | |
/// TODO(SL): we might want to show two columns: one for the tableIndex (for selection) and one for the dataIndex (to refer to the original data ids) | |
rowLabel(dataIndex) | |
}</span> | |
<input type='checkbox' checked={isSelected({ selection, index: tableIndex })} /> | |
</td> | |
{data.header.map((col, colIndex) => | |
Cell(row[col], colIndex, dataIndex) | |
)} | |
</tr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still thinking about this.
First of all the behavior now is basically: use the index if present, but otherwise just give a normal 1,2,3 row ordering, even if there is an orderBy. But if the __index__
does exists, then use it. Absolutely requiring the index feels like it might be impossible in some situations (can imagine a table provider that does SQL queries against a backend, or other variations). Could be hacked around by annotating with 1,2,3 indexes but that felt like it was adding more work to the simple-case, and it would be more flexible to let the table component handle indexed-or-not. The example in the README would need to change to add the __index__
column being returned if this change is merged.
This gave me an idea: If we do add the __index__
to AsyncRow
(as done in this PR), and change DataFrame.rows()
to always return type AsyncRow[]
and not | Promise<Row[]>
. This would simplify many code branches. It would make it slightly harder to write a DataFrame implementation, but we could fix that by providing helpers (for example see the asyncRows
helper function) that would also annotate with indexes if needed? So the simple case would be like:
const df = {
header, numRows,
rows(start, end, orderBy): AsyncRow[] {
const rows = [{}, {}, {}, ...]
return asyncRows(rows, start)
},
}
But for advanced use cases like parquet, we can return our own AsyncRow[]
with full control over when cells are loaded (and the __index__
).
Will continue thinking about best approach.
Yes, it seems like a good simplification. Thinking out loud: looking at the code in master (and in hyperparam-cli), we always resolve all the columns when we process a given row. I understand that AsyncRow gives the ability to update only some cells in a row, but do we want to provide this level of flexibility? If not, and iterating on your idea above, we could have
with
|
what should we do for the double click on a cell in that case? Disable it? I think that for this PR, I'll apply the alternative #2:
so that we can still ship the tests, and a11y improvements, and then work on the index in another PR |
if __index__ is not present in sorted rows, we: - don't show the number in the left column - cancel the callbacks on mouse down click and double click on a cell
This reverts commit 9769e68.
if (!('__index__' in wrapped) && '__index__' in row && typeof row.__index__ === 'number') { | ||
wrapped[i].__index__ = resolvablePromise<number>() | ||
wrapped[i].__index__.resolve(row.__index__) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this part fixes a bug: __index__
was ignored if using asyncRows()
Requesting a review again @platypii :) It breaks less things and is acceptable too, I think |
Ability to fill in individual cells is required behavior. The whole point is to allow cells to fill in when they are ready. For parquet, since it is column-oriented, small columns might fill in before large text columns and I don't want to have to wait for the entire row to resolve (not implemented in cli yet, but that is planned). Also some columns might be derived, and take time to compute, like with the llm transform demo. This would look really bad if every row was blank until every column was ready: column.loading.mp4Which is why, if anything, I think we should make the DataFrame always return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good now, while we debate other refactors :-)
Excellent, thanks for the details. Indeed we want cell level resolution. |
Yea this is a good point which I don't have a great answer to. I could be convinced on requiring the |
Add
__index__: number
as a mandatory field of the rows. It's a breaking change, but when data is sorted, we need to be sure (and the types should reflect it) that the row index is provided to:Two alternatives:
__index__
in the rows only when orderBy is not undefined, because otherwise we already have the index. It's what is implemented in the helper functionsortableDataFrame
, and we could simply add type with function overloads. It would be a breaking change anyway since we don't force anything at the moment, but possibly less impactful.Also in this PR:
<td>
to<th>
as it's semantically better, I think. But we can revert if it's better to avoid breaking the styles.