Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix row numbers in the left column + add tests #26

Merged
merged 15 commits into from
Jan 15, 2025
Merged

Fix row numbers in the left column + add tests #26

merged 15 commits into from
Jan 15, 2025

Conversation

severo
Copy link
Contributor

@severo severo commented Jan 14, 2025

Add __index__: number as a mandatory field of the rows. It's a breaking change, but when data is sorted, we need to be sure (and the types should reflect it) that the row index is provided to:

  • show the row number in the left column
  • be able to show the cell content when double clicking

Two alternatives:

  • force __index__ in the rows only when orderBy is not undefined, because otherwise we already have the index. It's what is implemented in the helper function sortableDataFrame, and we could simply add type with function overloads. It would be a breaking change anyway since we don't force anything at the moment, but possibly less impactful.
  • don't break things, and simply remove features (no row number displayed in the left column, and no double click/mouse down callbacks) if data is sorted and no index is provided

Also in this PR:

  • I added some aria attributes as testing-library primarily relies on roles and accessible information (by opposition to the specific organization of the DOM), and as adopting its way of thinking helps build a more accessible component
  • I added tests on column sort and mouse double-click, to be sure the correct row index is sent.
  • [Breaking (the CSS)] I changed the left cells (row numbers) from <td> to <th> as it's semantically better, I think. But we can revert if it's better to avoid breaking the styles.

@severo severo changed the title add aria attributes, change left cells from td to th, add test on col… add aria attributes, change left cells from td to th, add test on column sort Jan 14, 2025
@severo severo marked this pull request as draft January 14, 2025 08:49
@@ -49,15 +49,15 @@ type State = {
columnWidths: Array<number | undefined> // width of each column
invalidate: boolean // true if the data must be fetched again
hasCompleteRow: boolean // true if at least one row is fully resolved (all of its cells)
rows: AsyncRow[] // slice of the virtual table rows (sorted rows) to render as HTML
rows: (Row | undefined)[] // slice of the virtual table rows (sorted rows) to render as HTML - undefined means pending
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change from AsyncRow to Row?

Copy link
Contributor Author

@severo severo Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[if I understand well...]

In master, when we set rows with SET_ROWS, it's an array of Row objects, not of AsyncRow objects, see

const resolved: Row[] = []

resolved.push(resolvedRow)

I think something is slightly confusing in master: the local rows variable in useEffect is effectively AsyncRow[] while the state field rows is Row[]. That's why I renamed

const rows = asyncRows(unwrapped, end - start, data.header)

to

const rowsChunk = asyncRows(unwrapped, end - start, data.header)

Also: when we use the state rows in the JSX code, resolved Row objects are expected:

hightable/src/HighTable.tsx

Lines 390 to 401 in da40bff

return <tr key={tableIndex} title={rowError(row, dataIndex)} className={isSelected({ selection, index: tableIndex }) ? 'selected' : ''}>
<td style={cornerStyle} onClick={event => onRowNumberClick({ useAnchor: event.shiftKey, tableIndex })}>
<span>{
/// TODO(SL): we might want to show two columns: one for the tableIndex (for selection) and one for the dataIndex (to refer to the original data ids)
rowLabel(dataIndex)
}</span>
<input type='checkbox' checked={isSelected({ selection, index: tableIndex })} />
</td>
{data.header.map((col, colIndex) =>
Cell(row[col], colIndex, dataIndex)
)}
</tr>

src/HighTable.tsx Outdated Show resolved Hide resolved
@severo severo requested a review from platypii January 14, 2025 22:38
@severo severo changed the title add aria attributes, change left cells from td to th, add test on column sort Fix row numbers in the left column + add tests Jan 14, 2025
@severo severo marked this pull request as ready for review January 14, 2025 22:39
Copy link
Contributor

@platypii platypii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still thinking about this.

First of all the behavior now is basically: use the index if present, but otherwise just give a normal 1,2,3 row ordering, even if there is an orderBy. But if the __index__ does exists, then use it. Absolutely requiring the index feels like it might be impossible in some situations (can imagine a table provider that does SQL queries against a backend, or other variations). Could be hacked around by annotating with 1,2,3 indexes but that felt like it was adding more work to the simple-case, and it would be more flexible to let the table component handle indexed-or-not. The example in the README would need to change to add the __index__ column being returned if this change is merged.

This gave me an idea: If we do add the __index__ to AsyncRow (as done in this PR), and change DataFrame.rows() to always return type AsyncRow[] and not | Promise<Row[]>. This would simplify many code branches. It would make it slightly harder to write a DataFrame implementation, but we could fix that by providing helpers (for example see the asyncRows helper function) that would also annotate with indexes if needed? So the simple case would be like:

const df = {
  header, numRows,
  rows(start, end, orderBy): AsyncRow[] {
    const rows = [{}, {}, {}, ...]
    return asyncRows(rows, start)
  },
}

But for advanced use cases like parquet, we can return our own AsyncRow[] with full control over when cells are loaded (and the __index__).

Will continue thinking about best approach.

@severo
Copy link
Contributor Author

severo commented Jan 15, 2025

Yes, it seems like a good simplification.


Thinking out loud: looking at the code in master (and in hyperparam-cli), we always resolve all the columns when we process a given row. I understand that AsyncRow gives the ability to update only some cells in a row, but do we want to provide this level of flexibility? If not, and iterating on your idea above, we could have

const df = {
  header, numRows,
  rows(start, end, orderBy): AsyncRow[] {
    const rows = [{}, {}, {}, ...]
    return asyncRows(rows, start)
  },
}

with

AsyncRow = WrappedPromise<Row>

@severo
Copy link
Contributor Author

severo commented Jan 15, 2025

Absolutely requiring the index feels like it might be impossible in some situations (can imagine a table provider that does SQL queries against a backend, or other variations).

what should we do for the double click on a cell in that case? Disable it?

I think that for this PR, I'll apply the alternative #2:

don't break things, and simply remove features (no row number displayed in the left column, and no double click/mouse down callbacks) if data is sorted and no index is provided

so that we can still ship the tests, and a11y improvements, and then work on the index in another PR

if (!('__index__' in wrapped) && '__index__' in row && typeof row.__index__ === 'number') {
wrapped[i].__index__ = resolvablePromise<number>()
wrapped[i].__index__.resolve(row.__index__)
}
Copy link
Contributor Author

@severo severo Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part fixes a bug: __index__ was ignored if using asyncRows()

@severo severo requested a review from platypii January 15, 2025 17:05
@severo
Copy link
Contributor Author

severo commented Jan 15, 2025

Requesting a review again @platypii :) It breaks less things and is acceptable too, I think

@platypii
Copy link
Contributor

I understand that AsyncRow gives the ability to update only some cells in a row, but do we want to provide this level of flexibility?

Ability to fill in individual cells is required behavior. The whole point is to allow cells to fill in when they are ready. For parquet, since it is column-oriented, small columns might fill in before large text columns and I don't want to have to wait for the entire row to resolve (not implemented in cli yet, but that is planned).

Also some columns might be derived, and take time to compute, like with the llm transform demo. This would look really bad if every row was blank until every column was ready:

column.loading.mp4

Which is why, if anything, I think we should make the DataFrame always return AsyncRow[].

Copy link
Contributor

@platypii platypii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good now, while we debate other refactors :-)

@severo
Copy link
Contributor Author

severo commented Jan 15, 2025

Excellent, thanks for the details. Indeed we want cell level resolution.

@severo severo merged commit 9d2f932 into master Jan 15, 2025
8 checks passed
@severo severo deleted the add-tests branch January 15, 2025 17:26
@platypii
Copy link
Contributor

what should we do for the double click on a cell in that case? Disable it?

Yea this is a good point which I don't have a great answer to. I could be convinced on requiring the __index__. But for now just disabling the clicks when sorted+noindex makes sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants