Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aggregator for frequency metadata rows #32

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

lbschanno
Copy link
Collaborator

Create the aggregator FrequencyMetadataAggregator. This aggregator will support compaction of the "f", "i", and "ri" columns in the metadata table, collapsing the counts for dates into a single entry for each unique row, data type, and column visbility grouping.

Update MetadataHelper and AllFieldsMetadataHelper such that methods scanning over either the "f", "i", and/or "ri" columns are able to handle entries that either have the original format created upon ingest, or the aggregated format generated from the aggregator.

Required for datawave/issues/716.

Create the aggregator FrequencyMetadataAggregator. This aggregator will
support compaction of the "f", "i", and "ri" columns in the metadata
table, collapsing the counts for dates into a single entry for each
unique row, data type, and column visbility grouping.

Update MetadataHelper and AllFieldsMetadataHelper such that methods
scanning over either the "f", "i", and/or "ri" columns are able to
handle entries that either have the original format created upon ingest,
or the aggregated format generated from the aggregator.

Required for datawave/issues/716.
@ivakegg
Copy link
Collaborator

ivakegg commented Apr 12, 2024

So I loaded this and set this as an iterator on the datawave.metadata table using datawave-quickstart

ashell
setiter -t datawave.metadata -scan -class datawave.iterators.FrequencyMetadataAggregator -p 13

Then scanned the various column families. It works if I keep the scan down to the i, ri, or f columns although the i and ri values appear to be all zeros:
scan -c f
scan -c i
scan -c f

However if I can a field like EVENT_DATE, then we bomb:
scan -r EVENT_DATE

We need to be able to configure the column families to aggregate. As an example see the Combiners that we currently configure on the datawave.metadata table.

@lbschanno
Copy link
Collaborator Author

@ivakegg understood, I'll add an option for configuring the column families.

@lbschanno lbschanno requested a review from hlgp as a code owner September 3, 2024 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle the cases were a field is both indexed and not indexed within a time range
2 participants