-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add aggregator for frequency metadata rows #32
base: main
Are you sure you want to change the base?
Conversation
Create the aggregator FrequencyMetadataAggregator. This aggregator will support compaction of the "f", "i", and "ri" columns in the metadata table, collapsing the counts for dates into a single entry for each unique row, data type, and column visbility grouping. Update MetadataHelper and AllFieldsMetadataHelper such that methods scanning over either the "f", "i", and/or "ri" columns are able to handle entries that either have the original format created upon ingest, or the aggregated format generated from the aggregator. Required for datawave/issues/716.
src/main/java/datawave/iterators/FrequencyMetadataAggregator.java
Outdated
Show resolved
Hide resolved
So I loaded this and set this as an iterator on the datawave.metadata table using datawave-quickstart ashell Then scanned the various column families. It works if I keep the scan down to the i, ri, or f columns although the i and ri values appear to be all zeros: However if I can a field like EVENT_DATE, then we bomb: We need to be able to configure the column families to aggregate. As an example see the Combiners that we currently configure on the datawave.metadata table. |
@ivakegg understood, I'll add an option for configuring the column families. |
Create the aggregator FrequencyMetadataAggregator. This aggregator will support compaction of the "f", "i", and "ri" columns in the metadata table, collapsing the counts for dates into a single entry for each unique row, data type, and column visbility grouping.
Update MetadataHelper and AllFieldsMetadataHelper such that methods scanning over either the "f", "i", and/or "ri" columns are able to handle entries that either have the original format created upon ingest, or the aggregated format generated from the aggregator.
Required for datawave/issues/716.