Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

null filling and coalescing for mv agg query #6088

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

null filling and coalescing for mv agg query #6088

wants to merge 13 commits into from

Conversation

pjain1
Copy link
Member

@pjain1 pjain1 commented Nov 13, 2024

These PR adds supports for two features-

  1. fill_missing - This is a metrics view aggregation query level config and when a computed time dimension is used, it fills in the missing time buckets in the response.
  2. treat_nulls_as - This is a measure level config used to configure what value to fill in for missing time buckets. This also works generally as COALESCING over non empty time buckets.

Closes https://github.com/rilldata/rill-private-issues/issues/786

@pjain1 pjain1 marked this pull request as draft November 13, 2024 11:11
@nishantmonu51 nishantmonu51 added blocker A release blocker issue that should be resolved before a new release and removed blocker A release blocker issue that should be resolved before a new release labels Nov 15, 2024
@pjain1 pjain1 changed the title null filling for mv agg query on duckdb null filling and coalescing for mv agg query Nov 25, 2024
@pjain1 pjain1 marked this pull request as ready for review November 25, 2024 07:06
@@ -205,6 +205,7 @@ message MetricsViewSpec {
string format_d3 = 7;
google.protobuf.Struct format_d3_locale = 13;
bool valid_percent_of_total = 6;
string treat_nulls_as = 14; // TODO what should the type, using string values will not work when coalescing numeric cols
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just consider this to be a SQL expression to be templated into the query literally. For example:

treat_nulls_as: 0
treat_nulls_as: CAST(0 AS HUGEINT)
treat_nulls_as: "'Not available'"

runtime/metricsview/ast.go Outdated Show resolved Hide resolved
@@ -689,7 +699,87 @@ func (a *AST) buildSpineSelect(alias string, spine *Spine, tr *TimeRange) (*Sele
}

if spine.TimeRange != nil {
return nil, errors.New("time_range not yet supported in spine")
if a.dialect == drivers.DialectDruid {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nowhere else in ast.go or astsql.go is there a hard-coded reference to a dialect. It would be very nice to avoid adding that now. Look at the various places that a.dialect are used – it always pushes the dialect-specific handling into the dialect implementation. I think something similar should be possible here.

Copy link
Member Author

@pjain1 pjain1 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving this to drivers/olap.go causing cyclic dependency because of metricsview.TimeGrain, metricsview package need drivers package

Comment on lines 1085 to 1104
for _, cjs := range cpy.CrossJoinSelects {
for _, f := range cjs.DimFields {
s.DimFields = append(s.DimFields, FieldNode{
Name: f.Name,
DisplayName: f.DisplayName,
Expr: a.sqlForMember(cpy.Alias, f.Name),
})
}

if len(cjs.UnionAllSelects) > 0 {
// All dimensions will be same across UNION ALL SELECTS so we can just pick the first one
for _, f := range cjs.UnionAllSelects[0].DimFields {
s.DimFields = append(s.DimFields, FieldNode{
Name: f.Name,
DisplayName: f.DisplayName,
Expr: a.sqlForMember(cpy.Alias, f.Name),
})
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this behavior different than the handling of LeftJoinSelects and JoinComparisonSelect? I think this might relate to the previously mentioned issue of not setting FromSelect?

Ideally after the wrap, these can just be set to nil since they are handled in the nested select.

runtime/metricsview/executor.go Outdated Show resolved Hide resolved
@@ -124,6 +124,7 @@ type TimeSpine struct {
Start time.Time `mapstructure:"start"`
End time.Time `mapstructure:"end"`
Grain TimeGrain `mapstructure:"grain"`
Alias string `mapstructure:"alias"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do? Ideally it should apply only to the time dimension if it's requested, and in that case, the time dimension already has an alias provided in Dimension

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used while constructing the inline select statement for time ranges and for also removing time dim from dim select node which will be cross joined with range select. Yes its present in the query dimension however to use it here it will need to passed down separately as in this context only ast dimensions are present.

Comment on lines +322 to +324
if q.TimeRange == nil || q.TimeRange.Start == nil || q.TimeRange.End == nil {
return nil, fmt.Errorf("time range is required for null fill")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might still work if there's a relative time range (like q.TimeRange.IsoDuration), and this might be needed e.g. for alerts/reports that use a time spine.

It would be resolved to a fixed time range before the query is turned into an AST. It happens here:

func (e *Executor) rewriteQueryTimeRanges(ctx context.Context, qry *Query, executionTime *time.Time) error {

Start: s,
End: e,
Grain: timeDim.Compute.TimeFloor.Grain,
Alias: timeDim.Name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be handled inside the metricsview package by applying it to any dimension that has a TimeFloor applied (and erroring if zero or multiple dimensions have a TimeFloor)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WhereSpine is created in the query package so did similar. You mean to create this inside NewAST ?

Copy link
Contributor

@begelundmuller begelundmuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a separate comment, can we add some tests for this feature for all of DuckDB, Druid and ClickHouse in runtime/resolvers/testdata?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants