Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creates partiql-tpc tool to generate TPC-DS in Ion, CSV, and Parquet #1170

Closed
wants to merge 3 commits into from

Conversation

RCHowell
Copy link
Member

@RCHowell RCHowell commented Aug 1, 2023

Description

This leverages Trino's Java port of TPC-DS to generate TPC benchmarking data. This PR includes a CSV and Ion Text writer.

Usage: partiql-tpc [-hV] -d=<benchmark> --format=<format> --output=<output>
                   [--part=<part>] [--partitions=<partitions>]
                   [--scale=<scale>] [--table=<table>]
Writes a TPC dataset
  -d, -dataset=<benchmark>   Dataset type; valid values: TPCDS, TPCH
      --format=<format>      Output format; valid values: ION, CSV, PARQUET
  -h, --help                 Show this help message and exit.
      --output=<output>
      --part=<part>
      --partitions=<partitions>

      --scale=<scale>        Scale factor is 1GB
      --table=<table>        Table to generate; if not specified, all tables
                               are generated. https://www.tpc.
                               org/tpc_documents_current_versions/pdf/tpc-ds_v2.
                               6.0.pdf
  -V, --version              Print version information and exit.

Additionally, I've added a Parquet writer which I've done preliminary verification using https://crates.io/crates/pqrs

Once merged, will need to open issues where the partitioning and file naming isn't quite right

Relevant Issues

Performance related issues would benefit from this work as we can use it to collect metrics.

Other Information

  • Updated Unreleased Section in CHANGELOG: [YES/NO]
    No

  • Any backward-incompatible changes? [YES/NO]
    No

  • Any new external dependencies? [YES/NO]

    • io.trino.tpcds:tpcds (Apache-2.0)
    • org.apache.parquet:parquet (Apache-2.0)
  • Do your changes comply with the Contributing Guidelines
    and Code Style Guidelines? [YES/NO]
    Yes

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@RCHowell RCHowell requested a review from jpschorr August 1, 2023 22:14
@github-actions
Copy link

github-actions bot commented Aug 1, 2023

Conformance comparison report

Base (c580925) e82dd65 +/-
% Passing 92.40% 92.40% 0.00%
✅ Passing 5376 5376 0
❌ Failing 442 442 0
🔶 Ignored 0 0 0
Total Tests 5818 5818 0

Number passing in both: 5376

Number failing in both: 442

Number passing in Base (c580925) but now fail: 0

Number failing in Base (c580925) but now pass: 0

@codecov-commenter
Copy link

codecov-commenter commented Aug 1, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (c580925) 73.18% compared to head (9844380) 73.18%.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #1170   +/-   ##
=========================================
  Coverage     73.18%   73.18%           
  Complexity     2358     2358           
=========================================
  Files           224      224           
  Lines         17398    17398           
  Branches       3202     3202           
=========================================
  Hits          12733    12733           
  Misses         3680     3680           
  Partials        985      985           
Flag Coverage Δ
CLI 14.28% <ø> (ø)
EXAMPLES 80.28% <ø> (ø)
LANG 79.03% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@RCHowell RCHowell changed the title Partiql tpc Creates partiql-tpc tool to generate TPC-DS in Ion, CSV, and Parquet Aug 2, 2023
@RCHowell RCHowell closed this Aug 23, 2023
@RCHowell RCHowell deleted the partiql-tpc branch August 23, 2023 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants