Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression with accessing parquet files on a public s3 bucket #522

Closed
simonharrer opened this issue Nov 25, 2024 · 3 comments
Closed

Regression with accessing parquet files on a public s3 bucket #522

simonharrer opened this issue Nov 25, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@simonharrer
Copy link
Contributor

With the latest development version, the following does no longer work:

datacontract test https://datacontract.com/examples/orders-latest/datacontract.yaml

It shows:

(venv) ➜  datacontract-demo datacontract test https://datacontract.com/examples/orders-latest/datacontract.yaml

Testing https://datacontract.com/examples/orders-latest/datacontract.yaml
ERROR:root:Exception occurred
Traceback (most recent call last):
  File "/Users/simonharrer/Projects/datacontract-cli/datacontract/data_contract.py", line 202, in test
    check_soda_execute(run, data_contract, server, self._spark, tmp_dir)
  File "/Users/simonharrer/Projects/datacontract-cli/datacontract/engines/soda/check_soda_execute.py", line 28, in check_soda_execute
    con = get_duckdb_connection(data_contract, server, run)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/simonharrer/Projects/datacontract-cli/datacontract/engines/soda/connections/duckdb.py", line 39, in get_duckdb_connection
    con.sql(query)
duckdb.duckdb.HTTPException: HTTP Error: HTTP GET error on '/?encoding-type=url&list-type=2&prefix=v2%2Forders%2F' (HTTP 400)
ERROR:root:HTTP Error: HTTP GET error on '/?encoding-type=url&list-type=2&prefix=v2%2Forders%2F' (HTTP 400)
╭────────┬──────────────────────────────────┬────────────┬────────────────────────────────────────────────────╮
│ Result │ Check                            │ Field      │ Details                                            │
├────────┼──────────────────────────────────┼────────────┼────────────────────────────────────────────────────┤
│ passed │ Check that JSON has valid schema │ orders     │ All JSON entries are valid.                        │
│ passed │ Check that JSON has valid schema │ line_items │ All JSON entries are valid.                        │
│ error  │ Test Data Contract               │            │ HTTP Error: HTTP GET error on                      │
│        │                                  │            │ '/?encoding-type=url&list-type=2&prefix=v2%2Forde… │
│        │                                  │            │ (HTTP 400)                                         │
╰────────┴──────────────────────────────────┴────────────┴────────────────────────────────────────────────────╯
🔴 data contract is invalid, found the following errors:
1) HTTP Error: HTTP GET error on '/?encoding-type=url&list-type=2&prefix=v2%2Forders%2F' (HTTP 400)
@simonharrer simonharrer added the bug Something isn't working label Nov 25, 2024
@simonharrer
Copy link
Contributor Author

duckdb/duckdb#7970 seems related

@cornzyblack
Copy link
Contributor

cornzyblack commented Dec 22, 2024

The issue has been resolved with the new version of datacontract-cli. However, after attempting to run the same command, I encountered a new quality-check error related to DuckDB SQL syntax.

Here is a screenshot showing the cause of the failing datacontract.yaml file:
image

I also attempted to run the exact soda check in DuckDB:
image

To fix the SQL for the soda quality check and ensure it passes the test, you can use the following query:

SELECT MAX(duration) AS max_duration
FROM (
    SELECT EXTRACT(EPOCH FROM (order_timestamp - LAG(order_timestamp) OVER (ORDER BY order_timestamp))) AS duration
    FROM orders
) subquery;

Lastly, here is a screenshot of me executing the new quality check in the terminal locally:
image

I opened an issue here #560 with the fix

@jochenchrist
Copy link
Contributor

Seems to be obsolete / fixed in another ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants