From a0c246b8c1300d5302b1797ecfae707640751344 Mon Sep 17 00:00:00 2001 From: qianheng Date: Tue, 12 Nov 2024 03:51:12 +0800 Subject: [PATCH] Add sanity test script (#878) * Add sanity test script Signed-off-by: Heng Qian * Add header Signed-off-by: Heng Qian * Minor fix Signed-off-by: Heng Qian * Minor fix Signed-off-by: Heng Qian * Minor fix Signed-off-by: Heng Qian * Support check expected_status if have that column in input file. Signed-off-by: Heng Qian * Add README.md Signed-off-by: Heng Qian * Minor fix Signed-off-by: Heng Qian * Support set log-level Signed-off-by: Heng Qian --------- Signed-off-by: Heng Qian --- integ-test/script/README.md | 158 +++++++++ integ-test/script/SanityTest.py | 291 ++++++++++++++++ integ-test/script/test_cases.csv | 567 +++++++++++++++++++++++++++++++ 3 files changed, 1016 insertions(+) create mode 100644 integ-test/script/README.md create mode 100644 integ-test/script/SanityTest.py create mode 100644 integ-test/script/test_cases.csv diff --git a/integ-test/script/README.md b/integ-test/script/README.md new file mode 100644 index 000000000..79b188158 --- /dev/null +++ b/integ-test/script/README.md @@ -0,0 +1,158 @@ +# Sanity Test Script + +### Description +This Python script executes test queries from a CSV file using an asynchronous query API and generates comprehensive test reports. + +The script produces two report types: +1. An Excel report with detailed test information for each query +2. A JSON report containing both test result overview and query-specific details + +Apart from the basic feature, it also has some advanced functionality includes: +1. Concurrent query execution (note: the async query service has session limits, so use thread workers moderately despite it already supports session ID reuse) +2. Configurable query timeout with periodic status checks and automatic cancellation if timeout occurs. +3. Flexible row selection from the input CSV file, by specifying start row and end row of the input CSV file. +4. Expected status validation when expected_status is present in the CSV +5. Ability to generate partial reports if testing is interrupted + +### Usage +To use this script, you need to have Python **3.6** or higher installed. It also requires the following Python libraries: +```shell +pip install requests pandas +``` + +After getting the requisite libraries, you can run the script with the following command line parameters in your shell: +```shell +python SanityTest.py --base-url ${URL_ADDRESS} --username *** --password *** --datasource ${DATASOURCE_NAME} --input-csv test_cases.csv --output-file test_report --max-workers 2 --check-interval 10 --timeout 600 +``` +You need to replace the placeholders with your actual values of URL_ADDRESS, DATASOURCE_NAME and USERNAME, PASSWORD for authentication to your endpoint. + +For more details of the command line parameters, you can see the help manual via command: +```shell +python SanityTest.py --help + +usage: SanityTest.py [-h] --base-url BASE_URL --username USERNAME --password PASSWORD --datasource DATASOURCE --input-csv INPUT_CSV + --output-file OUTPUT_FILE [--max-workers MAX_WORKERS] [--check-interval CHECK_INTERVAL] [--timeout TIMEOUT] + [--start-row START_ROW] [--end-row END_ROW] + +Run tests from a CSV file and generate a report. + +options: + -h, --help show this help message and exit + --base-url BASE_URL Base URL of the service + --username USERNAME Username for authentication + --password PASSWORD Password for authentication + --datasource DATASOURCE + Datasource name + --input-csv INPUT_CSV + Path to the CSV file containing test queries + --output-file OUTPUT_FILE + Path to the output report file + --max-workers MAX_WORKERS + optional, Maximum number of worker threads (default: 2) + --check-interval CHECK_INTERVAL + optional, Check interval in seconds (default: 10) + --timeout TIMEOUT optional, Timeout in seconds (default: 600) + --start-row START_ROW + optional, The start row of the query to run, start from 1 + --end-row END_ROW optional, The end row of the query to run, not included + --log-level LOG_LEVEL + optional, Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL, default: INFO) +``` + +### Input CSV File +As claimed in the description, the input CSV file should at least have the column of `query` to run the tests. It also supports an optional column of `expected_status`, the script will check the actual status against the expected status and generate a new column of `check_status` for the check result -- TRUE means the status check passed; FALSE means the status check failed. + +We also provide a sample input CSV file `test_cases.csv` for reference. It includes all sanity test cases we have currently in the Flint. + +**TODO**: the prerequisite data of the test cases and ingesting process + +### Report Explanation +The generated report contains two files: + +#### Excel Report +The Excel report provides the test result details of each query, including the query name(i.e. sequence number in the input csv file currently), query itself, expected status, actual status, and whether the status satisfy the expected status or not. + +It provides an error message if the query execution failed, otherwise it provides the query execution result with empty error. + +It also provides the query_id, session_id and start/end time for each query, which can be used to debug the query execution in the Flint. + +An example of Excel report: + +| query_name | query | expected_status | status | check_status | error | result | Duration (s) | query_id | session_id | Start Time | End Time | +|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|---------|--------------|------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|-------------------------------|------------------------------|----------------------|---------------------| +| 1 | describe myglue_test.default.http_logs | SUCCESS | SUCCESS | TRUE | | {'status': 'SUCCESS', 'schema': [{...}, ...], 'datarows': [[...], ...], 'total': 31, 'size': 31} | 37.51 | SHFEVWxDNnZjem15Z2x1ZV90ZXN0 | RkgzZm0xNlA5MG15Z2x1ZV90ZXN0 | 2024-11-07 13:34:10 | 2024-11-07 13:34:47 | +| 2 | source = myglue_test.default.http_logs \| dedup status CONSECUTIVE=true | SUCCESS | FAILED | FALSE | {"Message":"Fail to run query. Cause: Consecutive deduplication is not supported"} | | 39.53 | dVNlaVVxOFZrZW15Z2x1ZV90ZXN0 | ZGU2MllVYmI4dG15Z2x1ZV90ZXN0 | 2024-11-07 13:34:10 | 2024-11-07 13:34:49 | +| 3 | source = myglue_test.default.http_logs \| eval res = json_keys(json('{"account_number":1,"balance":39225,"age":32,"gender":"M"}')) \| head 1 \| fields res | SUCCESS | SUCCESS | TRUE | | {'status': 'SUCCESS', 'schema': [{'name': 'res', 'type': 'array'}], 'datarows': [[['account_number', 'balance', 'age', 'gender']]], 'total': 1, 'size': 1} | 12.77 | WHQxaXlVSGtGUm15Z2x1ZV90ZXN0 | RkgzZm0xNlA5MG15Z2x1ZV90ZXN0 | 2024-11-07 13:34:47 | 2024-11-07 13:38:45 | +| ... | ... | ... | ... | ... | | | ... | ... | ... | ... | ... | + + +#### JSON Report +The JSON report provides the same information as the Excel report, but in JSON format.Additionally, it includes a statistical summary of the test results at the beginning of the report. + +An example of JSON report: +```json +{ + "summary": { + "total_queries": 115, + "successful_queries": 110, + "failed_queries": 3, + "submit_failed_queries": 0, + "timeout_queries": 2, + "execution_time": 16793.223807 + }, + "detailed_results": [ + { + "query_name": 1, + "query": "source = myglue_test.default.http_logs | stats avg(size)", + "query_id": "eFZmTlpTa3EyTW15Z2x1ZV90ZXN0", + "session_id": "bFJDMWxzb2NVUm15Z2x1ZV90ZXN0", + "status": "SUCCESS", + "error": "", + "result": { + "status": "SUCCESS", + "schema": [ + { + "name": "avg(size)", + "type": "double" + } + ], + "datarows": [ + [ + 4654.305710913499 + ] + ], + "total": 1, + "size": 1 + }, + "duration": 170.621145, + "start_time": "2024-11-07 14:56:13.869226", + "end_time": "2024-11-07 14:59:04.490371" + }, + { + "query_name": 2, + "query": "source = myglue_test.default.http_logs | eval res = json_keys(json(\u2018{\"teacher\":\"Alice\",\"student\":[{\"name\":\"Bob\",\"rank\":1},{\"name\":\"Charlie\",\"rank\":2}]}')) | head 1 | fields res", + "query_id": "bjF4Y1VnbXdFYm15Z2x1ZV90ZXN0", + "session_id": "c3pvU1V6OW8xM215Z2x1ZV90ZXN0", + "status": "FAILED", + "error": "{\"Message\":\"Syntax error: \\n[PARSE_SYNTAX_ERROR] Syntax error at or near 'source'.(line 1, pos 0)\\n\\n== SQL ==\\nsource = myglue_test.default.http_logs | eval res = json_keys(json(\u2018{\\\"teacher\\\":\\\"Alice\\\",\\\"student\\\":[{\\\"name\\\":\\\"Bob\\\",\\\"rank\\\":1},{\\\"name\\\":\\\"Charlie\\\",\\\"rank\\\":2}]}')) | head 1 | fields res\\n^^^\\n\"}", + "result": null, + "duration": 14.051738, + "start_time": "2024-11-07 14:59:18.699335", + "end_time": "2024-11-07 14:59:32.751073" + }, + { + "query_name": 2, + "query": "source = myglue_test.default.http_logs | eval col1 = size, col2 = clientip | stats avg(col1) by col2", + "query_id": "azVyMFFORnBFRW15Z2x1ZV90ZXN0", + "session_id": "VWF0SEtrNWM3bm15Z2x1ZV90ZXN0", + "status": "TIMEOUT", + "error": "Query execution exceeded 600 seconds with last status: running", + "result": null, + "duration": 673.710946, + "start_time": "2024-11-07 14:45:00.157875", + "end_time": "2024-11-07 14:56:13.868821" + }, + ... + ] +} +``` diff --git a/integ-test/script/SanityTest.py b/integ-test/script/SanityTest.py new file mode 100644 index 000000000..1c51d4d20 --- /dev/null +++ b/integ-test/script/SanityTest.py @@ -0,0 +1,291 @@ +""" +Copyright OpenSearch Contributors +SPDX-License-Identifier: Apache-2.0 +""" + +import signal +import sys +import requests +import json +import csv +import time +import logging +from datetime import datetime +import pandas as pd +import argparse +from requests.auth import HTTPBasicAuth +from concurrent.futures import ThreadPoolExecutor, as_completed +import threading + +""" +Environment: python3 + +Example to use this script: + +python SanityTest.py --base-url ${URL_ADDRESS} --username *** --password *** --datasource ${DATASOURCE_NAME} --input-csv test_queries.csv --output-file test_report --max-workers 2 --check-interval 10 --timeout 600 + +The input file test_queries.csv should contain column: `query` + +For more details, please use command: + +python SanityTest.py --help + +""" + +class FlintTester: + def __init__(self, base_url, username, password, datasource, max_workers, check_interval, timeout, output_file, start_row, end_row, log_level): + self.base_url = base_url + self.auth = HTTPBasicAuth(username, password) + self.datasource = datasource + self.headers = { 'Content-Type': 'application/json' } + self.max_workers = max_workers + self.check_interval = check_interval + self.timeout = timeout + self.output_file = output_file + self.start = start_row - 1 if start_row else None + self.end = end_row - 1 if end_row else None + self.log_level = log_level + self.max_attempts = (int)(timeout / check_interval) + self.logger = self._setup_logger() + self.executor = ThreadPoolExecutor(max_workers=self.max_workers) + self.thread_local = threading.local() + self.test_results = [] + + def _setup_logger(self): + logger = logging.getLogger('FlintTester') + logger.setLevel(self.log_level) + + fh = logging.FileHandler('flint_test.log') + fh.setLevel(self.log_level) + + ch = logging.StreamHandler() + ch.setLevel(self.log_level) + + formatter = logging.Formatter( + '%(asctime)s - %(threadName)s - %(levelname)s - %(message)s' + ) + fh.setFormatter(formatter) + ch.setFormatter(formatter) + + logger.addHandler(fh) + logger.addHandler(ch) + + return logger + + + def get_session_id(self): + if not hasattr(self.thread_local, 'session_id'): + self.thread_local.session_id = "empty_session_id" + self.logger.debug(f"get session id {self.thread_local.session_id}") + return self.thread_local.session_id + + def set_session_id(self, session_id): + """Reuse the session id for the same thread""" + self.logger.debug(f"set session id {session_id}") + self.thread_local.session_id = session_id + + # Call submit API to submit the query + def submit_query(self, query, session_id="Empty"): + url = f"{self.base_url}/_plugins/_async_query" + payload = { + "datasource": self.datasource, + "lang": "ppl", + "query": query, + "sessionId": session_id + } + self.logger.debug(f"Submit query with payload: {payload}") + response_json = None + try: + response = requests.post(url, auth=self.auth, json=payload, headers=self.headers) + response_json = response.json() + response.raise_for_status() + return response_json + except Exception as e: + return {"error": str(e), "response": response_json} + + # Call get API to check the query status + def get_query_result(self, query_id): + url = f"{self.base_url}/_plugins/_async_query/{query_id}" + response_json = None + try: + response = requests.get(url, auth=self.auth) + response_json = response.json() + response.raise_for_status() + return response_json + except Exception as e: + return {"status": "FAILED", "error": str(e), "response": response_json} + + # Call delete API to cancel the query + def cancel_query(self, query_id): + url = f"{self.base_url}/_plugins/_async_query/{query_id}" + response_json = None + try: + response = requests.delete(url, auth=self.auth) + response_json = response.json() + response.raise_for_status() + self.logger.info(f"Cancelled query [{query_id}] with info {response.json()}") + return response_json + except Exception as e: + self.logger.warning(f"Cancel query [{query_id}] error: {str(e)}, got response {response_json}") + + # Run the test and return the result + def run_test(self, query, seq_id, expected_status): + self.logger.info(f"Starting test: {seq_id}, {query}") + start_time = datetime.now() + pre_session_id = self.get_session_id() + submit_result = self.submit_query(query, pre_session_id) + if "error" in submit_result: + self.logger.warning(f"Submit error: {submit_result}") + return { + "query_name": seq_id, + "query": query, + "expected_status": expected_status, + "status": "SUBMIT_FAILED", + "check_status": "SUBMIT_FAILED" == expected_status if expected_status else None, + "error": submit_result["error"], + "duration": 0, + "start_time": start_time, + "end_time": datetime.now() + } + + query_id = submit_result["queryId"] + session_id = submit_result["sessionId"] + self.logger.info(f"Submit return: {submit_result}") + if (session_id != pre_session_id): + self.logger.info(f"Update session id from {pre_session_id} to {session_id}") + self.set_session_id(session_id) + + test_result = self.check_query_status(query_id) + end_time = datetime.now() + duration = (end_time - start_time).total_seconds() + + return { + "query_name": seq_id, + "query": query, + "query_id": query_id, + "session_id": session_id, + "expected_status": expected_status, + "status": test_result["status"], + "check_status": test_result["status"] == expected_status if expected_status else None, + "error": test_result.get("error", ""), + "result": test_result if test_result["status"] == "SUCCESS" else None, + "duration": duration, + "start_time": start_time, + "end_time": end_time + } + + # Check the status of the query periodically until it is completed or failed or exceeded the timeout + def check_query_status(self, query_id): + query_id = query_id + + for attempt in range(self.max_attempts): + time.sleep(self.check_interval) + result = self.get_query_result(query_id) + + if result["status"] == "FAILED" or result["status"] == "SUCCESS": + return result + + # Cancel the query if it exceeds the timeout + self.cancel_query(query_id) + return { + "status": "TIMEOUT", + "error": "Query execution exceeded " + str(self.timeout) + " seconds with last status: " + result["status"], + } + + def run_tests_from_csv(self, csv_file): + with open(csv_file, 'r') as f: + reader = csv.DictReader(f) + queries = [(row['query'], i, row.get('expected_status', None)) for i, row in enumerate(reader, start=1) if row['query'].strip()] + + # Filtering queries based on start and end + queries = queries[self.start:self.end] + + # Parallel execution + futures = [self.executor.submit(self.run_test, query, seq_id, expected_status) for query, seq_id, expected_status in queries] + for future in as_completed(futures): + result = future.result() + self.test_results.append(result) + + def generate_report(self): + self.logger.info("Generating report...") + total_queries = len(self.test_results) + successful_queries = sum(1 for r in self.test_results if r['status'] == 'SUCCESS') + failed_queries = sum(1 for r in self.test_results if r['status'] == 'FAILED') + submit_failed_queries = sum(1 for r in self.test_results if r['status'] == 'SUBMIT_FAILED') + timeout_queries = sum(1 for r in self.test_results if r['status'] == 'TIMEOUT') + + # Create report + report = { + "summary": { + "total_queries": total_queries, + "successful_queries": successful_queries, + "failed_queries": failed_queries, + "submit_failed_queries": submit_failed_queries, + "timeout_queries": timeout_queries, + "execution_time": sum(r['duration'] for r in self.test_results) + }, + "detailed_results": self.test_results + } + + # Save report to JSON file + with open(f"{self.output_file}.json", 'w') as f: + json.dump(report, f, indent=2, default=str) + + # Save reults to Excel file + df = pd.DataFrame(self.test_results) + df.to_excel(f"{self.output_file}.xlsx", index=False) + + self.logger.info(f"Generated report in {self.output_file}.xlsx and {self.output_file}.json") + +def signal_handler(sig, frame, tester): + print(f"Signal {sig} received, generating report...") + try: + tester.executor.shutdown(wait=False, cancel_futures=True) + tester.generate_report() + finally: + sys.exit(0) + +def main(): + # Parse command line arguments + parser = argparse.ArgumentParser(description="Run tests from a CSV file and generate a report.") + parser.add_argument("--base-url", required=True, help="Base URL of the service") + parser.add_argument("--username", required=True, help="Username for authentication") + parser.add_argument("--password", required=True, help="Password for authentication") + parser.add_argument("--datasource", required=True, help="Datasource name") + parser.add_argument("--input-csv", required=True, help="Path to the CSV file containing test queries") + parser.add_argument("--output-file", required=True, help="Path to the output report file") + parser.add_argument("--max-workers", type=int, default=2, help="optional, Maximum number of worker threads (default: 2)") + parser.add_argument("--check-interval", type=int, default=5, help="optional, Check interval in seconds (default: 5)") + parser.add_argument("--timeout", type=int, default=600, help="optional, Timeout in seconds (default: 600)") + parser.add_argument("--start-row", type=int, default=None, help="optional, The start row of the query to run, start from 1") + parser.add_argument("--end-row", type=int, default=None, help="optional, The end row of the query to run, not included") + parser.add_argument("--log-level", default="INFO", help="optional, Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL, default: INFO)") + + args = parser.parse_args() + + tester = FlintTester( + base_url=args.base_url, + username=args.username, + password=args.password, + datasource=args.datasource, + max_workers=args.max_workers, + check_interval=args.check_interval, + timeout=args.timeout, + output_file=args.output_file, + start_row=args.start_row, + end_row=args.end_row, + log_level=args.log_level, + ) + + # Register signal handlers to generate report on interrupt + signal.signal(signal.SIGINT, lambda sig, frame: signal_handler(sig, frame, tester)) + signal.signal(signal.SIGTERM, lambda sig, frame: signal_handler(sig, frame, tester)) + + # Running tests + tester.run_tests_from_csv(args.input_csv) + + # Gnerate report + tester.generate_report() + +if __name__ == "__main__": + main() diff --git a/integ-test/script/test_cases.csv b/integ-test/script/test_cases.csv new file mode 100644 index 000000000..7df05f5a3 --- /dev/null +++ b/integ-test/script/test_cases.csv @@ -0,0 +1,567 @@ +query,expected_status +describe myglue_test.default.http_logs,FAILED +describe `myglue_test`.`default`.`http_logs`,FAILED +"source = myglue_test.default.http_logs | dedup 1 status | fields @timestamp, clientip, status, size | head 10",SUCCESS +"source = myglue_test.default.http_logs | dedup status, size | head 10",SUCCESS +source = myglue_test.default.http_logs | dedup 1 status keepempty=true | head 10,SUCCESS +"source = myglue_test.default.http_logs | dedup status, size keepempty=true | head 10",SUCCESS +source = myglue_test.default.http_logs | dedup 2 status | head 10,SUCCESS +"source = myglue_test.default.http_logs | dedup 2 status, size | head 10",SUCCESS +"source = myglue_test.default.http_logs | dedup 2 status, size keepempty=true | head 10",SUCCESS +source = myglue_test.default.http_logs | dedup status CONSECUTIVE=true | fields status,FAILED +"source = myglue_test.default.http_logs | dedup 2 status, size CONSECUTIVE=true | fields status",FAILED +"source = myglue_test.default.http_logs | sort stat | fields @timestamp, clientip, status | head 10",SUCCESS +"source = myglue_test.default.http_logs | fields @timestamp, notexisted | head 10",FAILED +"source = myglue_test.default.nested | fields int_col, struct_col.field1, struct_col2.field1 | head 10",FAILED +"source = myglue_test.default.nested | where struct_col2.field1.subfield > 'valueA' | sort int_col | fields int_col, struct_col.field1.subfield, struct_col2.field1.subfield",FAILED +"source = myglue_test.default.http_logs | fields - @timestamp, clientip, status | head 10",SUCCESS +"source = myglue_test.default.http_logs | eval new_time = @timestamp, new_clientip = clientip | fields - new_time, new_clientip, status | head 10",SUCCESS +source = myglue_test.default.http_logs | eval new_clientip = lower(clientip) | fields - new_clientip | head 10,SUCCESS +"source = myglue_test.default.http_logs | fields + @timestamp, clientip, status | fields - clientip, status | head 10",SUCCESS +"source = myglue_test.default.http_logs | fields - clientip, status | fields + @timestamp, clientip, status| head 10",SUCCESS +source = myglue_test.default.http_logs | where status = 200 | head 10,SUCCESS +source = myglue_test.default.http_logs | where status != 200 | head 10,SUCCESS +source = myglue_test.default.http_logs | where size > 0 | head 10,SUCCESS +source = myglue_test.default.http_logs | where size <= 0 | head 10,SUCCESS +source = myglue_test.default.http_logs | where clientip = '236.14.2.0' | head 10,SUCCESS +source = myglue_test.default.http_logs | where size > 0 AND status = 200 OR clientip = '236.14.2.0' | head 100,SUCCESS +"source = myglue_test.default.http_logs | where size <= 0 AND like(request, 'GET%') | head 10",SUCCESS +source = myglue_test.default.http_logs status = 200 | head 10,SUCCESS +source = myglue_test.default.http_logs size > 0 AND status = 200 OR clientip = '236.14.2.0' | head 100,SUCCESS +"source = myglue_test.default.http_logs size <= 0 AND like(request, 'GET%') | head 10",SUCCESS +"source = myglue_test.default.http_logs substring(clientip, 5, 2) = ""12"" | head 10",SUCCESS +source = myglue_test.default.http_logs | where isempty(size),FAILED +source = myglue_test.default.http_logs | where ispresent(size),FAILED +source = myglue_test.default.http_logs | where isnull(size) | head 10,SUCCESS +source = myglue_test.default.http_logs | where isnotnull(size) | head 10,SUCCESS +"source = myglue_test.default.http_logs | where isnotnull(coalesce(size, status)) | head 10",FAILED +"source = myglue_test.default.http_logs | where like(request, 'GET%') | head 10",SUCCESS +"source = myglue_test.default.http_logs | where like(request, '%bordeaux%') | head 10",SUCCESS +"source = myglue_test.default.http_logs | where substring(clientip, 5, 2) = ""12"" | head 10",SUCCESS +"source = myglue_test.default.http_logs | where lower(request) = ""get /images/backnews.gif http/1.0"" | head 10",SUCCESS +source = myglue_test.default.http_logs | where length(request) = 38 | head 10,SUCCESS +"source = myglue_test.default.http_logs | where case(status = 200, 'success' else 'failed') = 'success' | head 10",FAILED +"source = myglue_test.default.http_logs | eval h = ""Hello"", w = ""World"" | head 10",SUCCESS +"source = myglue_test.default.http_logs | eval @h = ""Hello"" | eval @w = ""World"" | fields @timestamp, @h, @w",SUCCESS +source = myglue_test.default.http_logs | eval newF = clientip | head 10,SUCCESS +"source = myglue_test.default.http_logs | eval newF = clientip | fields clientip, newF | head 10",SUCCESS +"source = myglue_test.default.http_logs | eval f = size | where f > 1 | sort f | fields size, clientip, status | head 10",SUCCESS +"source = myglue_test.default.http_logs | eval f = status * 2 | eval h = f * 2 | fields status, f, h | head 10",SUCCESS +"source = myglue_test.default.http_logs | eval f = size * 2, h = status | stats sum(f) by h",SUCCESS +"source = myglue_test.default.http_logs | eval f = UPPER(request) | eval h = 40 | fields f, h | head 10",SUCCESS +"source = myglue_test.default.http_logs | eval request = ""test"" | fields request | head 10",FAILED +source = myglue_test.default.http_logs | eval size = abs(size) | where size < 500,FAILED +"source = myglue_test.default.http_logs | eval status_string = case(status = 200, 'success' else 'failed') | head 10",FAILED +"source = myglue_test.default.http_logs | eval n = now() | eval t = unix_timestamp(@timestamp) | fields n, t | head 10",SUCCESS +source = myglue_test.default.http_logs | eval e = isempty(size) | eval p = ispresent(size) | head 10,FAILED +"source = myglue_test.default.http_logs | eval c = coalesce(size, status) | head 10",FAILED +source = myglue_test.default.http_logs | eval c = coalesce(request) | head 10,FAILED +source = myglue_test.default.http_logs | eval col1 = ln(size) | eval col2 = unix_timestamp(@timestamp) | sort - col1 | head 10,SUCCESS +"source = myglue_test.default.http_logs | eval col1 = 1 | sort col1 | head 4 | eval col2 = 2 | sort - col2 | sort - size | head 2 | fields @timestamp, clientip, col2",SUCCESS +"source = myglue_test.default.mini_http_logs | eval stat = status | where stat > 300 | sort stat | fields @timestamp,clientip,status | head 5",SUCCESS +"source = myglue_test.default.http_logs | eval col1 = size, col2 = clientip | stats avg(col1) by col2",SUCCESS +source = myglue_test.default.http_logs | stats avg(size) by clientip,SUCCESS +"source = myglue_test.default.http_logs | eval new_request = upper(request) | eval compound_field = concat('Hello ', if(like(new_request, '%bordeaux%'), 'World', clientip)) | fields new_request, compound_field | head 10",SUCCESS +source = myglue_test.default.http_logs | stats avg(size),SUCCESS +source = myglue_test.default.nested | stats max(int_col) by struct_col.field2,SUCCESS +source = myglue_test.default.nested | stats distinct_count(int_col),SUCCESS +source = myglue_test.default.nested | stats stddev_samp(int_col),SUCCESS +source = myglue_test.default.nested | stats stddev_pop(int_col),SUCCESS +source = myglue_test.default.nested | stats percentile(int_col),SUCCESS +source = myglue_test.default.nested | stats percentile_approx(int_col),SUCCESS +source = myglue_test.default.mini_http_logs | stats stddev_samp(status),SUCCESS +"source = myglue_test.default.mini_http_logs | where stats > 200 | stats percentile_approx(status, 99)",SUCCESS +"source = myglue_test.default.nested | stats count(int_col) by span(struct_col.field2, 10) as a_span",SUCCESS +"source = myglue_test.default.nested | stats avg(int_col) by span(struct_col.field2, 10) as a_span, struct_col2.field2",SUCCESS +"source = myglue_test.default.http_logs | stats sum(size) by span(@timestamp, 1d) as age_size_per_day | sort - age_size_per_day | head 10",SUCCESS +"source = myglue_test.default.http_logs | stats distinct_count(clientip) by span(@timestamp, 1d) as age_size_per_day | sort - age_size_per_day | head 10",SUCCESS +"source = myglue_test.default.http_logs | stats avg(size) as avg_size by status, year | stats avg(avg_size) as avg_avg_size by year",SUCCESS +"source = myglue_test.default.http_logs | stats avg(size) as avg_size by status, year, month | stats avg(avg_size) as avg_avg_size by year, month | stats avg(avg_avg_size) as avg_avg_avg_size by year",SUCCESS +"source = myglue_test.default.nested | stats avg(int_col) as avg_int by struct_col.field2, struct_col2.field2 | stats avg(avg_int) as avg_avg_int by struct_col2.field2",FAILED +"source = myglue_test.default.nested | stats avg(int_col) as avg_int by struct_col.field2, struct_col2.field2 | eval new_col = avg_int | stats avg(avg_int) as avg_avg_int by new_col",SUCCESS +source = myglue_test.default.nested | rare int_col,SUCCESS +source = myglue_test.default.nested | rare int_col by struct_col.field2,SUCCESS +source = myglue_test.default.http_logs | rare request,SUCCESS +source = myglue_test.default.http_logs | where status > 300 | rare request by status,SUCCESS +source = myglue_test.default.http_logs | rare clientip,SUCCESS +source = myglue_test.default.http_logs | where status > 300 | rare clientip,SUCCESS +source = myglue_test.default.http_logs | where status > 300 | rare clientip by day,SUCCESS +source = myglue_test.default.nested | top int_col by struct_col.field2,SUCCESS +source = myglue_test.default.nested | top 1 int_col by struct_col.field2,SUCCESS +source = myglue_test.default.nested | top 2 int_col by struct_col.field2,SUCCESS +source = myglue_test.default.nested | top int_col,SUCCESS +source = myglue_test.default.http_logs | inner join left=l right=r on l.status = r.int_col myglue_test.default.nested | head 10,FAILED +"source = myglue_test.default.http_logs | parse request 'GET /(?[a-zA-Z]+)/.*' | fields request, domain | head 10",SUCCESS +source = myglue_test.default.http_logs | parse request 'GET /(?[a-zA-Z]+)/.*' | top 1 domain,SUCCESS +source = myglue_test.default.http_logs | parse request 'GET /(?[a-zA-Z]+)/.*' | stats count() by domain,SUCCESS +"source = myglue_test.default.http_logs | parse request 'GET /(?[a-zA-Z]+)/.*' | eval a = 1 | fields a, domain | head 10",SUCCESS +"source = myglue_test.default.http_logs | parse request 'GET /(?[a-zA-Z]+)/.*' | where size > 0 | sort - size | fields size, domain | head 10",SUCCESS +"source = myglue_test.default.http_logs | parse request 'GET /(?[a-zA-Z]+)/(?[a-zA-Z]+)/.*' | where domain = 'english' | sort - picName | fields domain, picName | head 10",SUCCESS +source = myglue_test.default.http_logs | patterns request | fields patterns_field | head 10,SUCCESS +source = myglue_test.default.http_logs | patterns request | where size > 0 | fields patterns_field | head 10,SUCCESS +"source = myglue_test.default.http_logs | patterns new_field='no_letter' pattern='[a-zA-Z]' request | fields request, no_letter | head 10",SUCCESS +source = myglue_test.default.http_logs | patterns new_field='no_letter' pattern='[a-zA-Z]' request | stats count() by no_letter,SUCCESS +"source = myglue_test.default.http_logs | patterns new_field='status' pattern='[a-zA-Z]' request | fields request, status | head 10",FAILED +source = myglue_test.default.http_logs | rename @timestamp as timestamp | head 10,FAILED +source = myglue_test.default.http_logs | sort size | head 10,SUCCESS +source = myglue_test.default.http_logs | sort + size | head 10,SUCCESS +source = myglue_test.default.http_logs | sort - size | head 10,SUCCESS +"source = myglue_test.default.http_logs | sort + size, + @timestamp | head 10",SUCCESS +"source = myglue_test.default.http_logs | sort - size, - @timestamp | head 10",SUCCESS +"source = myglue_test.default.http_logs | sort - size, @timestamp | head 10",SUCCESS +"source = myglue_test.default.http_logs | eval c1 = upper(request) | eval c2 = concat('Hello ', if(like(c1, '%bordeaux%'), 'World', clientip)) | eval c3 = length(request) | eval c4 = ltrim(request) | eval c5 = rtrim(request) | eval c6 = substring(clientip, 5, 2) | eval c7 = trim(request) | eval c8 = upper(request) | eval c9 = position('bordeaux' IN request) | eval c10 = replace(request, 'GET', 'GGG') | fields c1, c2, c3, c4, c5, c6, c7, c8, c9, c10 | head 10",SUCCESS +"source = myglue_test.default.http_logs | eval c1 = unix_timestamp(@timestamp) | eval c2 = now() | eval c3 = +DAY_OF_WEEK(@timestamp) | eval c4 = +DAY_OF_MONTH(@timestamp) | eval c5 = +DAY_OF_YEAR(@timestamp) | eval c6 = +WEEK_OF_YEAR(@timestamp) | eval c7 = +WEEK(@timestamp) | eval c8 = +MONTH_OF_YEAR(@timestamp) | eval c9 = +HOUR_OF_DAY(@timestamp) | eval c10 = +MINUTE_OF_HOUR(@timestamp) | eval c11 = +SECOND_OF_MINUTE(@timestamp) | eval c12 = +LOCALTIME() | fields c1, c2, c3, c4, c5, c6, c7, c8, c9, c10, c11, c12 | head 10",SUCCESS +"source=myglue_test.default.people | eval c1 = adddate(@timestamp, 1) | fields c1 | head 10",SUCCESS +"source=myglue_test.default.people | eval c2 = subdate(@timestamp, 1) | fields c2 | head 10",SUCCESS +source=myglue_test.default.people | eval c1 = date_add(@timestamp INTERVAL 1 DAY) | fields c1 | head 10,SUCCESS +source=myglue_test.default.people | eval c1 = date_sub(@timestamp INTERVAL 1 DAY) | fields c1 | head 10,SUCCESS +source=myglue_test.default.people | eval `CURDATE()` = CURDATE() | fields `CURDATE()`,SUCCESS +source=myglue_test.default.people | eval `CURRENT_DATE()` = CURRENT_DATE() | fields `CURRENT_DATE()`,SUCCESS +source=myglue_test.default.people | eval `CURRENT_TIMESTAMP()` = CURRENT_TIMESTAMP() | fields `CURRENT_TIMESTAMP()`,SUCCESS +source=myglue_test.default.people | eval `DATE('2020-08-26')` = DATE('2020-08-26') | fields `DATE('2020-08-26')`,SUCCESS +source=myglue_test.default.people | eval `DATE(TIMESTAMP('2020-08-26 13:49:00'))` = DATE(TIMESTAMP('2020-08-26 13:49:00')) | fields `DATE(TIMESTAMP('2020-08-26 13:49:00'))`,SUCCESS +source=myglue_test.default.people | eval `DATE('2020-08-26 13:49')` = DATE('2020-08-26 13:49') | fields `DATE('2020-08-26 13:49')`,SUCCESS +"source=myglue_test.default.people | eval `DATE_FORMAT('1998-01-31 13:14:15.012345', 'HH:mm:ss.SSSSSS')` = DATE_FORMAT('1998-01-31 13:14:15.012345', 'HH:mm:ss.SSSSSS'), `DATE_FORMAT(TIMESTAMP('1998-01-31 13:14:15.012345'), 'yyyy-MMM-dd hh:mm:ss a')` = DATE_FORMAT(TIMESTAMP('1998-01-31 13:14:15.012345'), 'yyyy-MMM-dd hh:mm:ss a') | fields `DATE_FORMAT('1998-01-31 13:14:15.012345', 'HH:mm:ss.SSSSSS')`, `DATE_FORMAT(TIMESTAMP('1998-01-31 13:14:15.012345'), 'yyyy-MMM-dd hh:mm:ss a')`",SUCCESS +"source=myglue_test.default.people | eval `'2000-01-02' - '2000-01-01'` = DATEDIFF(TIMESTAMP('2000-01-02 00:00:00'), TIMESTAMP('2000-01-01 23:59:59')), `'2001-02-01' - '2004-01-01'` = DATEDIFF(DATE('2001-02-01'), TIMESTAMP('2004-01-01 00:00:00')) | fields `'2000-01-02' - '2000-01-01'`, `'2001-02-01' - '2004-01-01'`", +source=myglue_test.default.people | eval `DAY(DATE('2020-08-26'))` = DAY(DATE('2020-08-26')) | fields `DAY(DATE('2020-08-26'))`, +source=myglue_test.default.people | eval `DAYNAME(DATE('2020-08-26'))` = DAYNAME(DATE('2020-08-26')) | fields `DAYNAME(DATE('2020-08-26'))`,FAILED +source=myglue_test.default.people | eval `CURRENT_TIMEZONE()` = CURRENT_TIMEZONE() | fields `CURRENT_TIMEZONE()`,SUCCESS +source=myglue_test.default.people | eval `UTC_TIMESTAMP()` = UTC_TIMESTAMP() | fields `UTC_TIMESTAMP()`,SUCCESS +"source=myglue_test.default.people | eval `TIMESTAMPDIFF(YEAR, '1997-01-01 00:00:00', '2001-03-06 00:00:00')` = TIMESTAMPDIFF(YEAR, '1997-01-01 00:00:00', '2001-03-06 00:00:00') | eval `TIMESTAMPDIFF(SECOND, timestamp('1997-01-01 00:00:23'), timestamp('1997-01-01 00:00:00'))` = TIMESTAMPDIFF(SECOND, timestamp('1997-01-01 00:00:23'), timestamp('1997-01-01 00:00:00')) | fields `TIMESTAMPDIFF(YEAR, '1997-01-01 00:00:00', '2001-03-06 00:00:00')`, `TIMESTAMPDIFF(SECOND, timestamp('1997-01-01 00:00:23'), timestamp('1997-01-01 00:00:00'))`",SUCCESS +"source=myglue_test.default.people | eval `TIMESTAMPADD(DAY, 17, '2000-01-01 00:00:00')` = TIMESTAMPADD(DAY, 17, '2000-01-01 00:00:00') | eval `TIMESTAMPADD(QUARTER, -1, '2000-01-01 00:00:00')` = TIMESTAMPADD(QUARTER, -1, '2000-01-01 00:00:00') | fields `TIMESTAMPADD(DAY, 17, '2000-01-01 00:00:00')`, `TIMESTAMPADD(QUARTER, -1, '2000-01-01 00:00:00')`",SUCCESS + source = myglue_test.default.http_logs | stats count(),SUCCESS +"source = myglue_test.default.http_logs | stats avg(size) as c1, max(size) as c2, min(size) as c3, sum(size) as c4, percentile(size, 50) as c5, stddev_pop(size) as c6, stddev_samp(size) as c7, distinct_count(size) as c8",SUCCESS +"source = myglue_test.default.http_logs | eval c1 = abs(size) | eval c2 = ceil(size) | eval c3 = floor(size) | eval c4 = sqrt(size) | eval c5 = ln(size) | eval c6 = pow(size, 2) | eval c7 = mod(size, 2) | fields c1, c2, c3, c4, c5, c6, c7 | head 10",SUCCESS +"source = myglue_test.default.http_logs | eval c1 = isnull(request) | eval c2 = isnotnull(request) | eval c3 = ifnull(request, +""Unknown"") | eval c4 = nullif(request, +""Unknown"") | eval c5 = isnull(size) | eval c6 = if(like(request, '%bordeaux%'), 'hello', 'world') | fields c1, c2, c3, c4, c5, c6 | head 10",SUCCESS +/* this is block comment */ source = myglue_test.tpch_csv.orders | head 1 // this is line comment,SUCCESS +"/* test in tpch q16, q18, q20 */ source = myglue_test.tpch_csv.orders | head 1 // add source=xx to avoid failure in automation",SUCCESS +"/* test in tpch q4, q21, q22 */ source = myglue_test.tpch_csv.orders | head 1",SUCCESS +"/* test in tpch q2, q11, q15, q17, q20, q22 */ source = myglue_test.tpch_csv.orders | head 1",SUCCESS +"/* test in tpch q7, q8, q9, q13, q15, q22 */ source = myglue_test.tpch_csv.orders | head 1",SUCCESS +/* lots of inner join tests in tpch */ source = myglue_test.tpch_csv.orders | head 1,SUCCESS +/* left join test in tpch q13 */ source = myglue_test.tpch_csv.orders | head 1,SUCCESS +"source = myglue_test.tpch_csv.orders + | right outer join ON c_custkey = o_custkey AND not like(o_comment, '%special%requests%') + myglue_test.tpch_csv.customer +| stats count(o_orderkey) as c_count by c_custkey +| sort - c_count",SUCCESS +"source = myglue_test.tpch_csv.orders + | full outer join ON c_custkey = o_custkey AND not like(o_comment, '%special%requests%') + myglue_test.tpch_csv.customer +| stats count(o_orderkey) as c_count by c_custkey +| sort - c_count",SUCCESS +"source = myglue_test.tpch_csv.customer +| semi join ON c_custkey = o_custkey myglue_test.tpch_csv.orders +| where c_mktsegment = 'BUILDING' + | sort - c_custkey +| head 10",SUCCESS +"source = myglue_test.tpch_csv.customer +| anti join ON c_custkey = o_custkey myglue_test.tpch_csv.orders +| where c_mktsegment = 'BUILDING' + | sort - c_custkey +| head 10",SUCCESS +"source = myglue_test.tpch_csv.supplier +| where like(s_comment, '%Customer%Complaints%') +| join ON s_nationkey > n_nationkey [ source = myglue_test.tpch_csv.nation | where n_name = 'SAUDI ARABIA' ] +| sort - s_name +| head 10",SUCCESS +"source = myglue_test.tpch_csv.supplier +| where like(s_comment, '%Customer%Complaints%') +| join [ source = myglue_test.tpch_csv.nation | where n_name = 'SAUDI ARABIA' ] +| sort - s_name +| head 10",SUCCESS +source=myglue_test.default.people | LOOKUP myglue_test.default.work_info uid AS id REPLACE department | stats distinct_count(department),SUCCESS +source = myglue_test.default.people| LOOKUP myglue_test.default.work_info uid AS id APPEND department | stats distinct_count(department),SUCCESS +source = myglue_test.default.people| LOOKUP myglue_test.default.work_info uid AS id REPLACE department AS country | stats distinct_count(country),SUCCESS +source = myglue_test.default.people| LOOKUP myglue_test.default.work_info uid AS id APPEND department AS country | stats distinct_count(country),SUCCESS +"source = myglue_test.default.people| LOOKUP myglue_test.default.work_info uID AS id, name REPLACE department | stats distinct_count(department)",SUCCESS +"source = myglue_test.default.people| LOOKUP myglue_test.default.work_info uid AS ID, name APPEND department | stats distinct_count(department)",SUCCESS +"source = myglue_test.default.people| LOOKUP myglue_test.default.work_info uID AS id, name | head 10",SUCCESS +"source = myglue_test.default.people | eval major = occupation | fields id, name, major, country, salary | LOOKUP myglue_test.default.work_info name REPLACE occupation AS major | stats distinct_count(major)",SUCCESS +"source = myglue_test.default.people | eval major = occupation | fields id, name, major, country, salary | LOOKUP myglue_test.default.work_info name APPEND occupation AS major | stats distinct_count(major)",SUCCESS +"source = myglue_test.default.http_logs | eval res = json('{""account_number"":1,""balance"":39225,""age"":32,""gender"":""M""}') | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json('{""f1"":""abc"",""f2"":{""f3"":""a"",""f4"":""b""}}') | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json('[1,2,3,{""f1"":1,""f2"":[5,6]},4]') | head 1 | fields res",SUCCESS +source = myglue_test.default.http_logs | eval res = json('[]') | head 1 | fields res,SUCCESS +"source = myglue_test.default.http_logs | eval res = json(‘{""teacher"":""Alice"",""student"":[{""name"":""Bob"",""rank"":1},{""name"":""Charlie"",""rank"":2}]}') | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json('{""invalid"": ""json""') | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json('[1,2,3]') | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json(‘[1,2') | head 1 | fields res",SUCCESS +source = myglue_test.default.http_logs | eval res = json('[invalid json]') | head 1 | fields res,SUCCESS +source = myglue_test.default.http_logs | eval res = json('invalid json') | head 1 | fields res,SUCCESS +source = myglue_test.default.http_logs | eval res = json(null) | head 1 | fields res,SUCCESS +"source = myglue_test.default.http_logs | eval res = json_array('this', 'is', 'a', 'string', 'array') | head 1 | fields res",SUCCESS +source = myglue_test.default.http_logs | eval res = json_array() | head 1 | fields res,SUCCESS +"source = myglue_test.default.http_logs | eval res = json_array(1, 2, 0, -1, 1.1, -0.11) | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json_array('this', 'is', 1.1, -0.11, true, false) | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = to_json_string(json_array(1,2,0,-1,1.1,-0.11)) | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = array_length(json_array(1,2,0,-1,1.1,-0.11)) | head 1 | fields res",SUCCESS +source = myglue_test.default.http_logs | eval res = array_length(json_array()) | head 1 | fields res,SUCCESS +source = myglue_test.default.http_logs | eval res = json_array_length('[]') | head 1 | fields res,SUCCESS +"source = myglue_test.default.http_logs | eval res = json_array_length('[1,2,3,{""f1"":1,""f2"":[5,6]},4]') | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json_array_length('{\""key\"": 1}') | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json_array_length('[1,2') | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = to_json_string(json_object('key', 'string_value')) | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = to_json_string(json_object('key', 123.45)) | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = to_json_string(json_object('key', true)) | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = to_json_string(json_object(""a"", 1, ""b"", 2, ""c"", 3)) | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = to_json_string(json_object('key', array())) | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = to_json_string(json_object('key', array(1, 2, 3))) | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = to_json_string(json_object('outer', json_object('inner', 123.45))) | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = to_json_string(json_object(""array"", json_array(1,2,0,-1,1.1,-0.11))) | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | where json_valid(('{""account_number"":1,""balance"":39225,""age"":32,""gender"":""M""}') | head 1",SUCCESS +"source = myglue_test.default.http_logs | where not json_valid(('{""account_number"":1,""balance"":39225,""age"":32,""gender"":""M""}') | head 1",SUCCESS +"source = myglue_test.default.http_logs | eval res = json_keys(json('{""account_number"":1,""balance"":39225,""age"":32,""gender"":""M""}')) | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json_keys(json('{""f1"":""abc"",""f2"":{""f3"":""a"",""f4"":""b""}}')) | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json_keys(json('[1,2,3,{""f1"":1,""f2"":[5,6]},4]')) | head 1 | fields res",SUCCESS +source = myglue_test.default.http_logs | eval res = json_keys(json('[]')) | head 1 | fields res,SUCCESS +"source = myglue_test.default.http_logs | eval res = json_keys(json(‘{""teacher"":""Alice"",""student"":[{""name"":""Bob"",""rank"":1},{""name"":""Charlie"",""rank"":2}]}')) | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json_keys(json('{""invalid"": ""json""')) | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json_keys(json('[1,2,3]')) | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json_keys(json('[1,2')) | head 1 | fields res",SUCCESS +source = myglue_test.default.http_logs | eval res = json_keys(json('[invalid json]')) | head 1 | fields res,SUCCESS +source = myglue_test.default.http_logs | eval res = json_keys(json('invalid json')) | head 1 | fields res,SUCCESS +source = myglue_test.default.http_logs | eval res = json_keys(json(null)) | head 1 | fields res,SUCCESS +"source = myglue_test.default.http_logs | eval res = json_extract('{""teacher"":""Alice"",""student"":[{""name"":""Bob"",""rank"":1},{""name"":""Charlie"",""rank"":2}]}', '$') | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json_extract('{""teacher"":""Alice"",""student"":[{""name"":""Bob"",""rank"":1},{""name"":""Charlie"",""rank"":2}]}', '$.teacher') | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json_extract('{""teacher"":""Alice"",""student"":[{""name"":""Bob"",""rank"":1},{""name"":""Charlie"",""rank"":2}]}', '$.student') | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json_extract('{""teacher"":""Alice"",""student"":[{""name"":""Bob"",""rank"":1},{""name"":""Charlie"",""rank"":2}]}', '$.student[*]') | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json_extract('{""teacher"":""Alice"",""student"":[{""name"":""Bob"",""rank"":1},{""name"":""Charlie"",""rank"":2}]}', '$.student[0]') | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json_extract('{""teacher"":""Alice"",""student"":[{""name"":""Bob"",""rank"":1},{""name"":""Charlie"",""rank"":2}]}', '$.student[*].name') | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json_extract('{""teacher"":""Alice"",""student"":[{""name"":""Bob"",""rank"":1},{""name"":""Charlie"",""rank"":2}]}', '$.student[1].name') | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json_extract('{""teacher"":""Alice"",""student"":[{""name"":""Bob"",""rank"":1},{""name"":""Charlie"",""rank"":2}]}', '$.student[0].not_exist_key') | head 1 | fields res",SUCCESS +"source = myglue_test.default.http_logs | eval res = json_extract('{""teacher"":""Alice"",""student"":[{""name"":""Bob"",""rank"":1},{""name"":""Charlie"",""rank"":2}]}', '$.student[10]') | head 1 | fields res",SUCCESS +"source = myglue_test.default.people | eval array = json_array(1,2,0,-1,1.1,-0.11), result = forall(array, x -> x > 0) | head 1 | fields result",SUCCESS +"source = myglue_test.default.people | eval array = json_array(1,2,0,-1,1.1,-0.11), result = forall(array, x -> x > -10) | head 1 | fields result",SUCCESS +"source = myglue_test.default.people | eval array = json_array(json_object(""a"",1,""b"",-1),json_object(""a"",-1,""b"",-1)), result = forall(array, x -> x.a > 0) | head 1 | fields result",SUCCESS +"source = myglue_test.default.people | eval array = json_array(json_object(""a"",1,""b"",-1),json_object(""a"",-1,""b"",-1)), result = exists(array, x -> x.b < 0) | head 1 | fields result",SUCCESS +"source = myglue_test.default.people | eval array = json_array(1,2,0,-1,1.1,-0.11), result = exists(array, x -> x > 0) | head 1 | fields result",SUCCESS +"source = myglue_test.default.people | eval array = json_array(1,2,0,-1,1.1,-0.11), result = exists(array, x -> x > 10) | head 1 | fields result",SUCCESS +"source = myglue_test.default.people | eval array = json_array(1,2,0,-1,1.1,-0.11), result = filter(array, x -> x > 0) | head 1 | fields result",SUCCESS +"source = myglue_test.default.people | eval array = json_array(1,2,0,-1,1.1,-0.11), result = filter(array, x -> x > 10) | head 1 | fields result",SUCCESS +"source = myglue_test.default.people | eval array = json_array(1,2,3), result = transform(array, x -> x + 1) | head 1 | fields result",SUCCESS +"source = myglue_test.default.people | eval array = json_array(1,2,3), result = transform(array, (x, y) -> x + y) | head 1 | fields result",SUCCESS +"source = myglue_test.default.people | eval array = json_array(1,2,3), result = reduce(array, 0, (acc, x) -> acc + x) | head 1 | fields result",SUCCESS +"source = myglue_test.default.people | eval array = json_array(1,2,3), result = reduce(array, 0, (acc, x) -> acc + x, acc -> acc * 10) | head 1 | fields result",SUCCESS +source=myglue_test.default.people | eval age = salary | eventstats avg(age) | sort id | head 10,SUCCESS +"source=myglue_test.default.people | eval age = salary | eventstats avg(age) as avg_age, max(age) as max_age, min(age) as min_age, count(age) as count | sort id | head 10",SUCCESS +source=myglue_test.default.people | eventstats avg(salary) by country | sort id | head 10,SUCCESS +"source=myglue_test.default.people | eval age = salary | eventstats avg(age) as avg_age, max(age) as max_age, min(age) as min_age, count(age) as count by country | sort id | head 10",SUCCESS +"source=myglue_test.default.people | eval age = salary | eventstats avg(age) as avg_age, max(age) as max_age, min(age) as min_age, count(age) as count +by span(age, 10) | sort id | head 10",SUCCESS +"source=myglue_test.default.people | eval age = salary | eventstats avg(age) as avg_age, max(age) as max_age, min(age) as min_age, count(age) as count by span(age, 10) as age_span, country | sort id | head 10",SUCCESS +"source=myglue_test.default.people | where country != 'USA' | eventstats stddev_samp(salary), stddev_pop(salary), percentile_approx(salary, 60) by span(salary, 1000) as salary_span | sort id | head 10",SUCCESS +"source=myglue_test.default.people | eval age = salary | eventstats avg(age) as avg_age by occupation, country | eventstats avg(avg_age) as avg_state_age by country | sort id | head 10",SUCCESS +"source=myglue_test.default.people | eventstats distinct_count(salary) by span(salary, 1000) as age_span",FAILED +"source = myglue_test.tpch_csv.lineitem +| where l_shipdate <= subdate(date('1998-12-01'), 90) +| stats sum(l_quantity) as sum_qty, + sum(l_extendedprice) as sum_base_price, + sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, + sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, + avg(l_quantity) as avg_qty, + avg(l_extendedprice) as avg_price, + avg(l_discount) as avg_disc, + count() as count_order + by l_returnflag, l_linestatus +| sort l_returnflag, l_linestatus",SUCCESS +"source = myglue_test.tpch_csv.part +| join ON p_partkey = ps_partkey myglue_test.tpch_csv.partsupp +| join ON s_suppkey = ps_suppkey myglue_test.tpch_csv.supplier +| join ON s_nationkey = n_nationkey myglue_test.tpch_csv.nation +| join ON n_regionkey = r_regionkey myglue_test.tpch_csv.region +| where p_size = 15 AND like(p_type, '%BRASS') AND r_name = 'EUROPE' AND ps_supplycost = [ + source = myglue_test.tpch_csv.partsupp + | join ON s_suppkey = ps_suppkey myglue_test.tpch_csv.supplier + | join ON s_nationkey = n_nationkey myglue_test.tpch_csv.nation + | join ON n_regionkey = r_regionkey myglue_test.tpch_csv.region + | where r_name = 'EUROPE' + | stats MIN(ps_supplycost) + ] +| sort - s_acctbal, n_name, s_name, p_partkey +| head 100",SUCCESS +"source = myglue_test.tpch_csv.customer +| join ON c_custkey = o_custkey myglue_test.tpch_csv.orders +| join ON l_orderkey = o_orderkey myglue_test.tpch_csv.lineitem +| where c_mktsegment = 'BUILDING' AND o_orderdate < date('1995-03-15') AND l_shipdate > date('1995-03-15') +| stats sum(l_extendedprice * (1 - l_discount)) as revenue by l_orderkey, o_orderdate, o_shippriority + | sort - revenue, o_orderdate +| head 10",SUCCESS +"source = myglue_test.tpch_csv.orders +| where o_orderdate >= date('1993-07-01') + and o_orderdate < date_add(date('1993-07-01'), interval 3 month) + and exists [ + source = myglue_test.tpch_csv.lineitem + | where l_orderkey = o_orderkey and l_commitdate < l_receiptdate + ] +| stats count() as order_count by o_orderpriority +| sort o_orderpriority",SUCCESS +"source = myglue_test.tpch_csv.customer +| join ON c_custkey = o_custkey myglue_test.tpch_csv.orders +| join ON l_orderkey = o_orderkey myglue_test.tpch_csv.lineitem +| join ON l_suppkey = s_suppkey AND c_nationkey = s_nationkey myglue_test.tpch_csv.supplier +| join ON s_nationkey = n_nationkey myglue_test.tpch_csv.nation +| join ON n_regionkey = r_regionkey myglue_test.tpch_csv.region +| where r_name = 'ASIA' AND o_orderdate >= date('1994-01-01') AND o_orderdate < date_add(date('1994-01-01'), interval 1 year) +| stats sum(l_extendedprice * (1 - l_discount)) as revenue by n_name +| sort - revenue",SUCCESS +"source = myglue_test.tpch_csv.lineitem +| where l_shipdate >= date('1994-01-01') + and l_shipdate < adddate(date('1994-01-01'), 365) + and l_discount between .06 - 0.01 and .06 + 0.01 + and l_quantity < 24 +| stats sum(l_extendedprice * l_discount) as revenue",SUCCESS +"source = [ + source = myglue_test.tpch_csv.supplier + | join ON s_suppkey = l_suppkey myglue_test.tpch_csv.lineitem + | join ON o_orderkey = l_orderkey myglue_test.tpch_csv.orders + | join ON c_custkey = o_custkey myglue_test.tpch_csv.customer + | join ON s_nationkey = n1.n_nationkey myglue_test.tpch_csv.nation as n1 + | join ON c_nationkey = n2.n_nationkey myglue_test.tpch_csv.nation as n2 + | where l_shipdate between date('1995-01-01') and date('1996-12-31') + and n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY' or n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE' + | eval supp_nation = n1.n_name, cust_nation = n2.n_name, l_year = year(l_shipdate), volume = l_extendedprice * (1 - l_discount) + | fields supp_nation, cust_nation, l_year, volume + ] as shipping +| stats sum(volume) as revenue by supp_nation, cust_nation, l_year +| sort supp_nation, cust_nation, l_year",SUCCESS +"source = [ + source = myglue_test.tpch_csv.part + | join ON p_partkey = l_partkey myglue_test.tpch_csv.lineitem + | join ON s_suppkey = l_suppkey myglue_test.tpch_csv.supplier + | join ON l_orderkey = o_orderkey myglue_test.tpch_csv.orders + | join ON o_custkey = c_custkey myglue_test.tpch_csv.customer + | join ON c_nationkey = n1.n_nationkey myglue_test.tpch_csv.nation as n1 + | join ON s_nationkey = n2.n_nationkey myglue_test.tpch_csv.nation as n2 + | join ON n1.n_regionkey = r_regionkey myglue_test.tpch_csv.region + | where r_name = 'AMERICA' AND p_type = 'ECONOMY ANODIZED STEEL' + and o_orderdate between date('1995-01-01') and date('1996-12-31') + | eval o_year = year(o_orderdate) + | eval volume = l_extendedprice * (1 - l_discount) + | eval nation = n2.n_name + | fields o_year, volume, nation + ] as all_nations +| stats sum(case(nation = 'BRAZIL', volume else 0)) as sum_case, sum(volume) as sum_volume by o_year +| eval mkt_share = sum_case / sum_volume +| fields mkt_share, o_year +| sort o_year",SUCCESS +"source = [ + source = myglue_test.tpch_csv.part + | join ON p_partkey = l_partkey myglue_test.tpch_csv.lineitem + | join ON s_suppkey = l_suppkey myglue_test.tpch_csv.supplier + | join ON ps_partkey = l_partkey and ps_suppkey = l_suppkey myglue_test.tpch_csv.partsupp + | join ON o_orderkey = l_orderkey myglue_test.tpch_csv.orders + | join ON s_nationkey = n_nationkey myglue_test.tpch_csv.nation + | where like(p_name, '%green%') + | eval nation = n_name + | eval o_year = year(o_orderdate) + | eval amount = l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity + | fields nation, o_year, amount + ] as profit +| stats sum(amount) as sum_profit by nation, o_year +| sort nation, - o_year",SUCCESS +"source = myglue_test.tpch_csv.customer +| join ON c_custkey = o_custkey myglue_test.tpch_csv.orders +| join ON l_orderkey = o_orderkey myglue_test.tpch_csv.lineitem +| join ON c_nationkey = n_nationkey myglue_test.tpch_csv.nation +| where o_orderdate >= date('1993-10-01') + AND o_orderdate < date_add(date('1993-10-01'), interval 3 month) + AND l_returnflag = 'R' +| stats sum(l_extendedprice * (1 - l_discount)) as revenue by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment +| sort - revenue +| head 20",SUCCESS +"source = myglue_test.tpch_csv.partsupp +| join ON ps_suppkey = s_suppkey myglue_test.tpch_csv.supplier +| join ON s_nationkey = n_nationkey myglue_test.tpch_csv.nation +| where n_name = 'GERMANY' +| stats sum(ps_supplycost * ps_availqty) as value by ps_partkey +| where value > [ + source = myglue_test.tpch_csv.partsupp + | join ON ps_suppkey = s_suppkey myglue_test.tpch_csv.supplier + | join ON s_nationkey = n_nationkey myglue_test.tpch_csv.nation + | where n_name = 'GERMANY' + | stats sum(ps_supplycost * ps_availqty) as check + | eval threshold = check * 0.0001000000 + | fields threshold + ] +| sort - value",SUCCESS +"source = myglue_test.tpch_csv.orders +| join ON o_orderkey = l_orderkey myglue_test.tpch_csv.lineitem +| where l_commitdate < l_receiptdate + and l_shipdate < l_commitdate + and l_shipmode in ('MAIL', 'SHIP') + and l_receiptdate >= date('1994-01-01') + and l_receiptdate < date_add(date('1994-01-01'), interval 1 year) +| stats sum(case(o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH', 1 else 0)) as high_line_count, + sum(case(o_orderpriority != '1-URGENT' and o_orderpriority != '2-HIGH', 1 else 0)) as low_line_countby + by l_shipmode +| sort l_shipmode",SUCCESS +"source = [ + source = myglue_test.tpch_csv.customer + | left outer join ON c_custkey = o_custkey AND not like(o_comment, '%special%requests%') + myglue_test.tpch_csv.orders + | stats count(o_orderkey) as c_count by c_custkey + ] as c_orders +| stats count() as custdist by c_count +| sort - custdist, - c_count",SUCCESS +"source = myglue_test.tpch_csv.lineitem +| join ON l_partkey = p_partkey + AND l_shipdate >= date('1995-09-01') + AND l_shipdate < date_add(date('1995-09-01'), interval 1 month) + myglue_test.tpch_csv.part +| stats sum(case(like(p_type, 'PROMO%'), l_extendedprice * (1 - l_discount) else 0)) as sum1, + sum(l_extendedprice * (1 - l_discount)) as sum2 +| eval promo_revenue = 100.00 * sum1 / sum2 // Stats and Eval commands can combine when issues/819 resolved +| fields promo_revenue",SUCCESS +"source = myglue_test.tpch_csv.supplier +| join right = revenue0 ON s_suppkey = supplier_no [ + source = myglue_test.tpch_csv.lineitem + | where l_shipdate >= date('1996-01-01') AND l_shipdate < date_add(date('1996-01-01'), interval 3 month) + | eval supplier_no = l_suppkey + | stats sum(l_extendedprice * (1 - l_discount)) as total_revenue by supplier_no + ] +| where total_revenue = [ + source = [ + source = myglue_test.tpch_csv.lineitem + | where l_shipdate >= date('1996-01-01') AND l_shipdate < date_add(date('1996-01-01'), interval 3 month) + | eval supplier_no = l_suppkey + | stats sum(l_extendedprice * (1 - l_discount)) as total_revenue by supplier_no + ] + | stats max(total_revenue) + ] +| sort s_suppkey +| fields s_suppkey, s_name, s_address, s_phone, total_revenue",SUCCESS +"source = myglue_test.tpch_csv.partsupp +| join ON p_partkey = ps_partkey myglue_test.tpch_csv.part +| where p_brand != 'Brand#45' + and not like(p_type, 'MEDIUM POLISHED%') + and p_size in (49, 14, 23, 45, 19, 3, 36, 9) + and ps_suppkey not in [ + source = myglue_test.tpch_csv.supplier + | where like(s_comment, '%Customer%Complaints%') + | fields s_suppkey + ] +| stats distinct_count(ps_suppkey) as supplier_cnt by p_brand, p_type, p_size +| sort - supplier_cnt, p_brand, p_type, p_size",SUCCESS +"source = myglue_test.tpch_csv.lineitem +| join ON p_partkey = l_partkey myglue_test.tpch_csv.part +| where p_brand = 'Brand#23' + and p_container = 'MED BOX' + and l_quantity < [ + source = myglue_test.tpch_csv.lineitem + | where l_partkey = p_partkey + | stats avg(l_quantity) as avg + | eval `0.2 * avg` = 0.2 * avg + | fields `0.2 * avg` + ] +| stats sum(l_extendedprice) as sum +| eval avg_yearly = sum / 7.0 +| fields avg_yearly",SUCCESS +"source = myglue_test.tpch_csv.customer +| join ON c_custkey = o_custkey myglue_test.tpch_csv.orders +| join ON o_orderkey = l_orderkey myglue_test.tpch_csv.lineitem +| where o_orderkey in [ + source = myglue_test.tpch_csv.lineitem + | stats sum(l_quantity) as sum by l_orderkey + | where sum > 300 + | fields l_orderkey + ] +| stats sum(l_quantity) by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice +| sort - o_totalprice, o_orderdate +| head 100",SUCCESS +"source = myglue_test.tpch_csv.lineitem +| join ON p_partkey = l_partkey + and p_brand = 'Brand#12' + and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') + and l_quantity >= 1 and l_quantity <= 1 + 10 + and p_size between 1 and 5 + and l_shipmode in ('AIR', 'AIR REG') + and l_shipinstruct = 'DELIVER IN PERSON' + OR p_partkey = l_partkey + and p_brand = 'Brand#23' + and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') + and l_quantity >= 10 and l_quantity <= 10 + 10 + and p_size between 1 and 10 + and l_shipmode in ('AIR', 'AIR REG') + and l_shipinstruct = 'DELIVER IN PERSON' + OR p_partkey = l_partkey + and p_brand = 'Brand#34' + and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') + and l_quantity >= 20 and l_quantity <= 20 + 10 + and p_size between 1 and 15 + and l_shipmode in ('AIR', 'AIR REG') + and l_shipinstruct = 'DELIVER IN PERSON' + myglue_test.tpch_csv.part",SUCCESS +"source = myglue_test.tpch_csv.supplier +| join ON s_nationkey = n_nationkey myglue_test.tpch_csv.nation +| where n_name = 'CANADA' + and s_suppkey in [ + source = myglue_test.tpch_csv.partsupp + | where ps_partkey in [ + source = myglue_test.tpch_csv.part + | where like(p_name, 'forest%') + | fields p_partkey + ] + and ps_availqty > [ + source = myglue_test.tpch_csv.lineitem + | where l_partkey = ps_partkey + and l_suppkey = ps_suppkey + and l_shipdate >= date('1994-01-01') + and l_shipdate < date_add(date('1994-01-01'), interval 1 year) + | stats sum(l_quantity) as sum_l_quantity + | eval half_sum_l_quantity = 0.5 * sum_l_quantity + | fields half_sum_l_quantity + ] + | fields ps_suppkey + ]",SUCCESS +"source = myglue_test.tpch_csv.supplier +| join ON s_suppkey = l1.l_suppkey myglue_test.tpch_csv.lineitem as l1 +| join ON o_orderkey = l1.l_orderkey myglue_test.tpch_csv.orders +| join ON s_nationkey = n_nationkey myglue_test.tpch_csv.nation +| where o_orderstatus = 'F' + and l1.l_receiptdate > l1.l_commitdate + and exists [ + source = myglue_test.tpch_csv.lineitem as l2 + | where l2.l_orderkey = l1.l_orderkey + and l2.l_suppkey != l1.l_suppkey + ] + and not exists [ + source = myglue_test.tpch_csv.lineitem as l3 + | where l3.l_orderkey = l1.l_orderkey + and l3.l_suppkey != l1.l_suppkey + and l3.l_receiptdate > l3.l_commitdate + ] + and n_name = 'SAUDI ARABIA' +| stats count() as numwait by s_name +| sort - numwait, s_name +| head 100",SUCCESS +"source = [ + source = myglue_test.tpch_csv.customer + | where substring(c_phone, 1, 2) in ('13', '31', '23', '29', '30', '18', '17') + and c_acctbal > [ + source = myglue_test.tpch_csv.customer + | where c_acctbal > 0.00 + and substring(c_phone, 1, 2) in ('13', '31', '23', '29', '30', '18', '17') + | stats avg(c_acctbal) + ] + and not exists [ + source = myglue_test.tpch_csv.orders + | where o_custkey = c_custkey + ] + | eval cntrycode = substring(c_phone, 1, 2) + | fields cntrycode, c_acctbal + ] as custsale +| stats count() as numcust, sum(c_acctbal) as totacctbal by cntrycode +| sort cntrycode",SUCCESS