Generated on 2023-02-08
#7275 | [FEA] Support SaveIntoDataSourceCommand for Delta Lake |
#5225 | [FEA] Support array_remove |
#6781 | [FEA] Create demo notebook on Databricks for qualification tool usage |
#6782 | [FEA] Create demo notebook on Databricks for profiler tool usage |
#6024 | [FEA] Add support for Spark 3.2.3 SNAPSHOT |
#6887 | [FEA] support expressions parameter in substr function |
#7078 | [FEA] Add shims for Spark 3.2.3 |
#3037 | [FEA] Support ZSTD compression with Parquet and Orc |
#6916 | [FEA] Support Coalesce on map column(s) |
#6902 | [FEA] Add shims for Spark 3.3.2 |
#6896 | [FEA] Support Apache Spark 3.3.1 |
#6884 | [FEA] Support instr |
#6313 | [FEA] Support mapInArrow introduced by pyspark 3.3.0+ |
#6064 | [FEA] Qualification tool support parsing expressions (part 2) |
#6645 | [FEA] Qualification Tool: Print timestamp related functions. |
#6794 | Investigate other compression codecs and other serializers. |
#6528 | [FEA] Identify additional opportunities for using tiered projections |
#6430 | [FEA] look into using the new CUDF like operator |
#7020 | Fallback to CPU for Delta lake delta_log parquet checkpoint files |
#6254 | [FEA] Support z-ordering acceleration |
#6524 | [FEA] Improve tiered project by eliminating eclipsed columns in each tier |
#6130 | [FEA] More efficient bound check for GpuCast |
#6455 | [BUG] Rapids tools test on Databricks fail |
#6890 | [BUG] RUN_DIR change fail some CI pipelines |
#7085 | [BUG] GPU Hive Text reader fails to read floating point input as integral types |
#7271 | [BUG] failed to build in Databricks runtime due to alluxio utils |
#6636 | [BUG] casting to string and list, and concat can cause overflow issues |
#7234 | [BUG] Integration test script failed on: '/tmp/20221204/python/lib': No such file or directory |
#7198 | [BUG] RapidsShuffleManager fails to unregister UCX-mode shuffle |
#7168 | [BUG] mismatch cpu and gpu result in test_aqe_join_reused_exchange_inequality_condition failed |
#7066 | [SPARK-39432][BUG] The test test_array_element_at_zero_index_fail fails on Spark 3.4 |
#7179 | [BUG] Executors killed for out of memory with multithreaded RapidsShuffleManager |
#7054 | [BUG] Some tests in the AdaptiveQueryExecSuite fail on Spark 340 |
#7037 | [BUG] AQE on Databricks failed the query with error "UnsupportedOperationException: ColumnarToRow does not implement doExecuteBroadcast" |
#7150 | [BUG] Spark 3.4 build fails |
#7092 | [BUG] java gateway crashed due to hash_aggregate_test case intermittently |
#7140 | [BUG] failed to echo PROJECT_VERSION in premerge CI |
#7111 | [BUG] Multithreaded shuffle keeps files around after RDDs are GCed |
#7059 | [BUG] Qualification - Incorrect parsing of conditional expressions |
#6983 | [BUG] query95 @ 30TB negative allocation from BaseHashJoinIterator.countGroups with default 200 partitions |
#7036 | [BUG] 30TB query95 fails on the join with illegal memory access with 200 partitions |
#7065 | [SPARK-38976][SPARK-40066][BUG] Some tests in the array_test.py fail on Spark 3.4 because the conf strictIndexOperator has been removed |
#7044 | [BUG] Qualification tool skips applications due to failure in expression parsing |
#7026 | [BUG] AnsiCastOpSuite 340 failures |
#7039 | [BUG] nz timestamp (MILLIS AND MICROS) fails on Spark 3.4 |
#7033 | [BUG] GPU and CPU substring output different rows when pos + len < 0 && len >= 0 |
#7041 | [BUG] regexp_test and many other test failures |
#6425 | [BUG] Host column leak detected in ParquetCachedBatchSerializer tests |
#6906 | [FEAT] Add tests for parquet reader code for all possible types |
#6963 | [BUG] Dynamic partition writer prevents GPU memory from being freed during write |
#7014 | [BUG] The unit test avg literals bools fail fails in Spark 340 |
#7003 | [BUG] Alluxio config pathsToReplace does not overwrite automount config. |
#6779 | [BUG] Always read old data from alluxio regardless of S3 changes when using CONVERT_TIME replacement algorithm |
#7010 | [BUG] Parquet multi-threaded reader bufferTime is wrong |
#6949 | [BUG] Negative allocation error while stress testing with NDSv2 Query 9 |
#6995 | [BUG] HostToGpuCoalesceIterator can sometimes close input batches |
#4884 | [BUG] Split by regular expressions with ? and * repetition are not consistent with Spark |
#6452 | [BUG] GPU writes more records than maxRecordsPerFile limit while CPU performs well |
#6951 | [BUG] cast_test.py::test_cast_float_to_timestamp_ansi_for_nan_inf failed in spark 3.3.0+ |
#6880 | [BUG] Regular expressions should support escaped forward slash \/ (and any other "invalid" escape chars) |
#6537 | [BUG] per-sql unit-tests need to be added to the test generator |
#6933 | [BUG] Tools run with filter arguments should handle corrupted log that doesn't have SparkListenerApplicationStart event |
#3143 | [BUG] DPP is not working in Databricks env |
#6895 | [BUG] Profile tool fails in getMaxTaskInputSizeBytes |
#6871 | [BUG] Parquet reader - Found no metadata for schema index |
#6883 | [BUG] integration test fail in CDH env due us trying to change permissions on /tmp/hive |
#6752 | [BUG] StringOperatorsSuite failed when building with JDK17 |
#6671 | [Audit][BUG] Handle updated messageParameters for any thrown Spark exceptions in Spark 3.4.x |
#6865 | [BUG] parquet_write_test is failing when reading on the CPU parquet that was written on the GPU |
#6856 | [BUG] Can not switch Alluxio auto-mount option on the fly |
#6869 | [BUG] Building databricks failed |
#6848 | [BUG] github workflow actions use deprecated API "to be removed soon" |
#6825 | [BUG] pytests should configure hive.scratch.dir under RUN_DIR |
#6818 | [BUG] RapidsShuffleThreadedReader is not found when building the plugin with Spark 340 |
#6718 | [BUG] test_iceberg_parquet_read_round_trip FAILED "TypeError: object of type 'NoneType' has no len()" |
#6762 | [BUG] The concurrent writer throws a class casting error when enabling AQE. |
#6146 | [BUG] intermittent orc test_read_round_trip failed due to /tmp/hive location |
#2654 | [BUG] --help at the end does not print out help for tools |
#7337 | Update 22.12 changelog to latest [skip ci] |
#7316 | Update jni version 22.12.0 |
#7237 | [Doc]update download docs for v22.12 release[skip ci] |
#7330 | xfail all delta-write fallback cases [skip ci] |
#7288 | Add support for SaveIntoDataSource for Delta Lake 2.x |
#7306 | Cherry pick #7293 to 22.12 [skip ci] |
#7270 | Update 22.12 changelog to latest [skip ci] |
#7264 | Update columnar stats tracker API to pass file path for new batches |
#7273 | Fix AlluxioUtilsSuite build on Databricks for 22.12 |
#7250 | Change tools hadoop version to 3.3.4 |
#7172 | Add a document for how to view Alluxio metrics on UI [skip ci] |
#7238 | Add branch-specific premerge jenkinsfile |
#7243 | [Doc]fix broken links[skip ci] |
#7155 | Add unit tests for alluxio utils |
#7080 | [Doc] Document Alluxio does not sync metadata from S3 by default [skip ci] |
#7235 | Create tmp path to make python path explicit [skip ci] |
#7084 | [Doc]Update databricks doc for 22.12[skip ci] |
#7166 | Sync up spark2 explain code |
#6903 | Support projectV2 for changelog tooling [skip ci] |
#7203 | [Doc]add a Contact Us page at the top-level menu[skip ci] |
#7174 | Fix dependencies in jenkins-test script to support DB11.3 |
#7034 | Read directly from S3 instead of reading from Alluxio caches if files are large and disk is slow |
#7199 | Fixes unregisterShuffle bugs in the driver and a missed match for the GpuResolver |
#7156 | Add scripts to run integration test on Databricks by leveraging Jenkins parallelism [skip ci] |
#7195 | Fix non-deterministic query in test_aqe_join_reused_exchange_inequality_condition |
#7176 | Copying common ThreadFactoryBuilder to tools to remove dependency |
#7188 | Remove "SNAPSHOT" for 323 shim |
#7189 | specify shim versions to build [skip ci] |
#7165 | Try/catch cudf file scan exceptions and re-throw with file metadata in message |
#7164 | Search for CudaFatalException in causes of failureReason in function onTaskFailed |
#7169 | Remove snapshot shims build in premerge script |
#7180 | multithreaded RapidsShuffleManager change when we release memory |
#7115 | Support array_remove operator |
#7171 | Add tests for 331 and 332 |
#7099 | Update AQE tests to support Spark 3.4 |
#7110 | Add GpuBroadcastToRowExec to handle columnar broadcast in cpu broadcast join with AQE enabled |
#7153 | Add SchemaUtilsShims |
#7142 | Restore hash aggregate tests after cub segmented sort fix |
#7141 | Get PROJECT_VERSION from version-def.sh [skip ci] |
#7123 | Reduce the duplication of RegExpShim and getFileScanRDD |
#7145 | Remove inaccurate warnings about fallbacks when using multithreaded shuffle |
#7135 | Revert "Suffix artifactId with amd64/arm64 for the dist jars [skip ci]] (#7070)(#7120)" |
#7103 | Add support to DB 11.3 ML LTS in databricks build script |
#7125 | Add missing cleanup of shuffle data when using multi-threaded shuffle |
#6934 | Add support for chunked parquet reading |
#7120 | Build noSnapshots without cdh shims on arm CPU [skip ci] |
#7013 | Hive delimited textfile read support |
#7077 | Add shims for Spark 3.2.3 |
#7070 | Suffix artifactId with amd64/arm64 for the dist jars [skip ci] |
#7088 | Fix ConditionalExpr parser in Qualification tool |
#7097 | Use Databricks instance Spark version as default |
#7107 | Skip test_hash_groupby_collect_with_single_distinct [skip ci] |
#7102 | Skip test_hash_groupby_collect_partial_replace_with_distinct_fallback for #7092 |
#7051 | Support non literal position and length for substring |
#7067 | Update the tests in array_test.py to adapt the removal of strictIndexOperator in Spark 3.4 |
#7071 | Exception in SQLParser should not cause Qualification tool to skip app |
#7025 | Spark-3.4 - Fix cast unit tests |
#7045 | Fix parquet test for nztimestamp on spark 3.4.0 |
#7048 | Enable tiered projections for GpuProjectExec |
#7055 | [Doc]update a typo for iceberg readme[skip ci] |
#7049 | add parenthesis around delta_log check to short circuit |
#7052 | Enable automerge 22.12 to 23.02 [skip ci] |
#7040 | Fix a substring issue for a corner case |
#6960 | Use cudf like operator in GpuLike operator |
#7027 | Include unit tests and integration tests in mvn-verify-check |
#7022 | Fallback to CPU when reading Delta delta_log parquet checkpoint files |
#7031 | Add skip test options [skip ci] |
#7002 | ParquetCachedBatchSerializer: Close the hostBatch in ColumnBatchToCachedBatchIterator when the iterator has exhausted |
#6914 | Add in tests to verify corner cases in parquet |
#7016 | Parse out positive and negative lookahead explicitly to fallback to GPU |
#6999 | Enable snapshot builds as optional PR checks |
#6977 | Close the batch in the writeBatch function of GpuDynamicPartitionDataSingleWriter |
#7015 | Fix the test failure of avg literals bools fail on Spark 3.4.0 |
#6362 | [FEA] Add support for using nvcomp ZSTD compression |
#7004 | Alluxio pathsToReplace should has higher priority |
#6806 | Fix read old data from alluxio regardless of S3 changes when using CONVERT_TIME replacement algorithm |
#7006 | Revert "Fix a minor potential issue when rebatching for GpuArrowEvalP… |
#7011 | Fix buffertime for multi-threaded reader |
#6950 | Throw when onAllocFailure is invoked with invalid arguments |
#7009 | Work around column vectors reporting incorrect data type |
#6996 | Fix HostToGpuCoalesceIterator sometimes closing input batches |
#6998 | Make shim revision check opt-out |
#6976 | Update the docs of write and writebatch of ColumnOutputWriter |
#7000 | Spark-3.4: Update DecimalArithmeticOverrides to object |
#6937 | Removed PromotePrecision for Spark 3.4 |
#6959 | Allow * , ? , and {0,...} variants in StringSplit in non-empty match situations |
#6972 | Add regular expression support for \d inside character classes on the GPU |
#6922 | Fix CastBase issues not related to PromotePrecision and CheckOverflow |
#6966 | Extract pre/post projections from columnar transitions |
#6974 | Add doc for mapInArrow [skip ci] |
#6931 | mergeSort late batch materialization and free already merged batches eagerly |
#6971 | Spark-3.4 : Fix build error in DataSourceV2ScanExec |
#6901 | Add JDK11 to mvn-verify-check |
#6801 | Enable the config MaxRecordsPerFile on the GpuDynamicDirectoryConcurrentWriter |
#6952 | Fix the failOnError not found error when building Spark 3.4.0 |
#6962 | Stop using deprecated JDK API javax.xml.bind |
#6957 | Fix leak in GpuBroadcastExchangeExec |
#6924 | Shim for shaded protobuf orc-core |
#6943 | Mechanism to reduce redundancy in Maven profiles for shims |
#6956 | Throw SparkDateTimeException for invalid cast in Spark3.3+ versions |
#6948 | Pass through escaped punctuation in Regular Expression Transpiler |
#6953 | Remove unsupported format when converting dates/timestamps to strings [skip ci] |
#6944 | Update to a valid cuda docker image for k8s run [skip ci] |
#6938 | Spark-3.4 - Fix build errors in DataSourceStrategy and SparkDateTimeException |
#6925 | Only warn when hive scratch creation fails |
#6923 | [BUG] Fix qualification-test-result generators and update csv files |
#6939 | Support Coalesce on map column |
#6936 | Fixing exception when appStartInfo isn't available due to incomplete event log |
#6824 | Use alluxio Java API to mount instead of cmd |
#6918 | Added shim for Spark 3.3.2 |
#6919 | Enable DPP and DPP+AQE on |
#6920 | Support Spark 3.3.1 |
#6905 | Fix Spark 340 build error related to checkForNumericExpr |
#6899 | Add ApplicationSummaryInfo wrapper to allow mock tests |
#6910 | [FEA] Support string Instr function |
#6913 | [BUG] GpuPartitioning should close CVs before releasing semaphore |
#6833 | Flatten simple 4+ nesting of withResource |
#6757 | Add startupOnly tag to configs |
#6893 | Add different codepoint for unicode 13.0 |
#6892 | Fix the Spark340 build error related to mapKeyNotExistError |
#6897 | Avoid coalescing files with mismatched schemas |
#6889 | Create target folder before attempting to add unique RUN_DIR |
#6891 | Remove invalid members from allow list [skip ci] |
#6827 | Follow on from recent regexp fixes to reject patterns that cuDF no longer rejects |
#6876 | Fix Spark 3.4 build issues |
#6866 | Use a unique run directory for each run when testing in run_pyspark_from_build |
#6877 | Plugin fixes after cuDF removed INT8 for binary columns in parquet writer |
#6873 | Add in support for zorder operators on databricks |
#6857 | Fix bug that can not switch Alluxio auto-mount option on the fly |
#6860 | Adjust to cudf removal of checks in scatter and repeat |
#6823 | Support columnar processing for mapInArrow |
#6813 | Move _databricks_internal check to shim layer |
#6796 | Qualification tool: Parse expressions in Join execs |
#6861 | Add check for is_spark_330cdh and update orc test to skip zstd for cdh |
#6849 | Cuda.deviceSynchronize as a last resort if we cannot spill enough |
#6859 | Reduce memory usage in aggregate.scala |
#6870 | Update the db hadoop jars version to 0007 for 10.4 |
#6867 | Temporarily disable the failing tests of parquet writing. |
#6855 | Add the FileIndexOptions shims for Spark340 |
#6847 | Fix integration builds failing with current directory not found |
#6854 | Fix setup-java step of blossom-ci [skip ci] |
#6852 | Fix deprecated Github actions API [skip ci] |
#6700 | Support zorder for deltalake and improve perf of range partitioning |
#6826 | Place hive scratch files under pytest $RUN_DIR |
#6819 | Move the RapidsShuffleThreadedReader from 330~340 shims to 330+ shims |
#6810 | Dump stack traces for tasks with the semaphore held when OOM goes unhandled |
#6815 | Update castPartValue function to fix ClassCastException |
#6766 | Adding timestamp functions into potential problems for qual tool |
#6809 | Relocate Scala files placed in the java/ directory |
#6804 | Fix auto merge conflict 6802 [skip ci] |
#6751 | Support columnar processing for FlatMapCoGroupInPandas |
#6783 | Revert "Temporarily xfail failing test_iceberg_parquet_read_round_trip test" |
#6780 | Fix auto merge conflict 6776 |
#6763 | Fix a class casting error in concurrent writer when enabling AQE |
#6760 | Clean run directory before running tests in run_pyspark_from_build |
#6716 | Improve tiered project by eliminating eclipsed columns in each tier |
#6764 | Add supervisor(like systemd stuff) to auto restart Alluxio processes … [skip ci] |
#6726 | Provision hive scratch dir before test execution |
#6730 | Fix an unchecked conversion warning |
#6756 | Temporarily xfail failing test_iceberg_parquet_read_round_trip test |
#6743 | Add spark-rapids pulls to GitHub project [skip ci] |
#6681 | Fixes for more efficient bound checks for GpuCast |
#6742 | Rework for adding event log info for profiler output |
#6717 | Qualification tool: Parse expressions in Expand, Generate and TakeOrderedAndProject Execs |
#6741 | Reverse normalizing nan in the GpuSortArray |
#6728 | Disable maven-compiler-plugin |
#6644 | Simplify how we transpile negated character classes and add more tests |
#6706 | Adding new profiler output to map app with event log path |
#6704 | Removing --help tools tests that trigger System.exit() |
#6675 | Adding error handling to print help out when at end of command |
#6667 | Retain all heap dumps per JVM lifecycle |
#6583 | Update the GpuSingleDirectoryDataWriter and GpuDynamicDirectorySingleDataWriter to split ColumnarBatch when writing to match the maxRecordsPerFile |
#6649 | Update CUDF_VER to 22.12 for CI |
#6613 | Update project version to 22.12.0-SNAPSHOT |
#6323 | [FEA] AutoTuner Profiling Tool |
#6544 | [FEA] Update spark2 explain api code for 22.10 |
#6322 | [FEA] Integrate AutoTuner into DataProc Rapids environment |
#6401 | [FEA] Support cast string to decimal(38,2) |
#6170 | [FEA] Qualification tool support plugin for running application |
#6067 | [FEA] Qualification Tool: For Databricks eventlog capture more information in output csv file |
#6632 | [FEA] Profiling tool: Suggest parameters to tune |
#5305 | [FEA] Qualification tool: Operator mapping, check if execs/expressions off by default |
#5589 | [FEA] GpuGlobalLimitExec and GpuCollectLimitExec support offset |
#6264 | [FEA] Qualification tool print unsupported execs and expressions |
#5409 | [FEA] Binary Data Write support for Parquet |
#6400 | [FEA] Windowing with decimal in orderBy. |
#6529 | [FEA] Update qualification speedup factors for CSP environments |
#5096 | [FEA] Support GroupBy Array[INT] |
#6496 | Allow filtering blocks to be done multithreaded in the parquet coalescing reader |
#6392 | [FEA] Support OptimizedCreateHiveTableAsSelectCommand (Hive CTAS with parquet) |
#6395 | [FEA] Remove the hasNans config from GpuCollectSet |
#5416 | [FEA] Support reading binary data types from Parquet as binary (not strings) |
#4656 | [FEA] Support Group-By on Array[String] |
#5942 | [FEA] Support multithreaded and coalescing read strategies for Apache Iceberg |
#3974 | [FEA] Fully implement multiply and divide for decimal128 |
#6164 | [FEA] Add Nan handling in the GpuMin |
#6142 | [FEA] GpuAverage cannot guarantee proper overflow checks for a precision large than 23 |
#6144 | [FEA] Support FromUTCTimestamp |
#5559 | [FEA] Add GpuMapConcat support for nested (array, struct, map) types. |
#6143 | [FEA] Avoid CPU fallback due to intermediate precision overflow when handling decimal |
#4061 | [FEA] Validate the size/complexity of regular expressions |
#6145 | [FEA] Avoid CPU fallback due to date_format:Failed to convert Unsupported word: SSS null. |
#6300 | [FEA] Profiling Tool supports recommendations for tuning |
#6267 | [FEA] Support ShuffleExchangeExec with BinaryType as input and output |
#6708 | [BUG] Regression in NDSv2 of 4% because of spillable broadcast |
#5999 | [FEA] [improvement] Investigate DynamicPartitionDataConcurrentWriter to avoid full sort when writing partitioned data |
#6061 | [FEA] PoC shuffle read/decompress performance |
#4713 | [FEA] Running window optimization for percent rank |
#5085 | Could we evaluate once the child expressions of GpuExtractChunk32 |
#6209 | revisit locality wait = 0 setting |
#5320 | [FEA] fix issues so we can remove hasNans config |
#6219 | [FEA] Do not read the real data when readDataSchema is empty in Avro multi-threaded reading. |
#6727 | [BUG] On SPARK-3.2.1 : java.lang.ClassCastException |
#6748 | [BUG] Casting strings CudfException: strings column has no children |
#6614 | [BUG] test_iceberg_read_parquet_compression_codec CPU and GPU output mismatched in PASCAL GPU |
#6723 | [BUG] null pointer exception selecting single column from iceberg table |
#6693 | [BUG] test_cast_string_to_negative_scale_decimal failed in nightly |
#6692 | [BUG] compile error deprecated method w/ jdk11 |
#6431 | [BUG] Like does not work how we would like it to. |
#6659 | [BUG] Potential memory leaks in regexp_extract on the GPU |
#6515 | [BUG] RapidsShuffleThreadedWriterSuite failed to delete itermitent failure |
#6621 | [BUG] setting multi-threaded writer threads to 0 leads to divide-by-zero exception |
#6508 | [BUG] delta lake deletes/updates on Databricks can fail when using alluxio |
#6637 | [BUG] Qualification tool application time calculation can count stages twice if in separate sql queries |
#6578 | [BUG] Autotuner does not load worker-info from remote storage |
#6592 | [BUG] Delta Lake Deletes on Databricks broken with PERFILE parquet reader |
#6593 | [BUG] Avro tests using packages feature needs to enable snapshot repositories |
#6539 | Delta Lake and AQE on Databricks 10.4 workaround |
#3328 | [BUG] Segfault when partitioning empty batch |
#6572 | [BUG] UCX smoke tests can fail with OOM when initializing UCX |
#6312 | [BUG] Timestamp from GPU ORC reading is different from CPU ORC reading |
#6270 | [BUG] UPDATE on a Databricks (10.4) DELTA table leads to JVM crash |
#6404 | [BUG] DMLC XGBoost train FAILED against rapids-4-spark 22.10.0-SNAPSHOT FAILED |
#6531 | [BUG] window function of window function queries fail on Databricks 10.4 |
#6559 | [BUG]EmptyHashedRelation$ cannot be cast to org.apache.spark.sql.rapids.execution.SerializeConcatHostBuffersDeserializeBatch |
#6501 | [BUG]cgroup directory permission get reverted on reboot |
#6558 | [BUG] orc_write_test.py::test_write_ cases failed |
#6519 | [BUG] Windowing skew caused GPU run OOM |
#135 | [BUG] mergeSchema on ORC reads does not work |
#6302 | [BUG] spark.sql.parquet.outputTimestampType is not considered during read/write parquet for nested types containing timestamp |
#1059 | [BUG] adaptive query executor and delta optimized table writes don't work on databricks |
#6416 | [BUG] Example Jupyter notebook fails to parse and contains errors |
#5657 | [BUG] Documented deployment of spark-avro is not tested |
#6520 | [BUG] NoClassDefFoundError: com/nvidia/spark/rapids/shims/PlanShims in UCX tests |
#6397 | [BUG] GpuBringBackToHost doExecute needs columnar conversion |
#6460 | [BUG] test_hash_grpby_sum_full_decimal fails |
#6465 | [BUG] orc_cast_test fails on CDH |
#6478 | [BUG] test_cast_float_to_timestamp_side_effect intermittently fails |
#6372 | [BUG] Decimal average excessively checks for overflow |
#6467 | [BUG] Fix DOP calculations for xdist |
#6428 | [BUG] IntervalDivisionSuite has memory leak |
#6438 | [BUG] GpuSortArray doesn't match the behavior of Spark when handling Nan s |
#6442 | [BUG] java.lang.ClassNotFoundException: org.apache.spark.sql.rapids.execution.SerializeConcatHostBuffersDeserializeBatch |
#6417 | [BUG] CDH integration tests ClassNotFoundException: com.nvidia.spark.rapids.spark321cdh.RapidsShuffleManager |
#6471 | [BUG] Encrypted Parquet writes are not falling back if configs are set in configuration |
#6433 | [BUG] dist module "install" should install reduced pom |
#6240 | [BUG] shuffle file can not be deleted correctly when use RapidsShuffleManager. |
#6446 | [BUG] test_casting_from_integer[timestamp] fails on databricks321 |
#6426 | [BUG] GpuShuffledHashJoinExecSuite has leaks |
#6447 | [BUG] Python UDF triggered java.lang.NullPointerException |
#6406 | [BUG] integration tests arithmetic_ops_test.test_day_time_interval_multiply_number failing |
#6340 | [BUG] test_hash_grpby_sum_full_decimal can fail with negative numbers |
#6368 | [BUG] It's confusing that BASE_SPARK_VERSION in jenkins/databricks/build.sh, but BASE_SPARK_VER in databricks/test.sh |
#6351 | [BUG] Implement escape characters for spark property encoding in PYSP_TEST env variables |
#6284 | [BUG] date_format cannot output with subsecond |
#6341 | [BUG] test_decimal_multiplication_mixed_no_overflow_guarantees fails for some negative values |
#6303 | [BUG] Coalescing readers don't include filterblock time in scan time metric |
#6363 | [BUG] missing zip utility on CI |
#6073 | [SPARK-39806][SQL] Accessing _metadata on partitioned table can crash a query |
#6330 | [BUG] withPsNote on ArrayMin does not appear in generated docs |
#6332 | [BUG] array_min does not fall back to CPU when hasNan = true |
#6352 | [BUG] Reading Binary Type in Iceberg table fallback to CPU |
#6347 | [BUG] test_delta_metadata_query_fallback failed in spark32X |
#6359 | [BUG] test_from_json_map failed |
#5619 | [BUG] Mixing parquet input files with different schemas results in crashes |
#6344 | [BUG] Iceberg tests fail due to duplication of spark.jarc conf via PYSP_TST and on the command line |
#3851 | [BUG] ShimLoader.updateSparkClassLoader fails with openjdk Java11 |
#5714 | [BUG] discrepancy in the plugin jar deployment in run_pyspark_from_build.sh depending on TEST_PARALLEL |
#6294 | [BUG] Incorrect result when casting timestamp to string |
#6165 | [BUG] AnsiCastOpSuite fail in spark331 shim |
#6308 | [BUG] Integration tests failing on Spark 3.2 due to BinaryType |
#6243 | [BUG] AST fuzz test regexp find, replace fail |
#6236 | [BUG] integration tests corrupt executorEnv names containing underscore |
#5706 | [BUG] buildall --generate-bloop creates projects that Metals/Bloop does not recognize in VS code |
#6907 | [Doc]a hot fix for download links versions[skip ci] |
#6803 | Updated 22.10 changelog to latest [skip ci] |
#6799 | Update JNI version to released 22.10.0 |
#6755 | [doc] Add diagnostic tool section to GCP Dataproc getting started page [skip ci] |
#6734 | Init 22.10 changelog [skip ci] |
#6770 | Revert "Docker container for ease of deployment to Databricks [skip ci]" |
#6754 | [Doc] update getting started guide for emr 6.8.0 release[skip ci] |
#6772 | [Doc]remove group on array in 22.10, target in 22.12[skip ci] |
#6767 | Avoid any issues with scalar values returned by evalColumnar |
#6765 | [DOC] Add gcp dataproc gpu limit [skip ci] |
#6703 | Docker container for ease of deployment to Databricks [skip ci] |
#6750 | Enabling decimal 38,2 casting |
#6729 | Fix NullPointerException in iceberg schema parsing code when selecting single column |
#6724 | Qualification tool: Read SQL function names for parsing expressions |
#6695 | [Doc] Adding Dataproc quick start steps to use new user tools package [skip ci] |
#6719 | Document that we test on JDK8 and JDK11, other versions are untested [skip ci] |
#6721 | Fix a couple of markdown links that are now permanently moved [skip ci] |
#6701 | Add AutoTuner documentation [skip ci] |
#6709 | Take semaphore after first stream batch is materialized (broadcast) |
#6697 | Fix AutoTuner yaml error handling and discovery script rounding |
#6705 | Suppress warning for jdk11 Finalize method deprecation |
#6691 | Fix validity checks for large decimal window bounds |
#6670 | [Doc]Add 22.10 download page[skip ci] |
#6690 | Update spark2 code for Revert "Add support for arrays in hashaggregate" |
#6689 | Fix the maxPartitionBytes recommendation by AutoTuner to use the max task input bytes |
#6652 | Revise AutoTuner to match the BootStrap tool |
#6616 | String to decimal casting custom kernel |
#6679 | Revert "Add support for arrays in hashaggregate (#6066)" |
#6631 | Fixes split estimation in explode/explode_outer |
#6604 | Make broadcast tables spillable |
#6666 | Fix resource leaks in regexp_extract_all |
#6657 | Add Qualification tool support for running application - per sql output |
#6662 | update spark2 code |
#6643 | Workflow to add new issues to Github global project [skip ci] |
#6648 | Update iceberg doc for split size options [docs] |
#6640 | Avoid failing test on cleanup when filesystem has issues |
#6641 | Fix case where number of shuffle writer threads is set to 0 |
#6638 | Qualification tool: Print cluster usage tags to csv and log file |
#6651 | Changing toList to toIterator to improve memory optimization and runt… |
#6601 | delta lake deletes/updates on Databricks fail when using alluxio |
#6642 | Qualification tool application time calculation can count stages twice if in separate sql queries |
#6606 | Print nvidia-smi output when a task fails due to a cuda fatal exception. |
#6630 | Allow AutoTuner to accept remore path for WorkerInfo |
#6627 | Move spark331 back to list of snapshot shims |
#6617 | Fix a Delta Lake Deletes issue |
#6612 | Tolerate event log folder existence when to create it to avoid raisin… |
#6610 | Disable 22.10 snapshot builds |
#6607 | Enable tests that were missed when binary support was extended |
#6584 | Fix spark2-sql-plugin |
#6506 | Add alluxio reliability doc |
#6609 | Enable automerge from 22.10 to 22.12 [skip ci] |
#6569 | Add dynamic partition concurrent writer to avoid full sort |
#6602 | Fix version-def script to correctly set list of shims |
#6599 | Add in support for casting binary to string |
#6432 | [Doc]Add archived release page[skip ci] |
#6594 | Add Apache snapshot repository when running Avro tests |
#6574 | Add shim layer for Cloudera CDS 3.3 |
#6412 | Qualification tool: Print unsupported Execs and expressions |
#6590 | Parallelize tests using spark packages feature |
#6589 | Update doc to indicate ORC and Parquet zstd read support [skip ci] |
#6437 | Use dist/pom file as source of truth for spark versions |
#6587 | Delta Lake and AQE on Databricks 10.4 workaround |
#6573 | Update UCX to 1.13.1 in CI and sets UCX_TLS=^posix |
#6586 | Adds link to spark supporting shuffle classes and fix copyright |
#6545 | Allow ORC tests to run with wider range of timestamp input |
#6511 | Multi-threaded shuffle reader for RapidsShuffleManager |
#6576 | Bump snakeyaml version to 1.32 |
#6577 | Work around multiprocess issues with updating Ivy cache |
#6579 | Disable UCX smoke test temporarily |
#6564 | Fix the check of empty batches for partitioning |
#6534 | Add GpuColumnVectorUtils to access GpuColumnVector |
#6575 | Fix maxPartitionBytes bounds checking in AutoTuner |
#6553 | Update handling for projectList based WindowExecs to handle window function of window function |
#6562 | Handle EmptyRelation in GpuSubqueryBroadcastExec |
#6504 | [DOC] Add notes for cgroup permission reverted[skip ci] |
#6554 | Support Decimal ordering column for RANGE window functions |
#6550 | Allow percent_rank to not need an entire group in memory |
#6557 | Mitigate non-test failure and remove 21.xx premerge support |
#6566 | Fix map gen for orc_write_test.py |
#6563 | Add missing closing ``` for a code block [skip ci] |
#6512 | Remove the hasNans config and update the doc |
#6542 | [Doc]Doc update for databricks single node cluster[skip ci] |
#6555 | Document a safe unshimming algorithm [skip ci] |
#6549 | Update SnakeYaml version for bug fixes |
#6523 | ORC reading supports mergeSchema |
#6522 | Nightly spark-tests script to follow PYSP_TEST pattern [skip ci] |
#6548 | Fixes for recent cuDF regexp changes |
#6541 | Add another alluxio path replacement algorithm |
#6547 | Append new authorized user to blossom-ci whitelist [skip ci] |
#6429 | Fix up buffer time for multi-file readers |
#6473 | Fix parquet write when the input column is nested type containing timestamp |
#6461 | Enabling AQE on |
#6436 | Switch to gpu string to integer casts |
#6538 | Updating qual tool speedup factors from latest CSP benchmarks |
#6421 | Fix notebook and getting started examples [skip ci] |
#6505 | Include avro test by using '--packages' option [skip ci] |
#6525 | Fix typo in file name |
#6527 | Use ShimLoader to access PlanShims |
#6466 | Use tiered projections for hash aggregates |
#6510 | Revert "Added in very specific support for from_json to a Map<String,String> (#6211)" |
#6319 | Support float/double castings for ORC reading |
#6498 | Allow filtering blocks to be done multithreaded in the Parquet coalescing reader |
#6507 | Perform columnar-to-row transition in GpuBringBackToHost.doExecute |
#6491 | [DOC] Change recommend setting of spark.locality.wait to 3s [skip ci] |
#6476 | Add GPU acceleration for OptimizedCreateHiveTableAsSelect |
#6499 | Fix non-deterministic overflows in test_hash_grpby_sum_full_decimal |
#6490 | Fix: orc_cast_test fails on CDH |
#6486 | Remove the hasNans config from GpuCollectSet |
#6484 | Fixes excessive ShuffleBlockId object creation due to missing map index bounds |
#6492 | Fix intermittent failure on test_cast_float_to_timestamp_side_effect |
#6483 | Fix DOP calculation for xdist |
#6479 | Remove KnownFloatingPointNormalized from allow_non_gpu |
#6482 | Fix leak in interval divide |
#6451 | Normalize nans in GpuSortArray |
#6066 | Add support for arrays in hashaggregate |
#6475 | Change GpuKryoRegistrator to load the classes we want to register with the ShimLoader |
#6472 | Check more places for Parquet encryption configs |
#6468 | Use non-capture groups in LIKE regexp pattern |
#6434 | Install reduced pom for dist module |
#6462 | Increase stability of pytest run with PVC storage |
#6454 | Support bool/int8/16/32/64 castings for ORC reading |
#6422 | Iceberg supports coalescing reading for Parquet |
#6450 | Add new github ID to blossom-ci allow list [skip ci] |
#6458 | Change some Alluxio log messages to be debug |
#6457 | Reading delta log Table Checkpoint files should fallback the entire plan |
#6439 | Fix leaks in GpuShuffledHashJoinExecSuite |
#6251 | Add Nan handling in the GpuMin |
#6449 | Remove caching of needles in GpuInSet |
#6414 | Add support for full 128-bit decimal divide |
#6448 | Revert patch that caused failing test on databricks 321 |
#6441 | Skip decimal gens that overflow on Spark 3.3.0+ |
#6273 | Support bool/int8/int16/int32/int64 castings for ORC reading. |
#6370 | Support simple pass-through for FromUTCTimestamp |
#6290 | Add GpuMapConcat support for nested type keys. |
#6405 | Support more timestamp format when casting string to timestamp |
#6418 | Fix tests for DateTimeInterval that were overflowing on CPU |
#6410 | Fix handling of older array encodings in Parquet |
#6398 | Fix DecimalGen to generate full range and fix failing test cases |
#6396 | Make the variable "BASE_SPARK_VERSION" consistent |
#6409 | Fix test_dpp_from_swizzled_hash_keys on CDH |
#6407 | Remove empty unreferenced file unshimmed-spark311.txt |
#6379 | Rebalance time of parallel stages for pre-merge CI |
#6358 | Support _ in spark conf of integration tests |
#6387 | Use new custom kernel for large decimal multiply |
#6355 | Include filterblock time in scan time metric for Coalescing readers |
#6393 | Add zip&unzip in pre-merge dockerfile |
#6374 | Remove anthony-chang [skip ci] |
#6349 | Add Nan handling in GpuArrayMin |
#6371 | Fix datetime name collision in cast_test |
#6361 | Binary type support in Iceberg read |
#6306 | Struct null aware equality comparator <=> support |
#6350 | Allow writing Binary data in Parquet |
#6365 | Honor delta_lake marker for pytest |
#6271 | Add format SSS for date_format function |
#6338 | Adding AutoTuner to Profiling Tool |
#6356 | Fix auto merge conflict 6353 [skip ci] |
#6342 | Avoid passing duplicate conf to spark_init_internal |
#6286 | Change TimestampGen unit in integration test from millisecond to microsecond |
#6335 | Add missing subnet option to dataproc cluster example [skip ci] |
#6307 | Add more information in FileSourceScanExec log when timezone is not UTC |
#5981 | Run Delta Lake tests with Spark 3.2.x |
#5646 | Use Spark's Utils.getContextOrSparkClassLoader to load Shims |
#6333 | Make run_pyspark to report fail and error as default |
#6044 | [BUG] Fix IT discrepancy which depending on TEST_PARALLEL |
#6311 | Re-implement cast timestamp to string and add more tests |
#6316 | Add Nan handling for GpuArrayMax |
#6256 | [Bug] Add Expr OverflowInTableInsert to fix AnsiCastOpSuite |
#6314 | Increase robustness of mvn commands in nightly scripts |
#6318 | [BugFix]Change the RapidsDiskBlockManager in ShuffleBufferCatalog to guarantee the shuffle files can be cleaned successfully |
#6006 | Estimate and validate regular expression complexities |
#6305 | Increase robustness of MVN commands in pre-merge scripts |
#6309 | Add BinaryType to some shimmed expressions |
#6062 | Nested struct binary comparison operator support |
#6298 | Add BinaryType support to operations that already support arrays |
#6297 | Fix merge conflict with branch-22.08 |
#6241 | Read metadata only when read schema is empty in Avro multi-threaded reading |
#5989 | Add NaN handling in GpuMax |
#6203 | Add config option to log all query transformations |
#6246 | Fix merge conflict with 22.08 |
#6247 | regexp: Catch "nothing to repeat" errors nested in groups |
#6237 | Preserve underscore in executorEnv in integration tests |
#6235 | Fix merge conflict with branch-22.08 |
#6110 | Iceberg Parquet supports multi-threaded reading. |
#6227 | Configurable task failures in integration tests |
#6194 | Make dist jar compression opt-out optional |
#6211 | Added in very specific support for from_json to a Map<String,String> |
#6218 | Disable overflow tableInsert tests for 331+ |
#6210 | Fix merge conflict with branch-22.08 |
#6152 | Improve coverage in mvn verify check github workflow |
#6156 | Fix Bloop project generation in buildall [skip ci] |
#5946 | GpuGlobalLimitExec and GpuCollectLimitExec support offset |
#6162 | Remove hard-coded versions from buildall [skip ci] |
#6055 | Add tests for .count() in the file readers |
#6129 | Init 22.10.0-SNAPSHOT |
#6081 | [FEA] Update spark2 code for 22.08 |
#5508 | [FEA] collect_set on struct[Array] |
#5222 | [FEA] Support function array_except |
#5228 | [FEA] Support array_union |
#5188 | [FEA] Support arrays_overlap |
#4932 | [FEA] Support ArrayIntersect on at least Arrays of String |
#4005 | [FEA] Support First() in windowing context with Integer type |
#5061 | [FEA] Support last in windowing context for Integer type. |
#6059 | [FEA] Add SQL table to Qualification's app-details view |
#5617 | [FEA] Qualification tool support parsing expressions (part 1) |
#4719 | [FEA] GpuStringSplit: Add support for line and string anchors in regular expressions |
#5502 | [FEA] Qualification tool should use SQL ID of each Application ID like profiling tool |
#5524 | [FEA] Automatically adjust spark.rapids.sql.format.parquet.multiThreadedRead.numThreads to the same as spark.executor.cores |
#4817 | [FEA] Support Iceberg batch reads |
#5510 | [FEA] Support Iceberg for data INSERT, DELETE operations |
#5890 | [FEA] Mount the alluxio buckets/paths on the fly when the query is being executed |
#6018 | [FEA] Support Spark 3.2.2 |
#5417 | [FEA] Fully support reading parquet binary as string |
#4283 | [FEA] Implement regexp_extract_all on GPU for idx > 0 |
#4353 | [FEA] Implement regexp_extract_all on GPU for idx = 0 |
#5813 | [FEA] Set sql.json.read.double.enabled and sql.csv.read.double.enabled to true by default |
#4720 | [FEA] GpuStringSplit: Add support for limit = 0 and limit =1 |
#5953 | [FEA] Support Rocky Linux release |
#5204 | [FEA] Support Key vectors for GetMapValue and ElementAt for maps. |
#4323 | [FEA] Profiling tool add option to filter based on filesystem date |
#5846 | [FEA] Support null characters in regular expressions |
#5904 | [FEA] Add support for negated POSIX character classes in regular expressions |
#5702 | [FEA] Set spark.rapids.sql.explain=NOT_ON_GPU by default |
#5867 | [FEA] Add shim for Spark 3.3.1 |
#5628 | [FEA] Enable Application detailed view in Qualification UI |
#5831 | [FEA] Update default speedup factors used for qualification tool |
#4519 | [FEA] Add regular expression support for Form Feed, Alert, and Escape control characters |
#4040 | [FEA] Support spark.sql.parquet.binaryAsString=true |
#5797 | [FEA] Support RoundCeil and RoundFloor when scale is zero |
#4468 | [FEA] Support repetition quantifiers ? and * with regexp_replace |
#5679 | [FEA] Support MMyyyy date/timestamp format |
#4413 | [FEA] Add support for POSIX characters in regular expressions |
#4289 | [FEA] Regexp: Add support for word and non-word boundaries in regexp pattern |
#4517 | [FEA] Add support for word boundaries \b and \B in regular expressions |
#6060 | [FEA] Add experimental multi-threaded BypassMergeSortShuffleWriter |
#5453 | [FEA] Support runtime filters for BatchScanExec |
#5075 | Performance can be very slow when reading just a few columns out of many on parquet |
#5624 | [FEA] Let CPU handle Delta table's metadata related queries |
#4837 | [FEA] Optimize JSON reading of floating-point values |
#6112 | [BUG] UCX ubuntu dockerfile build failed |
#6281 | [BUG] Reading binary columns from nested types does not work. |
#6282 | [BUG] Missing CPU fallback for GetMapValue on scalar map, vector key |
#6208 | [BUG] test_array_intersect failed in databricks 10.4 runtime and Spark 3.3+ |
#6249 | [BUG] test_array_union_before_spark313 failed in UCX job |
#6232 | [BUG] Query failed with java.lang.NullPointerException when doing GpuSubqueryBroadcastExec |
#6230 | [BUG] AQE does not respect entirePlanWillNotWork |
#6131 | [BUG] count() in avro failed when reader_types is coalescing |
#6220 | [BUG] Host buffer leak occurred when executing count with Avro multi-threaded reader |
#6160 | [BUG] When Hive table's actual data has varchar, but the DDL is string, then query fails to do varchar to string conversion |
#6183 | [BUG] Qualification UI uses single precision floating point |
#6005 | [BUG] When old Hive partition has different schema than new partition& Hive Schema, read old partition fails with "Found no metadata for schema index" |
#6158 | [BUG] AQE being used on Databricks even when its disabled |
#6179 | [BUG] Qualfication tool per sql output --num-output-rows option broken |
#6157 | [BUG] Pandas UDF hang in Databricks |
#6167 | [BUG] iceberg_test failed in nightly |
#6128 | [BUG] Can not ansi cast decimal type to long type while fetching decimal column from data table |
#6029 | [BUG] Query failed if reading a Hive partition table with partition key column is a Boolean data type, and if spark.rapids.alluxio.pathsToReplace is set |
#6054 | [BUG] Test Parquet nested unsigned int: uint8, uint16, uint32 FAILED in spark 320+ |
#6086 | [BUG] checkValue does not work in RapidsConf |
#6127 | [BUG] regex_test failed in nightly |
#6026 | [BUG] Failed to cast value false to BooleanType for partition column k1 |
#5984 | [BUG] DATABRICKS: NullPointerException: format is null in 22.08 (works fine with 22.06) |
#6089 | [BUG] orc_test is failing on Spark 3.2+ |
#5892 | [BUG] When using Alluxio+Spark RAPIDS, if the S3 bucket is not mounted, then query will return nothing |
#6056 | [BUG] zstd integration tests failed for orc on Cloudera |
#5957 | [BUG] Exception calling collect() when partitioning using with arrays with null values using array_union(...) |
#6017 | [BUG] test_parquet_read_round_trip hanging forever in spark 32x standalone mode |
#6035 | [BUG] cache tests throws ClassCastException on Databricks |
#6032 | [BUG] Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec failure |
#6028 | [BUG] regexp_test is failing in nightly tests |
#3677 | [BUG] PCBS does not fully follow the pattern for public classes |
#6022 | [BUG] test_iceberg_fallback_not_unsafe_row failed in databricks 10.4 runtime |
#109 | [BUG] GPU degreees function does not overflow |
#5959 | [BUG] test_parquet_read_encryption fails |
#5493 | [BUG] test_parquet_read_merge_schema failed w/ TITAN V |
#5521 | [BUG] Investigate regexp failures with unicode input |
#5629 | [BUG] regexp unicode tests require LANG=en_US.UTF-8 to pass |
#5448 | [BUG] partitioned writes require single batches and sorting, causing gpu OOM in some cases |
#6003 | [BUG] join_test failed in integration tests |
#5979 | [BUG] executors shutdown intermittently during integrations test parallel run |
#5948 | [BUG] GPU ORC reading fails when positional schema is enabled and more columns are required. |
#5909 | [BUG] Null characters do not work in regular expression character classes |
#5956 | [BUG] Warnings in build for GpuRegExpUtils with group_index |
#4676 | [BUG] Research associating MemoryCleaner to Spark's ShutdownHookManager |
#5854 | [BUG] Memory leaked in some test cases |
#5937 | [BUG] test_get_map_value_string_col_keys_ansi_fail in databricks321 runtime |
#5891 | [BUG] GpuShuffleCoalesce op time metric doesn't include concat batch time |
#5896 | [BUG] Profiling tool on taking a really long time for integration tests |
#5939 | [BUG] Qualification tool UI. Read Schema column is broken |
#5711 | [BUG] regexp: Build fails on CI when more characters added to fuzzer but not locally |
#5929 | [BUG] test_sorted_groupby_first_last failed in nightly tests |
#5914 | [BUG] test_parquet_compress_read_round_trip tests failed in spark320+ |
#5859 | [BUG] Qualification tools csv order is not in sync |
#5648 | [BUG] compile-time references to classes potentially unavailable at run time |
#5838 | [BUG] Qualification ui output goes to wrong folder |
#5855 | [BUG] MortgageSparkSuite.scala set spark.rapids.sql.explain as true, which is invalid |
#5630 | [BUG] Qualification UI cannot render long strings |
#5732 | [BUG] fix estimated speed-up for not-applicable apps in Qualification results |
#5788 | [BUG] Qualification UI Sanitize template content |
#5836 | [BUG] string_test.py::test_re_replace_repetition failed IT |
#5837 | [BUG] test_parquet_read_round_trip_binary_as_string failures on YARN and Dataproc |
#5726 | [BUG] CastChecks.sparkIntegralSig has BINARY in it twice |
#5775 | [BUG] TimestampSuite is run on Spark 3.3.0 only |
#5678 | [BUG] Inconsistency between the time zone in the fallback reason and the actual time zone checked in RapidsMeta.checkTImeZoneId |
#5688 | [BUG] AnsiCast is merged into Cast in Spark 340, failing the 340 build |
#5480 | [BUG] Some arithmetic tests are failing on Spark 3.4.0 |
#5777 | [BUG] repeated runs of mvn package without clean lead to missing spark-rapids-jni-version-info.properties in dist jar |
#5456 | [BUG] Handle regexp_replace inconsistency from https://issues.apache.org/jira/browse/SPARK-39107 |
#5683 | [BUG] test_cast_neg_to_decimal_err failed in recent 22.08 tests |
#5525 | [BUG] Investigate more edge cases in regexp support |
#5744 | [BUG] Compile failure with Spark 3.2.2 |
#5707 | [BUG] Fix shim-related bugs |
#6376 | Update 22.08 changelog to latest |
#6367 | Revert "Enable Strings as a supported type for GpuColumnarToRow transitions" |
#6354 | Update 22.08 changelog to latest [skip ci] |
#6348 | Update plugin jni version to released 22.08.0 |
#6234 | [Doc] Add 22.08 docs' links [skip ci] |
#6288 | CPU fallback for Map scalars with key vectors |
#6292 | Fix parquet binary reads to do the transformation in the plugin |
#6257 | Fallback to CPU for Parquet reads with _databricks_internal columns |
#6274 | Use schema instead of row field count during columnar conversion |
#6268 | Apply BroadcastMode key projections before interpreting key expressions in subqueries |
#6250 | Fix bug where AQE does not respect entirePlanWillNotWork |
#6248 | Fix some issues with reading binary from parquet |
#6239 | Add rocky Dockerfiles and refine docker documentation |
#6079 | Add support for nested types to collect_set(...) on the GPU |
#6215 | Update Spark2 Explain API code for 22.08 |
#6161 | Added binary read support for Parquet [Databricks] |
#6222 | Init 22.08 changelog [skip ci] |
#6225 | Fix count() in avro failed when reader_types is coalescing |
#6216 | [Doc] Update 22.08 documentation |
#6223 | Temporary fix for test_array_intersect failures on Spark 3.3.0 |
#6221 | Release host buffers when Avro read schema is empty |
#6132 | [DOC]update outofdate mortgage notebooks and update docs for xgboost161 jar[skip ci] |
#6188 | Allow ORC conversion from VARCHAR to STRING |
#6013 | Add fixed issues to regex fuzzer |
#5958 | Add set based operations for arrays: array_intersect , array_union , array_except , and arrays_overlap for running on GPU |
#6189 | Qualification UI change floating precision [skip ci] |
#6063 | Fix Parquet schema evolution when missing column is in a nested type |
#6159 | Workaround for Databricks using AQE even when disabled |
#6181 | Fix the qualification tool per sql number output rows option |
#6166 | Update the configs used to choose the Python runner for flat-map Pandas UDF |
#6169 | Fix IcebergProvider classname in unshim exceptions |
#6103 | Fix crash when casting decimals to long |
#6071 | Update test_add_overflow_with_ansi_enabled and test_subtraction_overflow_with_ansi_enabled to check the exception type for Integral case. |
#6136 | Fix Alluxio inferring partitions for BooleanType with Hive |
#6027 | Re-enable "transpile complex regex 2" scala test |
#6140 | Update profile names in unit tests docs [skip ci] |
#6141 | Fixes threaded shuffle writer test mocks for spark 3.3.0+ |
#6147 | Revert "Temporarily disable Parquet unsigned int test in ParquetScanS… |
#6133 | [DOC]update getting started guide doc for aws-emr670 release[skip ci] |
#6007 | Add doc for parsing expressions in qualification tool [skip ci] |
#6125 | Add SQL table to Qualification's app-details view [skip ci] |
#6116 | Fix: check validity before setting the default value |
#6120 | Qualification Tool add test for SQL Description escaping commas for csv |
#6106 | Qualification tool: Parse expressions in WindowExec |
#6040 | Enable anchors in regexp string split |
#6052 | Multi-threaded shuffle writer for RapidsShuffleManager |
#5998 | Enable Strings as a supported type for GpuColumnarToRow transitions |
#6092 | Qualification tool output recommendations on a per sql query basis |
#6104 | Revert to only supporting Apache Iceberg 0.13.x |
#6111 | Fix missed gnupg2 in ucx example dockerfiles [skip ci] |
#6107 | Disable snapshot shims build in 22.08 |
#6016 | Automatically adjust spark.rapids.sql.multiThreadedRead.numThreads to the same as spark.executor.cores |
#6098 | Support Apache Iceberg 0.14.0 |
#6097 | Fix 3.3 shim to include castTo handling AnyTimestampType and minor spacing |
#6057 | Tag GpuWindow child expressions for GPU execution |
#6090 | Add missing is_spark_321cdh import in orc_test |
#6048 | Port whole parsePartitions method from Spark3.3 to Gpu side |
#5941 | GPU accelerate Apache Iceberg reads |
#5925 | Add Alluxio auto mount feature |
#6004 | Check the existence of alluxio path |
#6082 | Enable auto-merge from branch-22.08 to branch-22.10 [skip ci] |
#6058 | Disable zstd orc tests in cdh |
#6078 | Temporarily disable Parquet unsigned int test in ParquetScanSuite |
#6049 | Fix test hang caused by parquet hadoop test jar log4j file |
#6042 | Qualification tool: Parse expressions in Aggregates and Sort execs. |
#6041 | Improve check for UTF-8 in integration tests by testing from the JVM |
#5970 | Address feedback in "Improve regular expression error messages" PR |
#6000 | Support nth_value, first and last in window context |
#6031 | Update spark322shim dependency to released lib |
#6033 | Refactor: Fix PCBS does not fully follow the pattern for public classes |
#6019 | Update the interval division to throw same type exceptions as Spark |
#6030 | Cleans up some of the redundant code in proxy/internal RAPIDS Shuffle Managers |
#5988 | [FEA] Add a progress bar in Qualification tool when it is running |
#6020 | Unify test modes in databricks test script |
#6025 | Skip Iceberg tests on Databricks |
#5983 | Adding AUTO native parquet support and legacy tests |
#6010 | Update docs to better explain limitations of Dataset support |
#5996 | Fix GPU degrees function does not overflow |
#5994 | Skip Parquet encryption read tests if Parquet version is less than 1.12 |
#5776 | Enable regular expression support based on whether UTF-8 is in the current locale |
#6009 | Fix issue where spark-tests was producing an unintended error code |
#5903 | Avoid requiring single batch when using out-of-core sort |
#6008 | Rename test modes in spark-tests.sh [skip ci] |
#5991 | Enable zstd integration tests for parquet and orc |
#5997 | support testing parquet encryption |
#5968 | Add support for regexp_extract_all on GPU |
#5995 | Fix a minor potential issue when rebatching for GpuArrowEvalPythonExec |
#5960 | Set up the framework of type casting for ORC reading |
#5987 | Document how to check if finalized plan on GPU from user code / REPLs [skip ci] |
#5982 | Use the new native parquet footer API instead of the old one |
#5972 | [DOC] add app-details to qualification tools doc [skip ci] |
#5976 | Enable null in regex character classes |
#5974 | Remove scaladoc warning |
#5912 | Fall back to CPU for Delta Lake metadata queries |
#5955 | Fix fake memory leaks in some test cases |
#5915 | Make the error message of changing decimal type the same as Spark's |
#5971 | Append new authorized user to blossom-ci whitelist [skip ci] |
#5967 | [Doc]In Databricks doc, disable DPP config[skip ci] |
#5871 | Improve regular expression error messages |
#5952 | Qualification tool: Parse expressions in ProjectExec |
#5961 | Don't set spark.sql.ansi.strictIndexOperator to false for array subscript test |
#5935 | Enable reading double values on GPU when reading CSV and JSON |
#5950 | Fix GpuShuffleCoalesce op time metric doesn't include concat batch time |
#5932 | Add string split support for limit = 0 and limit =1 |
#5951 | Fix issue with Profiling tool taking a long time due to finding stage ids that maps to sql nodes |
#5954 | Add IT dockerfile for rockylinux8 [skip ci] |
#5949 | Update GpuAdd and GpuSubtract to throw same type exception as Spark |
#5878 | Fix misleading documentation for approx_percentile and some other functions |
#5913 | Update gcp cluster init option [skip ci] |
#5940 | Qualification tool UI. fix Read-Schema column broken [skip ci] |
#5938 | Fix leaks in the test cases of CachedBatchWriterSuite |
#5934 | Add underscore to regexp fuzzer |
#5936 | [BUG] Fix databricks test report location |
#5883 | Add support for element_at and GetMapValue |
#5918 | Filter profiling tool based on start time. |
#5926 | Collect databricks test report |
#5924 | Changes made to the Audit process for prioritizing the commits [skip-ci] |
#5834 | Add support for null characters in regular expressions |
#5930 | Make first/last test for sorted deterministic |
#5917 | Improve sort removal heuristic for sort aggregate |
#5916 | Revert "Enable testing zstd for spark releases 3.2.0 and later (#5898)" |
#5686 | Add GpuMapConcat support for nested-type values |
#5905 | Add support for negated POSIX character classes \P |
#5898 | Enable testing parquet with zstd for spark releases 3.2.0 and later |
#5900 | Optimize some common if/else cases |
#5869 | Qualification: fix sorting and add unit-tests script |
#5819 | Modify the default value of spark.rapids.sql.explain as NOT_ON_GPU |
#5723 | Dynamically load hive and avro using reflection to avoid potential class not found exception |
#5886 | Avoid serializing plan in GpuCoalesceBatches, GpuHashAggregateExec, and GpuTopN |
#5897 | GpuBatchScanExec partitions should be marked transient |
#5894 | [Doc]fix a typo with double "("[skip ci] |
#5880 | Qualification tool: Parse expressions in FilterExec |
#5885 | [Doc] Fix alluxio doc link issue[skip ci] |
#5879 | Avoid duplicate sanitization step when reading JSON floats |
#5877 | Add Apache Spark 3.3.1-SNAPSHOT Shims |
#5783 | assertMinValueOverflow should throw same type of exception as Spark |
#5875 | Qualification ui output goes to wrong folder |
#5870 | Use a common thread pool across formats for multithreaded reads |
#5868 | Profiling tool add wholestagecodegen to execs mapping, sql to stage info and job end time |
#5873 | Correct the value of spark.rapids.sql.explain |
#5695 | Verify DPP over LIKE ANY/ALL expression |
#5856 | Update unit test doc |
#5866 | Fix CsvScanForIntervalSuite leak issues |
#5810 | Qualification UI - add application details view |
#5860 | [Doc]Add Spark3.3 support in doc[skip ci] |
#5858 | Remove SNAPSHOT support from Spark 3.3.0 shim |
#5857 | Remove user sperlingxx[skip ci] |
#5841 | Enable regexp empty string short circuit on shim version 3.1.3 |
#5853 | Fix auto merge conflict 5850 |
#5845 | Update Parquet binaryAsString integration to use a static parquet file |
#5842 | Update default speedup factors for qualification tool |
#5829 | Add regexp support for Alert, and Escape control characters |
#5833 | Add test for GpuCast canonicalization with timezone |
#5822 | Configure log4j version 2.x for test cases |
#5830 | Enable the spark.sql.parquet.binaryAsString=true configuration option on the GPU |
#5805 | [Issue 5726] Removing duplicate BINARY keyword |
#5828 | Update tools module to latest Hadoop version |
#5809 | Disable Spark 3.4.0 premerge for 22.08 and enable for 22.10 |
#5767 | Fix the time zone check issue |
#5814 | Fix auto merge conflict 5812 [skip ci] |
#5804 | Support RoundCeil and RoundFloor when scale is zero |
#5696 | Support Parquet field IDs |
#5749 | Add shims for AnsiCast |
#5780 | Append new authorized user to blossom-ci whitelist [skip ci] |
#5350 | Halt Spark executor when encountering unrecoverable CUDA errors |
#5779 | Fix repeated runs mvn package without clean lead to missing spark-rapids spark-rapids-jni-version-info.properties in dist jar |
#5800 | Fix auto merge conflict 5799 |
#5794 | Fix auto merge conflict 5789 |
#5740 | Handle regexp_replace inconsistency with empty strings and zero-repetition patterns |
#5790 | Fix auto merge conflict 5789 |
#5690 | Update the error checking of test_cast_neg_to_decimal_err |
#5774 | Fix merge conflict with branch-22.06 |
#5768 | Support MMyyyy date/timestamp format |
#5692 | Add support for POSIX predefined character classes |
#5762 | Fix auto merge conflict 5759 |
#5754 | Fix auto merge conflict 5752 |
#5450 | Handle ? , * , {0,} and {0,n} based repetitions in regexp_replace on the GPU |
#5479 | Add support for word boundaries \b and \B |
#5745 | Move RapidsErrorUtils to org.apache.spark.sql.shims package |
#5610 | Fall back to CPU for unsupported regular expression edge cases with end of line/string anchors and newlines |
#5725 | Fix auto merge conflict 5724 |
#5687 | Minor: Clean up GpuConcat |
#5710 | Fix auto merge conflict 5709 |
#5708 | Fix shim-related bugs |
#5700 | Fix auto merge conflict 5699 |
#5675 | Update the error messages for the failing arithmetic tests. |
#5689 | Disable 340 for premerge and nightly |
#5603 | Skip unshim and dedup of external spark-rapids-jni and jucx |
#5472 | Add shims for Spark 3.4.0 |
#5647 | Init version 22.08.0-SNAPSHOT |
#5451 | [FEA] Update Spark2 explain code for 22.06 |
#5261 | [FEA] Create MIG with Cgroups on YARN Dataproc scripts |
#5476 | [FEA] extend concat on arrays to all nested types. |
#5113 | [FEA] ANSI mode: Support CAST between types |
#5112 | [FEA] ANSI mode: allow casting between numeric type and timestamp type |
#5323 | [FEA] Enable floating point by default |
#4518 | [FEA] Add support for escaped unicode hex in regular expressions |
#5405 | [FEA] Support map_concat function |
#5547 | [FEA] Regexp: Can we transpile \W and \D to Java's definition so we can support on GPU? |
#5512 | [FEA] Qualification tool, hook up final output and output execs table |
#5507 | [FEA] Support GpuRaiseError |
#5325 | [FEA] Support spark.sql.mapKeyDedupPolicy=LAST_WIN for TransformKeys |
#3682 | [FEA] Use conventional jar layout in dist jar if there is only one input shim |
#1556 | [FEA] Implement ANSI mode tests for string to timestamp functions |
#4425 | [FEA] Support line anchor $ and string anchors \z and \Z in regexp_replace |
#5176 | [FEA] Qualification tool UI |
#5111 | [FEA] ANSI mode: CAST between ANSI intervals and IntegralType |
#4605 | [FEA] Add regular expression support for new character classes introduced in Java 8 |
#5273 | [FEA] Support map_filter |
#1557 | [FEA] Enable ANSI mode for CAST string to date |
#5446 | [FEA] Remove hasNans check for array_contains |
#5445 | [FEA] Support reading Int as Byte/Short/Date from parquet |
#5449 | [FEA] QualificationTool. Add speedup information to AppSummaryInfo |
#5322 | [FEA] remove hasNans for Pivot |
#4800 | [FEA] Enable support for more regular expressions with \A and \Z |
#5404 | [FEA] Add Shim for the Spark version shipped with Cloudera CDH 7.1.7 |
#5226 | [FEA] Support array_repeat |
#5229 | [FEA] Support arrays_zip |
#5119 | [FEA] Support ANSI mode for SQL functions/operators |
#4532 | [FEA] Re-enable support for \Z in regular expressions |
#3985 | [FEA] UDF-Compiler: Translation of simple predicate UDF should allow predicate pushdown |
#5034 | [FEA] Implement ExistenceJoin for BroadcastNestedLoopJoin Exec |
#4533 | [FEA] Re-enable support for $ in regular expressions |
#5263 | [FEA] Write out operator mapping from plugin to CSV file for use in qualification tool |
#5095 | [FEA] Support collect_set on struct in reduction context |
#4811 | [FEA] Support ANSI intervals for Cast and Sample |
#2062 | [FEA] support collect aggregations |
#5060 | [FEA] Support Count on Struct of [ Struct of [String, Map(String,String)], Array(String), Map(String,String) ] |
#4528 | [FEA] Add support for regular expressions containing \s and \S |
#4557 | [FEA] Add support for regexp_replace with back-references |
#5148 | Add the MULTI-THREADED reading support for avro |
#5304 | [FEA] Optimize remote Avro reading for a PartitionFile |
#5257 | [FEA][Audit] - [SPARK-34863][SQL] Support complex types for Parquet vectorized reader |
#5149 | Add the COALESCING reading support for avro |
#5769 | [BUG] arithmetic ops tests failing on Spark 3.3.0 |
#5785 | [BUG] Tests module build failed in OrcEncryptionSuite for 321cdh |
#5765 | [BUG] Container decimal overflow when casting float/double to decimal |
#5246 | Verify Parquet columnar encryption is handled safely |
#5770 | [BUG] test_buckets failed |
#5733 | [BUG] Integration test test_orc_write_encryption_fallback fail |
#5719 | [BUG] test_cast_float_to_timestamp_ansi_for_nan_inf failed in spark330 |
#5739 | [BUG] Spark 3.3 build failure - QueryExecutionErrors package scope changed |
#5670 | [BUG] Job failed when parsing "java.lang.reflect.InvocationTargetException: org.apache.spark.sql.catalyst.parser.ParseException:" |
#4860 | [BUG] GPU writing ORC columns statistics |
#5717 | [BUG] div_by_zero test is failing on Spark 330 on 22.06 |
#5632 | [BUG] udf_cudf tests failed: EOFException DataInputStream.readInt(DataInputStream.java:392) |
#5672 | [BUG] Read exception occurs when clipped schema is empty |
#5694 | [BUG] Inconsistent behavior with Spark when reading a non-existent column from Parquet |
#5562 | [BUG] read ORC file with various file schemas |
#5654 | [BUG] Transpiler produces regex pattern that cuDF cannot compile |
#5655 | [BUG] Regular expression pattern [&&1] produces incorrect results on GPU |
#4862 | [FEA] Add support for regular expressions containing octal digits inside character classes , eg[\0177] |
#5615 | [BUG] GpuBatchScanExec only reports output row metrics |
#4505 | [BUG] RegExp parse fails to parse character ranges containing escaped characters |
#4865 | [BUG] Add support for regular expressions containing hexadecimal digits inside character classes, eg [\x7f] |
#5513 | [BUG] NoClassDefFoundError with caller classloader off in GpuShuffleCoalesceIterator in local-cluster |
#5530 | [BUG] regexp: \d , \w inconsistencies with non-latin unicode input |
#5594 | [BUG] 3.3 test_div_overflow_exception_when_ansi test failures |
#5596 | [BUG] Shim service provider failure when using jar built with -DallowConventionalDistJar |
#5582 | [BUG] Nightly CI failed with : 'dist/target/rapids-4-spark_2.12-22.06.0-SNAPSHOT.jar' not exists |
#5577 | [BUG] test_cast_neg_to_decimal_err failing in databricks |
#5557 | [BUG] dist jar does not contain reduced pom, creates an unnecessary jar |
#5474 | [BUG] Spark 3.2.1 arithmetic_ops_test failures |
#5497 | [BUG] 3 tests in IntervalSuite are faling on 330 |
#5544 | [BUG] GpuCreateMap needs to set hasSideEffects in some cases |
#5469 | [BUG] NPE during serialization for shuffle in array-aggregation-with-limit query |
#5496 | [BUG] avg literals bools is failing on 330 |
#5511 | [BUG] orc_test failures on 321cdh |
#5439 | [BUG] Encrypted Parquet writes are being replaced with a GPU unencrypted write |
#5108 | [BUG] GpuArrayExists encounters a CudfException on an input partition consisting of just empty lists |
#5492 | [BUG] com.nvidia.spark.rapids.RegexCharacterClass cannot be cast to com.nvidia.spark.rapids.RegexCharacterClassComponent |
#4818 | [BUG] ASYNC: the spill store needs to synchronize on spills against the allocating stream |
#5481 | [BUG] test_parquet_check_schema_compatibility failed in databricks runtimes |
#5482 | [BUG] test_cast_string_date_invalid_ansi_before_320 failed in databricks runtime |
#5457 | [BUG] 330 AnsiCastOpSuite Unit tests failed 22 cases |
#5098 | [BUG] Harden calls to RapidsBuffer.free |
#5464 | [BUG] Query failure with java.lang.AssertionError when using partitioned Iceberg tables |
#4746 | [FEA] Add support for regular expressions containing octal digits in range \200 to 377 |
#5200 | [BUG] More detailed logs to show which parquet file and which data type has mismatch. |
#4866 | [BUG] Add support for regular expressions containing hexadecimal digits greater than 0x7f |
#5140 | [BUG] NPE on array_max of transformed empty array |
#5444 | [BUG] build failed on Databricks |
#5357 | [BUG] Spark 3.3 cache_test test_passing_gpuExpr_as_Expr[failures |
#5429 | [BUG] test_cache_expand_exec fails on Spark 3.3 |
#5312 | [BUG] The coalesced AVRO file may contain different sync markers if the sync marker varies in the avro files being coalesced. |
#5415 | [BUG] Regular Expressions: matching the dot . doesn't fully exclude all unicode line terminator characters |
#5413 | [BUG] Databricks 321 build fails - not found: type OrcShims320untilAllBase |
#5286 | [BUG] assert failed test_struct_self_join and test_computation_in_grpby_columns |
#5351 | [BUG] Build fails for Spark 3.3 due to extra arguments to mapKeyNotExistError |
#5260 | [BUG] map_test failures on Spark 3.3.0 |
#5189 | [BUG] Reading from iceberg table will fail. |
#5130 | [BUG] string_split does not respect spark.rapids.sql.regexp.enabled config |
#5267 | [BUG] markdown link check failed issue |
#5295 | [BUG] Build fails for Spark 3.3 due to extra arguments to mapKeyNotExistError |
#5264 | [BUG] Delete unused generic type. |
#5275 | [BUG] rlike cannot run on GPU because invalid or unsupported escape character ']' near index 14 |
#5278 | [BUG] build 311cdh failed: unable to find valid certification path to requested target |
#5211 | [BUG] csv_test:test_basic_csv_read FAILED |
#5244 | [BUG] Spark 3.3 integration test failures logic_test.py::test_logical_with_side_effect |
#5041 | [BUG] Implement hasSideEffects for all expressions that have side-effects |
#4980 | [BUG] window_function_test FAILED on PASCAL GPU |
#5240 | [BUG] EGX integration test_collect_list_reductions failures |
#5242 | [BUG] Executor falls back to cudaMalloc if the pool can't be initialized |
#5215 | [BUG] Coalescing reading is not working for v2 parquet/orc datasource |
#5104 | [BUG] Unconditional warning in UDF Plugin "The compiler is disabled by default" |
#5099 | [BUG] Profiling tool should not sum gettingResultTime |
#5182 | [BUG] Spark 3.3 integration tests arithmetic_ops_test.py::test_div_overflow_exception_when_ansi failures |
#5147 | [BUG] object LZ4Compressor is not a member of package ai.rapids.cudf.nvcomp |
#4695 | [BUG] Segfault with UCX and ASYNC allocator |
#5138 | [BUG] xgboost job failed if we enable PCBS |
#5135 | [BUG] GpuRegExExtract is not align with RegExExtract |
#5084 | [BUG] GpuWriteTaskStatsTracker complains for all writes in local mode |
#5123 | [BUG] Compile error for Spark330 because of VectorizedColumnReader constructor added a new parameter. |
#5133 | [BUG] Compile error for Spark330 because of Spark changed the method signature: QueryExecutionErrors.mapKeyNotExistError |
#4959 | [BUG] Test case in OpcodeSuite failed on Spark 3.3.0 |
#5863 | Update 22.06 changelog to include new commits [skip ci] |
#5861 | [Doc]Add Spark3.3 support in doc for 22.06 branch[skip ci] |
#5851 | Update 22.06 changelog to include new commits [skip ci] |
#5848 | Update spark330shim to use released lib |
#5840 | [DOC] Updated RapidsConf to reflect the default value of spark.rapids.sql.improvedFloatOps.enabled [skip ci] |
#5816 | Update 22.06.0 changelog to latest [skip ci] |
#5795 | Update FAQ to include local jar deployment via extraClassPath [skip ci] |
#5802 | Update spark-rapids-jni.version to release 22.06.0 |
#5798 | Fall back to CPU for RoundCeil and RoundFloor expressions |
#5791 | Remove ORC encryption test from 321cdh |
#5766 | Fix the overflow of container type when casting floats to decimal |
#5786 | Fix rounds over decimal in Spark 330+ |
#5761 | Throw an exception when attempting to read columnar encrypted Parquet files on the GPU |
#5784 | Update the error string for test_cast_neg_to_decimal_err on 330 |
#5781 | Correct the exception string for test_mod_pmod_by_zero on Spark 3.3.0 |
#5764 | Add test for encrypted ORC write |
#5760 | Enable avrotest in nightly tests [skip ci] |
#5746 | Init 22.06 changelog [skip ci] |
#5716 | Disable Avro support when spark-avro classes not loadable by Shim classloader |
#5737 | Remove the ORC encryption tests |
#5753 | [DOC] Update regexp compatibility for 22.06 [skip ci] |
#5738 | Update Spark2 explain code for 22.06 |
#5731 | Throw SparkDateTimeException for InvalidInput while casting in ANSI mode |
#5742 | Spark-3.3 build fix - Move QueryExecutionErrors to sql package |
#5641 | [Doc]Update 22.06 documentation[skip ci] |
#5701 | Update docs for qualification tool to reflect recommendations and UI [skip ci] |
#5283 | Add documentation for MIG on Dataproc [skip ci] |
#5728 | Qualification tool: Add test for stage failures |
#5681 | Branch 22.06 nvcomp notice binary [skip ci] |
#5713 | Fix GpuCast losing the timezoneId during canonicalization |
#5715 | Update GPU ORC statistics write support |
#5718 | Update the error message for div_by_zero test |
#5604 | ORC encrypted write should fallback to CPU |
#5674 | Fix reading ORC/PARQUET over empty clipped schema |
#5676 | Fix ORC reading over different schemas |
#5693 | Temporarily allow 3.3.1 for 3.3.0 shims. |
#5591 | Enable regular expressions by default |
#5664 | Fix edge case where one side of regexp choice ends in duplicate string anchors |
#5542 | Support arrays of arrays and structs for concat on arrays |
#5677 | Qualification tool Enable UI by default |
#5575 | Regexp: Transpile \D , \W to Java's definitions |
#5668 | Add user as CI owner [skip ci] |
#5627 | Install locales and generate en_US.UTF-8 |
#5514 | ANSI mode: allow casting between numeric type and timestamp type |
#5600 | Qualification tool UI cosmetics and CSV output changes |
#5658 | Fallback to CPU when && found in character class |
#5644 | Qualification tool: Enable UDF reporting in potential problems |
#5645 | Add support for octal digits in character classes |
#5643 | Fix missing GpuBatchScanExec metrics in SQL UI |
#5441 | Enable optional float confs and update docs mentioning them |
#5532 | Support hex digits in character classes and escaped characters in character class ranges |
#5625 | [DOC]update links for 2206 release[skip ci] |
#5623 | Handle duplicates in negated character classes |
#5533 | Support GpuMapConcat |
#5614 | Move HostConcatResultUtil out of unshimmed classes |
#5612 | Qualification tool: update SQL Df value used and look at jobs in SQL |
#5526 | Fix whitespace \s and \S tests |
#5541 | Regexp: Transpile \d , \w to Java's definitions |
#5598 | Qualification tool: Update RunningQualificationApp tests |
#5601 | Update test_div_overflow_exception_when_ansi test for Spark-3.3 |
#5588 | Update Databricks build scripts |
#5599 | Move ShimServiceProvider file re-init/truncate |
#5531 | Filter rows with null keys when coalescing due to reaching cuDF row limits |
#5550 | Qualification tool hook up final output based on per exec analysis |
#5540 | Support RaiseError |
#5505 | Support spark.sql.mapKeyDedupPolicy=LAST_WIN for TransformKeys |
#5583 | Disable spark snapshot shims build for pre-merge |
#5584 | Enable automerge from branch-22.06 to 22.08 [skip ci] |
#5581 | nightly CI to install and deploy cuda11 classifier dist jar [skip ci] |
#5579 | Update test_cast_neg_to_decimal_err to work with Databricks 10.4 where exception is different |
#5578 | Fix unfiltered partitions being used to create GpuBatchScanExec RDD |
#5560 | Minor: Clean up the tests of concat_list |
#5528 | Enable build and test with JDK11 |
#5571 | Update array_min and array_max to use new cudf operations |
#5558 | Fix target file for update from extra-resources in dist module |
#5556 | Move FsInput creation into AvroFileReader |
#5483 | Don't distinguish between types of ArithmeticException for Spark 3.2.x |
#5539 | Fix IntervalSuite cases failure |
#5421 | Support multi-threaded reading for avro |
#5538 | Add tests for string to timestamp functions in ANSI mode |
#5546 | Set hasSideEffects correctly for GpuCreateMap |
#5529 | Fix failing bool agg test in Spark 3.3 |
#5500 | Fallback parquet reading with merged schema and native footer reader |
#5534 | MVN_OPT to last, as it is empty in most cases |
#5523 | Enable forcePositionEvolution for 321cdh |
#5501 | Build against specified spark-rapids-jni snapshot jar [skip ci] |
#5489 | Fallback to the CPU if Parquet encryption keys are set |
#5527 | Fix bug with character class immediately following a string anchor |
#5506 | Fix ClassCastException in regular expression transpiler |
#5519 | Address feedback in "string anchors regexp replace" PR |
#5520 | [DOC] Remove Spark from our naming of Tools [skip ci] |
#5491 | Enables $ , \z , and \Z in REGEXP_REPLACE on the GPU |
#5470 | Qualification tool support UI code generation |
#5353 | Supports casting between ANSI interval types and integral types |
#5487 | Add limited support for captured vars and athrow |
#5499 | [DOC]update doc for emr6.6[skip ci] |
#5485 | Add cudaStreamSynchronize when a new device buffer is added to the spill framework |
#5477 | Add support for \h , \H , \v , \V , and \R character classes |
#5490 | Qualification tool: Update speedup factor for few operators |
#5494 | Fix databrick Shim to support Ansi mode when casting from string to date |
#5498 | Enable 330 unit tests for nightly |
#5504 | Fix printing of split information when dumping debug data |
#5486 | Fix regression in AnsiCastOpSuite with Spark 3.3.0 |
#5436 | Support map_filter operator |
#5471 | Add implicit safeFree for RapidsBuffer |
#5465 | Fix query planning issue when Iceberg is used with DPP and AQE |
#5459 | Add test cases for casting string to date in ANSI mode |
#5443 | Add support for regular expressions containing octal digits greater than \200 |
#5468 | Qualification tool: Add support for join, pandas, aggregate execs |
#5473 | Remove hasNan check over array_contains |
#5434 | Check schema compatibility when building parquet readers |
#5442 | Add support for regular expressions containing hexadecimal digits greater than 0x7f |
#5466 | [Doc] Change the picture of the query plan to text format. [skip ci] |
#5310 | Use C++ to parse and filter parquet footers. |
#5454 | QualificationTool. Add speedup information to AppSummaryInfo |
#5455 | Moved ShimCurrentBatchIterator so it's visible to db312 and db321 |
#5354 | Plugin should throw same arithmetic exceptions as Spark part1 |
#5440 | Qualification tool support for read and write execs and more, add mapping stage times to sql execs |
#5431 | [DOC] Update the ubuntu repo key [skip ci] |
#5425 | Handle readBatch changes for Spark 3.3.0 |
#5438 | Add tests for all-null data for array_max |
#5428 | Make the sync marker uniform for the Avro coalescing reader |
#5432 | Test case insensitive reading for Parquet and CSV |
#5433 | [DOC] Removed mention of 30x from shims.md [skip ci] |
#5424 | Exclude all unicode line terminator characters from matching dot |
#5426 | Qualification tool: Parsing Execs to get the ExecInfo #2 |
#5427 | Workaround to fix cuda repo key rotation in ubuntu images [skip ci] |
#5419 | Append my id to blossom-ci whitelist [skip ci] |
#5422 | xfail tests for spark 3.3.0 due to changes in readBatch |
#5420 | Qualification tool: Parsing Execs to get the ExecInfo #1 |
#5418 | Add GpuEqualToNoNans and update GpuPivotFirst to use to handle PivotFirst with NaN support enabled on GPU |
#5306 | Support coalescing reading for avro |
#5410 | Update docs for removal of 311cdh |
#5414 | Add 320+-noncdh to Databricks to fix 321db build |
#5349 | Enable some repetitions for \A and \Z |
#5346 | ADD 321cdh shim to rapids and remove 311cdh shim |
#5408 | [DOC] Add rebase mode notes for databricks doc [skip ci] |
#5348 | Qualification tool: Skip GPU event logs |
#5400 | Restore test_computation_in_grpby_columns and test_struct_self_join |
#5399 | Update New Issue template to recommend a Discussion or Question [skip ci] |
#5293 | Support array_repeat |
#5359 | Qualification tool base plan parsing infrastructure |
#5360 | Revert "skip failing tests for Spark 3.3.0 (#5313)" |
#5326 | Update GCP doc and scripts [skip ci] |
#5352 | Fix spark330 build due to mapKeyNotExistError changed |
#5317 | Support arrays_zip |
#5316 | Support ANSI mode for ToUnixTimestamp, UnixTimestamp, GetTimestamp, DateAddInterval |
#5319 | Re-enable support for \Z in regular expressions on the GPU |
#5315 | Simplify conditional catalyst expressions generated by udf-compiler |
#5301 | Support existence join type for broadcast nested loop join |
#5313 | skip failing tests for Spark 3.3.0 |
#5311 | Add information about the discussion board to the README and FAQ [skip ci] |
#5308 | Remove unused ColumnViewUtil |
#5289 | Re-enable dollar ($) line anchor in regular expressions in find mode |
#5274 | Perform explicit UnsafeRow projection in ColumnarToRow transition |
#5297 | GpuStringSplit now honors thespark.rapids.sql.regexp.enabled configuration option |
#5307 | Remove compatibility guide reference to issue #4060 |
#5298 | Qualification tool: Operator mapping from plugin to CSV file |
#5266 | Update Outdated GCP getting started guide[skip ci] |
#5300 | Fix DIST_JAR PATH in coverage-report [skip ci] |
#5290 | Add documentation about reporting security issues [skip ci] |
#5277 | Support multiple datatypes in TypeSig.withPsNote() |
#5296 | Fix spark330 build due to removal of isElementAt parameter from mapKeyNotExistError |
#5291 | fix dead links in shims.md [skip ci] |
#5276 | fix markdown check issue[skip ci] |
#5270 | Include dependency of common jar in tools jar |
#5265 | Remove unused generic types |
#5288 | Temporarily xfail tests to restore premerge builds |
#5287 | Fix nightly scripts to deploy w/ classifier correctly [skip ci] |
#5134 | Support division on ANSI interval types |
#5279 | Add test case for ANSI pmod and ANSI Remainder |
#5284 | Enable support for escaping the right square bracket |
#5280 | [BUG] Fix incorrect plugin nightly deployment and release [skip ci] |
#5249 | Use a bundled spark-rapids-jni dependency instead of external cudf dependency |
#5268 | [BUG] When ASYNC is enabled GDS needs to handle cudaMalloced bounce buffers |
#5230 | Update csv float tests to reflect changes in precision in cuDF |
#5001 | Add fuzzing test for JSON reader |
#5155 | Support casting between day-time interval and string |
#5247 | Fix test failure caused by change in Spark 3.3 exception |
#5254 | Fix the integration test of collect_list_reduction |
#5243 | Throw again after logging that RMM could not intialize |
#5105 | Support multiplication on ANSI interval types |
#5171 | Fix the bug COALESCING reading does not work for v2 parquet/orc datasource |
#5157 | Update the log warning of UDF compiler |
#5213 | Support sample on ANSI interval types |
#5218 | XFAIL tests that are failing due to issue 5211 |
#5202 | Profiling tool: Remove gettingResultTime from stages & jobs aggregation |
#5201 | Fix merge conflict from branch-22.04 |
#5195 | Refactor Spark33XShims to avoid code duplication |
#5185 | Fix test failure with Spark 3.3 by looking for less specific error message |
#4992 | Support Collect-like Reduction Aggregations |
#5193 | Fix auto merge conflict 5192 [skip ci] |
#5020 | Support arithmetic operators on ANSI interval types |
#5174 | Fix auto merge conflict 5173 [skip ci] |
#5168 | Fix auto merge conflict 5166 |
#5151 | Remove NvcompLZ4CompressionCodec single-buffer APIs |
#5132 | Add count support for all types |
#5141 | Upgrade to UCX 1.12.1 for 22.06 |
#5143 | Fix merge conflict with branch-22.04 |
#5144 | Adapt to storage-partitioned join additions in SPARK-37377 |
#5139 | Make mvn-verify check name more descriptive [skip ci] |
#5136 | Fix GpuRegExExtract about inconsistent to Spark |
#5107 | Fix GpuFileFormatDataWriter failing to stat file after commit |
#5124 | Fix ShimVectorizedColumnReader construction for recent Spark 3.3.0 changes |
#5047 | Change Cast.toString as "cast" instead of "ansi_cast" under ANSI mode |
#5089 | Enable regular expressions containing \s and \S |
#5087 | Add support for regexp_replace with back-references |
#5110 | Appending my id (mattahrens) to the blossom-ci whitelist [skip ci] |
#5090 | Add nvtx ranges around pre, agg, and post steps in hash aggregate |
#5092 | Remove single-buffer compression codec APIs |
#5093 | Fix leak when GDS buffer store closes |
#5067 | Premerge databricks CI autotrigger [skip ci] |
#5083 | Remove EMRShimVersion |
#5076 | Unshim cache serializer and other 311+-all code |
#5074 | Make ASYNC the default allocator for 22.06 |
#5073 | Add in nvtx ranges for parquet filterBlocks |
#5077 | Change Scala style continuation indentation to be 2 spaces to match guide [skip ci] |
#5070 | Fix merge from 22.04 to 22.06 |
#5046 | Init 22.06.0-SNAPSHOT |
#5059 | Fix merge from 22.04 to 22.06 |
#5036 | Unshim many expressions |
#4993 | PCBS and Parquet support ANSI year month interval type |
#5031 | Unshim many SparkShim interfaces |
#5027 | Fix merge of branch-22.04 to branch-22.06 |
#5022 | Unshim many Pandas execs |
#5013 | Unshim GpuRowBasedScalaUDF |
#5012 | Unshim GpuOrcScan and GpuParquetScan |
#5010 | Unshim GpuSumDefaults |
#5007 | Remove schema utils, case class copying, file partition, and legacy statistical aggregate shims |
#4999 | Enable automerge from branch-22.04 to branch-22.06 [skip ci] |
#4734 | [FEA] Support approx_percentile in reduction context |
#1922 | [FEA] Support ORC forced positional evolution |
#123 | [FEA] add in support for dayfirst formats in the CSV parser |
#4863 | [FEA] Improve timestamp support in JSON and CSV readers |
#4935 | [FEA] Support reading Avro: primitive types |
#4915 | [FEA] Drop support for Spark 3.0.1, 3.0.2, 3.0.3, Databricks 7.3 ML LTS |
#4815 | [FEA] Support org.apache.spark.sql.catalyst.expressions.ArrayExists |
#3245 | [FEA] GpuGetMapValue should support all valid value data types and non-complex key types |
#4914 | [FEA] Support for Databricks 10.4 ML LTS |
#4945 | [FEA] Support filter and comparisons on ANSI day time interval type |
#4004 | [FEA] Add support for percent_rank |
#1111 | [FEA] support spark.sql.legacy.timeParserPolicy when parsing CSV files |
#4849 | [FEA] Support parsing dates in JSON reader |
#4789 | [FEA] Add Spark 3.1.4 shim |
#4646 | [FEA] Make JSON parsing of NaN and Infinity values fully compatible with Spark |
#4824 | [FEA] Support reading decimals from JSON and CSV |
#4814 | [FEA] Support element_at with non-literal index |
#4816 | [FEA] Support org.apache.spark.sql.catalyst.expressions.GetArrayStructFields |
#3542 | [FEA] Support str_to_map function |
#4721 | [FEA] Support regular expression delimiters for str_to_map |
#4791 | Update Spark 3.1.3 to be released |
#4712 | [FEA] Allow to partition on Decimal 128 when running on the GPU |
#4762 | [FEA] Improve support for reading JSON integer types |
#4696 | [FEA] Support casting map to string |
#1572 | [FEA] Add in decimal support for pmod, remainder and divide |
#4763 | [FEA] Improve support for reading JSON boolean types |
#4003 | [FEA] Add regular expression support to GPU implementation of StringSplit |
#4626 | [FEA] cannot run on GPU because unsupported data types in 'partitionSpec' |
#33 | [FEA] hypot SQL function |
#4515 | [FEA] Set RMM async allocator as default |
#3026 | [FEA] [Audit]: Set the list of read columns in the task configuration to reduce reading of ORC data |
#4895 | Add support for structs in GpuScalarSubquery |
#4393 | [BUG] Columnar to Columnar transfers are very slow |
#589 | [FEA] Support ExistenceJoin |
#4784 | [FEA] Improve copying decimal data from CPU columnar data |
#4685 | [FEA] Avoid regexp cost in string_split for escaped characters |
#4777 | Remove input upcast in GpuExtractChunk32 |
#4722 | Optimize DECIMAL128 average aggregations |
#4645 | [FEA] Investigate ASYNC allocator performance with additional queries |
#4539 | [FEA] semaphore optimization in shuffled hash join |
#2441 | [FEA] Use AST for filter in join APIs |
#5233 | [BUG] rapids-tools v22.04.0 release jar reports maven dependency issue : rapids-4-spark-common_2.12:jar:22.04.0 NOT FOUND |
#5183 | [BUG] UCX EGX integration test array_test.py::test_array_exists failures |
#5180 | [BUG] create_map failed with java.lang.IllegalStateException: This is not supported yet |
#5181 | [BUG] Dataproc tests failing when trying to detect for accelerated row conversions |
#5154 | [BUG] build failed in databricks 10.4 runtime (updated recently) |
#5159 | [BUG] Approx percentile query fails with UnsupportedOperationException |
#5164 | [BUG] Databricks 9.1ML failed with "java.lang.NoSuchMethodError: org.apache.spark.sql.execution.metric.SQLMetrics$.createSizeMetric" |
#5125 | [BUG] GpuCast.hasSideEffects does not check if child expression has side effects |
#5091 | [BUG] Profiling tool fails process custom task accumulators of type CollectionAccumulator |
#5050 | [BUG] Release build of v22.04.0 FAILED on "Execution attach-javadoc failed: NullPointerException" with maven option '-P source-javadoc' |
#5035 | [BUG] Different CSV parsing behavior between 22.04 and 22.02 |
#5065 | [BUG] spark330+ build error due to SPARK-37463 |
#5019 | [BUG] udf compiler failed to translate UDF in spark-shell |
#5048 | [BUG] OOM for q18 of TPC-DS benchmark testing on Spark2a |
#5038 | [BUG] When spark.rapids.sql.regexp.enabled is on in 22.04 snapshot jars, Reading a Delta table in Databricks may cause driver error |
#5023 | [BUG] When+sequence could trigger "Illegal sequence boundaries" error |
#5021 | [BUG] test_cache_reverse_order failed |
#5003 | [BUG] Cloudera 3.1.1 tests fail due to ClouderaShimVersion |
#4960 | [BUG] Spark 3.3 IT cache_test:test_passing_gpuExpr_as_Expr failure |
#4913 | [BUG] Fall back to the CPU if we see a scale on Ceil or Floor |
#4806 | [BUG] When running xgboost training, if PCBS is enabled, it fails with java.lang.AssertionError |
#4542 | [BUG] test_write_round_trip failed Maximum pool size exceeded |
#4911 | [BUG][Audit] [SPARK-38314] - Fail to read parquet files after writing the hidden file metadata |
#4936 | [BUG] databricks nightly window_function_test failures |
#4931 | [BUG] Spark 3.3 IT test cache_test.py::test_passing_gpuExpr_as_Expr fails with IllegalArgumentException |
#4710 | [BUG] cudaErrorIllegalAddress for q95 (3TB) on GCP with ASYNC allocator |
#4918 | [BUG] databricks nightly build failed |
#4826 | [BUG] cache_test failures when testing with 128-bit decimal |
#4855 | [BUG] Shim tests in sql-plugin module are not running |
#4487 | [BUG] regexp_find hangs with some patterns |
#4486 | [BUG] Regular expressions with hex digits not working as expected |
#4879 | [BUG] [SPARK-38237][SQL] ClusteredDistribution clustering keys break build with wrong arguments |
#4883 | [BUG] row-based_udf_test.py::test_hive_empty_* fail nightly tests |
#4876 | [BUG] Nightly build failed on Databricks with "pip: No such file or directory" |
#4739 | [BUG] Plugin will crash with query > 100 columns on pascal GPU |
#4840 | [BUG] test_dpp_via_aggregate_subquery_aqe_off failed with table already exists |
#4841 | [BUG] test_compress_write_round_trip failed on Spark 3.3 |
#4668 | [FEA][Audit] - [SPARK-37750][SQL] ANSI mode: optionally return null result if element not exists in array/map |
#3971 | [BUG] udf-examples dependencies are incorrect |
#4022 | [BUG] Ensure shims.v2.ParquetCachedBatchSerializer and similar classes are at most package-private |
#4526 | [BUG] Short circuit AND/OR in ANSI mode |
#4787 | [BUG] Dataproc notebook IT test failure - NoSuchMethodError: org.apache.spark.network.util.ByteUnit.toBytes |
#4704 | [BUG] Update the premerge and nightly tests after moving the UDF example to external repository |
#4795 | [BUG] Read ORC does not ignoreCorruptFiles |
#4802 | [BUG] GPU CSV read does not honor ignoreCorruptFiles or ignoreMissingFiles |
#4803 | [BUG] GPU JSON read does not honor ignoreCorruptFiles or ignoreMissingFiles |
#1986 | [BUG] CSV reading null inconsistent between spark.rapids.sql.format.csv.enabled=true&false |
#126 | [BUG] CSV parsing large number values overflow |
#4759 | [BUG] Profiling tool can miss datasources when they are GPU reads |
#4798 | [BUG] Integration test builds failing with worker_id not found |
#4727 | [BUG] Read Parquet does not ignoreCorruptFiles |
#4744 | [BUG] test_groupby_std_variance_partial_replace_fallback failed |
#4761 | [BUG] test_simple_partitioned_read failed on Spark 3.3 |
#2071 | [BUG] parsing invalid boolean CSV values return true instead of null |
#4749 | [BUG] test_write_empty_parquet_round_trip failed |
#4730 | [BUG] python UDF tests are leaking |
#4290 | [BUG] Investigate q32 and q67 for decimals potential regression |
#4409 | [BUG] Possible race condition in regular expression support for octal digits |
#4728 | [BUG] test_mixed_compress_read orc_test.py failures |
#4736 | [BUG] buildall --profile=321 fails on missing spark301 rapids-4-spark-sql dependency |
#4702 | [BUG] cache_test.py failed w/ cache.serializer in spark 3.3.0 |
#4031 | [BUG] Spark 3.3.0 test failure: NoSuchMethodError org.apache.orc.TypeDescription.getAttributeValue |
#4664 | [BUG] MortgageAdaptiveSparkSuite failed with duplicate buffer exception |
#4564 | [BUG] map_test ansi failed in spark330 |
#119 | [BUG] LIKE does not work if null chars are in the string |
#124 | [BUG] CSV/JSON Parsing some float values results in overflow |
#4045 | [BUG] q93 failed in this week's NDS runs |
#4488 | [BUG] isCastingStringToNegDecimalScaleSupported seems set wrong for some Spark versions |
#5251 | Update 22.04 changelog to latest [skip ci] |
#5232 | Fix issue in GpuArrayExists where a parent view outlived the child |
#5239 | Fix tools depending on the common jar |
#5205 | Update 22.04 changelog to latest [skip ci] |
#5190 | Fix column->row conversion GPU check: |
#5184 | Fix CPU fallback for Map lookup |
#5191 | Update version-def to use released cudfjni 22.04.0 [skip ci] |
#5167 | Update cudfjni version to released 22.04.0 |
#5169 | Terminate test earlier if pytest ENV issue [skip ci] |
#5160 | Fix approximate percentile reduction UnsupportedOperationException |
#5165 | Update Databricks 10.4 for changes to the QueryStageExec and ClusteredDistribution |
#4997 | Update docs for the 22.04 release[skip ci] |
#5146 | Support env var INTEGRATION_TEST_VERSION to override shim version |
#5103 | Init 22.04 changelog [skip ci] |
#5122 | Disable GPU accelerated row-column transpose for Pascal GPUs: |
#5127 | GpuCast.hasSideEffects now checks to see if the child expression has side-effects |
#5118 | On task failure catch some CUDA exceptions and kill executor |
#5069 | Update for the public release [skip ci] |
#5097 | Implement hasSideEffects for GpuGetArrayItem, GpuElementAt, GpuGetMapValue, GpuUnaryMinus, and GpuAbs |
#5079 | Disable spark snapshot shims pre-merge build in 22.04 |
#5094 | Fix profiling tool reading collectionAccumulator |
#5078 | Disable JSON and CSV floating-point reads by default |
#4961 | Support approx_percentile in reduction context |
#5062 | Update Spark 2.x explain API with changes in 22.04 |
#5066 | Add getOrcSchemaString for OrcShims |
#5030 | Fix regression from 21.12 where udfs defined in repl no longer worked |
#5051 | Revert "Replace ParquetFileReader.readFooter with open() and getFooter " |
#5052 | Work around incompatibility between Databricks Delta loads and GpuRegExpExtract |
#4972 | Add support for ORC forced positional evolution |
#5042 | Implement hasSideEffects for GpuSequence |
#5040 | Fix missing imports for 321db shim |
#5033 | Removed limit from the test |
#4938 | Improve compatibility when reading timestamps from JSON and CSV sources |
#5026 | Update RoCE doc URL [skip ci] |
#4976 | Replace ParquetFileReader.readFooter with open() and getFooter |
#4989 | Use conf.useCompression config to decide if we should be compressing the cache |
#4956 | Add avro reader support |
#5009 | Remove references of shims folder in docs [skip ci] |
#5004 | Add ClouderaShimVersion to unshimmed files |
#4971 | Fall back to the CPU for non-zero scale on Ceil or Floor functions |
#4996 | Fix collect_set on struct type |
#4998 | Added the id back for struct children to make them unique |
#4995 | Include 321db shim in distribution build [skip ci] |
#4981 | Update doc for CSV reading interval |
#4973 | Implement support for ArrayExists expression |
#4988 | Remove support for Spark 3.0.x |
#4955 | Add UDT support to ParquetCachedBatchSerializer (CPU) |
#4994 | Add databricks 10.4 build in pre-merge |
#4990 | Remove 30X permerge support for version 22.04 and above [skip ci] |
#4958 | Add independent mvn verify check [skip ci] |
#4933 | Set OrcConf.INCLUDE_COLUMNS for ORC reading |
#4944 | Support for non-string key-types for GetMapValue and element_at() |
#4974 | Add shim for Databricks 10.4 |
#4907 | Add markdown check action |
#4977 | Add missing 314 to buildall script |
#4927 | Support reading ANSI day time interval type from CSV source |
#4965 | Documentation: add example python api call for ExplainPlan.explainPotentialGpuPlan [skip ci] |
#4957 | Document agg pushdown on ORC file limitation [skip ci] |
#4946 | Support predictors on ANSI day time interval type |
#4952 | Have a fixed GPU memory size for integration tests |
#4954 | Fix of failing to read parquet files after writing the hidden file metadata in |
#4953 | Add Decimal 128 as a supported type in partition by for databricks running window |
#4941 | Use new list reduction API to improve performance |
#4926 | Support DayTimeIntervalType in ParquetCachedBatchSerializer |
#4947 | Fallback to ARENA if ASYNC configured and driver < 11.5.0 |
#4934 | Replace MetadataAttribute with FileSourceMetadataAttribute to follow the update in Spark for 3.3.0+ |
#4942 | Fix window rank integration tests on |
#4928 | Disable regular expressions on GPU by default |
#4923 | Support GpuScalarSubquery on nested types |
#4924 | Implement percent_rank() on GPU |
#4853 | Improve date support in JSON and CSV readers |
#4930 | Add in support for sorting arrays with structs in sort_array |
#4861 | Add Apache Spark 3.1.4-SNAPSHOT Shims |
#4925 | Remove unused Spark322PlusShims |
#4921 | Add DatabricksShimVersion to unshimmed class list |
#4917 | Default some configs to protect against cluster settings in integration tests |
#4922 | Add support for decimal 128 for db and spark 320+ |
#4919 | Case-insensitive PR title check [skip ci] |
#4796 | Implement ExistenceJoin Iterator using an auxiliary left semijoin |
#4857 | Transition to v2 shims [Databricks] |
#4899 | Fixed Decimal 128 bug in ParquetCachedBatchSerializer |
#4810 | Support ANSI intervals to/from Parquet |
#4909 | Make ARENA the default allocator for 22.04 |
#4856 | Enable shim tests in sql-plugin module |
#4880 | Bump hadoop-client dependency to 3.1.4 |
#4825 | Initial support for reading decimal types from JSON and CSV |
#4859 | Fallback to CPU when Spark pushes down Aggregates (Min/Max/Count) for ORC |
#4872 | Speed up copying decimal column from parquet buffer to GPU buffer |
#4904 | Relocate Hive UDF Classes |
#4871 | Minor changes to print revision differences when building shims |
#4882 | Disable write/read Parquet when Parquet field IDs are used |
#4858 | Support non-literal index for GpuElementAt and GpuGetArrayItem |
#4875 | Support running GetArrayStructFields on GPU |
#4885 | Enable fuzz testing for Regular Expression repetitions and move remaining edge cases to CPU |
#4869 | Support for hexadecimal digits in regular expressions on the GPU |
#4854 | Avoid regexp_cost with stringSplit on the GPU using transpilation |
#4888 | Clean up leak detection code |
#4901 | fix a broken link in CONTRIBUTING.md[skip ci] |
#4891 | update getting started doc because aws-emr 6.5.0 released[skip ci] |
#4881 | Fix compilation error caused by ClusteredDistribution parameters |
#4890 | Integration-test tests jar for hive UDF tests |
#4878 | Set conda/mamba default to Python version to 3.8 [skip ci] |
#4874 | Fix spark-tests syntax issue [skip ci] |
#4850 | Also check cuda runtime version when using the ASYNC allocator |
#4851 | Add worker ID to temporary table names in tests |
#4847 | Fix test_compress_write_round_trip failure on Spark 3.3 |
#4848 | Profile tool: fix printing of task failed reason |
#4636 | Support str_to_map |
#4835 | Trim parquet_write_test to reduce integration test runtime |
#4819 | Throw exception if casting from double to datetime |
#4838 | Trim cache tests to improve integration test time |
#4839 | Optionally return null if element not exists map/array |
#4822 | Push decimal workarounds to cuDF |
#4619 | Move the udf-examples module to the external repository spark-rapids-examples |
#4844 | Update spark313 dep to released one |
#4827 | Make InternalExclusiveModeGpuDiscoveryPlugin and ExplainPlanImpl as protected class. |
#4836 | Support WindowExec partitioning by Decimal 128 on the GPU |
#4760 | Short circuit AND/OR in ANSI mode |
#4829 | Make bloopInstall version configurable in buildall |
#4823 | Reduce redundancy of decimal testing |
#4715 | Patterns such (3?)+ should now fall back to CPU |
#4809 | Add ignoreCorruptFiles for ORC readers |
#4790 | Improve JSON and CSV parsing of integer values |
#4812 | Default integration test configs to allow negative decimal scale |
#4805 | Avoid output cast by using unsigned type output for GpuExtractChunk32 |
#4804 | Profiling tool can miss datasources when they are GPU reads |
#4797 | Do not check for metadata during schema comparison |
#4785 | Support casting Map to String |
#4794 | Decimal-128 support for mod and pmod |
#4799 | Fix failure to generate worker_id when xdist is not present |
#4742 | Add ignoreCorruptFiles feature for Parquet reader |
#4792 | Ensure GpuM2 merge aggregation does not produce a null mean or m2 |
#4770 | Improve columnarCopy for HostColumnarToGpu |
#4776 | Improve aggregation performance of average on DECIMAL128 columns |
#4786 | Add shims to compare ORC TypeDescription |
#4780 | Improve JSON and CSV support for boolean values |
#4778 | Decrease chance of random collisions in test temporary paths |
#4782 | Check in host leak detection code |
#4781 | Add Spark properties table to profiling tool output |
#4714 | Add regular expression support to string_split |
#4754 | Close SpillableBatch to avoid leaks |
#4758 | Fix merge conflict with branch-22.02 [skip ci] |
#4694 | Add clarifications and details to integration-tests README [skip ci] |
#4740 | Enable regular expressions on GPU by default |
#4735 | Re-enables partial regex support for octal digits on the GPU |
#4737 | Check for a null compression codec when creating ORC OutStream |
#4738 | Change resume-from to aggregator in buildall [skip ci] |
#4698 | Add tests for few json options |
#4731 | Trim join tests to improve runtime of tests |
#4732 | Fix failing serializer tests on Spark 3.3.0 |
#4709 | Update centos 8 dockerfile to handle EOL issue [skip ci] |
#4724 | Debug dump to Parquet support for DECIMAL128 columns |
#4688 | Optimize DECIMAL128 sum aggregations |
#4692 | Add FAQ entry to discuss executor task concurrency configuration [skip ci] |
#4588 | Optimize semaphore acquisition in GpuShuffledHashJoinExec |
#4697 | Add preliminary test and test framework changes for ExistanceJoin |
#4716 | GpuStringSplit should return an array on not-null elements |
#4611 | Support BitLength and OctetLength |
#4408 | Use the ORC version that corresponds to the Spark version |
#4686 | Fall back to CPU for queries referencing hidden metadata columns |
#4669 | Prevent deadlock between RapidsBufferStore and RapidsBufferBase on close |
#4707 | Fix auto merge conflict 4705 [skip ci] |
#4690 | Fix map_test ANSI failure in Spark 3.3.0 |
#4681 | Reimplement check for non-regexp strings using RegexParser |
#4683 | Fix documentation link, clarify documentation [skip ci] |
#4677 | Make Collect, first and last as deterministic aggregate functions for Spark-3.3 |
#4682 | Enable test for LIKE with embedded null character |
#4673 | Allow GpuWindowExec to partition on structs |
#4637 | Improve support for reading CSV and JSON floating-point values |
#4629 | Remove shims module |
#4648 | Append new authorized user to blossom-ci safelist |
#4623 | Fallback to CPU when aggregate push down used for parquet |
#4606 | Set default RMM pool to ASYNC for cuda 11.2+ |
#4531 | Use libcudf mixed joins for conditional hash semi and anti joins |
#4624 | Enable integration test results report on Jenkins [skip ci] |
#4597 | Update plugin version to 22.04.0-SNAPSHOT |
#4592 | Adds SQL function HYPOT using the GPU |
#4504 | Implement AST-based regular expression fuzz tests |
#4560 | Make shims.v2.ParquetCachedBatchSerializer as protected |
#4305 | [FEA] write nvidia tool wrappers to allow old YARN versions to work with MIG |
#4410 | [FEA] ReplicateRows - Support ReplicateRows for decimal 128 type |
#4360 | [FEA] Add explain api for Spark 2.X |
#3541 | [FEA] Support max on single-level struct in aggregation context |
#4238 | [FEA] Add a Spark 3.X Explain only mode to the plugin |
#3952 | [Audit] [FEA][SPARK-32986][SQL] Add bucketed scan info in query plan of data source v1 |
#4412 | [FEA] Improve support for \A, \Z, and \z in regular expressions |
#3979 | [FEA] Improvements for CPU(Row) based UDF |
#4467 | [FEA] Add support for regular expression with repeated digits (\d+ , \d* , \d? ) |
#4439 | [FEA] Enable GPU broadcast exchange reuse for DPP when AQE enabled |
#3512 | [FEA] Support org.apache.spark.sql.catalyst.expressions.Sequence |
#3475 | [FEA] Spark 3.2.0 reads Parquet unsigned int64(UINT64) as Decimal(20,0) but CUDF does not support it |
#4091 | [FEA] regexp_replace: Improve support for ^ and $ |
#4104 | [FEA] Support org.apache.spark.sql.catalyst.expressions.ReplicateRows |
#4027 | [FEA] Support SubqueryBroadcast on GPU to enable exchange reuse during DPP |
#4284 | [FEA] Support idx = 0 in GpuRegExpExtract |
#4002 | [FEA] Implement regexp_extract on GPU |
#3221 | [FEA] Support GpuFirst and GpuLast on nested types under reduction aggregations |
#3944 | [FEA] Full support for sum with overflow on Decimal 128 |
#4028 | [FEA] support GpuCast from non-nested ArrayType to StringType |
#3250 | [FEA] Make CreateMap duplicate key handling compatible with Spark and enable CreateMap by default |
#4170 | [FEA] Make regular expression behavior with $ and \r consistent with CPU |
#4001 | [FEA] Add regexp support to regexp_replace |
#3962 | [FEA] Support null characters in regular expressions in RLIKE |
#3797 | [FEA] Make RLike support consistent with Apache Spark |
#4392 | [FEA] could the parquet scan code avoid acquiring the semaphore for an empty batch? |
#679 | [FEA] move some deserialization code out of the scope of the gpu-semaphore to increase cpu concurrent |
#4350 | [FEA] Optimize the all-true and all-false cases in GPU If and CaseWhen |
#4309 | [FEA] Leverage cudf conditional nested loop join to implement semi/anti hash join with condition |
#4395 | [FEA] acquire the semaphore after concatToHost in GpuShuffleCoalesceIterator |
#4134 | [FEA] Allow EliminateJoinToEmptyRelation in GpuBroadcastExchangeExec |
#4189 | [FEA] understand why between is so expensive |
#4316 | [BUG] Exception: Unable to find py4j, your SPARK_HOME may not be configured correctly intermittently |
#4725 | [DOC] Broken links in guide doc |
#4675 | [BUG] Jenkins integration build timed out at 10 hours |
#4665 | [BUG] Spark321Shims.getParquetFilters failed with NoSuchMethodError |
#4635 | [BUG] nvidia-smi wrapper script ignores ENABLE_NON_MIG_GPUS=1 on a heterogeneous multi-GPU machine |
#4500 | [BUG] Build failures against Spark 3.2.1 rc1 and make 3.2.1 non snapshot |
#4631 | [BUG] Release build with mvn option -P source-javadoc FAILED |
#4625 | [BUG] NDS query 5 fails with AdaptiveSparkPlanExec assertion |
#4632 | [BUG] Build failing for Spark 3.3.0 due to deprecated method warnings |
#4599 | [BUG] test_group_apply_udf and test_group_apply_udf_more_types hangs on Databricks 9.1 |
#4600 | [BUG] crash if we have a decimal128 in a struct in an array |
#4581 | [BUG] Build error "GpuOverrides.scala:924: wrong number of arguments" on DB9.1.x spark-3.1.2 |
#4593 | [BUG] dup GpuHashJoin.diff case-folding issue |
#4559 | [BUG] regexp_replace with replacement string containing \ can produce incorrect results |
#4503 | [BUG] regexp_replace with back references produces incorrect results on GPU |
#4567 | [BUG] Profile tool hangs in compare mode |
#4315 | [BUG] test_hash_reduction_decimal_overflow_sum[30] failed OOM in integration tests |
#4551 | [BUG] protobuf-java version changed to 3.x |
#4499 | [BUG]GpuSequence blows up when nulls exist in any of the inputs (start, stop, step) |
#4454 | [BUG] Shade warnings when building the tools artifact |
#4541 | [BUG] Column vector leak in conditionals_test.py |
#4514 | [BUG] test_hash_reduction_pivot_without_nans failed |
#4521 | [BUG] Inconsistencies in handling of newline characters and string and line anchors |
#4548 | [BUG] ai.rapids.cudf.CudaException: an illegal instruction was encountered in databricks 9.1 |
#4475 | [BUG] \D and \W match newline in Spark but not in cuDF |
#1866 | [BUG] GpuFileFormatWriter does not close the data writer |
#4524 | [BUG] RegExp transpiler fails to detect some choice expressions that cuDF cannot compile |
#3226 | [BUG]OOM happened when do cube operations |
#2504 | [BUG] OOM when running NDS queries with UCX and GDS |
#4273 | [BUG] Rounding past the size that can be stored in a type produces incorrect results |
#4060 | [BUG] test_hash_groupby_approx_percentile_long_repeated_keys failed intermittently |
#4039 | [BUG] Spark 3.3.0 IT Array test failures |
#3849 | [BUG] In ANSI mode we can fail in cases Spark would not due to conditionals |
#4445 | [BUG] mvn clean prints an error message on a clean dir |
#4421 | [BUG] the driver is trying to load CUDA with latest 22.02 |
#4455 | [BUG] join_test.py::test_struct_self_join[IGNORE_ORDER({'local': True})] failed in spark330 |
#4442 | [BUG] mvn build FAILED with option -P noSnapshotsWithDatabricks |
#4281 | [BUG] q9 regression between 21.10 and 21.12 |
#4280 | [BUG] q88 regression between 21.10 and 21.12 |
#4422 | [BUG] Host column vectors are being leaked during tests |
#4446 | [BUG] GpuCast crashes when casting from Array with unsupportable child type |
#4432 | [BUG] nightly build 3.3.0 failed: HashClusteredDistribution is not a member of org.apache.spark.sql.catalyst.plans.physical |
#4443 | [BUG] SPARK-37705 breaks parquet filters from Spark 3.3.0 and Spark 3.2.2 onwards |
#4378 | [BUG] udf_test udf_cudf_test failed require_minimum_pandas_version check in spark 320+ |
#4423 | [BUG] Build is failing due to FileScanRDD changes in Spark 3.3.0-SNAPSHOT |
#4401 | [BUG]array_test.py::test_array_contains failures |
#4403 | [BUG] NDS query 72 logs codegen fallback exception and produces incorrect results |
#4386 | [BUG] conditionals_test.py FAILED with side_effects_cast[Integer/Long] on Databricks 9.1 Runtime |
#3934 | [BUG] Dependencies of published integration tests jar are missing |
#4341 | [BUG] GpuCast.scala:nnn warning: discarding unmoored doc comment |
#4356 | [BUG] nightly spark303 deploy pulling spark301 aggregator |
#4347 | [BUG] Dist jar pom lists aggregator jar as dependency |
#4176 | [BUG] ParseDateTimeSuite UT failed |
#4292 | [BUG] no meaningful message is surfaced to maven when binary-dedupe fails |
#4351 | [BUG] Tests FAILED On SPARK-3.2.0, com.nvidia.spark.rapids.SerializedTableColumn cannot be cast to com.nvidia.spark.rapids.GpuColumnVector |
#4346 | [BUG] q73 decimal was twice as slow in weekly results |
#4334 | [BUG] GpuColumnarToRowExec will always be tagged False for exportColumnarRdd after Spark311 |
#4339 | The parameter dataType is not necessary in resolveColumnVector method. |
#4275 | [BUG] Row-based Hive UDF will fail if arguments contain a foldable expression. |
#4229 | [BUG] regexp_replace [^a] has different behavior between CPU and GPU for multiline strings |
#4294 | [BUG] parquet_write_test.py::test_ts_write_fails_datetime_exception failed in spark 3.1.1 and 3.1.2 |
#4205 | [BUG] Get different results when casting from timestamp to string |
#4277 | [BUG] cudf_udf nightly cudf import rmm failed |
#4246 | [BUG] Regression in CastOpSuite due to cuDF change in parsing NaN |
#4243 | [BUG] test_regexp_replace_null_pattern_fallback[ALLOW_NON_GPU(ProjectExec,RegExpReplace)] failed in databricks |
#4244 | [BUG] Cast from string to float using hand-picked values failed |
#4227 | [BUG] RAPIDS Shuffle Manager doesn't fallback given encryption settings |
#3374 | [BUG] minor deprecation warnings in a 3.2 shim build |
#3613 | [BUG] release312db profile pulls in 311until320-apache |
#4213 | [BUG] unused method with a misleading outdated comment in ShimLoader |
#3609 | [BUG] GpuShuffleExchangeExec in v2 shims has inconsistent packaging |
#4127 | [BUG] CUDF 22.02 nightly test failure |
#4773 | Update 22.02 changelog to latest [skip ci] |
#4771 | revert cudf api links from legacy to stable[skip ci] |
#4767 | Update 22.02 changelog to latest [skip ci] |
#4750 | Updated doc for decimal support |
#4757 | Update qualification tool to remove DECIMAL 128 as potential problem |
#4755 | Fix databricks doc for limitations.[skip ci] |
#4751 | Fix broken hyperlinks in documentation [skip ci] |
#4706 | Update 22.02 changelog to latest [skip ci] |
#4700 | Update cudfjni version to released 22.02.0 |
#4701 | Decrease nighlty tests upper limitation to 7 [skip ci] |
#4639 | Update changelog for 22.02 and archive info of some older releases [skip ci] |
#4572 | Add download page for 22.02 [skip ci] |
#4672 | Revert "Disable 311cdh build due to missing dependency (#4659)" |
#4662 | Update the deploy script [skip ci] |
#4657 | Upmerge spark2 directory to the latest 22.02 changes |
#4659 | Disable 311cdh build by default because of a missing dependency |
#4508 | Fix Spark 3.2.1 build failures and make it non-snapshot |
#4652 | Remove non-deterministic test order in nightly [skip ci] |
#4643 | Add profile release301 when mvn help:evaluate |
#4630 | Fix the incomplete capture of SubqueryBroadcast |
#4633 | Suppress newTaskTempFile method warnings for Spark 3.3.0 build |
#4618 | [DB31x] Pick the correct Python runner for flatmap-group Pandas UDF |
#4622 | Fallback to CPU when encoding is not supported for JSON reader |
#4470 | Add in HashPartitioning support for decimal 128 |
#4535 | Revert "Disable orc write by default because of https://issues.apache.org/jira/browse/ORC-1075 (#4471)" |
#4583 | Avoid unapply on PromotePrecision |
#4573 | Correct version from 21.12 to 22.02[skip ci] |
#4575 | Correct and update links in UDF doc[skip ci] |
#4501 | Switch and/or to use new cudf binops to improve performance |
#4594 | Resolve case-folding issue [skip ci] |
#4585 | Spark2 module upmerge, deploy script, and updates for Jenkins |
#4589 | Increase premerge databricks IDLE_TIMEOUT to 4 hours [skip ci] |
#4485 | Add json reader support |
#4556 | regexp_replace with back-references should fall back to CPU |
#4569 | Fix infinite loop with Profiling tool compare mode and app with no sql ids |
#4529 | Add support for Spark 2.x Explain Api |
#4577 | Revert "Fix CVE-2021-22569 (#4545)" |
#4520 | GpuSequence refactor |
#4570 | A few quick fixes to try to reduce max memory usage in the tests |
#4477 | Use libcudf mixed joins for conditional hash joins |
#4566 | remove scala-library from combined tools jar |
#4552 | Fix resource leak in GpuCaseWhen |
#4553 | Reenable test_hash_reduction_pivot_without_nans |
#4530 | Fix correctness issues in regexp and add \r and \n to fuzz tests |
#4549 | Fix typos in integration tests README [skip ci] |
#4545 | Fix CVE-2021-22569 |
#4543 | Enable auto-merge from branch-22.02 to branch-22.04 [skip ci] |
#4540 | Remove user kuhushukla |
#4434 | Support max on single-level struct in aggregation context |
#4534 | Temporarily disable integration test - test_hash_reduction_pivot_without_nans |
#4322 | Add an explain only mode to the plugin |
#4497 | Make better use of pinned memory pool |
#4512 | remove hadoop version requirement[skip ci] |
#4527 | Fall back to CPU for regular expressions containing \D or \W |
#4525 | Properly close data writer in GpuFileFormatWriter |
#4502 | Removed the redundant test for element_at and fixed the failing one |
#4523 | Add more integration tests for decimal 128 |
#3762 | Call the right method to convert table from row major <=> col major |
#4482 | Simplified the construction of zero scalar in GpuUnaryMinus |
#4510 | Update copyright in NOTICE [skip ci] |
#4484 | Update GpuFileFormatWriter to stay in sync with recent Spark changes, but still not support writing Hive bucketed table on GPU. |
#4492 | Fall back to CPU for regular expressions containing hex digits |
#4495 | Enable approx_percentile by default |
#4420 | Fix up incorrect results of rounding past the max digits of data type |
#4483 | Update test case of reading nested unsigned parquet file |
#4490 | Remove warning about RMM default allocator |
#4461 | [Audit] Add bucketed scan info in query plan of data source v1 |
#4489 | Add arrays of decimal128 to join tests |
#4476 | Don't acquire the semaphore for empty input while scanning |
#4424 | Improve support for regular expression string anchors \A , \Z , and \z |
#4491 | Skip the test for spark versions 3.1.1, 3.1.2 and 3.2.0 only |
#4459 | Use merge sort for struct types in non-key columns |
#4494 | Append new authorized user to blossom-ci whitelist [skip ci] |
#4400 | Enable approx percentile tests |
#4471 | Disable orc write by default because of https://issues.apache.org/jira/browse/ORC-1075 |
#4462 | Rename DECIMAL_128_FULL and rework usage of TypeSig.gpuNumeric |
#4479 | Change signoff check image to slim-buster [skip ci] |
#4464 | Throw SparkArrayIndexOutOfBoundsException for Spark 3.3.0+ |
#4469 | Support repetition of \d and \D in regexp functions |
#4472 | Modify docs for 22.02 to address issue-4319[skip ci] |
#4440 | Enable GPU broadcast exchange reuse for DPP when AQE enabled |
#4376 | Add sequence support |
#4460 | Abstract the text based PartitionReader |
#4383 | Fix correctness issue with CASE WHEN with expressions that have side-effects |
#4465 | Refactor for shims 320+ |
#4463 | Avoid replacing a hash join if build side is unsupported by the join type |
#4456 | Fix build issues: 1 clean non-exists target dirs; 2 remove duplicated plugin |
#4416 | Unshim join execs |
#4172 | Support String to Decimal 128 |
#4458 | Exclude some metadata operators when checking GPU replacement |
#4451 | Some metrics improvements and timeline reporting |
#4435 | Disable add profile src execution by default to make the build log clean |
#4436 | Print error log to stderr output |
#4155 | Add partial support for line begin and end anchors in regexp_replace |
#4428 | Exhaustively iterate ColumnarToRow iterator to avoid leaks |
#4430 | update pca example link in ml-integration.md[skip ci] |
#4452 | Limit parallelism of nightly tests [skip ci] |
#4449 | Add recursive type checking and fallback tests for casting array with unsupported element types to string |
#4437 | Change logInfo to logWarning |
#4447 | Fix 330 build error and add 322 shims layer |
#4417 | Fix an Intellij debug issue |
#4431 | Add DateType support for AST expressions |
#4433 | Import the right pandas from conda [skip ci] |
#4419 | Import the right pandas from conda |
#4427 | Update getFileScanRDD shim for recent changes in Spark 3.3.0 |
#4397 | Ignore cufile.log |
#4388 | Add support for ReplicateRows |
#4399 | Update docs for Profiling and Qualification tool to change wording |
#4407 | Fix GpuSubqueryBroadcast on multi-fields relation |
#4396 | GpuShuffleCoalesceIterator acquire semaphore after host concat |
#4361 | Accommodate altered semantics of cudf::lists::contains() |
#4394 | Use correct column name in GpuIf test |
#4385 | Add missing GpuSubqueryBroadcast replacement rule for spark31x |
#4387 | Fix auto merge conflict 4384[skip ci] |
#4374 | Fix the IT module depends on the tests module |
#4365 | Not publishing integration_tests jar to Maven Central [skip ci] |
#4358 | Update GpuIf to support expressions with side effects |
#4382 | Remove unused scallop dependency from integration_tests |
#4364 | Replace Scala document with Scala comment for inner functions |
#4373 | Add pytest tags for nightly test parallel run [skip ci] |
#4150 | Support GpuSubqueryBroadcast for DPP |
#4372 | Move casting to string tests from array_test.py and struct_test.py to cast_test.py |
#4371 | Fix typo in skipTestsFor330 calculation [skip ci] |
#4355 | Dedicated deploy-file with reduced pom in nightly build [skip ci] |
#4352 | Revert "Ignore failing string to timestamp tests temporarily (#4197)" |
#4359 | Audit - SPARK-37268 - Remove unused variable in GpuFileScanRDD [Databricks] |
#4327 | Print meaningful message when calling scripts in maven |
#4354 | Fix regression in AQE optimizations |
#4343 | Fix issue with binding to hash agg columns with computation |
#4285 | Add support for regexp_extract on the GPU |
#4349 | Fix PYTHONPATH in pre-merge |
#4269 | The option for the nightly script not deploying jars [skip ci] |
#4335 | Fix the issue of exporting Column RDD |
#4336 | Split expensive pytest files in cases level [skip ci] |
#4328 | Change the explanation of why the operator will not work on GPU |
#4338 | Use scala Int.box instead of Integer constructors |
#4340 | Remove the unnecessary parameter dataType in resolveColumnVector method |
#4256 | Allow returning an EmptyHashedRelation when a broadcast result is empty |
#4333 | Add tests about writing empty table to ORC/PAQUET |
#4337 | Support GpuFirst and GpuLast on nested types under reduction aggregations |
#4331 | Fix parquet options builder calls |
#4310 | Fix typo in shim class name |
#4326 | Fix 4315 decrease concurrentGpuTasks to avoid sum test OOM |
#4266 | Check revisions for all shim jars while build all |
#4282 | Use data type to create an inspector for a foldable GPU expression. |
#3144 | Optimize AQE with Spark 3.2+ to avoid redundant transitions |
#4317 | [BUG] Update nightly test script to dynamically set mem_fraction [skip ci] |
#4206 | Porting GpuRowToColumnar converters to InternalColumnarRDDConverter |
#4272 | Full support for SUM overflow detection on decimal |
#4255 | Make regexp pattern [^a] consistent with Spark for multiline strings |
#4306 | Revert commonizing the int96ParquetRebase* functions |
#4299 | Fix auto merge conflict 4298 [skip ci] |
#4159 | Optimize sample perf |
#4235 | Commonize v2 shim |
#4274 | Add tests for timestamps that overflowed before. |
#4271 | Skip test_regexp_replace_null_pattern_fallback on Spark 3.1.1 and later |
#4278 | Use mamba for cudf conda install [skip ci] |
#4270 | Document exponent differences when casting floating point to string [skip ci] |
#4268 | Fix merge conflict with branch-21.12 |
#4093 | Add tests for regexp() and regexp_like() |
#4259 | fix regression in cast from string to float that caused signed NaN to be considered valid |
#4241 | fix bug in parsing regex character classes that start with ^ and contain an unescaped ] |
#4224 | Support row-based Hive UDFs |
#4221 | GpuCast from ArrayType to StringType |
#4007 | Implement duplicate key handling for GpuCreateMap |
#4251 | Skip test_regexp_replace_null_pattern_fallback on Databricks |
#4247 | Disable failing CastOpSuite test |
#4239 | Make EOL anchor behavior match CPU for strings ending with newline |
#4153 | Regexp: Only transpile once per expression rather than once per batch |
#4230 | Change to build tools module with all the versions by default |
#4223 | Fixes a minor deprecation warning |
#4215 | Rebalance testing load |
#4214 | Fix pre_merge ci_2 [skip ci] |
#4212 | Remove an unused method with its outdated comment |
#4211 | Update test_floor_ceil_overflow to be more lenient on exception type |
#4203 | Move all the GpuShuffleExchangeExec shim v2 classes to org.apache.spark |
#4193 | Rename 311until320-apache to 311until320-noncdh |
#4197 | Ignore failing string to timestamp tests temporarily |
#4160 | Fix merge issues for branch 22.02 |
#4081 | Convert String to DecimalType without casting to FloatType |
#4132 | Fix auto merge conflict 4131 [skip ci] |
#4099 | [REVIEW] Init version 22.02.0 |
#4113 | Fix pre-merge CI 2 conditions [skip ci] |
Changelog of older releases can be found at docs/archives