Generated on 2022-01-28
#938 | [FEA] Have hashed shuffle match spark |
#1604 | [FEA] Support casting structs to strings |
#1920 | [FEA] Support murmur3 hashing of structs |
#2018 | [FEA] A way for user to find out the plugin version and cudf version in REPL |
#77 | [FEA] Support ArrayContains |
#1721 | [FEA] build cudf jars with NVTX enabled |
#1782 | [FEA] Shim layers to support spark versions |
#1625 | [FEA] Support Decimal Casts to String and String to Decimal |
#166 | [FEA] Support get_json_object |
#1698 | [FEA] Support casting structs to string |
#1912 | [FEA] Let Scalar Pandas UDF support array of struct type. |
#1136 | [FEA] Audit: Script to list commits between different Spark versions/tags |
#1921 | [FEA] cudf version check should be lenient on later patch version |
#19 | [FEA] Out of core sorts |
#2090 | [FEA] Make row count estimates available to the cost-based optimizer |
#1341 | Optimize unnecessary columnar->row->columnar transitions with AQE |
#1558 | [FEA] Initialize UCX early |
#1633 | [FEA] Implement a cost-based optimizer |
#1727 | [FEA] Put RangePartitioner data path on the GPU |
#2279 | [BUG] Hash Partitioning can fail for very small batches |
#2314 | [BUG] v0.5.0 pre-release pytests join_test.py::test_hash_join_array FAILED on SPARK-EGX Yarn Cluster |
#2317 | [BUG] GpuColumnarToRowIterator can stop after receiving an empty batch |
#2244 | [BUG] Executors hanging when running NDS benchmarks |
#2278 | [BUG] FullOuter join can produce too many results |
#2220 | [BUG] csv_test.py::test_csv_fallback FAILED on the EMR Cluster |
#2225 | [BUG] GpuSort fails on tables containing arrays. |
#2232 | [BUG] hash_aggregate_test.py::test_hash_grpby_pivot FAILED on the Databricks Cluster |
#2231 | [BUG]string_test.py::test_re_replace FAILED on the Dataproc Cluster |
#2042 | [BUG] NDS q14a fails with "GpuColumnarToRow does not implement doExecuteBroadcast" |
#2203 | [BUG] Spark nightly cache tests fail with -- master flag |
#2230 | [BUG] qa_nightly_select_test.py::test_select FAILED on the Dataproc Cluster |
#1711 | [BUG] find a way to stop allocating from RMM on the shuffle-client thread |
#2109 | [BUG] Fix high priority violations detected by code analysis tools |
#2217 | [BUG] qa_nightly_select_test failure in test_select |
#2127 | [BUG] Parsing with two-digit year should fall back to CPU |
#2078 | [BUG] java.lang.ArithmeticException: divide by zero when spark.sql.ansi.enabled=true |
#2048 | [BUG] split function+ repartition result in "ai.rapids.cudf.CudaException: device-side assert triggered" |
#2036 | [BUG] Stackoverflow when writing wide parquet files. |
#1973 | [BUG] generate_expr_test FAILED on Dataproc Cluster |
#2079 | [BUG] koalas.sql fails with java.lang.ArrayIndexOutOfBoundsException |
#217 | [BUG] CudaUtil should be removed |
#1550 | [BUG] The ORC output data of a query is not readable |
#2074 | [BUG] Intermittent NPE in RapidsBufferCatalog when running test suite |
#2027 | [BUG] udf_cudf_test.py integration tests fail |
#1899 | [BUG] Some queries fail when cost-based optimizations are enabled |
#1914 | [BUG] Add in float, double, timestamp, and date support to murmur3 |
#2014 | [BUG] earlyStart option added in 0.5 can cause errors when starting UCX |
#1984 | [BUG] NDS q58 Decimal scale (59) cannot be greater than precision (38). |
#2001 | [BUG] RapidsShuffleManager didn't pass dirs to getBlockData from a wrapped ShuffleBlockResolver |
#1797 | [BUG] occasional crashes in CI |
#1861 | Encountered column data outside the range of input buffer |
#1905 | [BUG] Large concat task time in GpuShuffleCoalesce with pinned memory pool |
#1638 | [BUG] Tests test_window_aggs_for_rows_collect_list fails when there are null values in columns. |
#1864 | [BUG]HostColumnarToGPU inefficient when only doing count() |
#1862 | [BUG] spark 3.2.0-snapshot integration test failed due to conf change |
#1844 | [BUG] branch-0.5 nightly IT FAILED on the The mortgage ETL test "Could not read footer for file: file:/xxx/xxx.snappy.parquet" |
#1627 | [BUG] GDS exception when restoring spilled buffer |
#1802 | [BUG] Many decimal integration test failures for 0.5 |
#2326 | Update changelog for 0.5.0 release |
#2316 | Update doc to note that single quoted json strings are not ok |
#2319 | Disable hash partitioning on arrays |
#2318 | Fix ColumnarToRowIterator handling of empty batches |
#2304 | Update CHANGELOG.md |
#2301 | Update doc to reflect nanosleep problem with 460.32.03 |
#2298 | Update changelog for v0.5.0 release [skip ci] |
#2293 | update cudf version to 0.19.2 |
#2289 | Update docs to warn against 450.80.02 driver with 10.x toolkit |
#2285 | Require single batch for full outer join streaming |
#2281 | Remove download section for unreleased 0.4.2 |
#2264 | Add spark312 and spark320 versions of cache serializer |
#2254 | updated gcp docs with custom dataproc image instructions |
#2247 | Allow specifying a superclass for non-GPU execs |
#2235 | Fix distributed cache to read requested schema |
#2261 | Make CBO row count test more robust |
#2237 | update cudf version to 0.19.1 |
#2240 | Get the correct 'PIPESTATUS' in bash [skip ci] |
#2242 | Add shuffle doc section on the periodicGC configuration |
#2251 | Fix issue when out of core sorting nested data types |
#2204 | Run nightly tests for ParquetCachedBatchSerializer |
#2245 | Fix pivot bug for decimalType |
#2093 | Initial implementation of row count estimates in cost-based optimizer |
#2188 | Support GPU broadcast exchange reuse to feed CPU BHJ when AQE is enabled |
#2227 | ParquetCachedBatchSerializer broadcast AllConfs instead of SQLConf to fix distributed mode |
#2223 | Adds subquery aggregate tests from SPARK-31620 |
#2222 | Remove groupId already specified in parent pom |
#2209 | Fixed a few issues with out of core sort |
#2218 | Fix incorrect RegExpReplace children handling on Spark 3.1+ |
#2207 | fix batch size default values in the tuning guide |
#2208 | Revert "add nightly cache tests (#2083)" |
#2206 | Fix shim301db build |
#2192 | Fix index-based access to the head elements |
#2210 | Avoid redundant collection conversions |
#2190 | JNI fixes for StringWordCount native UDF example |
#2086 | Updating documentation for data format support |
#2172 | Remove easy unused symbols |
#2089 | Update PandasUDF doc |
#2195 | fix cudf 0.19.0 download link [skip ci] |
#2175 | Branch 0.5 doc update |
#2168 | Simplify GpuExpressions w/ withResourceIfAllowed |
#2055 | Support PivotFirst |
#2183 | GpuParquetScan#readBufferToTable remove dead code |
#2129 | Fall back to CPU when parsing two-digit years |
#2083 | add nightly cache tests |
#2151 | add corresponding close call for HostMemoryOutputStream |
#2169 | Work around bug in Spark for integration test |
#2130 | Fix divide-by-zero in GpuAverage with ansi mode |
#2149 | Auto generate the supported types for the file formats |
#2072 | Disable CSV parsing by default and update tests to better show what is left |
#2157 | fix merge conflict for 0.4.2 [skip ci] |
#2144 | Allow array and struct types to pass thru when doing join |
#2145 | Avoid GPU shuffle for round-robin of unsortable types |
#2021 | Add in support for murmur3 hashing of structs |
#2128 | Add in Partition type check support |
#2116 | Add dynamic Spark configuration for Databricks |
#2132 | Log plugin and cudf versions on startup |
#2135 | Disable Spark 3.2 shim by default |
#2125 | enable auto-merge from 0.5 to 0.6 [skip ci] |
#2120 | Materialize Stream before serialization |
#2119 | Add more comprehensive documentation on supported date formats |
#1717 | Decimal32 support |
#2114 | Modified the Download page for 0.4.1 and updated doc to point to K8s guide |
#2106 | Fix some buffer leaks |
#2097 | fix the bound row project empty issue in row frame |
#2099 | Remove verbose log prints to make the build/test log clean |
#2105 | Cleanup prior Spark sessions in tests consistently |
#2104 | Clone apache spark source code to parse the git commit IDs |
#2095 | fix refcount when materializing device buffer from GDS |
#2100 | [BUG] add wget for fetching conda [skip ci] |
#2096 | Adjust images for integration tests |
#2094 | Changed name of parquet files for Mortgage ETL Integration test |
#2035 | Accelerate data transfer for map Pandas UDF plan |
#2050 | stream shuffle buffers from GDS to UCX |
#2084 | Enable ORC write by default |
#2088 | Upgrade ScalaTest plugin to respect JAVA_HOME |
#1932 | Create a getting started on K8s page |
#2080 | Improve error message after failed RMM shutdown |
#2064 | Optimize unnecessary columnar->row->columnar transitions with AQE |
#2025 | Update the doc for pandas udf on databricks |
#2059 | Add the flag 'TEST_TYPE' to avoid integration tests silently skipping some test cases |
#2075 | Remove debug println from CBO test |
#2046 | support casting Decimal to String |
#1812 | allow spilled buffers to be unspilled |
#2061 | Run the pandas udf using cudf on Databricks |
#1893 | Plug-in support for get_json_object |
#2044 | Use partition for GPU hash partitioning |
#1954 | Fix CBO bug where incompatible plans were produced with AQE on |
#2049 | Remove incompatable int overflow checking |
#2056 | Remove Spark 3.2 from premerge and nightly CI run |
#1814 | Struct to string casting functionality |
#2037 | Fix warnings from use of deprecated cudf methods |
#2033 | Bump up pre-merge OS from ubuntu 16 to ubuntu 18 [skip ci] |
#1883 | Enable sort for single-level nesting struct columns on GPU |
#2016 | Refactor logic for parallel testing |
#2022 | Update order by to not load native libraries when sorting |
#2017 | Add in murmur3 support for float, double, date and timestamp |
#1981 | Fix GpuSize |
#1999 | support casting string to decimal |
#2006 | Enable windowed collect_list by default |
#2000 | Use Spark's HybridRowQueue to avoid MemoryConsumer API shim |
#2015 | Fix bug where rkey buffer is getting advanced after the first handshake |
#2007 | Fix unknown column name error when filtering ORC file with no names |
#2005 | Update to new is_before_spark_311 function name |
#1944 | Support running scalar pandas UDF with array type. |
#1991 | Fixes creation of invalid DecimalType in GpuDivide.tagExprForGpu |
#1958 | Support legacy behavior of parameterless count |
#1919 | Add support for Structs for UnionExec |
#2002 | Pass dirs to getBlockData for a wrapped shuffle resolver |
#1983 | document building against different CUDA Toolkit versions |
#1994 | Merge 0.4 to 0.5 [skip ci] |
#1982 | Update ORC pushdown filter building to latest Spark logic |
#1978 | Add audit script to list commits from Spark |
#1976 | Temp fix for parquet write changes |
#1970 | add maven profiles for supported CUDA versions |
#1951 | Branch 0.5 doc remove numpartitions |
#1967 | Update FAQ for Dataset API and format supported versions |
#1972 | support GpuSize |
#1966 | add xml report for codecov |
#1955 | Fix typo in Arrow optimization config |
#1956 | Fix NPE in plugin shutdown |
#1930 | Relax cudf version check for patch-level versions |
#1787 | support distributed file path in cloud environment |
#1961 | change premege GPU_TYPE from secret to global env [skip ci] |
#1957 | Update Spark 3.1.2 shim for float upcast behavior |
#1889 | Decimal DIV changes |
#1947 | Move doc of Pandas UDF to additional-functionality |
#1938 | Add spark.executor.resource.gpu.amount=1 to YARN and K8s docs |
#1937 | Fix merge conflict with branch-0.4 |
#1878 | spillable cache for GpuCartesianRDD |
#1843 | Refactor GpuGenerateExec and Explode |
#1933 | Split DB scripts to make them common for the build and IT pipeline |
#1935 | Update Alias SQL quoting and float-to-timestamp casting to match Spark 3.2 |
#1926 | Consolidate RAT settings in parent pom |
#1918 | Minor code cleanup in dateTImeExpressions |
#1906 | Remove get call on timeZoneId |
#1908 | Remove the Scala version of Mortgage ETL tests from nightly test |
#1894 | Modified Download Page to re-order the items and change the format of download links |
#1909 | Avoid pinned memory for shuffle host buffers |
#1891 | Connect UCX endpoints early during app startup |
#1877 | remove docker build in pre-merge [skip ci] |
#1830 | Enable the tests for collect over window. |
#1882 | GpuArrowColumnarBatchBuilder retains the references of ArrowBuf until HostToGpuCoalesceIterator put them into device |
#1868 | Increase row limit when doing count() for HostColumnarToGpu |
#1855 | Expose row count statistics in GpuShuffleExchangeExec |
#1875 | Fix merge conflict with branch-0.4 |
#1841 | Add in support for DateAddInterval |
#1869 | Fix tests for Spark 3.2.0 shim |
#1858 | fix shuffle manager doc on ucx library path |
#1836 | Add shim for Spark 3.1.2 |
#1852 | Fix Part Suite Tests |
#1616 | Cost-based optimizer |
#1834 | Add shim for Spark 3.0.3 |
#1839 | Refactor join code to reduce duplicated code |
#1848 | Fix merge conflict with branch-0.4 |
#1796 | Have most of range partitioning run on the GPU |
#1845 | Fix fails on the mortgage ETL test |
#1829 | Cleanup unused Jenkins files and scripts |
#1704 | Create a shim for Spark 3.2.0 development |
#1838 | Make databricks build.sh more convenient for dev |
#1835 | Fix merge conflict with branch-0.4 |
#1808 | Update mortgage tests to support reading multiple dataset formats |
#1822 | Fix conflict 0.4 to 0.5 |
#1807 | Fix merge conflict between branch-0.4 and branch-0.5 |
#1788 | Spill metrics everywhere |
#1719 | Add in out of core sort |
#1728 | Skip RAPIDS accelerated Java UDF tests if UDF fails to load |
#1689 | Update docs for plugin 0.5.0-SNAPSHOT and cudf 0.19-SNAPSHOT |
#1682 | init CI/CD dependencies branch-0.5 |
#1985 | [BUG] broadcast exchange can fail on 0.4 |
#1995 | update changelog 0.4.1 [skip ci] |
#1990 | Prepare for v0.4.1 release |
#1988 | broadcast exchange can fail when job group set |
#1773 | [FEA] Spark 3.0.2 release support |
#80 | [FEA] Support the struct SQL function |
#76 | [FEA] Support CreateArray |
#1635 | [FEA] RAPIDS accelerated Java UDF |
#1333 | [FEA] Support window operations on Decimal |
#1419 | [FEA] Support GPU accelerated UDF alternative for higher order function "aggregate" over window |
#1580 | [FEA] Support Decimal for ParquetCachedBatchSerializer |
#1600 | [FEA] Support ScalarSubquery |
#1072 | [FEA] Support for a custom DataSource V2 which supplies Arrow data |
#906 | [FEA] Clarify query explanation to directly state what will run on GPU |
#1335 | [FEA] Support CollectLimitExec for decimal |
#1485 | [FEA] Decimal Support for Parquet Write |
#1329 | [FEA] Decimal support for multiply int div, add, subtract and null safe equals |
#1351 | [FEA] Execute UDFs that provide a RAPIDS execution path |
#1330 | [FEA] Support Decimal Casts |
#1353 | [FEA] Example of RAPIDS UDF using custom GPU code |
#1487 | [FEA] Change spark 3.1.0 to 3.1.1 |
#1334 | [FEA] Add support for count aggregate on decimal |
#1325 | [FEA] Add in join support for decimal |
#1326 | [FEA] Add in Broadcast support for decimal values |
#37 | [FEA] round and bround SQL functions |
#78 | [FEA] Support CreateNamedStruct function |
#1331 | [FEA] UnionExec and ExpandExec support for decimal |
#1332 | [FEA] Support CaseWhen, Coalesce and IfElse for decimal |
#937 | [FEA] have murmur3 hash function that matches exactly with spark |
#1324 | [FEA] Support Parquet Read of Decimal FIXED_LENGTH_BYTE_ARRAY |
#1428 | [FEA] Add support for unary decimal operations abs, floor, ceil, unary - and unary + |
#1375 | [FEA] Add log statement for what the concurrentGpuTasks tasks is set to on executor startup |
#1352 | [FEA] Example of RAPIDS UDF using cudf Java APIs |
#1328 | [FEA] Support sorting and shuffle of decimal |
#1316 | [FEA] Support simple DECIMAL aggregates |
#1435 | [FEA]Improve the file reading by using local file caching |
#1738 | [FEA] Reduce regex usage in CAST string to date/timestamp |
#987 | [FEA] Optimize CAST from string to temporal types by using cuDF is_timestamp function |
#1594 | [FEA] RAPIDS accelerated ScalaUDF |
#103 | [FEA] GPU version of TakeOrderedAndProject |
#1024 | Cleanup RAPIDS transport calls to receive |
#1366 | Seeing performance differences of multi-threaded/coalesce/perfile Parquet reader type for a single file |
#1200 | [FEA] Accelerate the scan speed for coalescing parquet reader when reading files from multiple partitioned folders |
#1885 | [BUG] natural join on string key results in a data frame with spurious NULLs |
#1785 | [BUG] Rapids pytest integration tests FAILED on Yarn cluster with unrecognized arguments: --std_input_path=src/test/resources/ |
#999 | [BUG] test_multi_types_window_aggs_for_rows_lead_lag fails against Spark 3.1.0 |
#1818 | [BUG] unmoored doc comment warnings in GpuCast |
#1817 | [BUG] Developer build with local modifications fails during verify phase |
#1644 | [BUG] test_window_aggregate_udf_array_from_python fails on databricks |
#1771 | [BUG] Databricks AWS CI/CD failing to create cluster |
#1157 | [BUG] Fix regression supporting to_date on GPU with Spark 3.1.0 |
#716 | [BUG] Cast String to TimeStamp issues |
#1117 | [BUG] CAST string to date returns wrong values for dates with out-of-range values |
#1670 | [BUG] Some TPC-DS queries fail with AQE when decimal types enabled |
#1730 | [BUG] Range Partitioning can crash when processing is in the order-by |
#1726 | [BUG] java url decode test failing on databricks, emr, and dataproc |
#1651 | [BUG] GDS exception when writing shuffle file |
#1702 | [BUG] check all tests marked xfail for Spark 3.1.1 |
#575 | [BUG] Spark 3.1 FAILED join_test.py::test_broadcast_join_mixed[FullOuter][IGNORE_ORDER] failed |
#577 | [BUG] Spark 3.1 log arithmetic functions fail |
#1541 | [BUG] Tests fail in integration in distributed mode after allowing nested types through in sort and shuffle |
#1626 | [BUG] TPC-DS-like query 77 at scale=3TB fails with maxResultSize exceeded error |
#1576 | [BUG] loading SPARK-32639 example parquet file triggers a JVM crash |
#1643 | [BUG] TPC-DS-Like q10, q35, and q69 - slow or hanging at leftSemiJoin |
#1650 | [BUG] BenchmarkRunner does not include query name in JSON summary filename when running multiple queries |
#1654 | [BUG] TPC-DS-like query 59 at scale=3TB with AQE fails with join mismatch |
#1274 | [BUG] OutOfMemoryError - Maximum pool size exceeded while running 24 day criteo ETL Transform stage |
#1497 | [BUG] Spark-rapids v0.3.0 pytest integration tests with UCX on FAILED on Yarn cluster |
#1534 | [BUG] Spark 3.1.1 test failure in writing due to removal of InMemoryFileIndex.shouldFilterOut |
#1155 | [BUG] on shutdown don't print Socket closed exception when shutting down UCX.scala |
#1510 | [BUG] IllegalArgumentException during shuffle |
#1513 | [BUG] executor not fully initialized may get calls from Spark, in the process setting the catalog incorrectly |
#1466 | [BUG] Databricks build must run before the rapids nightly |
#1456 | [BUG] Databricks 0.4 parquet integration tests fail |
#1400 | [BUG] Regressions in spark-shell usage of benchmark utilities |
#1119 | [BUG] inner join fails with Column size cannot be negative |
#1079 | [BUG]The Scala UDF function cannot invoke the UDF compiler when it's passed to "explode" |
#1298 | TPCxBB query16 failed at UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary |
#1271 | [BUG] CastOpSuite and AnsiCastOpSuite failing with ArithmeticException on Spark 3.1 |
#84 | [BUG] sort does not match spark for -0.0 and 0.0 |
#578 | [BUG] Spark 3.1 qa_nightly_select_test.py Full join test failures |
#586 | [BUG] Spark3.1 tpch failures |
#837 | [BUG] Distinct count of floating point values differs with regular spark |
#953 | [BUG] 3.1.0 pos_explode tests are failing |
#127 | [BUG] String CSV parsing does not respect nullValues |
#1203 | [BUG] tpcds query 51 fails with join error on Spark 3.1.0 |
#750 | [BUG] udf_cudf_test::test_with_column fails with IPC error |
#1348 | [BUG] Host columnar decimal conversions are failing |
#1270 | [BUG] Benchmark runner fails to produce report if benchmark fails due to an invalid query plan |
#1179 | [BUG] SerializeConcatHostBuffersDeserializeBatch may have thread issues |
#1115 | [BUG] Unchecked type warning in SparkQueryCompareTestSuite |
#1963 | Update changelog 0.4 [skip ci] |
#1960 | Replace sonatype staging link with maven central link |
#1945 | Update changelog 0.4 [skip ci] |
#1910 | Make hash partitioning match CPU |
#1927 | Change cuDF dependency to 0.18.1 |
#1934 | Update documentation to use cudf version 0.18.1 |
#1871 | Disable coalesce batch spilling to avoid cudf contiguous_split bug |
#1849 | Update changelog for 0.4 |
#1744 | Fix NullPointerException on null partition insert |
#1842 | Update to note support for 3.0.2 |
#1832 | Spark 3.1.1 shim no longer a snapshot shim |
#1831 | Spark 3.0.2 shim no longer a snapshot shim |
#1826 | Remove benchmarks |
#1828 | Update cudf dependency to 0.18 |
#1813 | Fix LEAD/LAG failures in Spark 3.1.1 |
#1819 | Fix scaladoc warning in GpuCast |
#1820 | [BUG] make modified check pre-merge only |
#1780 | Remove SNAPSHOT from test and integration_test READMEs |
#1809 | check if modified files after update_config/supported |
#1804 | Update UCX documentation for RX_QUEUE_LEN and Docker |
#1810 | Pandas UDF: Sort the data before computing the sum. |
#1751 | Exclude foldable expressions from GPU if constant folding is disabled |
#1798 | Add documentation about explain not on GPU when AQE is on |
#1766 | Branch 0.4 release docs |
#1794 | Build python output schema from udf expressions |
#1783 | Fix the collect_list over window tests failures on db |
#1781 | Better float/double cases for casting tests |
#1790 | Record row counts in benchmark runs that call collect |
#1779 | Add support of DateType and TimestampType for GetTimestamp expression |
#1768 | Updating getting started Databricks docs |
#1742 | Fix regression supporting to_date with Spark-3.1 |
#1775 | Fix ambiguous ordering for some tests |
#1760 | Update GpuDataSourceScanExec and GpuBroadcastExchangeExec to fix audit issues |
#1750 | Detect task failures in benchmarks |
#1767 | Consistent Spark version for test and production |
#1741 | Reduce regex use in CAST |
#1756 | Skip RAPIDS accelerated Java UDF tests if UDF fails to load |
#1716 | Update RapidsShuffleManager documentation for branch 0.4 |
#1740 | Disable ORC writes until bug can be fixed |
#1747 | Fix resource leaks in unit tests |
#1725 | Branch 0.4 FAQ reorg |
#1718 | CAST string to temporal type now calls isTimestamp |
#1734 | Disable range partitioning if computation is needed |
#1723 | Removed StructTypes support for ParquetCachedBatchSerializer as cudf doesn't support it yet |
#1714 | Add support for RAPIDS accelerated Java UDFs |
#1713 | Call GpuDeviceManager.shutdown when the executor plugin is shutting down |
#1596 | Added in Decimal support to ParquetCachedBatchSerializer |
#1706 | cleanup unused is_before_spark_310 |
#1685 | Fix CustomShuffleReader replacement when decimal types enabled |
#1699 | Add docs about Spark 3.1 in standalone modes not needing extra class path |
#1701 | remove xfail for orc test_input_meta for spark 3.1.0 |
#1703 | Remove xfail for spark 3.1.0 test_broadcast_join_mixed FullOuter |
#1676 | BenchmarkRunner option to generate query plan diagrams in DOT format |
#1695 | support alternate jar paths |
#1694 | increase mem and limit parallelism for pre-merge |
#1691 | add validate_execs_in_gpu_plan to pytest.ini |
#1692 | Add the integration test resources to the test tarball |
#1677 | When PTDS is enabled, print warning if the allocator is not ARENA |
#1683 | update changelog to verify autotmerge 0.5 setup [skip ci] |
#1673 | support auto-merge for branch 0.5 [skip ci] |
#1681 | Xfail the collect_list tests for databricks |
#1678 | Fix array/struct checks in Sort and HashAggregate and sorting tests in distributed mode |
#1671 | Allow metrics to be configurable by level |
#1675 | add run_pyspark_from_build.sh to the pytest distribution tarball |
#1548 | Support executing collect_list on GPU with windowing. |
#1593 | Avoid unnecessary Table instances after contiguous split |
#1592 | Add in support for Decimal divide |
#1668 | Implement way for python integration tests to validate Exec is in GPU plan |
#1669 | Add FAQ entries for executor-per-GPU questions |
#1661 | Enable Parquet test for file containing map struct key |
#1664 | Filter nulls for left semi and left anti join to work around cudf |
#1665 | Add better automated tests for Arrow columnar copy in HostColumnarToGpu |
#1614 | add alluxio getting start document |
#1639 | support GpuScalarSubquery |
#1656 | Move UDF to Catalyst Expressions to its own document |
#1663 | BenchmarkRunner - Include query name in JSON summary filename |
#1655 | Fix extraneous shuffles added by AQE |
#1652 | Fix typo in arrow optimized config name - spark.rapids.arrowCopyOptimizationEnabled |
#1645 | Run Databricks IT with python-xdist parallel, includes test fixes and xfail |
#1649 | Move building from source docs to contributing guide |
#1637 | Fail DivModLike on zero divisor in ANSI mode |
#1646 | Update links in rapids-udfs.md after moving to subfolder |
#1641 | Xfail struct and array order by tests on Dataproc |
#1565 | Add GPU accelerated array_contains operator |
#1617 | Enable nightly test checks for Apache Spark |
#1636 | RAPIDS accelerated Spark Scala UDF support |
#1634 | Fix databricks build since Arrow code added |
#1599 | Add division by zero tests for Spark 3.1 behavior |
#1619 | Update GpuFileSourceScanExec to be in sync with DataSourceScanExec |
#1631 | Explicitly add maven-jar-plugin version to improve incremental build time. |
#1624 | Update explain format to show what will and will not run on the GPU |
#1622 | Support faster copy for a custom DataSource V2 which supplies Arrow data |
#1621 | Additional functionality docs |
#1618 | update blossom-ci for security updates [skip ci] |
#1562 | add alluxio support |
#1597 | Documentation for Parquet serializer |
#1611 | Add in flag for integration tests to not skip required tests |
#1609 | Disable float round/bround by default |
#1615 | Add in window support for average |
#1610 | Limit length of spark app name in BenchmarkRunner |
#1579 | Support TakeOrderedAndProject |
#1581 | Support Decimal type for CollectLimitExec |
#1591 | Add support for running multiple queries in BenchmarkRunner |
#1595 | Fix Github documentation issue template |
#1577 | rename directory from spark310 to spark311 |
#1578 | Test to track RAPIDS-side issues re SPARK-32639 |
#1583 | fix request-action issue [skip ci] |
#1555 | Enable ANSI mode for CAST string to timestamp |
#1531 | Decimal Support for writing Parquet |
#1545 | Support comparing ORC data |
#1570 | Branch 0.4 doc cleanup |
#1569 | Add shim method shouldIgnorePath |
#1564 | Add in support for Decimal Multiply and DIV |
#1561 | Decimal support for add and subtract |
#1560 | support sum in window aggregation for decimal |
#1546 | Cleanup shutdown logging for UCX shuffle |
#1551 | RAPIDS-accelerated Hive UDFs support all types |
#1543 | Shuffle/transport enabled by default |
#1552 | Disable blackduck signature check |
#1540 | Handle ShuffleManager api calls when plugin is not fully initialized |
#1547 | Cleanup shuffle transport receive calls |
#1512 | Support window operations on Decimal |
#1532 | Support casting from decimal to decimal |
#1542 | Change the number of partitions to zero when a range is empty |
#1506 | Add --use-decimals flag to TPC-DS ConvertFiles |
#1511 | Remove unused Jenkinsfiles [skip ci] |
#1505 | Add least, greatest and eqNullSafe support for DecimalType |
#1484 | add doc for nsight systems bundled with cuda toolkit |
#1478 | Documentation for RAPIDS-accelerated Hive UDFs |
#1477 | Allow structs and arrays to pass through for Shuffle and Sort |
#1489 | Adds in some support for the array sql function |
#1438 | Cast from numeric types to decimal type |
#1493 | Moved ParquetRecordMaterializer to the shim package to follow convention |
#1495 | Fix merge conflict, merge branch 0.3 to branch 0.4 [skip ci] |
#1472 | Add an example RAPIDS-accelerated Hive UDF using native code |
#1488 | Rename Spark 3.1.0 shim to Spark 3.1.1 to match community |
#1474 | Fix link |
#1476 | DecimalType support for Aggregate Count |
#1475 | Join support for DecimalType |
#1244 | Support round and bround SQL functions |
#1458 | Add in support for struct and named_struct |
#1465 | DecimalType support for UnionExec and ExpandExec |
#1450 | Add dynamic configs for the spark-rapids IT pipelines |
#1207 | Spark SQL hash function using murmur3 |
#1457 | Support reading decimal columns from parquet files on Databricks |
#1455 | Upgrade Scala Maven Plugin to 4.3.0 |
#1453 | DecimalType support for IfElse and Coalesce |
#1452 | Support DecimalType for CaseWhen |
#1444 | Improve UX when running benchmarks from Spark shell |
#1294 | Support reading decimal columns from parquet files |
#1153 | Scala UDF will compile children expressions in Project |
#1416 | Optimize mvn dependency download scripts |
#1430 | Add project for testing code that requires Spark 3.1.0 or later |
#1425 | Add in Decimal support for abs, floor, ceil, unary - and unary + |
#1427 | Revert "Make the multi-threaded parquet reader the default" |
#1420 | Add udf jar to nightly integration tests |
#1422 | Log the number of concurrent gpu tasks allowed on Executor startup |
#1401 | Accelerate the coalescing parquet reader when reading files from multiple partitioned folders |
#1413 | Add config for cast float to integral types |
#1313 | Support spilling to disk directly via cuFile/GDS |
#1411 | Add udf-examples jar to databricks build |
#1412 | Fix a lot of tests marked with xfail for Spark 3.1.0 that no longer fail |
#1414 | Build merged code of HEAD and BASE branch for pre-merge [skip ci] |
#1409 | Add option to use decimals in tpc-ds csv to parquet conversion |
#1410 | Add Decimal support for In, InSet, AtLeastNNonNulls, GetArrayItem, GetStructField, and GenerateExec |
#1408 | Support RAPIDS-accelerated HiveGenericUDF |
#1407 | Update docs and tests for null CSV support |
#1393 | Support RAPIDS-accelerated HiveSimpleUDF |
#1392 | Turn on hash partitioning for decimal support |
#1402 | Better GPU Cast type checks |
#1404 | Fix branch 0.4 merge conflict |
#1323 | More advanced type checking and documentation |
#1391 | Remove extra null join filtering because cudf is fast for this now. |
#1395 | Fix branch-0.3 -> branch-0.4 automerge |
#1382 | Handle "MM[/-]dd" and "dd[/-]MM" datetime formats in UnixTimeExprMeta |
#1390 | Accelerated columnar to row/row to columnar for decimal |
#1380 | Adds in basic support for decimal sort, sum, and some shuffle |
#1367 | Reuse gpu expression conversion rules when checking sort order |
#1349 | Add canonicalization tests |
#1368 | Move to cudf 0.18-SNAPSHOT |
#1361 | Use the correct precision when reading spark columnar data. |
#1273 | Update docs and scripts to 0.4.0-SNAPSHOT |
#1321 | Refactor to stop inheriting from HashJoin |
#1311 | ParquetCachedBatchSerializer code cleanup |
#1303 | Add explicit outputOrdering for BHJ and SHJ in spark310 shim |
#1299 | Benchmark runner improved error handling |
#1002 | [FEA] RapidsHostColumnVectorCore should verify cudf data with respect to the expected spark type |
#444 | [FEA] Plugable Cache |
#1158 | [FEA] Better documentation on type support |
#57 | [FEA] Support INT96 for parquet reads and writes |
#1003 | [FEA] Reduce overlap between RapidsHostColumnVector and RapidsHostColumnVectorCore |
#913 | [FEA] In Pluggable Cache Support CalendarInterval while creating CachedBatches |
#1092 | [FEA] In Pluggable Cache handle nested types having CalendarIntervalType and NullType |
#670 | [FEA] Support NullType |
#50 | [FEA] support spark.sql.legacy.timeParserPolicy |
#1144 | [FEA] Remove Databricks 3.0.0 shim layer |
#1096 | [FEA] Implement parquet CreateDataSourceTableAsSelectCommand |
#688 | [FEA] udf compiler should be auto-appended to spark.sql.extensions |
#502 | [FEA] Support Databricks 7.3 LTS Runtime |
#764 | [FEA] Sanity checks for cudf jar mismatch |
#1018 | [FEA] Log details related to GPU memory fragmentation on GPU OOM |
#619 | [FEA] log whether libcudf and libcudfjni were built for PTDS |
#905 | [FEA] create AWS EMR 3.0.1 shim |
#838 | [FEA] Support window count for a column |
#864 | [FEA] config option to enable RMM arena memory resource |
#430 | [FEA] Audit: Parquet Writer support for TIMESTAMP_MILLIS |
#818 | [FEA] Create shim layer for AWS EMR |
#608 | [FEA] Parquet small file optimization improve handle merge schema |
#446 | [FEA] Test jucx in 1.9.x branch |
#1038 | [FEA] Accelerate the data transfer for plan WindowInPandasExec |
#533 | [FEA] Improve PTDS performance |
#849 | [FEA] Have GpuColumnarBatchSerializer return GpuColumnVectorFromBuffer instances |
#784 | [FEA] Allow Host Spilling to be more dynamic |
#627 | [FEA] Further parquet reading small file improvements |
#5 | [FEA] Support Adaptive Execution |
#1423 | [BUG] Mortgage ETL sample failed with spark.sql.adaptive enabled on AWS EMR 6.2 |
#1369 | [BUG] TPC-DS Query Failing on EMR 6.2 with AQE |
#1344 | [BUG] Spark-rapids Pytests failed on On Databricks cluster spark standalone mode |
#1279 | [BUG] TPC-DS query 2 failing with NPE |
#1280 | [BUG] TPC-DS query 93 failing with UnsupportedOperationException |
#1308 | [BUG] TPC-DS query 14a runs much slower on 0.3 |
#1284 | [BUG] TPC-DS query 77 at scale=1TB fails with maxResultSize exceeded error |
#1061 | [BUG] orc_test.py is failing |
#1197 | [BUG] java.lang.NullPointerException when exporting delta table |
#685 | [BUG] In ParqueCachedBatchSerializer, serializing parquet buffers might blow up in certain cases |
#1269 | [BUG] GpuSubstring is not expected to be a part of a SortOrder |
#1246 | [BUG] Many TPC-DS benchmarks fail when writing to Parquet |
#961 | [BUG] ORC predicate pushdown should work with case-insensitive analysis |
#962 | [BUG] Loading columns from an ORC file without column names returns no data |
#1245 | [BUG] Code adding buffers to the spillable store should synchronize |
#570 | [BUG] Continue debugging OOM after ensuring device store is empty |
#972 | [BUG] total time metric is redundant with scan time |
#1039 | [BUG] UNBOUNDED window ranges on null timestamp columns produces incorrect results. |
#1195 | [BUG] AcceleratedColumnarToRowIterator queue empty |
#1177 | [BUG] leaks possible in the rapids shuffle if batches are received after the task completes |
#1216 | [BUG] Failure to recognize ORC file format when loaded via Hive |
#898 | [BUG] count reductions are failing on databricks because lack for Complete support |
#1184 | [BUG] test_window_aggregate_udf_array_from_python fails on databricks 3.0.1 |
#1151 | [BUG]Add databricks 3.0.1 shim layer for GpuWindowInPandasExec. |
#1199 | [BUG] No data size in Input column in Stages page from Spark UI when using Parquet as file source |
#1031 | [BUG] dependency info properties file contains error messages |
#1149 | [BUG] Scaladoc warnings in GpuDataSource |
#1185 | [BUG] test_hash_multiple_mode_query failing |
#724 | [BUG] PySpark test_broadcast_nested_loop_join_special_case intermittent failure |
#1164 | [BUG] ansi_cast tests are failing in 3.1.0 |
#1110 | [BUG] Special date "now" has wrong value on GPU |
#1139 | [BUG] Host columnar to GPU can be very slow |
#1094 | [BUG] unix_timestamp on GPU returns invalid data for special dates |
#1098 | [BUG] unix_timestamp on GPU returns invalid data for bad input |
#1082 | [BUG] string to timestamp conversion fails with split |
#1140 | [BUG] ConcurrentModificationException error after scala test suite completes |
#1073 | [BUG] java.lang.RuntimeException: BinaryExpressions must override either eval or nullSafeEval |
#975 | [BUG] BroadcastExchangeExec fails to fall back to CPU on driver node on GCP Dataproc |
#773 | [BUG] Investigate high task deserialization |
#1035 | [BUG] TPC-DS query 90 with AQE enabled fails with doExecuteBroadcast exception |
#825 | [BUG] test_window_aggs_for_ranges intermittently fails |
#1008 | [BUG] limit function is producing inconsistent result when type is Byte, Long, Boolean and Timestamp |
#996 | [BUG] TPC-DS benchmark via spark-submit does not provide option to disable appending .dat to path |
#1006 | [BUG] Spark3.1.0 changed BasicWriteTaskStats breaks BasicColumnarWriteTaskStatsTracker |
#985 | [BUG] missing metric dataSize |
#881 | [BUG] cannot disable Sort by itself |
#812 | [BUG] Test failures for 0.2 when run with multiple executors |
#925 | [BUG]Range window-functions with non-timestamp order-by expressions not falling back to CPU |
#852 | [BUG] BenchUtils.compareResults cannot compare partitioned files when ignoreOrdering=false |
#868 | [BUG] Rounding error when casting timestamp to string for timestamps before 1970 |
#880 | [BUG] doing a window operation with an orderby for a single constant crashes |
#776 | [BUG] Integration test fails on spark 3.1.0-SNAPSHOT |
#874 | [BUG] RapidsConf.scala has some un-consistency for spark.rapids.sql.format.parquet.multiThreadedRead |
#860 | [BUG] we need to mark columns from received shuffle buffers as GpuColumnVectorFromBuffer |
#122 | [BUG] CSV Timestamp parseing is broken for TS < 1902 and TS > 2038 |
#810 | [BUG] UDF Integration tests fail if pandas is not installed |
#746 | [BUG] cudf_udf_test.py is flakey |
#811 | [BUG] 0.3 nightly is timing out |
#574 | [BUG] Fix GpuTimeSub for Spark 3.1.0 |
#1496 | Update changelog for v0.3.0 release [skip ci] |
#1473 | Update documentation for 0.3 release |
#1371 | Start Guide for RAPIDS on AWS EMR 6.2 |
#1446 | Update changelog for 0.3.0 release [skip ci] |
#1439 | when AQE enabled we fail to fix up exchanges properly and EMR |
#1433 | fix pandas 1.2 compatible issue |
#1424 | Make the multi-threaded parquet reader the default since coalescing doesn't handle partitioned files well |
#1389 | Update project version to 0.3.0 |
#1387 | Update cudf version to 0.17 |
#1370 | [REVIEW] init changelog 0.3 [skip ci] |
#1376 | MetaUtils.getBatchFromMeta should return batches with GpuColumnVectorFromBuffer |
#1358 | auto-merge: instant merge after creation [skip ci] |
#1359 | Use SortOrder from shims. |
#1343 | Do not run UDFs when the partition is empty. |
#1342 | Fix and edit docs for standalone mode |
#1350 | fix GpuRangePartitioning canonicalization |
#1281 | Documentation added for testing |
#1336 | Fix missing post-shuffle coalesce with AQE |
#1318 | Fix copying GpuFileSourceScanExec node |
#1337 | Use UTC instead of GMT |
#1307 | Fallback to cpu when reading Delta log files for stats |
#1310 | Fix canonicalization of GpuFileSourceScanExec, GpuShuffleCoalesceExec |
#1302 | Add GpuSubstring handling to SortOrder canonicalization |
#1265 | Chunking input before writing a ParquetCachedBatch |
#1278 | Add a config to disable decimal types by default |
#1272 | Add Alias to shims |
#1268 | Adds in support docs for 0.3 release |
#1235 | Trigger reading and handling control data. |
#1266 | Updating Databricks getting started for 0.3 release |
#1291 | Increase pre-merge resource requests [skip ci] |
#1275 | Temporarily disable more CAST tests for Spark 3.1.0 |
#1264 | Fix race condition in batch creation |
#1260 | Update UCX license info in NOTIFY-binary for 1.9 and RAPIDS plugin copyright dates |
#1247 | Ensure column names are valid when writing benchmark query results to file |
#1240 | Fix loading from ORC file with no column names |
#1242 | Remove compatibility documentation about unsupported INT96 |
#1192 | [REVIEW] Support GpuFilter and GpuCoalesceBatches for decimal data |
#1170 | Add nested type support to MetaUtils |
#1194 | Drop redundant total time metric from scan |
#1248 | At BatchedTableCompressor.finish synchronize to allow for "right-size… |
#1169 | Use CUDF's "UNBOUNDED" window boundaries for time-range queries. |
#1204 | Avoid empty batches on columnar to row conversion |
#1133 | Refactor batch coalesce to be based solely on batch data size |
#1237 | In transport, limit pending transfer requests to fit within a bounce |
#1232 | Move SortOrder creation to shims |
#1068 | Write int96 to parquet |
#1193 | Verify shuffle of decimal columns |
#1180 | Remove batches if they are received after the iterator detects that t… |
#1173 | Support relational operators for decimal type |
#1220 | Support replacing ORC format when Hive is configured |
#1219 | Upgrade to jucx 1.9.0 |
#1081 | Add option to upload benchmark summary JSON file |
#1217 | Aggregate reductions in Complete mode should use updateExpressions |
#1218 | Remove obsolete HiveStringType usage |
#1214 | changelog update 2020-11-30. Trigger automerge check [skip ci] |
#1210 | Support auto-merge for branch-0.4 [skip ci] |
#1202 | Fix a bug with the support for java.lang.StringBuilder.append. |
#1213 | Skip casting StringType to TimestampType for Spark 310 |
#1201 | Replace only window expressions on databricks. |
#1208 | [BUG] Fix GHSL2020-239 [skip ci] |
#1205 | Fix missing input bytes read metric for Parquet |
#1206 | Update Spark 3.1 shim for ShuffleOrigin shuffle parameter |
#1196 | Rename ShuffleCoalesceExec to GpuShuffleCoalesceExec |
#1191 | Skip window array tests for databricks. |
#1183 | Support for CalendarIntervalType and NullType |
#1150 | udf spec |
#1188 | Add in tests for parquet nested pruning support |
#1189 | Enable NullType for First and Last in 3.0.1+ |
#1181 | Fix resource leaks in unit tests |
#1186 | Fix compilation and scaladoc warnings |
#1187 | Updated documentation for distinct count compatibility |
#1182 | Close buffer catalog on device manager shutdown |
#1137 | Let GpuWindowInPandas declare ArrayType supported. |
#1176 | Add in support for null type |
#1174 | Fix race condition in SerializeConcatHostBuffersDeserializeBatch |
#1175 | Fix leaks seen in shuffle tests |
#1138 | [REVIEW] Support decimal type for GpuProjectExec |
#1162 | Set job descriptions in benchmark runner |
#1172 | Revert "Fix race condition (#1165)" |
#1060 | Show partition metrics for custom shuffler reader |
#1152 | Add spark301db shim layer for WindowInPandas. |
#1167 | Nulls out the dataframe if --gc-between-runs is set |
#1165 | Fix race condition in SerializeConcatHostBuffersDeserializeBatch |
#1163 | Add in support for GetStructField |
#1166 | Fix the cast tests for 3.1.0+ |
#1159 | fix bug where 'now' had same value as 'today' for timestamps |
#1161 | Fix nightly build pipeline failure. |
#1160 | Fix some performance problems with columnar to columnar conversion |
#1105 | [REVIEW] Change ColumnViewAccess usage to work with ColumnView |
#1148 | Add in tests for Maps and extend map support where possible |
#1154 | Mark test as xfail until we can get a fix in |
#1113 | Support unix_timestamp on GPU for subset of formats |
#1156 | Fix warning introduced in iterator suite |
#1095 | Dependency info |
#1145 | Remove support for databricks 7.0 runtime - shim spark300db |
#1147 | Change the assert to require for handling TIMESTAMP_MILLIS in isDateTimeRebaseNeeded |
#1132 | Add in basic support to read structs from parquet |
#1121 | Shuffle/better error handling |
#1134 | Support saveAsTable for writing orc and parquet |
#1124 | Add shim layers for GpuWindowInPandasExec. |
#1131 | Add in some basic support for Structs |
#1127 | Add in basic support for reading lists from parquet |
#1129 | Fix resource leaks with new shuffle optimization |
#1116 | Optimize normal shuffle by coalescing smaller batches on host |
#1102 | Auto-register UDF extention when main plugin is set |
#1108 | Remove integration test pipelines on NGCC |
#1123 | Mark Pandas udf over window tests as xfail on databricks until they can be fixed |
#1120 | Add in support for filtering ArrayType |
#1080 | Support for CalendarIntervalType and NullType for ParquetCachedSerializer |
#994 | Packs bounce buffers for highly partitioned shuffles |
#1112 | Remove bad config from pytest setup |
#1107 | closeOnExcept -> withResources in MetaUtils |
#1104 | Support lists to/from the GPU |
#1106 | Improve mechanism for expected exceptions in tests |
#1069 | Accelerate the data transfer between JVM and Python for the plan 'GpuWindowInPandasExec' |
#1099 | Update how we deal with type checking |
#1077 | Improve AQE transitions for shuffle and coalesce batches |
#1097 | Cleanup some instances of excess closure serialization |
#1090 | Fix the integration build |
#1086 | Speed up test performance using pytest-xdist |
#1084 | Avoid issues where more scalars that expected show up in an expression |
#1076 | [FEA] Support Databricks 7.3 LTS Runtime |
#1083 | Revert "Get cudf/spark dependency from the correct .m2 dir" |
#1062 | Get cudf/spark dependency from the correct .m2 dir |
#1078 | Another round of fixes for mapping of DataType to DType |
#1066 | More fixes for conversion to ColumnarBatch |
#1029 | BenchmarkRunner should produce JSON summary file even when queries fail |
#1055 | Fix build warnings |
#1064 | Use array instead of List for from(Table, DataType) |
#1057 | Fix empty table broadcast requiring a GPU on driver node |
#1047 | Sanity checks for cudf jar mismatch |
#1044 | Accelerated row to columnar and columnar to row transitions |
#1056 | Add query number to Spark app name when running benchmarks |
#1054 | Log total RMM allocated on GPU OOM |
#1053 | Remove isGpuBroadcastNestedLoopJoin from shims |
#1052 | Allow for GPUCoalesceBatch to deal with Map |
#1051 | Add simple retry for URM dependencies [skip ci] |
#1046 | Fix broken links |
#1017 | Log whether PTDS is enabled |
#1040 | Update to cudf 0.17-SNAPSHOT and fix tests |
#1042 | Fix inconsistencies in AQE support for broadcast joins |
#1037 | Add in support for the SQL functions Least and Greatest |
#1036 | Increase number of retries when waiting for databricks cluster |
#1034 | [BUG] To honor spark.rapids.memory.gpu.pool=NONE |
#854 | Arbitrary function call in UDF |
#1028 | Update to cudf-0.16 |
#1023 | Add --gc-between-run flag for TPC* benchmarks. |
#1001 | ColumnarBatch to CachedBatch and back |
#990 | Parquet coalesce file reader for local filesystems |
#1014 | Add --append-dat flag for TPC-DS benchmark |
#991 | Updated GCP Dataproc Mortgage-ETL-GPU.ipynb |
#886 | Spark BinaryType and cast to BinaryType |
#1016 | Change Hash Aggregate to allow pass-through on MapType |
#984 | Add support for MapType in selected operators |
#1012 | Update for new position parameter in Spark 3.1.0 RegExpReplace |
#995 | Add shim for EMR 3.0.1 and EMR 3.0.1-SNAPSHOT |
#998 | Update benchmark automation script |
#1000 | Always use RAPIDS shuffle when running TPCH and Mortgage tests |
#981 | Change databricks build to dynamically create a cluster |
#986 | Fix missing dataSize metric when using RAPIDS shuffle |
#914 | Write InternalRow to CachedBatch |
#934 | Iterator to make it easier to work with a window of blocks in the RAPIDS shuffle |
#992 | Skip post-clean if aborted before the image build stage in pre-merge [skip ci] |
#988 | Change in Spark caused the 3.1.0 CI to fail |
#983 | clean jenkins file for premerge on NGCC |
#964 | Refactor TPC benchmarks to reduce duplicate code |
#978 | Enable scalastyle checks for udf-compiler module |
#949 | Fix GpuWindowExec to work with a CPU SortExec |
#973 | Stop reporting totalTime metric for GpuShuffleExchangeExec |
#968 | XFail pos_explode tests until final fix can be put in |
#970 | Add legacy config to clear active Spark 3.1.0 session in tests |
#918 | Benchmark runner script |
#915 | Add option to control number of partitions when converting from CSV to Parquet |
#944 | Fix some issues with non-determinism |
#935 | Add in support/tests for a window count on a column |
#940 | Fix closeOnExcept suppressed exception handling |
#942 | fix github action env setup [skip ci] |
#933 | Update first/last tests to avoid non-determinisim and ordering differences |
#931 | Fix checking for nullable columns in window range query |
#924 | Benchmark guide update for command-line interface / spark-submit |
#926 | Move pandas_udf functions into the tests functions |
#929 | Pick a default tableId to use that is non 0 so that flatbuffers allow… |
#928 | Fix RapidsBufferStore NPE when no spillable buffers are available |
#820 | Benchmarking guide |
#859 | Compare partitioned files in order |
#916 | create new sparkContext explicitly in CPU notebook |
#917 | create new SparkContext in GPU notebook explicitly. |
#919 | Add label benchmark to performance subsection in changelog |
#850 | Add in basic support for lead/lag |
#843 | [REVIEW] Cache plugin to handle reading CachedBatch to an InternalRow |
#904 | Add command-line argument for benchmark result filename |
#909 | GCP preview version image name update |
#903 | update getting-started-gcp.md with new component list |
#900 | Turn off CollectLimitExec replacement by default |
#907 | remove configs from databricks that shouldn't be used by default |
#893 | Fix rounding error when casting timestamp to string for timestamps before 1970 |
#899 | Mark reduction corner case tests as xfail on databricks until they can be fixed |
#894 | Replace whole-buffer slicing with direct refcounting |
#891 | Add config to dump heap on GPU OOM |
#890 | Clean up CoalesceBatch to use withResource |
#892 | Only manifest the current batch in cached block shuffle read iterator |
#871 | Add support for using the arena allocator |
#889 | Fix crash on scalar only orderby |
#879 | Update SpillableColumnarBatch to remove buffer from catalog on close |
#888 | Shrink detect scope to compile only [skip ci] |
#885 | [BUG] fix IT dockerfile arguments [skip ci] |
#883 | [BUG] fix IT dockerfile args ordering [skip ci] |
#875 | fix the non-consistency for spark.rapids.sql.format.parquet.multiThreadedRead in RapidsConf.scala |
#862 | Migrate nightly&integration pipelines to blossom [skip ci] |
#872 | Ensure that receive-side batches use GpuColumnVectorFromBuffer to avoid |
#833 | Add nvcomp LZ4 codec support |
#870 | Cleaned up tests and documentation for csv timestamp parsing |
#823 | Add command-line interface for TPC-* for use with spark-submit |
#856 | Move GpuWindowInPandasExec in shims layers |
#756 | Add stream-time metric |
#832 | Skip pandas tests if pandas cannot be found |
#841 | Fix a hanging issue when processing empty data. |
#840 | [REVIEW] Fixed failing cache tests |
#848 | Update task memory and disk spill metrics when buffer store spills |
#851 | Use contiguous table when deserializing columnar batch |
#857 | fix pvc scheduling issue |
#853 | Remove nodeAffinity from premerge pipeline |
#796 | Record spark plan SQL metrics to JSON when running benchmarks |
#781 | Add AQE unit tests |
#824 | Skip cudf_udf test by default |
#839 | First/Last reduction and cleanup of agg APIs |
#827 | Add Spark 3.0 EMR Shim layer |
#816 | [BUG] fix nightly is timing out |
#782 | Benchmark utility to perform diff of output from benchmark runs, allowing for precision differences |
#813 | Revert "Enable tests in udf_cudf_test.py" |
#788 | [FEA] Persist workspace data on PVC for premerge |
#805 | [FEA] nightly build trigger both IT on spark 300 and 301 |
#797 | Allow host spill store to fit a buffer larger than configured max size |
#807 | Deploy integration-tests javadoc and sources |
#777 | Enable tests in udf_cudf_test.py |
#790 | CI: Update cudf python to 0.16 nightly |
#772 | Add support for empty array construction. |
#783 | Improved GpuArrowEvalPythonExec |
#771 | Various improvements to benchmarks |
#763 | [REVIEW] Allow CoalesceBatch to spill data that is not in active use |
#727 | Update cudf dependency to 0.16-SNAPSHOT |
#726 | parquet writer support for TIMESTAMP_MILLIS |
#674 | Unit test for GPU exchange re-use with AQE |
#723 | Update code coverage to find source files in new places |
#766 | Update the integration Dockerfile to reduce the image size |
#762 | Fixing conflicts in branch-0.3 |
#738 | [auto-merge] branch-0.2 to branch-0.3 - resolve conflict |
#722 | Initial code changes to support spilling outside of shuffle |
#693 | Update jenkins files for 0.3 |
#692 | Merge shims dependency to spark-3.0.1 into branch-0.3 |
#690 | Update the version to 0.3.0-SNAPSHOT |
#696 | [FEA] run integration tests against SPARK-3.0.1 |
#455 | [FEA] Support UCX shuffle with optimized AQE |
#510 | [FEA] Investigate libcudf features needed to support struct schema pruning during loads |
#541 | [FEA] Scala UDF:Support for null Value operands |
#542 | [FEA] Scala UDF: Support for Date and Time |
#499 | [FEA] disable any kind of warnings about ExecutedCommandExec not being on the GPU |
#540 | [FEA] Scala UDF: Support for String replaceFirst() |
#340 | [FEA] widen the rendered Jekyll pages |
#602 | [FEA] don't release with any -SNAPSHOT dependencies |
#579 | [FEA] Auto-merge between branches |
#515 | [FEA] Write tests for AQE skewed join optimization |
#452 | [FEA] Update HashSortOptimizerSuite to work with AQE |
#454 | [FEA] Update GpuCoalesceBatchesSuite to work with AQE enabled |
#354 | [FEA]Spark 3.1 FileSourceScanExec adds parameter optionalNumCoalescedBuckets |
#566 | [FEA] Add support for StringSplit with an array index. |
#524 | [FEA] Add GPU specific metrics to GpuFileSourceScanExec |
#494 | [FEA] Add some AQE-specific tests to the PySpark test suite |
#146 | [FEA] Python tests should support running with Adaptive Query Execution enabled |
#465 | [FEA] Audit: Update script to audit multiple versions of Spark |
#488 | [FEA] Ability to limit total GPU memory used |
#70 | [FEA] Support StringSplit |
#403 | [FEA] Add in support for GetArrayItem |
#493 | [FEA] Implement shuffle optimization when AQE is enabled |
#500 | [FEA] Add maven profiles for testing with AQE on or off |
#471 | [FEA] create a formal process for updating the github-pages branch |
#233 | [FEA] Audit DataWritingCommandExec |
#240 | [FEA] Audit Api validation script follow on - Optimize StringToTypeTag |
#388 | [FEA] Audit WindowExec |
#425 | [FEA] Add tests for configs in BatchScan Readers |
#453 | [FEA] Update HashAggregatesSuite to work with AQE |
#184 | [FEA] Enable NoScalaDoc scalastyle rule |
#438 | [FEA] Enable StringLPad |
#232 | [FEA] Audit SortExec |
#236 | [FEA] Audit ShuffleExchangeExec |
#355 | [FEA] Support Multiple Spark versions in the same jar |
#385 | [FEA] Support RangeExec on the GPU |
#317 | [FEA] Write test wrapper to run SQL queries via pyspark |
#235 | [FEA] Audit BroadcastExchangeExec |
#234 | [FEA] Audit BatchScanExec |
#238 | [FEA] Audit ShuffledHashJoinExec |
#237 | [FEA] Audit BroadcastHashJoinExec |
#316 | [FEA] Add some basic Dataframe tests for CoalesceExec |
#145 | [FEA] Scala tests should support running with Adaptive Query Execution enabled |
#231 | [FEA] Audit ProjectExec |
#229 | [FEA] Audit FileSourceScanExec |
#326 | [DISCUSS] Shuffle read-side error handling |
#601 | [FEA] Optimize unnecessary sorts when replacing SortAggregate |
#333 | [FEA] Better handling of reading lots of small Parquet files |
#511 | [FEA] Connect shuffle table compression to shuffle exec metrics |
#15 | [FEA] Multiple threads sharing the same GPU |
#272 | [DOC] Getting started guide for UCX shuffle |
#780 | [BUG] Inner Join dropping data with bucketed Table input |
#569 | [BUG] left_semi_join operation is abnormal and serious time-consuming |
#744 | [BUG] TPC-DS query 6 now produces incorrect results. |
#718 | [BUG] GpuBroadcastHashJoinExec ArrayIndexOutOfBoundsException |
#698 | [BUG] batch coalesce can fail to appear between columnar shuffle and subsequent columnar operation |
#658 | [BUG] GpuCoalesceBatches collectTime metric can be underreported |
#59 | [BUG] enable tests for string literals in a select |
#486 | [BUG] GpuWindowExec does not implement requiredChildOrdering |
#631 | [BUG] Rows are dropped when AQE is enabled in some cases |
#671 | [BUG] Databricks hash_aggregate_test fails trying to canonicalize a WrappedAggFunction |
#218 | [BUG] Window function COUNT(x) includes null-values, when it shouldn't |
#153 | [BUG] Incorrect output from partial-only hash aggregates with multiple distincts and non-distinct functions |
#656 | [BUG] integration tests produce hive metadata files |
#607 | [BUG] Fix misleading "cannot run on GPU" warnings when AQE is enabled |
#630 | [BUG] GpuCustomShuffleReader metrics always show zero rows/batches output |
#643 | [BUG] race condition while registering a buffer and spilling at the same time |
#606 | [BUG] Multiple scans for same data source with TPC-DS query59 with delta format |
#626 | [BUG] parquet_test showing leaked memory buffer |
#155 | [BUG] Incorrect output from averages with filters in partial only mode |
#277 | [BUG] HashAggregateSuite failure when AQE is enabled |
#276 | [BUG] GpuCoalesceBatchSuite failure when AQE is enabled |
#598 | [BUG] Non-deterministic output from MapOutputTracker.getStatistics() with AQE on GPU |
#192 | [BUG] test_read_merge_schema fails on Databricks |
#341 | [BUG] Document compression formats for readers/writers |
#587 | [BUG] Spark3.1 changed FileScan which means or GpuScans need to be added to shim layer |
#362 | [BUG] Implement getReaderForRange in the RapidsShuffleManager |
#528 | [BUG] HashAggregateSuite "Avg Distinct with filter" no longer valid when testing against Spark 3.1.0 |
#416 | [BUG] Fix Spark 3.1.0 integration tests |
#556 | [BUG] NPE when removing shuffle |
#553 | [BUG] GpuColumnVector build warnings from raw type access |
#492 | [BUG] Re-enable AQE integration tests |
#275 | [BUG] TpchLike query 2 fails when AQE is enabled |
#508 | [BUG] GpuUnion publishes metrics on the UI that are all 0 |
#269 | Needed to add --conf spark.driver.extraClassPath= |
#473 | [BUG] PartMerge:countDistinct:sum fails sporadically |
#531 | [BUG] Temporary RMM workaround needs to be removed |
#532 | [BUG] NPE when enabling shuffle manager |
#525 | [BUG] GpuFilterExec reports incorrect nullability of output in some cases |
#483 | [BUG] Multiple scans for the same parquet data source |
#382 | [BUG] Spark3.1 StringFallbackSuite regexp_replace null cpu fall back test fails. |
#489 | [FEA] Fix Spark 3.1 GpuHashJoin since it now requires CodegenSupport |
#441 | [BUG] test_broadcast_nested_loop_join_special_case fails on databricks |
#347 | [BUG] Failed to read Parquet file generated by GPU-enabled Spark. |
#433 | InSet operator produces an error for Strings |
#144 | [BUG] spark.sql.legacy.parquet.datetimeRebaseModeInWrite is ignored |
#323 | [BUG] GpuBroadcastNestedLoopJoinExec can fail if there are no columns |
#356 | [BUG] Integration cache test for BroadcastNestedLoopJoin failure |
#280 | [BUG] Full Outer Join does not work on nullable keys |
#149 | [BUG] Spark driver fails to load native libs when running on node without CUDA |
#826 | Fix link to cudf-0.15-cuda11.jar |
#815 | Update documentation for Scala UDFs in 0.2 since you need two things |
#802 | Update 0.2 CHANGELOG |
#793 | Update Jenkins scripts for release |
#798 | Fix shims provider override config not being seen by executors |
#785 | Make shuffle run on CPU if we do a join where we read from bucketed table |
#765 | Add config to override shims provider class |
#759 | Add CHANGELOG for release 0.2 |
#758 | Skip the udf test fails periodically. |
#752 | Fix snapshot plugin jar version in docs |
#751 | Correct the channel for cudf installation |
#754 | Filter nulls from joins where possible to improve performance |
#732 | Add a timeout for RapidsShuffleIterator to prevent jobs to hang infin… |
#637 | Documentation changes for 0.2 release |
#747 | Disable udf tests that fail periodically |
#745 | Revert Null Join Filter |
#741 | Fix issue with parquet partitioned reads |
#733 | Remove GPU Types from github |
#720 | Stop removing GpuCoalesceBatches from non-AQE queries when AQE is enabled |
#729 | Fix collect time metric in CoalesceBatches |
#640 | Support running Pandas UDFs on GPUs in Python processes. |
#721 | Add some more checks to databricks build scripts |
#714 | Move spark 3.0.1-shims out of snapshot-shims |
#711 | fix blossom checkout repo |
#709 | [BUG] fix unexpected indentation issue in blossom yml |
#642 | Init workflow for blossom-ci |
#705 | Enable configuration check for cast string to timestamp |
#702 | Update slack channel for Jenkins builds |
#701 | fix checkout-ref for automerge |
#695 | Fix spark-3.0.1 shim to be released |
#668 | refactor automerge to support merge for protected branch |
#687 | Include the UDF compiler in the dist jar |
#689 | Change shims dependency to spark-3.0.1 |
#677 | Use multi-threaded parquet read with small files |
#638 | Add Parquet-based cache serializer |
#613 | Enable UCX + AQE |
#684 | Enable test for literal string values in a select |
#686 | Remove sorts when replacing sort aggregate if possible |
#675 | Added TimeAdd |
#645 | [window] Add GpuWindowExec requiredChildOrdering |
#676 | fixUpJoinConsistency rule now works when AQE is enabled |
#683 | Fix issues with cannonicalization of WrappedAggFunction |
#682 | Fix path to start-slave.sh script in docs |
#673 | Increase build timeouts on nightly and premerge builds |
#648 | add signoff-check use github actions |
#593 | Add support for isNaN and datetime related instructions in UDF compiler |
#666 | [window] Disable GPU for COUNT(exp) queries |
#655 | Implement AQE unit test for InsertAdaptiveSparkPlan |
#614 | Fix for aggregation with multiple distinct and non distinct functions |
#657 | Fix verify build after integration tests are run |
#660 | Add in neverReplaceExec and several rules for it |
#639 | BooleanType test shouldn't xfail |
#652 | Mark UVM config as internal until supported |
#653 | Move to the cudf-0.15 release |
#647 | Improve warnings about AQE nodes not supported on GPU |
#646 | Stop reporting zero metrics for GpuCustomShuffleReader |
#644 | Small fix for race in catalog where a buffer could get spilled while … |
#623 | Fix issues with canonicalization |
#599 | [FEA] changelog generator |
#563 | cudf and spark version info in artifacts |
#633 | Fix leak if RebaseHelper throws during Parquet read |
#632 | Copy function isSearchableType from Spark because signature changed in 3.0.1 |
#583 | Add udf compiler unit tests |
#617 | Documentation updates for branch 0.2 |
#616 | Add config to reserve GPU memory |
#612 | [REVIEW] Fix incorrect output from averages with filters in partial only mode |
#609 | fix minor issues with instructions for building ucx |
#611 | Added in profile to enable shims for SNAPSHOT releases |
#595 | Parquet small file reading optimization |
#582 | fix #579 Auto-merge between branches |
#536 | Add test for skewed join optimization when AQE is enabled |
#603 | Fix data size metric always 0 when using RAPIDS shuffle |
#600 | Fix calculation of string data for compressed batches |
#597 | Remove the xfail for parquet test_read_merge_schema on Databricks |
#591 | Add ucx license in NOTICE-binary |
#596 | Add Spark 3.0.2 to Shim layer |
#594 | Filter nulls from joins where possible to improve performance. |
#590 | Move GpuParquetScan/GpuOrcScan into Shim |
#588 | xfail the tpch spark 3.1.0 tests that fail |
#572 | Update buffer store to return compressed batches directly, add compression NVTX ranges |
#558 | Fix unit tests when AQE is enabled |
#580 | xfail the Spark 3.1.0 integration tests that fail |
#565 | Minor improvements to TPC-DS benchmarking code |
#567 | Explicitly disable AQE in one test |
#571 | Fix Databricks shim layer for GpuFileSourceScanExec and GpuBroadcastExchangeExec |
#564 | Add GPU decode time metric to scans |
#562 | getCatalog can be called from the driver, and can return null |
#555 | Fix build warnings for ColumnViewAccess |
#560 | Fix databricks build for AQE support |
#557 | Fix tests failing on Spark 3.1 |
#547 | Add GPU metrics to GpuFileSourceScanExec |
#462 | Implement optimized AQE support so that exchanges run on GPU where possible |
#550 | Document Parquet and ORC compression support |
#539 | Update script to audit multiple Spark versions |
#543 | Add metrics to GpuUnion operator |
#549 | Move spark shim properties to top level pom |
#497 | Add UDF compiler implementations |
#487 | Add framework for batch compression of shuffle partitions |
#544 | Add in driverExtraClassPath for standalone mode docs |
#546 | Fix Spark 3.1.0 shim build error in GpuHashJoin |
#537 | Use fresh SparkSession when capturing to avoid late capture of previous query |
#538 | Revert "Temporary workaround for RMM initial pool size bug (#530)" |
#517 | Add config to limit maximum RMM pool size |
#527 | Add support for split and getArrayIndex |
#534 | Fixes bugs around GpuShuffleEnv initialization |
#529 | [BUG] Degenerate table metas were not getting copied to the heap |
#530 | Temporary workaround for RMM initial pool size bug |
#526 | Fix bug with nullability reporting in GpuFilterExec |
#521 | Fix typo with databricks shim classname SparkShimServiceProvider |
#522 | Use SQLConf instead of SparkConf when looking up SQL configs |
#518 | Fix init order issue in GpuShuffleEnv when RAPIDS shuffle configured |
#514 | Added clarification of RegExpReplace, DateDiff, made descriptive text consistent |
#506 | Add in basic support for running tpcds like queries |
#504 | Add ability to ignore tests depending on spark shim version |
#503 | Remove unused async buffer spill support |
#501 | disable codegen in 3.1 shim for hash join |
#466 | Optimize and fix Api validation script |
#481 | Codeowners |
#439 | Check a PR has been committed using git signoff |
#319 | Update partitioning logic in ShuffledBatchRDD |
#491 | Temporarily ignore AQE integration tests |
#490 | Fix Spark 3.1.0 build for HashJoin changes |
#482 | Prevent bad practice in python tests |
#485 | Show plan in assertion message if test fails |
#480 | Fix link from README to getting-started.md |
#448 | Preliminary support for keeping broadcast exchanges on GPU when AQE is enabled |
#478 | Fall back to CPU for binary as string in parquet |
#477 | Fix special case joins in broadcast nested loop join |
#469 | Update HashAggregateSuite to work with AQE |
#475 | Udf compiler pom followup |
#434 | Add UDF compiler skeleton |
#474 | Re-enable noscaladoc check |
#461 | Fix comments style to pass scala style check |
#468 | fix broken link |
#456 | Add closeOnExcept to clean up code that closes resources only on exceptions |
#464 | Turn off noscaladoc rule until codebase is fixed |
#449 | Enforce NoScalaDoc rule in scalastyle checks |
#450 | Enable scalastyle for shuffle plugin |
#451 | Databricks remove unneeded files and fix build to not fail on rm when file missing |
#442 | Shim layer support for Spark 3.0.0 Databricks |
#447 | Add scalastyle plugin to shim module |
#426 | Update BufferMeta to support multiple codec buffers per table |
#440 | Run mortgage test both with AQE on and off |
#445 | Added in StringRPad and StringLPad |
#422 | Documentation updates |
#437 | Fix bug with InSet and Strings |
#435 | Add in checks for Parquet LEGACY date/time rebase |
#432 | Fix batch use-after-close in partitioning, shuffle env init |
#423 | Fix duplicates includes in assembly jar |
#418 | CI Add unit tests running for Spark 3.0.1 |
#421 | Make it easier to run TPCxBB benchmarks from spark shell |
#413 | Fix download link |
#414 | Shim Layer to support multiple Spark versions |
#406 | Update cast handling to deal with new libcudf casting limitations |
#405 | Change slave->worker |
#395 | Databricks doc updates |
#401 | Extended the FAQ |
#398 | Add tests for GpuPartition |
#352 | Change spark tgz package name |
#397 | Fix small bug in ShuffleBufferCatalog.hasActiveShuffle |
#286 | [REVIEW] Updated join tests for cache |
#393 | Contributor license agreement |
#389 | Added in support for RangeExec |
#390 | Ucx getting started |
#391 | Hide slack channel in Jenkins scripts |
#387 | Remove the term whitelist |
#365 | [REVIEW] Timesub tests |
#383 | Test utility to compare SQL query results between CPU and GPU |
#380 | Fix databricks notebook link |
#378 | Added in FAQ and fixed spelling |
#377 | Update heading in configs.md |
#373 | Modifying branch name to conform with rapidsai branch name change |
#376 | Add our session extension correctly if there are other extensions configured |
#374 | Fix rat issue for notebooks |
#364 | Update Databricks patch for changes to GpuSortMergeJoin |
#371 | fix typo and use regional bucket per GCP's update |
#359 | Karthik changes |
#353 | Fix broadcast nested loop join for the no column case |
#313 | Additional tests for broadcast hash join |
#342 | Implement build-side rules for shuffle hash join |
#349 | Updated join code to treat null equality properly |
#335 | Integration tests on spark 3.0.1-SNAPSHOT & 3.1.0-SNAPSHOT |
#346 | Update the Title Header for Fine Tuning |
#344 | Fix small typo in readme |
#331 | Adds iterator and client unit tests, and prepares for more fetch failure handling |
#337 | Fix Scala compile phase to allow Java classes referencing Scala classes |
#332 | Match GPU overwritten functions with SQL functions from FunctionRegistry |
#339 | Fix databricks build |
#338 | Move GpuPartitioning to a separate file |
#310 | Update release Jenkinsfile for Databricks |
#330 | Hide private info in Jenkins scripts |
#324 | Add in basic support for GpuCartesianProductExec |
#328 | Enable slack notification for Databricks build |
#321 | update databricks patch for GpuBroadcastNestedLoopJoinExec |
#322 | Add oss.sonatype.org to download the cudf jar |
#320 | Don't mount passwd/group to the container |
#258 | Enable running TPCH tests with AQE enabled |
#318 | Build docker image with Dockerfile |
#309 | Update databricks patch to latest changes |
#312 | Trigger branch-0.2 integration test |
#307 | [Jenkins] Update the release script and Jenkinsfile |
#304 | [DOC][Minor] Fix typo in spark config name. |
#303 | Update compatibility doc for -0.0 issues |
#301 | Add info about branches in README.md |
#296 | Added in basic support for broadcast nested loop join |
#297 | Databricks CI improvements and support runtime env parameter to xfail certain tests |
#292 | Move artifacts version in version-def.sh |
#254 | Cleanup QA tests |
#289 | Clean up GpuCollectLimitMeta and add in metrics |
#287 | Add in support for right join and fix issues build right |
#273 | Added releases to the README.md |
#285 | modify run_pyspark_from_build.sh to be bash 3 friendly |
#281 | Add in support for Full Outer Join on non-null keys |
#274 | Add RapidsDiskStore tests |
#259 | Add RapidsHostMemoryStore tests |
#282 | Update Databricks patch for 0.2 branch |
#261 | Add conditional xfail test for DISTINCT aggregates with NaN |
#263 | More time ops |
#256 | Remove special cases for contains, startsWith, and endWith |
#253 | Remove GpuAttributeReference and GpuSortOrder |
#271 | Update the versions for 0.2.0 properly for the databricks build |
#162 | Integration tests for corner cases in window functions. |
#264 | Add a local mvn repo for nightly pipeline |
#262 | Refer to branch-0.2 |
#255 | Revert change to make dependencies of shaded jar optional |
#257 | Fix link to RAPIDS cudf in index.md |
#252 | Update to 0.2.0-SNAPSHOT and cudf-0.15-SNAPSHOT |
#74 | [FEA] Support ToUnixTimestamp |
#21 | [FEA] NormalizeNansAndZeros |
#105 | [FEA] integration tests for equi-joins |
#116 | [BUG] calling replace with a NULL throws an exception |
#168 | [BUG] GpuUnitTests Date tests leak column vectors |
#209 | [BUG] Developers section in pom need to be updated |
#204 | [BUG] Code coverage docs are out of date |
#154 | [BUG] Incorrect output from partial-only averages with nulls |
#61 | [BUG] Cannot disable Parquet, ORC, CSV reading when using FileSourceScanExec |
#249 | Compatability -> Compatibility |
#247 | Add index.md for default doc page, fix table formatting for configs |
#241 | Let default branch to master per the release rule |
#177 | Fixed leaks in unit test and use ColumnarBatch for testing |
#243 | Jenkins file for Databricks release |
#225 | Make internal project dependencies optional for shaded artifact |
#242 | Add site pages |
#221 | Databricks Build Support |
#215 | Remove CudfColumnVector |
#213 | Add RapidsDeviceMemoryStore tests |
#214 | [REVIEW] Test failure to pass Attribute as GpuAttribute |
#211 | Add project leads to pom developer list |
#210 | Updated coverage docs |
#195 | Support public release for plugin jar |
#208 | Remove unneeded comment from pom.xml |
#191 | WindowExec handle different spark distributions |
#181 | Remove INCOMPAT for NormalizeNanAndZero, KnownFloatingPointNormalized |
#196 | Update Spark dependency to the released 3.0.0 artifacts |
#206 | Change groupID to 'com.nvidia' in IT scripts |
#202 | Fixed issue for contains when searching for an empty string |
#201 | Fix name of scan |
#200 | Fix issue with GpuAttributeReference not overrideing references |
#197 | Fix metrics for writes |
#186 | Fixed issue with nullability on concat |
#193 | Add RapidsBufferCatalog tests |
#188 | rebrand to com.nvidia instead of ai.rapids |
#189 | Handle AggregateExpression having resultIds parameter instead of a single resultId |
#190 | FileSourceScanExec can have logicalRelation parameter on some distributions |
#185 | Update type of parameter of GpuExpandExec to make it consistent |
#172 | Merge qa test to integration test |
#180 | Add MetaUtils unit tests |
#171 | Cleanup scaladoc warnings about missing links |
#176 | Updated join tests to cover more data. |
#169 | Remove dependency on shaded Spark artifact |
#174 | Added in fallback tests |
#165 | Move input metadata tests to pyspark |
#173 | Fix setting local mode for tests |
#160 | Integration tests for normalizing NaN/zeroes. |
#163 | Ignore the order locally for repartition tests |
#157 | Add partial and final only hash aggregate tests and fix nulls corner case for Average |
#159 | Add integration tests for joins |
#158 | Orc merge schema fallback and FileScan format configs |
#164 | Fix compiler warnings |
#152 | Moved cudf to 0.14 for CI |
#151 | Switch CICD pipelines to Github |
Changelog of older releases can be found at docs/archives