Skip to content

Releases: NVIDIA/DALI

DALI v0.22.0

09 Jun 08:57
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • DALI now supports CUDA 11:
    • DALI builds for CUDA 11 are now available.
    • CUDA 9 support has been deprecated.
    • DALI 0.22.0 is the final release that provides a CUDA 9 build.
  • Support is now available for the Ampere Hardware JPEG decoder.
  • The following new operators are now available:
    • NumpyReader, which allows you to read standard .npy (NumPy) files (#1858).
    • CoordFlip for CPU and GPU (#1894, #1895).
  • Readers can be set to read files directly instead of using mmap, which improves network filesystems performance (#1909).
  • DALI can be built as a CMake subproject (#1924).

Bug fixes

  • Fix TL1_tensorflow-dali_test (#1869)
  • Hotfix of external_source.py (#1878)
  • Build fix for aarch64 (incorrect cmake dependency) (#1883)
  • Fix TL1_ssd_training test by freezing apex version (#1898)
  • Fix support for dynamic per-sample shape in Warp operators (#1911)
  • Remove Optical flow test bug (#1902)
  • Fix jitter operator illegal memory access (#1914)
  • Fix setup_packages.py after pip update to 20.1 version (#1916)
  • Fix TL1_python-nvjpeg_test test dependency (#1926)
  • L1 test fix for Xavier (#1936)
  • Fix tensorflow_dataset test to run on any power of 2 number of GPUs (#1935)
  • Fix a race condition in ExternalSourceTest test (#1943)

Improvements

  • Add support for array and cuda_array interface for DALI tensor (#1857)
  • Add collapse_dim and collapse_dims for TensorListShape. (#1862)
  • Add support for TensorFlow 2.2.0rc2 (#1860)
  • Add ExternalSource to "C API" (#1865)
  • Numpy reader (#1858)
  • Add TensorGPU and TensorListGPU constructors based on CUDA array interface (#1868)
  • Bump up OpenCV version to 4.3.0, libturbo-jpeg to 2.0.4, libtiff to 4.1.0, FFmpeg to 4.2.2 (#1783)
  • Add "no exec check" to SmallVector to prevent warnings in host-only functions. (#1870)
  • Allow for a separate dali_tf_plugin pip wheel step (#1856)
  • QA tests: Fix nvidia-dali-tf-plugin to uninstall weekly and nightly packages (#1877)
  • make install target for installing DALI on system where it's build (#1854)
  • Allow RandomBBoxCrop thresholds to refer to relative overlap alternatively to IoU (#1874)
  • Add a link to release notes in the docs (#1881)
  • Operator diagnostics mechanism (#1880)
  • Reductions: position-dependent preprocessing, kernels for unhandled edge cases (#1884)
  • Update Horovod in Tensorflow test (#1887)
  • Add an ability to strip DALI whl binary from debug symbols (#1897)
  • Extend conda testing (#1784)
  • Copy out core* files if the test_body fails (#1890)
  • Make volume return 1 for 0-dim shape. (#1906)
  • Update DALI PyTorch RN50 example to the latest AMP version (#1888)
  • Add a specialized TF dataset for conda (#1910)
  • Deserialize pipeline in python API (#1912)
  • Add CoordFlip CPU operator (#1894)
  • Restore an ability to use direct read of files instead of mmap (#1909)
  • Use only ImportError in setup_packages (#1922)
  • Collect exit code from test_body (#1923)
  • Coordinate Flip GPU operator (#1895)
  • DALI as a git submodule (#1924)
  • Add Erase GPU Kernel (#1903)
  • C API ExternalSource for GPU input (#1892)
  • Fix warning condition in ExternalSource (#1934)
  • Reduce GPU - kernel frontend (#1882)
  • Add checking alignment argument for 0 in the pad operator (#1937)
  • Move from http://xiph.org to GitHub for libflac, libvorbis and libogg (#1938)
  • C API function: inherit parameters from serialized pipeline (#1932)
  • Use LinearTransformation kernel in ColorTwist GPU Op (#1918)
  • Adjust test sizes for Erase GPU Kernel (#1939)
  • Use user stream by default in copy_to_external/feed_ndarray (#1921)
  • Move to TensorFlow 2.2.0 from 2.2.0-RC2 (#1946)
  • Add support for random_shuffle argument in test_RN50_data_pipeline (#1945)
  • Proper DALI initialization in process & daliInitialize function (#1929)
  • Update clang version to 8.0.1 in deps image (#1949)
  • Add support for nvjpeg HW decoder, including rework to accommodate different decoding methods in one batch
  • Fix "hw_decoder_load" handling for slice/cropImageDecoder for nvJPEG
  • Move HW decoding to separate stream
  • Fix linter in nvjpeg HW decoder
  • Deprecate CUDA 9
  • Add CUDA 11 to the installation guide and build.sh

Breaking API changes

None

Deprecated feature

  • CUDA 9 support is deprecated. DALI 0.22.0 is the last release that provides CUDA 9 build.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.22.0
or for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.22.0
or for CUDA 11:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/11.0 nvidia-dali==0.22.0

Or use direct download links (CUDA 9.0):
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.22.0-1313462-cp35-cp35m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.22.0-1313462-cp36-cp36m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.22.0-1313462-cp37-cp37m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.22.0-1313462-cp38-cp38-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.22.0.tar.gz

Or use direct download links (CUDA 10.0):

https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.22.0-1313464-cp35-cp35m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.22.0-1313464-cp36-cp36m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.22.0-1313464-cp37-cp37m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.22.0-1313464-cp38-cp38-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.22.0.tar.gz

Or use direct download links (CUDA 11.0):

https://developer.download.nvidia.com/compute/redist/cuda/11.0/nvidia-dali/nvidia_dali-0.22.0-1313465-cp35-cp35m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/11.0/nvidia-dali/nvidia_dali-0.22.0-1313465-cp36-cp36m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/11.0/nvidia-dali/nvidia_dali-0.22.0-1313465-cp37-cp37m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/11.0/nvidia-dali/nvidia_dali-0.22.0-1313465-cp38-cp38-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/11.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.22.0.tar.gz

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v0.21.0

28 Apr 16:39
Compare
Choose a tag to compare
DALI v0.21.0 Pre-release
Pre-release

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • Introduced experimental Functional API (#1598):
    • Operators can be used directly with a single call, no need to create an instance with a constructor
    • DALI pipeline can be used in Context Manager
    • There is no need to subclass Pipeline
  • Simplified usage of ExternalSource (#1598, #1832) - it accepts callbacks or generators as a parameter.
  • Added Python 3.8 build and support (#1782)
  • Allowed seed to be set for serialized pipeline (#1844)

New operators:

  • ToDecibels GPU operator (#1837)
  • One hot encoding CPU operator (#1807)

Bug fixes

  • Fix positional argument propagation in TF Dataset (#1798)
  • Fix parameter name in data_node._check. (#1816)
  • Fix Transpose bugs - degenerate dims and non-uniform GPU (#1817)
  • Fix sharding.png image link in multigpu example (#1821)
  • Fix collecting vector arguments in rotate_params. (#1841)
  • Fix a leak of the last created DALI pipeline instance (#1845)
  • Remove of usage of internal Sphinx _MockImporter method (#1861)
  • Make SSDRandomCrop calculate crop window in double precision (#1848)

Improvements

  • Move RNNT test to Torch specific tests (#1805)
  • Propagate layout in cast operator (#1801)
  • Add proper type info for optional arguments in schema (#1769)
  • Add missing new line for section anchor in rst (#1808)
  • Add missing #include <cstdint> to util and math_util. (#1810)
  • Update file_list argument description in FileReader (#1779)
  • Functional API + improved ExternalSource + improved Pipeline (#1598)
  • GPU reduction kernels part 1 - non-directional batched and global reductions (#1806)
  • Enable NVTX profiling information for CUDA 10 by default (#1793)
  • Make read function provided to FFmpeg return AVERROR_EOF for EOF (#1814)
  • Make DALI buildable for Python 3.8 (#1782)
  • Allow empty arrays in MXNet iterator (#1815)
  • Ignore VS Code settings directory in Git (#1826)
  • Reworks setup_packages script (#1820)
  • Add one hot encoding operator (CPU backend) (#1807)
  • New page layout of Supported Operations & "How to verify DALI build" description in compilation tutorial (#1722)
  • Generator support in ExternalSource (#1832)
  • 3d RandomBboxCrop (#1785)
  • Update TF RN50 performance test threshold to make it pass on dgx1v32GB (#1838)
  • ToDecibels GPU kernel (#1836)
  • Add ReduceAllGPU kernel (#1839)
  • Directional reduction CUDA kernels (#1840)
  • Rename CPU reductions; separate reduction functors from kernels. (#1846)
  • ToDecibels GPU operator (#1837)
  • Allow seed to be set for serialized pipeline (#1844)
  • Change StrictVersion to LooseVersion in TensorFlow plugin (#1851)
  • Make reader respect shard_id pipeline argument in tf.data.Dataset with multiple GPUs example (#1850)

Breaking API changes

None

Deprecated feature

  • CUDA 9 support will end in several releases (#1684)

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.21.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.21.0

Or use direct download links (CUDA 9.0):
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.21.0-1239037-cp35-cp35m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.21.0-1239037-cp36-cp36m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.21.0-1239037-cp37-cp37m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali/nvidia_dali-0.21.0-1239037-cp38-cp38-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/9.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.21.0.tar.gz

Or use direct download links (CUDA 10.0):

https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.21.0-1239036-cp35-cp35m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.21.0-1239036-cp36-cp36m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.21.0-1239036-cp37-cp37m-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali/nvidia_dali-0.21.0-1239036-cp38-cp38-manylinux1_x86_64.whl
https://developer.download.nvidia.com/compute/redist/cuda/10.0/nvidia-dali-tf-plugin/nvidia-dali-tf-plugin-0.21.0.tar.gz

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v0.20.0

27 Mar 16:56
Compare
Choose a tag to compare
DALI v0.20.0 Pre-release
Pre-release

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

Added operators:

  • Spectrogram for GPU (#1786)
  • MelFilterBank for GPU (#1796)

Allow align-only behavior in Pad operator by treating shape argument as minimum shape (#1764)
Added data_ptr method to Tensor and TensorList (#1773) - it enables __array_interface__ and __cuda_array_interface__ support.
Extended shape support in DALI Dataset for TensorFlow (#1723)
Documentation improvements: layouts, Python API.
Added Gluon iterator plugin (#1683)

Bug fixes

  • Fix bug in TransposeCPU & ToDecibels operators (#1729)
  • Fix BBFlip issues (#1738)
  • Fix build without NVJPEG (#1739)
  • Fix precision loss in CropWindowGenerator (#1735)
  • Fix warnings reported by static analysis tool: (#1734)
  • Fixed the test failure on Power and x86 (#1752)
  • Fix out of range detection in get_item for TensorList (#1758)
  • Fix a race condition in AsyncPipelinedExecutor destructor and WorkerThread (#1757)
  • Fix bug in the COCOReader with masks (#1724)
  • Fix test_plugin_manager (#1749)
  • Fix typo in TensorListGPU docs, show getitem docs (#1746)
  • Fix SSD type mismatch (#1767)
  • Fix TF dataset build (#1792)
  • Fix DALI TF plugin build (#1789)
  • Fix positional argument propagation in TF Dataset (#1798)

Improvements

  • Add Gluon iterator plugin (#1683)
  • Adjust mxnet DALIClassificationIterator doc (#1718)
  • Change default value in ToDecibels, add one test (#1720)
  • Add error handling when trying to serialize Python Operators (#1730)
  • Use CMake's CUDA language support (#1733)
  • Allow 1 and 2 dimmensional input for Slice (#1741)
  • Specialize mul artihm op for bool (#1737)
  • Optical flow test against ground truth. (#1753)
  • Add /usr/local/cuda/bin to PATH in the main Dockerfile (#1756)
  • Add an ability to read noncontinuous RecordIO and TFRecord files (#1747)
  • Allow align-only behavior in Pad operator by treating shape argument as minimum shape (#1764)
  • Enable XLA for TensorFlow RN50 tests and use passthrough ImageNet for MXNet (#1760)
  • Add Reinterpret operator as a flavor of Reshape (#1768)
  • Short-time Fourier transform for GPU (#1721)
  • Adds data_ptr method to Tensor and TensorList (#1773)
  • Correct COCOReader mask doc (#1772)
  • Add GPU variant of Spectrogram operator (#1786)
  • Extend shape support in DALI Dataset for TF (#1723)
  • MelFIlterBank GPU kernel (#1787)
  • MelFilterBank GPU operator (#1796)
  • Test for RNNT data pipeline (CPU) (#1745)
  • Add data layout documentation and input layout expectations in operator's documentation (#1766)
  • Move RNNT test to Torch specific tests (#1805)

Breaking API changes

None

Deprecated feature

  • CUDA 9 support will end in several releases (#1684)

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.20.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.20.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v0.19.0

02 Mar 17:02
Compare
Choose a tag to compare
DALI v0.19.0 Pre-release
Pre-release

Bug fixes

  • Update examples with COCO data set and fix reader behavior for padding (#1557)
  • Fix TensorFlow dataset test (#1641)
  • Fix typo in QNX cmake files (#1648)
  • Remove allocation-dependent test assert (#1650)
  • Fix several explicit "something is implicitly deleted" warnings (#1652)
  • Fix formatting of the example in the FW iterators docs (#1649)
  • Fix hang in decoder benchmark (#1672)
  • Fix error message (#1680)
  • Fix torch stream initialization in TorchPythonFunction (#1681)
  • Fix multi-channel fill value check in Erase operator (#1675)
  • Tests fix after examples refactor (#1687)
  • Fix Reshape docstring typo (#1691)
  • Add synchronization to read/write operations in image decoder cache (#1702)
  • Fix Buffer linkage and Reshape bug (#1714)
  • Fix TL1 tests (#1710)
  • Fix Pad operator bug (#1713)

Improvements

  • Allow Crop and CropMirrorNormalize to crop sequences as if they were volumetric images (#1605)
  • Erase CPU operator (#1609)
  • Improved Reshape (#1634)
  • Add GetDimIndices utility to tensor_layout.h (#1640)
  • Add example with booleans, comparisons, bitwise and muxing (#1631)
  • Remove unimplemented scale parameter in ops.VideoReader. (#1658)
  • Change ambiguous here in docs developer version (#1657)
  • Docs layout and navigation changes (#1635)
  • GPU PythonFunction operator (#1655)
  • Rename Tensor to TensorList in Supported Ops doc (#1661)
  • Add Pad CPU operator (including aligned padded shape support) (#1642)
  • Remove the ColorTwist deprecation message (#1646)
  • Change PipelineAPIType to Enum (#1636)
  • Directional reductions (for CPU) - mean standard deviation, sum, mean square; with tree reduction. (#1653)
  • Add support to UINT8 data type in SequenceWrapper (#1643)
  • Moving operators around. (#1667)
  • Normalize CPU vol 2 (#1666)
  • GPU PyTorch operator (#1662)
  • Proposing new structure of DALI examples (#1540)
  • VideoReader example (#1612)
  • MovingMeanSquared kernel (#1668)
  • Allow extra dimensions with extent 1 in Spectrogram operator & AudioDecoder changes (#1679)
  • Make DataIter a base class for MXNet DALIGenericIterator (#1669)
  • Add Transpose CPU Operator (#1677)
  • Remove not supported python versions from manylinux build (#1694)
  • Add deprecation message about CUDA 9 (#1684)
  • Mitigate the OS file-max limit in the VideoReader (#1659)
  • Adds support to StopIteration raised inside framework iterators (#1625)
  • Enable FFTS builds for ARM (Xavier, QNX) (#1686)
  • Normalize operator for CPU backend (#1670)
  • Python operator notebook (#1685)
  • Change backend_impl at to getitem - return TensorXPU (#1682)
  • Normalize tutorial (#1697)
  • Adjust setup_packages.py to the latest pip version (#1698)
  • Remove gif as supported extension (#1700)
  • Making "Supported backend" title in docs appear correctly
  • Update supported TF versions, update setup_packages.py (#1693)
  • Add pass-through info to OpSchema to add shared data to stage outputs. (#1707)
  • Nonsilence operator (#1701)
  • Constant operator and Python wrapper. (#1699)
  • Add support in CropMirrorNormalize for uneven sizes of mean and std (#1708)
  • Shrink host buffers (#1712)
  • Move pipeline ownership from Dataset to Iterator (#1704)
  • Align Rn50 data processing pipeline for TensorFlow with upstream examples (#1706)
  • Add a note how to set DALI_EXTRA_PATH to run jupyter examples (#1703)
  • Gpu python operator notebook (#1715)
  • Update Memory consumption and Custom operator docs sections (#1719)
  • Use prebuild cupy for TL0_jupyter test (#1728)

Breaking API changes

None

Deprecated feature

  • CUDA 9 support will end in several releases (#1684)
  • Access to Tensors of TensorListCPU and TensorListGPU with at was replaced by array subscript operator. (#1682)

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.19.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.19.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v0.18.0

17 Jan 16:54
Compare
Choose a tag to compare
DALI v0.18.0 Pre-release
Pre-release

Bug fixes

  • Fix setup_packages.py for CUDA versions that are not listed explicitly (#1554)
  • Fix problem with TensorFlow and cupy tests (#1568)
  • Fix ToContiguousXXX for more than 2 inputs. (#1572)
  • Use prebuild cupy in tests (#1570)
  • Fix a race condition in GetGPUAllocator (#1575)
  • Use different stream base for different videos. (#1592)
  • Fixing numpy version to 1.17.0 to avoid error in pycocotools/cocoeval due to implicit conversion from float64 to integer (#1618)
  • Formatting fix. (#1597)
  • Fix Transpose operator for batch size 1 as well as 1 channel images (#1624)
  • Fix static analysis problems (#1559)
  • Fix check if resampling is needed in audio decoder. (#1630)
  • Temporary fix due to missing PILLOW_VERSION symbol when using torchvision (#1626)

Improvements

  • Add support for Unary Ops: + and - (#1392)
  • Improve support for labels in VideoReader. (#1500)
  • Bump up Protobuff version to the latest one (#1543)
  • Add comparison operators and bool handling in arithmetic ops (#1541)
  • Cleanup formatting of Supported Operations (#1578)
  • Bump up protobuf and libturbo-jpeg version in aarch64-linux and qnx build, fix libsnd dependency (#1573)
  • Update PR template (#1571)
  • Add an ability to return a duplicated outputs from the DALI pipeline (#1556)
  • Add explicit call docstring, fix Supported backends (#1547)
  • Add DCT 1D CPU kernel (#1569)
  • Bump protobuf version in docs (#1586)
  • Add interdoc link to define_graph, fix note (#1590)
  • Split Expression Factory into separate translation units (#1587)
  • Add bitwise operators: &, |, ^ (#1594)
  • Resampling decoder (#1582)
  • Extract windows GPU (#1538)
  • Remove old PythonFunction implementation (#1585)
  • Mock imports when building docs where possible (#1593)
  • Load libnvcuvid before we test if cuvidReconfigureDecoder symbol exists (#1591)
  • Bump protobuf version in conda build (#1606)
  • Update VideoReader testcase, use nvmlSystemGetDriverVersion (#1617)
  • Name the dataloader shuffling seed (#1621)
  • Add docs for arithmetic expressions (#1600)
  • Add data source info to error message in TFRecord and Caffe parsers (#1620)
  • Remove the need to have GPU available when DALI is just imported (#1601)
  • MFCC CPU operator (#1577)
  • Update CUDA version detection for Conda (#1629)

Breaking API changes

  • Python 2.7 is no longer available. To stay up-to-date with DALI, upgrade to Python 3.5 or later.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.18.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.18.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v0.17.0

27 Dec 23:20
Compare
Choose a tag to compare
DALI v0.17.0 Pre-release
Pre-release

Bug fixes

  • Fix scalar batch handling in arithmetic ops (#1449)
  • Coverity fixes (#1408)
  • Fix removal of device_id initialization in OF (#1459)
  • Static analysis fixes (#1469)
  • Fix start index function (#1482)
  • Add missing dependencies to conda recipe (#1483)
  • Fix for bundle-wheel.sh (#1499)
  • More of static analysis fixes (#1496)
  • Fix race between consecutive invocations of stage, reduce number of events (#1493)
  • Fixes ExternSource for the GPU (#1452)
  • Fix pip package discovery (#1534)
  • Wait for thread pool to finish work in BrightnessConstrast (#1549)
  • Fix doc string (#1546)
  • Fix color operators. (#1555)
  • Fix color operators even more (#1558)
  • Fix stream usage in HSV and BrighnessContrast. (#1566)
  • Fix problem with TensorFlow and cupy tests (#1568)

Improvements

  • Add favicon to docs (#1453)
  • Resampling ND - ground work (#1366)
  • Warp 3D (#1442)
  • Add sequence and 3D support in flip operator (#1439)
  • Make thread pinning optional in the mixed ImageDecoder (#1465)
  • Improve accuracy of 3D rotation (#1466)
  • Add ability to read LMDB without any labels stored inside (#1440)
  • AudioDecoder for WAV format (#1447)
  • Add support for PaddlePaddle (#1371)
  • Update docs for fill_last_batch parameter to match the real behavior (#1479)
  • Remove used requirement from paddle SSD demo docs (#1486)
  • FFT CPU 1D implementation (based on ffts) (#1446)
  • Utilize libcudart.so version to detect the CUDA toolkit version (#1477)
  • Allow for more verbose Pipeline's graph logging (#1487)
  • CMake switch for audio support (#1480)
  • Add polygons mask support to COCOReader (#1455)
  • Change TF versions supported by dataset (#1492)
  • Additional deps for AudioDecoder (#1485)
  • Add ExtractWindows CPU kernel (#1461)
  • Add MNIST TensorFlow test (#1467)
  • Remove deprecated edge.py (#1498)
  • Add PowerSpectrum CPU operator (#1460)
  • Add Spectrogram CPU Operator (#1468)
  • Add MNIST examples (#1491)
  • Add notebooks with example usage of arithmetic ops (#1438)
  • Add ToDecibels CPU kernel (#1516)
  • Adding librosa dependency to qa/TL1_jupyter_plugins/test.sh (#1517)
  • Fix Keras GPU example (#1520)
  • Preemphasis operator (#1515)
  • Fix for WaitForWork in Preemphasis (#1523)
  • AudioDecoder operator (#1481)
  • Lower the accuracy threshold for paddle RN50 test (limited to 25 epochs only) (#1528)
  • Remove cache options from fused ImageDecoder documentation (#1495)
  • Add ToDecibels CPU operator (#1518)
  • Add deprecation warning for Python 2.7 (#1521)
  • Split tests per framework if possible (#1519)
  • Add zlib dependency warning to libtiff build step (#1530)
  • Rephrase supported backends documentation (#1497)
  • Extend supported ops doc to include info about volumetric data. (#1531)
  • Disable clamping when converting from bool (#1536)
  • Add adobe analytics tracking script into docs (#1539)
  • ColorTwist operator cleanup (#1532)
  • NormalDistribution operator (#1529)
  • Hide the docs for internal operators (#1542)
  • MelFilterBank CPU kernel (#1522)
  • Disables cupy test for python 2.7 (#1544)
  • Boundary condition handling (#1552)
  • Add spaces in Python 2.7 end of life warning (#1553)
  • Add MelFilterBank CPU operator (#1535)
  • Add more formats to FileReader (#1561)
  • Make the presence of unique visitor script counting optional in docs (#1560)
  • Adjust color ops; make contrast-neutral gray configurable (#1562)

Breaking API changes

  • DALI 0.17 is the last official release for Python 2.7, which reaches the end of life on January 1st, 2020. To stay up to date with DALI, please upgrade to Python 3.5 or later.
  • The asCPU method is no longer available and has been replaced with as_cpu.
  • ColorTwist operator was deprecated and replaced by BrightnessContrast and HSV operators cleanup (#1532)

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.17.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.17.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v0.16.0

03 Dec 00:21
Compare
Choose a tag to compare
DALI v0.16.0 Pre-release
Pre-release

Bug fixes

  • Fix DALI TF plugin CXX11 ABI issue (#1361)
  • Fix DALI TF installation for TF 2.0 (#1386)
  • Fix Pad op default fill_value and axes (#1410)
  • Fix Tensorflow examples for TF 2.0 (#1420)
  • Fix input tiling in arithmetic ops (#1426)
  • Fix link error in debug mode. (#1429)
  • Fix RN50 MXNet TL3 test (#1424)
  • Fix scalar batch handling in arithmetic ops (#1449)

Improvements

  • Rearrange docker images (#1333)
  • GTest naming in STYLE_GUIDE (#1330)
  • Add 3D case to shape layout verification in CropAttr (#1344)
  • Add fallback to host when nvjpegJpegStreamParse fails (#1335)
  • Surface2D -> ND generalization (#1348)
  • Add multichannel (C>3) pipeline tests (#1219)
  • Improve last_batch_padded and Running DALI pipeline docs (#1351)
  • Undo pytorch download changes (#1353)
  • Provide prebuilt plugins for manylinux2010 based pip packages (#1346)
  • Clean include file depenedencies (#1362)
  • Add warning if avformat_open_input fails (#1363)
  • Workaround for a segfault in NVCC 9 with (#1365)
  • HSV manipulation operator for GPU & CPU (#1338)
  • Backend implementation for binary arithmetic Operator (#1322)
  • Add skip_vfr_check option to VideoReader (#1367)
  • Support float16 in Cast GPU operator (#1368)
  • Add implementation of BmpImage::PeekShapeImpl, including number of channels (#1332)
  • Add Vp9 codec support (#1331)
  • Add torch dependency to TL1_separate_executor (#1373)
  • Add TF Dataset GPU (#1354)
  • Add ability to cross compile ldmb (#1374)
  • Move Tensor(List)Shape, Tensor(List)View to dali/core (#1341)
  • Relax check for libnvidia-opticalflow is test script. (#1381)
  • Disable Vp9 tests temporarily (#1383)
  • Make it possible to build DALI with any CUDA version (#1345)
  • Add multigpu TF dataset test (#1382)
  • Generalize helper code to unary inputs (#1379)
  • Force inline and affine transformation (#1389)
  • GPU dltensor operator (#1261)
  • Enhance Slice API to specify axes represented in the arguments (#1336)
  • Allow default compiler build if TF compiler version is unknown (#1396)
  • NewWarpAffine -> WarpAffine; optimize CPU warp for affine mapping. (#1387)
  • Allow build DALI for different architectures as well (#1397)
  • Remove PyTorch iterator double buffering (#1399)
  • Improve wording for PREBUILD_TF_PLUGINS option (#1407)
  • Move builtin operators to dali/pipeline. (#1406)
  • Enhance CaffeReader and Caffe2Reader to support multiple LMDB files (#1360)
  • Expose arithm ops in Python (#1355)
  • Add Pad operator (#1180)
  • Enable CUDA 10 compatibility layer for Conda build (#1339)
  • Enforce crop argument minimum size (#1401)
  • Rotate operator using Warp kernel (#1403)
  • Allow empty lists in arguments (#1413)
  • Add missing license in python tests (#1412)
  • Support TF 1.15 and 2.0 in tests (#1400)
  • Fix DALIDataType enum in Python (#1419)
  • BrightnessContrast operator example (#1414)
  • Add additional_decode_surfaces parameter to videoreader (#1393)
  • CPU argument input (#1423)
  • Add support for Constant inputs and type-erased tiles (#1391)
  • Support TF v2.0 in jupyter examples (#1425)
  • Limit number of Input/Output type combinations in Slice kernel family (#1418)
  • Add TF 1.15 and 2.0 support for TF dataset (#1395)
  • New warp example + minor fixes (#1158)
  • Add initial support for constants in python API (#1421)

Breaking API changes

  • DALI 0.17 is the last official release for Python 2.7, which reaches the end of life on January 1st, 2020. To stay up to date with DALI, please upgrade to Python 3.5 or later.
  • Removed the following deprecated operators:
    • Remove previously deprecated operator NormalizePermute (CropMirrorNormalize should be used instead) (#1402)
    • Remove deprecated HostDecoder and nvJPEGDecoder (#1398)
  • Crop, CropMirrorNormalize and Slice operator possible output types are limited to one of uint8_t, int16_t, uint16_t, int32_t, float, float16 or passing through the input type (#1418).
  • Move dali/pipeline/operators to dali/operators (#1380)
  • DALI library modularization (#1384)
  • CPU argument input (#1423)

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.16.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.16.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

DALI v0.15.0

28 Oct 15:31
Compare
Choose a tag to compare
DALI v0.15.0 Pre-release
Pre-release

Bug fixes

  • Fix Transpose operator when data shape with dimension of size 1 (#1244)
  • Fix DALI_Extra clone (#1276)
  • Fix conda check in DALI TF installation script (#1284)
  • Fix problems with seeking when stream start_time is != 0. (#1287)
  • Fix TypeTable initialization (#1321)
  • Fix CropMirrorNormalize compilation with GCC 8 (#1320)
  • Suppress warning when FileReader encounters dot and dot-dot entries (#1318)
  • Fix the wrong usage of find_library when searching for FFmpeg libs (#1317)
  • Fix last_batch_padded docs (#1314)
  • Fix pytorch download url (#1334)
  • Undo pytorch download changes (#1353)
  • Fix DALI TF plugin CXX11 ABI issue (#1361)
  • Add torch dependency to TL1_separate_executor (#1373)
  • Fix DALI TF installation for TF 2.0 (#1386)
  • Relax check for libnvidia-opticalflow is test script. (#1381)

Improvements

  • Replace std::pair alias with actual type (#1248)
  • Add support for volumetric (i.e. 3D) crop (depth, height and width) (#1210)
  • Refactor storage type specialization for operator aguments (#1245)
  • CPU DLTensor Operator (#1233)
  • Change Outputs and SharedOuputs return type to tuple (#1243)
  • Add non_blocking option to CopyToExternalTensor (#1254)
  • Improve heuristic for variable frame rate detection (#1242)
  • Add pipeline validation (#1267)
  • Add lookup table operator (#1251)
  • make_string for arguments, which have operator<< (#1174)
  • Tensor layout (#1237)
  • Rework Support Ops to use TensorList (#1259)
  • Improve logic in DALI TF plugin installation (support conda installation use case) (#1271)
  • size_t -> int for vec, mat, box etc... (#1277)
  • ImageDecoder libtiff implementation (#1264)
  • Add check for OF support (#1278)
  • ImageDecoder libtiff implementation (types.ANY_DATA, YCbCr, ImageDims to TensorShape) (#1280)
  • Handle nchannels>3 in ImageDecoder (#1285)
  • Use alternative compiler (e.g. g++-5.4) when available (#1290)
  • Add support for UCF-101 dataset and upgrade ffmpeg version from 3.4.2 to 4.2 (#1241)
  • Add info about libtiff dependency in the documentation (#1294)
  • Check whether random row access is allowed in libtiff based decoder implementation (#1295)
  • Make cspan (#1298)
  • BrightnessContrast operator (#1188)
  • Parse number of channels in PNGImage::PeekShape (#1288)
  • Add support for decoding multiple resolution videos in the same pipeline. (#1144)
  • Conda recipe: Point to local git repository for build source, relax version dependencies and use on conda-forge for some dependencies (#1303)
  • TiffImage::PeekShapeImpl parse and return number of channels (#1304)
  • Introduce byte_io.h including byte sequence reading utils (ReadValueBE and ReadValueLE) (#1310)
  • Add parsing of number of channels in JpegImage::PeekShapeImpl (#1306)
  • Layout refactor (#1250)
  • Add CMake VERBOSE_LOGS switch (#1319)
  • Add BMP tests (#1316)
  • Make DALI_extra repo path settable from the env (#1323)
  • Linear transformation GPU kernel (#1262)
  • Use DALI_extra images in more tests (#1177)
  • Reshape op (#1327)
  • Add tf dataset (#1299)
  • Adjust QA scripts remove installing pip whl from direct links as pip will disregard the "-f" option in that case (#1328)
  • Add CropMirrorNormalize 3D support (#1326)
  • Add layout handling to Transpose operator (#1329)
  • Add shape layout input to crop window generator signature (#1340)
  • Linear Transformation kernel for CPU (#1300)
  • Rearrange docker images (#1333)
  • Provide prebuilt plugins for manylinux2010 based pip packages (#1346)
  • Add 3D case to shape layout verification in CropAttr (#1344)

Breaking API changes

  • Change Outputs and SharedOuputs return type to tuple (#1243)

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.

  • DALI TensorFlow plugin may not be compatible with TensorFlow versions 1.15.0 and/or later. If the user wants to use DALI with TensorFlow version which doesn’t have prebuilt plugin binary shipped with DALI it requires the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc, 4.8.5 or 5.4, depending on the particular version) is present on the system.

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.15.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.15.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

DALI v0.14.0

30 Sep 17:37
Compare
Choose a tag to compare
DALI v0.14.0 Pre-release
Pre-release

Bug fixes

  • Fix fp16 bug from #1129 and add fp16 test case (#1160)
  • Fix framework iterators behavior when iter_setup raises StopIteration (#1136)
  • Fix nvjpeg legacy API (#1179)
  • Attempt different driver urls in setup_test_common.sh (#1193)
  • fix nightly bug in video reader (#1194)
  • Fix conversions to int64 / uint64. (#1205)
  • Attempt to fix issue with tf plugin install and gcc 4.8 (#1214)
  • Fix PyTorch spelling (#1230)

Improvements

  • BrightnessContrast CUDA kernels (#1142)
  • Adjust Operator::Run to take reference instead of pointer (#1168)
  • Add a STYLE_GUIDE for DALI, adjust Kernel example (#1167)
  • Extend external source operator capacity (#1127)
  • Make Deallocate public API (#1182)
  • Remove .cpu function (#1181)
  • Allow stream() to be called for every Workspace (#1178)
  • Improve error messages for file_list arg problems in FileReader (#1184)
  • Add multi gpu python notebook (#1186)
  • HSV Kernel for CPU (#1187)
  • Adjust CropMirrorNormalize to Setup API (#1140)
  • Expose tensor as dlpack (#1154)
  • Add const noexcept qualifiers to IsContiguous. (#1211)
  • ROI utils (#1189)
  • Add qa test for multi gpu example (#1202)
  • Add support for 3d shapes in crop window (#1207)
  • DALI for aarch64-QNX platform (#522)
  • Unified naming for float16 type. (#1212)
  • Add types to DALIDataType that were missing (#1213)
  • CPU warp, with tests. (#1159)
  • Conda Recipe for DALI (#1156)
  • Update file reader doc (#1222)
  • Track DALI_extra version in DALI (#1229)
  • Add Shapes operator returning sample shapes. (#1223)
  • New Warp operator (#1153)

Breaking API changes

  • Remove .cpu function (#1181)
  • Adjust Operator::Run to take reference instead of pointer (#1168)
  • Extend external source operator capacity (#1127) - it now requires input to be set for every iteration
  • Unified naming for float16 type. (#1212)

Known issues:

  • New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
    -e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video"
  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.14.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.14.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

DALI v0.13.0

29 Aug 15:34
Compare
Choose a tag to compare
DALI v0.13.0 Pre-release
Pre-release

Bug fixes

  • Upgrade PyTorch to 1.2, TorchVison to 0.4 (#1155)
  • Add use_batched_decode argument to nvJPEGDecoder API (only for legacy nvJPEGDecoder implementation) (#1151)
  • Make loading of the versioned libnvidia-opticalflow.so the primary path (#1147)
  • Fix tests that are not using prolog/epilog functions (#1143)
  • Provide default initialization for scratch sizes in KernelRequiements. (#1141)
  • Fix coco loader (#1135)
  • Fix GET_PROC_EX macro (#1128)
  • Fix typo in installation doc (#1126)
  • Fix capitalization in docs for docker dir (#1122)
  • Fix pipeline serialization/deserialization for logical_id (#1121)
  • Make use right PyTorch capitalization everywhere (#1119)
  • Fix Gluon example that mixes simple and iterator DALI API (#1117)
  • Fix lint in ../dali/pipeline/operators/reader/loader/loader.h (#1113)
  • Fix float16 support in DALI TensorFlow plugin (#1086)
  • Fix python operator with side effects. (#1105)
  • Fix warning (#1061)
  • Fix test header inclusion (#1100)
  • Make dali_kernel_test_lib respect BUILD_TEST (#1101)
  • Fix a race condtion in async pipeline executor (#1103)
  • Typo fixed in getting started notebook (#1091)
  • Reduced batch size to avoid out of memory condition in 19.07 container. (#1089)
  • Fix error of indexing shape in Optical Flow (#1087)
  • Disable video_reader_op test when we disable NVDEC (#1077)
  • Add video error message (#1067)
  • Fix sampling of chroma in the VideoReader op (#1054)
  • Fix detection pipeline example (#1055)
  • Fix fp16 bug from #1129 and add fp16 test case (#1160)

Improvements

  • Adjust customdummy plugin in Docs to new API (#1150)
  • Add view overload to get TensorListView from TensorVector. (#1152)
  • Warp kernels (#1063)
  • Add Setup API to Operator (#1045)
  • Input & output TYPED_TEST (#1133)
  • Refactor SliceFlipNormalizePermutPad (super)kernel (#1129)
  • Add virtual env and conda test case for DALI TF plugin (#1107)
  • Add test for water operator (#1075)
  • BrightnessContrast kernel first implementation (#1060)
  • Add default_cuda_stream_priority documentation (#1131)
  • Fast coco reader (#1098)
  • Optimize docker images building(#1053)
  • Remove explicit Multiple Input Sets handling from C++ Backend (#1088)
  • Document pre-built WML CE packages in Installation docs (#1124)
  • Upgrade VideoCodecSDK to 9.0.20 (#1120)
  • UniformRandomFill for unified storage (#1070)
  • Calculation layout setup for GPU kernels. (#1106)
  • Rework multiple input sets API (#1104)
  • Use per-sample RNG in SSDRandomCrop and RandomBBoxCrop (#1109)
  • Add compile-time mapping for DALIDataType. For use in TYPE_SWITCH. (#1108)
  • Reworks how the reader pick samples from the shuffling buffer (#1005)
  • Add checking if Python API is not mixed between simple, scheduled and iterator (#1074)
  • Enable OpticalFlow test on CI (#1096)
  • Make protobuf linking mode configurable (#1102)
  • Kernel manager (#1079)
  • Add JIRA Task placeholder in PR template (#1090)
  • Replace vector<shared_ptr> with TensorVector (#1040)
  • Deprecate NormalizePermute in favor of CropMirrorNormalize (#982)
  • Adjust TensorFlow ResNet50 example to 1.14 version API (#1081)
  • Update DALI TF plugin docs to be aligned with the current functionality (#1066)
  • Adds BUILD_TF_PLUGIN flag to one-click build script (#1051)
  • Enforce shares_data_ in Buffer (#1057)
  • Improved sampler (#1071)
  • Change test prefix from L*_ to TL*_ (#1069)
  • Rounding Convert and ConvertSat added. (#1068)
  • Copy multiple collections to scratchpad. (#1044)
  • Use DALI_extra in loader test (#1064)
  • Add filename to LMDB reader errors (#1059)
  • Add make check target that runs basic tests (#1019)
  • Bounding box representation (#1052)
  • Add option to enable fast IDCT in libjpeg-turbo (#1031)
  • Adjust Tests to use DALI_EXTRA (#1056)
  • Basic geometric transform functions. (#1047)
  • Add TorchPythonFunction operator (#1033)
  • Add support for reading video files with labels using file_list argument (#1029)
  • add tensorflow 1.14 (#1037)
  • Enable sink operators. (#1004)
  • Update PR template (#1043)

Breaking API changes

  • Added Setup API to Operator with pure virtual SetupImpl
  • Multiple Input Sets handling was removed from backend and is only python level syntactic sugar
  • Reader sampling from shuffling buffer was adjusted
  • Replace vector<shared_ptr> with TensorVector as input and output of CPU Operators allowing for contiguous outputs from CPU Ops
  • Deprecate NormalizePermute in favor of CropMirrorNormalize (#982)
  • Enforce shares_data_ in Buffer - sharing data cannot be implicitly reallocated and must match allocation size

Known issues:

  • New Video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. NVIDIA GPU Cloud (NGC) optimized containers lacks this functionality in the default configuration prior to 19.01. To enable it please run the container with the ‘video’ capability enabled, ie.:
    -e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video"
  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.

Binary builds

Install via pip for CUDA 9:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/9.0 nvidia-dali==0.13.0
or for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/cuda/10.0 nvidia-dali==0.13.0

Or use direct download links (CUDA 9.0):

Or use direct download links (CUDA 10.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here