Skip to content

Latest commit

 

History

History
117 lines (100 loc) · 27.7 KB

transform.md

File metadata and controls

117 lines (100 loc) · 27.7 KB

Transform

  1. Introduction

  2. Transform support list

    2.1 Tensorflow

    2.2 Pytorch

    2.3 MXNet

    2.4 ONNXRT

Introduction

Neural Compressor supports built-in preprocessing methods on different framework backends. Refer to this HelloWorld example on how to configure a transform in a dataloader.

Transform Support List

TensorFlow

Transform Parameters Comments Usage(In yaml file)
Resize(size, interpolation) size (list or int): Size of the result
interpolation (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest', 'bicubic'
Resize the input image to the given size Resize:
   size: 256
   interpolation: bilinear
CenterCrop(size) size (list or int): Size of the result Crop the given image at the center to the given size CenterCrop:
   size: [10, 10] # or size: 10
RandomResizedCrop(size, scale, ratio, interpolation) size (list or int): Size of the result
scale (tuple or list, default=(0.08, 1.0)): range of the size of the origin size cropped
ratio (tuple or list, default=(3. / 4., 4. / 3.)): range of aspect ratio of the origin aspect ratio cropped
interpolation (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest'
Crop the given image to random size and aspect ratio RandomResizedCrop:
   size: [10, 10] # or size: 10
   scale: [0.08, 1.0]
   ratio: [3. / 4., 4. / 3.]
   interpolation: bilinear
Normalize(mean, std) mean (list, default=[0.0]): means for each channel, if len(mean)=1, mean will be broadcasted to each channel, otherwise its length should be same with the length of image shape
std (list, default=[1.0]):stds for each channel, if len(std)=1, std will be broadcasted to each channel, otherwise its length should be same with the length of image shape
Normalize a image with mean and standard deviation Normalize:
   mean: [0.0, 0.0, 0.0]
   std: [1.0, 1.0, 1.0]
RandomCrop(size) size (list or int): Size of the result Crop the image at a random location to the given size RandomCrop:
   size: [10, 10] # size: 10
Compose(transform_list) transform_list (list of Transform objects): list of transforms to compose Compose several transforms together If user uses yaml file to configure transforms, Neural Compressor will automatic call Compose to group other transforms.
In user code:
from neural_compressor.experimental.data import TRANSFORMS
preprocess = TRANSFORMS(framework, 'preprocess')
resize = preprocess["Resize"] (**args)
normalize = preprocess["Normalize"] (**args)
compose = preprocess["Compose"] ([resize, normalize])
sample = compose(sample)
# sample: image, label
CropResize(x, y, width, height, size, interpolation) x (int): Left boundary of the cropping area
y (int): Top boundary of the cropping area
width (int): Width of the cropping area
height (int): Height of the cropping area
size (list or int): resize to new size after cropping
interpolation (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest' and 'bicubic'
Crop the input image with given location and resize it CropResize:
   x: 0
   y: 5
   width: 224
   height: 224
   size: [100, 100] # or size: 100
   interpolation: bilinear
RandomHorizontalFlip() None Horizontally flip the given image randomly RandomHorizontalFlip: {}
RandomVerticalFlip() None Vertically flip the given image randomly RandomVerticalFlip: {}
DecodeImage() None Decode a JPEG-encoded image to a uint8 tensor DecodeImage: {}
EncodeJped() None Encode image to a Tensor of type string EncodeJped: {}
Transpose(perm) perm (list): A permutation of the dimensions of input image Transpose image according perm Transpose:
   perm: [1, 2, 0]
ResizeWithRatio(min_dim, max_dim, padding) min_dim (int, default=800): Resizes the image such that its smaller dimension == min_dim
max_dim (int, default=1365): Ensures that the image longest side does not exceed this value
padding (bool, default=False): If true, pads image with zeros so its size is max_dim x max_dim
Resize image with aspect ratio and pad it to max shape(optional). If the image is padded, the label will be processed at the same time. The input image should be np.array or tf.Tensor. ResizeWithRatio:
   min_dim: 800
   max_dim: 1365
   padding: True
CropToBoundingBox(offset_height, offset_width, target_height, target_width) offset_height (int): Vertical coordinate of the top-left corner of the result in the input
offset_width (int): Horizontal coordinate of the top-left corner of the result in the input
target_height (int): Height of the result
target_width (int): Width of the result
Crop an image to a specified bounding box CropToBoundingBox:
   offset_height: 10
   offset_width: 10
   target_height: 224
   224
Cast(dtype) dtype (str, default='float32'): A dtype to convert image to Convert image to given dtype Cast:
   dtype: float32
ToArray() None Convert PIL Image to numpy array ToArray: {}
Rescale() None Scale the values of image to [0,1] Rescale: {}
AlignImageChannel(dim) dim (int): The channel number of result image Align image channel, now just support [H,W]->[H,W,dim], [H,W,4]->[H,W,3] and [H,W,3]->[H,W].
This transform is going to be deprecated.
AlignImageChannel:
   dim: 3
ParseDecodeImagenet() None Parse features in Example proto ParseDecodeImagenet: {}
ResizeCropImagenet(height, width, random_crop, resize_side, random_flip_left_right, mean_value, scale) height (int): Height of the result
width (int): Width of the result
random_crop (bool, default=False): whether to random crop
resize_side (int, default=256):desired shape after resize operation
random_flip_left_right (bool, default=False): whether to random flip left and right
mean_value (list, default=[0.0,0.0,0.0]):means for each channel
scale (float, default=1.0):std value
Combination of a series of transforms which is applicable to images in Imagenet ResizeCropImagenet:
   height: 224
   width: 224
   random_crop: False
   resize_side: 256
   random_flip_left_right: False
   mean_value: [123.68, 116.78, 103.94]
   scale: 0.017
QuantizedInput(dtype, scale) dtype(str): desired image dtype, support 'uint8', 'int8'
scale(float, default=None):scaling ratio of each point in image
Convert the dtype of input to quantize it QuantizedInput:
   dtype: 'uint8'
LabelShift(label_shift) label_shift(int, default=0): number of label shift Convert label to label - label_shift LabelShift:
   label_shift: 0
BilinearImagenet(height, width, central_fraction, mean_value, scale) height(int): Height of the result
width(int):Width of the result
central_fraction(float, default=0.875):fraction of size to crop
mean_value(list, default=[0.0,0.0,0.0]):means for each channel
scale(float, default=1.0):std value
Combination of a series of transforms which is applicable to images in Imagenet BilinearImagenet:
   height: 224
   width: 224
   central_fraction: 0.875
   mean_value: [0.0,0.0,0.0]
   scale: 1.0
SquadV1(label_file, n_best_size, max_seq_length, max_query_length, max_answer_length, do_lower_case, doc_stride) label_file (str): path of label file
vocab_file(str): path of vocabulary file
n_best_size (int, default=20): The total number of n-best predictions to generate in the nbest_predictions.json output file
max_seq_length (int, default=384): The maximum total input sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter, than this will be padded
max_query_length (int, default=64): The maximum number of tokens for the question. Questions longer than this will be truncated to this length
max_answer_length (int, default=30): The maximum length of an answer that can be generated. This is needed because the start and end predictions are not conditioned on one another
do_lower_case (bool, default=True): Whether to lower case the input text. Should be True for uncased models and False for cased models
doc_stride (int, default=128): When splitting up a long document into chunks, how much stride to take between chunks
Postprocess the predictions of bert on SQuAD SquadV1
   label_file: /path/to/label_file
   n_best_size: 20
   max_seq_length: 384
   max_query_length: 64
   max_answer_length: 30
   do_lower_case: True
   doc_stride: True

Pytorch

Transform Parameters Comments Usage(In yaml file)
Resize(size) size (list or int): Size of the result
interpolation(str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic'
Resize the input image to the given size Resize:
   size: 256
   interpolation: bilinear
CenterCrop(size) size (list or int): Size of the result Crop the given image at the center to the given size CenterCrop:
   size: [10, 10] # or size: 10
RandomResizedCrop(size, scale, ratio, interpolation) size (list or int): Size of the result
scale (tuple or list, default=(0.08, 1.0)): range of size of the origin size cropped
ratio (tuple or list, default=(3. / 4., 4. / 3.)): range of aspect ratio of the origin aspect ratio cropped
interpolation (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest', 'bicubic'
Crop the given image to random size and aspect ratio RandomResizedCrop:
   size: [10, 10] # or size: 10
   scale: [0.08, 1.0]
   ratio: [3. / 4., 4. / 3.]
   interpolation: bilinear
Normalize(mean, std) mean (list, default=[0.0]): means for each channel, if len(mean)=1, mean will be broadcasted to each channel, otherwise its length should be same with the length of image shape
std (list, default=[1.0]): stds for each channel, if len(std)=1, std will be broadcasted to each channel, otherwise its length should be same with the length of image shape
Normalize a image with mean and standard deviation Normalize:
   mean: [0.0, 0.0, 0.0]
   std: [1.0, 1.0, 1.0]
RandomCrop(size) size (list or int): Size of the result Crop the image at a random location to the given size RandomCrop:
   size: [10, 10] # size: 10
Compose(transform_list) transform_list (list of Transform objects): list of transforms to compose Compose several transforms together If user uses yaml file to configure transforms, Neural Compressor will automatic call Compose to group other transforms.
In user code:
from neural_compressor.experimental.data import TRANSFORMS
preprocess = TRANSFORMS(framework, 'preprocess')
resize = preprocess["Resize"] (**args)
normalize = preprocess["Normalize"] (**args)
compose = preprocess["Compose"] ([resize, normalize])
sample = compose(sample)
# sample: image, label
RandomHorizontalFlip() None Horizontally flip the given image randomly RandomHorizontalFlip: {}
RandomVerticalFlip() None Vertically flip the given image randomly RandomVerticalFlip: {}
Transpose(perm) perm (list): A permutation of the dimensions of input image Transpose image according perm Transpose:
   perm: [1, 2, 0]
CropToBoundingBox(offset_height, offset_width, target_height, target_width) offset_height (int): Vertical coordinate of the top-left corner of the result in the input
offset_width (int): Horizontal coordinate of the top-left corner of the result in the input
target_height (int): Height of the result
target_width (int): Width of the result
Crop an image to a specified bounding box CropToBoundingBox:
   offset_height: 10
   offset_width: 10
   target_height: 224
   224
ToTensor() None Convert a PIL Image or numpy.ndarray to tensor ToTensor: {}
ToPILImage() None Convert a tensor or an ndarray to PIL Image ToPILImage: {}
Pad(padding, fill, padding_mode) padding (int or tuple or list): Padding on each border
fill (int or str or tuple): Pixel fill value for constant fill. Default is 0
padding_mode (str): Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant
Pad the given image on all sides with the given “pad” value Pad:
   padding: 0
   fill: 0
   padding_mode: constant
ColorJitter(brightness, contrast, saturation, hue) brightness (float or tuple of python:float (min, max)): How much to jitter brightness. Default is 0
contrast (float or tuple of python:float (min, max)): How much to jitter contrast. Default is 0
saturation (float or tuple of python:float (min, max)): How much to jitter saturation. Default is 0
hue (float or tuple of python:float (min, max)): How much to jitter hue. Default is 0
Randomly change the brightness, contrast, saturation and hue of an image ColorJitter:
   brightness: 0
   contrast: 0
   saturation: 0
   hue: 0
ToArray() None Convert PIL Image to numpy array ToArray: {}
CropResize(x, y, width, height, size, interpolation) x (int):Left boundary of the cropping area
y (int):Top boundary of the cropping area
width (int):Width of the cropping area
height (int):Height of the cropping area
size (list or int): resize to new size after cropping
interpolation (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic'
Crop the input image with given location and resize it CropResize:
   x: 0
   y: 5
   width: 224
   height: 224
   size: [100, 100] # or size: 100
   interpolation: bilinear
Cast(dtype) dtype (str, default ='float32'): The target data type Convert image to given dtype Cast:
   dtype: float32
AlignImageChannel(dim) dim (int): The channel number of result image Align image channel, now just support [H,W,4]->[H,W,3] and [H,W,3]->[H,W], input image must be PIL Image.
This transform is going to be deprecated.
AlignImageChannel:
   dim: 3
ResizeWithRatio(min_dim, max_dim, padding) min_dim (int, default=800): Resizes the image such that its smaller dimension == min_dim
max_dim (int, default=1365): Ensures that the image longest side does not exceed this value
padding (bool, default=False): If true, pads image with zeros so its size is max_dim x max_dim
Resize image with aspect ratio and pad it to max shape(optional). If the image is padded, the label will be processed at the same time. The input image should be np.array. ResizeWithRatio:
   min_dim: 800
   max_dim: 1365
   padding: True
LabelShift(label_shift) label_shift(int, default=0): number of label shift Convert label to label - label_shift LabelShift:
   label_shift: 0

MXNet

Transform Parameters Comments Usage(In yaml file)
Resize(size, interpolation) size (list or int): Size of the result
interpolation (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic'
Resize the input image to the given size Resize:
   size: 256
   interpolation: bilinear
CenterCrop(size) size (list or int): Size of the result Crop the given image at the center to the given size CenterCrop:
   size: [10, 10] # or size: 10
RandomResizedCrop(size, scale, ratio, interpolation) size (list or int): Size of the result
scale (tuple or list, default=(0.08, 1.0)):range of size of the origin size cropped
ratio (tuple or list, default=(3. / 4., 4. / 3.)): range of aspect ratio of the origin aspect ratio cropped
interpolation (str, default='bilinear'):Desired interpolation type, support 'bilinear', 'nearest', 'bicubic'
Crop the given image to random size and aspect ratio RandomResizedCrop:
   size: [10, 10] # or size: 10
   scale: [0.08, 1.0]
   ratio: [3. / 4., 4. / 3.]
   interpolation: bilinear
Normalize(mean, std) mean (list, default=[0.0]):means for each channel, if len(mean)=1, mean will be broadcasted to each channel, otherwise its length should be same with the length of image shape
std (list, default=[1.0]):stds for each channel, if len(std)=1, std will be broadcasted to each channel, otherwise its length should be same with the length of image shape
Normalize a image with mean and standard deviation Normalize:
   mean: [0.0, 0.0, 0.0]
   std: [1.0, 1.0, 1.0]
RandomCrop(size) size (list or int): Size of the result Crop the image at a random location to the given size RandomCrop:
   size: [10, 10] # size: 10
Compose(transform_list) transform_list (list of Transform objects): list of transforms to compose Compose several transforms together If user uses yaml file to configure transforms, Neural Compressor will automatic call Compose to group other transforms.
In user code:
from neural_compressor.experimental.data import TRANSFORMS
preprocess = TRANSFORMS(framework, 'preprocess')
resize = preprocess["Resize"] (**args)
normalize = preprocess["Normalize"] (**args)
compose = preprocess["Compose"] ([resize, normalize])
sample = compose(sample)
# sample: image, label
CropResize(x, y, width, height, size, interpolation) x (int): Left boundary of the cropping area
y (int): Top boundary of the cropping area
width (int): Width of the cropping area
height (int): Height of the cropping area
size (list or int): resize to new size after cropping
interpolation (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest', 'bicubic'
Crop the input image with given location and resize it CropResize:
   x: 0
   y: 5
   width: 224
   height: 224
   size: [100, 100] # or size: 100
   interpolation: bilinear
RandomHorizontalFlip() None Horizontally flip the given image randomly RandomHorizontalFlip: {}
RandomVerticalFlip() None Vertically flip the given image randomly RandomVerticalFlip: {}
CropToBoundingBox(offset_height, offset_width, target_height, target_width) offset_height (int): Vertical coordinate of the top-left corner of the result in the input
offset_width (int): Horizontal coordinate of the top-left corner of the result in the input
target_height (int): Height of the result
target_width (int): Width of the result
Crop an image to a specified bounding box CropToBoundingBox:
   offset_height: 10
   offset_width: 10
   target_height: 224
   224
ToArray() None Convert NDArray to numpy array ToArray: {}
ToTensor() None Convert an image NDArray or batch of image NDArray to a tensor NDArray ToTensor: {}
Cast(dtype) dtype (str, default ='float32'): The target data type Convert image to given dtype Cast:
   dtype: float32
Transpose(perm) perm (list): A permutation of the dimensions of input image Transpose image according perm Transpose:
   perm: [1, 2, 0]
AlignImageChannel(dim) dim (int): The channel number of result image Align image channel, now just support [H,W]->[H,W,dim], [H,W,4]->[H,W,3] and [H,W,3]->[H,W].
This transform is going to be deprecated.
AlignImageChannel:
   dim: 3
ToNDArray() None Convert np.array to NDArray ToNDArray: {}
ResizeWithRatio(min_dim, max_dim, padding) min_dim (int, default=800): Resizes the image such that its smaller dimension == min_dim
max_dim (int, default=1365): Ensures that the image longest side does not exceed this value
padding (bool, default=False): If true, pads image with zeros so its size is max_dim x max_dim
Resize image with aspect ratio and pad it to max shape(optional). If the image is padded, the label will be processed at the same time. The input image should be np.array. ResizeWithRatio:
   min_dim: 800
   max_dim: 1365
   padding: True

ONNXRT

Type Parameters Comments Usage(In yaml file)
Resize(size, interpolation) size (list or int): Size of the result
interpolation (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest', 'bicubic'
Resize the input image to the given size Resize:
   size: 256
   interpolation: bilinear
CenterCrop(size) size (list or int): Size of the result Crop the given image at the center to the given size CenterCrop:
   size: [10, 10] # or size: 10
RandomResizedCrop(size, scale, ratio, interpolation) size (list or int): Size of the result
scale (tuple or list, default=(0.08, 1.0)): range of size of the origin size cropped
ratio (tuple or list, default=(3. / 4., 4. / 3.)): range of aspect ratio of the origin aspect ratio cropped
interpolation (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest'
Crop the given image to random size and aspect ratio RandomResizedCrop:
   size: [10, 10] # or size: 10
   scale: [0.08, 1.0]
   ratio: [3. / 4., 4. / 3.]
   interpolation: bilinear
Normalize(mean, std) mean (list, default=[0.0]):means for each channel, if len(mean)=1, mean will be broadcasted to each channel, otherwise its length should be same with the length of image shape
std (list, default=[1.0]): stds for each channel, if len(std)=1, std will be broadcasted to each channel, otherwise its length should be same with the length of image shape
Normalize a image with mean and standard deviation Normalize:
   mean: [0.0, 0.0, 0.0]
   std: [1.0, 1.0, 1.0]
RandomCrop(size) size (list or int): Size of the result Crop the image at a random location to the given size RandomCrop:
   size: [10, 10] # size: 10
Compose(transform_list) transform_list (list of Transform objects): list of transforms to compose Compose several transforms together If user uses yaml file to configure transforms, Neural Compressor will automatic call Compose to group other transforms.
In user code:
from neural_compressor.experimental.data import TRANSFORMS
preprocess = TRANSFORMS(framework, 'preprocess')
resize = preprocess["Resize"] (**args)
normalize = preprocess["Normalize"] (**args)
compose = preprocess["Compose"] ([resize, normalize])
sample = compose(sample)
# sample: image, label
CropResize(x, y, width, height, size, interpolation) x (int): Left boundary of the cropping area
y (int): Top boundary of the cropping area
width (int): Width of the cropping area
height (int): Height of the cropping area
size (list or int): resize to new size after cropping
interpolation (str, default='bilinear'): Desired interpolation type, support 'bilinear', 'nearest'
Crop the input image with given location and resize it CropResize:
   x: 0
   y: 5
   width: 224
   height: 224
   size: [100, 100] # or size: 100
   interpolation: bilinear
RandomHorizontalFlip() None Horizontally flip the given image randomly RandomHorizontalFlip: {}
RandomVerticalFlip() None Vertically flip the given image randomly RandomVerticalFlip: {}
CropToBoundingBox(offset_height, offset_width, target_height, target_width) offset_height (int): Vertical coordinate of the top-left corner of the result in the input
offset_width (int): Horizontal coordinate of the top-left corner of the result in the input
target_height (int): Height of the result
target_width (int): Width of the result
Crop an image to a specified bounding box CropToBoundingBox:
   offset_height: 10
   offset_width: 10
   target_height: 224
   224
ToArray() None Convert PIL Image to numpy array ToArray: {}
Rescale() None Scale the values of image to [0,1] Rescale: {}
AlignImageChannel(dim) dim (int): The channel number of result image Align image channel, now just support [H,W]->[H,W,dim], [H,W,4]->[H,W,3] and [H,W,3]->[H,W].
This transform is going to be deprecated.
AlignImageChannel:
   dim: 3
ResizeCropImagenet(height, width, random_crop, resize_side, random_flip_left_right, mean_value, scale) height (int): Height of the result
width (int): Width of the result
random_crop (bool, default=False): whether to random crop
resize_side (int, default=256): desired shape after resize operation
random_flip_left_right (bool, default=False): whether to random flip left and right
mean_value (list, default=[0.0,0.0,0.0]): mean for each channel
scale (float, default=1.0): std value
Combination of a series of transforms which is applicable to images in Imagenet ResizeCropImagenet:
   height: 224
   width: 224
   random_crop: False
   resize_side: 256
   random_flip_left_right: False
   mean_value: [123.68, 116.78, 103.94]
   scale: 0.017
Cast(dtype) dtype (str, default ='float32'): The target data type Convert image to given dtype Cast:
   dtype: float32
ResizeWithRatio(min_dim, max_dim, padding) min_dim (int, default=800): Resizes the image such that its smaller dimension == min_dim
max_dim (int, default=1365): Ensures that the image longest side does not exceed this value
padding (bool, default=False): If true, pads image with zeros so its size is max_dim x max_dim
Resize image with aspect ratio and pad it to max shape(optional). If the image is padded, the label will be processed at the same time. The input image should be np.array. ResizeWithRatio:
   min_dim: 800
   max_dim: 1365
   padding: True