Several Problems when using Keras Image Augmentation Layers to Augment Images and Masks for a Semantic Segmentation Dataset #20857

RabJon · 2025-02-04T11:08:42Z

In the following, I will try to tell you a bit of my “story” about how I read through the various layers and (partly outdated) tutorials on image augmentation for semantic segmentation, what problems and bugs I encountered and how I was able to solve them, at least for my use case. I hope that I am reporting these issue(s) in the right place and that I can help the developers responsible with my “story”.

I am currently using the latest versions of Keras (3.8) and Tensorflow (2.18.0) on an Ubuntu 23.10 machine, but have also observed the issues with other setups.

My goal was to use some of the Image Augmentation Layers from Keras to augment the images AND masks in my semantic segmentation dataset. To do this I followed the segmentation tutorial from Tensarflow where a custom Augment class is used for augmentation.

This code works fine because the Augment class uses only a single augmentation layer. But, for my custom code I wanted to use more augmentations. Therefore, I "combined" several of the augmentation layers trying both the normal Pipeline layer but also my preferred RandomAugmentationPipeline layer from keras_cv.

I provide an example implementation in this gist. The gist is a modification of the aforementioned image segmentation tutorial from Tensorflow where I intended to only change the Augment class slightly to fit my use case and to remove all the unnecessary parts. Unfortunately, I also had to touch the code that loads the dataset, because the dataset version used in the notebook was no longer supported. But, this is another issue.

Anyway, at the bottom of the notebook you can see some example images and masks from the data pipeline and if you inspect the in more detail you can see that for some examples images and masks do not match because they were augmented differently, which is obviously not the expected behaviour. You can also see that these mismatches already happen from the first epoch on. On my system with my custom use case (where I was using the RandomAugmentationPipeline layer) it happend only from the second epoch onward, which made debugging much more difficult.

I first assumed that it was one specific augmentation layer that caused the problems, but after trying them one-by-one I found out that it is the combination of layers that makes the problems. So, I started to think about possible solutions and I also tried to use the Sequential model from keras to combine the layers instead of the Pipeline layer, but the result remained the same.

I found that the random seed is the only parameter which can be used to "control and sync" the augmentation of the images and masks, but I already had ensured that I always used the same seed. So, I started digging into the source code of the different augmentation layers and I found that most of the ones that I was using implement a transform_segmentation_masks() method which is however not used if the layers are called as described in the tutorial. This is because, in order to enforce the call of this method, images and masks must be passed to the same augmentation layer object as a dictionary with the keys “images” and “segmentation_masks” instead of using two different augmentation layer objects for images and masks. However, I had not seen this type of call using a dictionary in any tutorial, neither for Tensorflow nor for Keras. Nevertheless, I decided to change my code to make use of the transform_segmentation_masks() method, as I hoped that if such a method already existed, it would also process the images and masks correctly and thus avoid the mismatches.
Unfortunately, this was not directly the case, because some of the augmentation layers changed the masks to float32 data types although the input was uint8. Even with layers such as “RandomFlip”, which should not change or interpolate the data at all, but only shift it. So, I had to wrap all layers again in a custom layer, which castes the data back to the input data type before returning it:

class ApplyAndCastAugmenter(BaseImageAugmentationLayer):
    
    def __init__(self, other_augmenter, **kwargs):
        super().__init__(**kwargs)
        
        self.other_augmenter = other_augmenter
    
    def call(self, inputs):
        
        output_dtypes = {}
        is_dict = isinstance(inputs, dict)
        if is_dict:
            for key, value in inputs.items():
                output_dtypes[key] = value.dtype
        else:
            output_dtypes = inputs.dtype
        
        outputs = self.other_augmenter(inputs)
        
        if is_dict:
            for key in outputs.keys():
                outputs[key] = keras.ops.cast(outputs[key], output_dtypes[key])
        else:
            outputs = keras.ops.cast(outputs, output_dtypes)
        
        return outputs

With this (admittedly unpleasant) work-around, I was able to fix all the remaining problems. Images and masks finally matched perfectly after augmentation and this was immediately noticeable in a significant performance improvement in the training of my segmentation model.

For the future I would now like to see that wrapping with my custom ApplyAndCastAugmenter is no longer necessary but is handled directly by the transform_segmentation_masks() method and it would also be good to have a proper tutorial on image augmentation for semantic segmentation or to update the old tutorials.

The text was updated successfully, but these errors were encountered:

fchollet · 2025-02-04T11:41:06Z

Thanks for the report. @shashaka do you think we should add this cast op in the base image augmentation layer?

shashaka · 2025-02-04T14:35:16Z

@fchollet
I'm not sure whether dtype casting is necessary for most users.
However, if needed, it would be better to make it optional through a configuration flag in the base layer,
allowing users to enable it only when required.

RabJon · 2025-02-04T15:24:43Z

Sorry to burst into your discussion, but it reminded me of another behavior I observed. I've added it to the end of the gist under the heading “Showcase of tf.switch_case error due to incompatible dtypes”.

The point is that the tf.data.Dataset.map() method immediately throws an exception if the dtype of the possibly augmented segmentation masks is not uniform. This was originally the reason why I created my ApplyAndCastAugmenter layer. This behavior can only be observed if the layers from keras_cv "BaseImageAugmentationLayer" and "RandomAugmentationPipeline" are used.

However, if keras.layers.Layer and keras.layers.Pipeline are used as in the commented code above, the exceptions do not occur and the augmentations between images and masks also seem to match, as I have just found out.

github-actions bot assigned sachinprasadhs Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Several Problems when using Keras Image Augmentation Layers to Augment Images and Masks for a Semantic Segmentation Dataset #20857

Several Problems when using Keras Image Augmentation Layers to Augment Images and Masks for a Semantic Segmentation Dataset #20857

RabJon commented Feb 4, 2025

fchollet commented Feb 4, 2025

shashaka commented Feb 4, 2025

RabJon commented Feb 4, 2025

Several Problems when using Keras Image Augmentation Layers to Augment Images and Masks for a Semantic Segmentation Dataset #20857

Several Problems when using Keras Image Augmentation Layers to Augment Images and Masks for a Semantic Segmentation Dataset #20857

Comments

RabJon commented Feb 4, 2025

fchollet commented Feb 4, 2025

shashaka commented Feb 4, 2025

RabJon commented Feb 4, 2025