MSc Dissertation at the University of Edinburgh

Compression

snappy-c

Snappy-c is third-party implementation of the snappy algorithm providing C-language bindings
Compiles on Morello purecap without issues, build using gmake. Resulting binaries are executable including a provided verifying script, command line interface and/or object files to be included in other projects.
Running on Morello provides runtime safety, for example providing wrongly sized buffers results in a runtime error. Similarly, testing unaligned sizes with provided buffers, uninitialised memory and similar. Question is whether the code should check for the missmatch or just fail on Morello? I assume on traditional architecture it will fail as well with an alternative error, but the code is then open to certain vulnerabilities, presumably.
Running verify (their testing file) shows no issues. I have written short bash script to handle running on compression-corpus, which contains several files of different formatting, file extensions and length to be tested.
snappy-c on GitHub, snappy-c fork, snappy-c API, Main project website, snappy on GitHub

zip

Library had a buffer overflow bug, which was discovered through their provided tests.

Running tests...
/usr/local64/bin/ctest --force-new-ctest-process 
Test project /usr/home/s1765150/dissertation/zip/build
    Start 1: test_write.out
1/6 Test #1: test_write.out ...................   Passed    0.00 sec
    Start 2: test_append.out
2/6 Test #2: test_append.out ..................   Passed    0.00 sec
    Start 3: test_read.out
3/6 Test #3: test_read.out ....................   Passed    0.00 sec
    Start 4: test_extract.out
4/6 Test #4: test_extract.out .................   Passed    0.00 sec
    Start 5: test_entry.out
5/6 Test #5: test_entry.out ...................Signal 34***Exception:   0.20 sec
    Start 6: test_permissions.out
6/6 Test #6: test_permissions.out .............   Passed    0.01 sec

83% tests passed, 1 tests failed out of 6

Total Test time (real) =   0.22 sec

The following tests FAILED:
          5 - test_entry.out (Signal 34)
Errors while running CTest
Output from these tests are in: /usr/home/s1765150/dissertation/zip/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
gmake: *** [Makefile:74: test] Error 8

In summary, the code contained a check for an array index. The check is tested each loop iteration, but only after accessing the array already. Ultimately, resulting in buffer over-read of the last index, which is out of the bounds of the buffer.

Running the failing test individually results in a In-address space security exception (core dumped) error, which are caused by a flipped && condition of an array index. More detail is provided in the issue filed in the original repository description here issue.

Ruize and I found the bug through different means, we updated each other on the issue and collaborated on finding a fix. The fix has now been upstreamed to the original repository. Testing the new release on Morello all tests now pass successfully.

zip, "This is done by hacking awesome miniz library and layering functions on top of the miniz v2.2.0 API."

FastLZ

Found a bug, Ruize found a fix, which makes all the tests pass. A buffer overflow caused a In-address space security exception (core dumped).
Code alteration can be found in the submodule, which is a fork of the original repository. Bug fix included and added defined(__aarch64__) || defined(_M_ARM64) to recognise ARM as a 64-bit system.
FastLZ, FastLZ fork

zlib

Compiles with no issues, all tests pass
Main project website, zlib on GitHub

QuickLZ

Seems to cause a memory fault In-address space security exception (core dumped), this happens only when decompressing with QLZ_STREAMING_BUFFER 0 and QLZ_COMPRESSION_LEVEL 1
The code contains a large amount of sections, which are either omitted or included in compilation based on several defines

Decompression investigation and bugs

An investigation shows a cast of (const unsigned char *)(size_t) from a const unsigned char * type, which causes the store pointer to not be dereferencable. A pointer variable is cast to an integer type and than back to a pointer, which under CHERI results in an invalid capability due to missing bounds as they cannot be infered.

typedef struct 
{
#if QLZ_COMPRESSION_LEVEL == 1
 const unsigned char *offset;
#else
 const unsigned char *offset[QLZ_POINTERS];
#endif
} qlz_hash_decompress;

A decompression struct is created to contain an offset for compression tracking, which under QLZ_COMPRESSION_LEVEL 1 is only ever of a pointer type. The snippet below showcases the codeline, which performs the undesirable casting.

offset2 = (const unsigned char *)(size_t)state->hash[hash].offset;

The offset is cast and later dereferenced in the code causing a capability fault at runtime. The cast is removed such that the offset can be stored as a dereferencable pointer and a valid capability.

Compression investigation

In addition, there is also a 64-bit flag, which enables some additional code segments and provides a macro for casting. The code specifies it enables 64-bit optimization.

#if QLZ_COMPRESSION_LEVEL == 1 && defined QLZ_PTR_64 && QLZ_STREAMING_BUFFER == 0
 #define OFFSET_BASE source
 #define CAST (ui32)(size_t)
#else
 #define OFFSET_BASE 0
 #define CAST
#endif

An example of such usage is shown below:

const unsigned char *o;
ui32 hash, cached;

hash = hash_func(fetch);
cached = fetch ^ state->hash[hash].cache;
state->hash[hash].cache = fetch;

o = state->hash[hash].offset + OFFSET_BASE;
state->hash[hash].offset = CAST(src - OFFSET_BASE);

This specific code fragment stores an offset into the o pointer variable. The offset is defined as follows:

typedef struct 
{
#if QLZ_COMPRESSION_LEVEL == 1
 ui32 cache;
#if defined QLZ_PTR_64 && QLZ_STREAMING_BUFFER == 0
 unsigned int offset;
#else
 const unsigned char *offset;
#endif
#else
 const unsigned char *offset[QLZ_POINTERS];
#endif
} qlz_hash_compress;

As a result it can be either an integer or pointer type under QLZ_COMPRESSION_LEVEL 1. This can cause a similar danger to previous mentioned scenarios under decompression. The evaluation suggests that the capability is always correctly constructed. The OFFSET_BASE is created based on the source buffer for compression and when incremented with the offset from the struct a valid pointer is constructed. The result is then stored and and a chunk of the buffer is compressed. The updated value also uses the OFFSET_BASE and integer casting to appropriatly traverse it during the compression iteration.

QuickLZ fork

lzbench

large compilation project pulling multiple compression libraries into one big repository
Abandoned for now
lzbench

minilzo

Minilzo is a subset of the lzo library intended to be included in other projects
Can't build it, missing defined variable names
minilzo, lzo, project include

miniz

Having a hard time running the build script, multiple failures caused by the linker
Abandoned for now
miniz

liblzf

Abandoned

ImageMagick

Running instructions

First run the configure script, which automatically sets up a building Makefile with the default installation location:

./configure --build=aarch64-unknown-freebsd14.0

The --build=aarch64-unknown-freebsd14.0 flag is used to recognise the custom aarch64c achitecture.

Optionally, if you later wish to install the tool set a custom install directory with the --prefix=PREFIX flag, e.g. --prefix=$HOME/im to install in the user home directory under im.

I run my configuration as:

./configure --build=aarch64-unknown-freebsd14.0 --prefix=$HOME/im

Other flags can also be provided to run the building script. The building including feature toggles can be fully customised. To view all the options visit Install from Source, Advanced Linux Installation or run ./configure -h.

In order to compile the source code on Morello run gmake inside the ImageMagick directory.

Now the tool can be run by:

./utilities/magick

The tool can be optionally install the binaries with gmake install either to the default /usr/local or the --prefix= defined directory. Included tests can be run with gmake check. To remove all files produced by installation run gmake uninstall. In addition, gmake clean and gmake distclean are also available to remove files produced by compilation using gmake and running the configure script, respectively.

Porting

Several compilation warnings show up during build: warning: cast from provenance-free integer type to pointer type will give pointer that can not be dereferenced [-Wcheri-capability-misuse]
The issues appear in MagickWand\wand.c and MagickCore\distribute-cache.c.
Community discussion here.

The below example is taken from wand.c lines 74 to 94 before ImageMagick community changes were made.

WandExport size_t AcquireWandId(void)
{
  static size_t
    id = 0;

  size_t
    wand_id;

  if (wand_semaphore == (SemaphoreInfo *) NULL)
    ActivateSemaphoreInfo(&wand_semaphore);
  LockSemaphoreInfo(wand_semaphore);
  if (wand_ids == (SplayTreeInfo *) NULL)
    wand_ids=NewSplayTree((int (*)(const void *,const void *)) NULL,
      (void *(*)(void *)) NULL,(void *(*)(void *)) NULL);
  wand_id=id++;
  (void) AddValueToSplayTree(wand_ids,(const void *) wand_id,
    (const void *) wand_id);
  instantiate_wand=MagickTrue;
  UnlockSemaphoreInfo(wand_semaphore);
  return(wand_id);
}

The compiler warning aries from the fact that an integer type of wand_id variable is being cast to a void * pointer. As a result a pointer is created from an integer value, which is not dereferencable. Capability tag is 0 and bounds cannot be set. ImageMagick developers argue that it is permitted under the C standard and expect the size_t to be capable of holding a pointer. Such assumption does not always hold true. A great dicussion on this topic is provided here.

The SplayTree structure is designed to hold a wide variety of types including integers, strings, blobs, etc. The ImageMagick codebase is constrained by keeping the API consistent, for example changes to update function signatures to use the uintptr_t type is not viable.

As a result the ImageMagick developers created a solution, which does remove the warnings. A proposed solution uses a NULL pointer, which is incremented and passed as the function argument. Such solution has several problems, which are explained in more detail in the ImageMagick discussion.

A first update to the ImageMagick code changes the wand_id to a pointer type as shown below:

WandExport size_t AcquireWandId(void)
{
  size_t
    id;

  static size_t
    *wand_id = NULL;

  if (wand_semaphore == (SemaphoreInfo *) NULL)
    ActivateSemaphoreInfo(&wand_semaphore);
  LockSemaphoreInfo(wand_semaphore);
if (wand_ids == (SplayTreeInfo*) NULL)
    wand_ids=NewSplayTree((int (*)(const void*,const void *)) NULL,
(void*(*)(void*)) NULL,(void *(*)(void *)) NULL);
  wand_id++;
id=(size_t) (wand_id-(size_t*) NULL);
  (void) AddValueToSplayTree(wand_ids,wand_id,wand_id);
  instantiate_wand=MagickTrue;
  UnlockSemaphoreInfo(wand_semaphore);
  return(id);
}

This implementation has been very quickly changed to the solution below:

WandExport size_t AcquireWandId(void)
{
  const size_t
    *wand_id = (const size_t *) NULL;

  size_t
    id;

  static size_t
    isn = 0;

  if (wand_semaphore == (SemaphoreInfo *) NULL)
    ActivateSemaphoreInfo(&wand_semaphore);
  LockSemaphoreInfo(wand_semaphore);
  if (wand_ids == (SplayTreeInfo *) NULL)
    wand_ids=NewSplayTree((int (*)(const void *,const void *)) NULL,
      (void *(*)(void *)) NULL,(void *(*)(void *)) NULL);
  id=isn++;
  (void) AddValueToSplayTree(wand_ids,wand_id+id,wand_id+id);
  instantiate_wand=MagickTrue;
  UnlockSemaphoreInfo(wand_semaphore);
  return(id);
}

Semantically, it is pretty much the same implementation and exhibits similar issues.

Firstly, the proposed solution of pointer arithmetic increases the wand_id pointer value by the size of the type being pointed to. Essentially, the ids are not incremented by 1 anymore. The operation of wand_id+id results in increments of 8 in my environment.

Secondly, I think that the code can be fairly complicated to understand as the intention might not be clear to developers. This could potentially cause issues in future changes to the software. The value is created from a NULL pointer, which is then still treated as an integer. The definition of incrementing a NULL pointer doesn't seem to be well defined in the C language either. In fact, the UndefinedBehaviorSanitizer from LLVM says it's undefined behaviour.

As a result a suggestion was proposed to use the uintptr_t. The usage of uintptr_t, which is an integer type designed to hold values that might be valid pointers if cast back to a pointer type (see 4.2.1 in CHERI C/C++ Programming Guide). Such changes would show the intention of storing an integer value as a pointer.

Some concern have been raised with the uintptr_t being optional part of the C11 standard (they prefer the c++ compiler, C++ as the better C).

An alternative solution would be to allocate the id, of type size_t, on the heap as that guarantees a proper pointer to secure in the splay-tree.

A new solution taking the uintptr_t into account is shown below.

WandExport size_t AcquireWandId(void)
{
  MagickAddressType
    wand_id;

  static size_t
    id = 0;

  if (wand_semaphore == (SemaphoreInfo *) NULL)
    ActivateSemaphoreInfo(&wand_semaphore);
  LockSemaphoreInfo(wand_semaphore);
  if (wand_ids == (SplayTreeInfo *) NULL)
    wand_ids=NewSplayTree((int (*)(const void *,const void *)) NULL,
      (void *(*)(void *)) NULL,(void *(*)(void *)) NULL);
  wand_id=id++;
  (void) AddValueToSplayTree(wand_ids,(const void *) wand_id,(const void *)
    wand_id);
  instantiate_wand=MagickTrue;
  UnlockSemaphoreInfo(wand_semaphore);
  return((size_t) wand_id);
}

The optionality of uintptr_t has been resolved with a macro below, which has already been part of the code-base previously.

#if MAGICKCORE_HAVE_UINTPTR_T || defined(uintptr_t)
typedef uintptr_t MagickAddressType;
#else
/* Hope for the best, I guess. */
typedef size_t MagickAddressType;
#endif

Issues

Toggling features and packages in configuration using --with-something, --without-something for packages and --enable-something, --disable-something for features
- --disable-docs disable documentation flag? It seems to build fine with it
- --without-x disable X11? It seems to build fine with it
- --disable-installed is it needed? creates a distributable portable installation

Testing results

============================================================================
Testsuite summary for ImageMagick 7.1.0-44
============================================================================
# TOTAL: 86
# PASS:  80
# SKIP:  0
# XFAIL: 0
# FAIL:  3
# XPASS: 0
# ERROR: 3

Some tests seem to be failing due to not being able to access fonts or other dependencies which are not included in the configuration.

Notes and links

ImageMagick, ImageMagick on GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
FastLZ @ 730fbec		FastLZ @ 730fbec
ImageMagick @ f2398de		ImageMagick @ f2398de
QuickLZ @ 008b9cd		QuickLZ @ 008b9cd
examples		examples
snappy-c @ d71c35b		snappy-c @ d71c35b
zip @ 114d9d4		zip @ 114d9d4
zlib @ 21767c6		zlib @ 21767c6
.gitmodules		.gitmodules
README.md		README.md
fastlz.sh		fastlz.sh
snappy-c.sh		snappy-c.sh
zip.sh		zip.sh
zlib.sh		zlib.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MSc Dissertation at the University of Edinburgh

Table of contents

Compression

snappy-c

zip

FastLZ

zlib

QuickLZ

Decompression investigation and bugs

Compression investigation

lzbench

minilzo

miniz

liblzf

ImageMagick

Running instructions

Porting

Issues

Testing results

Notes and links

Miscellaneous porting

About

Releases

Packages

Languages

ka5p3rr/MScDissertation

Folders and files

Latest commit

History

Repository files navigation

MSc Dissertation at the University of Edinburgh

Table of contents

Compression

snappy-c

zip

FastLZ

zlib

QuickLZ

Decompression investigation and bugs

Compression investigation

lzbench

minilzo

miniz

liblzf

ImageMagick

Running instructions

Porting

Issues

Testing results

Notes and links

Miscellaneous porting

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages