Skip to content

Latest commit

 

History

History

07-code-and-compression

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Code and Compression

Let's Play

Let's play the Compression Game

Working with Sequences

  • Alphabet: A = {a,b,c, ... } (Set of tokens)
  • Sequence: S = (t1, t2, t3, ...) (String of tokens)
  • Size of Alphabet: |A|
  • Encodings (ASCII, Unicode, Binary ...)

Recap: Turtle Graphics

  • Mapping Token Sequences to Turtle Graphics
    • Absolute direction corresponding to a token
    • Relative direction corresponding to a token

Run Length Encoding (RLE)

Visual Example I

The original image is on the left, runlength code in the center, decoded image on the right.
Each pixel corresponds to one token (or one byte).

Visual Example II

Original image on the left, glitched runlength code in the center, decoded image on the right.

Approach

Identify repeating Tokens.

  • Scale Level: Tokens

Encoding Example

  • rle(11111111111111111111, 1) → 91 + 91 + 21919192
  • rle(11111222333333333333, 1) → 51 + 32 + 93 + 3351329333
  • rle(12345678912345678912, 1) → 11 + 12 + 13 + 11 + 12 + 13112131415161718191112131415161718191112

Decoding Example

  • rle(123456789123456789, -1) → 12 + 34 + 56 + 78 + 91 + 23 + 45 + 67 + 8 9= 2444666668888888133555577777799999999

Problems

  • How to save the RLE as a sequence of tokens?

    1. Fixed Token Size + Maximum Repeat
    2. Fixed Token Size + Escape Codes (See also: UTF8)
    3. Seperator Token

Delta Encoding

Identify Differences between successive Tokens

  • Scale Level: Ordered Tokens, Circluar Ordered Tokens

Encoding Example

  • delta(11111111111111111111, 1) → 10000000000000000000
  • delta(11111222333333333333, 1) → 10000100100000000000
  • delta(12345678912345678912, 1) → 11111111121111111121

Decoding Example

  • delta(00000000000000000000, -1) → 00000000000000000000
  • delta(11111111111111111111, -1) → 1234567890123456789)0

Index Encoding

  • Identify Sequences of Tokens encountered before

Encoding Example

  • idx(123123123123, 1) → 1 + 2 +3 + mem(-3,3) + mem(-3, 3) → 0102033333
  • idx(12345678912345678912) → 1 + 2 + 3+ 4 + 5 + 6 + 7 + 8 + 9 → mem(9, 9) + mem(9,2) → 0102030405060708099992

Decoding Example

  • idx(0102033366) → 123 + idx(3366) → 123123+ idx(66) → 123123123123

Grammar Encoding

  • Replace frequently used tokens by new tokens (i.e. Frequency Encoding)
  • Rewrite results as a Grammar

Encoding Example

  • (1212121212121212121212) → (33333333333, 312)
  • (1122112211221122112211) → (34343434343, 311, 422)

LZW Encoding

  • Lempel-Ziv-Welch Encoding (LZW) = Frequency + Index Encoding
  • ZIP files!
  • Lempel-Ziv-Welch (LZW) on Wikipedia

Sequitur

Upwrite Predictor

  • Upwrite Predictor Project Site on Archive.org
  • The UpWrite Predictor A General Grammatical Inference Engine for Symbolic Time Series, with Applications in Natural Language Acquisition and Data Compression – PHD Thesis of Jason Hutchens (PDF)

Exercises

Exercises can be found here.

Solutions

Solutions can be found here.

Papers