This is a work-in-progress reverse-engineered specification of ESRI .style files. This document tends to be out of date, with the best documentation of the specification being the Python parser code itself.
ESRI .style files are actually renamed Microsoft Access databases. Renaming the .style file to .mdb allows it to be opened directly within Access.
The database contains a number of tables for the different style types, eg "Line Symbols", "Marker Symbols", "Color Ramps". While these tables contain fields for the symbol name, category, and tags, unfortunately the symbol itself is stored inside a binary data column. Note that earlier versions of style files (pre ArcGIS 10?) do not have the tags field in these tables.
The included slyr/tools/style_to_bin.py
script can bulk export these symbol binary blobs from a .style file. (Note that all command line tools in slyr require mdbtools within the current path).
We'll refer to a binary blob representing a single symbol as a "symbol section".
While .lyr files include much more than just a layer's symbology, they do appear to utilise the same symbol section format as .style files. It may be possible to automatically detect the symbol sections within lyr files and automatically extract them, avoiding the need for the intermediate .style file.
The style binary blobs themselves are serialized COM objects. Some details on how ArcGIS handles persistence is available at http://edndoc.esri.com/arcobjects/9.2/NET/f39eb22f-fa51-4fb6-a413-55d33adddcac.htm . Basically, the binary blob is a stream, where individual COM objects can be read directly from the stream. Object types are identified by unique class IDs.
First, the class ID is read from the stream. Binary representations of CLSIDs are not a direct encoding of the CLSID string - rather the parts of the CLSID are written in different order. This appears to be an internal detail how Windows WriteClassStm operates.
The CLSID string of {7914e603-c892-11d0-8bb6-080009ee4e41}
is written as the binary representation 03e6147992c8d0118bb6080009ee4e41
.
See e.g. https://github.com/wine-mirror/wine/blob/6d801377055911d914226a3c6af8d8637a63fa13/dlls/compobj.dll16/compobj.c#L380 for how WINE re-implements this encoding.
Following an object CLSID, ESRI objects then write a two byte short value representing the object's version. Different object versions will be encoded differently, and may require special handling.
Colors are encoded inside a symbol using the CIELAB color model. They are represented as a sequence of 3 8-byte, little endian doubles, corresponding to the L, A and B components of the color.
For instance, #ff0000 is represented as
63 29 92 AF 04 46 4C 40
09 17 68 51 90 39 53 40
82 A2 B8 D4 9E 06 51 40
This corresponds to
- L: 56.547018
- A: 76.899433
- B: 68.103444
It seems that ESRI utilises Apple RGB Model as RGB Model with Whitepoint D65 and Gamma 1.8. Using these settings, we get RGB values very similar to those showed into ESRI products. Only for RGB values very close to zero, we have sometimes small differences, probably due to some approximation.
The conversion rules and value can be found on Bruce Lindbloom site
Immediately following the 8-byte color components, the next byte indicates whether the color has the "use windows dithering" option enabled (!) Not sure who really cares about this option anymore, but the following values are possible for this byte:
00
: no dithering01
: dither
The next byte indicates whether the color is a null color. A value of 00 indicates not null, and FF indicates a null (transparent) color.
Color models are represented by a byte, where
92
: HSV96
: RGB97
: CMYK
Internally both RGB and HSV colors are stored identically (as described above). CMYK colors seem to be stored differently, although more investigation is needed.
The first two bytes indicate the symbol type, with known values:
04 E6
: Fill symbolFF E5
: Marker symbolFA E5
: Line symbol
Then the following section, of unknown purpose but it's always the same in all styles encountered so far:
14 79 92 C8 D0 11 8B B6 08 00 09 EE 4E 41
: unknown sequence (1)02
: unknown meaning (B)00
: probably padding0D
: likely 'end of section' flag- 7 x
00
padding
Structure of line symbol styles:
Starts with 4 byte, little endian int representing the number of levels in symbol.
The next section is repeated for each level in the symbol. Levels are in reverse z-order, ie the bottom most level comes first.
First, a 2 byte unsigned integer representing the line symbol type enum. Known values are:
F9 E5
: Simple lineFD E5
: Marker lineFB E5
: Cartographic lineFC E5
: Hash line
The following is the structure for "simple line" types:
14 79 92 C8 D0 11 8B B6 08 00 09 EE 4E 41
: unknown sequence (1)01
: unknown, but probably relates to the02
(B) byte from the beginning of the symbol block00
: padding- color model byte (see above)
C4 E9 7E 23 D1 D0 11 83 83 08 00 09 B9 96 CC 01 00 01
: unknown sequence (2)00
x 2 padding- 26 bytes: L/A/B color components as 3 8-byte doubles + 2 bytes for the dithered/null flags (as described above)
- 8 bytes: outline width as a little endian double
- little endian unsigned int, for line type (see below)
0D
byte: again, likely an "end of section" flag00
x 7: padding
Line types are encoded using a single byte unsigned integer, with the following values:
- 0: Solid
- 1: Dashed
- 2: Dotted
- 3: Dash dot
- 4: Dash dot dot
- 5: Null
After this is repeated for each symbol layer, there's then two more repeating array sections. These are repeated for each symbol layer (in reverse z-order, as above).
The first consists of little endian unsigned ints, with values 1 for enabled symbol layers or 0 for disabled layers. There will be as many sequential unsigned ints here as there are symbol layers.
The second consists of little endian unsigned ints, with values 1 for locked symbol layers or 0 for unlocked layers. There will be as many sequential unsigned ints here as there are symbol layers.
Following this is a terminator of unknown meaning - 02
, then a bunch of 00
padding bytes (repeated varying times)
Following the symbol header we have a section which follows the same format as the outline and fill (described below), but of unknown purpose. This always seems to be the same values in all generated styles:
-
96
: RGB color model flag -
C4 E9 7E 23 D1 D0 11 83 83 08 00 09 B9 96 CC 01 00 01
: unknown sequence (2) -
3 x LAB color doubles (see above), but always black in reference files
-
00
: likely the 'no dither' flag for some default black color -
00
: likely a 'not null' flag for this default black, dithered color -
01
: number of levels in symbol - seems to take 4 bytes, little endian -
2 byte integer: fill symbol type enum
Fill symbol types:
03 E6
: Simple fill06 E6
: Line pattern fill08 E6
: Marker fill09 E6
: Gradient fill
The following is the structure for "simple fill" types:
A block of unknown purpose:
14 79 92 C8 D0 11 8B B6 08 00 09 EE 4E 41
: repetion of unknown sequence (1) from above01
: unknown, but probably relates to the02
(B) byte from the beginning of the symbol block00 F9 E5 14 79 92 C8 D0 11 8B B6 08 00 09 EE 4E 41 01 00
: unknown sequence (3)
The interesting part of a fill symbol section follows this. While it's unknown what the bytes before this offset represent, but for the selection of simple fill styles used as reference these bytes were identical.
The next 67 bytes seem to represent the outline style:
- 1 byte: The color model (92/96/97)
- 20 bytes:
C4 E9 7E 23 D1 D0 11 83 83 08 00 09 B9 96 CC 01 00 01 00 00
repetition of unknown sequence (2) - 26 bytes: L/A/B color components as 3 8-byte doubles + 2 bytes for the dithered/null flags (as described above)
- 8 bytes: outline width as a little endian double
There's then 4 empty bytes, followed by a 0D byte. This 0D byte is likely a flag to indicate the start of the fill style. Then 7 more empty bytes.
Following this begins the fill style section:
- 1 byte: The color model.
- 20 bytes:
C4 E9 7E 23 D1 D0 11 83 83 08 00 09 B9 96 CC 01 00 01 00 00
repitition of unknown sequence (2) - 26 bytes: L/A/B color components as 3 8-byte doubles + 2 bytes for the dithered/null flags (as described above)
0D
byte: again, likely an "end of section" flag00
x 11: padding- little endian unsigned int, with a value 1 for enabled symbol layers or 0 for disabled layers
- little endian unsigned int, with a value 1 for locked symbol layers or 0 for unlocked layers
02
: unknown meaning