Titus the Fox / Moktar SQZ file format by Jesses (mail at ttf dot mine dot nu)
Visit https://ttf.mine.nu for more TTF/Moktar stuff

SQZ decoding explanation, v1.0.
Version history:
  1.0 - initial version

This file describes how to decompress the SQZ files. One of two compression
algorithms is used: either LZW or Huffman+RLE. A byte at offset 1 determines
which one of them is actually in use.


File format:
Offset	Size	What
0	1	bits 0..3: high nibble of uncompressed size, bits 4..7 unused
1	1	0x10: LZW is used, something else (0): Huffman+RLE is used
		CDRUN.COM's unpacker considers values above 0x10 invalid
2	2	low word of uncompressed size

If LZW is used:
Offset	Size	What
4	rest	LZW bitstream

If Huffman is used:
Offset	Size	What
4	2	HTS: Huffman tree size in bytes
6	HTS	HT: Huffman tree as array of 16-bit words
6+HTS	rest	Huffman bitstream


A perl implementation is given in the file "unpack.pl".


How to decode LZW bitstream:

The bitstream consists of variable-length codewords; from here on, their
word size is named 'nbit'. Initially, nbit == 9.

LZW makes use of a dictionary; the codewords are used as indices to perform
lookups. This dictionary is not specified explicitly in the compressed stream
though, instead it is derived during decompression. Initially the dictionary
contains 258 entries. For entries 0..255, the value of each entry is simply
the ASCII binary form of its index: dict[i] = chr(i). The remaining two
entries, 0x100 and 0x101, have a special purpose and aren't used for lookups.
The dictionary has a maximum size of 0x1000 entries.

The bitstream codewords can be divided in two categories. The first category
is formed by the two special values mentioned above, CLEAR_CODE (0x100) and
END_CODE (0x101). Whenever CLEAR_CODE is encountered, the decoder should
reset its state to the initial condition, that is, set nbit to 9 and reset
the dictionary to its initial 258 entries. END_CODE, as expected, marks the
end of the stream.

The second category is formed by all other codeword values. For each
codeword, first a new entry is optionally added to the dictionary. No entry
is added immediately after a CLEAR_CODE, and neither when the dictionary is
full. Next, the codeword's value is used as an index in the dictionary: the
corresponding entry is output.

The value of the new dictionary entry is the output generated by the previous
codeword, plus one more byte. This byte is normally the first byte of the
current codeword's output. However, sometimes the current codeword points to
a not-yet-existing dictionary entry. In this case, the extra byte's value is
the first byte of the previous output. So, if the previous codeword generated
an output of 'ABC', and the current codeword points beyond the dictionary,
the new dictionary entry would be 'ABCA'.

For this to work correctly, the input stream must fulfill a few conditions,
which all apply only when the current codeword points to a not-yet-existing
entry. First, the previous codeword can't be a CLEAR_CODE, as there is no
previous output in that case. Secondly, the dictionary cannot be full, as
otherwise there is no room to add a new entry, which is needed to lookup the
output. Finally, the current codeword must point to the to-be-added entry,
for the same reason. It is the responsibility of the compressor to make sure
these conditions hold, and they do for all LZW-encoded SQZ files in both
Moktar and Titus the Fox.

If after adding an entry the number of elements in the dictionary is equal to
2^nbit, the value of nbit is incremented; unless nbit is equal to 12.

The number of unused input bits following END_CODE is at least one and at
most eight, so sometimes an entirely unused input byte is present at the end.

Regarding bit-order: consider the first three input bytes of a file, and
label the bits like this, with most-significant bits on the left in each byte:
	76543210 FEDCBA98 NMLKJIHG
then, the first two codewords (most-significant bit left) are:
	76543210F EDCBA98NM

Constants:
	CLEAR_CODE	= 0x100 (0x101 for CDRUN.COM)
	END_CODE	= 0x101 (0x100 for CDRUN.COM)
	MAX_TABLE	= 0x1000

Pseudocode:
	int nbit
	stringarray dict

	int prev := CLEAR_CODE
	while (prev != END_CODE) {
		if (prev == CLEAR_CODE) {
			nbit := 9
			dict := [chr(0)..chr(255), ?, ?] // 258 entries, ?=unused
		}
		int cw := next nbit-sized codeword
		if (cw != CLEAR_CODE && cw != END_CODE) {
			int newbyte
			if (cw < num_elem(dict)) {
				newbyte := first_byte(dict[cw])
			} else {
				// Assumption 1: prev != CLEAR_CODE
				// Assumption 2: num_elem(dict) < MAX_TABLE
				// Assumption 3: cw == num_elem(dict)
				newbyte := first_byte(dict[prev])
			}
			if ((prev != CLEAR_CODE) && (num_elem(dict) < MAX_TABLE)) {
				append dict[prev] ++ newbyte to dict
				if (num_elem(dict) == 2**nbit && nbit < 12) {
					nbit++
				}
			}
			output dict[cw]
		}
		prev := cw
	}

Implementation note:
	The above pseudocode focuses on clarity, not efficiency. It is
	possible to apply some optimizations. A simple optimization is to not
	explicitly keep the first 258 entries of the dictionary in memory,
	since the first 256 are trivially computed and the remaining two are
	unused. This gains some memory bytes, at the cost of a few extra
	conditional statements and subtractions of 0x102.

	More interesting optimizations are possible. In particular, the use
	of string data types can be completely avoided. Any addition to the
	dictionary is formed by a prefix and a single byte, where the prefix
	already exists in the dictionary at index 'prev'. Instead of storing
	this prefix literally, it can also be stored as a pointer to the
	index where it occurs. This, together with the maximum dictionary
	size, allows for the array 'dict' to be of constant bytesize.

	The perl implementation in "unpack.pl" uses these two optimizations.

Example output:
	LEVEL1.SQZ is identical for both Moktar and Titus the Fox. The first
	twelve codewords, which will yield 38 decompressed bytes, are:
		01C 045 053 104 105 106 107 105 097 108 109 10B
	After processing these codewords, the dictionary contains literally:
		(1C 45, 45 53, 53[2x], 53[3x], 53[4x], 53[5x], 53[6x],
		 53[3x] 97, 97 53, 53[7x], 53[3x] 97 53)
	The same dictionary, shown in pointer+byte format as explained above:
		(01C+45, 045+53, 053+53, 104+53, 105+53, 106+53, 107+53,
		 105+97, 097+53, 108+53, 109+53)
	The first bytes of the decompressed output at offset 0x00000 are:
		1C 45 53[18x] 97 53[9x] 97 53[10x] 97 53[6x] 96 53[3x] 97
	At offset 0x00321 in the output, nbit changes its value for the first
	time. Surrounding output bytes, starting at offset 0x318, are:
		96 53[3x] 97 53 96 53[3x] 97 96 53[3x] 97 53 96 53[3x] 97
	At offset 0x012EC in the output, a CLEAR_CODE occurs in the input.
	The surrounding output bytes, starting at offset 0x012E0, are:
		81 BD B6[10x] EB EC B7 B6[9x] EB EC B7 B6[4x] EB 1C 45 53[4x]
	Finally, the last 20 bytes are:
		68 47 CA 47 FF 23 03 23 03 0B 00 08 CA 00[3x] B0 0D 40 02
	CRC-32 of the uncompressed LEVEL1.SQZ is 0x44800FE5.


How to decode Huffman+RLE bitstream:

In Huffman coding a fixed binary tree structure is used, with output
codewords stored at its leaves. The Huffman bitstream sequentially addresses
these leaf nodes. The tree is navigated node for node, starting at the root.
One bit is then read from the stream. This bit determines the next node to
visit: a bit value of 0 means visit the first (left) child, where a value of
1 means take the second (right) child. Whenever a leaf node is reached, its
value is output and the current node position is reset to the root.

In most Huffman implementations the tree is stored efficiently in what is
known as "canonical" form. In the SQZ files however, this is not the case and
the tree is stored in a very straightforward fashion, making it easy to use.
The binary Huffman tree HT is stored as array of 16-bit words, where each word
represents a node; each node is either an internal node or a leaf node. The
value stored in the array for a node has two parts: bit 15 is set if it is a
leaf node, bits 0..14 contain the node's value. For leaf nodes this value is
a codeword, for internal nodes it is 2*(index of first child), or put another
way: the byte offset in the array of the first child. The second child of an
internal node is stored immediately after its first child.

Bit order is such that for each byte, the most significant bit is processed
first.

Pseudocode for reading codewords (Huffman decoding):
	int node := 0
	while (bool b := next bit) {
		if (b) {
			node++
		}
		if (HT[node] < 0x8000) {
			// internal node
			node := HT[node] / 2
		} else {
			// leaf node
			int cw := HT[node] & 0x7FFF
			process codeword cw
			node := 0
		}
	}

This results in a sequence of RLE codewords. Each codeword 'cw' is processed
in the following way:

Let L == cw mod 256
Let H == cw div 256

Codeword	What to do
H==0		last := L; output 'last'
H!=0 && L==0	read next codeword; output cw times 'last'
H!=0 && L==1	read next codeword; count := L*256;
		read next codeword; count += L;
		output count times 'last'
H!=0 && L>=2	output L times 'last'

This is easily implemented using a state machine: the following pseudocode
can be directly inserted in the loop above.

The pseudocode uses three pre-existing variables:
	int state := 0
	byte last
	int count

Pseudocode for processing a codeword cw (RLE decoding):
	byte L := cw & 255
	byte H := cw >> 8
	if (state == 0) {
		if (H == 0) {
			last := L
			output 'last'
		} else if (L == 0) {
			state := 1
		} else if (L == 1) {
			state := 2
		} else {
			output L times 'last'
		}
	} else if (state == 1) { // cw == repeat count
		output cw times 'last'
		state := 0
	} else if (state == 2) { // L == high byte of 'count'
		count := L*256
		state := 3
	} else if (state == 3) { // L == low byte of 'count'
		count += L
		output count times 'last'
		state := 0
	}

Example output:
	The games only have a few Huffman-encoded files. Moktar's only file
	is SPRITES.SQZ, Titus the Fox has SPREXP.SQZ. In SPRITES.SQZ the
	first 9 input bit groups corresponding to codewords are:
		11 1011 10100 0000101110 11 10010 0001110001 10011 10100
	The first 12 intermediate codewords, after Huffman but before RLE:
		0000 0100 0003 007A 0000 0001 0098 0080 0003 0065 00C0 0000
	The first 16 bytes of decompressed output for SPRITES.SQZ are:
		00 00 00 00 7A 00 01 98 80 03 65 C0 00 98 40 02
	Now for SPREXP.SQZ, the first 12 input bit groups:
		11 11 011010 11 11 0100101 001100000 11 11 0100011 011101 11
	The first 12 intermediate codewords:
		0000 0000 0008 0000 0000 0030 0028 0000 0000 0018 0040 0000
	And the first 16 output bytes:
		00 00 08 00 00 30 28 00 00 18 40 00 00 3C 08 00
	In both files, there is only one codeword where H!=0 && L>=1. The
	surrounding codewords, for SPRITES.SQZ starting at output offset
	0x2B131 and for SPREXP.SQZ at 0x2AE29, are:
		0000 0100 0009 000C 0000 0101 0001 0008 0020 0000 0100 0008
	The output generated by these codewords is:
		00 00[9x] 0C 00 00[256x] 00[8x] 20 00 00[8x]
	Finally, the last 16 bytes of both files are:
		00 00 0C 60 0E E0 07 C0 07 80 01 00 00 00 00 00
	CRC-32 of the uncompressed SPRITES.SQZ is 0xAA81B13A; for SPREXP.SQZ
	it is 0xEB0338C9.