ePSILON: Information theory

Information: the ability to distinguish among possible alternatives.
How to measure Information: Shannon says: the amount of information contained in a message from a source is the entropy H:
H = log2 (M)
M : number of possible messages.

Comments:
- Entropy formula uses log base 2, it means the measurement unit is "bit".
- H is interpreted as the amount of information we lack before we receive a message.
- H of a source equals the minimum number of bits required to represent (encode) a message.
- The formula above assumes that each possible message is equally likely (same probability).

Notes on probability:
- Frequency and probability are very likely to be nearly equal if we make many experiments.
- p(x|y) = p(x & y) / p(y) ; Notice that p(y) = sigma( p(x & y) ) for all x
- F is a numeric function of x. Average(F) = sigma(p(x) * F(x)) for all x (if probabilities are equal, then the average is exactly the arithmetic mean).

Generalisation:
- Given a message x, the surprise of x is : s(x) = log2(1/p(x))
- The more surprising message tells us more (information)
- General formula of entropy: S is the surprise function of x, then
H = Average(S) = sigma(p(x) * log2(1/p(x)))

Communication:
- The codeword is the format that a message (letter, word ...) is represented (encoded) by.
- Efficient communication: the more common a message is, the shorter should be the corresponding codeword.
- Encoding-Decoding bijection: the encoding should satisfy the prefix-free condition = No codeword can be a prefix of an other codeword.
- An encoding is free-prefix => the Kraft inequality is satisfied.
- Kraft inequality: sigma(1/2^l)<= 1 over all codewords (l is codeword's length). The inequality states that they should not be so many short codewords in an encoding.
- Given a free-prefix encoding of X (messages from a source); and L is length's function of the corresponding codewords, then: Average(L) >= H(X).
This inequality states that:

The length of codewords, on average, can not be shorter than the entropy of information source.
For encoding efficiency, the average of lengths should be closest (or equal) to the entropy (e.g Huffman code is the best efficient free-prefix encoding)
Application: data compression is the process of re-encoding data in a more efficient way (closest to H). Most of the time, this is achieved by analysing data regularities (text pattern ....)

Shannon's first theorem: H = the minimum of {Average(L), where L is length's function of a possible encoding}

Data compression: run length encoding (for simple image, e.g black and white).

ePSILON

Information theory - elementary

No comments: