Information theory - elementary

Information: the ability to distinguish among possible alternatives.
How to measure Information: Shannon says: the amount of information contained in a message from a source is the entropy H:
H = log2 (M)
M : number of possible messages.

Comments:
- Entropy formula uses log base 2, it means the measurement unit is "bit".
- H is interpreted as the amount of information we lack before we receive a message.
- H of a source equals the minimum number of bits required to represent (encode) a message.
- The formula above assumes that each possible message is equally likely (same probability).

Notes on probability:
 - Frequency and probability are  very likely to be nearly equal if we make many experiments.
p(x|y) = p(x & y) / p(y) ;  Notice that p(y) = sigma( p(x & y) ) for all x
- F is a numeric function of x. Average(F) = sigma(p(x) * F(x)) for all (if probabilities are equal, then the average is exactly the  arithmetic mean).


Generalisation:
- Given a message x, the  surprise of x is : s(x) = log2(1/p(x))
- The more surprising message tells us more (information)
- General formula of entropy: S is the surprise function of x, then
H = Average(S) = sigma(p(x) * log2(1/p(x)))

Communication:
- The codeword is the format that a message (letter, word ...) is represented (encoded) by.
- Efficient communication: the more common a message is, the shorter should be the corresponding codeword.
- Encoding-Decoding bijection: the encoding should satisfy the prefix-free condition = No codeword can be a prefix of an other codeword.
-  An encoding is free-prefix => the Kraft inequality is satisfied.
- Kraft inequality:  sigma(1/2^l)<= 1  over all codewords (l is codeword's length). The inequality states that they should not be so many short codewords in an encoding.
- Given a free-prefix encoding of X (messages from a source); and L is length's function of the corresponding codewords, then:  Average(L) >= H(X).
This inequality states that:
  • The length of codewords, on average, can not be shorter than the entropy of information source.
  • For encoding efficiency, the average of lengths should be closest (or equal) to the entropy (e.g Huffman code is the best efficient free-prefix encoding)
  • Application: data compression is the process of re-encoding data in a more efficient way (closest to H). Most of the time, this is achieved by analysing data regularities (text pattern ....) 
Shannon's first theorem: H = the minimum of {Average(L), where  L is length's function of a possible encoding}

Data compression: run length encoding (for simple image, e.g black and white).


No comments: