Parnas mentors

Everard M. Wiliams
Wiliams taught me that:
- We should not use the title "Engineer" to describe anyone who builds things, but only to refer to those whose education and experience prepared them to base their practice on sound scientific and mathematical principles.

Alan J. Perlis
He was the first to show me that the intellectual discipline and mathematical rigor that I enjoyed in engineering , could also applied to software.
From him I learned to look for methods that would get the job done reliably, not mechanisms (he is referring to AI) that were modeled on the way that people or animals appear to solve a problem. .. Imitating human behavior is not the path to reliable software.

Leo Aldo Finzi
Finzi taught us that we were designing in a multidimentional space.. and that no single metric (or figure of merit) could capture what was meant by "good design"... He also showed us it was always possible to design a product that had a good value of a metric, but was still a lousy design.

Harlan D. Mills
One of Mills's most important lessons had to do with the difference between management and engineering. He did not try to to tell us how to manage unmanageable projects;.he showed us how to make projects manageable. He understood that only well-designed and well-documemted software could be managed properly. He taught us how to design products in ways that made management easier. Today, when so many papers discuss software engineering as if the problem was simply project management, I miss Harlan Mills.


A taste of BlockChain

Definitions

Blockchain was first proposed in 2008 and implemented in 2009  (https://bitcoin.org/bitcoin.pdf)

Blockchain = a public ledger + transactions are stored in a list of blocks
Security is implemented using Asymmetric cryptography
Ledger consistency is implemented using distributed consensus algorithms (proof of work, or, proof of stake)

The typical digital signature algorithm used in blockchains is the elliptic curve digital signature algorithm

A cryptocurrency wallet, comparable to a bank account, contains a pair of public and private cryptographic keys. A public key allows for other wallets to make payments to the wallet's address, whereas a private key enables the spending of cryptocurrency from that address.

The cryptocurrency itself is not in the wallet. In case of bitcoin and cryptocurrencies derived from it, the cryptocurrency is decentrally stored and maintained in a publicly available ledger called the blockchain

Taxonomy of blockchain systems 



Consensus


In Blockchain, there is no central authority that ensures ledgers on distributed nodes are all the same. Consensus algorithms are used to ensure ledgers in different nodes are consistent:

  • Proof of work: if a node wants to publish a block of transactions, a lot of work has to be done first:
    • Each node of the network is calculating a hash value of the block header
    • The consensus requires that the calculated hash must be equal to or smaller than a certain value called "Target"
    • The block header contains a "Nonce" and miners would change the nonce frequently until getting a valid hash value.
    • Once node reaches the target value, it broadcast the block to other nodes to confirm the correctness of the hash value and append this new block to their own blockchains 
  • Proof of stake: only "reputed" nodes can create bocks: Reputation is often the coin amount owned by a miner.
    • Miners have to prove the ownership of the amount of currency (stake size).
    • Using only PoS leads to unfairness between miners (i.e. richest miner endup being monopoly)
    • To avoid unfairness, PoS is combined with other strategies 
    • Such strategies are: Lowest hash value. Coin age based selection (older and larger sets of coins have a greater probability of mining next block).
       
  • Byzantine fault tolerant: Consortium/private ledgers, voting consensus.
    • Every node has to be known by all the network
    • A block is created in a round.
    • A node is candidate for a round if it has received votes from over 2/3 of all nodes.
  • Ripple: Aristocratic approach.
    • Only predefined nodes (called servers) participate on consensus
    • A new block is added to the ledger if  >80% of servers agreed on it.

Consensus Algorithms Comparison



Block mining scenario






Foundations

Peer 2 Peer Network

Public-Private keys cryptography

Consensus protocols

Hash function properties:
  • Collision free
  • Hiding (irreversibly)
  • compact (fixed-size)
Hash pointers used to guarantee that no data is tempered on a data structure (e.g linked list, binary tree).

Digital signature:
  • Only you can sign
  • Anyone can verify
  • A signature cannot be forged (e.g cut-pasted from one document to another) 

Ledger (i.e. BlockChain)

A mathematical structure for storing data in a way that is nearly impossible to fake. It can be used for all kinds of valuable data.

It’s a new way of answering an old question: how can we create enough trust between one another to peacefully exchange something of value?
Application:
  • Account-to-Account digital cash transfer: Alice sends an amount to Bob (e.g. Bitcoin)
  • Smart contract digital cash transfer: someone stores a program - called smart contract - in Blockchain. That program triggers to transfer to a target account when conditions met) 


Blockchain implementation

BitCoin (2009, Value store, C++)
Ethereum (2015. Open source. Smart contract) (Supported by Microsft)
Hyperledger (Open source, Java golang)(Supported by IBM, Linux foundation)
Corda (Open sourde, Haskell)


In Bitcoin jargon, BTC called cryptocurrency
In Ethereum jargon, ETH called Token
No cryptocurrency in Hyperledger, only smart contract?


Smart contract (SC): make an agreement between a seller and a buyer to exchange any commodity with value,  more general than money transfer
Issues with SC :
  • Oracle problem: when it comes to sell physical Objects via SC, a trusted authority (Oracle) is needed to verify the Object life cycle in both world: Real (physical) and Virtual (Blockchain). In general, no external data from the outside world can be brought into the decentralized systems without trusting a third party as an oracle


Blochian scalability

Issue regarding public Blockchain:  transaction processing scalability.
Possible solutio:
- Off-chain state channels : a mechanism by which Blockchain work get conducted off of the Blockchain.
- Sharding: transactions are directed to different nodes depending on which shards they affect. Each shard only processes a small part of the state.
- Plasma: a series of contracts that run on top of a root blockchain. The root blockchain enforces the validity of the state in the Plasma chains using “fraud proofs”

 

Wallet generation


Here is how Bitcoin, for example, wallet is created:
  • generate ECC key pair (check here: https://github.com/lzbair/security/tree/master/ecc) 
  • Create a WIF (Wallet Interchange Format) by encoding the private key using a Base58. This is your secret key.
  • Create your public address: 
    • Take public key and compress it. public key compression means keeping only the x-coordinate, y-coordinate is removed since  it's calculable from the ellipiic curve equation.
    • Apply successive hash functions: SHA256 and RIPEMD160.
    • Create a checksum for the c value.
    • Concatenate hash  value with it's checksum, and encode all in base58.


Questions 

  1. Background: What's money? Currency? Exchange?
  2. Who can create a transaction? --> the holder of private key (wallet)?
  3. How a transaction is validated? for example a fake private key, or a wallet without credit
  4. Does transaction validation occurs during the process of block creation?
  5. How invalid blocks are detected and removed from the blockchain?
  6. Centralized vs Decentralized Exchange?
  7. How it works : Cloud Blockchain, Mining as a Cloud Service
  8. How to develop distributed application (dApps) using Ethereum / Hyperledger?

Bibliography

https://www.springer.com/gp/book/9783030030346

https://www.researchgate.net/profile/Hong-Ning_Dai/publication/318131748_An_Overview_of_Blockchain_Technology_Architecture_Consensus_and_Future_Trends/links/59d71faa458515db19c915a1/An-Overview-of-Blockchain-Technology-Architecture-Consensus-and-Future-Trends.pdf

https://www.amazon.com/Computer-Internet-Security-Hands-Approach/dp/1733003924

Physics and Philosophy - Notes on Heisenberg's course

- The central lesson of the theory of relativity is that space and time are not merely the arena in which the drama of  the universe is acted out but part of the cast (part of the physical universe as matter).

- Heisenberg's uncertainty: all physical quantities that can be observed are subject to unpredictable fluctuations, so that their values are not precisely defined.

- Consider, for example, the position x and the momentum p (p=mv) of a quantum particle such as an electron. The experimenter is free to measure either of these quantities to arbitrary precision, but they cannot possess precise values  simultaneously (in one observation or mesurement).

- Quantum mechanics is a statistical theory. Where it differs from other statistical theories is that the chance element is inherent in the nature of the quantum system and not merely imposed by our limited grasp of all the variables that affect the system.

- In a classical world our observations do not create reality: they uncover it. Thus atoms and particles continue to exist with well-defined attributes even when we do not observe them.

- By contrast, in quantum mechanics one cannot meaningfully talk about what an electron is doing between observations because it is the observations alone that create the reality of the electron. Thus a measurement of an electron's position creates an electron-with-a-position; a measurement of its momentum creates an electron-with-a-momentum. But neither entity can be considered already to be in existence prior to the measurement being made.

- A particle is an abstract encodement of a set of potentialities or possible outcomes of measurements. But the reality is in the observations, not in the electron.

- 'What actually happens inside a piece of measuring apparatus when a measurement of a quantum particle is made?' 
=> The combined system of apparatus + particle adopts a state that still represents a range of potential possibilities. These potential possibilities should be mesured (observed) themselves (circular raisoning).
Deails: "Measurement problem" by John von Neumann

- Radiation: releasing energy (in form of waves or particles).
- Atom with more or less neutrons (than the proton's number) is called "unstable" or "radioactive".
- Unstable atom ==release_energy==> Stable atom.
Released energy  is either alpha, beta or gamma

- Break: wave simulation https://phet.colorado.edu/

- Standing/Stationary Waves: superposition of a wave and its reflection
- Golden rule of a wave: Speed = WaveLength* Frequency
- Richard Feynman on Energy :
http://www.feynmanlectures.caltech.edu/I_04.html

Information theory - elementary

Information: the ability to distinguish among possible alternatives.
How to measure Information: Shannon says: the amount of information contained in a message from a source is the entropy H:
H = log2 (M)
M : number of possible messages.

Comments:
- Entropy formula uses log base 2, it means the measurement unit is "bit".
- H is interpreted as the amount of information we lack before we receive a message.
- H of a source equals the minimum number of bits required to represent (encode) a message.
- The formula above assumes that each possible message is equally likely (same probability).

Notes on probability:
 - Frequency and probability are  very likely to be nearly equal if we make many experiments.
p(x|y) = p(x & y) / p(y) ;  Notice that p(y) = sigma( p(x & y) ) for all x
- F is a numeric function of x. Average(F) = sigma(p(x) * F(x)) for all (if probabilities are equal, then the average is exactly the  arithmetic mean).


Generalisation:
- Given a message x, the  surprise of x is : s(x) = log2(1/p(x))
- The more surprising message tells us more (information)
- General formula of entropy: S is the surprise function of x, then
H = Average(S) = sigma(p(x) * log2(1/p(x)))

Communication:
- The codeword is the format that a message (letter, word ...) is represented (encoded) by.
- Efficient communication: the more common a message is, the shorter should be the corresponding codeword.
- Encoding-Decoding bijection: the encoding should satisfy the prefix-free condition = No codeword can be a prefix of an other codeword.
-  An encoding is free-prefix => the Kraft inequality is satisfied.
- Kraft inequality:  sigma(1/2^l)<= 1  over all codewords (l is codeword's length). The inequality states that they should not be so many short codewords in an encoding.
- Given a free-prefix encoding of X (messages from a source); and L is length's function of the corresponding codewords, then:  Average(L) >= H(X).
This inequality states that:
  • The length of codewords, on average, can not be shorter than the entropy of information source.
  • For encoding efficiency, the average of lengths should be closest (or equal) to the entropy (e.g Huffman code is the best efficient free-prefix encoding)
  • Application: data compression is the process of re-encoding data in a more efficient way (closest to H). Most of the time, this is achieved by analysing data regularities (text pattern ....) 
Shannon's first theorem: H = the minimum of {Average(L), where  L is length's function of a possible encoding}

Data compression: run length encoding (for simple image, e.g black and white).


Large data - Technology stack

Hadoop

Architecture:
  • Master node: metadata / descriptor (e.g which block is on which slave)
  • Data nodes (slaves): holding blocks of data
Basic-idea: Push processing from Master->Salves instead of Pulling data from Slaves -> Master



- Technology stack:
  • Transformers: MapReduce (google), Pig (Yahoo), Hive (Facebook). 
  • Real time stream processing: Storm (clojure), Spark (n memory).
  • Data stores (nosql-haddop friendly): Hbase, Accumulo 
  • High volume message brokers:  Kafka (Producer->Queue->Consumer)
  • Others: HCatalog, Oozie, Mahout,