Large data - Technology stack

Hadoop

Architecture:
  • Master node: metadata / descriptor (e.g which block is on which slave)
  • Data nodes (slaves): holding blocks of data
Basic-idea: Push processing from Master->Salves instead of Pulling data from Slaves -> Master



- Technology stack:
  • Transformers: MapReduce (google), Pig (Yahoo), Hive (Facebook). 
  • Real time stream processing: Storm (clojure), Spark (n memory).
  • Data stores (nosql-haddop friendly): Hbase, Accumulo 
  • High volume message brokers:  Kafka (Producer->Queue->Consumer)
  • Others: HCatalog, Oozie, Mahout, 

No comments: