Key Big Data Terms You Should Know

0
60

Share on LinkedIn

Given below is a listing of key Big Data terms that you should know and a very brief explanation of what it is in simple language. Hope you find it useful.

1. Hadoop: System for processing very large data sets
2. HDFS or Hadoop Distributed File System: For storage of large volume of data (key elements – Datanodes, Namenode and Tasktracker)
3. MapReduce: Think of it as Assembly level language for distributed computing. Used for computation in Hadoop
4. Pig: Developed by Yahoo. It is a higher level language than MapReduce
5. Hive: Higher level language developed by Facebook with SQL like syntax
6. Apache HBase: For real-time access to Hadoop data
7. Accumulo: Improved HBase with new features like cell level security
8. AVRO: New data serialization format (protocol buffers etc.)
9. Apache ZooKeeper: Distributed co-ordination system
10. HCatalog: For combining meta store of Hive and merging with what Pig does
11. Oozie: Scheduling system developed by Yahoo
12. Flume: Log aggregation system
13. Whirr: For automating hadoop cluster processing
14. Sqoop: For transfering structured data to Hadoop
15. Mahout: Machine learning on top of MapReduce
16: Bigtop: Integrate multiple Hadoop sub-systems into one that works as a whole
17. Crunch: Runs on top of MapReduce, Java API for tedious tasks like joining and data aggregation.
18. Giraph: Used for large scale distributed graph processing

Also, embedded below is an excellent TechTalk by Jakob Homan of LinkedIn on the subject explaining these tech terms.

Republished with author's permission from original post.

ADD YOUR COMMENT

Please use comments to add value to the discussion. Maximum one link to an educational blog post or article. We will NOT PUBLISH brief comments like "good post," comments that mainly promote links, or comments with links to companies, products, or services.

Please enter your comment!
Please enter your name here