Showing posts with label NoSQL. Show all posts
Showing posts with label NoSQL. Show all posts

Wednesday, 30 March 2016

Book: Apache Spark Graph Processing

This book provided me guide on how to use apache spark graphx to graph processing in my project. While the book provides only basic implementation and intro to graphx features such as visualization, aggregateMessages, and pregel, I gain some useful insights after re-implemented most of the codes in the book (knowledge about scala required)

Link: https://www.packtpub.com/big-data-and-business-intelligence/apache-spark-graph-processing

Book: Hadoop MapReduce v2 Cookbook

Use the material of this book to build the mapreduce algorithms and hdfs infrastructure for my project. Very good introduction in terms of hadoop mapreduce and how to set up hdfs and yarn in virtual environment.

Link: https://www.packtpub.com/big-data-and-business-intelligence/hadoop-mapreduce-v2-cookbook-second-edition


Friday, 8 January 2016

Online Course: SQL on Hadoop - Analyzing Big Data with Hive

Completed this course. While the pace of the course is a bit fast, it is packed with many useful stuff from executing sql on top of hadoop using Hive, things such as Distributed By, Sort By, GroupSet, Multiple Insert, UDAF, UDTF are really cool. Will need to revisit the course materials when I start to use Hive in development.

Link: https://app.pluralsight.com/library/courses/sql-hadoop-analyzing-big-data-hive/table-of-contents