Showing posts with label Big Data. Show all posts
Showing posts with label Big Data. Show all posts
Wednesday, 30 March 2016
Book: Data Algorithms Recipes for Scaling Up with Hadoop and Spark
Very nice book which teaches how to implement mechine learning and data mining techniques such as NBC, recommender, clustering, etc. Implemented in java, the book provides codes in both hadoop mapreduce and apache spark in simple-to-understand and clean manner. Have re-coded most of the algorithms in the book except for chapters dealing with some of the bio stuff which i am not particularly interested at the moment.
Book: Apache Spark Graph Processing
This book provided me guide on how to use apache spark graphx to graph processing in my project. While the book provides only basic implementation and intro to graphx features such as visualization, aggregateMessages, and pregel, I gain some useful insights after re-implemented most of the codes in the book (knowledge about scala required)
Link: https://www.packtpub.com/big-data-and-business-intelligence/apache-spark-graph-processing
Link: https://www.packtpub.com/big-data-and-business-intelligence/apache-spark-graph-processing
Book: Hadoop MapReduce v2 Cookbook
Use the material of this book to build the mapreduce algorithms and hdfs infrastructure for my project. Very good introduction in terms of hadoop mapreduce and how to set up hdfs and yarn in virtual environment.
Link: https://www.packtpub.com/big-data-and-business-intelligence/hadoop-mapreduce-v2-cookbook-second-edition
Link: https://www.packtpub.com/big-data-and-business-intelligence/hadoop-mapreduce-v2-cookbook-second-edition
Wednesday, 3 February 2016
Online Course: Administering an ElasticSearch Cluster
Glad to complete this course. Wonderful materials for things like rolling restart for es upgrade, rolling index, snapshot via curator, the head plugin and es health check tips. Learned a lot on how to run es cluster properly.
Link: https://app.pluralsight.com/library/courses/administering-elasticsearch-cluster/table-of-contents
Link: https://app.pluralsight.com/library/courses/administering-elasticsearch-cluster/table-of-contents
Tuesday, 19 January 2016
Online Course: Patterns for Building Distributed Systems for The Enterprise
Completed this course. Quite good a mind opener, gain some new perspective of distributed computing, particularly the CQRS via MQ and ESB, as well as the append-only models (esp the event streaming and the historical model). This course gave some good ideas on how to proceed with the refactoring processing of a big data project working on.
Link: https://app.pluralsight.com/library/courses/cqrs-theory-practice/table-of-contents
Link: https://app.pluralsight.com/library/courses/cqrs-theory-practice/table-of-contents
Friday, 8 January 2016
Online Course: SQL on Hadoop - Analyzing Big Data with Hive
Completed this course. While the pace of the course is a bit fast, it is packed with many useful stuff from executing sql on top of hadoop using Hive, things such as Distributed By, Sort By, GroupSet, Multiple Insert, UDAF, UDTF are really cool. Will need to revisit the course materials when I start to use Hive in development.
Link: https://app.pluralsight.com/library/courses/sql-hadoop-analyzing-big-data-hive/table-of-contents
Link: https://app.pluralsight.com/library/courses/sql-hadoop-analyzing-big-data-hive/table-of-contents
Tuesday, 22 December 2015
Online Course: Introduction to Graph Databases and Neo4j
My first course into graph database on Neo4j. Quite a nice introduction into data storage using graph.
Link: https://app.pluralsight.com/library/courses/graph-databases-neo4j-introduction/table-of-contents
Link: https://app.pluralsight.com/library/courses/graph-databases-neo4j-introduction/table-of-contents
Saturday, 19 December 2015
Online Course: Cassandra for Developers
Completed this course, and realized that i have been using cassandra without proper understanding of partition key, index and cluster key.
Link: https://app.pluralsight.com/library/courses/cassandra-developers/table-of-contents
Link: https://app.pluralsight.com/library/courses/cassandra-developers/table-of-contents
Tuesday, 1 December 2015
Online Course: Stateful Reactive Concurrent SPAs with SignalR and Akka.NET
Completed this course to get the sense of how to interact with akka from web client.
Link: https://app.pluralsight.com/library/courses/akkadotnet-signalr-stateful-reactive-concurrent-spas/table-of-contents
Link: https://app.pluralsight.com/library/courses/akkadotnet-signalr-stateful-reactive-concurrent-spas/table-of-contents
Online Course: Improving Message Throughput in Akka.NET
Completed this course to understand how to improve the performance of akka using components like routers, stash, and pipeto as well as some related design patterns in akka.
Link: https://app.pluralsight.com/library/courses/akka-dotnet-improving-messaging-throughput/table-of-contents
Link: https://app.pluralsight.com/library/courses/akka-dotnet-improving-messaging-throughput/table-of-contents
Sunday, 29 November 2015
Online Course: Implementing Logging and Dependency Injection in Akka.NET
Completed this course. good for learning logging and DI in akka.NET (useful for java developer with akka as well)
Saturday, 28 November 2015
Online Course: Building Concurrent Applications with the Actor Model in Akka.NET
Have just completed this course, which is a gentle introduction to Akka.NET.
Link: https://app.pluralsight.com/library/courses/akka-dotnet-actor-model-building-concurrent-applications/table-of-contents
Link: https://app.pluralsight.com/library/courses/akka-dotnet-actor-model-building-concurrent-applications/table-of-contents
Monday, 9 November 2015
Online Course: Apache Spark Fundamental
Great course for anyone interesting to learn Apache Spark
Link:
https://app.pluralsight.com/library/courses/apache-spark-fundamentals/table-of-contents
https://app.pluralsight.com/library/courses/apache-spark-fundamentals/table-of-contents
Wednesday, 16 September 2015
Online Course: Understanding NoSQL
Very nice course which gives overview of NoSQLs and their comparison.
Link: http://www.pluralsight.com/courses/understanding-nosql
Link: http://www.pluralsight.com/courses/understanding-nosql
Sunday, 13 September 2015
Online Course: Big Data, The Big Picture
Nice course to provide a non-technical overview of the big data technologies and comparison.
Link: http://www.pluralsight.com/courses/bigdata-bigpicture
Link: http://www.pluralsight.com/courses/bigdata-bigpicture
Friday, 12 December 2014
Storm Blueprints: Patterns for Distributed Real-time Computation
A very useful book for anyone interested in Storm for real-time processing. I am particularly benefited by some of the practical use cases of Storm, especially on Storm Trident. Compared to other books on Storm that I read so far, this one seems to offer very good tutorials on several aspects of Trident which answers some of my puzzles over Trident such as:
While the later chapters involving druid and hadoop are a bit difficult for me to assimilate at this stage due to time constraints, i will definitely like to read it again on these chapters.
https://www.packtpub.com/big-data-and-business-intelligence/storm-blueprints-patterns-distributed-real-time-computation
- How to implement a Trident State
- How to implement a Trident State Factory
- How to implement a Trident State Updater
- How to effectively uses combiner, reducer, aggregator
- How to implement a Trident non-transactional, repeat transactional topology and opaque trident map state
- How to implement a Trident spout, coordinator, emitter
- How to implement recursive functions in Trident
While the later chapters involving druid and hadoop are a bit difficult for me to assimilate at this stage due to time constraints, i will definitely like to read it again on these chapters.
https://www.packtpub.com/big-data-and-business-intelligence/storm-blueprints-patterns-distributed-real-time-computation
Wednesday, 26 November 2014
Elastic Search Cookbook
The book is nice and easy for someone who just starts learning ElasticSearch. Interesting reader can get the book from:
http://www.amazon.com/ElasticSearch-Cookbook-Alberto-Paro/dp/1782166629
There are some parts which i feel a bit verbose such as when explaining the installation and different ways of search, query, map, etc (which will be useful for reference anyway) as i only need to understand a subset of these command while explore the rest in the wild (e.g., http://java.dzone.com/articles/elasticsearch-java-api). While I did read through most of the materials, the book seems to me a bit week to present in a short and concise way to get someone immediately up-to-speed with working with various features of elastic search. However, coupled with a bit of online resources, it is actually quite easy to get started and toy with elastic search.
http://www.amazon.com/ElasticSearch-Cookbook-Alberto-Paro/dp/1782166629
There are some parts which i feel a bit verbose such as when explaining the installation and different ways of search, query, map, etc (which will be useful for reference anyway) as i only need to understand a subset of these command while explore the rest in the wild (e.g., http://java.dzone.com/articles/elasticsearch-java-api). While I did read through most of the materials, the book seems to me a bit week to present in a short and concise way to get someone immediately up-to-speed with working with various features of elastic search. However, coupled with a bit of online resources, it is actually quite easy to get started and toy with elastic search.
Tuesday, 25 November 2014
Learning Storm
Just completes the book "Learning Storm". Very nice read, interesting readers can go to the following link to buy:
https://www.packtpub.com/big-data-and-business-intelligence/learning-storm
The book covers quite widely, with quite a number of ways to show how other technologies working with Storm introduced in a easy-to-understand way such as the covering of ZooKeeper, Kafka, Hadoop, YARN, Ganglia, JMX, HBase, Redis, MySQL, etc. I especially likes the way they teach Trident, which makes it much easier to grasp the concept of Trident, and the last chapter on machine is extremely useful.
While the normal readers can read the book chapters by chapters to take a slow and full exposure. For someone like me, who always like to delve directly into practice, the best approach is actually to read the book three times, each time skipping some chapters.
During first time, the reader should go through chapter 1 to chapter 4, skipping the thrift library introduction in chapter 3, and then directly jump to chapter 8, which gives an example of log processing in Storm. With this the reader will build a level of confidence after practicing the simple cases in these chapters.
During the second time, the reader should go through chapter 5 and chapter 9 to get a good ideas of what Trident is and how Trident work, as well as how to do machine learning using Trident.
During the third time, the reader can optionally go through the thrift library in chapter 3, then go to chapter 7 which show rich tools to interact with Storm such JMX and Ganglia. Finally if there is a need for integration with Hadoop, then go to chapter 6 and some other parts in chapter 7.
https://www.packtpub.com/big-data-and-business-intelligence/learning-storm
The book covers quite widely, with quite a number of ways to show how other technologies working with Storm introduced in a easy-to-understand way such as the covering of ZooKeeper, Kafka, Hadoop, YARN, Ganglia, JMX, HBase, Redis, MySQL, etc. I especially likes the way they teach Trident, which makes it much easier to grasp the concept of Trident, and the last chapter on machine is extremely useful.
While the normal readers can read the book chapters by chapters to take a slow and full exposure. For someone like me, who always like to delve directly into practice, the best approach is actually to read the book three times, each time skipping some chapters.
During first time, the reader should go through chapter 1 to chapter 4, skipping the thrift library introduction in chapter 3, and then directly jump to chapter 8, which gives an example of log processing in Storm. With this the reader will build a level of confidence after practicing the simple cases in these chapters.
During the second time, the reader should go through chapter 5 and chapter 9 to get a good ideas of what Trident is and how Trident work, as well as how to do machine learning using Trident.
During the third time, the reader can optionally go through the thrift library in chapter 3, then go to chapter 7 which show rich tools to interact with Storm such JMX and Ganglia. Finally if there is a need for integration with Hadoop, then go to chapter 6 and some other parts in chapter 7.
Wednesday, 12 November 2014
Getting Started with Storm
Just completed reading the "Getting Started with Storm" book, I will say this is one of the easiest-to-follow books I have read, yet it provides good basic understanding of working with Storm, a distributed system for processing streaming data. A good read, totally recommended for someone interested in learning Storm.
http://www.amazon.com/Getting-Started-Storm-Jonathan-Leibiusky/dp/1449324010
http://www.amazon.com/Getting-Started-Storm-Jonathan-Leibiusky/dp/1449324010
Sunday, 7 September 2014
Online Course: Intro to Hadoop and MapReduce
Udacity Link: Intro to Hadoop and MapReduce
Very easy-to-learn course (should be able to finish the tutorial and course in around 1.5 hours), basically the user will learn how to write command lines to interact with Hadoop DFS and write simple mapper and reducer python scripts to process files in Hadoop DFS
Very easy-to-learn course (should be able to finish the tutorial and course in around 1.5 hours), basically the user will learn how to write command lines to interact with Hadoop DFS and write simple mapper and reducer python scripts to process files in Hadoop DFS
Subscribe to:
Comments (Atom)