Showing posts with label Data Analytics. Show all posts
Showing posts with label Data Analytics. Show all posts

Wednesday, 30 March 2016

Book: Data Algorithms Recipes for Scaling Up with Hadoop and Spark

Very nice book which teaches how to implement mechine learning and data mining techniques such as NBC, recommender, clustering, etc. Implemented in java, the book provides codes in both hadoop mapreduce and apache spark in simple-to-understand and clean manner. Have re-coded most of the algorithms in the book except for chapters dealing with some of the bio stuff which i am not particularly interested at the moment.

Book: Apache Spark Graph Processing

This book provided me guide on how to use apache spark graphx to graph processing in my project. While the book provides only basic implementation and intro to graphx features such as visualization, aggregateMessages, and pregel, I gain some useful insights after re-implemented most of the codes in the book (knowledge about scala required)

Link: https://www.packtpub.com/big-data-and-business-intelligence/apache-spark-graph-processing

Book: Hadoop MapReduce v2 Cookbook

Use the material of this book to build the mapreduce algorithms and hdfs infrastructure for my project. Very good introduction in terms of hadoop mapreduce and how to set up hdfs and yarn in virtual environment.

Link: https://www.packtpub.com/big-data-and-business-intelligence/hadoop-mapreduce-v2-cookbook-second-edition


Sunday, 26 April 2015

Online Course: Text Retrieval and Search Engines

Very nice to course to learn how to implement search engine. Thanks to the course, I was able to build a basic search engine in C# having features introduced in the course such as:

Ranking Functions:
  -- BM25, Pivoted Length Normalization for Vector Space Model (IDF, TF transformation, doc length normaliation)
  -- JM and Dirichlet Prior Smoothing for Language model based query likelihood
  -- PageRank and HITS from Link Analysis

Feedback Functions:
  -- Rochhio algorithm with implicit feedback for VSM
  -- KL divergence model for LM

Inverted Indexing and Map-reduce

Web-crawler:
  -- breadth-first search, parallel, focused search

Link: https://www.coursera.org/course/textretrieval


Thursday, 9 April 2015

Online Course: R Programming

Very easy-to-follow and concise course on R Programming, take me around a day to view all the lecture videos and practice in R while following the videos (I liked Dr. Roger Peng's pace as well as the selective inclusions of R features in the course materials, most of the book on R out there try to cover too much and only ends up making the books more like a desktop reference or instruction manual for R). I particularly likes the parts covering the explanation on the usage of lapply, sapply, apply, mapply, sapply and split, as well as the parts covering the subsetting of vector, list, matrix, and data frame, which in the past it is a bit confusing for me

https://class.coursera.org/rprog-013

Tuesday, 7 April 2015

Online Course: Data Analytics and Inference Statistics

Very nice course for statistics introduction, Dr. Mine Centinkaya gives very easy-to-understand and concise explanations for many basic concepts such as probability tree, bayes rule, CLT, confidence interval, hypothesis testing, chi square independence and GOF testing, distributions such as t distribution (used for when CLT does not hold for small samples), f statistics and ANOVA. The course is still continuing but i could not wait for the availability of the videos for the last two weeks and ends up reading the companion book "OpenIntro Statistics" (the last two chapters more on linear/logistic regression as well as related statistics such as predictor correlation, predictor coefficient confidence interval, R^2, residuals as useful tools such as backward elimination and forward model selection using p-value and R^2)

https://class.coursera.org/statistics-003

OpenIntro Statistics

Very nice book to start learning inferential statistics, very concise and contains a lot of examples. The book is freely downloadable from:

https://www.openintro.org/stat/textbook.php?stat_book=os

You will be able to get a clear explanations of concepts such as central limit theorem, confidence interval, standard error, hypothesis testing (for continuous, categorical variables), Chi square GOF and independence test, normal/t/f statistics, bootstrapping, ANOVA, multiple comparisons, regression model selection (forward, backward model selection).

Sunday, 11 August 2013

Data Analysis Using SQL and Excel

Although I use this book primarily for learning some statistics, some of the examples of SQL query can be quite useful. Overall, I think the book is quite easy to master, and may provide some insight into CRM data analytics.

http://www.amazon.com/dp/0470099518

Reading Status: Completed

Sunday, 4 August 2013

Data Mining Cookbook: Modeling Data for Marketing, Risk, and Customer Relationship Management

Although from my personal opinion, the book is not very well written, it provides quite a bit of insights on how mining method such as logistic regression can be used to customer segmentation, profiling, and create various targeting, propensity models for marketing and risk as well as customer relationship management. the illustrated programming language is in SAS, but the code is not completely listed in the book. It is not a beginning book for SAS but a seasoned programmer who previously does not programme in SAS should not have difficulty picking up the code and implementation in the book. btw the book is outdated,

http://www.amazon.com/Data-Mining-Cookbook-Relationship-Management/dp/0471385646

Reading Status: Not Completed.


Thursday, 11 July 2013

Supply Chain Optimization Design and Management Advances and Intelligent Methods.

Very nice book about the recent supply chain optimization method as well as computational intelligence approach for solving problems such customer uncertainty, etc.
Reading Status: Not Completed

Wednesday, 10 July 2013

Learning OpenCV Computer Vision with the OpenCV library

This is a very easy to understand book on computer vision which uses the OpenCV library to develop computer vision and learning applications.
Reading Status: Not Completed

Friday, 5 July 2013

Mining the Social Web

Very useful book on using python tools to mine the web

Link: http://www.amazon.com/Mining-Social-Web-Matthew-Russell/dp/1449367615/ref=sr_1_2?s=books&ie=UTF8&qid=1373020345&sr=1-2&keywords=Mining+the+Social+Web
Reading Status: Not Completed

Mining Graph Data

The book I read when researching on application of graph mining for vehicle routing problems during my PHD.

Link: http://www.amazon.com/Mining-Graph-Data-Diane-Cook/dp/0471731900
Reading Status: Completed

Constructing intelligent agents using JAVA

The first book that i came across concepts in computational intelligence method related to data mining and machine learning.

Link: http://books.google.com.sg/books/about/Constructing_intelligent_agents_using_JA.html?id=ZbBQAAAAMAAJ&redir_esc=y
Reading Status: Completed


Exploring Everyday Things with R and Ruby

If you’re curious about how things work, this fun and intriguing guide will help you find real answers to everyday problems. By using fundamental math and doing simple programming with the Ruby and R languages, you’ll learn how to model a problem and work toward a solution.

All you need is a basic understanding of programming. After a quick introduction to Ruby and R, you’ll explore a wide range of questions by learning how to assemble, process, simulate, and analyze the available data. You’ll learn to see everyday things in a different perspective through simple programs and common sense logic. Once you finish this book, you can begin your own journey of exploration and discovery.

Reading Status: Not Completed

Introduction to Scientific Programming and Simulation Using R


An Introduction to Scientific Programming and Simulation Using R teaches the skills needed to perform scientific programming while also introducing stochastic modelling. Stochastic modelling in particular, and mathematical modelling in general, are intimately linked to scientific programming because the numerical techniques of scientific programming enable the practical application of mathematical models to real-world problems.

Reading Status: Not Completed