Showing posts with label Data Analytics. Show all posts
Showing posts with label Data Analytics. Show all posts
Wednesday, 30 March 2016
Book: Data Algorithms Recipes for Scaling Up with Hadoop and Spark
Very nice book which teaches how to implement mechine learning and data mining techniques such as NBC, recommender, clustering, etc. Implemented in java, the book provides codes in both hadoop mapreduce and apache spark in simple-to-understand and clean manner. Have re-coded most of the algorithms in the book except for chapters dealing with some of the bio stuff which i am not particularly interested at the moment.
Book: Apache Spark Graph Processing
This book provided me guide on how to use apache spark graphx to graph processing in my project. While the book provides only basic implementation and intro to graphx features such as visualization, aggregateMessages, and pregel, I gain some useful insights after re-implemented most of the codes in the book (knowledge about scala required)
Link: https://www.packtpub.com/big-data-and-business-intelligence/apache-spark-graph-processing
Link: https://www.packtpub.com/big-data-and-business-intelligence/apache-spark-graph-processing
Book: Hadoop MapReduce v2 Cookbook
Use the material of this book to build the mapreduce algorithms and hdfs infrastructure for my project. Very good introduction in terms of hadoop mapreduce and how to set up hdfs and yarn in virtual environment.
Link: https://www.packtpub.com/big-data-and-business-intelligence/hadoop-mapreduce-v2-cookbook-second-edition
Link: https://www.packtpub.com/big-data-and-business-intelligence/hadoop-mapreduce-v2-cookbook-second-edition
Sunday, 26 April 2015
Online Course: Text Retrieval and Search Engines
Very nice to course to learn how to implement search engine. Thanks to the course, I was able to build a basic search engine in C# having features introduced in the course such as:
Ranking Functions:
-- BM25, Pivoted Length Normalization for Vector Space Model (IDF, TF transformation, doc length normaliation)
-- JM and Dirichlet Prior Smoothing for Language model based query likelihood
-- PageRank and HITS from Link Analysis
Feedback Functions:
-- Rochhio algorithm with implicit feedback for VSM
-- KL divergence model for LM
Inverted Indexing and Map-reduce
Web-crawler:
-- breadth-first search, parallel, focused search
Link: https://www.coursera.org/course/textretrieval
Ranking Functions:
-- BM25, Pivoted Length Normalization for Vector Space Model (IDF, TF transformation, doc length normaliation)
-- JM and Dirichlet Prior Smoothing for Language model based query likelihood
-- PageRank and HITS from Link Analysis
Feedback Functions:
-- Rochhio algorithm with implicit feedback for VSM
-- KL divergence model for LM
Inverted Indexing and Map-reduce
Web-crawler:
-- breadth-first search, parallel, focused search
Link: https://www.coursera.org/course/textretrieval
Thursday, 9 April 2015
Online Course: R Programming
Very easy-to-follow and concise course on R Programming, take me around a day to view all the lecture videos and practice in R while following the videos (I liked Dr. Roger Peng's pace as well as the selective inclusions of R features in the course materials, most of the book on R out there try to cover too much and only ends up making the books more like a desktop reference or instruction manual for R). I particularly likes the parts covering the explanation on the usage of lapply, sapply, apply, mapply, sapply and split, as well as the parts covering the subsetting of vector, list, matrix, and data frame, which in the past it is a bit confusing for me
https://class.coursera.org/rprog-013
https://class.coursera.org/rprog-013
Tuesday, 7 April 2015
Online Course: Data Analytics and Inference Statistics
Very nice course for statistics introduction, Dr. Mine Centinkaya gives very easy-to-understand and concise explanations for many basic concepts such as probability tree, bayes rule, CLT, confidence interval, hypothesis testing, chi square independence and GOF testing, distributions such as t distribution (used for when CLT does not hold for small samples), f statistics and ANOVA. The course is still continuing but i could not wait for the availability of the videos for the last two weeks and ends up reading the companion book "OpenIntro Statistics" (the last two chapters more on linear/logistic regression as well as related statistics such as predictor correlation, predictor coefficient confidence interval, R^2, residuals as useful tools such as backward elimination and forward model selection using p-value and R^2)
https://class.coursera.org/statistics-003
https://class.coursera.org/statistics-003
OpenIntro Statistics
Very nice book to start learning inferential statistics, very concise and contains a lot of examples. The book is freely downloadable from:
https://www.openintro.org/stat/textbook.php?stat_book=os
You will be able to get a clear explanations of concepts such as central limit theorem, confidence interval, standard error, hypothesis testing (for continuous, categorical variables), Chi square GOF and independence test, normal/t/f statistics, bootstrapping, ANOVA, multiple comparisons, regression model selection (forward, backward model selection).
https://www.openintro.org/stat/textbook.php?stat_book=os
You will be able to get a clear explanations of concepts such as central limit theorem, confidence interval, standard error, hypothesis testing (for continuous, categorical variables), Chi square GOF and independence test, normal/t/f statistics, bootstrapping, ANOVA, multiple comparisons, regression model selection (forward, backward model selection).
Sunday, 11 August 2013
Data Analysis Using SQL and Excel
Although I use this book primarily for learning some statistics, some of the examples of SQL query can be quite useful. Overall, I think the book is quite easy to master, and may provide some insight into CRM data analytics.
http://www.amazon.com/dp/0470099518
Reading Status: Completed
http://www.amazon.com/dp/0470099518
Reading Status: Completed
Sunday, 4 August 2013
Data Mining Cookbook: Modeling Data for Marketing, Risk, and Customer Relationship Management
Although from my personal opinion, the book is not very well written, it provides quite a bit of insights on how mining method such as logistic regression can be used to customer segmentation, profiling, and create various targeting, propensity models for marketing and risk as well as customer relationship management. the illustrated programming language is in SAS, but the code is not completely listed in the book. It is not a beginning book for SAS but a seasoned programmer who previously does not programme in SAS should not have difficulty picking up the code and implementation in the book. btw the book is outdated,
http://www.amazon.com/Data-Mining-Cookbook-Relationship-Management/dp/0471385646
Reading Status: Not Completed.
http://www.amazon.com/Data-Mining-Cookbook-Relationship-Management/dp/0471385646
Reading Status: Not Completed.
Thursday, 25 July 2013
Programming Collective Intelligence: Building Smart Web 2.0 Application
Very easy-to-understand book on how to apply machine learning and computational intelligence methods on web data.
http://www.amazon.com/Programming-Collective-Intelligence-Building-Applications/dp/0596529325
Status: Not Completed
http://www.amazon.com/Programming-Collective-Intelligence-Building-Applications/dp/0596529325
Status: Not Completed
Thursday, 11 July 2013
Supply Chain Optimization Design and Management Advances and Intelligent Methods.
Very nice book about the recent supply chain optimization method as well as computational intelligence approach for solving problems such customer uncertainty, etc.
Reading Status: Not Completed
Reading Status: Not Completed
Wednesday, 10 July 2013
Learning OpenCV Computer Vision with the OpenCV library
This is a very easy to understand book on computer vision which uses the OpenCV library to develop computer vision and learning applications.
Reading Status: Not Completed
Reading Status: Not Completed
Friday, 5 July 2013
Mining the Social Web
Very useful book on using python tools to mine the web
Link: http://www.amazon.com/Mining-Social-Web-Matthew-Russell/dp/1449367615/ref=sr_1_2?s=books&ie=UTF8&qid=1373020345&sr=1-2&keywords=Mining+the+Social+Web
Reading Status: Not Completed
Link: http://www.amazon.com/Mining-Social-Web-Matthew-Russell/dp/1449367615/ref=sr_1_2?s=books&ie=UTF8&qid=1373020345&sr=1-2&keywords=Mining+the+Social+Web
Reading Status: Not Completed
Mining Graph Data
The book I read when researching on application of graph mining for vehicle routing problems during my PHD.
Link: http://www.amazon.com/Mining-Graph-Data-Diane-Cook/dp/0471731900
Reading Status: Completed
Link: http://www.amazon.com/Mining-Graph-Data-Diane-Cook/dp/0471731900
Reading Status: Completed
Constructing intelligent agents using JAVA
The first book that i came across concepts in computational intelligence method related to data mining and machine learning.
Link: http://books.google.com.sg/books/about/Constructing_intelligent_agents_using_JA.html?id=ZbBQAAAAMAAJ&redir_esc=y
Reading Status: Completed
Link: http://books.google.com.sg/books/about/Constructing_intelligent_agents_using_JA.html?id=ZbBQAAAAMAAJ&redir_esc=y
Reading Status: Completed
Exploring Everyday Things with R and Ruby
If you’re curious about how things work, this fun and intriguing guide will help you find real answers to everyday problems. By using fundamental math and doing simple programming with the Ruby and R languages, you’ll learn how to model a problem and work toward a solution.
All you need is a basic understanding of programming. After a quick introduction to Ruby and R, you’ll explore a wide range of questions by learning how to assemble, process, simulate, and analyze the available data. You’ll learn to see everyday things in a different perspective through simple programs and common sense logic. Once you finish this book, you can begin your own journey of exploration and discovery.
Reading Status: Not Completed
Introduction to Scientific Programming and Simulation Using R
An Introduction to Scientific Programming and Simulation Using R teaches the skills needed to perform scientific programming while also introducing stochastic modelling. Stochastic modelling in particular, and mathematical modelling in general, are intimately linked to scientific programming because the numerical techniques of scientific programming enable the practical application of mathematical models to real-world problems.
Reading Status: Not Completed
Subscribe to:
Comments (Atom)