Thursday, August 18, 2016

Exploratory Data Analysis


I finished Coursera Johns Hopkins University Exploratory Data Analysis course earlier this year. It was a great course for getting into data and using R to get a good idea of what you are looking at. Here are some of the main subjects of the course and few quick thoughts on each.

Exploratory Graphs - You can use these to do a quick and dirty look into the data that you have and see what it might tell you. Then you can put together a more polished set of charts after you gain direction.

Plotting Systems - Here are the ones that are covered:

  1. base - The basic plotting system. You can create plots and then annotate and add things as you go. Easy to use, but set of commands to recreate.
  2. lattice - Everything is created in one function call and no way to add anything after that.
  3. ggplot2 - A combination of the two. A lot of the basics taken care of up front intuitively and then can add if needed after creation.

Clustering

  1. Hierarchical or K Means - Use distances (euclidian or manhattan) and dendrograms to create clusters of data of which to analyze.
  2. Dimension Reduction - Use principal component analysis (PCA) or singular value decomposition (SVD) to trim down the data and find meaningful relationships.

No comments:

Post a Comment