Introduction In this post, we are going to look at some publicly available data to dig deeper into exploratory data analysis and machine learning techniques. We’ll look at some data from Jonathan McDowell’s Catalog site. This site has data about man-made objects launched into space and all the debris we have created in near earth orbits. We’ll try to load the file directly from the site. This allows us to get the most recent data from the internet.

Continue reading

Generative Art

library(tidyverse) library(ambient) library(randomcoloR) library(ggforce) disturbance = expand.grid(c = 1:25, r = 1:49) %>% mutate( c = ifelse(r %% 2 == 0, c + 0.5, c), a = 180 * gen_cubic(c, r, frequency = 0.1, seed = 1964) ) %>% filter(c <= 50) ggplot(disturbance) + geom_text(aes(c, r, label = "0", angle = a, color = a), family = "Times", size = 16, show.legend = F) + coord_fixed(ratio = 0.5, expand = TRUE) + scale_color_gradient2(low = "orange", high = "red", mid = "tomato") + theme_void() set.

Continue reading

Introduction In this post, we are going to look at some publicly available data to dig deeper into exploratory data analysis and machine learning techniques. I am going to start by fetching some data from the inter webs, this data is available at the FuelEconomy.gov site. This file has fuel economy data for all cars sold in the United States for several years. Let’s start by loading the libraries we need:

Continue reading

NPS analysis NPS - Comment analysis In an [previous post](https://nitinahuja.github.io/2017/nps-exploratory-analysis-in-r/) we performed some EDA on the NPS data we have. Recall that as part of the question about the likelihood of recommending a service or business there is an optional text response about why they picked this score. Let’s try and see what those responses are all about. We had already performed some sentiment analysis on this text we are now going to attempt to classify this text into topics.

Continue reading

Heat maps in R

Heatmaps Heat maps are invaluable in displaying a large amount of continuous data contained in a 2d matrix. This post is meant to show a way to create a print worthy heat map in R. Let’s start by loading the required packages. suppressPackageStartupMessages({ library(ggplot2) library(ggthemes) library(viridis) library(scales) library(tidyr) }) Data Our data is from a business that receives sales calls 24x7. Let’s read and see what the data looks like. We have observations (count of calls) for each day of the week and each hour of the day.

Continue reading

NPS analysis What is net promorter score (NPS)? Net Promoter Score or NPS is a customer loyalty metric and was developed by Fred Reichheld and it asks respondents to answer a single question. How likely are you to recommend this product? The respondents are asked to score between 0 and 10. 10 being “most likely” to recommend and 0 being “least likely”. An additional optional question is asked about why they picked this score and the response to that is usually a text comment.

Continue reading

Exploring crime in Philadelphia This is a large and intersting dataset and has data points stretching back over 10 years. Several explorations have pointed out that crime seems to be seasonal and I wanted to explore this with a time series. Assuming that seasonal trends might repeat themselves, I am exploring this using the forecast package and using linear regression to predict trends. suppressPackageStartupMessages({ library(data.table) library(forecast) library(knitr) }) Data size and structure.

Continue reading

Author's picture

Nitin Ahuja

A programmer’s viewpoint

Forever learning

California