Assignment:

1. Use the diamonds dataset (part of ggplot2 package), create a visualization to explore the relationship between carat and price conditioned on (diamond) color.

  • carat: number (continuous)
  • price: integer (discrete)
  • color: ordinal factor (discrete)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.2
ggplot(diamonds, aes(price, carat, color = color)) +
  geom_point()

ggplot(diamonds, aes(carat)) +
  geom_bar(binwidth = 0.1)
## Warning: `geom_bar()` no longer has a `binwidth` parameter. Please use
## `geom_histogram()` instead.

ggplot(diamonds, aes(carat, price, fill = color)) +
  xlim(0,3) +
  geom_bar(stat = "identity")
## Warning: Removed 32 rows containing missing values (position_stack).
## Warning: Removed 8 rows containing missing values (geom_bar).

ggplot(diamonds, aes(price, carat, color = color)) +
  geom_point()

ggplot(diamonds, aes(carat, price, color = color)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

2. Create a visualization to explore the distribution of price conditioned on (diamond) cut.

ggplot(diamonds, aes(cut, price)) +
  geom_bar(stat = "identity")

ggplot(diamonds, aes(cut, price, color = color)) +
  geom_bar(stat = "identity")

ggplot(diamonds, aes(cut, price, color = color)) +
  geom_violin()

ggplot(diamonds, aes(cut, price, color = color)) +
  geom_boxplot()

ggplot(diamonds, aes(cut, price, color = color)) +
  geom_violin()

You should explore various visualization options, and make/select/present a visualization that you think is most effective.

ggplot(diamonds, aes(price)) + 
  geom_density(na.rm = TRUE) 

ggplot(diamonds, aes(price, fill = cut, color = cut)) + 
  geom_density(na.rm = TRUE)

ggplot(diamonds, aes(carat)) + 
  geom_density(na.rm = TRUE) 

ggplot(diamonds, aes(carat, fill = cut, color = cut)) + 
  geom_density(na.rm = TRUE)

ggplot(diamonds, aes(price)) +
  geom_freqpoly(aes(color = cut), binwidth = 20, na.rm = TRUE) 

ggplot(diamonds, aes(carat)) +
  geom_histogram(aes(fill = cut), binwidth = 0.1, position ="fill", 
                 na.rm = TRUE)