How to assign names to the the density curves? - r

Attached to the post you see a density plot. I am asked to compute the volume of bike rentals during weather levels, named weathersit (on the right). I am further to assign names, say Bad, Very_Bad, God_Awful and I_Wont_Even_Bother. Preferably the names should be near or in the blue circle, seen in the attached Image.
library(ggplot2)
library(hrbrthemes)
library(dplyr)
library(tidyr)
library(viridis)
#plotting the bike density
ggplot(Bikes_Washington, aes(x=cnt, group=weathersit, fill = weathersit)) +
geom_density(color = "darkblue", alpha=0.2)
Our plot

In day data, there only three categories exists, so I recode 1, 2, and 3 as Bad, Very Bad, God Awful. Also, it's recommended to change categorical variable as factor using factor or as.factor.
Bikes_Washington <- read.csv("D:/Prac/day.csv")
Bikes_Washington <- Bikes_Washington %>%
mutate(weathersit = as.factor(weathersit)) %>%
mutate(weathersit = recode_factor(weathersit, "1" = "Bad", "2" = "Very_Bad", "3" = "God_Awful"))
sp <- split(Bikes_Washington$cnt, Bikes_Washington$weathersit)
a <- lapply(seq_along(sp), function(i){
d <- density(sp[[i]])
k <- which.max(d$y)
data.frame(weathersit = names(sp)[i], xmax = d$x[k], ymax = d$y[k])
})
a <- do.call(rbind, a)
Bikes_Washington %>%
mutate(weathersit = as.factor(weathersit)) %>%
ggplot(aes(x=cnt, group=weathersit, fill = weathersit)) +
geom_density(color = "darkblue", alpha=0.2) +
geom_text(data = a,
aes(x = xmax, y = ymax,
label = weathersit, vjust = -.5))

Related

How to fill density plot within an interval with ggplot?

The code below fills each of the two densities with color under the area of the curve:
library(ggplot2)
#fake data
dat <- data.frame(dens = c(rnorm(100), rnorm(100, 2, 0.5))
, group = rep(c("C", "P"), each = 100))
#fill the area under the curve
ggplot(dat, aes(x = dens, fill = group)) + geom_density(alpha = 0.75)
How can I achieve the following two goals?
1) Only fill each curve within a specified interval. For example, interval [-1.5, 2.0] for group 'C' and [0.5, 2.8] for group 'P'.
2) Add a vertical segment (from x-axis to the curve) for each density. For example, at x=0.2 for group 'C' and at x=1.9 for group 'P'.
To get you stared, here's your first question:
library(dplyr)
library(purrr)
library(tidyr)
library(ggplot2)
as.data.frame.density <- function(x) data.frame(x = x$x, y = x$y)
densities <- dat %>%
group_nest(group) %>%
mutate(dens = map(data, ~as.data.frame(density(.$dens)))) %>%
unnest(dens)
ggplot(densities, aes(x = x, y = y, group = group)) +
geom_density(stat = 'identity') +
geom_density(
aes(fill = group),
. %>% filter((group == "C" & between(x, -1.5, 2.0)) | (group == "P" & between(x, 0.5, 2.8))),
stat = 'identity',
alpha = 0.75
)
There are other ways of calculating the densities per group, using dplyr is just one way. It is probably good to set an equal bandwidth to the two density estimations.
Adding the segments is similar to this approach, you just need to find the correct values in the densities data.frame.

Assigning fixed ggplot2 shapes based on dynamic set of values

I am working on a Power BI project and am making some R visuals. One of the visuals is a scatterplot. Each point has a categorical value ("Yes", "No"). I am choosing to represent these categories via shape. Sometimes the user will query the data in such a way that there will be only one category per point. I want my "Yes" shape to remain unique only for "Yes" values, and my "No" shape to remain unique only for "No" values.
My code:
ggplot(data = dataset) +
geom_point(data = dataset, aes(x= value1 y=value2, shape = YesNo_column) +
scale_shape_manual(name="", values = c(20, 0))
I want "Yes" points always to have shape "20", and "No" points always to have shape "0". If the user queries the data to only show "No" points, then my code will assign shape "20" to those points, instead of shape "0".
EDIT: I have just made some sample data to show my issue:
query1 <- mtcars[mtcars$cyl == 4 | mtcars$cyl == 6,]
query1$YesNo_column <- "blah"
query1[query1$cyl==4,]$YesNo_column <- "Yes"
query1[query1$cyl==6,]$YesNo_column <- "No"
ggplot(query1, aes(x=mpg, y=hp, shape=YesNo_column)) +
geom_point()+
scale_shape_manual(name="",values = c(20, 0))
query2 <- query1[query1$YesNo_column == "Yes",]
ggplot(query2, aes(x=mpg, y=hp, shape=YesNo_column)) +
geom_point()+
scale_shape_manual(name="",values = c(20, 0))
As you can see, the shapes are not fixed to the values.
Make sure your yes and no are encoded as a factor and simply use DROP = FALSE
library(tidyverse)
mydata <- tibble(
x = rnorm(20, 1.0, 0.3),
y = rnorm(20, 1.5, 0.4),
response = sample(c("yes", "no"), replace = TRUE, size = 20)
) %>%
mutate(response = as.factor(response))
ggplot(mydata, aes(x = x, y = y, shape = response)) +
geom_point() +
scale_shape_manual(name="", values = c(20, 0), drop = FALSE)
mydata %>%
filter(response == "yes") %>%
ggplot(aes(x = x, y = y, shape = response)) +
geom_point() +
scale_shape_manual(name="", values = c(20, 0), drop = FALSE)

Create shaded polygons around points with ggplot2

I saw yesterday this beautiful map of McDonalds restaurants in USA. I wanted to replicate it for France (I found some data that can be downloaded here).
I have no problem plotting the dots:
library(readxl)
library(ggplot2)
library(raster)
#open data
mac_do_FR <- read_excel("./mcdo_france.xlsx")
mac_do_FR_df <- as.data.frame(mac_do_FR)
#get a map of France
mapaFR <- getData("GADM", country="France", level=0)
#plot dots on the map
ggplot() +
geom_polygon(data = mapaFR, aes(x = long, y = lat, group = group),
fill = "transparent", size = 0.1, color="black") +
geom_point(data = mac_do_FR_df, aes(x = lon, y = lat),
colour = "orange", size = 1)
I tried several methods (Thiessen polygons, heat maps, buffers), but the results I get are very poor. I can't figure out how the shaded polygons were plotted on the American map. Any pointers?
Here's my result, but it did take some manual data wrangling.
Step 1: Get geospatial data.
library(sp)
# generate a map of France, along with a fortified dataframe version for ease of
# referencing lat / long ranges
mapaFR <- raster::getData("GADM", country="France", level=0)
map.FR <- fortify(mapaFR)
# generate a spatial point version of the same map, defining your own grid size
# (a smaller size yields a higher resolution heatmap in the final product, but will
# take longer to calculate)
grid.size = 0.01
points.FR <- expand.grid(
x = seq(min(map.FR$long), max(map.FR$long), by = grid.size),
y = seq(min(map.FR$lat), max(map.FR$lat), by = grid.size)
)
points.FR <- SpatialPoints(coords = points.FR, proj4string = mapaFR#proj4string)
Step 2: Generate a voronoi diagram based on store locations, & obtain the corresponding polygons as a SpatialPolygonsDataFrame object.
library(deldir)
library(dplyr)
voronoi.tiles <- deldir(mac_do_FR_df$lon, mac_do_FR_df$lat,
rw = c(min(map.FR$long), max(map.FR$long),
min(map.FR$lat), max(map.FR$lat)))
voronoi.tiles <- tile.list(voronoi.tiles)
voronoi.center <- lapply(voronoi.tiles,
function(l) data.frame(x.center = l$pt[1],
y.center = l$pt[2],
ptNum = l$ptNum)) %>%
data.table::rbindlist()
voronoi.polygons <- lapply(voronoi.tiles,
function(l) Polygon(coords = matrix(c(l$x, l$y),
ncol = 2),
hole = FALSE) %>%
list() %>%
Polygons(ID = l$ptNum)) %>%
SpatialPolygons(proj4string = mapaFR#proj4string) %>%
SpatialPolygonsDataFrame(data = voronoi.center,
match.ID = "ptNum")
rm(voronoi.tiles, voronoi.center)
Step 3. Check which voronoi polygon each point on the map overlaps with, & calculate its distance to the corresponding nearest store.
which.voronoi <- over(points.FR, voronoi.polygons)
points.FR <- cbind(as.data.frame(points.FR), which.voronoi)
rm(which.voronoi)
points.FR <- points.FR %>%
rowwise() %>%
mutate(dist = geosphere::distm(x = c(x, y), y = c(x.center, y.center))) %>%
ungroup() %>%
mutate(dist = ifelse(is.na(dist), max(dist, na.rm = TRUE), dist)) %>%
mutate(dist = dist / 1000) # convert from m to km for easier reading
Step 4. Plot, adjusting the fill gradient parameters as needed. I felt the result of a square root transformation looks quite good for emphasizing distances close to a store, while a log transformation is rather too exaggerated, but your mileage may vary.
ggplot() +
geom_raster(data = points.FR %>%
mutate(dist = pmin(dist, 100)),
aes(x = x, y = y, fill = dist)) +
# optional. shows outline of France for reference
geom_polygon(data = map.FR,
aes(x = long, y = lat, group = group),
fill = NA, colour = "white") +
# define colour range, mid point, & transformation (if desired) for fill
scale_fill_gradient2(low = "yellow", mid = "red", high = "black",
midpoint = 4, trans = "sqrt") +
labs(x = "longitude",
y = "latitude",
fill = "Distance in km") +
coord_quickmap()

R ggplot2::geom_density with a constant variable

I have recently came across a problem with ggplot2::geom_density that I am not able to solve. I am trying to visualise a density of some variable and compare it to a constant. To plot the density, I am using the ggplot2::geom_density. The variable for which I am plotting the density, however, happens to be a constant (this time):
df <- data.frame(matrix(1,ncol = 1, nrow = 100))
colnames(df) <- "dummy"
dfV <- data.frame(matrix(5,ncol = 1, nrow = 1))
colnames(dfV) <- "latent"
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.2, position = "identity") +
geom_vline(data = dfV, aes(xintercept = latent, color = 'ls'), size = 2)
This is OK and something I would expect. But, when I shift this distribution to the far right, I get a plot like this:
df <- data.frame(matrix(71,ncol = 1, nrow = 100))
colnames(df) <- "dummy"
dfV <- data.frame(matrix(75,ncol = 1, nrow = 1))
colnames(dfV) <- "latent"
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.2, position = "identity") +
geom_vline(data = dfV, aes(xintercept = latent, color = 'ls'), size = 2)
which probably means that the kernel estimation is still taking 0 as the centre of the distribution (right?).
Is there any way to circumvent this? I would like to see a plot like the one above, only the centre of the kerner density would be in 71 and the vline in 75.
Thanks
Well I am not sure what the code does, but I suspect the geom_density primitive was not designed for a case where the values are all the same, and it is making some assumptions about the distribution that are not what you expect. Here is some code and a plot that sheds some light:
# Generate 10 data sets with 100 constant values from 0 to 90
# and then merge them into a single dataframe
dfs <- list()
for (i in 1:10){
v <- 10*(i-1)
dfs[[i]] <- data.frame(dummy=rep(v,100),facet=v)
}
df <- do.call(rbind,dfs)
# facet plot them
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.5, position = "identity") +
facet_wrap( ~ facet,ncol=5 )
Yielding:
So it is not doing what you thought it was, but it is also probably not doing what you want. You could of course make it "translation-invariant" (almost) by adding some noise like this for example:
set.seed(1234)
noise <- +rnorm(100,0,1e-3)
dfs <- list()
for (i in 1:10){
v <- 10*(i-1)
dfs[[i]] <- data.frame(dummy=rep(v,100)+noise,facet=v)
}
df <- do.call(rbind,dfs)
ggplot() +
geom_density(data = df, aes(x = dummy, colour = 's'),
fill = '#FF6666', alpha = 0.5, position = "identity") +
facet_wrap( ~ facet,ncol=5 )
Yielding:
Note that there is apparently a random component to the geom_density function, and I can't see how to set the seed before each instance, so the estimated density is a bit different each time.

How I can make in ggplot2 my first 10 lines in red and the rest lines in blue based on example (R, ggplot2)

There were example code for E on ggplot2 library:
theme_set(theme_bw())
dat = data.frame(value = rnorm(100,sd=2.5))
dat = within(dat, {
value_scaled = scale(value, scale = sd(value))
obs_idx = 1:length(value)
})
ggplot(aes(x = obs_idx, y = value_scaled), data = dat) +
geom_ribbon(ymin = -1, ymax = 1, alpha = 0.1) +
geom_line() + geom_point()
There is a question: How I can make in ggplot2 my first 10 lines in red and the rest lines in blue based on example? I tried to use some kind of layer syntax is, but it doesn't work.
First, add another column to your data frame dat. It has value 0 for the first 10 rows and 1 for the rest.
dat$group <- factor(rep.int(c(0, 1), c(10, nrow(dat)-10)))
Generate the plot:
library(ggplot2)
ggplot(aes(x = obs_idx, y = value_scaled), data = dat) +
geom_ribbon(ymin = -1, ymax = 1, alpha = 0.1) +
geom_line(aes(colour = group), show_guide = FALSE) +
scale_colour_manual(values = c("red", "blue")) +
geom_point()
The parameter show_guide = FALSE suppresses the legend for the red and blue lines.
OK, I could manage layers, the code is (not elegant, but works):
require(ggplot2)
value=round(rnorm(50,200,50),0)
nmbrs<-length(value) ## length of vector
obrv<-1:length(value) ## list of observations
#create data frame from the values
data_lj<-data.frame(obrv,value)
data_lj20<-data.frame(data_lj[1:20,1:2])
data_lj21v<-data.frame(data_lj[20:nmbrs,1:2])
#plot with ggplot
rr<-ggplot()+
layer(mapping=aes(obrv,value),geom="line",data=data_lj20,colour="red")+
layer(mapping=aes(obrv,value),geom="line",data=data_lj21v,colour="blue")
print(rr)

Resources