How to plot a heatmap with 3 continuous variables in r ggplot2? - r

Sample dataset is as below:
count is discrete variable, temperature and relative_humidity_percent are continuous variables.
The code to generate sample dataset:
templ = data.frame(count = c(200,225,610,233,250,210,290,255,279,250),
temperature = c(12.2,11.6,12,8.5,4,8.2,9.2,10.6,10.8,10.9),
relative_humidity_percent = c(74,78,72,65,77,84,83,74,73,75))
count
temperature
relative_humidity_percent
200
12.2
74
225
11.6
78
610
12
72
233
8.5
65
250
4
77
210
8.2
84
290
9.2
83
255
10.6
74
279
10.8
73
250
10.9
75
I tried to plot a heatmap with ggplot2::stat_contour,
plot2 <- ggplot(templ, aes(x = temperature, y = relative_humidity_percent, z = count)) +
stat_contour(geom = 'contour') +
geom_tile(aes(fill = n)) +
stat_contour(bins = 15) +
guides(fill = guide_colorbar(title = 'count'))
plot2
The result is:
Also, I tried to use ggplot::stat_density_2d,
> ggplot(templ, aes(temperature, relative_humidity_percent, z = count)) +
+ stat_density_2d(aes(fill = count))
Warning messages:
1: In stat_density_2d(aes(fill = count)) :
Ignoring unknown aesthetics: fill
2: The following aesthetics were dropped during statistical transformation: fill, z
ℹ This can happen when ggplot fails to infer the correct grouping structure in the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor?
> geom_density_2d() +
+ geom_contour() +
+ metR::geom_contour_fill(na.fill=TRUE) +
+ theme_classic()
Error in `+.gg`:
! Cannot add <ggproto> objects together
ℹ Did you forget to add this object to a <ggplot> object?
Run `rlang::last_error()` to see where the error occurred.
The result:
which was not filled with colour.
What I want is:
I want to replace level with count in the graph. However, since count variable is not factor. Therefore I cannot plot heatmap by using ggplot::geom_contour...

I understand from your comment that you want to "fill the entire graph", thus having a less truthful representation of your three dimensional data, which would be more accurately represented as a scatter plot and local coding of your third variable. I understand that you intend to interpolate the observation density between the measured locations.
You can of course use geom_density_2d for this. Just do the same trick as in my other answer and uncount your data first.
NB this is of course creating bins of densities. Otherwise this type of visualisation with iso density lines is not working.
ggplot(tidyr::uncount(templ, count)) +
geom_density_2d_filled(aes(temperature, relative_humidity_percent))

Just use geom_point and color according to your count. You can of course make your points square.
Or, if your count is not yet actually an aggregate measure and you want to show the density of neighbouring observations, you could use ggpointdensity::geom_pointdensity for this. (in your example, I have to uncount first).
library(ggplot2)
library(dplyr)
library(tidyr)
templ = data.frame(count = c(200,225,610,233,250,210,290,255,279,250),
temperature = c(12.2,11.6,12,8.5,4,8.2,9.2,10.6,10.8,10.9),
relative_humidity_percent = c(74,78,72,65,77,84,83,74,73,75))
ggplot(templ) +
geom_point(aes(temperature, relative_humidity_percent, color = count), shape = 15, size = 5)
## first uncount
templ %>%
uncount(count) %>%
ggplot() +
ggpointdensity::geom_pointdensity(aes(temperature, relative_humidity_percent))

Related

Coloring a specific line in ggplot

I use ggplot to plot hundreds of simulated paths. The data has been organized by pivot_longer to look like this (200 simulated paths, each having 2520 periods; simulation 1 first, then simulation 2 etc., with ind showing the simulated values for each period):
sim
period
ind
1
0
100.0
1
1
99.66
.
.
.
1
2520
103.11
2
0
100.0
.
.
.
.
.
.
200
0
100.0
.
.
.
200
2520
195.11
Not sure if using pivot_long is optimal or not but at least the following ggplot looks fine:
p<-ggplot(simdata, aes(x=period, y=ind,color=sim, group=sim))+geom_line()
producing a nice graph with paths in different shades of blue.
What I would like to do is to color the mean, median and quartile paths with different colours (e.g. red and green). Median, mean and quartile paths are defined by the last period's value. I already know the sim number for those. E.g. let's assume that median path is the one where sim = 160.
I have tried the following approaches.
Add a new geom_line specifying the number (sim) of the median path:
p + geom_line(aes(y = simdata[sim == 160,], color ="red")
This fails since the additional geom_line is not of the same length (200*2520) as the simdata - even if the graph's x-axis only has 2520 periods.
Stat_summary
p + stat_summary(aes(group=sim),fun=median, geom="line",colour="red")
The outcome was that all lines become read, also the simulated ones. Also, I rejected this since it takes a lot more time to have ggplot to find the mean, median etc. values rather than finding them before the graphics part.
gghighlight
I experimented with this package but could not figure out if you can specify the path numbers to color.
Maybe try your first solution, but pass it to the data argument of geom_line instead:
p + geom_line(data = simdata[simdata$sim == 160,], color ="red")
As a quick example with some simulated data:
library(ggplot2)
df <- data.frame(a = rep(1:20, each = 100),
b = rep(1:100, times = 20),
c = rnorm(2000))
ggplot(df, aes(b, c, group = a)) +
geom_line(colour = "grey") +
geom_line(data = df[df$a==20,], colour = "red")
You can also pass a conditional as an argument in aes, which draws one line a colour specified by scale_colour_manual (tidier, adds legend, with labels which can be edited):
ggplot(df, aes(b, c, group = a, colour = a == 20)) +
geom_line() +
scale_colour_manual(values = c("TRUE" = "red", "FALSE" = "grey"))
Created on 2021-12-07 by the reprex package (v2.0.1)

ggplot2 - How to plot length of time using geom_bar?

I am trying to show different growing season lengths by displaying crop planting and harvest dates at multiple regions.
My final goal is a graph that looks like this:
which was taken from an answer to this question. Note that the dates are in julian days (day of year).
My first attempt to reproduce a similar plot is:
library(data.table)
library(ggplot2)
mydat <- "Region\tCrop\tPlanting.Begin\tPlanting.End\tHarvest.Begin\tHarvest.End\nCenter-West\tSoybean\t245\t275\t1\t92\nCenter-West\tCorn\t245\t336\t32\t153\nSouth\tSoybean\t245\t1\t1\t122\nSouth\tCorn\t183\t336\t1\t153\nSoutheast\tSoybean\t275\t336\t1\t122\nSoutheast\tCorn\t214\t336\t32\t122"
# read data as data table
mydat <- setDT(read.table(textConnection(mydat), sep = "\t", header=T))
# melt data table
m <- melt(mydat, id.vars=c("Region","Crop"), variable.name="Period", value.name="value")
# plot stacked bars
ggplot(m, aes(x=Crop, y=value, fill=Period, colour=Period)) +
geom_bar(stat="identity") +
facet_wrap(~Region, nrow=3) +
coord_flip() +
theme_bw(base_size=18) +
scale_colour_manual(values = c("Planting.Begin" = "black", "Planting.End" = "black",
"Harvest.Begin" = "black", "Harvest.End" = "black"), guide = "none")
However, there's a few issues with this plot:
Because the bars are stacked, the values on the x-axis are aggregated and end up too high - out of the 1-365 scale that represents day of year.
I need to combine Planting.Begin and Planting.End in the same color, and do the same to Harvest.Begin and Harvest.End.
Also, a "void" (or a completely uncolored bar) needs to be created between Planting.Begin and Harvest.End.
Perhaps the graph could be achieved with geom_rect or geom_segment, but I really want to stick to geom_bar since it's more customizable (for example, it accepts scale_colour_manual in order to add black borders to the bars).
Any hints on how to create such graph?
I don't think this is something you can do with a geom_bar or geom_col. A more general approach would be to use geom_rect to draw rectangles. To do this, we need to reshape the data a bit
plotdata <- mydat %>%
dplyr::mutate(Crop = factor(Crop)) %>%
tidyr::pivot_longer(Planting.Begin:Harvest.End, names_to="period") %>%
tidyr::separate(period, c("Type","Event")) %>%
tidyr::pivot_wider(names_from=Event, values_from=value)
# Region Crop Type Begin End
# <chr> <fct> <chr> <int> <int>
# 1 Center-West Soybean Planting 245 275
# 2 Center-West Soybean Harvest 1 92
# 3 Center-West Corn Planting 245 336
# 4 Center-West Corn Harvest 32 153
# 5 South Soybean Planting 245 1
# ...
We've used tidyr to reshape the data so we have one row per rectangle that we want to draw and we've also make Crop a factor. We can then plot it like this
ggplot(plotdata) +
aes(ymin=as.numeric(Crop)-.45, ymax=as.numeric(Crop)+.45, xmin=Begin, xmax=End, fill=Type) +
geom_rect(color="black") +
facet_wrap(~Region, nrow=3) +
theme_bw(base_size=18) +
scale_y_continuous(breaks=seq_along(levels(plotdata$Crop)), labels=levels(plotdata$Crop))
The part that's a bit messy here that we are using a discrete scale for y but geom_rect prefers numeric values, so since the values are factors now, we use the numeric values for the factors to create ymin and ymax positions. Then we need to replace the y axis with the names of the levels of the factor.
If you also wanted to get the month names on the x axis you could do something like
dateticks <- seq.Date(as.Date("2020-01-01"), as.Date("2020-12-01"),by="month")
# then add this to you plot
... +
scale_x_continuous(breaks=lubridate::yday(dateticks),
labels=lubridate::month(dateticks, label=TRUE, abbr=TRUE))

How to change bar position without affecting the results?

I'm new to R and im trying to change the position of a bar on a bar chart but my results have changed too. Here is the chart : Chart of age
when I use the code :
positions <- c("Moins de 18 ans","18 a 22 ans", "23 a 27 ans", "33 a 37 ans","38 ans et plus")
p + theme_classic() + scale_x_discrete(limits = positions)
This is the results I have:
Chart of age 2
and the message :
Warning messages:
1: Removed 86 rows containing non-finite values (stat_count).
2: Removed 86 rows containing non-finite values (stat_count).
I don't know what to do with this. Someone help me please!
Since you haven't provided the data, I can show how to rearrange bars using dummy data. To sort bar graph, essentially you need to sort data on variables you are using as x-axis in plot.
vec = c(rep("a", 30), rep("b", 20), rep("c", 10))
df = as.data.frame(table(vec)) # Create dummy data frame
Dataframe df looks like this -
vec Freq
1 a 30
2 b 20
3 c 10
The plot will be -
df %>%
ggplot(aes(x = vec, y = Freq)) +
geom_bar(stat = "identity") # default plot
Now, I want bars in order b,a,c. All I need to do is sort my dataframe in the same order -
df$vec = factor(df$vec, levels = c("b", "a", "c")) # assign levels in order you want to see the bar-plot
df = df[order(df$vec),] # sort dataframe on your x-variable
df %>%
ggplot(aes(x = vec, y = Freq)) +
geom_bar(stat = "identity") # barplot will be sorted on levels of factor now
The output of above code is -
I haven't done rest of the formatting, but from your graphs, you are good with that. By following these steps, your data shouldn't change when reordering the bars. If you can share your data, I can better understand if it solves your problem.

ggplot2 density-plot with discrete data

I want to create a density plot with the following data:
interval fr mi ab
0x 9765 3631 12985
1x 2125 2656 601
2x 1299 2493 191
3x 493 2234 78
4x 141 1559 20
5x and more 75 1325 23
On the X-Axis I want to have the Intervals and on the Y-Axis I want to have the density of "fr", "mi" and "ab" in different colors.
My imagination was something like this graph.
My problem is that I don't know how to get the density on the Y-Axis. I tried it with geom_density, but it didn't work. The best result I accomplished was using the following code:
DS29 <-as.data.frame(DS29)
DS29$interval <- factor(DS29$interval, levels = DS29$interval)
DS29 <- melt (DS29,id=c("interval"))
output$DS51<- renderPlot({
plot_tab6 <- ggplot(DS29, aes(x= interval,y = value, fill=variable, group = variable)) +
geom_col()+
geom_line()
return(plot_tab6)
})
This gives me the following plot, which is not the result I want to have. Do you have an idea how I could get to my wanted result? Thank you very much.
Seeing your sample data, I am not sure if you want to use geom_density. If you type ?geom_density, you will see some example codes. If I take one example from the help page, you may see things that you are missing.
ggplot(diamonds, aes(depth, fill = cut, colour = cut)) +
geom_density(alpha = 0.1) +
xlim(55, 70)
For x-axis, depth is a continuous variable, not a categorical variable. Your current data has a categorical variable in x-axis. For geom_density, you are looking for density of something at a value on x-axis. The example code above shows that the density of diamonds classified as "Ideal" has high density around 61.5-62, suggesting that the largest proportion "Ideal" diamonds have depth value around 61.5-62. Indeed, mean value for depth of "Ideal" diamond is 61.71. This means that you need multiple data points to calculate density. Your data has only one data point for each interval for each group (e.g., ab, fr, mi). So, I do not think your data is not ready for calculating density.
If you want to draw a graphic similar to what you suggested in your question using the current data, I think you need to 1) convert interval to a numeric variable, 2) transform the data into long format, and 3) use stat_smooth.
library(tidyverse)
mydf %>%
mutate(interval = as.numeric(sub(x = as.character(interval), pattern = "x", replacement = ""))) %>%
gather(key = group, value = value, - interval) -> temp
ggplot(temp, aes(x = interval, y = value, fill = group)) +
stat_smooth(geom = "area", span = 0.4, method = "loess", alpha = 0.4)

Plots in R (ggplot2) for time series with multiple values per time?

Let's say I have data consisting of the time I leave the house and the number of minutes it takes me to get to work. I'll have some repeated values:
08:00, 20
08:04, 25
08:30, 40
08:20, 23
08:04, 22
And some numbers will repeat (like 08:04). What I want to do is a run a scatter plot that is correctly scaled at the x-axis but allows these multiple values per entry so that I could view the trend.
Is a time-series even what I want to be using? I've been able to plot a time series graph that has one value per time, and I've gotten multiple values plotted but without the time-series scaling. Can anyone suggest a good approach? Preference for ggplot2 but I'll take standard R plotting if it's easier.
First lets prepare some more data
set.seed(123)
df <- data.frame(Time = paste0("08:", sample(35:55, 40, replace = TRUE)),
Length = sample(20:50, 40, replace = TRUE),
stringsAsFactors = FALSE)
df <- df[order(df$Time), ]
df$Attempt <- unlist(sapply(rle(df$Time)$lengths, function(i) 1:i))
df$Time <- as.POSIXct(df$Time, format = "%H:%M") # Fixing y axis
head(df)
Time Length Attempt
6 08:35 24 1
18 08:35 43 2
35 08:35 34 3
15 08:37 37 1
30 08:38 33 1
38 08:39 38 1
As I understand, you want to preserve the order of observations of the same leaving house time. At first I ignored that and got a scatter plot like this:
ggplot(data = df, aes(x = Length, y = Time)) +
geom_point(aes(size = Length, colour = Length)) +
geom_path(aes(group = Time, colour = Length), alpha = I(1/3)) +
scale_size(range = c(2, 7)) + theme(legend.position = 'none')
but considering three dimensions (Time, Length and Attempt) scatter plot no longer can show us all the information. I hope I understood you correctly and this is what you are looking for:
ggplot(data = df, aes(y = Time, x = Attempt)) + geom_tile(aes(fill = Length))

Resources