Hello I am working on a data set which looks like as below
raw_data =
week v1 v3 v4 v5 v6
1 17 20.983819 7.799831 16.0600278 113.018687
2 34 22.651678 8.090671 16.4898951 120.824817
3 15 24.197048 6.892516 16.9805836 128.105372
4 14 26.016688 5.272781 17.471264 140.15794
5 26 27.572317 10.767018 17.8686156 154.886518
6 37 29.018684 21.280104 19.8096452 165.244061
7 27 30.395094 32.140543 22.937902 176.453934
8 24 31.832068 44.008145 28.714597 184.7598
9 16 33.383742 45.704626 39.2958153 193.461108
10 28 34.877819 39.355206 45.9069661 201.305558
What I am trying to achieve is to plot variables from v3 to v6 as a stacked area plot while variable v1 as a line plot in the same graph plot across the week.
I have tried the following code which does plot the stack area plot but not the line plot.
mdf <- melt(raw_data, id="Week") # convert to long format
p <- ggplot(mdf, aes(x=Week, y=value)) + geom_area(aes(fill= mdf$variable), position = 'stack') + theme_classic()
p + ggplot(raw_data, aes(x=Week, y=v1)) +geom_line()
and I get the following error
Error: Don't know how to add e2 to a plot
I tired the method suggested by this article How to overlay geom_bar and geom_line plots with different number of elements using ggplot2? and used the below code
mdf <- melt(raw_data, id="Week") # convert to long format
p <- ggplot(mdf, aes(x=Week, y=value)) + geom_area(aes(colour =
mdf$variable, fill= mdf$variable), position = 'stack') + theme_classic()
p + geom_line(aes(x=Week, y=mdf$variable=="v1"))
but then I got the below error
Error: Discrete value supplied to continuous scale
I tried to convert the v1 variable as per below code referencing the following article, however it did not help to resolve.
How do I get discrete factor levels to be treated as continuous?
raw_data$v1 <- as.numeric(as.character(raw_data$v1))
Please help how to resolve the issue. Also, how do I create a black border line for each graph in my stacked graph such that it is easy to differentiate among the graphs.
Thanks a lot for the help in advance!!
Using your melt command does not work for me, so I'm using gather instead.
All you need to do is add geom_line and specify the data and mapping:
mdf <- tidyr::gather(raw_data, variable, value, -week, -v1)
ggplot(mdf, aes(week, value)) +
geom_area(aes(fill = variable), position = 'stack', color = 'black') +
geom_line(aes(y = v1), raw_data, lty = 2)
Note: don't use $ inside aes, ever!
Related
Sample dataset is as below:
count is discrete variable, temperature and relative_humidity_percent are continuous variables.
The code to generate sample dataset:
templ = data.frame(count = c(200,225,610,233,250,210,290,255,279,250),
temperature = c(12.2,11.6,12,8.5,4,8.2,9.2,10.6,10.8,10.9),
relative_humidity_percent = c(74,78,72,65,77,84,83,74,73,75))
count
temperature
relative_humidity_percent
200
12.2
74
225
11.6
78
610
12
72
233
8.5
65
250
4
77
210
8.2
84
290
9.2
83
255
10.6
74
279
10.8
73
250
10.9
75
I tried to plot a heatmap with ggplot2::stat_contour,
plot2 <- ggplot(templ, aes(x = temperature, y = relative_humidity_percent, z = count)) +
stat_contour(geom = 'contour') +
geom_tile(aes(fill = n)) +
stat_contour(bins = 15) +
guides(fill = guide_colorbar(title = 'count'))
plot2
The result is:
Also, I tried to use ggplot::stat_density_2d,
> ggplot(templ, aes(temperature, relative_humidity_percent, z = count)) +
+ stat_density_2d(aes(fill = count))
Warning messages:
1: In stat_density_2d(aes(fill = count)) :
Ignoring unknown aesthetics: fill
2: The following aesthetics were dropped during statistical transformation: fill, z
ℹ This can happen when ggplot fails to infer the correct grouping structure in the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor?
> geom_density_2d() +
+ geom_contour() +
+ metR::geom_contour_fill(na.fill=TRUE) +
+ theme_classic()
Error in `+.gg`:
! Cannot add <ggproto> objects together
ℹ Did you forget to add this object to a <ggplot> object?
Run `rlang::last_error()` to see where the error occurred.
The result:
which was not filled with colour.
What I want is:
I want to replace level with count in the graph. However, since count variable is not factor. Therefore I cannot plot heatmap by using ggplot::geom_contour...
I understand from your comment that you want to "fill the entire graph", thus having a less truthful representation of your three dimensional data, which would be more accurately represented as a scatter plot and local coding of your third variable. I understand that you intend to interpolate the observation density between the measured locations.
You can of course use geom_density_2d for this. Just do the same trick as in my other answer and uncount your data first.
NB this is of course creating bins of densities. Otherwise this type of visualisation with iso density lines is not working.
ggplot(tidyr::uncount(templ, count)) +
geom_density_2d_filled(aes(temperature, relative_humidity_percent))
Just use geom_point and color according to your count. You can of course make your points square.
Or, if your count is not yet actually an aggregate measure and you want to show the density of neighbouring observations, you could use ggpointdensity::geom_pointdensity for this. (in your example, I have to uncount first).
library(ggplot2)
library(dplyr)
library(tidyr)
templ = data.frame(count = c(200,225,610,233,250,210,290,255,279,250),
temperature = c(12.2,11.6,12,8.5,4,8.2,9.2,10.6,10.8,10.9),
relative_humidity_percent = c(74,78,72,65,77,84,83,74,73,75))
ggplot(templ) +
geom_point(aes(temperature, relative_humidity_percent, color = count), shape = 15, size = 5)
## first uncount
templ %>%
uncount(count) %>%
ggplot() +
ggpointdensity::geom_pointdensity(aes(temperature, relative_humidity_percent))
I am trying to show different growing season lengths by displaying crop planting and harvest dates at multiple regions.
My final goal is a graph that looks like this:
which was taken from an answer to this question. Note that the dates are in julian days (day of year).
My first attempt to reproduce a similar plot is:
library(data.table)
library(ggplot2)
mydat <- "Region\tCrop\tPlanting.Begin\tPlanting.End\tHarvest.Begin\tHarvest.End\nCenter-West\tSoybean\t245\t275\t1\t92\nCenter-West\tCorn\t245\t336\t32\t153\nSouth\tSoybean\t245\t1\t1\t122\nSouth\tCorn\t183\t336\t1\t153\nSoutheast\tSoybean\t275\t336\t1\t122\nSoutheast\tCorn\t214\t336\t32\t122"
# read data as data table
mydat <- setDT(read.table(textConnection(mydat), sep = "\t", header=T))
# melt data table
m <- melt(mydat, id.vars=c("Region","Crop"), variable.name="Period", value.name="value")
# plot stacked bars
ggplot(m, aes(x=Crop, y=value, fill=Period, colour=Period)) +
geom_bar(stat="identity") +
facet_wrap(~Region, nrow=3) +
coord_flip() +
theme_bw(base_size=18) +
scale_colour_manual(values = c("Planting.Begin" = "black", "Planting.End" = "black",
"Harvest.Begin" = "black", "Harvest.End" = "black"), guide = "none")
However, there's a few issues with this plot:
Because the bars are stacked, the values on the x-axis are aggregated and end up too high - out of the 1-365 scale that represents day of year.
I need to combine Planting.Begin and Planting.End in the same color, and do the same to Harvest.Begin and Harvest.End.
Also, a "void" (or a completely uncolored bar) needs to be created between Planting.Begin and Harvest.End.
Perhaps the graph could be achieved with geom_rect or geom_segment, but I really want to stick to geom_bar since it's more customizable (for example, it accepts scale_colour_manual in order to add black borders to the bars).
Any hints on how to create such graph?
I don't think this is something you can do with a geom_bar or geom_col. A more general approach would be to use geom_rect to draw rectangles. To do this, we need to reshape the data a bit
plotdata <- mydat %>%
dplyr::mutate(Crop = factor(Crop)) %>%
tidyr::pivot_longer(Planting.Begin:Harvest.End, names_to="period") %>%
tidyr::separate(period, c("Type","Event")) %>%
tidyr::pivot_wider(names_from=Event, values_from=value)
# Region Crop Type Begin End
# <chr> <fct> <chr> <int> <int>
# 1 Center-West Soybean Planting 245 275
# 2 Center-West Soybean Harvest 1 92
# 3 Center-West Corn Planting 245 336
# 4 Center-West Corn Harvest 32 153
# 5 South Soybean Planting 245 1
# ...
We've used tidyr to reshape the data so we have one row per rectangle that we want to draw and we've also make Crop a factor. We can then plot it like this
ggplot(plotdata) +
aes(ymin=as.numeric(Crop)-.45, ymax=as.numeric(Crop)+.45, xmin=Begin, xmax=End, fill=Type) +
geom_rect(color="black") +
facet_wrap(~Region, nrow=3) +
theme_bw(base_size=18) +
scale_y_continuous(breaks=seq_along(levels(plotdata$Crop)), labels=levels(plotdata$Crop))
The part that's a bit messy here that we are using a discrete scale for y but geom_rect prefers numeric values, so since the values are factors now, we use the numeric values for the factors to create ymin and ymax positions. Then we need to replace the y axis with the names of the levels of the factor.
If you also wanted to get the month names on the x axis you could do something like
dateticks <- seq.Date(as.Date("2020-01-01"), as.Date("2020-12-01"),by="month")
# then add this to you plot
... +
scale_x_continuous(breaks=lubridate::yday(dateticks),
labels=lubridate::month(dateticks, label=TRUE, abbr=TRUE))
Trying to make some plots with ggplot2 and cannot figure out how colour works as defined in aes. Struggling with errors of aesthetic length.
I've tried defining colours in either main ggplot call aes to give legend, but also in geom_line aes.
# Define dataset:
number<-rnorm(8,mean=10,sd=3)
species<-rep(c("rose","daisy","sunflower","iris"),2)
year<-c("1995","1995","1995","1995","1996","1996","1996","1996")
d.flowers<-cbind(number,species,year)
d.flowers<-as.data.frame(d.flowers)
#Plot with no colours:
ggplot(data=d.flowers,aes(x=year,y=number))+
geom_line(group=species) # Works fine
#Adding colour:
#Defining aes in main ggplot call:
ggplot(data=d.flowers,aes(x=year,y=number,colour=factor(species)))+
geom_line(group=species)
# Doesn't work with data size 8, asks for data of size 4
ggplot(data=d.flowers,aes(x=year,y=number,colour=unique(species)))+
geom_line(group=species)
# doesn't work with data size 4, now asking for data size 8
The first plot gives
Error: Aesthetics must be either length 1 or the same as the data (4): group
The second gives
Error: Aesthetics must be either length 1 or the same as the data (8): x, y, colour
So I'm confused - when given aes of length either 4 or 8 it's not happy!
How could I think about this more clearly?
Here are #kath's comments as a solution. It's subtle to learn at first but what goes inside or outside the aes() is key. Some more info here - When does the aesthetic go inside or outside aes()? and lots of good googleable "ggplot aesthetic" centric pages with lots of examples to cut and paste and try.
library(ggplot2)
number <- rnorm(8,mean=10,sd=3)
species <- rep(c("rose","daisy","sunflower","iris"),2)
year <- c("1995","1995","1995","1995","1996","1996","1996","1996")
d.flowers <- data.frame(number,species,year, param1, param2)
head(d.flowers)
#number species year
#1 8.957372 rose 1995
#2 7.145144 daisy 1995
#3 9.864917 sunflower 1995
#4 7.645287 iris 1995
#5 4.996174 rose 1996
#6 8.859320 daisy 1996
ggplot(data = d.flowers, aes(x = year,y = number,
group = species,
colour = species)) + geom_line()
#note geom_point() doesn't need to be grouped - try:
ggplot(data = d.flowers, aes(x = year,y = number, colour = species)) + geom_point()
Aim
I am trying to change the shape of the geom_point into a cross (so not a "plus/addition" sign, but a 'death' cross).
Attempt
Let say I have the following data:
library(tidyverse)
df <- read.table(text="x y
1 3
2 4
3 6
4 7 ", header=TRUE)
I am able to change the shape using the shape parameter in geom_point into different shapes, like this:
ggplot(data = df, aes(x =x, y=y)) +
geom_point(shape=2) # change shape
However, there is no option to change the shape into a cross.
Question
How do I change the shape of a value into a cross using ggplot in R?
Shape can be set to a unicode character. The below uses the skull and crossbones but you can look up a more suitable symbol.
Note that the final result will depend on the font used to generate the plot.
ggplot(data = df, aes(x =x, y=y)) +
geom_point(shape="\u2620", size = 10)
I have a chart which I want to colour the bin density (as below). But I want to have single bins (value=1) as black and higher values either as a single other colour, or better, as a gradient.
I have only been able to have a single black->red gradient, or completely discrete colours which is too confusing. I haven't been able to successfully map manual colours to the 'count' variable of the bin2d function. Can anyone suggest a fix?
My code:
ggplot(x, aes(x=as.factor(V4), y=V2)) +
geom_bin2d(binwidth = c(1,100)) +
scale_fill_continuous(low="black", high="red") +
facet_wrap(~V1, nrow = 1)
Zoomed version, showing how difficult it is to differentiate 2s
EDIT: I've realised a better way to represent this. What I want is a scale that looks like this:
My data (x) looks like this:
V1 V2 V3 V4
5 5831 30 A
5 20451 38 A
5 23151 34 B
5 30061 39 A
5 34191 32 B
5 41641 30 A
So, V2 is position of the row up the y axis, V1 is the facets and V4 is the vertical columns. Existence of the row (previously determined by V3 but not relevant here) contributes to the bin2d count.
I have managed to work this out. Found that you can map to the bind count using "..count..", so the code now reads:
ggplot(x, aes(x=as.factor(V4), y=V2)) +
geom_bin2d(binwidth = c(1,100), aes(fill=as.factor(..count..))) +
scale_fill_manual(values = c("#000000", "#FF9900", "#FF6600", "#FF3300")) +
scale_y_continuous(breaks = pretty_breaks(12)) +
facet_wrap(~V1, nrow = 1)