I am using multiple datasets in ggplot2 to create a time series of event occurrences. The plan is to plot the mean lines (mean being average date of occurrence) of two datasets over time, and use geom_ribbon() to depict the range between +1 and -1 standard deviation above and below the mean (listed below in columns sdv_pos and sdv_neg representing +1 and -1 respectively).
I am able to plot the two mean lines. However, when I insert geom_ribbon I get the following error:
Error in as.POSIXct.numeric(value) : 'origin' must be supplied".
I've tried converting the columns used in the geom_ribbon() line using as.POSIXct() with the origin, but it has not worked. I only get this error with geom_ribbon(), not geom_line().
Here are the two datasets:
Data1:
sdv_pos stv_neg year data1_mean
1976-03-20 1976-03-14 1997 1976-03-17
1976-02-18 1976-01-18 1998 1976-02-03
1976-02-12 1976-01-06 1999 1976-01-24
1976-03-02 1976-01-07 2000 1976-02-04
1976-01-10 1976-01-10 2001 1976-01-10
1976-04-21 1976-02-19 2002 1976-03-21 1
Data2:
sdv_pos sdv_neg year data2_mean
1976-04-24 1976-03-10 1997 1976-04-02
1976-04-21 1976-01-27 1998 1976-03-10
1976-04-21 1976-01-20 1999 1976-03-07
1976-03-23 1976-01-04 2000 1976-02-12
1976-05-05 1976-02-08 2001 1976-03-23
1976-05-01 1976-01-29 2002 1976-03-16
Here is the code I'm using for this. Note that when I remove geom_ribbon() the plot works. However when I include geom_ribbon() I get the error.
graph1<- ggplot()+
geom_line(data = Data1, aes(x = year, y = data2_mean), color = "blue") +
geom_ribbon(data = Data1, aes(x=data2_mean, ymax=sdv_pos, ymin=sdv_neg), fill="pink", alpha=.5)+
geom_line(data = Data2, aes(x = year, y=data2_mean), color = "red") +
geom_ribbon(data = Data2, aes(x=data2_mean, ymax=sdv_pos, ymin=sdv_neg), fill="yellow", alpha=.5)
Note that the year for the x axis and year for the data values are not the same. I use 1976 just to keep the mean line on the same date/month, otherwise the y-axis will extent to include all the years in the study
I found the answer by changing the command to
geom_ribbon(data = Data1, aes(x=data2_mean, ymax=sdv_pos, ymin=sdv_neg), fill="pink", alpha=.5)+
The difference being what the x value is. I thought I had to incorporate the mean as a centerline for the ribbon, but what it does is simply shades in the space between the two lines (sdv_pos, sdv_neg), and needs x for the x-axis to shade the area as it goes.
Seems obvious but I wanted to post an answer here in case anyone runs into the same problem
Related
I have a data df with the format
State
Date
Ratio
AL
2019-01
10.1
AL
2019-02
12.1
...
...
...
NY
2019-01
15.1
...
...
...
And I would like to draw a time series with the geofacet package. I am having troubles with the Date format I guess.
ggplot(df,aes(Date, Ratio)) + geom_line() + facet_geo(~ State, grid = "us_state_grid2") + ylab("Rate (%)")
The following errors shown:geom_path: Each group consists of only one observation. Do you
need to adjust the group aesthetic?
How I can adjust it?
Your date is structured 'yyyy-mm', so I'm guessing it's a character vector rather than a date object. You should convert it to class Date with as.Date() and then it should work as expected. (You'll need to paste on the day of the month.)
You get a grouping error because when your x-axis is a character vector, geom_line will group by values of the character vector x-axis. Lines are drawn instead between the various y values at each x value. Here's an example using the geofacet package's own state_ranks dataset.
library(ggplot2)
library(dplyr)
library(geofacet)
data(state_ranks)
# The lines are not connected across a character x-axis.
ggplot(state_ranks) +
geom_line(aes(x = variable, y = rank))
# Throws error: geom_path: Each group consists of only one observation. Do
# you need to adjust the group aesthetic?
ggplot(state_ranks) +
geom_line(aes(x = variable, y = rank)) +
facet_geo(~ state)
If you group by state, you get the expected result (with an alphabetically ordered x-axis).
# Works, x-axis is alphabetized and lines are connected
ggplot(state_ranks) +
geom_line(aes(x = variable, y = rank, group = state)) +
facet_geo(~ state)
I am trying to show different growing season lengths by displaying crop planting and harvest dates at multiple regions.
My final goal is a graph that looks like this:
which was taken from an answer to this question. Note that the dates are in julian days (day of year).
My first attempt to reproduce a similar plot is:
library(data.table)
library(ggplot2)
mydat <- "Region\tCrop\tPlanting.Begin\tPlanting.End\tHarvest.Begin\tHarvest.End\nCenter-West\tSoybean\t245\t275\t1\t92\nCenter-West\tCorn\t245\t336\t32\t153\nSouth\tSoybean\t245\t1\t1\t122\nSouth\tCorn\t183\t336\t1\t153\nSoutheast\tSoybean\t275\t336\t1\t122\nSoutheast\tCorn\t214\t336\t32\t122"
# read data as data table
mydat <- setDT(read.table(textConnection(mydat), sep = "\t", header=T))
# melt data table
m <- melt(mydat, id.vars=c("Region","Crop"), variable.name="Period", value.name="value")
# plot stacked bars
ggplot(m, aes(x=Crop, y=value, fill=Period, colour=Period)) +
geom_bar(stat="identity") +
facet_wrap(~Region, nrow=3) +
coord_flip() +
theme_bw(base_size=18) +
scale_colour_manual(values = c("Planting.Begin" = "black", "Planting.End" = "black",
"Harvest.Begin" = "black", "Harvest.End" = "black"), guide = "none")
However, there's a few issues with this plot:
Because the bars are stacked, the values on the x-axis are aggregated and end up too high - out of the 1-365 scale that represents day of year.
I need to combine Planting.Begin and Planting.End in the same color, and do the same to Harvest.Begin and Harvest.End.
Also, a "void" (or a completely uncolored bar) needs to be created between Planting.Begin and Harvest.End.
Perhaps the graph could be achieved with geom_rect or geom_segment, but I really want to stick to geom_bar since it's more customizable (for example, it accepts scale_colour_manual in order to add black borders to the bars).
Any hints on how to create such graph?
I don't think this is something you can do with a geom_bar or geom_col. A more general approach would be to use geom_rect to draw rectangles. To do this, we need to reshape the data a bit
plotdata <- mydat %>%
dplyr::mutate(Crop = factor(Crop)) %>%
tidyr::pivot_longer(Planting.Begin:Harvest.End, names_to="period") %>%
tidyr::separate(period, c("Type","Event")) %>%
tidyr::pivot_wider(names_from=Event, values_from=value)
# Region Crop Type Begin End
# <chr> <fct> <chr> <int> <int>
# 1 Center-West Soybean Planting 245 275
# 2 Center-West Soybean Harvest 1 92
# 3 Center-West Corn Planting 245 336
# 4 Center-West Corn Harvest 32 153
# 5 South Soybean Planting 245 1
# ...
We've used tidyr to reshape the data so we have one row per rectangle that we want to draw and we've also make Crop a factor. We can then plot it like this
ggplot(plotdata) +
aes(ymin=as.numeric(Crop)-.45, ymax=as.numeric(Crop)+.45, xmin=Begin, xmax=End, fill=Type) +
geom_rect(color="black") +
facet_wrap(~Region, nrow=3) +
theme_bw(base_size=18) +
scale_y_continuous(breaks=seq_along(levels(plotdata$Crop)), labels=levels(plotdata$Crop))
The part that's a bit messy here that we are using a discrete scale for y but geom_rect prefers numeric values, so since the values are factors now, we use the numeric values for the factors to create ymin and ymax positions. Then we need to replace the y axis with the names of the levels of the factor.
If you also wanted to get the month names on the x axis you could do something like
dateticks <- seq.Date(as.Date("2020-01-01"), as.Date("2020-12-01"),by="month")
# then add this to you plot
... +
scale_x_continuous(breaks=lubridate::yday(dateticks),
labels=lubridate::month(dateticks, label=TRUE, abbr=TRUE))
Trying to make some plots with ggplot2 and cannot figure out how colour works as defined in aes. Struggling with errors of aesthetic length.
I've tried defining colours in either main ggplot call aes to give legend, but also in geom_line aes.
# Define dataset:
number<-rnorm(8,mean=10,sd=3)
species<-rep(c("rose","daisy","sunflower","iris"),2)
year<-c("1995","1995","1995","1995","1996","1996","1996","1996")
d.flowers<-cbind(number,species,year)
d.flowers<-as.data.frame(d.flowers)
#Plot with no colours:
ggplot(data=d.flowers,aes(x=year,y=number))+
geom_line(group=species) # Works fine
#Adding colour:
#Defining aes in main ggplot call:
ggplot(data=d.flowers,aes(x=year,y=number,colour=factor(species)))+
geom_line(group=species)
# Doesn't work with data size 8, asks for data of size 4
ggplot(data=d.flowers,aes(x=year,y=number,colour=unique(species)))+
geom_line(group=species)
# doesn't work with data size 4, now asking for data size 8
The first plot gives
Error: Aesthetics must be either length 1 or the same as the data (4): group
The second gives
Error: Aesthetics must be either length 1 or the same as the data (8): x, y, colour
So I'm confused - when given aes of length either 4 or 8 it's not happy!
How could I think about this more clearly?
Here are #kath's comments as a solution. It's subtle to learn at first but what goes inside or outside the aes() is key. Some more info here - When does the aesthetic go inside or outside aes()? and lots of good googleable "ggplot aesthetic" centric pages with lots of examples to cut and paste and try.
library(ggplot2)
number <- rnorm(8,mean=10,sd=3)
species <- rep(c("rose","daisy","sunflower","iris"),2)
year <- c("1995","1995","1995","1995","1996","1996","1996","1996")
d.flowers <- data.frame(number,species,year, param1, param2)
head(d.flowers)
#number species year
#1 8.957372 rose 1995
#2 7.145144 daisy 1995
#3 9.864917 sunflower 1995
#4 7.645287 iris 1995
#5 4.996174 rose 1996
#6 8.859320 daisy 1996
ggplot(data = d.flowers, aes(x = year,y = number,
group = species,
colour = species)) + geom_line()
#note geom_point() doesn't need to be grouped - try:
ggplot(data = d.flowers, aes(x = year,y = number, colour = species)) + geom_point()
I want to create a density plot with the following data:
interval fr mi ab
0x 9765 3631 12985
1x 2125 2656 601
2x 1299 2493 191
3x 493 2234 78
4x 141 1559 20
5x and more 75 1325 23
On the X-Axis I want to have the Intervals and on the Y-Axis I want to have the density of "fr", "mi" and "ab" in different colors.
My imagination was something like this graph.
My problem is that I don't know how to get the density on the Y-Axis. I tried it with geom_density, but it didn't work. The best result I accomplished was using the following code:
DS29 <-as.data.frame(DS29)
DS29$interval <- factor(DS29$interval, levels = DS29$interval)
DS29 <- melt (DS29,id=c("interval"))
output$DS51<- renderPlot({
plot_tab6 <- ggplot(DS29, aes(x= interval,y = value, fill=variable, group = variable)) +
geom_col()+
geom_line()
return(plot_tab6)
})
This gives me the following plot, which is not the result I want to have. Do you have an idea how I could get to my wanted result? Thank you very much.
Seeing your sample data, I am not sure if you want to use geom_density. If you type ?geom_density, you will see some example codes. If I take one example from the help page, you may see things that you are missing.
ggplot(diamonds, aes(depth, fill = cut, colour = cut)) +
geom_density(alpha = 0.1) +
xlim(55, 70)
For x-axis, depth is a continuous variable, not a categorical variable. Your current data has a categorical variable in x-axis. For geom_density, you are looking for density of something at a value on x-axis. The example code above shows that the density of diamonds classified as "Ideal" has high density around 61.5-62, suggesting that the largest proportion "Ideal" diamonds have depth value around 61.5-62. Indeed, mean value for depth of "Ideal" diamond is 61.71. This means that you need multiple data points to calculate density. Your data has only one data point for each interval for each group (e.g., ab, fr, mi). So, I do not think your data is not ready for calculating density.
If you want to draw a graphic similar to what you suggested in your question using the current data, I think you need to 1) convert interval to a numeric variable, 2) transform the data into long format, and 3) use stat_smooth.
library(tidyverse)
mydf %>%
mutate(interval = as.numeric(sub(x = as.character(interval), pattern = "x", replacement = ""))) %>%
gather(key = group, value = value, - interval) -> temp
ggplot(temp, aes(x = interval, y = value, fill = group)) +
stat_smooth(geom = "area", span = 0.4, method = "loess", alpha = 0.4)
I have some data (AllPCA) that is divided by site. I have used qplot (PC1, PC2, data=AllPCA, colour=Population, facets=~Population) + scale_colour_manual (values=cbbPalette) to facet a scatterplot of two variables by site.
Example AllPCA:
ID PC1 PC2 Population
Syd1 0.0185 0.0426 Sydney
Was1 0.0167 0.0415 Washington
Rea1 0.0182 0.0431 Reading
Aar1 0.0183 0.0427 Aarhus
This works fine, but only gives the data from each site in each of the windows.
I would like to create the same plot, but keeping the rest of the data in each facetted plot, just greyed out. Can you help?
One way would be to use two geom_point() calls. In first I use data=AllPCA[,-4] - this is your data without column Population and set color="grey". So all points will be plotted in all facets in grey. Then I add second geom_point() with all data and color=Population. This will add only points in facets corresponding to each Population in separate colors (when facet_wrap() is used).
ggplot()+
geom_point(data=AllPCA[,-4],aes(PC1,PC2),color="grey")+
geom_point(data=AllPCA,aes(PC1,PC2,color=Population))+
facet_wrap(~Population)
Duplicate your data several times:
n <- length(unique(AllPCA[["Population"]]))
dat <- do.call(rbind, rep(list(AllPCA), n))
Create new columns for (a) facetting and (b) colour:
dat[["Population2"]] <- rep(AllPCA[["Population"]], each = n)
dat[["PopulationMatch"]] <- with(dat, Population == Population2)
Plot:
library(ggplot2)
qplot(PC1, PC2, data = dat, colour = PopulationMatch, facets = ~ Population2) +
scale_colour_manual(values = c("grey", "black"))