When ggplot makes a line plot with polar coordinates, it leaves a gap between the highest and lowest x-values (Dec and Jan below) instead of wrapping around into a spiral. How can I continue the line and close that gap?
In particular, I want to use months as my x-axis, but plot multiple years of data in one looping line.
Reprex:
library(ggplot2)
# three years of monthly data
df <- expand.grid(month = month.abb, year = 2014:2016)
df$value <- seq_along(df$year)
head(df)
## month year value
## 1 Jan 2014 1
## 2 Feb 2014 2
## 3 Mar 2014 3
## 4 Apr 2014 4
## 5 May 2014 5
## 6 Jun 2014 6
ggplot(df, aes(month, value, group = year)) +
geom_line() +
coord_polar()
Here's a somewhat-hacky option:
# make a data.frame of start values end values should continue to
bridges <- df[df$month == 'Jan',]
bridges$year <- bridges$year - 1 # adjust index to align with previous group
bridges$month <- NA # set x value to any new value
# combine extra points with original
ggplot(rbind(df, bridges), aes(month, value, group = year)) +
geom_line() +
# close gap by removing expansion; redefine breaks to get rid of "NA/Jan" label
scale_x_discrete(expand = c(0,0), breaks = month.abb) +
coord_polar()
Obviously adding extra data points is not ideal, though, so maybe a more elegant answer exists.
Related
I have 2 datas, one for 2020 and the other for 2019. Each is divided into 5 groups when each month has its own data.
I want to create a graph that compares each month for each group between the figure in 2020 and the figure in 2019.
the data for 2020 was like that-
enter image description here
and the data for 2019 was the same.
I combine the 2 datas to that:
enter image description here
The problem is that all the graphs I looked at on the internet have either one column of values or no division into months.
How can you create one graph that compares each month between 2019 and 2020?
library(tidyverse)
library(ggplot2)
# bring table in long format
longerTable <- tibble(month = 1:12, value_2020 = rnorm(12), value_2019=rnorm(12)) %>%
pivot_longer(cols=starts_with("value"), names_to="year", values_to="value")
# plot with ggplot.
ggplot(longerTable, aes(x=month, y=value, fill=year)) +
# stat = identity -> plot numbers as they are
# position = dodge -> show bars next to each other
geom_bar(stat="identity", position = "dodge")
Created on 2020-10-01 by the reprex package (v0.3.0)
I am trying to show different growing season lengths by displaying crop planting and harvest dates at multiple regions.
My final goal is a graph that looks like this:
which was taken from an answer to this question. Note that the dates are in julian days (day of year).
My first attempt to reproduce a similar plot is:
library(data.table)
library(ggplot2)
mydat <- "Region\tCrop\tPlanting.Begin\tPlanting.End\tHarvest.Begin\tHarvest.End\nCenter-West\tSoybean\t245\t275\t1\t92\nCenter-West\tCorn\t245\t336\t32\t153\nSouth\tSoybean\t245\t1\t1\t122\nSouth\tCorn\t183\t336\t1\t153\nSoutheast\tSoybean\t275\t336\t1\t122\nSoutheast\tCorn\t214\t336\t32\t122"
# read data as data table
mydat <- setDT(read.table(textConnection(mydat), sep = "\t", header=T))
# melt data table
m <- melt(mydat, id.vars=c("Region","Crop"), variable.name="Period", value.name="value")
# plot stacked bars
ggplot(m, aes(x=Crop, y=value, fill=Period, colour=Period)) +
geom_bar(stat="identity") +
facet_wrap(~Region, nrow=3) +
coord_flip() +
theme_bw(base_size=18) +
scale_colour_manual(values = c("Planting.Begin" = "black", "Planting.End" = "black",
"Harvest.Begin" = "black", "Harvest.End" = "black"), guide = "none")
However, there's a few issues with this plot:
Because the bars are stacked, the values on the x-axis are aggregated and end up too high - out of the 1-365 scale that represents day of year.
I need to combine Planting.Begin and Planting.End in the same color, and do the same to Harvest.Begin and Harvest.End.
Also, a "void" (or a completely uncolored bar) needs to be created between Planting.Begin and Harvest.End.
Perhaps the graph could be achieved with geom_rect or geom_segment, but I really want to stick to geom_bar since it's more customizable (for example, it accepts scale_colour_manual in order to add black borders to the bars).
Any hints on how to create such graph?
I don't think this is something you can do with a geom_bar or geom_col. A more general approach would be to use geom_rect to draw rectangles. To do this, we need to reshape the data a bit
plotdata <- mydat %>%
dplyr::mutate(Crop = factor(Crop)) %>%
tidyr::pivot_longer(Planting.Begin:Harvest.End, names_to="period") %>%
tidyr::separate(period, c("Type","Event")) %>%
tidyr::pivot_wider(names_from=Event, values_from=value)
# Region Crop Type Begin End
# <chr> <fct> <chr> <int> <int>
# 1 Center-West Soybean Planting 245 275
# 2 Center-West Soybean Harvest 1 92
# 3 Center-West Corn Planting 245 336
# 4 Center-West Corn Harvest 32 153
# 5 South Soybean Planting 245 1
# ...
We've used tidyr to reshape the data so we have one row per rectangle that we want to draw and we've also make Crop a factor. We can then plot it like this
ggplot(plotdata) +
aes(ymin=as.numeric(Crop)-.45, ymax=as.numeric(Crop)+.45, xmin=Begin, xmax=End, fill=Type) +
geom_rect(color="black") +
facet_wrap(~Region, nrow=3) +
theme_bw(base_size=18) +
scale_y_continuous(breaks=seq_along(levels(plotdata$Crop)), labels=levels(plotdata$Crop))
The part that's a bit messy here that we are using a discrete scale for y but geom_rect prefers numeric values, so since the values are factors now, we use the numeric values for the factors to create ymin and ymax positions. Then we need to replace the y axis with the names of the levels of the factor.
If you also wanted to get the month names on the x axis you could do something like
dateticks <- seq.Date(as.Date("2020-01-01"), as.Date("2020-12-01"),by="month")
# then add this to you plot
... +
scale_x_continuous(breaks=lubridate::yday(dateticks),
labels=lubridate::month(dateticks, label=TRUE, abbr=TRUE))
this is the first question I post here, so please excuse if I don't provide all information right away.
I'm trying to build a line graph with two lines:
y1 <- c(1000,1500,1000,1500,2000,3000,4000)
y2 <- c(1100,1400,900,1500,2000,2500,3500)
x <- c(49,50,51,1,2,49,50)
df <- data.frame(y1,y2,x)
Imagine x being calendar weeks, I skipped the weeks between 3 and 48 of the second year.
Now I want to build a line graph, which display the x-axis values (time series) in this order.
First I tried a really simple approach:
p <- ggplot()
p <- p + geom_line(data=df,aes(x=x,y=y1))
p <- p + geom_line(data=df,aes(x=x,y=y2), color = "red")
p
Problem: R sorts the x values and also sums up same week numbers.
I then tried to change the x values to make them unique, e.g. 49/19,50/19, but R still changes the order. Same happens if I use geom_path instead of geom_line.
I then tried to change x to a factor and use x_scale_discrete, but I couldn't figure out, how to do it, either the lines or the x labels were always missing.
I hope you can give me some kind of advice.
Many thanks,
Andre
You can add a prefix of the year to your x value, and we pad it using str_pad() from stringr with a zero, so that they will be sorted from 01 all the way to 52:
library(tidyr)
library(stringr)
library(ggplot2)
df$week = paste(rep(c("2019","2020"),c(3,4)),
str_pad(df$x,2,pad="0"),sep="_")
pivot this long, so that we get a legend:
pivot_longer(df[,c("week","y1","y2")],-week)
# A tibble: 14 x 3
week name value
<chr> <chr> <dbl>
1 2019_49 y1 1000
2 2019_49 y2 1100
3 2019_50 y1 1500
4 2019_50 y2 1400
5 2019_51 y1 1000
6 2019_51 y2 900
7 2020_01 y1 1500
8 2020_01 y2 1500
9 2020_02 y1 2000
Then use this directly in ggplot
ggplot(pivot_longer(df[,c("week","y1","y2")],-week),
aes(x=week,y=value,group=name,col=name)) +
geom_line() + scale_color_manual(values=c("black","red"))
One approach is to replace x with a sequence of integers and then apply the x-axis labels afterwards.
library(ggplot2)
ggplot(data = df, aes(x = seq(1,nrow(df)))) +
geom_line(aes(y=y1)) +
geom_line(aes(y=y2), color = "red") +
scale_x_continuous(breaks = seq(1,nrow(df)),
labels = as.character(df$x)) +
labs(x = "Week")
I have a data like this
year catch group
2011 22 1
2012 45 1
2013 34 1
2011 11 2
2012 22 2
2013 32 2
I would like to have the number of the group (1 and 2) to appear above the line in the plot.
Any suggestion?
My real data has 8 groups in total with 8 lines which makes it hard to see because the lines cross one another and the colors of the legend are similar.
I tried this:
library(ggplot2)
ggplot(aes(x=as.factor(year), y=catch, group=as.factor(group),
col=as.factor(group)), data=df) +
geom_line() +
geom_point() +
xlab("year") +
labs(color="group")
Firstly, distinguishing 8 different colours is very difficult. That's why your 8 groups seem to have similar colors.
What you want in this case is not a legend (which usually is an off-chart summary), but rather "annotation".
You can directly add the groups with
ggplot(...) +
geom_text(aes(x=as.factor(year), y=catch, label=group)) +
...
and then try to tweak the position of the text with nudge_x and nudge_y. But if you wanted only 1 label per group, you would have to prepare a data frame with it:
labels <- df %>% group_by(group) %>% top_n(1, -year)
ggplot(...) +
geom_text(data=labels, aes(x=as.factor(year), y=catch, label=group)) +
...
I have two data frames z (1 million observations) and b (500k observations).
z= Tracer time treatment
15 0 S
20 0 S
25 0 X
04 0 X
55 15 S
16 15 S
15 15 X
20 15 X
b= Tracer time treatment
2 0 S
35 0 S
10 0 X
04 0 X
20 15 S
11 15 S
12 15 X
25 15 X
I'd like to create grouped boxplots using time as a factor and treatment as colour. Essentially I need to bind them together and then differentiate between them but not sure how. One way I tried was using:
zz<-factor(rep("Z", nrow(z))
bb<-factor(rep("B",nrow(b))
dumB<-merge(z,zz) #this won't work because it says it's too big
dumB<-merge(b,zz)
total<-rbind(dumB,dumZ)
But z and zz merge won't work because it says it's 10G in size (which can't be right)
The end plot might be similar to this example: Boxplot with two levels and multiple data.frames
Any thoughts?
Cheers,
EDIT: Added boxplot
I would approach it as follows:
# create a list of your data.frames
l <- list(z,b)
# assign names to the dataframes in the list
names(l) <- c("z","b")
# bind the dataframes together with rbindlist from data.table
# the id parameter will create a variable with the names of the dataframes
# you could also use 'bind_rows(l, .id="id")' from 'dplyr' for this
library(data.table)
zb <- rbindlist(l, id="id")
# create the plot
ggplot(zb, aes(x=factor(time), y=Tracer, color=treatment)) +
geom_boxplot() +
facet_wrap(~id) +
theme_bw()
which gives:
Other alternatives for creating your plot:
# facet by 'time'
ggplot(zb, aes(x=id, y=Tracer, color=treatment)) +
geom_boxplot() +
facet_wrap(~time) +
theme_bw()
# facet by 'time' & color by 'id' instead of 'treatment'
ggplot(zb, aes(x=treatment, y=Tracer, color=id)) +
geom_boxplot() +
facet_wrap(~time) +
theme_bw()
In respons to your last comment: to get everything in one plot, you use interaction to distinguish between the different groupings as follows:
ggplot(zb, aes(x=treatment, y=Tracer, color=interaction(id, time))) +
geom_boxplot(width = 0.7, position = position_dodge(width = 0.7)) +
theme_bw()
which gives:
The key is you do not need to perform a merge, which is computationally expensive on large tables. Instead assign a new variable and value (source c(b,z) in my code below) to each dataframe and then rbind. Then it becomes straight forward, my solution is very similar to #Jaap's just with different faceting.
library(ggplot2)
#Create some mock data
t<-seq(1,55,by=2)
z<-data.frame(tracer=sample(t,size = 10,replace = T), time=c(0,15), treatment=c("S","X"))
b<-data.frame(tracer=sample(t,size = 10,replace = T), time=c(0,15), treatment=c("S","X"))
#Add a variable to each table to id itself
b$source<-"b"
z$source<-"z"
#concatenate the tables together
all<-rbind(b,z)
ggplot(all, aes(source, tracer, group=interaction(treatment,source), fill=treatment)) +
geom_boxplot() + facet_grid(~time)