I'm trying to create a grouped bar chart of monthly data, aggregated from daily data, over multiple years. I have accomplished what I wanted from my x-axis from faceting, using faceting as a way to apply a secondary sort (on year and month). Now that I've faceted by year, ggplot is showing all months - even when there's no data. This is wasting space and my actual data set has years of data and I want to add labels, so space is an issue.
How can I accomplish this without the wasted space? Is there a way to add the secondary sort (year,month) on the x-axis without faceting?
# create data set
date = seq(as.Date("2014-05-01"),as.Date("2015-05-10"), "day")
revenue = runif(375, min = 0, max = 200)
cost = runif(375, min = 0, max = 100)
df = data.frame(date,revenue,cost)
head(df)
# adding month and year column, then aggregating to monthly revenue and cost
library(plyr)
df$month <- month(df$date, label=TRUE)
df$year <- year(df$date)
df <- as.data.frame(ddply(df, .(month,year), numcolwise(sum)))
# melting the data for a 'grouped chart' in ggplot
library(reshape)
df <-melt(df, id = c("month","year"))
#create chart
library(ggplot2)
g <-ggplot(df, aes(x=month, y=value, fill=variable))
g + geom_bar(stat="identity", position="dodge") + facet_wrap(~ year)
I feel certain that there's a more elegant way to do this from within ggplot. Am I right?
The key is to use scale = "free" in facet_wrap(). By following your code (with a revision), you'll see the graphic below.
set.seed(222)
date = seq(as.Date("2014-05-01"),as.Date("2015-05-10"), "day")
revenue = runif(375, min = 0, max = 200)
cost = runif(375, min = 0, max = 100)
mydf = data.frame(date,revenue,cost)
mydf$month <- month(mydf$date, label=TRUE)
mydf$year <- year(mydf$date)
mydf2 <- as.data.frame(ddply(mydf, .(month,year), numcolwise(sum)))
mydf3 <- melt(mydf2, id = c("month","year"))
ggplot(mydf3, aes(x=month, y=value, fill=variable)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~ year, scale = "free")
Related
Using the R programming language, I create some time series data (daily measurements, over a period of 20 years). I aggregated this data at monthly time periods and then produced a graph:
library(ggplot2)
library(xts)
library(scales)
set.seed(123)
day = seq(as.Date("2000/1/1"), as.Date("2020/1/1"),by="day")
day <- format(as.Date(day), "%Y/%m/%d")
amount <- rnorm(7306 ,100,10)
data <- data.frame(day, amount)
y.mon<-aggregate(amount~format(as.Date(day),
format="%Y/%m"),data=data, FUN=sum)
y.mon$d = y.mon$`format(as.Date(day), format = "%Y/%m")`
ggplot(y.mon, aes(x = d, y=amount))+
geom_line(aes(group=1))
Right now, the x-axis is completely unreadable. Is there a way to "unclutter" the x-axis? Perhaps "slant" the dates or show the dates at intervals of 4 month periods? I can completely delete the x-axis but ideally I would like to keep it there for reference.
At the end of the graph, there is a huge downwards "spike". I think this is because the data is aggregated every month - and since the last day the data is available at is "Jan-01-2020", this causes the "downwards spike". Is it possible to "query" the "y.mon" object so that the graph is made only until the last "complete" time period? This "spike" is deceiving, someone might look at the graph and think a big anomaly happened in Jan-2020, but it's actually because there is only 1 measurement at this time.
Thanks
You can also try:
library(ggplot2)
library(xts)
library(scales)
set.seed(123)
#Data
day = seq(as.Date("2000/1/1"), as.Date("2020/1/1"),by="day")
amount <- rnorm(7306 ,100,10)
data <- data.frame(day, amount)
#Aggregate
y.mon<-aggregate(amount~format(as.Date(day),
format="%Y/%m"),data=data, FUN=sum)
#Count days
y.mon2<-aggregate(amount~format(as.Date(day),
format="%Y/%m"),data=data,
FUN=function(x) length(x))
names(y.mon2)[2]<-'N'
#Format and merge to add N
y.mon$d = y.mon$`format(as.Date(day), format = "%Y/%m")`
mmon <- merge(y.mon,y.mon2)
#Add a dummy date
mmon$d <- as.Date(paste0(mmon$d,'/01'),'%Y/%m/%d')
#Plot
ggplot(subset(mmon,N!=1), aes(x = d, y=amount))+
geom_line(aes(group=1))+
scale_x_date(date_breaks = '4 month',date_labels = '%Y-%m',
expand = c(0,0))+
theme(axis.text.x = element_text(angle = 90))
Output:
Update: Using previous code and only changing for labels:
#Plot Update
ggplot(subset(mmon,N!=1), aes(x = d, y=amount))+
geom_line(aes(group=1))+
scale_x_date(date_breaks = '12 month',date_labels = '%Y',
expand = c(0,0))+
theme(axis.text.x = element_text(angle = 90))
Output:
I have a table with many variables. One of the variables contains year information: from 1999 till 2010.
I need to do for each year the same analysis, for instance, to plot a graph, a histogram, etc.
Currently, I subset the data so that each year goes into a data frame(table) and I do the analysis in turn for each year. This is very inefficient:
dates <- (sample(seq(as.Date('1999/01/01'), as.Date('2010/01/01'), by="day"), 50, replace = TRUE))
dt<-data.table( YEAR = format.Date(dates,"%Y"),
Var1=sample(0:100, 50, rep=TRUE),
Var2 =sample(0:500, 50, rep=TRUE)
)
year_1999<-dt[YEAR=="1999"]
plot_1999<- ggplot(year_1999, aes (x=Var1))+
geom_line(aes(y=Var2), size=1, color="blue") +
labs(y="V2", x="V1", title="Year 1999")
plot_1999
How can I better write this in a compact way? I suppose I need a function but I have no idea how to.
Instead of repeating the code several times, we can specify the 'YEAR' in facet_wrap
library(ggplot2)
ggplot(dt, aes(x = Var1, y = Var2)) +
geom_line(aes(size = 1, color = "blue")) +
labs(y = "V2", x = "V1") +
facet_wrap(~ YEAR)
Try this if you want to create a separate plot object for each unique year in dt$YEAR:
for (i in unique(dt$YEAR)) {
year <- dt[YEAR==i]
plot <- ggplot(year, aes (x=Var1))+
geom_line(aes(y=Var2), size=1, color="blue") +
labs(y="V2", x="V1", title="Year 1999")
assign(paste("plot", i, sep=""), plot)
}
When facetting barplots in ggplot the x-axis includes all factor levels. However, not all levels may be present in each group. In addition, zero values may be present, so from the barplot alone it is not possible to distinguish between x-axis values with no data and those with zero y-values. Consider the following example:
library(tidyverse)
set.seed(43)
site <- c("A","B","C","D","E") %>% sample(20, replace=T) %>% sort()
year <- c("2010","2011","2012","2013","2014","2010","2011","2012","2013","2014","2010","2012","2013","2014","2010","2011","2012","2014","2012","2014")
isZero = rbinom(n = 20, size = 1, prob = 0.40)
value <- ifelse(isZero==1, 0, rnorm(20,10,3)) %>% round(0)
df <- data.frame(site,year,value)
ggplot(df, aes(x=year, y=value)) +
geom_bar(stat="identity") +
facet_wrap(~site)
This is fish census data, where not all sites were fished in all years, but some times no fish were caught. Hence the need to differentiate between the two situations. For example, there was no catch at site C in 2010 and it was not fished in 2011, and the reader cannot tell the difference. I would like to add something like "no data" to the plot for 2011. Maybe it is possible to fill the rows where data is missing, generate another column with the desired text to be added and then include this via geom_text?
So here is an example of your proposed method:
# Tabulate sites vs year, take zero entries
tab <- table(df$site, df$year)
idx <- which(tab == 0, arr.ind = T)
# Build new data.frame
missing <- data.frame(site = rownames(tab)[idx[, "row"]],
year = colnames(tab)[idx[, "col"]],
value = 1,
label = "N.D.") # For 'no data'
ggplot(df, aes(year, value)) +
geom_col() +
geom_text(data = missing, aes(label = label)) +
facet_wrap(~site)
Alternatively, you could also let the facets omit unused x-axis values:
ggplot(df, aes(x=year, y=value)) +
geom_bar(stat="identity") +
facet_wrap(~site, scales = "free_x")
I have a time series of monthly data for 10 years:
myts <- ts(rnorm(12*10), frequency = 12, start = 2001)
Now, I'd like to plot the data but with the x-axis restricted to a range/ticks from Jan - Dec (generic year). Thus, the whole time series should be broken in ten lines where each line starts at Jan and ends at Dec. So multiple lines should be overplotted each other which I'd like to use to visually compare different years. Is there a straight forward command to do that in R?
So far I came up with following solution using matplot which might not be the most sophisticated one:
mydf <- as.data.frame(matrix(myts, 12))
matplot(mydf,type="l")
Or even better would be a way to calculate an average value and the corresponding CI/standard deviation for each month and plot then the average from Jan - Dec as a line and the corresponding CI/standard deviation as a band around the line for the average.
Consider using ggplot2.
library(ggplot2)
library(ggfortify)
d <- fortify(myts)
d$year <- format(d$Index, "%Y")
d$month <- format(d$Index, "%m")
It's useful to start by reshaping the ts object into a long dataframe. Given the dataframe, it's straightforward to create the plots you have in mind:
ggplot(d, aes(x = month, y = Data, group = year, colour = year)) +
geom_line()
ggplot(d, aes(x = month, y = Data, group = month)) +
stat_summary(fun.data = mean_se, fun.args = list(mult = 1.96))
Result:
You can also summarise the data yourself, then plot it:
d_sum <- do.call(rbind, (lapply(split(d$Data, d$month), mean_se, mult = 1.96)))
d_sum$month <- rownames(d_sum)
ggplot(d_sum, aes(x = month, y = y, ymin = ymin, ymax = ymax)) +
geom_errorbar() +
geom_point() +
geom_line(aes(x = as.numeric(month)))
Result:
I want to plot two stacked histograms that share a common x-axis. I want the second histogram to be plotted as the inverse(pointing downward) of the first. I found this post that shows how to plot the stacked histograms (How to plot multiple stacked histograms together in R?). For the sake of simplicity, let's say I just want to plot that same histogram, on the same x-axis but facing in the negative y-axis direction.
You could count up cases and then multiply the count by -1 for one category. Example with data.table / ggplot
library(data.table)
library(ggplot2)
# fake data
set.seed(123)
dat <- data.table(value = factor(sample(1:5, 200, replace=T)),
category = sample(c('a', 'b'), 200, replace=T))
# count by val/category; cat b as negative
plot_dat <-
dat[, .(N = .N * ifelse(category=='a', 1, -1)),
by=.(value, category)]
# plot
ggplot(plot_dat, aes(x=value, y=N, fill=category)) +
geom_bar(stat='identity', position='identity') +
theme_classic()
You can try something like this:
ggplot() +
stat_bin(data = diamonds,aes(x = depth)) +
stat_bin(data = diamonds,aes(x = depth,y = -..count..))
Responding to the additional comment:
library(dplyr)
library(tidyr)
d1 <- diamonds %>%
select(depth,table) %>%
gather(key = grp,value = val,depth,table)
ggplot() +
stat_bin(data = d1,aes(x = val,fill = grp)) +
stat_bin(data = diamonds,aes(x = price,y = -..count..))
Visually, that's a bad example because the scales of the variables are all off, but that's the general idea.