Facing an issue with ggplot - r

I am having a very simple data frame as below.
cat_group total abort_rate cancel_rate success_rate
100 1804 18.8 45.1 31.8
200 4118 17.7 30.0 48.3
500 14041 19.2 16.9 60.0
I am trying to put this data on a plot such that on the x-axis, I will have cat_group and then I would line plot all the other variables total, abort_rate, cancel_rate and success_rate. My idea is to show how each of these variables vary according to the value in cat_group. I would need four lines in total, one for each variable in a different colour
But when I use the below plot function in R, I am seeing the error: geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?
ggplot(my_data_frame, aes(category)) +
geom_line(aes(y = abort_rate, colour = "abort_rate")) +
geom_line(aes(y = success_rate, colour = "success_rate"))+
geom_line(aes(y = success_rate, colour = "total"))+
geom_line(aes(y = success_rate, colour = "cancel_rate"))
Any suggestions on how to resolve this issue?

Assuming that cat_group is of factor type (that's the only way I can reproduce your error) you could do it like this:
my_data_frame$cat_group <- as.factor(my_data_frame$cat_group)
library(ggplot2)
ggplot(my_data_frame, aes(cat_group)) +
geom_line(aes(y = abort_rate, colour = "abort_rate", group=1)) +
geom_line(aes(y = success_rate, colour = "success_rate", group=1))+
geom_line(aes(y = success_rate, colour = "total", group=1))+
geom_line(aes(y = success_rate, colour = "cancel_rate", group=1))
i.e. by specifying one group per geom_line. This has the problem that the scales will not be good enough because they will be set by the first geom_line, and therefore only 2 out of the 4 lines would show.
The typical way of working with such data is to melt the data.frame and then plot it like this:
library(reshape2)
dfm <- melt(my_data_frame, id.vars='cat_group')
ggplot(dfm, aes(x=cat_group, y=value, colour=variable, group=variable)) + geom_line() +
scale_y_log10()
Notice the scale_y_log10 in order to plot (and actually see) all 4 lines. You probably need a log scale since otherwise you will only be able to see the total which is very big and every other line will be overlapped.

One easy way to do this is to use autoplot.zoo:
library(ggplot2)
library(zoo)
z <- read.zoo(my_df)
autoplot(z, facet = NULL) + scale_y_log10()
(continued after graph):
or for separate panels without a log scale:
autoplot(z) + facet_free()
(continued after graph)
Note: Here is the input data in reproducible form:
Lines <- "cat_group total abort_rate cancel_rate success_rate
100 1804 18.8 45.1 31.8
200 4118 17.7 30.0 48.3
500 14041 19.2 16.9 60.0"
my_df <- read.table(text = Lines, header = TRUE)

The best way to solve this to regroup your data so that you have one column for the x axis and one for the y axis and one for what type of data that is contained in the row. To do this you can use the tidyr package.
library(tidyr)
plottingData <- df %>% gather(type,value,-cat_group)
ggplot(plottingData,aes(x=cat_group,y=value,color=type)) + geom_line()

Related

Control time-series axis label and color in R stacked bar

I'm trying to plot a stacked bar. Its work but still there are some function that I don't understand e.g, what xts do? Am I using all the library I've load? Replacing the axis label, its work with original data, but not with melted data (data was melted for producing stacked bar, because I didn't find any other ways to produced stacked bar using data.frame) . I also want to use monochrome color for this stacked bar. I try replacing 'fill = variable' to 'fill = c("orange", "blue", "green")' just to try, its not working. Kindly help.. Thank you..
library(ggplot2)
library(xts)
library(reshape2)
library(lubridate)
library(zoo)
setwd("C:/Users/Hp/Documents/yr")
data1 <- read.csv("yr1983.csv", head = TRUE, stringsAsFactors = FALSE)
data1$Date <- dmy(data1$Date)
#data1 <- xts(x = data1[,-1], order.by = data1[,1])
head(data1)
Date Inland middle coastal
1 1983-11-01 0.0 0.0 0.0
2 1983-11-02 0.0 0.0 0.0
3 1983-11-03 90.5 19.5 60.0
4 1983-11-04 88.5 28.5 53.8
5 1983-11-05 80.5 73.0 122.0
6 1983-11-06 179.5 102.0 141.3
#plot stacked bar
data.m <- melt(data1,id.vars = "Date")
p1 <- ggplot(data.m, aes(x = Date, y = value,fill=variable)) +
geom_bar(stat='identity')
p1
#try to rename the axis - error
Rainfall_Intensity <- data1$value
month <- data1$Date
ggplot(data.m, aes(x = month, y = Rainfall_Intensity,fill= variable)) +
geom_bar(stat='identity')
*Error: Aesthetics must be either length 1 or the same as the data (276): x, y, fill
ggplot(data1, aes(month, y = Rainfall_Intensity,fill= variable)) + geom_bar(stat='identity')
*Error in eval(expr, envir, enclos) : object 'Date' not found
look that:
Rainfall_Intensity <- data1$value
month <- data1$Date
The variables Rainfall_Intensity and month they not inside of the data.m. Therefore, when you use ggplot it generates the errors presented above. You must rename the variables:
rename(data.m,Rainfall_Intensity = value, month = Date)
And, after this, run your ggplot2.
fill = variable under aes is referring to the variable according to which the stacked bars are supposed to be separated. To change the colours of the stacked bars, you want to change fill under geom_bar
ggplot(data.m, aes(x = Date, y = value,fill=variable))
+ geom_bar(stat='identity', fill = c("orange", "blue", "green"))
You can refer to - http://sape.inf.usi.ch/quick-reference/ggplot2/colour - for choosing colours.
ggplot2 operates on entire data frames, so it expects that whatever names you use to map to aesthetics in aes are the bare column names from the data frame supplied to either the data param of the initial ggplot call, or a data param for a specific geom. Therefore, if you have a global variable called date and you call ggplot(data, aes(x = date, y = value)), it will be looking for a column in data called date, and will throw an error if one isn't found.
If you need to rename columns in your data frame, you can do that lots of different ways, such as names(data.m) <- c(...) or setNames(data.m, c(...)).
But if all you need to do is change the axis labels, you can do that as part of building the plot. Either assign labels using labs, or assign a single label within the corresponding scale function.
Changing several labels at once with labs (I just guessed based on the data sample):
library(tidyverse)
...
ggplot(data.m, aes(x = Date, y = value, fill = variable)) +
geom_col() +
labs(x = "Month", y = "Rainfall intensity", fill = "Location",
title = "Rainfall intensity by location",
subtitle = "November 1983")
Changing just the x-axis label within a call to scale_x_date:
ggplot(data.m, aes(x = Date, y = value, fill = variable)) +
geom_col() +
scale_x_date(name = "Month")
Created on 2018-06-29 by the reprex package (v0.2.0).

ggplot: adding new lines from a subset of years

I have about 90 years of daily data and I want to plot the long term mean, plus the individual lines for each year of my survey period (2014-2018). The data looks like this:
> head(dischg)
date ddmm year cfs daymo
1 1-Jan-27 01-Jan 1927 715 2018-01-01
2 2-Jan-27 02-Jan 1927 697 2018-01-02
3 3-Jan-27 03-Jan 1927 715 2018-01-03
4 4-Jan-27 04-Jan 1927 796 2018-01-04
5 5-Jan-27 05-Jan 1927 825 2018-01-05
6 6-Jan-27 06-Jan 1927 865 2018-01-06
I have been able to plot the long term mean easily enough:
p1 <- ggplot(dischg, aes(x=daymo, y=cfs)) +
stat_summary(fun.data = "mean_cl_boot", geom = "smooth", colour = "blue")
... but I need some help plotting the subset of years. I tried using "subset"
p2 <- p1 +
ggplot (subset(dischg, year %in% c(2014:2018)), aes(x=daymo, y=cfs, linetype=year)) +
geom_line() +
scale_colour_brewer(palette="Set1")
but I received this error:
Error: Don't know how to add o to a plot
Would it be smarter to just add one year at a time? That seems a bit cumbersome when there are five years of data to plot.
Thank you for providing sample data, however, I unfortunately cannot get the ggplot code to run with that sample data you provided so I will use a built in R dataset. The concepts are the same though.
The issue is that you are trying to add ggplot to an object that is already of class ggplot. Once you have initialized your object as a ggplot object, you don't need to call ggplot each time you want to add a layer. For example, I get the same error you do if I try:
p1 <- ggplot(mtcars, aes(x=hp,y=cyl)) + geom_point()
p2 <- p1 + ggplot(mtcars[mtcars$am == 1, ], aes(x = hp, y = cyl)) + geom_line()
As mentioned in my comment, if you want to add another layer with separate data (in your case the geom_line) you can do this by putting the data directly into the geom_ call. In your case you would do something like:
p1 <- ggplot(mtcars, aes(x=hp,y=cyl)) + geom_point()
p2 <- p1 + geom_line(data = mtcars[mtcars$am == 1, ])
p2
With thanks to feedback from #MikeH., I figured it out:
p1 <- ggplot(dischg, aes(x=daymo, y=cfs)) +
stat_summary(fun.data = "mean_cl_boot", geom = "smooth", colour = "blue") +
geom_line(data=subset(dischg, year %in% c(2014:2018)),
aes(colour=year)) +
scale_colour_brewer(palette="Set1")
(Also, I had to make sure the 'year' was a factor rather than a continuous variable.)

Add a line linking through top points of each bar to show the trend in R

I have plotted a bar graph already and now I'd like to add a curve,going through the top point of each bar so that the trend of change can be sown more clearly.
The data frame is in a format like:
v1 v2
a 10
b 6
c 7
...
Here is the code I plot the bar:
ggplot(date_count, aes(V1,V2)) + geom_bar(stat = "identity")+ theme(axis.text.x = element_text(angle=45, hjust = 1,vjust = 1)) +xlab("date") + ylab("Number of activity")
I have tried +geom_line() and geom_smooth() but both failed. Do you have any idea? Thanks in advance.
It is assumed you mean tops of bars rather than bottoms since the bottoms are all zero. We make the X axis continuous rather than discrete and in order to be able to see the added lines we make the bars white.
# input data in reproducible form
Lines <- "V1 V2
a 10
b 6
c 7"
date_count <- read.table(text = Lines, header = TRUE)
library(ggplot2)
n <- nrow(date_count)
ggplot(date_count, aes(x = 1:n, y = V2)) +
geom_bar(stat = "identity", fill = "white") +
theme(axis.text.x = element_text(angle=45, hjust = 1, vjust = 1)) +
xlab("date") +
ylab("Number of activity") +
scale_x_continuous(breaks = 1:n, labels = date_count$V1) +
geom_line() +
geom_smooth(lty = 2)
I'm a little confused by your "bottom point". I'm assuming that you mean the minimal point of each group.
It would be easier to reproduce with a larger sample of data. Hence, I'm using mtcars.
I interprete the "bottom" as minimal points which are here
aggregate(mpg ~ cyl , mtcars, function(x)min(x))
cyl mpg
1 4 21.4
2 6 17.8
3 8 10.4
You can generate the plot in the following way:
data(mtcars)
ggplot(mtcars, aes(x=cyl,y=mpg))+
geom_bar(stat="identity")+
stat_summary(fun.y=min ,geom="line",color="red")+
stat_summary(fun.y=sum ,geom="line",color="blue")
The red line is plotted using stat_summary at the minimum value of each group - as you wrote bottom. The blue line is the top (sum) of each group.

Error when adding errorbars to ggplot

Dear Stackoverflow users,
I would like to draw a grouped barplot with three independent variables with error bars. I based my graph on an example on Stacked Overflow (stacked bars within grouped bars), using ggplot with geom_bar. When I add the geom_errorbar according to examples of the help pages, I get the following error:
Error in if (empty(data)) { : missing value where TRUE/FALSE needed
This is the script I use:
treatment<-rep(c(rep(c(1),8),rep(c(2),8)),2)
origin<-rep(c("A","B"),16)
time<-c(rep(c(5),16),rep(c(10),16))
sulfide<-c(0,10,5,8,9,6,16,18,20,25,50,46,17,58,39,43,20,25,50,46,17,58,39,43,100,120,103,104,150,160,200,180)
Reed<-data.frame(treatment,origin,time,sulfide)
# specify factor types
Reed$treatment<-as.factor(Reed$treatment)
Reed$origin<-as.character(Reed$origin)
Reed$time<-as.factor(Reed$time)
library(ggplot2)
library(scales)
#draw plot
ggplot() +geom_bar(data=Reed, aes(y = sulfide, x = treatment, fill=origin), stat="identity",position="dodge") +theme_bw() + facet_grid( ~ time)+xlab("treatment") +ylab("Sulfide")+ggtitle("Time)")
This is how I added error bars:
ErrorBars <- function(x, y, upper, lower=upper, length=0.03,...{if(length(x) != length(y) | length(y) !=length(lower) | length(lower) != length(upper))stop("vectors must be same length")arrows(x,y+upper, x, y-lower, angle=90, code=3, length=length, ...)}#function for errorbars
SE<- function(x) sqrt(var(x,na.rm=TRUE)/length(na.omit(x))) #function for SE
Reed$trt<- paste(Reed$treatment,Reed$origin,sep="")#combine treatment and origin to a column
mean_Reed<-data.frame(tapply(Reed$sulfide,list(Reed$trt,Reed$time),mean,na.rm=TRUE)) #mean
SE_Reed<-data.frame(tapply(Reed$sulfide,list(Reed$trt, Reed$time),SE)) # SE
limits <- aes(ymax = mean_Reed + SE_Reed, ymin=mean_Reed - SE_Reed)# Define the top and bottom of the errorbars
#plot with error bars:
ggplot() +geom_bar(data=Reed, aes(y = sulfide, x = treatment, fill=origin), stat="identity",position="dodge") +theme_bw() + facet_grid( ~ time)+xlab("treatment") +ylab("Sulfide")+ggtitle("Time)"+ geom_errorbar(limits, width=.2,position="dodge")
I really can't find what I'm doing wrong.
I hope you can help me:)
Leaving aside the issue of error bars for the moment, there's a much more serious problem with your plot. You have 2 values each of treatment, time, and origin, for a total of 8 combinations, but 32 values of sulfide - so there are 4 values of sulfide for each combination. When you plot this using, e.g.,
ggplot(data=Reed) +
geom_bar(aes(y = sulfide, x = treatment, fill=origin), stat="identity",position="dodge") +
facet_grid( ~ time)+xlab("treatment") +ylab("Sulfide")
you are plotting bars for all four sulfide values on top of each other all in the same color. This has the effect of displaying only the maximum value. It's a little hard to believe this is what you intended, and even if you did there's a better way to do that. For instance, if you want to plot the mean value of sulfide for each combination of factors, you can do it this way.
ggp <- ggplot(data=Reed, aes(y = sulfide, x = as.factor(treatment), group=origin)) +
geom_bar(aes(fill=origin), stat="summary", fun.y=mean, position="dodge") +
theme_bw() +
facet_grid( ~ time)+xlab("treatment") +ylab("Sulfide")+ggtitle("Time")
ggp
This uses stat="summary" to automatically summarize the result using the aggregating function mean (fun.y=mean).
As similar approach can be used to very simply add the error bars:
se <- function(y) sd(y)/length(y) # to calculate standard error in the mean
ggp+stat_summary(geom="errorbar",position=position_dodge(width=0.85),
fun.data=function(y)c(ymin=mean(y)-se(y),ymax=mean(y)+se(y)), width=0.1)
Notice that there is no need to aggregate the data externally - ggplot does it for you.
Finally, this approach lends itself to the use of many built-in functions for generating confidence limits with more statistical rigor.
ggp+stat_summary(fun.data=mean_cl_normal, conf.int=0.95,
geom="errorbar",position=position_dodge(width=0.85), width=0.1)
So here we use the ggplot built-in function mean_cl_normal to calculate 95% confidence limits on the mean assuming the data follows a normal distribution (and that, hence, the means will follow a t-distribution). We use the argument conf.int=... to specify the desired confidence interval, but the default is 0.95 so it really wasn't necessary in this example.
There are several other functions of this type: see the documentation and links therein for an explanation.
If you want to build your error bars by making a summary dataset, you just need to get that dataset in the correct format. There are lots of options for this; I will use dplyr. Notice I keep all the grouping variables from the plot in this dataset in a "tidy" format, with each variable in a separate column.
library(dplyr)
meandat = Reed %>%
group_by(treatment, time, origin) %>%
summarise(mean = mean(sulfide, na.rm = TRUE), se = SE(sulfide))
Source: local data frame [8 x 5]
Groups: treatment, time [?]
treatment time origin mean se
(fctr) (fctr) (chr) (dbl) (dbl)
1 1 5 A 7.50 3.378856
2 1 5 B 10.50 2.629956
3 1 10 A 31.50 7.858117
4 1 10 B 43.00 6.819091
5 2 5 A 31.50 7.858117
6 2 5 B 43.00 6.819091
7 2 10 A 138.25 23.552689
8 2 10 B 141.00 17.540429
Now error bars can be added via geom_errorbar. You'll see I set the aesthetics globally within ggplot to save myself having to re-type some of these, but you can change this as you want. I use position_dodge to get the error bars placed correctly over each bar.
ggplot(data = Reed, aes(y = sulfide, x = treatment, fill=origin)) +
geom_bar(stat="identity", position="dodge") +
theme_bw() +
facet_grid( ~ time)+
xlab("treatment") +
ylab("Sulfide")+
ggtitle("Time")+
geom_errorbar(data = meandat, aes(ymin = mean - se, ymax = mean + se, y = mean),
position = position_dodge(width = .9))
You can actually do all of this via stat_summary, rather than calculating the summary statistics "by hand". An example is here. The code would look like so, and gives the same plot as above.
ggplot(data = Reed, aes(y = sulfide, x = treatment, fill=origin)) +
geom_bar(stat="identity",position="dodge") +
theme_bw() +
facet_grid( ~ time) +
xlab("treatment") +
ylab("Sulfide") +
ggtitle("Time") +
stat_summary(geom = "errorbar", fun.data = mean_cl_normal, mult = 1,
position = position_dodge(width = .9))
I've been using the development version of ggplot2, ggplot2_1.0.1.9003, and found that I needed to add stat_summary function arguments via fun.args. This would look like fun.args = list(mult = 1) to get error bars of 1 standard error.

ggplot: Add a theoretical range on a geom_violin()

Let's say we observed two species of beetles. We want to compare their size using geom_violin() as done below:
df = data.frame(species=rep(c('species_a','species_b'),3), size=c(1,1.5,1.2,1.8,1.1,1.9))
ggplot(df, aes(x=species, y=size)) + geom_violin()
Knowing that the expected size range is [0.8,1.8] for species_a and [1.2, 1.8] for species_b...
ranges = list(species_a=c(0.8,1.8), species_b=c(1.2,1.8))
How can we easily add this range (with a grey shape for example) on the graph?
Put ranges in separate data frame with species names and minimal/maximal values
ranges = data.frame(species=c('species_a','species_b'),
rmin=c(0.8,1.2),rmax=c(1.2,1.8))
ranges
species rmin rmax
1 species_a 0.8 1.8
2 species_b 1.2 1.8
Then use new data frame for geom_rect() to make area that is placed under the geom_violin(). The geom_blank() is used to make x axis according to original data frame.
ggplot(df, aes(x=species, y=size)) + geom_blank() +
geom_rect(data=ranges,aes(xmin=as.numeric(species)-0.45,
xmax=as.numeric(species)+0.45,
ymin=rmin,ymax=rmax),inherit.aes=FALSE)+
geom_violin()
You may try this:
# first, create data frame from list 'ranges'
df2 <- setNames(object = do.call(rbind.data.frame, ranges), nm = c("min_size", "max_size"))
df2$species <- rownames(df2)
# plot violins with 'df', and ranges with 'df2'.
# Set colour and size according to your own "data-ink ratio" preferences.
ggplot(data = df, aes(x = species)) +
geom_violin(aes(y = size)) +
geom_linerange(data = df2, aes(ymax = max_size, ymin = min_size), colour = "grey", size = 3)

Resources