Boxplot with geom_boxplot for many years - r

I want to make nice boxplot with ggplot2.
NB: the airquality data from ggplot2 can illustrated what I want to do but in my own data I have an additional column for year(1900:2000).
I make simple boxplot with this command:
tapply(data$Temp, substr(data$Month, 1,3),na.rm=TRUE, summary) #data=airquality
boxplot(Temp~Month, data=data, na.action = NULL, main="1900-2000")
It have this graphic:
But I when try with ggplot2 with this command:
ggplot(data, aes(Month, Temp),facet= Month~.) + geom_boxplot()
It get this graphic
In the same plot I want to view the corresponding Value and boxplot for each month like graphic1

Because Month is a continuous variable you will need to 'factorize' this variable to have seperate boxplots:
ggplot(airquality, aes(factor(Month), Temp)) + geom_boxplot()
alternatively you can use the group aesthetic:
ggplot(airquality, aes(Month, Temp, group = Month)) + geom_boxplot()

Related

plot multiple variables by group [r]

i want to plot multiple plots, where in each plot i have observations of a set variable for different time sets in function of a distance
short example of my df:
year <- c("2018","2018","2018","2018","2019","2019","2019","2019")
polutatnt <- c("NO2","NO2","SO2","SO2","NO2","NO2","SO2","SO2")
radius <- c("500m", "1000m","500m", "1000m","500m", "1000m","500m", "1000m")
value <- c(0.5,0.8,0.1,-0.2,0.3,-0.6,0.2,-0.2)
df <- data.frame(year,polutatnt,radius,value)
i would like to have one plot for each polutant, where i would have one line for each year in function of distance. i tried this line of code but i get a waring and empty plots:
ggplot(df, aes(radius, value, col = year)) +
geom_line() + facet_grid(polutatnt ~.)
geom_path: Each group consists of only one observation. Do you need to
adjust the group aesthetic?
According to the requirements described by you, this is what you want:
[EDIT] All of the blue points and red points linked
ggplot(df, aes(radius, value, color = year, group=polutatnt, shape=year)) +
geom_point(size=3) + geom_line(aes(group = year)) + facet_grid(polutatnt ~.)

Overlay density plot to each existing facet wrapped density plot in ggplot2?

I have a dataframe with ~37000 rows that contains 'name' in string format and 'UTCDateTime' in posixct format and am using it to produce a facet wrapped density plot of time grouped by the names:
I also have a separate density plot of posixct datetime data from an entirely different dataframe:
I want to overlay this second density plot on each individual facet_wrapped plot in the first density plot. Is there a way to do that? In general, if I have plots of any kind that are facet wrapped and another plot of the same type but different data that I want to overlay on each facet of the facet wrap, how do I do so?
This should in theory be as simple as not having the column that you're facetting by in the second dataframe. Example below:
library(ggplot2)
ggplot(iris, aes(Sepal.Width)) +
geom_density(aes(fill = Species)) +
geom_density(data = faithful,
aes(x = eruptions)) +
facet_wrap(~ Species)
Created on 2020-08-12 by the reprex package (v0.3.0)
EDIT: To get the densities on the same scale for the two types of data, you can use the computed variables using after_stat()*:
ggplot(iris, aes(Sepal.Width)) +
geom_density(aes(y = after_stat(scaled),
fill = Species)) +
geom_density(data = faithful,
aes(x = eruptions,
y = after_stat(scaled))) +
facet_wrap(~ Species)
* Prior to ggplot2 v3.3.0 also stat(variable) or ...variable....

Arranging data for two facet R line plot

I am trying to make a two facet line plot as this example. My problem is to arrange data to show desired variable on x-axis. Here is small data set I wanna use.
Study,Cat,Dim1,Dim2,Dim3,Dim4
Study1,PK,-3.00,0.99,-0.86,0.46
Study1,US,-4.67,0.76,1.01,0.45
Study2,FL,-2.856,4.15,1.554,0.765
Study2,FL,-8.668,5.907,3.795,4.754
I tried to use the following code to draw line graph from this data frame.
plot1 <- ggplot(data = dims, aes(x = Cat, y = Dim1, group = Study)) +
geom_line() +
geom_point() +
facet_wrap(~Study)
As is clear, I can only use one value column to draw lines. I want to put Dim1, Dim2, Dim3, Dim4 on x axis which I cannot do in this arrangement of data. [tried c(Dim1, Dim2, Dim3, Dim4) with no luck]
Probably the solution is to transpose the table but then I cannot reproduce categorization for facet (Study in above table) and colour (Cat in above table. Any ideas how to solve this issue?
You can try this:
library(tidyr)
library(dplyr)
gather(dims, variable, value, -Study, -Cat) %>%
ggplot(aes(x=variable, y=value, group=Cat, col=Cat)) +
geom_point() + geom_line() + facet_wrap(~Study)
The solution was quite easy. Just had to think a bit and the re-arranged data looks like this.
Study,Cat,Dim,Value
Study1,PK,Dim1,-3
Study1,PK,Dim2,0.99
Study1,PK,Dim3,-0.86
Study1,PK,Dim4,0.46
Study1,US,Dim1,-4.67
Study1,US,Dim2,0.76
Study1,US,Dim3,1.01
Study1,US,Dim4,0.45
Study2,FL,Dim1,-2.856
Study2,FL,Dim2,4.15
Study2,FL,Dim3,1.554
Study2,FL,Dim4,0.765
Study2,FL,Dim1,-8.668
Study2,FL,Dim2,5.907
Study2,FL,Dim3,3.795
Study2,FL,Dim4,4.754
After that R produced desire result with this code.
plot1 <- ggplot(data=dims, aes(x=Dim, y=Value, colour=Cat, group=Cat)) + geom_line()+ geom_point() + facet_wrap(~Study)

Plotting means as a line plot onto a scatter plot with ggplot

I have this simple data frame holding three replicates (value) for each factor (CT). I would like to plot it as geom_point and than the means of the point as geom_line.
gene <- c("Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5")
value <- c(0.86443, 0.79032, 0.86517, 0.79782, 0.79439, 0.89221, 0.93071, 0.87170, 0.86488, 0.91133, 0.87202, 0.84028, 0.83242, 0.74016, 0.86656)
CT <- c("ET","ET","ET", "HP","HP","HP","HT","HT","HT", "LT","LT","LT","P","P","P")
df<- cbind(gene,value,CT)
df<- data.frame(df)
So, I can make the scatter plot.
ggplot(df, aes(x=CT, y=value)) + geom_point()
How do I get a geom_line representing the means for each factor. I have tried the stat_summary:
ggplot(df, aes(x=CT, y=value)) + geom_point() +
stat_summary(aes(y = value,group = CT), fun.y=mean, colour="red", geom="line")
But it does not work.
"geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?"
But each group has three observations, what is wrong?
Ps. I am also interested in a smooth line.
You should set the group aes to 1:
ggplot(df, aes(x=CT, y=value)) + geom_point() +
stat_summary(aes(y = value,group=1), fun.y=mean, colour="red", geom="line",group=1)
You can use the dplyr package to get the means of each factor.
library(dplyr)
group_means <- df %>%
group_by(CT) %>%
summarise(mean = mean(value))
Then you will need to convert the factors to numeric to let you plot lines on the graph using the geom_segment function. In addition, the scale_x_continuous function will let you set the labels for the x axis.
ggplot(df, aes(x=as.numeric(CT), y=value)) + geom_point() +
geom_segment(aes(x=as.numeric(CT)-0.4, xend=as.numeric(CT)+0.4, y=mean, yend=mean),
data=group_means, colour="red") +
scale_x_continuous("name", labels=as.character(df$CT), breaks=as.numeric(df$CT))
Following on from hrbrmstr's comment you can add the smooth line using the following:
ggplot(df, aes(x=as.numeric(CT), y=value, group=1)) + geom_point() +
geom_segment(aes(x=as.numeric(CT)-0.4, xend=as.numeric(CT)+0.4, y=mean, yend=mean),
data=group_means, colour="red") +
scale_x_continuous("name", labels=as.character(df$CT), breaks=as.numeric(df$CT)) +
geom_smooth()

converting boxplots to densities in ggplot2 in R

I have the following ggplot2 plot:
ggplot(iris) + geom_boxplot(aes(x=Species, y=Petal.Length, fill=Species)) + coord_flip()
I would like to instead plot this as horizontal density plots or histograms, meaning have density line plots for each species or histograms instead of boxplots. This does not do the trick:
> ggplot(iris) + geom_density(aes(x=Species, y=Petal.Length, fill=Species)) + coord_flip()
Error in eval(expr, envir, enclos) : object 'y' not found
for simplicity I used Species as the x variable and as the fill but in my actual data the X axis represents one set of conditions and the fill represents another. Though that should not matter for plotting purposes. I'm trying to make it so the X axis represents different conditions for which the value y is plotted as a density/histogram instead of boxplots.
edit this is better illustrated with a variable that has two factor-like variables like Species. In the mpg dataset, I want to make a density plot for each manufacturer, plotting the distribution of displ for each cyl value. The x-axis (which is vertical in flipped coordinates) represents each manufacturer, and value being histogrammed is displ, but for each manufacturer, I want as many histograms as there are cyl values for that manufacturer. Hope this is clearer. I know that this doesn't work because y= expects counts.
ggplot(mpg, aes(x=manufacturer, fill=cyl, y=displ)) +
geom_density(position="identity") + coord_flip()
The closest I get is:
> ggplot(mpg, aes(x=displ, fill=cyl)) +
+ geom_density(position="identity") + facet_grid(manufacturer ~ .)
But I don't want different grids, I'd like them to be different entries in the same plot like in the histogram case.
Something like this? For both histogram and density plots, the y variable is count. So, you've to plot x = Petal.Length whose frequency (for that given binwidth) will be plotted in the y-axis. Just use fill=Species along with x=Petal.Length to give colours by Species.
For histogram:
ggplot(iris, aes(x=Petal.Length, fill=Species)) +
geom_histogram(position="identity") + coord_flip()
For density:
ggplot(iris, aes(x=Petal.Length, fill=Species)) +
geom_density(position="identity") + coord_flip()
Edit: Maybe you're looking for facetting??
ggplot(mpg, aes(x=displ, fill=factor(cyl))) +
geom_density(position="identity") +
facet_wrap( ~ manufacturer, ncol=3)
Gives:
Edit: Since, you don't want facetting, the only other way I can think of is to create a separate group by pasting manufacturer and cyl together:
dd <- mpg
dd$grp <- factor(paste(dd$manufacturer, dd$cyl))
ggplot(dd, aes(x=displ)) +
geom_density(aes(fill=grp), position="identity")
gives:

Resources