Histogram with different colours using the abline function in R - r

I would like to plot a histogram with different colours and legend.
Assuming the following data:
df1<- rnorm(300,60,5)
I have used the following codes to get the histogram plot and the lines using the abline function:
df1<-data.frame(df1)
attach(df1)
hist(M,at=seq(0,100, 2))
abline(v=80, col="blue")
abline(v=77, col="red")
abline(v=71, col="red")
abline(v=68, col="blue")
abline(v=63, col="blue")
abline(v=58, col="blue")
abline(v=54, col="blue")
abline(v=51, col="blue")
abline(v=457, col="blue")
Now I want to get the following plot. I wanted to remove the lines, but I was unable to do it. So I do not need to have the lines.

Here's one way of doing that with ggplot2, dplyr and tidyr.
First you need to set the colors. I do that with mutate and case_when. For the plot itself, it's important to remember that if histogram bins are not aligned, you can get different colors on the same bar. To avoid this, you can use binwidth=1.
library(ggplot2)
library(dplyr)
library(tidyr)
df1 <- data.frame(data1=rnorm(300,60,5))
df1 <- df1 %>%
mutate(color_name=case_when(data1<60 ~ "red",
data1>=60 & data1 <63 ~ "blue",
TRUE ~ "cyan"))
ggplot(df1,aes(x=data1, fill=color_name)) +
geom_histogram(binwidth = 1, boundary = 0, position="dodge") +
scale_fill_identity(guide = "legend")
Additional request in comment
Using case_when with four colors:
df1 <- data.frame(data1=rnorm(300,60,5))
df1 <- df1 %>%
mutate(color_name=case_when(data1<60 ~ "red",
data1>=60 & data1 <63 ~ "blue",
data1>=63 & data1 <65 ~ "orange",
TRUE ~ "cyan"))
ggplot(df1,aes(x=data1, fill=color_name)) +
geom_histogram(binwidth = 1, boundary = 0, position="dodge") +
scale_fill_identity(guide = "legend")

Related

Annotate ggplot boxplot facets with number of observations per bar/group

I already looked through other questions like these (Annotate ggplot2 facets with number of observations per facet), but didn't find the answer for carrying out an annotation of the single bars of a boxplot with facets.
Here's my sample code for creating the boxplot:
require(ggplot2)
require(plyr)
mms <- data.frame(deliciousness = rnorm(100),
type=sample(as.factor(c("peanut", "regular")),
100, replace=TRUE),
color=sample(as.factor(c("red", "green", "yellow", "brown")),
100, replace=TRUE))
ggplot(mms, aes(x=type, y=deliciousness, fill=type)) +
geom_boxplot(notch=TRUE)+
facet_wrap(~ color,nrow=3, scales = "free")+
xlab("")+
scale_fill_manual(values = c("coral1", "lightcyan1", "olivedrab1"))+
theme(legend.position="none")
And here the corresponding plot:
Now I want to annotate individually for each facet of the color the number of observations per group (peanut/regular), as shown in my drawing:
What I already did, was summarizing the number of observations with dpyr per color and per group (peanut/regular) with this code:
mms.cor <- ddply(.data=mms,
.(type,color),
summarize,
n=paste("n =", length(deliciousness)))
However, I do not know how to add this summary of the data to the ggplot. How can this be done?
Try this approach using dplyr and ggplot2. You can build the label with mutate() and then format to have only one value based on max value of deliciousness. After that geom_text() can enable the text as you want. Here the code:
library(dplyr)
library(ggplot2)
#Data
mms <- data.frame(deliciousness = rnorm(100),
type=sample(as.factor(c("peanut", "regular")),
100, replace=TRUE),
color=sample(as.factor(c("red", "green", "yellow", "brown")),
100, replace=TRUE))
#Plot
mms %>% group_by(color,type) %>% mutate(N=n()) %>%
mutate(N=ifelse(deliciousness==max(deliciousness,na.rm=T),paste0('n=',N),NA)) %>%
ggplot(aes(x=type, y=deliciousness, fill=type,label=N)) +
geom_boxplot(notch=TRUE)+
geom_text(fontface='bold')+
facet_wrap(~ color,nrow=3, scales = "free")+
xlab("")+
scale_fill_manual(values = c("coral1", "lightcyan1", "olivedrab1"))+
theme(legend.position="none")
Output:

Can you translate this into ggplot?

So basically, I would like to use ggplot function geom_line + geom_point to create the same plots but with fancier graphics.
> a
V1 V2 V3
1 0.8224887 0.7882316 0.7596440
2 0.7892779 0.7604186 0.7409430
3 0.8254516 0.8257800 0.8014778
4 0.8268519 0.7887464 0.7887322
5 0.8226651 0.7981079 0.7934783
plot(6:10, a$V1, type="l", xlab="Folds", ylab="Accuracy", col="Blue",ylim=c(0.7,0.9))
par(new=TRUE)
plot(6:10, a$V2, type="l", xlab="Folds", ylab="Accuracy", col="Orange",ylim=c(0.7,0.9))
par(new=TRUE)
plot(6:10, a$V3, type="l", xlab="Folds", ylab="Accuracy", col="Green",ylim=c(0.7,0.9))
My main goal is to get a legend that helps to distinguish each variable.
I tried to plot just the first line:
ggplot(data = a)+
theme_classic()+
geom_line(aes(x=6:10, y = a$V1, color = "blue"))
The problem is that i don't even get the color I want.
Thanks for reading and helping!
library (dplyr)
library (ggplot2)
a <- data.frame(
V1=rnorm(5),
V2=rnorm(5),
V3=rnorm(5),
Folds = 6:10) # make some example data
a %>%
tidyr::gather(key,value,-Folds) %>% #get data in long format for ggplot
ggplot(.,aes(x = Folds,y = value,y,col = key))+
geom_line() + # add line
geom_point() + # add points
scale_color_manual("My Variables",values = c("blue","orange","green")) + #change colours
theme_classic()
library(tidyverse)
originalData <- tibble(
V1=c(0.8224887, 0.7892779, 0.8254516, 0.8268519, 0.8226651),
V2=c(0.7882316, 0.7604186, 0.8257800, 0.7887464, 0.7981079),
V3=c(0.7596440, 0.7409430, 0.8014778, 0.7887322, 0.7934783)
)
# ggplot works best if your data is 'tidy'
tidyData <- originalData %>%
pivot_longer(cols=c(V1, V2, V3), names_to="Variable") %>%
add_column(X=rep(6:10, each=3))
tidyData
tidyData %>%
ggplot(aes(x=X, y=value, colour=Variable)) +
geom_line() +
theme_classic()
Giving
You can customise your plot from here as you like.

Sorting a subset and/or maintaining file order within ggplot facets

I'm creating a forest plot of effect sizes (denoted as or), faceted by their type (X or Y), using ggplot2.
I would like for the first line of each facet to be the summary effect size (cite=='Summary') for that facet, followed by a row per study, sorted by effect size (I don't particularly care if it's ascending or descending). Although I can easily create a dataframe corresponding to this, I can't seem to get it to plot in order in both facets without sorting the summary effect size too.
Please assume that there are too many datapoints to manually specify the order in which they should appear- below is a minimally representative subsample.
cite <- as.factor(c("A","B","C","B","A"))
or <- c(8.132075,3.475255,5.727273,4.334704,4.009901)
lowerCI <- c(4.6841118,1.5059889,-0.5582456,2.3612416,-2.6439191)
upperCI <- c(11.580039,5.444521,12.012791,6.308167,10.663721)
type <- as.factor(c("X","X","X","Y","Y"))
df <- data.frame(cite, or, lowerCI, upperCI, type)
df <- df[order(df$type, -xtfrm(df$or)), ] # Sorting within type by or
Adding rows for summaries to the end of the dataset so they're unsorted:
X.row <- list(cite="Summary",or=3.506705,lowerCI=1.5375528,upperCI=5.475857,type="X")
df[nrow(df) + 1, names(X.row)] <- X.row
Y.row <- list(cite="Summary",or=4.332824,lowerCI=2.3594369,upperCI=6.306212,type="Y")
df[nrow(df) + 1, names(Y.row)] <- Y.row
Using code to maintain file order based on this answer:
df <- transform(df,cite=factor(cite,levels=unique(cite)))
Plotting attempt:
plot<-ggplot(data=df, aes(y=cite, x=or, xmin=lowerCI, xmax=upperCI, shape = type)) +
geom_point(color = 'black', size=2)+
geom_errorbarh(height=.1)+
geom_point(data=subset(df,cite=='Summary'), color='black', size=5)+
facet_grid(type~., scales= 'free', space='free')+
scale_y_discrete(breaks=levels(df$cite),
labels=c(levels(df$cite)[1:3], expression(italic("Summary Effect"))))
Results:
The problem is that the first facet, but not the second facet, is appropriately sorted. So I tried an alternative suggested here:
Xdat<-subset(df, type=="X")
Ydat<-subset(df, type=="Y")
Xdat <- transform(Xdat,cite=factor(cite,levels=unique(cite)))
Ydat <- transform(Ydat,cite=factor(cite,levels=unique(cite)))
plot2<-ggplot(mapping=aes(y=cite, x=or, xmin=lowerCI, xmax=upperCI, shape = type)) +
geom_point(data=Xdat,color = 'black', size=2)+
geom_point(data=subset(Xdat,cite=='Summary'), color='black', size=7)+
geom_errorbarh(data=Xdat,height=.1)+
geom_point(data=Ydat,color = 'black', size=2)+
geom_point(data=subset(Ydat,cite=='Summary'), color='black', size=7)+
geom_errorbarh(data=Ydat,height=.1)+
facet_grid(type~., scales= 'free', space='free')+
scale_y_discrete(breaks=levels(df$cite),
labels=c(levels(df$cite)[1:3], expression(italic("Summary Effect"))))
This yields the same result. Any ideas?
ETA: I tried the solution posted here as recommended, and it maintains the sorting, but I'm now unable to get y-axis labels to display properly.
Creating the unique cite variable:
df$type <- factor(df$type, levels = c("X","Y"))
df$cite.type <- with(df, paste(cite, type, sep = "_"))
df$cite.type <- as.factor(df$cite.type)
df <- transform(df,cite.type=factor(cite.type,levels=unique(cite.type)))
df <- transform(df,cite=factor(cite,levels=unique(cite)))
Plotting (note that I cannot retain the y=reorder(cite.type,or) part because that would shift the summary effects:
ggplot(data=df, aes(y=cite.type, x=or, xmin=lowerCI, xmax=upperCI, shape = type)) +
geom_point(color = 'black', size=2)+
geom_errorbarh(height=.1)+
geom_point(data=subset(df,cite=='Summary'), color='black', size=5)+
facet_grid(type~., scales= 'free', space='free')+
scale_y_discrete(breaks=levels(df$cite.type),
labels=c(levels(df$cite)[1:3], expression(italic("Summary Effect"))))
And here is the result:
Note that it's now sorted appropriately but the y axis labels are only printed once per cite.

How to plot lines for the count data in R?

I have data frame like this:
frame <- data.frame("AGE" = seq(18,44,1),
"GROUP1"= c(83,101,159,185,212,276,330,293,330,356,370,325,264,274,214,229,227,154,132,121,83,69,57,32,16,17,8),
"GROUP2"= c(144,210,259,329,391,421,453,358,338,318,270,258,207,186,173,135,106,92,74,56,41,31,25,13,16,5,8))
I want to plot AGE in X-axis and value of GROUP1 and GROUP2 in the Y-axis in the same plot with different colors. And the values should be joined by a smoothened line.
As a first part, I melted the data frame and plotted:
melt <- melt(frame, id.vars = "AGE")
melt <- melt[order(melt$AGE),]
plot(melt$AGE, melt$value)
Here is an alternative solution using dplyr and tidyr packages.
library(dplyr)
library(tidyr)
newframe <- frame %>% gather("variable","value",-AGE)
ggplot(newframe, aes(x=AGE, y=value, color=variable)) +
geom_point() +
geom_smooth()
You could use geom_line() to get lines between the points, but it feels better to use geom_smooth() here. geom_area gives you a shaded area under the lines, but we need to change color to fill.
ggplot(newframe, aes(x=AGE, y=value, fill=variable)) + geom_area()
We can use matplot
matplot(`row.names<-`(as.matrix(frame[-1]), frame[,1]),
ylab='value',type = "l", xlab = "AGE",col = c("red", "blue"), pch = 1)
legend("topright", inset = .05, legend = c("GROUP1", "GROUP2"),
pch = 1, col = c("red", "blue"), horiz = TRUE)
Try,
library(ggplot2)
ggplot(meltdf,aes(x=AGE,y=value,colour=variable,group=variable)) + geom_line()

Plot multiple columns against a single column in one plot in R

I have a dataframe with 9 columns, including a column for elapsed time (Days):
Probes <- data.frame(Days=seq(0.01, 4.91, 0.01), DJ1=-45:445, SJ2=-50:440, SJ3=10:500, SJ4=-200:290, DJ2=-150:340, DJ9=100:590, SJ21=-300:190, SJ32=-100:390)
I would like to create a line graph with multiple series on one plot. I would be plotting EACH of the remaining columns in the data frame against the first column (Days).
I know the basic code to do this...However, I am wondering if there is a simpler option for when you would like to plot ALL of the columns in the dataframe against one column, as I am doing now. It seems redundant to have to type a line of code for every single column name, particularly because I have to this on over 50 data frames.
For my particular dataset, I would like to color the columns with "S" in the header as "brown" and the columns with "D" in the header as orange.
The long and unwieldy code I have for this is written below so that you can see the output I am going for. Is there a simpler way to this (perhaps in ggplot)?
png("ProbesPlot.png", height = 1000, width = 1400, units = "px", res = 200)
par(xpd=T, mar=par()$mar+c(0,0,0,6))
plot(Probes$Days, Probes$DJ1, type="l", lwd=2, lty=1, col="orange", main="Depth in Sediments- 4.5 cm", xlab="Elapsed Time (days)", ylab="Eh (mV)", xlim=c(0, 8), ylim=c(-400,600), bty="n")
lines(Probes$Days, Probes$SJ2, type="l", lwd=2, lty=1, col="brown")
lines(Probes$Days, Probes$SJ3, type="l", lwd=2, lty=1, col="brown")
lines(Probes$Days, Probes$SJ4, type="l", lwd=2, lty=1, col="brown")
lines(Probes$Days, Probes$DJ2, type="l", lwd=2, lty=1, col="orange ")
lines(Probes$Days, Probes$DJ9, type="l", lwd=2, lty=1, col="orange ")
lines(Probes$Days, Probes$SJ21, type="l", lwd=2, lty=1, col="brown")
lines(Probes$Days, Probes$SJ32, type="l", lwd=2, lty=1, col="brown")
legend(8.5,400, c("4 cm", "1 cm"), lty=c(1,1), lwd=c(2,2), col=c("orange", " brown"), bty="n", title="Depth")
dev.off()
3/1 Edit to question #eipi10 I have an additional dataframe with 29 columns including a column for elapsed time (Days). I would like to create the same type of plot, but instead of 2 color categories (D and S) I now have 3 (D, M, and S). This is an example of the structure of my new dataframe:
Probes <- data.frame(Days=seq(0.01, 4.91, 0.01), B1D.J1=-45:445, B1S.J2=-50:440, B1M.J3=10:500, B1S.J4=-200:290, B1D.J25=-150:340, B1D.J9=100:590, B1S.J21=-300:190, B1S.J32=-100:390, B1D.J18=-15:475, B1M.J22=-70:420, B1M.J31=5:495, B1S.J43=-200:290, B1D.J27=-150:340, B1D.J99=100:590, B1S.J87=-300:190, B1S.J65=-100:390, B1S.J44=-300:190, B1M.J90=-100:390, B1D.J18=-15:475, B1M.J26=-70:420, B1M.J66=5:495, B1M.J43=-200:290, B1D.J52=-150:340, B1D.J96=-50:440, B1M.J53=-300:190, B1M.J74=-100:390)
Also, the only legend I would essentially need is a small one differentiating the colors: Orange=Deep, Brown= Mid, Red=Shallow (D, M, and S)
Following #MikeWise's suggestion, here's how you can make this much easier with ggplot2:
library(ggplot2)
library(reshape2)
library(dplyr)
Probes <- data.frame(Days=seq(0.01, 4.91, 0.01), DJ1=-45:445, SJ2=-50:440, SJ3=10:500, SJ4=-200:290, DJ2=-150:340, DJ9=100:590, SJ21=-300:190, SJ32=-100:390)
# Convert data to long format
Probes.m = melt(Probes, id.var="Days")
# Reorder factor levels so "D's" and "S's" are grouped together
Probes.m$variable = factor(Probes.m$variable,
levels=sort(unique(as.character(Probes.m$variable))))
ggplot(Probes.m, aes(Days, value, colour=variable)) +
geom_line() +
scale_colour_manual(values=rep(c("orange","brown"), c(3,5))) +
guides(colour=FALSE) +
geom_text(data=Probes.m %>% group_by(variable) %>% filter(value==max(value)),
aes(label=variable, x=Days + 0.1, y=value), size=3, hjust=0) +
labs(x="x-label", y="y-label") +
ggtitle("My Title")
UPDATE: Based on your comment and the updated data you provided, the issue is that many of your lines exactly overlap, so they plot one on top of the other. You can see this by inspecting your sample data. Also, using your new data and using geom_text_repel instead of geom_text you can see there are multiple labels for the "same" line, which is really just multiple copies of the same line with different names.
ggplot(Probes.m, aes(Days, value, colour=variable)) +
geom_line() +
scale_colour_manual(values=rep(c("orange","brown","red"), c(9,9,8))) +
guides(colour=FALSE) +
geom_text_repel(data=Probes.m %>% group_by(variable) %>% filter(value==max(value)),
aes(label=variable, x=Days + 0.1, y=value), size=3) +
labs(x="x-label", y="y-label") +
ggtitle("My Title")
Building on answer from #eipi10, to cope with overlapping text you could make use of a new package ggrepel which provides a new 'geom' for ggplot. [I started answering this earlier using tidyr instead of reshape2 and don't have reputation to comment yet so posting this as new answer]
library(ggplot2)
library(tidyr)
library(dplyr)
library(ggrepel)
Probes <- data.frame(Days=seq(0.01, 4.91, 0.01), DJ1=-45:445, SJ2=-50:440, SJ3=10:500, SJ4=-200:290, DJ2=-150:340, DJ9=100:590, SJ21=-300:190, SJ32=-100:390)
tidiedProbes <-
Probes %>%
tidyr::gather(key = probeRef, value = `Eh (mV)`, - Days) %>%
dplyr::mutate(lineColour = ifelse(substring(probeRef, 1, 1) == "D", "D", "S"))
dataForLabels <-
tidiedProbes %>%
filter(Days == max(Days))
newPlot <-
tidiedProbes %>%
ggplot() +
geom_line(aes(x = Days, y = `Eh (mV)`, colour = lineColour, group = probeRef)) +
scale_color_manual(values = c("orange", "brown")) +
geom_text_repel(
data = dataForLabels,
aes(x = Days, y = `Eh (mV)`, label = probeRef)) +
ggtitle("Depth in Sediments - 4.5 cm") +
xlab("Elapsed Time (days)") +
theme(legend.title = element_blank())
newPlot
Don't have reputation for posting images yet but hopefully plot looks as expected

Resources