Plot multiple columns against a single column in one plot in R - r

I have a dataframe with 9 columns, including a column for elapsed time (Days):
Probes <- data.frame(Days=seq(0.01, 4.91, 0.01), DJ1=-45:445, SJ2=-50:440, SJ3=10:500, SJ4=-200:290, DJ2=-150:340, DJ9=100:590, SJ21=-300:190, SJ32=-100:390)
I would like to create a line graph with multiple series on one plot. I would be plotting EACH of the remaining columns in the data frame against the first column (Days).
I know the basic code to do this...However, I am wondering if there is a simpler option for when you would like to plot ALL of the columns in the dataframe against one column, as I am doing now. It seems redundant to have to type a line of code for every single column name, particularly because I have to this on over 50 data frames.
For my particular dataset, I would like to color the columns with "S" in the header as "brown" and the columns with "D" in the header as orange.
The long and unwieldy code I have for this is written below so that you can see the output I am going for. Is there a simpler way to this (perhaps in ggplot)?
png("ProbesPlot.png", height = 1000, width = 1400, units = "px", res = 200)
par(xpd=T, mar=par()$mar+c(0,0,0,6))
plot(Probes$Days, Probes$DJ1, type="l", lwd=2, lty=1, col="orange", main="Depth in Sediments- 4.5 cm", xlab="Elapsed Time (days)", ylab="Eh (mV)", xlim=c(0, 8), ylim=c(-400,600), bty="n")
lines(Probes$Days, Probes$SJ2, type="l", lwd=2, lty=1, col="brown")
lines(Probes$Days, Probes$SJ3, type="l", lwd=2, lty=1, col="brown")
lines(Probes$Days, Probes$SJ4, type="l", lwd=2, lty=1, col="brown")
lines(Probes$Days, Probes$DJ2, type="l", lwd=2, lty=1, col="orange ")
lines(Probes$Days, Probes$DJ9, type="l", lwd=2, lty=1, col="orange ")
lines(Probes$Days, Probes$SJ21, type="l", lwd=2, lty=1, col="brown")
lines(Probes$Days, Probes$SJ32, type="l", lwd=2, lty=1, col="brown")
legend(8.5,400, c("4 cm", "1 cm"), lty=c(1,1), lwd=c(2,2), col=c("orange", " brown"), bty="n", title="Depth")
dev.off()
3/1 Edit to question #eipi10 I have an additional dataframe with 29 columns including a column for elapsed time (Days). I would like to create the same type of plot, but instead of 2 color categories (D and S) I now have 3 (D, M, and S). This is an example of the structure of my new dataframe:
Probes <- data.frame(Days=seq(0.01, 4.91, 0.01), B1D.J1=-45:445, B1S.J2=-50:440, B1M.J3=10:500, B1S.J4=-200:290, B1D.J25=-150:340, B1D.J9=100:590, B1S.J21=-300:190, B1S.J32=-100:390, B1D.J18=-15:475, B1M.J22=-70:420, B1M.J31=5:495, B1S.J43=-200:290, B1D.J27=-150:340, B1D.J99=100:590, B1S.J87=-300:190, B1S.J65=-100:390, B1S.J44=-300:190, B1M.J90=-100:390, B1D.J18=-15:475, B1M.J26=-70:420, B1M.J66=5:495, B1M.J43=-200:290, B1D.J52=-150:340, B1D.J96=-50:440, B1M.J53=-300:190, B1M.J74=-100:390)
Also, the only legend I would essentially need is a small one differentiating the colors: Orange=Deep, Brown= Mid, Red=Shallow (D, M, and S)

Following #MikeWise's suggestion, here's how you can make this much easier with ggplot2:
library(ggplot2)
library(reshape2)
library(dplyr)
Probes <- data.frame(Days=seq(0.01, 4.91, 0.01), DJ1=-45:445, SJ2=-50:440, SJ3=10:500, SJ4=-200:290, DJ2=-150:340, DJ9=100:590, SJ21=-300:190, SJ32=-100:390)
# Convert data to long format
Probes.m = melt(Probes, id.var="Days")
# Reorder factor levels so "D's" and "S's" are grouped together
Probes.m$variable = factor(Probes.m$variable,
levels=sort(unique(as.character(Probes.m$variable))))
ggplot(Probes.m, aes(Days, value, colour=variable)) +
geom_line() +
scale_colour_manual(values=rep(c("orange","brown"), c(3,5))) +
guides(colour=FALSE) +
geom_text(data=Probes.m %>% group_by(variable) %>% filter(value==max(value)),
aes(label=variable, x=Days + 0.1, y=value), size=3, hjust=0) +
labs(x="x-label", y="y-label") +
ggtitle("My Title")
UPDATE: Based on your comment and the updated data you provided, the issue is that many of your lines exactly overlap, so they plot one on top of the other. You can see this by inspecting your sample data. Also, using your new data and using geom_text_repel instead of geom_text you can see there are multiple labels for the "same" line, which is really just multiple copies of the same line with different names.
ggplot(Probes.m, aes(Days, value, colour=variable)) +
geom_line() +
scale_colour_manual(values=rep(c("orange","brown","red"), c(9,9,8))) +
guides(colour=FALSE) +
geom_text_repel(data=Probes.m %>% group_by(variable) %>% filter(value==max(value)),
aes(label=variable, x=Days + 0.1, y=value), size=3) +
labs(x="x-label", y="y-label") +
ggtitle("My Title")

Building on answer from #eipi10, to cope with overlapping text you could make use of a new package ggrepel which provides a new 'geom' for ggplot. [I started answering this earlier using tidyr instead of reshape2 and don't have reputation to comment yet so posting this as new answer]
library(ggplot2)
library(tidyr)
library(dplyr)
library(ggrepel)
Probes <- data.frame(Days=seq(0.01, 4.91, 0.01), DJ1=-45:445, SJ2=-50:440, SJ3=10:500, SJ4=-200:290, DJ2=-150:340, DJ9=100:590, SJ21=-300:190, SJ32=-100:390)
tidiedProbes <-
Probes %>%
tidyr::gather(key = probeRef, value = `Eh (mV)`, - Days) %>%
dplyr::mutate(lineColour = ifelse(substring(probeRef, 1, 1) == "D", "D", "S"))
dataForLabels <-
tidiedProbes %>%
filter(Days == max(Days))
newPlot <-
tidiedProbes %>%
ggplot() +
geom_line(aes(x = Days, y = `Eh (mV)`, colour = lineColour, group = probeRef)) +
scale_color_manual(values = c("orange", "brown")) +
geom_text_repel(
data = dataForLabels,
aes(x = Days, y = `Eh (mV)`, label = probeRef)) +
ggtitle("Depth in Sediments - 4.5 cm") +
xlab("Elapsed Time (days)") +
theme(legend.title = element_blank())
newPlot
Don't have reputation for posting images yet but hopefully plot looks as expected

Related

How to write the abscissa of the maximum under x'x

I'm plotting a time series data, say 'data1', I use plot.ts(data1)
then I use abline(which.max(data1))
Now I want to add the abscissa of the maximum point, say x-abscissa=19, but sometimes it appears confounded to the number that already exist in x-axis,
My question: how can I write the abscissa of the maximum below the number that already exist on x'x.
s=c(1,1.5,2,4,1,1,5,3,5,2,3,5,2,5,2,2,4,2,7,5,2,3,5,2,3,5,2,3,5,2,3,5)
plot.ts(s)
abline(v=which.max(s), col= "red", lty=2, lwd=1)
axis(1,which.max(s))
Does this work for you Salman?
# axis(1,which.max(s))
library(glue)
label <- glue("Max is {max(s)}")
text(which.max(s), 0.2*max(s), label)
Ah, OK, like this? Not really sure what x'x means.
(I swapped to tidyverse from base R, which is much easier to use, and gives beautiful plots)
library(glue)
library(tibble)
library(ggplot2)
s <- c(1,1.5,2,4,1,1,5,3,5,2,3,5,2,5,2,2,4,2,7,5,2,3,5,2,3,5,2,3,5,2,3,5)
s <- tibble(x = 1:32, y = s)
label <- glue("Max is {max(s$y)}")
ref_line <- which.max(s$y)
ggplot(s, aes(x, y)) +
geom_line() +
labs(tag = label) +
theme(plot.tag.position = c(.65, 0.02)) +
geom_vline(xintercept = ref_line, col = "red")

Histogram with different colours using the abline function in R

I would like to plot a histogram with different colours and legend.
Assuming the following data:
df1<- rnorm(300,60,5)
I have used the following codes to get the histogram plot and the lines using the abline function:
df1<-data.frame(df1)
attach(df1)
hist(M,at=seq(0,100, 2))
abline(v=80, col="blue")
abline(v=77, col="red")
abline(v=71, col="red")
abline(v=68, col="blue")
abline(v=63, col="blue")
abline(v=58, col="blue")
abline(v=54, col="blue")
abline(v=51, col="blue")
abline(v=457, col="blue")
Now I want to get the following plot. I wanted to remove the lines, but I was unable to do it. So I do not need to have the lines.
Here's one way of doing that with ggplot2, dplyr and tidyr.
First you need to set the colors. I do that with mutate and case_when. For the plot itself, it's important to remember that if histogram bins are not aligned, you can get different colors on the same bar. To avoid this, you can use binwidth=1.
library(ggplot2)
library(dplyr)
library(tidyr)
df1 <- data.frame(data1=rnorm(300,60,5))
df1 <- df1 %>%
mutate(color_name=case_when(data1<60 ~ "red",
data1>=60 & data1 <63 ~ "blue",
TRUE ~ "cyan"))
ggplot(df1,aes(x=data1, fill=color_name)) +
geom_histogram(binwidth = 1, boundary = 0, position="dodge") +
scale_fill_identity(guide = "legend")
Additional request in comment
Using case_when with four colors:
df1 <- data.frame(data1=rnorm(300,60,5))
df1 <- df1 %>%
mutate(color_name=case_when(data1<60 ~ "red",
data1>=60 & data1 <63 ~ "blue",
data1>=63 & data1 <65 ~ "orange",
TRUE ~ "cyan"))
ggplot(df1,aes(x=data1, fill=color_name)) +
geom_histogram(binwidth = 1, boundary = 0, position="dodge") +
scale_fill_identity(guide = "legend")

How to tailor R plot axis?

I have the following data stored in a CSV file.
NodeID,pageRank
0,0.0327814931593
1,0.384378430034
2,0.342932804288
3,0.0390870921
4,0.0808856932345
5,0.0390870921
I have read the CSV file in R and ordered pageRank values in descending order.
data <- read.csv("pagerank.csv")
data <- data[order(-data$pageRank),]
After ordering, data look following.
1 0.38437843
2 0.34293280
4 0.08088569
3 0.03908709
5 0.03908709
0 0.03278149
In the above example, the first column represents NodeID (not sequentially ordered) and the second column represents pageRank (descending order). Next I have used following command to plot the data.
plot(data$pageRank, type="o", col="red", xlab="Node Rank", ylab="PageRank Value")
The plot is showing Y-axis (pageRank values) properly. However, on the X-axis it is showing sequential numbers (0,1,2,3,4,5). Hence, instead of showing sequential number, how can I plot NodeID (1,2,4,3,5,0) on the X-axis (available in data) by maintaining pageRank's descending order. I have tried the following. However, it does not maintain pageRank's descending order.
plot(data$NodeID, data$pageRank, type="o", col="red", xlab="NodeID", ylab="PageRank Value")
In ggplot, we assemble the plot from layers connected with the plus operator +.
So we can start by defining the dataset to plot (data), then use the aes function to specify which variables to use for the x and y axes. Finally we tell it to plot both points and a line using this data.
library(ggplot2)
ggplot(data = data,
aes(x = NodeID, y = pageRank)) +
geom_point() +
geom_line() +
xlab('Node Rank') + ylab('PageRank Value')
Simple! I highly recommend using ggplot whenever possible over the limited and obtusely designed base R graphics.
You can use fct_reorder from the forcats package to do the ordering for you. See also this.
library(magrittr)
library(tidyverse)
library(forcats)
txt <- "NodeID,pageRank
0,0.0327814931593
1,0.384378430034
2,0.342932804288
3,0.0390870921
4,0.0808856932345
5,0.0390870921"
df <- read_csv(txt)
# Convert NodeID column to factor first
df %<>%
mutate(NodeID = factor(NodeID))
# Plot
ggplot(df, aes(pageRank, fct_reorder(NodeID, pageRank))) +
geom_point() +
geom_segment(aes(y = NodeID, yend = NodeID, x = 0, xend = pageRank), color = "red") +
scale_x_continuous(expand = c(0, 0)) +
ylab("Node Rank") +
xlab("PageRank Value") +
theme_classic(base_size = 16)
ggplot(df, aes(y = pageRank, x = fct_reorder(NodeID, -pageRank))) +
geom_line(group = 1, color = "red") +
scale_y_continuous(expand = c(0, 0)) +
xlab("Node Rank") +
ylab("PageRank Value") +
theme_classic(base_size = 16)
Using base R plot
df$NodeID <- factor(df$NodeID, levels = c("1", "2", "4", "3", "5", "0"))
plot(df$pageRank ~ df$NodeID, xlab = "Node Rank", ylab = "PageRank Value")
Created on 2018-08-08 by the reprex package (v0.2.0.9000).

Sorting a subset and/or maintaining file order within ggplot facets

I'm creating a forest plot of effect sizes (denoted as or), faceted by their type (X or Y), using ggplot2.
I would like for the first line of each facet to be the summary effect size (cite=='Summary') for that facet, followed by a row per study, sorted by effect size (I don't particularly care if it's ascending or descending). Although I can easily create a dataframe corresponding to this, I can't seem to get it to plot in order in both facets without sorting the summary effect size too.
Please assume that there are too many datapoints to manually specify the order in which they should appear- below is a minimally representative subsample.
cite <- as.factor(c("A","B","C","B","A"))
or <- c(8.132075,3.475255,5.727273,4.334704,4.009901)
lowerCI <- c(4.6841118,1.5059889,-0.5582456,2.3612416,-2.6439191)
upperCI <- c(11.580039,5.444521,12.012791,6.308167,10.663721)
type <- as.factor(c("X","X","X","Y","Y"))
df <- data.frame(cite, or, lowerCI, upperCI, type)
df <- df[order(df$type, -xtfrm(df$or)), ] # Sorting within type by or
Adding rows for summaries to the end of the dataset so they're unsorted:
X.row <- list(cite="Summary",or=3.506705,lowerCI=1.5375528,upperCI=5.475857,type="X")
df[nrow(df) + 1, names(X.row)] <- X.row
Y.row <- list(cite="Summary",or=4.332824,lowerCI=2.3594369,upperCI=6.306212,type="Y")
df[nrow(df) + 1, names(Y.row)] <- Y.row
Using code to maintain file order based on this answer:
df <- transform(df,cite=factor(cite,levels=unique(cite)))
Plotting attempt:
plot<-ggplot(data=df, aes(y=cite, x=or, xmin=lowerCI, xmax=upperCI, shape = type)) +
geom_point(color = 'black', size=2)+
geom_errorbarh(height=.1)+
geom_point(data=subset(df,cite=='Summary'), color='black', size=5)+
facet_grid(type~., scales= 'free', space='free')+
scale_y_discrete(breaks=levels(df$cite),
labels=c(levels(df$cite)[1:3], expression(italic("Summary Effect"))))
Results:
The problem is that the first facet, but not the second facet, is appropriately sorted. So I tried an alternative suggested here:
Xdat<-subset(df, type=="X")
Ydat<-subset(df, type=="Y")
Xdat <- transform(Xdat,cite=factor(cite,levels=unique(cite)))
Ydat <- transform(Ydat,cite=factor(cite,levels=unique(cite)))
plot2<-ggplot(mapping=aes(y=cite, x=or, xmin=lowerCI, xmax=upperCI, shape = type)) +
geom_point(data=Xdat,color = 'black', size=2)+
geom_point(data=subset(Xdat,cite=='Summary'), color='black', size=7)+
geom_errorbarh(data=Xdat,height=.1)+
geom_point(data=Ydat,color = 'black', size=2)+
geom_point(data=subset(Ydat,cite=='Summary'), color='black', size=7)+
geom_errorbarh(data=Ydat,height=.1)+
facet_grid(type~., scales= 'free', space='free')+
scale_y_discrete(breaks=levels(df$cite),
labels=c(levels(df$cite)[1:3], expression(italic("Summary Effect"))))
This yields the same result. Any ideas?
ETA: I tried the solution posted here as recommended, and it maintains the sorting, but I'm now unable to get y-axis labels to display properly.
Creating the unique cite variable:
df$type <- factor(df$type, levels = c("X","Y"))
df$cite.type <- with(df, paste(cite, type, sep = "_"))
df$cite.type <- as.factor(df$cite.type)
df <- transform(df,cite.type=factor(cite.type,levels=unique(cite.type)))
df <- transform(df,cite=factor(cite,levels=unique(cite)))
Plotting (note that I cannot retain the y=reorder(cite.type,or) part because that would shift the summary effects:
ggplot(data=df, aes(y=cite.type, x=or, xmin=lowerCI, xmax=upperCI, shape = type)) +
geom_point(color = 'black', size=2)+
geom_errorbarh(height=.1)+
geom_point(data=subset(df,cite=='Summary'), color='black', size=5)+
facet_grid(type~., scales= 'free', space='free')+
scale_y_discrete(breaks=levels(df$cite.type),
labels=c(levels(df$cite)[1:3], expression(italic("Summary Effect"))))
And here is the result:
Note that it's now sorted appropriately but the y axis labels are only printed once per cite.

How to plot lines for the count data in R?

I have data frame like this:
frame <- data.frame("AGE" = seq(18,44,1),
"GROUP1"= c(83,101,159,185,212,276,330,293,330,356,370,325,264,274,214,229,227,154,132,121,83,69,57,32,16,17,8),
"GROUP2"= c(144,210,259,329,391,421,453,358,338,318,270,258,207,186,173,135,106,92,74,56,41,31,25,13,16,5,8))
I want to plot AGE in X-axis and value of GROUP1 and GROUP2 in the Y-axis in the same plot with different colors. And the values should be joined by a smoothened line.
As a first part, I melted the data frame and plotted:
melt <- melt(frame, id.vars = "AGE")
melt <- melt[order(melt$AGE),]
plot(melt$AGE, melt$value)
Here is an alternative solution using dplyr and tidyr packages.
library(dplyr)
library(tidyr)
newframe <- frame %>% gather("variable","value",-AGE)
ggplot(newframe, aes(x=AGE, y=value, color=variable)) +
geom_point() +
geom_smooth()
You could use geom_line() to get lines between the points, but it feels better to use geom_smooth() here. geom_area gives you a shaded area under the lines, but we need to change color to fill.
ggplot(newframe, aes(x=AGE, y=value, fill=variable)) + geom_area()
We can use matplot
matplot(`row.names<-`(as.matrix(frame[-1]), frame[,1]),
ylab='value',type = "l", xlab = "AGE",col = c("red", "blue"), pch = 1)
legend("topright", inset = .05, legend = c("GROUP1", "GROUP2"),
pch = 1, col = c("red", "blue"), horiz = TRUE)
Try,
library(ggplot2)
ggplot(meltdf,aes(x=AGE,y=value,colour=variable,group=variable)) + geom_line()

Resources