graphing multiple data series in R ggplot

graphing multiple data series in R ggplot - r

I am trying to plot (on the same graph) two sets of data versus date from two different data frames. Both data frames have the same exact dates for each of the two measurements. I would like to plot these two sets of data on the same graph, with different colors. However, I can't get them on the same graph at all. R is already reading the date as date. I tried this:
qplot( date , NO3, data=qual.arn)
+ qplot( qual.arn$date , qual.arn$DIS.O2, "O2(aq)" , add=T)
and received this error.
Error in add_ggplot(e1, e2, e2name) :
argument "e2" is missing, with no default
I tried using the ggplot function instead of qplot, but I couldn't even plot one graph this way.
ggplot(date=qual.no3.s, aes(date,NO3))
Error: ggplot2 doesn't know how to deal with data of class uneval
PLEASE HELP. Thank you!

Since you didn't provide any data (please do so in future), here's a made up dataset for demonstrate a solution. There are (at least) two ways to do this: the right way and the wrong way. Both yield equivalent results in this very simple case.
# set up minimum reproducible example
set.seed(1) # for reproducible example
dates <- seq(as.Date("2015-01-01"),as.Date("2015-06-01"), by=1)
df1 <- data.frame(date=dates, NO3=rpois(length(dates),25))
df2 <- data.frame(date=dates, DIS.O2=rnorm(length(dates),50,10))
ggplot is designed to use data in "long" format. This means that all the y-values (the concentrations) are in a single column, and there is separate column which identifies the corresponding category ("NO3" or "DIS.O2" in your case). So first we merge the two data-sets based on date, then use melt(...) to convert from "wide" (categories in separate columns) to "long" format. Then we let ggplot worry about legends, colors, etc.
library(ggplot2)
library(reshape2) # for melt(...)
# The right way: combine the data-sets, then plot
df.mrg <- merge(df1,df2, by="date", all=TRUE)
gg.df <- melt(df.mrg, id="date", variable.name="Component", value.name="Concentration")
ggplot(gg.df, aes(x=date, y=Concentration, color=Component)) +
geom_point() + labs(x=NULL)
The "wrong" way to do this is by making separate calls to geom_point(...) for each layer. In your particular case this might be simpler, but in the long run it's better to use the other method.
# The wrong way: plot two sets of points
ggplot() +
geom_point(data=df1, aes(x=date, y=NO3, color="NO2")) +
geom_point(data=df2, aes(x=date, y=DIS.O2, color="DIS.O2")) +
scale_color_manual(name="Component",values=c("red", "blue")) +
labs(x=NULL, y="Concentration")

Related

geom_area doesn't show data, supposedly because of x-axis data

I want to create a stacked area plot based on a data frame.
Time <- c("W37/19","W38/19","W39/19","W40/19","W41/19")
Basis <- c(20.07,20.07,20.07,20.07,20.07)
AdStock <- c(5.88,5.60,5.34,5.09,4.86)
TV <- c(0,0,0.54,0.93,1.14)
Display <- c(0.07,0.21,0.33,0.35,0.36)
df_graph <- data.frame(Time, Basis, AdStock, TV, Display)
Data is time series data, "Time" is German calender weeks and should stay in this order.
First thing I do is transforming the data in long format.
library(tidyr)
df_graph <- pivot_longer(df_graph[,c("Time","Basis","AdStock","TV","Display")],-Time)
Second I convert df_graph$name to a factor and reverse the order, because I want to keep the original order for the stacking.
library(forcats)
df_graph$name <-factor(df_graph$name, levels = c("Basis","AdStock","TV","Display"))
df_graph$name <- fct_rev(df_graph$name)
Then I want to plot my data.
library(ggplot2)
p <- ggplot(df_graph, aes(x=Time, y=value, fill=name))
p <- p + geom_area()
p
The plot shows both axes as well as the legend but no data.
If I replace the calender weeks in "Time" by just an ascending series of numbers
df_graph$Time <- seq(1:5)
it works, but not with my X-Axis values.
Also I don't think, that the conversion of "Name" to factor is a problem, because I still don't get data in my plot even if I remove these two lines.
I tried different methods for the Long-Format (e.g. gather) and also tried using the ascending series of numbers(1:5) as x-values and then replacing it with scale_x_discrete but my areas always disappear.
What am I missing?
Many thanks in advance.

How do I put multiple boxplots in the same graph in R?

Sorry I don't have example code for this question.
All I want to know is if it is possible to create multiple side-by-side boxplots in R representing different columns/variables within my data frame. Each boxplot would also only represent a single variable--I would like to set the y-scale to a range of (0,6).
If this isn't possible, how can I use something like the panel option in ggplot2 if I only want to create a boxplot using a single variable? Thanks!
Ideally, I want something like the image below but without factor grouping like in ggplot2. Again, each boxplot would represent completely separate and single columns.

ggplot2 requires that your data to be plotted on the y-axis are all in one column.
Here is an example:
set.seed(1)
df <- data.frame(
value = runif(810,0,6),
group = 1:9
)
df
library(ggplot2)
ggplot(df, aes(factor(group), value)) + geom_boxplot() + coord_cartesian(ylim = c(0,6)
The ylim(0,6) sets the y-axis to be between 0 and 6
If your data are in columns, you can get them into the longform using melt from reshape2 or gather from tidyr. (other methods also available).

You can do this if you reshape your data into long format
## Some sample data
dat <- data.frame(a=rnorm(100), b=rnorm(100), c=rnorm(100))
## Reshape data wide -> long
library(reshape2)
long <- melt(dat)
plot(value ~ variable, data=long)

Create Lollipop-like plot with R

I have a .csv file that looks like that:
Pos,ReadsME_016,ReadsME_017,ReadsME_018,ReadsME_019,ReadsME_020,ReadsME_021
95952794,62.36,62.06,55.56,51,60.35,44.27
95952795,100,100,100,100,100,100
95952833,0,0,-,0,-,-
95952846,0,0,-,0,0,-
95952876,0,-,0,0,0,0
95952877,38.89,28.98,25.67,36.99,37.91,16.86
95952878,100,100,100,100,100,100
95952884,0,-,0,-,-,0
95952897,18.7,20.52,20.94,16.43,22.68,12.55
95952898,100,100,75,80,-,100
95952902,10.88,8.93,10.22,10.63,13.51,6.06
95952903,100,100,100,75,-,100
95952915,10.75,8.7,7.91,8.35,15.12,8.88
What I want is to create a plot that is similar to this one:
http://www.scfbm.org/content/9/1/11/figure/F2
However, all my attempts failed. Unfortunately, the tool is yet not available and I cannot read the source code.
I've thought of ggplot and melt, but I do not come close to this graph. How can I achieve that all read samples (ReadsME_016,ReadsME_017,..) are listed on the x-axes and the positions are listed on the y-axes? I don’t know how to deal with both x- & y-axes being categorical while the plotted values should show percentages?
dataset <- melt(dataset, id.vars="Pos")
ggplot(dataset, aes(x=value, y=Pos, colour=variable)) + geom_point()
Here is the complete .csv file:
Pos,ReadsME_016,ReadsME_017,ReadsME_018,ReadsME_019,ReadsME_020,ReadsME_021,ReadsME_022,ReadsME_023,ReadsME_024,ReadsME_025,ReadsME_026,ReadsME_027,ReadsME_028,ReadsME_030,ReadsME_031,ReadsME_032
95952794,62.36,62.06,55.56,51.0,60.35,44.27,53.73,61.69,57.04,64.16,61.48,59.42,66.93,49.71,55.23,66.67
95952795,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,-,100.0,100.0,100.0,100.0,-
95952833,0.0,0.0,-,0.0,-,-,100.0,-,-,-,-,0.0,-,-,0.0,-
95952846,0.0,0.0,-,0.0,0.0,-,0.0,0.0,-,-,-,0.0,-,-,-,-
95952876,0.0,-,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-
95952877,38.89,28.98,25.67,36.99,37.91,16.86,29.65,35.38,35.43,36.87,34.04,33.91,35.04,19.09,38.35,0.0
95952878,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,-,100.0,100.0,100.0,100.0,-
95952884,0.0,-,0.0,-,-,0.0,-,-,100.0,-,-,0.0,-,-,-,-
95952897,18.7,20.52,20.94,16.43,22.68,12.55,18.3,22.28,21.05,22.55,24.81,20.63,22.05,13.06,22.8,0.0
95952898,100.0,100.0,75.0,80.0,-,100.0,80.0,100.0,100.0,-,-,-,100.0,-,100.0,-
95952902,10.88,8.93,10.22,10.63,13.51,6.06,9.62,15.73,14.08,18.65,13.28,16.44,15.02,8.92,11.11,100.0
95952903,100.0,100.0,100.0,75.0,-,100.0,100.0,100.0,100.0,-,-,100.0,100.0,100.0,100.0,-
95952915,10.75,8.7,7.91,8.35,15.12,8.88,7.32,9.76,11.45,8.99,10.57,14.07,10.36,6.35,10.04,0.0
95952916,100.0,100.0,100.0,100.0,-,100.0,100.0,100.0,100.0,-,-,100.0,100.0,-,100.0,-
95952925,10.39,8.33,8.59,10.51,14.19,10.99,6.98,11.56,13.93,15.0,14.29,16.26,9.76,5.86,12.96,0.0
95952926,100.0,100.0,100.0,100.0,-,100.0,100.0,100.0,100.0,-,-,-,100.0,-,100.0,-
95952937,19.53,14.97,11.97,14.43,19.26,17.18,19.48,12.31,21.17,21.57,23.08,26.24,16.38,13.47,21.82,0.0
95952938,100.0,100.0,100.0,100.0,-,100.0,100.0,-,-,-,-,-,-,-,100.0,-
95952825,-,0.0,-,-,-,-,-,-,-,-,0.0,-,-,0.0,0.0,-
95952975,-,0.0,-,-,-,-,-,-,0.0,-,-,-,-,-,-,-
95952669,-,-,0.0,-,-,0.0,0.0,-,-,-,-,-,-,-,0.0,-
95952718,-,-,0.0,0.0,0.0,-,0.0,-,-,-,0.0,-,-,0.0,0.0,-
95952868,-,-,0.0,-,0.0,-,-,0.0,-,-,0.0,-,-,-,-,-
95952957,-,-,0.0,-,-,-,-,0.0,0.0,0.0,-,0.0,-,-,-,-
95952976,-,-,0.0,-,0.0,0.0,0.0,100.0,-,0.0,-,-,-,-,0.0,-
95952681,-,-,-,0.0,-,0.0,-,0.0,-,-,-,-,-,0.0,-,-
95952779,-,-,-,0.0,-,-,-,-,-,-,-,-,-,-,-,-
95952811,-,-,-,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-,-,-,0.0,-
95952821,-,-,-,0.0,-,-,-,-,-,-,-,-,-,-,-,-
95952823,-,-,-,0.0,-,-,-,-,-,-,-,-,-,-,-,-
95952859,-,-,-,0.0,0.0,-,-,0.0,0.0,-,0.0,-,-,0.0,0.0,-
95952882,-,-,-,0.0,-,-,-,-,-,-,0.0,-,-,-,-,-
95953023,-,-,-,0.0,-,0.0,-,-,-,-,-,-,-,-,-,-
95953058,-,-,-,0.0,-,0.0,-,-,-,-,-,-,-,-,-,-
95952664,-,-,-,-,-,0.0,0.0,-,-,0.0,-,-,-,-,0.0,-
95952801,-,-,-,-,-,0.0,-,-,-,-,-,-,-,-,-,-
95952968,-,-,-,-,-,-,0.0,-,-,0.0,-,-,-,-,-,-
95952797,-,-,-,-,-,-,-,-,0.0,-,-,-,-,-,-,-
95952851,-,-,-,-,-,-,-,-,-,-,0.0,-,-,-,-,-
95952894,-,-,-,-,-,-,-,-,-,-,0.0,-,-,-,-,-
95952807,-,-,-,-,-,-,-,-,-,-,-,-,-,0.0,-,-
95952712,-,-,-,-,-,-,-,-,-,-,-,-,-,-,0.0,-

First, you want to make sure you are reading in your data properly. You have non-numeric values (specifically "-") mixed in with numeric values. I'm assuming those are missing values. Make sure you let R know that with na.strings="-". Then, to get something more consistent with the example plot, i changed your variables around
library(reshape2) # for melt()
library(ggplot2) # for ggplot()
dataset <- read.table("file.txt", header=TRUE, sep=",", na.strings="-")
ggplot(melt(dataset, id.vars="Pos"),
aes(x=Pos, y=variable, colour=cut(value, breaks=5))) +
geom_point()

R: Plot multiple box plots using columns from data frame

I would like to plot an INDIVIDUAL box plot for each unrelated column in a data frame. I thought I was on the right track with boxplot.matrix from the sfsmsic package, but it seems to do the same as boxplot(as.matrix(plotdata) which is to plot everything in a shared boxplot with a shared scale on the axis. I want (say) 5 individual plots.
I could do this by hand like:
par(mfrow=c(2,2))
boxplot(data$var1
boxplot(data$var2)
boxplot(data$var3)
boxplot(data$var4)
But there must be a way to use the data frame columns?
EDIT: I used iterations, see my answer.

You could use the reshape package to simplify things
data <- data.frame(v1=rnorm(100),v2=rnorm(100),v3=rnorm(100), v4=rnorm(100))
library(reshape)
meltData <- melt(data)
boxplot(data=meltData, value~variable)
or even then use ggplot2 package to make things nicer
library(ggplot2)
p <- ggplot(meltData, aes(factor(variable), value))
p + geom_boxplot() + facet_wrap(~variable, scale="free")

From ?boxplot we see that we have the option to pass multiple vectors of data as elements of a list, and we will get multiple boxplots, one for each vector in our list.
So all we need to do is convert the columns of our matrix to a list:
m <- matrix(1:25,5,5)
boxplot(x = as.list(as.data.frame(m)))
If you really want separate panels each with a single boxplot (although, frankly, I don't see why you would want to do that), I would instead turn to ggplot and faceting:
m1 <- melt(as.data.frame(m))
library(ggplot2)
ggplot(m1,aes(x = variable,y = value)) + facet_wrap(~variable) + geom_boxplot()

I used iteration to do this. I think perhaps I wasn't clear in the original question. Thanks for the responses none the less.
par(mfrow=c(2,5))
for (i in 1:length(plotdata)) {
boxplot(plotdata[,i], main=names(plotdata[i]), type="l")
}

Simple analog for plotting a line from a table object in ggplot2

I have been unable to find a simple analog for plotting a line graph from a table object in ggplot2. Given the elegance and utility of the package, I feel I must be missing something quite obvious. As an illustration consider a data frame with yearly observations:
dat<-data.frame(year=sample(c("2001":"2010"),1000, replace=T))
And a quick time series plot in base R:
plot(table(dat$year), type="l")
Switching to qplot, returns the error "attempt to apply a non-function":
qplot(table(dat$year), geom="line")
ggplot2 requires a data frame. Fair enough. But this returns the same error.
qplot(year, data=dat, geom="line")
After some searching and fiddling, I abandoned qplot, and came up with the following approach which involves specifying a line geometry, binning the counts, and dropping final values to avoid plotting zeros.
ggplot(dat, aes(year) ) + geom_line(stat = "bin", binwidth=1, drop=TRUE)
It seems like rather a long walk around the block. And it is still not entirely satisfactory, since the bins don't align precisely with the mid-year values on the x-axis. Where have I gone wrong?

Maybe still more complicated than you want, but:
qplot(Var1,Freq,data=as.data.frame(table(dat$year)),geom="line",group=1)
(the group=1 is necessary because the Year variable (Var1) is returned as a factor ...)
If you didn't need it as a one-liner you could use ytab <- as.data.frame(table(dat$year)) first to extract the table and convert it to a data frame ...
Following Brian Diggs's answer, if you're willing to construct a bit more fortify machinery you can condense this a bit more:
A utility function that converts a factor to numeric if possible:
conv2num <- function(x) {
xn <- suppressWarnings(as.numeric(as.character(x)))
if (!all(is.na(xn))) xn else x
}
And a fortify method that turns the table into a data frame and then tries to make the columns numeric:
fortify.table <- function(x,...) {
z <- as.data.frame(x)
facs <- sapply(z,is.factor)
z[facs] <- lapply(z[facs],conv2num)
z
}
Now this works almost as you would like it to:
qplot(Var1,Freq,data=table(dat$year),geom="line")
(It would be nice/easier if there were a table option to preserve the numeric nature of cross-classifying factors ...)

Expanding on Ben's answer, the "standard" approach would be to create the data frame from the table, at which point you can covert the years back into numbers.
ytab <- as.data.frame(table(dat$year))
ytab$Var1 <- as.numeric(as.character(ytab$Var1))
The either of the following will work:
ggplot(ytab, aes(Var1, Freq)) + geom_line()
qplot(Var1, Freq, data=ytab, geom="line")
The other approach is to create a fortify function which will transform the table into a data frame, and use that.
fortify.table <- as.data.frame.table
Then you can pass the table directly instead of a data frame. But Var1 is now still a factor and so you need group=1 to connect the line across years.
ggplot(table(dat$year), aes(Var1, Freq)) + geom_line(aes(group=1))
qplot(Var1, Freq, data=table(dat$year), geom="line", group=1)