R boxplot several variables at once - r

I have the following data set that I would like to make a boxplot from:
July<-c("Closed","Open")
Cistus<-c(10.8, 18.9)
CS<-c(2.004, 3.9)
Oak<-c(7.4, 12.4)
OS<-c(0.9,2.1)
df<-data.frame(July, Cistus, CS, Oak, OS)
I would like my boxplot to have Cistus and Oak at the x-axis, each with two boxes (opened and closed). So in total 4 boxes....
I am epically failing at this... Please can you help me? I'm sorry for the basic question.

Here is a modification of Vincent's code but with the subsetting to the desired categories:
library(reshape2)
#reshape into long format
dfnew<-melt(df, "July")
#subset down to just Cistus and Oak
dfnew<-droplevels(dfnew[dfnew$variable %in% c("Cistus", "Oak"),])
#plot
boxplot(value ~ July+variable, data=dfnew, las=2, col=c("grey10", "grey50"))

I would do it using reshape2 to arrange your data.frame. Then, you can use formula in boxplot, so:
library(reshape2)
boxplot(July + variable ~ value, melt(df))
With more than one value per group and some color:
df2 <- data.frame(July=rep(c("Closed", "Open"), each=5),
Cistus=runif(10),
CS=runif(10),
Oak=runif(10),
OS=runif(10))
boxplot(value ~ July + variable, melt(df2), col=c("grey10", "grey50"))
Is this what you're looking for?

Related

Graphing different variables in the same graph R- ggplot2

I have several datasets and my end goal is to do a graph out of them, with each line representing the yearly variation for the given information. I finally joined and combined my data (as it was in a per month structure) into a table that just contains the yearly means for each item I want to graph (column depicting year and subsequent rows depicting yearly variation for 4 different elements)
I have one factor that is the year and 4 different variables that read yearly variations, thus I would like to graph them on the same space. I had the idea to joint the 4 columns into one by factor (collapse into one observation per row and the year or factor in the subsequent row) but seem unable to do that. My thought is that this would give a structure to my y axis. Would like some advise, and to know if my approach to the problem is effective. I am trying ggplot2 but does not seem to work without a defined (or a pre defined range) y axis. Thanks
I would suggest next approach. You have to reshape your data from wide to long as next example. In that way is possible to see all variables. As no data is provided, this solution is sketched using dummy data. Also, you can change lines to other geom you want like points:
library(tidyverse)
set.seed(123)
#Data
df <- data.frame(year=1990:2000,
v1=rnorm(11,2,1),
v2=rnorm(11,3,2),
v3=rnorm(11,4,1),
v4=rnorm(11,5,2))
#Plot
df %>% pivot_longer(-year) %>%
ggplot(aes(x=factor(year),y=value,group=name,color=name))+
geom_line()+
theme_bw()
Output:
We could use melt from reshape2 without loading multiple other packages
library(reshape2)
library(ggplot2)
ggplot(melt(df, id.var = 'year'), aes(x = factor(year), y = value,
group = variable, color = variable)) +
geom_line()
-output plot
Or with matplot from base R
matplot(as.matrix(df[-1]), type = 'l', xaxt = 'n')
data
set.seed(123)
df <- data.frame(year=1990:2000,
v1=rnorm(11,2,1),
v2=rnorm(11,3,2),
v3=rnorm(11,4,1),
v4=rnorm(11,5,2))

plotting multiple plot in R for different calendar date

I have about 20 years of daily data in a time series. It has columns Date, rainfall and other data.
I am trying plot rainfall vs Time. I want to get 20 line plots with different colours and legend is generated that show the years in one graph. I tried the following codes but it is not giving me the desired results. Any suggestion to fix my issue would be most welcome
library(ggplot2)
library(seas)
data(mscdata)
p<-ggplot(data=mscdata,aes(x=date,y=precip,group=year,color=year))
p+geom_line()+scale_x_date(labels=date_format("%m"),breaks=date_breaks("1 months"))
It doesnt look great but here's a method. We first coerce the data into dates in the same year:
mscdata$dayofyear <- as.Date(format(mscdata$date, "%j"), format = "%j")
Then we plot:
library(ggplot2)
library(scales)
p <- ggplot(data = mscdata, aes(x = dayofyear, y = precip, group = year, color = year))
p + geom_line() +
scale_x_date(labels = date_format("%m"), breaks = date_breaks("1 months"))
While I agree with #Jaap that this may not be the best way to depict these data, try to following:
mscdata$doy <- as.numeric(strftime(mscdata$date, format="%j"))
ggplot(data=mscdata,aes(x=doy,y=precip,group=year)) +
geom_line(aes(color=year))
Although the given answers are good answers to your questions as it stands, i don't think it will solve your problem. I think you should be looking at a different way to present the data. #Jaap already suggested using facets. Take for example this approach:
#first add a month column to your dataframe
mscdata$month <- format(mscdata$date, "%m")
#then plot it using boxplot with year on the X-axis and month as facet.
p1 <- ggplot(data = mscdata, aes(x = year, y = precip, group=year))
p1 + geom_boxplot(outlier.shape = 3) + facet_wrap(~month)
This will give you a graph per month, showing the rainfall per year next to one each other. Because i use boxplot, the peaks in rainfall show up as dots ('normal' rain events are inside box).
Another possible approach would be to use stat_summary.

How do I put multiple boxplots in the same graph in R?

Sorry I don't have example code for this question.
All I want to know is if it is possible to create multiple side-by-side boxplots in R representing different columns/variables within my data frame. Each boxplot would also only represent a single variable--I would like to set the y-scale to a range of (0,6).
If this isn't possible, how can I use something like the panel option in ggplot2 if I only want to create a boxplot using a single variable? Thanks!
Ideally, I want something like the image below but without factor grouping like in ggplot2. Again, each boxplot would represent completely separate and single columns.
ggplot2 requires that your data to be plotted on the y-axis are all in one column.
Here is an example:
set.seed(1)
df <- data.frame(
value = runif(810,0,6),
group = 1:9
)
df
library(ggplot2)
ggplot(df, aes(factor(group), value)) + geom_boxplot() + coord_cartesian(ylim = c(0,6)
The ylim(0,6) sets the y-axis to be between 0 and 6
If your data are in columns, you can get them into the longform using melt from reshape2 or gather from tidyr. (other methods also available).
You can do this if you reshape your data into long format
## Some sample data
dat <- data.frame(a=rnorm(100), b=rnorm(100), c=rnorm(100))
## Reshape data wide -> long
library(reshape2)
long <- melt(dat)
plot(value ~ variable, data=long)

Adding a trend line to a scatterplot using R

I have a data set with number of people at a certain age (ranging from 0-105+), recorded in the period 1846-2014, and I am making a scatterplot of the summed amount of people by year; there's one data set for males and one for females. After that, I am going to add a trend line, but I am having problems figuring out how.
This is what I've got so far:
B <- as.matrix(read.table("clipboard"))
head(B)
age <- 0:105
y <- 1846:2014
plot(c(1846:2014), c(colSums(B)), col=3, xlab="Year", ylab="Summed age", main="Summed people")
This gives me the plot, but I am not sure how to add the trend line. Please help.
Plot looks like this: https://www.dropbox.com/s/5dono5bjrmqylcp/Plot.png?dl=0
Data available here:
https://www.ssb.no/statistikkbanken/SelectVarVal/Define.asp?subjectcode=01&ProductId=01&MainTable=FolkemEttAarig&SubTable=1&PLanguage=1&nvl=True&Qid=0&gruppe1=Hele&gruppe2=Hele&gruppe3=Hele&VS1=AlleAldre00B&VS2=Kjonn3&VS3=&mt=0&KortNavnWeb=folkemengde&CMSSubjectArea=befolkning&StatVariant=&checked=true
I downloaded your data file and posted it somewhere accessible.
urlsrc <- "http://www.math.mcmaster.ca/bolker/misc"
urlfn <- "201512516853914205393FolkemEttAarig.tsv"
d <- read.delim(url(paste(urlsrc,urlfn,sep="/")),header=TRUE,
check.names=FALSE)
dm <- d[,3:171]
y <- as.numeric(names(dm))
Now make the plot:
plot(y, colSums(dm),
col=3, xlab="Year", ylab="Summed age", main="Summed people")
abline(lm(colSums(dm) ~ y))
You can also do it like this:
library("tidyr")
library("ggplot2"); theme_set(theme_bw())
library("dplyr")
d2 <- gather(dm,year,pop,convert=TRUE)
d3 <- d2 %>% group_by(year) %>% summarise(pop=mean(pop))
ggplot(d3,aes(year,pop)) + geom_point() +
geom_smooth(method="lm")
There is a confidence interval around this trend line, but it's so narrow that it's hard to see.
update: I accidentally used the mean instead of the sum in the second plot, but of course it should be easy to change that.

Visualize summary-statistics with R

My dataset looks similar to the one described here( i have more variables=columns and more observations):
dat=cbind(var1=c(100,20,33,400),var2=c(1,0,1,1),var3=c(0,1,0,0))
Now I want to create a bargraph with R where on the x axis one see the names of all the variable, and on the y axis the mean of the respective variable.
As a second task it would be great to show not only the mean, also the standard deviation within the same plot.
It would be nice, solving this with gglopt or qplot.
Thanks
Using base R:
dat <- cbind(var1=c(1,0.20,0.33,4),var2=c(1,0,1,1),var3=c(0,1,0,0))
dat <- as.data.frame(dat) # get this into a data frame as early as possible
barplot(sapply(dat,mean))
Using ggplot
library(ggplot2)
library(reshape2) # for melt(...)
df <- melt(dat)
ggplot(df, aes(x=variable,y=value)) +
stat_summary(fun.y=mean,geom="bar",color="grey20",fill="lightgreen")+
stat_summary(fun.data="mean_sdl",mult=1)

Resources