I'm a beginner in r and I've been trying to find how I can plot this graphic.
I have 4 variables (% of gravel, % of sand, % of silt in five places). I'm trying to plot the percentages of these 3 types of sediment (y) in each station (x). So it's five groups in x axis and 3 bars per group.
Station % gravel % sand % silt
1 PRA1 28.430000 70.06000 1.507000
2 PRA3 19.515000 78.07667 2.406000
3 PRA4 19.771000 78.63333 1.598333
4 PRB1 7.010667 91.38333 1.607333
5 PRB2 18.613333 79.62000 1.762000
I tried plotting a grouped barchart with
grao <- read_excel("~/Desktop/Masters/Data/grao.xlsx")
colors <- c('#999999','#E69F00','#56B4E9','#94A813','#718200')
barplot(table(grao$Station, grao$`% gravel`, grao$`% sand`, grao$`% silt`), beside = TRUE, col = colors)
But this error message keeps happening:
'height' must be a vector or matrix
I also tried
ggplot(grao, aes(Station, color=as.factor(`% gravel`), shape=as.factor(`% sand`))) +
geom_bar() + scale_color_manual(values=c('#999999','#E69F00','#56B4E9','#94A813','#718200')+ theme(legend.position="top")
But it's creating a crazy graphic.
Could someone help me, please? I've been stuck for weeks now in this one.
Cheers
I think this may be what you are looking for:
#install.packages("tidyverse")
library(tidyverse)
df <- data.frame(
station = c("PRA1", "PRA3", "PRA4", "PRB1", "PRB2"),
gravel = c(28.4, 19.5, 19.7, 7.01, 18.6),
sand = c(70.06, 78.07, 78.63, 91, 79),
silt = c(1.5, 2.4, 1.6, 1.7, 1.66)
)
df2 <- df %>%
pivot_longer(cols = c("gravel", "sand", "silt"), names_to = "Sediment_Type", values_to = "Percentage")
ggplot(df2) +
geom_bar(aes(x = station, y = Percentage, fill = Sediment_Type ), stat = "identity", position = "dodge") +
theme_minimal() #theme_minimal() is from the ggthemes package
provides:
You need to "pivot" your data set "longer". Part of the tidy way is ensuring all columns represent a single variable. You will notice in your initial dataframe that each column name is a variable ("Sediment_type") and each column fill is just the percentage for each ("Percentage"). The function pivot_longer() takes a dataset and allows one to gather up all the columns then turn them into just two - the identity and value.
Once you've done this, ggplot will allow you to specify your x axis, and then a grouping variable by "fill". You can switch these two up. If you end up with lots of data and grouping variables, faceting is also an option worth looking in to!
Hope this helps,
Brennan
barplot wants a "matrix", ideally with both dimension names. You could transform your data like this (remove first column while using it for row names):
dat <- `rownames<-`(as.matrix(grao[,-1]), grao[,1])
You will see, that barplot already does the tabulation for you. However, you also could use xtabs (table might not be the right function for your approach).
# dat <- xtabs(cbind(X..gravel, X..sand, X..silt) ~ Station, grao) ## alternatively
I would advise you to use proper variable names, since special characters are not the best idea.
colnames(dat) <- c("gravel", "sand", "silt")
dat
# gravel sand silt
# PRA1 28.430000 70.06000 1.507000
# PRA3 19.515000 78.07667 2.406000
# PRA4 19.771000 78.63333 1.598333
# PRB1 7.010667 91.38333 1.607333
# PRB2 18.613333 79.62000 1.762000
Then barplot knows what's going on.
.col <- c('#E69F00','#56B4E9','#94A813') ## pre-define colors
barplot(t(dat), beside=T, col=.col, ylim=c(0, 100), ## barplot
main="Here could be your title", xlab="sample", ylab="perc.")
legend("topleft", colnames(dat), pch=15, col=.col, cex=.9, horiz=T, bty="n") ## legend
box() ## put it in a box
Data:
grao <- read.table(text=" Station '% gravel' '% sand' '% silt'
1 PRA1 28.430000 70.06000 1.507000
2 PRA3 19.515000 78.07667 2.406000
3 PRA4 19.771000 78.63333 1.598333
4 PRB1 7.010667 91.38333 1.607333
5 PRB2 18.613333 79.62000 1.762000 ", header=TRUE)
Related
I have two lists of 37 items each (I put 3 here as an example):
vacancy.locations <- c("Amsterdam", "Zuid Holland", "Utrecht")
count.locations <- c("11", "9", "40")
I binded these two lists together locations <- cbind(vacancy.locations, count.locations so that I could sort in descending order sortedlocations <- locations[order(-count.locations),] and not lose the fact that 11 belonged to Amsterdam and 40 to Utrecht.
However, now I want to only keep the 10 locations with the highest count. Can anyone help me do that?
After this I want to plot the top 10 locations in a barplot. Currently I'm trying that with the sortedlocations, however I only get 1 bar in the chart with all locations combined.
barplotLocations <- barplot(height=sortedlocations, las=2, main="locations in vacancies", xlab="locations", ylab="number", cex.axis = .5, cex.names = .75)
Help? :)
Here it is an easy way of doing it with ggplot2
vacancy.locations <- letters
count.locations <- sample(1:1000, length(letters))
location = cbind.data.frame(vacancy.locations,count.locations)
location_sorted = location[order(-count.locations),]
top_location = location_sorted[1:10,]
top_location[,1] = factor(top_location[,1], levels = top_location[,1][order(top_location[,2])])
library(ggplot2)
ggplot(data=top_location, aes(x=vacancy.locations, y=as.factor(count.locations))) +
geom_bar(stat="identity")
First time question asker here. I wasn't able to find an answer to this question in other posts (love stackexchange, btw).
Anyway...
I'm creating a rarefaction curve via the vegan package and I'm getting a very messy plot that has a very thick black bar at the bottom of the plot which is obscuring some low diversity sample lines.
Ideally, I would like to generate a plot with all of my lines (169; I could reduce this to 144) but make a composite graph, coloring by Sample Year and making different types of lines for each Pond (i.e: 2 sample years: 2016, 2017 and 3 ponds: 1,2,5). I've used phyloseq to create an object with all my data, then separated my OTU abundance table from my metadata into distinct objects (jt = OTU table and sampledata = metadata). My current code:
jt <- as.data.frame(t(j)) # transform it to make it compatible with the proceeding commands
rarecurve(jt
, step = 100
, sample = 6000
, main = "Alpha Rarefaction Curve"
, cex = 0.2
, color = sampledata$PondYear)
# A very small subset of the sample metadata
Pond Year
F16.5.d.1.1.R2 5 2016
F17.1.D.6.1.R1 1 2017
F16.1.D15.1.R3 1 2016
F17.2.D00.1.R2 2 2017
enter image description here
Here is an example of how to plot a rarefaction curve with ggplot. I used data available in the phyloseq package available from bioconductor.
to install phyloseq:
source('http://bioconductor.org/biocLite.R')
biocLite('phyloseq')
library(phyloseq)
other libraries needed
library(tidyverse)
library(vegan)
data:
mothlist <- system.file("extdata", "esophagus.fn.list.gz", package = "phyloseq")
mothgroup <- system.file("extdata", "esophagus.good.groups.gz", package = "phyloseq")
mothtree <- system.file("extdata", "esophagus.tree.gz", package = "phyloseq")
cutoff <- "0.10"
esophman <- import_mothur(mothlist, mothgroup, mothtree, cutoff)
extract OTU table, transpose and convert to data frame
otu <- otu_table(esophman)
otu <- as.data.frame(t(otu))
sample_names <- rownames(otu)
out <- rarecurve(otu, step = 5, sample = 6000, label = T)
Now you have a list each element corresponds to one sample:
Clean the list up a bit:
rare <- lapply(out, function(x){
b <- as.data.frame(x)
b <- data.frame(OTU = b[,1], raw.read = rownames(b))
b$raw.read <- as.numeric(gsub("N", "", b$raw.read))
return(b)
})
label list
names(rare) <- sample_names
convert to data frame:
rare <- map_dfr(rare, function(x){
z <- data.frame(x)
return(z)
}, .id = "sample")
Lets see how it looks:
head(rare)
sample OTU raw.read
1 B 1.000000 1
2 B 5.977595 6
3 B 10.919090 11
4 B 15.826125 16
5 B 20.700279 21
6 B 25.543070 26
plot with ggplot2
ggplot(data = rare)+
geom_line(aes(x = raw.read, y = OTU, color = sample))+
scale_x_continuous(labels = scales::scientific_format())
vegan plot:
rarecurve(otu, step = 5, sample = 6000, label = T) #low step size because of low abundance
One can make an additional column of groupings and color according to that.
Here is an example how to add another grouping. Lets assume you have a table of the form:
groupings <- data.frame(sample = c("B", "C", "D"),
location = c("one", "one", "two"), stringsAsFactors = F)
groupings
sample location
1 B one
2 C one
3 D two
where samples are grouped according to another feature. You could use lapply or map_dfr to go over groupings$sample and label rare$location.
rare <- map_dfr(groupings$sample, function(x){ #loop over samples
z <- rare[rare$sample == x,] #subset rare according to sample
loc <- groupings$location[groupings$sample == x] #subset groupings according to sample, if more than one grouping repeat for all
z <- data.frame(z, loc) #make a new data frame with the subsets
return(z)
})
head(rare)
sample OTU raw.read loc
1 B 1.000000 1 one
2 B 5.977595 6 one
3 B 10.919090 11 one
4 B 15.826125 16 one
5 B 20.700279 21 one
6 B 25.543070 26 one
Lets make a decent plot out of this
ggplot(data = rare)+
geom_line(aes(x = raw.read, y = OTU, group = sample, color = loc))+
geom_text(data = rare %>% #here we need coordinates of the labels
group_by(sample) %>% #first group by samples
summarise(max_OTU = max(OTU), #find max OTU
max_raw = max(raw.read)), #find max raw read
aes(x = max_raw, y = max_OTU, label = sample), check_overlap = T, hjust = 0)+
scale_x_continuous(labels = scales::scientific_format())+
theme_bw()
I know this is an older question but I originally came here for the same reason and along the way found out that in a recent (2021) update vegan has made this a LOT easier.
This is an absolutely bare-bones example.
Ultimately we're going to be plotting the final result in ggplot so you'll have full customization options, and this is a tidyverse solution with dplyr.
library(vegan)
library(dplyr)
library(ggplot2)
I'm going to use the dune data within vegan and generate a column of random metadata for the site.
data(dune)
metadata <- data.frame("Site" = as.factor(1:20),
"Vegetation" = rep(c("Cactus", "None")))
Now we will run rarecurve, but provide the argument tidy = TRUE which will export a dataframe rather than a plot.
One thing to note here is that I have also used the step argument. The default step is 1, and this means by default you will get one row per individual per sample in your dataset, which can make the resulting dataframe huge. Step = 1 for dune gave me over 600 rows. Reducing the step too much will make your curves blocky, so it will be a balance between step and resolution for a nice plot.
Then I piped a left join right into the rarecurve call
dune_rare <- rarecurve(dune,
step = 2,
tidy = TRUE) %>%
left_join(metadata)
Now it will be plottable in ggplot, with a color/colour call to whatever metadata you attached.
From here you can customize other aspects of the plot as well.
ggplot(dune_rare) +
geom_line(aes(x = Sample, y = Species, group = Site, colour = Vegetation)) +
theme_bw()
dune-output
(Sorry it says I'm not allowed to embed the image yet :( )
I need to build a barplot of my data, showing bacterial relative abundance in different samples (each column should sum to 1 in the complete dataset).
A subset of my data:
> mydata
Taxon CD6 CD1 CD12
Actinomycetaceae;g__Actinomyces 0.031960309 0.066683743 0.045638509
Coriobacteriaceae;g__Atopobium 0.018691589 0.003244536 0.00447774
Corynebacteriaceae;g__Corynebacterium 0.001846083 0.006403689 0.000516662
Micrococcaceae;g__Rothia 0.001730703 0.000426913 0.001894429
Porphyromonadaceae;g__Porphyromonas 0.073497173 0.065915301 0.175406872
What I'd like to have is a bar for each sample (CD6, CD1, CD12), where the y values are the relative abundance of bacterial species (the Taxon column).
I think (but I'm not sure) my data format is not right to do the plot, since I don't have a variable to group by like in the examples I found...
ggplot(data) + geom_bar(aes(x=revision, y=added), stat="identity", fill="white", colour="black")
Is there a way to order my data making them right as input to this code?
Or how can I modify it?
Thanks!
Do you want something like this?
# sample data
df <- read.table(header=T, sep=" ", text="
Taxon CD6 CD1 CD12
Actinomycetaceae;g__Actinomyces 0.031960309 0.066683743 0.045638509
Coriobacteriaceae;g__Atopobium 0.018691589 0.003244536 0.00447774
Corynebacteriaceae;g__Corynebacterium 0.001846083 0.006403689 0.000516662
Micrococcaceae;g__Rothia 0.001730703 0.000426913 0.001894429
Porphyromonadaceae;g__Porphyromonas 0.073497173 0.065915301 0.175406872")
# convert wide data format to long format
require(reshape2)
df.long <- melt(df, id.vars="Taxon",
measure.vars=grep("CD\\d+", names(df), val=T),
variable.name="sample",
value.name="value")
# calculate proportions
require(plyr)
df.long <- ddply(df.long, .(sample), transform, value=value/sum(value))
# order samples by id
df.long$sample <- reorder(df.long$sample, as.numeric(sub("CD", "", df.long$sample)))
# plot using ggplot
require(ggplot2)
ggplot(df.long, aes(x=sample, y=value, fill=Taxon)) +
geom_bar(stat="identity") +
scale_fill_manual(values=scales::hue_pal(h = c(0, 360) + 15, # add manual colors
c = 100,
l = 65,
h.start = 0,
direction = 1)(length(levels(df$Taxon))))
My question is certainly a repeat but I can't figure out how to achieve what I aim to make.
Here is my data:
v1=c(46.55172, 13.79310, 29.31034, 1.72414, 5.17241, 3.44828, 0.00000, 0.60241, 24.09639, 59.63855, 4.81928, 6.02410, 0.00000, 4.81928, 14.58333, 22.91667, 58.33333, 0.00000, 2.08333, 2.08333, 0.00000, 20.96774, 20.96774, 47.58065, 5.64516, 3.22581, 0.80645, 0.80645)
names(v1) = c('Simul','SE','Obs','CG','Double','LR','RM','Simul','SE','Obs','CG','Double','LR','RM','Simul','SE','Obs', 'CG','Double','LR','RM','Simul','SE','Obs','CG', 'Double','LR','RM')
The first 7 numbers correspond to a fist "journal", the numbers from 8 to 14 corresponds to a second "journal", etc...
The seven numbers of each journal have the names Simul, SE, Obs, CG, Double, LR, RM. I want these numbers to represent the height of seven bars respectiviely in barplot and I want the 4 journals to be on the same window. My current script does so.
par(mfrow=c(2,2))
for (journal in 0:3){
if (journal == 0) { journal.name = 'American Naturalist'}
if (journal == 1) { journal.name = 'Animal Behavour'}
if (journal == 2) { journal.name = 'Ecology Letters'}
if (journal == 3) { journal.name = 'Evolution'}
barplot(v1[((journal*7)+1):((journal*7)+7)],ylim=c(0,60),main=journal.name)
}
mtext('Frequency',padj=2,side=2,outer=T)
mtext('Articles Type',padj=-2,side=1,outer=T)
I now want to...
1) legend
... add a box (and the space for this box) on the right side in order to add some legend with the meaning of the abreviations (Simul, SE, OBS, etc...)
2) text angle
... write the abreviations (Simul, SE, OBS, etc...) with an angle of 45°.
I guess the best way to achieve these things is to use ggplot but any answer types are welcome !
Thanks a lot !
For starters, I would recommend reshaping your current data (v1) to fit ggplot2
df = do.call("rbind",lapply(unique(names(v1)),function(x){v1[names(v1)==x]}))
rownames(df) = unique(names(v1))
colnames(df) = c("American Naturalist","Animal Behavour","Ecology Letters","Evolution")
head(df)
American Naturalist Animal Behavour Ecology Letters Evolution
Simul 46.55172 0.60241 14.58333 20.96774
SE 13.79310 24.09639 22.91667 20.96774
Obs 29.31034 59.63855 58.33333 47.58065
CG 1.72414 4.81928 0.00000 5.64516
Double 5.17241 6.02410 2.08333 3.22581
LR 3.44828 0.00000 2.08333 0.80645
Now, using reshape2:
head(melt(df))
Var1 Var2 value
1 Simul American Naturalist 46.55172
2 SE American Naturalist 13.79310
3 Obs American Naturalist 29.31034
4 CG American Naturalist 1.72414
5 Double American Naturalist 5.17241
6 LR American Naturalist 3.44828
Next, a basic ggplot2 bar plot:
p = ggplot(melt(df)) + geom_bar(aes(x=Var1,y=value, fill=Var1), stat="identity") + facet_wrap(~Var2)
The angle of axis labels:
p <- p + theme(axis.text.x = element_text(angle = 45))
I guess you can build on this by looking at labs for adding explanations for the axis labels.
As #Aaron said, it might be better to flip the plot around:
p + coord_flip()
OK, for one thing, let's put your data in a matrix. Too hard to keep track in just a vector!
v2 <- matrix(v1, nrow=7)
rownames(v2) <- c('Simul','SE','Obs','CG','Double','LR','RM')
colnames(v2) <- c('American Naturalist','Animal Behavour','Ecology Letters','Evolution')
v2
# American Naturalist Animal Behavour Ecology Letters Evolution
# Simul 46.55172 0.60241 14.58333 20.96774
# SE 13.79310 24.09639 22.91667 20.96774
# Obs 29.31034 59.63855 58.33333 47.58065
# CG 1.72414 4.81928 0.00000 5.64516
# Double 5.17241 6.02410 2.08333 3.22581
# LR 3.44828 0.00000 2.08333 0.80645
# RM 0.00000 4.81928 0.00000 0.80645
You're probably right that ggplot or lattice are going to be preferred solutions; here's a lattice one.
library(lattice)
library(reshape2)
v3 <- melt(v2)
names(v3) <- c("Variable", "Journal", "Frequency")
barchart(Variable~Frequency|Journal, data=v3, as.table=TRUE)
Note that I've made the bars horizontal and that this way the labels for each bar can be easily read. This is preferred to putting them at an angle and giving your audience a pain in the neck. This also makes it possible to just use the full name of whatever those things are instead of just the abbreviations, rather than putting it in a legend and giving your audience whiplash.
After succesfully performing a cast (using the reshape package) on a small data set I obtain the following frame(e_disp) which is what I am looking for.
Date Code 200g
1 2010/06/01 cg4j 0.519880141
2 2010/09/19 7gv2 0.158999682
3 2011/04/14 zl94 0.294174203
4 2011/05/27 a13t 0.140232549
My problem is that I wish to create a barplot which has the values under the column 200g plotted in bars with the x-axis being the date and each bar having the code associated with value. (This could also be on the x-axis above or below the date)
My problem is that I get the following error
"Error in barplot.default(e_disp) : 'height' must be a vector or a matrix"
So my questions are
1) Can what I am trying to do be done after using 'cast'
2) If so any suggestions as to how to accomplish this
Any help would be appreciated
This is quite easily done with ggplot2. Here is an example
# generate dummy data
mydf = data.frame(date = 1:5, code = letters[1:5], value = rpois(5, 40))
# plot it using ggplot2
library(ggplot2)
pl = ggplot(mydf, aes(x = date, y = value)) +
geom_bar(stat = 'identity') +
geom_text(aes(label = code), vjust = -1)
print(p1)
Is this what you are after:
dat <- read.table(textConnection("Date Code x200g
1 2010/06/01 cg4j 0.519880141
2 2010/09/19 7gv2 0.158999682
3 2011/04/14 zl94 0.294174203
4 2011/05/27 a13t 0.14023254"), header=TRUE, as.is=TRUE)
dat$Date <- as.Date(dat$Date)
Pasting the Date and Code columns separated by linefeed (\n") to make labels:
barplot(dat$x200g, names.arg=paste(dat$Date,"\n", dat$Code, sep=""), ylab=" ")