Multiple series barplot - r

I have matrix the looks like this (expect with four numeric variables)
GeneId<- c("x","y","z")
Var1<- c(0,1,3)
Var2<- c(1,2,1)
df<-cbind(GeneId, Var1,Var2)
What I what to plot is a bar graph where each gene has a bar for each variable grouped (i.e x would have bar1 = height 0, bar2 = 1)
I can do individual graphs by writing a loop and plotting each row:
for (i in 1:legnth(df$GeneId){
barplot(as.numeric(df[i,]), main= rownames(df)[i])
}
But I would like to have the plots on the same graph. Any ideas? I thought of doing using either ggplot2 or lattice but from what I have seen they are only able to put them in a grid together, axis are independent of each other.

The simplest answer would be to use
barplot(rbind(Var1,Var2),col=c("darkblue","red"),beside = TRUE)
I recommend you to read and experiment using barplot

Try this:
df=data.frame(GeneId=c("x","y","z"), Var1=c(0,1,3),Var2=c(1,2,1))
library(reshape2)
library(ggplot2)
df_ = melt(df, id.vars=c("GeneId"))
ggplot(df_, aes(GeneId, value, fill=variable)) +
geom_bar(stat='Identity',position=position_dodge())

Related

How do I make my row names appear on my x axis? And the numbers on from my variables appear as the y axis?

I created a dataframe with countries as row names and percentages as obs. from the variables, but when making a histogram it seems that the percentages from the variables are occupying the x axis and the country names aren't even there. How do I make it so that the countrie's names are on the x axis and the variables on the y?
Country <- c('Albania','Armenia','Austria','Belarus','Belgium','Bosnia and Herzegovina','Bulgaria','Croatia','Cyprus','Czechia','Denmark','Estonia','Finland','France','Georgia','Germany','Greece','Hungary','Iceland','Ireland','Italy','Latvia','Lithuania','Luxembourg','Malta','Moldova','Montenegro','Netherlands','Norway','Poland','Portugal','Romania','Russia','Serbia','Slovakia','Slovenia','Spain','Sweden','Switzerland','Turkey','Ukraine','United Kingdom')
Anxiety.Disorders <- c(3.38,2.73,5.22,3.03,4.92,3.70,3.84,3.74,5.61,3.59,5.18,3.01,3.59,6.37,2.46,6.37,5.58,3.69,5.15,5.66,5.57,3.04,3.06,5.19,5.14,2.77,3.55,6.43,7.33,3.68,5.52,3.41,3.02,3.60,3.61,3.60,5.14,5.16,5.28,3.85,3.09,4.43)
Depressive.Disorders <- c(2.42,3.16,3.66,4.84,4.35,2.88,3.30,3.60,3.88,3.25,3.62,4.78,5.08,4.55,2.98,4.42,4.56,3.53,3.55,4.37,3.94,4.44,5.20,3.95,3.69,3.77,2.96,4.34,3.95,2.72,5.27,2.88,4.36,3.15,2.87,3.58,3.91,4.84,4.17,3.76,5.02,4.35)
Bipolar.Disorder <- c(0.72,0.77,0.95,0.73,0.91,0.79,0.67,0.77,1.04,0.75,0.99,0.71,0.99,0.93,0.67,0.79,0.93,0.74,0.97,0.80,0.95,0.71,0.73,0.95,0.97,0.67,0.74,0.94,0.85,0.76,0.97,0.78,0.70,0.74,0.76,0.75,0.97,1.04,0.98,0.85,0.73,1.05)
G08 <- data.frame(Country, Anxiety.Disorders, Depressive.Disorders, Bipolar.Disorder)
row.names(G08) <- G08$Country
G08[1] <- NULL
hist(G08$Anxiety.Disorders)
I use the melt() call to create one observation per row. Then, I use ggplot to produce the bar plot.
library(ggplot2)
library(reshape2)
Country <- c('Albania','Armenia','Austria','Belarus','Belgium','Bosnia-Herzegovina','Bulgaria','Croatia','Cyprus','Czechia','Denmark','Estonia','Finland','France','Georgia','Germany','Greece','Hungary','Iceland','Ireland','Italy','Latvia','Lithuania','Luxembourg','Malta','Moldova','Montenegro','Netherlands','Norway','Poland','Portugal','Romania','Russia','Serbia','Slovakia','Slovenia','Spain','Sweden','Switzerland','Turkey','Ukraine','United Kingdom')
Anxiety.Disorders <- c(3.38,2.73,5.22,3.03,4.92,3.70,3.84,3.74,5.61,3.59,5.18,3.01,3.59,6.37,2.46,6.37,5.58,3.69,5.15,5.66,5.57,3.04,3.06,5.19,5.14,2.77,3.55,6.43,7.33,3.68,5.52,3.41,3.02,3.60,3.61,3.60,5.14,5.16,5.28,3.85,3.09,4.43)
Depressive.Disorders <- c(2.42,3.16,3.66,4.84,4.35,2.88,3.30,3.60,3.88,3.25,3.62,4.78,5.08,4.55,2.98,4.42,4.56,3.53,3.55,4.37,3.94,4.44,5.20,3.95,3.69,3.77,2.96,4.34,3.95,2.72,5.27,2.88,4.36,3.15,2.87,3.58,3.91,4.84,4.17,3.76,5.02,4.35)
Bipolar.Disorder <- c(0.72,0.77,0.95,0.73,0.91,0.79,0.67,0.77,1.04,0.75,0.99,0.71,0.99,0.93,0.67,0.79,0.93,0.74,0.97,0.80,0.95,0.71,0.73,0.95,0.97,0.67,0.74,0.94,0.85,0.76,0.97,0.78,0.70,0.74,0.76,0.75,0.97,1.04,0.98,0.85,0.73,1.05)
G08 <- data.frame(Country, Anxiety.Disorders, Depressive.Disorders, Bipolar.Disorder)
G08melt <- melt(G08, "Country")
G08.bar <- ggplot(G08melt, aes(x = Country, y=value)) +
geom_bar(aes(fill=variable),stat="identity", position ="dodge") +
theme_bw()+
theme(axis.text.x = element_text(angle=-40, hjust=.1))
G08.bar
Looking at your question, I think you tried to do a grouped column diagram instead of a histogram. You can do the plot directly using the barplot function from the graphics package. But before that, you need to convert your dataframe into a matrix. I removed the first column from G08.
mat<-G08[,-1]
Now just simply use the barplot function on the transpose of the matrix mat and use the names parameter of barplot to write the names of the Countries on the x-axis:
barplot(t(mat),beside=T,col=c('red','blue','gold'),border=NA,names=G08$Country,cex.names=0.45,las=2)
par(new=T)
legend('topright',c("Anxiety","Depressive","Bipolar"),fill=c("red","blue","gold"),cex=0.5,title='Disorder types')
Suggestion:
For a little bit of more 'fresh air' in the graph, you can just set beside=F in barplot and get a stacked column diagram:

How to Plot Bar Charts for a Categorical Variable Against an Analytical Variable in R

I'm struggling with how to do something with R that comes very easily to me in Excel: so I'm sure this is something quite basic but I'm just not aware of the equivalent method in R.
In essence, I have a two variables in my dataset: a categorical variable which has a list of names, and an analytical variable that has the frequency corresponding to that particular observation.
Something like this:
Name Freq
==== =========
X 100
Y 200
and so on.
I would like to plot a bar chart with the names listed on the X-Axis (X, Y and so on) and bars of height corresponding to the relevant value of the Freq. variable for that observation.
This is something very trivial with Excel; I can just select the relevant cells and create a bar chart.
However, in R I just can't seem to figure out how to do this! The bar charts in R seems to be univariate only and doesn't behave the way I want it to. Trying to plot the two variables results in a scatter plot which is not what I'm going for.
Is there something very basic I'm missing here, or is R just not capable of performing this task?
Any pointers will be much helpful.
Edited to Add:
I was primarily trying to use base R's plot function to get the job done.
Using, plot(dataset1$Name, dataset1$Freq) does not lead to a bar graph but a scatter-plot instead.
First the data.
dat <- data.frame(Name = c("X", "Y"), Freq = c(100, 200))
With base R.
barplot(dat$Freq, names.arg = dat$Name)
If you want to display a long list of names.arg, maybe the best way is to customize your horizontal axis with function staxlab from package plotrix. Here are two example plots.
One, with the axis labels rotated 45 degrees.
set.seed(3)
Name <- paste0("Name_", LETTERS[1:10])
dat2 <- data.frame(Name = Name, Freq = sample(100:200, 10))
bp <- barplot(dat2$Freq)
plotrix::staxlab(1, at = bp, labels = dat2$Name, srt = 45)
Another, with the labels spread over 3 lines.
bp <- barplot(dat2$Freq)
plotrix::staxlab(1, at = bp, labels = dat2$Name, nlines = 3)
Add colors with argument col. See help("par").
With ggplot2.
library(ggplot2)
ggplot(dat, aes(Name, Freq)) +
geom_bar(stat = "identity")
To add colors you have the aesthetics colour (for the contour of the bars) and fill (for the interior of the bars).

Creating boxplot in same graph and scale with two different length of dataset

I have a two set of data with different length.
Sameple datatype is:
A=c(423,430,500,460,457,300,325,498,450,453,486,459)
B=c(300,325,356345,378,391,367)
I want to create boxplot for them within a same graph and same scale. I tried it in ggplot2 in R. I also tried default boxplot in R.
boxplot (A~B)
but it showed error. I would like to use ggplot2 in R.
You have to create a dataset with those 2 vectors and then plot.
library(ggplot2)
A=c(423,430,500,460,457,300,325,498,450,453,486,459)
B=c(300,325,356345,378,391,367)
# create a dataset for each vector
df_A = data.frame(value=A, id="A")
df_B = data.frame(value=B, id="B")
# combine datasets
df = rbind(df_A, df_B)
# create the box plot
ggplot(df, aes(id, value)) + geom_boxplot()

Align X axis of scatterplot and boxplot

I'm superimposing two images in R. One image is a boxplot (using boxplot()), the other a scatterplot (using scatterplot()). I noticed a discrepancy in the scale along the x-axis. (A) is the boxplot scale. (B) is for the scatterplot.
What I've been trying to do is re-scale (B) to suit (A). I note there is a condition called xlim in scatterplot. Tried it, didn't work. I've also noted this example came up as I was typing out the question: Change Axis Label - R scatterplot.
Tried it, didn't work.
How can I modify the x-axis to change the scale from 1.0, 1.5, 2.0, 2.5, 3.0 to simply 1,2,3.
In Stata, I'm aware you can specify the x-axis range, and then indicate the step-ups between. For example, the range may be 0-100, and each measurable point would be set to 10. So you'd end up with 10, 20,....,100.
My R code, as it stands, looks something like this:
library(car)
boxplot(a,b,c)
par(new=T)
scatterplot(x, y, smooth=TRUE, boxplots=FALSE)
I've tried modifying scatterplot as such without any success:
scatterplot(x, y, smooth=TRUE, boxplots=FALSE, xlim=c(1,3))
As mentioned in comments use as.factor, then xaxis should align. Here is ggplot solution:
#dummy data
dat1 <- data.frame(group=as.factor(rep(1:3,4)),
var=c(runif(12)))
dat2 <- data.frame(x=as.factor(1:3),y=runif(3))
library(ggplot2)
library(grid)
library(gridExtra)
#plot points on top of boxplot
ggplot(dat1,aes(group,var)) +
geom_boxplot() +
geom_point(aes(x,y),dat2)
Plot as separate plots
gg_boxplot <-
ggplot(dat1,aes(group,var)) +
geom_boxplot()
gg_point <-
ggplot(dat2,aes(x,y)) +
geom_point()
grid.arrange(gg_boxplot,gg_point,
ncol=1,
main="Plotting is easier with ggplot")
EDIT
Using xlim as suggested by #RuthgerRighart
#dummy data - no factors
dat1 <- data.frame(group=rep(1:3,4),
var=c(runif(12)))
dat2 <- data.frame(x=1:3,y=runif(3))
par(mfrow=c(2,1))
boxplot(var~group,dat1,xlim=c(1,3))
plot(dat2$x,dat2$y,xlim=c(1,3))

Plotting each column of a dataframe as one line using ggplot

The whole dataset describes a module (or cluster if you prefer).
In order to reproduce the example, the dataset is available at:
https://www.dropbox.com/s/y1905suwnlib510/example_dataset.txt?dl=0
(54kb file)
You can read as:
test_example <- read.table(file='example_dataset.txt')
What I would like to have in my plot is this
On the plot, the x-axis is my Timepoints column, and the y-axis are the columns on the dataset, except for the last 3 columns. Then I used facet_wrap() to group by the ConditionID column.
This is exactly what I want, but the way I achieved this was with the following code:
plot <- ggplot(dataset, aes(x=Timepoints))
plot <- plot + geom_line(aes(y=dataset[,1],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,2],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,3],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,4],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,5],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,6],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,7],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,8],colour = dataset$InModule))
...
As you can see it is not very automated. I thought about putting in a loop, like
columns <- dim(dataset)[2] - 3
for (i in seq(1:columns))
{
plot <- plot + geom_line(aes(y=dataset[,i],colour = dataset$InModule))
}
(plot <- plot + facet_wrap( ~ ConditionID, ncol=6) )
That doesn't work.
I found this topic
Use for loop to plot multiple lines in single plot with ggplot2 which corresponds to my problem.
I tried the solution given with the melt() function.
The problem is that when I use melt on my dataset, I lose information of the Timepoints column to plot as my x-axis. This is how I did:
data_melted <- dataset
as.character(data_melted$Timepoints)
dataset_melted <- melt(data_melted)
I tried using aggregate
aggdata <-aggregate(dataset, by=list(dataset$ConditionID), FUN=length)
Now with aggdata at least I have the information on how many Timepoints for each ConditionID I have, but I don't know how to proceed from here and combine this on ggplot.
Can anyone suggest me an approach.
I know I could use the ugly solution of creating new datasets on a loop with rbind(also given in that link), but I don't wanna do that, as it sounds really inefficient. I want to learn the right way.
Thanks
You have to specify id.vars in your call to melt.data.frame to keep all information you need. In the call to ggplot you then need to specify the correct grouping variable to get the same result as before. Here's a possible solution:
melted <- melt(dataset, id.vars=c("Timepoints", "InModule", "ConditionID"))
p <- ggplot(melted, aes(Timepoints, value, color = InModule)) +
geom_line(aes(group=paste0(variable, InModule)))
p

Resources