Vioplot R: How to set axis labels - r

I have a dataframe mdata which looks like:
>head(mdata)
ID variable value
SJ5444_MAXGT coding 4.241920
SJ5426_MAXGT coding 4.254331
HR1383_MAXGT coding 4.244994
HR5522_MAXGT missense 4.250347
CH30041_MAXGT missense 4.303174
SJ5438_MAXGT utr.3 4.242218
and I am trying to plot a violin plot like this:
x1<- mdata$value[mdata$variable=='coding']
x2<- mdata$value[mdata$variable=='missense']
x3<- mdata$value[mdata$variable=='utr.3']
vioplot(x1, x2, x3, names=as.character(unique(mdata$variable)), col="red")
title("Violin Plot: Log10 values")
But I have another dataframe ndata which looks like:
>head(ndata)
ID variable value
SJ5444_MAXGT coding 17455
SJ5426_MAXGT coding 17961
HR1383_MAXGT coding 17579
HR5522_MAXGT missense 17797
CH30041_MAXGT missense 20099
SJ5438_MAXGT utr.3 17467
Basically mdata$value is:
mdata$value = log10(ndata$value)
So I can make the Violin plot alright. But I need to change the Y-axis labels to match ndata$value and not mdata$value. I am plotting mdata$value but want the Y-axis labels taken from ndata$value. Just FYI, this is a subset of the actual data & min and max values in actual data are 12 & 36937 and I know how to plot it on a boxplot using:
axis(side=2,labels=round(10^(seq(log10(min(ndata$value)),log10(max(ndata$value)),len=5))),at=seq(log10(min(ndata$value)),log10(max(ndata$value)),len=5))
But I cannot plot the Y-axis labels to match ndata$value in the Violin plot. Any suggestions?
P.S. I could not find a tag vioplot or violinplot so I couldn't tag it.

vioplot isn't very flexible -- it doesn't allow you to turn off the axis labels or modify them -- but you can create your own empty plot first, then add the violin plot to it with vioplot(...,add=TRUE), then add the labels manually, as follows:
## make up data
set.seed(101)
x1 <- rlnorm(1000,meanlog=3,sdlog=1)
x2 <- rlnorm(1000,meanlog=3,sdlog=2)
x3 <- rlnorm(1000,meanlog=2,sdlog=2)
Now create the plot:
library(vioplot)
par(las=1,bty="l") ## my preferred setting
## set up empty plot
plot(0:1,0:1,type="n",xlim=c(0.5,3.5),ylim=range(log10(c(x1,x2,x3))),
axes=FALSE,ann=FALSE)
vioplot(log10(x1),log10(x2),log10(x3),add=TRUE)
axis(side=1,at=1:3,labels=c("first","second","third"))
axis(side=2,at=-2:4,labels=10^(-2:4))
Alternately, you could use ggplot2::geom_violin() along with scale_y_log10() (I think).

Based on Ben Bolker's suggestion, I used ggplot2::geom_violin() and achieved what I wanted, plotting log10(value) but labeling 'value' as such on the Y-axis using:
ggplot(mdata, aes(variable, log10(value))) + geom_violin(colour="black",fill="red")
+ scale_y_continuous(
breaks = seq(log10(min(mdata$value)),log10(max(mdata$value)),len=5),
labels = round(10^(seq(log10(min(mdata$value)),log10(max(mdata$value)),len=5)))
)

Related

How to shade an area of the curve, with values from a different column in a data table that is not part of x and y?

I have a question about shading an area in using R. I have a data table that looks like this:
Now, I want to plot x and y which I can do using plot(x,y, type ='l'). But the question is: How can I shade the area of the plot (from 0 to infinity in y axis) whenever my 'z' value in the data table is 1?
I really appreciate your desire to help..
Thank you so much.
I would suggest next ggplot2 approach. Try creating a reference variable when z==1 so that you can identify the coordinates for x-axis and then use geom_ribbon() in next way:
library(ggplot2)
#Data
df <- data.frame(x=2:9,
y=(2:9)^2,
z=c(rep(1,5),rep(0,3)))
#Create reference
df$Ref <- ifelse(df$z==1,df$x,NA)
#Plot
ggplot(df,aes(x=x,y=y))+
geom_line()+
geom_ribbon(aes(x=Ref,ymin=0,ymax=Inf),fill='blue',alpha=0.4)
Output:
Other options would be:
#Create reference
df$Ref <- ifelse(df$z==1,df$x,NA)
df$Ref1 <- ifelse(df$z==1,1,NA)
#Option 1
ggplot(df,aes(x=x,y=y))+
geom_line()+
geom_ribbon(aes(x=Ref,ymin=y,ymax=Inf),fill='blue',alpha=0.4)
#Option 2
ggplot(df,aes(x=x,y=y))+
geom_line()+
geom_ribbon(aes(ymin=Ref1,ymax=Inf),fill='blue',alpha=0.4)

How to Plot Bar Charts for a Categorical Variable Against an Analytical Variable in R

I'm struggling with how to do something with R that comes very easily to me in Excel: so I'm sure this is something quite basic but I'm just not aware of the equivalent method in R.
In essence, I have a two variables in my dataset: a categorical variable which has a list of names, and an analytical variable that has the frequency corresponding to that particular observation.
Something like this:
Name Freq
==== =========
X 100
Y 200
and so on.
I would like to plot a bar chart with the names listed on the X-Axis (X, Y and so on) and bars of height corresponding to the relevant value of the Freq. variable for that observation.
This is something very trivial with Excel; I can just select the relevant cells and create a bar chart.
However, in R I just can't seem to figure out how to do this! The bar charts in R seems to be univariate only and doesn't behave the way I want it to. Trying to plot the two variables results in a scatter plot which is not what I'm going for.
Is there something very basic I'm missing here, or is R just not capable of performing this task?
Any pointers will be much helpful.
Edited to Add:
I was primarily trying to use base R's plot function to get the job done.
Using, plot(dataset1$Name, dataset1$Freq) does not lead to a bar graph but a scatter-plot instead.
First the data.
dat <- data.frame(Name = c("X", "Y"), Freq = c(100, 200))
With base R.
barplot(dat$Freq, names.arg = dat$Name)
If you want to display a long list of names.arg, maybe the best way is to customize your horizontal axis with function staxlab from package plotrix. Here are two example plots.
One, with the axis labels rotated 45 degrees.
set.seed(3)
Name <- paste0("Name_", LETTERS[1:10])
dat2 <- data.frame(Name = Name, Freq = sample(100:200, 10))
bp <- barplot(dat2$Freq)
plotrix::staxlab(1, at = bp, labels = dat2$Name, srt = 45)
Another, with the labels spread over 3 lines.
bp <- barplot(dat2$Freq)
plotrix::staxlab(1, at = bp, labels = dat2$Name, nlines = 3)
Add colors with argument col. See help("par").
With ggplot2.
library(ggplot2)
ggplot(dat, aes(Name, Freq)) +
geom_bar(stat = "identity")
To add colors you have the aesthetics colour (for the contour of the bars) and fill (for the interior of the bars).

Plotting lines in ggplot2 with given slope and intercept with legend

I have a dataframe containing columns for slope and intercept. I want to plot lines corresponding to each intercept and slope for a given range and assign them color using a 3rd column. As I don't know how to plot geom_abline without assigning a plot to ggplot first, I used other columns x and y to get a plot and adjust the axis scale to see the abline lines and hide x y plot. So far so good. However, when I try to use color for abline, I can see the lines in different colors but I don't see the legend labels. Here is my sample code:
slope <- c(rep(.28,5),rep(.26,5),rep(.27,5),rep(.28,5),rep(.24,5))
intercept <- c(rep(-2.5,5),rep(-1.7,5),rep(-1.63,5),rep(-1.5,5),rep(-1.58,5))
wf <- c(1,1,1,1,1,5,5,5,5,5,8,8,8,8,8,18,18,18,18,18,22,22,22,22,22)
x <- c(seq(1:5),seq(1:5),seq(1:5),seq(1:5),seq(1:5))
y <- c(seq(1:25))
df <- data.frame(cbind(slope,intercept,wf,x,y))
ggplot(data=df,
aes(x,y))+
geom_point()+
theme_bw() +
scale_y_continuous(limits=c(-3,0))+
geom_abline(data=df,aes(slope=slope,intercept=intercept,color=factor(wf)))
Here is the plot I get using the above code
I also tried to use
scale_color_manual(labels=c("1","5","8","18","22"))
to manually add the labels but it does not work.
Please help me to add labels or better how to plot lines with given slope and intercept to ggplot without having to fool around with other data.

Align X axis of scatterplot and boxplot

I'm superimposing two images in R. One image is a boxplot (using boxplot()), the other a scatterplot (using scatterplot()). I noticed a discrepancy in the scale along the x-axis. (A) is the boxplot scale. (B) is for the scatterplot.
What I've been trying to do is re-scale (B) to suit (A). I note there is a condition called xlim in scatterplot. Tried it, didn't work. I've also noted this example came up as I was typing out the question: Change Axis Label - R scatterplot.
Tried it, didn't work.
How can I modify the x-axis to change the scale from 1.0, 1.5, 2.0, 2.5, 3.0 to simply 1,2,3.
In Stata, I'm aware you can specify the x-axis range, and then indicate the step-ups between. For example, the range may be 0-100, and each measurable point would be set to 10. So you'd end up with 10, 20,....,100.
My R code, as it stands, looks something like this:
library(car)
boxplot(a,b,c)
par(new=T)
scatterplot(x, y, smooth=TRUE, boxplots=FALSE)
I've tried modifying scatterplot as such without any success:
scatterplot(x, y, smooth=TRUE, boxplots=FALSE, xlim=c(1,3))
As mentioned in comments use as.factor, then xaxis should align. Here is ggplot solution:
#dummy data
dat1 <- data.frame(group=as.factor(rep(1:3,4)),
var=c(runif(12)))
dat2 <- data.frame(x=as.factor(1:3),y=runif(3))
library(ggplot2)
library(grid)
library(gridExtra)
#plot points on top of boxplot
ggplot(dat1,aes(group,var)) +
geom_boxplot() +
geom_point(aes(x,y),dat2)
Plot as separate plots
gg_boxplot <-
ggplot(dat1,aes(group,var)) +
geom_boxplot()
gg_point <-
ggplot(dat2,aes(x,y)) +
geom_point()
grid.arrange(gg_boxplot,gg_point,
ncol=1,
main="Plotting is easier with ggplot")
EDIT
Using xlim as suggested by #RuthgerRighart
#dummy data - no factors
dat1 <- data.frame(group=rep(1:3,4),
var=c(runif(12)))
dat2 <- data.frame(x=1:3,y=runif(3))
par(mfrow=c(2,1))
boxplot(var~group,dat1,xlim=c(1,3))
plot(dat2$x,dat2$y,xlim=c(1,3))

Plotting each column of a dataframe as one line using ggplot

The whole dataset describes a module (or cluster if you prefer).
In order to reproduce the example, the dataset is available at:
https://www.dropbox.com/s/y1905suwnlib510/example_dataset.txt?dl=0
(54kb file)
You can read as:
test_example <- read.table(file='example_dataset.txt')
What I would like to have in my plot is this
On the plot, the x-axis is my Timepoints column, and the y-axis are the columns on the dataset, except for the last 3 columns. Then I used facet_wrap() to group by the ConditionID column.
This is exactly what I want, but the way I achieved this was with the following code:
plot <- ggplot(dataset, aes(x=Timepoints))
plot <- plot + geom_line(aes(y=dataset[,1],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,2],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,3],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,4],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,5],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,6],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,7],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,8],colour = dataset$InModule))
...
As you can see it is not very automated. I thought about putting in a loop, like
columns <- dim(dataset)[2] - 3
for (i in seq(1:columns))
{
plot <- plot + geom_line(aes(y=dataset[,i],colour = dataset$InModule))
}
(plot <- plot + facet_wrap( ~ ConditionID, ncol=6) )
That doesn't work.
I found this topic
Use for loop to plot multiple lines in single plot with ggplot2 which corresponds to my problem.
I tried the solution given with the melt() function.
The problem is that when I use melt on my dataset, I lose information of the Timepoints column to plot as my x-axis. This is how I did:
data_melted <- dataset
as.character(data_melted$Timepoints)
dataset_melted <- melt(data_melted)
I tried using aggregate
aggdata <-aggregate(dataset, by=list(dataset$ConditionID), FUN=length)
Now with aggdata at least I have the information on how many Timepoints for each ConditionID I have, but I don't know how to proceed from here and combine this on ggplot.
Can anyone suggest me an approach.
I know I could use the ugly solution of creating new datasets on a loop with rbind(also given in that link), but I don't wanna do that, as it sounds really inefficient. I want to learn the right way.
Thanks
You have to specify id.vars in your call to melt.data.frame to keep all information you need. In the call to ggplot you then need to specify the correct grouping variable to get the same result as before. Here's a possible solution:
melted <- melt(dataset, id.vars=c("Timepoints", "InModule", "ConditionID"))
p <- ggplot(melted, aes(Timepoints, value, color = InModule)) +
geom_line(aes(group=paste0(variable, InModule)))
p

Resources