Plotting lines in ggplot2 with given slope and intercept with legend - r

I have a dataframe containing columns for slope and intercept. I want to plot lines corresponding to each intercept and slope for a given range and assign them color using a 3rd column. As I don't know how to plot geom_abline without assigning a plot to ggplot first, I used other columns x and y to get a plot and adjust the axis scale to see the abline lines and hide x y plot. So far so good. However, when I try to use color for abline, I can see the lines in different colors but I don't see the legend labels. Here is my sample code:
slope <- c(rep(.28,5),rep(.26,5),rep(.27,5),rep(.28,5),rep(.24,5))
intercept <- c(rep(-2.5,5),rep(-1.7,5),rep(-1.63,5),rep(-1.5,5),rep(-1.58,5))
wf <- c(1,1,1,1,1,5,5,5,5,5,8,8,8,8,8,18,18,18,18,18,22,22,22,22,22)
x <- c(seq(1:5),seq(1:5),seq(1:5),seq(1:5),seq(1:5))
y <- c(seq(1:25))
df <- data.frame(cbind(slope,intercept,wf,x,y))
ggplot(data=df,
aes(x,y))+
geom_point()+
theme_bw() +
scale_y_continuous(limits=c(-3,0))+
geom_abline(data=df,aes(slope=slope,intercept=intercept,color=factor(wf)))
Here is the plot I get using the above code
I also tried to use
scale_color_manual(labels=c("1","5","8","18","22"))
to manually add the labels but it does not work.
Please help me to add labels or better how to plot lines with given slope and intercept to ggplot without having to fool around with other data.

Related

Geom_area plot doesn't fill the area between the lines

I want to make an area plot with ggplot(mpg, aes(x=year,y=hwy, fill=manufacturer)) + geom_area(), but I get this:
I'm realy new in R world, can anyone explain why it does not fill the area between the lines? Thanks!
First of all, there's nothing wrong with your code. It's working as intended and you are correct in the syntax required to do what you are looking to do.
Why don't you get the area geom to plot correctly, then? Simple answer is that you don't have enough points to draw a proper line between your x values for all of the aesthetics (manufacturers). Try the geom_point plot and you'll see what I mean:
ggplot(mpg, aes(x=year,y=hwy)) + geom_point(aes(color=manufacturer))
You need a different dataset. Here's a dummy one that is simply two lines with different slopes. It works as expected because each of the aesthetics has y values which span the x labels:
# dummy dataset
df <- data.frame(
x=rep(1:10,2),
y=c(seq(1,10,length.out=10), seq(1,5,length.out=10)),
z=c(rep('A',10), rep('B', 10))
)
# plot
ggplot(df, aes(x,y)) + geom_area(aes(fill=z))

Adding group mean lines to geom_bar plot and including in legend

I want to be able to create a bar graph which shows also shows the mean value for bars in each group. AND shows the mean bar in the legend.
I have been able to get this graph Bar chart with means using the code below, which is fine, but I would like to be able to see the mean lines in the legend.
##The data to be graphed is the proportion of persons receiving a treatment
## (num=numerator) in each population (denom=demoninator). The population is
##grouped by two age groups and (Age) and further divided by a categorical
##variable V1
###SET UP DATAFRAME###
require(ggplot2)
df <- data.frame(V1 = c(rep(c("S1","S2","S3","S4","S5"),2)),
Age= c(rep(70,5),rep(80,5)),
num=c(5280,6570,5307,4894,4119,3377,4244,2999,2971,2322),
denom=c(9984,12600,9425,8206,7227,7290,8808,6386,6206,5227))
df$prop<-df$num/df$denom*100
PopMean<-sum(df$num)/sum(df$denom)*100
df70<-df[df$Age==70,]
group70mean<-sum(df70$num)/sum(df70$denom)*100
df80<-df[df$Age==80,]
group80mean<-sum(df80$num)/sum(df80$denom)*100
df$PopMean<-c(rep(PopMean,10))
df$groupmeans<-c(rep(group70mean,5),rep(group80mean,5))
I want the plot to look like this, but want the lines in the legend too, to be labelled as 'mean of group' or similar.
#basic plot
P<-ggplot(df, aes(x=factor(Age), y=prop, fill=factor(V1))) +
geom_bar(position=position_dodge(), colour='black',stat="identity")
P
####add mean lines
P+geom_errorbar(aes(y=df$groupmeans, ymax=df$groupmeans,
ymin=df$groupmeans), col="red", lwd=2)
Adding show.legend=TRUE overlays the error bars onto the factor legend, rather than separately. If there is a way of showing geom_errorbar separately in the legend this is probably the simplest solution.
I have also tried various things with geom_line
The syntax below produces a line for the population mean value, but running from the centre of each point rather than covering the width of the bars
This produces a line for the population mean and it does produce a legend but one showing a bar of colour rather than a line.
P+geom_line(aes(y=df$PopMean, group=df$PopMean, color=df$PopMean),lwd=1)
If i try to do lines for group means the lines are not visible (because they are only single points).
P+geom_line(aes(y=df$groupmeans, group=df$groupmeans, color=df$groupmeans))
I also tried to get round this with facet plot, although this requires me to pretend my categorical variable is numeric to get it to work.
###set up new df
df2<-df
df2$V1<-c(rep(c(1,2,3,4,5),2))
P<-ggplot(df2, aes(x=factor(V1), y=prop, fill=factor(V1))) +
geom_bar(position=position_dodge(),
colour='black',stat="identity",width=1)
P+facet_grid(.~factor(df2$Age))
P+facet_grid(.~factor(df2$Age))+geom_line(aes(y=df$groupmeans,
group=df$groupmeans, color=df$groupmeans))
Facetplot
This allows me to show the mean lines, using geom_line, so a legend does appear (although it doesn't look right, showing a colour gradient rather than coloured lines!). However, the lines still do not go the full width of the bars. Also my x-axis now needs relabelling to show S1, S2 etc rather than numeric 1,2,3
To sum up - is there a way of showing error bar lines separately in the legend?
If not, then, if i use facetting, how do I correct the legend appearance and relabel axes with my categorical variables and is is possible to get the line to go the full width of the plot?
Or is there an alternate solution that I am missing!?
Thanks
To get the legend for the geom_error you need to pass the colour argument in the aes.
As you want only one category (here red), I've create a dummy variable first
df$mean <- "Mean"
ggplot(df, aes(x=factor(Age), y=prop, fill=factor(V1))) +
geom_bar(position=position_dodge(), colour='black',stat="identity") +
geom_errorbar(aes (ymax=groupmeans,
ymin=groupmeans, colour=mean), lwd=2) +
scale_colour_manual(name="",values = "#ff0000")

Align X axis of scatterplot and boxplot

I'm superimposing two images in R. One image is a boxplot (using boxplot()), the other a scatterplot (using scatterplot()). I noticed a discrepancy in the scale along the x-axis. (A) is the boxplot scale. (B) is for the scatterplot.
What I've been trying to do is re-scale (B) to suit (A). I note there is a condition called xlim in scatterplot. Tried it, didn't work. I've also noted this example came up as I was typing out the question: Change Axis Label - R scatterplot.
Tried it, didn't work.
How can I modify the x-axis to change the scale from 1.0, 1.5, 2.0, 2.5, 3.0 to simply 1,2,3.
In Stata, I'm aware you can specify the x-axis range, and then indicate the step-ups between. For example, the range may be 0-100, and each measurable point would be set to 10. So you'd end up with 10, 20,....,100.
My R code, as it stands, looks something like this:
library(car)
boxplot(a,b,c)
par(new=T)
scatterplot(x, y, smooth=TRUE, boxplots=FALSE)
I've tried modifying scatterplot as such without any success:
scatterplot(x, y, smooth=TRUE, boxplots=FALSE, xlim=c(1,3))
As mentioned in comments use as.factor, then xaxis should align. Here is ggplot solution:
#dummy data
dat1 <- data.frame(group=as.factor(rep(1:3,4)),
var=c(runif(12)))
dat2 <- data.frame(x=as.factor(1:3),y=runif(3))
library(ggplot2)
library(grid)
library(gridExtra)
#plot points on top of boxplot
ggplot(dat1,aes(group,var)) +
geom_boxplot() +
geom_point(aes(x,y),dat2)
Plot as separate plots
gg_boxplot <-
ggplot(dat1,aes(group,var)) +
geom_boxplot()
gg_point <-
ggplot(dat2,aes(x,y)) +
geom_point()
grid.arrange(gg_boxplot,gg_point,
ncol=1,
main="Plotting is easier with ggplot")
EDIT
Using xlim as suggested by #RuthgerRighart
#dummy data - no factors
dat1 <- data.frame(group=rep(1:3,4),
var=c(runif(12)))
dat2 <- data.frame(x=1:3,y=runif(3))
par(mfrow=c(2,1))
boxplot(var~group,dat1,xlim=c(1,3))
plot(dat2$x,dat2$y,xlim=c(1,3))

Vioplot R: How to set axis labels

I have a dataframe mdata which looks like:
>head(mdata)
ID variable value
SJ5444_MAXGT coding 4.241920
SJ5426_MAXGT coding 4.254331
HR1383_MAXGT coding 4.244994
HR5522_MAXGT missense 4.250347
CH30041_MAXGT missense 4.303174
SJ5438_MAXGT utr.3 4.242218
and I am trying to plot a violin plot like this:
x1<- mdata$value[mdata$variable=='coding']
x2<- mdata$value[mdata$variable=='missense']
x3<- mdata$value[mdata$variable=='utr.3']
vioplot(x1, x2, x3, names=as.character(unique(mdata$variable)), col="red")
title("Violin Plot: Log10 values")
But I have another dataframe ndata which looks like:
>head(ndata)
ID variable value
SJ5444_MAXGT coding 17455
SJ5426_MAXGT coding 17961
HR1383_MAXGT coding 17579
HR5522_MAXGT missense 17797
CH30041_MAXGT missense 20099
SJ5438_MAXGT utr.3 17467
Basically mdata$value is:
mdata$value = log10(ndata$value)
So I can make the Violin plot alright. But I need to change the Y-axis labels to match ndata$value and not mdata$value. I am plotting mdata$value but want the Y-axis labels taken from ndata$value. Just FYI, this is a subset of the actual data & min and max values in actual data are 12 & 36937 and I know how to plot it on a boxplot using:
axis(side=2,labels=round(10^(seq(log10(min(ndata$value)),log10(max(ndata$value)),len=5))),at=seq(log10(min(ndata$value)),log10(max(ndata$value)),len=5))
But I cannot plot the Y-axis labels to match ndata$value in the Violin plot. Any suggestions?
P.S. I could not find a tag vioplot or violinplot so I couldn't tag it.
vioplot isn't very flexible -- it doesn't allow you to turn off the axis labels or modify them -- but you can create your own empty plot first, then add the violin plot to it with vioplot(...,add=TRUE), then add the labels manually, as follows:
## make up data
set.seed(101)
x1 <- rlnorm(1000,meanlog=3,sdlog=1)
x2 <- rlnorm(1000,meanlog=3,sdlog=2)
x3 <- rlnorm(1000,meanlog=2,sdlog=2)
Now create the plot:
library(vioplot)
par(las=1,bty="l") ## my preferred setting
## set up empty plot
plot(0:1,0:1,type="n",xlim=c(0.5,3.5),ylim=range(log10(c(x1,x2,x3))),
axes=FALSE,ann=FALSE)
vioplot(log10(x1),log10(x2),log10(x3),add=TRUE)
axis(side=1,at=1:3,labels=c("first","second","third"))
axis(side=2,at=-2:4,labels=10^(-2:4))
Alternately, you could use ggplot2::geom_violin() along with scale_y_log10() (I think).
Based on Ben Bolker's suggestion, I used ggplot2::geom_violin() and achieved what I wanted, plotting log10(value) but labeling 'value' as such on the Y-axis using:
ggplot(mdata, aes(variable, log10(value))) + geom_violin(colour="black",fill="red")
+ scale_y_continuous(
breaks = seq(log10(min(mdata$value)),log10(max(mdata$value)),len=5),
labels = round(10^(seq(log10(min(mdata$value)),log10(max(mdata$value)),len=5)))
)

Problems making a graphic in ggplot

I an working with ggplot. I want to desine a graphic with ggplot. This graphics is with two continuous variables but I would like to get a graphic like this:
Where x and y are the continuous variables. My problem is I can't get it to show circles in the line of the plot. I would like the plot to have circles for each pair of observations from the continuous variables. For example in the attached graphic, it has a circle for pairs (1,1), (2,2) and (3,3). It is possible to get it? (The colour of the line doesn't matter.)
# dummy data
dat <- data.frame(x = 1:5, y = 1:5)
ggplot(dat, aes(x,y,color=x)) +
geom_line(size=3) +
geom_point(size=10) +
scale_colour_continuous(low="blue",high="red")
Playing with low/high will change the colours.
In general, to remove the legend, use + theme(legend.position="none")

Resources