How to change dots in forest plot? - r

I have an excel table with the data of the Odds Ratios of different diseases for my study. I want to make a forestplot with the R package ggplot2. I have used this script:
library(ggplot2)
df <- excel.xlsx
fp <- ggplot(data=df, aes(x=Disease, y=OR, ymin=Lower, ymax=Upper)) +
geom_pointrange() +
geom_hline(yintercept=1, lty=2) + # add a dotted line at x=1 after flip
coord_flip() + # flip coordinates (puts labels on y axis)
xlab("Disease") + ylab("OR (95% CI)") +
theme_bw() # use a white background
print(fp)
This makes round black spots for all diseases.I would like to change the shape of the dots on the graph to squares or other different form, but only to some diseases. I would like to change the shape of the points on the graph corresponding to rows 6, 8, 14 and 16 and the rest of the points leave them as they are now.
Thank you in advanced.
I have tried this script but it makes only black spots.

the example code is not reproducible when I'm writing this answer, but I think you just need to specify shape in the aes
This question includes a complete example with multiple shapes

Related

How do I add intensity legend of colors after I plot using grid.raster()?

I am doing kmeans clustering on a png image and have been plotting it using grid::grid.raster(image). But I would like to put a legend which shows the intensity in a bar(from blue to red) marked with values, essentially indicating the intensity on the image. (image is an array where the third dimension equals 3 giving the red, green and blue channels.)
I thought of using grid.legend() but couldn't figure it out. I am hoping the community can help me out. Following is the image I have been using and after I perform kmeans clustering want a legend beside it that displays intensity on a continuous scale on a color bar.
Also I tried with ggplot2 and could plot the image but still couldn't plot the legend. I am providing the ggplot code for plotting the image. I can extract the RGB channels separately using ggplot2 also, so showing that also helps.
colassign <- rgb(Kmeans2#centers[clusters(Kmeans2),])
library(ggplot2)
ggplot(data = imgVEC, aes(x = x, y = y)) +
geom_point(colour = colassign) +
labs(title = paste("k-Means Clustering of", kClusters, "Colours")) +
xlab("x") +
ylab("y")
Did not find a way to use grid.raster() properly but found a way to do it by ggplot2 when plotting the RGB channels separately. Note: this only works for plotting the pannels separately, but this is what I needed. Following shows the code for green channel.
#RGB channels are respectively stored in columns 1,2,3.
#x-axis and y-axis values are stored in columns 4,5.
#original image is a nx5 matrix
ggplot(original_img[,c(3,4,5)], aes(x, y)) +
geom_point(aes(colour = segmented_img[,3])) +
scale_color_gradient2()+
# scale_color_distiller(palette="RdYlBu") can be used instead of scale_color_gradient2() to get color selections of choice using palette as argument.

Legends and labelling smooth fitted lines + additional lines using ggplot2

I am working on visualising some patterns in network data and have some issues labelling lines, where I have multiple classes of lines:
loess lines for each factor (network)
a baseline at y=4000
a gam line that acts on all of the data (not factored)
Now, stack overflow has helped get me to this point (thanks!), but I feel like I have run into a brick wall for what I need to do:
A. provide a legend entry for the line #3
B. label each line on the graph (as per #1 #2 #3 - so 8 lines total)
Here is the code that I have so far:
p <- ggplot(network_data, aes(x=timeofday,y=dspeed, colour=factor(network)))+stat_smooth(method="loess",formula=y~x,se=FALSE)
p <- p + stat_function(fun=function(x)4000, geom="line", linetype="dashed", aes(colour="Baseline"))
p <- p + xlab("Time of Day (hr)") + ylab("Download Speed (ms)")
p <- p + theme(axis.line=element_line(colour="black"))
# add the gam line, colouring it purple for now
q <- layer(data=network_data, mapping=aes(x=timeofday,y=dspeed), stat="smooth"
, stat_params=list(method="gam", formula=y~s(x), se=FALSE), geom="smooth", geom_params=list(colour="purple"), position=position_identity())
graph <- p+q # add the layer
#legend
graph <- graph+scale_colour_discrete(name="network")
# set up the origin correctly and axes etc
graph2 <- graph + scale_y_continuous(limits=c(0,6500), expand=c(0,0), breaks=c(0,1000,2000,3000,4000,5000,6000)) + scale_x_datetime(limits=as.POSIXct(c("2015-04-13 00:00:01","2015-04-13 23:59:59")), expand = c(0, 0), breaks=date_breaks("1 hour"), labels=date_format("%H"))
Happy to consider other packages, but ggplot2 seems to be the best so far.
Is there anyway to do this 'automatically' (through programming) as I am trying to automate the generation of these graphs?
I have made the data available here as a .Rda file:
https://dl.dropboxusercontent.com/u/5268020/network_data.Rda
And here is an image of the current plot:
For q B, try annotate and manually code in the location and text for the label of each line. Seems unnecessary given the legend.
http://docs.ggplot2.org/current/annotate.html

5 dimensional plot in r

I am trying to plot a 5 dimensional plot in R. I am currently using the rgl package to plot my data in 4 dimensions, using 3 variables as the x,y,z, coordinates, another variable as the color. I am wondering if I can add a fifth variable using this package, like for example the size or the shape of the points in the space. Here's an example of my data, and my current code:
set.seed(1)
df <- data.frame(replicate(4,sample(1:200,1000,rep=TRUE)))
addme <- data.frame(replicate(1,sample(0:1,1000,rep=TRUE)))
df <- cbind(df,addme)
colnames(df) <- c("var1","var2","var3","var4","var5")
require(rgl)
plot3d(df$var1, df$var2, df$var3, col=as.numeric(df$var4), size=0.5, type='s',xlab="var1",ylab="var2",zlab="var3")
I hope it is possible to do the 5th dimension.
Many thanks,
Here is a ggplot2 option. I usually shy away from 3D plots as they are hard to interpret properly. I also almost never put in 5 continuous variables in the same plot as I have here...
ggplot(df, aes(x=var1, y=var2, fill=var3, color=var4, size=var5^2)) +
geom_point(shape=21) +
scale_color_gradient(low="red", high="green") +
scale_size_continuous(range=c(1,12))
While this is a bit messy, you can actually reasonably read all 5 dimensions for most points.
A better approach to multi-dimensional plotting opens up if some of your variables are categorical. If all your variables are continuous, you can turn some of them to categorical with cut and then use facet_wrap or facet_grid to plot those.
For example, here I break up var3 and var4 into quintiles and use facet_grid on them. Note that I also keep the color aesthetics as well to highlight that most of the time turning a continuous variable to categorical in high dimensional plots is good enough to get the key points across (here you'll notice that the fill and border colors are pretty uniform within any given grid cell):
df$var4.cat <- cut(df$var4, quantile(df$var4, (0:5)/5), include.lowest=T)
df$var3.cat <- cut(df$var3, quantile(df$var3, (0:5)/5), include.lowest=T)
ggplot(df, aes(x=var1, y=var2, fill=var3, color=var4, size=var5^2)) +
geom_point(shape=21) +
scale_color_gradient(low="red", high="green") +
scale_size_continuous(range=c(1,12)) +
facet_grid(var3.cat ~ var4.cat)

Can I add a third variable to graph with geom_rug?

I have sports dataset which shows a team's result, win draw or loss, cumulative games played and league standing. A simple plot of position by games played is produced thus
df<- data.frame(played=c(1:5),
result=c("W","L","D","D","L"),
position=c(1,3,4,4,5))
ggplot() +
geom_line(data=df,aes(x=played,y=position)) +
scale_y_reverse()
I would like to add a rug on the x axis with a different colour for each result, say W is green, L red and D, blue but cannot seem to solve it using geom_rug or adding a geom_bar.
This should do the trick:
##The data frame df is now inherited by
##the other geom's
ggplot(data=df,aes(x=played,y=position)) +
geom_line() +
scale_y_reverse() +
geom_rug(sides="b", aes(colour=result))
In the geom_rug function, we specify that we only want a rug on the bottom and that we should colour the lines conditional on the result. To change the colours, look at the scale_colour_* functions. For your particular colours, try:
+ scale_colour_manual(values=c("blue","red", "green"))

Make multiple ggplot have the same point colours in r

I need to show 3 ggplot scatterplots and one dendrogram on one page. How can I make the point colours the same in each scatter plot (i.e. I need the points for group two to be the same colour for all 3 graphs).
require(graphics)
require(ggplot)
require(ggdendro)
#Scatter plots
df1<-data.frame(x=c(3,4,5),y=c(15,20,25),grp=c(1,2,2))
df1$grp =factor(df1$grp)
colnames(df1)[3]="Group"
p<-ggplot(df1,aes(x,y))
p<-p+ geom_point(aes(colour=factor(Group)),size=4)
p1<-p + coord_fixed()
df2<-data.frame(x=c(3,4,5,6),y=c(15,20,25,30),grp=c(1,2,2,3))
df2$grp =factor(df2$grp)
colnames(df2)[3]="Group"
p<-ggplot(df2,aes(x,y))
p<-p+ geom_point(aes(colour=factor(Group)),size=4)
p2<-p + coord_fixed()
df3<-data.frame(x=c(3,4,5,6,7),y=c(15,20,25,30,35),grp=c(1,2,2,3,4))
df3$grp =factor(df3$grp)
colnames(df3)[3]="Group"
p<-ggplot(df3,aes(x,y))
p<-p+ geom_point(aes(colour=factor(Group)),size=4)
p3<-p + coord_fixed()
#Dendrogram
dis <- hclust(dist(USArrests), "ave")
d<-as.dendrogram(dis)
ddata<-dendro_data(d,type="rectangle")
dp<-ggplot(segment(ddata)) + geom_segment(aes(x=x,y=y,xend=xend,yend=yend))
dp<-dp+geom_hline(aes(yintercept=50),colour="red")
I tried used the multi plot function
multiplot(p1,p2,p3,dp,cols=2)
and got:
Bonus: The graphs all have a fixed aspect ratio such that scatterplot are different sizes, which is fine but I really don't need the scatterplot to take up so much space. How can I control how much space each figure is given in the final figure?

Resources