how to make the biplot name more clear using ggbiplot - r

I have a data which can be download from here
https://gist.github.com/anonymous/5f1135e4f750a39b0255
I try to plot a PCA with ggbiplot using the following function
data <- read.delim("path to the data.txt")
data.pca <- prcomp (data, center = TRUE, scale =TRUE)
library(ggbiplot)
g <- ggbiplot(data.pca, obs.scale =1, var.scale=1, ellipse = TRUE, circle=TRUE)
g <- g + scale_color_discrete(name='')
g <- g + theme(legend.direction = 'horizontal', legend.position = 'top')
print(g)
however, it is very difficult to see the biplot lines names,
is there any way to make it more clear or show it better ?

I think a way to make it clearer is to adjust the size and position of the labels using the varname.sizeand varname.adjust arguments. However, with a lot of variables it still looks crowded. By increasing the length of the arrows (similar to stats::biplot()), makes it look somewhat better (imo)
# install ggbiplot
#require(devtools)
#install_github('ggbiplot','vqv')
library(httr)
library(ggbiplot)
# read data
url <- "https://gist.githubusercontent.com/anonymous/5f1135e4f750a39b0255/raw/data.txt"
dat <- read.table(text=content(GET(url), as="text"), header=TRUE)
# pca
data.pca <- prcomp (dat, center = TRUE, scale =TRUE)
# original plot + increase labels size and space from line
p <- ggbiplot(data.pca, obs.scale=1,
var.scale=1, circle=F,
varname.size=4, varname.adjust=2)
p
# use coord_equal() to change size ratio of plot (excludes use of circle)
p <- p + coord_equal(1.5) + theme_classic()
p
To extend the arrows, the x and y coordinates need to be recalculated. You can then use these to edit the relevant grobs, and change any other parameter (colour, size, rotation etc). (you could go the whole ggplotGrob(p) approach, but just use grid.edit() below.)
# function to rescale the x & y positions of the lines and labels
f <- function(a0, a1, M=M)
{
l <- lapply(as.list(environment()), as.numeric)
out <- M* (l$a1 - l$a0) + l$a0
grid::unit(out, "native")
}
# get list of grobs in current graphics window
grobs <- grid.ls(print=FALSE)
# find segments grob for the arrows
s_id <- grobs$name[grep("segments", grobs$name)]
# edit length and colour of lines
seg <- grid.get(gPath(s_id[2]))
grid.edit(gPath(s_id[2]),
x1=f(seg$x0, seg$x1, 2),
y1=f(seg$y0, seg$y1, 2),
gp=gpar(col="red"))
# find text grob for the arrow labels
lab_id <- grobs$name[grep("text", grobs$name)]
# edit position of text, and rotate and colour labels
seg2 <- grid.get(gPath(lab_id))
grid.edit(gPath(lab_id),
x=f(seg$x0, seg2$x, 2),
y=f(seg$y0, seg2$y, 2),
rot=0,
gp=gpar(col="red"))
Subjective if this makes it better, and perhaps it is easier just to use biplot() or even define a new function

Related

Multiple plots lay out as upper triangle matrix and formatted as scatter plots

I am doing a Bayesian modeling and I have 8 variables, for each variable there is a correlation parameter, and each pair of variables have a correlation parameter. All those parameters have their posterior density plot. I would like to arrange those plots in a upper triangle layout, and I used the result from
Upper triangle layout
However, it would be better if I can title/label the plots in the same fashion as what scatter plot does, i.e., I will only have titles(variable names) in the top and in the right, and by checking the corresponding titles on the top and right of each sub figure, people will know what the correlation parameter is standing for.
Here is minimum example I have achieved, where I only used 3 variables for illustration.
require(ggplot2)
corr_1 = rnorm(100)
corr_2 = rnorm(100)
corr_12 = rnorm(100)
corr_list = list(corr_1, corr_2, corr_12)
ttls = c('variance within variable 1',
'correlation within variable 1 & 2',
'variance within variable 2')
plots = list()
for(i in 1:3){
temp_df = data.frame(x=corr_list[[i]])
temp = ggplot(data=temp_df, aes(x=x)) +
geom_density()+
ggtitle(ttls[i])
plots[[i]] = temp
}
library(gridExtra) ## for grid.arrange()
library(grid)
ng <- nullGrob()
grid.arrange(plots[[1]], plots[[2]],
ng, plots[[3]])
So what I want is instead of explicitly stating what the correlation means, having labels in the top of the plot. I should have title "variable1" and "variable2" on top, and on the right of the plot, I have title "variable1" and "variable2" vertically, just like what the scatter plots does.
The final layout I'd like to have is similar to this one:
However, the difference is that mine plots requires the off diagonal parts to be all density plot, and all the density plots are independent, i.e., the data does not depend on other variables, as in my minimum example, I have independent plots stored in a list( while in a pairwise scatter plot, each subplot is using one variable as x, one as y).
I'm assuming that you have your plots appropriately arranged, and that all you need is to add the variable labels. I've made a couple of changes to the plot function to remove titles and axis labels.
arrangeGrob returns a grob which is also a gtable. Thus, gtable functions can be applied to add the labels. I've added some comments below.
library(ggplot2)
library(gridExtra)
library(grid)
library(gtable)
corr_1 = rnorm(100)
corr_2 = rnorm(100)
corr_12 = rnorm(100)
corr_list = list(corr_1, corr_2, corr_12)
ttls = c('variance within variable 1',
'correlation within variable 1 & 2',
'variance within variable 2')
plots = list()
for(i in 1:3){
temp_df = data.frame(x=corr_list[[i]])
temp = ggplot(data=temp_df, aes(x=x)) +
geom_density() +
theme(axis.title = element_blank()) #+
# ggtitle(ttls[i])
plots[[i]] = temp
}
ng <- nullGrob()
gp <- arrangeGrob(plots[[1]], plots[[2]],
ng, plots[[3]])
# The gp object is a gtable;
# thus gtable functions can be applied to add the the necessary labels
# A list of text grobs - the labels
vars <- list(textGrob("Variable 1"), textGrob("Variable 2"))
# So that there is space for the labels,
# add a row to the top of the gtable,
# and a column to the left of the gtable.
gp <- gtable_add_cols(gp, unit(1.5, "lines"), 0)
gp <- gtable_add_rows(gp, unit(1.5, "lines"), 0)
# Add the label grobs.
# The labels on the left should be rotated; hence the edit.
# t and l refer to cells in the gtable layout.
# gtable_show_layout(gp) shows the layout.
gp <- gtable_add_grob(gp, lapply(vars, editGrob, rot = 90), t = 2:3, l = 1)
gp <- gtable_add_grob(gp, vars, t = 1, l = 2:3)
# Draw it
grid.newpage()
grid.draw(gp)

incorporate standalone legend in ggpairs (take 2)

tl;dr can't get a standalone legend (describing common colours across the whole plot) in ggpairs to my satisfaction.
Sorry for length.
I'm trying to draw a (lower-triangular) pairs plot using GGally::ggpairs (an extension package for drawing various kinds of plot matrices with ggplot2). This is essentially the same question as How to add an external legend to ggpairs()? , but I'm not satisfied with the answer to that question aesthetically, so I'm posting this as an extension (if suggested/recommended by commenters, I will delete this question and offer a bounty on that question instead). In particular, I would like the legend to appear outside the sub-plot frame, either putting it within one virtual subplot but allowing additional width to hold it, or (ideally) putting it in a separate (empty) subplot. As I show below, both of my partial solutions have problems.
Fake data:
set.seed(101)
dd <- data.frame(x=rnorm(100),
y=rnorm(100),
z=rnorm(100),
f=sample(c("a","b"),size=100,replace=TRUE))
library(GGally)
Base plot function:
ggfun <- function(...) {
ggpairs(dd,mapping = ggplot2::aes(color = f),
columns=1:3,
lower=list(continuous="points"),
diag=list(continuous="blankDiag"),
upper=list(continuous="blank"),
...)
}
Function to trim top/right column:
trim_gg <- function(gg) {
n <- gg$nrow
gg$nrow <- gg$ncol <- n-1
v <- 1:n^2
gg$plots <- gg$plots[v>n & v%%n!=0]
gg$xAxisLabels <- gg$xAxisLabels[-n]
gg$yAxisLabels <- gg$yAxisLabels[-1]
return(gg)
}
gg0 <- trim_gg(ggfun(legends=TRUE))
Get rid of legends in left column (as in the linked question above):
library(ggplot2) ## for theme()
for (i in 1:2) {
inner <- getPlot(gg0,i,1)
inner <- inner + theme(legend.position="none")
gg0 <- putPlot(gg0,inner,i,1)
}
inner <- getPlot(gg0,2,2)
inner <- inner + theme(legend.position="right")
gg0 <- putPlot(gg0,inner,2,2)
Problems:
the blank panel behind the legend is actually masking some points; I don't know why it's not outside the panel as usual, I assume that's something that ggpairs is doing
if it were outside the panel (on top or to the right), I would want to make sure to leave some extra space so the panels themselves were all the same size. However, ggmatrix/ggpairs looks very inflexible about this.
The only alternative I've been able to try to far is following ggplot separate legend and plot by extracting the legend and using gridExtra::grid.arrange():
g_legend <- function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend)
}
library(gridExtra)
grid.arrange(getPlot(gg0,1,1),
g_legend(getPlot(gg0,2,2)),
getPlot(gg0,2,1),
getPlot(gg0,2,2)+theme(legend.position="none"),
nrow=2)
Problems:
the axes and labels suppressed by ggpairs are back ...
I also considered creating a panel with a special plot that contained only the legend (i.e. trying to use theme(SOMETHING=element.blank) to suppress the plot itself, but couldn't figure out how to do it.
As a last resort, I could trim the axes where appropriate myself, but this is practically reinventing what ggpairs is doing in the first place ...
With some slight modification to solution 1: First, draw the matrix of plots without their legends (but still with the colour mapping). Second, use your trim_gg function to remove the diagonal spaces. Third, for the plot in the top left position, draw its legend but position it into the empty space to the right.
data(state)
dd <- data.frame(state.x77,
State = state.name,
Abbrev = state.abb,
Region = state.region,
Division = state.division)
columns <- c(3, 5, 6, 7)
colour <- "Region"
library(GGally)
library(ggplot2) ## for theme()
# Base plot
ggfun <- function(data = NULL, columns = NULL, colour = NULL, legends = FALSE) {
ggpairs(data,
columns = columns,
mapping = ggplot2::aes_string(colour = colour),
lower = list(continuous = "points"),
diag = list(continuous = "blankDiag"),
upper = list(continuous = "blank"),
legends = legends)
}
# Remove the diagonal elements
trim_gg <- function(gg) {
n <- gg$nrow
gg$nrow <- gg$ncol <- n-1
v <- 1:n^2
gg$plots <- gg$plots[v > n & v%%n != 0]
gg$xAxisLabels <- gg$xAxisLabels[-n]
gg$yAxisLabels <- gg$yAxisLabels[-1]
return(gg)
}
# Get the plot
gg0 <- trim_gg(ggfun(dd, columns, colour))
# For plot in position (1,1), draw its legend in the empty panels to the right
inner <- getPlot(gg0, 1, 1)
inner <- inner +
theme(legend.position = c(1.01, 0.5),
legend.direction = "horizontal",
legend.justification = "left") +
guides(colour = guide_legend(title.position = "top"))
gg0 <- putPlot(gg0, inner, 1, 1)
gg0

Absolute positioning of rasterGrobs in gtable cells

I have been attempting to specify absolute positions for rasterGrobs in gtable cells without success. I would like to be able to have the extents of an image align to values on the y axis. The script aligns drill-core images alongside multi-sensor data plotted in ggplot2 facets. For example, a particular radiograph core image needs to have its top at 192 mm, and bottom at 1482 mm, but I want the scale to go from 0 to 1523 mm. Please see the included link for an example of what I am doing, but for simplicity I have only posted an MWE here. Is it possible to specify an absolute position for a rasterGrob inside a gtable cell?
sample of intended output
In terms of the MWE below, my only solution thus far has been to move Rlogo.png around using relative positions set when using rasterGrob(). Using "native" coordinates does not appear to be what I need either. Similarly, I can't make sense of the position parameters called in gtable_add_grob().
library(png)
library(ggplot2)
library(gtable)
# read Image
img <- readPNG(system.file("img", "Rlogo.png", package = "png"))
# convert to rastergrob
g <- rasterGrob(img, y = unit(0.5, "npc"), x = unit(0.5, "npc"))
# create plot
tp <- qplot(1:5, 1:5, geom="blank") + scale_y_reverse()
# convert plot to gtable
tt <- ggplot_gtable(ggplot_build(tp))
# add column to gtable to hold image
tt <- gtable_add_cols(tt, tt$width[[.5*4]], 3)
# add grob to cell 3, 4
tt <- gtable_add_grob(tt,g,3,4)
# render
grid.draw(tt)
Did a lot of searching before coming up with this solution of using rasterGrob to add images to panels in a ggplot. Perhaps though there is a more elegant solution someone can suggest?
The grob can set its position within a cell, as illustrated below
library(gridExtra)
library(grid)
library(gtable)
# quick shortcut to create a 2x2 gtable filled with 4 rectGrobs
tg <- arrangeGrob(grobs=replicate(4, rectGrob(), FALSE))
# red rect of fixed size with default position (0.5, 0.5) npc
rg1 <- rasterGrob("red", width=unit(1,"cm"), height=unit(1,"cm"))
# blue rect with specific x position (0) npc, left-justified
rg2 <- rasterGrob("blue", width=unit(1,"cm"), height=unit(1,"cm"),
x = 0, hjust=0)
# green rect at x = 1cm left-justified, y=-0.5cm from middle, top-justified
rg3 <- rasterGrob("green", width=unit(1,"cm"), height=unit(1,"cm"),
x = unit(1,"cm"), y=unit(0.5, "npc") - unit(0.5, "cm"),
hjust=0, vjust=1)
# place those on top
tg <- gtable_add_grob(tg, rg1, 1, 2, z = Inf, name = "default")
tg <- gtable_add_grob(tg, rg2, 1, 2, z = Inf, name = "left")
tg <- gtable_add_grob(tg, rg3, 1, 2, z = Inf, name = "custom")
grid.newpage()
grid.draw(tg)

Rescale axis text clearly after arranging multiple plots with arrangeGrob and resizing the full plot in R

I have the following piece of code, which generates three plots after which I arrange those plots using arrangeGrob and put it in a variable g. Then to save the output I use ggsave(file="filename",g). My question is all the three plots are saving correctly in the file, however some of the axis lables gets cramped up, is there any way automatically rescale (to preserve a clear output) in the saved file. One option would be to adjust the width and height of the file, but I need to keep these dimensions and decimal format for the numbers. Can someone suggest a method for this, thanks.
I don't understand why the color in the geom_point(aes()) is not applied to the plots correctly?
x <- c(1:10)
y <- x^3
z <- y-20
s <- z/3
t <- s*6
q <- s*y
x1 <- cbind(x,y,z,s,t,q)
x1 <- data.frame(x1)
a <- ggplot(x1,aes(x=x1[,1],y=x1[,2]))+geom_point(aes(color = 'blue'),size=4)+theme(legend.position='none',plot.margin=unit(c(0,3,0,0),"mm"))
ggsave("a.png")
b <- ggplot(x1,aes(x=x1[,3],y=x1[,4]))+geom_point(aes(color = 'blue'),size=4)+theme(legend.position='none',plot.margin=unit(c(0,3,0,0),"mm"))
ggsave("b.png")
c <- ggplot(x1,aes(x=x1[,5],y=x1[,6]))+geom_point(aes(color = 'violet'),size=4)+theme(legend.position='none',plot.margin=unit(c(0,3,0,0),"mm"))
ggsave("c.png")
gA <- ggplotGrob(a)
gB <- ggplotGrob(b)
gC <- ggplotGrob(c)
maxWidth = grid::unit.pmax(gA$widths[2:5], gB$widths[2:5],gC$widths[2:5])
gA$widths[2:5] <- as.list(maxWidth)
gB$widths[2:5] <- as.list(maxWidth)
gC$widths[2:5] <- as.list(maxWidth)
g <- arrangeGrob(gA, gB,gC, ncol=2)
ggsave(file='fname.png',g, width=10,height=8,units=c("cm"), dpi=600)
An illustration of the resulting plot is shown below,
Add:
+ theme(text = element_text(size = 10))
You can change the text size to fit your needs.
To change the colour of the points set the colour flag outside of the aesthetics:
geom_point(colour = "blue", size = 4)

scatter plot specifying color and labelling axis in r

I have following data and plot:
pos <- rep(1:2000, 20)
xv =c(rep(1:20, each = 2000))
# colrs <- unique(xv)
colrs <- xv # edits
yv =rnorm(2000*20, 0.5, 0.1)
xv = lapply(unique(xv), function(x) pos[xv==x])
to.add = cumsum(sapply(xv, max) + 1000)
bp <- c(xv[[1]], unlist(lapply(2:length(xv), function(x) xv[[x]] + to.add[x-1])))
plot (bp,yv, pch = "*", col = colrs)
I have few issues in this plot I could not figure out.
(1) I want to use different color for different group or two different color for different groups (i.e xv), but when I tried color function in terms to be beautiful mixture. Although I need to highlight some points (for example bp 4000 to 4500 for example with blue color)
(2) Instead of bp positions I want to put a tick mark and label with the group.
Thank you, appreciate your help.
Edits: with help of the following answer (with slight different approach in case I have unbalanced number in each group will work) I could get the similar plot. But still question remaining regarding colors is what if I want to use two alternate colors in alternate group ?
You can solve your colour issue by repeating the colour index however many times each group has a point plotted, like so:
plot (bp,yv, pch = "*", col = rep(colrs,each=2000))
The default colour palette (see ?palette or palette() ) will wrap around itself and you might want to specify your own to get 20 distinct colours.
To relabel the x axis, try plotting without the axis and then specifying the points and labels manually.
plot (bp,yv, pch = "*", col = rep(colrs,each=2000),xaxt="n")
axis(1,at=seq(1000,58000,3000),labels=1:20)
If you are trying to squeeze a lot of labels in there, you might have to shrink the text (cex.axis)or spin the labels 90 degrees (las=2).
plot (bp,yv, pch = "*", col = rep(colrs,each=2000),xaxt="n")
axis(1,at=seq(1000,58000,3000),labels=1:20,cex.axis=0.7,las=2)
Result:
One way is you could use a nested ifelse.
I'm still learning R, but one way it could be done would look something like:
plot(whatev$x, whatev$y, col=ifelse(xv<2000,red,ifelse(2000<xv & xv<4000,yellow,blue)))
You could nest as many of these as you want to have specificity on the colors and the intervals. The ifelse command is of form ifelse(TEST, True, False).
A simpler way would be to use the unique groups in xv to assign rainbow colors.
colrs=rainbow(length(unique(xv))) #Or colrs=rainbow(length(xv)) if xv is unique.
plot(whatev$x, whatev$y, col=colrs)
I hope I got all that right. I'm still learning R myself.
I'm going to go out on a limb and guess that your real data are something like 2000 values of things from 20 different groups. For instance, heights of 2000 plants of 20 different species. In such a case, you might want to look at the dotplot() function (or as illustrated below, dotplot.table()) in the lattice package.
Generate matrix of hypothetical values:
set.seed(1)
myY <- sapply( seq_len(20), function(x) rnorm(2000, x^(1/3)))
Transpose matrix to get groups as rows
myY <- t(myY)
Provide names of groups to matrix:
dimnames(myY)[[1]]<-paste("group", seq_len(nrow(myY)))
Load lattice package
library(lattice)
Generate dotplot
dotplot(myY, horizontal = FALSE, panel = function(x, y, horizontal, ...) {
panel.dotplot(x = x, y = y, horizontal = horizontal, jitter.x = TRUE,
col = seq_len(20)[x], pch = "*", cex = 1.5)
}, scales = list(x = list(rot = 90))
)
Which looks like (with unfortunate y-axis labeling):
Seeing that #JohnCLK is requesting a way of colouring by values on the x axis, I tried these demos in ggplot2-- each uses a dummy variable that is coded based on values or ranges to be highlighted in the other variables.
So, first set up the data, as in the question:
pos <- rep(1:2000, 20)
xv <- c(rep(1:20, each = 2000))
yv <- (2000*20, 0.5, 0.1)
xv <- lapply(unique(xv), function(x) pos[xv==x])
to.add <- cumsum(sapply(xv, max) + 1000)
bp <- c(xv[[1]], unlist(lapply(2:length(xv), function(x) xv[[x]] + to.add[x-1])))
Then load ggplot2, prepare a couple of utility functions, and set the default theme:
library("ggplot2")
make.png <- function(p, fName) {
png(fName, width=640, height=480, units="px")
print(p)
dev.off()
}
make.plot <- function(df) {
p <- ggplot(df,
aes(x = bp,
y = yv,
colour = highlight))
p <- p + geom_point()
p <- p + opts(legend.position = "none")
return(p)
}
theme_set( theme_bw() )
Draw a plot which highlights values in a defined range on the vertical axis:
# highlight a horizontal band
df <- data.frame(cbind(bp, yv))
df$highlight <- 0
df$highlight[ df$yv >= 0.4 & df$yv < 0.45 ] <- 1
p <- make.plot(df)
print(p)
make.png(p, "demo_horizontal.png")
Next draw a plot which highlights values in a defined range on the x axis, a vertical band:
# highlight a vertical band
df$highlight <- 0
df$highlight[ df$bp >= 38000 & df$bp < 42000 ] <- 1
p <- make.plot(df)
print(p)
make.png(p, "demo_vertical.png")
And finally draw a plot which highlights alternating vertical bands, by x value:
# highlight alternating bands
library("gtools")
alt.band.width <- 2000
df$highlight <- as.integer(df$bp / alt.band.width)
df$highlight <- ifelse(odd(df$highlight), 1, 0)
p <- make.plot(df)
print(p)
make.png(p, "demo_alternating.png")
Hope this helps; it was good practice anyway.

Resources