I would like to combine a ggplot2 with a lattice plot object. Since both packages are based on grid I was wondering whether this is possible? Ideally, I would do everything in ggplot2 but I cannot plot a 3d scatter.
So assume I have the following data:
set.seed(1)
mdat <- data.frame(x = rnorm(100), y = rnorm(100), z = rnorm(100),
cluster = factor(sample(5, 100, TRUE)))
First, I want to create a scatterplot matrix in ggplot2:
library(ggplot2)
library(gtools)
library(plyr)
cols <- c("x", "y", "z")
allS <- adply(combinations(3, 2, cols), 1, function(r)
data.frame(cluster = mdat$cluster,
var.x = r[1],
x = mdat[[r[1]]],
var.y = r[2],
y = mdat[[r[2]]]))
sc <- ggplot(allS, aes(x = x, y = y, color = cluster)) + geom_point() +
facet_grid(var.x ~ var.y)
So far so good. Now I want to create a lattice 3d scatterplot with all the variables together:
library(lattice)
sc3d <- cloud(z ~ x + y, data = mdat, groups = cluster)
Now I would like to combine sc and sc3d in one single plot. How can I achieve that? Maybe with the help of grid or gridExtra (pushViewport, arrangeGrob?)? Or can I produce a 3d scatterplot in ggplot? Ideally, I would like to see the 3d plot in the empty panel pf the ggplot but I guess that's asked even too much, so for starters I would be very happy to learn how we could arrange these two plots side by side.
library(gridExtra); library(lattice); library(ggplot2)
grid.arrange(xyplot(1~1), qplot(1,1))
You can replace the empty panel by the lattice grob within the gtable, but it doesn't look very good due to the axes etc.
g <- ggplotGrob(sc)
lg <- gridExtra:::latticeGrob(sc3d)
ids <- which(g$layout$name == "panel")
remove <- ids[2]
g$grobs[[remove]] <- lg
grid.newpage()
grid.draw(g)
Related
I'm having trouble displaying the multiple graphs on the same page. I'm having a data frame with 18 numerical columns. For each column, I need to show its histogram and boxplot on the same page with a 4*9 grid. Following is what I tried. But I need to show it along with the boxplot as well. Through a for a loop if possible. Can someone please help me to do it.
library(gridExtra)
library(ggplot2)
p <- list()
for(i in 1:18){
x <- my_data[,i]
p[[i]] <- ggplot(gather(x), aes(value)) +
geom_histogram(bins = 10) +
facet_wrap(~key, scales = 'free_x')
}
do.call(grid.arrange,p)
I received the following graph.
When following is tried, I'm getting the graph in separate pages
library(dplyr)
dat2 <- my_data %>% mutate_all(scale)
# Boxplot from the R trees dataset
boxplot(dat2, col = rainbow(ncol(dat2)))
par(mfrow = c(2, 2)) # Set up a 2 x 2 plotting space
# Create the loop.vector (all the columns)
loop.vector <- 1:4
p <- list()
for (i in loop.vector) { # Loop over loop.vector
# store data in column.i as x
x <- my_data[,i]
# Plot histogram of x
p[[i]] <-hist(x,
main = paste("Question", i),
xlab = "Scores",
xlim = c(0, 100))
plot_grid(p, label_size = 12)
}
You can assemble the base R boxplot and the ggplot object generated with facet_wrap together using the R package patchwork:
library(ggplot2)
library(patchwork)
p <- ggplot(mtcars, aes(x = mpg)) +
geom_histogram() +
facet_wrap(~gear)
wrap_elements(~boxplot(split(mtcars$mpg, mtcars$gear))) / p
ggsave('test.png', width = 6, height = 8, units = 'in')
I try to use ggplot to plot quarted circles to visualize contour plots but I get misconfigured plot using geom_area (following this tutorial on stacked area with ggplot2)
The code I tried reads
library(ggplot2)
library(dplyr)
N <- 1E2
r <- rev(c(1,2,4,7))
maxXY = max(r)+.25*max(r)
grupp <- c("0","0.25","0.5","0.75")
datalist = list()
plot(0,0,xlim=c(0,maxXY),ylim=c(0,maxXY))
for (i in 1:length(r)) {
quadX <- seq(from = 0,to = r[i],length.out = N) # calculate x coords
quadY <- sqrt(r[i]^2 - quadX^2) # calculate y coords
lines(quadX,quadY)
# data for ggplot
dat <- data.frame(X = quadX, Y = quadY)
dat$group <- grupp[i]
datalist[[i]] <- dat # add it to your list
}
DF = do.call(rbind, datalist)
# stacked area chart
p1 <- ggplot(DF, aes(x=X, y=Y, fill=group)) +
geom_area(alpha=0.6 , size=1, colour="black")
plot(p1)
and I get quarter circles plotted correctly with basic plot
but a weird one with geom_area
Any help would be very appreciated. MJS
EDIT: using Z.Lin's suggestions I get the correct plot, thanks!
Hi there: I need to plot a factor with 81 different categories with different frequency counts each. Each factor name is a 4-letter category. It looks like this. As you can see, it is pretty tough to read the factor labels. I'd like to stagger the y-axis according to this suggestion. However, this issue on github suggests that something has changed in ggplot2 and that the hjust and vjust options no longer work. Does anyone have any suggestions to make this plot look better, in particular to make the factor levels readable.
#libraries
# install.packages('stringi')
library(ggplot2)
library(stringi)
#fake data
var<-stri_rand_strings(81, 4, pattern='[HrhEgeIdiFtf]')
var1<-rnorm(81, mean=175, sd=75)
#data frame
out<-data.frame(var, var1)
#set levels for plotting
out$var<-factor(out$var, levels=out$var[order(out$var1, decreasing=FALSE)])
#PLot
out.plot<-out %>%
ggplot(., aes(x=var, y=var1))+geom_point()+coord_flip()
#Add staggered axis option
out.plot+theme(axis.text.y = element_text(hjust = grid::unit(c(-2, 0, 2), "points")))
To stagger the labels, you could add spaces to the labels in the dataframe.
# Libraries
library(ggplot2)
library(stringi)
# fake data
set.seed(12345)
var <- stri_rand_strings(81, 4, pattern = '[HrhEgeIdiFtf]')
var1 <- rnorm(81, mean = 175, sd = 75)
out <- data.frame(var, var1)
# Add spacing, and set levels for plotting
out = out[order(out$var1), ]
out$var = paste0(out$var, c("", " ", " "))
out$var <- factor(out$var, levels = out$var[order(out$var1, decreasing = FALSE)])
# Plot
out.plot <- ggplot(out, aes(x = var, y = var1)) +
geom_point() + coord_flip()
out.plot
Alternatively, draw the original plot, then edit. Here, I use the grid function, editGrob() to do the editing.
# Libraries
library(ggplot2)
library(gtable)
library(grid)
library(stringi)
# fake data
set.seed(12345)
var <- stri_rand_strings(81, 4, pattern = '[HrhEgeIdiFtf]')
var1 <- rnorm(81, mean = 175, sd = 75)
out <- data.frame(var, var1)
# Set levels for plotting
out$var <- factor(out$var, levels = out$var[order(out$var1, decreasing = FALSE)])
# Plot
out.plot <- ggplot(out, aes(x = var, y = var1)) +
geom_point() + coord_flip()
# Get the ggplot grob
g = ggplotGrob(out.plot)
# Get a hierarchical list of component grobs
grid.ls(grid.force(g))
Look through the list to find the section referring to the left axis. The relevant bit is:
axis-l.6-3-6-3
axis.line.y..zeroGrob.232
axis
axis.1-1-1-1
GRID.text.229
axis.1-2-1-2
You will need to set up path from 'axis-l', through 'axis', through 'axis', though to 'GRID.text'.
# make the relevant column a little wider
g$widths[3] = unit(2.5, "cm")
# The edit
g = editGrob(grid.force(g),
gPath("axis-l", "axis", "axis", "GRID.text"),
x = unit(c(-1, 0, 1), "npc"),
grep = TRUE)
# Draw the plot
grid.newpage()
grid.draw(g)
Another option is to find your way through the structure to the relevant grob to make the edit.
# Get the grob
g <- ggplotGrob(out.plot)
# Get the y axis
index <- which(g$layout$name == "axis-l") # Which grob
yaxis <- g$grobs[[index]]
# Get the ticks (labels and marks)
ticks <- yaxis$children[[2]]
# Get the labels
ticksL <- ticks$grobs[[1]]
# Make the edit
ticksL$children[[1]]$x <- rep(unit.c(unit(c(1,0,-1),"npc")), 27)
# Put the edited labels back into the plot
ticks$grobs[[1]] <- ticksL
yaxis$children[[2]] <- ticks
g$grobs[[index]] <- yaxis
# Make the relevant column a little wider
g$widths[3] <- unit(2.5, "cm")
# Draw the plot
grid.newpage()
grid.draw(g)
Sandy mentions adding spaces to the labels.
With a discrete axis, you can also simply add line breaks to alternate cases. In my case I wanted to stagger alternate ones:
scale_x_discrete(labels=paste0(c("","\n"),net_change$TZ_t)
Where net_change$TZ_t is my ordered factor. It extends to 'triple' levels easily with c("","\n","\n\n").
I'm struggling with the following problem:
I use pam to cluster my dataset v in 7 clusters:
x <- pam(v,7)
I know that there is a vector clustering in x which contains the according numbers of clusters.
I would like to get a subset of x which only contains cluster 1.
Is this possible?
Edit:
Here is an example. Cluster iris in three clusters and plot them.
library(ggfortify)
library(cluster)
v <- iris[-5]
x <- pam(v,3)
autoplot(x, frame = TRUE, frame.type = 'norm')
The question: How can I plot only the first cluster? It should look like the first plot without cluster 2 and 3.
Edit: I think I found a solution. Therefore I don't use autoplot anymore but calculate the convex hull of every cluster and plot it.
library(cluster)
library(plyr)
library(ggplot2)
library(ggrepel)
find_hull <- function(df) df[chull(df$x, df$y),]
v<-iris[-5]
pp <- pam(v,3)
n<-princomp(pp$data, scores = TRUE, cor = ncol(pp$data) != 2)$scores
df<-data.frame(n[,1],n[,2],pp$clustering)
colnames(df)<-c("x","y","z")
hulls <- ddply(df, "z", find_hull)
p<-qplot(x,y,data=df,color=as.factor(z))+
geom_polygon(data=hulls, alpha=1, fill=NA)+
geom_text_repel(aes(label = rownames(df)),arrow = arrow(length = unit(0.00, 'inches'), angle = 0.00),size=5.5,colour="grey55")+
theme_classic(base_size = 16)+
theme(axis.line=element_blank(),axis.text.x=element_blank(),axis.text.y=element_blank(),axis.ticks=element_blank(),
axis.title.x=element_blank(),axis.title.y=element_blank(),legend.position="none",
panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank())
p
df2<-df[df$z==1,]
hulls <- ddply(df2, "z", find_hull)
p1<-qplot(x,y,data=df2,color=as.factor(z))+
geom_polygon(data=hulls, alpha=0.8, fill=NA)+
geom_text_repel(aes(label = rownames(df2)),arrow = arrow(length = unit(0.00, 'inches'), angle = 0.00),size=5.5,colour="grey25")+
theme_classic(base_size = 16)+
theme(axis.line=element_blank(),axis.text.x=element_blank(),axis.text.y=element_blank(),axis.ticks=element_blank(),
axis.title.x=element_blank(),axis.title.y=element_blank(),legend.position="none",
panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank())+
p1
Now I want to plot both figures in one device. I have already tried the multiplot from cookbook-r but it gives the error
Error: Aesthetics must be either length 1 or the same as the data (26): label, x, y
It must be because of the labels I guess.
I also tried
grid.arrange(p,p1, ncol=1)
from the gridExtra package but it gives the same error.
Is there any other option to arrange multiple figures with labels in one figure?
I'm running an R script generating plots of the PCA analysis using FactorMineR.
I'd like to output the coordinates for the generated PCA plots but I'm having trouble finding the right coordinates. I found results1$ind$coord and results1$var$coord but neither look like the default plot.
I found
http://www.statistik.tuwien.ac.at/public/filz/students/seminar/ws1011/hoffmann_ausarbeitung.pdf
and
http://factominer.free.fr/classical-methods/principal-components-analysis.html
but neither describe the contents of the variable created by the PCA
library(FactoMineR)
data1 <- read.table(file=args[1], sep='\t', header=T, row.names=1)
result1 <- PCA(data1,ncp = 4, graph=TRUE) # graphs generated automatically
plot(result1)
I found that $ind$coord[,1] and $ind$coord[,2] are the first two pca coords in the PCA object. Here's a worked example that includes a few other things you might want to do with the PCA output...
# Plotting the output of FactoMineR's PCA using ggplot2
#
# load libraries
library(FactoMineR)
library(ggplot2)
library(scales)
library(grid)
library(plyr)
library(gridExtra)
#
# start with a clean slate
rm(list=ls(all=TRUE))
#
# load example data
data(decathlon)
#
# compute PCA
res.pca <- PCA(decathlon, quanti.sup = 11:12, quali.sup=13, graph = FALSE)
#
# extract some parts for plotting
PC1 <- res.pca$ind$coord[,1]
PC2 <- res.pca$ind$coord[,2]
labs <- rownames(res.pca$ind$coord)
PCs <- data.frame(cbind(PC1,PC2))
rownames(PCs) <- labs
#
# Just showing the individual samples...
ggplot(PCs, aes(PC1,PC2, label=rownames(PCs))) +
geom_text()
# Now get supplementary categorical variables
cPC1 <- res.pca$quali.sup$coor[,1]
cPC2 <- res.pca$quali.sup$coor[,2]
clabs <- rownames(res.pca$quali.sup$coor)
cPCs <- data.frame(cbind(cPC1,cPC2))
rownames(cPCs) <- clabs
colnames(cPCs) <- colnames(PCs)
#
# Put samples and categorical variables (ie. grouping
# of samples) all together
p <- ggplot() + theme(aspect.ratio=1) + theme_bw(base_size = 20)
# no data so there's nothing to plot...
# add on data
p <- p + geom_text(data=PCs, aes(x=PC1,y=PC2,label=rownames(PCs)), size=4)
p <- p + geom_text(data=cPCs, aes(x=cPC1,y=cPC2,label=rownames(cPCs)),size=10)
p # show plot with both layers
# Now extract the variables
#
vPC1 <- res.pca$var$coord[,1]
vPC2 <- res.pca$var$coord[,2]
vlabs <- rownames(res.pca$var$coord)
vPCs <- data.frame(cbind(vPC1,vPC2))
rownames(vPCs) <- vlabs
colnames(vPCs) <- colnames(PCs)
#
# and plot them
#
pv <- ggplot() + theme(aspect.ratio=1) + theme_bw(base_size = 20)
# no data so there's nothing to plot
# put a faint circle there, as is customary
angle <- seq(-pi, pi, length = 50)
df <- data.frame(x = sin(angle), y = cos(angle))
pv <- pv + geom_path(aes(x, y), data = df, colour="grey70")
#
# add on arrows and variable labels
pv <- pv + geom_text(data=vPCs, aes(x=vPC1,y=vPC2,label=rownames(vPCs)), size=4) + xlab("PC1") + ylab("PC2")
pv <- pv + geom_segment(data=vPCs, aes(x = 0, y = 0, xend = vPC1*0.9, yend = vPC2*0.9), arrow = arrow(length = unit(1/2, 'picas')), color = "grey30")
pv # show plot
# Now put them side by side in a single image
#
grid.arrange(p,pv,nrow=1)
#
# Now they can be saved or exported...
Adding something extra to Ben's answer. You'll note in the first chart in Ben's response that the labels overlap somewhat. The pointLabel() function in the maptools package attempts to find locations for the labels without overlap. It's not perfect, but you can adjust the positions in the new dataframe (see below) to fine tune if you want. (Also, when you load maptools you get a note about gpclibPermit(). You can ignore it if you're concerned about the restricted licence). The first part of the script below is Ben's script.
# load libraries
library(FactoMineR)
library(ggplot2)
library(scales)
library(grid)
library(plyr)
library(gridExtra)
#
# start with a clean slate
# rm(list=ls(all=TRUE))
#
# load example data
data(decathlon)
#
# compute PCA
res.pca <- PCA(decathlon, quanti.sup = 11:12, quali.sup=13, graph = FALSE)
#
# extract some parts for plotting
PC1 <- res.pca$ind$coord[,1]
PC2 <- res.pca$ind$coord[,2]
labs <- rownames(res.pca$ind$coord)
PCs <- data.frame(cbind(PC1,PC2))
rownames(PCs) <- labs
#
# Now, the code to produce Ben's first chart but with less overlap of the labels.
library(maptools)
PCs$label=rownames(PCs)
# Base plot first for pointLabels() to get locations
plot(PCs$PC1, PCs$PC2, pch = 20, col = "red")
new = pointLabel(PCs$PC1, PCs$PC2, PCs$label, cex = .7)
new = as.data.frame(new)
new$label = PCs$label
# Then plot using ggplot2
(p = ggplot(data = PCs) +
geom_hline(yintercept = 0, linetype = 3, colour = "grey20") +
geom_vline(xintercept = 0, linetype = 3, colour = "grey20") +
geom_point(aes(PC1, PC2), shape = 20, col = "red") +
theme_bw())
(p = p + geom_text(data = new, aes(x, y, label = label), size = 3))
The result is:
An alternative is to use the biplot function from CoreR or biplot.psych from the psych package. This will put the components and the data onto the same figure.
For the decathlon data set, use principal and biplot from the psych package:
library(FactoMineR) #needed to get the example data
library(psych) #needed for principal
data(decathlon) #the data set
pc2 <- principal(decathlon[1:10],2) #just the first 10 columns
biplot(pc2,labels = rownames(decathlon),cex=.5, main="Biplot of Decathlon results")
#this is a call to biplot.psych which in turn calls biplot.
#adjust the cex parameter to change the type size of the labels.
This looks like:
!a biplot http://personality-project.org/r/images/olympic.biplot.pdf
Bill