faceting by unique pairs - r

I would like to create some plots with ggplot using faceting. I'm relatively new to ggplot so I'm struggeling setting up the plot. For testing I set up some test data. The actual data is huge and I want first to play around with these toy case. Here is the toy data
m1 <- matrix(rep(c("Skin","Human"),100),ncol = 2,byrow = T)
m2 <- matrix(rep(c("Head","Animal"),200),ncol = 2, byrow=T)
m3 <- matrix(rep(c("Skin","Animal"),250), ncol = 2, byrow=T)
y <- rnorm(550,0,1)
x1 <- rnorm(100,0,1)
x2 <- rnorm(200,0,1)
x3 <- rnorm(250,0,1)
m1 <- as.data.frame(cbind(x1,m1))
m2 <- as.data.frame(cbind(x2,m2))
m3 <- as.data.frame(cbind(x3,m3))
colnames(m1) <- c("x1","type","class")
colnames(m2) <- c("x1","type","class")
colnames(m3) <- c("x1","type","class")
data <- as.data.frame(cbind(y,rbind(m1,m2,m3)))
data <- cbind(data,rnorm(550,0,1))
colnames(data) <- c("y","x1","type","class","x2")
data <- data[,c("y","x1","x2","type","class")]
plot(sort(data[1:100,"y"]),sort(data[1:100,"x1"]),col="red")
points(sort(data[1:100,"y"]),sort(data[1:100,"x2"]),col="blue")
I would like to have a plot for all unique pairs of c("type","class") where in each plot I see two scatterplots of x1 and x2 against y. I thought facetting is the right approach, however I'm struggeling to achieve the desired result.

Based on the plots that your sample code generates, it seems like you want to plot two sets of points (x1,y) and (x2,y) on the same plot, which ggplot is able to handle well. However, ggplot works well with long tables rather than wide ones.
I've provided one way to achieve your desired outcome. The following steps can be performed after your chunk of code to achieve the desired outcome.
Melt your table wide-to-long make use of ggplot's in-built functionality. Note that the color argument automatically plots the x1 and x2 in different colors.
library(reshape2) # Used to melt the table
library(ggplot2) # Used to plot
data <- melt(data, id.vars = c('type','class','y'), measure.vars = c('x1','x2'))
head(data)
# type class y variable value
# 1 Skin Human 1.3170057 x1 -1.09101346133313
# 2 Skin Human 1.2805021 x1 -0.883308758331181
# 3 Skin Human -0.7620298 x1 0.0800447346341697
# 4 Skin Human 0.2766297 x1 0.589741587886533
# 5 Skin Human -1.8504755 x1 -0.178520217862402
# 6 Skin Human 0.6474738 x1 0.1039386636512
p1 <- ggplot(data, aes(x = as.numeric(value), y = y, color = variable))
print(p1)
Using facet_wrap to facet by unique combinations of type and class
faceted <- p1 + facet_wrap(~type + class)
print(faceted)

Related

In ggplot2 is there a relatively simple way of using different geoms for different groups in the data?

I have a set of data with multiple groups. I'd like to plot them on the same graph but with, say, a smooth line for one group and the data points for the other. Or with smooth lines for both, but data points for only one of them. An example:
library(reshape)
library(ggplot2)
set.seed(123)
x <- 1:1000
y <- 5 + rnorm(1000)
z <- 5 + 0.005*x + rnorm(1000)
df <- as.data.frame(cbind(x,y,z))
df <- melt(df,id=c("x"))
ggplot(df,aes(x=x,y=value,color=variable)) +
geom_point() + #here I want only the y variable graphed
geom_smooth() #here I want only the z variable graphed
They are both graphed against the x variable, and are on the same scale. Is there a relatively easy way to accomplish this?
Set the data parameter with the filtered data on each plot type
library(ggplot2)
library(reshape)
set.seed(123)
x <- 1:1000
y <- 5 + rnorm(1000)
z <- 5 + 0.005*x + rnorm(1000)
df <- as.data.frame(cbind(x,y,z))
df <- reshape::melt(df,id=c("x"))
df
ggplot(df,aes(x=x,y=value,color=variable)) +
geom_point(data=df[df$variable=="y",]) + #here I want only the y variable graphed
geom_smooth(data=df[df$variable=="z",]) #here I want only the z variable graphed

How to find out if two cells in a dataframe belong to the same pre-specified factor-level

I really dont know how to title this question better, so please bear with me.
library(reshape)
library(ggplot2)
library(dplyr)
dist1 <- matrix(runif(16),4,4)
dist2 <- matrix(runif(16),4,4)
rownames(dist1) <- colnames(dist1) <- paste0("A",1:4)
rownames(dist2) <- colnames(dist2) <- paste0("A",1:4)
m1 <- melt(dist1)
m2 <- melt(dist2)
final <- full_join(m1,m2, by=c("Var1","Var2"))
ggplot(final, aes(value.x,value.y)) + geom_point()
To illustrate my problem, i have a matrix with ecological distances (m1) and one with genetic distances (m2) for a number of biological species. I have merged both matrices and want to plot both distances versus each other.
Here is the twist:
The biological species belong to certain groups, which are given in the dataframe species. I want to check if a x,y pair (as in final$Var1, final$Var2) belongs to the same group of species (here "cat" or "dog"), and then want to color it specifically.
So, i need an R translation for:
species <- data.frame(spcs=as.character(paste0("A",1:4)),
grps=as.factor(c(rep("cat",2),(rep("dog",2)))))
final$group <- If (final$Var1,final$Var2) belongs to the same group as specified
in species, then assign the species group here, else do nothing or assign NA
so i can proceed with
ggplot(final, aes(value.x,value.y, col=group)) + geom_point()
Thank you very much!
Here's one approach that works. I made some changes to the code you provided. Full working example code given below.
library(reshape)
library(ggplot2)
library(dplyr)
dist1 <- matrix(runif(16), 4, 4)
dist2 <- matrix(runif(16), 4, 4)
rownames(dist1) <- colnames(dist1) <- paste0("A", 1:4)
rownames(dist2) <- colnames(dist2) <- paste0("A", 1:4)
m1 <- melt(dist1)
m2 <- melt(dist2)
# I changed the by= argument here
final <- full_join(m1, m2, by=c("X1", "X2"))
# I made some changes to keep spcs character and grps factor
species <- data.frame(spcs=paste0("A", 1:4),
grps=as.factor(c(rep("cat", 2), (rep("dog", 2)))), stringsAsFactors=FALSE)
# define new variables for final indicating group membership
final$g1 <- species$grps[match(final$X1, species$spcs)]
final$g2 <- species$grps[match(final$X2, species$spcs)]
final$group <- as.factor(with(final, ifelse(g1==g2, as.character(g1), "dif")))
# plot just the rows with matching groups
ggplot(final[final$group!="dif", ], aes(value.x, value.y, col=group)) +
geom_point()
# plot all the rows
ggplot(final, aes(value.x, value.y, col=group)) + geom_point()
One way to do this is to set up the species data frame with two columns that correspond to X1 and X2 in final, then merge based on those two columns:
species <- data.frame(X1=paste0("A",1:4),
X2=paste0("A",1:4),
grps=as.factor(c(rep("cat",2),(rep("dog",2)))))
final = merge(final, species, by=c("X1","X2"), all.x=TRUE)
Now you can plot the data using grps as the colour aesthetic:
ggplot(final, aes(value.x,value.y, colour=grps)) + geom_point()

How to put ggplot2 legend in two columns for an area plot

I would like to put a long legend into two columns and I am not having any success. Here's the code that I'm using with the solution found elsewhere which does not work for geom='area', though it works for my other plots. The plot that I do get from the code below looks like:
So how do I plot Q1 with the legend in two columns please?
NVER <- 10
NGRID <- 20
MAT <- matrix(NA, nrow=NVER, ncol=NGRID)
gsd <- 0.1 # standard deviation of the Gaussians
verlocs <- seq(from=0, to=1, length.out=NVER)
thegrid <- seq(from=0, to=1, length.out=NGRID)
# create a mixture of Gaussians with modes spaced evenly on 0 to 1
# i.e. the first mode is at 0 and the last mode is at 1
for (i in 1:NVER) {
# add the shape of gaussian i
MAT[i,] <- dnorm(thegrid, verlocs[[i]], sd=gsd)
}
M2 <- MAT/rowSums(MAT)
colnames(M2) <- as.character(thegrid)
# rownames(M2) <- as.character(verlocs)
library(reshape2)
D2 <- melt(M2)
# head(D2)
# str(D2)
D2$Var1 <- ordered(D2$Var1)
library(ggplot2)
Q1 <- qplot(Var2, value, data=D2, order=Var1, fill=Var1, geom='area')
Q1
# ggsave('sillyrainbow.png')
# now try the stackoverflow guide() solution
Q1 + guides(col=guide_legend(ncol=2)) # try but fail to put the legend in two columns!
Note that the solution in creating columns within a legend list while using ggplot in R code is incorporated above and it does not work unfortunately!
You are referring to the wrong guide.
Q1 + guides(fill=guide_legend(ncol=2))

Universal scale bar for paneled levelplots

I would like to have multiple heatmaps/levelplots in a single plot, with a universal scale bar. I have the plots arranged, and I think I'm close to the answer, but I want to make sure I don't mess the scale up.
#Fake data
library(gridExtra)
fill = rnorm(100,4)
matA = matrix(fill, ncol=10)
matB = matrix(fill * 2, ncol=10)
# Plotting
a=levelplot(matA, colorkey=FALSE)
b=levelplot(matB, colorkey=list(col=rainbow(1000), at=seq(0,6, length.out=1000)))
grid.arrange(a,b,ncol=2)
Thanks for any help!
Instead of using grid.arrange, you may rearrange your data to be able to use the formula method of x in levelplot. This allows you to easily create a plot with different panels based on a grouping variable g, with a common scale. Here g ('L1') corresponds to the different matrices.
library(reshape2)
library(lattice)
# put your matrices in a list an melt them to one data frame.
l <- list(matA, matB)
df <- melt(l)
# plot
levelplot(value ~ Var1 * Var2 | L1, data = df,
col.regions = rainbow(100))

R: Loop pairs of columns in a dataframe

Is it possible to plot pairs of columns in a single plot with a loop? For example, if I have a data frame of time series with 10 columns (x1, x2.. x10), I would like to create 5 plots: 1st plot will display x1 and x2, the 2nd plot would display x3 and x4 and so on.
Any plotting method would be useful, (zoo, lattice, ggplot2).
I got stuck at creating a loop to plot a single variable:
set.seed(1)
x<- data.frame(replicate(10,rnorm(10, mean = 0, sd = 1)))
cols <- seq(1,10)
library(zoo)
z <- read.zoo(x)
for (i in cols) {
plot(z[,i], screen = 1)
}
Thanks in advance.
How about this with ggplot2 and reshape2:
require(reshape2)
require(ggplot2)
m<-melt(matrix(z,10))
m$facet<-cut(m$Var2,c(0,2,4,6,8,10))
ggplot(m)+geom_line(aes(x=Var1,y=value,group=Var2,color=factor(Var2)))+facet_wrap(~ facet)
It can be done in a single line without a loop like this where the col argument specifies that the odd series are black and the even are red. Note that z in the question has 9 columns (since the first column in x is the time index) so we have used a 10 column z below instead which was likely what was intended.
library(zoo)
# test data
set.seed(123); z <- zoo(matrix(rnorm(250), 25)); colnames(z) <- make.names(1:10)
plot(z, screen = rep(colnames(z)[c(TRUE, FALSE)], each = 2), col = 1:2)
The output is shown below. To produce a single column add the argument nc=1 or to produce a lattice plot replace plot with xyplot.
ADDED: lattice solution.
like this? Although I am not clear how you want to plot it.
par(mfrow=c(1,5))
for (i in seq(1,10,by=2)){
plot(x[,i],x[,i+1])
}

Resources