Plot data based on three objects as z-scores

Plot data based on three objects as z-scores - r

I'm using the package "networktools" in R (https://cran.r-project.org/web/packages/networktools/networktools.pdf).
I've created three "bridge"-objects: DataT5_SDQ_network_b, DataT6_SDQ_network_b, and DataT7_SDQ_network_b.
The three bridge-objects are downloadable here: https://drive.google.com/file/d/12Hgq78RjuXXRLplIXJw6SNU4NoZbULcc/view?usp=sharing.
The code which generates a bridge-object (using networktools):
DataT5_SDQ_network_b <- bridge(DataT5_SDQ_network,
communities=SDQ_communitiesSG, directed=FALSE, nodes =
DataT5_SDQ_list$names)
I have successfully plotted the three "bridge"-objects in the same plot (with legend) using this code:
p <- lapply(list(DataT5_SDQ_network_b,
DataT6_SDQ_network_b,
DataT7_SDQ_network_b), function(x) suppressWarnings(plot(x)))
p <- Map(function(a, b) { a$data$Class <- b; a}, a = p, b = c("T5", "T6", "T7"))
p[[1]]$data <- do.call(rbind, lapply(p, function(x) x$data))
p <- p[[1]] + aes(color = Class, group = Class)
p
The result:
Questions:
How can I plot the data as z-scores?
How can I get the plot only showing Bridge Expected Influence (1-step)?

I found out:
First, plot each plot:
gg1 <- plot(DataT5_SDQ_network_b, include=c("Bridge Expected Influence (1-step)"), theme_bw=FALSE, zscore=TRUE)
gg2 <- plot(DataT6_SDQ_network_b, include=c("Bridge Expected Influence (1-step)"), theme_bw=FALSE, zscore=TRUE)
gg3 <- plot(DataT7_SDQ_network_b, include=c("Bridge Expected Influence (1-step)"), theme_bw=FALSE, zscore=TRUE)
Then combine plots
p <- list(gg1, gg2, gg3)
p <- Map(function(a, b) { a$data$Class <- b; a}, a = p, b = c("T5", "T6", "T7"))
p[[1]]$data <- do.call(rbind, lapply(p, function(x) x$data))
p <- p[[1]] + aes(color = Class, group = Class)
p

Related

Adding layers to ggplots works but adding in a loop does not

I want to add to an existing ggplot in a loop. It works fine as shown in a minimal example below when I add points to a list not using a loop (left plot). When I do the same in a loop, the final result only contains the last point that I added (right plot).
library(ggplot2)
p <- list(); pl <- list()
x0 <- c(-1,1); y0 <- c(-3,3); q <- c(-5,5)
#no loop
p[[1]] <- ggplot() + coord_fixed(xlim = q, ylim = q)
p[[2]] <- p[[1]] +geom_point(aes(x=x0[1], y=y0[1]))
p[[3]] <- p[[2]] + geom_point(aes(x=x0[2], y=y0[2]))
#loop
pl[[1]] <- ggplot() + coord_fixed(xlim = q, ylim = q)
for (i in 1:2)
{
pl[[i+1]] <- pl[[i]] + geom_point(aes(x=x0[i], y=y0[i]))
}
p[[3]]
pl[[3]]

This is due to what is called "lazy evaluation", explained in several posts (like this).
You don't need to add the plots into lists you just overwrite and get the same result.
As for the loop you need to put your data into a data.frame and feed it to the
geom_point() function:
p <- list(); pl <- list()
x0 <- c(-1,1); y0 <- c(-3,3); q <- c(-5,5)
#no loop
p <- ggplot() + coord_fixed(xlim = q, ylim = q)
p <- p +geom_point(aes(x=x0[1], y=y0[1]))
p <- p + geom_point(aes(x=x0[2], y=y0[2]))
#loop
pl <- ggplot() + coord_fixed(xlim = q, ylim = q)
for (i in 1:2){
data<-cbind.data.frame(x=x0[i], y=y0[i])
pl <- pl + geom_point(data=data,aes(x=x, y=y))
}
p
pl

You're a victim of lazy evaluation. [See, for example, here.] A for loop uses lazy evaluation. Fortunately, lapply does not. So,
p <- ggplot() + coord_fixed(xlim = q, ylim = q)
lapply(
1:2,
function(i) p <<- p + geom_point(aes(x=x0[i], y=y0[i]))
)
gives you what you want.
Note the use of <<- as a quick and dirty fix.

Jitter dots without overlap

My data:
a <- sample(1:5, 100, replace = TRUE)
b <- sample(1:5, 100, replace = TRUE)
c <- sample(1:10, 100, replace = TRUE)
d <- sample(1:40, 100, replace = TRUE)
df <- data.frame(a, b, c, d)
Using ggplot2, I have created scatterplot over x = a and y = b, weighted in two dimension (by colour = c and size = d). Note that x and y are intentionally 1:5.
Obviously, the points of different sizes and colors therefore overlap, so I tried jitter to avoid overlapping:
ggplot(df, aes(a, b, colour = c, size = d)) +
geom_point(position = position_jitter())
Now I would like the dots clustering closer together, so I tried several
combinations of height and width for the jitter function, such as
ggplot(df, aes(a, b, colour = c, size = d)) +
geom_point(position = position_jitter(width = 0.2, height = 0.2))
Jitter makes the dots still overlap and also distributes them to randomly on the given area.
Is there a way to have the dots not overlapping at all, yet clustered as close together as possible, maybe even touching and also not "side by side" or stacked? (In a way, creating kind of bubbles with smaller dots)?
Thanks!

According to #Tjebo's suggestions I have arranged dots in "heaps".
set.seed(1234)
n <- 100
a <- sample(1:5,n,rep=TRUE)
b <- sample(1:5,n,rep=TRUE)
c <- sample(1:10,n,rep=TRUE)
d <- sample(1:40,n,rep=TRUE)
df0 <- data.frame(a,b,c,d)
# These parameters need carefully tuning
minr <- 0.05
maxr <- 0.2
# Order circles by dimension
ord <- FALSE
df1 <- df0
df1$d <- minr+(maxr-minr)*(df1$d-min(df1$d))/(max(df1$d)-min(df1$d))
avals <- unique(df1$a)
bvals <- unique(df1$b)
for (k1 in seq_along(avals)) {
for (k2 in seq_along(bvals)) {
print(paste(k1,k2))
subk <- (df1$a==avals[k1] & df1$b==bvals[k2])
if (sum(subk)>1) {
subdfk <- df1[subk,]
if (ord) {
idx <- order(subdfk$d)
subdfk <- subdfk[idx,]
}
subdfk.mod <- subdfk
posmx <- which.max(subdfk$d)
subdfk1 <- subdfk[posmx,]
subdfk2 <- subdfk[-posmx,]
angsk <- seq(0,2*pi,length.out=nrow(subdfk2)+1)
subdfk2$a <- subdfk2$a+cos(angsk[-length(angsk)])*(subdfk1$d+subdfk2$d)/2
subdfk2$b <- subdfk2$b+sin(angsk[-length(angsk)])*(subdfk1$d+subdfk2$d)/2
subdfk.mod[posmx,] <- subdfk1
subdfk.mod[-posmx,] <- subdfk2
df1[subk,] <- subdfk.mod
}
}
}
library(ggplot2)
library(ggforce)
ggplot(df1, aes()) +
geom_circle(aes(x0=a, y0=b, r=d/2, fill=c), alpha=0.7)+ coord_fixed()

An interesting visualization tool is the beeswarm plot.
In R the beeswarm and the ggbeeswarm packages implement this kind of plot.
Here is an example with ggbeeswarm:
set.seed(1234)
a <- sample(1:5,100,rep=TRUE)
b <- sample(1:5,100,rep=TRUE)
c <- sample(1:10,100,rep=TRUE)
d <- sample(1:40,100,rep=TRUE)
df <- data.frame(a,b,c,d)
library(ggbeeswarm)
ggplot(aes(x=a, y=b, col=c, size=d), data = df)+
geom_beeswarm(priority='random',cex=3.5, groupOnX=T)+coord_flip()
I hope this can help you.

Here is another possibile solution to the jittering problem of #Tjebo.
The parameter dst needs some tuning.
set.seed(1234)
a <- sample(1:5,100,rep=TRUE)
b <- sample(1:5,100,rep=TRUE)
c <- sample(1:10,100,rep=TRUE)
d <- sample(1:40,100,rep=TRUE)
df <- data.frame(a,b,c,d)
dst <- .2
df.mod <- df
avals <- unique(df$a)
bvals <- unique(df$b)
for (k1 in seq_along(avals)) {
for (k2 in seq_along(bvals)) {
subk <- (df$a==avals[k1] & df$b==bvals[k2])
if (sum(subk)>1) {
subdf <- df[subk,]
angsk <- seq(0,2*pi,length.out=nrow(subdf)+1)
ak <- subdf$a+cos(angsk[-1])*dst
bk <- subdf$b+sin(angsk[-1])*dst
df.mod[subk,c("a","b")] <- cbind(ak,bk)
}
}
}
library(ggplot2)
ggplot(df.mod, aes(a, b, colour = c, size = d)) + geom_point()

geom_raster faceted plot with ggplot2: control row height

In the example below I have a dataset containing two experiments F1 and F2. A classification is performed based on F1 signal, and both F1 and F2 values are ordered accordingly. In this diagram, each facet has the same dimension although the number of rows is not the same (e.g class #7 contains only few elements compare to the other classes). I would like to modify the code to force row height to be the same across facets (facets would thus have various blank space below). Any hints would be greatly appreciated.
Thank you
library(ggplot2)
library(reshape2)
set.seed(123)
# let's create a fake dataset
nb.experiment <- 4
n.row <- 200
n.col <- 5
nb.class <- 7
d <- matrix(round(runif(n.row * n.col),2), nc=n.col)
colnames(d) <- sprintf("%02d", 1:5)
# These strings will be the row names of each heatmap
# in the subsequent facet plot
elements <- sample(replicate(n.row/2, rawToChar(as.raw(sample(65:90, 6, replace=T)))))
# let's create a data.frame d
d <- data.frame(d,
experiment = sort(rep(c("F1","F2"), n.row/2)),
elements= elements)
# Now we split the dataset by experiments
d.split <- split(d, d$experiment)
# Now we create classes (here using hierarchical clustering )
# based on F1 experiment
dist.mat <- as.dist(1-cor(t(d.split$F1[,1:5]), method="pearson"))
hc <- hclust(dist.mat)
cuts <- cutree(hc, nb.class)
levels(cuts) <- sprintf("Class %02d", 1:nb.experiment)
# We split F1 and F2 based on classification result
for(s in names(d.split)){
d.split[[s]] <- split(d.split[[s]], cuts)
}
# Data are melt (their is perhaps a better solution...)
# in order to use the ggplot function
dm <- melt(do.call('rbind',lapply(d.split, melt)), id.var=c( "experiment", "elements", "variable", "L1"))
dm <- dm[, -5]
colnames(dm) <- c("experiment","elements", "pos", "class", "exprs")
dm$class <- as.factor(dm$class)
levels(dm$class) <- paste("Class", levels(dm$class))
# Now we plot the data
p <- ggplot(dm, aes(x = pos, y = elements, fill = exprs))
p <- p + geom_raster()
p <- p + facet_wrap(~class +experiment , scales = "free", ncol = 2)
p <- p + theme_bw()
p <- p + theme(text = element_text(size=4))
p <- p + theme(text = element_text(family = "mono", face = "bold"))
print(p)

Use facet_grid instead of facet_wrap and set the space attribute:
ggplot(dm, aes(x = pos, y = elements, fill = exprs)) +
geom_raster() +
facet_grid(rowMeanClass ~ experiment , scales = "free", space = "free_y") +
theme_bw()

Order heatmap rows in ggplot2 facet plot

I'm having a problem with faceted heatmap rendering in ggplot2. The idea is that I have several elements (these are genes in the real life) and several experiments (F1 and F2 in the example below). Using the F1 experiment, I'm able to create class of elements/genes based on their mean expression (high, ..., moderate, ..., low). In the heatmap produced through the example below, I would like to order each elements in each class (01, 02, 03, 04) based on its mean expression value in F1. Unfortunately, the elements appear in alphabetic order. I would be very happy to get some hints...
Best
library(ggplot2)
library(reshape2)
set.seed(123)
# let's create a fake dataset
nb.experiment <- 4
n.row <- 200
n.col <- 5
d <- matrix(round(runif(n.row * n.col),2), nc=n.col)
colnames(d) <- sprintf("%02d", 1:5)
# These strings will be the row names of each heatmap
# in the subsequent facet plot
elements <- sample(replicate(n.row/2, rawToChar(as.raw(sample(65:90, 6, replace=T)))))
# let's create a data.frame d
d <- data.frame(d,
experiment = sort(rep(c("F1","F2"), n.row/2)),
elements= elements)
# For elements related to experiment F1
# we artificially produce a gradient of values that will
# create elements with increasing row means
d[d$experiment =="F1",1:5] <- round(sweep(d[d$experiment =="F1",1:5],
1,
seq(from=1, 10, length.out = 100),
"+"), 2)
# For elements related to experiment F2
# we artificially produce a gradient of values that will
# create elements with decreasing row means
d[d$experiment =="F2",1:5] <- round(sweep(d[d$experiment =="F2",1:5],
1,
seq(from=10, 1, length.out = 100),
"+"), 2)
#print(d[d$experiment =="F1",1:5])
# Now we split the dataset by experiments
d.split <- split(d, d$experiment)
# For all experiments, we order elements based on the mean expression signal in
# F1.
row.means.F1 <- rowMeans(d.split$F1[,1:5])
pos <- order(row.means.F1)
for(s in names(d.split)){
d.split[[s]] <- d.split[[s]][pos,]
}
# We create several classes of elements based on their
# mean expression signal in F1.
cuts <- cut(1:nrow(d.split$F1), nb.experiment)
levels(cuts) <- sprintf("%02d", 1:nb.experiment)
for(s in names(d.split)){
d.split[[s]] <- split(d.split[[s]], cuts)
}
# Data are melt (their is perhaps a better solution...)
# in order to use the ggplot function
dm <- melt(do.call('rbind',lapply(d.split, melt)), id.var=c( "experiment", "elements", "variable", "L1"))
dm <- dm[, -5]
colnames(dm) <- c("experiment","elements", "pos", "rowMeanClass", "exprs")
# Now we plot the data
p <- ggplot(dm, aes(x = pos, y = elements, fill = exprs))
p <- p + geom_raster()
p <- p + facet_wrap(~rowMeanClass +experiment , scales = "free", ncol = 2)
p <- p + theme_bw()
p <- p + theme(text = element_text(size=4))
p <- p + theme(text = element_text(family = "mono", face = "bold"))
ggsave("RPlot_test.jpeg", p)

Using your advises I was able to find a solution (which implies to clearly specify the order of levels for the 'elements' factor). Thank you #hrbrmstr (and all others).
NB: I only added few lines compare to the original code that are denoted below with 'Added: begin' and 'Added: end' flags.
library(ggplot2)
library(reshape2)
set.seed(123)
# let's create a fake dataset
nb.experiment <- 4
n.row <- 200
n.col <- 5
d <- matrix(round(runif(n.row * n.col),2), nc=n.col)
colnames(d) <- sprintf("%02d", 1:5)
# These strings will be the row names of each heatmap
# in the subsequent facet plot
elements <- sample(replicate(n.row/2, rawToChar(as.raw(sample(65:90, 6, replace=T)))))
# let's create a data.frame d
d <- data.frame(d,
experiment = sort(rep(c("F1","F2"), n.row/2)),
elements= elements)
# For elements related to experiment F1
# we artificially produce a gradient of values that will
# create elements with increasing row means
d[d$experiment =="F1",1:5] <- round(sweep(d[d$experiment =="F1",1:5],
1,
seq(from=1, 10, length.out = 100),
"+"), 2)
# For elements related to experiment F2
# we artificially produce a gradient of values that will
# create elements with decreasing row means
d[d$experiment =="F2",1:5] <- round(sweep(d[d$experiment =="F2",1:5],
1,
seq(from=10, 1, length.out = 100),
"+"), 2)
#print(d[d$experiment =="F1",1:5])
# Now we split the dataset by experiments
d.split <- split(d, d$experiment)
# For all experiments, we order elements based on the mean expression signal in
# F1.
row.means.F1 <- rowMeans(d.split$F1[,1:5])
pos <- order(row.means.F1)
for(s in names(d.split)){
d.split[[s]] <- d.split[[s]][pos,]
}
## Added: begin ###
#Get the list of elements in proper order (based on row mean)
mean.order <- as.character(d.split$F1$elements)
## Added: end###
# We create several classes of elements based on their
# mean expression signal in F1.
cuts <- cut(1:nrow(d.split$F1), nb.experiment)
levels(cuts) <- sprintf("%02d", 1:nb.experiment)
for(s in names(d.split)){
d.split[[s]] <- split(d.split[[s]], cuts)
}
# Data are melt (their is perhaps a better solution...)
# in order to use the ggplot function
dm <- melt(do.call('rbind',lapply(d.split, melt)), id.var=c( "experiment", "elements", "variable", "L1"))
dm <- dm[, -5]
colnames(dm) <- c("experiment","elements", "pos", "rowMeanClass", "exprs")
## Added: begin###
#Ensure that dm$elements is an ordered factor with levels
# ordered as expected
dm$elements <- factor(dm$elements, levels = mean.order, ordered = TRUE)
## Added: end###
# Now we plot the data
p <- ggplot(dm, aes(x = pos, y = elements, fill = exprs))
p <- p + geom_raster()
p <- p + facet_wrap(~rowMeanClass +experiment , scales = "free", ncol = 2)
p <- p + theme_bw()
p <- p + theme(text = element_text(size=4))
p <- p + theme(text = element_text(family = "mono", face = "bold"))
ggsave("RPlot_test.jpeg", p)

Matrix of density plots with each plot overlaying two distributions

I have a data.frame with 5 columns and I'd like to generate a matrix of density plots, such that each density plot is an overlay of two density plots. (This is akin to plotmatrix, except that in my case, the number of non-NA value in each column differ from column to column and I want overlaid distributions rather than scatter plots).
My first attempt, which didn't work, is given below:
library(ggplot2)
library(reshape)
tmp1 <- data.frame(do.call(cbind, lapply(1:5, function(x) {
r <- rnorm(100)
r[sample(1:100, 20)] <- NA
return(r)
})))
ggplot( melt(tmp1), aes(x=value, fill=variable))+
geom_density(alpha=0.2, position="identity")+opts(legend.position = "none")+
facet_grid(variable ~ variable)
My second approach got me nearly there, but instead of 5 different colors, I only want to use two colors across all the plots. And, I'm sure there is a more elegant way to construct this expanded matrix:
tmp2 <- do.call(rbind, lapply(1:5, function(i) {
do.call(rbind, lapply(1:5, function(j) {
r <- rbind(data.frame(var=sprintf('X%d', i), val=tmp1[,i]),
data.frame(var=sprintf('X%d', j), val=tmp1[,j]))
r <- data.frame(xx=sprintf('X%d', i), yy=sprintf('X%d', j), r)
return(r)
}))
}))
ggplot(tmp2, aes(x=val, fill=var))+
geom_density(alpha=0.2, position="identity")+opts(legend.position = "none")+
facet_grid(xx ~ yy)
My solution was to manually loop through the pairs of columns and generate the overlaid density plots by hand, saving them to a list. I then arranged them in a grid using `grid.arrange' giving the image below.
But is it possible to achieve this using facet_grid instead?

The easiest way is to reshape your data with all permutations (5 * 5 = 25 of them).
require(gregmisc)
perm <- permutations(5, 2, paste0("X", 1:5), repeats.allowed=TRUE)
# instead of gregmisc + permutations, you can use expand.grid from base as:
# perm <- expand.grid(paste0("X", 1:5), paste0("X", 1:5))
o <- apply(perm, 1, function(idx) {
t <- tmp1[idx]
names(t) <- c("A", "B")
t$id1 <- idx[1]
t$id2 <- idx[2]
t
})
require(ggplot2)
require(reshape2)
o <- do.call(rbind, o)
o.m <- melt(o, c("id1", "id2"))
o.m$id1 <- factor(o.m$id1)
o.m$id2 <- factor(o.m$id2)
p <- ggplot(o.m, aes(x = value))
p <- p + geom_density(alpha = 0.2, position = "identity", aes(fill = variable))
p <- p + theme(legend.position = "none")
p <- p + facet_grid(id1 ~ id2)
p

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Plot data based on three objects as z-scores - r

Related

Adding layers to ggplots works but adding in a loop does not

Jitter dots without overlap

geom_raster faceted plot with ggplot2: control row height

Order heatmap rows in ggplot2 facet plot

Matrix of density plots with each plot overlaying two distributions

Categories

Resources