Change point plotting order in ggnet - r

I am working with a large network and wish too highlight certain nodes. I would like these nodes to plot on top of a dense network. They currently are identified by a certain color. Here is some simple example code.
library(network)
library(GGally)
# make a random network
x <- c(0,1,0,1,1,1,0,1,0,1,0,1)
seed <- c(10,25,40,34,1,35,6,3,14,5,23,3)
net <- data.frame(matrix(nrow = 12, ncol = 12))
for (i in 1:12) {
set.seed(seed[i])
net[i] <- sample(x)
}
#plot it with two colors
plot = as.network(net,
directed = FALSE,
ignore.eval = FALSE,
names.eval = 'R_val')
color <- c("yes","yes","no","no","no","no","no","no","no","no","no","no")
final <- ggnet2(net,size = 25,color = color,label = TRUE)
I have really exaggerated the dot size here to make them overlap. Is there a way I can get the "yes" points to always plot on top of the "no" points?
EDIT: Added "labels" for clarity.

Yes, there is! Your color vector first denotes the "yes" and then the "no", which seems to determine the plotting order. Assuming you have more than "yes" or "no", you could try convert the color vector to a factor and set levels. Then you can sort the order of your "yes"s and "no"s:
color <- c("yes","yes","no","no","no","no","no","no","no","no","no","no")
factor_color <- sort(factor(color, levels = c("no", "yes")))
ggnet2(net, size = 100, color = factor_color)
EDIT 1
As per your comment, I cannot think of a (more) elegant solution, but this works for me:
#plot it with two colors
plot = as.network(net,
directed = FALSE,
ignore.eval = FALSE,
names.eval = 'R_val')
color <- c("yes","yes","no","no","no","no","no","no","no","no","no","no")
final <- ggnet2(net,size = 100, color = color, label = TRUE)
final_build <- ggplot2::ggplot_build(final)
# Extract the geom_point data and find which elements have 'yes'
yes_index <- which(color == "yes")
label_data <- final_build$data[[2]]
yes_coordinates_label <- cbind(label_data[yes_index,], label = names(net)[yes_index])
final +
geom_point(data = yes_coordinates_label, aes(x = x, y = y),
size = 100, color = first(yes_coordinates_label$colour)) +
geom_text(data = yes_coordinates_label, aes(x = x, y = y, label = label))
The idea is to plot the dots with geom_point() again but only for the dots which are "yes".
EDIT 2
I couldn't help but think of another solution without plotting the points again. It is possible to retrieve the plot information using ggplot_build() and then to reorder the hierarchy of the points drawn; the datapoints which come first are drawn first. Hence doing the following will work:
library(tidyverse)
# Find the index of the GeomPoint layer
geom_types <- final$layers %>% map("geom") %>% map(class)
GeomPoint_ind <- which(sapply(geom_types, function(x) "GeomPoint" %in% x))
# Retrieve plot information
final_build <- ggplot2::ggplot_build(final)
df <- final_build$data[[GeomPoint_ind]]
# Set the indices which you would like to have on top and modify the ggplot_build object.
yes_index <- which(color == "yes")
final_build$data[[2]] <- rbind(df[-yes_index,], df[yes_index,])
# Convert to plot object and plot
new_final <- ggplot_gtable(final_build)
plot(new_final)

Related

How to specify groups with colors in qqplot()?

I have created a qqplot (with quantiles of beta distribution) from a dataset including two groups. To visualize, which points belong to which group, I would like to color them. I have tried the following:
res <- beta.mle(data$values) #estimate parameters of beta distribution
qqplot(qbeta(ppoints(500),res$param[1], res$param[2]),data$values,
col = data$group,
ylab = "Quantiles of data",
xlab = "Quantiles of Beta Distribution")
the result is shown here:
I have seen solutions specifying a "col" vector for qqnorm, hover this seems to not work with qqplot, as simply half the points is colored in either color, regardless of group. Is there a way to fix this?
A simulated some data just to shown how to add color in ggplot
Libraries
library(tidyverse)
# install.packages("Rfast")
Data
#Simulating data from beta distribution
x <- rbeta(n = 1000,shape1 = .5,shape2 = .5)
#Estimating parameters
res <- Rfast::beta.mle(x)
data <-
tibble(
simulated_data = sort(x),
quantile_data = qbeta(ppoints(length(x)),res$param[1], res$param[2])
) %>%
#Creating a group variable using quartiles
mutate(group = cut(x = simulated_data,
quantile(simulated_data,seq(0,1,.25)),
include.lowest = T))
Code
data %>%
# Adding group variable as color
ggplot(aes( x = quantile_data, y = simulated_data, col = group))+
geom_point()
Output
For those who are wondering, how to work with pre-defined groups, this is the code that worked for me:
library(tidyverse)
library(Rfast)
res <- beta.mle(x)
# make sure groups are not numerrical
# (else color skale might turn out continuous)
g <- plyr::mapvalues(g, c("1", "2"), c("Group1", "Group2"))
data <-
tibble(
my_data = sort(x),
quantile_data = qbeta(ppoints(length(x)),res$param[1], res$param[2]),
group = g[order(x)]
)
data %>%
# Adding group variable as color
ggplot(aes( x = quantile_data, y = my_data, col = group))+
geom_point()
result

R control jitter function - avoid overplotting / non-random jitter

My problems seems simple, I am using ggplot2 with geom_jitter() to plot a variable. (take my picture as an example)
Jitter now adds some random noise to the variable (the variable is just called "1" in this example) to prevent overplotting. So I have now random noise in the y-direction and clearly what otherwise would be completely overplotted is now better visible.
But here is my question:
As you can see, there are still some points, that overplot each other. In my example here, this could be easily prevented, if it wouldn't be random noise in y-direction... but somehow more strategically placed offsets.
Can I somehow alter the geom_jitter() behavior or is there a similar function in ggplot2 that does exactly this?
Not really a minimal example, but also not too long:
library("imputeTS")
library("ggplot2")
data <- tsAirgap
# 2.1 Create required data
# Get all indices of the data that comes directly before and after an NA
na_indx_after <- which(is.na(data[1:(length(data) - 1)])) + 1
# starting from index 2 moves all indexes one in front, so no -1 needed for before
na_indx_before <- which(is.na(data[2:length(data)]))
# Get the actual values to the indices and put them in a data frame with a label
before <- data.frame(id = "1", type = "before", input = na_remove(data[na_indx_before]))
after <- data.frame(id = "1", type = "after", input = na_remove(data[na_indx_after]))
all <- data.frame(id = "1", type = "source", input = na_remove(data))
# Get n values for the plot labels
n_before <- length(before$input)
n_all <- length(all$input)
n_after <- length(after$input)
# 2.4 Create dataframe for ggplot2
# join the data together in one dataframe
df <- rbind(before, after, all)
# Create the plot
gg <- ggplot(data = df) +
geom_jitter(mapping = aes(x = id, y = input, color = type, alpha = type), width = 0.5 , height = 0.5)
gg <- gg + ggplot2::scale_color_manual(
values = c("before" = "skyblue1", "after" = "yellowgreen","source" = "gray66"),
)
gg <- gg + ggplot2::scale_alpha_manual(
values = c("before" = 1, "after" = 1,"source" = 0.3),
)
gg + ggplot2::theme_linedraw() + theme(aspect.ratio = 0.5) + ggplot2::coord_flip()
So many good suggestions...here is what Bens suggestion would look like for my example:
I changed parts of my code to:
gg <- ggplot(data = df, aes(x = input, color = type, fill = type, alpha = type)) +
geom_dotplot(binwidth = 15)
Would basically also work as intended for me. ggbeeplot as suggested by Jon also worked great for my purpose.
I thought of a hack I really like, using ggrepel. It's normally used for labels, but nothing preventing you from making the label into a point.
df <- data.frame(x = rnorm(200),
col = sample(LETTERS[1:3], 200, replace = TRUE),
y = 1)
ggplot(df, aes(x, y, label = "●", color = col)) + # using unicode black circle
ggrepel::geom_text_repel(segment.color = NA,
box.padding = 0.01, key_glyph = "point")
A downside of this method is that ggrepel can take a lot time for a large number of points, and will recalculate differently each time you change the plot size. A faster alternative would be to use ggbeeswarm::geom_quasirandom, which uses a deterministic process to define jitter that looks random.
ggplot(df, aes(x,y, color = col)) +
ggbeeswarm::geom_quasirandom(groupOnX = FALSE)

Adding dummy values on axis in ggplot2 to add asymmetric distance between ticks

How to add dummy values on x-axis in ggplot2
I have 0,2,4,6,12,14,18,22,26 in data and that i have plotted on x-axis. Is there a way to add the remaining even numbers for which there is no data in table? this will create due spaces on the x-axis.
after the activity the x-axis should show 0,2,4,6,8,10,12,14,16,18,20,22,24,26
i have tried using rbind.fill already to add dummy data but when I make them factor the 8,10,12etc coming in last
Thanks
enter image description here
Hope this make sense:
library(ggplot2)
gvals <- factor(letters[1:3])
xvals <- factor(c(0,2,4,6,12,14,18,22,26), levels = seq(0, 26, by = 2))
yvals <- rnorm(10000, mean = 2)
df <- data.frame(x = sample(xvals, size = length(yvals), replace = TRUE),
y = yvals,
group = sample(gvals, size = length(yvals), replace = TRUE))
ggplot(df, aes(x = x, y = y)) + geom_boxplot(aes(fill = group)) +
scale_x_discrete(drop = FALSE)
The tricks are to make the x-variable with all levels you need and to specify drop = FALSE in scale.

how to get top 100 count number for each cell in ggplot2 with geom_bin2d

Before asking, I have read this post, but mine is more specific.
library(ggplot2)
library(scales)
set.seed(1)
dat <- data.frame(x = rnorm(1000), y = rnorm(1000))
I replace my real data with dat, the domain of x and y is [-4,4] at this random seed, and I partition the area into 256(16*16) cells, the interval of which is 0.5. For each cell, I want to get the count numbers.
Yeah, it's quite easy, geom_bin2d can solve it.
# plot
p <- ggplot(dat, aes(x = x, y = y)) + geom_bin2d()
# Get data - this includes counts and x,y coordinates
newdat <- ggplot_build(p)$data[[1]]
# add in text labels
p + geom_text(data=newdat, aes((xmin + xmax)/2, (ymin + ymax)/2,
label=count), col="white")
So far so good, but I only want to get top 100 count numbers and plot in the pic, like pic below.
After reading ?geom_bin2d, drop = TRUE only removes all cells with 0 counts, and my concern is the top 100 counts. What should I do, this is question 1.
Please take another look on the legend of the 2nd pic, the count number is small and close, what if it's 10,000, 20,000, 30,000.
The method is use trans in scale_fill_gradient, the built_in function are exp, log, sqrt, and so on, but I want to divide 1,000. Then, I found trans_new() in package scales and had a try, but negative.
sci_trans <- function(){ trans_new('sci', function(x) x/1000, function(x) x*1000)}
p + scale_fill_gradient(trans='sci')
And, this is question 2. I have googled a lot, but cannot find a way to solve it, thanks a lot for anyone who does me a favor, thank you!
Apparently you can't get the output bins or counts from stat_bin2d or stat_summary_2d ; according to a related question: How to use stat_bin2d() to compute counts labels in ggplot2? where #MrFlick 's comment quotes Hadley from 2010: "he basically says you can't use stat_bin2d, you'll have to do the summarization yourself".
So, the workaround: create the coordinate bins manually yourself, get the 2D counts, then take top-n. For example, using dplyr:
dat %>% mutate(x_binned=some_fn(x), y_binned=some_fn(y)) %>%
group_by(x_binned,y_binned) %>% # maybe can skip this line
summarize(count = count()) %>% # NOTE: no need to sort() or order()
top_n(..., 100)
You might have to poke into stat_bin2d in order to copy (or call) their exact coordinate-binning code. UPDATE: here's the source for stat-bin2d.r
StatBin2d <- ggproto("StatBin2d", Stat,
default_aes = aes(fill = ..count..),
required_aes = c("x", "y"),
compute_group = function(data, scales, binwidth = NULL, bins = 30,
breaks = NULL, origin = NULL, drop = TRUE) {
origin <- dual_param(origin, list(NULL, NULL))
binwidth <- dual_param(binwidth, list(NULL, NULL))
breaks <- dual_param(breaks, list(NULL, NULL))
bins <- dual_param(bins, list(x = 30, y = 30))
xbreaks <- bin2d_breaks(scales$x, breaks$x, origin$x, binwidth$x, bins$x)
ybreaks <- bin2d_breaks(scales$y, breaks$y, origin$y, binwidth$y, bins$y)
xbin <- cut(data$x, xbreaks, include.lowest = TRUE, labels = FALSE)
ybin <- cut(data$y, ybreaks, include.lowest = TRUE, labels = FALSE)
...
}
bin2d_breaks <- function(scale, breaks = NULL, origin = NULL, binwidth = NULL,
bins = 30, right = TRUE) {
...
(But this seems a worthy enhance request on ggplot2, if it hasn't already been filed.)

How to add inbetween space in nested boxplots ggplot2

I would like to added a marginal space between groups of box plots by using the stats_summary method.
Here is a small example of my problem
library(ggplot2)
library(reshape2)
data1 <- (lapply(letters[1:5], function(l1) return(matrix(rt(5*3, 1), nrow = 5, ncol = 3, dimnames = list(cat2=letters[6:10], cat3=letters[11:13])))))
names(data1) <- letters[1:5]
data2 <- melt(data1)
customstats <- function(x) {
xs <- sort(x)
return(c(ymin=min(x), lower= mean(xs[xs < mean(x)]), middle = mean(x) , upper = mean(xs[xs > mean(x)]), ymax=max(x)))
}
ggplot(data2, aes(x=cat2, y=value, fill=cat3), width=2) +
stat_summary(fun.data = customstats, geom = "boxplot",
alpha = 0.5, position = position_dodge(1), mapping = aes(fill=cat3))
The result is the following picture.
I would like to achieve a visual separation for each "cat2" and add a "space" between the group of boxplots (I'm retricted to using the stats_summary since I have a custom statistic). How can I do it?
I have fixed a similar problem in an ugly (but effective for me) way by creating a dataframe with the same plotting variables as my original data, but with x (or y) positioned or factored that it fits between the two points I want to separate and missing values for y (or x). For your problem, I added the following code and got an image with spacial separation of clusters.
library(plyr)
empties <- data.frame(cat2_orig=unique(data2$cat2)[-length(unique(data2$cat2))])
#no extra space needed between last cluster and edge of plot
empties$cat2 <- paste0(empties$cat2_orig,empties$cat2_orig)
empties$value <- NA
data2_space <- rbind.fill(data2,empties)
ggplot(data2_space, aes(x=cat2, y=value, fill=cat3), width=2) +
stat_summary(fun.data = customstats, geom = "boxplot",
alpha = 0.5, position = position_dodge(1), mapping = aes(fill=cat3)) +
#remove tickmarks for non-interesting points on x-axis
scale_x_discrete(breaks=unique(data2$cat2))
Before & after

Resources