Below is a minimal working example.
library(ggplot2)
set.seed(926)
df <- data.frame(x. = rnorm(100),
y. = rnorm(100),
color. = rnorm(100))
library(ggplot2)
p <- ggplot(df, aes(x = x., y = y., color = color.)) +
geom_point() +
viridis::scale_color_viridis(option = "C")
p
p_build <- ggplot_build(p)
# The desired vector is below somehow I feel there must have an easier way to get it
p_build[["data"]][[1]][["colour"]]
df$color_converted <- p_build[["data"]][[1]][["colour"]]
Specifically, I like to use viridis::viridis(option = "C") color scheme. Could anyone help with this? Thanks.
*Modify*
Sorry, my question wasn't clear enough. Let me put it this way, I couldn't utilize ggplot2 package and had to use the pure plot() function that comes with R, in my specific project.
My goal is to try to reproduce the above plot with the base R package.
plot(df$x., df$y., color = df$color_converted)
If possible, could anyone also direct me on how to customize a gradient legend that is similar to ggplot2, with base legend()?
First of all you can assign the colors to a vector called "color2" and use scale_colour_gradientn to assign these colors to your plot. The problem is that the colors are not sorted right so you have to do that first by using the TSP package. In the output below you can see that you can recreate the plot without using scale_color_viridis:
set.seed(926)
df <- data.frame(x. = rnorm(100),
y. = rnorm(100),
color. = rnorm(100))
library(ggplot2)
library(TSP)
p <- ggplot(df, aes(x = x., y = y., color = color.)) +
geom_point() +
viridis::scale_color_viridis(option = "C")
p
p_build <- ggplot_build(p)
# The desired vector is below somehow I feel there must have an easier way to get it
color2 <- p_build[["data"]][[1]][["colour"]]
rgb <- col2rgb(color2)
lab <- convertColor(t(rgb), 'sRGB', 'Lab')
ordered_cols2 <- color2[order(lab[, 'L'])]
ggplot(df, aes(x = x., y = y.)) +
geom_point(aes(colour = color.)) +
scale_colour_gradientn(colours = ordered_cols2, guide = "colourbar")
#viridis::scale_color_viridis(option = "C")
Created on 2022-08-17 with reprex v2.0.2
Base r
You can use the following code:
color2 <- p_build[["data"]][[1]][["colour"]]
rgb <- col2rgb(color2)
lab <- convertColor(t(rgb), 'sRGB', 'Lab')
ordered_cols2 <- color2[order(lab[, 'L'])]
layout(matrix(1:2,ncol=2), width = c(2,1),height = c(1,1))
plot(df$x., df$y., col = df$color_converted)
legend_image <- as.raster(matrix(ordered_cols2, ncol=1))
plot(c(0,2),c(0,1),type = 'n', axes = F,xlab = '', ylab = '', main = 'legend title')
text(x=1.5, y = seq(0,1,l=5), labels = seq(-3,3,l=5))
rasterImage(legend_image, 0, 0, 1,1)
Output:
I have a data frame representing a benchmark and I would like to produce all possible comparison plots. Here is a small example of data frame that represents my problem.
df = data.frame("A"=c(1,2,3,1,2,3,1,2,3,1,2,3), "B"=c(1,1,1,2,2,2,1,1,1,2,2,2), "C"=c(1,1,1,1,1,1,2,2,2,2,2,2), "D"=c(4,5,6,7,8,9,10,11,12,13,14,15))
I want to produce the following plots.
D in function of A, when B and C are fixed. This would produce four (4) different lines, one for each couple (B,C).
D in function of B, when A and C are fixed. This would also produce six (6) different lines.
D in function of C, when A and B are fixed. Again, six (6) different lines.
Is there a simple way to this in R ?
For now, I don't mind that they are in different plots or not. Any representation would be ok at this point. I only need all plots to be produced, since I don't know how we want to display our results.
Edit
I forgot to specify in my example that the columns of the data frame do not have the same factor levels. Here is a more complete example.
df = data.frame("A"=c(1,2,3,1,2,3,1,2,3,1,2,3),
"B"=c("[0,1]","[0,1]","[0,1]","[1,3]","[1,3]","[1,3]","[0,1]","[0,1]","[0,1]","[1,3]","[1,3]","[1,3]"),
"C"=c(1,1,1,1,1,1,2,2,2,2,2,2),
"D"=c(4,5,6,7,8,9,10,11,12,13,14,15))
Using #mattek's solution, I have the following plots.
This is great. If I could remove the extra values from the x-axis and keep only the corresponding factors for each column, that would be perfect.
library(ggplot2)
library(reshape2)
First, we melt your table:
df.plot = melt(df,
measure.vars = c('A', 'B', 'C'),
id.vars = 'D',
variable.name = 'var.name',
value.name = 'val.abc')
Then, we add groupings column:
df.plot$grouping = rep(1:4, 3, each = 3)
And we are ready to plot:
ggplot(df.plot, aes(x = val.abc, y = D, group = as.factor(grouping))) +
facet_wrap(~ var.name) +
geom_line(aes(colour = var.name)) +
geom_point(aes(colour = var.name))
Using facet_wrap(~ var.name, scale = "free_x") instead would get rid of non-existant factors in every facet.
Possible answer for exploratory analysis that will show correlation between variables and also a smoothing line:
df = data.frame("A"=c(1,2,3,1,2,3,1,2,3,1,2,3), "B"=c(1,1,1,2,2,2,1,1,1,2,2,2), "C"=c(1,1,1,1,1,1,2,2,2,2,2,2), "D"=c(4,5,6,7,8,9,10,11,12,13,14,15))
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- cor(x, y)
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste0(prefix, txt)
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor * r)
}
pairs(df, lower.panel = panel.smooth, upper.panel = panel.cor)
Another option comes from ggplot using the GGaly package:
library(ggplot2)
library(GGally)
this helps a lot if some of your data is a factor, using your data, lets assume that A is a factor variables
df = data.frame("A"=as.factor(c(1,2,3,1,2,3,1,2,3,1,2,3)), "B"=c(1,1,1,2,2,2,1,1,1,2,2,2), "C"=c(1,1,1,1,1,1,2,2,2,2,2,2), "D"=c(4,5,6,7,8,9,10,11,12,13,14,15))
then ggpairs would make boxplots instead of points, you can choose there
ggpairs(df)
Here's what I would do, I would create three new variables which capture the different combinations of A, B, and C fixed:
library(dplyr)
library(ggplot2)
dat <- data.frame("A"=c(1,2,3,1,2,3,1,2,3,1,2,3),
"B"=c(1,1,1,2,2,2,1,1,1,2,2,2),
"C"=c(1,1,1,1,1,1,2,2,2,2,2,2),
"D"=c(4,5,6,7,8,9,10,11,12,13,14,15))
# add variables for A-B, A-C, B-C
dat <- dat %>%
mutate('A - B' = paste(A, '-', B),
'A - C' = paste(A, '-', C),
'B - C' = paste(B, '-', C))
Then we make the plots:
ggplot(dat, aes(y = D))+
geom_line(aes(x = C, colour = `A - B`))
ggplot(dat, aes(y = D))+
geom_line(aes(x = B, colour = `A - C`))
ggplot(dat, aes(y = D))+
geom_line(aes(x = A, colour = `B - C`))
Given a numeric dataset {(x_i, y_i, z_i)} with N points, one can create a scatterplot by drawing a point P_i=(x_i,y_i) for each i=1,...,N and color each point with an intensity depending on the value of z_i.
library(ggplot2)
N = 1000;
dfA = data.frame(runif(N), runif(N), runif(N))
dfB = data.frame(runif(N), runif(N), runif(N))
names(dfA) = c("x", "y", "z")
names(dfB) = c("x", "y", "z")
PlotA <- ggplot(data = dfA, aes(x = x, y = y)) + geom_point(aes(colour = z));
PlotB <- ggplot(data = dfB, aes(x = x, y = y)) + geom_point(aes(colour = z));
Assume I have created these scatterplots. What I would like to do for each dataset is to divide the plane with a grid (rectangular, hexagonal, triangular, ... doesn't matter) and color each cell of the grid with the average intensity of all the points that fall within the cell.
Additionally, suppose I have created two such plots PlotA and PlotB (as above) for two different datasets dfA and dfB. Let c_i^k be the i-th cell of plot k. I want to create a third plot such that c_i^3 = c_i^1 * c_i^2 for every i.
Thank you.
EDIT: Minimum example
Dividing the plane and calculating summaries for rectangles is pretty straight-forward with the stat_summary2d function. First, i'm going to create explicit breaks rather than letting ggplot choose them so they will be the exact same for both plots
bb<-seq(0,1,length.out=10+1)
breaks<-list(x=bb, y=bb)
p1 <- ggplot(data = dfA, aes(x = x, y = y, z=z)) +
stat_summary2d(fun=mean, breaks=breaks) + ggtitle("A");
p2 <- ggplot(data = dfB, aes(x = x, y = y, z=z)) +
stat_summary2d(fun=mean, breaks=breaks) + ggtitle("B");
Then to get the different is a bit messier, but we can extract the data from the plots we've already created and combine them
#get data
d1 <- ggplot_build(p1)$data[[1]][, 2:4]
d2 <- ggplot_build(p2)$data[[1]][, 2:4]
mm <- merge(d1, d2, by=c("xbin","ybin"))
#turn factor back into numeric values
mids <- diff(bb)/2+bb[-length(bb)]
#plot difference
ggplot(mm, aes(x=mids[xbin], y=mids[ybin], fill=value.x-value.y)) +
geom_tile() + scale_fill_gradient2(name="diff") + labs(x="x",y="y")