lets say I have 2 data frames:
df1 = data.frame(a = rnorm(10), b = rnorm(10), c = rnorm(10))
df2 = data.frame(d = rnorm(10), e = rnorm(10))
I would like to look at the all pairwise scatter plots between data frames:
i.e.: the six scatter plots: a vs d, a vs e, b vs d, b vs e, c vs d, c vs e.
How could I achieve this? I notice that pairs does this for a single data.frame
use cbind to combine the two dfs and then use plot()
df1 = data.frame(a = rnorm(10), b = rnorm(10), c = rnorm(10))
df2 = data.frame(d = rnorm(10), e = rnorm(10))
df <- cbind(df1, df2)
plot(df)
If you want to create only plots between the two data.frames (no self comparison), you can loop them:
par(mfrow = c(ncol(df1), ncol(df2)))
for(i in 1:ncol(df1)){
for(j in 1:ncol(df2)){
plot(df1[,i], df2[,j], main = paste(names(df1)[i], "vs", names(df2)[j]),
ylab = names(df2)[j],
xlab = names(df1)[i])
}
}
A pretty (unnecessarily complicated?) tidyverse/ggplot2 solution.
Reorganize data:
library(dplyr)
library(tidyr)
mfun <- function(x,label="df1") {
x %>%
mutate(obs=seq(n())) %>% ## add obs numbers
gather(key=var,value=value,-obs) ## reshape
}
## combine
df12 <- mfun(df1) %>% full_join(mfun(df2),by="obs")
Plot:
library(ggplot2); theme_set(theme_bw())
ggplot(df12,aes(value.x,value.y)) +
geom_point()+
facet_grid(var.x~var.y)+
theme(panel.margin=grid::unit(0,"lines")) ## squash panels together
Related
I need to scatter plot Observed Vs Predicted data of each Variable using facet_wrap functionality of ggplot. I might be close but not there yet. I use some suggestion from an answer to my previous question to gather the data to automate the plotting process. Here is my code so far- I understand that the aes of my ggplot is wrong but I used it purposely to make my point clear. I would also like to add geom_smooth to have the confidence interval.
library(tidyverse)
DF1 = data.frame(A = runif(12, 1,10), B = runif(12,5,10), C = runif(12, 3,9), D = runif(12, 1,12))
DF2 = data.frame(A = runif(12, 4,13), B = runif(12,6,14), C = runif(12, 3,12), D = runif(12, 4,8))
DF1$df <- "Observed"
DF2$df <- "Predicted"
DF = rbind(DF1,DF2)
DF_long = gather(DF, key = "Variable", value = "Value", -df)
ggplot(DF_long, aes(x = Observed, y = Predicted))+
geom_point() + facet_wrap(Variable~.)+ geom_smooth()
I should see a plot like below, comparing Observed Vs Predicted for each Variable.
We will need to convert each dataframe separately then cbind as x is Observed and y is Predicted, then facet, see this example:
library(ggplot2)
# reproducible data with seed
set.seed(1)
DF1 = data.frame(A = runif(12, 1,10), B = runif(12,5,10), C = runif(12, 3,9), D = runif(12, 1,12))
DF2 = data.frame(A = runif(12, 4,13), B = runif(12,6,14), C = runif(12, 3,12), D = runif(12, 4,8))
DF1_long <- gather(DF1, key = "group", "Observed")
DF2_long <- gather(DF2, key = "group", "Predicted")
plotDat <- cbind(DF1_long, DF2_long[, -1, drop = FALSE])
head(plotDat)
# group Observed Predicted
# 1 A 3.389578 10.590824
# 2 A 4.349115 10.234584
# 3 A 6.155680 8.298577
# 4 A 9.173870 11.750885
# 5 A 2.815137 7.942874
# 6 A 9.085507 6.203175
ggplot(plotDat, aes(x = Observed, y = Predicted))+
geom_point() +
facet_wrap(group~.) +
geom_smooth()
We can use ggpubr to add P and R values to the plot see answers in this post:
Similarly, consider merge on reshaped data frames using base R's reshape (avoiding any tidyr dependencies in case you are a package author). Below lapply + Reduce dynamically merges to bypass helper objects, DF1_long and DF2_long, in global environment:
Data
set.seed(10312019)
DF1 = data.frame(A = runif(12, 1,10), B = runif(12,5,10),
C = runif(12, 3,9), D = runif(12, 1,12))
DF2 = data.frame(A = runif(12, 4,13), B = runif(12,6,14),
C = runif(12, 3,12), D = runif(12, 4,8))
Plot
library(ggplot2) # ONLY IMPORTED PACKAGE
DF1$df <- "Observed"
DF2$df <- "Predicted"
DF = rbind(DF1, DF2)
DF_long <- Reduce(function(x,y) merge(x, y, by=c("Variable", "id")),
lapply(list(DF1, DF2), function(df)
reshape(df, varying=names(DF)[1:(length(names(DF))-1)],
times=names(DF)[1:(length(names(DF))-1)],
v.names=df$df[1], timevar="Variable", drop="df",
new.row.names=1:1E5, direction="long")
)
)
head(DF_long)
# Variable id Observed Predicted
# 1 A 1 6.437720 11.338586
# 2 A 10 4.690934 9.861456
# 3 A 11 6.116200 9.020343
# 4 A 12 6.499371 5.904779
# 5 A 2 6.779087 5.901970
# 6 A 3 6.499652 8.557102
ggplot(DF_long, aes(x = Observed, y = Predicted)) +
geom_point() + geom_smooth() + facet_wrap(Variable~.)
I have a data frame with a nested vector in one column. Any ideas how to ggplot a geom_density using the values from the nested vector?
If I use pivot_longer the entire data frame, I get 25 million rows, so I'd prefer to avoid that if possible.
library(ggplot2)
df = data.frame(a = rep(letters[1:5],length.out = 100), b = sample(LETTERS, 100, replace = T))
df[["c"]] = purrr::map(1:100, function(x) rnorm(100))
# works but too heavy for the actual implementation
ggplot(tidyr::unnest(df, c), aes(c, group = a)) + geom_density() + facet_wrap(vars(b))
# doesn't work
ggplot(df, aes(c, group = a)) + geom_density() + facet_wrap(vars(b))
Different solution: Prepare each plot separately and rearrange your plots afterwards using gridExtra package.
library(ggplot2)
df = data.frame(a = rep(letters[1:5],length.out = 100), b = sample(LETTERS, 100, replace = T))
df[["c"]] = purrr::map(1:100, function(x) rnorm(100))
lst_plot <- lapply(sort(unique(df$b)), function(x){
data <- df[df$b == x,
data <- purrr::map_dfr(seq(length(data$a)), ~ data.frame(a = data$a[.x], c = data$c[.x][[1]]))
gg <- ggplot(data) +
geom_density(aes(c, group = a)) +
ylab(NULL)
return(gg)
})
gridExtra::grid.arrange(grobs = lst_plot, ncol = 6, left = "density")
To be honest, I'm not sure how well this works with your massive dataset...
I have a list of data.frames:
samplelist = list(a = data.frame(x = c(1:10), y=rnorm(10),
b= data.frame(x=c(5:10), y = rnorm(5),
c = data.frame(x=c(2:12), y=rnorm(10))
I'd like to structure a ggplot of the following format:
ggplot()+
geom_line(data=samplelist[[1]], aes(x,y))+
geom_line(data=samplelist[[2]], aes(x,y))+
geom_line(data=samplelist[[3]], aes(x,y))
But that isn't super automated. Does anyone have a suggestion for how to address this?
Thanks!
ggplot works most efficiently with data in "long" format. In this case, that means stacking your three data frames into a single data frame with an extra column added to identify the source data frame. In that format, you need only one call to geom_line, while the new column identifying the source data frame can be used as a colour aesthetic, resulting in a different line for each source data frame. The dplyr function bind_rows allows you to stack the data frames on the fly, within the call to ggplot.
library(dplyr)
library(ggplot2)
samplelist = list(a = data.frame(x=c(1:10), y=rnorm(10)),
b = data.frame(x=c(5:10), y=rnorm(6)),
c = data.frame(x=c(2:12), y=rnorm(11)))
ggplot(bind_rows(samplelist, .id="df"), aes(x, y, colour=df)) +
geom_line()
I assumed above that you would want each line to be a different color and for there to be a legend showing the color mapping. However, if, for some reason, you just want three black lines and no legend, just change colour=df to group=df.
Or you could use lapply.
library(ggplot2)
samplelist = list(a = data.frame(x = c(1:10), y=rnorm(10)),
b= data.frame(x=c(5:10), y = rnorm(6)),
c = data.frame(x=c(2:12), y=rnorm(11)))
p <- ggplot()
plot <- function(df){
p <<- p + geom_line(data=df, aes(x,y))
}
lapply(samplelist, plot)
p
This will work -
library(ggplot2)
samplelist <- list(a = data.frame(x = c(1:10), y=rnorm(10)),
b = data.frame(x=c(5:10), y = rnorm(6)),
c = data.frame(x=c(2:12), y=rnorm(11)))
p <- ggplot()
for (i in 1:3) p <- p + geom_line(data=samplelist[[i]], aes(x,y))
p
Reduce is another option to add things iteratively,
library(ggplot2)
samplelist = list(a = data.frame(x = c(1:10), y=rnorm(10)),
b= data.frame(x=c(5:10), y = rnorm(6)),
c = data.frame(x=c(2:12), y=rnorm(11)))
pl <- Reduce(f = function(p, d) p + geom_line(data=d, aes(x,y)),
x = samplelist, init = ggplot(), accumulate = TRUE)
gridExtra::grid.arrange(grobs = pl)
I have three different data-frames that have same format and I can not combine them because each one represent different data source. I would like to show percentage of one variable for different data frames in one bar chart.
I can get bar-chart for column1 of one dataframe by using:
ggplot(baseline, aes(x = c1)) +
geom_bar(aes(y = (..count..)/sum(..count..)),fill="blue",colour="blue") +
geom_text(aes(y = ((..count..)/sum(..count..)), label=scales::percent((..count..)/sum(..count..))), stat = "count")
I want output similar to this plot(except that I am showing percentage of each category) while race will be name of different data-frames and factor is values of column 1 of data frames.
I do not use ggplot2 but here is an illustration of how to accomplish what you want. It will be easiest to add a column to your data.frames indicating the source of each data.frame. Then calculate whatever metric you want, by source, then plot. Alternatively, you could calculate the metrics first, then combine the data.frames.
library(RColorBrewer)
library(data.table)
set.seed(1234)
make_data <- function() {
n <- sample(5:10, 1)
data.frame(id = rep(c("A", "B", "C"), each = n),
vals = c(rnorm(n, 5, 1), rnorm(n, 10, 1), rnorm(n, 15, 1)))
}
df1 <- make_data()
df2 <- make_data()
df3 <- make_data()
df4 <- make_data()
df1$src <- "source1"
df2$src <- "source2"
df3$src <- "source3"
df4$src <- "source4"
dat <- do.call(rbind, list(df1, df2, df3, df4))
dat <- as.data.table(dat)
res <- dat[ , mean(vals), by = list(id, src)][order(id)]
barplot(height = res$V1, col = rep(brewer.pal(4, "Set1"), 3))
EDIT
Here is the ggplot2 code provided by Sumedh:
library(ggplot2)
ggplot(res, aes(x = id, y = V1, fill = src)) +
geom_bar(stat = "identity", position = "dodge")
I'm new to plotting in R so I ask for your help. Say I have the following matrix.
mat1 <- matrix(seq(1:6), 3)
dimnames(mat1)[[2]] <- c("x", "y")
dimnames(mat1)[[1]] <- c("a", "b", "c")
mat1
x y
a 1 4
b 2 5
c 3 6
I want to plot this, where the x-axis contains each rowname (a, b, c) and the y-axis is the value of each rowname (a = 1 and 4, b = 2 and 5, c = 3 and 6). Any help would be appreciated!
| o
| o x
| o x
| x
|_______
a b c
Here's one way using base graphics:
plot(c(1,3),range(mat1),type = "n",xaxt ="n")
points(1:3,mat1[,2])
points(1:3,mat1[,1],pch = "x")
axis(1,at = 1:3,labels = rownames(mat1))
Edited to include different plotting symbol
matplot() was designed for data in just this format:
matplot(y = mat1, pch = c(4,1), col = "black", xaxt ="n",
xlab = "x-axis", ylab = "y-axis")
axis(1, at = 1:nrow(mat1), labels = rownames(mat1)) ## Thanks, Joran
And finally, a lattice solution
library(lattice)
dfmat <- as.data.frame(mat1)
xyplot( x + y ~ factor(rownames(dfmat)), data=dfmat, pch=c(4,1), cex=2)
You could do it in base graphics, but if you're going to use R for much more than this I think it is worth getting to know the ggplot2 package. Note that ggplot2 only takes data frames - but then, it is often more useful to keep your data in data frames rather than matrices.
d <- as.data.frame(mat1) #convert to a data frame
d$cat <- rownames(d) #add the 'cat' column
dm <- melt(d, id.vars)
dm #look at dm to get an idea of what melt is doing
require(ggplot2)
ggplot(dm, aes(x=cat, y=value, shape=variable)) #define the data the plot will use, and the 'aesthetics' (i.e., how the data are mapped to visible space)
+ geom_point() #represent the data with points