R: how to combine ggplots basd on data with different column names - r

Both ggplots below work separtely, but I would like to combine them to one plot where the Group variable of the lines in df2 appears in the legend.
library(ggplot2)
df1 <- data.frame(x = 1:10, y = 1:10)
df2 <- data.frame(x = rep(1:10,2),
y = c(seq(1,2,length.out = 10),
seq(5,6,length.out = 10)),
Group = c(rep("A",10),rep("B",10))
)
p1 <- ggplot(data = df1, aes(x = x, y = y)) +
geom_point()
p2<- ggplot(data = df2, aes(x = x, y = y,
group = Group, color = Group)) +
geom_line()
The problem is caused by the fact that 2 data frames have different column names. Therefore I cannot generate 2 plots and add them like p1 + p2 as it is done in other solutions that have been published here before.

First, in ggplot2 each layer has its own local data argument, i.e. you could pass a different dataset to each layer. If you don't set the data argument for a layer it will simply inherit the global dataset set in ggplot() (if any), i.e. you could combine your plots into one by adding + geom_line(data = df2, aes(group = Group, color = Group)) to your first plot.
Second, if your datasets have different column names that's not a big deal either. As with the data argument, each layer has it's own (local) set of aesthetics. If not set, a layer will inherit the global aesthetics set in ggplot(), i.e. with different column names you simply have to tell which columns should be mapped on the aesthetics in each layer.
Slightly altered your example data:
library(ggplot2)
df1 <- data.frame(x1 = 1:10, y1 = 1:10)
df2 <- data.frame(x2 = rep(1:10,2),
y2 = c(seq(1,2,length.out = 10),
seq(5,6,length.out = 10)),
Group = c(rep("A",10),rep("B",10))
)
ggplot(data = df1, aes(x = x1, y = y1)) +
geom_point() +
geom_line(data = df2, aes(x = x2, y = y2, group = Group, color = Group))

Related

How to change the colour for missing values in geom_miss_point (with two different color scales)

I'm struggling to modifing the colour/shape/... of the points based of if it's a missing value or not.
library(ggplot2)
library(naniar)
ggplot(data = airquality,
aes(x = Ozone,
y = Solar.R)) +
geom_miss_point()
What I have
airquality_no_na <-airquality[!(is.na(airquality$Ozone) | is.na(airquality$Solar.R)) ,]
airquality_na <-airquality[(is.na(airquality$Ozone) | is.na(airquality$Solar.R)),]
ggplot() +
geom_point(data = airquality_no_na,
aes(x = Ozone,
y = Solar.R, colour = "NoMissing")) +
geom_miss_point(data = airquality_na,
aes(x = Ozone,
y = Solar.R, colour = "Missing")) +
scale_colour_manual(name = 'Legende',
values =c('NoMissing'='green',
'Missing'='blue'))
What I would like to have
I don't know how to make the missing value in green and the non-missing value in blue without spliting in two dataframe.
EDIT :
My issue was a bit more complexe. I want to have the possibility to choose the color for the first data set (missing in blue, not missing in green) ans the second data set (missing in red, not missing in yellow)
#Create dataframes
df1=as.data.frame(matrix(data=runif(n=200, 0,1),ncol=2))
df2=as.data.frame(matrix(data=runif(n=100, 0,1),ncol=2))
#Add missing values
df1[rbinom(n=100,size=1,prob = 0.1) ==1,1] <- NA
df1[rbinom(n=100,size=1,prob = 0.1) ==1,2] <- NA
df2[rbinom(n=50,size=1,prob = 0.1) ==1,1] <- NA
df2[rbinom(n=50,size=1,prob = 0.1) ==1,2] <- NA
#This doesnt work. It only print in blue (missing) and green (not missing)
ggplot() +
geom_miss_point(data = df1,
aes(x = V1,
y = V2)) +
geom_miss_point(data = df2,
aes(x = V1,
y = V2)) +
scale_colour_manual(values = c("blue", "green", "yellow","red"))
I am not sure if this a good idea. But for the sake of "showing how to do this in theory". From what I understand from a quick look into the naniar package, is that the color aesthetic is mapped to ..missing.. by default. You would need to dig quite a lot into the actual geom to change that behaviour. But there is a simple workaround for it.
Create a second color scale with ggnewscale.
You will not get around subsetting your data first, but this is not a bad thing. Don't fear to subset your data, that's a very normal thing to do.
library(tidyverse)
library(naniar)
library(ggnewscale)
ggplot() +
geom_miss_point(data = df1, aes(V1, V2)) +
scale_colour_manual(name = "df1", values = c("blue", "green")) +
new_scale_color() +
geom_miss_point(data = df2, aes(V1, V2)) +
scale_colour_manual(name = "df2", values = c("yellow","red"))
With some trial and error I came up with a solution using the group aesthetic:
Row bind your datasets and add an identifier
Map the dataset identifier on group
Map the interaction of ..group.. and naniars ..missing.. on color. (I first tried by using dataset directly but that did not work. ): )
library(ggplot2)
library(naniar)
set.seed(42)
#Create dataframes
df1=as.data.frame(matrix(data=runif(n=200, 0,1),ncol=2))
df2=as.data.frame(matrix(data=runif(n=100, 0,1),ncol=2))
#Add missing values
df1[rbinom(n=100,size=1,prob = 0.1) ==1,1] <- NA
df1[rbinom(n=100,size=1,prob = 0.1) ==1,2] <- NA
df2[rbinom(n=50,size=1,prob = 0.1) ==1,1] <- NA
df2[rbinom(n=50,size=1,prob = 0.1) ==1,2] <- NA
dplyr::bind_rows(df1, df2, .id = "dataset") %>%
ggplot() +
geom_miss_point(aes(x = V1,
y = V2,
group = dataset,
colour = interaction(..group.., ..missing..))) +
scale_colour_manual(values = c("blue", "red", "green", "yellow"))

ggplot : Plot two bars and one line?

I need to plot two bars and one line. I have 3 data frames as this:
require(ggplot2)
df.0 <- data.frame(x = c(1:5), y = rnorm(5))
df.1 <- data.frame(x = c(1:5), y = rnorm(5))
df.2 <- data.frame(x = c(1:5), y = runif(5))
ggplot(df.0, aes(x=x, y=y)) +
geom_line(aes(x=x, y=y))+
geom_bar(data=df.1, aes(x=x, y=y),stat = "identity",position="dodge")+
geom_bar(data=df.2, aes(x=x, y=y),stat = "identity",position="dodge")
I can't manage to plot the bars and the line in the correct way. It should look as the image below.
I'm not familiar with ggplot2. I've read a lot of links, and I can't find a post similar to my question.
Thanks for your time and interest.
Combine the data frames - at least the two for the bar plot. Dodging is done within a single geom_bar layer, not between two separate ones.
df_bar = rbind(df.1, df.2)
df_bar$id = rep(c("df.1", "df.2"), times = c(nrow(df.1), nrow(df.2)))
ggplot(df.0, aes(x = x, y = y)) +
geom_line() +
geom_col(data = df_bar, aes(fill = id), position="dodge")
Other changes: no need to repeat aes(x = x, y = y) in every layer. If it's in the original ggplot() it will be inherited. Also geom_col is a nice way of geom_bar(stat = 'identity').

ggplot graph stored erase each others

Here is some code which reproduces my issue:
x <- as.factor(1:20)
y <- 1:20
id <- as.factor(c(rep(0,19),1))
g1 <- ggplot() + geom_bar(stat = "identity", aes(x = x, y = y, color = id, fill = id), width = 0.5) + ggtitle("g1")
g1 # First print
y <- 20:1
g2 <- ggplot() + geom_bar(stat = "identity", aes(x = x, y = y, color = id, fill = id), width = 0.5) + ggtitle("g2")
g2
g1 # Second print
As you can see when running the code above, the first time you print g1, you have a barplot starting at (factor 1, y = 1), ending at (factor 20, y = 20).
After having created g2, if you print again g1, it looks the same than g2, except the title which isn't modified.
I'm really puzzled, any help would be much appreciated !
ggplot works best when you pull data from a data.frame rather than the global environment. If you did
x <- as.factor(1:20)
y <- 1:20
id <- as.factor(c(rep(0,19),1))
g1 <- ggplot(data.frame(x, y, id)) +
geom_bar(stat = "identity", aes(x = x, y = y, color = id, fill = id), width = 0.5) +
ggtitle("g1")
y <- 20:1
g2 <- ggplot(data.frame(x, y, id)) +
geom_bar(stat = "identity", aes(x = x, y = y, color = id, fill = id), width = 0.5) +
ggtitle("g2")
everything would work fine.
The "problem" is that ggplot doesn't actually "build" the plot until you print it. And when you are linking to variable names with aes(), it just tracks the variable name, not the value. So it uses whatever the current value is when the plot prints. When we "trap" data inside a data.frame, we are capturing the current value of the variable so that we can use that later.

Facetting in ggplot2

I have this dataset: https://dl.dropboxusercontent.com/u/73950/data.csv
The dataset contains 3 variables.
Here's how I visualize the data right now:
library(ggplot2)
library(reshape2)
library(RColorBrewer)
dat = read.csv("data.csv", header = FALSE)
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
sc <- scale_colour_gradientn(colours = myPalette(100))
ggplot(dat, aes(x=V1, y=V3, colour = V2))+ geom_point(alpha = .2,size = 3) + sc
Instead of just one figure, I'd like to facet the figure to display 3 different ways to attribute variables to each axis and color. As such:
x = V1, y = V2, color = V3
x = V1, y = V3, color = V2
x = V2, y = V3, color = V1
How to do this kind of things with ggplot2's faceting?
You can get this by putting the data in the format ggplot likes. In this case, a column that can be used to split the data into facets (called var below). To do that, I just repeated the data three times, choosing the appropriate x and y variables for each 2-way combo, and using the variable left out of each combination as the coloring variable.
## Rearrange the data by 2-way combinations, the coloring is the remaining column
res <- do.call(rbind, combn(1:3, 2, function(ii)
cbind(setNames(dat[,c(ii, setdiff(1:3, ii))], c("x", "y", "color")),
var=paste(ii, collapse=".")), simplify=F))
ggplot(res, aes(x=x, y=y, color=color))+ geom_point(alpha = .2,size = 3) +
facet_wrap(~ var, scales="free") + sc

Plots with a common x axis

I have a data.frame df with columns T ,V1,V2,V3,V4
I would like to make a ggplot containing two plots with T as the common the x axis
The first plot contains V1
The second plot contains V2,V3,V4
I tried:
m1 <- melt(df, id = "T")
chart1<-qplot(T, value, data = m1, geom = "line", group = variable) +
stat_smooth() +
facet_grid(variable ~ ., scale = "free_y")
But this gives me four common plots whereas I just want two.
Is there a way to do this?
library(ggplot2)
library("reshape")
df <- data.frame(T,V1,V2,V3,V4)
m1 <- melt(df, id = "T")
m1$sepfac <- (m1$variable=="V1")
chart1<-qplot(T, value, data = m1, geom = "line", group = variable) +
stat_smooth() +
facet_grid(sepfac~., scale = "free_y")

Resources