ggplot2 facet_wrap() 4 scatter plot - r

I have a dataset (from R):
head(anscombe)
x1 x2 x3 x4 y1 y2 y3 y4
1 10 10 10 8 8.04 9.14 7.46 6.58
2 8 8 8 8 6.95 8.14 6.77 5.76
And now I would like to plot scatter plot of (x1,y1), (x2,y2), (x3,y3) and (x4, y4) in grid using ggplot2. Each subplot should also have title "1", "2","3","4" respectively. It should be similar as when we use par(mfrow=c(2,2)) I looked into facet_wrap documentation but the examples seems to be not covering this simple case. How can I achieve it in ggplot2?

Here's one way to do it, if hard-coding the dataset numbers 1-4 is acceptable:
library(dplyr)
library(ggplot2)
data(anscombe)
list(
transmute(anscombe, x=x1, y=y1, dataset=1),
transmute(anscombe, x=x2, y=y2, dataset=2),
transmute(anscombe, x=x3, y=y3, dataset=3),
transmute(anscombe, x=x4, y=y4, dataset=4)
) %>%
bind_rows() %>%
ggplot(aes(x, y)) +
geom_point() +
facet_wrap(~ dataset)
The main thing is that you need all the x-coordinate values (x1 to x4) in one variable, and all y-coordinates (y1 to y4) in another.

You could try without facet_wrap too:
library(ggplot2)
library(gridExtra)
grid.arrange(ggplot(df, aes(x1, y1))+geom_point(size=2),
ggplot(df, aes(x2, y2))+geom_point(size=2),
ggplot(df, aes(x3, y3))+geom_point(size=2),
ggplot(df, aes(x4, y4))+geom_point(size=2))

It's possible not all of this is required, but it worked for me. To see what it is doing, just iterate through, line-by-line, and look at the intermediate steps.
library(dplyr)
library(tidyr)
library(ggplot2)
mutate(df, i = row_number()) %>%
gather(key, val, -i) %>%
mutate(pane = gsub("[a-z]", "", key),
key = gsub("[^a-z]", "", key)) %>%
spread(key, val) %>%
ggplot(aes(x=x,y=y)) +
geom_point() +
facet_wrap(~pane)

Related

Simulation with Tidyverse -- putting data into tibble format

I am trying to run simulations in R using the tidyverse. This code works, but doesn't scale well to more than a few variables.
Any thoughts on how to improve this? I've tried purrr but I didn't find any success.
The example below draws 5 values from a normal distribution and repeats this 3 times. How could I repeat it n times instead of 3?
n = 5
x=1:n
y1 = rnorm(n)
y2 = rnorm(n)
y3 = rnorm(n)
# put data into tibble
df <- tibble(x=x, y1=y1, y2=y2, y3=y3)
# Tidy data -- go from wide to long
df <- pivot_longer(df, cols=starts_with('y'))
# Make plot
ggplot(df, aes(x=x, y=value, group=name, color=name))+
geom_line()
If we need to replicate, then
library(dplyr)
library(tidyr)
library(stringr)
library(ggplot2)
n <- 5
rpl <- 3
replicate(rpl, rnorm(n), simplify = FALSE) %>%
set_names(str_c('y', seq_along(.))) %>%
as_tibble %>%
mutate(x = row_number()) %>%
pivot_longer(cols = starts_with('y')) %>%
ggplot(aes(x=x, y=value, group=name, color=name))+
geom_line()

How to plot percentage labels in a bar plot per each legend?

I'm trying to plot a bar plot geom_bar() with ggplot, having a variable on x-axis and its count on Y-axis and the third to be as a legend and inside each legend the percentage of that legend from the count of Y-axis.
I'm using the built-in mtcars dataset as a reprex.
library (ggplot2)
ggplot(data = mtcars, aes(x=cyl, y= ..count..)) +
geom_bar(aes(fill = factor(gear)))
I would like to have these percentages inside each legend, in a decent way if possible.
prop.table(table(eight$gear))
3 5
0.8571429 0.1428571
prop.table(table(six$gear))
3 4 5
0.2857143 0.5714286 0.1428571
prop.table(table(four$gear))
3 4 5
0.09090909 0.72727273 0.18181818
The documentation for this is listed here
But I reached the point where I can't put y as labels since I get it from a ggplot special start variable.
I'm sorry I can't upload images, since I'm a newbie having no sufficient reputation to post one, but running that mtcars code will generate what I'm asking for.
I'd probably format the % labels separately and them add them as a second data frame to the chart.
Formatting using dplyr:
library(dplyr)
legendLabels <- mtcars %>%
group_by(cyl, gear) %>%
summarise(count = n()) %>%
arrange(cyl, desc(gear)) %>%
mutate(percent = round(count/sum(count)*100,2),
yPos = cumsum(count))
And the updated chart:
ggplot(data = mtcars, aes(x=cyl, y= ..count..)) +
geom_bar(aes(fill = factor(gear))) +
geom_text(data = legendLabels, aes(x=cyl, y=yPos-0.5, label=paste0(percent,"%")))
Giving this chart:
Hopefully this is what you're looking for.

ggplot2 automatically removes missing values but does not rescale axes

The code below show that ggplot2 automatically removes the 2nd observation, and yet still keep the y-axis's range from 1 to 1000. How to make ggplot2 scale appropriately without hard-coding the range myself?
df <- data.frame(x = c(1, NA),
y = c(1, 1000))
ggplot(df) + geom_point(aes(x, y))
How about removing rows with missing values in x before plotting?
library(dplyr)
df %>%
filter(!is.na(x)) %>%
ggplot() +
geom_point(aes(x, y))
Or use na.omit
df %>%
na.omit() %>%
ggplot() +
geom_point(aes(x, y))

Scatter plot with ggplot

I want to do a scatter (xy) plot of variables in a melted data frame as shown below.
df
class var mean
0 x 4.25
0 y 6.25
1 x 2.00
1 y 11.00
I have tried this, but it plots 4 points. How can plot x and y?
library(ggplot2)
ggplot(df, aes(x=mean, y=mean, group=var, colour=class)) +
geom_point( size=5, shape=21, fill="white")
As Heroka pointed out, you need the data to be in a more wide type format. If the data was read in like this, you may use the following to convert it.
## you don't need this since you already have df
text = "class var mean
0 x 4.25
0 y 6.25
1 x 2.00
1 y 11.00"
df = read.delim(textConnection(text),header=TRUE,strip.white=TRUE,
stringsAsFactors = FALSE, sep = " ");df2
## use this library to switch from long-wide
library(reshape2)
df2 = dcast(df, class ~ var, value.var = "mean")
library(ggplot2)
ggplot(df2, aes(x=x, y=y, colour=class)) +
geom_point( size=5, shape=21, fill="white")

Vary scale of geom_point size by facet

I'm using ggplot with facet_wrap to generate 3 side-by-side plots with linear models. In addition, I have another dimension (let's call it "z") I'd like to visualize by varying the size of the points on the plots.
Currently, the plots I generate keep the size of the points on the same scale across all 3 facets. I would instead like to scale the point sizes by facet - that way, one can quickly tell which point contains the highest "z" value for each facet.
Is there any way to do this without creating 3 separate plots? I've included a sample of my data and the code I used below:
x <- c(0.03,1.32,2.61,3.90,5.20,6.48,7.77,0.75,2.04,3.33,4.62,5.91,7.20,8.49,0.41,1.70,3.00,4.28,5.57,6.86,8.15)
y <- c(650,526,382,110,72,209,60,559,296,76,48,64,20,22,50,102,176,21,20,25,5)
z <- c(391174,244856,836435,46282,40351,27118,17411,26232,59162,9737,1917,20575,1484,450,12071,13689,133326,1662,711,728,412)
facet <- c("A","A","A","A","A","A","A","B","B","B","B","B","B","B","C","C","C","C","C","C","C")
df <- data.frame(x,y,z,facet)
ggplot(df, aes(x=x, y=y)) +
geom_point(aes(size=z)) +
geom_smooth(method="lm") +
facet_wrap(~facet)
The method below reassigns z to it's z-score within it's facet:
require(dplyr)
require(ggplot)
require(magrittr)
require(scales)
x <- c(0.03,1.32,2.61,3.90,5.20,6.48,7.77,0.75,2.04,3.33,4.62,5.91,7.20,8.49,0.41,1.70,3.00,4.28,5.57,6.86,8.15)
y <- c(650,526,382,110,72,209,60,559,296,76,48,64,20,22,50,102,176,21,20,25,5)
z <- c(391174,244856,836435,46282,40351,27118,17411,26232,59162,9737,1917,20575,1484,450,12071,13689,133326,1662,711,728,412)
facet <- c("A","A","A","A","A","A","A","B","B","B","B","B","B","B","C","C","C","C","C","C","C")
df <- data.frame(x,y,z,facet)
df %<>%
group_by(facet) %>%
mutate(z = scale(z)) # calculate point size within group
ggplot(df, aes(x=x, y=y, group = facet)) +
geom_point(aes(size=z)) +
geom_smooth(method="lm") +
facet_wrap(~facet )
Try to rescale size for each facet to take values in (0,1]:
df %>%
group_by(facet) %>%
mutate(newz = z/max(z)) %>%
ggplot(., aes(x=x, y=y)) +
geom_point(aes(size=newz)) +
geom_smooth(method="lm") +
facet_wrap(~facet)
I would just take the mean of the df$z by each df$facet
AverageFacet <- df %>% group_by(facet) %>% summarize(meanwithinfacet= mean(z, na.rm=TRUE))
df <- merge(df, AverageFacet)
df$pointsize<- df$z - df$meanwithinfacet
Now each point size depends on the mean of the facets
> head(df,10)
facet x y z meanwithinfacet pointsize
1 A 0.03 650 391174 229089.57 162084.429
2 A 1.32 526 244856 229089.57 15766.429
3 A 2.61 382 836435 229089.57 607345.429
4 A 3.90 110 46282 229089.57 -182807.571
5 A 5.20 72 40351 229089.57 -188738.571
6 A 6.48 209 27118 229089.57 -201971.571
7 A 7.77 60 17411 229089.57 -211678.571
8 B 0.75 559 26232 17079.57 9152.429
9 B 2.04 296 59162 17079.57 42082.429
and plot
ggplot(df, aes(x=x, y=y)) +
geom_point(aes(size=pointsize)) +
geom_smooth(method="lm") +
facet_wrap(~facet)
Looks like this, not sure about the legend though.
You could also instead of using the absolute difference from the mean use the how many standard deviates from the mean a given z is
AverageFacet <- df %>% group_by(facet) %>% summarize(meanwithinfacet= mean(z, na.rm=TRUE), sdwithinfacet= sd(z, na.rm=TRUE))
df <- merge(df, AverageFacet)
df$absoluteDiff<- df$z - df$meanwithinfacet
df$SDfromMean <- df$absoluteDiff / df$sdwithinfacet
ggplot(df, aes(x=x, y=y)) +
geom_point(aes(size=SDfromMean)) +
geom_smooth(method="lm") +
facet_wrap(~facet)

Resources