Vary scale of geom_point size by facet - r

I'm using ggplot with facet_wrap to generate 3 side-by-side plots with linear models. In addition, I have another dimension (let's call it "z") I'd like to visualize by varying the size of the points on the plots.
Currently, the plots I generate keep the size of the points on the same scale across all 3 facets. I would instead like to scale the point sizes by facet - that way, one can quickly tell which point contains the highest "z" value for each facet.
Is there any way to do this without creating 3 separate plots? I've included a sample of my data and the code I used below:
x <- c(0.03,1.32,2.61,3.90,5.20,6.48,7.77,0.75,2.04,3.33,4.62,5.91,7.20,8.49,0.41,1.70,3.00,4.28,5.57,6.86,8.15)
y <- c(650,526,382,110,72,209,60,559,296,76,48,64,20,22,50,102,176,21,20,25,5)
z <- c(391174,244856,836435,46282,40351,27118,17411,26232,59162,9737,1917,20575,1484,450,12071,13689,133326,1662,711,728,412)
facet <- c("A","A","A","A","A","A","A","B","B","B","B","B","B","B","C","C","C","C","C","C","C")
df <- data.frame(x,y,z,facet)
ggplot(df, aes(x=x, y=y)) +
geom_point(aes(size=z)) +
geom_smooth(method="lm") +
facet_wrap(~facet)

The method below reassigns z to it's z-score within it's facet:
require(dplyr)
require(ggplot)
require(magrittr)
require(scales)
x <- c(0.03,1.32,2.61,3.90,5.20,6.48,7.77,0.75,2.04,3.33,4.62,5.91,7.20,8.49,0.41,1.70,3.00,4.28,5.57,6.86,8.15)
y <- c(650,526,382,110,72,209,60,559,296,76,48,64,20,22,50,102,176,21,20,25,5)
z <- c(391174,244856,836435,46282,40351,27118,17411,26232,59162,9737,1917,20575,1484,450,12071,13689,133326,1662,711,728,412)
facet <- c("A","A","A","A","A","A","A","B","B","B","B","B","B","B","C","C","C","C","C","C","C")
df <- data.frame(x,y,z,facet)
df %<>%
group_by(facet) %>%
mutate(z = scale(z)) # calculate point size within group
ggplot(df, aes(x=x, y=y, group = facet)) +
geom_point(aes(size=z)) +
geom_smooth(method="lm") +
facet_wrap(~facet )

Try to rescale size for each facet to take values in (0,1]:
df %>%
group_by(facet) %>%
mutate(newz = z/max(z)) %>%
ggplot(., aes(x=x, y=y)) +
geom_point(aes(size=newz)) +
geom_smooth(method="lm") +
facet_wrap(~facet)

I would just take the mean of the df$z by each df$facet
AverageFacet <- df %>% group_by(facet) %>% summarize(meanwithinfacet= mean(z, na.rm=TRUE))
df <- merge(df, AverageFacet)
df$pointsize<- df$z - df$meanwithinfacet
Now each point size depends on the mean of the facets
> head(df,10)
facet x y z meanwithinfacet pointsize
1 A 0.03 650 391174 229089.57 162084.429
2 A 1.32 526 244856 229089.57 15766.429
3 A 2.61 382 836435 229089.57 607345.429
4 A 3.90 110 46282 229089.57 -182807.571
5 A 5.20 72 40351 229089.57 -188738.571
6 A 6.48 209 27118 229089.57 -201971.571
7 A 7.77 60 17411 229089.57 -211678.571
8 B 0.75 559 26232 17079.57 9152.429
9 B 2.04 296 59162 17079.57 42082.429
and plot
ggplot(df, aes(x=x, y=y)) +
geom_point(aes(size=pointsize)) +
geom_smooth(method="lm") +
facet_wrap(~facet)
Looks like this, not sure about the legend though.
You could also instead of using the absolute difference from the mean use the how many standard deviates from the mean a given z is
AverageFacet <- df %>% group_by(facet) %>% summarize(meanwithinfacet= mean(z, na.rm=TRUE), sdwithinfacet= sd(z, na.rm=TRUE))
df <- merge(df, AverageFacet)
df$absoluteDiff<- df$z - df$meanwithinfacet
df$SDfromMean <- df$absoluteDiff / df$sdwithinfacet
ggplot(df, aes(x=x, y=y)) +
geom_point(aes(size=SDfromMean)) +
geom_smooth(method="lm") +
facet_wrap(~facet)

Related

How can I line up graphs with a common x axis made using 2 different data frames when the x axis values are not identical?

I have 2 datasets, both with the column Depth. However df1 has ~400 rows and df2 7000 rows. The depth values, which I want to be my common x axis, for df1 go from 48-120 and df2 48-133. When I make my plots this difference in the range stops the plots from lining up.
df1 sample data
depth L F P Depo
67.48 1.003 1.063 1.066 Turb
67.63 1.004 1.020 1.024 Dri
67.73 1.011 1.017 1.028 Dri
67.83 1.006 1.007 1.014 Turb
67.92 1.003 1.029 1.032 Pro
68.06 1.004 1.007 1.011 Pro
df2 sample data
depth Ca Ti
67.41 378 241
67.91 422 253
67.94 402 262
67.95 412 264
67.98 377 266
68.01 386 263
68.02 326 266
68.08 338 219
I tried making individual plots and then using grid.draw but this doesn't work.
creating plots from DF1
Lin <- ggplot(DF1, aes(x=depth, y=L)) + geom_line() + geom_point(data = DF1, aes(x=depth, y=L, color = Depo))
Fab <- ggplot(DF1, aes(x=depth, y=P)) + geom_path() + geom_point(data = DF1, aes(x=depth, y=P, color = Depo))
Fol <- ggplot(DF1, aes(x=depth, y=F)) + geom_path() + geom_point(data = DF1, aes(x=depth, y=F, color = Depo))
Combining plots with grid.draw works for the df1 graphs
grid.draw(rbind(ggplotGrob(Fol), ggplotGrob(Lin), ggplotGrob(Fab), size = "last"))
Creating plot from DF2
Ca1 <- ggplot(DF2, aes(x=depth, y=Ca)) + geom_path()
When I try to combine the plots from the 2 dataframes it throws an error that x and y must have the same amount of columns.
grid.draw(rbind(ggplotGrob(Fol), ggplotGrob(Lin), ggplotGrob(Fab), ggplotGrob(Ca1), size = "last"))
Cowplot works but the depths don't line up for my df2 graph (Ca1)
plot_grid(Fol, Lin, Fab, Ca1, align="h", axis="b", nrow = 4, rel_widths = c(1,2))
I tried some other ways of lining the graphs up but it seems they all line the plots up, not the actual values of the x axis. I also tried to use facet wrap but couldn't work out how to combine the 2 dfs. In my searching to resolve this problem I keep seeing to combine the 2 dataframes but I can't see how this would work with my data?
Does anyone know how I can line these graphs up? I have so many variables I need to compare from both datasets.
To integrate the two data sets, which only have "depth" in common, you can gather the remaining numeric columns into "long" format, where we label the type of data in one column ('col' here) and the value in another ('val' here).
Once the data is combined, we can use facet_wrap(~col, scales = "free_y") to make facets for each variable, but with a common x axis.
library(tidyverse)
df_combo <-
bind_rows(
df1 %>% gather(col, val, L:P),
df2 %>% gather(col, val, Ca:Ti)
)
ggplot(df_combo, aes(depth, val, color = Depo)) +
geom_path() +
facet_wrap(~col, scales = "free_y", ncol = 1)

Reordering columns by y-value in R?

I have a dataframe structured like this:
> head(df)
Zip Crimes Population CPC
1 78701 2103 6841 0.3074
2 78719 186 1764 0.1054
3 78702 1668 21334 0.0782
4 78723 2124 28330 0.0750
5 78753 3472 49301 0.0704
6 78741 2973 44935 0.0662
And I'm plotting it using this function:
p = ggplot(df, aes(x=Zip, y=CPC)) + geom_col() + theme(axis.text.x = element_text(angle = 90))
And this is the graph I get:
How can I order the plot by CPC, where the highest Zip codes are on the left?
Convert Zip to a factor ordered by negative CPC. E.g., try df$Zip <- reorder(df$Zip, -df$CPC) before plotting. Here's a small example:
d <- data.frame(
x = c('a', 'b', 'c'),
y = c(5, 15, 10)
)
library(ggplot2)
# Without reordering
ggplot(d, aes(x, y)) + geom_col()
# With reordering
d$x <- reorder(d$x, -d$y)
ggplot(d, aes(x, y)) + geom_col()
Sort your data frame in descending order and then plot it:
library(dplyr)
df <- arrange(df,desc(CPC))
ggplot...

ggplot2 facet_wrap() 4 scatter plot

I have a dataset (from R):
head(anscombe)
x1 x2 x3 x4 y1 y2 y3 y4
1 10 10 10 8 8.04 9.14 7.46 6.58
2 8 8 8 8 6.95 8.14 6.77 5.76
And now I would like to plot scatter plot of (x1,y1), (x2,y2), (x3,y3) and (x4, y4) in grid using ggplot2. Each subplot should also have title "1", "2","3","4" respectively. It should be similar as when we use par(mfrow=c(2,2)) I looked into facet_wrap documentation but the examples seems to be not covering this simple case. How can I achieve it in ggplot2?
Here's one way to do it, if hard-coding the dataset numbers 1-4 is acceptable:
library(dplyr)
library(ggplot2)
data(anscombe)
list(
transmute(anscombe, x=x1, y=y1, dataset=1),
transmute(anscombe, x=x2, y=y2, dataset=2),
transmute(anscombe, x=x3, y=y3, dataset=3),
transmute(anscombe, x=x4, y=y4, dataset=4)
) %>%
bind_rows() %>%
ggplot(aes(x, y)) +
geom_point() +
facet_wrap(~ dataset)
The main thing is that you need all the x-coordinate values (x1 to x4) in one variable, and all y-coordinates (y1 to y4) in another.
You could try without facet_wrap too:
library(ggplot2)
library(gridExtra)
grid.arrange(ggplot(df, aes(x1, y1))+geom_point(size=2),
ggplot(df, aes(x2, y2))+geom_point(size=2),
ggplot(df, aes(x3, y3))+geom_point(size=2),
ggplot(df, aes(x4, y4))+geom_point(size=2))
It's possible not all of this is required, but it worked for me. To see what it is doing, just iterate through, line-by-line, and look at the intermediate steps.
library(dplyr)
library(tidyr)
library(ggplot2)
mutate(df, i = row_number()) %>%
gather(key, val, -i) %>%
mutate(pane = gsub("[a-z]", "", key),
key = gsub("[^a-z]", "", key)) %>%
spread(key, val) %>%
ggplot(aes(x=x,y=y)) +
geom_point() +
facet_wrap(~pane)

Scatter plot with ggplot

I want to do a scatter (xy) plot of variables in a melted data frame as shown below.
df
class var mean
0 x 4.25
0 y 6.25
1 x 2.00
1 y 11.00
I have tried this, but it plots 4 points. How can plot x and y?
library(ggplot2)
ggplot(df, aes(x=mean, y=mean, group=var, colour=class)) +
geom_point( size=5, shape=21, fill="white")
As Heroka pointed out, you need the data to be in a more wide type format. If the data was read in like this, you may use the following to convert it.
## you don't need this since you already have df
text = "class var mean
0 x 4.25
0 y 6.25
1 x 2.00
1 y 11.00"
df = read.delim(textConnection(text),header=TRUE,strip.white=TRUE,
stringsAsFactors = FALSE, sep = " ");df2
## use this library to switch from long-wide
library(reshape2)
df2 = dcast(df, class ~ var, value.var = "mean")
library(ggplot2)
ggplot(df2, aes(x=x, y=y, colour=class)) +
geom_point( size=5, shape=21, fill="white")

2x1 faceting with ggplot2

I'm trying to make a simple facet with histograms in ggplot2
data <- read.csv("/hist_distances.csv", check.names = FALSE, sep = ",")
mdata <- melt(data)
m <- ggplot(data, aes(x=Distance))
m + geom_histogram()
head(data)
Gives:
Times Distance
1 3.093060 260.8840
2 2.557780 187.4960
3 0.263611 10.6584
4 2.880000 184.5970
5 5.035000 281.3490
6 6.952780 251.4730
head(mdata)
gives:
variable value
1 Times 3.093060
2 Times 2.557780
3 Times 0.263611
4 Times 2.880000
5 Times 5.035000
6 Times 6.952780
and
tail(mdata)
gives:
variable value
1739 Distance 1.103670
1740 Distance 1.695610
1741 Distance 3.795020
1742 Distance 6.651960
1743 Distance 0.719843
1744 Distance 6.504050
This produces this graphic:
I have tried:
m <- ggplot(mdata, aes(x=value)) +
geom_histogram() +
m + facet_wrap(~ variable)
With no success.
How can I produce a facetted graph instead, with a histogram of variable "times" at the top and a histogram of variable "distances" at the bottom?
Use facet_grid(variable ~ .), where facet_grid(row ~ column):
df <- data.frame(Time = rnorm(100),
Distance = rnorm(100)
)
dfm <- melt(df)
ggplot(dfm, aes(x=value)) + geom_histogram() + facet_grid(variable ~ .)
Edit for follow-up comment:
If your data are on different scales, use facet_grid(variable ~ ., scales = "free").
See help(facet_grid) for options.

Resources