How to do an association plot in ggplot2?

How to do an association plot in ggplot2? - r

I have a table with two categorical values and I want to visualise their association; the number of times that they are found together in the same row.
For instance, let's take this data frame:
d <-data.frame(cbind(sample(1:5,100,replace=T), sample(1:10,100,replace=T)))
How can generate a heatmap like this:
Where the colour of the squares represent the number of times that X1 and X2 are found in a given combination.
It would be even better to know how to plot this with a dot plot instead, where the size of the dot represent the count of the combination occurrence between X1 and X2.
If you can guide me how to do this on ggplot2 or any other way in R, it would be really helpful.
Thanks!

Here's how I would do it:
library(ggplot2)
library(dplyr)
set.seed(123)
d <-data.frame(x = sample(1:5,100,replace=T), y = sample(1:10,100,replace=T))
d_sum <- d %>%
group_by(x, y) %>%
summarise(count = n())
For the heatmap:
ggplot(d_sum, aes(x, y)) +
geom_tile(aes(fill = count))
For the dotplot:
ggplot(d_sum, aes(x, y)) +
geom_point(aes(size = count))

library(ggplot2)
library(dplyr)
library(scales)
set.seed(123)
d <-data.frame(x = sample(1:20,1000,replace=T), y = sample(1:20,1000,replace=T))
d %>% count(x, y) %>% ggplot(aes(x, y, fill = n)) +
geom_tile() +
scale_x_continuous(breaks=1:20)+
scale_y_continuous(breaks=1:20)+
scale_fill_gradient2(low='white', mid='steelblue', high='red') +
guides(fill=guide_legend("Count")) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + theme_bw()

Related

How to remove low frequency bins in histogram

Let's say I've a data frame containing an array of numbers which I want to visualise in a histogram. What I want to achieve is to show only the bins containing more than let's say 50 observations.
Step 1
set.seed(10)
x <- data.frame(x = rnorm(1000, 50, 2))
p <-
x %>%
ggplot(., aes(x)) +
geom_histogram()
p
Step 2
pg <- ggplot_build(p)
pg$data[[1]]
As a check when I print the pg$data[[1]] I'd like to have only rows where count >= 50.
Thank you

library(ggplot2)
ggplot(x, aes(x=x, y = ifelse(..count.. > 50, ..count.., 0))) +
geom_histogram(bins=30)
With this code you can see the counts of the deleted bins:
library(ggplot2)
ggplot(x, aes(x=x, y = ifelse(..count.. > 50, ..count.., 0))) +
geom_histogram(bins=30, fill="green", color="grey") +
stat_bin(aes(label=..count..), geom="text", vjust = -0.7)

You could do something like this, most likely you do not really like the factorized names on the x-axis, but what you can do is split the two values and take the average to take that one to plot the x-axis.
x %>%
mutate(bin = cut(x, breaks = 30)) %>%
group_by(bin) %>%
mutate(count = n()) %>%
filter(count > 50) %>%
ggplot(., aes(bin)) +
geom_histogram(stat = "count")

How can you plot `geom_point()` with `facet_wrap()` using per-group row number as x?

Is there a way to plot geom_point() so that it implicitly uses the row number as x in a facet? Just like plot(y) but also for multiple facets.
The following fails with Error: geom_point requires the following missing aesthetics: x:
df = data.frame(y = rnorm(60), group = rep(c("A", "B", "C"), 20))
ggplot(df, aes(y = y)) +
geom_point() +
facet_wrap(~group)
Naturally, you can do it using something like the following, but it is quite cumbersome.
df = df %>%
group_by(group) %>%
mutate(row = row_number())
ggplot(df, aes(x = row, y = y)) +
geom_point() +
facet_wrap(~group)

You can try this:
ggplot(df, aes(x=seq(y),y = y))+geom_point() + facet_wrap(~group)
In that way you can avoid the creation of an index variable as you mentioned!!!

Boxplot for several variables with different Y scale

I have 4 variables (A, B, C, D) with similar pattern on 3 Locations. I would like to plot a box plot (variables as dots on Y-axis, locations as X). But the variables have values of different orders of magnitude. Is there a way of scaling the Y-axis and have all variables plotted on the boxplots? Maybe differenced by colouring.
Location = c("Washington","Washington","Washington","Washington","Washington","Washington", "Maine","Maine","Maine","Maine","Maine", "Florida","Florida","Florida","Florida","Florida","Florida")
A = c(0.000693156, 0.000677354, 0.000727863, 0.000650822, 0.000908343, 0.001126689, 0.001316292, 0.000975274, 0.00109082, 0.001057585, 0.000927826, 0.000552769, 0.000532546, 0.000559781, 0.000771569, 0.000563436, 0.000551136)
B = c(0.001915388, 0.001936627, 0.001476521, 0.001573681, 0.002584282, 0.00738909, 0.008089839, 0.006616564, 0.00495211, 0.004515925, 0.003791596, 0.000653847, 0.000350701, 0.000559781, 0.001920087, 0.000738206, 0.001077627)
C = c(0.000138966, 0.000104745, 0.000145573, 0.000103305, 5.08255E-05, 0.000361988, 0.000264876, 0.000454172, 0.000277471, 0.000117919, 8.9214E-05, 0.000173727, 0.000108241, 8.54628E-05, 2.35593E-05, 3.1302E-05, 1.12019E-05)
D = c(0.000108829, 0.000135005, 0.000120617, 9.29746E-05, 0.000105561, 9.27596E-05, 0.000121317, 0.000131471, 0.000152503, 0.000128974, 0.000196271, 0.000142141, 0.000147208, 0.00013674, 0.000147246, 0.000185204, 0.000103058)
df = data.frame(Location, A, B, C, D)
And this is what I have tried for two variables as individual graphs
library(ggplot2)
a <- ggplot(df, aes(x=Location, y=A)) +
geom_boxplot()
a + geom_dotplot(binaxis='y', stackdir='center', dotsize=1, fill="red")
b <- ggplot(df, aes(x=Location, y=B)) +
geom_boxplot()
b + geom_dotplot(binaxis='y', stackdir='center', dotsize=1, fill="blue")
Can I merge all 4 variables in 1 graph with a scaled Y-axis?
Can I add a legend only showing "A" and "D"?

If you reshape your data to "long" format, faceting is one option. Note that you must set scales = 'free' in facet_wrap().
library(tidyverse)
df.long <- df %>%
pivot_longer(A:D, names_to = 'variable', values_to = 'value')
g <- ggplot(data = df.long, aes(x = Location, y = value)) +
geom_boxplot() +
facet_wrap(facets = ~variable, scales = 'free')
print(g)
If you wanted to get everything on one plot, you'd have to rescale the data per group. Here I've normalized each data point to between 0 and 1, relative to its original scale.
df.long <- df %>%
pivot_longer(A:D, names_to = 'variable', values_to = 'value') %>%
group_by(variable) %>%
mutate(value_norm = value - min(value),
value_norm = value_norm / max(value_norm)
)
g.norm <- ggplot(data = df.long, aes(x = Location, y = value_norm, fill = variable)) +
geom_boxplot()
print(g.norm)

Try this. Using scale_y_log10. Not the most beautiful plot, but ...
library(ggplot2)
library(tidyr)
library(dplyr)
df %>%
pivot_longer(-Location) %>%
ggplot(aes(x=Location, y=value, color = name)) +
geom_boxplot() +
geom_dotplot(aes(fill = name), color = "black", binaxis='y', dotsize=.5) +
scale_y_log10()
#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.
Created on 2020-04-14 by the reprex package (v0.3.0)

Annotate with ggplot2 when axis is of class 'date'

I'm dealing a lot with geom_line plots these days. What is the easiest way to annotate on a plot with an axis of class date? Other than to convert the date variable to a different class?
Here's my code:
china_trades %>%
filter(type %in% c("Imports")) %>%
ggplot() +
geom_line(aes(x = month, y = dollars, group = 1)) +
theme_minimal()
I would like to annotate the last data point which is at 2017-10 and 48.
Here's my plot:

Maybe somebody can chime in with a pure gg way of doing this but the directlabels package has this functionality:
china_trades %>%
filter(type %in% c("Imports")) %>%
ggplot() +
geom_line(aes(x = month, y = dollars, group = 1)) +
theme_minimal() +
geom_dl(aes(label = month), method = list(dl.combine("last.points")))
Edit: Here's a gg way using annotate:
x <- as.Date(c('2016-1-1','2016-1-2','2016-1-3','2016-1-4'))
y <- c(4,1,2,3)
df <- data.frame(x,y)
lastDate<- max(x)
lastDateY <- df[x==lastDate,2]
ggplot(df) +
geom_line(aes(x = x, y = y)) +
annotate(geom='text', x=lastDate,y=lastDateY, vjust=-2, label="China")

Grouping data outside limits in histogram using ggplot2

I am trying to do a histogram zoomed on part of the data. My problem is that I would like to grup everything that is outside the range into last category "10+". Is it possible to do it using ggplot2?
Sample code:
x <- data.frame(runif(10000, 0, 15))
ggplot(x, aes(runif.10000..0..15.)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), colour = "grey50", binwidth = 1) +
scale_y_continuous(labels = percent) +
coord_cartesian(xlim=c(0, 10)) +
scale_x_continuous(breaks = 0:10)
Here is how the histogram looks now:
How the histogram looks now
And here is how I would like it to look:
How the histogram should look
Probably it is possibile to do it by nesting ifelses, but as I have in my problem more cases is there a way for ggplot to do it?

You could use forcats and dplyr to efficiently categorize the values, aggregate the last "levels" and then compute the percentages before the plot. Something like this should work:
library(forcats)
library(dplyr)
library(ggplot2)
x <- data.frame(x = runif(10000, 0, 15))
x2 <- x %>%
mutate(x_grp = cut(x, breaks = c(seq(0,15,1)))) %>%
mutate(x_grp = fct_collapse(x_grp, other = levels(x_grp)[10:15])) %>%
group_by(x_grp) %>%
dplyr::summarize(count = n())
ggplot(x2, aes(x = x_grp, y = count/10000)) +
geom_bar(stat = "identity", colour = "grey50") +
scale_y_continuous(labels = percent)
However, the resulting graph is very different from your example, but I think it's correct, since we are building a uniform distribution:

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to do an association plot in ggplot2? - r

Related

How to remove low frequency bins in histogram

How can you plot `geom_point()` with `facet_wrap()` using per-group row number as x?

Boxplot for several variables with different Y scale

Annotate with ggplot2 when axis is of class 'date'

Grouping data outside limits in histogram using ggplot2

Categories

Resources