How to make this graph - that compares ranks - in R? - r

I'm trying to make a graph like the one on the picture in R. I tried with this piece of code, however it doesn't look the same, I want it to be symmetrical just like the one on the picture.
My data.frame looks like this:
Group Ranking1 Ranking2 Pop
a 1 1 12345
b 2 4 127868
c 3 2 123477
d 4 3 9485
e 5 7 132588
f 6 5 38741
g 7 9 8372
h 8 11 53423
i 9 6 238419
j 10 16 31314
And the code I used was:
ggparcoord(data,
columns = 2:3, groupColumn = 1,
scale="globalminmax",
showPoints = TRUE,
title = "Ranking",
alphaLines = 0.3
) + scale_color_viridis(discrete=TRUE) + theme_ipsum()+ theme_void()
But I can`t make it look like this one:

If I understand correctly what you mean with "symmetrical": You won't be able to reproduce a graph like this if the Rankings in the two columns don't match. In Ranking1 you have c(1:10), in Ranking2 you have c(1:7, 9, 11, 16).
Here's a minimal example to get closer to your goal:
Data
# Data with corrected rankings (1:10)
data <- read.table(text="
Group Ranking1 Ranking2 Pop
a 1 1 12345
b 2 4 127868
c 3 2 123477
d 4 3 9485
e 5 7 132588
f 6 5 38741
g 7 9 8372
h 8 8 53423
i 9 6 238419
j 10 10 31314
", header = TRUE)
Code
# Build plot
GGally::ggparcoord(data,
columns = 2:3, groupColumn = 1,
scale="globalminmax",
showPoints = TRUE,
title = "Ranking"
) +
# Reversed y axis with custom breaks to recreate 1:10 rankings
scale_y_reverse(breaks = 1:10)
Edit: Making it pretty
If you want to add some pizzaz (as you were trying to do) you can do the following (no need to use theme_void()):
GGally::ggparcoord(data,
columns = 2:3, groupColumn = 1,
scale="globalminmax",
showPoints = TRUE,
title = "Ranking"
) +
# Reverses scale, adds pretty breaks
scale_y_reverse(breaks = 1:10) +
# Prettifies typography etc.
hrbrthemes::theme_ipsum() +
# Removes gridlines
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()
) +
# Removes axis labels
labs(
y = element_blank(),
x = element_blank()
)

Related

Normalize/proportionalize count of individual bins 2d histogram

I am trying to normalize the count of individual bins in a 2d histogram. Here, group 3 has a substantially higher number of inputs, however, I want to compare bins. So I am trying to get it to show the proportional y values of each bin, that the total count of each bin adds up to e.g. 100.
I reckon that this has to be done with the dataframe beforehand. I have managed to normalize the values per group, however, I havent managed to reduce the count to be able to visualize it like so in with the 2d histogram function.
perClassNormalized <- Variables %>%
group_by(Class) %>%
mutate(Nor = procntStad/(max(procntStad)))
Variables <- dataframe with about 10 variables (columns), each with x entries per one of 5 classes. The current total counts per class are: 1 = 639, 2 = 247, 3 = 9881, 4 = 1084, 5 = 823. So the number of inputs for 3 is substantially higher than the others.
Class
variable1
variable2
1
3
7
1
2
3
2
2
6
2
5
8
3
3
9
3
2
1
3
2
3
3
8
4
4
9
5
5
10
2
Example of what image I currently have
my_breaks = c(2, 10, 50, 100, 5000)
##
procentStadVSKlasse <- ggplot(perClassNormalized , aes(x = Class, y = (Nor))) + geom_bin2d(bins = 10) +
ylab("Percentage bebouwd oppervlak") + xlab("Norm klasse regionale kering") +
labs(title = "Bebouwd oppervlak") +
scale_fill_gradient(name = "count", trans = "log", breaks = my_breaks, labels = my_breaks,
low = '#55C667FF', high = '#FDE725FF') +
theme_bw() +
scale_x_discrete(limits = c(1,2,3,4,5)) +
theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.title.x = element_text(size=14),
axis.text.x = element_text(size=12),
axis.title.y = element_text(size=14))
The new image should likely be similar, however, the visualization is likely to be improved and distinctions are hopefully more easily spotted.

R: geom_signif in a ggplot2 plot with facets; why can't I specify comparisons that do not include the first group?

I am trying to do something that I have done in the past successfully, but now I am not sure if I am missing something or something has changed in newer versions, but for the life of me I cannot make it work...
I just want to make a faceted ggplot2 plot like the one below, using geom_signif on top of it to actually include the percentage change from each value plotted, compared to all the rest.
To make things simple, however, I am just adding "foo" to the geom_signif brackets (no percentage change values yet).
So I have a data frame like this one:
mydata <- data.frame(Rep=rep(paste0('rep',1:5), 4),
Population=rep(paste0(LETTERS[1:4],'cells'), each=5),
Value=c(runif(15,1,5), runif(5,30,40)))
Rep Population Value
1 rep1 Acells 3.906863
2 rep2 Acells 2.391534
3 rep3 Acells 2.417360
4 rep4 Acells 4.956607
5 rep5 Acells 1.018905
6 rep1 Bcells 3.348250
7 rep2 Bcells 1.979448
8 rep3 Bcells 3.499493
9 rep4 Bcells 4.161168
10 rep5 Bcells 4.705278
11 rep1 Ccells 1.854068
12 rep2 Ccells 4.514578
13 rep3 Ccells 3.430654
14 rep4 Ccells 4.418377
15 rep5 Ccells 1.228447
16 rep1 Dcells 39.763432
17 rep2 Dcells 36.528565
18 rep3 Dcells 31.575392
19 rep4 Dcells 34.956205
20 rep5 Dcells 39.882848
I just try to make the plot the following way; note that I want to save the plot in P and access it afterwards to include my actual values, so let's try to keep it that way.
As you can see though, the problem I am encountering is I can only include comparisons that include rep1 for some reason... the moment I try to include the comparison rep3 vs rep4 for example, it becomes ignored! Why is this happening?
P <- ggplot2::ggplot(mydata, ggplot2::aes(x=Rep, y=Value, group=1)) +
ggplot2::geom_line(color="black") +
ggplot2::geom_point(color="red", shape=16, size=5) +
ggplot2::facet_wrap(.~Population, scales="free", ncol=2) +
ggplot2::theme_light() +
ggsignif::geom_signif(comparisons=list(c("rep1","rep2"),c("rep1","rep3"),c("rep3","rep4")),
annotation="foo",
textsize=5,
size=1)
# build the plot (get a list of data frames out of it)
P2 <- ggplot2::ggplot_build(P)
# in list 3 we have access to each annotation
head(P2$data[[3]], 20)
# plot
grDevices::pdf.options(reset = TRUE, onefile = FALSE)
grDevices::pdf(file="test.pdf", height=10, width=10)
print(#or ggsave()
graphics::plot(ggplot2::ggplot_gtable(P2))
)
grDevices::dev.off()
You can see how the rep3 vs rep4 comparison was totally ignored in head(P2$data[[3]], 20), see panel 1:
x xend y yend annotation group PANEL shape colour textsize angle hjust vjust alpha family fontface
1 1 1 5.035361 5.153492 foo rep1-rep2-1 1 19 black 5 0 0.5 0 NA 1
2 1 2 5.153492 5.153492 foo rep1-rep2-1 1 19 black 5 0 0.5 0 NA 1
3 2 2 5.153492 5.035361 foo rep1-rep2-1 1 19 black 5 0 0.5 0 NA 1
4 1 1 5.035361 5.153492 foo rep1-rep3-2 1 19 black 5 0 0.5 0 NA 1
5 1 3 5.153492 5.153492 foo rep1-rep3-2 1 19 black 5 0 0.5 0 NA 1
6 3 3 5.153492 5.035361 foo rep1-rep3-2 1 19 black 5 0 0.5 0 NA 1
And of course, the final plot doesn't show the bracket for that comparison (only for the rep1 comparisons):
Any idea why this is happening?
As an additional question: how would I specify the y_position for all the final brackets in all panels? I know how to do it if it's the same for all panels, but note Dcells Population has a different range of values, so I would like to keep the "free" scales.
Many thanks!
It looks like you need to move 'group = 1' into geom_line(), E.g.
library(tidyverse)
library(ggsignif)
mydata <- data.frame(Rep=rep(paste0('rep',1:5), 4),
Population=rep(paste0(LETTERS[1:4],'cells'), each=5),
Value=c(runif(15,1,5), runif(5, 30,40)))
P <- ggplot(mydata, aes(x = Rep, y = Value)) +
geom_point(color = "red", shape = 16, size = 5) +
geom_line(aes(group = 1)) +
facet_wrap(. ~Population, scales = "free_y", ncol = 2) +
theme_light() +
geom_signif(comparisons=list(c("rep1", "rep2"),
c("rep1", "rep3"),
c("rep3", "rep4")),
annotations = c("foo 1v2", "foo 1v3", "foo 3v4"),
textsize=4,
size=1,
step_increase = 0.1)
# build the plot (get a list of data frames out of it)
P2 <- ggplot2::ggplot_build(P)
# in list 3 we have access to each annotation
head(P2$data[[3]], 20)
# plot
grDevices::pdf.options(reset = TRUE, onefile = FALSE)
grDevices::pdf(file="test.pdf", height=10, width=10)
print(#or ggsave()
graphics::plot(ggplot2::ggplot_gtable(P2))
)
grDevices::dev.off()
You might want to consider changing the 'free y' scale too, depending on your application, E.g.
P <- ggplot(mydata, aes(x = Rep, y = Value)) +
geom_point(color = "red", shape = 16, size = 5) +
geom_line(aes(group = 1)) +
facet_wrap(. ~Population, ncol = 2) +
theme_light() +
ylim(c(0,60)) +
geom_signif(comparisons=list(c("rep1", "rep2"),
c("rep1", "rep3"),
c("rep3", "rep4")),
annotations = c("foo 1v2", "foo 1v3", "foo 3v4"),
textsize=4,
size=1,
step_increase = 0.1)
# build the plot (get a list of data frames out of it)
P2 <- ggplot2::ggplot_build(P)
# in list 3 we have access to each annotation
head(P2$data[[3]], 20)
# plot
grDevices::pdf.options(reset = TRUE, onefile = FALSE)
grDevices::pdf(file="test.pdf", height=10, width=10)
print(#or ggsave()
graphics::plot(ggplot2::ggplot_gtable(P2))
)
grDevices::dev.off()

Need to make a 2x2 ggplot in R

I have the following data:
unigrams Freq
1 the 236133
2 to 154296
3 and 128165
4 a 127434
5 i 124599
6 of 103380
7 in 81985
8 you 69504
9 is 65243
10 for 62425
11 it 60298
12 that 58605
13 on 45935
14 my 45424
15 with 38270
16 this 34799
17 was 33009
18 be 32725
19 have 31728
20 at 30255
and this set of data:
bigrams Freq
1 of the 20707
2 in the 19443
3 for the 11090
4 to the 10939
5 on the 10280
6 to be 9555
7 at the 7184
8 i have 6408
9 and the 6387
10 i was 6143
11 is a 6114
12 and i 5993
13 i am 5843
14 in a 5770
15 it was 5644
16 for a 5343
17 if you 5326
18 it is 5196
19 with the 5092
20 have a 4936
I would like to place two qplots together side-by-side, ncol = 2. I tried the gridExtra library, but it is generating errors that I can't seem to figure out how to correct. Any ideas on how to do this, please?
library(gridExtra)
# The 20 most unigrams in the dataset
ugrams <- as.data.frame(unigrams)
graph.data <- ugrams[order(ugrams$Freq, decreasing = T), ]
graph.data <- graph.data[1:20, ]
p1 <- qplot(unigrams,Freq, data=graph.data,fill=unigrams,geom=c("histogram"))
# The 20 most bigrams in the dataset
bgrams <- as.data.frame(bigrams)
graph.data <- bgrams[order(bgrams$Freq, decreasing = T), ]
graph.data <- graph.data[1:20, ]
p2 <- qplot(bigrams,Freq, data=graph.data,fill=bigrams,geom=c("histogram"))
grid.arrange(p1,p2,ncol=2)
This is the error that is generated:
<error/rlang_error>
stat_bin() can only have an x or y aesthetic.
Backtrace:
1. (function (x, ...) ...
2. ggplot2:::print.ggplot(x)
4. ggplot2:::ggplot_build.ggplot(x)
5. ggplot2:::by_layer(function(l, d) l$compute_statistic(d, layout))
6. ggplot2:::f(l = layers[[i]], d = data[[i]])
7. l$compute_statistic(d, layout)
8. ggplot2:::f(..., self = self)
9. self$stat$setup_params(data, self$stat_params)
10. ggplot2:::f(...)
I would like to have the graphs resemble this one:
Which was accomplished by the following code:
# The 20 most quadgrams in the dataset
qgrams <- as.data.frame(quadgrams)
graph.data <- qgrams[order(qgrams$Freq, decreasing = T), ]
graph.data <- graph.data[1:20, ]
ggplot(data=graph.data, aes(x=quadgrams, y=Freq, fill=quadgrams)) + geom_bar(stat="identity") +
theme(axis.text.x = element_text(angle = 40, hjust = 1))
Is that possible
Edited for your shift from histograms to bar plots. Assuming that graph.data is actually your ugrams dataset, the working single plot is
Putting them side-by-side can be done with facets:
dplyr::bind_rows(
unigrams = select(ugrams, grams = unigrams, Freq),
bigrams = select(bigrams, grams = bigrams, Freq),
.id = "id") %>%
arrange(-Freq) %>%
mutate(
id = factor(id, levels = c("unigrams", "bigrams")),
grams = factor(grams, levels = grams)
) %>%
ggplot(aes(x = grams, y = Freq, fill = grams)) +
facet_wrap(~ id, ncol = 2, scales = "free_x") +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 40, hjust = 1))
(Obviously, these are "too small" to hold all of the legend, but that depends on where you are using it. I wonder if the legend shouldn't be included, since it is somewhat redundant with the x-axis labels.)
The y-axis on the left is harder to see because it is dwarfed by the unigrams on the right. While it does bias the plot (it might be natural to compare the vertical levels of the plot on the left with those on the right), you can alleviate that by freeing both the "x" (already free) and "y" axes with scales="free":

From basic plot to ggplot, how to convert cex for symbol size

A friend of mine helped me creating a code to generate a scatter plot using the basic plot function. Now I would like to make the same plot using ggplot, however I do not know/understand how to convert cex parameter of the basic plot function in the corresponding ggplot option.
This is an example of my data
df=read.table(text="
A B C D E F G H I J
1 2 3 4 5 1 2 3 4 5
2 3 4 5 6 2 3 4 5 6
3 4 5 6 7 3 4 5 6 7
4 5 6 7 8 4 5 6 7 8
5 6 7 8 9 5 6 7 8 9
6 7 8 9 10 6 7 8 9 10
7 8 9 10 11 7 8 9 10 11
8 9 10 11 12 8 9 10 11 12
9 10 11 12 13 9 10 11 12 13",header=T)
And the is the code I use for the basic plot function
temp <- as.matrix(df)
x <- ncol(temp)/2
y <- nrow(temp)
maxtemp <- max(temp [ , 1:x], na.rm = T)
plot(rep(1, y) ~ temp [, x + 1], type = "p", pch = 1, cex = 5*temp [, 1]/maxtemp , xlim = c(0, 15), ylim = c(0,6))
for(i in 1:(x-1)){
points(rep(1 + i, y) ~ temp [, x + 1 + i], type = "p", pch = 1, cex = 5*temp [, 1 + i]/maxtemp)
next}
To make the same plot in ggplot I wrote this code, however I do not get the same picture, the size of the dots is not the same, and I guess this is due to the fact that plot uses cex, while ggplot uses size, and I do not know how to deal with this...
temp <- data.table(df)
colnames(temp) <- c(paste("I",c(1:5), sep=""), c(1:5))
x <- ncol(temp)/2
maxtemp <- max(temp [ , 1:x], na.rm = T)
for(t in 1:5){
temp[, t] <- 5*temp[, t, with = FALSE]/maxtemp
next}
#I do this to create the 'cex' values, as cex does not exist in ggplot
ggplot(gather(temp[, 6:10]),aes(x = value, y = as.numeric(key))) + geom_point(aes(size = gather(temp[, 1:5])$value), shape = 1) + xlim(0, 15) + ylim(0, 6) + theme_bw() + theme(legend.position = "none", panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + xlab("") + ylab("")
What am I doing wrong?
Ok, I think I found a way to do what I wanted.
I was looking for something else and I came across the following sentence in the reference manual of ggplot2
# To set aesthetics, wrap in I()
qplot(mpg, wt, data = mtcars, colour = I("red"))
So I gave it a try and after few attempts I ended up with the following code
ggplot(gather(temp[, 6:10]),aes(x = value, y = as.numeric(key))) + geom_point(aes(size = I(gather(temp[, 1:5])$value*2)), shape = 1) + xlim(0, 15) + ylim(0, 6) + theme_bw() + theme(legend.position = "none", panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + xlab("") + ylab("")
Using aes(size = I(gather(temp[, 1:5])$value*2)) I get almost exactly the same plot as if I use the plot() function (first block of code).
I do not fully understand the relationship between cex and aes(I()), but it does what I was looking for. Maybe someone can comment on this.

ggplot2 facets: Different annotation text for each plot

I have the following generated data frame called Raw_Data:
Time Velocity Type
1 10 1 a
2 20 2 a
3 30 3 a
4 40 4 a
5 50 5 a
6 10 2 b
7 20 4 b
8 30 6 b
9 40 8 b
10 50 9 b
11 10 3 c
12 20 6 c
13 30 9 c
14 40 11 c
15 50 13 c
I plotted this data with ggplot2:
ggplot(Raw_Data, aes(x=Time, y=Velocity))+geom_point() + facet_grid(Type ~.)
I have the objects: Regression_a, Regression_b, Regression_c. These are the linear regression equations for each plot. Each plot should display the corresponding equation.
Using annotate displays the particular equation on each plot:
annotate("text", x = 1.78, y = 5, label = Regression_a, color="black", size = 5, parse=FALSE)
I tried to overcome the issue with the following code:
Regression_a_eq <- data.frame(x = 1.78, y = 1,label = Regression_a,
Type = "a")
p <- x + geom_text(data = Raw_Data,label = Regression_a)
This did not solve the problem. Each plot still showed Regression_a, rather than just plot a
You can put the expressions as character values in a new dataframe with the same unique Type's as in your data-dataframe and add them with geom_text:
regrDF <- data.frame(Type = c('a','b','c'), lbl = c('Regression_a', 'Regression_b', 'Regression_c'))
ggplot(Raw_Data, aes(x = Time, y = Velocity)) +
geom_point() +
geom_text(data = regrDF, aes(x = 10, y = 10, label = lbl), hjust = 0) +
facet_grid(Type ~.)
which gives:
You can replace the text values in regrDF$lbl with the appropriate expressions.
Just a supplementary for the adopted answer if we have facets in both horizontal and vertical directions.
regrDF <- data.frame(Type1 = c('a','a','b','b'),
Type2 = c('c','d','c','d'),
lbl = c('Regression_ac', 'Regression_ad', 'Regression_bc', 'Regression_bd'))
ggplot(Raw_Data, aes(x = Time, y = Velocity)) +
geom_point() +
geom_text(data = regrDF, aes(x = 10, y = 10, label = lbl), hjust = 0) +
facet_grid(Type1 ~ Type2)
The answer is good but still imperfect as I do not know how to incorporate math expressions and newline simultaneously (Adding a newline in a substitute() expression).

Resources