How to visualize only specific geographic area with ggplot2? - r

I'm working with ggplot2 and i'm creating a geographic representation of my country.
This is the dataset and the script I'm using ( prov2022 is the shapefile for the map)
#dataset
COD_REG COD_PROV Wage
1 91 530
1 92 520
1 93 510
2 97 500
2 98 505
2 99 501
13 102 700
13 103 800
13 159 900
18 162 740
18 123 590
18 119 420
19 162 340
19 123 290
19 119 120
#script
right_join(prov2022, dataset, by = "COD_PROV") %>%
ggplot(aes(fill = `Wage`)) +
geom_sf() +
theme_void() +
scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'))
It works fine, but now I'm insterested in visualizing only a specific area.
If I add a filter to select the regions that have the value of the variable COD_REG > 13, I get what I was looking for but the color gradient changes.
right_join(prov2022, dataset, by = "COD_PROV") %>%
filter(COD_REG >= 13 ) %>%
ggplot(aes(fill = `Wage`)) +
geom_sf() +
theme_void() +
scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'))
The color gradient of the output that I get is different if I use the filter because the colors are applied considering only the values of that specific areas and not anymore the ones of the whole country.
As consequence these areas do not have anymore the colors that they had at the beggining ( I mean in the entire map that i get with the first script).
I need to maintain the color gradient of the whole country, but get as output of ggplot2 only some specific areas without changing anything.
How do I solve?

Try this. I'll use fake data on the state map from package maps.
library(ggplot2)
library(maps)
usa <- sf::st_as_sf(map('state', plot = FALSE, fill = TRUE))
set.seed(42)
usa$val <- runif(length(usa$ID))
ggplot(usa, aes(fill = val)) +
geom_sf() +
scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'))
If we naively just filter the states we want to see, the colors change:
ggplot(usa, aes(fill = val)) +
geom_sf(data = ~ subset(., val > 0.5)) +
scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black'))
If we add geom_blank, though, we can normalize the range of values from which the scale is determined. Since it still uses all of the original data, and it does nothing with it (except to update scales and limits), it "costs" nothing as far as drawing (e.g.) transparent or super-small things in order to get its way. From ?geom_blank:
The blank geom draws nothing, but can be a useful way of ensuring
common scales between different plots. See 'expand_limits()' for
more details.
Code:
ggplot(usa, aes(fill = val)) +
geom_sf(data = ~ subset(., val > 0.5)) +
scale_fill_gradientn(colors = c( 'white', 'yellow', 'red', 'black')) +
geom_blank()
Notice that I'm using inline ~ rlang-style functions for subsetting the data; this is my convention but is not required.

Related

Is it possible to add few more details like rich factor to the bar graph along with the pvalve?

Pathway
#Proteins
Pvalue
Richfactor
Peptide chain elongation
90
1.11E-16
0.5
Translation elongation
79
1.11E-16
0.7
P53 pathway
50
1.11E-16
0.2
cGAS sting pathway
20
1.11E-16
0.4
The above given is the data. Using this data i tried to generate bar graph with pvalue and proteins but i want to add additional details to graph like Rich factor given in the data above.
library(ggplot2)
library(viridis)
top_fun <- read.delim(file="Pathways.txt",header = TRUE)
topfun <- as.data.frame(top_fun)
#Turn your 'Name' column into a character vector
topfun$Pathway <- as.character(topfun$Pathway)
#Then turn it back into a factor with the levels in the correct order
topfun$Pathway<- factor(topfun$Pathway, levels=unique(topfun$Pathway))
ggplot(topfun,aes(x=Group,y=topfun$Proteins,fill=topfun$Pvalue)) +
geom_col(position="dodge",width=0.4) +
coord_flip() + scale_fill_viridis(option="mako")+
facet_grid(Pathway~.)+
theme(strip.text.y = element_text(angle = 0))
Using the above code i generated this graph
I want to add additional details like rich factor to the graph. Thanks for the help!.
The obvious thing to do is to map Richfactor to the fill variable. You can add the p values directly as text, since they don't seem to be very helpful mapped to the fill scale, at least in this example
ggplot(topfun,aes(x = 'WT', y = Proteins, fill = Richfactor)) +
geom_col(position = "dodge", width = 0.4, color = 'gray50') +
geom_text(aes(y = 1, label = paste('p =', Pvalue), color = Pathway),
hjust = 0) +
coord_flip() +
scale_fill_viridis_c(option = "mako") +
facet_grid(Pathway ~ .) +
theme(strip.text.y = element_text(angle = 0)) +
scale_color_manual(values = c('black', 'black', 'white', 'black'),
guide = 'none')

Stack Barplot (ggplot): Is there a way to use 'repeatable' values of filling in different order?

I am creating a stacked barplot showing different types of treatment for ovarian cancer. Each 'bar' represents a different treatment. Some patients are treated with the same combinational therapy, but not neccessarily in continued lines.
I've looked at this answer # 2.
But it doesn't cut it.
I've attached a sample patient
record_id line treatment value
134 47 1 Carboplatin og Docetaxel 1
135 47 2 Carboplatin og Caelyx 1
136 47 3 Carboplatin og Caelyx 1
137 47 4 AVANOVA, arm 2 - Bevacizumab og NIraparib 1
138 47 5 Carboplatin og Caelyx 1
Using the following ggplot for the patients generates
library(tidyverse)
library(ggplot2)
stack %>%
ggplot(aes(x = record_id, y = value, fill = interaction(treatment,-line))) +
geom_bar(stat = "identity", position = "stack", data = stack %>% filter(record_id == 47)) +
guides(fill = guide_legend("ordering"))
I have also tried using the fill = reorder - same code as above. The result is
I was hoping to get a result looking the the first picture (with fill = interaction), but where the colors appear the same for the same treatment (in this example 'Carboplatin and Caelyx').
It sounds like you want the bars stacked in the first order, but with their fill solely based on treatment. I think this can be done by using group and fill together:
library(tidyverse)
stack %>%
ggplot(aes(x = record_id, y = value,
group = interaction(treatment,-line),
fill = treatment)) +
geom_bar(stat = "identity", position = "stack",
data = stack %>% filter(record_id == 47),
color = "white") +
guides(fill = guide_legend("ordering"))

Show statistically significant difference in a graph

I have carried out an experiment with six treatments and each treatment was performed in the light and darkness. I have used ggplot2 to make bar plot graph. I would like add the significance letters (e.g. LSD result) into the graph to show the difference between light and darkness for each treatment but it gives me an error.
Any suggestion?
data <- read.table(header = TRUE, text =
'T0 T1 T2 T3 T4 T5 LVD
40 62 50 45 45 58 Light
30 60 44 40 30 58 Light
30 68 42 35 32 59 Light
47 75 58 55 50 70 Dark
45 75 52 54 42 78 Dark
50 75 68 48 56 75 Dark
')
gla <- melt(data,id="LVD")
ggplot(gla, aes(x=variable, y=value, fill=as.factor(LVD))) +
stat_summary(fun.y=mean,
geom="bar",position=position_dodge(),colour="black",width=.7,size=.7) +
stat_summary(fun.ymin=min,fun.ymax=max,geom="errorbar",
color="black",position=position_dodge(.7), width=.2) +
scale_fill_manual("Legend", values = c("Light" = "white", "Dark" ="gray46")) +
xlab("Treatments")+
ylab("Germination % ") +
theme(panel.background = element_rect(fill = 'white', colour = 'black'))
till here it perfectly works but when I use geom_text it gives an error
+ geom_text(aes(label=c("a","b","a","a","a","a, a","b","a","b","a","b")))
The error is:
Error: Aesthetics must be either length 1 or the same as the data (36): label, x, y, fill
The problem is that you have 36 data points, which you summarize to 12. ggplot will only allow mapping to 36 data points in geom_text (which the error tells you). In order to use the summarized 12 points, you do need to use stat_summary once again.
The basic rule is that statistical transformations (like summaries) do *not* transfer between layers (i.e. geoms and stats). So geom_text has no idea what the y values computed by the original stat_summary actually are.
Then you also need to fix the typo in your letters.
We end up with:
ggplot(gla, aes(x=variable, y=value, fill=as.factor(LVD))) +
stat_summary(fun.y=mean,
geom="bar",position=position_dodge(),colour="black",width=.7,size=.7) +
stat_summary(fun.ymin=min,fun.ymax=max,geom="errorbar",
color="black",position=position_dodge(.7), width=.2) +
stat_summary(geom = 'text', fun.y = max, position = position_dodge(.7),
label = c("a","b","a","a","a","a", "a","b","a","b","a","b"), vjust = -0.5) +
scale_fill_manual("Legend", values = c("Light" = "white", "Dark" ="gray46")) +
xlab("Treatments") +
ylab("Germination % ") +
scale_y_continuous(expand = c(0, 0), limits = c(0, 85)) +
theme_bw()
I don't like dynamite plots, so here's my version:
let <- c("a","b","a","a","a","a", "a","b","a","b","a","b")
stars <- ifelse(let[c(TRUE, FALSE)] == let[c(FALSE, TRUE)], '', '*')
ggplot(gla, aes(x = variable, y = value)) +
stat_summary(aes(col = as.factor(LVD)),
fun.y=mean, fun.ymin = min, fun.ymax = max,
position = position_dodge(.3), size = .7) +
stat_summary(geom = 'text', fun.y = max, position = position_dodge(.3),
label = stars, vjust = 0, size = 6) +
scale_color_manual("Legend", values = c("Light" = "black", "Dark" ="gray46")) +
xlab("Treatments") +
ylab("Germination % ") +
scale_y_continuous(expand = c(0.1, 0)) +
theme_bw()
I fount it the simplest way to show the statistical significance with asterisks and lines.
fig2 + geom_text(x=1.5,y=89, label = "***") + annotate("segment", x=c(1,1,2), xend=c(1,2,2), y=c(84,86,86), yend=c(86,86,84), size=1)adds 'geom_text' and 'annotate'
[1]: https://i.stack.imgur.com/fs0zN.png

Modyfing the Legend in ggplot2

I've got a problem interacting with the labels in ggplot2.
I have two data sets (Temperature vs. Time) from two experiments but recorded at different timesteps. I've managed to merge the data frames and put them in a long fashion to plot them in the same graph, using the melt function from the reshape2 library.
So, the initial data frames look something like this:
> d1
step Temp
1 512.5 301.16
2 525.0 299.89
3 537.5 299.39
4 550.0 300.58
5 562.5 300.20
6 575.0 300.17
7 587.5 300.62
8 600.0 300.51
9 612.5 300.96
10 625.0 300.21
> d2
step Temp
1 520 299.19
2 540 300.39
3 560 299.67
4 580 299.43
5 600 299.78
6 620 300.74
7 640 301.03
8 660 300.39
9 680 300.54
10 700 300.25
I combine it like this:
> mrgd <- merge(d1, d2, by = "step", all = T)
step Temp.x Temp.y
1 512.5 301.16 NA
2 520.0 NA 299.19
...
And put it into long format for ggplot2 with this:
> melt1 <- melt(mrgd3, id = "step")
> melt1
step variable value
1 512.5 Temp.x 301.16
2 520.0 Temp.x NA
...
Now, I want to for example do a histogram of the distribution of values. I do it like this:
p <- ggplot(data = melt1, aes(x = value, color = variable, fill = variable)) + geom_histogram(alpha = 0.4)
My problem is when I try to modify the Legend of this graph, I don't know how to! I've followed what is suggested in the R Graphics Cookbook book, but I've had no luck.
I've tried to do this, for example (to change the labels of the Legend):
> p + scale_fill_discrete(labels = c("d1", "d2"))
But I just create a "new" Legend box, like so
Or even removing the Legend completely
> p + scale_fill_discrete(guide = F)
I just get this
Finally, doing this also doesn't help
> p + scale_fill_discrete("")
Again, it just adds a new Legend box
Does anyone know what's happening here? It looks as if I'm actually modyfing another Label object, if that makes any sense. I've looked into other related questions in this site, but I haven't found someone having the same problem as me.
Get rid of the aes(color = variable...) to remove the scale that belongs to aes(color = ...).
ggplot(data = melt1, aes(x = value, fill = variable)) +
geom_histogram(alpha = 0.4) +
scale_fill_discrete(labels = c("d1", "d1")) # Change the labels for `fill` scale
This second plot contains aes(color = variable...). Color in this case will draw colored outlines around the histogram bins. You can turn off the scale so that you only have one legend, the one created from fill
ggplot(data = melt1, aes(x = value, color = variable, fill = variable)) +
geom_histogram(alpha = 0.4) +
scale_fill_discrete(labels = c("d1", "d1")) +
scale_color_discrete(guide = F) # Turn off the color (outline) scale
The most straightforward thing to do would be to not use reshape2 or merge at all, but instead to rbind your data frames:
dfNew <- rbind(data.frame(d1, Group = "d1"),
data.frame(d2, Group = "d2"))
ggplot(dfNew, aes(x = Temp, color = Group, fill = Group)) +
geom_histogram(alpha = 0.4) +
labs(fill = "", color = "")
If you wanted to vary alpha by group:
ggplot(dfNew, aes(x = Temp, color = Group, fill = Group, alpha = Group)) +
geom_histogram() +
labs(fill = "", color = "") +
scale_alpha_manual("", values = c(d1 = 0.4, d2 = 0.8))
Note also that the default position for geom_histogram is "stacked". There won't be overlap of the bars unless you use geom_histogram(position = identity).

Errorbars look like pointrange (ggplot2)

I have the following data frame:
> df <- read.table("throughputOverallSummary.txt", header = TRUE)
> df
ExperimentID clients connections msgSize Mean Deviation Error
1 77 100 50 1999 142.56427 8.368127 0.4710121
2 78 200 50 1999 284.22705 13.575943 0.3832827
3 79 400 50 1999 477.48997 44.820831 0.7538666
4 80 600 50 1999 486.87102 49.916391 0.8240869
5 81 800 50 1999 488.84899 51.422070 0.8462216
6 82 10 50 1999 15.23667 1.995150 1.0498722
7 83 50 50 1999 71.94000 5.197893 0.5793057
and some code that processes the dataframe df above:
msg_1999 = subset(df, df$msgSize == 1999)
if (nrow(msg_1999) > 0) {
limits = aes(ymax = msg_1999$Mean + msg_1999$Deviation, ymin = msg_1999$Mean -
msg_1999$Deviation)
ggplot(data = msg_1999, aes(clients, Mean, color = as.factor(connections), group =
as.factor(connections))) +
geom_point() + geom_line() +
geom_errorbar(limits, width = 0.25) +
xlab("Number of Clients") +
ylab("Throughput (in messages/second)") +
labs(title = "Message size 1999 bytes", color = "Connections")
ggsave(file = "throughputMessageSize1999.png")
}
My problem is that the error bars in the plot look like pointrange. The horizontal bars at the upper and lower end of the error bars are missing.
Ideally, the error bars should have looked something like this:
Why do errorbars from my code look different?
The width parameter as the same scale as x, you have given width = 0.25, where the range of the x axis is 0-800. A bar with width 0.25 is not going to be visible on this graph. If you don't set the width value, then something reasonably sensible is guessed.
ggplot(data = df, aes(clients, Mean, color = as.factor(connections), group =
as.factor(connections))) +
geom_point() + geom_line() +
geom_errorbar(aes(ymax = Mean + Deviation, ymin=Mean-Deviation)) +
xlab("Number of Clients") +
ylab("Throughput (in messages/second)") +
labs(title = "Message size 1999 bytes", color = "Connections")
Note that if you want to predefine your mapping argument, you should still specify the variables as you would within a call to geom_xxxx. aes (and ggplot) does some fancy footwork to ensure that this will be evaluated within the correct environment at the time of plotting.
Thus the following will work
limits <- aes(ymax = Mean + Deviation, ymin=Mean-Deviation)
ggplot(data = df, aes(clients, Mean, color = as.factor(connections), group =
as.factor(connections))) +
geom_point() + geom_line() +
geom_errorbar(limits) +
xlab("Number of Clients") +
ylab("Throughput (in messages/second)") +
labs(title = "Message size 1999 bytes", color = "Connections")

Resources