Error in Plotting Longitude Latitude with Fill Values in ggplot2 - r

I have a data with longitude, latitude and value at each grid. A grid may have more than one value so I set alpha to visualize multiple values. My aim is to fill grids with three different ranges. If the value is zero then that grid would be empty.
library(maps)
library(ggplot2)
data <- read.csv("G:/mydata.csv")
g1 <- ggplot(aes(x=x, y=y, fill= A), data=data) +
geom_tile(data=subset(data, A > 1970 & A < 1980),fill = "black", alpha = 0.5)+
geom_tile(data=subset(data, B > 1970 & B < 1980),fill = "black", alpha = 0.5)+
geom_tile(data=subset(data, C > 1970 & C < 1980),fill = "black", alpha = 0.5)+
geom_tile(data=subset(data, A > 1979 & A < 1990),fill = "blue", alpha = 0.5)+
geom_tile(data=subset(data, B> 1979 & B < 1990), fill = "blue", alpha = 0.5)+
geom_tile(data=subset(data, C > 1979 & C < 1990),fill = "blue", alpha = 0.5)+
geom_tile(data=subset(data, A > 1989),fill = "red", alpha = 0.5)+
geom_tile(data=subset(data, B > 1989),fill = "red", alpha = 0.5)+
geom_tile(data=subset(data, C > 1989),fill = "red", alpha = 0.5)+
theme_classic()
is wrong. As blue grids are bigger. I could not find out the mistake. I followed the link but could not make it. I guess there is something trivial which I am missing. My data can be accessed here. Many thanks in advance.

Sorry, can't do it the way you envisioned it. Not enough flexiblity that I could see. But one can do this:
library(maps)
library(ggplot2)
ddf <- read.csv("mydata.csv")
setz <- function(dddf,zvek,lev=0,fillclr){
dddf$z <- as.numeric(zvek)
dddf$lev <- lev
dddf$color <- "white"
dddf$fill <- ifelse(zvek,fillclr,"gray")
return(dddf)
}
df1<-setz(ddf,ddf$A>1970 & ddf$A<1980,"A>1970 & A<1980","black")
df2<-setz(ddf,ddf$B>1970 & ddf$B<1980,"B>1970 & B<1980","black")
df3<-setz(ddf,ddf$C>1970 & ddf$C<1980,"C>1970 & C<1980","black")
df4<-setz(ddf,ddf$A>1979 & ddf$A<1990,"A>1979 & A<1990","blue")
df5<-setz(ddf,ddf$B>1979 & ddf$B<1990,"B>1979 & B<1990","blue")
df6<-setz(ddf,ddf$C>1979 & ddf$C<1990,"C>1979 & C<1990","blue")
df7<-setz(ddf,ddf$A>1989,"A>1989","red")
df8<-setz(ddf,ddf$B>1989,"B>1989","red")
df9<-setz(ddf,ddf$C>1989,"C>1989","red")
ddg <- rbind( df1,df2,df3, df4,df5,df6, df7,df8,df9 )
g1 <- ggplot(data=ddg,aes(x=x, y=y,fill=fill,color=color)) +
geom_tile() +
scale_color_identity() +
scale_fill_identity() +
facet_wrap(~lev)
theme_classic()
print(g1)
Which yields this:

Related

R ggplot2: different color and alpha styling for different geom_point depending on y value

I want to create the earthquake graph with the look based on this reference Earthquake viz reference
Data
The data is look like this:
head(df)
|Date |Time.UTC|Latitude|Longitude|Depth|Depth.Type|Magnitude.Type|Magnitude|Region.Name |Last.Update |Eqid |X |color |
|----------|--------|--------|---------|-----|----------|--------------|---------|---------------------------|----------------|-------|---|-------|
|2023-02-10|06:11:56|-0.13 |123.12 |137 | | M |2.5 |SULAWESI, INDONESIA |2023-02-10 06:20|1221400| |#C4C4C4|
|2023-02-10|06:00:14|-1.79 |100.42 |27 | | M |2.9 |SOUTHERN SUMATRA, INDONESIA|2023-02-10 06:10|1221398| |#C4C4C4|
|2023-02-10|05:59:27|-1.31 |120.44 |10 | | M |2.7 |SULAWESI, INDONESIA |2023-02-10 06:05|1221396| |#C4C4C4|
|2023-02-10|05:26:25|-6.14 |104.72 |35 | | M |3.9 |SUNDA STRAIT, INDONESIA |2023-02-10 05:35|1221388| |#C4C4C4|
|2023-02-10|05:10:06|-8.08 |117.78 |18 | | M |2.8 |SUMBAWA REGION, INDONESIA |2023-02-10 05:15|1221377| |#C4C4C4|
|2023-02-10|04:55:01|0.99 |98.06 |25 | | M |3.3 |NIAS REGION, INDONESIA |2023-02-10 05:05|1221370| |#C4C4C4|
Code used
ggplot(df,aes(Date, Magnitude)) +
geom_point(data = df %>% filter(Magnitude > 5),
alpha = 0.9, size = 1.7, shape = 16, stroke = 0,
color = "#264653")+
geom_point(aes(color = Magnitude),
data = df %>% filter(Magnitude <= 5),
alpha = 1/20, size = 1.7, shape = 16, stroke = 0)
Output
Expectation
I'm expecting that the geom_points for the data where its magnitude is below or equal to 5 to have grey color and use transparency gradient (higher magnitude, higher alpha, vice versa).
While geom_points for the data above 5, only plain solid color
Other test
I tried to add
scale_color_gradient(low=alpha("#BFBFBF",0),high = alpha("#6B6B6B",0.9))
But the result have no transparency gradient as expected
Here is another option.
The ggplot has one geom_point with data filtered to Magnitude > 5. This has color = Magnitude in aes. This gives the possibility to style the color of those points with scale_color_gradiant2 as done below.
The other data are also filtered ( Magnitude <= 5) and Magnitude is renamed to mag. This gives the possibility to style those points independently from the other. You can use geom_point. But this may lead to some overplotting and the impression that the alpha is not applied.
I have used geom_jitter to avoid overplotting.
I have also taken the freedom to use different sizes for the different subsets and some color.
library(tidyverse)
ggplot() +
geom_point(data = data |> filter(Magnitude > 5), aes(
Date, Magnitude,
color = Magnitude
), size = 3, show.legend = FALSE) +
# here you can also use geom_point
geom_jitter(data = data |>
filter(Magnitude <= 5) |> rename(mag = Magnitude), aes(
Date, mag,
alpha = mag
), color = "grey", size = 1, show.legend = FALSE) +
scale_colour_gradient2(
low = "green",
mid = "orange",
high = "red",
midpoint = 6,
guide = "none",
aesthetics = "colour"
)
You can fake your alpha with segments. This gives you much better control than using scale_alpha (in my opinion).
I have adjusted the look to come a bit closer to your linked example graph. The trick is to assign alpha values to your Magnitude values. In the data frame which does that, you can influence where your alpha will have which value.
library(ggplot2)
df <- data.frame(Date = rep(seq(as.Date("2021-01-01"), as.Date("2021-12-31"), by = "day"), 10),
Magnitude = runif(3650, max = 5))
## generate your alpha segments
df_seg <- data.frame(y = seq(min(df$Magnitude), max(df$Magnitude), len = 1000),
alpha = seq(1, 0.3, len = 1000))
ggplot(df, aes(Date, Magnitude)) +
## use alpha as aesthetic
## color for the same look as in your linked example graph
geom_point(aes(color = Magnitude),
size = 1.7, shape = 16, stroke = 0
) +
geom_segment(
data = df_seg, aes(
y = y, yend = y,
x = as.Date("2021-01-01"),
xend = as.Date("2021-12-31"),
alpha = alpha
),
## you need to play around with the line width a bit
linewidth = .2,
color = "white",
## to remove the alpha legend
show.legend = F
) +
## add some random divergent colors
scale_color_gradientn(values = c(0, .4, 1), colors = c("darkgreen", "darkgreen", "red")) +
theme_classic() +
coord_cartesian(expand = F)
I was saying I find it easier to add segments to fake an alpha. As user AllanCameron rightly pointed out, this is arguable. Adding the alpha directly to the colors in scale_color_gradientn will also give a nice visual and is indeed much shorter.
ggplot(df, aes(Date, Magnitude)) +
geom_point(aes(color = Magnitude),
size = 1.7, shape = 16, stroke = 0
) +
scale_color_gradientn(values = c(0, .4, 1),
colors =c(alpha("darkgreen", 0.1), alpha("darkgreen", 0.5), "red")) +
theme_classic() +
coord_cartesian(expand = F)
You can adjust the transparency gradient by changing the low and high arguments in scale_alpha(). To make the lower magnitude earthquakes have a lighter tone, you can increase the low value, for example:
ggplot(df,aes(Date, Magnitude)) +
geom_point(data = df %>% filter(Magnitude > 5),
alpha = 0.9, size = 1.7, shape = 16, stroke = 0,
color = "#264653")+
geom_point(aes(color = Magnitude),
data = df %>% filter(Magnitude <= 5),
alpha = 0.5, size = 1.7, shape = 16, stroke = 0) +
scale_alpha(limits = c(0, 5), range = c(0.1, 0.5))

volcano plot error (using ggplot2): drawn without data

I'm here again with another problem.
I'm currently working with making a volcano plot of DEG data using ggplot2.
The thing is that I'm getting a result without data. weird.
for more accurate diagnosis, my data(volcano) is consist of 948 DEG data (|logFC|>1, FDR<0.05).
library(ggplot2)
volcano["group"] <- "NotSignificant"
volcano[which(volcano['FDR'] < 0.01 & abs(volcano['logFC']) > 2 ),"group"] <- "Increased"
volcano[which(volcano['FDR'] < 0.01 & abs(volcano['logFC']) < -2 ),"group"] <- "Decreased"
# creating color palette
cols <- c("red" = "red", "orange" = "orange", "NotSignificant" = "darkgrey",
"Increased" = "#00B2FF", "Decreased" = "#00B2FF")
##I didn't even get to use those beautiful colors.
FDR_threshold <- 0.01
logFC_threshold <- 2
deseq.threshold <- as.factor(abs(volcano$logFC) >= logFC_threshold &
volcano$FDR < FDR_threshold)
xi <- which(deseq.threshold == TRUE)
deseq.threshold <- as.factor(abs(volcano$logFC) > 2 & volcano$FDR < 0.05)
# Make a basic ggplot2 object
vol <- ggplot(volcano, aes(x = logFC, y =-log10(FDR), colour=deseq.threshold))
# inserting manual colours as per colour palette and more
vol +
scale_colour_manual(values = cols) +
ggtitle(label = "Volcano Plot", subtitle = "colon specific volcano plot") +
geom_point(size = 2.5, alpha = 1, na.rm = T) +
theme_bw(base_size = 14) +
theme(legend.position = "none") +
xlab(expression(log[2]("logFC"))) +
ylab(expression(-log[10]("FDR"))) +
geom_hline(yintercept = 1, colour="#990000", linetype="dashed") +
geom_vline(xintercept = 0.586, colour="#990000", linetype="dashed") +
geom_vline(xintercept = -0.586, colour="#990000", linetype="dashed")+
scale_y_continuous(trans = "log1p")
Here is the lil sample of my dataset, volcano
genes logFC FDR group
1 INHBA 6.271879 2.070000e-30 Increased
2 COL10A1 7.634386 1.820000e-23 Increased
3 WNT2 9.485133 6.470000e-20 Increased
4 COL8A1 3.974965 6.470000e-20 Increased
5 THBS2 4.104176 2.510000e-19 Increased
6 BGN 3.524484 5.930000e-18 Increased
7 COMP 11.916956 2.740000e-17 Increased
9 SULF1 3.540374 1.290000e-15 Increased
10 CTHRC1 3.937028 4.620000e-14 Increased
11 TRIM29 3.827088 1.460000e-11 Increased
12 SLC6A20 5.060538 5.820000e-11 Increased
13 SFRP4 5.924330 8.010000e-11 Increased
14 CDH3 5.330732 8.940000e-11 Increased
15 ESM1 6.491496 3.380000e-10 Increased
614 TDP2 -1.801368 0.002722461 NotSignificant
615 EPHX2 -1.721039 0.002722461 NotSignificant
616 RAVER2 -1.581812 0.002749728 NotSignificant
617 BMP6 -2.702780 0.002775460 Increased
619 SCNN1G -4.012111 0.002870500 Increased
620 SLC52A3 -1.868920 0.002931197 NotSignificant
621 VIPR1 -1.556238 0.002945578 NotSignificant
622 SUCLG2 -1.720993 0.003059717 NotSignificant
I think your issue is coming from the use of deseq.threshold in the color of aes. Instead, I think you should use group column to plot the color.
BTW, your threshold to define your significant genes has a mistake because you are looking for "Decreased" for genes with an absolute value of logFC inferior to -2 which is not possible.
Here, I used an example of an output of DEG:
library(data.table)
volcano = fread("https://gist.githubusercontent.com/stephenturner/806e31fce55a8b7175af/raw/1a507c4c3f9f1baaa3a69187223ff3d3050628d4/results.txt", header = TRUE)
colnames(volcano) <- c("Gene","logFC","pvalue","FDR")
# Adding group to decipher if the gene is significant or not:
volcano <- data.frame(volcano)
volcano["group"] <- "NotSignificant"
volcano[which(volcano['FDR'] < 0.01 & volcano['logFC'] > 1 ),"group"] <- "Increased"
volcano[which(volcano['FDR'] < 0.01 & volcano['logFC'] < -1 ),"group"] <- "Decreased"
So, my example dataframe looks like (I changed a little bit the threshold you are using to get more significant genes):
> head(volcano)
Gene logFC pvalue FDR group
1 DOK6 0.5100 1.861e-08 0.0003053 NotSignificant
2 TBX5 -2.1290 5.655e-08 0.0004191 Decreased
3 SLC32A1 0.9003 7.664e-08 0.0004191 NotSignificant
4 IFITM1 -1.6870 3.735e-06 0.0068090 Decreased
5 NUP93 0.3659 3.373e-06 0.0068090 NotSignificant
6 EMILIN2 1.5340 2.976e-06 0.0068090 Increased
Now, you can plot:
library(ggplot2)
ggplot(volcano, aes(x = logFC, y = -log10(FDR), color = group))+
scale_colour_manual(values = cols) +
ggtitle(label = "Volcano Plot", subtitle = "colon specific volcano plot") +
geom_point(size = 2.5, alpha = 1, na.rm = T) +
theme_bw(base_size = 14) +
theme(legend.position = "none") +
xlab(expression(log[2]("logFC"))) +
ylab(expression(-log[10]("FDR"))) +
geom_hline(yintercept = 1, colour="#990000", linetype="dashed") +
geom_vline(xintercept = 0.586, colour="#990000", linetype="dashed") +
geom_vline(xintercept = -0.586, colour="#990000", linetype="dashed")+
scale_y_continuous(trans = "log1p")

How to highlight bin of observation in ggplot?

How to highlight the entire bar in which the observations obs.A and obs.B respectively are being allocated using ggplot? The exact same thing has been done for the regular hist() function but what is the ggplot way?
Below some code to illustrate
library(ggplot2)
data <- data.frame(Var=c(rep("A",50),rep("B",50)), Value=round(rnorm(100,10,2),0))
obs.A<-8
obs.B<-10
ggplot(data)+
geom_histogram(aes(x=Value))+
facet_grid(Var ~ .)
Edit: It needs to work for large and small sample sizes and really only highlight one and all of the bar.
one ggplot way is to build it into the dataframe used for plotting:
library(ggplot2)
data <- data.frame(Var=c(rep("A",50),rep("B",50)), Value=round(rnorm(100,10,2),0))
obs.A<-8
obs.B<-10
data$color <- ifelse(data$Var == "A" & data$Value == obs.A, T, F)
data$color <- ifelse(data$Var == "B" & data$Value == obs.B, T, data$color)
ggplot(data)+
geom_histogram(aes(x=Value, fill = color))+
facet_grid(Var ~ .)
Note this works easily for your test case because the range for data$Value is 5-16 and the default for geom_histogram() is bins = 30. If you wanted to make it more transferable you would want to set geom_histogram(binwidth = 1) or set data$color based on bins, something like this:
data <- data.frame(Var=c(rep("A",50),rep("B",50)), Value=round(rnorm(100,10,10),0)) # bigger sd
obs.A<-8
obs.B<-10
data$cuts <- cut(data$Value, 30, labels = F)
A_colored_cuts <- unique(data$cuts[data$Value == obs.A])
data$color <- ifelse(data$Var == "A" & data$cuts == A_colored_cuts, T, F)
B_colored_cuts <- unique(data$cuts[data$Value == obs.B])
data$color <- ifelse(data$Var == "B" & data$cuts == B_colored_cuts, T, data$color)
ggplot(data)+
geom_histogram(aes(x=Value, fill = color))+
facet_grid(Var ~ .)
EDIT: For larger sample sizes, we would want to use the second option outlined above and specify geom_histogram(boundary = .5), since we want the bin breaks on integers.
set.seed(1)
data <- data.frame(Var=c(rep("A",50),rep("B",50)), Value=round(rnorm(10000,10,10),0))
#use code chunk 2 above
ggplot(data)+
geom_histogram(aes(x=Value, fill = color), boundary = .5)+
facet_grid(Var ~ .)
Adding the conditions inside geom_histogram for the aesthetics fill. We remove the oversized legend with theme(legend.position = "none")
# Example 1
set.seed(12345)
data <- data.frame(Var = c(rep("A", 50), rep("B", 50)), Value = round(rnorm(100, 10, 2), 0))
ggplot(data) +
geom_histogram(aes(x = Value,
fill = Value == 8 & Var == "A" | Value == 10 & Var == "B"), binwidth=0.5) +
facet_grid(Var ~ .) +
theme(legend.position = "none")
# Example 2
set.seed(12345)
data <- data.frame(Var = c(rep("A", 50), rep("B", 50)), Value = round(rnorm(10000, 10, 10), 0))
ggplot(data) +
geom_histogram(aes(x = Value,
fill = Value == 8 & Var == "A" | Value == 10 & Var == "B"), binwidth = 0.5) +
facet_grid(Var ~ .) +
theme(legend.position = "none")
If we would like to assign different colours, we use scale_fill_manual:
# Example3
set.seed(12345)
data <- data.frame(Var = c(rep("A", 50), rep("B", 50)), Value = round(rnorm(100, 10, 2), 0))
ggplot(data) +
geom_histogram(aes(x = Value,
fill = Value == 8 & Var == "A" | Value == 10 & Var == "B"), binwidth=0.5) +
facet_grid(Var ~ .) +
scale_fill_manual(values = c("grey45", "red"))+
theme(legend.position = "none")
We can try this to have desired the highlighting for any size of data (we may need to adjust the bindwith with the data size for the bar chart to look prettier and more informative):
library(ggplot2)
set.seed(12345)
data <- data.frame(Var=c(rep("A",50),rep("B",50)), Value=round(rnorm(100,10,2),0))
obs.A<-8
obs.B<-10
cond <- (data$Var=='A' & data$Value == obs.A)|(data$Var=='B' & data$Value == obs.B)
binwidth <- 0.25
ggplot(data)+
geom_histogram(data=data[!cond,], aes(x=Value), binwidth=binwidth) +
geom_histogram(data=data[cond,], aes(x=Value), fill='red', binwidth=binwidth) +
facet_grid(Var ~ .)
set.seed(12345)
data <- data.frame(Var=c(rep("A",50),rep("B",50)), Value=round(rnorm(10000,10,10),0))
obs.A<-8
obs.B<-10
cond <- (data$Var=='A' & data$Value == obs.A)|(data$Var=='B' & data$Value == obs.B)
binwidth <- 0.5
ggplot(data)+
geom_histogram(data=data[!cond,], aes(x=Value), binwidth=binwidth) +
geom_histogram(data=data[cond,], aes(x=Value), fill='red', binwidth=binwidth) +
facet_grid(Var ~ .)

ggplot2 multi-histogram graph plotting only single histogram

ggplot(d,aes(x= `Log Number`)) +
geom_histogram(data=subset(d,state == 'c'),fill = "red", alpha = 0.2) +
geom_histogram(data=subset(d,state == 'l'),fill = "blue", alpha = 0.2) +
geom_histogram(data=subset(d,state == 't'),fill = "green", alpha = 0.2)
d is a dataset only contain two columns log number which is a long list of number, state which is a factor contain 3 levels-c,l,t
i tried to use it to plot a overlapping histogram but it just return a single one. Thanks
You want to fill by status
ggplot(d, aes(x = `Log Number`, fill = state)) + geom_histogram()
Hmm, I don't know, I think your data is wrong. Worked for me:
lon <- log( rnorm(1000,exp(6) ))
state <- sample(c("c","l","t"),1000,replace=T)
d <- data.frame(lon,state)
names(d) <- c("Log Number","state")
head(d)
yields the following data:
Log Number state
1 5.999955 t
2 5.997907 c
3 6.002452 l
4 5.994471 l
5 5.997306 l
6 6.000798 t
And then the plot:
ggplot(d,aes(x= `Log Number`)) +
geom_histogram(data=subset(d,state == 'c'),fill = "red", alpha = 0.2) +
geom_histogram(data=subset(d,state == 'l'),fill = "blue", alpha = 0.2) +
geom_histogram(data=subset(d,state == 't'),fill = "green", alpha = 0.2)
looks like this:

multi-faceted heat map with ggplot for selected portion of X with additional text labels on it

I have the following data:
Id = paste ("ID-", 1:5, sep = "")
position <- rep(seq (1, 100,10), each = 5)
group = rep (rep(rep (1:5, each = length (Id)), each = length(position)))
yvar <- rnorm (length(position), 0.5, 0.1)
ycat <- c(sample (c("A", "B"), length(yvar), replace = TRUE))
namevar <- rep(Id, length(group)/length(Id))
mydf <- data.frame (namevar, group, position, yvar, ycat)
group is a faceting variable, position is a continous x variable. yvar is used for filling the color of the tiles. ycat is a text label for each tile. I want to create a plot with empty space for all values, except certain tiles that I select to plot with a fill color and labels.
Here is what I have so far:
ggplot(mydf,aes(y=Id,x=position)) +
facet_wrap(~group) +
geom_tile(aes(fill = yvar),colour = "black") +
geom_text(aes(label = ycat)) +
labs(x = NULL,y = NULL)
I'd like the plot to look like this except have blank space everywhere except, for instance, group 1 between 30-50 and group 5 between 20-60, sort of like this:
This will produce your last plot, but only shade selected regions:
ggplot(mydf,aes(y=Id,x=position)) +
facet_wrap(~group) +
geom_blank() +
geom_tile(data = subset(mydf,(group == 1 & position >= 30 & position <= 50) |
(group == 5 & position >= 20 & position <= 60)),aes(fill = yvar),colour = "black") +
geom_text(data = subset(mydf,(group == 1 & position >= 30 & position <= 50) |
(group == 5 & position >= 20 & position <= 60)),aes(label = ycat),size = 3) +
labs(x = NULL,y = NULL)

Resources