Define each label in heatmap clearly in ggplot in R - r

I have following data frame:
ID position hum_chr_pos CHROM a1 a2 a3 a4 ID_rn
rs1 197_V 897738 1 0.343442666 0.074361225 1 0.028854932 1
rs3 1582_N 2114271 2 0.015863115 1 0.003432604 0.840242328 2
rs6 2266_I 79522907 3 0.177445544 0.090282782 1 0.038199399 3
rs8 521_D 86959173 4 0.542804846 0.088721027 1 0.047758851 4
rs98 1368_G 92252015 5 0.02861059 0.979995611 0.007545923 1 5
rs23 540_A 96162102 5 0.343781806 0.062643599 1 0.024992095 6
rs43 2358_S 147351955 6 0.042592955 0.862087128 0.013001476 1 7
rs65 577_E 168572720 6 0.517111734 0.080471431 1 0.034521778 8
rs602 1932_T 169483561 6 0.043270585 1 0.009731403 0.988762282 9
rs601 1932_T 169511878 6 0.042963813 0.911392392 0.010562154 1 10
rs603 1932_T 169513583 6 0.04096538 0.956129216 0.010983517 1 11
rs606 1936_T 169513573 7 0.04838 0.0126129216 0.090983517 1 12
rs609 1935_T 169513574 7 0.056 0.045 0.086 1 13
I created a heatmap with the values a1, a2, a3, a4:
For this I used this code:
df_melt <- melt(dummy, id.vars=c("ID", "position","hum_chr_pos","CHROM","ID_rn"))
pos <- df_melt %>%
group_by(CHROM) %>%
summarize(avg = round(mean(ID_rn))) %>%
pull(avg)
ggplot(df_melt, aes(x=variable, y=ID_rn)) + geom_tile(aes(fill=value))+theme_bw()+
scale_fill_gradient2(low="lightblue", mid="white", high="darkblue", midpoint=0.5, limits=range(df_melt$value))+
theme_classic()+ labs(title="graph", x= "a", fill = "value")+
ylab("CHROM") +
scale_y_discrete(limits = pos,labels = unique(limits = pos,df_melt$CHROM))
I would like to find a way to see more clearly the separation of each factor on the y axis. At the moment it is not really clear which row belong to which label on the y axis. So I would like to have something like that:
Also it is weird, that the numbers are sometimes not really in the middle of each factor. For example, the 5 and 7 on the y axis are not centered.
But I have searching how to do this, but couldn't find anything.

You could use geom_hline
ggplot(df_melt, aes(x = variable, y = ID_rn)) +
geom_tile(aes(fill = value)) +
theme_bw() +
scale_fill_gradient2(low = "lightblue", mid = "white", high = "darkblue",
midpoint = 0.5, limits = range(df_melt$value)) +
theme_classic() +
labs(title="graph", x= "a", fill = "value", y = "CHROM") +
scale_y_discrete(limits = c(1, 2, 3, 4, 5.5, 9, 12.5),
labels = unique(df_melt$CHROM)) +
geom_hline(yintercept = c(1, 2, 3, 4, 6, 11, 13) + 0.5, color = 'red')

Related

Change legend labels and position dodge

I created with ggplot an interaction plot and added with a different dataframe outliers into the same plot. I want to change the legend's labels (yes and no), but a new legend is added instead of changing them. Here is the Code:
the theme I'm using:
theme_apa(
legend.pos = "right",
legend.use.title = FALSE,
legend.font.size = 12,
x.font.size = 12,
y.font.size = 12,
facet.title.size = 12,
remove.y.gridlines = TRUE,
remove.x.gridlines = TRUE
)
the plot:
InteractionWithOutliers <- ggplot() +
geom_line(data=data2, aes(x=Messzeitpunkt,
y = Sum_PCLMean,group = TB2,linetype=TB2),) +
scale_color_manual(labels = c("test", "test"),values=c('#000000','#000000'))+
geom_point(data = outliersDF, aes(Messzeitpunkt,Sum_PCL,
shape=TB2, color=TB2, size=TB2),) +
geom_point(data = data2, aes(Messzeitpunkt,Sum_PCLMean,
shape=TB2, color=TB2, size=TB2), ) +
scale_shape_manual(values=c(15, 17))+
scale_size_manual(values=c(2,2)) +
ylim(0, 60) +
scale_x_continuous(breaks = seq(0,2)) +
geom_errorbar(data=data2,aes(x = Messzeitpunkt,ymin=Sum_PCLMean-Sum_PCLSD, ymax=Sum_PCLMean+Sum_PCLSD), width=.2,)
InteractionWithOutliers + theme_apa() +
labs(x ="Measurement Period", y = "PTSS mean scores")
Image of the Graph:
Furthermore, when i try to use position dodge to split the position of the interaction plot and the outliers, not everything moves the same way.
Code:
InteractionWithOutliers <- ggplot() +
geom_line(data=data2, aes(x=Messzeitpunkt,
y = Sum_PCLMean,group = TB2,linetype=TB2),position = position_dodge(width = 0.4)) +
scale_color_manual(labels = c("test", "test"),values=c('#000000','#000000'))+
geom_point(data = outliersDF, aes(Messzeitpunkt,Sum_PCL,
shape=TB2, color=TB2, size=TB2),position = position_dodge(width = 0.4)) +
geom_point(data = data2, aes(Messzeitpunkt,Sum_PCLMean,
shape=TB2, color=TB2, size=TB2),position = position_dodge(width = 0.4) ) +
scale_shape_manual(values=c(15, 17))+
scale_size_manual(values=c(2,2)) +
ylim(0, 60) +
scale_x_continuous(breaks = seq(0,2)) +
geom_errorbar(data=data2,aes(x = Messzeitpunkt,ymin=Sum_PCLMean-Sum_PCLSD, ymax=Sum_PCLMean+Sum_PCLSD),
width=.2,position = position_dodge(width = 0.4))
InteractionWithOutliers + theme_apa() +
labs(x ="Measurement Period", y = "PTSS mean scores")
Thank you for your help!
Edit: Data for the Outliers:
Messzeitpunkt Sum_PCL TB2
0 38 no
0 37 yes
0 40 yes
0 41 yes
0 38 yes
1 56 no
1 33 no
2 39 no
2 33 no
Data for the interaction plots:
Messzeitpunkt Sum_PCLMean TB2 Sum_PCLSD
0 9 no 11
0 12 yes 11
1 9 no 15
1 18 yes 16
2 8 no 12
2 14 yes 12
Merging legends can sometimes be painful. If your variables are already labelled (like in your example), then you also don't need to stipulate breaks or labels. (see first example).
However, a good rule is - don't add an aesthetic if you don't really need it. Size and color are constant aesthetics in your case, thus you could (and should) add it as a constant aesthetic outside of aes.
P.S. I have slightly changed the plot in order to make the essential more visible. I personally prefer to keep my plots in an order geoms->scales->coordinates->labels->theme, this helps me keeping an overview over the layers.
library(ggplot2)
data2 <- read.table(text = "Messzeitpunkt Sum_PCL TB2
0 38 no
0 37 yes
0 40 yes
0 41 yes
0 38 yes
1 56 no
1 33 no
2 39 no
2 33 no", head = T)
outliersDF <- read.table(text = "Messzeitpunkt Sum_PCLMean TB2 Sum_PCLSD
0 9 no 11
0 12 yes 11
1 9 no 15
1 18 yes 16
2 8 no 12
2 14 yes 12", head = T)
ggplot() +
geom_line(data = data2, aes(
x = Messzeitpunkt,
y = Sum_PCL, group = TB2, linetype = TB2
)) +
geom_point(data = outliersDF, aes(Messzeitpunkt, Sum_PCLMean,
shape = TB2, color = TB2, size = TB2
)) +
geom_point(data = data2, aes(Messzeitpunkt, Sum_PCL,
shape = TB2, color = TB2, size = TB2
)) +
## if your variable is labelled, no need to specify breaks or labels
scale_color_manual(values = c("#000000", "#000000")) +
scale_shape_manual(values = c(15, 17)) +
scale_size_manual(values = c(2, 2))
## Better, if you have constant aesthetics, not to use aes(), but
## add the values as constants instead
ggplot() +
geom_line(data = data2, aes(
x = Messzeitpunkt,
y = Sum_PCL, group = TB2, linetype = TB2
)) +
geom_point(data = outliersDF, aes(Messzeitpunkt, Sum_PCLMean,
shape = TB2
), size = 2) +
geom_point(data = data2, aes(Messzeitpunkt, Sum_PCL,
shape = TB2
## black color is default, this is just for demonstration
), color = "black", size = 2) +
scale_shape_manual(values = c(15, 17))
Created on 2022-07-15 by the reprex package (v2.0.1)

How do i join points within a ggplot in R properly?

I used the code below to create my plot above. Is there a way to adapt my code so that I do not have the long red line joining the two periods of non-peak hours?
Day_2 <- non_cumul[(non_cumul$Day.No == 'Day 2'),]
Day_2$time_test <- between(as.ITime(Day_2$date_time),
as.ITime("09:00:00"),
as.ITime("17:00:00"))
Day2plot <- ggplot(Day_2,
aes(date_time, non_cumul_measurement, color = time_test)) +
geom_point()+
geom_line() +
theme(plot.title = element_text(hjust = 0.5)) +
ggtitle('Water Meter Averages (Thurs 4th Of Jan 2018)',
'Generally greater water usage between peak hours compared to non peak hours') +
xlab('Date_Times') +
ylab('Measurement in Cubic Feet') +
scale_color_discrete(name="Peak Hours?")
Day2plot +
theme(axis.title.x = element_text(face="bold", colour="black", size=10),
axis.text.x = element_text(angle=90, vjust=0.5, size=10))
From the sound of it, your plot comprises of one observation for each position on the x-axis, and you want consecutive observations of the same color to be joined together in a line.
Here's a simple example that reproduces this:
set.seed(5)
df = data.frame(
x = seq(1, 20),
y = rnorm(20),
color = c(rep("A", 5), rep("B", 9), rep("A", 6))
)
ggplot(df,
aes(x = x, y = y, color = color)) +
geom_line() +
geom_point()
The following code creates a new column "group", which takes on a different value for each collection of consecutive points with the same color. "prev.color" and "change.color" are intermediary columns, included here for clarity:
library(dplyr)
df2 <- df %>%
arrange(x) %>%
mutate(prev.color = lag(color)) %>%
mutate(change.color = is.na(prev.color) | color != prev.color) %>%
mutate(group = cumsum(change.color))
> head(df2, 10)
x y color prev.color change.color group
1 1 -0.84085548 A <NA> TRUE 1
2 2 1.38435934 A A FALSE 1
3 3 -1.25549186 A A FALSE 1
4 4 0.07014277 A A FALSE 1
5 5 1.71144087 A A FALSE 1
6 6 -0.60290798 B A TRUE 2
7 7 -0.47216639 B B FALSE 2
8 8 -0.63537131 B B FALSE 2
9 9 -0.28577363 B B FALSE 2
10 10 0.13810822 B B FALSE 2
ggplot(df2,
aes(x = x, y = y, color = colour, group = group)) +
geom_line() +
geom_point()

how to colour a funnel plot in ggplot R

I have drawn the attached funnel plot in ggplot, But I have 2 questions:
Is there any way to make the coloured green dot bigger (only that one);
is there any way to colour the upper and lower part of the confidence intervals?
This is what I am able to make so far:
Thank you!
The data set I am working on:
df <-
read.table(text = "
school_id year sdq_emotional
1060 7 4
1060 7 5
1060 7 7
1060 7 6
1060 7 4
1060 7 7
1060 7 8
1115 7 5
1115 7 9
1115 7 3
1136 7 1
1136 7 8
1136 7 5
1136 7 9
1135 7 4
1139 7 7
1139 7 3
2371 7 6
2371 7 3
2372 7 4
2372 7 1
2378 7 6
2378 7 7
2378 7 5", header=TRUE)
My code as follows:
# Format the data
df1 <- plyr::count(df, c('school_id'))
df2 <- merge(df,df1, by= c("school_id"))
df <- df2
M3 <- aggregate(df$sdq_emotional[df$freq > 10], by=list(df$school_id[df$freq > 10]),mean,na.rm=T)
S3 <- aggregate(df$sdq_emotional[df$freq > 10], by=list(df$school_id[df$freq > 10]),nona)
CG_PLOT1 <- merge(M3,S3,by="Group.1")
names(CG_PLOT1) <- c("School","Mean","Size")
LINE3 <- data.frame(M3=rep(mean(df$sdq_emotional,na.rm=T),max(CG_PLOT1$Size)+25),
SD3=rep(sd(df$sdq_emotional,na.rm=T),max(CG_PLOT1$Size)+25),
N3=sqrt(1:(max(CG_PLOT1$Size)+25)))
ID <- 1060
filling3 <- rep("white",nrow(CG_PLOT1))
filling3[CG_PLOT1$School ==ID]<-"green"
# Build the graph
ggplot(data = CG_PLOT1) +
geom_line(data = LINE3, aes(x = 1:(max(CG_PLOT1$Size) + 25),
y = M3 + qnorm(0.975) * SD3 / N3), size = 1, colour = "steelblue2",
linetype = 5) +
geom_line(data = LINE3, aes(x = 1:(max(CG_PLOT1$Size) + 25),
y = M3 - qnorm(0.975) * SD3 / N3), size = 1, colour = "steelblue2",
linetype = 5) +
geom_segment(xend = max(CG_PLOT1$Size)+25,yend=mean(LINE3$M3,na.rm=T)),
aes(x = 1, y = mean(LINE3$M3,na.rm=T), size=1, colour="steelblue2") +
geom_point(data = CG_PLOT1, aes(x = Size, y = Mean), size = 2,
colour = "black", shape = 21,fill = filling3) +
ylim(0, 8)
thank you very much!
As you didn't provide a reproducible example, I have used this question as a template for your problem:
Creating a dataset here:
library(ggplot2)
set.seed(101)
x <- runif(100, min=1, max=10)
y <- rnorm(length(x), mean=5, sd=0.1*x)
df <- data.frame(x=x*70, y=y)
m <- lm(y ~ x, data=df)
fit95 <- predict(m, interval="conf", level=.95)
fit99 <- predict(m, interval="conf", level=.999)
df <- cbind.data.frame(df,
lwr95=fit95[,"lwr"], upr95=fit95[,"upr"],
lwr99=fit99[,"lwr"], upr99=fit99[,"upr"])
To add a colour background to the funnel plot, we can use the geom_ribbon function within ggplot to fill the area between a ymin and ymax. In this case, we will use the data used to construct each of the lines:
ggplot(df, aes(x, y)) +
# Add background
geom_ribbon(ymin= df$upr99, ymax = Inf, fill = "#e2a49a", alpha = 0.5) +
geom_ribbon(ymin = df$lwr99, ymax = df$upr99, fill = "#e0ba9d", alpha = 0.5 ) +
geom_ribbon(ymin = 0, ymax = df$lwr99, fill = "#8fd6c9", alpha = 0.5 ) +
# Overlay points and lines
geom_point() +
geom_smooth(method="lm", colour="black", lwd=1.1, se=FALSE) +
geom_line(aes(y = upr95), color="black", linetype=2) +
geom_line(aes(y = lwr95), color="black", linetype=2) +
geom_line(aes(y = upr99), color="red", linetype=3) +
geom_line(aes(y = lwr99), color="red", linetype=3)
labs(x="No. admissions...", y="Percentage of patients...")
As for changing the size of one point, you can check out the answer here. I would recommend subsetting the data to extract the one point, and then add another layer for the geom_point and then changing the size and colour argument of the new layer`

ggplot2 facets: Different annotation text for each plot

I have the following generated data frame called Raw_Data:
Time Velocity Type
1 10 1 a
2 20 2 a
3 30 3 a
4 40 4 a
5 50 5 a
6 10 2 b
7 20 4 b
8 30 6 b
9 40 8 b
10 50 9 b
11 10 3 c
12 20 6 c
13 30 9 c
14 40 11 c
15 50 13 c
I plotted this data with ggplot2:
ggplot(Raw_Data, aes(x=Time, y=Velocity))+geom_point() + facet_grid(Type ~.)
I have the objects: Regression_a, Regression_b, Regression_c. These are the linear regression equations for each plot. Each plot should display the corresponding equation.
Using annotate displays the particular equation on each plot:
annotate("text", x = 1.78, y = 5, label = Regression_a, color="black", size = 5, parse=FALSE)
I tried to overcome the issue with the following code:
Regression_a_eq <- data.frame(x = 1.78, y = 1,label = Regression_a,
Type = "a")
p <- x + geom_text(data = Raw_Data,label = Regression_a)
This did not solve the problem. Each plot still showed Regression_a, rather than just plot a
You can put the expressions as character values in a new dataframe with the same unique Type's as in your data-dataframe and add them with geom_text:
regrDF <- data.frame(Type = c('a','b','c'), lbl = c('Regression_a', 'Regression_b', 'Regression_c'))
ggplot(Raw_Data, aes(x = Time, y = Velocity)) +
geom_point() +
geom_text(data = regrDF, aes(x = 10, y = 10, label = lbl), hjust = 0) +
facet_grid(Type ~.)
which gives:
You can replace the text values in regrDF$lbl with the appropriate expressions.
Just a supplementary for the adopted answer if we have facets in both horizontal and vertical directions.
regrDF <- data.frame(Type1 = c('a','a','b','b'),
Type2 = c('c','d','c','d'),
lbl = c('Regression_ac', 'Regression_ad', 'Regression_bc', 'Regression_bd'))
ggplot(Raw_Data, aes(x = Time, y = Velocity)) +
geom_point() +
geom_text(data = regrDF, aes(x = 10, y = 10, label = lbl), hjust = 0) +
facet_grid(Type1 ~ Type2)
The answer is good but still imperfect as I do not know how to incorporate math expressions and newline simultaneously (Adding a newline in a substitute() expression).

NA value breaks ggplot2 heatmap?

I'm using ggplot2 to generate a heatmap, but NA values cause the heatmap to be all one color.
Example dataframe:
id<-as.factor(c(1:5))
year<-as.factor(c("Y13", "Y14", "Y15"))
freq<-c(26, 137, 166, 194, 126, 8, 4, 76, 20, 92, 4, NA, 6, 6, 17)
test<-data.frame(id, year, freq)
test
id year freq
1 Y13 26
2 Y14 137
3 Y15 166
4 Y13 194
5 Y14 126
1 Y15 8
2 Y13 4
3 Y14 76
4 Y15 20
5 Y13 92
1 Y14 4
2 Y15 NA
3 Y13 6
4 Y14 6
5 Y15 17
I used the following for the heatmap:
# set color palette
jBuPuFun <- colorRampPalette(brewer.pal(n = 9, "RdBu"))
paletteSize <- 256
jBuPuPalette <- jBuPuFun(paletteSize)
# heatmap
ggplot(test, aes(x = year, y = id, fill = freq)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
geom_tile() +
scale_fill_gradient2(low = jBuPuPalette[1],
mid = jBuPuPalette[paletteSize/2],
high = jBuPuPalette[paletteSize],
midpoint = (max(test$freq) + min(test$freq)) / 2,
name = "Number of Violations")
The result is a gray color over the entire heatmap.
When I removed the "NA" from the dataframe, the heatmap renders correctly.
I've experimented with this by specifically assigning color to th "NA" values (for example, by
scale_fill_gradient2(low = jBuPuPalette[1],
mid = jBuPuPalette[paletteSize/2],
high = jBuPuPalette[paletteSize],
na.value="yellow",
midpoint = (max(test$freq) + min(test$freq)) / 2,
name = "Number of Violations")
However, that just made the entire heatmap yellow.
Am I missing something obvious? Any suggestions are appreciated.
Thanks.
Comment to answer:
ggplot deals with NAs just fine, but the defaults for min and max are to return NA if the vector contains any NA. You just need to set na.rm = TRUE for these when you define the midpoint of your scale:
midpoint = (max(test$freq, na.rm = TRUE ) + min(test$freq, na.rm = TRUE)) / 2,

Resources