I have this data set, I want to make Hierarchical Cluster Heatmap in R. Please help me
structure(list(Location = c("Karnaphuli River", "Sangu River", "Kutubdia Channel", "Moheshkhali Channel", "Bakkhali River", "Naf River", "St. Martin's Island", "Mean "), Cr = c(114.92, 2.75, 18.88, 27.6, 39.5, 12.8, 17.45, 33.41), Pb = c(31.29, 26.42, 52.3, 59.45, 34.65, 12.8, 9.5, 32.34), Cu = c(9.48, 54.39, 52.4, 73.28, 76.26, 19.48, 8.94, 42.03), Zn = c(66.2, 71.17, 98.7, 95.3, 127.84, 27.76, 21.78, 72.67), As = c(89.67, 9.85, 8.82, 18.54, 15.38, 7.55, 16.45, 23.75), Cd = c(1.06, 0, 0.96, 2.78, 3.12, 0.79, 0.45, 1.53)), class = "data.frame", row.names = c(NA, -8L))
library(tidyverse)
library(gplots)
#>
#> Attaching package: 'gplots'
#> The following object is masked from 'package:stats':
#>
#> lowess
dat <- structure(list(Location = c("Karnaphuli River", "Sangu River", "Kutubdia Channel", "Moheshkhali Channel", "Bakkhali River", "Naf River", "St. Martin's Island", "Mean "), Cr = c(114.92, 2.75, 18.88, 27.6, 39.5, 12.8, 17.45, 33.41), Pb = c(31.29, 26.42, 52.3, 59.45, 34.65, 12.8, 9.5, 32.34), Cu = c(9.48, 54.39, 52.4, 73.28, 76.26, 19.48, 8.94, 42.03), Zn = c(66.2, 71.17, 98.7, 95.3, 127.84, 27.76, 21.78, 72.67), As = c(89.67, 9.85, 8.82, 18.54, 15.38, 7.55, 16.45, 23.75), Cd = c(1.06, 0, 0.96, 2.78, 3.12, 0.79, 0.45, 1.53)), class = "data.frame", row.names = c(NA, -8L))
dat %>%
column_to_rownames(var = "Location") %>%
as.matrix() %>%
heatmap.2(., # source data
scale = "none", # set scaling by column, row or none
Rowv = T, # toggles clustering of rows
Colv = T, # toggles clustering of columns
trace = "none", # turn off trace in each column of heatmap
margin = c(3, 10), # set margins around plot
col = colorRampPalette(c("white", "red"))(11), # set color scheme
symkey = F, # set color scale to be asymetric
symbreaks = F, # set color scale to be asymetric
main = "Heatmap Title", # set main plot title
cexRow = 1, # set font size for rows
cexCol = 1, # set font size for columns
tracecol = "black", # set color of histogram on key
key.xlab = "Value", # set title for legend
lhei = c(1,3), # set key height as proportion to total plot height
lwid = c(1,3), # set key width as proportion to total plot width
keysize = 2 # set overall key size
)
Created on 2021-04-21 by the reprex package (v1.0.0)
There are many options to customize the details of the heatmap - check the documentation for more. I've shown a few of the ones I commonly use here.
Related
I have date for 8 years. Sample of my data:
structure(list(Data = c("1/1/2015", "1/2/2015", "1/3/2015", "1/4/2015",
"1/5/2015", "1/6/2015", "1/7/2015", "1/8/2015", "1/9/2015", "1/10/2015",
"1/11/2015", "1/12/2015", "1/13/2015", "1/14/2015", "1/15/2015",
"1/16/2015", "1/17/2015", "1/18/2015", "1/19/2015", "1/20/2015",
"1/21/2015", "1/22/2015", "1/23/2015", "1/24/2015", "1/25/2015",
"1/26/2015", "1/27/2015", "1/28/2015", "1/29/2015", "1/30/2015",
"1/31/2015"), no2 = c(3.56, 11.13, 11.84, 4.88, 6.16, 12.56,
18.99, 24.74, 10.81, 12.7, 6.08, 7.34, 16.88, 16.65, 15.81, 20.78,
15.03, 11.82, 15.18, 17, 15.21, 13.86, 10.28, 8.34, 11.89, 7.22,
15.44, 10.55, 8.19, 5.04, 14.65), ws = c(10.84, 3.71, 2.08, 4.59,
6.18, 2.97, 2.13, 1.22, 1.92, 2.07, 3.09, 4.75, 2.12, 1.8, 1.9,
1.79, 1.58, 1.86, 1.58, 1.47, 1.7, 2.6, 2.67, 3.21, 1.78, 4.58,
1.79, 3.1, 3.49, 6.15, 2.59), wd = c(90, 112.5, 112.5, 270, 90,
135, 112.5, 112.5, 270, 315, 270, 112.5, 112.5, 135, 135, 112.5,
292.5, 135, 270, 135, 112.5, 112.5, 270, 112.5, 112.5, 112.5,
112.5, 112.5, 270, 270, 270)), class = "data.frame", row.names = c(NA,
-31L))
library(openair)
windRose(nitrogen,
key = list(header="Wind Rose Acri", footer="wind speed",
plot.style = c("ticks", "border"),
fit = "all", height = 1,
space = "top"))
pollutionRose(nitrogen, pollutant = "no2")
I want to show how the wind rose varies by month. The same problem (Wind rose with ggplot (R)?) but tried realised by function from Openair package.
You could convert your Data column to a name called date with date format and specify type argument with "month". type according to documenation:
type determines how the data are split i.e. conditioned, and then
plotted. The default is will produce a single plot using the entire
data. Type can be one of the built-in types as detailed in cutData
e.g. “season”, “year”, “weekday” and so on. For example, type =
"season" will produce four plots --- one for each season.
It is also possible to choose type as another variable in the data
frame. If that variable is numeric, then the data will be split into
four quantiles (if possible) and labelled accordingly. If type is an
existing character or factor variable, then those categories/levels
will be used directly. This offers great flexibility for understanding
the variation of different variables and how they depend on one
another.
Type can be up length two e.g. type = c("season", "weekday") will
produce a 2x2 plot split by season and day of the week. Note, when two
types are provided the first forms the columns and the second the
rows.
Please note you only provided one month:
library(openair)
# add month column
nitrogen$date <- as.POSIXct(nitrogen$Data, format = '%m/%d/%Y')
windRose(nitrogen,
key = list(header="Wind Rose Acri", footer="wind speed",
plot.style = c("ticks", "border"),
fit = "all", height = 1,
space = "top"),
type = 'month')
Created on 2022-12-13 with reprex v2.0.2
Here is an example with build-in data with type = 'month':
library(openair)
windRose(mydata, type = "month")
Created on 2022-12-13 with reprex v2.0.2
I have a graph in Excel that I'd like to replicate in R if it's even possible. I am new to R, so any guidance will be appreciated.
So my data looks like this
I can include the file if anyone wants it.
Then I have this graph:
I'd like to plot the same graph as in the Excel file but using R. So, is there a way to have a kind of subset for the x-axis values that belong to the main value?
I looked through the ggplot documentation and How to plot side-by-side with sub labels and without space between subplots, but to no avail.
You can use either geom_bar(position = "dodge") or facet_wrap() to achieve your desired results. Please note that you'll need to name all your variables before plotting as it looks like the first two columns of your dataframe do not have names.
library(tidyverse)
data(mtcars)
# make a nested dataframe for example purposes
df <- mtcars %>%
rownames_to_column(var = "rowname") %>%
select(c(1:5)) %>%
pivot_longer(cols = -c(rowname)) %>%
head(n = 20)
ggplot(df, aes(x = name, y = value, fill = name)) +
geom_bar(stat = "identity") +
facet_wrap(~rowname, nrow = 1) # use facet_wrap to display nestedness
ggplot(df, aes(x = rowname, y = value, fill = name)) +
geom_bar(position = "dodge", stat = "identity")
This can be helpful
library(tidyr)
library(ggplot2)
df %>%
pivot_longer(ACI:SB) %>%
mutate(across(where(is.character), as.factor)) %>%
ggplot(aes(x = R, y = value, fill=name))+
geom_bar(stat="identity", position = "dodge", width=0.75)+
facet_wrap(~A, nrow=1, strip.position="bottom") +
theme(legend.position = "bottom") +
labs(fill="", y="", x="")
Produces:
If you want "to speak R with Excel accent" and convert this nice plot into a default excel plot, then you can add at the end of the plot theme_excel_new() from ggtheme package
library(ggthemes)
... +
theme_excel_new()
It'll give the following plot
Sample data:
structure(list(A = c(25, 25, 25, 50, 50, 50, 100, 100, 100, 250,
250, 250), R = c("R1", "R2", "R3", "R1", "R2", "R3", "R1", "R2",
"R3", "R1", "R2", "R3"), ACI = c(2.94, 1.91, 8.86, 5.03, 8.77,
1.89, 7.58, 7.24, 9.44, 5.48, 7.12, 3.89), PB = c(1.01, 9.27,
2.83, 5.91, 1.1, 8.41, 3.18, 7.83, 2.68, 2.19, 5.17, 2.69), NB = c(1.81,
5.19, 5.63, 1.29, 2.56, 7.18, 9.61, 1, 7.63, 9.48, 8.19, 3.08
), Bca = c(6.5, 9.53, 9.54, 3.4, 2.62, 1.65, 3.22, 5.1, 9.24,
5.11, 2.58, 0.46), SB = c(4.18, 8.54, 3.47, 1.31, 3.74, 6.31,
3.9, 6.9, 6.89, 5.55, 4.3, 4.53), `round(2)` = c(2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2)), class = "data.frame", row.names = c(NA,
-12L))
I have use this code to make hierarchical cluster heatmap but no color is coming
library(tidyverse)
Mydata <- structure(list(Location = c("Karnaphuli River", "Sangu River", "Kutubdia Channel", "Moheshkhali Channel", "Bakkhali River", "Naf River", "St. Martin's Island", "Mean "), Cr = c(114.92, 2.75, 18.88, 27.6, 39.5, 12.8, 17.45, 33.41), Pb = c(31.29, 26.42, 52.3, 59.45, 34.65, 12.8, 9.5, 32.34), Cu = c(9.48, 54.39, 52.4, 73.28, 76.26, 19.48, 8.94, 42.03), Zn = c(66.2, 71.17, 98.7, 95.3, 127.84, 27.76, 21.78, 72.67), As = c(89.67, 9.85, 8.82, 18.54, 15.38, 7.55, 16.45, 23.75), Cd = c(1.06, 0, 0.96, 2.78, 3.12, 0.79, 0.45, 1.53)), class = "data.frame", row.names = c(NA, -8L))
library(pheatmap)
Mydata %>% column_to_rownames(var = "Location") %>%
as.matrix() %>% pheatmap(Mydata, cutree_cols = 6)
You don't need to pass data again when using pipes. Try :
library(pheatmap)
Mydata %>%
column_to_rownames(var = "Location") %>%
as.matrix() %>% pheatmap(cutree_cols = 6)
I'm creating a custom chart to visualize a variable's distribution using geom_density. I added 3 vertical lines for a custom value, the 5th percentile and the 95th percentile.
How do I add labels for those lines?
I tried using geom_text but i don't know how to parameter the x and y variables
library(ggplot2)
ggplot(dataset, aes(x = dataset$`Estimated percent body fat`)) +
geom_density() +
geom_vline(aes(xintercept = dataset$`Estimated percent body fat`[12]),
color = "red", size = 1) +
geom_vline(aes(xintercept = quantile(dataset$`Estimated percent body fat`,
0.05, na.rm = TRUE)),
color = "grey", size = 0.5) +
geom_vline(aes(xintercept = quantile(dataset$`Estimated percent body fat`,
0.95, na.rm = TRUE)),
color="grey", size=0.5) +
geom_text(aes(x = dataset$`Estimated percent body fat`[12],
label = "Custom", y = 0),
colour = "red", angle = 0)
I'd like to obtain the following:
for the custom value, I'd like to add the label at the top of the chart, just to the right of the line
for the percentiles label, I'd like to add them in the middle of the chart; at the left of the line for the 5th percentile and right of the line for 95th percentile
Here is what I was able to obtain https://i.imgur.com/thSQwyg.png
And these are the first 50 lines of my dataset:
structure(list(`Respondent sequence number` = c(21029L, 21034L,
21043L, 21056L, 21067L, 21085L, 21087L, 21105L, 21107L, 21109L,
21110L, 21125L, 21129L, 21138L, 21141L, 21154L, 21193L, 21195L,
21206L, 21215L, 21219L, 21221L, 21232L, 21239L, 21242L, 21247L,
21256L, 21258L, 21287L, 21310L, 21325L, 21367L, 21380L, 21385L,
21413L, 21418L, 21420L, 21423L, 21427L, 21432L, 21437L, 21441L,
21444L, 21453L, 21466L, 21467L, 21477L, 21491L, 21494L, 21495L
), `Estimated percent body fat` = c(NA, 7.2, NA, NA, 24.1, 25.1,
30.2, 23.6, 24.3, 31.4, NA, 14.1, 20.5, NA, 23.1, 30.6, 21, 20.9,
NA, 24, 26.7, 16.6, NA, 26.9, 16.9, 21.3, 15.9, 27.4, 13.9, NA,
20, NA, 12.8, NA, 33.8, 18.1, NA, NA, 28.4, 10.9, 38.1, 33, 39.3,
15.9, 32.7, NA, 20.4, 16.8, NA, 29)), row.names = c(NA, 50L), class =
"data.frame")
First I recommend clean column names.
dat <- dataset
names(dat) <- tolower(gsub("\\s", "\\.", names(dat)))
Whith base R plots you could do the following. The clou is, that you can store the quantiles and custom positions to use them as coordinates later which gives you a dynamic positioning. I'm not sure if/how this is possible with ggplot.
plot(density(dat$estimated.percent.body.fat, na.rm=TRUE), ylim=c(0, .05),
main="Density curve")
abline(v=c1 <- dat$estimated.percent.body.fat[12], col="red")
abline(v=q1 <- quantile(dat$estimated.percent.body.fat, .05, na.rm=TRUE), col="grey")
abline(v=q2 <- quantile(dat$estimated.percent.body.fat, .95, na.rm=TRUE), col="grey")
text(c1 + 4, .05, c(expression("" %<-% "custom")), cex=.8)
text(q1 - 5.5, .025, c(expression("5% percentile" %->% "")), cex=.8)
text(q2 + 5.5, .025, c(expression("" %<-% "95% percentile")), cex=.8)
Note: Case you don't like the arrows just do e.g. "5% percentile" instead of c(expression("5% percentile" %->% "")).
Or in ggplot you could use annotate.
library(ggplot2)
ggplot(dataset, aes(x = dataset$`Estimated percent body fat`)) +
geom_density() +
geom_vline(aes(xintercept = dataset$`Estimated percent body fat`[12]),
color = "red", size = 1) +
geom_vline(aes(xintercept = quantile(dataset$`Estimated percent body fat`,
0.05, na.rm = TRUE)),
color = "grey", size = 0.5) +
geom_vline(aes(xintercept = quantile(dataset$`Estimated percent body fat`,
0.95, na.rm = TRUE)),
color="grey", size=0.5) +
annotate("text", x=16, y=.05, label="custom") +
annotate("text", x=9.5, y=.025, label="5% percentile") +
annotate("text", x=38, y=.025, label="95% percentile")
Note, that in either solution the result (i.e. exact label positions) depends on your export size. To learn how to control this, take e.g. a look into How to save a plot as image on the disk?.
Data
dataset <- structure(list(`Respondent sequence number` = c(21029L, 21034L,
21043L, 21056L, 21067L, 21085L, 21087L, 21105L, 21107L, 21109L,
21110L, 21125L, 21129L, 21138L, 21141L, 21154L, 21193L, 21195L,
21206L, 21215L, 21219L, 21221L, 21232L, 21239L, 21242L, 21247L,
21256L, 21258L, 21287L, 21310L, 21325L, 21367L, 21380L, 21385L,
21413L, 21418L, 21420L, 21423L, 21427L, 21432L, 21437L, 21441L,
21444L, 21453L, 21466L, 21467L, 21477L, 21491L, 21494L, 21495L
), `Estimated percent body fat` = c(NA, 7.2, NA, NA, 24.1, 25.1,
30.2, 23.6, 24.3, 31.4, NA, 14.1, 20.5, NA, 23.1, 30.6, 21, 20.9,
NA, 24, 26.7, 16.6, NA, 26.9, 16.9, 21.3, 15.9, 27.4, 13.9, NA,
20, NA, 12.8, NA, 33.8, 18.1, NA, NA, 28.4, 10.9, 38.1, 33, 39.3,
15.9, 32.7, NA, 20.4, 16.8, NA, 29)), row.names = c(NA, 50L), class =
"data.frame")
I made a barplot with error bars and labels written on the bars.
My problem is: I want the labels to appear on the bars and also next to the error bars. That is, I don't want labels and error bars to overlap.
An example with my code:
i <- data.frame(
nbr =c(15.18 ,11.53 ,13.37 ,9.2, 10.9, 12.23 ,9.53, 9.81, 7.86, 12.79,
22.03 ,17.64 ,18.1, 16.78 ,17.53 ,16.97 ,17.76 ,18.35 ,12.82 ,20.91,
22.09 ,19.18 ,17.54 ,18.45 ,19.83 ,16.99 ,19.69 ,19.45 ,13.07 ,21.41,
12.13 ,9.76, 10.79 ,10.74 ,12.43 ,9.65, 12.18 ,11.63 ,6.74, 12.31,
17.5, 14.75 ,15.2, 13.89 ,15.24 ,17.43 ,15.22 ,14.04,9.49, 15.86,
8.09, 5.86, 6.68, 7.34, 8.01, 6.35, 8.4, 7.4, 3.88, 6.92 ),
SD = c(4.46, 4.19, 2.27, 2.19, 5.10, 7.25, 8.42, 6.47, 6.04, 7.48, 6.38, 6.05, 3.58, 3.85,
6.94, 6.87, 6.32, 4.28, 4.10, 7.34, 7.46, 6.62, 4.28, 5.24, 8.00, 8.10, 7.73, 5.18,
5.53, 7.96, 7.46, 7.05, 4.47, 4.73, 8.15, 6.95, 5.88, 3.20, 4.01, 7.34, 7.24, 6.98,
5.98, 4.53, 4.22, 7.21, 4.02, 4.30, 1.96, 2.11, 4.98, 7.16, 8.45, 6.39, 6.20, 7.03,
6.10, 6.42, 3.77, 3.53),
x2=rep(c("a", "b", "c", "d", "e", "f", "g",
"h", "i", "j"),6),
s = c(rep(c(rep(c("3"),10),
rep(c("4"),10),
rep(c("5"),10),
rep(c("6"),10),
rep(c("7"),10),
rep(c("8"),10)),1)))
ii <- i[order(i$s, i$nbr ), ]
sn <- factor(x = 1:60, labels = ii$x2)
ii$sn <- sn
scale_x_reordered <- function(..., sep = "___") {
reg <- paste0(sep, ".+$")
ggplot2::scale_x_discrete(labels = function(x) gsub(reg, "", x), ...)
}
reorder_within <- function(x, by, within, fun = mean, sep = "___", ...) {
new_x <- paste(x, within, sep = sep)
stats::reorder(new_x, by, FUN = fun)
}
dummy2 <- data.frame(s = levels(i$s)[-1], Z = c( 4,16,16,8,4))
dummy2$s <- factor(dummy2$s)
ggplot(ii, aes(reorder_within(sn, nbr, s), nbr,
label =x2)) +
geom_bar(stat = 'identity') +
geom_text(aes(y = 0,fontface=2), angle = 90, hjust = -.05, size = 4)+
scale_x_reordered() +
facet_wrap(.~ s, scales = "free_x", ncol=2)+
#geom_text(aes(label=nbr), vjust=1.6, color="white", size=3.5)+
theme(axis.text.x = element_blank(),
axis.title=element_text(size=16),
axis.text=element_text(face = "bold"),
strip.text.x = element_text(size = 14,face="bold")
)+ geom_errorbar(aes(reorder_within(sn, nbr, s),ymin=nbr-SD, ymax=nbr+SD), width=.2, position=position_dodge(.9))
Example of expected parcel:
I want all the labels to be written next to the error bars on the bars.
Thanks for your help !
I found this solution and wanted to share it with you:
geom_text(aes(y = 0,fontface=2), angle = 90, vjust = -1, hjust = -.05, size = 4)