Legends not showing up properly in heatmap with ggplot2 - r

I am trying to make a heatmap of normalized read abundance values with geom_tile in ggplot2 based on the example code here. My current code produces a heatmap for the desired ranges, but for some reason only 4 out of the 7 ranges are shown in heatmap and I cannot figure out what is the issue. When I followed the example in the original link it worked fine, so I must have changed something incorrectly in my code. Can anyone please help me to identify the error in my code that is causing this?
I want to have the following color scheme:
-Inf < value <= 0 -> white
0 < value <=1 -> yellow
1< value <=10 -> orange
10< value <= 100 -> darkorange2
100< value <= 1000 -> red
1000 <value <= 10000 -> red3
10000 < value <= 32000 -> red4
Here is my code:
#re-order the labels in the order of appearance in the data frame
df$label <- factor(df$X1, as.character(df$X1))
# make the cuts
df$value1 <-cut(df$value,breaks=c(Inf,0,1,10,100,1000,10000,32000),right = T)
ggplot(data = df, aes(x = label, y = X2)) + geom_tile(aes(fill=value1), colour= "black") + scale_fill_manual(breaks=c("(-Inf,0]", "(0,1]", "(1,10]", "(10,100]", "(100,1000]", "(1000,10000]", "(10000,32000]"),values =c("white","yellow","orange","darkorange2","red","red3","red4"))
here is a preview of my data (actual data has 228 rows featuring reads per million values for 38 IDs in 6 different experiments):
head(df)
X1 X2 value label value1
1 merged_read_17785-997_aka_156_aka_21 RPM.MT1 91.783028 merged_read_17785-997_aka_156_aka_21 (10,100]
2 merged_read_133362-79_aka_156_aka_21 RPM.MT1 6.403467 merged_read_133362-79_aka_156_aka_21 (1,10]
3 merged_read_147828-69_aka_156_aka_20 RPM.MT1 4.268978 merged_read_147828-69_aka_156_aka_20 (1,10]
4 merged_read_162443-60_aka_156_aka_21 RPM.MT1 0.000000 merged_read_162443-60_aka_156_aka_21 (-Inf,0]
5 merged_read_262156-32_aka_156_aka_21 RPM.MT1 5.691971 merged_read_262156-32_aka_156_aka_21 (1,10]
6 merged_read_22905-759_aka_159_aka_21 RPM.MT1 140.164780 merged_read_22905-759_aka_159_aka_21 (100,1e+03]
And here is the plot that I get from the above data:

I think I figured this out, if I take out the breaks argument from scale_fill_manual then all legends are shown:
ggplot(data = df, aes(x = label, y = X2)) + geom_tile(aes(fill=value1), colour= "black") + scale_fill_manual(values =c("white","yellow","orange","darkorange2","red","red3","red4"))

Related

geom_vline doesn't work after the scale_x_discrete in R

I am a newie here, sorry for not writing the question right :p
1, the aim is to plot a graph about the mean NDVI value during a time period (8 dates were chosen from 2019-05 to 2019-10) of my study site (named RB1). And plot vertical lines to show the date with a grass cutting event.
2, Now I had calculated the NDVI value for these 8 chosen dates and made a CSV file.
(PS. the "cutting" means when the grassland on the study site has been cut, so the corresponding dates should be show as a vertical line, using geom_vline)
infor <- read_csv("plotting information.csv")
infor
# A tibble: 142 x 3
date NDVI cutting
<date> <dbl> <lgl>
1 2019-05-12 NA NA
2 2019-05-13 NA NA
3 2019-05-14 NA NA
4 2019-05-15 NA NA
5 2019-05-16 NA NA
6 2019-05-17 0.787 TRUE
# ... with 132 more rows
3, the problem is, when I do the ggplot, first I want to keep the x-axis as the whole time period (2019-05 to 2019-10) but of course not show all dates in between, otherwise there will be way too much dates show on the x-axis). So, I do the scale_x_discrte(breaks=, labels=) to show the specific dates with NDVI values.
Second I also want to show the dates that the grasses were cut geom_vline.
BUT, it seems like the precondition for scale_x_discrte is to factor my date, while the precondition for geom_vline is to keep the date as nummeric.
these two calls seems to be contradictory.
y1 <- ggplot(infor, aes(factor(date), NDVI, group = 1)) +
geom_point() +
geom_line(data=infor[!is.na(infor$NDVI),]) +
scale_x_discrete(breaks = c("2019-05-17", "2019-06-18", "2019-06-26", "2019-06-28","2019-07-23","2019-07-28", "2019-08-27","2019-08-30", "2019-09-21"),
labels = c("0517","0618","0626","0628","0723","0728", "0827","0830","0921")))
y2 <- ggplot(infor, aes(date, NDVI, group = 1)) +
geom_point() +
geom_line(data=infor[!is.na(infor$NDVI),]))
when I add the geom_vline in the y1, vertical lines do not show on my plot:
y1 + geom_vline
when I add it in the y2, vertical lines were showed, but the dates (x axis) are weird (not show as the y1 because we donot run the scale_x_ here)
y2 + geom_vline
y1 +
geom_vline(data=filter(infor,cutting == "TRUE"), aes(xintercept = as.numeric(date)), color = "red", linetype ="dashed")
Would be appreciated if you can help!
thanks in advance! :D
I agree with the comment about leaving dates as dates. In this case, you can specify the x-intercept of geom_vline as a date.
Given basic data:
df <- tribble(
~Date, ~Volume, ~Cut,
'1-1-2010', 123456, 'FALSE',
'5-1-2010', 789012, 'TRUE',
'9-1-2010', 5858585, 'TRUE',
'12-31-2010', 2543425, 'FALSE'
)
I set the date and then pull the subset for Cut=='TRUE' into a new object:
df <- mutate(df, Date = lubridate::mdy(Date))
d2 <- filter(df, Cut == 'TRUE') %>% pull(Date)
And finally use the object to specify intercepts:
df %>%
ggplot(aes(x = Date, y = Volume)) +
geom_vline(xintercept = d2) +
geom_line()

R Plot Bar graph transposed dataframe

I'm trying to plot the following dataframe as bar plot, where the values for the filteredprovince column are listed on a separate column (n)
Usually, the ggplot and all the other plots works on horizontal dataframe, and after several searches I am not able to find a way to plot this "transposed" version of dataframe.
The cluster should group each bar graph, and within each cluster I would plot each filteredprovince based on the value of the n column
Thanks you for the support
d <- read.table(text=
" cluster PROVINCIA n filteredprovince
1 1 08 765 08
2 1 28 665 28
3 1 41 440 41
4 1 11 437 11
5 1 46 276 46
6 1 18 229 18
7 1 35 181 other
8 1 29 170 other
9 1 33 165 other
10 1 38 153 other ", header=TRUE,stringsAsFactors = FALSE)
UPDATE
Thanks to the suggestion in comments I almost achived the format desired :
ggplot(tab_s, aes(x = cluster, y = n, fill = factor(filteredprovince))) + geom_col()
There is any way to put on Y labels not frequencies but the % ?
If I understand correctly, you're trying to use the geom_bar() geom which gives you problems because it wants to make sort of an histogram but you already have done this kind of summary.
(If you had provided code which you have tried so far I would not have to guess)
In that case you can use geom_col() instead.
ggplot(d, aes(x = filteredprovince, y = n, fill = factor(PROVINCIA))) + geom_col()
Alternatively, you can change the default stat of geom_bar() from "count" to "identity"
ggplot(d, aes(x = filteredprovince, y = n, fill = factor(PROVINCIA))) +
geom_bar(stat = "identity")
See this SO question for what a stat is
EDIT: Update in response to OP's update:
To display percentages, you will have to modify the data itself.
Just divide n by the sum of all n and multiply by 100.
d$percentage <- d$n / sum(d$n) * 100
ggplot(d, aes(x = cluster, y = percentage, fill = factor(filteredprovince))) + geom_col()
I'm not sure I perfectly understand, but if the problem is the orientation of your dataframe, you can transpose it with t(data) where data is your dataframe.

ggplot2 adding custom legend when plotting two lines from subset of columns

I've looked all over stack and other sites to fix my code but can't see what's wrong. I am trying to plot 2 lines on the same graph on ggplot that are portions of 2 different columns. For example, I have a column of length 8 of which the first four rows are M (male) and the last four rows are F (female). I have two columns of data and one column for condition (factor).
ModelMF <- data.frame(ProbGender, ProbCond, ProbMF, Act_pct)
where:
ProbGender ProbCond ProbMF Act_pct
M 0 .75 .71
M 10 .67 .69
M 20 .61 .54
M 30 .81 .77
F 0 .88 .82
F 10 .73 .71
F 20 .67 .71
F 30 .60 .63
I have tried the following but I keep getting errors (see below):
ggplot(data = ModelMF, aes(x = ProbCond)) + geom_line(data =
ModelMF[ModelMF$ProbGender=="M",], aes(y=ProbMF), color = 'col1') +
geom_point(data = ModelMF[ModelMF$ProbGender=="M",], aes(y = ProbMF)) +
geom_line(data = ModelMF[ModelMF$ProbGender=="M",], aes(y=Act_pct), color =
'col2') + geom_point(data = ModelMF[ModelMF$ProbGender=="M",], aes(y =
Act_pct)) + scale_color_manual(values = c('col1' = 'darkblue', 'col2' ='lightblue'))
Preferably I would like to be able to create a custom legend that lets me map the colors as I've attempted to do using scale_color_manual, but I get the following error:
Error in grDevices::col2rgb(colour, TRUE) : invalid color name 'col1'
I'm not sure if it is due to the fact that I'm subsetting data within the df or something else I'm just missing? Also if I add the female lines I assume I can simply follow the same procedure?
Thanks in advance.

Include space for missing factor level used in fill aesthetics in geom_boxplot

I am trying to draw a box and whisker plot in R. My code is below. At the moment, because I only have data for two months in one of the two sites, the bars are wider for that site (because the third level of month is dropped).
Instead, I would like the same pattern of boxes for site A as there is for site B (i.e. with space for an empty box on the right-hand side). I can easily do this with drop=TRUE when I only have one factor but do not seem to be able to do it with the "filling" factor.
Month=rep(c(rep(c("Jan","Feb"),2),"Mar"),10)
Site=rep(c(rep(c("A","B"),each=2),"B"),10)
factor(Month)
factor(Site)
set.seed(1114)
Height=rnorm(50)
Data=data.frame(Month,Site,Height)
plot = ggplot(Data, aes(Site, Height)) +
geom_boxplot(aes(fill=Month, drop=TRUE), na.rm=FALSE)
plot
Here is a solution, which is based on creating fake data:
Firstly, a new row is added to the data frame. It contains a data point for the non-existing combination of factor levels (Mar and A). The value of Height has to be outside the range of the real Height data.
Data2 <- rbind(Data, data.frame(Month = "Mar", Site = "A", Height = 5))
Then, the plot can be generated. Since the fake data should not be visible, the y axis limits have to be modified with coord_cartesian and the range of the original Height data.
library(ggplot2)
ggplot(Data2, aes(Site, Height)) +
geom_boxplot(aes(fill = Month)) +
coord_cartesian(ylim = range(Data$Height) + c(-.25, .25))
One way to achieve the desired look is to change data produced while plotting.
First, save plot as object and then use ggplot_build() to save all parts of plot data as object.
p<-ggplot(Data, aes(Site, Height,fill=Month)) + geom_boxplot()
dd<-ggplot_build(p)
List element data contains all information used for plotting.
dd$data
[[1]]
fill ymin lower middle upper ymax outliers notchupper notchlower x PANEL
1 #F8766D -1.136265 -0.2639268 0.1978071 0.5318349 0.9815675 0.5954014 -0.1997872 0.75 1
2 #00BA38 -1.264659 -0.6113666 0.3190873 0.7915052 1.0778202 1.0200180 -0.3818434 1.00 1
3 #F8766D -1.329028 -0.4334205 0.3047065 1.0743448 1.5257798 1.0580462 -0.4486332 1.75 1
4 #00BA38 -1.137494 -0.7034188 -0.4466927 -0.1989093 0.1859752 -1.759846 -0.1946196 -0.6987658 2.00 1
5 #619CFF -2.344163 -1.2108919 -0.5457815 0.8047203 2.3773189 0.4612987 -1.5528617 2.25 1
group weight ymin_final ymax_final xmin xmax
1 1 1 -1.136265 0.9815675 0.625 0.875
2 2 1 -1.264659 1.0778202 0.875 1.125
3 3 1 -1.329028 1.5257798 1.625 1.875
4 4 1 -1.759846 0.1859752 1.875 2.125
5 5 1 -2.344163 2.3773189 2.125 2.375
You are interested in x, xmax and xmin values. First two rows correspond to level A. Those values should be changed.
dd$data[[1]]$x[1:2]<-c(0.75,1)
dd$data[[1]]$xmax[1:2]<-c(0.875,1.125)
dd$data[[1]]$xmin[1:2]<-c(0.625,0.875)
Now use ggplot_gtable() and grid.draw() to plot changed data.
library(grid)
grid.draw(ggplot_gtable(dd))
There is an easy way to do this now using 'preserve' in the position see here. For the plot above this would be:
Month = rep(c(rep(c("Jan", "Feb"), 2), "Mar"), 10)
Site = rep(c(rep(c("A", "B"), each = 2), "B"), 10)
factor(Month)
factor(Site)
set.seed(1114)
Height = rnorm(50)
Data = data.frame(Month, Site, Height)
plot = ggplot(Data, aes(Site, Height)) +
geom_boxplot(
aes(fill = Month, drop = TRUE),
na.rm = FALSE,
## Note:
position = position_dodge(preserve = 'single')
)
plot

add custom legend in ggplot2

I'm working with ggplot2 to generate some geom_line plots which i've already generated from another data.frame which is not important to mention here. but it also contains the same id value as the following dataframe.
I have this data frame called df:
id X Y total
1 3214 6786 10000
2 4530 5470 10000
3 2567 7433 10000
4 1267 8733 10000
5 2456 7544 10000
6 6532 6532 10000
7 5642 4358 10000
What i want to do is create custom legend which present for a specific id the percentage of X and Y on each of the geom_line for when the id variable is the same. So basically for each geom_line of e.g(id=1, draw the percentage for that id in the geom_line plot)
I've tried to use geom_text, but the problem is that it's printing everything in one line which i cannot see anything of it.
how this can be done ??
EDIT
olddf dataframe is something like that:
id pos X Y Z
1
1.....
1
2
3
4
3 ......
.
.
that's the code that i've tried
for(i in df$id)
{
test = subset(olddf, id==i)
mdata <- melt(test, id=c("pos","id"))
pl = ggplot() + geom_line(data=mdata, aes(x=pos, y=value, color=variable)) + geom_text(data=df, aes(x=6000, y=0.1, label=(X*total)/100), size=5)
}
The answer (as discussed in chat) is quite straightforward:
Change geom_text(data = df, ...) to geom_text(data = df[df$id == i, ], ...)

Resources