ggplot 2 heatmap with varing axis - r

I want to draw a heatmap, but the size of units on the x (and y) Axis should vary. Here an example code:
users = rep(1:3,3)
Inst = c(rep("A",3),rep("B",3),rep("C",3))
dens = rnorm(9)
n_inst = c(3,3,3,2,2,2,1,1,1)
df <- data.frame( users, Inst, dens, n_inst )
1 1 A 1.2521487 3
2 2 A -0.1013088 3
3 3 A 1.5770535 3
4 1 B 1.1093957 2
5 2 B 1.1059166 2
6 3 B 0.6884662 2
7 1 C -0.3864710 1
8 2 C -1.0216373 1
9 3 C 0.4500778 1
z <- ggplot(df, aes(Inst, users)) + geom_tile(aes(fill = dens))
z + scale_x_discrete(breaks = n_inst)
So this draws a heatmap, but all units of Inst have the same size. I want A to be 3 times the width of C and B two times the width of C. So I want n_inst to give the width of units.
I tried scale_discret, but that doesn't do it
Thank you in advance.

You can try this:
ggplot(df, aes(Inst, users)) + geom_tile(aes(fill = dens, width=n_inst))

Related

Changing fill color in geom_density beyond threshold in facet_grid plot

I want to accentuate the area in the faceted density plots above the measure threshold of 2 (e.g., red shading for x=>2). This solution works well for a single facet factor, but I have two factors. How do I specify the levels for the two factors when using ggplot_build? Or do I need to use a different approach?
Here's a bit of the dataframe (the dataframe is 750 rows):
mode.f task.f mgds
1 1 A 1.1413636
2 1 A 0.9105000
3 2 A 1.0320000
4 2 A 1.1811429
14 1 C 1.4646000
15 1 C 1.7505000
16 2 C 1.3968000
17 1 D 1.0668333
18 1 D 1.0084000
19 1 D 1.1622500
20 2 D 1.3452500
21 2 D 1.0132000
22 3 C 0.6960000
23 3 C 0.9180000
24 3 D 1.0128000
25 3 D 0.6670000
26 2 E 2.9190000
27 2 E 1.3755000
28 2 E 1.4080000
29 1 E 1.3878000
30 1 E 1.4816667
Here's the code that works for a single facet factor:
mp <- ggplot(df,aes(x=mgds))+
geom_density(color=NA,fill="gray30",alpha=.4)+
facet_wrap(~mode.f)+
theme_bw()+
theme(strip.background =
element_rect(fill="gray95",color="gray60"),
strip.text = element_text(colour="black",size=10),
panel.border = element_rect(color="gray60"))+
labs(x="MGD (s)",y="Density")
to_fill <- data_frame(
x = ggplot_build(mp)$data[[1]]$x,
y = ggplot_build(mp)$data[[1]]$y,
mode.f = factor(ggplot_build(mp)$data[[1]]$PANEL, levels =
c(1,2,3), labels = c("1","2","3")))
mp + geom_area(data = to_fill[to_fill$x >= 2, ],
aes(x=x, y=y), fill = "red")
Here's the code for the facet_grid plots that I want to have the area beyond the x=2 threshold be a different color 2
ggplot(df,aes(x=mgds))+
geom_density(color=NA,fill="gray30",alpha=.4)+
facet_grid(~mode.f~task.f)+
theme_bw()+
theme(strip.background = element_rect(fill="gray95",color="gray60"),
strip.text = element_text(colour="black",size=10),
panel.border = element_rect(color="gray60"))+
geom_vline(xintercept=2,linetype="longdash",color="gray50")+
labs(x="Measure",y="Density")

How to get percentages on the y axes in an alluvial or sankey plot?

I realized this graph using ggplot2 and I'd like to change y axes to percentages, from 0% to 100% with breaks every 10.
I know I can use:
+ scale_y_continuous(label=percent, breaks = seq(0,1,.1))
but I still get a problem because, turning into percentages, R interpret 30000 as 30000%, so if a limit to 100% I don't get anything in my graph.
How can I manage it?
I have a dataset like this:
ID time value
1 1 B with G available
2 1 Generic
3 1 B with G available
4 1 Generic
5 1 B with G available
6 1 Generic
7 1 Generic
8 1 Generic
9 1 B with G available
10 1 B with G available
11 1 Generic
12 1 B with G available
13 1 B with G available
14 1 B with G available
15 1 Generic
16 1 B with G available
17 1 B with G available
18 1 B with G available
19 1 B with G available
20 1 B with G available
1 2 B with G available
2 2 Generic
3 2 B with G available
4 2 Generic
5 2 B with G available
6 2 Generic
7 2 Generic
8 2 Generic
9 2 B with G available
10 2 B with G available
11 2 Generic
12 2 B with G available
13 2 B with G available
14 2 B with G available
15 2 Generic
16 2 B with G available
17 2 switch
18 2 B with G available
19 2 B with G available
20 2 switch
which is reproducible with this code:
PIPPO <- data.frame("ID"=rep(c(1:20),2), "time"=c(rep(1,20),rep(2,20)), "value"=c("B","G","B","G","B",rep("G",3),rep("B",2),"G",rep("B",3),"G",rep("B",6),"G","B","G","B",rep("G",3),rep("B",2),"G",rep("B",3),"G","B","switch",rep("B",2),"switch"))
so I don't have a variable for y axes I can manage.
Here my code and the plot I obtained
ggplot(PIPPO,
aes(x = time, stratum = value, alluvium = ID,
fill = value, label = value)) +
scale_fill_brewer(type = "qual" , palette = "Set3") +
geom_flow(stat = "flow", knot.pos = 1/4, aes.flow = "forward",
color = "gray") +
geom_stratum() +
theme(legend.position = "bottom")
Could anyone help me?
What I get on real data using
scale_y_continuous(label = scales::percent_format(scale = 100 / n_id))
is this:
with 84% as the maximum value (and not 100%). How can i get the y-axes up to 100% and broken every 10% ?
Here what I get with
scale_y_continuous(breaks = scales::pretty_breaks(10), label = scales::percent_format(scale = 100 / n_id))
I get this weird values every 14%.
Using the scale argument in percent_format this can be achieved like so:
PIPPO <- data.frame("ID"=rep(c(1:20),2), "time"=c(rep(1,20),rep(2,20)), "value"=c("B","G","B","G","B",rep("G",3),rep("B",2),"G",rep("B",3),"G",rep("B",6),"G","B","G","B",rep("G",3),rep("B",2),"G",rep("B",3),"G","B","switch",rep("B",2),"switch"))
library(ggplot2)
library(ggalluvial)
n_id <- length(unique(PIPPO$ID))
ggplot(PIPPO,
aes(x = time, stratum = value, alluvium = ID,
fill = value, label = value)) +
scale_fill_brewer(type = "qual" , palette = "Set3") +
scale_y_continuous(label = scales::percent_format(scale = 100 / n_id)) +
geom_flow(stat = "flow", knot.pos = 1/4, aes.flow = "forward", color = "gray",) +
geom_stratum() +
theme(legend.position = "bottom")
Created on 2020-05-19 by the reprex package (v0.3.0)
I assume You will need to create a new column of the percentages, by taking the total number of rows, and then dividing each "value" in your column by the total to get what percentage it represents.
Simply normalising your y-values seems to do the trick:
library(ggplot2)
ggplot(mtcars, aes(x = cyl, y = mpg/max(mpg))) +
geom_point() +
scale_y_continuous(label = scales::label_percent())
Created on 2020-05-19 by the reprex package (v0.3.0)

R Histograms X axis not equal distributed

I woould like to display a histogram with the allocation of school notes.
The dataframe looks like:
> print(xls)
# A tibble: 103 x 2
X__1 X__2
<dbl> <chr>
1 3 w
2 1 m
3 2 m
4 1 m
5 1 w
6 0 m
7 3 m
8 1 w
9 0 m
10 5 m
I create the histogram with:
hist(xls$X__1, main='Notenverteilung', xlab='Note (0 = keine Beurteilung)', ylab='Anzahl')
It looks like:
Why are there spaces between 1,2,3 but not between 0 & 1?
Thanks, BR Bernd
Use ggplot2 for that, and your bars will be aligned
library(ggplot2)
ggplot(xls, aes(x = X__1)) + geom_histogram(binwidth = 1)
You can try
barplot(table(xls$X__1))
or try
h <- hist(xls$X__1, xaxt = "n", breaks = seq(min(xls$X__1), max(xls$X__1)))
axis(side=1, at=h$mids, labels=seq(min(xls$X__1), max(xls$X__1))[-1])
and using ggplot
ggplot(xls, aes(X__1)) +
geom_histogram(binwidth = 1, color=2) +
scale_x_continuous(breaks = seq(min(xls$X__1), max(xls$X__1)))

How do I adjust the scale of a geom_tile in ggplot2?

I am trying to adjust the colour scale of a geom_tile plot.
A short version of my data (in data.frame format) is:
mydat <-
Sc K n minC
A 2 1 NA
A 2 2 37.453023
A 2 3 23.768316
A 2 4 17.628376
A 3 1 NA
A 3 2 12.693124
A 3 3 8.884226
A 3 4 7.436250
A 10 1 2.128121
A 10 2 2.116539
A 10 3 2.737923
A 10 4 3.509773
A 20 1 1.104592
A 20 2 1.840195
A 20 3 2.717198
A 20 4 3.616501
B 2 1 NA
B 2 2 25.090085
B 2 3 15.924186
B 2 4 11.811022
B 3 1 NA
B 3 2 8.827183
B 3 3 6.179484
B 3 4 5.175331
B 10 1 2.096934
B 10 2 2.064984
B 10 3 2.662373
B 10 4 3.407246
B 20 1 1.096871
B 20 2 1.802418
B 20 3 2.649153
B 20 4 3.517776
My code to prepare the data to plot is the following:
mydat$Sc <- factor(mydat$Sc, levels =c("A", "B"))
mydat$K <- factor(mydat$K, levels =c("2", "3","10","20"))
mydat.m <- melt(pmydat,id.vars=c("Sc","K","n"), measure.vars=c("minC"))
I want to plot with geom_tile the value of minC with K and n as axis and different facets for Sc with the following:
mydat.m.p <- ggplot(mydat.m, aes(x=n, y=K))
mydat.m.p +
geom_tile(data=mydat.m, aes(fill=value)) +
scale_fill_gradient(low="palegreen", high="lightcoral") +
facet_wrap(~ Sc, ncol=2)
This gives me a plot for each Sc factor. However, the colour scale does not reflect want I want to portray, because a few high values making low values all equal.
I want to adjust to a relevant scale in 4 breaks, i.e., 1-2, 2-3, 3-5, >5.
Looking at other questions there was a suggestion to use the cut function and scale fill manual as:
mydat.m$value1 <- cut(mydat.m$value, breaks = c(1:5, Inf), right = FALSE)
Then use the following in geom_tile:
scale_fill_manual(breaks = c("\[1,2)", "\[2, 3)", "\[3, 5)", "\[5, Inf)"),
values = c("darkgreen", "palegreen", "lightcoral", "red"))
However, I am not sure how this can be applied to a data.frame with other factors and in long format.
You're almost there. Simply use cut before melting:
mydat$minC.cut <- cut(mydat$minC, breaks = c(1:3, 5, Inf), right = FALSE)
mydat.cut <- melt(mydat, id.vars=c("Sc", "K", "n"), measure.vars=c("minC.cut"))
Now, you don't need to specify breaks since we took care of that already.
ggplot(mydat.cut, aes(x=n, y=K)) +
geom_tile(aes(fill=value)) +
facet_wrap(~ Sc, ncol=2) +
scale_fill_manual(values = c("darkgreen", "palegreen", "lightcoral", "red"))

ggplot2 issue with y axis

I have the following means table:
Sex Trait Average
1 1 -N 9.042735
2 2 -N 3.529577
3 1 E 8.111111
4 2 E 9.447887
5 1 O 17.196580
6 2 O 16.311800
7 1 A 12.213680
8 2 A 13.449440
9 1 C 12.025640
10 2 C 14.529580
From where I run the following graph:
library(ggplot2)
plot <- ggplot(meansMatrix, aes(Trait, Average, colour= Sex,group= Sex)) +
geom_line(aes(linetype=Sex),size=1) +
geom_point(size=3,fill="white") +
scale_color_manual(values = c("black", "grey50")) +
scale_y_discrete(limits=c(0,18),breaks=seq(2,18,2.5),labels=seq(2,18,2.5)) +
scale_x_discrete(limits=c("-N","E","O","A","C")); plot
There is visible a problem with the y axis. After setting the variable Average as numeric, I have tried with different combinations by changing the arguments (limits, breaks and labels) with no success. This is graph is the only that pops up else than error messages.
Any input of how to re-locate the plot and show the corresponding breaks will be highly appreciated!
Use scale_y_continuous:
meansMatrix <- read.table(text=" Sex Trait Average
1 1 -N 9.042735
2 2 -N 3.529577
3 1 E 8.111111
4 2 E 9.447887
5 1 O 17.196580
6 2 O 16.311800
7 1 A 12.213680
8 2 A 13.449440
9 1 C 12.025640
10 2 C 14.529580", header=TRUE)
meansMatrix$Sex <- factor(meansMatrix$Sex)
library(ggplot2)
p <- ggplot(meansMatrix, aes(Trait, Average, colour= Sex,group= Sex)) +
geom_line(aes(linetype=Sex),size=1) +
geom_point(size=3,fill="white") +
scale_color_manual(values = c("black", "grey50")) +
scale_y_continuous(limits=c(0,18),breaks=seq(2,18,2.5),labels=seq(2,18,2.5)) +
scale_x_discrete(limits=c("-N","E","O","A","C"))
print(p)

Resources