R Histograms X axis not equal distributed - r

I woould like to display a histogram with the allocation of school notes.
The dataframe looks like:
> print(xls)
# A tibble: 103 x 2
X__1 X__2
<dbl> <chr>
1 3 w
2 1 m
3 2 m
4 1 m
5 1 w
6 0 m
7 3 m
8 1 w
9 0 m
10 5 m
I create the histogram with:
hist(xls$X__1, main='Notenverteilung', xlab='Note (0 = keine Beurteilung)', ylab='Anzahl')
It looks like:
Why are there spaces between 1,2,3 but not between 0 & 1?
Thanks, BR Bernd

Use ggplot2 for that, and your bars will be aligned
library(ggplot2)
ggplot(xls, aes(x = X__1)) + geom_histogram(binwidth = 1)

You can try
barplot(table(xls$X__1))
or try
h <- hist(xls$X__1, xaxt = "n", breaks = seq(min(xls$X__1), max(xls$X__1)))
axis(side=1, at=h$mids, labels=seq(min(xls$X__1), max(xls$X__1))[-1])
and using ggplot
ggplot(xls, aes(X__1)) +
geom_histogram(binwidth = 1, color=2) +
scale_x_continuous(breaks = seq(min(xls$X__1), max(xls$X__1)))

Related

How to get percentages on the y axes in an alluvial or sankey plot?

I realized this graph using ggplot2 and I'd like to change y axes to percentages, from 0% to 100% with breaks every 10.
I know I can use:
+ scale_y_continuous(label=percent, breaks = seq(0,1,.1))
but I still get a problem because, turning into percentages, R interpret 30000 as 30000%, so if a limit to 100% I don't get anything in my graph.
How can I manage it?
I have a dataset like this:
ID time value
1 1 B with G available
2 1 Generic
3 1 B with G available
4 1 Generic
5 1 B with G available
6 1 Generic
7 1 Generic
8 1 Generic
9 1 B with G available
10 1 B with G available
11 1 Generic
12 1 B with G available
13 1 B with G available
14 1 B with G available
15 1 Generic
16 1 B with G available
17 1 B with G available
18 1 B with G available
19 1 B with G available
20 1 B with G available
1 2 B with G available
2 2 Generic
3 2 B with G available
4 2 Generic
5 2 B with G available
6 2 Generic
7 2 Generic
8 2 Generic
9 2 B with G available
10 2 B with G available
11 2 Generic
12 2 B with G available
13 2 B with G available
14 2 B with G available
15 2 Generic
16 2 B with G available
17 2 switch
18 2 B with G available
19 2 B with G available
20 2 switch
which is reproducible with this code:
PIPPO <- data.frame("ID"=rep(c(1:20),2), "time"=c(rep(1,20),rep(2,20)), "value"=c("B","G","B","G","B",rep("G",3),rep("B",2),"G",rep("B",3),"G",rep("B",6),"G","B","G","B",rep("G",3),rep("B",2),"G",rep("B",3),"G","B","switch",rep("B",2),"switch"))
so I don't have a variable for y axes I can manage.
Here my code and the plot I obtained
ggplot(PIPPO,
aes(x = time, stratum = value, alluvium = ID,
fill = value, label = value)) +
scale_fill_brewer(type = "qual" , palette = "Set3") +
geom_flow(stat = "flow", knot.pos = 1/4, aes.flow = "forward",
color = "gray") +
geom_stratum() +
theme(legend.position = "bottom")
Could anyone help me?
What I get on real data using
scale_y_continuous(label = scales::percent_format(scale = 100 / n_id))
is this:
with 84% as the maximum value (and not 100%). How can i get the y-axes up to 100% and broken every 10% ?
Here what I get with
scale_y_continuous(breaks = scales::pretty_breaks(10), label = scales::percent_format(scale = 100 / n_id))
I get this weird values every 14%.
Using the scale argument in percent_format this can be achieved like so:
PIPPO <- data.frame("ID"=rep(c(1:20),2), "time"=c(rep(1,20),rep(2,20)), "value"=c("B","G","B","G","B",rep("G",3),rep("B",2),"G",rep("B",3),"G",rep("B",6),"G","B","G","B",rep("G",3),rep("B",2),"G",rep("B",3),"G","B","switch",rep("B",2),"switch"))
library(ggplot2)
library(ggalluvial)
n_id <- length(unique(PIPPO$ID))
ggplot(PIPPO,
aes(x = time, stratum = value, alluvium = ID,
fill = value, label = value)) +
scale_fill_brewer(type = "qual" , palette = "Set3") +
scale_y_continuous(label = scales::percent_format(scale = 100 / n_id)) +
geom_flow(stat = "flow", knot.pos = 1/4, aes.flow = "forward", color = "gray",) +
geom_stratum() +
theme(legend.position = "bottom")
Created on 2020-05-19 by the reprex package (v0.3.0)
I assume You will need to create a new column of the percentages, by taking the total number of rows, and then dividing each "value" in your column by the total to get what percentage it represents.
Simply normalising your y-values seems to do the trick:
library(ggplot2)
ggplot(mtcars, aes(x = cyl, y = mpg/max(mpg))) +
geom_point() +
scale_y_continuous(label = scales::label_percent())
Created on 2020-05-19 by the reprex package (v0.3.0)

Join values when creating boxplot

I have a table of 983 obs. of 27 variables; the data can be provided if need be, but I do not believe there is a need for it, as the following crosstable should summarise it well enough:
Kjønn Antall <> e f g s ug
Sex Count w d m s um
k 282 2 26 5 41 208
m 701 11 56 4 148 2 480
Abbreviations (with English translation):
e[nkemann], f[raskilt], g[ift], s[eparert], ug[ift]
w[idow(er)], d[ivorced], m[arried], s[eparated], u[n]m[arried]
I would like to create a variable width boxplot showing the distribution of these individuals, but as can be seen from the table, the NAs, the divorced and the separated would be such a small group that it would be hardly legible (and pointless. How can I join these groups creating a boxplot showing e, f+s, g, and ug?
My current code:
# The basis for the boxplot
dBox_SexAge <- ggplot(data = tblHoved) +
geom_boxplot(
mapping = aes(colour = KJONN, x = KJONN, y = 1875-FAAR),
notch = TRUE,
lwd = .5, fatten = .125,
varwidth = TRUE
)
# Create the final boxplot
dBox_SexAgeMStat <- dBox_SexAge +
facet_grid(SIVST ~ .) +
coord_flip()
# Run it
dBox_SexAgeMStat
Current plot, from which I would like to group f and s:
Create a sample data frame
tblHoved <- data.frame(FAAR = rnorm(10),
SIVST = rep(c("e", "f", "g", "s", "ug"),2),
stringsAsFactors = FALSE)
tblHoved
# FAAR SIVST
# 1 0.22499630 e
# 2 1.10236362 f
# 3 0.02220001 g
# 4 0.19062022 s
# 5 0.05103136 ug
# 6 0.09280887 e
# 7 -0.70574835 f
# 8 0.39331232 g
# 9 0.24817094 s
# 10 0.66631994 ug
Merge f and s
tblHoved$SIVST[tblHoved$SIVST %in% c("f","s")] <- "f+s"
tblHoved
# FAAR SIVST
# 1 0.22499630 e
# 2 1.10236362 f+s
# 3 0.02220001 g
# 4 0.19062022 f+s
# 5 0.05103136 ug
# 6 0.09280887 e
# 7 -0.70574835 f+s
# 8 0.39331232 g
# 9 0.24817094 f+s
# 10 0.66631994 ug

ggplot 2 heatmap with varing axis

I want to draw a heatmap, but the size of units on the x (and y) Axis should vary. Here an example code:
users = rep(1:3,3)
Inst = c(rep("A",3),rep("B",3),rep("C",3))
dens = rnorm(9)
n_inst = c(3,3,3,2,2,2,1,1,1)
df <- data.frame( users, Inst, dens, n_inst )
1 1 A 1.2521487 3
2 2 A -0.1013088 3
3 3 A 1.5770535 3
4 1 B 1.1093957 2
5 2 B 1.1059166 2
6 3 B 0.6884662 2
7 1 C -0.3864710 1
8 2 C -1.0216373 1
9 3 C 0.4500778 1
z <- ggplot(df, aes(Inst, users)) + geom_tile(aes(fill = dens))
z + scale_x_discrete(breaks = n_inst)
So this draws a heatmap, but all units of Inst have the same size. I want A to be 3 times the width of C and B two times the width of C. So I want n_inst to give the width of units.
I tried scale_discret, but that doesn't do it
Thank you in advance.
You can try this:
ggplot(df, aes(Inst, users)) + geom_tile(aes(fill = dens, width=n_inst))

How do I adjust the scale of a geom_tile in ggplot2?

I am trying to adjust the colour scale of a geom_tile plot.
A short version of my data (in data.frame format) is:
mydat <-
Sc K n minC
A 2 1 NA
A 2 2 37.453023
A 2 3 23.768316
A 2 4 17.628376
A 3 1 NA
A 3 2 12.693124
A 3 3 8.884226
A 3 4 7.436250
A 10 1 2.128121
A 10 2 2.116539
A 10 3 2.737923
A 10 4 3.509773
A 20 1 1.104592
A 20 2 1.840195
A 20 3 2.717198
A 20 4 3.616501
B 2 1 NA
B 2 2 25.090085
B 2 3 15.924186
B 2 4 11.811022
B 3 1 NA
B 3 2 8.827183
B 3 3 6.179484
B 3 4 5.175331
B 10 1 2.096934
B 10 2 2.064984
B 10 3 2.662373
B 10 4 3.407246
B 20 1 1.096871
B 20 2 1.802418
B 20 3 2.649153
B 20 4 3.517776
My code to prepare the data to plot is the following:
mydat$Sc <- factor(mydat$Sc, levels =c("A", "B"))
mydat$K <- factor(mydat$K, levels =c("2", "3","10","20"))
mydat.m <- melt(pmydat,id.vars=c("Sc","K","n"), measure.vars=c("minC"))
I want to plot with geom_tile the value of minC with K and n as axis and different facets for Sc with the following:
mydat.m.p <- ggplot(mydat.m, aes(x=n, y=K))
mydat.m.p +
geom_tile(data=mydat.m, aes(fill=value)) +
scale_fill_gradient(low="palegreen", high="lightcoral") +
facet_wrap(~ Sc, ncol=2)
This gives me a plot for each Sc factor. However, the colour scale does not reflect want I want to portray, because a few high values making low values all equal.
I want to adjust to a relevant scale in 4 breaks, i.e., 1-2, 2-3, 3-5, >5.
Looking at other questions there was a suggestion to use the cut function and scale fill manual as:
mydat.m$value1 <- cut(mydat.m$value, breaks = c(1:5, Inf), right = FALSE)
Then use the following in geom_tile:
scale_fill_manual(breaks = c("\[1,2)", "\[2, 3)", "\[3, 5)", "\[5, Inf)"),
values = c("darkgreen", "palegreen", "lightcoral", "red"))
However, I am not sure how this can be applied to a data.frame with other factors and in long format.
You're almost there. Simply use cut before melting:
mydat$minC.cut <- cut(mydat$minC, breaks = c(1:3, 5, Inf), right = FALSE)
mydat.cut <- melt(mydat, id.vars=c("Sc", "K", "n"), measure.vars=c("minC.cut"))
Now, you don't need to specify breaks since we took care of that already.
ggplot(mydat.cut, aes(x=n, y=K)) +
geom_tile(aes(fill=value)) +
facet_wrap(~ Sc, ncol=2) +
scale_fill_manual(values = c("darkgreen", "palegreen", "lightcoral", "red"))

ggplot2 issue with y axis

I have the following means table:
Sex Trait Average
1 1 -N 9.042735
2 2 -N 3.529577
3 1 E 8.111111
4 2 E 9.447887
5 1 O 17.196580
6 2 O 16.311800
7 1 A 12.213680
8 2 A 13.449440
9 1 C 12.025640
10 2 C 14.529580
From where I run the following graph:
library(ggplot2)
plot <- ggplot(meansMatrix, aes(Trait, Average, colour= Sex,group= Sex)) +
geom_line(aes(linetype=Sex),size=1) +
geom_point(size=3,fill="white") +
scale_color_manual(values = c("black", "grey50")) +
scale_y_discrete(limits=c(0,18),breaks=seq(2,18,2.5),labels=seq(2,18,2.5)) +
scale_x_discrete(limits=c("-N","E","O","A","C")); plot
There is visible a problem with the y axis. After setting the variable Average as numeric, I have tried with different combinations by changing the arguments (limits, breaks and labels) with no success. This is graph is the only that pops up else than error messages.
Any input of how to re-locate the plot and show the corresponding breaks will be highly appreciated!
Use scale_y_continuous:
meansMatrix <- read.table(text=" Sex Trait Average
1 1 -N 9.042735
2 2 -N 3.529577
3 1 E 8.111111
4 2 E 9.447887
5 1 O 17.196580
6 2 O 16.311800
7 1 A 12.213680
8 2 A 13.449440
9 1 C 12.025640
10 2 C 14.529580", header=TRUE)
meansMatrix$Sex <- factor(meansMatrix$Sex)
library(ggplot2)
p <- ggplot(meansMatrix, aes(Trait, Average, colour= Sex,group= Sex)) +
geom_line(aes(linetype=Sex),size=1) +
geom_point(size=3,fill="white") +
scale_color_manual(values = c("black", "grey50")) +
scale_y_continuous(limits=c(0,18),breaks=seq(2,18,2.5),labels=seq(2,18,2.5)) +
scale_x_discrete(limits=c("-N","E","O","A","C"))
print(p)

Resources