Showing a subgroup or subdivision within a histogram bar

Showing a subgroup or subdivision within a histogram bar - r

I have data as follows:
thevalues <- structure(c(9, 7, 9, 9, 9, 8, 9, 6, 4, 7, 9, 9, 9, 8, 7, 7, 9,
8, 8, 9, 5, 5, 8, 7, 5, 9, 9, 7, 7, 9, 8, 7, 8, 9, 4, 7, 9, 8,
6, 7, 7, 4, 8, 6, 9, 9, 8, 1, 9, 9, 9, 8, 9, 9, 6, 7, 4, 7, 9,
6, 6, 9, 9, 8, 6, 8, 7, 7, 7, 5, 9, 5, 7, 9, 8, 4, 9, 8, 8, 8,
5, 8, 1, 7, 7, 5, 6, 9, 5, 9, 6, 9, 6, 9, 9, 9, 8, 9, 9, 9, 9,
4, 6, 4, 8, 6, 8, 8, 7, 4, 6, 7, 4, 8, 8, 8, 7, 9, 3, 8, 8, 6,
9, 8, 8, 6, 5, 8, 3, 8, 6, 8, 7, 7, 6, 9, 5, 9, 8, 7, 9, 7, 9,
9, 8, 9, 6, 8, 9, 8, 6, 8, 9, 9, 9, 4, 8, 8, 5, 8, 7, 8, 8, 9,
9, 6, 8, 5, 9, 8, 7, 9, 9, 7, 6, 8, 7, 7, 8, 9, 6, 7, 8, 9, 7,
6, 6, 9, 7, 7, 8, 7, 7, 2, 4, 9, 9, 7, 7, 9, 7, 6, 9, 9, 8, 5,
5), label = NA_character_, class = c("labelled", "numeric"))
mistakes <- structure(c(0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1,
0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1,
0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0,
0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0), label = NA_character_, class = c("labelled", "numeric"))
I want to create a histogram of thevalues like so:
df <- data.frame(value = c(A),
variable = rep(c("thevalues"), each = length(A)))
ggplot(df, aes(value, fill = variable)) +
geom_density(aes(y = ..count..), size = 0.7, alpha = 0.1) +
geom_bar(position = "dodge") +
scale_fill_brewer(palette = "Set1") +
scale_x_continuous(breaks = c(1:9), labels = c(1:9)) +
theme(legend.title = element_blank(), legend.position = c(0.1, 0.85))
However, I would like to see the mistakes as part of these bars:
table(thevalues, mistakes)
mistakes
thevalues 0 1
1 1 1
2 1 0
3 1 1
4 9 2
5 10 4
6 17 8 # The total height of the bar is 25, 8 have a different colour.
7 24 16 # The total height of the bar is 40, 16 have a different colour.
8 33 16 # The total height of the bar is 49, 16 have a different colour.
9 49 14 # The total height of the bar is 63, 14 have a different colour.
Something like this:
EDIT:
The solution works perfectly, but I would really like to do this when there are two variables in the histogram:
thevalues_II <- structure(c(9, 9, 9, 8, 8, 9, 6, 9, 8, 8, 6, 9, 9, 9, 6, 7, 9,
7, 8, 9, 7, 9, 9, 8, 7, 9, 8, 7, 8, 9, 8, 9, 9, 9, 9, 7, 9, 7,
8, 9, 7, 7, 8, 4, 6, 9, 7, 7, 9, 9, 9, 8, 9, 8, 9, 9, 4, 8, 9,
8, 7, 9, 9, 8, 7, 8, 9, 8, 2, 7, 8, 8, 8, 8, 8, 6, 4, 9, 9, 8,
3, 7, 3, 8, 8, 9, 7, 9, 5, 6, 7, 8, 9, 8, 9, 9, 9, 9, 9, 9, 9,
7, 3, 7, 9, 7, 7, 7, 8, 8, 9, 9, 8, 8, 9, 6, 9, 9, 6, 7, 8, 7,
8, 9, 9, 7, 6, 8, 7, 9, 6, 5, 8, 8, 7, 9, 8, 9, 9, 7, 9, 7, 9,
8, 7, 9, 4, 8, 7, 7, 9, 9, 9, 9, 9, 4, 9, 9, 6, 7, 6, 7, 8, 9,
8, 9, 5, 9, 8, 8, 8, 9, 9, 6, 8, 8, 8, 8, 8, 8, 7, 8, 9, 9, 9,
7, 4, 8, 7, 7, 9, 8, 8, 7, 5, 8, 9, 8, 8, 9, 8, 5, 8, 9, 8, 9,
7), label = NA_character_, class = c("labelled", "numeric"))
df <- data.frame(value = c(thevalues, thevalues_II),
variable = rep(c("tax", "truth"), each = length(A)))
ggplot(df, aes(value, fill = variable)) +
geom_density(aes(y = ..count..), size = 0.7, alpha = 0.3) +
geom_bar(position = "dodge") +
scale_fill_brewer(palette = "Set1") +
theme(legend.title = element_blank(), legend.position = c(0.1, 0.85))
I tried:
library(tidyverse)
mydf <- data.frame(thevalues, mistakes)
mycount <- count(mydf, thevalues, thevalues_II, mistakes)
ggplot() +
geom_col(data = mycount, aes(thevalues, thevalues_II, n, fill = as.character(mistakes))) +
geom_density(data = mydf, aes(thevalues, thevalues_II, y = ..count..), size = 0.7, alpha = 0.1) +
scale_fill_brewer(palette = "Set1") +
theme(legend.title = element_blank(), legend.position = c(0.1, 0.85))
But that does not work.

Try a summarising count first. Apologies again for lack of image - using online console with reduced facilities.
library(tidyverse)
mydf <- data.frame(thevalues, mistakes)
mycount <- count(mydf, thevalues, mistakes)
ggplot() +
geom_col(data = mycount, aes(thevalues, n, fill = as.character(mistakes))) +
geom_density(data = mydf, aes(thevalues, y = ..count..), size = 0.7, alpha = 0.1) +
scale_fill_brewer(palette = "Set1") +
theme(legend.title = element_blank(), legend.position = c(0.1, 0.85))

Related

Density plot of a vector shows tails before and after its minimum and maximum

I have the following vector:
v<-c(1, 1, 8, 3, 1, 9, 4, 21, 13, 13, 1, 1, 3, 10, 1, 13, 22, 1,
1, 4, 2, 1, 13, 1, 5, 1, 2, 1, 1, 2, 12, 10, 26, 15, 2, 9, 6,
5, 1, 3, 18, 2, 10, 2, 8, 9, 4, 1, 11, 4, 2, 12, 3, 14, 2, 1,
27, 3, 6, 2, 1, 1, 3, 16, 3, 36, 13, 9, 11, 10, 24, 2, 27, 4,
4, 2, 9, 1, 3, 13, 3, 1, 8, 5, 5, 15, 1, 1, 3, 1, 4, 14, 8, 1,
1, 2, 20, 1, 9, 3, 1, 2, 5, 14, 5, 11, 1, 3, 2, 9, 10, 21, 9,
1, 20, 5, 11, 23, 2, 1, 1, 2, 1, 7, 2, 9, 1, 19, 9, 9, 2, 15,
17, 8, 11, 17, 2, 14, 2, 8, 13, 1, 2, 9, 15, 25, 3, 8, 32, 4,
11, 1, 1, 2)
I would like to estimate its density in R through the command density. With few lines of code:
d<-density(v)
df<-data.frame(x=d$x,y=d$y,stringsAsFactors = FALSE)
plot(df)
I obtained the following picture:
But the resulting plot doesn't add up, because max(v) is 36 and min(v) is 1 while the graph shows tails before and after 0 and 40.

R - Making code more professional/efficient

This is a part of my code:
a <- data.frame(X1 = c(9, 9, 9, 8, 9, 9, 8, 9, 8, 7),
X2 = c(8, 8, 6, 8, 6, 8, 9, 8, 8, 8),
X3= c(-3, -3, -3, -3, -3, -3, -2, -1, -3, -3),
X4= c(-5, -7, -5, -7, -7, -7, -7, -7, -7, -5),
X5= c(1, 1, -1, 1, 1, 1, 1, 1, 1, 1),
X6= c(9, 11, 11, 11, 11, 11, 9, 11, 11, 10),
X7= c(7, 8, 8, 7, 8, 8, 8, 8, 8, 6),
X8= c(1, 0, 0, 1, 1, 1, 1, 1, 0, 0),
X9= c(25, 25, 25, 24, 25, 25, 24, 25, 25, 24))
cov=cov(a)
cov[5,1:5]<-0
cov[1:5,5]<-0
cov[5,5]<-1
cov
I try to write this part of the code in a more professional way:
cov[5,1:5]<-0
cov[1:5,5]<-0
cov[5,5]<-1
I tried to do something like:
cov[5,1:5] & cov[1:5,5]<-0
but it does not work

cov[5,1:4] <- cov[1:4,5] <-0
cov[5,5] <- 1

cov[5,1:5] <- cov[1:5,5] <-0

The ROC curve is below the oblique line, how to correct it?

I use ggplot and plotROC packages to draw ROC curves, but one of the drawn curves is in the opposite direction. How can I modify them to keep the two curves in the same direction?
My code is as follows：
library("plotROC")
Response <- c(0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0,
0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0,
0, 0, 1, 1, 0, 0)
len <- c(4, 7, 8, 10, 4, 10, 10, 10, 10, 10, 9, 8, 7, 7, 5, 4, 4, 4, 3, 3, 2,
2, 9, 11, 0.5, 10, 8, 5, 4, 10, 10, 9, 8, 8, 7, 5, 1, 12, 10, 11, 9,
10, 7, 10, 7, 12, 10, 11, 10, 4, 12, 7, 12, 14, 10, 9, 9, 7, 10, 2,
12, 12, 10, 16, 10, 9, 15, 10, 9, 5, 12, 12, 11, 6, 9.5, 9, 11, 3)
gc <- c(15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
15, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 13, 13, 13, 13, 13, 13, 13,
13, 12, 12, 11, 10, 9, 9, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7,
7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6, 5, 5, 5, 4, 3, 3, 3,3)
d1 <- data.frame(Response = Response, Predictor = len, group = "len")
d2 <- data.frame(Response = Response, Predictor = gc, group = "gc")
mydata <- rbind(d1, d2)
ggplot(mydata, aes(d = Response, m = Predictor, color = group, linetype = group, shape = group)) +
geom_roc(n.cut = 0, show.legend = TRUE, labels=FALSE, size = 0.6)+
geom_abline(size = 0.7, color = "grey", linetype = "dashed")+
xlab("1 - Specificity") +
ylab("Sensitivity")

How can I change the grid line spacing on a ggplot2 dotplot?

I'm analyzing data from the result of pulling 10 numbered balls from a jar with replacement, repeated 70 times. Here's my code (data included):
numbers <- c(8, 3, 9, 5, 1, 9, 10, 8, 8, 1, 9, 9, 8, 5, 1, 10, 5, 9, 6, 4, 10, 3,
10, 9, 8, 4, 8, 8, 9, 9, 1, 5, 9, 8, 4, 1, 8, 6, 7, 8, 2, 9, 5, 6,
10, 9, 1, 1, 5, 6, 2, 8, 6, 5, 2, 5, 4, 10, 10, 2, 2, 4, 9, 6, 9,
9, 6, 10, 9, 10)
num_frame <- data.frame(numbers)
ggplot(num_frame) +
geom_dotplot(aes(numbers), binwidth = 1, dotsize = 0.4) +
theme_bw() +
xlab("Numbers") +
ylab("Frequency")
The resulting plot is nice, except it labels gridlines at 0, 2.5, 5, 7.5, and 10, which is obviously not what I want. The scale is fine, but I would like the gridlines to be at integer values 1 through 10 (0 is fine too if necessary). How can I do this? I'd also like the y-axis to adjust likewise so that the grid is still square. Thanks!

Just add:
scale_x_continuous(breaks=1:10, minor_breaks=NULL)
minor_breaks=NULL suppress lines that aren't at the breaks

how to define fill colours in ggplot histogram?

I have the following simple data
data <- structure(list(status = c(9, 5, 9, 10, 11, 10, 8, 6, 6, 7, 10,
10, 7, 11, 11, 7, NA, 9, 11, 9, 10, 8, 9, 10, 7, 11, 9, 10, 9,
9, 8, 9, 11, 9, 11, 7, 8, 6, 11, 10, 9, 11, 11, 10, 11, 10, 9,
11, 7, 8, 8, 9, 4, 11, 11, 8, 7, 7, 11, 11, 11, 6, 7, 11, 6,
10, 10, 9, 10, 10, 8, 8, 10, 4, 8, 5, 8, 7), statusgruppe = c(0,
0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, NA, 0, 1, 0, 1,
0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1,
1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0,
1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0)), .Names = c("status",
"statusgruppe"), class = "data.frame", row.names = c(NA, -78L
))
from that I'd like to make a histogram:
ggplot(data, aes(status))+
geom_histogram(aes(y=..density..),
binwidth=1, colour = "black",
fill="white")+
theme_bw()+
scale_x_continuous("Staus", breaks=c(min(data$status,na.rm=T), median(data$status, na.rm=T), max(data$status, na.rm=T)),labels=c("Low", "Middle", "High"))+
scale_y_continuous("Percent", formatter="percent")
Now - i'd like for the bins to take colou according to value - e.g. bins with value > 9 gets dark grey - everything else should be light grey.
I have tried with fill=statusgruppe, scale_fill_grey(breaks=9) etc. - but I can't get it to work. Any ideas?

Hopefully this should get you started:
ggplot(data, aes(status, fill = ..x..))+
geom_histogram(binwidth = 1) +
scale_fill_gradient(low = "black", high = "white")
ggplot(data, aes(status, fill = ..x.. > 9))+
geom_histogram(binwidth = 1) +
scale_fill_grey()

How about using fill=..count.. or fill=I(..count..>9) right after y=..density..? You have to tinker with the legend title and labels a bit, but it gets the coloring right.
EDIT:
It seems I misunderstood your question a bit. If you want to define color based on the x-coordinate, you can use the ..x.. automatic variable similarly.

What about scale_manual? Here's link to Hadley's site. I've used this function to set an appropriate fill colour for a boxplot. Not sure if it'll work with histogram, though...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Showing a subgroup or subdivision within a histogram bar - r

Related

Density plot of a vector shows tails before and after its minimum and maximum

R - Making code more professional/efficient

The ROC curve is below the oblique line, how to correct it?

How can I change the grid line spacing on a ggplot2 dotplot?

how to define fill colours in ggplot histogram?

Categories

Resources