Plot summary of unique observations with ggplot - r

Is it possible to count unique observations via a ggplot formula? For instance by somehow achieving the same result as this by cutting the middle line? My efforts so far e.g. using geom_histogram with stat='bin' have failed.
set.seed(1)
d = data.frame(year = sample(2005:2009, 50, prob = 1:5, rep=T),
group = sample(letters, 50, prob = 1:26, rep=T))
d2 = plyr::count(unique(d)$year)
ggplot(d2, aes(x, freq)) + geom_bar(stat='identity') + labs(x='year', y='count of groups')

stat_bin() will do the trick like this:
ggplot(unique(d), aes(x = as.factor(year))) +
stat_bin() +
labs(x='year', y='count of groups')

Related

How to group variable by quality number?

Here's a madeup dataset that demonstrates the general idea of what I'm working with.
Quality <- sample(1:4, 300, replace = TRUE)
reader_ID <- rep(1:3, each = 100)
df <- data.frame(Quality, reader_ID)
df
quality_percentage <- ggplot(df, aes(x = reader_ID, y = Quality, fill = Quality)) +
geom_bar(position="fill", stat="identity")
quality_percentage
Here is the graph it produced. I'm trying to have each quality grouped together instead of having them all separate.
You can simply sort your data frame by Quality before plotting:
ggplot(df[order(df$Quality),],
aes(x = reader_ID, y = Quality, fill = Quality)) +
geom_col(position = "fill")

ggplot: How to add a certain percentage to the top of the pillars of a histogram

I need to replicate a certain format of a histogram/barchart. I already did some good modification with ggplot in order to group the categorial x-variable and specifiy the colors with HEX.
Here is what I try to plot/replicate:
Here is a MWE for my data structure:
sex <- sample(0:1, 100, replace=TRUE)
group <- sample(2:5, 100, replace=TRUE)
data <- data.frame(sex, group)
library(ggplot2)
ggplot(data, aes(x = group, group=sex, fill=factor(sex) )) +
geom_histogram(position="dodge", binwidth=0.45) +
theme(axis.title.x=element_blank(), axis.title.y=element_blank()) +
guides(fill=guide_legend(title="sex")) +
scale_y_continuous(labels = scales::percent_format()) +
scale_fill_manual(values=c("#b6181f", "#f6b8bb"))
I get:
Small things I can't handle are:
replace the factor labels on the x-axis, there might be a problem with my histogram-approach, but I also found no practical way with a bar-chart
round the percentage-digits, no decimals for percentages
But most important is, that I don't know how to add a single percentage-value for one group, one sex to the top of each bar..
I am looking forward for some advice :)
First of all I would treat your x-axis data as factors and plot it as bars. Getting percentage value text to the bar top look this question: Show % instead of counts in charts of categorical variables.
Futhermore the y-axis percent values aren't a question of rounding, they actually are no percentage values. y = ..prop.. solves that.
Are you looking for that (I summed everything up)?
sex <- sample(0:1, 100, replace=TRUE)
group <- sample(2:5, 100, replace=TRUE)
data <- data.frame(sex, group)
labs <- c("Score < 7", "Score\n7 bis < 12", "Score\n12 bis < 15",
"Score\n15 bis < 20","Score >= 20")
ggplot(data, aes(x = as.factor(group), y = ..prop.., group = sex, fill = factor(sex) )) +
geom_bar(position = "dodge") +
geom_text(aes(label = scales::percent(..prop..)),
position = position_dodge(width = 0.9), stat = "count", vjust = 2) +
labs(x = NULL, y = NULL) +
guides(fill = guide_legend(title = "sex")) +
scale_y_continuous(labels = scales::percent_format()) +
scale_fill_manual(values=c("#b6181f", "#f6b8bb")) +
scale_x_discrete(labels = labs)

Grouping data outside limits in histogram using ggplot2

I am trying to do a histogram zoomed on part of the data. My problem is that I would like to grup everything that is outside the range into last category "10+". Is it possible to do it using ggplot2?
Sample code:
x <- data.frame(runif(10000, 0, 15))
ggplot(x, aes(runif.10000..0..15.)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), colour = "grey50", binwidth = 1) +
scale_y_continuous(labels = percent) +
coord_cartesian(xlim=c(0, 10)) +
scale_x_continuous(breaks = 0:10)
Here is how the histogram looks now:
How the histogram looks now
And here is how I would like it to look:
How the histogram should look
Probably it is possibile to do it by nesting ifelses, but as I have in my problem more cases is there a way for ggplot to do it?
You could use forcats and dplyr to efficiently categorize the values, aggregate the last "levels" and then compute the percentages before the plot. Something like this should work:
library(forcats)
library(dplyr)
library(ggplot2)
x <- data.frame(x = runif(10000, 0, 15))
x2 <- x %>%
mutate(x_grp = cut(x, breaks = c(seq(0,15,1)))) %>%
mutate(x_grp = fct_collapse(x_grp, other = levels(x_grp)[10:15])) %>%
group_by(x_grp) %>%
dplyr::summarize(count = n())
ggplot(x2, aes(x = x_grp, y = count/10000)) +
geom_bar(stat = "identity", colour = "grey50") +
scale_y_continuous(labels = percent)
However, the resulting graph is very different from your example, but I think it's correct, since we are building a uniform distribution:

Format axis and label for line graph using ggplot2

Here is my sample data:
Singer <- c("A","B","C","A","B","C")
Rank <- c(1,2,3,3,2,1)
Episode <- c(1,1,1,2,2,2)
Votes <- c(0.3,0.28,0.11,0.14,0.29,0.38)
data <- data_frame(Episode,Singer,Rank,Votes)
data$Episode <- as.character(data$Episode)
I would like to make a line graph to show the performance of each singer.
I tried to use ggplot2 like below:
ggplot(data,aes(x=Episode,y=Votes,group = Singer)) + geom_line()
I have two questions:
How can I format the y-axis as percentage?
How can I label each dot in this line graph as the values of "Rank", which allows me to show rank and votes in the same graph?
To label each point use:
geom_label(aes(label = Rank))
# or
geom_text(aes(label = Rank), nudge_y = .01, nudge_x = 0)
To format the axis labels use:
scale_y_continuous(labels = scales::percent_format())
# or without package(scales):
scale_y_continuous(breaks = (seq(0, .4, .2)), labels = sprintf("%1.f%%", 100 * seq(0, .4, .2)), limits = c(0,.4))
Complete code:
library(ggplot2)
library(scales)
ggplot(data, aes(x = factor(Episode), y = Votes, group = Singer)) +
geom_line() +
geom_label(aes(label = Rank)) +
scale_y_continuous(labels = scales::percent_format())
Data:
Singer <- c("A","B","C","A","B","C")
Rank <- c(1,2,3,3,2,1)
Episode <- c(1,1,1,2,2,2)
Votes <- c(0.3,0.28,0.11,0.14,0.29,0.38)
data <- data_frame(Episode,Singer,Rank,Votes)
# no need to transform to character bc we use factor(Episode) in aes(x=..)

ggplot multiple lines colored as gradient

I'm currently struggling to wrap my head around the following objective:
a 2x2 facet grid
in each facet a couple of lines
each line colored according to some continuous variable
I not even get the simple example working. So far I have:
df <- data.frame(xval = rep(1:5, 8),
yval = runif(40),
pval = rep(c(rep(1,5), rep(2, 5)),4),
plt = rep(c(rep("mag", 10), rep("ph", 10)), 2),
p = c(rep("p1", 20), rep("p2", 20))
)
ggplot(df, aes(xval, yval)) +
geom_line(aes(colour = pval)) +
facet_grid(plt~p)
Would very much appreciate your help.
Since pval is not a factor variable you need to specify the grouping explicitly.
ggplot(df, aes(xval, yval)) +
geom_line(aes(colour = pval, group = pval)) +
facet_grid(plt~p)

Resources