Frequencies of bargraph as independent list - r

Suppose I have the following dataset
set.seed(85)
a <- data.frame(replicate(10,sample(0:3,5,rep=TRUE)))
and I plot it in the following way:
library(ggplot2)
ggplot(stack(a), aes(x = values)) +
geom_bar()
From the graph I can read that there are a little less than 1250 occurrences of '3' in the dataset, but is there a way to output frequency of each x-axis value in the dataset as an independent list (i.e. not as numbers on the barplot)? I am looking for a list of how many occurrences of '3' there are in the dataset (and also for the values, 0, 1, & 2).
output:
0: 1249
1: 1200
2: ...
3: ...
Any help is much appreciated

We can convert to 'long' format and then do the count
library(dplyr)
library(tidyr)
a %>%
pivot_longer(everything()) %>%
count(value)
To get the barplot
library(ggplot2)
a %>%
pivot_longer(everything()) %>%
count(value) %>%
ggplot(aes(x = value, y = n)) +
geom_bar(stat = 'identity')
In base R, unlist and get the table
table(unlist(a))
or for plotting
barplot(table(unlist(a)))

Related

Break ggplot2 into Windows/facets with labels in alphabetical order

I want to generate a point plot which shows equal number of rows in each frame (can be facets if nothing else works) & in alphabetical order (A1,A2,B2,B2 etc.) since the plot length is too high to see the axis labels clearly. I want to break this plot into 4 windows with the same number of rows i.e. 13 each. (preferably tidyverse & not hard coded # of rows)
library(tidyverse)
df <- data.frame(names=c(paste0(LETTERS,1),paste0(LETTERS,2)),value=1:52)
df %>%
arrange(desc(names)) %>%
ggplot(aes(y=names,x=value))+
geom_point()+
scale_y_discrete(limits=rev)
We can create a grouping column with gl and use facet_wrap
library(dplyr)
library(ggplot2)
df %>%
arrange(desc(names)) %>%
mutate(grp = as.integer(gl(n(), ceiling(n()/4), n()))) %>%
ggplot(aes(y=names,x=value))+
geom_point() +
facet_wrap(~ grp, scales = 'free_y')
-output

Fill geom_tile by mode of a factor variable or other ways to create a heat map in R

I am trying to create a heat map in R using three factors. I would like to be able to fill the colour using the modal category of one of the factors but I have not been able to find out how to do this.
When I try ggplot with geom_tile, it does produce the heatmap, however, I am not sure how it chooses the value of the fill variable. It certainly isn't the mode because I've checked this.
For instance, using the inbuilt dataset ChickWeight, I would like the fill to be based on the modal (most frequent) category of a variable "weight_group" I created.
data(ChickWeight)
glimpse(ChickWeight)
ChickWeight$Time <- ifelse(ChickWeight$Time >= 10,1,0)
ChickWeight <- ChickWeight %>% mutate(weight_group = ntile(weight, 3))
ChickWeight$Diet <- as.factor(ChickWeight$Diet)
ChickWeight$Time <- as.factor(ChickWeight$Time)
ChickWeight$weight_group <- as.factor(ChickWeight$weight_group)
table(ChickWeight$Diet, ChickWeight$Time, ChickWeight$weight_group)
ggplot(data = ChickWeight, aes(x=Time, y=Diet, fill=weight_group)) +
geom_tile()
Based on the three-way table, the bottom right block should be pink (corresponding to weight_group==1) rather than green as the modal category of weight_group when Diet==1 & Time==1 is weight_group==1 (11 counts).
Any help on this would be greatly appreciated.
Thank you!
You can define a function getMode that calculates the mode of a vector using plyr's count function to create a data frame of the counts for each class. Then sort the data frame and get the top value.
library(plyr)
getMode <- function(vec){
df <- plyr::count(vec) %>%
arrange(-freq)
return(df[1,"x"])
}
From here group by time and diet so you can find the mode for each combination of these groups and then use this as the fill for ggplot.
ChickWeight %>%
group_by(Time, Diet) %>%
summarize(modeWeightGroup = getMode(weight_group)) %>%
ggplot(aes(x=Time, y=Diet, fill= modeWeightGroup)) +
geom_tile()
I also don't think that the bottom right square should be weight_group 1 because it looks like the three way table is already sorted based on weight_group so that square is saying that of chicks in weight_group 1, their modal time, diet combination is (1,1).
Using dplyr to count the most frequent category of weight_group for each combination of Time and Diet :
ChickWeight %>%
group_by(Time, Diet) %>%
count(weight_group) %>%
filter(n == max(n)) %>%
ggplot(
aes(x = Time,
y = Diet,
fill = weight_group)
) +
geom_tile()
By the way, since you already know dplyr::mutate, you should know you can do all the pre-processing you are doing here inside a single mutate.
That means instead of :
ChickWeight$Time <- ifelse(ChickWeight$Time >= 10,1,0)
ChickWeight <- ChickWeight %>% mutate(weight_group = ntile(weight, 3))
ChickWeight$Diet <- as.factor(ChickWeight$Diet)
ChickWeight$Time <- as.factor(ChickWeight$Time)
ChickWeight$weight_group <- as.factor(ChickWeight$weight_group)
you can simply type :
ChickWeight <-
ChickWeight %>%
mutate(
Time = as.factor(ifelse(Time>=10, 1 ,0)),
Diet = as.factor(Diet),
weight_group = as.factor(ntile(weight, 3))
)

Stacked bar chart with multiple categorical variables in ggplot2 with facet_grid

I am trying to create a stacked bar chart in ggplot2 to display the percentage of values corresponding to each categorical variable. Here's an example of the data that I am trying to work with.
sampledf <- data.frame("Death" = rep(0:1, each = 5),
"HabitA" = rep(0:1, c(3, 7)),
"HabitB" = rep(1:2, c(4, 6)),
"HabitC" = rep(0:1, c(6, 4)))
Each of the habits are the columns that I am using to create the stacked bar chart, and I want to use the Death column in facet_grid. I'm looking to show the percentage of values for each habit in the bar chart.
The output data I think I need to create the chart should will translate to, under Death = 0, HabitA has 60% 0 values, and 40% of the values are 1, while under Death = 1, 100% of HabitA values are 1.
I have produced charts like this using ggplot and group_by, summarise for only one attribute, but I am not sure how this works with multiple categorical attributes in the data.
sampledf %>%
group_by(Death, HabitA) %>%
summarise(count=n()) %>%
mutate(perc=count/sum(count))
This produces what I want for just one variable, but when I include another attribute in the group by argument, it returns counts a percentages for a combination of all 3 attributes which is not what I am looking for. I tried using the summarise_at/mutate_at but it doesn't seem to be working.
sampledf %>%
group_by(Death) %>%
mutate_at(c("HabitA", "HabitB"), Counts = n())
Is there a straightforward way to do this in R, and use the resulting data as input for ggplot2?
Edit:
I tried to reshape the data and using the long form to build my plot. Here's what I have.
long <- melt(sampledf, id.vars = c("Death"))
The resulting data is in this format.
Death variable value
1 0 HabitA 0
2 0 HabitA 0
3 0 HabitA 0
4 0 HabitA 1
5 0 HabitA 1
6 1 HabitA 1
7 1 HabitA 1
I'm not sure how to use the value attribute to build the plot, because the ggplot I am currently trying to build is counting the total number of times each level occurs in the variable column.
ggplot(long, aes(x = variable, fill = variable)) +
geom_bar(stat = "count", position = "dodge") + facet_grid(~ Death)
Try this, maybe not so straightforward, but it works. It includes reshaping as #aosmith suggested by gather. Then calculation of number of observations after grouping and then percentage for each group Death + habitat. Then summarized to get unique values.
sampledf_edited <- sampledf %>%
tidyr::gather("habitat", "count", 2:4) %>%
group_by(Death, habitat, count) %>%
mutate(observation = n()) %>%
ungroup() %>%
group_by(Death, habitat) %>%
mutate(percent = observation/n()) %>%
ungroup() %>%
group_by(Death, habitat, count, percent) %>%
summarize()
It is necessarry to make count factor.
sampledf_edited$count <- as.factor(sampledf_edited$count)
Plotting by ggplot.
ggplot(sampledf_edited, aes(habitat, percent, fill = count)) +
geom_bar(stat = "identity") +
facet_grid(~ Death)
If your question has been answered, please make sure to accept an answer for further references.
---EDIT---
plot added

Barplot with groups and subgroups

I need to make a boxplot with groups and variables using the data below :
df<-as.data.frame(cbind(c(1,2,3),
c(0.4,-0.11,-0.07),
c(0.31,0.07,0),
c(0.45,-0.23,0.02)))
names(df)<-c('cat','var1','var2','var3')
I need to make a barplot with the cat1 on the abscissa and the measurements of each variables on the ordinate.
For example concerning the cat=1, I need in the abscissa the number of cat1 with 3 barplots representing the value of (var1,..var3).
library(tidyverse)
df <- df %>%
gather(var, val, -cat)
ggplot(df, aes(cat, val, fill=var)) +
geom_col(position="dodge")

Plotting Average/Median of each column in data frame grouped by factors

I am trying to make a grouped barplot and I am running into trouble. For example, if I was using the mtcars dataset and I wanted to group everything by the 'vs' column (col #8), find the average of all remaining columns, and then plot them by group.
Below is a very poor example of what I am trying to do and I know it is incorrect.
Ideally, mpg for vs=1 & vs=0 would be side by side, followed by cyl's means side by side, etc. I don't care if aggregate is skipped for dyplr or if ggplot is used or even if the aggregate step is not needed...just looking for a way to do this since it is driving me crazy.
df = mtcars
agg = aggregate(df[,-8], by=list(df$vs), FUN=mean)
agg
barplot(t(agg), beside=TRUE, col=df$vs))
Try
library(ggplot2)
library(dplyr)
library(tidyr)
df %>%
group_by(vs=factor(vs)) %>%
summarise_each(funs(mean)) %>%
gather(Var, Val, -vs) %>%
ggplot(., aes(x=Var, y=Val, fill=vs))+
geom_bar(stat='identity', position='dodge')
Or using base R
m1 <- as.matrix(agg[-1])
row.names(m1) <- agg[,1]
barplot(m1, beside=TRUE, col=c('red', 'blue'), legend=row.names(m1))

Resources