Two Variable side by side bar plot ggplot of categorical data - r

demo <- read.table(header = TRUE,
text ="var1 Var2
Good Excellent
Subpar Good
Excellent Decent
Good Good
Subpar Subpar")
How would I create a side by side bar plot these Var1 and Var2 where the Y-axis is the count of each of each distinct values?
For instance a bar under good comparing the number of good in var1 to var2?

The tidyverse is perfect for that:
library(tidyverse)
demo %>%
gather(key, value) %>%
mutate(value_ordered = factor(value, levels=c("Decent","Good", "Subpar", "Excellent"))) %>%
ggplot(aes(value_ordered, fill=key)) +
geom_bar(position="dodge")
Or bars with same width:
as.tbl(demo) %>%
gather(key, value) %>%
group_by(key, value) %>% # group
count() %>% # count the frequency
ungroup() %>% # ungroup
complete(key, value) %>% # Complete missing combinations
mutate(value_ordered = factor(value, levels=c("Decent","Good", "Subpar", "Excellent"))) %>%
ggplot(aes(value_ordered,n, fill=key)) +
geom_col(position = "dodge") # it is recommended to use geom_col directly instead of stat="identity"

library(reshape)
library(ggplot2)
#sample data
demo <- read.table(header = TRUE,
text ="var1 var2
Good Excellent
Subpar Good
Excellent Decent
Good Good
Subpar Subpar")
#pre-processing
df <- merge(melt(table(demo$var1)), melt(table(demo$var2)), by='Var.1', all=T)
colnames(df) <- c("Words", "Var1", "Var2")
df[is.na(df)] <- 0
df <- melt(df, id=c("Words"))
colnames(df) <- c("Words", "Column_Name", "Count")
#plot
ggplot(df, aes(x=Words, y=Count, fill=Column_Name)) +
geom_bar(position="dodge", stat="identity")

Related

R geom_point() number of points reflect value in column

Say I have mydf, a dataframe which is as follows:
Name
Value
Mark
101
Joe
121
Bill
131
How would I go about creating a scatterplot in ggplot that takes the data in the value column (e.g., 101) and makes that number of points on a chart? Would this be a stat = that I am unfamiliar with, or would I have to structure the data such that Mark, for example, has 101 unique rows, Joe has 121, etc.?
Update: As suggest by Ben Bolker (many thanks) we could set the width of geom_jitter additionally we could add some colour asthetics:
df %>%
group_by(Name) %>%
complete(Value = 1:Value) %>%
ggplot(aes(x=Name, y=Value, colour=Name))+
geom_jitter(width = 0.1)
OR more compact as suggested by Henrik (many thanks) using uncount:
ggplot(uncount(df, Value, .id = "y"), aes(x = Name, y = y)) + ...
First answer:
Something like this?
library(dplyr)
library(ggplot2)
library(tidyr) # complete
df %>%
group_by(Name) %>%
complete(Value = 1:Value) %>%
ggplot(aes(x=Name, y=Value))+
geom_jitter()

Plot percentages in R as blocks

I have the table to the left
table <- cbind(c("x1","x2", "x3"), c("0.4173","0.9211","0.0109"))
and is trying to make the plot two the right.
Is there any packages in R, which can do, what I'm trying to achieve?
A base R, option would be to use barplot applied on a named vector
barplot(v1)
Or convert to two column data.frame with stack and use the formula method
barplot(values ~ ind, stack(v1))
Or we can can use tidyverse with ggplot
library(dplyr)
library(ggplot2)
library(tidyr)
library(tibble)
enframe(v1, name = "id", value = 'block') %>%
mutate(non_block = 1 - block) %>%
pivot_longer(cols = -id) %>%
ggplot(aes(x = id, y = value, fill = name)) +
geom_col() +
coord_flip() +
theme_bw()
-output
data
v1 <- setNames(c(0.4173, 0.9211, 0.0109), paste0("x", 1:3))

bicolor heatmap with factor levels

I have this dataframe:
set.seed(0)
df <- data.frame(id = factor(sample(1:100, 10000, replace=TRUE), levels=1:100),
year = factor(sample(1950:2019, 10000, replace=TRUE), levels=1950:2019)) %>% unique() %>% arrange(id, year)
And I'm looking to plot a heatmap graph where the ids are in the X-axis, years at the Y-axis, and the color is blue when the data point exists and the color is red when the data doesn't exist. I'm almost there, but I can't figure out to change the fill argument for the two colors:
ggplot(df, aes(id, year, fill= year)) +
geom_tile()
The objective to plot both variables as factors is to plot them even when some year doesn't have any id (and plotting its whole row as red).
EDIT:
Two things I forgot to add (hope it's not too late):
How to add alpha transparency to geom_tile() without messing it?
I need to sort the ids from maximum missings to minimum missings.
The complete() function from the tidyr package is useful for filling in missing combinations. First, you need to set a flag variable to indicate if the data is present or not, and then expand the data frame with the missing combinations and fill the new flag variable with 0:
df <- df %>%
mutate(flag = TRUE) %>%
complete(id, year, fill = list(flag = FALSE))
ggplot(df, aes(id, year, fill = flag)) +
geom_tile()
EDIT1: To add transparency, add alpha = 0.x within geom_tile(), where x is a value indicating the transparency. The lower the value, the more transparent.
EDIT2: To sort by missingness add the following code prior to the ggplot code:
# Determine the order of the IDs
df_order <- df %>%
group_by(id) %>%
summarize(sum = sum(flag)) %>%
arrange(desc(sum)) %>%
mutate(order = row_number()) %>%
select(id, order)
# Set the IDs in order on the chart
df <- df %>%
left_join(df_order) %>%
mutate(id = fct_reorder(id, order))
I think you need to do some pre-processing before plotting. Create a temporary variable (data_exist) which denotes data is present for that id and year. Then use complete to fill the missing years for each id and plot it.
library(tidyverse)
df %>%
mutate_all(~as.integer(as.character(.))) %>%
mutate(data_exist = 1) %>%
complete(id, year = min(year):max(year), fill = list(data_exist = 0)) %>%
mutate(data_exist = factor(data_exist)) %>%
ggplot() + aes(id, year, fill= data_exist) + geom_tile()
With expand.gridyou can create a dataframe with all combinations of ids and years, then left join on this combinations to see if you had them in df
all <- expand.grid(id=levels(df$id),year=levels(df$year)) %>%
left_join(df) %>%
mutate(present=ifelse(is.na(present),'0','1'))
ggplot(all, aes(as.numeric(id), as.numeric(year), fill= present)) +
geom_tile() +
scale_fill_manual(values=c('0'='red','1'='blue')) + # change default colors
theme(legend.position="None") # hide legend

Adding character values of a column in R

I have two columns i.e. square_id & Smart_Nsmart as given below.
I want to count(add) N's and S's against each square_id and ggplot the data i.e. plot square_id vs Smart_Nsmart.
square_id 1
1
2 2 2 2 3 3 3 3
Smart_Nsmart
S N N N S S N S S S
We can use count and then use ggplot to plot the frequency. Here, we are plotting it with geom_bar (as it is not clear from the OP's post)
library(dplyr)
library(ggplot2)
df %>%
count(square_id, Smart_Nsmart) %>%
ggplot(., aes(x= square_id, y = n, fill = Smart_Nsmart)) +
geom_bar(stat = 'identity')
The above answer is very smart. However, instead of count function, you can implement group_by and summarise just in case in future you want to apply some other functions to your code.
library(dplyr)
library(ggplot2)
dff <- data.frame(a=c(1,1,1,1,2,1,2),b=c("C","C","N","N","N","C","N"))
dff %>%
group_by(a,b) %>%
summarise(n = length(b) ) %>%
ggplot(., aes(x= a, y = n, fill = b)) +
geom_bar(stat = 'identity')

grouped by factor level in ggplot2()

I've got a data frame with four three-level categorical variables: before_weight, after_weight, before_pain, and after_pain.
I'd like to make a bar plot featuring the proportion for each level of the variables. That my current code achieves.
The problem's the presentation of the data. I'd like the respective before and after bars to be grouped together, so that the bar representing the people that answered 1 in the before_weight variable is grouped next to the bar representing the people that answered 1 in the after_weight variable, and so forth for both the weight and pain variables.
I've been trying to use dplyr, mutate() with numerous ifelse() statements, to make a new variable pairing up the groups in question, but can't seem to get it to work.
Any help would be much appreciated.
starting point (df):
df <- data.frame(before_weight=c(1,2,3,2,1),before_pain=c(2,2,1,3,1),after_weight=c(1,3,3,2,3),after_pain=c(1,1,2,3,1))
current code:
library(tidyr)
dflong <- gather(df, varname, score, before_weight:after_pain, factor_key=TRUE)
df$score<- as.factor(df$score)
library(ggplot2)
library(dplyr)
dflong %>%
group_by(varname) %>%
count(score) %>%
mutate(prop = 100*(n / sum(n))) %>%
ggplot(aes(x = varname, y = prop, fill = factor(score))) + scale_fill_brewer() + geom_col(position = 'dodge', colour = 'black')
UPDATE:
I'd like proportions rather than counts, so I've attempted to tweak Nate's code. Since I'm using the question variable to group the data to get the proportions, I can't seem use gsub() to change the content of that variable. Instead I added question2 and passed it into facet_wrap(). It seems to work.:
df %>% gather("question", "val") %>%
count(question, val) %>%
group_by(question) %>%
mutate(percent = 100*(n / sum(n))) %>%
mutate(time= factor(ifelse(grepl("before", question), "before", "after"), c("before", "after"))) %>%
mutate(question2= ifelse(grepl("weight", question), "weight", "pain")) %>%
ggplot(aes(x=val, y=percent, fill = time)) + geom_col(position = "dodge") + facet_wrap(~question2)
Does this code make the visual comparisons you are after? One ifelse and a gsub will help make variables we can use for facetting and filling in ggplot.
df %>% gather("question", "val") %>% # go long
mutate(time = factor(ifelse(grepl("before", question), "before", "after"),
c("before", "after")), # use factor with levels to control order
question = gsub(".*_", "", question)) %>% # clean for facets
ggplot(aes(x = val, fill = time)) + # use fill not color for whole bar
geom_bar(position = "dodge") + # stacking is the default option
facet_wrap(~question) # two panels

Resources