proportional stacked barplot with multiple variables R - r

I have a table in R looking like this. Columns are male and female. Rows are 4 variables with both a no & yes. The values are actually the proportions. So in column 1 the sum of value in row 1 and 2 sums up to 1, because this is the sum of proportions yes & no for variable 1.
propvars
prop_sum_male prop_sum_female
1_no 0.90123457 0.96296296
1_yes 0.09876543 0.03703704
2_no 0.88750000 0.96296296
2_yes 0.11250000 0.03703704
3_no 0.88750000 1.00000000
3_yes 0.11250000 0.00000000
4_no 0.44444444 0.40740741
4_yes 0.55555556 0.59259259
I want to created a stacked barplot for those 4 variables.
I used
barplot(propvars)
which gives me this:
But as you can see the distinction between male & female is correct, but he puts all variables on top of each other. And I need 4 different bars next to each other for the 4 variables, with every bar representing yes/no stacked on top of each other. So the Y-axis should go from 0-1 instead of from 0-4 like now.
Any hints on how to do this?

This may be helpful. I arranged your data in order to draw a graph. I added row name as a column. Then, I changed the data to a long-format data.
DATA & CODE
mydf <- structure(list(prop_sum_male = c(0.90123457, 0.09876543, 0.8875,
0.1125, 0.8875, 0.1125, 0.44444444, 0.55555556), prop_sum_female = c(0.96296296,
0.03703704, 0.96296296, 0.03703704, 1, 0, 0.40740741, 0.59259259
)), .Names = c("prop_sum_male", "prop_sum_female"), class = "data.frame", row.names = c("1_no",
"1_yes", "2_no", "2_yes", "3_no", "3_yes", "4_no", "4_yes"))
library(qdap)
library(dplyr)
library(tidyr)
library(ggplot2)
mydf$category <- rownames(mydf)
df <- mydf %>%
gather(Gender, Proportion, - category) %>%
mutate(Gender = char2end(Gender, "_", 2)) %>%
separate(category, c("category", "Response"))
ggplot(data = df, aes(x = category, y = Proportion, fill = Response)) +
geom_bar(stat = "identity", position = "stack") +
facet_grid(. ~ Gender)

Related

How do I construct a crosswalk table with multiple crosswalks with the ggplot2 package?

I wanna present my crosswalk results for 5 different crosswalks in a combined table with the ggplot2 package.
I've created a data.frame with all results that need to be displayed:
crosswalk <- data.frame(subset(fsdiscDET,, (1:2)),(subset(fsdiscDIS,, 2)),
(subset(fsdiscANT,, 2)), (subset(fsdiscPSY,, 2)),(subset(fsdiscANPICD,, 2)),
(subset(fsdiscANPID5,,2)))
#Define Column names for the data frame named "crosswalk
colnames(crosswalk) <- c("SumScore", "ThetaDET", "ThetaDIS","ThetaANT", "ThetaPSY",
"ThetaANPiCD", "ThetaPID5BF+M")
The table is constructed like this:
SumScores ThetaDET ThetaDIS ThetaANT ThetaPSY ThetaANPiCD ThetaPID5BF+M
0 -2 ... ....
1
2
3
4
5
6
7
8
Unfortunately, I can't show my real results but the table is filled with scores, that should be displayed as a crosswalk from the sum scores, so here is some example data: (first row)
> dput(head(crosswalk, 1))
structure(list(SumScore = 0L, ThetaDET = -0.880871248855981,
ThetaDIS = -0.594351208632866, ThetaANT = -0.463495518249115,
ThetaPSY = -0.471562212797643, ThetaANPiCD =
-0.850865132469677,
`ThetaPID5BF+M` = -0.91391979254119), row.names = 0L, class =
"data.frame")
Here is an example of what I want to create: example
In my case, the different "columns" of the example would be the sum scores (0 to 8) of the table I have created above. The crosswalk would than be to place the sum scores on the y-axis (Theta), where the corresponding Theta score would be. So the different columns like ThetaDET and ThetaDIS are all filled with values from -3 to 4 and should be represented at the left y-axis of the graphic.
Does someone have an idea how to do that?
Here's an example with the mtcars dataset. We can reshape long, then scale within each variable, and plot:
library(tidyverse)
mtcars %>%
rownames_to_column() %>%
pivot_longer(-rowname) %>%
group_by(name) %>%
mutate(scaled = as.numeric(scale(value))) %>%
ungroup() %>%
ggplot(aes(name, scaled, label = value, color = name)) +
geom_point(shape = "-", size = 7) +
geom_text(hjust = -0.5, size = 2, alpha = 0.7, check_overlap =TRUE) +
scale_x_discrete(position = "top", name = NULL) +
guides(color = "none") +
theme_minimal()

Group differences barplot

I'm still pretty new to R, I'm sorry if the question is a stupid one!
For some descriptives I created a barplot to visualize group differences in my sample. I have two groups of people - suicide attempters and non-attempters. They differ regarding their diagnoses and so far I have a plot showing me how many people per group I have with a certain diagnosis, but I'd like to have a bar representing those people per group who do not have this diagnosis.
So I'd have a bar representing the number of people with MDD in the attempters group, a bar for those without in the attempters group, a bar for those with MDD in the non-attempters and a bar for those without MDD in the non-attempters.
Regarding my data: Everything is coded as 0 or 1, except for the attempters or not.
My old data frame looks something like this:
code
MDD
Anxiety
PTBS
attempters
01
0
1
1
1
02
1
1
0
0
03
0
0
1
0
04
0
1
0
0
At first I changed my data from wide to long and recoded the grouping variable attempters to a factor:
# create data frame for attempters
data_attempters <- data_gesamt %>%
pivot_longer(cols = c(MDD, Anxiety, PTBS),
names_to = "predictors", values_to = "value") %>%
filter(value == 1) %>%
# convert "attempters" to factor
mutate(attempter = as.factor(attempters)) %>%
# rename factor levels
mutate(attempter = recode_factor(attempter, "yes" = "0", "no" = "1")) %>%
group_by(predictors, attempter) %>%
summarize(severity = n(),.groups = "drop")
which got me a data frame as follows:
predictors
attempters
severity
MDD
0
1
Anxiety
1
1
Anxiety
0
2
PTBS
1
1
PTBS
0
1
and then used the following to plot:
plot_attempers <- data_attempters %>%
ggplot(aes(x = attempter, y = severity,
fill = attempter, group = attempter)) +
geom_bar(stat = "identity",
# position_dodge for avoid bar stacked on each other
position = position_dodge()) +
scale_fill_manual(labels = labs, values = c("0" = "#999999", "1" = "#CC79A7")) +
facet_grid(.~ predictors) +
scale_y_continuous(limits = c(0, 12), breaks = seq(0, 12, by = 1)) +
theme(legend.position = "bottom",
axis.text.x=element_blank()) +
ylab("Frequency")
plot_attempers
Did I add something in the code where I converted the data which made me lose the data about those who do not have a certain diagnosis which is why it is not shown in my plot? Or what do I need to add to get the non-diagnosis-people in the plot as well? Because as I can see in the new data-frame, I did lose those who do not have a diagnosis ...
My plot looks like this so far (please ignore the diagnoses I did not mention in my explanation here. I did not include them in this post so it is a smaller sample as well):
I would want four bars per diagnosis (two per group, one of them representing people with the diagnosis and one representing the people without)
You have to summarise your data first: Here I create a little example that simulate your data:
library(ggplot2)
library(reshape2)
df <- data.frame(code=1:100,
MDD=sample(0:1,100,replace = T,prob = c(0.3,0.7)),
anxiety=sample(0:1,100,replace = T,prob = c(0.4,0.6)),
PTBS=sample(0:1,100,replace = T),
attempters=sample(0:1,100,replace = T,prob = c(0.2,0.8)))
x <- reshape2::melt(df[,-1],id.vars="attempters",variable.name="diagnosis")
t <- x %>% group_by(diagnosis,attempters) %>%
summarise(sick=sum(value==1),healt=sum(value==0))
t <- reshape2::melt(t,id.vars=c("diagnosis","attempters"))
tt <-as.data.frame( apply(t, 2, as.factor))
ggplot(tt,aes(x=attempters,y=value))+
geom_bar(stat = "identity",aes(fill=variable),position = "dodge")+
facet_wrap(~diagnosis)+
scale_fill_manual(values = c("#CC79A7","#999999"))
and this is the resulting plot

How can I create stacked bar chart with mean and scale variables (1-5) on x axe?

I got a df where variables 1-5 is scale with values total counts.
df<-data.frame(
speed=c(2,3,3,2,2),
race=c(5,5,4,5,5),
cake=c(5,5,5,4,4),
lama=c(2,1,1,1,2))
library(data.table)
dcast(melt(df), variable~value)
# variable 1 2 3 4 5
#1 speed 0 3 2 0 0
#2 race 0 0 0 1 4
#3 cake 0 0 0 2 3
#4 lama 3 2 0 0 0
I want to do stacked bar chart with mean and scale variables 1-5 on x axe by variables in first column (speed, race, cake, lama).
I tried solution from Stacked Bar Plot in R, but there is not what I am looking for.
I had to try a few things and do some workarround to get something very close to want you are looking for (given that I understood the problem correctly):
library(dplyr)
library(ggplot2)
library(tidyr)
df<-data.frame(
speed=c(2,3,3,2,2),
race=c(5,5,4,5,5),
cake=c(5,5,5,4,4),
lama=c(2,1,1,1,2))
# get the data in right shape for ggplot2
dfp <- df %>%
# a column that identifies the rows uniquely is needed ("name of data row")
dplyr::mutate(ID = as.factor(dplyr::row_number())) %>%
# the data has to shaped into "tidy" format (similar to excel pivot)
tidyr::pivot_longer(-ID) %>%
# order by name and ID
dplyr::arrange(name, ID) %>%
# group by name
dplyr::group_by(name) %>%
# calculate percentage and cumsum to be able to calculate label position (p2)
dplyr::mutate(p = value/sum(value),
c= cumsum(p),
p2 = c - p/2,
# the groups or x-axis values have to be recoded to numeric type
name = recode(name, "cake" = 1, "lama" = 2, "race" = 3, "speed" = 4))
# calculate the mean value per group (or label) as you want them in the plot
sec_labels <- dfp %>%
dplyr::summarise(m = mean(value)) %>%
pull(m)
dfp %>%
# building base plot, telling to fill by the new name variable
ggplot2::ggplot(aes(x = name, y = value, fill = ID)) +
# make it a stacked bar chart by percentiles
ggplot2::geom_bar(stat = "identity", position = "fill") +
# recode the x axis labels and add a secondary x axis with the labels
ggplot2::scale_x_continuous(breaks = 1:4,
labels = c("cake", "lama","race", "speed"),
sec.axis = sec_axis(~.,
breaks = 1:4,
labels = sec_labels)) +
# flip the chart by to the side
ggplot2::coord_flip() +
# scale the y axis (now after flipping x axis) to percent
ggplot2::scale_y_continuous(labels=scales::percent) +
# add a layer with labels acording to p2
ggplot2::geom_text(aes(label = value,
y=p2)) +
# put a name to the plot
ggplot2::ggtitle("meaningfull plot name") +
# put the labels on top
ggplot2::theme(legend.position = "top")

R ggplot2: unify fill for grouped samples with binary data in geom_tile

I'm trying to display an "absence/presence" heatmap with geom_tile in R. I would like to have a fill for "1" or "present" if a feature (here: OTU) can be found in at least one of the samples within a group. So below is the example code, where I grouped the samples by site:
library(reshape2)
library(ggplot2)
df <- data.frame(
OTU = c("OTU001", "OTU002", "OTU003", "OTU004", "OTU005"),
Sample1 = c(0,0,1,1,0),
Sample2 = c(1,0,0,1,0),
Sample3 = c(1,1,0,1,0),
Sample4 = c(1,1,1,1,0))
molten_df <- melt(df)
# add group data
sites <- data.frame(
site = c(rep("site_A", 10), rep("site_B", 10)))
molten_df2 <- cbind(molten_df, sites)
# plot heatmap based on group variable sites
ggplot(molten_df2, aes(x = site, y = OTU, fill = value)) +
geom_tile()
the tile (site_A, OTU003) consists of the values Sample1 = 1 and Sample2 = 0 and the outcome is 0. On the other hand, the tile (site_B, OTU003) also has Sample3 = 0 and Sample4 = 1, but it turns out as 1. Maybe it uses the last value for the fill? As I would like to display 1 if an OTU appears in any of the grouped samples regardless of the order, I wondered if anyone knows how to do this within ggplot2?
The other way I thought of (but failed coding) is to write a function that sets the remaining values of a given tile to 1, if at least one 1 appears.
With library dplyr, you can create a new variable indicating if OTU at a given site is present in, at least, one sample :
tmp = group_by(molten_df2,OTU, site) %>%
summarise(., PA=as.factor(ifelse(sum(value)>0,1,0)))
Then plot :
ggplot(tmp, aes(x = site, y = OTU, fill = PA)) +
geom_tile()
Or directly inside the ggplot function :
ggplot(group_by(molten_df2,OTU, site) %>%
summarise(., PA=factor(ifelse(sum(value)>0,1,0))),
aes(x = site, y = OTU, fill =PA)) +
geom_tile()

How to draw stacked barplot on the summed data

For data called df that reads:
car suv pickup
1 2 1
2 3 4
4 1 2
5 4 2
3 1 1
total = apply(df,1,sum)
barplot(total,col= rainbow(5))
So what I did right now is plotting a barplot on total number of cars, which are in fact, the sum of each row. What I want to do now is to present it as a stack barplot on the sum.
For now, it would just show "total" without any lines indicating whether 1 car, 2 suv, 1 pickup addes to 4 "total".
Note. It is different from barplot(matrix(df)), because that's just dividing it my car,suv,pickup, that disregards total number.
You can achieve this easily using ggplot2 and reshape2.
You will need an ID column to track the rows, so I have added that in. I melt the data to long type so that the different groups can be managed and plotted accordingly.
Then plot using geom_bar, specifying the row ids as the x axis and the groupings (fill and colour) for the stack plot and legend.
library(reshape2)
library(ggplot2)
df <- data.frame("ID" = c(1,2,3,4,5), "car" = c(1,2,4,5,3), "suv" = c(2,3,1,4,1), "pickup" = c(1, 4, 2, 2, 1))
long_df <- df %>% melt(id.vars = c("ID") ,value.name = "Number", variable.name = "Type")
ggplot(data = long_df, aes(x = ID, y = Number)) +
geom_bar(aes(fill = Type, colour = Type),
stat = "identity",
position = "stack")
With base R
df %>% melt(id.vars = c("ID") ,value.name = "Number", variable.name = "Type") %>%
dcast(Type ~ ID, value.var = "Number") %>%
as.matrix() %>%
barplot()
Are you after something like this?
library(tidyverse)
df %>%
rowid_to_column("row") %>%
gather(k, v, -row) %>%
ggplot(aes(row, v, fill = k)) +
geom_col()
We use a stacked barplot here, so there is no need to manually calculate the sum. The key here is to transform data from wide to long and keep track of the row.
Sample data
df <- read.table(text =
"car suv pickup
1 2 1
2 3 4
4 1 2
5 4 2
3 1 1", header = T)

Resources