How to stack marginal increase/decrease? Stacked bar plot -R - r

This question seems fairly simple, but I wasn't able to find another post that answers it (apologies if I've missed it though).
I have a variable with three factors (a value for each month). The data looks like the below:
id variable value
AZ Feb-20 1085
AZ Mar-20 1
AZ Apr-20 61
CO Feb-20 6
CO Mar-20 192
FO Apr-20 2
I want to stack the data, such that I have a bar for February, and then the marginal increases for March, and April stacked on top.
Right now, the values for each month are stacked on top of each other.
ggplot(df3, aes(x = id, y = value, fill=factor(variable, levels=c("Apr-20","Mar-20", "Feb-20")))) + geom_bar(stat = "identity")
How do I stack the increases from February? Is there a way to modify a stacked bar plot or do I need to try another method?
EDIT
After thinking on this, I believe the best solution is overlapping bars. But higher bars cover smaller bars. Changing the transparency isn't very useful with three factors. Maybe there is a way to reorder so that the smaller bars are in the forefront?
ggplot(df2) +geom_bar(aes(x = id, y = `Feb-20`), position = "identity",stat="identity", fill = 'green') +
geom_bar(aes(x = id, y = `Mar-20`), position = "identity", stat="identity",fill = 'navy') +
geom_bar(aes(x = id, y = `Apr-20`), position = "identity", stat="identity", fill = 'red')
Second Edit
Apologies, this is my fault for being unclear. Prior, I wanted to avoid the cumulative summing that occurs with stacked barplots, and asked that each additional month be added as a marginal increase.
Now, though, as overlapping barplots, it is not necessary for the bars to display change. The problem with the overlapping bar charts is just that the data is obscured for bars with a shorter height.
This plot still plots marginal change. I'm looking for something like plot 2 in this post, but where all of the data is visible.
EDIT 3
Maybe this is a better way to explain:
Take the example of 'WA' in the first plot. For Feb, the data point was 338, for March, the data point was 318, and for April, the data point was 2270. A stacked bar plot adds these on top of each other, cumulatively.
However, the bar that I want for 'WA' should really show 338 for February, then a drop of 20 for March. And finally, an addition of 1952 for April.
This is why I had used the language of a marginal increase/decrease for a stacked barplot. I had also tried an overlapping barplot, but all of the data is not visible, and longer bars cover shorter bars.

We can arrange based on converting the column to yearmon class (from zoo), then convert to factor with levels specified as the unique elements (or do a match and convert)
library(dplyr)
library(zoo)
library(ggplot2)
df1 %>%
arrange(id, as.yearmon(variable, '%b-%y')) %>%
mutate(variable = factor(variable, levels = unique(variable))) %>%
ggplot(aes(x = id, y = value, fill = variable)) +
geom_bar(stat = "identity")
If we wanted to automate the second plot
library(tidyr)
df2 <- df1 %>%
pivot_wider(names_from = variable, values_from = value)
p <- ggplot(df2)
colrs <- c('green', 'navy', 'red')
nm1 <- names(df2)[-1]
for(i in seq_along(nm1)) p <- p +
geom_bar(aes(x = id, y = !! rlang::sym(nm1[i])),
position = 'identity', stat = 'identity', fill = colrs[i])

Related

Make x-axis appear in a particular order in ggplot [duplicate]

This question already has answers here:
How do you specifically order ggplot2 x axis instead of alphabetical order? [duplicate]
(2 answers)
Order discrete x scale by frequency/value
(7 answers)
Closed 17 days ago.
I have a dataset:
data <- c('real','real','real','real','real','pred','pred','pred','pred','pred','real','real','real','real','pred','pred','pred','pred')
threshold <- c('>=1','>=2','>=3','>=4','>=101','>=1','>=2','>=3','>=4','>=101','>=1','>=2','>=3','>=4','>=1','>=2','>=3','>=4')
accuracy <- c(63.4,64.4,65.1,64.3,65.4,62.1,63.6,64.1,65.4,64.8,62.2,63.3,64.4,65.6,63.1,63.8,64.6,65.1)
types<-c('morning','morning','morning','morning','morning','morning','morning','morning','morning','morning','evening','evening','evening','evening','evening','evening','evening','evening')
df <- data.frame(data,threshold,accuracy,types)
I want to plot 'data' column as stacked barplot for morning and evening separately. So I use facet wrap. My code for plotting is:
ggplot(df, aes(x = threshold, y = accuracy)) + geom_bar(aes(fill = data), stat = "identity", color = "white",position = position_dodge(0.9))+
facet_wrap(~types) +
fill_palette("jco")
And the plot I get looks like:
However, as you can see the order of threshold got messed up. I want the order for morning to look like:
'>=1','>=2','>=3','>=4','>=101'
And the order for evening should be:
'>=1','>=2','>=3','>=4'
So I have three questions:
How can I enforce the order using my code?
2 Also for evening I shouldn't be getting '>=101' so how can I remove that from the plot.
Is there a way to make the background white but keep the grid.
And on a slightly unrelated note, can you point at a graph type that might be slightly better looking than this? I am new at visualisation so I am still learning.
Insights will be appreciated.
You may set order of threshold to reorder x axis.
Then add scales = 'free' in facet_wrap to remove >=101 in evening,
add theme_bw() to make background white.
df %>%
mutate(threshold = factor(threshold, levels = c('>=1','>=2','>=3','>=4','>=101'))) %>%
ggplot(aes(x = threshold, y = accuracy)) + geom_bar(aes(fill = data), stat = "identity", color = "white",position = position_dodge(0.9))+
facet_wrap(~types, scales = 'free') +
theme_bw() +
fill_palette("jco")

How Can I Plot Percentage Change for 3 Vectors in Same DataFrame?

I have three vectors and a list of crimes. Each crime represents a row. On each row, each vector identifies the percentage change in the number of incidents of each type from the prior year.
Below is the reproducible example. Unfortunately, the df takes the first value in and repeats in down the columns (this is my first sorta reproducible example).
crime_vec = c('\tSTRONGARM - NO WEAPON', '$500 AND UNDER', 'ABUSE/NEGLECT: CARE FACILITY', 'AGG CRIM')
change15to16vec = as.double(825, -1.56, -66.67, -19.13)
change16to17vec = as.double(8.11, .96, 50, 4.84)
change17to18vec = as.double(-57.50, 1.29, 83.33, 28.72)
df = data.frame(crime_vec, change15to16vec, change16to17vec, change17to18vec)
df
I need a graph that will take the correct data frame, show the crimes down the y axis and ALL 3 percentage change vectors on the x-axis in a dodged bar. The examples I've seen plot only two vectors. I've tried plot(), geom_bar, geom_col, but can only get one column to graph (occasionally).
Any suggestions for a remedy would help.
Not sure if this is what you are looking for:
library(tidyr)
library(ggplot2)
df %>%
pivot_longer(-crime_vec) %>%
ggplot(aes(x = value, y = crime_vec, fill = as.factor(name))) +
geom_bar(stat = "identity", position = "dodge") +
theme_minimal() +
xlab("Percentage Change") +
ylab("Crime") +
labs(fill = "Change from")
For using ggplot2 it's necessary, to bring your data into a long format. geom_bar should create your desired plot.

Grouped bar plot in ggplot with y values based on combination of 2 categorical variables?

I am trying to create a grouped bar plot in ggplot, in which there should be 4 bars per each x value. Here is a subset of my data (actual data is about 4x longer):
Verb_Type,Frame,proportion_type,speaker
mental,V CP,0.209513024,Child
mental,V NP,0.138731597,Child
perception,V CP,0.017167382,Child
perception,V NP,0.387528402,Child
mental,V CP,0.437998087,Parent
mental,V NP,0.144086707,Parent
perception,V CP,0.042695836,Parent
perception,V NP,0.398376853,Parent
What I want is to plot Frame as the x values and proportion_type as the y values, but with the bars based on both Verb_Type and speaker. So for each x value (Frame), there would be 4 bars grouped together - a bar each for the proportion_type value corresponding to mental~child, mental~parent, perception~child, perception~parent. I need for the fill color to be based on Verb_Type, and the fill "texture" (saturation or something) based on speaker. I do not want stacked bars, as it would not accurately represent the data.
I don't want to use facet grids because I find it visually difficult to compare all 4 bars when they're separated into 2 groups. I want to group all the bars together so that the visualization is easier. But I can't figure out how to make the appropriate groupings. Is this something I can do in ggplot, or do I need to manipulate the data before plotting? I tried using melt to reshape the data, but either I was doing it wrong, or that's not what I actually should be doing.
I think you are looking for the interaction() (i.e. get all unique pairings) between df$Verb_Type and df$speaker to get the column groupings you are after. You can pass this directly to ggplot or make a new variable ahead of time:
ggplot(df, aes(x = Frame, y = proportion_type,
group = interaction(Verb_Type, speaker), fill = Verb_Type, alpha = speaker)) +
geom_bar(stat = "identity", position = "dodge") +
scale_alpha_manual(values = c(.5, 1))
Or:
df$grouper <- interaction(df$Verb_Type, df$speaker)
ggplot(df, aes(x = Frame, y = proportion_type,
group = grouper, fill = Verb_Type, alpha = speaker)) +
geom_bar(stat = "identity", position = "dodge") +
scale_alpha_manual(values = c(.5, 1))

R ggplot how to overlay partial graph on the full graph

Is there a way to overlay partial graph on top of full graph using ggplot? I have one line graph with time span of say 100 days on X axis and need to add second line that only spans last 20 days, with different color; I don't want to plot second line as having zero values for first 80 days - need to only plot it for last 20 days- using different color. What is the best way to do that?
Sure, just use two geoms with different subsets of your data.frame (for simplicity I use the full df and only one subset):
library(ggplot2)
df <- data.frame(Index = 1:1000, Value = cumsum(rnorm(1000)))
ggplot() + geom_line(data = df, aes(x = Index, y = Value)) +
geom_line(data = df[500:700,], aes(x = Index, y = Value), col="red")

ggplot reorder stacked bar plot based on values in data frame

I am making stacked bar plots with ggplot2 in R with specific bar ordering about the y-axis.
# create reproducible data
library(ggplot2)
d <- read.csv(text='Day,Location,Length,Amount
1,4,3,1.1
1,3,1,2
1,2,3,4
1,1,3,5
2,0,0,0
3,3,3,1.8
3,2,1,3.54
3,1,3,1.1',header=T)
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount, order = Location), stat = "identity")
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount, order = rev(Location)), stat = "identity")
The first ggplot plot shows the data in order of Location, with Location=1 nearest the x-axis and data for each increasing value of Location stacked upon the next.
The second ggplot plot shows the data in a different order, but it doesn't stack the data with the highest Location value nearest the x-axis with the data for the next highest Location stacked in the second from the x-axis position for the first bar column, like I would expect it to based on an earlier post.
This next snippet does show the data in the desired way, but I think this is an artifact of the simple and small example data set. Stacking order hasn't been specified, so I think ggplot is stacking based on values for Amount.
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount), stat = "identity")
What I want is to force ggplot to stack the data in order of decreasing Location values (Location=4 nearest the x-axis, Location=3 next, ... , and Location=1 at the very top of the bar column) by calling the order = or some equivalent argument. Any thoughts or suggestions?
It seems like it should be easy because I am only dealing with numbers. It shouldn't be so hard to ask ggplot to stack the data in a way that corresponds to a column of decreasing (as you move away from the x-axis) numbers, should it?
Try:
ggplot(d, aes(x = Day, y = Length)) +
geom_bar(aes(fill = Amount, order = -Location), stat = "identity")
Notice how I swapped rev with -. Using rev does something very different: it stacks by the value for each row you happen to get if you reverse the order of values in the column Location, which could be just about anything.

Resources