Grouped bar plot column width uneven due to no data

Grouped bar plot column width uneven due to no data - r

I am trying to display a grouped bar plot for my dataset, however, due to some months have no data (no income), the column width is showing up as unequal and I was hoping to have the same column width regardless if some states have no income. Notice how the bar plot is grouped for January, something grouped like that across all months although other states have no income (I'd like to have them spaced out if some states do not have any income). Any help will be much appreciated, thanks.
library(ggplot2)
plot = ggplot(Checkouts, aes(fill=Checkouts$State, x=Checkouts$Month, y=Checkouts$Income)) +
geom_bar(colour = "black", stat = "identity")
My Bar Plot
Checkouts table/data

There are two ways that this can be done.
If you are using the latest version of ggplot2(from 2.2.1 I believe), there is a parameter called preserve in the function position_dodge which preserves the vertical position and adjust only the horizontal position. Here is the code for it.
Code:
import(ggplot2)
plot = ggplot(Checkouts, aes(fill=Checkouts$State, x=Checkouts$Month, y=Checkouts$Income)) +
geom_bar(colour = "black", stat = "identity", position = position_dodge(preserve = 'single'))
Another way is to precompute and add dummy rows for each of the missing. using table is the best solution.

You are looking for position_dodge2(preserve = "single")(https://ggplot2.tidyverse.org/reference/position_dodge.html).
library(ggplot2)
plot = ggplot(Checkouts, aes(fill = State, x = Month, y= Income)) +
geom_bar(colour = "black", stat = "identity",
position = position_dodge2(preserve = "single"))
Also, you don't need to specify the columns to the data frame with $ in ggplot(). For example, Checkouts$State can be replaced with State.

Related

ggplot bar graph by percentages

I am trying to make a bar graph showing ages of first alcohol use by county by percent. I am not quite sure where the mistake is and would appreciate another set of eyes.
Data is publicly available here: https://www.datafiles.samhsa.gov/dataset/national-survey-drug-use-and-health-2020-nsduh-2020-ds0001 although I have cleaned it on my computer.
The percentages are definitely not out of 100 and the numbers are not adjusting for population. They are the same as my chart showing raw numbers.
palc.age.ct<-data1.cleaned%>%
mutate(ALCTRY= na_if(x=ALCTRY, y="Never Used"))%>%
drop_na(ALCTRY)%>%
ggplot(aes(x=ALCTRY, fill=COUTYP4))+
geom_bar (position = "dodge") +
geom_bar(aes(y = (..count..)/sum(..count..)))+
scale_y_continuous(labels=scales::percent)+
theme_minimal()+
labs(title = "First Alcohol Use by Age and Locality",
x="Age Initiated", y="Number Initiated")+
scale_color_viridis(option = "D")

I'm not recreating everything you did like labelling the bins, but based on the data you can do something like below. Note that you need to include the position = "dodge" in the bar chart where you want to calculate the percentage. Otherwise the calculation is done in a different geom than the one that is creating the grouped bar geom. Which is the reason for your issue.
library(dplyr)
library(ggplot2)
NSDUH_2020 %>%
select(alctry, COUTYP4) %>%
mutate(altcry = if_else(alctry > 66, NA_integer_, alctry),
COUTYP4 = forcats::as_factor(COUTYP4)) %>%
filter(!is.na(altcry)) %>%
ggplot(aes(x = alctry, fill = COUTYP4)) +
geom_bar(aes(y = (..count..)/sum(..count..)), position = "dodge") +
scale_y_continuous(labels = scales::label_percent(accuracy = .1)) +
scale_x_binned()

what are these gray lines inside the bars of my ggplot bargraph?

I wanted to create a graph to show which continent has the highest co2 emissions total per capita over the years,
so I coded this:
barchart <- ggplot(data = worldContinents,
aes(x = Continent,
y = `Per.Capita`,
colour = Continent))
+ geom_bar(stat = "identity")
barchart
This is my dataframe:
Geometry is just for some geographical mapping later.
I checked which(is.na(worldContinents$Per.Capita)) to see whether there were NA values but it returned nothing
What's causing these gray lines?
How do I get rid of them?
These are the gray lines inside the bar graph
Thank you

You have a couple of issues here. First of all, I'm guessing you want the fill to be mapped to the continent, not color, which only controls the color of the bars' outlines.
Secondly, there are multiple values for each continent in your data, so they are simply stacking on top of each other. This is the reason for the lines in your bars, and is probably not what you want. If you want the average value per capita in each continent, you either need to summarise your data beforehand or use stat_summary like this:
barchart <- ggplot(data = worldContinents,
aes(x = Continent,
y = `Per.Capita`,
fill = Continent)) +
stat_summary(geom = "col", fun = mean, width = 0.7,
col = "gray50") +
theme_light(base_size = 20) +
scale_fill_brewer(palette = "Spectral")
barchart
Data used
Obviously, we don't have your data, so I used a modified version of the gapminder data set to match your own data structure
worldContinents <- gapminder::gapminder
worldContinents$Per.Capita <- worldContinents$gdpPercap
worldContinents$Continent <- worldContinents$continent
worldContinents <- worldContinents[worldContinents$year == 2007,]

ggplot - retaining axis label coloring with reordered data

I'm making a horizontal bar chart where each observation has a numeric count variable associated with it. I want to show the bars for each variable ordered by (descending) count, which is no problem. However I also want to highlight the variable name based on a third dichotomous variable. I found how to do the latter in another post on here, but I have been unable to combine the two. Here's an example of what I mean:
library(ggplot2)
testdata<-data.frame("var"=c('V1','V2','V3','V4'),"cat"=c('Y','N','Y','N'),
"count"=c(1,5,2,10))
ggplot(testdata, aes(var,count))+
geom_bar(stat='identity',colour='blue',fill='blue',width=0.3)+
coord_flip(ylim=c(0,10))+
theme(axis.text.y=
element_text(colour=ifelse(testdata$cat=="N","darkgreen","darkred"),
size=15))
That's the horizontal bar chart with highlighting, which works fine - V1/V3 are red and V2/V4 are green.
However when I try to sort it doesn't keep the groups:
ggplot(testdata, aes(reorder(var,count),count))+
geom_bar(stat='identity',colour='blue',fill='blue',width=0.3)+
coord_flip(ylim=c(0,10))+theme_classic()+
theme(axis.ticks.y=element_blank())+
theme(axis.text.y=
element_text(colour=ifelse(testdata$cat=="N","darkgreen","darkred"),
size=15))
In this second graph, V2 and V3 are the wrong color.
I also tried sorting the data by count first, and then using the first ggplot statement, however it still plots the data by variable name instead of count (and even if it did work, I would have to resolve tied count values). Any ideas? What I really need is for the dataframe in the "ifelse" colour to match the dataframe in the aes statement. I tried using the data frame that was sorted by descending count in the colour statement, but that also did not work.
Thanks
edit: more code
testdata$var = with(testdata, reorder(var, count))
ggplot(testdata, aes(var,count))+
geom_bar(stat='identity',colour='blue',fill='blue',width=0.3)+
coord_flip(ylim=c(0,10))+theme_classic()+
theme(axis.ticks.y=element_blank())+
theme(axis.text.y=
element_text(colour=ifelse(testdata$cat=="N","darkgreen","darkred"),
size=15))

My comment was partially incorrect. The order of the levels is the only thing that matters for the order of the axis, but when we do ifelse(testdata$cat == "N", "darkgreen", "darkred") of course it goes in the order of the data! So we need the order of the levels and the order of the data to be the same:
testdata$var = with(testdata, reorder(var, count))
testdata = testdata[order(testdata$var), ]
ggplot(testdata, aes(var, count)) +
geom_bar(
stat = 'identity',
colour = 'blue',
fill = 'blue',
width = 0.3
) +
coord_flip(ylim = c(0, 10)) + theme_classic() +
theme(axis.ticks.y = element_blank()) +
theme(axis.text.y =
element_text(
colour = ifelse(testdata$cat == "N", "darkgreen", "darkred"),
size = 15
))

Grouped bar plot in ggplot with y values based on combination of 2 categorical variables?

I am trying to create a grouped bar plot in ggplot, in which there should be 4 bars per each x value. Here is a subset of my data (actual data is about 4x longer):
Verb_Type,Frame,proportion_type,speaker
mental,V CP,0.209513024,Child
mental,V NP,0.138731597,Child
perception,V CP,0.017167382,Child
perception,V NP,0.387528402,Child
mental,V CP,0.437998087,Parent
mental,V NP,0.144086707,Parent
perception,V CP,0.042695836,Parent
perception,V NP,0.398376853,Parent
What I want is to plot Frame as the x values and proportion_type as the y values, but with the bars based on both Verb_Type and speaker. So for each x value (Frame), there would be 4 bars grouped together - a bar each for the proportion_type value corresponding to mental~child, mental~parent, perception~child, perception~parent. I need for the fill color to be based on Verb_Type, and the fill "texture" (saturation or something) based on speaker. I do not want stacked bars, as it would not accurately represent the data.
I don't want to use facet grids because I find it visually difficult to compare all 4 bars when they're separated into 2 groups. I want to group all the bars together so that the visualization is easier. But I can't figure out how to make the appropriate groupings. Is this something I can do in ggplot, or do I need to manipulate the data before plotting? I tried using melt to reshape the data, but either I was doing it wrong, or that's not what I actually should be doing.

I think you are looking for the interaction() (i.e. get all unique pairings) between df$Verb_Type and df$speaker to get the column groupings you are after. You can pass this directly to ggplot or make a new variable ahead of time:
ggplot(df, aes(x = Frame, y = proportion_type,
group = interaction(Verb_Type, speaker), fill = Verb_Type, alpha = speaker)) +
geom_bar(stat = "identity", position = "dodge") +
scale_alpha_manual(values = c(.5, 1))
Or:
df$grouper <- interaction(df$Verb_Type, df$speaker)
ggplot(df, aes(x = Frame, y = proportion_type,
group = grouper, fill = Verb_Type, alpha = speaker)) +
geom_bar(stat = "identity", position = "dodge") +
scale_alpha_manual(values = c(.5, 1))

Overlay raw data onto geom_bar

I have a data-frame arranged as follows:
condition,treatment,value
A , one , 2
A , one , 1
A , two , 4
A , two , 2
...
D , two , 3
I have used ggplot2 to make a grouped bar plot that looks like this:
The bars are grouped by "condition" and the colours indicate "treatment." The bar heights are the mean of the values for each condition/treatment pair. I achieved this by creating a new data frame containing the mean and standard error (for the error bars) for all the points that will make up each group.
What I would like to do is superimpose the raw jittered data to produce a bar-chart version of this box plot: http://docs.ggplot2.org/0.9.3.1/geom_boxplot-6.png [I realise that a box plot would probably be better, but my hands are tied because the client is pathologically attached to bar charts]
I have tried adding a geom_point object to my plot and feeding it the raw data (rather than the aggregated means which were used to make the bars). This sort of works, but it plots the raw values at the wrong x axis locations. They appear at the points at which the red and grey bars join, rather than at the centres of the appropriate bar. So my plot looks like this:
I can not figure out how to shift the points by a fixed amount and then jitter them in order to get them centered over the correct bar. Anyone know? Is there, perhaps, a better way of achieving what I'm trying to do?
What follows is a minimal example that shows the problem I have:
#Make some fake data
ex=data.frame(cond=rep(c('a','b','c','d'),each=8),
treat=rep(rep(c('one','two'),4),each=4),
value=rnorm(32) + rep(c(3,1,4,2),each=4) )
#Calculate the mean and SD of each condition/treatment pair
agg=aggregate(value~cond*treat, data=ex, FUN="mean") #mean
agg$sd=aggregate(value~cond*treat, data=ex, FUN="sd")$value #add the SD
dodge <- position_dodge(width=0.9)
limits <- aes(ymax=value+sd, ymin=value-sd) #Set up the error bars
p <- ggplot(agg, aes(fill=treat, y=value, x=cond))
#Plot, attempting to overlay the raw data
print(
p + geom_bar(position=dodge, stat="identity") +
geom_errorbar(limits, position=dodge, width=0.25) +
geom_point(data= ex[ex$treat=='one',], colour="green", size=3) +
geom_point(data= ex[ex$treat=='two',], colour="pink", size=3)
)

I found it is unnecessary to create separate dataframes. The plot can be created by providing ggplot with the raw data.
ex <- data.frame(cond=rep(c('a','b','c','d'),each=8),
treat=rep(rep(c('one','two'),4),each=4),
value=rnorm(32) + rep(c(3,1,4,2),each=4) )
p <- ggplot(ex, aes(cond,value,fill = treat))
p + geom_bar(position = 'dodge', stat = 'summary', fun.y = 'mean') +
geom_errorbar(stat = 'summary', position = 'dodge', width = 0.9) +
geom_point(aes(x = cond), shape = 21, position = position_dodge(width = 1))

You need just one call to geom_point() where you use data frame ex and set x values to cond, y values to value and color=treat (inside aes()). Then add position=dodge to ensure that points are dodgeg. With scale_color_manual() and argument values= you can set colors you need.
p+geom_bar(position=dodge, stat="identity") +
geom_errorbar(limits, position=dodge, width=0.25)+
geom_point(data=ex,aes(cond,value,color=treat),position=dodge)+
scale_color_manual(values=c("green","pink"))
UPDATE - jittering of points
You can't directly use positions dodge and jitter together. But there are some workarounds. If you save whole plot as object then with ggplot_build() you can see x positions for bars - in this case they are 0.775, 1.225, 1.775... Those positions correspond to combinations of factors cond and treat. As in data frame ex there are 4 values for each combination, then add new column that contains those x positions repeated 4 times.
ex$xcord<-rep(c(0.775,1.225,1.775,2.225,2.775,3.225,3.775,4.225),each=4)
Now in geom_point() use this new column as x values and set position to jitter.
p+geom_bar(position=dodge, stat="identity") +
geom_errorbar(limits, position=dodge, width=0.25)+
geom_point(data=ex,aes(xcord,value,color=treat),position=position_jitter(width =.15))+
scale_color_manual(values=c("green","pink"))

As illustrated by holmrenser above, referencing a single dataframe and updating the stat instruction to "summary" in the geom_bar function is more efficient than creating additional dataframes and retaining the stat instruction as "identity" in the code.
To both jitter and dodge the data points with the bar charts per the OP's original question, this can also be accomplished by updating the position instruction in the code with position_jitterdodge. This positioning scheme allows widths for jitter and dodge terms to be customized independently, as follows:
p <- ggplot(ex, aes(cond,value,fill = treat))
p + geom_bar(position = 'dodge', stat = 'summary', fun.y = 'mean') +
geom_errorbar(stat = 'summary', position = 'dodge', width = 0.9) +
geom_point(aes(x = cond), shape = 21, position =
position_jitterdodge(jitter.width = 0.5, jitter.height=0.4,
dodge.width=0.9))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Grouped bar plot column width uneven due to no data - r

Related

ggplot bar graph by percentages

what are these gray lines inside the bars of my ggplot bargraph?

ggplot - retaining axis label coloring with reordered data

Grouped bar plot in ggplot with y values based on combination of 2 categorical variables?

Overlay raw data onto geom_bar

Categories

Resources