Geom_bar with R (Beginner) - r

Good morning all,
I work on data that I would like to represent in the form of a bar graph by two according to my two departments. I generated a dataframe that looks like this:
> test = data.frame (type_transport = sample (c ("ON FOOT", "CAR", "TRANSPORT COMMON"), 5000, replace = T), type_route = sample (c ("N", "D", " A "," VC "), 5000, replace = T), department = sample (c (" department1"," department2"), 5000, replace = T), troncon = sample (x = 0: 17 , 5000, replace = T))
By entering this formula, I get a bar graph:
> ggplot (test, aes (x = route_type, y = troncon_km, fill = department)) + geom_bar (stat = "identity")
https://zupimages.net/viewer.php?id=20/19/vt1s.png
Now, I would like to split these bars in half, to display the data according to my two departments. For this, I use position = "dodge":
> ggplot (test, aes (x = road_type, y = troncon_km, fill = department)) + geom_bar (stat = "identity", position = "dodge")
But there is a problem. The Y scale is far too small compared to reality (we go from several thousand on the first graph to 15 on the second). I obviously missed something ...
https://zupimages.net/viewer.php?id=20/19/sbh5.png
I do not understand.
Thank you.

The reason why all bars are of equal height is because geom_bar(stat="identity") will plot a bar for each observation (and the height of the bar will equal the value for that observation). Since every category in both departments have at least 1 observation of 17, all bars are showing that value.
There are several ways to move forward:
1.
ggplot(test, aes(type_route, troncon_km, fill = department)) +
stat_summary(geom = "bar", position = "dodge", fun.y = sum)
The fun.y argument can be any other function (e.g. mean, or median etc.)
2.
library("tidyverse")
total_km <- test %>%
group_by(department, type_route) %>%
summarise(total_km = sum(troncon_km))
ggplot(total_km, aes(type_route, total_km, fill = department)) +
geom_bar(stat = "identity", position = "dodge")
Again you can change the sum() function within the summarise() to your liking.
using the same data frame total_km, only a litle bit shorter using geom_col
ggplot(total_km, aes(type_route, total_km, fill = department)) +
geom_col(position = "dodge")
Hope this helps.

Related

stack bars by an ordering variable which is numeric ggplot

I am trying to create a swimlane plot of different subjects doses over time. When I run my code the bars are stacked by amount of dose. My issue is that subjects doses vary they could have 5, 10 , 5 in my plot the 5's are stacked together. But I want the represented as they happen over time. In my data set I have the amount of time each patient was on a dose for ordered by when they had the dose. I want by bars stacked by ordering variable called "p" which is numeric is goes 1,2,3,4,5,6 etc which what visit the subject had that dose.
ggplot(dataset,aes(x=diff+1, y=subject)) +
geom_bar(stat="identity", aes(fill=as.factor(EXDOSE))) +
scale_fill_manual(values = dosecol, name="Actual Dose in mg")
I want the bars stacked by my variable "p" not by fill
I tried forcats but that does not work. Unsure how to go about this the data in the dataset is arranged by p for each subject
example data
dataset <- data.frame(subject = c("1002", "1002", "1002", "1002", "1034","1034","1034","1034"),
exdose = c(5,10,20,5,5,10,20,20),
p= c(1,2,3,4,1,2,3,4),
diff = c(3,3,9,7,3,3,4,5)
)
ggplot(dataset,aes(x=diff+1, y=subject)) +
geom_bar(stat="identity", aes(fill=as.factor(exdose)),position ="stack") +
scale_fill_manual(values = dosecol, name="Actual Dose in mg")
If you want to order your stacked bar chart by p you have to tell ggplot2 to do so by mapping p on the group aesthetic. Otherwise ggplot2 will make a guess which by default is based on the categorical variables mapped on any aesthetic, i.e. in your case the fill aes:
Note: I dropped the scale_fill_manual as you did not provide the vector of colors. But that's not important for the issue.
library(ggplot2)
ggplot(dataset, aes(x = diff + 1, y = subject, group = p)) +
geom_col(aes(fill = as.factor(exdose)))
EDIT And to get the right order we have to reverse the order of the stack which could be achieved using position_stack(reverse = TRUE):
Note: To check that we have the right order I added a geom_text showing the p value.
ggplot(dataset, aes(x = diff + 1, y = subject, group = p)) +
geom_col(aes(fill = as.factor(exdose)), position = position_stack(reverse = TRUE)) +
geom_text(aes(label = p), position = position_stack(reverse = TRUE))
Second option would be to convert p to a factor which the order of levels set in the reverse order:
ggplot(dataset, aes(x = diff + 1, y = subject, group = factor(p, rev(sort(unique(p)))))) +
geom_col(aes(fill = as.factor(exdose))) +
geom_text(aes(label = p), position = "stack")

ggplot2 and jitter/dodge points by a group

I have 'elevation' as my y-axis and I want it as a discrete variable (in other words I want the space between each elevation to be equal and not relative to the numerical differences). My x-axis is 'time' (julian date).
mydata2<- data.frame(
"Elevation" = c(rep(c(1200),10),rep(c(1325.5),10),rep(c(1350.75),10), rep(c(1550.66),10)),
"Sex" = c(rep(c("F","M"),20)),
"Type" = c(rep(c("emerge","emerge","endhet","endhet","immerge","immerge","melt","melt", "storpor","storpor"),4)),
"mean" = c(rep(c(104,100,102,80,185,210,84,84,188,208,104,87,101,82, 183,188,83,83,190,189),2))
"se"=c(rep(c(.1,.01,.2,.02,.03),4)))
mydata2$Sex<-factor(mydata2$Sex))
mydata2$Type<-factor(mydata2$Type))
mydata2$Elevation<-factor(mydata2$Elevation))
at<-ggplot(mydata2, aes(y = mean, x = Elevation,color=Type, group=Sex)) +
geom_pointrange(aes(ymin = mean-se, ymax = mean+se),
position=position_jitter(width=0.2,height=.1),
linetype='solid') +
facet_grid(Sex~season,scales = "free")+
coord_flip()
at
Ideally, I would like each 'type' to be separated vertically. When I jitter or dodge only those that are close separate and not evenly. Is there a way to force each 'type' to be slightly shifted so they are all on their own line? I tried to force it by giving each type a slightly different 'elevation' but then I end up with a messy y-axis (I can't figure out a way to keep the point but not display all the tick marks with a discrete scale).
Thank you for your help.
If you want to use a numerical value as a discrete value, you should use as.factor. In your example, try to use x = as.factor(Elevation).
Additionally, I will suggest to use position = position_dodge() to get points from different conditions corresponding to the same elevation to be plot side-by-side
ggplot(mydata2, aes(y = mean, x = as.factor(Elevation),color=Type, group=Sex)) +
geom_pointrange(aes(ymin = mean-se, ymax = mean+se),
position=position_dodge(),
linetype='solid') +
facet_grid(Sex~season,scales = "free")+
coord_flip()
EDIT with example data provided by the OP
Using your dataset, I was not able to get range being plot with your point. So, I create two variable Lower and Upper using dplyr package.
Then, I did not pass your commdnas facotr(...) you provided in your question but instead, I used as.factor(Elevation) and position_dodge(0.9) for the plotting to get the following plot:
library(tidyverse)
mydata2 %>% mutate(Lower = mean-se*100, Upper = mean+se*100) %>%
ggplot(., aes( x = as.factor(Elevation), y = mean, color = Type))+
geom_pointrange(aes(ymin = Lower, ymax = Upper), linetype = "solid", position = position_dodge(0.9))+
facet_grid(Sex~., scales = "free")+
coord_flip()
Does it look what you are looking for ?
Data
Your dataset provided contains few errors (too much parenthesis), so I correct here.
mydata2<- data.frame(
"Elevation" = c(rep(c(1200),10),rep(c(1325.5),10),rep(c(1350.75),10), rep(c(1550.66),10)),
"Sex" = rep(c("F","M"),20),
"Type" = rep(c("emerge","emerge","endhet","endhet","immerge","immerge","melt","melt", "storpor","storpor"),4),
"mean" = rep(c(104,100,102,80,185,210,84,84,188,208,104,87,101,82, 183,188,83,83,190,189),2),
"se"=rep(c(.1,.1,.2,.05,.03),4))

position_dodge() does not seem to work with stat_summary() and x variables with different fill groups

Using stat_summary(geom = "bar) + stat_summary(geom = "errorbar") does not seem to work with position_dodge(), in the case of x values with varying numbers of condition groups.
I am trying to make a (what should be straightforward) barplot with ggplot2. My data has a number of different samples (x variable), and some of these samples also have a fill (condition) variable ("Scr" or "shRNA") while others don't (condition = NA). When I attempt to plot these data using the stat_summary wrappers to make bar plots with error bars, the position_dodge function for errorbars only works on samples that do not have different fill groups. The stat_summary(geom = "barplot") seems to be functional, because the separate bars do show up, but their error bars are not aligned.
test <- data.frame(Sample = c(rep("A",6),rep("B",3)),
Target = c(rep("GENE1",9)),
val = c(1.1,1.2,1.15,.5,.6,.7,.95,1,1.05),
condition = c(rep("Scr",3),rep("shRNA",3),rep(NA,3)))
g <- ggplot(data=test,aes(x=Sample,y=val,fill=condition)) +
stat_summary(geom = "bar", fun.y = mean,position = position_dodge2(width=.5,preserve = "single"),color="black",width=.8) +
stat_summary(geom = "errorbar", fun.data = mean_se, position = position_dodge2(width=.2,preserve = "single"),width=.2) +
scale_y_continuous(expand = expand_scale(mult = c(0,.2))) +
#scale_fill_discrete(guide=guide_legend(title="",nrow=2))
I expect the position_dodge() argument in both stat_summary()'s to align error bars to the correct x position, regardless of whether or not that particular sample has one or two fill groups.
I'm a bit confused about what you're trying to do. Why not use geom_col/geom_bar instead of stat_summary? I always prefer keeping data manipulation/summarisation and plotting separate.
This is what I'd do
library(tidyverse)
test %>%
group_by(Sample, condition) %>%
summarise(val.mean = mean(val), val.sd = sd(val)) %>%
ggplot(aes(Sample, val.mean, fill = condition)) +
geom_col(position = position_dodge(width = 0.8)) +
geom_errorbar(
aes(ymin = val.mean - val.sd, ymax = val.mean + val.sd),
position = position_dodge(width = 0.8),
width = 0.2)

ggplot2 multiple time-series plots

I'm just learning ggplot, so my apologies if this is a really basic question. I have data that has been aggregated by year with a few different qualities to slice on (code below will generate sample data). I'm trying to show a few different charts: one that shows overall for a given metric, then a couple that show the same metric split across the qualities, but its not going right. Ideally, I want to make the plot once, then call the geom layer for each of the individual charts. I do have examples of how I want it to look in the code as well.
I'm starting to think this is a data structure issue, but really can't figure it out.
Secondary question - My years are formatted as integers, is that the best way to do that here, or should I convert them to dates?
library(data.table)
library(ggplot2)
#Generate Sample Data - Yearly summarized data
BaseData <- data.table(expand.grid(dataYear = rep(2010:2017),
Program = c("A","B","C"),
Indicator = c("0","1")))
set.seed(123)
BaseData$Metric1 <- runif(nrow(BaseData),min = 10000,100000)
BaseData$Metric2 <- runif(nrow(BaseData),min = 10000,100000)
BaseData$Metric3 <- runif(nrow(BaseData),min = 10000,100000)
BP <- ggplot(BaseData, aes(dataYear,Metric1))
BP + geom_area() #overall Aggregate
BP + geom_area(position = "stack", aes(fill = Program)) #Stacked by Program
BP + geom_area(position = "stack", aes(fill = Indicator)) #stacked by Indicator
#How I want them to look
##overall Aggregate
BP.Agg <- BaseData[,.(Metric1 = sum(Metric1)),
by = dataYear]
ggplot(BP.Agg,aes(dataYear, Metric1))+geom_area()
##Stacked by Program
BP.Pro <- BaseData[,.(Metric1 = sum(Metric1)),
by = .(dataYear,
Program)]
ggplot(BP.Pro,aes(dataYear, Metric1, fill = Program))+geom_area(position = "stack")
##stacked by Indicator
BP.Ind <- BaseData[,.(Metric1 = sum(Metric1)),
by = .(dataYear,
Indicator)]
ggplot(BP.Ind,aes(dataYear, Metric1, fill = Indicator))+geom_area(position = "stack")
I was right, it was an easy fix. I should have used stat_summary instead of geom_area, here are the correct layers to add:
BP + stat_summary(fun.y = sum, geom = "area")
BP + stat_summary(fun.y = sum, geom = "area", position = "stack", aes(fill = Program, group = Program))
BP + stat_summary(fun.y = sum, geom = "area", position = "stack", aes(fill = Indicator, group = Indicator))

Changing colour schemes between facets

I have a data.frame, something like the following:
set.seed(100)
df <- data.frame(year = rep(2011:2014, 3),
class = rep(c("high", "middle", "low"), each = 4),
age_group = rep(1:3, each = 4),
value = sample(1:2, 12, rep = TRUE))
and I am looking to produce, by facet-ing (by the variable age_group) three plots which look similar to those produced by the following code:
library(ggplot2)
blue <- c("#bdc9e1", "#74a9cf", "#0570b0")
ggplot(df) + geom_bar(aes(x = year, y = value,
fill = factor(class, levels = c("high", "middle", "low"))),
stat = "identity") +
scale_fill_manual(values = c(blue)) +
guides(fill = FALSE)
however, where each facet has a different colour scheme where all the colours are specified by myself.
I appear to want a more specific version of what is going on here: ggplot2: Change color for each facet in bar chart
So, using the data I have provided, I am looking to get three facet-ed plots, split by age_group where the fill is given in each plot by the level of class, and all colours (9 total) would be specified manually by myself.
Edit: For clarification, the facet that I would like to end up with is indeed provided by the following code:
ggplot(df) + geom_bar(aes(x = year, y = value,
fill = factor(class, levels = c("high", "middle", "low"))),
stat = "identity") +
scale_fill_manual(values = c(blue)) +
guides(fill = FALSE) +
facet_wrap(~ age_group)
with the added level of control of colour subset by the class variable.
I'm not entirely sure why you want to do this, so it is a little hard to know whether or not what I came up with addresses your actual use case.
First, I generated a different data set that actually has each class in each age_group:
set.seed(100)
df <- data.frame(year = rep(2011:2014, 3),
class = rep(c("high", "middle", "low"), each = 12),
age_group = rep(1:3, each = 4),
value = sample(1:2, 36, rep = TRUE))
If you are looking for a similar dark-to-light gradient within each age_group you can accomplish this directly using alpha and not worry about adding extra data columns:
ggplot(df) +
geom_bar(aes(x = year, y = value,
fill = factor(age_group)
, alpha = class ),
stat = "identity") +
facet_wrap(~age_group) +
scale_alpha_discrete(range = c(0.4,1)) +
scale_fill_brewer(palette = "Set1"
, name = "age_group")
Here, I set the range of the alpha to give reasonably visible colors, and just chose a default palette from RColorBrewer to show the idea. This gives:
It also gives a relatively usable legend as a starting point, though you could modify it further (here is a similar legend answer I gave to a different question: https://stackoverflow.com/a/39046977/2966222 )
Alternatively, if you really, really want to specify the colors yourself, you can add a column to the data, and base the color off of that:
df$forColor <-
factor(paste0(df$class, " (", df$age_group , ")")
, levels = paste0(rep(c("high", "middle", "low"), times = 3)
, " ("
, rep(1:3, each = 3)
, ")") )
Then, use that as your fill. Note here that I am using the RColorBrewer brewer.pal to pick colors. I find that the first color is too light to show up for bars like this, so I excluded it.
ggplot(df) +
geom_bar(aes(x = year, y = value,
fill = forColor),
stat = "identity") +
scale_fill_manual(values = c(brewer.pal(4, "Blues")[-1]
, brewer.pal(4, "Reds")[-1]
, brewer.pal(4, "Purples")[-1]
)
, name = "Class (age_group)") +
facet_wrap(~age_group)
gives:
The legend is rather busy, but could be modified similar to the other answer I linked to. This would then allow you to set whatever 9 (or more, for different use cases) colors you wanted.

Resources