stack bars by an ordering variable which is numeric ggplot - r

I am trying to create a swimlane plot of different subjects doses over time. When I run my code the bars are stacked by amount of dose. My issue is that subjects doses vary they could have 5, 10 , 5 in my plot the 5's are stacked together. But I want the represented as they happen over time. In my data set I have the amount of time each patient was on a dose for ordered by when they had the dose. I want by bars stacked by ordering variable called "p" which is numeric is goes 1,2,3,4,5,6 etc which what visit the subject had that dose.
ggplot(dataset,aes(x=diff+1, y=subject)) +
geom_bar(stat="identity", aes(fill=as.factor(EXDOSE))) +
scale_fill_manual(values = dosecol, name="Actual Dose in mg")
I want the bars stacked by my variable "p" not by fill
I tried forcats but that does not work. Unsure how to go about this the data in the dataset is arranged by p for each subject
example data
dataset <- data.frame(subject = c("1002", "1002", "1002", "1002", "1034","1034","1034","1034"),
exdose = c(5,10,20,5,5,10,20,20),
p= c(1,2,3,4,1,2,3,4),
diff = c(3,3,9,7,3,3,4,5)
)
ggplot(dataset,aes(x=diff+1, y=subject)) +
geom_bar(stat="identity", aes(fill=as.factor(exdose)),position ="stack") +
scale_fill_manual(values = dosecol, name="Actual Dose in mg")

If you want to order your stacked bar chart by p you have to tell ggplot2 to do so by mapping p on the group aesthetic. Otherwise ggplot2 will make a guess which by default is based on the categorical variables mapped on any aesthetic, i.e. in your case the fill aes:
Note: I dropped the scale_fill_manual as you did not provide the vector of colors. But that's not important for the issue.
library(ggplot2)
ggplot(dataset, aes(x = diff + 1, y = subject, group = p)) +
geom_col(aes(fill = as.factor(exdose)))
EDIT And to get the right order we have to reverse the order of the stack which could be achieved using position_stack(reverse = TRUE):
Note: To check that we have the right order I added a geom_text showing the p value.
ggplot(dataset, aes(x = diff + 1, y = subject, group = p)) +
geom_col(aes(fill = as.factor(exdose)), position = position_stack(reverse = TRUE)) +
geom_text(aes(label = p), position = position_stack(reverse = TRUE))
Second option would be to convert p to a factor which the order of levels set in the reverse order:
ggplot(dataset, aes(x = diff + 1, y = subject, group = factor(p, rev(sort(unique(p)))))) +
geom_col(aes(fill = as.factor(exdose))) +
geom_text(aes(label = p), position = "stack")

Related

How to figure out sequence problem in bar plot?

data3 <- data.frame(
Count=c(61,80,47,54,41,92,14,37,33,17,5,53),
category=c('M','M','M','M','M','M','M','M','F','F','F','F'),
Agecate =c('<=5', '6-14', '15-19','above 20',
'<=5', '6-14', '15-19','above 20',
'<=5', '6-14', '15-19','above 20'))
I want to plot count vs category using a bar plot as
ggplot(data3, aes(x = reorder(category,Count), y = Count, fill = Agecate)) +
geom_bar(stat = "identity",position = "dodge") + theme_bw(base_size = 30)+
geom_text(aes(label=Count), size=5, position=position_dodge(width=0.9), vjust=-0.25)+
theme(axis.text.x = element_text(size = 25))+theme(axis.text = element_text(face="bold"))+
xlab('Woredas')+ylab('Number of hh members') +labs(fill='Age group')
How can I improve this figure on the age group sequence? category 6-14 is coming at the end but must come next to <5. And also some counts labeled twice in the bar plot?
Just reorder your variable levels before launching plot, with factor function
data3$Agecate <- factor(data3$Agecate,
levels=c("<=5", "6-14", "15-19", "above 20"))
Concerning double label issue, you have twice each information for 'M' category, only keep one or sum them so you only display labels once or create a third group to discriminate them.
The plot only took the first ones into account for bar height.

dodge columns in ggplot2

I am trying to create a picture that summarises my data. Data is about prevalence of drug use obtained from different practices form different countries. Each practice has contributed with a different amount of data and I want to show all of this in my picture.
Here is a subset of the data to work on:
gr<-data.frame(matrix(0,36))
gr$drug<-c("a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b")
gr$practice<-c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r")
gr$country<-c("c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3","c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3")
gr$prevalence<-c(9.14,5.53,16.74,1.93,8.51,14.96,18.90,11.18,15.00,20.10,24.56,22.29,19.41,20.25,25.01,25.87,29.33,20.76,18.94,24.60,26.51,13.37,23.84,21.82,23.69,20.56,30.53,16.66,28.71,23.83,21.16,24.66,26.42,27.38,32.46,25.34)
gr$prop<-c(0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406,0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406)
gr$low.CI<-c(8.27,4.80,12.35,1.83,7.22,14.53,18.25,10.56,14.28,18.76,24.25,21.72,18.62,19.83,24.36,25.22,28.80,20.20,17.73,23.15,21.06,13.12,21.79,21.32,22.99,19.76,29.60,15.41,28.39,23.25,20.34,24.20,25.76,26.72,31.92,24.73)
gr$high.CI<-c(10.10,6.37,22.31,2.04,10.00,15.40,19.56,11.83,15.74,21.52,24.87,22.86,20.23,20.68,25.67,26.53,29.86,21.34,20.21,26.10,32.79,13.63,26.02,22.33,24.41,21.39,31.48,17.98,29.04,24.43,22.01,25.12,27.09,28.05,33.01,25.95)
The code I wrote is this
p<-ggplot(data=gr, aes(x=factor(drug), y=as.numeric(gr$prevalence), ymax=max(high.CI),position="dodge",fill=practice,width=prop))
colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2))
p + theme_bw()+
geom_bar(stat="identity",position = position_dodge(0.9)) +
labs(x="Drug",y="Prevalence") +
geom_errorbar(ymax=gr$high.CI,ymin=gr$low.CI,position=position_dodge(0.9),width=0.25,size=0.25,colour="black",aes(x=factor(drug), y=as.numeric(gr$prevalence), fill=practice)) +
ggtitle("Drug usage by country and practice") +
scale_fill_manual(values = colour)+ guides(fill=F)
The figure I obtain is this one where bars are all on top of each other while I want them "dodge".
I also obtain the following warning:
ymax not defined: adjusting position using y instead
Warning message:
position_dodge requires non-overlapping x intervals
Ideally I would get each bar near one another, with their error bars in the middle of its bar, all organised by country.
Also should I be concerned about the warning (which I clearly do not fully understand)?
I hope this makes sense. I hope I am close enough, but I don't seem to be going anywhere, some help would be greatly appreciated.
Thank you
ggplot's geom_bar() accepts the width parameter, but doesn't line them up neatly against one another in dodged position by default. The following workaround references the solution here:
library(dplyr)
# calculate x-axis position for bars of varying width
gr <- gr %>%
group_by(drug) %>%
arrange(practice) %>%
mutate(pos = 0.5 * (cumsum(prop) + cumsum(c(0, prop[-length(prop)])))) %>%
ungroup()
x.labels <- gr$practice[gr$drug == "a"]
x.pos <- gr$pos[gr$drug == "a"]
ggplot(gr,
aes(x = pos, y = prevalence,
fill = country, width = prop,
ymin = low.CI, ymax = high.CI)) +
geom_col(col = "black") +
geom_errorbar(size = 0.25, colour = "black") +
facet_wrap(~drug) +
scale_fill_manual(values = c("c1" = "gray79",
"c2" = "gray60",
"c3" = "gray39"),
guide = F) +
scale_x_continuous(name = "Drug",
labels = x.labels,
breaks = x.pos) +
labs(title = "Drug usage by country and practice", y = "Prevalence") +
theme_classic()
There is a lot of information you are trying to convey here - to contrast drug A and drug B across countries using the barplots and accounting for proportions, you might use the facet_grid function. Try this:
colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2))
gr$drug <- paste("Drug", gr$drug)
p<-ggplot(data=gr, aes(x=factor(practice), y=as.numeric(prevalence),
ymax=high.CI,ymin = low.CI,
position="dodge",fill=practice, width=prop))
p + theme_bw()+ facet_grid(drug~country, scales="free") +
geom_bar(stat="identity") +
labs(x="Practice",y="Prevalence") +
geom_errorbar(position=position_dodge(0.9), width=0.25,size=0.25,colour="black") +
ggtitle("Drug usage by country and practice") +
scale_fill_manual(values = colour)+ guides(fill=F)
The width is too small in the C1 country and as you indicated the one clinic is quite influential.
Also, you can specify your aesthetics with the ggplot(aes(...)) and not have to reset it and it is not needed to include the dataframe objects name in the aes function within the ggplot call.

ggplot bar_plot with 3 levels [duplicate]

I'm hoping to use ggplot2 to generate a set of stacked bars in pairs, much like this:
With the following example data:
df <- expand.grid(name = c("oak","birch","cedar"),
sample = c("one","two"),
type = c("sapling","adult","dead"))
df$count <- sample(5:200, size = nrow(df), replace = T)
I would want the x-axis to represent the name of the tree, with two bars per tree species: one bar for sample one and one bar for sample two. Then the colors of each bar should be determined by type.
The following code generates the stacked bar with colors by type:
ggplot(df, aes(x = name, y = count, fill = type)) + geom_bar(stat = "identity")
And the following code generates the dodged bars by sample:
ggplot(df, aes(x = name, y = count, group = sample)) + geom_bar(stat = "identity", position = "dodge")
But I can't get it to dodge one of the groupings (sample) and stack the other grouping (type):
ggplot(df, aes(x = name, y = count, fill = type, group = sample)) + geom_bar(stat = "identity", position = "dodge")
One workaround would be to put interaction of sample and name on x axis and then adjust the labels for the x axis. Problem is that bars are not put close to each other.
ggplot(df, aes(x = as.numeric(interaction(sample,name)), y = count, fill = type)) +
geom_bar(stat = "identity",color="white") +
scale_x_continuous(breaks=c(1.5,3.5,5.5),labels=c("oak","birch","cedar"))
Another solution is to use facets for name and sample as x values.
ggplot(df,aes(x=sample,y=count,fill=type))+
geom_bar(stat = "identity",color="white")+
facet_wrap(~name,nrow=1)

Sum percentages for each facet - respect "fill"

I am trying to create a faceted barplot, with percentages adding up to 100 for each facet. The solution to this seems to be a combination of group and ..density... How ever - it seems to me that groupis conflicting with fill.
Data:
test <- data.frame(
test1 = sample(letters[1:2], 100, replace = TRUE),
test2 = sample(letters[3:8], 100, replace = TRUE)
)
This gets the percentages right:
ggplot(test, aes(test2)) +
geom_bar(aes(y = ..density.., fill=test2,group=test1)) +
facet_grid(~test1)
Bus as you can see, fillis overwritten:
However, the code below respects fill but gives me the wrong percentages (sums to 100 for the whole chart)(using ..density..):
ggplot(test, aes(test2)) +
geom_bar(aes(y = ..count../sum(..count..), fill=test2)) +
facet_grid(~test1)
Related: This old question of mine: percentage on y lab in a faceted ggplot barchart?.
And yes - I could create additional data, but I feel this belongs in the presentation layer. Actually this feels like a bug?
This is a bit of a hack, but you can reference ..x.. within the geom_bar call. The only problem is that ggplot considers this numeric and so I have coerced to factor and given nice labels within a call to scale_fill_brewer
ggplot(test, aes(x= test2, group = test1)) +
geom_bar(aes(y = ..density.., fill = factor(..x..))) +
facet_grid(~test1) +
scale_fill_brewer(name = 'test2', breaks = 1:6,
labels = levels(test$test2), palette = 'Set3')
compare with not coercing ..x.. to a factor
ggplot(test, aes(x= test2, group = test1)) +
geom_bar(aes(y = ..density.., fill = ..x..)) +
facet_grid(~test1)

How to order bars in faceted ggplot2 bar chart

If I want to order the bars in a ggplot2 barchart from largest to smallest, then I'd usually update the factor levels of the bar category, like so
one_group <- data.frame(
height = runif(5),
category = gl(5, 1)
)
o <- order(one_group$height, decreasing = TRUE)
one_group$category <- factor(one_group$category, levels = one_group$category[o])
p_one_group <- ggplot(one_group, aes(category, height)) +
geom_bar(stat = "identity")
p_one_group
If have have several groups of barcharts that I'd like in different facets, with each facet having bars ordered from largest to smallest (and different x-axes) then the technique breaks down.
Given some sample data
two_groups <- data.frame(
height = runif(10),
category = gl(5, 2),
group = gl(2, 1, 10, labels = letters[1:2])
)
and the plotting code
p_two_groups <- ggplot(two_groups, aes(category, height)) +
geom_bar(stat = "identity") +
facet_grid(. ~ group, scales = "free_x")
p_two_groups
what do I need to do to get the bar ordering right?
If it helps, an equivalent problem to solve is: how do I update factor levels after I've done the faceting?
here is a hack:
two_groups <- transform(two_groups, category2 = factor(paste(group, category)))
two_groups <- transform(two_groups, category2 = reorder(category2, rank(height)))
ggplot(two_groups, aes(category2, height)) +
geom_bar(stat = "identity") +
facet_grid(. ~ group, scales = "free_x") +
scale_x_discrete(labels=two_groups$category, breaks=two_groups$category2)
make UNIQUE factor variable for all entries (category2)
reorder the variable based on the height
plot on the variable: aes(x=category2)
re-label the axis using original value (category) for the variable (category2) in scale_x_discrete.
Here is a hack to achieve what you want. I was unable to figure out how to get the category values below the tick marks. So if someone can help fix that, it would be wonderful. Let me know if this works
# add a height rank variable to the data frame
two_groups = ddply(two_groups, .(group), transform, hrank = rank(height));
# plot the graph
p_two_groups <- ggplot(two_groups, aes(-hrank, height)) +
geom_bar(stat = "identity") +
facet_grid(. ~ group, scales = "free_x") +
opts(axis.text.x = theme_blank()) +
geom_text(aes(y = 0, label = category, vjust = 1.5))

Resources