Ordering bars in ggplot2 stacked barplot via levels() but output looks different - r

I'm struggling with my ggplot2 stacked barplot. I want to define the order of the bars manually. So I do that normally by transforming the variable into a factor and defining the levels in my desired order.
data <- transform(data, variable = factor(variable, levels = c("A4 Da/De/Du", "A2 London", "A3 Berlin", "A1 Paris", "A5 Rome")))
When I check my variable levels I can see that the levels are in my desired order to plot
head(data$variable)
When I plot my data everything looks as desired, but somehow, and I have no clue why, one variable (for example "A4 Da/De/Du") is not in my defined variable order...
Has someone an idea what the problem could be?
-It's the only variable with special characters (e.g "/") in it
-It's the only variable which has zero levels in it (e.g. c(20,40,0,0,40))
-My ggplot code is quite complex, and I use the "reorder()" function, and I use the "forcats" package in my ggplot2 code. Could that be a problem?
Thanks very much for any help or ideas!
EDIT (some example data)
library(reshape2)
library(ggplot2)
library(dplyr)
df <- data.frame(cbind(a=c(20,40,20,10,10),b=c(10,30,50,5,5), c=c(60,10,10,15,5), d=c(80,20,0,0,0), e=c(50,10,10,15,15)))
colnames(df) <- c("D1 Paris", "D2 London", "D3 Berlin", "D4 Da/De/Du", "D5 Rome")
df$category <- c("C1", "C2", "C3", "C4", "C5")
data <- data %>% group_by(variable) %>% arrange(variable)
data <- melt(data)
data$percent <- data$value/100
data <- transform(data, variable = factor(variable,
levels = c("D4 Da/De/Du", "D2 London", "D3 Berlin", "D1 Paris", "D5 Rome")))
And the short version of the ggplot2 code:
ggplot(data, aes(x=reorder((variable), percent), y=percent, fill=category)) +
coord_flip()+
geom_bar(stat="identity", width = .4, colour="black", lwd=0.1)
SOLUTION
I finally solved my problem :)
Gregor was right, after transforming the levels of the specific variable in the desired order, the reorder() function in ggplot2 is no longer necessary respectively overwrites the earlier defined levels, what at the end produced my error...
Thanks Gregor!

Related

How to change order of factor levels in ggplot facet wrap

I have a factor with levels Xa, aXa and aX. Since R treats the factors in alphabetical order, the default facet wraps are coming as aX, aXa and Xa. I want the Xa to be the first graph in the wrap. I tried the following code:
data_small<- read.csv("agg_cond_subj_123.csv")
data_small<-fct_relevel(data_small$pos, "Xa", "aXa", "aX")
And then fed it to ggplot:
data_small %>%
ggplot(aes(lg, prop))+
geom_boxplot()+
facet_wrap(~pos)+
labs(x="Language group",
y="Accuracy (%)")
Xa was still treated in the last order. I tried piping it directly through fct_reorder()
data_small %>%
fct_reorder(pos, "Xa", "aXa", "aX")
ggplot(aes(lg, prop))+
geom_boxplot()+
facet_wrap(~pos)+
labs(x="Language group",
y="Accuracy`enter code here` (%)")
but it gave an error: Error: f must be a factor (or character vector).
I looked at some related solutions on the platform already but they were not fulfilling my purpose.
Use functions that relevel the factor variable inside mutate:
data_small %>%
mutate(pos = factor(pos, levels = c("Xa", "aXa", "aX"))) %>%
ggplot(aes(lg, prop))+
geom_boxplot()+
facet_wrap(~pos)+
labs(x="Language group",
y="Accuracy`enter code here` (%)")

Sorting histogram plots within facet_wrap by skew

I have about 1K observations for each country and I have used facet_wrap to display each country's geom_bar but the output is by alphabetical order. I would want to cluster or order them by skew (so the most positive-skew are together and moving towards the normal-distribution countries, then the negative-skew countries ending with the most negative-skewed) without eyeballing what countries are more similar to each other. I was thinking maybe psych::describe() might be useful since it calculates skew, but I am having a hard time figuring out how I would implement adding that information to a similar question.
Any suggestions would be helpful
I can't go into too much detail without a reproducible example but this would be my general approach. Use psych::describe() to create a vector of countries that are sorted from most positive skew to least positive skew: country_order . Next, factor the country column in your dataset with country = factor(country, levels = country_order). When you use facet_wrap the plots will be displayed in the same order as country_order.
After some troubleshooting , I found (what I think is) an efficient way of doing it:
skews <- psych::describe.By(df$DV, df$Country, mat = TRUE) #.BY and mat will produce a matrix that you can use to merge into your df easily
skews %<>%select(group1, mean, skew) %>% sjlabelled::as_factor(., group1) #Turn it into a factor, I also kept country means
combined <- sort(union(levels(df$Country), levels(skews$group1))) #I was getting an error that my levels were inconsistent even though they were the same (since group1 came from df$Country) which I think was due to having Country reference category Germany which through off the alphabetical sort of group1 so I used [dfrankow's answer][1]
df <- left_join(mutate(df, Country=factor(Country, levels=combined)),
mutate(skews, Country=factor(group1, levels=combined))) %>% rename(`Country skew` = "skew", `Country mean` = "mean") %>% select(-group1)
df$`Country skew` <- round(df$`Country skew`, 2)
ggplot(df) +
geom_bar(aes(x = DV, y=(..prop..)))+
xlab("Scale axis text") + ylab("Proportion") +
scale_x_continuous()+
scale_y_continuous(labels = scales::percent_format(accuracy = 1))+
ggtitle("DV distribution by country mean")+
facet_wrap(~ Country %>% fct_reorder(.,mean), nrow = 2) #this way the reorder that was important for my lm can remain intact

Not sure why this subset is not working in ggplot

I have sub-setted my data set so that only three sites are included, as I only want to plot three sites and the following code does not seem to work with ggplot. Anyone have any idea why?
rm(list=ls())
require(ggplot2)
require(reshape2)
require(magrittr)
require(dplyr)
require(tidyr)
setwd("~/Documents/Results")
mydata <- read.csv("Metals sheet R.csv")
L <- subset(mydata, Site =="B1"| Site == "B2"| Site == "B3", select = c(Site,Date,Al))
L$Date <- as.Date(L$Date, "%d/%m/%Y")
ggplot(data=L, aes(x=Date, y=Al, xaxt="n", colour=Site)) +
geom_point() +
labs(title = "Total Al in the Barlwyd and Bowydd
19/03/2015.", x = "Site",
y = "Total concentration (mg/L)") +
scale_x_date(date_breaks = "1 month", labels = date_format("%m"))
It seems to falter after the ggplot line. Thanks in advance. I have double checked it but can't see anything wrong? I might possibly need a way to only plot three of my 21 sites.
The head of my subsetted L data set looks something like this (x58 reps)
Date Site Al
12/08/2015 B1 22.3
12/08/2015 B2 23.4
12/08/2015 B3 203
Thankyou in advance.
I think xaxt = "n" is wrong. The ggplot aes function is only for matching variables in your data to plot elements. To remove the x-axis text in ggplot, use the theme function e.g. ggplot2 plot without axes, legends, etc.
On a separate note, the %in% operator provides a quicker way of selecting a subset of values from a column:
subset(mydata, Site %in% c("B1", "B2", B3"))

R boxplot several variables at once

I have the following data set that I would like to make a boxplot from:
July<-c("Closed","Open")
Cistus<-c(10.8, 18.9)
CS<-c(2.004, 3.9)
Oak<-c(7.4, 12.4)
OS<-c(0.9,2.1)
df<-data.frame(July, Cistus, CS, Oak, OS)
I would like my boxplot to have Cistus and Oak at the x-axis, each with two boxes (opened and closed). So in total 4 boxes....
I am epically failing at this... Please can you help me? I'm sorry for the basic question.
Here is a modification of Vincent's code but with the subsetting to the desired categories:
library(reshape2)
#reshape into long format
dfnew<-melt(df, "July")
#subset down to just Cistus and Oak
dfnew<-droplevels(dfnew[dfnew$variable %in% c("Cistus", "Oak"),])
#plot
boxplot(value ~ July+variable, data=dfnew, las=2, col=c("grey10", "grey50"))
I would do it using reshape2 to arrange your data.frame. Then, you can use formula in boxplot, so:
library(reshape2)
boxplot(July + variable ~ value, melt(df))
With more than one value per group and some color:
df2 <- data.frame(July=rep(c("Closed", "Open"), each=5),
Cistus=runif(10),
CS=runif(10),
Oak=runif(10),
OS=runif(10))
boxplot(value ~ July + variable, melt(df2), col=c("grey10", "grey50"))
Is this what you're looking for?

How can I plot multiple variables side-by-side in a dotplot in R?

I'm still pretty new to R, and have come up against a plotting problem I can't find an answer to.
I've got a data frame that looks like this (though a lot bigger):
df <- data.frame(Treatment= rep(c("A", "B", "C"), each = 6),
LocA=sample(1:100, 18),
LocB=sample(1:100, 18),
LocC=sample(1:100, 18))
And I want dot plots that look like this one produced in Excel. It's exactly the formatting I want: a dotplot for each of the treatments side-by-side for each location, with data for multiple locations together on one graph. (Profuse apologies for not being able to post the image here; posting images requires a 10 reputation.)
It's no problem to make a plot for each location, with the dots color-coded, and so on:
ggplot(data = df, aes(x=Treatment, y=LocA, color = Treatment)) + geom_point()
but I can't figure out how to add locations B and C to the same graph.
Any advice would be much appreciated!
As a couple of people have mentioned, you need to "melt" the data, getting it into a "long" form.
library(reshape2)
df_melted <- melt(df, id.vars=c("Treatment"))
colnames(df_melted)[2] <- "Location"
In ggplot jargon, having different groups like treatment side-by-side is achieved through "dodging". Usually for things like barplots you can just say position="dodge" but geom_point seems to require a bit more manual specification:
ggplot(data=df_melted, aes(x=Location, y=value, color=Treatment)) +
geom_point(position=position_dodge(width=0.3))
You need to reshape the data. Here an example using reshape2
library(reshape2)
dat.m <- melt(dat, id.vars='Treatment')
library(ggplot2)
ggplot(data = dat.m,
aes(x=Treatment, y=value,shape = Treatment,color=Treatment)) +
geom_point()+facet_grid(~variable)
Since you want a dotplot, I propose also a lattice solution. I think it is more suitable in this case.
dotplot(value~Treatment|variable,
groups = Treatment, data=dat.m,
pch=c(25,19),
par.strip.text=list(cex=3),
cex=2)

Resources