ggplot2 position=dodge giving erroneous labels - r

I am trying to plot a simple bar chart with labels in ggplot2. However, when I use position=dodge, it puts the wrong labels in the resulting graphic, eg. 17.6% instead of 77.7% for Trucks. My data and code are below.
library(ggplot2)
mode <- factor(c("Truck", "Rail","Water","Air","Other"), levels=c("Truck", "Rail","Water","Air","Other"))
Year <- factor(c("2011","2011","2011","2011","2011","2040","2040","2040","2040","2040"))
share <- c(0.709946085, 0.175582806, 0.11392987, 0.000534132, 0.00000710797, 0.777162621, 0.133121584, 0.088818658, 0.000880041, 0.000017097)
modeshares <- data.frame(Year, mode, share)
theme_set(theme_grey(base_size = 18))
modeshares$lab <- as.character(round(100 * share,1))
modeshares$lab <- paste(modeshares$lab,"%",sep="")
ggplot(data=modeshares, aes(x=mode, y=share*100, fill=Year, ymax=(share*100))) + geom_bar(stat="identity", position="dodge") + labs(y="Percent",x="Mode") +geom_text(label=modeshares$lab,position=position_dodge(width=1),vjust=-0.5)
The resulting graph is shown below.
Any insights into how to ensure that the correct label values are displayed would be much appreciated.
Thanks!

Related

Why does changing the label mess up my plot?

I have recently been playing around with various plot types using fictitious data to get my head around how I could display various pieces of information. One plot type that is gaining popularity is the so called individual differences dot plot which shows the change in each subjects score pre-post. The plot is fairly easy to produce, but my issue is that when I go to change the labels using either the labs or xlab ylab functions in ggplot, the plot itself becomes messed up. Below I have attached the fictitious data, the code used and the results.
Data
df<- data.frame(Participant<- c(rep(1:10,2)), Score<- c(rnorm(20,100,5)), Session<- c(1,1,1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2,2,2,2))
colnames(df) <- c("Participant", "Score", "Session")
Code for plot
p<- ggplot(df, aes(x=df$Session, y=df$Score, colour=df$Participant))+ geom_point()+
geom_line(group=df$Participant)+
theme_classic()
Plot
Individual difference plot
My dilemma is that anytime I try to change the label names, the plot messes up as per below.
Problem
p + xlab("Session") + ylab("Score")
Plot after relabelling
The same thing happens if I try the labs function i.e, p + labs(x= "Session", y= "Score"). You can see that the labels themselves do actually change, but for some reason this messes up the actual plot. Does any have any ideas as to what could be going wrong here?
The issue appears to be the grouping is undone when the label functions are called. Instead, issue the grouping as an aesthetic mapping:
library(dplyr); library(ggplot)
df %>% mutate(across(c(Session,Participant),factor)) -> df
p <- ggplot(df, aes(x=Session, y=Score, colour=Participant))+ geom_point()+
geom_line(aes(group=Participant))+
theme_classic()
p + xlab("Session") + ylab("Score")
I suspect this is probably a bug.

R geom_line not plotting as expected

I am using the following code to plot a stacked area graph and I get the expected plot.
P <- ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) + #ggplot initial parameters
geom_ribbon(position='fill', aes(ymin=0, ymax=1))
but then when I add lines which are reading the same data source I get misaligned results towards the right side of the graph
P + geom_line(position='fill', aes(group=model, ymax=1))
does anyone know why this may be? Both plots are reading the same data source so I can't figure out what the problem is.
Actually, if all you wanted to do was draw an outline around the areas, then you could do the same using the colour aesthetic.
ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) +
geom_ribbon(position='fill', aes(ymin=0, ymax=1), colour = "black")
I have an answer, I hope it works for you, it looks good but very different from your original graph:
library(ggplot2)
DATA2 <- read.csv("C:/Users/corcoranbarriosd/Downloads/porsche model volumes.csv", header = TRUE, stringsAsFactors = FALSE)
In my experience you want to have X as a numeric variable and you have it as a string, if that is not the case I can Change that, but this will transform your bucket into a numeric vector:
bucket.list <- strsplit(unlist(DATA2$bucket), "[^0-9]+")
x=numeric()
for (i in 1:length(bucket.list)) {
x[i] <- bucket.list[[i]][2]
}
DATA2$bucket <- as.numeric(x)
P <- ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) +
geom_ribbon(aes(ymin=0, ymax=volume))+ geom_line(aes(group=model, ymax=volume))
It gives me the area and the line tracking each other, hope that's what you needed
If you switch to using geom_path in place of geom_line, it all seems to work as expected. I don't think the ordering of geom_line is behaving the same as geom_ribbon (and suspect that geom_line -- like geom_area -- assumes a zero base y value)
ggplot(DATA2, aes(x=bucket, y=volume, ymin=0, ymax=1,
group=model, fill=model, label=volume)) +
geom_ribbon(position='fill') +
geom_path(position='fill')
Should give you

Plot results from dist_tab() function from qdap library

I am interested in plotting the results from the following code which produces a frequency distribution table. I would like to graph the Freq column as a bar with the cum.Freq as a line both sharing the interval column as the x-axis.
library("qdap")
x <- c(1,2,3,2,4,2,5,4,6,7,8,9)
dist_tab(x)
I have been able to get the bar chart built using ggplot, but I want to take it further with the cum.Freq added as a secondary axis. I also want to add the percent and cum.percent values added as data labels. Any help is appreciated.
library("ggplot2")
ggplot(dist_tab(x), aes(x=interval)) + geom_bar(aes(y=Freq))
Not sure if I understand your question. Is this what you are looking for?
df <- dist_tab(x)
df.melt <- melt(df, id.vars="interval", measure.vars=c("Freq", "cum.Freq"))
#
ggplot(df.melt, aes(x=interval, y=value, fill=variable)) +
geom_bar(stat="identity", position="dodge")

Gradient Fill in Bar Graph

I'm looking at behavior of different groups of people (called Clusters in this data set) and their preference for the type of browser they use. I want to create a bar graph that shows the percentage of each cluster that is using each type of browser.
Here is some code to generate a similar dataset (please ignore that the percentages for each cluster will not add up to 1):
browserNames <- c("microsoft","mozilla","google")
clusterNames <- c("Cluster 1","Cluster 2","Cluster 3")
percentages <- runif(n=length(browserNames)*length(clusterNames),min=0,max=1)
myData<-as.data.frame(list(browserNames=rep(browserNames,3),
clusterNames=rep(clusterNames,each=3),
percentages=percentages))
Here's the code I've been able to come up with so far to get the graph I desire:
ggplot(myData, aes(x=browserNames, y=percentages, fill=factor(clusterNames))) +
geom_bar(stat="identity",position="dodge") +
scale_y_continuous(name="Percent Weight", labels=percent)
I want the fill for each cluster to be a gradient fill with high and low values that I determine. So, in this example, I would like to be able to set 3 high and low values for each cluster that is represented.
I've had trouble with the different scale_fill commands, and I'm new enough to ggplot that I am pretty sure I'm probably just doing it wrong. Any ideas?
Edit: Here is a picture of what I'm looking for:
(Original image available at https://www.dropbox.com/s/py6hifejqz7k54v/gradientExample.bmp)
Is this close to what you had in mind??
# color set depends on browser
library(RColorBrewer) # for brewer.pal(...)
gg <- with(myData, myData[order(browserNames,percentages),])
gg$colors <- 1:9
colors <- c(brewer.pal(3,"Reds"),brewer.pal(3,"Greens"),brewer.pal(3,"Blues"))
ggplot(zz, aes(x=browserNames, y=percentages,
fill=factor(colors), group=factor(clusterNames))) +
geom_bar(stat="identity",position="dodge", color="grey70") +
scale_fill_manual("Cluster", values=colors,
breaks=c(3,6,9), labels=c("Google","Microsoft","Mosilla"))
# color set depends on cluster
library(RColorBrewer) # for brewer.pal(...)
gg <- with(myData, myData[order(clusterNames,percentages),])
gg$colors <- 1:9
col <- c(brewer.pal(3,"Reds"),brewer.pal(3,"Greens"),brewer.pal(3,"Blues"))
ggplot(gg, aes(x=browserNames, y=percentages,
fill=factor(colors), group=factor(clusterNames))) +
geom_bar(stat="identity",position="dodge", color="grey70") +
scale_fill_manual("Cluster", values=col,
breaks=c(3,6,9), labels=c("Cluster1","Cluster2","Cluster3"))

How can I change the colors in a ggplot2 density plot?

Summary: I want to choose the colors for a ggplot2() density distribution plot without losing the automatically generated legend.
Details: I have a dataframe created with the following code (I realize it is not elegant but I am only learning R):
cands<-scan("human.i.cands.degnums")
non<-scan("human.i.non.degnums")
df<-data.frame(grp=factor(c(rep("1. Candidates", each=length(cands)),
rep("2. NonCands",each=length(non)))), val=c(cands,non))
I then plot their density distribution like so:
library(ggplot2)
ggplot(df, aes(x=val,color=grp)) + geom_density()
This produces the following output:
I would like to choose the colors the lines appear in and cannot for the life of me figure out how. I have read various other posts on the site but to no avail. The most relevant are:
Changing color of density plots in ggplot2
Overlapped density plots in ggplot2
After searching around for a while I have tried:
## This one gives an error
ggplot(df, aes(x=val,colour=c("red","blue"))) + geom_density()
Error: Aesthetics must either be length one, or the same length as the dataProblems:c("red", "blue")
## This one produces a single, black line
ggplot(df, aes(x=val),colour=c("red","green")) + geom_density()
The best I've come up with is this:
ggplot() + geom_density(aes(x=cands),colour="blue") + geom_density(aes(x=non),colour="red")
As you can see in the image above, that last command correctly changes the colors of the lines but it removes the legend. I like ggplot2's legend system. It is nice and simple, I don't want to have to fiddle about with recreating something that ggplot is clearly capable of doing. On top of which, the syntax is very very ugly. My actual data frame consists of 7 different groups of data. I cannot believe that writing + geom_density(aes(x=FOO),colour="BAR") 7 times is the most elegant way of coding this.
So, if all else fails I will accept with an answer that tells me how to get the legend back on to the 2nd plot. However, if someone can tell me how to do it properly I will be very happy.
set.seed(45)
df <- data.frame(x=c(rnorm(100), rnorm(100, mean=2, sd=2)), grp=rep(1:2, each=100))
ggplot(data = df, aes(x=x, color=factor(grp))) + geom_density() +
scale_color_brewer(palette = "Set1")
ggplot(data = df, aes(x=x, color=factor(grp))) + geom_density() +
scale_color_brewer(palette = "Set3")
gives me same plots with different sets of colors.
Provide vector containing colours for the "values" argument to map discrete values to manually chosen visual ones:
ggplot(df, aes(x=val,color=grp)) +
geom_density() +
scale_color_manual(values=c("red", "blue"))
To choose any colour you wish, enter the hex code for it instead:
ggplot(df, aes(x=val,color=grp)) +
geom_density() +
scale_color_manual(values=c("#f5d142", "#2bd63f")) # yellow/green

Resources