I would like to create a stacked bar graph however my output shows overlaid bars instead of stacked. How can I rectify this?
#Create data
date <- as.Date(rep(c("1/1/2016", "2/1/2016", "3/1/2016", "4/1/2016", "5/1/2016"),2))
sales <- c(23,52,73,82,12,67,34,23,45,43)*1000
geo <- c(rep("Western Territory",5), rep("Eastern Territory",5))
data <- data.frame(date, sales, geo)
#Plot
library(ggplot2)
ggplot(data=data, aes(x=date, y=sales, fill=geo))+
stat_summary(fun.y=sum, geom="bar") +
ggtitle("TITLE")
Plot output:
As you can see from the summarized table below, it confirms the bars are not stacked:
>#Verify plot is correct
>ddply(data, c("date"), summarize, total=sum(sales))
date total
1 0001-01-20 90000
2 0002-01-20 86000
3 0003-01-20 96000
4 0004-01-20 127000
5 0005-01-20 55000
Thanks!
You have to include position="stack" in your statSummary:
stat_summary(position="stack",fun.y=sum, geom="bar")
Alternatively, since your data are already summarized, you could use geom_col (the short hand for geom_bar(stat = "identity")):
ggplot(data=data, aes(x=date, y=sales, fill=geo))+
geom_col() +
scale_x_date(date_labels = "%b-%d")
Produces:
Note that I changed the date formatting (by adding format = "%m/%d/%Y" to the as.Date call) and explictly set the axis lable formatting.
If your actual data have more than one entry per period, you can always summarise first, then pass that into ggplot instead of the raw data.
Related
I have a ggplot graph and I want to draw two lines on it (from different columns, but for the same date). What I get are two lines that are stacked on each other, but I want to have the same y-axis, ordered correctly, with the lines overlapping each other.
This is the data I'm trying to plot:
final_table:
Month a b
1 2018-04 758519.397875 2404429.258675
2 2018-05 964792.603725 1995902.14473
3 2018-06 703170.240575 1294997.84319
This is my code:
bla3 <- melt(final_table, id='Month')
ggplot(data=bla3, aes(x=Month, y=value, colour= variable, group=variable)) +
geom_line()
And the output I get (notice the y-axis is totally wrong and unordered).
I guess that your data variable is not in the right format. E.g. if you run
class(final_table$month)
This should yield date. So you need to get it into the right format. Here's an example with your numbers.
Month <- as.character(c("2018-04", "2018-05", "2018-06")) #or convert it to character after
a <- c(758519.397875, 964792.603725, 703170.240575)
b <- c(2404429.258675, 1995902.14473, 1294997.84319)
final_table <- data.frame(Month, a, b)
#your Month variable is messed up, you actually need the day!
final_table$Month <- as.Date(paste(final_table$Month,"-01",sep=""))
library(reshape) #need to load that for melt
bla3 <- melt(final_table, id='Month')
ggplot(data=bla3, aes(x=Month, y=value, colour= variable, group=variable)) +
geom_line()
I'm sure this is a very simple question for most of you, but I'm new and can't figure it out. How do you create a side by side box plot grouped by time? For example, I have 24 months of data. I want to make one box plot for the first 12 months, and another for the second 12 months. My data can be seen below.
Month,Revenue
1,94000
2,81000
3,117000
4,105000
5,117000
6,89000
7,101000
8,118000
9,105000
10,123000
11,109000
12,89000
13,106000
14,159000
15,121000
16,135000
17,116000
18,133000
19,144000
20,130000
21,142000
22,124000
23,140000
24,104000
Since your data has a time ordering, it might be illuminating to plot line plots by month for each year separately. Here is code for both a line plot and a boxplot. I just made up the year values in the code below, but you can make those whatever is appropriate:
library(ggplot2)
# Assuming your data frame is called "dat"
dat$Month.abb = month.abb[rep(1:12,2)]
dat$Month.abb = factor(dat$Month.abb, levels=month.abb)
dat$Year = rep(2014:2015, each=12)
ggplot(dat, aes(Month.abb, Revenue, colour=factor(Year))) +
geom_line(aes(group=Year)) + geom_point() +
scale_y_continuous(limits=c(0,max(dat$Revenue))) +
theme_bw() +
labs(colour="Year", x="Month")
ggplot(dat, aes(factor(Year), Revenue)) +
geom_boxplot() +
scale_y_continuous(limits=c(0,max(dat$Revenue))) +
theme_bw() +
labs(x="Year")
I am trying to plot trip length distribution (for every 10 miles increase in distance I want to find out the Percent of trips in that bin for that specific year). When I plot it in ggplot2 my X-axis tick labels are ordered alphabetically rather than in the order of increasing distance. I have tried using the various tricks suggested (Change the order of a discrete x scale) but am not getting anywhere. The one link My code is below and the dataset is here (http://goo.gl/W1jjfL).
library(ggplot2)
library(reshape2)
nwpt <- subset(nonwork, select=c(Distance, PersonTrips1995, PersonTrips2001, PersonTrips2009))
nwpt <- melt(nwpt, id.vars="Distance")
ggplot(data=nwpt, aes(x=Distance, y=value, group=variable, colour=variable)) + scale_x_discrete(name="Distance") + geom_line(size=0.5) + ggtitle("Non Work Person Trips") + ylab("Percent")
I checked to see if the Distance variable is a factor and it is as shown below:
is.factor(nwpt$Distance) 1 TRUE
However, the output I am getting is not as I desire. Instead of Under 10 Miles being the first category, 10-14 miles being next etc. I get the plot like shown below (PDF here: http://goo.gl/V7yvxT).
Any help is appreciated.
TIA
Krishnan
Here's one way:
library(ggplot2)
library(reshape2)
nwpt <- subset(nonwork,
select=c(DID,Distance,PersonTrips1995,PersonTrips2001,PersonTrips2009))
nwpt <- melt(nwpt, id.vars=c("DID","Distance"))
ggplot(data=nwpt, aes(x=DID, y=value, colour=variable)) +
geom_line(size=0.5) +
labs(title="Non Work Person Trips", y="Percent") +
scale_x_discrete(name="Distance", labels=nwpt$Distance) +
theme(axis.text.x=element_text(angle=90))
Produces this with your dataset:
I have data that I am trying to plot. I have several variables that range from the years 1880-2012. I have one observation per year. But sometimes a variable does not have an observation for a number of years. For example, it may have an observation from 1880-1888, but then not from 1889-1955 and then from 1956-2012. I would like ggplot2 + geom_line to not have anything in the missing years (1889-1955). But it connects 1888 and 1956 with a straight line. Is there anything I can do to remove this line? I am using the ggplot function.
Unrelated question, but is there a way to get ggplot to not sort my variable names in the legend alphabetically? I have code like this:
ggplot(dataFrame, aes(Year, value, colour=Name)) + geom_line()
Or to add numbers in front of the variable names (Name1, ..., Name10) to the legend. For example,
1. Name1
2. Name2
...
10. Name10
Here's some sample data to answer your questions, I've added the geom_point() function to make it easier to see which values are in the data:
library(ggplot2)
seed(1234)
dat <- data.frame(Year=rep(2000:2013,5),
value=rep(1:5,each=14)+rnorm(5*14,0,.5),
Name=rep(c("Name1","End","First","Name2","Name 3"),each=14))
dat2 <- dat
dat2$value[sample.int(5*14,12)]=NA
dat3 is probably the example of what your data looks like except that I'm treating Year as an integer.
dat3 <- dat2[!is.na(dat2$value),]
# POINTS ARE CONNECTED WITH NO DATA IN BETWEEN #
ggplot(dat3, aes(Year, value, colour=Name)) +
geom_line() + geom_point()
However if you add columns in your data for the years that are missing a column and setting that value to NA then when you plot the data you'll get the gaps.
# POINTS ARE NOT CONNECTED #
ggplot(dat2, aes(Year, value, colour=Name)) +
geom_line() + geom_point()
And finally, to answer your last question this is how you change the order and labels of Name in the legend:
# CHANGE THE ORDER AND LABELS IN THE LEGEND #
ggplot(dat2, aes(Year, value, colour=Name)) +
geom_line() + geom_point() +
scale_colour_discrete(labels=c("Beginning","Name 1","Name 2","Name 3","End"),
breaks=c("First","Name1","Name2","Name 3","End"))
Using the variables (day, month, count) in the data frame d, I use ggplot below:
day <- c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Week","Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Week")
day <- factor(day, level=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Week"))
month<-c("Jan","Jan","Jan","Jan","Jan","Jan","Jan","Jan","Feb","Feb","Feb","Feb","Feb","Feb","Feb","Feb")
month<-factor(month,level=c("Jan","Feb"))
count <- c(4,5,6,8,3,4,9,5.57,2,4,3,7,1,9,3,4.14)
d <- data.frame(day=day,count=count,month=month)
d
The line graph below correctly orders the days:
ggplot()+geom_line(data=d[d$day!="Week",],aes(x=day, y=count, group=month, colour=month))
The bar graph below correctly displays the two counts:
ggplot()+geom_bar(data=d[d$day=="Week",],aes(x=day, y=count, fill=month),position="dodge")
However, the order of days is incorrect in the combined graph:
ggplot()+geom_line(data=d[d$day!="Week",],aes(x=day, y=count, group=month, colour=month))+geom_bar(data=d[d$day=="Week",],aes(x=day, y=count, fill=month),position="dodge")
How can I correctly display the order of days on the x-axis?
You can use scale_x_discrete() to change order of breaks with argument limits=. As in your original data frame factor levels are in right order then you can just use limits=levels(d$day).
ggplot()+
geom_line(data=d[d$day!="Week",],
aes(x=day, y=count, group=month, colour=month))+
geom_bar(data=d[d$day=="Week",],
aes(x=day, y=count, fill=month),position="dodge")+
scale_x_discrete(limits=levels(d$day))