Plot two lines on the same y-axis; Ggplot, R - r

I have a ggplot graph and I want to draw two lines on it (from different columns, but for the same date). What I get are two lines that are stacked on each other, but I want to have the same y-axis, ordered correctly, with the lines overlapping each other.
This is the data I'm trying to plot:
final_table:
Month a b
1 2018-04 758519.397875 2404429.258675
2 2018-05 964792.603725 1995902.14473
3 2018-06 703170.240575 1294997.84319
This is my code:
bla3 <- melt(final_table, id='Month')
ggplot(data=bla3, aes(x=Month, y=value, colour= variable, group=variable)) +
geom_line()
And the output I get (notice the y-axis is totally wrong and unordered).

I guess that your data variable is not in the right format. E.g. if you run
class(final_table$month)
This should yield date. So you need to get it into the right format. Here's an example with your numbers.
Month <- as.character(c("2018-04", "2018-05", "2018-06")) #or convert it to character after
a <- c(758519.397875, 964792.603725, 703170.240575)
b <- c(2404429.258675, 1995902.14473, 1294997.84319)
final_table <- data.frame(Month, a, b)
#your Month variable is messed up, you actually need the day!
final_table$Month <- as.Date(paste(final_table$Month,"-01",sep=""))
library(reshape) #need to load that for melt
bla3 <- melt(final_table, id='Month')
ggplot(data=bla3, aes(x=Month, y=value, colour= variable, group=variable)) +
geom_line()

Related

Plotting columns as series

I have a dataframe which has 12 columns (one for each month of the year) and an id. Each record in this dataframe corresponds to the transaction amount(in dollars) a customer has made over the course of last twelve months. I want to plot these columns as series. And I also want to plot all the customers in the dataframe. The x-axis will be the month index and y-axis will be dollar value. So basically for each customer I need a line or series chart on the same graph.
Code for generating random data
a <- data.frame(id = seq(1,1000,1))
b <- data.frame(replicate(12,sample(1000:100000,1000,rep=TRUE)))
df <- cbind(a,b)
This is what I tried but its not what I want
library(reshape2)
library(ggplot2)
df_lg <- melt(df, id = 'id') # convert from wide to tall
ggplot(data=df_lg,
aes(x=variable, y=value, colour=variable)) +
geom_line()
Any ideas how to do this?
Just add group to your aesthetics, so the colour and group should be the id variable you want in the legend.
ggplot(data=df_lg,
aes(x=variable, y=value, colour=id, group = id)) +
geom_line()

ggplot why are bars not stacked?

I would like to create a stacked bar graph however my output shows overlaid bars instead of stacked. How can I rectify this?
#Create data
date <- as.Date(rep(c("1/1/2016", "2/1/2016", "3/1/2016", "4/1/2016", "5/1/2016"),2))
sales <- c(23,52,73,82,12,67,34,23,45,43)*1000
geo <- c(rep("Western Territory",5), rep("Eastern Territory",5))
data <- data.frame(date, sales, geo)
#Plot
library(ggplot2)
ggplot(data=data, aes(x=date, y=sales, fill=geo))+
stat_summary(fun.y=sum, geom="bar") +
ggtitle("TITLE")
Plot output:
As you can see from the summarized table below, it confirms the bars are not stacked:
>#Verify plot is correct
>ddply(data, c("date"), summarize, total=sum(sales))
date total
1 0001-01-20 90000
2 0002-01-20 86000
3 0003-01-20 96000
4 0004-01-20 127000
5 0005-01-20 55000
Thanks!
You have to include position="stack" in your statSummary:
stat_summary(position="stack",fun.y=sum, geom="bar")
Alternatively, since your data are already summarized, you could use geom_col (the short hand for geom_bar(stat = "identity")):
ggplot(data=data, aes(x=date, y=sales, fill=geo))+
geom_col() +
scale_x_date(date_labels = "%b-%d")
Produces:
Note that I changed the date formatting (by adding format = "%m/%d/%Y" to the as.Date call) and explictly set the axis lable formatting.
If your actual data have more than one entry per period, you can always summarise first, then pass that into ggplot instead of the raw data.

How to rearrange string X axis tick labels in ggplot2

I am trying to plot trip length distribution (for every 10 miles increase in distance I want to find out the Percent of trips in that bin for that specific year). When I plot it in ggplot2 my X-axis tick labels are ordered alphabetically rather than in the order of increasing distance. I have tried using the various tricks suggested (Change the order of a discrete x scale) but am not getting anywhere. The one link My code is below and the dataset is here (http://goo.gl/W1jjfL).
library(ggplot2)
library(reshape2)
nwpt <- subset(nonwork, select=c(Distance, PersonTrips1995, PersonTrips2001, PersonTrips2009))
nwpt <- melt(nwpt, id.vars="Distance")
ggplot(data=nwpt, aes(x=Distance, y=value, group=variable, colour=variable)) + scale_x_discrete(name="Distance") + geom_line(size=0.5) + ggtitle("Non Work Person Trips") + ylab("Percent")
I checked to see if the Distance variable is a factor and it is as shown below:
is.factor(nwpt$Distance) 1 TRUE
However, the output I am getting is not as I desire. Instead of Under 10 Miles being the first category, 10-14 miles being next etc. I get the plot like shown below (PDF here: http://goo.gl/V7yvxT).
Any help is appreciated.
TIA
Krishnan
Here's one way:
library(ggplot2)
library(reshape2)
nwpt <- subset(nonwork,
select=c(DID,Distance,PersonTrips1995,PersonTrips2001,PersonTrips2009))
nwpt <- melt(nwpt, id.vars=c("DID","Distance"))
ggplot(data=nwpt, aes(x=DID, y=value, colour=variable)) +
geom_line(size=0.5) +
labs(title="Non Work Person Trips", y="Percent") +
scale_x_discrete(name="Distance", labels=nwpt$Distance) +
theme(axis.text.x=element_text(angle=90))
Produces this with your dataset:

R - ggplot2 - geom_line - Get rid of straight line for missing values

I have data that I am trying to plot. I have several variables that range from the years 1880-2012. I have one observation per year. But sometimes a variable does not have an observation for a number of years. For example, it may have an observation from 1880-1888, but then not from 1889-1955 and then from 1956-2012. I would like ggplot2 + geom_line to not have anything in the missing years (1889-1955). But it connects 1888 and 1956 with a straight line. Is there anything I can do to remove this line? I am using the ggplot function.
Unrelated question, but is there a way to get ggplot to not sort my variable names in the legend alphabetically? I have code like this:
ggplot(dataFrame, aes(Year, value, colour=Name)) + geom_line()
Or to add numbers in front of the variable names (Name1, ..., Name10) to the legend. For example,
1. Name1
2. Name2
...
10. Name10
Here's some sample data to answer your questions, I've added the geom_point() function to make it easier to see which values are in the data:
library(ggplot2)
seed(1234)
dat <- data.frame(Year=rep(2000:2013,5),
value=rep(1:5,each=14)+rnorm(5*14,0,.5),
Name=rep(c("Name1","End","First","Name2","Name 3"),each=14))
dat2 <- dat
dat2$value[sample.int(5*14,12)]=NA
dat3 is probably the example of what your data looks like except that I'm treating Year as an integer.
dat3 <- dat2[!is.na(dat2$value),]
# POINTS ARE CONNECTED WITH NO DATA IN BETWEEN #
ggplot(dat3, aes(Year, value, colour=Name)) +
geom_line() + geom_point()
However if you add columns in your data for the years that are missing a column and setting that value to NA then when you plot the data you'll get the gaps.
# POINTS ARE NOT CONNECTED #
ggplot(dat2, aes(Year, value, colour=Name)) +
geom_line() + geom_point()
And finally, to answer your last question this is how you change the order and labels of Name in the legend:
# CHANGE THE ORDER AND LABELS IN THE LEGEND #
ggplot(dat2, aes(Year, value, colour=Name)) +
geom_line() + geom_point() +
scale_colour_discrete(labels=c("Beginning","Name 1","Name 2","Name 3","End"),
breaks=c("First","Name1","Name2","Name 3","End"))

Can the x-axis labels be ordered when using a combined bar & line graph in R?

Using the variables (day, month, count) in the data frame d, I use ggplot below:
day <- c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Week","Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Week")
day <- factor(day, level=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Week"))
month<-c("Jan","Jan","Jan","Jan","Jan","Jan","Jan","Jan","Feb","Feb","Feb","Feb","Feb","Feb","Feb","Feb")
month<-factor(month,level=c("Jan","Feb"))
count <- c(4,5,6,8,3,4,9,5.57,2,4,3,7,1,9,3,4.14)
d <- data.frame(day=day,count=count,month=month)
d
The line graph below correctly orders the days:
ggplot()+geom_line(data=d[d$day!="Week",],aes(x=day, y=count, group=month, colour=month))
The bar graph below correctly displays the two counts:
ggplot()+geom_bar(data=d[d$day=="Week",],aes(x=day, y=count, fill=month),position="dodge")
However, the order of days is incorrect in the combined graph:
ggplot()+geom_line(data=d[d$day!="Week",],aes(x=day, y=count, group=month, colour=month))+geom_bar(data=d[d$day=="Week",],aes(x=day, y=count, fill=month),position="dodge")
How can I correctly display the order of days on the x-axis?
You can use scale_x_discrete() to change order of breaks with argument limits=. As in your original data frame factor levels are in right order then you can just use limits=levels(d$day).
ggplot()+
geom_line(data=d[d$day!="Week",],
aes(x=day, y=count, group=month, colour=month))+
geom_bar(data=d[d$day=="Week",],
aes(x=day, y=count, fill=month),position="dodge")+
scale_x_discrete(limits=levels(d$day))

Resources