How to rearrange string X axis tick labels in ggplot2 - r

I am trying to plot trip length distribution (for every 10 miles increase in distance I want to find out the Percent of trips in that bin for that specific year). When I plot it in ggplot2 my X-axis tick labels are ordered alphabetically rather than in the order of increasing distance. I have tried using the various tricks suggested (Change the order of a discrete x scale) but am not getting anywhere. The one link My code is below and the dataset is here (http://goo.gl/W1jjfL).
library(ggplot2)
library(reshape2)
nwpt <- subset(nonwork, select=c(Distance, PersonTrips1995, PersonTrips2001, PersonTrips2009))
nwpt <- melt(nwpt, id.vars="Distance")
ggplot(data=nwpt, aes(x=Distance, y=value, group=variable, colour=variable)) + scale_x_discrete(name="Distance") + geom_line(size=0.5) + ggtitle("Non Work Person Trips") + ylab("Percent")
I checked to see if the Distance variable is a factor and it is as shown below:
is.factor(nwpt$Distance) 1 TRUE
However, the output I am getting is not as I desire. Instead of Under 10 Miles being the first category, 10-14 miles being next etc. I get the plot like shown below (PDF here: http://goo.gl/V7yvxT).
Any help is appreciated.
TIA
Krishnan

Here's one way:
library(ggplot2)
library(reshape2)
nwpt <- subset(nonwork,
select=c(DID,Distance,PersonTrips1995,PersonTrips2001,PersonTrips2009))
nwpt <- melt(nwpt, id.vars=c("DID","Distance"))
ggplot(data=nwpt, aes(x=DID, y=value, colour=variable)) +
geom_line(size=0.5) +
labs(title="Non Work Person Trips", y="Percent") +
scale_x_discrete(name="Distance", labels=nwpt$Distance) +
theme(axis.text.x=element_text(angle=90))
Produces this with your dataset:

Related

r - Add a 2nd x axis to a ggplot

I ran a simulation for some populations. Now I want to plot the change of particular characteristics of these population over time as a line plot. The common x axis shows the number of generation
Below is a minimum working example for my R code so far (dummy data):
require(ggplot2)
set.seed(3)
x <- 99:0
y <- 0.5+cumsum(rnorm(100, 0, 0.01))
xy <- data.frame(x,y)
ggplot(data=xy, aes(x=x, y=y)) +
geom_line() +
xlab("Generation number") +
ylab("Character")
However, now I'd like to add a second x axis which gives the number of years before present (BP), assuming that the average generation time is 22.5 years. Thus, the value for the lowest generation number will have the highest value in the 2nd axis and vice versa. Any idea how I could acchieve this?
Thanks a lot in advance for your suggestions and help!
If you just want to add a second x axis, then use sec.axis in scale_x_continuous ... you could also add some calculations there ...
ggplot(data=xy, aes(x=x, y=y)) +
geom_line() +
scale_x_continuous(sec.axis=(~.+5)) +
xlab("Generation number") +
ylab("Character")
Ok, thanks to #sambold. Here's my solution based on her/his suggestion:
ggplot(data=xy, aes(x=x, y=y)) +
geom_line() +
scale_x_continuous(sec.axis=(~.*-22.5+2250)) +
xlab("Generation number") +
ylab("Character")

ggplot2: Time-series plot by continuous variable, color/fill by group

I have searched considerably for what I want to accomplish, but I haven't run across examples or plots that are specifically what I'm looking for), so I am reaching out to the community.
What I have (data downloadable here):
Time-series data (each record 2 hours apart and spanning nearly a year) with associated elevation and property ownership.
library(ggplot2)
data <- read.csv("dataex.csv")
data$timestamp <-as.POSIXct(as.character(data$timestamp),format="%m/%d/%Y %H:%M", tz="GMT")
What I want (via ggplot):
A line or bar plot showing elevation (y-axis) across time (x-axis) for each data record colored by ownership (for a line plot, filling the area under the line, or for a bar plot, filling the bar). I've tried iterations of geom_line, geom_bar, and geom_area (w geom_bar below the closest I have come). I'd like at least one of the following options to come true!
Option A - The closest I have come to achieving this (plotting per data record) is with the following code:
ggplot(data, aes(x=timestamp, y=elev, fill=OWNER)) + geom_bar(stat="identity")
However, I'd like the bars to be touching each other, but if I adjust the width in geom_bar(), everything disappears. (Also, if I run the above code on other batches of similar data, it will only show a fraction of the bars, likely because they have more data records). Seems like its just too much data to plot. So I tried another route...
Option B - Plotting by day, which turns out to be more informative, showing each day the variability in ownership.
ggplot(data, aes(x=as.Date(Date, format='%Y-%m-%d'), y=elev, fill=OWNER)) + geom_bar(stat="identity", width=1)
However, this sums the y-axis, so the elevation is not interpretable. I could divide the y-axis by 12 (the typical number of records per day) but there are occasional days with fewer than 12 records, which causes the y-axis to be incorrect. Is there a function or a way to divide the y-axis by the respective number of records per day that is being represented in the plot? Or does someone have advice for a better solution?
Something like:
library(readr)
library(dplyr)
library(ggplot2)
library(ggalt)
readr::read_csv("~/Desktop/dataex.csv") %>%
mutate(timestamp=lubridate::mdy_hm(timestamp)) %>%
select(timestamp, elev, Owner=OWNER) -> df
ggplot(df, aes(timestamp, elev, group=Owner, color=Owner)) +
geom_segment(aes(xend=timestamp, yend=0), size=0.1) +
scale_x_datetime(expand=c(0,0), date_breaks="2 months") +
scale_y_continuous(expand=c(0,0), limits=c(0,2250), label=scales::comma) +
ggthemes::scale_color_tableau() +
hrbrmisc::theme_hrbrmstr(grid="Y") +
labs(x=NULL, y="Elevation") +
theme(legend.position="bottom") +
theme(axis.title.y=element_text(angle=0, margin=margin(r=-20)))
?

ggplot why are bars not stacked?

I would like to create a stacked bar graph however my output shows overlaid bars instead of stacked. How can I rectify this?
#Create data
date <- as.Date(rep(c("1/1/2016", "2/1/2016", "3/1/2016", "4/1/2016", "5/1/2016"),2))
sales <- c(23,52,73,82,12,67,34,23,45,43)*1000
geo <- c(rep("Western Territory",5), rep("Eastern Territory",5))
data <- data.frame(date, sales, geo)
#Plot
library(ggplot2)
ggplot(data=data, aes(x=date, y=sales, fill=geo))+
stat_summary(fun.y=sum, geom="bar") +
ggtitle("TITLE")
Plot output:
As you can see from the summarized table below, it confirms the bars are not stacked:
>#Verify plot is correct
>ddply(data, c("date"), summarize, total=sum(sales))
date total
1 0001-01-20 90000
2 0002-01-20 86000
3 0003-01-20 96000
4 0004-01-20 127000
5 0005-01-20 55000
Thanks!
You have to include position="stack" in your statSummary:
stat_summary(position="stack",fun.y=sum, geom="bar")
Alternatively, since your data are already summarized, you could use geom_col (the short hand for geom_bar(stat = "identity")):
ggplot(data=data, aes(x=date, y=sales, fill=geo))+
geom_col() +
scale_x_date(date_labels = "%b-%d")
Produces:
Note that I changed the date formatting (by adding format = "%m/%d/%Y" to the as.Date call) and explictly set the axis lable formatting.
If your actual data have more than one entry per period, you can always summarise first, then pass that into ggplot instead of the raw data.

R Side-by-Side Boxplot

I'm sure this is a very simple question for most of you, but I'm new and can't figure it out. How do you create a side by side box plot grouped by time? For example, I have 24 months of data. I want to make one box plot for the first 12 months, and another for the second 12 months. My data can be seen below.
Month,Revenue
1,94000
2,81000
3,117000
4,105000
5,117000
6,89000
7,101000
8,118000
9,105000
10,123000
11,109000
12,89000
13,106000
14,159000
15,121000
16,135000
17,116000
18,133000
19,144000
20,130000
21,142000
22,124000
23,140000
24,104000
Since your data has a time ordering, it might be illuminating to plot line plots by month for each year separately. Here is code for both a line plot and a boxplot. I just made up the year values in the code below, but you can make those whatever is appropriate:
library(ggplot2)
# Assuming your data frame is called "dat"
dat$Month.abb = month.abb[rep(1:12,2)]
dat$Month.abb = factor(dat$Month.abb, levels=month.abb)
dat$Year = rep(2014:2015, each=12)
ggplot(dat, aes(Month.abb, Revenue, colour=factor(Year))) +
geom_line(aes(group=Year)) + geom_point() +
scale_y_continuous(limits=c(0,max(dat$Revenue))) +
theme_bw() +
labs(colour="Year", x="Month")
ggplot(dat, aes(factor(Year), Revenue)) +
geom_boxplot() +
scale_y_continuous(limits=c(0,max(dat$Revenue))) +
theme_bw() +
labs(x="Year")

facet_wrap: How to add y axis to every individual graph when scales="free_x"?

The following code
library(ggplot2)
library(reshape2)
m=melt(iris[,1:4])
ggplot(m, aes(value)) +
facet_wrap(~variable,ncol=2,scales="free_x") +
geom_histogram()
produces 4 graphs with fixed y axis (which is what I want). However, by default, the y axis is only displayed on the left side of the faceted graph (i.e. on the side of 1st and 3rd graph).
What do I do to make the y axis show itself on all 4 graphs? Thanks!
EDIT: As suggested by #Roland, one could set scales="free" and use ylim(c(0,30)), but I would prefer not to have to set the limits everytime manually.
#Roland also suggested to use hist and ddply outside of ggplot to get the maximum count. Isn't there any ggplot2 based solution?
EDIT: There is a very elegant solution from #babptiste. However, when changing binwidth, it starts to behave oddly (at least for me). Check this example with default binwidth (range/30). The values on the y axis are between 0 and 30,000.
library(ggplot2)
library(reshape2)
m=melt(data=diamonds[,c("x","y","z")])
ggplot(m,aes(x=value)) +
facet_wrap(~variable,ncol=2,scales="free") +
geom_histogram() +
geom_blank(aes(y=max(..count..)), stat="bin")
And now this one.
ggplot(m,aes(x=value)) +
facet_wrap(~variable,scales="free") +
geom_histogram(binwidth=0.5) +
geom_blank(aes(y=max(..count..)), stat="bin")
The binwidth is now set to 0.5 so the highest frequency should change (decrease in fact, as in tighter bins there will be less observations). However, nothing happened with the y axis, it still covers the same amount of values, creating a huge empty space in each graph.
[The problem is solved... see #baptiste's edited answer.]
Is this what you're after?
ggplot(m, aes(value)) +
facet_wrap(~variable,scales="free") +
geom_histogram(binwidth=0.5) +
geom_blank(aes(y=max(..count..)), stat="bin", binwidth=0.5)
ggplot(m, aes(value)) +
facet_wrap(~variable,scales="free") +
ylim(c(0,30)) +
geom_histogram()
Didzis Elferts in https://stackoverflow.com/a/14584567/2416535 suggested using ggplot_build() to get the values of the bins used in geom_histogram (ggplot_build() provides data used by ggplot2 to plot the graph). Once you have your graph stored in an object, you can find the values for all the bins in the column count:
library(ggplot2)
library(reshape2)
m=melt(iris[,1:4])
plot = ggplot(m) +
facet_wrap(~variable,scales="free") +
geom_histogram(aes(x=value))
ggplot_build(plot)$data[[1]]$count
Therefore, I tried to replace the max y limit by this:
max(ggplot_build(plot)$data[[1]]$count)
and managed to get a working example:
m=melt(data=diamonds[,c("x","y","z")])
bin=0.5 # you can use this to try out different bin widths to see the results
plot=
ggplot(m) +
facet_wrap(~variable,scales="free") +
geom_histogram(aes(x=value),binwidth=bin)
ggplot(m) +
facet_wrap(~variable,ncol=2,scales="free") +
geom_histogram(aes(x=value),binwidth=bin) +
ylim(c(0,max(ggplot_build(plot)$data[[1]]$count)))
It does the job, albeit clumsily. It would be nice if someone improved upon that to eliminate the need to create 2 graphs, or rather the same graph twice.

Resources