I ran a simulation for some populations. Now I want to plot the change of particular characteristics of these population over time as a line plot. The common x axis shows the number of generation
Below is a minimum working example for my R code so far (dummy data):
require(ggplot2)
set.seed(3)
x <- 99:0
y <- 0.5+cumsum(rnorm(100, 0, 0.01))
xy <- data.frame(x,y)
ggplot(data=xy, aes(x=x, y=y)) +
geom_line() +
xlab("Generation number") +
ylab("Character")
However, now I'd like to add a second x axis which gives the number of years before present (BP), assuming that the average generation time is 22.5 years. Thus, the value for the lowest generation number will have the highest value in the 2nd axis and vice versa. Any idea how I could acchieve this?
Thanks a lot in advance for your suggestions and help!
If you just want to add a second x axis, then use sec.axis in scale_x_continuous ... you could also add some calculations there ...
ggplot(data=xy, aes(x=x, y=y)) +
geom_line() +
scale_x_continuous(sec.axis=(~.+5)) +
xlab("Generation number") +
ylab("Character")
Ok, thanks to #sambold. Here's my solution based on her/his suggestion:
ggplot(data=xy, aes(x=x, y=y)) +
geom_line() +
scale_x_continuous(sec.axis=(~.*-22.5+2250)) +
xlab("Generation number") +
ylab("Character")
Related
I have noticed an odd behavior in geom_path() in ggplot2. I am not sure whether I am doing something wrong or whether it's a bug.
Here's my data set:
x <- abs(rnorm(10))
y <- abs(rnorm(10)/10)
categs <- c("a","b","c","d","e","f","g","h","i","j")
df <- data.frame(x,y,categs)
I make a plot with points and I join them using geom_path. Works well:
ggplot(df, aes(categs, x, group=1)) + geom_point() + geom_errorbar(aes(ymin=x-y, ymax=x+y)) + geom_path()
However, if I reorder my levels, for instance like this:
df$categs <- factor(df$categs, levels = c("f","i","c","g","e","a","d","h","b","j"))
then geom_plot still keeps the original order (although the order of the factor levels has been updated on the x axis).
Any guesses at what I am doing wrong? Thanks.
Order the df rows based on df$categs, geom_path goes row-by-row to plot:
ggplot(df[ order(df$categs), ], aes(categs, x, group=1)) +
geom_point() +
geom_errorbar(aes(ymin=x-y, ymax=x+y)) +
geom_path()
From ?geom_path manual:
geom_path() connects the observations in the order in which they appear in the data.
I'm sure this is a very simple question for most of you, but I'm new and can't figure it out. How do you create a side by side box plot grouped by time? For example, I have 24 months of data. I want to make one box plot for the first 12 months, and another for the second 12 months. My data can be seen below.
Month,Revenue
1,94000
2,81000
3,117000
4,105000
5,117000
6,89000
7,101000
8,118000
9,105000
10,123000
11,109000
12,89000
13,106000
14,159000
15,121000
16,135000
17,116000
18,133000
19,144000
20,130000
21,142000
22,124000
23,140000
24,104000
Since your data has a time ordering, it might be illuminating to plot line plots by month for each year separately. Here is code for both a line plot and a boxplot. I just made up the year values in the code below, but you can make those whatever is appropriate:
library(ggplot2)
# Assuming your data frame is called "dat"
dat$Month.abb = month.abb[rep(1:12,2)]
dat$Month.abb = factor(dat$Month.abb, levels=month.abb)
dat$Year = rep(2014:2015, each=12)
ggplot(dat, aes(Month.abb, Revenue, colour=factor(Year))) +
geom_line(aes(group=Year)) + geom_point() +
scale_y_continuous(limits=c(0,max(dat$Revenue))) +
theme_bw() +
labs(colour="Year", x="Month")
ggplot(dat, aes(factor(Year), Revenue)) +
geom_boxplot() +
scale_y_continuous(limits=c(0,max(dat$Revenue))) +
theme_bw() +
labs(x="Year")
I want to add an average line to the existing plot.
library(ggplot2)
A <- c(1:10)
B <- c(1,1,2,2,3,3,4,4,5,5)
donnees <- data.frame(A,B)
datetime<-donnees[,2]
Indcatotvalue<-donnees[,1]
df<-donnees
mn<-tapply(donnees[,1],donnees[,2],mean)
moyenne <- data.frame(template=names(mn),mean=mn)
ggplot(data=df,
aes_q(x=datetime,
y=Indcatotvalue)) + geom_line()
I have tried to add :
geom_line(aes(y = moyenne[,2], colour = "blue"))
or :
lines(moyenne[,1],moyenne[,2],col="blue")
but nothing happens, I don't understand especially for the function "lines".
When you say average line I'm assuming you want to plot a line that represents the average value of Y (Indcatotvalue). For that you want to use geom_hline() which plots horizontal lines on your graph:
ggplot(data=df,aes_q(x=datetime,y=Indcatotvalue)) +
geom_line() +
geom_hline(yintercept = mean(Indcatotvalue), color="blue")
Which, with the example numbers you gave, will give you a plot that looks like this:
The function stat_summary is perfect here.
I have found the answer in this page groups.google from Brian Diggs:
p + stat_summary(aes(group=bucket), fun.y=mean, geom="line", colour="green")
You need to set the group to the faceting variable explicitly since
otherwise it will be type and bucket (which looks like type since type
is nested in bucket).
I am trying to plot trip length distribution (for every 10 miles increase in distance I want to find out the Percent of trips in that bin for that specific year). When I plot it in ggplot2 my X-axis tick labels are ordered alphabetically rather than in the order of increasing distance. I have tried using the various tricks suggested (Change the order of a discrete x scale) but am not getting anywhere. The one link My code is below and the dataset is here (http://goo.gl/W1jjfL).
library(ggplot2)
library(reshape2)
nwpt <- subset(nonwork, select=c(Distance, PersonTrips1995, PersonTrips2001, PersonTrips2009))
nwpt <- melt(nwpt, id.vars="Distance")
ggplot(data=nwpt, aes(x=Distance, y=value, group=variable, colour=variable)) + scale_x_discrete(name="Distance") + geom_line(size=0.5) + ggtitle("Non Work Person Trips") + ylab("Percent")
I checked to see if the Distance variable is a factor and it is as shown below:
is.factor(nwpt$Distance) 1 TRUE
However, the output I am getting is not as I desire. Instead of Under 10 Miles being the first category, 10-14 miles being next etc. I get the plot like shown below (PDF here: http://goo.gl/V7yvxT).
Any help is appreciated.
TIA
Krishnan
Here's one way:
library(ggplot2)
library(reshape2)
nwpt <- subset(nonwork,
select=c(DID,Distance,PersonTrips1995,PersonTrips2001,PersonTrips2009))
nwpt <- melt(nwpt, id.vars=c("DID","Distance"))
ggplot(data=nwpt, aes(x=DID, y=value, colour=variable)) +
geom_line(size=0.5) +
labs(title="Non Work Person Trips", y="Percent") +
scale_x_discrete(name="Distance", labels=nwpt$Distance) +
theme(axis.text.x=element_text(angle=90))
Produces this with your dataset:
The following code
library(ggplot2)
library(reshape2)
m=melt(iris[,1:4])
ggplot(m, aes(value)) +
facet_wrap(~variable,ncol=2,scales="free_x") +
geom_histogram()
produces 4 graphs with fixed y axis (which is what I want). However, by default, the y axis is only displayed on the left side of the faceted graph (i.e. on the side of 1st and 3rd graph).
What do I do to make the y axis show itself on all 4 graphs? Thanks!
EDIT: As suggested by #Roland, one could set scales="free" and use ylim(c(0,30)), but I would prefer not to have to set the limits everytime manually.
#Roland also suggested to use hist and ddply outside of ggplot to get the maximum count. Isn't there any ggplot2 based solution?
EDIT: There is a very elegant solution from #babptiste. However, when changing binwidth, it starts to behave oddly (at least for me). Check this example with default binwidth (range/30). The values on the y axis are between 0 and 30,000.
library(ggplot2)
library(reshape2)
m=melt(data=diamonds[,c("x","y","z")])
ggplot(m,aes(x=value)) +
facet_wrap(~variable,ncol=2,scales="free") +
geom_histogram() +
geom_blank(aes(y=max(..count..)), stat="bin")
And now this one.
ggplot(m,aes(x=value)) +
facet_wrap(~variable,scales="free") +
geom_histogram(binwidth=0.5) +
geom_blank(aes(y=max(..count..)), stat="bin")
The binwidth is now set to 0.5 so the highest frequency should change (decrease in fact, as in tighter bins there will be less observations). However, nothing happened with the y axis, it still covers the same amount of values, creating a huge empty space in each graph.
[The problem is solved... see #baptiste's edited answer.]
Is this what you're after?
ggplot(m, aes(value)) +
facet_wrap(~variable,scales="free") +
geom_histogram(binwidth=0.5) +
geom_blank(aes(y=max(..count..)), stat="bin", binwidth=0.5)
ggplot(m, aes(value)) +
facet_wrap(~variable,scales="free") +
ylim(c(0,30)) +
geom_histogram()
Didzis Elferts in https://stackoverflow.com/a/14584567/2416535 suggested using ggplot_build() to get the values of the bins used in geom_histogram (ggplot_build() provides data used by ggplot2 to plot the graph). Once you have your graph stored in an object, you can find the values for all the bins in the column count:
library(ggplot2)
library(reshape2)
m=melt(iris[,1:4])
plot = ggplot(m) +
facet_wrap(~variable,scales="free") +
geom_histogram(aes(x=value))
ggplot_build(plot)$data[[1]]$count
Therefore, I tried to replace the max y limit by this:
max(ggplot_build(plot)$data[[1]]$count)
and managed to get a working example:
m=melt(data=diamonds[,c("x","y","z")])
bin=0.5 # you can use this to try out different bin widths to see the results
plot=
ggplot(m) +
facet_wrap(~variable,scales="free") +
geom_histogram(aes(x=value),binwidth=bin)
ggplot(m) +
facet_wrap(~variable,ncol=2,scales="free") +
geom_histogram(aes(x=value),binwidth=bin) +
ylim(c(0,max(ggplot_build(plot)$data[[1]]$count)))
It does the job, albeit clumsily. It would be nice if someone improved upon that to eliminate the need to create 2 graphs, or rather the same graph twice.