I have noticed an odd behavior in geom_path() in ggplot2. I am not sure whether I am doing something wrong or whether it's a bug.
Here's my data set:
x <- abs(rnorm(10))
y <- abs(rnorm(10)/10)
categs <- c("a","b","c","d","e","f","g","h","i","j")
df <- data.frame(x,y,categs)
I make a plot with points and I join them using geom_path. Works well:
ggplot(df, aes(categs, x, group=1)) + geom_point() + geom_errorbar(aes(ymin=x-y, ymax=x+y)) + geom_path()
However, if I reorder my levels, for instance like this:
df$categs <- factor(df$categs, levels = c("f","i","c","g","e","a","d","h","b","j"))
then geom_plot still keeps the original order (although the order of the factor levels has been updated on the x axis).
Any guesses at what I am doing wrong? Thanks.
Order the df rows based on df$categs, geom_path goes row-by-row to plot:
ggplot(df[ order(df$categs), ], aes(categs, x, group=1)) +
geom_point() +
geom_errorbar(aes(ymin=x-y, ymax=x+y)) +
geom_path()
From ?geom_path manual:
geom_path() connects the observations in the order in which they appear in the data.
Related
I ran a simulation for some populations. Now I want to plot the change of particular characteristics of these population over time as a line plot. The common x axis shows the number of generation
Below is a minimum working example for my R code so far (dummy data):
require(ggplot2)
set.seed(3)
x <- 99:0
y <- 0.5+cumsum(rnorm(100, 0, 0.01))
xy <- data.frame(x,y)
ggplot(data=xy, aes(x=x, y=y)) +
geom_line() +
xlab("Generation number") +
ylab("Character")
However, now I'd like to add a second x axis which gives the number of years before present (BP), assuming that the average generation time is 22.5 years. Thus, the value for the lowest generation number will have the highest value in the 2nd axis and vice versa. Any idea how I could acchieve this?
Thanks a lot in advance for your suggestions and help!
If you just want to add a second x axis, then use sec.axis in scale_x_continuous ... you could also add some calculations there ...
ggplot(data=xy, aes(x=x, y=y)) +
geom_line() +
scale_x_continuous(sec.axis=(~.+5)) +
xlab("Generation number") +
ylab("Character")
Ok, thanks to #sambold. Here's my solution based on her/his suggestion:
ggplot(data=xy, aes(x=x, y=y)) +
geom_line() +
scale_x_continuous(sec.axis=(~.*-22.5+2250)) +
xlab("Generation number") +
ylab("Character")
I have prepared a dataset that I wish to display as a histogram.
I believe I get the X axis right, but can't seem to get totmis1 on the Y axis... Just an unclear histogram:
ggplot(data = brfss2013a, aes(x = totmis)) +
geom_histogram(binwidth = 3)
tl;dr use geom_bar(stat="identity") instead of geom_histogram()
I think the terminology you are looking for is a bar chart (technically, a histogram is the result of counting/binning a continuous distribution of data; it's not clear whether you've already computed these values by binning, or whether the data mean something else, but I don't think it matters).
dd <- data.frame(totmis=1:11,
totmis1=c(5786,5086,3187,2594,1591,1318,
847,754,512,511,383))
library(ggplot2)
ggplot(dd, aes(totmis,totmis1))+
geom_bar(stat="identity")
You need stat="identity" because geom_bar() tries to count occurrences by default ...
I am using the following code to plot a stacked area graph and I get the expected plot.
P <- ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) + #ggplot initial parameters
geom_ribbon(position='fill', aes(ymin=0, ymax=1))
but then when I add lines which are reading the same data source I get misaligned results towards the right side of the graph
P + geom_line(position='fill', aes(group=model, ymax=1))
does anyone know why this may be? Both plots are reading the same data source so I can't figure out what the problem is.
Actually, if all you wanted to do was draw an outline around the areas, then you could do the same using the colour aesthetic.
ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) +
geom_ribbon(position='fill', aes(ymin=0, ymax=1), colour = "black")
I have an answer, I hope it works for you, it looks good but very different from your original graph:
library(ggplot2)
DATA2 <- read.csv("C:/Users/corcoranbarriosd/Downloads/porsche model volumes.csv", header = TRUE, stringsAsFactors = FALSE)
In my experience you want to have X as a numeric variable and you have it as a string, if that is not the case I can Change that, but this will transform your bucket into a numeric vector:
bucket.list <- strsplit(unlist(DATA2$bucket), "[^0-9]+")
x=numeric()
for (i in 1:length(bucket.list)) {
x[i] <- bucket.list[[i]][2]
}
DATA2$bucket <- as.numeric(x)
P <- ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) +
geom_ribbon(aes(ymin=0, ymax=volume))+ geom_line(aes(group=model, ymax=volume))
It gives me the area and the line tracking each other, hope that's what you needed
If you switch to using geom_path in place of geom_line, it all seems to work as expected. I don't think the ordering of geom_line is behaving the same as geom_ribbon (and suspect that geom_line -- like geom_area -- assumes a zero base y value)
ggplot(DATA2, aes(x=bucket, y=volume, ymin=0, ymax=1,
group=model, fill=model, label=volume)) +
geom_ribbon(position='fill') +
geom_path(position='fill')
Should give you
I have what seems to be a very basic problem, but I cannot solve it, as I have barely used ggplots2... I just want that the plot on the left uses the colors in the variable color1 and the plot on the right uses the colors in the variable color2. This is a MWE:
library(reshape2)
library(ggplot2)
a.df <- data.frame(
id=c("a","b","c","d","e","f","g","h"),
var1=c(1,2,3,4,5,6,7,8), var2=c(21,22,23,24,25,26,27,28),
var3=c(56,57,58,59,60,61,62,63),
color1=c(1,2,"NONE","NONE",1,2,2,1),
color2=c(1,"NONE",1,1,2,2,"NONE",2)
)
a.dfm <- melt(a.df, measure.vars=c("var2","var3"))
ggplot(a.dfm, aes(x=value, y=var1, color=color1)) +
geom_point(shape=1) +
facet_grid(. ~ variable)
Thanks a lot!
I think the easiest approach with your data is to create an additional column which has the color defined appropriately based on the value of variable. Since there are just two possible values that variable can take on, this isn't that hard.
a.dfm2 <- transform(a.dfm,
color.use = ifelse(variable=="var2",
as.character(color1),
as.character(color2)))
ggplot(a.dfm2, aes(x=value, y=var1, color=color.use)) +
geom_point(shape=1) +
facet_grid(. ~ variable)
I have data that I am trying to plot. I have several variables that range from the years 1880-2012. I have one observation per year. But sometimes a variable does not have an observation for a number of years. For example, it may have an observation from 1880-1888, but then not from 1889-1955 and then from 1956-2012. I would like ggplot2 + geom_line to not have anything in the missing years (1889-1955). But it connects 1888 and 1956 with a straight line. Is there anything I can do to remove this line? I am using the ggplot function.
Unrelated question, but is there a way to get ggplot to not sort my variable names in the legend alphabetically? I have code like this:
ggplot(dataFrame, aes(Year, value, colour=Name)) + geom_line()
Or to add numbers in front of the variable names (Name1, ..., Name10) to the legend. For example,
1. Name1
2. Name2
...
10. Name10
Here's some sample data to answer your questions, I've added the geom_point() function to make it easier to see which values are in the data:
library(ggplot2)
seed(1234)
dat <- data.frame(Year=rep(2000:2013,5),
value=rep(1:5,each=14)+rnorm(5*14,0,.5),
Name=rep(c("Name1","End","First","Name2","Name 3"),each=14))
dat2 <- dat
dat2$value[sample.int(5*14,12)]=NA
dat3 is probably the example of what your data looks like except that I'm treating Year as an integer.
dat3 <- dat2[!is.na(dat2$value),]
# POINTS ARE CONNECTED WITH NO DATA IN BETWEEN #
ggplot(dat3, aes(Year, value, colour=Name)) +
geom_line() + geom_point()
However if you add columns in your data for the years that are missing a column and setting that value to NA then when you plot the data you'll get the gaps.
# POINTS ARE NOT CONNECTED #
ggplot(dat2, aes(Year, value, colour=Name)) +
geom_line() + geom_point()
And finally, to answer your last question this is how you change the order and labels of Name in the legend:
# CHANGE THE ORDER AND LABELS IN THE LEGEND #
ggplot(dat2, aes(Year, value, colour=Name)) +
geom_line() + geom_point() +
scale_colour_discrete(labels=c("Beginning","Name 1","Name 2","Name 3","End"),
breaks=c("First","Name1","Name2","Name 3","End"))