Is there a way to overlay partial graph on top of full graph using ggplot? I have one line graph with time span of say 100 days on X axis and need to add second line that only spans last 20 days, with different color; I don't want to plot second line as having zero values for first 80 days - need to only plot it for last 20 days- using different color. What is the best way to do that?
Sure, just use two geoms with different subsets of your data.frame (for simplicity I use the full df and only one subset):
library(ggplot2)
df <- data.frame(Index = 1:1000, Value = cumsum(rnorm(1000)))
ggplot() + geom_line(data = df, aes(x = Index, y = Value)) +
geom_line(data = df[500:700,], aes(x = Index, y = Value), col="red")
Related
I'm generating violin plots in ggplot2 for a time series, year_1 to year_32. The years in my df are stored as numerical values. From the examples I've seen, it seems that I must convert these numerical year values to factors to plot one violin per year; and in fact, if I run the code without as.factors, I get one big fat violin. I would like to understand why geom_violin can't have numeric values on the x axis; or if I'm wrong about that, how to use them?
So:
my_data$year <- as.factor(my_data$year)
p <- ggplot(data = my_data, aes(x = year, y = continuous_var)+
geom_violin(fill = "#FF0000", color = "#000000")+
ylim(0,500)+
labs(x = "x_label", y = "y_label")
p +my_theme()
works fine, but if I skip
my_data$year <- as.factor(my_data$year)
it doesn't work, I get one big fat violin for all years. Why?
TIA
You miss a ) at the end of this line p <- ggplot(data = my_data, aes(x = year, y = continuous_var)
I have construced a reproducible example with the ToothGrowth dataset:
This should work now:
library(ggplot2)
my_data <- ToothGrowth
my_data$dose <- as.factor(my_data$dose)
p <- ggplot(data = my_data, aes(x = dose, y = len))+
geom_violin(fill = "#FF0000", color = "#000000")+
ylim(0,500)+
labs(x = "x_label", y = "y_label") +
theme_bw()
p
PS: this discussion would better fit Cross Validated, as it's more of an statistics than coding question.
I'm not 100% sure, but here's my explanation: the violin plot shows the density for a set of data, you can divide your data into groups so that you can plot one violin for each part of your data. But if the metric you're using to divide groups (x axis) is a continuous, you're going to have infinite groupings (one group for the values at 0, one for 0.1, one for 0.01, etc.), so in the end you actually can't divide your data, and ggplot probably ignores the x variable and makes one violin for all your data.
I'm plotting a time series with gaps from one source filled in from a second. Plotting this to indicate the source, I specify the source as a color aesthetic, but ggplot adds connecting lines tieing the gaps together.
Is there a clean way to remove these connecting lines? Ideally, I would like the separate groups to be connected, since I am using it as one data set.
library(ggplot)
set.seed(914)
df=data.frame(x=c(1:100),y=runif(100), z = rep(c("a", "b"), each = 25, times = 2))
ggplot(df, aes(x=x, y = y, color = z))+
geom_line()
Removing connetion lines between missing Dates in ggplot
suggests creating a group aesthetic, or making the gaps explicit using NA values.
I don't have any clear grouping aesthetic in my real data, like the year in the referenced example, and with many irregularly spaced gaps, it isn't immediately clear how to insert NA's in every gap.
As long as your colored "groups" are sequential in your data frame (as in your example) you can do:
ggplot(df, aes(x=x, y = y, color = z,
group = factor(c(0, cumsum(abs(head(z, -1) != tail(z, -1))))))) +
geom_line()
Or, for brevity, use data.table::rleid:
ggplot(df, aes(x=x, y = y, color = z, group = data.table::rleid(z))) +
geom_line()
which gives the same result
I have a time-series, with each point having a time, a value and a group he's part of. I am trying to plot it with time on x axis and value on y axes with the line appearing a different color depending on the group.
I tried using geom_path and geom_line, but they end up linking points to points within groups. I found out that when I use a continuous variable for the groups, I have a normal line; however when I use a factor or a categorical variable, I have the link problem.
Here is a reproducible example that is what I would like:
df = data.frame(time = c(1,2,3,4,5,6,7,8,9,10), value = c(5,4,9,3,8,2,5,8,7,1), group = c(1,2,2,2,1,1,2,2,2,2))
ggplot(df, aes(time, value, color = group)) + geom_line()
And here is a reproducible example that is what I have:
df = data.frame(time = c(1,2,3,4,5,6,7,8,9,10), value = c(5,4,9,3,8,2,5,8,7,1), group = c("apple","pear","pear","pear","apple","apple","pear","pear","pear","pear"))
ggplot(df, aes(time, value, color = group)) + geom_line()
So the first example works well, but 1/ it adds a few lines to change the legend to have the labels I want, 2/ out of curiosity I would like to know if I missed something.
Is there any option in ggplot I could use to have the behavior I expect, or is it an internal constraint?
As pointed by Richard Telford and Carles Sans Fuentes, adding group = 1 within the ggplot aesthetic makes the job. So the normal code should be:
ggplot(df, aes(time, value, color = group, group = 1)) + geom_line()
I am making stacked bar plots with ggplot2 in R with specific bar ordering about the y-axis.
# create reproducible data
library(ggplot2)
d <- read.csv(text='Day,Location,Length,Amount
1,4,3,1.1
1,3,1,2
1,2,3,4
1,1,3,5
2,0,0,0
3,3,3,1.8
3,2,1,3.54
3,1,3,1.1',header=T)
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount, order = Location), stat = "identity")
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount, order = rev(Location)), stat = "identity")
The first ggplot plot shows the data in order of Location, with Location=1 nearest the x-axis and data for each increasing value of Location stacked upon the next.
The second ggplot plot shows the data in a different order, but it doesn't stack the data with the highest Location value nearest the x-axis with the data for the next highest Location stacked in the second from the x-axis position for the first bar column, like I would expect it to based on an earlier post.
This next snippet does show the data in the desired way, but I think this is an artifact of the simple and small example data set. Stacking order hasn't been specified, so I think ggplot is stacking based on values for Amount.
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount), stat = "identity")
What I want is to force ggplot to stack the data in order of decreasing Location values (Location=4 nearest the x-axis, Location=3 next, ... , and Location=1 at the very top of the bar column) by calling the order = or some equivalent argument. Any thoughts or suggestions?
It seems like it should be easy because I am only dealing with numbers. It shouldn't be so hard to ask ggplot to stack the data in a way that corresponds to a column of decreasing (as you move away from the x-axis) numbers, should it?
Try:
ggplot(d, aes(x = Day, y = Length)) +
geom_bar(aes(fill = Amount, order = -Location), stat = "identity")
Notice how I swapped rev with -. Using rev does something very different: it stacks by the value for each row you happen to get if you reverse the order of values in the column Location, which could be just about anything.
I have a data frame that contains x and y coordinates for a random walk that moves in discrete steps (1 step up, down, left, or right). I'd like to plot the path---the points connected by a line. This is easy, of course. The difficulty is that the path crosses over itself and becomes difficult to interpret. I add jitter to the points to avoid overplotting, but it doesn't help distinguish the ordering of the walk.
I'd like to connect the points using a line that changes color over "time" (steps) according to a thermometer-like color scale.
My random walk is stored in its own class and I'm writing a specific plot method for it, so if you have suggestions for how I can do this using plot, that would be great. Thanks!
This is pretty easy to do in ggplot2:
so <- data.frame(x = 1:10,y = 1:10,col = 1:10)
ggplot(so,aes(x = x, y = y)) +
geom_line(aes(group = 1,colour = col))
If you prefer not to use ggplot, then ?segments will do what you want. -- I'm assuming here that x and y are both functions of time, as implied in your example.
If you use ggplot, you can set the colour aesthetic:
library(ggplot2)
walk <-cumsum(rnorm(n=100, mean=0))
dat <- data.frame(x = seq_len(length(walk)), y = walk)
ggplot(dat, aes(x,y, colour = x)) + geom_line()