Draw line between points with groups in ggplot - r

I have a time-series, with each point having a time, a value and a group he's part of. I am trying to plot it with time on x axis and value on y axes with the line appearing a different color depending on the group.
I tried using geom_path and geom_line, but they end up linking points to points within groups. I found out that when I use a continuous variable for the groups, I have a normal line; however when I use a factor or a categorical variable, I have the link problem.
Here is a reproducible example that is what I would like:
df = data.frame(time = c(1,2,3,4,5,6,7,8,9,10), value = c(5,4,9,3,8,2,5,8,7,1), group = c(1,2,2,2,1,1,2,2,2,2))
ggplot(df, aes(time, value, color = group)) + geom_line()
And here is a reproducible example that is what I have:
df = data.frame(time = c(1,2,3,4,5,6,7,8,9,10), value = c(5,4,9,3,8,2,5,8,7,1), group = c("apple","pear","pear","pear","apple","apple","pear","pear","pear","pear"))
ggplot(df, aes(time, value, color = group)) + geom_line()
So the first example works well, but 1/ it adds a few lines to change the legend to have the labels I want, 2/ out of curiosity I would like to know if I missed something.
Is there any option in ggplot I could use to have the behavior I expect, or is it an internal constraint?

As pointed by Richard Telford and Carles Sans Fuentes, adding group = 1 within the ggplot aesthetic makes the job. So the normal code should be:
ggplot(df, aes(time, value, color = group, group = 1)) + geom_line()

Related

R code of scatter plot for three variables

Hi I am trying to code for a scatter plot for three variables in R:
Race= [0,1]
YOI= [90,92,94]
ASB_mean = [1.56, 1.59, 1.74]
Antisocial <- read.csv(file = 'Antisocial.csv')
Table_1 <- ddply(Antisocial, "YOI", summarise, ASB_mean = mean(ASB))
Table_1
Race <- unique(Antisocial$Race)
Race
ggplot(data = Table_1, aes(x = YOI, y = ASB_mean, group_by(Race))) +
geom_point(colour = "Black", size = 2) + geom_line(data = Table_1, aes(YOI,
ASB_mean), colour = "orange", size = 1)
Image of plot: https://drive.google.com/file/d/1E-ePt9DZJaEr49m8fguHVS0thlVIodu9/view?usp=sharing
Data file: https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
Can someone help me understand where I am making mistake? I want to plot mean ASB vs YOI grouped by Race. Thanks.
I am not sure what is your desidered output. Maybe, if I well understood your question I Think that you want somthing like this.
g_Antisocial <- Antisocial %>%
group_by(Race) %>%
summarise(ASB = mean(ASB),
YOI = mean(YOI))
Antisocial %>%
ggplot(aes(x = YOI, y = ASB, color = as_factor(Race), shape = as_factor(Race))) +
geom_point(alpha = .4) +
geom_point(data = g_Antisocial, size = 4) +
theme_bw() +
guides(color = guide_legend("Race"), shape = guide_legend("Race"))
and this is the output:
#Maninder: there are a few things you need to look at.
First of all: The grammar of graphics of ggplot() works with layers. You can add layers with different data (frames) for the different geoms you want to plot.
The reason why your code is not working is that you mix the layer call and or do not really specify (and even mix) what is the scatter and line visualisation you want.
(I) Use ggplot() + geom_point() for a scatter plot
The ultimate first layer is: ggplot(). Think of this as your drawing canvas.
You then speak about adding a scatter plot layer, but you actually do not do it.
For example:
# plotting antisocal data set
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race)))
will plot your Antiscoial data set using the scatter, i.e. geom_point() layer.
Note that I put Race as a factor to have a categorical colour scheme otherwise you might end up with a continous palette.
(II) line plot
In analogy to above, you would get for the line plot the following:
# plotting Table_1
ggplot() +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean))
I save showing the plot of the line.
(III) combining different layers
# putting both together
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race))) +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean)) +
## this is to set the legend title and have a nice(r) name in your colour legend
labs(colour = "Race")
This yields:
That should explain how ggplot-layering works. Keep an eye on the datasets and geoms that you want to use. Before working with inheritance in aes, I recommend to keep the data= and aes() call in the geom_xxxx. This avoids confustion.
You may want to explore with geom_jitter() instead of geom_point() to get a bit of a better presentation of your dataset. The "few" points plotted are the result of many datapoints in the same position (and overplotted).
Moving away from plotting to your question "I want to plot mean ASB vs YOI grouped by Race."
I know too little about your research to fully comprehend what you mean with that.
I take it that the mean ASB you calculated over the whole population is your reference (aka your Table_1), and you would like to see how the Race groups feature vs this population mean.
One option is to group your race data points and show them as boxplots for each YOI.
This might be what you want. The boxplot gives you the median and quartiles, and you can compare this per group against the calculated ASB mean.
For presentation purposes, I highlighted the line by increasing its size and linetype. You can play around with the colours, etc. to give you the aesthetics you aim for.
Please note, that for the grouped boxplot, you also have to treat your integer variable YOI, I coerced into a categorical factor. Boxplot works with fill for the body (colour sets only the outer line). In this setup, you also need to supply a group value to geom_line() (I just assigned it to 1, but that is arbitrary - in other contexts you can assign another variable here).
ggplot() +
geom_boxplot(data = Antisocial, aes(x = as.factor(YOI), y = ASB, fill = as.factor(Race))) +
geom_line(data = Table_1, aes(x = as.factor(YOI), y = ASB_mean, group = 1)
, size = 2, linetype = "dashed") +
labs(x = "YOI", fill = "Race")
Hope this gets you going!

remove connecting lines when using color aesthetic ggplot in r

I'm plotting a time series with gaps from one source filled in from a second. Plotting this to indicate the source, I specify the source as a color aesthetic, but ggplot adds connecting lines tieing the gaps together.
Is there a clean way to remove these connecting lines? Ideally, I would like the separate groups to be connected, since I am using it as one data set.
library(ggplot)
set.seed(914)
df=data.frame(x=c(1:100),y=runif(100), z = rep(c("a", "b"), each = 25, times = 2))
ggplot(df, aes(x=x, y = y, color = z))+
geom_line()
Removing connetion lines between missing Dates in ggplot
suggests creating a group aesthetic, or making the gaps explicit using NA values.
I don't have any clear grouping aesthetic in my real data, like the year in the referenced example, and with many irregularly spaced gaps, it isn't immediately clear how to insert NA's in every gap.
As long as your colored "groups" are sequential in your data frame (as in your example) you can do:
ggplot(df, aes(x=x, y = y, color = z,
group = factor(c(0, cumsum(abs(head(z, -1) != tail(z, -1))))))) +
geom_line()
Or, for brevity, use data.table::rleid:
ggplot(df, aes(x=x, y = y, color = z, group = data.table::rleid(z))) +
geom_line()
which gives the same result

How to make a circled bubble plot using ggplot2 coord_polar()?

I have an example data, which does not have x- and y-axis information. I would like to make a bubble plot using R package ggplot2, and arrange the bubbles in a circled manner.
data <- data.frame(group = paste("Group", letters[1:11]),
value = sample(seq(1,100),11))
Thanks a lot.
You can just put a dummy value for y and make group your x values in aes.
ggplot(data, aes(x = group, y = 0, size = value)) +
coord_polar() +
geom_point()

ggplot2 - group aesthetic not working as expected am I missing something?

I am trying to plot a line graph with multiple lines (grouped by a categorical value - factor) and based on what I have done in the past and what I can find online here the easiest way to do this is by assigning the categorical value to the group aesthetic - but this isn't working for me I am only getting one line on the line graph. I am 100% sure I am doing something super silly but I can't for the life of me work it out. Thanks in advance :)
#dummy data for example
test <- data.frame(x = sample(seq(as.Date('2015/01/01'), as.Date('2020/01/01'), by="day"), 20),
y = sample(10:300, 10),
Origin_Station = as.factor(rep(1, 10)),
Neighbour_station = as.factor(rep(1:5, each = 20)))
#plot - what I want to see is a line for each of the 5 Neighbour_station categories (1:5) but what I get is just one line
ggplot(test, aes(x=x, y=y, group = Neighbour_station))+
geom_line()
I have also tried this:
ggplot(test, aes(x=x, y=y, group = factor(Neighbour_station), colour = Neighbour_station))+
geom_line()
Hi Rhetta also from Aus here, big ups Australian useRs:
library(ggplot2)
ggplot(test, aes(x = x, y = y, group = Neighbour_station, colour = Neighbour_station))+
geom_line()
Note the reason you can't see the distinct lines is because your data is exactly the same for each factor level (Neighbour_station 1:5).

R ggplot how to overlay partial graph on the full graph

Is there a way to overlay partial graph on top of full graph using ggplot? I have one line graph with time span of say 100 days on X axis and need to add second line that only spans last 20 days, with different color; I don't want to plot second line as having zero values for first 80 days - need to only plot it for last 20 days- using different color. What is the best way to do that?
Sure, just use two geoms with different subsets of your data.frame (for simplicity I use the full df and only one subset):
library(ggplot2)
df <- data.frame(Index = 1:1000, Value = cumsum(rnorm(1000)))
ggplot() + geom_line(data = df, aes(x = Index, y = Value)) +
geom_line(data = df[500:700,], aes(x = Index, y = Value), col="red")

Resources