Remove a line linked from first to last data, by geom_path() - r

I learned from stack overflow that geom_path() can remove the line from different part of data on the same line. It means that inside the whole Red line, there are some other colors parts, and without this command, the last point at previous blue part is linked to the first point in next blue part. Code and image are as below:
p6 <- ggplot(data = M1.m, mapping = aes(x = as.Date(M1_Date, format='%d/%m/%Y'),
y = M1_Value, color = factor(NewGroup))) + geom_path(aes(group = 1)) + geom_point(size = 0.5)
p6 <- p6 + labs(x = "Date", y = "Value") + labs(color = "Value Type")
p6
When I use them, it seems that interval data has solved this questions, but there is a wired line linked between the first data and the last data. Can you please tell me how to remove that?
The data is too large and I cannot link here sorry about that and here is the link:
Data
Thank you!

The geom_path() is used to draw line according to data order in data frame. In my question, column M1_value includes two types data with same period. So the last data in type 1 is next to the first data in type, and that's the reason for the wired line. The solution is to add a new column (say type) in data frame and add group = type in aes() to remove line. (BTW, this question is similar as recording monthly temperature because first day of new month's temperature will be linked by temperature of the last day of month, and the key for this type of question is to specify classification clearly and group them makes the question more easier.)
Answer:
Change previous code to:
p6 <- ggplot(data = M1.m, mapping = aes(x = as.Date(M1_Date, format='%d/%m/%Y'), y = M1_Value,
color = NewGroup, group = M1_Type)) + geom_path() + geom_point(size = 0.5)
p6 <- p6 + labs(x = "Date", y = "Value") + labs(color = "Value Type")
p6
and the plot changes to:
So the wired line is removed. Done!
Thanks for the help from following link:
R geom_path lines "closing", sometimes. How to keep them "open"?.

For me, a simpler solution was to arrange the dataframe using the variable plotted on the x-axis. In the example above, this would result in:
p6 <- ggplot(data = M1.m %>% arrange(M1_Date), mapping = aes(x = as.Date(M1_Date, format='%d/%m/%Y'),
y = M1_Value, color = factor(NewGroup))) + geom_path(aes(group = 1)) + geom_point(size = 0.5)
p6 <- p6 + labs(x = "Date", y = "Value") + labs(color = "Value Type")
p6
I haven't tested this on the data above. The solution given by Ericshaw did not work for me because I could not add a group aesthetic when already using linetype as an aesthetic.

Related

Can't make a ggplot with multiple lines, geom_line()

I'm trying to plot two lines using flight data I gathered. My problem is that after trying different formulas, R is still only showing one line. I've separated my data according to regions (see image below). Can someone help me out with my formula?
If you need any additional information don't hesitate to ask, this is my first time posting on this channel.
ggplot(ica.vs.total, aes(x = Year, y = flights)) +
geom_line(aes(color = region, group = region), size = 1) +
theme_minimal()
When I enter :
library(ggplot2)
ica.vs.total = data.frame(flights=c(215947,197757,185782,201023,279218,261045,213343,205609),
region=c('TotalFlights','TotalFlights','TotalFlights','TotalFlights',
'TotalFlightsICA','TotalFlightsICA','TotalFlightsICA','TotalFlightsICA'),
Year=c(2008,2009,2010,2011,2000,2001,2002,2003))
g = ggplot(ica.vs.total, aes(x = Year, y = flights)) +
geom_line(aes(color = region, group = region), size = 1)+
theme_minimal()
print(g)
I get the expected result :
Double check your code.

How do I correctly connect data points ggplot

I am making a stratigraphic plot but somehow, my data points don't connect correctly.
The purpose of this plot is that the values on the x-axis are connected so you get an overview of the change in d18O throughout time (age, ma).
I've used the following script:
library(readxl)
R_pliocene_tot <- read_excel("Desktop/R_d18o.xlsx")
View(R_pliocene_tot)
install.packages("analogue")
install.packages("gridExtra")
library(tidyverse)
R_pliocene_Rtot <- R_pliocene_tot %>%
gather(key=param, value=value, -age_ma)
R_pliocene_Rtot
R_pliocene_Rtot %>%
ggplot(aes(x=value, y=age_ma)) +
geom_path() +
geom_point() +
facet_wrap(~param, scales = "free_x") +
scale_y_reverse() +
labs(x = NULL, y = "Age (ma)")
which leads to the following figure:
Something is wrong with the geom_path function, I guess, but I can't figure out what it is.
Though the comment seem solve the problem I don't think the question asked was answered. So here is some introduction about ggplot2 library regard geom_path
library(dplyr)
library(ggplot2)
# This dataset contain two group with random value for y and x run from 1->20
# The param is just to replicate the question param variable.
df <- tibble(x = rep(seq(1, 20, by = 1), 2),
y = runif(40, min = 1, max = 100),
group = c(rep("group 1", 20), rep("group 2", 20)),
param = rep("a param", 40))
df %>%
ggplot(aes(x = x, y = y)) +
# In geom_path there is group aesthetics which help the function to know
# which data point should is in which path.
# The one in the same group will be connected together.
# here I use the color to help distinct the path a bit more.
geom_path(aes(group = group, color = group)) +
geom_point() +
facet_wrap(~param, scales = "free_x") +
scale_y_reverse() +
labs(x = NULL, y = "Age (ma)")
In your data which work well with group = 1 I guessed all data points belong to one group and you just want to draw a line connect all those data point. So take my data example above and draw with aesthetics group = 1, you can see the result that have two line similar to the above example but now the end point of group 1 is now connected with the starting point of group 2.
So all data point is now on one path but the order of how they draw is depend on the order they appear in the data. (I keep the color just to help see it a bit clearer)
df %>%
ggplot(aes(x = x, y = y)) +
geom_path(aes(group = 1, color = group)) +
geom_point() +
facet_wrap(~param, scales = "free_x") +
scale_y_reverse() +
labs(x = NULL, y = "Age (ma)")
Hope this give you better understanding of ggplot2::geom_path

How to stop ggplot line plot adding fill

I am producing a ggplot which looks at a curve in a dataset. When I build the plot, ggplot is automatically adding fill to data which is on the negative side of the x axis. Script and plot shown below.
ggplot(df, aes(x = Var1, y = Var2)) +
geom_line() +
geom_vline(xintercept = 0) +
geom_hline(yintercept = Var2[1])
Using base R, I am able to get the plot shown below which is how it should look.
plot(x = df$Var1, y = df$Var2, type = "l",
xlab = "Var1", ylab = "Var2")
abline(v = 0)
abline(h = df$Var2[1])
If anyone could help identify why I might be getting the automatic fill and how I could make it stop, I would be very appreciative. I would like to make this work in ggplot so I can later animate the line as it is a time series that can be used to compare between other datasets from the same source.
Can add data if necessary. Data set is 1561 obs long however. Thanks in advance.
I guess you should try
ggplot(df, aes(x = Var1, y = Var2)) +
geom_path() +
geom_vline(xintercept = 0) +
geom_hline(yintercept = Var2[1])
instead. The geom_line()-function connects the points in order of the variable on the x-axis.
Take a look at this example
dt <- data.frame(
x = c(seq(-pi/2,3*pi,0.001),seq(-pi/2,3*pi,0.001)),
y = c(sin(seq(-pi/2,3*pi,0.001)), cos(seq(-pi/2,3*pi,0.001)))
)
ggplot(dt, aes(x,y)) + geom_line()
The two points with x-coordinate -pi/2 will be connected first, creating a vertical black line. Next x = -pi/2 + 0.001 will be processed and so on. The x values will be processed in order.
Therefore you should use geom_path() to get the desired result
dt <- data.frame(
x = c(seq(-pi/2,3*pi,0.001),seq(-pi/2,3*pi,0.001)),
y = c(sin(seq(-pi/2,3*pi,0.001)), cos(seq(-pi/2,3*pi,0.001)))
)
ggplot(dt, aes(x,y)) + geom_path()

Time series data using ggplot: how use different color for each time point and also connect with lines data belonging to each subject?

I have data from several cells which I tested in several conditions: a few times before and also a few times after treatment. In ggplot, I use color to indicate different times of testing.
Additionally, I would like to connect with lines all data points which belong to the same cell. Is that possible?...
Here is my example data (https://www.dropbox.com/s/eqvgm4yu6epijgm/df.csv?dl=0) and a simplified code for the plot:
df$condition = as.factor(df$condition)
df$cell = as.factor(df$cell)
df$condition <- factor(df$condition, levels = c("before1", "before2", "after1", "after2", "after3")
windows(width=8,height=5)
ggplot(df, aes(x=condition, y=test_variable, color=condition)) +
labs(title="", x = "Condition", y = "test_variable", color="Condition") +
geom_point(aes(color=condition),size=2,shape=17, position = position_jitter(w = 0.1, h = 0))
I think you get in the wrong direction for your code, you should instead group and colored each points based on the column Cell. Then, if I'm right, you are looking to see the evolution of the variable for each cell before and after a treatment, so you can order the x variable using scale_x_discrete.
Altogether, you can do something like that:
library(ggplot2)
ggplot(df, aes(x = condition, y = variable, group = Cell)) +
geom_point(aes(color = condition))+
geom_line(aes(color = condition))+
scale_x_discrete(limits = c("before1","before2","after1","after2","after3"))
Does it look what you are expecting ?
Data
df = data.frame(Cell = c(rep("13a",5),rep("1b",5)),
condition = rep(c("before1","before2","after1","after2","after3"),2),
variable = c(58,55,36,29,53,57,53,54,52,52))

Why are the colors wrong on this ggplot? [duplicate]

This question already has an answer here:
ggplot wrong color assignment
(1 answer)
Closed 7 months ago.
I am new to ggplot2 so please have mercy on me.
My first attempt produces a strange result (at least it's strange to me). My reproducible R code is:
library(ggplot2)
iterations = 7
variables = 14
data <- matrix(ncol=variables, nrow=iterations)
data[1,] = c(0,0,0,0,0,0,0,0,10134,10234,10234,10634,12395,12395)
data[2,] = c(18596,18596,18596,18596,19265,19265,19390,19962,19962,19962,19962,20856,20856,21756)
data[3,] = c(7912,11502,12141,12531,12718,12968,13386,17998,19996,20226,20388,20583,20879,21367)
data[4,] = c(0,0,0,0,0,0,0,43300,43500,44700,45100,45100,45200,45200)
data[5,] = c(11909,11909,12802,12802,12802,13202,13307,13808,21508,21508,21508,22008,22008,22608)
data[6,] = c(11622,11622,11622,13802,14002,15203,15437,15437,15437,15437,15554,15554,15755,16955)
data[7,] = c(8626,8626,8626,9158,9158,9158,9458,9458,9458,9458,9458,9458,9558,11438)
df <- data.frame(data)
n_data_rows = nrow(df)
previous_volumes = df[1:(n_data_rows-1),]/1000
todays_volume = df[n_data_rows,]/1000
time = seq(ncol(df))/6
min_y = min(previous_volumes, todays_volume)
max_y = max(previous_volumes, todays_volume)
ylimit = c(min_y, max_y)
x = seq(nrow(previous_volumes))
# This gives a plot with 6 gray lines and one red line, but no Ledgend
p = ggplot()
for (row in x) {
y1 = as.integer(previous_volumes[row,])
dd = data.frame(time, y1)
p = p + geom_line(data=dd, aes(x=time, y=y1, group="1"), color="gray")
}
p
This code produces a correct plot... but no legend. The plot looks like:
If I move "color" inside "aes", I now get a legend... but the colors are wrong.
For example, the code:
p = ggplot()
for (row in x) {
y1 = as.integer(previous_volumes[row,])
dd = data.frame(time, y1)
p = p + geom_line(data=dd, aes(x=time, y=y1, group="1", color="gray"))
}
y2 = as.integer(todays_volume[1,])
dd = data.frame(time, y2)
p = p + geom_line(data=dd, aes(x=time, y=y2, group="2", colour="red"))
p
produces:
Why are the line colors wrong?
Charles
Colours can be controlled on an individual layer basis (i.e. the colour = XYZ) variable, however, these will not appear in any legend. Legends are produced when you have an aesthetic (i.e. in this case colour aesthetic) mapped to a variable in your data, in which case, you need to instruct how to to represent that specific mapping. If you do not specify explicitly, ggplot2 will try to make a best guess (say in the difference between discrete and continuous mapping for factor data vs numeric data). There are many options available here, including (but not limited to): scale_colour_continuous, scale_colour_discrete, scale_colour_brewer, scale_colour_manual.
By the sounds of it, scale_colour_manual is probably what you are after, note that in the below I have mapped the 'variable' column in the data to the colour aesthetic, and in the 'variable' data, the discrete values [PREV-A to PREV-F,Today] exists, so now we need to instruct what actual colour 'PREV-A','PREV-B',...'PREV-F' and 'Today' represents.
Alternatively, If the variable column contains 'actual' colours (i.e. hex '#FF0000' or name 'red') then you can use scale_colour_identity. We can also create another column of categories ('Previous','Today') to make things a little easier, in which case, be sure to introduce the 'group' aesthetic mapping to prevent series with the same colour (which are actually different series) being made continuous between them.
First prepare the data, then go through some different methods to assign colours.
# Put data as points 1 per row, series as columns, start with
# previous days
df.new = as.data.frame(t(previous_volumes))
#Rename the series, for colour mapping
colnames(df.new) = sprintf("PREV-%s",LETTERS[1:ncol(df.new)])
#Add the times for each point.
df.new$Times = seq(0,1,length.out = nrow(df.new))
#Add the Todays Volume
df.new$Today = as.numeric(todays_volume)
#Put in long format, to enable mapping of the 'variable' to colour.
df.new.melt = reshape2::melt(df.new,'Times')
#Create some colour mappings for use later
df.new.melt$color_group = sapply(as.character(df.new.melt$variable),
function(x)switch(x,'Today'='Today','Previous'))
df.new.melt$color_identity = sapply(as.character(df.new.melt$variable),
function(x)switch(x,'Today'='red','grey'))
And here are a few different ways of manipulating the colours:
#1. Base plot + color mapped to variable
plot1 = base + geom_path(aes(color=variable)) +
ggtitle("Plot #1")
#2. Base plot + color mapped to variable, Manual scale for Each of the previous days and today
colors = setNames(c(rep('gray',nrow(previous_volumes)),'red'),
unique(df.new.melt$variable))
plot2 = plot1 + scale_color_manual(values = colors) +
ggtitle("Plot #2")
#3. Base plot + color mapped to color group
plot3 = base + geom_path(aes(color = color_group,group=variable)) +
ggtitle("Plot #3")
#4. Base plot + color mapped to color group, Manual scale for each of the groups
plot4 = plot3 + scale_color_manual(values = c('Previous'='gray','Today'='red')) +
ggtitle("Plot #4")
#5. Base plot + color mapped to color identity
plot5 = base + geom_path(aes(color = color_identity,group=variable))
plot5a = plot5 + scale_color_identity() + #Identity not usually in legend
ggtitle("Plot #5a")
plot5b = plot5 + scale_color_identity(guide='legend') + #Identity forced into legend
ggtitle("Plot #5b")
gridExtra::grid.arrange(plot1,plot2,plot3,plot4,
plot5a,plot5b,ncol=2,
top="Various Outputs")
So given your question, #2 or #4 is probably what you are after, using #2, we can add another layer to render the value of the last points:
#Additionally, add label of the last point in each series.
df.new.melt.labs = plyr::ddply(df.new.melt,'variable',function(df){
df = tail(df,1) #Last Point
df$label = sprintf("%.2f",df$value)
df
})
baseWithLabels = base +
geom_path(aes(color=variable)) +
geom_label(data = df.new.melt.labs,aes(label=label,color=variable),
position = position_nudge(y=1.5),size=3,show.legend = FALSE) +
scale_color_manual(values=colors)
print(baseWithLabels)
If you want to be able to distinguish between the various 'PREV-X' lines, then you can also map linetype to this variable and/or make the label geometry more descriptive, below demonstrates both modifications:
#Add labels of the last point in each series, include series info:
df.new.melt.labs2 = plyr::ddply(df.new.melt,'variable',function(df){
df = tail(df,1) #Last Point
df$label = sprintf("%s: %.2f",df$variable,df$value)
df
})
baseWithLabelsAndLines = base +
geom_path(aes(color=variable,linetype=variable)) +
geom_label(data = df.new.melt.labs2,aes(label=label,color=variable),
position = position_nudge(y=1.5),hjust=1,size=3,show.legend = FALSE) +
scale_color_manual(values=colors) +
labs(linetype = 'Series')
print(baseWithLabelsAndLines)
My solution, which I got from here is to add scale_colour_identity() to your ggplot object -
p = p + geom_line(data=dd, aes(x=time, y=y2, group="2", colour="red"))
p = p + scale_colour_identity()
p

Resources