I'm trying to plot two lines using flight data I gathered. My problem is that after trying different formulas, R is still only showing one line. I've separated my data according to regions (see image below). Can someone help me out with my formula?
If you need any additional information don't hesitate to ask, this is my first time posting on this channel.
ggplot(ica.vs.total, aes(x = Year, y = flights)) +
geom_line(aes(color = region, group = region), size = 1) +
theme_minimal()
When I enter :
library(ggplot2)
ica.vs.total = data.frame(flights=c(215947,197757,185782,201023,279218,261045,213343,205609),
region=c('TotalFlights','TotalFlights','TotalFlights','TotalFlights',
'TotalFlightsICA','TotalFlightsICA','TotalFlightsICA','TotalFlightsICA'),
Year=c(2008,2009,2010,2011,2000,2001,2002,2003))
g = ggplot(ica.vs.total, aes(x = Year, y = flights)) +
geom_line(aes(color = region, group = region), size = 1)+
theme_minimal()
print(g)
I get the expected result :
Double check your code.
I am making a stratigraphic plot but somehow, my data points don't connect correctly.
The purpose of this plot is that the values on the x-axis are connected so you get an overview of the change in d18O throughout time (age, ma).
I've used the following script:
library(readxl)
R_pliocene_tot <- read_excel("Desktop/R_d18o.xlsx")
View(R_pliocene_tot)
install.packages("analogue")
install.packages("gridExtra")
library(tidyverse)
R_pliocene_Rtot <- R_pliocene_tot %>%
gather(key=param, value=value, -age_ma)
R_pliocene_Rtot
R_pliocene_Rtot %>%
ggplot(aes(x=value, y=age_ma)) +
geom_path() +
geom_point() +
facet_wrap(~param, scales = "free_x") +
scale_y_reverse() +
labs(x = NULL, y = "Age (ma)")
which leads to the following figure:
Something is wrong with the geom_path function, I guess, but I can't figure out what it is.
Though the comment seem solve the problem I don't think the question asked was answered. So here is some introduction about ggplot2 library regard geom_path
library(dplyr)
library(ggplot2)
# This dataset contain two group with random value for y and x run from 1->20
# The param is just to replicate the question param variable.
df <- tibble(x = rep(seq(1, 20, by = 1), 2),
y = runif(40, min = 1, max = 100),
group = c(rep("group 1", 20), rep("group 2", 20)),
param = rep("a param", 40))
df %>%
ggplot(aes(x = x, y = y)) +
# In geom_path there is group aesthetics which help the function to know
# which data point should is in which path.
# The one in the same group will be connected together.
# here I use the color to help distinct the path a bit more.
geom_path(aes(group = group, color = group)) +
geom_point() +
facet_wrap(~param, scales = "free_x") +
scale_y_reverse() +
labs(x = NULL, y = "Age (ma)")
In your data which work well with group = 1 I guessed all data points belong to one group and you just want to draw a line connect all those data point. So take my data example above and draw with aesthetics group = 1, you can see the result that have two line similar to the above example but now the end point of group 1 is now connected with the starting point of group 2.
So all data point is now on one path but the order of how they draw is depend on the order they appear in the data. (I keep the color just to help see it a bit clearer)
df %>%
ggplot(aes(x = x, y = y)) +
geom_path(aes(group = 1, color = group)) +
geom_point() +
facet_wrap(~param, scales = "free_x") +
scale_y_reverse() +
labs(x = NULL, y = "Age (ma)")
Hope this give you better understanding of ggplot2::geom_path
I am producing a ggplot which looks at a curve in a dataset. When I build the plot, ggplot is automatically adding fill to data which is on the negative side of the x axis. Script and plot shown below.
ggplot(df, aes(x = Var1, y = Var2)) +
geom_line() +
geom_vline(xintercept = 0) +
geom_hline(yintercept = Var2[1])
Using base R, I am able to get the plot shown below which is how it should look.
plot(x = df$Var1, y = df$Var2, type = "l",
xlab = "Var1", ylab = "Var2")
abline(v = 0)
abline(h = df$Var2[1])
If anyone could help identify why I might be getting the automatic fill and how I could make it stop, I would be very appreciative. I would like to make this work in ggplot so I can later animate the line as it is a time series that can be used to compare between other datasets from the same source.
Can add data if necessary. Data set is 1561 obs long however. Thanks in advance.
I guess you should try
ggplot(df, aes(x = Var1, y = Var2)) +
geom_path() +
geom_vline(xintercept = 0) +
geom_hline(yintercept = Var2[1])
instead. The geom_line()-function connects the points in order of the variable on the x-axis.
Take a look at this example
dt <- data.frame(
x = c(seq(-pi/2,3*pi,0.001),seq(-pi/2,3*pi,0.001)),
y = c(sin(seq(-pi/2,3*pi,0.001)), cos(seq(-pi/2,3*pi,0.001)))
)
ggplot(dt, aes(x,y)) + geom_line()
The two points with x-coordinate -pi/2 will be connected first, creating a vertical black line. Next x = -pi/2 + 0.001 will be processed and so on. The x values will be processed in order.
Therefore you should use geom_path() to get the desired result
dt <- data.frame(
x = c(seq(-pi/2,3*pi,0.001),seq(-pi/2,3*pi,0.001)),
y = c(sin(seq(-pi/2,3*pi,0.001)), cos(seq(-pi/2,3*pi,0.001)))
)
ggplot(dt, aes(x,y)) + geom_path()
I have data from several cells which I tested in several conditions: a few times before and also a few times after treatment. In ggplot, I use color to indicate different times of testing.
Additionally, I would like to connect with lines all data points which belong to the same cell. Is that possible?...
Here is my example data (https://www.dropbox.com/s/eqvgm4yu6epijgm/df.csv?dl=0) and a simplified code for the plot:
df$condition = as.factor(df$condition)
df$cell = as.factor(df$cell)
df$condition <- factor(df$condition, levels = c("before1", "before2", "after1", "after2", "after3")
windows(width=8,height=5)
ggplot(df, aes(x=condition, y=test_variable, color=condition)) +
labs(title="", x = "Condition", y = "test_variable", color="Condition") +
geom_point(aes(color=condition),size=2,shape=17, position = position_jitter(w = 0.1, h = 0))
I think you get in the wrong direction for your code, you should instead group and colored each points based on the column Cell. Then, if I'm right, you are looking to see the evolution of the variable for each cell before and after a treatment, so you can order the x variable using scale_x_discrete.
Altogether, you can do something like that:
library(ggplot2)
ggplot(df, aes(x = condition, y = variable, group = Cell)) +
geom_point(aes(color = condition))+
geom_line(aes(color = condition))+
scale_x_discrete(limits = c("before1","before2","after1","after2","after3"))
Does it look what you are expecting ?
Data
df = data.frame(Cell = c(rep("13a",5),rep("1b",5)),
condition = rep(c("before1","before2","after1","after2","after3"),2),
variable = c(58,55,36,29,53,57,53,54,52,52))
This question already has an answer here:
ggplot wrong color assignment
(1 answer)
Closed 7 months ago.
I am new to ggplot2 so please have mercy on me.
My first attempt produces a strange result (at least it's strange to me). My reproducible R code is:
library(ggplot2)
iterations = 7
variables = 14
data <- matrix(ncol=variables, nrow=iterations)
data[1,] = c(0,0,0,0,0,0,0,0,10134,10234,10234,10634,12395,12395)
data[2,] = c(18596,18596,18596,18596,19265,19265,19390,19962,19962,19962,19962,20856,20856,21756)
data[3,] = c(7912,11502,12141,12531,12718,12968,13386,17998,19996,20226,20388,20583,20879,21367)
data[4,] = c(0,0,0,0,0,0,0,43300,43500,44700,45100,45100,45200,45200)
data[5,] = c(11909,11909,12802,12802,12802,13202,13307,13808,21508,21508,21508,22008,22008,22608)
data[6,] = c(11622,11622,11622,13802,14002,15203,15437,15437,15437,15437,15554,15554,15755,16955)
data[7,] = c(8626,8626,8626,9158,9158,9158,9458,9458,9458,9458,9458,9458,9558,11438)
df <- data.frame(data)
n_data_rows = nrow(df)
previous_volumes = df[1:(n_data_rows-1),]/1000
todays_volume = df[n_data_rows,]/1000
time = seq(ncol(df))/6
min_y = min(previous_volumes, todays_volume)
max_y = max(previous_volumes, todays_volume)
ylimit = c(min_y, max_y)
x = seq(nrow(previous_volumes))
# This gives a plot with 6 gray lines and one red line, but no Ledgend
p = ggplot()
for (row in x) {
y1 = as.integer(previous_volumes[row,])
dd = data.frame(time, y1)
p = p + geom_line(data=dd, aes(x=time, y=y1, group="1"), color="gray")
}
p
This code produces a correct plot... but no legend. The plot looks like:
If I move "color" inside "aes", I now get a legend... but the colors are wrong.
For example, the code:
p = ggplot()
for (row in x) {
y1 = as.integer(previous_volumes[row,])
dd = data.frame(time, y1)
p = p + geom_line(data=dd, aes(x=time, y=y1, group="1", color="gray"))
}
y2 = as.integer(todays_volume[1,])
dd = data.frame(time, y2)
p = p + geom_line(data=dd, aes(x=time, y=y2, group="2", colour="red"))
p
produces:
Why are the line colors wrong?
Charles
Colours can be controlled on an individual layer basis (i.e. the colour = XYZ) variable, however, these will not appear in any legend. Legends are produced when you have an aesthetic (i.e. in this case colour aesthetic) mapped to a variable in your data, in which case, you need to instruct how to to represent that specific mapping. If you do not specify explicitly, ggplot2 will try to make a best guess (say in the difference between discrete and continuous mapping for factor data vs numeric data). There are many options available here, including (but not limited to): scale_colour_continuous, scale_colour_discrete, scale_colour_brewer, scale_colour_manual.
By the sounds of it, scale_colour_manual is probably what you are after, note that in the below I have mapped the 'variable' column in the data to the colour aesthetic, and in the 'variable' data, the discrete values [PREV-A to PREV-F,Today] exists, so now we need to instruct what actual colour 'PREV-A','PREV-B',...'PREV-F' and 'Today' represents.
Alternatively, If the variable column contains 'actual' colours (i.e. hex '#FF0000' or name 'red') then you can use scale_colour_identity. We can also create another column of categories ('Previous','Today') to make things a little easier, in which case, be sure to introduce the 'group' aesthetic mapping to prevent series with the same colour (which are actually different series) being made continuous between them.
First prepare the data, then go through some different methods to assign colours.
# Put data as points 1 per row, series as columns, start with
# previous days
df.new = as.data.frame(t(previous_volumes))
#Rename the series, for colour mapping
colnames(df.new) = sprintf("PREV-%s",LETTERS[1:ncol(df.new)])
#Add the times for each point.
df.new$Times = seq(0,1,length.out = nrow(df.new))
#Add the Todays Volume
df.new$Today = as.numeric(todays_volume)
#Put in long format, to enable mapping of the 'variable' to colour.
df.new.melt = reshape2::melt(df.new,'Times')
#Create some colour mappings for use later
df.new.melt$color_group = sapply(as.character(df.new.melt$variable),
function(x)switch(x,'Today'='Today','Previous'))
df.new.melt$color_identity = sapply(as.character(df.new.melt$variable),
function(x)switch(x,'Today'='red','grey'))
And here are a few different ways of manipulating the colours:
#1. Base plot + color mapped to variable
plot1 = base + geom_path(aes(color=variable)) +
ggtitle("Plot #1")
#2. Base plot + color mapped to variable, Manual scale for Each of the previous days and today
colors = setNames(c(rep('gray',nrow(previous_volumes)),'red'),
unique(df.new.melt$variable))
plot2 = plot1 + scale_color_manual(values = colors) +
ggtitle("Plot #2")
#3. Base plot + color mapped to color group
plot3 = base + geom_path(aes(color = color_group,group=variable)) +
ggtitle("Plot #3")
#4. Base plot + color mapped to color group, Manual scale for each of the groups
plot4 = plot3 + scale_color_manual(values = c('Previous'='gray','Today'='red')) +
ggtitle("Plot #4")
#5. Base plot + color mapped to color identity
plot5 = base + geom_path(aes(color = color_identity,group=variable))
plot5a = plot5 + scale_color_identity() + #Identity not usually in legend
ggtitle("Plot #5a")
plot5b = plot5 + scale_color_identity(guide='legend') + #Identity forced into legend
ggtitle("Plot #5b")
gridExtra::grid.arrange(plot1,plot2,plot3,plot4,
plot5a,plot5b,ncol=2,
top="Various Outputs")
So given your question, #2 or #4 is probably what you are after, using #2, we can add another layer to render the value of the last points:
#Additionally, add label of the last point in each series.
df.new.melt.labs = plyr::ddply(df.new.melt,'variable',function(df){
df = tail(df,1) #Last Point
df$label = sprintf("%.2f",df$value)
df
})
baseWithLabels = base +
geom_path(aes(color=variable)) +
geom_label(data = df.new.melt.labs,aes(label=label,color=variable),
position = position_nudge(y=1.5),size=3,show.legend = FALSE) +
scale_color_manual(values=colors)
print(baseWithLabels)
If you want to be able to distinguish between the various 'PREV-X' lines, then you can also map linetype to this variable and/or make the label geometry more descriptive, below demonstrates both modifications:
#Add labels of the last point in each series, include series info:
df.new.melt.labs2 = plyr::ddply(df.new.melt,'variable',function(df){
df = tail(df,1) #Last Point
df$label = sprintf("%s: %.2f",df$variable,df$value)
df
})
baseWithLabelsAndLines = base +
geom_path(aes(color=variable,linetype=variable)) +
geom_label(data = df.new.melt.labs2,aes(label=label,color=variable),
position = position_nudge(y=1.5),hjust=1,size=3,show.legend = FALSE) +
scale_color_manual(values=colors) +
labs(linetype = 'Series')
print(baseWithLabelsAndLines)
My solution, which I got from here is to add scale_colour_identity() to your ggplot object -
p = p + geom_line(data=dd, aes(x=time, y=y2, group="2", colour="red"))
p = p + scale_colour_identity()
p