Doing path & point plot using qplot() - r

My data consists of three variables:volume, occ and state. I want to have a volume-occ path & point plot with the paths and points marked by different colours according to the state.
Here is my code:
qplot(occ,volume,data = data,geom=c('path','point'),color=factor(state))+scale_colour_manual(values=c("blue", "orange", "red"))
The outcome is like this:
It seems that the qplot() does not connect the points in the initial order.Because the red points and paths are not connected with others, so are the other two coloured points.
I guess qplot() reordered my data according to the variable 'state', and then plot the path within each state separately.
I also tried the code without the colour arguement:
qplot(occ,volume,data = data,geom=c('path','point'))
The outcome is like this:
This outcome does show the initial path order that I want.
What I want is every point being connected continuously in the initial order just like what outcome2 shows and marked by different colours according to the state variable.
What should I do with my code?

If you set the color to a variable the data are also grouped by this variable. To prevent this set the group attribute manually to a constant. Here is an example:
df <- data.frame(x = 1:20,
y = c(rnorm(10, 5, 2), rnorm(10, 5, 2)),
group = c(rep("a", 10), rep("b", 10) ))
ggplot(df, aes(x = x, y = y, group = 1, col = group))+ geom_path()
The same with qplot:
qplot(x,y,data = df,geom=c('point'),color=factor(group), group = 1)
So just add group = 1 to your code and it will work the way you expect it.

Related

How do I plot a graph in R, with the first values in one colour and the next values in another colour?

I've been trying this out but I cannot find a solution. The best I can do is plotting the first 15846 values in 1 colour and then using the lines() function to add the remaining 841 points. But these then appear at the start of the graph and does not continue from the 15846th datapoint.
str(as.numeric(sigma.in.fr))
num [1:15846] 0.000408 0.000242 0.000536 0.000274 0.000476 ...
str(as.numeric(sigma.out.fr))
num [1:841] 0.002558 0.000428 0.000255 0.000549 0.00028 ...
plot(as.numeric(sigma.in.fr),type="l",col=c("tomato4"))
lines(as.numeric(sigma.out.fr), type="l",col="tomato1")
This returns the plot below:
Lets make some dummy data to demonstrate:
sigma.ins.fr = sin((1:800)/20) + rnorm(800)
sigma.outs.fr = sin((801:1000)/20) + rnorm(200)
Now, put all the data together into a single sequence
sigma.all = c(sigma.ins.fr, sigma.outs.fr)
And create an x vector which simply counts along the data. We'll need this in the segments call below.
x = seq_along(sigma.all)
Now create a vector of colors for the trace. It is the same length as the full data, with a color for each segment.
cols = c(rep("tomato4", length(sigma.ins.fr)), rep("blue", length(sigma.outs.fr)))
Now create a blank canvass on which to draw the data.
plot(sigma.all, type="l", col=NA)
At last, we can plot the data. Unfortunately, lines does not allow for a separate color in different segments. So instead we can use segments
segments(head(x,-1), head(sigma.all,-1), x[-1], sigma.all[-1], type="l", col=cols)
Or, if you really prefer to use two separate traces uning lines, then we can achieve this by adding the x coordinates to each call:
plot(sigma.all, type="l", col=NA)
lines(seq_along(sigma.ins.fr), sigma.ins.fr, col=c("tomato4"))
lines(seq_along(sigma.outs.fr) + length(sigma.ins.fr), sigma.outs.fr, col="tomato1")
Please provide a reproducible example. Using the packages ggplot2 and dplyr you can do something like:
df <- tibble(x = seq(1,1000, 1), y = seq(1, 500.5, 0.5))
ggplot() +
geom_line(data = df %>% filter(x < 800),
aes(x = x, y = y), color = "red", size =2) +
geom_line(data = df %>% filter(x >= 800),
aes(x = x, y = y),
color = "black", size = 2)
Note that I put the cut off at 800 (as I only created 1000 points), but you can easily change that.
So what I do is putting the data in geom_line, as you can also use this if you have different dataframes (with overlapping x and y) you want to plot in the same graph. However, I do filter the data at different points, so that different lines are drawn by the geom_line.

Behavior of "fill" argument in geom_polygon in R

I am trying to understand the behavior of the "fill" argument in geom_polygon for ggplot.
I have a dataframe where I have multiple values from a measure of interest, obtained in different counties for each state. I have merged my database with the coordinates from the "maps" package and then I call the plot via ggplot. I don't understand how ggplot chooses what color to show for a state considering that different numbers are provided in the fill variable (mean?median?interpolation?)
Reproducing a piece of my dataframe to explain what I mean:
state=rep("Alabama",3)
counties=c("Russell","Clay","Montgomery")
long=c(-87.46201,-87.48493,-87.52503)
lat=c(30.38968,30.37249,30.33239)
group=rep(1,3)
measure=c(22,28,17)
df=data.frame(state, counties, long,lat,group,measure)
Call for ggplot
p <- ggplot()
p <- p + geom_polygon(data=df, aes(x=long, y=lat, group=group, fill=df$measure),colour="black"
)
print(p)
Using the full dataframe, I have hundreds of rows with iterations of 17 counties and all the set of coordinates for the Alabama polygon. How is it that ggplot provides the state fill with a single color?
Again, I would assume it is somehow interpolating the fill values provided at each set of coordinate, but I am not sure about it.
Thanks everyone for the help.
Through trial and error, it looks like the first value of the fill mapping is used for the fill of the polygon. The range of the fill scale takes all values into account. This makes sense because the documentation doesn't mention any aggregation---I agree that an aggregate function would also make sense, but I would assume that the aggregation function would be set via an argument if that were the implementation.
Instead, the documentation shows an example (and recommends) starting with two data frames, one of which has coordinates for each vertex, and one which has a single row (and single fill value) per polygon, and joining them based on an ID column.
Here's a demonstration:
long=c(1, 1, 2)
lat=c(1, 2, 2)
group=rep(1,3)
df=data.frame(long,lat,group,
m1 = c(1, 1, 1),
m2 = c(1, 2, 3),
m3 = c(3, 1, 2),
m4 = c(1, 10, 11),
m5 = c(1, 5, 11),
m6 = c(11, 1, 10))
library(ggplot2)
plots = lapply(paste0("m", 1:6), function(f)
ggplot(df, aes(x = long, y = lat, group = group)) +
geom_polygon(aes_string(fill = f)) +
labs(title = sprintf("%s:, %s", f, toString(df[[f]])))
)
do.call(gridExtra::grid.arrange, plots)

Changing colour under particular threshold for geom_line [duplicate]

I have the following dataframe that I would like to plot. I was wondering if it is possible to color portions of the lines connecting my outcome variable(stackOne$y) in a different color, depending on whether it is less than a certain value or not. For example, I would like portions of the lines falling below 2.2 to be red in color.
set.seed(123)
stackOne = data.frame(id = rep(c(1, 2, 3), each = 3),
y = rnorm(9, 2, 1),
x = rep(c(1, 2, 3), 3))
ggplot(stackOne, aes(x = x, y = y)) +
geom_point() +
geom_line(aes(group = id))
Thanks!
You have at least a couple of options here. The first is quite simple, general (in that it's not limited to straight-line segments) and precise, but uses base plot rather than ggplot. The second uses ggplot, but is slightly more complicated, and colour transition will not be 100% precise (but near enough, as long as you specify an appropriate resolution... read on).
base:
If you're willing to use base plotting functions rather than ggplot, you could clip the plotting region to above the threshold (2.2), then plot the segments in your preferred colour, and subsequently clip to the region below the threshold, and plot again in red. While the first clip is strictly unnecessary, it prevents overplotting different colours, which can look a bit dud.
threshold <- 2.2
set.seed(123)
stackOne=data.frame(id=rep(c(1,2,3),each=3),
y=rnorm(9,2,1),
x=rep(c(1,2,3),3))
# create a second df to hold segment data
d <- stackOne
d$y2 <- c(d$y[-1], NA)
d$x2 <- c(d$x[-1], NA)
d <- d[-findInterval(unique(d$id), d$id), ] # remove last row for each group
plot(stackOne[, 3:2], pch=20)
# clip to region above the threshold
clip(min(stackOne$x), max(stackOne$x), threshold, max(stackOne$y))
segments(d$x, d$y, d$x2, d$y2, lwd=2)
# clip to region below the threshold
clip(min(stackOne$x), max(stackOne$x), min(stackOne$y), threshold)
segments(d$x, d$y, d$x2, d$y2, lwd=2, col='red')
points(stackOne[, 3:2], pch=20) # plot points again so they lie over lines
ggplot:
If you want or need to use ggplot, you can consider the following...
One solution is to use geom_line(aes(group=id, color = y < 2.2)), however this will assign colours based on the y-value of the point at the beginning of each segment. I believe you want to have the colour change not just at the nodes, but wherever a line crosses your given threshold of 2.2. I'm not all that familiar with ggplot, but one way to achieve this is to make a higher-resolution version of your data by creating new points along the lines that connect your existing points, and then use the color = y < 2.2 argument to achieve the desired effect.
For example:
threshold <- 2.2 # set colour-transition threshold
yres <- 0.01 # y-resolution (accuracy of colour change location)
d <- stackOne # for code simplification
# new cols for point coordinates of line end
d$y2 <- c(d$y[-1], NA)
d$x2 <- c(d$x[-1], NA)
d <- d[-findInterval(unique(d$id), d$id), ] # remove last row for each group
# new high-resolution y coordinates between each pair within each group
y.new <- apply(d, 1, function(x) {
seq(x['y'], x['y2'], yres*sign(x['y2'] - x['y']))
})
d$len <- sapply(y.new, length) # length of each series of points
# new high-resolution x coordinates corresponding with new y-coords
x.new <- apply(d, 1, function(x) {
seq(x['x'], x['x2'], length.out=x['len'])
})
id <- rep(seq_along(y.new), d$len) # new group id vector
y.new <- unlist(y.new)
x.new <- unlist(x.new)
d.new <- data.frame(id=id, x=x.new, y=y.new)
p <- ggplot(d.new, aes(x=x,y=y)) +
geom_line(aes(group=d.new$id, color=d.new$y < threshold))+
geom_point(data=stackOne)+
scale_color_discrete(sprintf('Below %s', threshold))
p
There may well be a way to do this through ggplot functions, but in the meantime I hope this helps. I couldn't work out how to draw a ggplotGrob into a clipped viewport (rather it seems to just scale the plot). If you want colour to be conditional on some x-value threshold instead, this would obviously need some tweaking.
Encouraged by people in my answer to a newer but related question, I'll also share a easier to use approximation to the problem here.
Instead of interpolating the correct values exactly, one can use ggforce::geom_link2() to interpolate lines and use after_stat() to assign the correct colours after interpolation. If you want more precision you can increase the n of that function.
library(ggplot2)
library(ggforce)
#> Warning: package 'ggforce' was built under R version 4.0.3
set.seed(123)
stackOne = data.frame(id = rep(c(1, 2, 3), each = 3),
y = rnorm(9, 2, 1),
x = rep(c(1, 2, 3), 3))
ggplot(stackOne, aes(x = x, y = y)) +
geom_point() +
geom_link2(
aes(group = id,
colour = after_stat(y < 2.2))
) +
scale_colour_manual(
values = c("black", "red")
)
Created on 2021-03-26 by the reprex package (v1.0.0)

Grouped line plots in Plotly R: how to control line color?

I have a bunch of 'paired' observations from a study for the same subject, and I am trying to build a spaghetti plot to visualize these observations as follows:
library(plotly)
df <- data.frame(id = rep(1:10, 2),
type = c(rep('a', 10), rep('b', 10)),
state = rep(c(0, 1), 10),
values = c(rnorm(10, 2, 0.5), rnorm(10, -2, 0.5)))
df <- df[order(df$id), ]
plot_ly(df, x = type, y = values, group = id, type = 'line') %>%
layout(showlegend = FALSE)
It produces the correct plot I am seeking. But, the code shows each grouped line in own color, which is really annoying and distracting. I can't seem to find a way to get rid of colors.
Bonus question: I actually want to use color = state and actually color the sloped lines by that variable instead.
Any approaches / thoughts?
You can set the lines to the same colour like this
plot_ly(df, x = type, y = values, group = id, type = 'scatter', mode = 'lines+markers',
line=list(color='#000000'), showlegend = FALSE)
For the 'bonus' two-for-the-price-of-one question 'how to color by a different variable to the one used for grouping':
If you were only plotting markers, and no lines, this would be simple, as you can simply provide a vector of colours to marker.color. Unfortunately, however, line.color only takes a single value, not a vector, so we need to work around this limitation.
Provided the data are not too numerous (in which case this method becomes slow, and a faster method is given below), you can set colours of each line individually by adding them as separate traces one by one in a loop (looping over id)
p <- plot_ly()
for (id in df$id) {
col <- c('#AA0000','#0000AA')[df[which(df$id==id),3][1]+1] # calculate color for this line based on the 3rd column of df (df$state).
p <- add_trace(data=df[which(df$id==id),], x=type, y=values, type='scatter', mode='markers+lines',
marker=list(color=col),
line=list(color=col),
showlegend = FALSE,
evaluate=T)
}
p
Although this one-trace-per-line approach is probably the simplest way conceptually, it does become very (impractically) slow if applied to hundreds or thousands of line segments. In this case there is a faster method, which is to plot only one line per colour, but to split this line up into multiple segments by inserting NA's between the separate segments and using the connectgaps=FALSE option to break the line into segments where there are missing data.
Begin by using dplyr to insert missing values between line segements (i.e. for each unique id we add a row containing NA in the columns that provide x and y coordinates).
library(dplyr)
df %<>% distinct(id) %>%
`[<-`(,c(2,4),NA) %>%
rbind(df) %>%
arrange (id)
and plot, using connectgaps=FALSE:
plot_ly(df, x = type, y = values, group = state, type = 'scatter', mode = 'lines+markers',
showlegend = FALSE,
connectgaps=FALSE)

Is there an easy way to reverse the ordering of a scale inside an aes() inside a geom()?

I've found many examples describing the assignment of alpha when in a ggplot2 line like so:
scale_alpha( variable, trans = reverse)
ref
However, is there a method to simply invert the scale in aes() inside the geom_*()?
Something like:
geom_point(aes(colour=variableA, alpha=REVERSE(variableB))
(This is a very old question, but I had the same issue and couldn't find an answer. The previous solution by hugh-allan is, as indicated in the Edit note, producing an incorrect legend.)
The settings of the scale should really be in the scale_alpha* parameter. That's where you manage this. The geoms are used for adding the data or setting a style for all points, not tuning a specific scale (otherwise, it would need to be inside the aes() mapping).
To be clear, there are two options in current versions of ggplot2 (using version 3.3.5):
tibble(x = 1:10, y = 1) %>%
ggplot(aes(x, y, alpha = x) +
geom_point(size = 5) +
scale_alpha(trans = reverse_trans())
or, probably more in line with current ggplot documentation:
scale_alpha(range = c(1, 0.1))
i.e., reversing the range of the alpha scale (the default is range = c(1, 0.1)).
If I understand the question correctly, you want to reverse the scale by which alpha is assigned inside a geom...?
For example, by default lower values of x will have lower values of alpha, and appear lighter:
# sample data
tibble(
x = 1:10,
y = 1,
) %>%
ggplot(aes(x, y, alpha = x))+
geom_point(size = 5)
You can reverse it so lower values of x are darker, by using sort() inside aes():
tibble(
x = 1:10,
y = 1,
) %>%
ggplot(aes(x, y, alpha = sort(x, decreasing = TRUE)))+
geom_point(size = 5)
Edit: just realised the legend is incorrect. I guess it's ok if you don't include the legend.

Resources