ggplot creates artefacts in plot with geom_line() - r

I have two vectors x,y and want to create a line plot for y over x.
y = c(-0.0400785, -0.0304795, -0.0208800, -0.0112805, -0.0016810, 0.0079185, 0.0175180, 0.0271175,
0.0367170, 0.0463160, 0.0559155, 0.0655150, 0.0751145, 0.0847140, 0.0943135, 0.1039130,
0.1135125, 0.1231120, 0.1327110, 0.1423105, 0.1519100)
x = c(-5.867304, -5.879089, -5.987021, -6.309500, -6.770748, -7.189354, -7.455675, -7.545589, -7.463138,
-7.371971, -7.407384, -7.461245, -7.398057, -7.192540, -7.010408, -6.961792, -6.994748, -6.971052,
-6.779542, -6.536575, -6.301766)
If I use the base plot function everything is fine.
plot(x,y, type = "l")
If I create a line plot using ggplot
library(ggplot2)
df = data.frame("y" = y, "x" = x)
ggplot(data = df) + geom_line(aes(x = x, y = y))
It introduces strange artefacts, as if wrong x are ordered to wrong y. Does anyone have the same experience and even better a solution for that problem?

You have to use geom_path:
ggplot(data = df) +
geom_path(aes(x = x, y = y))

Related

How to return either a vector or string based on condition in ifelse statement?

I am trying to write a function that creates a scatterplot - of which the points may need to be colored based on a variable or not.
I tried the following approach. But it doesn't color the points by group. Although the code runs fine without the ifelse statement.
data <- data.frame(x = rnorm(100,sd=2),
y1 = x*0.5+rnorm(100,sd=1),
y2 = fitted(lm(y~x))) %>%
pivot_longer(cols = -x,
names_to = "Group",
values_to = "yy")
group <- "Group"
ygroups <- 2
defaultcol = "black"
ggplot(data = data, mapping = aes(x = x , y = yy,
color = ifelse(ygroups > 1, get(group), defaultcol))) +
geom_point()
# runs fine
ggplot(data = data, mapping = aes(x = x , y = yy, color = get(group))) +
geom_point()
You don't want to use ifelse in this case because you need to return vectors of different length that your input. Just use a regular if/else
ggplot(data = data) +
aes(x = x , y = yy, color = if(ygroups > 1) get(group) else defaultcol) +
geom_point() +
labs(color="Color")
But you can't set selecific default colors in an aes(color=) -- that will remap the color name via your color scale. If you just want to conditionally add the scale, then do
ggplot(data = data) +
aes(x = x , y = yy) +
{if( ygroups > 1) aes(color=.data[[group]])} +
geom_point()
(using .data[[ ]] is recommended over using get())

How to make scatter plot points into numbers?

I am creating a scatter plot using ggplot/geom_point. Here is my code for building the function in ggplot.
AddPoints <- function(x) {
list(geom_point(data = dat , mapping = aes(x = x, y = y) , shape = 1 , size = 1.5 ,
color = "blue"))
}
I am wondering if it would be possible to replace the standard points on the plot with numbers. That is, instead of seeing a dot on the plot, you would see a number on the plot to represent each observation. I would like that number to correspond to a column for that given observation (column name 'RP'). Thanks in advance.
Sample data.
Data <- data.frame(
X = sample(1:10),
Y = sample(3:12),
RP = sample(c(4,8,9,12,3,1,1,2,7,7)))
Use geom_text() and map the rp variable to the label argument.
ggplot(Data, aes(x = X, y = Y, label = RP)) +
geom_text()

Create a graph from a binary column in a dataframe - R

I need to create a point graph using the "ggplot" library based on a binary column of a dataframe.
df <- c(1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1)
I need a point to be created every time the value "1" appears in the column, and all points are on the same graph. Thanks.
If the binary column you talk about is associated to some other variables, then I think this might work:
(I've just created some random x and y which are the same length as the binary 0, 1s you provided)
x <- rnorm(22)
y <- x^2 + rnorm(22, sd = 0.3)
df <- data.frame("x" = x, "y" = y,
"binary" = c(1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1))
library(ggplot2)
# this is the plot with all the points
ggplot(data = df, mapping = aes(x = x, y = y)) + geom_point()
# this is the plot with only the points for which the "binary" variable is 1
ggplot(data = subset(df, binary == 1), mapping = aes(x = x, y = y)) + geom_point()
# this is the plot with all points where they are coloured by whether "binary" is 0 or 1
ggplot(data = df, mapping = aes(x = x, y = y, colour = as.factor(binary))) + geom_point()
Something like this?
library(ggplot2)
y <- df
is.na(y) <- y == 0
ggplot(data = data.frame(x = seq_along(y), y), mapping = aes(x, y)) +
geom_point() +
scale_y_continuous(breaks = c(0, 1),
labels = c("0" = "0", "1" = "1"),
limits = c(0, 1))
It only plots points where df == 1, not the zeros. If you also want those, don't run the code line starting is.na(y).
Not sure exactly what you are asking, but here are a few options. Since your data structure is not a data frame, I've renamed it test. First, dotplot with ggplot:
library(ggplot2)
ggplot(as.data.frame(test), aes(x=test)) + geom_dotplot()
Or you could do the same thing as a bar:
qplot(test, geom="bar")
Or, a primitive base R quick look:
plot(test, pch=16, cex=3)

How plot new point in ggplot with older color data?

I know similar questions asked before but my question is different. Consider data points data1 that have colors with respect to x and y coordinates and I plot it with ggplot
x = 1:100
y = 1:100
d = expand.grid(x,y)
data1 <- data.frame(
xval = d$Var1,
yval = d$Var2,
col = d$Var1+d$Var2)
data2 <- data.frame(
xnew = c(1.5, 90.5),
ynew = c(95.5, 4))
ggplot(data1, aes(xval, yval, colour = col)) + geom_point()
But I want the last line don't plot anything and I want plot data2 points with respect to colors of data1. for example I paint what I want to plot for data2 :
I changed the last line to:
ggplot(data1, aes(xval, yval, colour = col)) +
geom_point(data = data2, aes(x = xnew, y = ynew))
Now I expect that ggplot draw just 2 points of data2, but I have an Error:
Don't know how to automatically pick scale for object of type function. Defaulting to continuous.
Error: Column colour must be a 1d atomic vector or a list
The problem is, that there is no mapping between col out of data1 and your data2.
Please try the following:
ggplot(data2, aes(x = xnew, y = ynew, colour = xnew)) + geom_point() +
scale_fill_gradientn(colours=c(2,1),
values = range(data1$xval),
rescaler = function(x,...) x,
oob = identity)

How to merge color, line style and shape legends in ggplot

Suppose I have the following plot in ggplot:
It was generated using the code below:
x <- seq(0, 10, by = 0.2)
y1 <- sin(x)
y2 <- cos(x)
y3 <- cos(x + pi / 4)
y4 <- sin(x + pi / 4)
df1 <- data.frame(x, y = y1, Type = as.factor("sin"), Method = as.factor("method1"))
df2 <- data.frame(x, y = y2, Type = as.factor("cos"), Method = as.factor("method1"))
df3 <- data.frame(x, y = y3, Type = as.factor("cos"), Method = as.factor("method2"))
df4 <- data.frame(x, y = y4, Type = as.factor("sin"), Method = as.factor("method2"))
df.merged <- rbind(df1, df2, df3, df4)
ggplot(df.merged, aes(x, y, colour = interaction(Type, Method), linetype = Method, shape = Type)) + geom_line() + geom_point()
I would like to have only one legend that correctly displays the shapes, the colors and the line types (the interaction(Type, Method) legends is the closest to what I would like, but it does not have the correct shapes/line types).
I know that if I use scale_xxx_manual and I specify the same labels for all legends they will be merged, but I don't want to have to set the labels manually: if there are new Methods or Types, I don't want to have to modify my code: a want something generic.
Edit
As pointed in answers below, there are several ways to get the job done in this particular case. All proposed solutions require to manually set the legend line types and shapes, either by using scale_xxx_manual functions or with guides function.
However, the proposed solutions still don't work in the general case: for instance, if I add a new data frame to the data set with a new "method3" Method, it does not work anymore, we have to manually add the new legend shapes and line types:
y5 <- sin(x - pi / 4)
df5 <- data.frame(x, y = y5, Type = as.factor("sin"), Method = as.factor("method3"))
df.merged <- rbind(df1, df2, df3, df4, df5)
override.shape <- c(16, 17, 16, 17, 16)
override.linetype <- c(1, 1, 3, 3, 4)
g <- ggplot(df.merged, aes(x, y, colour = interaction(Type, Method), linetype = Method, shape = Type)) + geom_line() + geom_point()
g <- g + guides(colour = guide_legend(override.aes = list(shape = override.shape, linetype = override.linetype)))
g <- g + scale_shape(guide = FALSE)
g <- g + scale_linetype(guide = FALSE)
print(g)
This gives:
Now the question is: how to automatically generate the override.shape and override.linetype vectors?
Note that the vector size is 5 because we have 5 curves, while the interaction(Type, Method) factor has size 6 (I don't have data for the cos/method3 combination)
Use labs() and set the same value for all aesthetics defining the appearance of geoms.
library('ggplot2')
ggplot(iris) +
aes(x = Sepal.Length, y = Sepal.Width,
color = Species, linetype = Species, shape = Species) +
geom_line() +
geom_point() +
labs(color = "Guide name", linetype = "Guide name", shape = "Guide name")
The R Cookbook section on Legends explains:
If you use both colour and shape, they both need to be given scale
specifications. Otherwise there will be two two separate legends.
In your case you need specifications for shape and linetype.
Edit
It was important to have the same data creating the shapes colors and lines, I combined your interaction phase by defining the column directly. Instead of scale_linetype_discrete to create the legend, I used scale_linetype_manual to specify the values since they will take on four different values by default.
If you would like a detailed layout of all possible shapes and line types, check this R Graphics site to see all of the number identifiers:
df.merged$int <- paste(df.merged$Type, df.merged$Method, sep=".")
ggplot(df.merged, aes(x, y, colour = int, linetype=int, shape=int)) +
geom_line() +
geom_point() +
scale_colour_discrete("") +
scale_linetype_manual("", values=c(1,2,1,2)) +
scale_shape_manual("", values=c(17,17,16,16))
Here is the solution in the general case:
# Create the data frames
x <- seq(0, 10, by = 0.2)
y1 <- sin(x)
y2 <- cos(x)
y3 <- cos(x + pi / 4)
y4 <- sin(x + pi / 4)
y5 <- sin(x - pi / 4)
df1 <- data.frame(x, y = y1, Type = as.factor("sin"), Method = as.factor("method1"))
df2 <- data.frame(x, y = y2, Type = as.factor("cos"), Method = as.factor("method1"))
df3 <- data.frame(x, y = y3, Type = as.factor("cos"), Method = as.factor("method2"))
df4 <- data.frame(x, y = y4, Type = as.factor("sin"), Method = as.factor("method2"))
df5 <- data.frame(x, y = y5, Type = as.factor("sin"), Method = as.factor("method3"))
# Merge the data frames
df.merged <- rbind(df1, df2, df3, df4, df5)
# Create the interaction
type.method.interaction <- interaction(df.merged$Type, df.merged$Method)
# Compute the number of types and methods
nb.types <- nlevels(df.merged$Type)
nb.methods <- nlevels(df.merged$Method)
# Set the legend title
legend.title <- "My title"
# Initialize the plot
g <- ggplot(df.merged, aes(x,
y,
colour = type.method.interaction,
linetype = type.method.interaction,
shape = type.method.interaction)) + geom_line() + geom_point()
# Here is the magic
g <- g + scale_color_discrete(legend.title)
g <- g + scale_linetype_manual(legend.title,
values = rep(1:nb.types, nb.methods))
g <- g + scale_shape_manual(legend.title,
values = 15 + rep(1:nb.methods, each = nb.types))
# Display the plot
print(g)
The result is the following:
Sinus curves are drawn as solid lines and cosinus curves as dashed lines.
"method1" data use filled circles for the shape.
"method2" data use filled triangle for the shape.
"method3" data use filled diamonds for the shape.
The legend matches the curve
To summarize, the tricks are :
Use the Type/Method interaction for all data representations (colour, shape,
linetype, etc.)
Then manually set both the curve styles and the legends styles with
scale_xxx_manual.
scale_xxx_manual allows you to provide a values vector that is longer than the actual number of curves, so it's easy to compute the style vector values from the sizes of the Type and Method factors
One just need to name both guides the same. For example:
g+ scale_linetype_manual(name="Guide1",values= c('solid', 'solid', 'dotdash'))+
scale_colour_manual(name="Guide1", values = c("blue", "green","red"))
The code below results in the desired legend, if I understand your question, but I'm not sure I understand the label issue, so let me know if this isn't what you were looking for.
p = ggplot(df.merged, aes(x, y, colour=interaction(Type, Method),
linetype=interaction(Type, Method),
shape=interaction(Type, Method))) +
geom_line() +
geom_point()
p + scale_shape_manual(values=rep(16:17, 2)) +
scale_linetype_manual(values=rep(c(1,3),each=2))

Resources