I am very new to R and I am trying to plot a third variable to a plot using ggplot2. I have searched for an answer and I could not find anything similar (or I didn't know the right words to search).
I have three columns of data which will be my x, y and z variable.
I want a graph that can show the values for x and y axis (as in the first and second column variables). However, I want the "points" (as a scatter plot) in the graph to be the values shown in variable z. Is there a way of doing that?
Everything that I have tried plot x against y.
Thanks for any help!
I believe this is what you are asking: Map two variables: (x,y) in their axis and display the "text" of a third variable.
Let's use this data frame - We'll try to "write" X1 and X3
df <- data.frame(X1 = 1:5, X2 = 2*1:5, X3 = rnorm(1:5))
With base graphics you can just plot one character
plot(df$X1, df$X2, pch = paste(df$X1)) plot(df$X1, df$X2, pch = paste(df$X3))
doesn't seem to work well.
Using ggplot2:
ggplot(df, aes(x = X1, y = X2)) + geom_text(label = df$X1)
ggplot(df, aes(x = X1, y = X2)) + geom_text(label = df$X3)
a fancier alternative is adding colour in the aes()
ggplot(df, aes(x = X1, y = X2, color=X3)) + geom_text(label = df$X3)
I want the "points" (as a scatter plot) in the graph to be the values shown in variable z. Is there a way of doing that?
Definitely. The bit that you need to think about is how to present the data in your z variable. By that I mean do you want the information in z to be shown by the points' colour, size or area? There are some great examples of how to do this at the R cookbook.
If you have a data frame called my.data, which has columns x, y, and z, you need to set up your plot like this:
my.plot <- ggplot(data = my.data,
aes(x = x,
y = y))
The example above says "plot the data in my.data using my.data$x to set the x location and y.data$y to set the y location". If your x variable was grid.x and y was grid.y you would have
my.plot <- ggplot(data = my.data,
aes(x = grid.x,
y = grid.y))
then you need to add your points. This time we'll assume that the information in z is going to used to set the colour of the points, which in this case is the colour aesthetic:
my.plot <- my.plot + geom_point(aes(colour = z))
print(my.plot)
And that should be that. You don't need to tell geom_point() what x and y are, because you already did that when you set up the plot.
Related
This question already has answers here:
How to plot one variable in ggplot?
(5 answers)
Closed 2 years ago.
I can simply plot a vector in R language using plot, like this:
vec <- sqrt(1:100)
plot(vec, type = "l")
But I want to plot this vector using ggplot2 because its plots are better and more beautiful. But I'm struggling with it. any help would be appreciated.
Try this:
ggplot(as.data.frame(vec)) +
geom_point(aes(vec, sqrt(vec)))
It works, but I would like to advise you to create a dataframe before making the plot.
Let's say you want a plot with lines and/or points. One way to have control over what you do is:
Create a dataframe with an x column and a y column
df <- data.frame(x = 1:100, y = 1:100)
pass the dataframe to ggplot()
ggplot(df)
add the geom you want by defining x and y in aes
ggplot(df) +
geom_point(aes(x = x, y = y)) +
geom_line(aes(x = x, y = y))
customize your plot
Note 1: in step 3 you can define aes also in ggplot to not repeat the code:
ggplot(df, aes(x = x, y = y)) +
geom_line() +
geom_point()
Note 2: in aes, x and y on the left of = are the names of the parameters, while x and y on the right of = are the names of the columns of the dataframe. The names of the x and y parameters can be omitted and you can leave only the name of the columns of the dataframe
Thanks to the good answer of Leonardo, the short answer to my question (to graph a vector vec via ggplot) would be this:
d <- data.frame(x = 1:length(vec), y = vec)
ggplot(d, aes(x, y)) + geom_line()
I struggle with plotting a quadratic fit in ggplot, as the line between the overlapping x-values jumps back and forth between the upper and lower side of the curve.
However, doing the same in base plot, it works, which makes me think I am overlooking something (possibly really stupid) in ggplot. Could anybody guide me towards how to receive a propper line in ggplot?
I unfortunately don`t know how to reproduce the exact problem, so just add code for a similar shaped "curve":
library(ggplot2)
x1 <- log(c(1:100, 99:1))
y1 <- log(seq(0.22, 0.2, length.out = 199))
dat <- data.frame(x = x1, y = y1)
ggplot(data = dat, aes(x = x, y = y)) + geom_line()
plot(y1 ~ x1, type = "l")
Thanks a lot in advance!
Try geom_path() instead.
library(ggplot2)
x1 <- log(c(1:100, 99:1))
y1 <- log(seq(0.22, 0.2, length.out = 199))
dat <- data.frame(x = x1, y = y1)
ggplot(data = dat, aes(x = x, y = y)) + geom_path()
plot(y1 ~ x1, type = "l")
geom_path() connects the observations in the order in which they appear in the data. geom_line() connects them in order of the variable on the x axis.
Documentation.
Consider this minimum working example:
library(ggplot2)
x <- c(1,2,3,4,5,6)
y <- c(3,2,5,1,3,1)
data <- data.frame(x,y)
pClass <- c(0,1,1,2,2,0)
plottedGraph <- ggplot(data, aes(x = x, y = y, colour = factor(pClass))) + geom_line()
print(plottedGraph)
I have a time series y = f(x) where x is a timestep. Each timestep should have a color which depends on the category of the timestep, recorded in pClass.
This is the result it gives:
It doesn't make any kind of sense to me why ggplot would connect points with the same color together and not points that follow each other (which is what geom_line should do according to the documentation).
How do I make it plot the following:
You should use group = 1 inside the aes() to tell ggplot that the different colours in fact belong to the same line (ie. group).
ggplot(data, aes(x = x, y = y, colour = factor(pClass), group = 1)) +
geom_line()
How can I fill a geom_violin plot in ggplot2 with different colors based on a fixed cutoff?
For instance, given the setup:
library(ggplot2)
set.seed(123)
dat <- data.frame(x = rep(1:3,each = 100),
y = c(rnorm(100,-1),rnorm(100,0),rnorm(100,1)))
dat$f <- with(dat,ifelse(y >= 0,'Above','Below'))
I'd like to take this basic plot:
ggplot() +
geom_violin(data = dat,aes(x = factor(x),y = y))
and simply have each violin colored differently above and below zero. The naive thing to try, mapping the fill aesthetic, splits and dodges the violin plots:
ggplot() +
geom_violin(data = dat,aes(x = factor(x),y = y, fill = f))
which is not what I want. I'd like a single violin plot at each x value, but with the interior filled with different colors above and below zero.
Here's one way to do this.
library(ggplot2)
library(plyr)
#Data setup
set.seed(123)
dat <- data.frame(x = rep(1:3,each = 100),
y = c(rnorm(100,-1),rnorm(100,0),rnorm(100,1)))
First we'll use ggplot::ggplot_build to capture all the calculated variables that go into plotting the violin plot:
p <- ggplot() +
geom_violin(data = dat,aes(x = factor(x),y = y))
p_build <- ggplot2::ggplot_build(p)$data[[1]]
Next, if we take a look at the source code for geom_violin we see that it does some specific transformations of this computed data frame before handing it off to geom_polygon to draw the actual outlines of the violin regions.
So we'll mimic that process and simply draw the filled polygons manually:
#This comes directly from the source of geom_violin
p_build <- transform(p_build,
xminv = x - violinwidth * (x - xmin),
xmaxv = x + violinwidth * (xmax - x))
p_build <- rbind(plyr::arrange(transform(p_build, x = xminv), y),
plyr::arrange(transform(p_build, x = xmaxv), -y))
I'm omitting a small detail from the source code about duplicating the first row in order to ensure that the polygon is closed.
Now we do two final modifications:
#Add our fill variable
p_build$fill_group <- ifelse(p_build$y >= 0,'Above','Below')
#This is necessary to ensure that instead of trying to draw
# 3 polygons, we're telling ggplot to draw six polygons
p_build$group1 <- with(p_build,interaction(factor(group),factor(fill_group)))
And finally plot:
#Note the use of the group aesthetic here with our computed version,
# group1
p_fill <- ggplot() +
geom_polygon(data = p_build,
aes(x = x,y = y,group = group1,fill = fill_group))
p_fill
Note that in general, this will clobber nice handling of any categorical x axis labels. So you will often need to do the plot using a continuous x axis and then if you need categorical labels, add them manually.
I have a large matrix mdat (1000 rows and 16 columns) contains first column as x variable and other columns as y variables. What I want to do is to make scatter plot in R having 15 figures on the same window. For example:
mdat <- matrix(c(1:50), nrow = 10, ncol=5)
In the above matrix, I have 10 rows and 5 columns. Is it possible that to use the first column as variable on x axes and other columns as variable on y axes, so that I have four different scatterplots on the same window? Keep in mind that I will not prefer par(mfrow=, because in that case I have to run each graph and then produce them on same window. What I need is a package so that I will give it just data and x, y varaibeles, and have graphs on same windows.
Is there some package available that can do this? I cannot find one.
Perhaps the simplest base R way is mfrow (or mfcol)
par(mfrow = c(2, 2)) ## the window will have 2 rows and 2 columns of plots
for (i in 2:ncol(mdat)) plot(mdat[, 1], mdat[, i])
See ?par for everything you might want to know about further adjustments.
Another good option in base R is layout (the help has some nice examples). To be fancy and pretty, you could use the ggplot2 package, but you'll need to reshape your data into a long format.
require(ggplot2)
require(reshape2)
molten <- melt(as.data.frame(mdat), id = "V1")
ggplot(molten, aes(x = V1, y = value)) +
facet_wrap(~ variable, nrow = 2) +
geom_point()
Alternatively with colors instead of facets:
ggplot(molten, aes(x = V1, y = value, color = variable)) +
geom_point()
#user4299 You can re-write shujaa's ggplot command in this form, using qplot which means 'quick plot' which is easier when starting out. Then instead of faceting, use variable to drive the color. So first command produces the same output as shujaa's answer, then the second command gives you all the lines on one plot with different colors and a legend.
qplot(data = molten, x = V1, y = value, facets = . ~ variable, geom = "point")
qplot(data = molten, x = V1, y = value, color = variable, geom = "point")
Maybe
library(lattice)
x = mdat[,1]; y = mdat[,-1]
df = data.frame(X = x, Y = as.vector(y),
Grp = factor(rep(seq_len(ncol(y)), each=length(x))))
xyplot(Y ~ X | Grp, df)