I have a plot(x, y) and I want to add a vertical line at x = 2 ONLY from y = 1 to 4. I want to use the lines() function but I'm having trouble limiting the y-range.
What's an easy way to do this?
Here's a simple example of using plot and lines. To draw a line from (2, 1) to (2, 4), you have to provide the x coordinates and y coordinates as (2, 2) and (1, 4):
plot(1:5)
lines(c(2, 2), c(1, 4))
ggplot2 offers a very simple solution, too!
library(ggplot2)
set.seed(1)
# Create some dummy data
data.frame(X = rpois(n = 10, lambda = 3),
Y = rpois(n = 10, lambda = 2)) %>%
# Pipe to ggplot
ggplot(aes(X, Y)) +
geom_point() +
geom_segment(aes(x = 1, xend = 1, y = 1, yend = 4), color = "red")
Within the aesthetics call to geom_segment() you can select the start and end points for your x and y parameters. You can then easily add multiple segments by simply adding + geom_segment(aes(...)) to the end of the code above.
For completeness, there is also a base graphics function in R that will do this: segments(x0,y0,x1,y1):
plot(1:5)
segments(2,1,2,4)
Related
I made this image in powerpoint to illustrate what I am trying to do:
I am trying to make a series of circles (each of which are the same size) that "move" along the x-axis in consistent intervals; for instance, the center of each consecutive circle would be 2 points away from the previous circle.
I have tried several things, including the DrawCircle function from the DescTools package, but cant produce this. For example, here I am trying to draw 20 circles, where the center of each circle is 2 points away from the previous, and each circle has a radius of 2 (which doesnt work)
library(DescTools)
plotdat <- data.frame(xcords = seq(1,50, by = 2.5), ycords = rep(4,20))
Canvas()
DrawCircle(x=plotdat$xcords, y=plotdat$ycords, radius = 2)
How can this be done in R?
This is basically #Peter's answer but with modifications. Your approach was fine but there is no radius= argument in DrawCircle. See the manual page ?DrawCircle for the arguments:
dev.new(width=12, height=4)
Canvas(xlim = c(0,50), ylim=c(2, 6), asp=1, xpd=TRUE)
DrawCircle(x=plotdat$xcords, y=plotdat$ycords, r.out = 2)
But your example has axes:
plot(NA, xlim = c(0,50), ylim=c(2, 6), xlab="", ylab="", yaxt="n", asp=1, xpd=TRUE)
DrawCircle(x=plotdat$xcords, y=plotdat$ycords, r.out = 2)
My solution requires the creation of some auxiliary functions
library(tidyverse)
##First function: create circle with a predefined radius, and a x-shift and y-shift
create_circle <- function(radius,x_shift, y_shift){
p <- tibble(
x = radius*cos(seq(0,2*pi, length.out = 1000)) + x_shift ,
y = radius*sin(seq(0,2*pi, length.out = 1000))+ y_shift
)
return(p)
}
##Use lapply to create circles with multiple x shifts:
##Group is only necessary for plotting
l <- lapply(seq(0,40, by = 2), function(i){
create_circle(2,i,0) %>%
mutate(group = i)
})
##Bind rows and plot
bind_rows(l) %>%
ggplot(aes(x = x, y = y, group =group)) +
geom_path()
Does this do the trick?
library(DescTools)
plotdat <- data.frame(xcords = seq(1, 5, length.out = 20), ycords = rep(4,20))
Canvas(xlim = c(0, 5), xpd=TRUE)
DrawCircle(x=plotdat$xcords, y=plotdat$ycords, r.out = 2)
I've assumed when you say circle centres are 2 points apart you mean 0.2 units apart.
You may have to experiment with the values to get what you need.
I'm trying to plot multiple circles of different sizes on a plot using ggplot2's geom_point inside of a for loop. Every time I run it though, it plots all the circles, but all in the location of the last circle instead of in their respective locations as given by the data frame. Below is an example of the code I am running. I'm wondering how I would fix this or if there's a better way to get at what I'm trying to do here.
data <- data.frame("x" = c(0, 500, 1000, 1500, 2000),
"y" = c(1500, 500, 2000, 0, 1000),
"size" = c(3, 5, 1.5, 4.2, 2.6)
)
g <- ggplot(data = data, aes(x = x, y = y)) + xlim(0,2000) + ylim(0,2000)
for(i in 1:5) {
g <- g + geom_point(aes(x=data$x[i],y=data$y[i]), size = data$size[i], pch = 1)
}
print(g)
It's pretty rare to need a for-loop for a plot -- ggplot2 will take the whole dataframe and process it all without you needing to manage each row.
ggplot(data = data, aes(x = x, y = y, size = size)) +
geom_point(pch = 1)
I have an existing plotting function (perhaps written by someone else) that uses mfrow to plot multiple figures on the same graphics device. I want to edit figures that have already been plotted (e.g. perhaps add a reference line to figure 1)
par(mfrow = c(1, 2))
plot(1:10)
hist(1:10)
# Oh no! I want to add abline(a = 0, b = 1) to the first plot!
Assume this code is nested in another plotting function
PlotABunchOfStuff(1:10) that I can't modify.
I don't want to modify PlotABunchOfStuff because someone else owns it, or I'm just debugging and won't need the extra details once the bug is found.
Use par(mfg).
For example:
par(mfrow = c(2, 3))
for (i in 1:6) {
plot(i, xlim = c(0,7), ylim = c(0, 7))
}
par(mfg = c(2, 2))
points(3,3,col= "red")
par(mfg = c(1, 1))
points(3,3,col= "blue")
If you are ready to use ggplot I think you can find what you want in the code below :
df <- data.frame(x = 1:10, y = 1:10)
g1 <- ggplot(df, aes(x = x, y = y)) +
geom_point()
g2 <- ggplot(df, aes(x = x, y = y)) +
geom_line()
grid.arrange(g1, g2)
g1 <- g1 + geom_smooth(method='lm',formula=y~x) # it could be anything else !
grid.arrange(g1, g2)
Edit 1
Create a graphical object in windows which will be destroye after dev.off() if filename = "" :
win.metafile(filename = "")
By default inhibit doesn't allow the plot to be recorded so we use enable :
dev.control('enable')
plot(1:10)
p <- recordPlot()
dev.off()
replayPlot(p)
p
abline(a = 1, b = 1, col = "red")
p <- recordPlot()
dev.off()
replayPlot(p)
My inspirations on Stackoverflow :
R plot without showing the graphic window
Save a plot in an object
My inspirations on R :
https://www.rdocumentation.org/packages/grDevices/versions/3.6.0/topics/dev
https://www.rdocumentation.org/packages/grDevices/versions/3.6.0/topics/recordPlot
I hope it helps you ! Good luck.
Be careful with scaling!
For the example in the original question, this will not have the result I think is desired.
par(mfrow = c(1, 2))
plot(1:10)
hist(1:10)
par(mfg = c(1, 1))
abline(a = 0, b = 1)
But this will have the result I think is desired.
par(mfrow = c(1, 2))
plot(1:10)
hist(1:10)
par(mfg = c(1, 1))
plot.window(xlim = c(1, 10), ylim = c(1, 10))
abline(a = 0, b = 1)
The mfg graphics parameter will allow you to jump to any panel, but the window scaling may need to be adjusted to be appropriate for the scale used when the original plot was created in that panel.
Why doesn't this code make lines between data at the same values of y?
main <- data_frame(x=rep(c(-1, 1), each=2), y = c(c(1, 1), c(2, 2)), z = c(1, 2, 3, 4))
qplot(data = main, x = x, y = z, geom="line", group=factor(y))
Here is what I get:
But I want only the points at the same level of y to be connected.
The issue is with how you defined your y variable. Change it to y = c(c(1,2), c(1,2)) and things should work.
Also, if you're going to use data_frame be sure to add the calls to library to make your code reproducible (i.e., library(dplyr) and library(ggplot2)).
in R, with ecdf I can plot a empirical cumulative distribution function
plot(ecdf(mydata))
and with hist I can plot a histogram of my data
hist(mydata)
How I can plot the histogram and the ecdf in the same plot?
EDIT
I try make something like that
https://mathematica.stackexchange.com/questions/18723/how-do-i-overlay-a-histogram-with-a-plot-of-cdf
Also a bit late, here's another solution that extends #Christoph 's Solution with a second y-Axis.
par(mar = c(5,5,2,5))
set.seed(15)
dt <- rnorm(500, 50, 10)
h <- hist(
dt,
breaks = seq(0, 100, 1),
xlim = c(0,100))
par(new = T)
ec <- ecdf(dt)
plot(x = h$mids, y=ec(h$mids)*max(h$counts), col = rgb(0,0,0,alpha=0), axes=F, xlab=NA, ylab=NA)
lines(x = h$mids, y=ec(h$mids)*max(h$counts), col ='red')
axis(4, at=seq(from = 0, to = max(h$counts), length.out = 11), labels=seq(0, 1, 0.1), col = 'red', col.axis = 'red')
mtext(side = 4, line = 3, 'Cumulative Density', col = 'red')
The trick is the following: You don't add a line to your plot, but plot another plot on top, that's why we need par(new = T). Then you have to add the y-axis later on (otherwise it will be plotted over the y-axis on the left).
Credits go here (#tim_yates Answer) and there.
There are two ways to go about this. One is to ignore the different scales and use relative frequency in your histogram. This results in a harder to read histogram. The second way is to alter the scale of one or the other element.
I suspect this question will soon become interesting to you, particularly #hadley 's answer.
ggplot2 single scale
Here is a solution in ggplot2. I am not sure you will be satisfied with the outcome though because the CDF and histograms (count or relative) are on quite different visual scales. Note this solution has the data in a dataframe called mydata with the desired variable in x.
library(ggplot2)
set.seed(27272)
mydata <- data.frame(x= rexp(333, rate=4) + rnorm(333))
ggplot(mydata, aes(x)) +
stat_ecdf(color="red") +
geom_bar(aes(y = (..count..)/sum(..count..)))
base R multi scale
Here I will rescale the empirical CDF so that instead of a max value of 1, its maximum value is whatever bin has the highest relative frequency.
h <- hist(mydata$x, freq=F)
ec <- ecdf(mydata$x)
lines(x = knots(ec),
y=(1:length(mydata$x))/length(mydata$x) * max(h$density),
col ='red')
you can try a ggplot approach with a second axis
set.seed(15)
a <- rnorm(500, 50, 10)
# calculate ecdf with binsize 30
binsize=30
df <- tibble(x=seq(min(a), max(a), diff(range(a))/binsize)) %>%
bind_cols(Ecdf=with(.,ecdf(a)(x))) %>%
mutate(Ecdf_scaled=Ecdf*max(a))
# plot
ggplot() +
geom_histogram(aes(a), bins = binsize) +
geom_line(data = df, aes(x=x, y=Ecdf_scaled), color=2, size = 2) +
scale_y_continuous(name = "Density",sec.axis = sec_axis(trans = ~./max(a), name = "Ecdf"))
Edit
Since the scaling was wrong I added a second solution, calculatin everything in advance:
binsize=30
a_range= floor(range(a)) +c(0,1)
b <- seq(a_range[1], a_range[2], round(diff(a_range)/binsize)) %>% floor()
df_hist <- tibble(a) %>%
mutate(gr = cut(a,b, labels = floor(b[-1]), include.lowest = T, right = T)) %>%
count(gr) %>%
mutate(gr = as.character(gr) %>% as.numeric())
# calculate ecdf with binsize 30
df <- tibble(x=b) %>%
bind_cols(Ecdf=with(.,ecdf(a)(x))) %>%
mutate(Ecdf_scaled=Ecdf*max(df_hist$n))
ggplot(df_hist, aes(gr, n)) +
geom_col(width = 2, color = "white") +
geom_line(data = df, aes(x=x, y=Ecdf*max(df_hist$n)), color=2, size = 2) +
scale_y_continuous(name = "Density",sec.axis = sec_axis(trans = ~./max(df_hist$n), name = "Ecdf"))
As already pointed out, this is problematic because the plots you want to merge have such different y-scales. You can try
set.seed(15)
mydata<-runif(50)
hist(mydata, freq=F)
lines(ecdf(mydata))
to get
Although a bit late... Another version which is working with preset bins:
set.seed(15)
dt <- rnorm(500, 50, 10)
h <- hist(
dt,
breaks = seq(0, 100, 1),
xlim = c(0,100))
ec <- ecdf(dt)
lines(x = h$mids, y=ec(h$mids)*max(h$counts), col ='red')
lines(x = c(0,100), y=c(1,1)*max(h$counts), col ='red', lty = 3) # indicates 100%
lines(x = c(which.min(abs(ec(h$mids) - 0.9)), which.min(abs(ec(h$mids) - 0.9))), # indicates where 90% is reached
y = c(0, max(h$counts)), col ='black', lty = 3)
(Only the second y-axis is not working yet...)
In addition to previous answers, I wanted to have ggplot do the tedious calculation (in contrast to #Roman's solution, which was kindly enough updated upon my request), i.e., calculate and draw the histogram and calculate and overlay the ECDF. I came up with the following (pseudo code):
# 1. Prepare the plot
plot <- ggplot() + geom_hist(...)
# 2. Get the max value of Y axis as calculated in the previous step
maxPlotY <- max(ggplot_build(plot)$data[[1]]$y)
# 3. Overlay scaled ECDF and add secondary axis
plot +
stat_ecdf(aes(y=..y..*maxPlotY)) +
scale_y_continuous(name = "Density", sec.axis = sec_axis(trans = ~./maxPlotY, name = "ECDF"))
This way you don't need to calculate everything beforehand and feed the results to ggpplot. Just lay back and let it do everything for you!