I want to compare two datasets with same x and y variables. However, not all X variable points are present on both. As a toy example say this is what I have:
position.x <- c(1,2,3)
score.x <- c(450,220,330)
x <- data.frame(position,score.x)
position.y <- c(2,3,5)
score.y <- c(333,423,988)
y<- data.frame(position.y,score.y)
par(mfrow = c(2,1))
plot(x, pch = 19)
plot(y, pch = 19)
X axes are not comparable. I found some post explaining how to do it on ggplot using facet_wrap but I would like to do it using base graph.
Thank you in advance.
you could specify the range of the x and y axises by xlim and slim
position.x <- c(1,2,3)
score.x <- c(450,220,330)
x <- data.frame(position,score.x)
position.y <- c(2,3,5)
score.y <- c(333,423,988)
y<- data.frame(position.y,score.y)
par(mfrow = c(2,1))
plot(x, pch = 19, xlim=c(1,5))
plot(y, pch = 19, xlim=c(1,5))
if you are going to repeat this, you might as well write some kind of function (which is one of the benefits of ggplot--it takes care of all the set-up for you):
## data needs to be in a long format
dat <- data.frame(position = c(1,2,3,2,3,5),
score = c(450,220,330,333,423,988),
z = c('x','x','x','y','y','y'))
facet_wrap <- function(data, x, y, z, horiz = TRUE, ...) {
## save current par settings and return after finished
op <- par(no.readonly = TRUE)
on.exit(par(op))
zz <- unique(data[, z])
## sets up the layout to cascade horizontally or vertically
## and sets xlim and ylim appropriately
if (horiz) {
par(mfrow = c(1, length(zz)), ...)
ylim <- range(data[, y])
xlim <- NULL
} else {
par(mfrow = c(length(zz), 1), ...)
xlim <- range(data[, x])
ylim <- NULL
}
## make a subset of data for each unique by variable
## and draw a basic plot for each one
for (ii in zz) {
tmp <- data[data[, z] %in% ii, ]
plot(tmp[, x], tmp[, y], xlim = xlim, ylim = ylim)
}
}
facet_wrap(dat, 'position', 'score', 'z', mar = c(5,4,2,2))
facet_wrap(dat, 'position', 'score', 'z', mar = c(5,4,1,2), horiz = FALSE)
Related
The type of plot I am trying to achieve in R seems to have been known as either as moving distribution, as joy plot or as ridgeline plot:
There is already a question in Stackoverflow whose recorded answer explains how to do it using ggplot: How to reproduce this moving distribution plot with R?
However, for learning purposes, I am trying to achieve the same using only base R plots (no lattice, no ggplot, no any plotting package).
In order to get started, I generated the following fake data to play with:
set.seed(2020)
shapes <- c(0.1, 0.5, 1, 2, 4, 5, 6)
dat <- lapply(shapes, function(x) rbeta(1000, x, x))
names(dat) <- letters[1:length(shapes)]
Then using mfrow I can achieve this:
par(mfrow=c(length(shapes), 1))
par(mar=c(1, 5, 1, 1))
for(i in 1:length(shapes))
{
values <- density(dat[[names(dat)[i]]])
plot(NA,
xlim=c(min(values$x), max(values$x)),
ylim=c(min(values$y), max(values$y)),
axes=FALSE,
main="",
xlab="",
ylab=letters[i])
polygon(values, col="light blue")
}
The result I get is:
Clearly, using mfrow (or even layout) here is not flexible enough and also does allow for the overlaps between the distributions.
Then, the question: how can I reproduce that type of plot using only base R plotting functions?
Here's a base R solution. First, we calculate all the density values and then manually offset off the y axis
vals <- Map(function(x, g, i) {
with(density(x), data.frame(x,y=y+(i-1), g))
}, dat, names(dat), seq_along(dat))
Then, to plot, we calculate the overall range, draw an empty plot, and the draw the densities (in reverse so they stack)
xrange <- range(unlist(lapply(vals, function(d) range(d$x))))
yrange <- range(unlist(lapply(vals, function(d) range(d$y))))
plot(0,0, type="n", xlim=xrange, ylim=yrange, yaxt="n", ylab="", xlab="Value")
for(d in rev(vals)) {
with(d, polygon(x, y, col="light blue"))
}
axis(2, at=seq_along(dat)-1, names(dat))
d = lapply(dat, function(x){
tmp = density(x)
data.frame(x = tmp$x, y = tmp$y)
})
d = lapply(seq_along(d), function(i){
tmp = d[[i]]
tmp$grp = names(d)[i]
tmp
})
d = do.call(rbind, d)
grp = unique(d$grp)
n = length(grp)
spcx = 5
spcy = 3
rx = range(d$x)
ry = range(d$y)
rx[2] = rx[2] + n/spcx
ry[2] = ry[2] + n/spcy
graphics.off()
plot(1, type = "n", xlim = rx, ylim = ry, axes = FALSE, ann = FALSE)
lapply(seq_along(grp), function(i){
x = grp[i]
abline(h = (n - i)/spcy, col = "grey")
axis(2, at = (n - i)/spcy, labels = grp[i])
polygon(d$x[d$grp == x] + (n - i)/spcx,
d$y[d$grp == x] + (n - i)/spcy,
col = rgb(0.5, 0.5, 0.5, 0.5))
})
Background
I have a function called TPN. When you run this function, it produces two plots (see picture below). The bottom-row plot samples from the top-row plot.
Question
I'm wondering how I could fix the ylim of the bottom-row plot to be always (i.e., regardless of the input values) the same as ylim of the top-row plot?
R code is provided below the picture (Run the entire block of code).
############## Input Values #################
TPN = function( each.sub.pop.n = 150,
sub.pop.means = 20:10,
predict.range = 10:0,
sub.pop.sd = .75,
n.sample = 2 ) {
#############################################
par( mar = c(2, 4.1, 2.1, 2.1) )
m = matrix( c(1, 2), nrow = 2, ncol = 1 ); layout(m)
set.seed(2460986)
Vec.rnorm <- Vectorize(function(n, mean, sd) rnorm(n, mean, sd), 'mean')
y <- c( Vec.rnorm(each.sub.pop.n, sub.pop.means, sub.pop.sd) )
set.seed(NULL)
x <- rep(predict.range, each = each.sub.pop.n)
plot(x, y) ## Plot #1
sample <- lapply(split(y, x), function(z) sample(z, n.sample, replace = TRUE))
sample <- data.frame(y = unlist(sample),
x = as.numeric(rep(names(sample), each = n.sample)))
plot(sample$x, sample$y) ## Plot # 2
}
## TEST HERE:
TPN()
You can get the ylim using par("yaxp")[1:2]. So, you can change the second plot code to have its ylim as the first plot's:
plot(sample$x, sample$y, ylim = par("yaxp")[1:2]) ## Plot # 2
or as mentioned in the comments, you can simply set the ylim for both plots to be range of both data-sets and add that to both plots:
ylim = range(c(y, sample$y))
Another option: Produce the same plot again but with type = "n" and then filling the points with points(). For example, change your plot 2 to
plot(x, y, type = "n")
points(sample$x, sample$y)
A benefit of this approach is that everything in the plot will be exactly the same, not just the y-axis (which may or may not matter for your function).
I have a chart created inside a couple of loops and I want to automatically write the chart to a file at the end of the outer loop. Here is a toy example:
filename <- "mychart"
for(i in 1:5) {
x <- 1:5
fun1 <- sample(1:10, 5, replace = TRUE)
xlim <- c(1, 5)
ylim <- c(0, 10)
plot(x, fun1, xlim = xlim, ylim = ylim, type = "l")
for(j in 1:3) {
fun2 <- 2:6 + j
lines(x, fun2, type = "l", col = "red")
}
out.filename <- paste(filename, i, sep = "")
## want to save this plot out to disk here!
}
I would also like to create the plot on the console so I can watch the program’s progress. Most answers to a similar question seem to deal with a plot that is created with a single “plot” statement, or do not enable the console plot window. Any suggestions much appreciated.
I think this does what you're after:
plotit <- function(i) {
x = 1:5
fun1 = sample(1:10, 5, replace=TRUE)
plot(x, fun1, xlim=c(1,5), ylim=c(0,10), type="l")
for(j in 1:3) {
fun2 = 2:6 + j
lines(x, fun2, type = "l", col = "red")
}
savePlot(paste0("mychart", i, ".png"), type="png")
}
Then:
for(i in seq(5)) plotit(i)
The typical way to save base graphics plots is with individual device functions such as pdf(), png(), etc. You open a plot device with the appropriate filename, create your plot, then close the device with dev.off(). It doesn't matter if your plot is created in a for loop or not. See lots of devices (and examples at the bottom) in ?png.
For your code, it would go something like this:
filename <- "mychart"
for(i in 1:5) {
out.filename <- paste(filename, i, ".png", sep = "")
## Open the device before you start plotting
png(file = out.filename)
# you can set the height and width (and other parameters) or use defaults
x <- 1:5
fun1 <- sample(1:10, 5, replace = TRUE)
xlim <- c(1, 5)
ylim <- c(0, 10)
plot(x, fun1, xlim = xlim, ylim = ylim, type = "l")
for(j in 1:3) {
fun2 <- 2:6 + j
lines(x, fun2, type = "l", col = "red")
}
## Close the device when you are done plotting.
dev.off()
}
I would like to add to a clusplot plot the variables used for pca as arrows. I am not sure that a way has been implemented (I can't find anything in the documentation).
I have produced a clusplot that looks like this:
With the package princomp I can independently plot the observations in an analogous space of representation, with the variables (columns) as arrows:
Is there a way to do the two things at the same time, by showing the clusters and the variables of pca on the same diagram?
I wanted to to the same thing as OP today and ended up putting pieces from clusplot and biplot together. This is the result which may be useful if you want to do the same thing:
clusplot2 <- function(dat, clustering, ...) {
clusplot(dat, clustering, ...)
## this is from clusplot.default
pca <- princomp(dat, scores = TRUE, cor = (ncol(dat) != 2))
## this is (adapted) from biplot.princomp
directions <- t(t(pca$loadings[, 1:2]) * pca$sdev[1:2]) * sqrt(pca$n.obs)
## all below is (adapted) from biplot.default
unsigned.range <- function(x) c(-abs(min(x, na.rm = TRUE)),
abs(max(x, na.rm = TRUE)))
x <- predict(pca)[, 1:2]
y <- directions
rangx1 <- unsigned.range(x[, 1L])
rangx2 <- unsigned.range(x[, 2L])
rangy1 <- unsigned.range(y[, 1L])
rangy2 <- unsigned.range(y[, 2L])
xlim <- ylim <- rangx1 <- rangx2 <- range(rangx1, rangx2)
ratio <- max(rangy1/rangx1, rangy2/rangx2)
par(new = T)
col <- par("col")
if (!is.numeric(col))
col <- match(col, palette(), nomatch = 1L)
col <- c(col, col + 1L)
cex <- rep(par("cex"), 2)
plot(y, axes = FALSE, type = "n", xlim = xlim * ratio, ylim = ylim *
ratio, xlab = "", ylab = "", col = col[1L])
axis(3, col = col[2L])
axis(4, col = col[2L])
box(col = col[1L])
text(y, labels = names(dat), cex = cex[2L], col = col[2L])
arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L],
length = 0.1)
}
############################################################
library(cluster)
dat <- iris[, 1:4]
clus <- pam(dat, k = 3)
clusplot2(dat, clus$clustering, main = "Test")
Of course there is much room for improvement (as this is just copied together) but I think anyone can easily adapt it if needed.
If you wonder why the arrows (loadings * sdev) are scaled with 0.8 * sqrt(n): I have absolutely no idea. I would have plotted loadings * sdev which should resemble the correlation between the principal components and the variables but this is how biplot does it.
Anyway, this should produce the same arrows as biplot.princomp and use the same pca as clusplot which was the primary goal for me.
I am trying to visualize a curve for pollination distribution. I am very new to R so please don't be upset by my stupidity.
llim <- 0
ulim <- 6.29
f <- function(x,y) {(.156812/((2*pi)*(.000005^2)*(gamma(2/.156812)))*exp(-((sqrt(x^2+y^2))/.000005)^.156812))}
integrate(function(y) {
sapply(y, function(y) {
integrate(function(x) f(x,y), llim, ulim)$value
})
}, llim, ulim)
fv <- Vectorize(f)
curve(fv, from=0, to=1000)
And I get:
Error in y^2 : 'y' is missing
I'm not quite sure what you're asking to plot. But I know you want to visualise your scalar function of two arguments.
Here are some approaches. First we define your function.
llim <- 0
ulim <- 6.29
f <- function(x,y) {
(.156812/((2*pi)*(.000005^2)*(gamma(2/.156812)))*exp(-((sqrt(x^2+y^2))/.000005)^.156812))
}
From your title I thought of the following. The function defined below intf integrates your function over the square [0,ul] x [0,ul] and return the value. We then vectorise and plot the integral over the square as a function the length of the side of the square.
intf <- function(ul) {
integrate(function(y) {
sapply(y, function(y) {
integrate(function(x) f(x,y), 0, ul)$value
})
}, 0, ul)$value
}
fv <- Vectorize(intf)
curve(fv, from=0, to=1000)
If f is a distribution, I guess you can make your (somewhat) nice probability interpretation of this curve. (I.e. ~20 % probability of pollination(?) in the 200 by 200 meter square.)
However, you can also do a contour plot (of the log-transformed values) which illustrate the function we are integrating above:
logf <- function(x, y) log(f(x, y))
x <- y <- seq(llim, ulim, length.out = 100)
contour(x, y, outer(x, y, logf), lwd = 2, drawlabels = FALSE)
You can also plot some profiles of the surface:
plot(1, xlim = c(llim, ulim), ylim = c(0, 0.005), xlab = "x", ylab = "f")
y <- seq(llim, ulim, length.out = 6)
for (i in seq_along(y)) {
tmp <- function(x) f(x, y = y[i])
curve(tmp, llim, ulim, add = TRUE, col = i)
}
legend("topright", lty = 1, col = seq_along(y),
legend = as.expression(paste("y = ",y)))
They need to be modified a bit to make them publication worthy, but you get the idea. Lastly, you can do some 3d plots as others have suggested.
EDIT
As per your comments, you can also do something like this:
# Define the function times radius (this time with general a and b)
# The default of a and b is as before
g <- function(z, a = 5e-6, b = .156812) {
z * (b/(2*pi*a^2*gamma(2/b)))*exp(-(z/a)^b)
}
# A function that integrates g from 0 to Z and rotates
# As g is not dependent on the angle we just multiply by 2pi
intg <- function(Z, ...) {
2*pi*integrate(g, 0, Z, ...)$value
}
# Vectorize the Z argument of intg
gv <- Vectorize(intg, "Z")
# Plot
Z <- seq(0, 1000, length.out = 100)
plot(Z, gv(Z), type = "l", lwd = 2)
lines(Z, gv(Z, a = 5e-5), col = "blue", lwd = 2)
lines(Z, gv(Z, b = .150), col = "red", lwd = 2)
lines(Z, gv(Z, a = 1e-4, b = .2), col = "orange", lwd = 2)
You can then plot the curves for the a and b you want. If either is not specified, the default is used.
Disclaimer: my calculus is rusty and I just did off this top of my head. You should verify that I've done the rotation of the function around the axis properly.
The lattice package has several functions that can help you draw 3 dimensional plots, including wireframe() and persp(). If you prefer not to use a 3d-plot, you can create a contour plot using contour().
Note: I don't know if this is intentional, but your data produces a very large spike in one corner of the plot. This produces a plot that is for all intents flat, with a barely noticable spike in one corner. This is particularly problematic with the contour plot below.
library(lattice)
x <- seq(0, 1000, length.out = 50)
y <- seq(0, 1000, length.out = 50)
First the wire frame plot:
df <- expand.grid(x=x, y=y)
df$z <- with(df, f(x, y))
wireframe(z ~ x * y, data = df)
Next the perspective plot:
dm <- outer(x, y, FUN=f)
persp(x, y, dm)
The contour plot:
contour(x, y, dm)