Is there some way how to make custom points in R? I am familiar with pch argument where are many choices, but what if I need to plot for example tree silhouettes?
For example if I draw some point as eps. (or similar) file, can I use it in R?. Solution by raster is not good in the case of complicated objects (f.e. trees).
You can do this with the grImport package. I drew a spiral in Inkscape and saved it as drawing.ps. Following the steps outlined in the grImport vignette, we trace the file and read it as a sort of polygon.
setwd('~/R/')
library(grImport)
library(lattice)
PostScriptTrace("drawing.ps") # creates .xml in the working directory
spiral <- readPicture("drawing.ps.xml")
The vignette uses lattice to plot the symbols. You can also use base graphics, although a conversion is needed from device to plot coordinates.
# generate random data
x = runif(n = 10, min = 1, max = 10)
y = runif(n = 10, min = 1, max = 10)
# lattice (as in the vignette)
x11()
xyplot(y~x,
xlab = "x", ylab = "y",
panel = function(x, y) {
grid.symbols(spiral, x, y, units = "native", size = unit(10, "mm"))
})
# base graphics
x11()
plot(x, y, pty = 's', type = 'n', xlim = c(0, 10), ylim = c(0, 10))
xx = grconvertX(x = x, from = 'user', to = 'ndc')
yy = grconvertY(y = y, from = 'user', to = 'ndc')
grid.symbols(spiral, x = xx, y = yy, size = 0.05)
Related
I am using the R programming language. I am using a computer that does not have a USB port or an internet connection - I only have R with a few preloaded libraries (e.g. ggplot2, reshape2, dplyr, base R).
Is it possible to make "parallel coordinate" plots (e.g. below) using only the "ggplot2" library and not "ggally"?
#load libraries (I do not have GGally)
library(GGally)
#load data (I have MASS)
data(crabs, package = "MASS")
#make 2 different parallel coordinate plots
ggparcoord(crabs)
ggparcoord(crabs, columns = 4:8, groupColumn = "sex")
Thanks
Source: https://homepage.divms.uiowa.edu/~luke/classes/STAT4580-2020/parcor.html
In fact, you do not even need ggplot! This is just a plot of standardised values (minus mean divided by SD), so you can implement this logic with any plotting function capable of doing so. The cleanest and easiest way to do it is in steps in base R:
# Standardising the variables of interest
data(crabs, package = "MASS")
crabs[, 4:8] <- apply(crabs[, 4:8], 2, scale)
# This colour solution works in great generality, although RColorBrewer has better distinct schemes
mycolours <- rainbow(length(unique(crabs$sex)), end = 0.6)
# png("gally.png", 500, 400, type = "cairo", pointsize = 14)
par(mar = c(4, 4, 0.5, 0.75))
plot(NULL, NULL, xlim = c(1, 5), ylim = range(crabs[, 4:8]) + c(-0.2, 0.2),
bty = "n", xaxt = "n", xlab = "Variable", ylab = "Standardised value")
axis(1, 1:5, labels = colnames(crabs)[4:8])
abline(v = 1:5, col = "#00000033", lwd = 2)
abline(h = seq(-2.5, 2.5, 0.5), col = "#00000022", lty = 2)
for (i in 1:nrow(crabs)) lines(as.numeric(crabs[i, 4:8]), col = mycolours[as.numeric(crabs$sex[i])])
legend("topright", c("Female", "Male"), lwd = 2, col = mycolours, bty = "n")
# dev.off()
You can apply this logic (x axis with integer values, y axis with standardised variable lines) in any package that can conveniently draw multiple lines (as in time series), but this solution has no extra dependencies an will not become unavailable due to an orphaned package with 3 functions getting purged from CRAN.
The closest thing I found to this without the "GGally" was the built in function using the "MASS" library:
#source: https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/parcoord.html
library(MASS)
parcoord(state.x77[, c(7, 4, 6, 2, 5, 3)])
ir <- rbind(iris3[,,1], iris3[,,2], iris3[,,3])
parcoord(log(ir)[, c(3, 4, 2, 1)], col = 1 + (0:149)%/%50)
I'm trying to create a very simple 3D plot using the rgl package: I have a function that just maps x values into y values. For a given z (in my example: z = 1), I can plot this function in a 3D plot:
library(rgl)
mycurve <- function(x) { return (1/x)}
myx <- seq(1, 10, by = 0.1)
plot3d(x = NA, xlim = c(0, 10), ylim = c(0, 10), zlim = c(0, 5),
xlab = "x", ylab = "y", zlab = "height")
lines3d(x = myx, y = mycurve(myx), z = 1)
However, even after hours of trying to understand the documentation of ?persp3d and ?surface3d, I still have no idea how to add a surface to my plot that "connects" my line to the x-y plane – like this:
(To generate this image, I cheated by plotting many lines: for (i in seq(0, 1, by = 0.01)) { lines3d(x = myx, y = mycurve(myx), z = i) }.)
I suppose that I need to supply the correct values to surface3d somehow. From ?surface3d:
The surface is defined by the matrix of height values in z, with rows corresponding to the values in x and columns corresponding to the values in y.
Given that my space curve is "vertical", each value of x corresponds to only 1 value of y. Still, I need to specify two z values for each xy pair, which is why I do not know how to proceed.
How can I plot a space curve as shown in the second image?
In persp3d, all 3 arguments can be matrices, so you can plot arbitrary surfaces. For your needs, this works:
mycurve <- function(x) { return (1/x)}
myx <- seq(1, 10, by = 0.1)
xmat <- matrix(NA, 2, length(myx))
ymat <- matrix(NA, 2, length(myx))
zmat <- matrix(NA, 2, length(myx))
for (i in 0:1) {
xmat[i+1,] <- myx
ymat[i+1,] <- mycurve(myx)
zmat[i+1,] <- i
}
library(rgl)
persp3d(x = xmat, y = ymat, z = zmat, xlim = c(0, 10), ylim = c(0, 10), zlim = c(0, 5),
xlab = "x", ylab = "y", zlab = "height", col = "gray")
The image produced looks like this:
If you want z to depend on x or y, you'll likely want a smaller step size, but this works for the surface you're after.
To use the persp3d function one needs to create a matrix for z to correspond to all of the x and y values in the desired range.
I revised your function to take both the x and y parameters and return the desired z value. The outer function will call the function repeatedly to fill the matrix. Then plot, with the defined x and y axis and z (from the outer function)
library(rgl)
mycurve <- function(x, y) { return (1/x)}
myx <- seq(1, 10, by = 0.4)
myy <-seq(1, 10, by =0.4)
#create matrix
data<-outer(myx, myy, mycurve)
#plot points
persp3d(x=myx, y=myy, z=data,
xlab = "x", ylab = "y", zlab = "height")
The first figure in link here shows a very nice example of how to visualise standard error and I would like to replicate that in R.
I'm getting there with the following
set.seed(1)
pop<-rnorm(1000,175,10)
mean(pop)
hist(pop)
#-------------------------------------------
# Plotting Standard Error for small Samples
#-------------------------------------------
smallSample <- replicate(10,sample(pop,3,replace=TRUE)) ; smallSample
smallMeans<-colMeans(smallSample)
par(mfrow=c(1,2))
x<-c(1:10)
plot(x,smallMeans,ylab="",xlab = "",pch=16,ylim = c(150,200))
abline(h=mean(pop))
#-------------------------------------------
# Plotting Standard Error for Large Samples
#-------------------------------------------
largeSample <- replicate(10,sample(pop,20,replace=TRUE))
largeMeans<-colMeans(largeSample)
x<-c(1:10)
plot(x,largeMeans,ylab="",xlab = "",pch=16,ylim = c(150,200))
abline(h=mean(pop))
But I'm not sure how to plot the raw data as they have with the X symbols. Thanks.
Using base plotting, you need to use the arrows function.
In R there is no function (ASAIK) that computes standard error so try this
sem <- function(x){
sd(x) / sqrt(length(x))
}
Plot (using pch = 4 for the x symbols)
plot(x, largeMeans, ylab = "", xlab = "", pch = 4, ylim = c(150,200))
abline(h = mean(pop))
arrows(x0 = 1:10, x1 = 1:10, y0 = largeMeans - sem(largeSample) * 5, largeMeans + sem(largeSample) * 5, code = 0)
Note: the SE's from the data you provided were quite small, so i multiplied them by 5 to make them more obvious
Edit
Ahh, to plot all the points, then perhaps ?matplot, and ?matpoints would be helpful? Something like:
matplot(t(largeSample), ylab = "", xlab = "", pch = 4, cex = 0.6, col = 1)
abline(h = mean(pop))
points(largeMeans, pch = 19, col = 2)
Is this more the effect you're after?
Is there some way how to make custom points in R? I am familiar with pch argument where are many choices, but what if I need to plot for example tree silhouettes?
For example if I draw some point as eps. (or similar) file, can I use it in R?. Solution by raster is not good in the case of complicated objects (f.e. trees).
You can do this with the grImport package. I drew a spiral in Inkscape and saved it as drawing.ps. Following the steps outlined in the grImport vignette, we trace the file and read it as a sort of polygon.
setwd('~/R/')
library(grImport)
library(lattice)
PostScriptTrace("drawing.ps") # creates .xml in the working directory
spiral <- readPicture("drawing.ps.xml")
The vignette uses lattice to plot the symbols. You can also use base graphics, although a conversion is needed from device to plot coordinates.
# generate random data
x = runif(n = 10, min = 1, max = 10)
y = runif(n = 10, min = 1, max = 10)
# lattice (as in the vignette)
x11()
xyplot(y~x,
xlab = "x", ylab = "y",
panel = function(x, y) {
grid.symbols(spiral, x, y, units = "native", size = unit(10, "mm"))
})
# base graphics
x11()
plot(x, y, pty = 's', type = 'n', xlim = c(0, 10), ylim = c(0, 10))
xx = grconvertX(x = x, from = 'user', to = 'ndc')
yy = grconvertY(y = y, from = 'user', to = 'ndc')
grid.symbols(spiral, x = xx, y = yy, size = 0.05)
I would like to overlay 2 density plots on the same device with R. How can I do that? I searched the web but I didn't find any obvious solution.
My idea would be to read data from a text file (columns) and then use
plot(density(MyData$Column1))
plot(density(MyData$Column2), add=T)
Or something in this spirit.
use lines for the second one:
plot(density(MyData$Column1))
lines(density(MyData$Column2))
make sure the limits of the first plot are suitable, though.
ggplot2 is another graphics package that handles things like the range issue Gavin mentions in a pretty slick way. It also handles auto generating appropriate legends and just generally has a more polished feel in my opinion out of the box with less manual manipulation.
library(ggplot2)
#Sample data
dat <- data.frame(dens = c(rnorm(100), rnorm(100, 10, 5))
, lines = rep(c("a", "b"), each = 100))
#Plot.
ggplot(dat, aes(x = dens, fill = lines)) + geom_density(alpha = 0.5)
Adding base graphics version that takes care of y-axis limits, add colors and works for any number of columns:
If we have a data set:
myData <- data.frame(std.nromal=rnorm(1000, m=0, sd=1),
wide.normal=rnorm(1000, m=0, sd=2),
exponent=rexp(1000, rate=1),
uniform=runif(1000, min=-3, max=3)
)
Then to plot the densities:
dens <- apply(myData, 2, density)
plot(NA, xlim=range(sapply(dens, "[", "x")), ylim=range(sapply(dens, "[", "y")))
mapply(lines, dens, col=1:length(dens))
legend("topright", legend=names(dens), fill=1:length(dens))
Which gives:
Just to provide a complete set, here's a version of Chase's answer using lattice:
dat <- data.frame(dens = c(rnorm(100), rnorm(100, 10, 5))
, lines = rep(c("a", "b"), each = 100))
densityplot(~dens,data=dat,groups = lines,
plot.points = FALSE, ref = TRUE,
auto.key = list(space = "right"))
which produces a plot like this:
That's how I do it in base (it's actually mentionned in the first answer comments but I'll show the full code here, including legend as I can not comment yet...)
First you need to get the info on the max values for the y axis from the density plots. So you need to actually compute the densities separately first
dta_A <- density(VarA, na.rm = TRUE)
dta_B <- density(VarB, na.rm = TRUE)
Then plot them according to the first answer and define min and max values for the y axis that you just got. (I set the min value to 0)
plot(dta_A, col = "blue", main = "2 densities on one plot"),
ylim = c(0, max(dta_A$y,dta_B$y)))
lines(dta_B, col = "red")
Then add a legend to the top right corner
legend("topright", c("VarA","VarB"), lty = c(1,1), col = c("blue","red"))
I took the above lattice example and made a nifty function. There is probably a better way to do this with reshape via melt/cast. (Comment or edit if you see an improvement.)
multi.density.plot=function(data,main=paste(names(data),collapse = ' vs '),...){
##combines multiple density plots together when given a list
df=data.frame();
for(n in names(data)){
idf=data.frame(x=data[[n]],label=rep(n,length(data[[n]])))
df=rbind(df,idf)
}
densityplot(~x,data=df,groups = label,plot.points = F, ref = T, auto.key = list(space = "right"),main=main,...)
}
Example usage:
multi.density.plot(list(BN1=bn1$V1,BN2=bn2$V1),main='BN1 vs BN2')
multi.density.plot(list(BN1=bn1$V1,BN2=bn2$V1))
You can use the ggjoy package. Let's say that we have three different beta distributions such as:
set.seed(5)
b1<-data.frame(Variant= "Variant 1", Values = rbeta(1000, 101, 1001))
b2<-data.frame(Variant= "Variant 2", Values = rbeta(1000, 111, 1011))
b3<-data.frame(Variant= "Variant 3", Values = rbeta(1000, 11, 101))
df<-rbind(b1,b2,b3)
You can get the three different distributions as follows:
library(tidyverse)
library(ggjoy)
ggplot(df, aes(x=Values, y=Variant))+
geom_joy(scale = 2, alpha=0.5) +
scale_y_discrete(expand=c(0.01, 0)) +
scale_x_continuous(expand=c(0.01, 0)) +
theme_joy()
Whenever there are issues of mismatched axis limits, the right tool in base graphics is to use matplot. The key is to leverage the from and to arguments to density.default. It's a bit hackish, but fairly straightforward to roll yourself:
set.seed(102349)
x1 = rnorm(1000, mean = 5, sd = 3)
x2 = rnorm(5000, mean = 2, sd = 8)
xrng = range(x1, x2)
#force the x values at which density is
# evaluated to be the same between 'density'
# calls by specifying 'from' and 'to'
# (and possibly 'n', if you'd like)
kde1 = density(x1, from = xrng[1L], to = xrng[2L])
kde2 = density(x2, from = xrng[1L], to = xrng[2L])
matplot(kde1$x, cbind(kde1$y, kde2$y))
Add bells and whistles as desired (matplot accepts all the standard plot/par arguments, e.g. lty, type, col, lwd, ...).