I have a dataset with the length and width of an prolate spheroid:
df <- data.frame(nr = c(1, 2, 3), length = c(4, 5, 3), width = c(2, 2, 1))
df
Now I want to make an extra column with the volume. I've used the formula V = 4/3*pi*a²b (with a and b = 1/2 length and width respectively):
df$volume <- (4/3)*pi*(df$length/2)^2*(df$width/2)
This works, but I want to know if there is maybe an inbuilt formula for this in R?
I'm not aware of such a function, and I'd be a bit surprised to see it implemented somewhere, just because it's a somewhat esoteric geometry thing. (Every esoteric stats concept has been implemented somewhere in R or in a package, but esoteric geometry concepts are much rarer since R is not a tool designed with geometric applications specifically in mind.)
That said: whether or not such a thing has already been implemented somewhere, why not just write a custom function using the code you gave?
spheroid_vol <- function(length, width){
4/3 * pi * (length/2)^2 * (width/2)^2
}
> spheroid_vol(df$length, df$width)
# 16.755161 26.179939 2.356194
You could also make this a better function by doing things like checking to make sure the inputs are nonempty, etc. -- but that may or may not be worth the effort depending on what you want to do with it.
Related
I have a list of polygons (and multi-polygons) named p_1, p_2, ..., p_n. And I would like to obtain the area in which they all intersect. As st_intersection() does not accept lists as arguments, I tried the following three approaches. None of them provides a satisfactory solution, which is why I am looking for alternative, more efficient techniques.
(i) I could loop through the list
for(i in P) p_1 <- st_intersection(p_1, i)
where P is a list containing polygons p_2 to p_n. But that is rather slow.
(ii) A do.call() approach, i.e.
p <- do.call(st_intersection, P)
where P is a list of polygons p_1 to p_n, only computes the intersection between the first two polygons in the list.
(iii) I could combine the polygons into one sf object and then run st_intersection():
p <- do.call(c, P) %>%
st_sf() %>%
st_intersection()
It works but is slow. Presumeably because it also derives a lot of other polygons apart from the common intersection of all polygons in P.
None of the three approaches provides a satisfactory solution. Looping through a hierarchy of pairwise comparisons in a parallelized framework might be faster. However, I assume there to be a simpler and more efficient solution than that.
Any comments and suggestions are welcome.
A note to the person who closed this question yesterday: do not close this question. Comment or send me a private message, if you personally have a problem with it. But do not close it.
I don't think the overhead of iterating through a list is a problem here: finding the intersection of multiple polygons is just computationally expensive. However, the method of sequentially applying a function to members of a list (effectively what you were trying to do with do.call) is easily managed using purrr::accumulate:
You don't have a reproducible example for folks here to test possible solutions, and creating sf polygons from scratch involves some work, so that may have been why your previous question was closed - I don't know.
Anyway, lets create three overlapping squares in a list and draw them:
library(sf)
library(purrr)
# create square
s1 <- rbind(c(1, 1), c(10, 1), c(10, 10), c(1, 10), c(1, 1))
p <- list(s1 = s1, s2 = s1 + 4, s3 = s1 - 4)
p <- lapply(p, function(x) st_sfc(st_polygon(list(x))) )
plot(p[[1]], xlim = c(-5, 15), ylim = c(-5, 15))
plot(p[[2]], add = TRUE)
plot(p[[3]], add = TRUE)
Our goal is to find the intersection of all three squares, which of course is the tiny square in the center. Using purrr, this is as easy as:
intersection <- accumulate(p, st_intersection)$s3
So when we add our result, coloured red, we get:
plot(intersection, col = "red", add = TRUE)
In terms of performance, accumulate is only about 10% faster than the raw loop, so you may need to parallelize this if performance is a big problem. Also, if there is a possibility that there is no intersection between all the polygons, you can find your smallest polygon and use st_intersects to ensure that all the polygons actually intersect it. This is a much quicker calculation provided that there is a fair chance of there being no unqiue intersection.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have some data with a special structure that requires me to write my own k-means function. Not far into this, I have already noticed the extremely high computation time when calculating the distance of a center to all data points. Because my data is going to be about 60 times larger in the future and I'll need to do runs with many different cluster sizes, I am very concerned with speed.
I have a attached a minimal example of calculating the distance from one randomly sampled center to each data point. I am not yet experienced using C++ or parallel computation in R, but I am most unsure about which of these solutions is the best approach to my problem (here and there some people claim you should parallelize whenever, some people claim it is almost never necessary, some advice for, some advice against using Rcpp). As with most things in life I'm sure there are cases where all of these answers are correct. However, what are the general circumstances when to go with which approach?
(I have profiled this code and couldn't find anything I could improve just within the R code for speed. If you have any suggestions however, please let me know as well!)
x <- matrix(runif(15000*34),nrow = 15000, ncol = 34)
w <- matrix(runif(15000*17),nrow = 15000, ncol = 17)
k <- 3
i <- 1
centers <- x[sample.int(nrow(x), size = k),]
weighted_matching <- function(point,center,weight){
point <- matrix(point, ncol = 2, nrow = 17, byrow = T)
center <- matrix(center, ncol = 2, nrow = 17, byrow = T)
1/sum(weight) * sum(weight * apply(point, 1, function(x,y) sqrt(sum((x-y)^2)), y = center))
}
system.time(
apply(x, 1, weighted_matching, weight = w, center = centers[i,])
)
There are two cases I use C++ in replacement to R:
iterate over lots of elements (large for-loops)
want to reduce memory-footprint
In your case, you are already using vectorized code instead of loop, so the first point does not apply.
The second point however could be beneficial; indeed, you're computing (x-y)^2 which creates two new temporary vectors.
It would be beneficial to rewrite this in C++ to use less memory and maybe getting a 2-3 fold improvement in computation time.
But, when I usually hear about "computing distances", I would probably go for trying to derive this using matrix computations (linear algebra).
I have a set of one-dimensional data points (locations on a segment), and I would like to test for Complete Spatial randomness. I was planning to run Gest (nearest neighbor), Fest (empty space) and Kest (pairwise distances) functions on it.
I am not sure how I should import my data set though. I can use ppp by setting a second dimension to 0, e.g.:
myDistTEST<- data.frame(
col1= sample(x = 1:100, size = 50, replace = FALSE),
col2= paste('Event', 1:50, sep = ''), stringsAsFactors = FALSE)
myDistTEST<- myDistTEST[order(myDistTEST$col1),]
myPPPTest<- ppp(x = myDistTEST[,1], y = replicate(n = 50, expr = 0),
c(1,120), c(0,0))
But I am not sure it is the proper way to format my data. I have also tried to use lpp, but I am not sure how to set the linnet object. What would be the correct way to import my data?
Thank you for your kind attention.
It will be wrong to simply let y=0 for all your points and then proceed as if you had a point pattern in two dimensions. Your suggestion of using lpp is good. Regarding how to define the linnet and lpp try to look at my answer here.
I have considered making a small package to handle one dimensional patterns more easily in spatstat, but so far I have only started the package with a single function to make the definition of the appropriate lpp easier. If you feel adventurous you can install it from the GitHub repo via the remotes package:
remotes::install_github("rubak/spatstat.1d")
The single function you can use is called lpp1. It basically just wraps up the few steps described in the linked answer.
Somewhat inexplicably, the length parameter in arrows is specified in inches (from ?arrows):
length length of the edges of the arrow head (in inches).
R source even goes so far as to explicitly make note that this measurement is in inches in a comment, highlighting how peculiar this design is.
That means the relative size of the arrows depends on dev.size(). What's not clear is how to translate inches into axis units (which are infinitely more useful in the first place). Here's a simplified version:
h = c(1, 2, 3)
xs = barplot(h, space = 0, ylim = c(0, 4))
arrows(xs, h - .5, xs, h + .5,
length = .5*mean(diff(xs)))
How this displays will depend on the device. E.g. here is the output on this device:
png('test.png', width = 5, height = 5)
And here it is on another:
png('test.png', width = 8, height = 8)
It's a bit of an optical illusion to tell on sight, but the arrows are indeed the same width in the two plots. How can I control this so that both plots (which convey the same data) display identically? More specifically, how can I make sure that the arrows are exactly .5 plot units in width?
I spent far too much time in the rabbit hole on this, but here goes. I'll document a bit of my journey first to aid others who happen upon this in the types of nooks and crannies to search when trying to pull yourself up by your bootstraps.
I started looking in the source of arrows, but to no avail, since it quickly dives into internal code. So I searched the R source for "C_arrows" to find what's happening; luckily, it's not too esoteric, as far as R internal code goes. Poking around it seems the workhorse is actually GArrow, but this was a dead end, as it seems the length parameter isn't really transformed there (IIUC this means the conversion to inches is done for the other coordinates and length is untouched). But I happened to notice some GConvert calls that looked closer to what I want and hoped to find some user-facing function that appeals to these directly.
This led me to go back to R and to simply run through the gamut of functions in the same package as arrows looking for anything that could be useful:
ls(envir = as.environment('package:grDevices'))
ls(envir = as.environment('package:graphics'))
Finally I found three functions in graphics: xinch, yinch, and xyinch (all found on ?xinch) are used for the opposite of my goal here -- namely, they take inches and convert them into device units (in the x, y, and x&y directions, respectively). Luckily enough, these functions are all very simple, e.g. the work horse of xinch is the conversion factor:
diff(par("usr")[1:2])/par("pin")[1L]
Examining ?par (for the 1,000,000th time), indeed pin and usr are exactly the graphical parameter we need (pin is new to me, usr comes up here and there):
pin The current plot dimensions, (width, height), in inches.
usr A vector of the form c(x1, x2, y1, y2) giving the extremes of the user coordinates of the plotting region.
Hence, we can convert from plot units to inches by inverting this function:
xinch_inv = function(dev_unit) {
dev_unit * par("pin")[1L]/diff(par("usr")[1:2])
}
h = c(1, 2, 3)
xs = barplot(h, space = 0, ylim = c(0, 4))
arrows(xs, h - .5, xs, h + .5,
# just convert plot units to inches
length = xinch_inv(.5*mean(diff(xs))))
Resulting in (5x5):
And (8x8):
One further note, it appears length is the length of each side of the arrow head -- using length = xinch_inv(.5), code = 3, angle = 90 results in segments as wide as the bars (i.e., 1).
On the off chance you're interested, I've packaged these in my package as xdev2in, etc.; GitHub only for now.
I'm using psych::principal in another function, with various rotate functions passed to principal.
(principal offers many rotation options and passes them on to different other functions).
I need to get the rotation matrix that whichever rotation procedure was used found, and implemented.
All of the downstream rotation procedures offer this, but it appears not to be return()ed by principal.
For example:
randomcor <- cor(matrix(data = rnorm(n = 100), nrow = 10))
library(psych)
principalres <- principal(r = randomcor, nfactors = 3, rotate = "none")
unrot.loa <- unclass(principalres$loadings)
principalrot <- principal(r = randomcor, nfactors = 3, rotate = "varimax") # there is no way to retrieve the rot.mat from principal
# but this CAN be done from the underlying varimax!
varimaxres <- varimax(x = unrot.loa)
varimaxres$rotmat # see, THIS is what I want!
I am loathe to re-implement all of the rotation procedures from principal.
(Don't repeat yourself, or someone else, as the say).
Does anyone have an idea how:
I could elegantly, somehow, magically, retrieve rotmat from principal(), though it appears not to return it?
I could, alternatively, impute whichever rotmat must have "happened", because I know the rotated and unrotated loadings?
as promised by William Revelle, as of 1.5.8, psych also returns the rotation matrices for factor analyses and principal components analysis.
This solves the problem, continuing the above example:
principalrot$rot.mat == varimaxres$rotmat # it works!