Looping in function using R - r

require(fracdiff)
#load your data, n is the sample size
x<-matrix(scan("data.txt", n = ), n, 1, byrow = TRUE)
x<-log(x)
x<-x-mean(x)
n<-length(x)
#select the truncation
m<-round(n^(0.7))
perdx<-px[2:n]
fn<-function(h)
{
lambda<-freq[1:m]^(2*h-1)
Gh=mean(perdx[1:m]*lambda)
Rh=log(Gh)-(2*h-1)*mean(log(freq[1:m]))
return(Rh)
}
est<-optimize(fn,c(0,1.5),tol = 0.00001)
hhat<-est$minimum
b <- hhat-0.5
b
I have this function written in R and I want to do loop for m, where m<-round(n^(0.7)) where the power of n running from 0.3-0.8(the default is 0.7, so I have different value of m supply to the function), and ultimately get the string of b (every b for the power of n running from 0.3-0.8), but so far I have been unsuccessful. Furthermore, I want to plot the different values of b with respect to m. I'm really hoping anyone can suggest me how get the result. Any suggestion is highly appreciated. thank you.

First some dummy data:
set.seed(1983)
freq <- rnorm(100, mean=10)
perdx <- rnorm(100, mean=100, sd=10)
Then your function (slightly shortened and modified: you need to add m as a parameter since it will be changing at each iteration), vector m and an empty vector b (you do want to allocate upfront the length of the vector):
m <- 1:100
b <- vector(length=length(m))
fn <- function(h,m){
lambda <- freq[1:m]^(2*h-1)
Gh <- mean(perdx[1:m]*lambda)
log(Gh)-(2*h-1)*mean(log(freq[1:m]))
}
And finally your loop:
for(i in seq_along(m)){b[i] <- optimize(fn,c(0,1.5),tol = 0.00001, m=m[i])$minimum - 0.5}
b
[1] 0.370809995 0.143004969 0.295652288 0.341975500 0.155323568 -0.004270843 -0.004463482 -0.005151013 -0.019702428 -0.066622938 -0.071558276 -0.051269281 -0.010162819 -0.011613268
[15] -0.043173232 -0.023702358 -0.017404588 -0.041314701 -0.041849379 -0.039659543 -0.042926431 -0.041149212 -0.050584172 -0.051101425 -0.051999900 -0.053473729 -0.007245899 -0.023556513
[29] -0.026109458 -0.035935971 -0.063366257 -0.048185532 -0.051862241 -0.051659993 -0.078318886 -0.080683266 -0.082146068 -0.088776082 -0.095815094 -0.097276217 -0.099827675 -0.090215664
[43] -0.091023273 -0.090649640 -0.091877778 -0.091318363 -0.083812376 -0.091700899 -0.086337626 -0.105456723 -0.105972890 -0.101094946 -0.101748039 -0.101323129 -0.070511638 -0.081105305
[57] -0.072667430 -0.072361640 -0.069692202 -0.067384208 -0.072985712 -0.063617816 -0.064122242 -0.067135980 -0.070663150 -0.069359528 -0.069691113 -0.084422380 -0.073379583 -0.072209507
[71] -0.069132825 -0.067681419 -0.063782326 -0.057532656 -0.063031479 -0.054001810 -0.053523184 -0.051783114 -0.053388449 -0.055742505 -0.052429781 -0.058399275 -0.059529803 -0.059389065
[85] -0.058834476 -0.043061836 -0.045186752 -0.048336234 -0.055597368 -0.065307991 -0.060903775 -0.062518358 -0.062898386 -0.059452595 -0.051983381 -0.049742105 -0.050124722 -0.049212744
[99] -0.041458672 -0.043251041

Related

fill NA raster cells using focal defined by boundary

I have a raster and a shapefile. The raster contains NA and I am filling the NAs using the focal function
library(terra)
v <- vect(system.file("ex/lux.shp", package="terra"))
r <- rast(system.file("ex/elev.tif", package="terra"))
r[45:60, 45:60] <- NA
r_fill <- terra::focal(r, 5, mean, na.policy="only", na.rm=TRUE)
However, there are some NA still left. So I do this:
na_count <- terra::freq(r_fill, value = NA)
while(na_count$count != 0){
r_fill <- terra::focal(r_fill, 5, mean, na.policy="only", na.rm=TRUE)
na_count <- terra::freq(r_fill, value = NA)
}
Once all NA's are filled, I clip the raster again using the shapefile
r_fill <- terra::crop(r_fill, v, mask = T, touches = T)
This is what my before and after looks like:
I wondered if the while loop is an efficient way to fill the NAs or basically determine how many times I have to run focal to fill all the NAs in the raster.
Perhaps we can, or want to, dispense with the while( altogether by making a better estimate of focal('s w= arg in a world where r, as ground truth, isn't available. Were it available, we could readily derive direct value of w
r <- rast(system.file("ex/elev.tif", package="terra"))
# and it's variants
r2 <- r
r2[45:60, 45:60] <- NA
freq(r2, value=NA) - freq(r, value=NA)
layer value count
1 0 NA 256
sqrt((freq(r2, value=NA) - freq(r, value=NA))$count)
[1] 16
which might be a good value for w=, and introducing another variant
r3 <- r
r3[40:47, 40:47] <- NA
r3[60:67, 60:67] <- NA
r3[30:37, 30:37] <- NA
r3[70:77, 40:47] <- NA
rm(r)
We no longer have our ground truth. How might we estimate an edge of w=? Turning to boundaries( default values (inner)
r2_bi <- boundaries(r2)
r3_bi <- boundaries(r3)
# examining some properties of r2_bi, r3_bi
freq(r2_bi, value=1)$count
[1] 503
freq(r3_bi, value=1)$count
[1] 579
freq(r2_bi, value=1)$count/freq(r2_bi, value = 0)$count
[1] 0.1306833
freq(r3_bi, value=1)$count/freq(r3_bi, value = 0)$count
[1] 0.1534588
sum(freq(r2_bi, value=1)$count,freq(r2_bi, value = 0)$count)
[1] 4352
sum(freq(r3_bi, value=1)$count,freq(r3_bi, value = 0)$count)
[1] 4352
Taken in reverse order, sum[s] and freq[s] suggest that while the total area of (let's call them holes) are the same, they differ in number and r2 is generally larger than r3. This is also clear from the first pair of freq[s].
Now we drift into some voodoo, hocus pocus in pursuit of a better edge estimate
sum(freq(r2)$count) - sum(freq(r2, value = NA)$count)
[1] 154
sum(freq(r3)$count) - sum(freq(r3, value = NA)$count)
[1] 154
(sum(freq(r3)$count) - sum(freq(r3, value = NA)$count))
[1] 12.40967
freq(r2_bi, value=1)$count/freq(r2_bi, value = 0)$count
[1] 0.1306833
freq(r2_bi, value=0)$count/freq(r2_bi, value = 1)$count
[1] 7.652087
freq(r3_bi, value=1)$count/freq(r3_bi, value = 0)$count
[1] 0.1534588
taking the larger, i.e. freq(r2_bi 7.052087
7.652087/0.1306833
[1] 58.55444
154+58
[1] 212
sqrt(212)
[1] 14.56022
round(sqrt(212)+1)
[1] 16
Well, except for that +1 part, maybe still a decent estimate for w=, to be used on both r2 and r3 if called upon to find a better w, and perhaps obviate the need for while(.
Another approach to looking for squares and their edges:
wtf3 <- values(r3_bi$elevation)
wtf2 <- values(r2_bi$elevation)
wtf2_tbl_df2 <- as.data.frame(table(rle(as.vector(is.na(wtf2)))$lengths))
wtf3_tbl_df2 <- as.data.frame(table(rle(as.vector(is.na(wtf3)))$lengths))
names(wtf2_tbl_df2)
[1] "Var1" "Freq"
wtf2_tbl_df2[which(wtf2_tbl_df2$Var1 == wtf2_tbl_df2$Freq), ]
Var1 Freq
14 16 16
wtf3_tbl_df2[which(wtf3_tbl_df2$Freq == max(wtf3_tbl_df2$Freq)), ]
Var1 Freq
7 8 35
35/8
[1] 4.375 # 4 squares of 8 with 3 8 length vectors
bringing in v finally and filling
v <- vect(system.file("ex/lux.shp", package="terra"))
r2_fill_17 <- focal(r2, 16 + 1 , mean, na.policy='only', na.rm = TRUE)
r3_fill_9 <- focal(r3, 8 + 1 , mean, na.policy='only', na.rm = TRUE)
r2_fill_17_cropv <- crop(r2_fill_17, v, mask = TRUE, touches = TRUE)
r3_fill_9_cropv <- crop(r3_fill_9, v, mask = TRUE, touches = TRUE)
And I now appreciate your while( approach as your r2 looks better, more naturally transitioned, though the r3 looks fine. In my few, brief experiments with smaller than 'hole', i.e. focal(r2, 9, I got the sense it would take 2 passes to fill, that suggests focal(r2, 5 would take 4.
I guess further determining the proportion of fill:hole:rast for when to deploy a while would be worthwhile.

Combinatorial optimization with discrete options in R

I have a function with five variables that I want to maximize using only an specific set of parameters for each variable.
Are there any methods in R that can do this, other than by brutal force? (e.g. Particle Swarm Optimization, Genetic Algorithm, Greedy, etc.). I have read a few packages but they seem to create their own set of parameters from within a given range. I am only interested in optimizing the set of options provided.
Here is a simplified version of the problem:
#Example of 5 variable function to optimize
Fn<-function(x){
a=x[1]
b=x[2]
c=x[3]
d=x[4]
e=x[5]
SUM=a+b+c+d+e
return(SUM)
}
#Parameters for variables to optimize
Vars=list(
As=c(seq(1.5,3, by = 0.3)), #float
Bs=c(1,2), #Binary
Cs=c(seq(1,60, by=10)), #Integer
Ds=c(seq(60,-60, length.out=5)), #Negtive
Es=c(1,2,3)
)
#Full combination
FullCombn= expand.grid(Vars)
Results=data.frame(I=as.numeric(), Sum=as.numeric())
for (i in 1:nrow(FullCombn)){
ParsI=FullCombn[i,]
ResultI=Fn(ParsI)
Results=rbind(Results,c(I=i,Sum=ResultI))
}
#Best iteration (Largest result)
Best=Results[Results[, 2] == max(Results[, 2]),]
#Best parameters
FullCombn[Best$I,]
Two more possibilities. Both minimize by default, so I flip the sign in your objective function (i.e. return -SUM).
#Example of 5 variable function to optimize
Fn<-function(x, ...){
a=x[1]
b=x[2]
c=x[3]
d=x[4]
e=x[5]
SUM=a+b+c+d+e
return(-SUM)
}
#Parameters for variables to optimize
Vars=list(
As=c(seq(1.5,3, by = 0.3)), #float
Bs=c(1,2), #Binary
Cs=c(seq(1,60, by=10)), #Integer
Ds=c(seq(60,-60, length.out=5)), #Negtive
Es=c(1,2,3)
)
First, a grid search. Exactly what you did, just convenient. And the implementation allows you to distribute the evaluations of the objective function.
library("NMOF")
gridSearch(fun = Fn,
levels = Vars)[c("minfun", "minlevels")]
## 5 variables with 6, 2, 6, 5, ... levels: 1080 function evaluations required.
## $minfun
## [1] -119
##
## $minlevels
## [1] 3 2 51 60 3
An alternative: a simple Local Search. You start with a valid initial guess, and then move randomly through possible feasible solutions. The key ingredient is the neighbourhood function. It picks one element randomly and then, again randomly, sets this element to one allowed value.
nb <- function(x, levels, ...) {
i <- sample(length(levels), 1)
x[i] <- sample(levels[[i]], 1)
x
}
(There would be better algorithms for neighbourhood functions; but this one is simple and so demonstrates the idea well.)
LSopt(Fn, list(x0 = c(1.8, 2, 11, 30, 2), ## a feasible initial solution
neighbour = nb,
nI = 200 ## iterations
),
levels = Vars)$xbest
## Local Search.
## ##...
## Best solution overall: -119
## [1] 3 2 51 60 3
(Disclosure: I am the maintainer of package NMOF, which provides functions gridSearch and LSopt.)
In response to the comment, a few remarks on Local Search and the neighbourhood function above (nb). Local Search, as implemented in
LSopt, will start with an arbitrary solution, and
then change that solution slightly. This new solution,
called a neighbour, will be compared (by its
objective-function value) to the old solution. If the new solution is
better, it becomes the current solution; otherwise it
is rejected and the old solution remains the current one.
Then the algorithm repeats, for a number of iterations.
So, in short, Local Search is not random sampling, but
a guided random-walk through the search space. It's
guided because only better solutions get accepted, worse one's get rejected. In this sense, LSopt will narrow down on good parameter values.
The implementation of the neighbourhood is not ideal
for two reasons. The first is that a solution may not
be changed at all, since I sample from feasible
values. But for a small set of possible values as here,
it might often happen that the same element is selected
again. However, for larger search spaces, this
inefficiency is typically negligible, since the
probability of sampling the same value becomes
smaller. Often so small, that the additional code for
testing if the solution has changed becomes more
expensive that the occasionally-wasted iteration.
A second thing could be improved, albeit through a more
complicated function. And again, for this small problem it does not matter. In the current neighbourhood, an
element is picked and then set to any feasible value.
But that means that changes from one solution to the
next might be large. Instead of picking any feasible values of the As,
in realistic problems it will often be better to pick a
value close to the current value. For example, when you are at 2.1, either move to 1.8 or 2.4, but not to 3.0. (This reasoning is only relevant, of course, if the variable in question is on a numeric or at least ordinal scale.)
Ultimately, what implementation works well can be
tested only empirically. Many more details are in this tutorial.
Here is one alternative implementation. A solution is now a vector of positions for the original values, e.g. if x[1] is 2, it "points" to 1.8, if x[2] is 2, it points to 1, and so on.
## precompute lengths of vectors in Vars
lens <- lengths(Vars)
nb2 <- function(x, lens, ...) {
i <- sample(length(lens), 1)
if (x[i] == 1L) {
x[i] <- 2
} else if (x[i] == lens[i]) {
x[i] <- lens[i] - 1
} else
x[i] <- x[i] + sample(c(1, -1), 1)
x
}
## the objective function now needs to map the
## indices in x back to the levels in Vars
Fn2 <- function(x, levels, ...){
y <- mapply(`[`, levels, x)
## => same as
## y <- numeric(length(x))
## y[1] <- Vars[[1]][x[1]]
## y[2] <- Vars[[2]][x[2]]
## ....
SUM <- sum(y)
return(-SUM)
}
xbest <- LSopt(Fn2,
list(x0 = c(1, 1, 1, 1, 1), ## an initial solution
neighbour = nb2,
nI = 200 ## iterations
),
levels = Vars,
lens = lens)$xbest
## Local Search.
## ....
## Best solution overall: -119
## map the solution back to the values
mapply(`[`, Vars, xbest)
## As Bs Cs Ds Es
## 3 2 51 60 3
Here is a genetic algorithm solution with package GA.
The key is to write a function decode enforcing the constraints, see the package vignette.
library(GA)
#> Loading required package: foreach
#> Loading required package: iterators
#> Package 'GA' version 3.2.2
#> Type 'citation("GA")' for citing this R package in publications.
#>
#> Attaching package: 'GA'
#> The following object is masked from 'package:utils':
#>
#> de
decode <- function(x) {
As <- Vars$As
Bs <- Vars$Bs
Cs <- Vars$Cs
Ds <- rev(Vars$Ds)
# fix real variable As
i <- findInterval(x[1], As)
if(x[1L] - As[i] < As[i + 1L] - x[1L])
x[1L] <- As[i]
else x[1L] <- As[i + 1L]
# fix binary variable Bs
if(x[2L] - Bs[1L] < Bs[2L] - x[2L])
x[2L] <- Bs[1L]
else x[2L] <- Bs[2L]
# fix integer variable Cs
i <- findInterval(x[3L], Cs)
if(x[3L] - Cs[i] < Cs[i + 1L] - x[3L])
x[3L] <- Cs[i]
else x[3L] <- Cs[i + 1L]
# fix integer variable Ds
i <- findInterval(x[4L], Ds)
if(x[4L] - Ds[i] < Ds[i + 1L] - x[4L])
x[4L] <- Ds[i]
else x[4L] <- Ds[i + 1L]
# fix the other, integer variable
x[5L] <- round(x[5L])
setNames(x , c("As", "Bs", "Cs", "Ds", "Es"))
}
Fn <- function(x){
x <- decode(x)
# a <- x[1]
# b <- x[2]
# c <- x[3]
# d <- x[4]
# e <- x[5]
# SUM <- a + b + c + d + e
SUM <- sum(x, na.rm = TRUE)
return(SUM)
}
#Parameters for variables to optimize
Vars <- list(
As = seq(1.5, 3, by = 0.3), # Float
Bs = c(1, 2), # Binary
Cs = seq(1, 60, by = 10), # Integer
Ds = seq(60, -60, length.out = 5), # Negative
Es = c(1, 2, 3)
)
res <- ga(type = "real-valued",
fitness = Fn,
lower = c(1.5, 1, 1, -60, 1),
upper = c(3, 2, 51, 60, 3),
popSize = 1000,
seed = 123)
summary(res)
#> ── Genetic Algorithm ───────────────────
#>
#> GA settings:
#> Type = real-valued
#> Population size = 1000
#> Number of generations = 100
#> Elitism = 50
#> Crossover probability = 0.8
#> Mutation probability = 0.1
#> Search domain =
#> x1 x2 x3 x4 x5
#> lower 1.5 1 1 -60 1
#> upper 3.0 2 51 60 3
#>
#> GA results:
#> Iterations = 100
#> Fitness function value = 119
#> Solutions =
#> x1 x2 x3 x4 x5
#> [1,] 2.854089 1.556080 46.11389 49.31045 2.532682
#> [2,] 2.869408 1.638266 46.12966 48.71106 2.559620
#> [3,] 2.865254 1.665405 46.21684 49.04667 2.528606
#> [4,] 2.866494 1.630416 46.12736 48.78017 2.530454
#> [5,] 2.860940 1.650015 46.31773 48.92642 2.521276
#> [6,] 2.851644 1.660358 46.09504 48.81425 2.525504
#> [7,] 2.855078 1.611837 46.13855 48.62022 2.575492
#> [8,] 2.857066 1.588893 46.15918 48.60505 2.588992
#> [9,] 2.862644 1.637806 46.20663 48.92781 2.579260
#> [10,] 2.861573 1.630762 46.23494 48.90927 2.555612
#> ...
#> [59,] 2.853788 1.640810 46.35649 48.87381 2.536682
#> [60,] 2.859090 1.658127 46.15508 48.85404 2.590679
apply(res#solution, 1, decode) |> t() |> unique()
#> As Bs Cs Ds Es
#> [1,] 3 2 51 60 3
Created on 2022-10-24 with reprex v2.0.2

3D with value interpolation in R (X, Y, Z, V)

Is there an R package that does X, Y, Z, V interpolation? I see that Akima does X, Y, V but I need one more dimension.
Basically I have X,Y,Z coordinates plus the value (V) that I want to interpolate. This is all GIS data but my GIS does not do voxel interpolation
So if I have a point cloud of XYZ coordinates with a value of V, how can I interpolate what V would be at XYZ coordinate (15,15,-12) ? Some test data would look like this:
X <-rbind(10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50)
Y <- rbind(10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50)
Z <- rbind(-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29)
V <- rbind(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,25,35,75,25,50,0,0,0,0,0,10,12,17,22,27,32,37,25,13,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,50,125,130,105,110,115,165,180,120,100,80,60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
I had the same question and was hoping for an answer in R.
My question was: How do I perform 3D (trilinear) interpolation using regular gridded coordinate/value data (x,y,z,v)? For example, CT images, where each image has pixel centers (x, y) and greyscale value (v) and there are multiple image "slices" (z) along the thing being imaged (e.g., head, torso, leg, ...).
There is a slight problem with the given example data.
# original example data (reformatted)
X <- rep( rep( seq(10, 50, by=10), each=25), 3)
Y <- rep( rep( seq(10, 50, by=10), each=5), 15)
Z <- rep(c(-5, -17, -29), each=125)
V <- rbind(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,25,35,75,25,50,0,0,0,0,0,10,12,17,22,27,32,37,25,13,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,50,125,130,105,110,115,165,180,120,100,80,60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
# the dimensions of the 3D grid described do not match the number of values
(length(unique(X))*length(unique(Y))*length(unique(Z))) == length(V)
## [1] FALSE
## which makes sense since 75 != 375
# visualize this:
library(rgl)
plot3d(x=X, y=Y, z=Z, col=terrain.colors(181)[V])
# examine the example data real quick...
df <- data.frame(x=X,y=Y,z=Z,v=V);
head(df);
table(df$x, df$y, df$z);
# there are 5 V values at each X,Y,Z coordinate... duplicates!
# redefine Z so there are 15 unique values
# making 375 unique coordinate points
# and matching the length of the given value vector, V
df$z <- seq(-5, -29, length.out=15)
head(df)
table(df$x, df$y, df$z);
# there is now 1 V value at each X,Y,Z coordinate
# that was for testing, now actually redefine the Z vector.
Z <- rep(seq(-5,-29, length.out = 15), 25)
# plot it.
library(rgl)
plot3d(x=X, y=Y, z=Z, col=terrain.colors(181)[V])
I couldn't find any 4D interpolation functions in the usual R packages, so I wrote a quick and dirty one. The following implements (without ANY error checking... caveat emptor!) the technique described at: https://en.wikipedia.org/wiki/Trilinear_interpolation
# convenience function #1:
# define a function that takes a vector of lookup values and a value to lookup
# and returns the two lookup values that the value falls between
between = function(vec, value) {
# extract list of unique lookup values
u = unique(vec)
# difference vector
dvec = u - value
vals = c(u[dvec==max(dvec[dvec<0])], u[dvec==min(dvec[dvec>0])])
return(vals)
}
# convenience function #2:
# return the value (v) from a grid data.frame for given point (x, y, z)
get_value = function(df, xi, yi, zi) {
# assumes df is data.frame with column names: x, y, z, v
subset(df, x==xi & y==yi & z==zi)$v
}
# inputs df (x,y,z,v), points to look up (x, y, z)
interp3 = function(dfin, xin, yin, zin) {
# TODO: check if all(xin, yin, zin) equals a grid point, if so just return the point value
# TODO: check if any(xin, yin, zin) equals a grid point, if so then do bilinear or linear interp
cube_x <- between(dfin$x, xin)
cube_y <- between(dfin$y, yin)
cube_z <- between(dfin$z, zin)
# find the two values in each dimension that the lookup value falls within
# and extract the cube of 8 points
tmp <- subset(dfin, x %in% cube_x &
y %in% cube_y &
z %in% cube_z)
stopifnot(nrow(tmp)==8)
# define points in a periodic and cubic lattice
x0 = min(cube_x); x1 = max(cube_x);
y0 = min(cube_y); y1 = max(cube_y);
z0 = min(cube_z); z1 = max(cube_z);
# define differences in each dimension
xd = (xin-x0)/(x1-x0); # 0.5
yd = (yin-y0)/(y1-y0); # 0.5
zd = (zin-z0)/(z1-z0); # 0.9166666
# interpolate along x:
v00 = get_value(tmp, x0, y0, z0)*(1-xd) + get_value(tmp,x1,y0,z0)*xd # 2.5
v01 = get_value(tmp, x0, y0, z1)*(1-xd) + get_value(tmp,x1,y0,z1)*xd # 0
v10 = get_value(tmp, x0, y1, z0)*(1-xd) + get_value(tmp,x1,y1,z0)*xd # 0
v11 = get_value(tmp, x0, y1, z1)*(1-xd) + get_value(tmp,x1,y1,z1)*xd # 65
# interpolate along y:
v0 = v00*(1-yd) + v10*yd # 1.25
v1 = v01*(1-yd) + v11*yd # 32.5
# interpolate along z:
return(v0*(1-zd) + v1*zd) # 29.89583 (~91.7% between v0 and v1)
}
> interp3(df, 15, 15, -12)
[1] 29.89583
Testing that same source's assertion that trilinear is simply linear(bilinear(), bilinear()), we can use the base R linear interpolation function, approx(), and the akima package's bilinear interpolation function, interp(), as follows:
library(akima)
approx(x=c(-11.857143,-13.571429),
y=c(interp(x=df[round(df$z,1)==-11.9,"x"], y=df[round(df$z,1)==-11.9,"y"], z=df[round(df$z,1)==-11.9,"v"], xo=15, yo=15)$z,
interp(x=df[round(df$z,1)==-13.6,"x"], y=df[round(df$z,1)==-13.6,"y"], z=df[round(df$z,1)==-13.6,"v"], xo=15, yo=15)$z),
xout=-12)$y
# [1] 0.2083331
Checked another package to triangulate:
library(oce)
Vmat <- array(data = V, dim = c(length(unique(X)), length(unique(Y)), length(unique(Z))))
approx3d(x=unique(X), y=unique(Y), z=unique(Z), f=Vmat, xout=15, yout=15, zout=-12)
[1] 1.666667
So 'oce', 'akima' and my function all give pretty different answers. This is either a mistake in my code somewhere, or due to differences in the underlying Fortran code in the akima interp(), and whatever is in the oce 'approx3d' function that we'll leave for another day.
Not sure what the correct answer is because the MWE is not exactly "minimum" or simple. But I tested the functions with some really simple grids and it seems to give 'correct' answers. Here's one simple 2x2x2 example:
# really, really simple example:
# answer is always the z-coordinate value
sdf <- expand.grid(x=seq(0,1),y=seq(0,1),z=seq(0,1))
sdf$v <- rep(seq(0,1), each=4)
> interp3(sdf,0.25,0.25,.99)
[1] 0.99
> interp3(sdf,0.25,0.25,.4)
[1] 0.4
Trying akima on the simple example, we get the same answer (phew!):
library(akima)
approx(x=unique(sdf$z),
y=c(interp(x=sdf[sdf$z==0,"x"], y=sdf[sdf$z==0,"y"], z=sdf[sdf$z==0,"v"], xo=.25, yo=.25)$z,
interp(x=sdf[sdf$z==1,"x"], y=sdf[sdf$z==1,"y"], z=sdf[sdf$z==1,"v"], xo=.25, yo=.25)$z),
xout=.4)$y
# [1] 0.4
The new example data in the OP's own, accepted answer was not possible to interpolate with my simple interp3() function above because:
(a) the grid coordinates are not regularly spaced, and
(b) the coordinates to lookup (x1, y1, z1) lie outside of the grid.
# for completeness, here's the attempt:
options(scipen = 999)
XCoor=c(78121.6235,78121.6235,78121.6235,78121.6235,78136.723,78136.723,78136.723,78136.8969,78136.8969,78136.8969,78137.4595,78137.4595,78137.4595,78125.061,78125.061,78125.061,78092.4696,78092.4696,78092.4696,78092.7683,78092.7683,78092.7683,78092.7683,78075.1171,78075.1171,78064.7462,78064.7462,78064.7462,78052.771,78052.771,78052.771,78032.1179,78032.1179,78032.1179)
YCoor=c(5213642.173,523642.173,523642.173,523642.173,523594.495,523594.495,523594.495,523547.475,523547.475,523547.475,523503.462,523503.462,523503.462,523426.33,523426.33,523426.33,523656.953,523656.953,523656.953,523607.157,523607.157,523607.157,523607.157,523514.671,523514.671,523656.81,523656.81,523656.81,523585.232,523585.232,523585.232,523657.091,523657.091,523657.091)
ZCoor=c(-3.0,-5.0,-10.0,-13.0,-3.5,-6.5,-10.5,-3.5,-6.5,-9.5,-3.5,-5.5,-10.5,-3.5,-5.5,-7.5,-3.5,-6.5,-11.5,-3.0,-5.0,-9.0,-12.0,-6.5,-10.5,-2.5,-3.5,-8.0,-3.5,-6.5,-9.5,-2.5,-6.5,-8.5)
V=c(2.4000,30.0,620.0,590.0,61.0,480.0,0.3700,0.0,0.3800,0.1600,0.1600,0.9000,0.4100,0.0,0.0,0.0061,6.0,52.0,0.3400,33.0,235.0,350.0,9300.0,31.0,2100.0,0.0,0.0,10.5000,3.8000,0.9000,310.0,0.2800,8.3000,18.0)
adf = data.frame(x=XCoor, y=YCoor, z=ZCoor, v=V)
# the first y value looks like a typo?
> head(adf)
x y z v
1 78121.62 5213642.2 -3.0 2.4
2 78121.62 523642.2 -5.0 30.0
3 78121.62 523642.2 -10.0 620.0
4 78121.62 523642.2 -13.0 590.0
5 78136.72 523594.5 -3.5 61.0
6 78136.72 523594.5 -6.5 480.0
x1=198130.000
y1=1913590.000
z1=-8
> interp3(adf, x1,y1,z1)
numeric(0)
Warning message:
In min(dvec[dvec > 0]) : no non-missing arguments to min; returning Inf
Whether the test data did or not make sense, I still needed an algorithm. Test data is just that, something to fiddle with and as a test data it was fine.
I wound up programming it in python and the following code takes XYZ V and does a 3D Inverse Distance Weighted (IDW) interpolation where you can set the number of points used in the interpolation. This python recipe only interpolates to one point (x1, y1, z1) but it is easy enough to extend.
import numpy as np
import math
#34 points
XCoor=np.array([78121.6235,78121.6235,78121.6235,78121.6235,78136.723,78136.723,78136.723,78136.8969,78136.8969,78136.8969,78137.4595,78137.4595,78137.4595,78125.061,78125.061,78125.061,78092.4696,78092.4696,78092.4696,78092.7683,78092.7683,78092.7683,78092.7683,78075.1171,78075.1171,78064.7462,78064.7462,78064.7462,78052.771,78052.771,78052.771,78032.1179,78032.1179,78032.1179])
YCoor=np.array([5213642.173,523642.173,523642.173,523642.173,523594.495,523594.495,523594.495,523547.475,523547.475,523547.475,523503.462,523503.462,523503.462,523426.33,523426.33,523426.33,523656.953,523656.953,523656.953,523607.157,523607.157,523607.157,523607.157,523514.671,523514.671,523656.81,523656.81,523656.81,523585.232,523585.232,523585.232,523657.091,523657.091,523657.091])
ZCoor=np.array([-3.0,-5.0,-10.0,-13.0,-3.5,-6.5,-10.5,-3.5,-6.5,-9.5,-3.5,-5.5,-10.5,-3.5,-5.5,-7.5,-3.5,-6.5,-11.5,-3.0,-5.0,-9.0,-12.0,-6.5,-10.5,-2.5,-3.5,-8.0,-3.5,-6.5,-9.5,-2.5,-6.5,-8.5])
V=np.array([2.4000,30.0,620.0,590.0,61.0,480.0,0.3700,0.0,0.3800,0.1600,0.1600,0.9000,0.4100,0.0,0.0,0.0061,6.0,52.0,0.3400,33.0,235.0,350.0,9300.0,31.0,2100.0,0.0,0.0,10.5000,3.8000,0.9000,310.0,0.2800,8.3000,18.0])
def Distance(x1,y1,z1, Npoints):
i=0
d=[]
while i < 33:
d.append(math.sqrt((x1-XCoor[i])*(x1-XCoor[i]) + (y1-YCoor[i])*(y1-YCoor[i]) + (z1-ZCoor[i])*(z1-ZCoor[i]) ))
i = i + 1
distance=np.array(d)
myIndex=distance.argsort()[:Npoints]
weightedNum=0
weightedDen=0
for i in myIndex:
weightedNum=weightedNum + (V[i]/(distance[i]*distance[i]))
weightedDen=weightedDen + (1/(distance[i]*distance[i]))
InterpValue=weightedNum/weightedDen
return InterpValue
x1=198130.000
y1=1913590.000
z1=-8
print(Distance(x1,y1,z1, 12))

How to obtain the density estimation of a specific value in stats::density?

Suppose I have data like the following:
val <- .65
set.seed(1)
distr <- replicate(1000, jitter(.5, amount = .2))
d <- density(distr)
Since stats::density uses a specific bw, it does not include all possible values in the interval (becuase they're infinite):
d$x[ d$x > .64 & d$x < .66 ]
[1] 0.6400439 0.6411318 0.6422197 0.6433076 0.6443955 0.6454834 0.6465713 0.6476592 0.6487471
[10] 0.6498350 0.6509229 0.6520108 0.6530987 0.6541866 0.6552745 0.6563624 0.6574503 0.6585382
[19] 0.6596261
I would like to find a way to provide val to the density function, so that it will return its d$y estimate (I will then use it to color areas of the density plot).
I can't guess how silly this question is, but I can't find a fast solution.
I thought of obtaining it by a linear interpolation of the d$y corresponding to the two values of d$x that are closer to val. Is there a faster way?
This illustrates the use of approxfun:
> Af <- approxfun(d$x, d$y)
> Af(val)
[1] 2.348879
> plot(d(
+
> plot(d)
> points(val,Af(val) )
> png();plot(d); points(val,Af(val) ); dev.off()

R: How to generate a sequence seq() given a condition?

I just discovered R and I am trying to work with it.
Here is what I am trying to achieve:
I have a vector of numbers, x, between 50 and 100 and with a size of 250 observations.
x = sample(seq(50, 100), 250, repeat = T)
Now, I want to generate another vector of numbers, y, between 0 and 100, which is the same size as vector x such that each element in y is less than or equal to its equivalent in x.
That is to say that if x[1] is 76, for example, the highest value y[1] could attain when generated is 76. But it could definitely be any other value below 76. In other words and more generally, I want vector y to be generated in such a way that y[i] <= x[i].
I hope I have made my request clearer.
Thank you very much!
y <- x -1 # ...........................
y <- sapply( x, function(x) runif(n=1, max=x))
y
[1] 7.2713788 30.0008063 42.5205775 0.9271717 10.7114456 39.5199145 7.4109775
[8] 28.3464373 28.5840101 34.0654033 15.0675028 50.2836294 45.9031794 13.5931005
[15] 43.2751738 17.0560824 3.1507491 25.7619129 12.3391448 22.6203684 51.3334810
[22] 37.0481703 33.4733277 37.1304850 26.7984406 66.3844126 40.2775918 47.6379024
[29] 16.2480595 66.8358384 33.3513161 60.2673874 65.6204462 45.6951960 1.5729434
[36] 20.4850357 0.1345737 84.5334203 19.7997451 53.8025623 48.5528486 8.8992123
[43] 90.9651742 28.3584167 41.7728159 46.4790641 17.8129578 83.1906415 37.5114353
[50] 89.5685501 85.2499600

Resources