My variogram code result different from variog() result - r

I am writing code for producing a variogram. For validating my result, I checked with geoR::variog() but both variograms are different.
I tried to understand the code of variog() to see what happens under the hood but there are so many things happening that I can't seem to understand it. I, in my code, am using the parameters X-coordinate, Y-coordiante, data value, number of lags, minimum lag value, lag interval, azimuth (angle in degrees; 90 corresponds to vertical direction), angle tolerance (in degrees) and maximum bandwidth.
variogram = function(xcor, ycor, data, nlag, minlag, laginv, azm, atol, maxbandw){
dl <- length(data)
lowangle <- azm - atol
upangle <- azm + atol
gamlag <- integer(nlag)
n <- integer(nlag)
dist <- pairdist(xcor, ycor)
maxd <- max(dist)
llag <- seq(minlag, minlag + (nlag-1) * laginv, by = laginv)
hlag <- llag + laginv
for(i in 1:dl){
for(j in i:dl){
if(i != j){
if(xcor[j]- xcor[i] == 0)
theta <- 90
else
theta <- 180/pi * atan((ycor[j] - ycor[i])/(xcor[j] - xcor[i]))
for(k in 1:nlag){
d <- dist[j, i]
b <- abs(d * sin(theta - azm))
if((llag[k] <= d & d < hlag[k]) & (lowangle <= theta & theta < upangle) & (b <= maxbandw)){
gamlag[k] <- gamlag[k] + (data[i] - data[j])^2;
n[k] <- n[k] + 1
}
}
}
}
}
gamlag <- ifelse(n == 0, NA, gamlag/(2*n))
tmp <- data.frame("lag" = llag, "gamma" = gamlag)
return(tmp)
}
function call for the above code
ideal_variogram_2 <- variogram(data3[,1], data3[,2], data3[,3], 18, 0, 0.025, 90, 45, 1000000)
ideal_variogram_2 <- na.omit(ideal_variogram_2)
plot(ideal_variogram_2$lag, ideal_variogram_2$gamma, main = "Using my code")
function call for variog()
geodata1 <- as.geodata(data3, coords.col = 1:2, data.col = 3)
ideal_variogram_1 <- variog(geodata1, coords = geodata1$coords, data = geodata1$data, option = "bin", uvec = seq(0, 0.45, by = 0.025), direction = pi/2, tolerance = pi/4)
df <- data.frame(u = ideal_variogram_1$u, v = ideal_variogram_1$v)
plot(df$u, df$v, main = "Using variog()")
The 2 variograms that I got are at the following link:
Variogram

Related

Solving for an input value of an R function

In my attached R function, I was wondering how to solve for mdes (suppose it is unknown) which is currently one of the input values IF everything else is known?
Is it also possible to solve for mdes and power (both currently input values) IF everything else is known?
foo <- function(A = 200, As = 15, B = 100,Bs = 10,iccmax = 0.15,mdes = .25,SD = 1.2,power = 80)
{
tail <- 2
alpha <- 5
inv_d <- function(mdes) {
c(mean_dif = 1, Vmax = 2/mdes^2)
}
SDr <- 1/SD
pars <- inv_d(mdes)
mean_dif <- pars[[1]]
Vmax <- pars[[2]]
zbeta <- qnorm((power/100))
zalpha <- qnorm(1-(alpha/(100*tail)))
maxvarmean_difhat <- (mean_dif / (zbeta + zalpha))**2
ntreat <- sqrt((A/As)*((1-iccmax)/iccmax))
ncont <- sqrt((B/Bs)*((1-iccmax)/iccmax))
costpertreatcluster <- A + (As*ntreat)
costperconcluster <- B + (Bs*ncont)
gtreat <- (sqrt(A*iccmax) + sqrt(As*(1-iccmax)))**2
gcon <- (sqrt(B*iccmax) + sqrt(Bs*(1-iccmax)))**2
pratio <- sqrt(gtreat/gcon)
budgetratio <- 99999
budgetratio <- ifelse( ((pratio <= SD) & (pratio >= SDr)), pratio**2, ifelse((pratio > SD), pratio*SD, pratio*SDr))
fraction <- budgetratio/(1 + budgetratio)
mmvnumer <- 99999
mmvnumer <- ifelse( ((pratio <= SD) & (pratio >= SDr)),
gcon*Vmax*(1+(pratio**2)),
ifelse((pratio > SD),
gcon*Vmax*(((pratio*SD)+1)**2/((SD**2)+1)),
gcon*Vmax*(((pratio*SDr)+1)**2/((SDr**2) + 1))) )
budget <- mmvnumer/maxvarmean_difhat
treatbudget <- fraction*budget
conbudget <- (1-fraction)*budget
ktreat <- treatbudget/costpertreatcluster
kcont <- conbudget/costperconcluster
ktreatrup <- ceiling(ktreat)
kcontrup <- ceiling(kcont)
ktreatplus <- ifelse(pmin(ktreatrup,kcontrup) < 8, ktreatrup + 3, ktreatrup + 2)
kcontplus <- ifelse(pmin(ktreatrup,kcontrup) < 8, kcontrup + 3, kcontrup + 2)
budgetplus <- (ktreatplus*costpertreatcluster) + (kcontplus*costperconcluster)
return(c(ncont = ncont, kcont = kcontplus,
ntreat = ntreat, ktreat = ktreatplus, budget = budgetplus))
}
#--------------------------------------------------------------------------------
# EXAMPLE OF USE:
foo()
ncont kcont ntreat ktreat budget
7.527727 73.000000 8.692270 62.000000 33279.051347
Define a function of one variable as
p0 = foo()
fn1 = function(x) sum((foo(mdes=x) - p0)^2)
and find a minimum that should be 0, and which corresponds to your mdes = 0.25 input!
optimize(fn1, c(0.0, 1.0))
## $minimum
## [1] 0.2497695
## $objective
## [1] 0
For two variables, this is more difficult, as the function has many local minima and is ill-defined outside certain regions. Applying optim() you will need well-chosen starting points.

Why is my Monte Carlo Integration wrong by a factor of 2?

I am trying to integrate the following function using a Monte Carlo Integration. The interval I want to integrate is x <- seq(0, 1, by = 0.01) and y <- seq(0, 1, by = 0.01).
my.f <- function(x, y){
result = x^2 + sin(x) + exp(cos(y))
return(result)
}
I calculated the integral using the cubature package.
library(cubature)
library(plotly)
# Rewriting the function, so it can be integrated
cub.function <- function(x){
result = x[1]^2 + sin(x[1]) + exp(cos(x[2]))
return(result)
}
cub.integral <- adaptIntegrate(f = cub.function, lowerLimit = c(0,0), upperLimit = c(1,1))
The result is 3.134606. But when I use my Monte Carlo Integration Code, see below, my result is about 1.396652. My code is wrong by more than a factor of 2!
What I did:
Since I need a volume to conduct a Monte Carlo Integration, I calculated the function values on the mentioned interval. This will give me an estimation of the maximum and minimum of the function.
# My data range
x <- seq(0, 1, by = 0.01)
y <- seq(0, 1, by = 0.01)
# The matrix, where I save the results
my.f.values <- matrix(0, nrow = length(x), ncol = length(y))
# Calculation of the function values
for(i in 1:length(x)){
for(j in 1:length(y)){
my.f.values[i,j] <- my.f(x = x[i], y = y[j])
}
}
# The maximum and minimum of the function values
max(my.f.values)
min(my.f.values)
# Plotting the surface, but this is not necessary
plot_ly(y = x, x = y, z = my.f.values) %>% add_surface()
So, the volume that we need is simply the maximum of the function values, since 1 * 1 * 4.559753 is simply 4.559753.
# Now, the Monte Carlo Integration
# I found the code online and modified it a bit.
monte = function(x){
tests = rep(0,x)
hits = 0
for(i in 1:x){
y = c(runif(2, min = 0, max = 1), # y[1] is y; y[2] is y
runif(1, min = 0, max = max(my.f.values))) # y[3] is z
if(y[3] < y[1]**2+sin(y[1])*exp(cos(y[2]))){
hits = hits + 1
}
prop = hits / i
est = prop * max(my.f.values)
tests[i] = est
}
return(tests)
}
size = 10000
res = monte(size)
plot(res, type = "l")
lines(x = 1:size, y = rep(cub.integral$integral, size), col = "red")
So, the result is completely wrong. But if I change the function a bit, suddenly is works.
monte = function(x){
tests = rep(0,x)
hits = 0
for(i in 1:x){
x = runif(1)
y = runif(1)
z = runif(1, min = 0, max = max(my.f.values))
if(z < my.f(x = x, y = y)){
hits = hits + 1
}
prop = hits / i
est = prop * max(my.f.values)
tests[i] = est
}
return(tests)
}
size = 10000
res = monte(size)
plot(res, type = "l")
lines(x = 1:size, y = rep(cub.integral$integral, size), col = "red")
Can somebody explain why the result suddenly changes? To me, both functions seem to do the exact same thing.
In your (first) code for monte, this line is in error:
y[3] < y[1]**2+sin(y[1])*exp(cos(y[2]))
Given your definition of my.f, it should surely be
y[3] < y[1]**2 + sin(y[1]) + exp(cos(y[2]))
Or..., given that you shouldn't be repeating yourself unnecessarily:
y[3] < my.f(y[1], y[2])

Non-linear fitting with nls() is giving me singular gradient matrix at initial parameter estimates. Why?

This is my first attempt at fitting a non-linear model in R, so please bear with me.
Problem
I am trying to understand why nls() is giving me this error:
Error in nlsModel(formula, mf, start, wts): singular gradient matrix at initial parameter estimates
Hypotheses
From what I've read from other questions here at SO it could either be because:
my model is discontinuous, or
my model is over-determined, or
bad choice of starting parameter values
So I am calling for help on how to overcome this error. Can I change the model and still use nls(), or do I need to use nls.lm from the minpack.lm package, as I have read elsewhere?
My approach
Here are some details about the model:
the model is a discontinuous function, a kind of staircase type of function (see plot below)
in general, the number of steps in the model can be variable yet they are fixed for a specific fitting event
MWE that shows the problem
Brief explanation of the MWE code
step_fn(x, min = 0, max = 1): function that returns 1 within the interval (min, max] and 0 otherwise; sorry about the name, I realize now it is not really a step function... interval_fn() would be more appropriate I guess.
staircase(x, dx, dy): a summation of step_fn() functions. dx is a vector of widths for the steps, i.e. max - min, and dy is the increment in y for each step.
staircase_formula(n = 1L): generates a formula object that represents the model modeled by the function staircase() (to be used with the nls() function).
please do note that I use the purrr and glue packages in the example below.
Code
step_fn <- function(x, min = 0, max = 1) {
y <- x
y[x > min & x <= max] <- 1
y[x <= min] <- 0
y[x > max] <- 0
return(y)
}
staircase <- function(x, dx, dy) {
max <- cumsum(dx)
min <- c(0, max[1:(length(dx)-1)])
step <- cumsum(dy)
purrr::reduce(purrr::pmap(list(min, max, step), ~ ..3 * step_fn(x, min = ..1, max = ..2)), `+`)
}
staircase_formula <- function(n = 1L) {
i <- seq_len(n)
dx <- sprintf("dx%d", i)
min <-
c('0', purrr::accumulate(dx[-n], .f = ~ paste(.x, .y, sep = " + ")))
max <- purrr::accumulate(dx, .f = ~ paste(.x, .y, sep = " + "))
lhs <- "y"
rhs <-
paste(glue::glue('dy{i} * step_fn(x, min = {min}, max = {max})'),
collapse = " + ")
sc_form <- as.formula(glue::glue("{lhs} ~ {rhs}"))
return(sc_form)
}
x <- seq(0, 10, by = 0.01)
y <- staircase(x, c(1,2,2,5), c(2,5,2,1)) + rnorm(length(x), mean = 0, sd = 0.2)
plot(x = x, y = y)
lines(x = x, y = staircase(x, dx = c(1,2,2,5), dy = c(2,5,2,1)), col="red")
my_data <- data.frame(x = x, y = y)
my_model <- staircase_formula(4)
params <- list(dx1 = 1, dx2 = 2, dx3 = 2, dx4 = 5,
dy1 = 2, dy2 = 5, dy3 = 2, dy4 = 1)
m <- nls(formula = my_model, start = params, data = my_data)
#> Error in nlsModel(formula, mf, start, wts): singular gradient matrix at initial parameter estimates
Any help is greatly appreciated.
I assume you are given a vector of observations of length len as the ones plotted in your example, and you wish to identify k jumps and k jump sizes. (Or maybe I misunderstood you; but you have not really said what you want to achieve.)
Below I will sketch a solution using Local Search. I start with your example data:
x <- seq(0, 10, by = 0.01)
y <- staircase(x,
c(1,2,2,5),
c(2,5,2,1)) + rnorm(length(x), mean = 0, sd = 0.2)
A solution is a list of positions and sizes of the jumps. Note that I use vectors to store these data, as it will become cumbersome to define variables when you have 20 jumps, say.
An example (random) solution:
k <- 5 ## number of jumps
len <- length(x)
sol <- list(position = sample(len, size = k),
size = runif(k))
## $position
## [1] 89 236 859 885 730
##
## $size
## [1] 0.2377453 0.2108495 0.3404345 0.4626004 0.6944078
We need an objective function to compute the quality of the solution. I also define a simple helper function stairs, which is used by the objective function.
The objective function abs_diff computes the average absolute difference between the fitted series (as defined by the solution) and y.
stairs <- function(len, position, size) {
ans <- numeric(len)
ans[position] <- size
cumsum(ans)
}
abs_diff <- function(sol, y, stairs, ...) {
yy <- stairs(length(y), sol$position, sol$size)
sum(abs(y - yy))/length(y)
}
Now comes the key component for a Local Search: the neighbourhood function that is used to evolve the solution. The neighbourhood function takes a solution and changes it slightly. Here, it will either pick a position or a size and modify it slightly.
neighbour <- function(sol, len, ...) {
p <- sol$position
s <- sol$size
if (runif(1) > 0.5) {
## either move one of the positions ...
i <- sample.int(length(p), size = 1)
p[i] <- p[i] + sample(-25:25, size = 1)
p[i] <- min(max(1, p[i]), len)
} else {
## ... or change a jump size
i <- sample.int(length(s), size = 1)
s[i] <- s[i] + runif(1, min = -s[i], max = 1)
}
list(position = p, size = s)
}
An example call: here the new solution has its first jump size changed.
## > sol
## $position
## [1] 89 236 859 885 730
##
## $size
## [1] 0.2377453 0.2108495 0.3404345 0.4626004 0.6944078
##
## > neighbour(sol, len)
## $position
## [1] 89 236 859 885 730
##
## $size
## [1] 0.2127044 0.2108495 0.3404345 0.4626004 0.6944078
I remains to run the Local Search.
library("NMOF")
sol.ls <- LSopt(abs_diff,
list(x0 = sol, nI = 50000, neighbour = neighbour),
stairs = stairs,
len = len,
y = y)
We can plot the solution: the fitted line is shown in blue.
plot(x, y)
lines(x, stairs(len, sol.ls$xbest$position, sol.ls$xbest$size),
col = "blue", type = "S")
Try DE instead:
library(NMOF)
yf= function(params,x){
dx1 = params[1]; dx2 = params[2]; dx3 = params[3]; dx4 = params[4];
dy1 = params[5]; dy2 = params[6]; dy3 = params[7]; dy4 = params[8]
dy1 * step_fn(x, min = 0, max = dx1) + dy2 * step_fn(x, min = dx1,
max = dx1 + dx2) + dy3 * step_fn(x, min = dx1 + dx2, max = dx1 +
dx2 + dx3) + dy4 * step_fn(x, min = dx1 + dx2 + dx3, max = dx1 +
dx2 + dx3 + dx4)
}
algo1 <- list(printBar = FALSE,
nP = 200L,
nG = 1000L,
F = 0.50,
CR = 0.99,
min = c(0,1,1,4,1,4,1,0),
max = c(2,3,3,6,3,6,3,2))
OF2 <- function(Param, data) { #Param=paramsj data=data2
x <- data$x
y <- data$y
ye <- data$model(Param,x)
aux <- y - ye; aux <- sum(aux^2)
if (is.na(aux)) aux <- 1e10
aux
}
data5 <- list(x = x, y = y, model = yf, ww = 1)
system.time(sol5 <- DEopt(OF = OF2, algo = algo1, data = data5))
sol5$xbest
OF2(sol5$xbest,data5)
plot(x,y)
lines(data5$x,data5$model(sol5$xbest, data5$x),col=7,lwd=2)
#> sol5$xbest
#[1] 1.106396 12.719182 -9.574088 18.017527 3.366852 8.721374 -19.879474 1.090023
#> OF2(sol5$xbest,data5)
#[1] 1000.424

Sequence labeller

I am analysing a time series signal. I set a threshold to separate the noise from the baseline noise. In order to identify the properties of each signal sequence (duration, amplitude, maximum signal...), I built a function to aggregate all the signal points that are continuous as different "peaks". Despite this function does what I want, I was wondering if anyone can help me to make it more efficient -e. g. vectorization, because I aim to run the function on a data.table of more than 1M rows. Here is a sample data with the function:
# Generate dummy data
x <- sin(seq(from = 0, to = 20, length.out = 200)) + rnorm(200, 0,0.1)
x <- zoo(x)
plot(x)
# Label each point as signal (== )1) or noise (0)
y <- ifelse(x > 0.5, 1, 0)
# Function to label each peak
peak_labeler <- function(x) {
tmp <- NULL
for (i in seq_along(x)) {
if (x[i] == 0) { tmp[i] <- 0 } # If baseline, mark as 0
if (x[i] == 1) {
# If x[n] belongs to a peak
if (i == 1) {tmp[i] <- 1} # Label as 1 at t0
else{
if (!exists("Peak")) {Peak <- 0}
if (x[i - 1] == 0) {
# if previous point is no peak, add as peak
Peak <- Peak + 1
tmp[i] <- Peak
}
if (x[i - 1] == 1) {
tmp[i] <- Peak
}
}
}
}
return(tmp)
rm(tmp, Peak, i) # Garbage collection
}
# Label peaks
dummy <- data.frame(t = 1:200, x,y,tmp = peak_labeler(y))
# Show data
ggplot(dummy, aes(x = t, y = x)) +
geom_point(aes(col = as.factor(tmp), group = 1))
Here's an approach using dplyr.
The test in the cross_threshold line works by evaluating whether y is on a different side of 0.5 than the prior y. If so, the sign of the two terms y - threshold and lag(y) - threshold will be different, leading to a TRUE, which is multiplied by 1 to become 1. If they're on the same side of 0.5, you'll get a FALSE and a 0. The default = 0 part deals with the first line, where lag(y) is undefined. Then we add up how many cumulative crosses there have been to define the tmp group.
library(dplyr)
threshold = 0.5
dummy <- data.frame(t = 1:200, x, y) %>%
mutate(cross_threshold = 1 * (sign(y - threshold) != sign(lag(y, default = 0) - threshold)),
# Line above now optional, just if we want to label all crossings
up = 1 * ((y > threshold) & (lag(y) < threshold)),
tmp = if_else(y > threshold, cumsum(up), 0))
ggplot(dummy, aes(x = t, y = x)) +
geom_point(aes(col = as.factor(tmp), group = 1)) +
geom_point(data = filter(dummy, cross_threshold == 1), shape = 21, size = 5)

Loop inside another loop in R

I have a problem with results of loop in loop function. It counts inside loop only once and choose the best solution for the first raw and then stop.
I would like to remember the best solution for every row of the matrix zmienne. What am I doing wrong?
schaffer <- function(xx)
{x1 <- xx[1]
x2 <- xx[2]
fact1 <- (sin(x1^2-x2^2))^2 - 0.5
fact2 <- (1 + 0.001*(x1^2+x2^2))^2
y <- 0.5 + fact1/fact2
return(y)
}
gradient_descent <- function(func, step, niter) {
N <- 3 #N- number of random points
zmienne <- matrix(runif(N*2, min = -100, max = 100), N, 2)
print(zmienne)
h = 0.001;
iter_count = 0;
for (i in 1:N) {
x_0 <- zmienne[i,]
x_n = x_0;
for (j in 1:niter) {
func_grad = (func(x_n+h) - func(x_n))/h;
if (abs(func_grad) < 0.0001) { break; }
x_n = x_n - step * func_grad;
print(x_n)
iter_count = iter_count + 1
}
}
return(list(iterations = niter, best_value = func_grad, best_state = x_n, x0=x_0))
}
solution_m1 <- gradient_descent(schaffer, 0.1, 20)
solution_m1
I think this is what you want:
gradient_descent <- function(func, step, niter) {
N <- 3 #N- number of random points
zmienne <- matrix(runif(N*2, min = -100, max = 100), N, 2)
print(zmienne)
h = 0.001;
iter_count = 0;
best.vals <- NULL
for (i in 1:N) {
x_0 <- zmienne[i,]
x_n = x_0;
for (j in 1:niter) {
func_grad = (func(x_n+h) - func(x_n))/h;
if (abs(func_grad) < 0.0001) { break; }
x_n = x_n - step * func_grad;
print(x_n)
iter_count = iter_count + 1
}
best.vals <- c(best.vals, func_grad)
}
return(list(iterations = iter_count, best_value = best.vals, best_state = x_n, x0=x_0))
}
solution_m1 <- gradient_descent(schaffer, 0.1, 20)
solution_m1
The return should not be inside the inside loop but at then end of the function.

Resources