How to save and load spline interpolation functions in R? - r

I need to create thousands and thousands of interpolation splines, each based on 5 pairs of (x, y) values. I would like to save them in a database (or csv file).
How can I export / import them, say in a text format or as an array of real parameters to rebuild each function when I need them?

If you are using the splinefun function from R base package stats, it is very easy to export its construction information.
set.seed(0)
xk <- c(0, 1, 2)
yk <- round(runif(3), 2)
f <- splinefun(xk, yk, "natural") ## natural cubic spline
construction_info <- environment(f)$z
str(construction_info)
# $ method: int 2
# $ n : int 3
# $ x : num [1:3] 0 1 2
# $ y : num [1:3] 0.9 0.27 0.37
# $ b : num [1:3] -0.812 -0.265 0.282
# $ c : num [1:3] 0 0.547 0
# $ d : num [1:3] 0.182 -0.182 0
The following illustrates what they mean and how we may re-construct the spline manually.
There are n = 3 points, (x[i], y[i]), hence two pieces.
attach(construction_info)
## plot the interpolation spline in gray
curve(f(x, 0), from = x[1], to = x[n], lwd = 10, col = 8)
## highlight knots
points(x, y, pch = 19)
## piecewise re-construction
piece_cubic <- function (x, xi, yi, bi, ci, di) {
yi + bi * (x - xi) + ci * (x - xi) ^ 2 + di * (x - xi) ^ 3
}
## loop through pieces
for (i in 1:(n - 1)) {
curve(piece_cubic(x, x[i], y[i], b[i], c[i], d[i]), from = x[i], to = x[i + 1],
add = TRUE, col = i + 1)
}
detach(construction_info)
We see that our manual re-construction is correct.
Exporting construction information allows us to move away from R and use it elsewhere.

Related

Solve a linear programming (LP) problem in R

The function takes in the coefficients of the objective function, the constraints matrix, the right-hand side values for the constraints, the direction of the constraints, and the type of the LP problem (minimization or maximization). It then uses the lpSolveAPI package to create an LP problem, set the problem type, decision variable types, and constraints, and then solves the LP problem. The function returns a list containing the optimal solution and the objective function value, which can be accessed by the user. The function is then called with specific inputs and the optimal solution and objective function value are printed
all seems correct but i have sm issues when it come to
here is my function :
solve_lp <- function(objective_coefs, constraints_matrix, constraints_rhs, constraints_dir, problem_type) {
# Load the lpSolveAPI package
library(lpSolveAPI)
# Set the number of rows (constraints) and columns (decision variables)
nrow <- nrow(constraints_matrix)
ncol <- ncol(constraints_matrix)
# Create an LP problem with nrow constraints and ncol decision variables
lprec <- make.lp(nrow = nrow, ncol = ncol)
# Set the type of problem to minimize or maximize the objective function based on the problem_type argument
lp.control(lprec, sense=problem_type)
# Set the type of decision variables to integer
set.type(lprec, 1:ncol, type=c("integer"))
# Set the objective function coefficients
set.objfn(lprec, objective_coefs)
# Add the constraints to the LP problem
for (i in 1:nrow) {
add.constraint(lprec, constraints_matrix[i, ], constraints_dir[i], constraints_rhs[i])
}
# Solve the LP problem
solve(lprec)
# If the problem has a feasible solution, get the decision variables values and the value of the objective function
solution <- get.variables(lprec)
obj_value <- get.objective(lprec)
# Return the optimal solution and objective function value
return(list(solution = solution, obj_value = obj_value))
}
objective_coefs <- c(15, 3, -6)
constraints_matrix <- matrix(c(1, 1, 1,
2, -1, -2,
2, 3, -5), nrow=3, byrow=TRUE)
constraints_rhs <- c(36, 8, 10)
constraints_dir <- c("<=", ">=", "=")
problem_type <- "min"
# Solve the LP problem using the solve_lp function
result <- solve_lp(objective_coefs = objective_coefs, constraints_matrix = constraints_matrix, constraints_rhs = constraints_rhs, constraints_dir = constraints_dir, problem_type= problem_type)
# Extract the optimal solution and objective function value
optimal_solution <- result$solution
obj_value <- result$obj_value
# Print the results
print(paste("Optimal solution:", optimal_solution))
print(paste("Objective function value:", obj_value))
min z = 15x1 + 3x2 − 6x3
S.C
x1 + x2 + x3 ≤ 36
2x1 − x2 − 2x3 ≥ 8
2x1 + 3x2 − 5x3 = 10
x1, x2, x3 ≥ 0
the result output turned for this program linear
"Optimal solution: 5" "Optimal solution: 0" "Optimal solution: 0"
i tested this program from a program that we did manually in class but we had different
output
7/14 0.5 0
my question is which solution is the right one
It would be easier to use lpSolve. res$solution gives the solution. res$status of 0 means it succeeded.
library(lpSolve)
res <- lp("min", objective_coefs, constraints_matrix,
constraints_dir, constraints_rhs)
str(res)
giving:
List of 28
$ direction : int 0
$ x.count : int 3
$ objective : num [1:3] 15 3 -6
$ const.count : int 3
$ constraints : num [1:5, 1:3] 1 1 1 1 36 2 -1 -2 2 8 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:5] "" "" "" "const.dir.num" ...
.. ..$ : NULL
$ int.count : int 0
$ int.vec : int 0
$ bin.count : int 0
$ binary.vec : int 0
$ num.bin.solns : int 1
$ objval : num 65.2
$ solution : num [1:3] 4.25 0.5 0
$ presolve : int 0
$ compute.sens : int 0
$ sens.coef.from : num 0
$ sens.coef.to : num 0
$ duals : num 0
$ duals.from : num 0
$ duals.to : num 0
$ scale : int 196
$ use.dense : int 0
$ dense.col : int 0
$ dense.val : num 0
$ dense.const.nrow: int 0
$ dense.ctr : num 0
$ use.rw : int 0
$ tmp : chr "Nobody will ever look at this"
$ status : int 0
- attr(*, "class")= chr "lp"
Note
We used these inputs:
objective_coefs <- c(15, 3, -6)
constraints_matrix <- matrix(c(1, 1, 1,
2, -1, -2,
2, 3, -5), nrow=3, byrow=TRUE)
constraints_rhs <- c(36, 8, 10)
constraints_dir <- c("<=", ">=", "=")
problem_type <- "min"

Is there a R function to solve figure out what data type is the proper here?

I want to run a loop with 26 matrices, 13 with numbers (e.g. 1,1,2,2,2,3) and the other 13 with letters simulating diferent parameters (e.g. U1, U2, U3...etc). My problem comes when I want to run them in a loop attaching the first term of each matrix in order to run them simultaneously in the function. The errors that appears is the next:
Errors were caught in checkModelList The model value for U is not
allowed. Check ?MARSS.form Error: Stopped in checkModelList() due to
specification problem(s).
I drop the code and the data structure below:
str(Y)
num [1:43, 1:24] NA NA NA 0.158 -1.172 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:43] "WGR_D_l.s" "WGR_D_l.m" "WGR_D_l.l" "WGR_Sh_l.s" ...
..$ : NULL
str(Z1)
num [1:43, 1] 1 1 1 1 1 1 1 1 1 1 ...
str(U1)
chr [1, 1] "U1"
z = c("Z1_","Z2_",'Z3_', 'Z4_', 'Z5_', "Z6_","Z7_",'Z8_', 'Z9_', 'Z10_','Z11_', 'Z12_', 'Z13_')
u = c("U1_","U2_",'U3_', 'U4_', 'U5_', "U6_","U7_",'U8_', 'U9_', 'U10_','U11_', 'U12_', 'U13_')
Q = c("unconstrained", "diagonal and unequal", "diagonal and equal")
q = c('Qun1','Qdu1','Qde1')
for(g in 1:length(U)){
model01$U = U[g]
for(i in 1:length(Z)){
model01$Z = Z[[i]]
for(j in 1:length(Q)){
model01$Q = Q[j]
print(paste(q[j], sep=""))
m1 = MARSS(Y, model=model01,
control=list(maxit = 5000,trace = -1, conv.test.slope.tol=100),
silent=2, method="kem")
model.name.txt = paste("C:/Users/ubeda/OneDrive/Desktop/Resultados post-TFM/1. RUN_Nocovariates_2xU/TXT/",q[j],'.txt', sep='')
capture.output(m1,file=model.name.txt)
print(model.name.txt)
model.name.rds = paste("C:/Users/ubeda/OneDrive/Desktop/Resultados post-TFM/1. RUN_Nocovariates_2xU/RDS/",q[j], '.rds', sep='')
print(model.name.rds)
saveRDS(m1,model.name.rds)
}#j
}#i
}#end g
However, if I run them one by one the code run smoothly. This would cost me an eternity to run the code 13 times, and I thought the loop was the best option. Here I drop the code in which the model run withouth problems:
Q = c("unconstrained", "diagonal and unequal", "diagonal and equal")
q = c('Qun1','Qdu1','Qde1')
h = "catches raw"
for(j in 1:length(Q)){
model01$Q = Q[j]
print(paste(q[j], sep=""))
m1 = MARSS(Y, model=model01,
control=list(maxit = 5000,trace = -1, conv.test.slope.tol=100),
silent=2, method="kem")
model.name.txt = paste("C:/Users/ubeda/OneDrive/Desktop/Resultados post-TFM/RUN_NAO_catches_raw_2xU/TXT/",q[j],h[1],'.txt', sep='')
capture.output(m1,file=model.name.txt)
print(model.name.txt)
model.name.rds = paste("C:/Users/ubeda/OneDrive/Desktop/Resultados post-TFM/RUN_NAO_catches_raw_2xU/RDS/",q[j],h[1], '.rds', sep='')
print(model.name.rds)
saveRDS(m1,model.name.rds)
}#j
Anyone has any idea about what is my problem?

Drawing Dendogram using R with Agglomerative hierarchical clustering (AHC) techniques with Complete link method

I have calculated the Distance matrix with the complete link method as shown in the image below:
The pairwise distance betwwen the clusters are
{0.5,1.12,1.5,3.61}
But While implementing with the same matrix in R with the code below:
Matrix
x1,x2,x3,x4,x5
0,0.5,2.24,3.35,3
0.5,0,2.5,3.61,3.04
2.24,2.5,0,1.12,1.41
3.35,3.61,1.12,0,1.5
3,3.04,1.41,1.5,0
Implementation:
library(cluster)
dt<-read.csv("cluster.csv")
df<-scale(dt[-1])
dc<-dist(df,method = "euclidean")
hc1 <- hclust(dc, method = "complete" )
plot(hc1, labels = c("x1", "x2","x3","x4","x5"),
hang = 0.1,
main = "Cluster dendrogram", sub = NULL,
xlab = NULL, ylab = "Height")
abline(h = hc1$height, lty = 2, col = "lightgrey")
str(hc1)
str(hc1)
List of 7
$ merge : int [1:4, 1:2] -1 -3 -5 1 -2 -4 2 3
$ height : num [1:4] 0.444 1.516 1.851 3.753
$ order : int [1:5] 1 2 5 3 4
$ labels : NULL
$ method : chr "complete"
$ call : language hclust(d = dc, method = "complete")
$ dist.method: chr "euclidean"
- attr(*, "class")= chr "hclust"
I have got the height as: 0.444 1.516 1.851 3.753
Which means the dendogram will be different in both cases, why is that different in both cases? May be i have done something wrong on the implementing on both ways?
Since the provided matrix is the euclidean distance matrix, so i don't need to calculate the distance matrix: rather i should convert the data.frame to dist.matrix. and to as.dist(m).
The below code will give me the exact result which was obtained from the paper calculation:
library(reshape)
dt<-read.csv("C:/Users/Aakash/Desktop/cluster.csv")
m <- as.matrix(dt)
hc1 <- hclust(as.dist(m), method = "complete" )
plot(hc1, labels = c("x1", "x2","x3","x4","x5"),
hang = 0.1,
main = "Complete Method Dendogram", sub = NULL,
xlab = "Items", ylab = "Height")
abline(h = hc1$height, lty = 2, col = "lightgrey")
str(hc1)
height : num [1:4] 0.5 1.12 1.5 3.61
Obtained Dendogram:

Extracting the value of the local truncation error (LTE) at each integration step when using the ode() function from the deSolve R-package

The default numerical scheme used by the ode() function in the deSolve R package is the lsode method which implements the BDF and implicit Adams linear multisteps schemes to solve the ODE system. The integration is completed with a variable step size, that is controlled by estimating the local truncation error (LTE) at each step. In practice, the step size is adapted in such a way that the estimate of the LTE is smaller than some preset value rtol and atol.
The default values in the ode method are:
ode(rtol = 1e-6, atol = 1e-6)
And can be adapted.
However, the actual evaluated LTE is always something above these values. Is there a way to extract this value from the solver?
An example system is given bellow:
rm(list = ls())
install.packages('deSolve')
library('deSolve')
# Example ODE system for the Lotka-V predator prey model
LVmod <- function(Time, State, Pars) {
with(as.list(c(State, Pars)), {
Ingestion <- rIng * Prey * Predator
GrowthPrey <- rGrow * Prey * (1 - Prey/K)
MortPredator <- rMort * Predator
dPrey <- GrowthPrey - Ingestion
dPredator <- Ingestion * assEff - MortPredator
return(list(c(dPrey, dPredator)))
})
}
# values of the parameters
pars <- c(rIng = 0.2, # /day, rate of ingestion
rGrow = 1.0, # /day, growth rate of prey
rMort = 0.2 , # /day, mortality rate of predator
assEff = 0.5, # -, assimilation efficiency
K = 10) # mmol/m3, carrying capacity
#initial values for the predator prey state variables
yini <- c(Prey = 1, Predator = 2)
#times at which the ode return state output
times <- seq(0, 200, by = 1)
# the solver using lsode
out <- ode(yini, times, LVmod, pars)
summary(out)
str(out)
deSolve [1:201, 1:3] 0 1 2 3 4 5 6 7 8 9 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:3] "time" "Prey" "Predator"
- attr(*, "istate")= int [1:21] 2 282 517 NA 1 1 0 52 22 NA ...
- attr(*, "rstate")= num [1:5] 1 1 201 0 143
- attr(*, "lengthvar")= int 2
- attr(*, "type")= chr "lsoda"

Plot from package "lomb" in ggplot2

I am using the package "lomb" to calculate Lomb-Scargle Periodograms, a method for analysing biological time series data. The package does create a plot if you tell it to do so. However, the plots are not too nice (compared to ggplot2 plots). Therefore, I would like to plot the results with ggplot. However, I do not know how to access the function for the curve plotted...
This is a sample code for a plot:
TempDiff <- runif(4033, 3.0, 18) % just generate random numbers
Time2 <- seq(1,4033) % Time vector
Rand.LombScargle <- randlsp(repeats=10, TempDiff, times = Time2, from = 12, to = 36,
type = c("period"), ofac = 10, alpha = 0.01, plot = T,
trace = T, xlab="period", main = "Lomb-Scargle Periodogram")
I have also tried to find out something about the function looking into the function randlsp itself, but could not really find anything that seemed useful to me there...
getAnywhere(randlsp)
A single object matching ‘randlsp’ was found
It was found in the following places
package:lomb
namespace:lomb
with value
function (repeats = 1000, x, times = NULL, from = NULL, to = NULL,
type = c("frequency", "period"), ofac = 1, alpha = 0.01,
plot = TRUE, trace = TRUE, ...)
{
if (is.ts(x)) {
x = as.vector(x)
}
if (!is.vector(x)) {
times <- x[, 1]
x <- x[, 2]
}
if (plot == TRUE) {
op <- par(mfrow = c(2, 1))
}
realres <- lsp(x, times, from, to, type, ofac, alpha, plot = plot,
...)
realpeak <- realres$peak
pks <- NULL
if (trace == TRUE)
cat("Repeats: ")
for (i in 1:repeats) {
randx <- sample(x, length(x))
randres <- lsp(randx, times, from, to, type, ofac, alpha,
plot = F)
pks <- c(pks, randres$peak)
if (trace == TRUE) {
if (i/10 == floor(i/10))
cat(i, " ")
}
}
if (trace == TRUE)
cat("\n")
prop <- length(which(pks >= realpeak))
p.value <- prop/repeats
if (plot == TRUE) {
mx = max(c(pks, realpeak)) * 1.25
hist(pks, xlab = "Peak Amplitude", xlim = c(0, mx), main = paste("P-value: ",
p.value))
abline(v = realpeak)
par(op)
}
res = realres[-(8:9)]
res = res[-length(res)]
res$random.peaks = pks
res$repeats = repeats
res$p.value = p.value
class(res) = "randlsp"
return(invisible(res))
Any idea will be appreciated!
Best,
Christine
PS: Here an example of the plot with real data.
The key to getting ggplot graphs out of any returned object is to convert the data that you need in to some sort of data.frame. To do this, you can look at what kind of object your returned value is and see what sort of data you can immediately extract into a data.frame
str(Rand.LombScargle) # get the data type and structure of the returned value
List of 12
$ scanned : num [1:2241] 12 12 12 12 12 ...
$ power : num [1:2241] 0.759 0.645 0.498 0.341 0.198 ...
$ data : chr [1:2] "times" "x"
$ n : int 4033
$ type : chr "period"
$ ofac : num 10
$ n.out : int 2241
$ peak : num 7.25
$ peak.at : num [1:2] 24.6908 0.0405
$ random.peaks: num [1:10] 4.99 9.82 7.03 7.41 5.91 ...
$ repeats : num 10
$ p.value : num 0.3
- attr(*, "class")= chr "randlsp"
in the case of randlsp, its a list, which is usually what is returned from statistical functions. Most of this information can also be obtained from ?randlsp too.
It looks as if Rand.LombScargle$scanned and Rand.LombScargle$power contains most of what is needed for the first graph:
There is also a horizontal line on the Periodogram, but it doesn't correspond to anything that was returned by randlsp. Looking at the source code that you provided, it looks as if the Periodogram is actually generated by lsp().
LombScargle <- lsp( TempDiff, times = Time2, from = 12, to = 36,
type = c("period"), ofac = 10, alpha = 0.01, plot = F)
str(LombScargle)
List of 12
$ scanned : num [1:2241] 12 12 12 12 12 ...
$ power : num [1:2241] 0.759 0.645 0.498 0.341 0.198 ...
$ data : chr [1:2] "Time2" "TempDiff"
$ n : int 4033
$ type : chr "period"
$ ofac : num 10
$ n.out : int 2241
$ alpha : num 0.01
$ sig.level: num 10.7
$ peak : num 7.25
$ peak.at : num [1:2] 24.6908 0.0405
$ p.value : num 0.274
- attr(*, "class")= chr "lsp"
I am guessing that, based on this data, the line is indicating the significance level LombScargle$sig.level
Putting this together, we can create our data to pass to ggplot from lsp:
lomb.df <- data.frame(period=LombScargle$scanned, power=LombScargle$power)
# use the data frame to set up the line plot
g <- ggplot(lomb.df, aes(period, power)) + geom_line() +
labs(y="normalised power", title="Lomb-Scargle Periodogram")
# add the sig.level horizontal line
g + geom_hline(yintercept=LombScargle$sig.level, linetype="dashed")
For the histogram, it looks like this is based on the vector Rand.LombScargle$random.peaks from randlsp:
rpeaks.df <- data.frame(peaks=Rand.LombScargle$random.peaks)
ggplot(rpeaks.df, aes(peaks)) +
geom_histogram(binwidth=1, fill="white", colour="black") +
geom_vline(xintercept=Rand.LombScargle$peak, linetype="dashed") +
xlim(c(0,12)) +
labs(title=paste0("P-value: ", Rand.LombScargle$p.value),
x="Peak Amplitude",
y="Frequency")
Play around with these graphs to get them looking to your taste.

Resources