Questions about using NLS.LM in the minpack.lm package - r

I am trying to use the minpack.lm package in R. Specifically the NLS.LM function. I'm pouring through the manual and help files, but the requirements for setting it up are a little beyond my current capabilities. Any guidance is greatly appreciated. Here is my code, and the error statement I'm getting, below.
R Code:
# Thomas P. Taggart
# ERE445/645
# Spring 2013 - Calibration Presentation
# Lumped parameter rainfall-runoff model for the Susquehanna River at Conklin, NY.
# Outlined in Haith's (1987) GWLF model. The model uses the SCS curve
# number runoff technique to determine runoff, with snowpack, unsaturated zone, and
# saturated zone mass balances. Evapotranspiration is to be determined using Hamon’s
# method with average monthly values for daylight hours.
# In this model we assume the following constants, which are determined through calibration:
# Baseflow Recession Coefficient, Kb
# Field capacity, FCAP
# Curve number for average moisture conditions, CN2
# Initial antecedent moisture conditions, iAMC
# Initial snow accumulation, iSNt
# Initial saturated zone storage, iSATt
# No deep groundwater seepage
# including needed functions
source("Functions.R")
source("distributionFunctions.R")
source("GWLF_Model.R")
require(ggplot2)
require(reshape)
library(minpack.lm)
library(scales)
###############################################################################################
# USGS Discharge data for Conklin, NY - Gage on the Susquehanna
# Reading in the input file
dischargeInput <- read.csv("USGS_DailyDischarge_ConklinNY_01503000_A.csv", header=TRUE)
###############################################################################################
# Weather Data
# Read in input file
weatherInput = read.csv("Conklin_NY_WeatherData_Edit.csv")
###############################################################################################
# Setting up the model inputs - inital Run
# Baseflow Recession, Kb
Kb <- 0.90
# Initial unsaturated storage is at field capacity, FCAP (cm)
FCAP <- 10
# Curve number for average moisture conditions, CN
CN <- 65.7
# Initial antecedent moisture conditions, AMC
AMC <- 1.5
# Initial saturated zone storage, SATt
iSATt <- 0.45
# Snowmelt constant, K
K <- 0.45
parameters <- c(Kb, FCAP,CN, AMC, iSATt, K)
# Calling the Model - 1st time to see the initial outputs
# GWLF(parameters, dischargeInput, weatherInput)
###############################################################################################
# Calibrating the model
guess <- c("Kb"=0.1, "FCAP"=1,"CN"=50, "AMC"=0, "iSATt"=0, "K"=0.5)
out <- nls.lm(par = guess, fn = GWLF(parameters, dischargeInput, weatherInput))
Here is the error message:
Error in function (par) : could not find function "fn"
How do I need to setup par? Or the 1st argument in the function i'm calling within nls.lm?
The GWLf function is being passed 6 parameters that are used as constants in the function. These are the 6 parameters i hope to calibrate.
Thanks,
Tom

From reading ?nls.lm
You need to pass the function, not a call to the function
out <- nls.lm(par = guess, fn = GWLF, dischargeInput, weatherInput)
Note the extra arguments (which I assume are the data) are passed within ...
It would be safer to name these arguments using whatever argument names you wish these to be within GWLF.

Related

siber.ellipses -> Error in rmultireg(Y, X, Bbar, A, nu, V) : not a matrix

I am very new to R and am currently trying to create siber ellipses.
I watched the potcast Using ellipses to compare community members:(http://www.tcd.ie/Zoology/research/research/theoretical/Rpodcasts.php#siber) and got along just fine in the beginning. Whenever i get to the function of the siber.ellipses i get an Error:
(Error in rmultireg(Y, X, Bbar, A, nu, V) : not a matrix)
I can not figure out why. I get it with my own data as well as with the example data in the zip file provided along with the script.
I have researched the Error message online but could not come up with an answer.
It almost can be an error in the script or the data since I used those exactly as provided. My R version is 3.3.2
Does it have something to do with some kind of settings?
What could be the reason?
Can someone please helpe me :)
thanks
# this demo generates some random data for M consumers based on N samples and
# constructs a standard ellipse for each based on SEAc and SEA_B
rm(list = ls())
library(siar)
-------------------------------------------------------------------------
# ANDREW - REMOVE THESE LINES WHICH SHOULD BE REDUNDANT
# change this line
setwd("C:/Users/elisabeth/Desktop/R/demo")
# -----------------------------------------------------------------------------
# now close all currently open windows
graphics.off()
# read in some data
# NB the column names have to be exactly, "group", "x", "y"
mydata <- read.table("example_ellipse_data.txt",sep="\t",header=T)
# make the column names availble for direct calling
attach(mydata)
# now loop through the data and calculate the ellipses
ngroups <- length(unique(group))
# split the isotope data based on group
spx <- split(x,group)
spy <- split(y,group)
# create some empty vectors for recording our metrics
SEA <- numeric(ngroups)
SEAc <- numeric(ngroups)
TA <- numeric(ngroups)
dev.new()
plot(x,y,col=group,type="p")
legend("topright",legend=as.character(paste("Group ",unique(group))),
pch=19,col=1:length(unique(group)))
for (j in unique(group)){
# Fit a standard ellipse to the data
SE <- standard.ellipse(spx[[j]],spy[[j]],steps=1)
# Extract the estimated SEA and SEAc from this object
SEA[j] <- SE$SEA
SEAc[j] <- SE$SEAc
# plot the standard ellipse with d.f. = 2 (i.e. SEAc)
# These are plotted here as thick solid lines
lines(SE$xSEAc,SE$ySEAc,col=j,lty=1,lwd=3)
# Also, for comparison we can fit and plot the convex hull
# the convex hull is plotted as dotted thin lines
#
# Calculate the convex hull for the jth group's isotope values
# held in the objects created using split() called spx and spy
CH <- convexhull(spx[[j]],spy[[j]])
# Extract the area of the convex hull from this object
TA[j] <- CH$TA
# Plot the convex hull
lines(CH$xcoords,CH$ycoords,lwd=1,lty=3)
}
# print the area metrics to screen for comparison
# NB if you are working with real data rather than simulated then you wont be
# able to calculate the population SEA (pop.SEA)
# If you do this enough times or for enough groups you will easily see the
# bias in SEA as an estimate of pop.SEA as compared to SEAc which is unbiased.
# Both measures are equally variable.
print(cbind(SEA,SEAc,TA))
# So far we have fitted the standard ellipses based on frequentist methods
# and calculated the relevant metrics (SEA and SEAc). Now we turn our attention
# to producing a Bayesian estimate of the standard ellipse and its area SEA_B
reps <- 10^4 # the number of posterior draws to make
# Generate the Bayesian estimates for the SEA for each group using the
# utility function siber.ellipses
SEA.B <- siber.ellipses(x,y,group,R=reps)
Error in rmultireg(Y, X, Bbar, A, nu, V) : not a matrix
Reinstall the SIAR package using Jackson's Github site:
library(devtools)
install_github("andrewljackson/siar#v4.2.2", build_vingettes == TRUE)
library(siar)

Parameter and initial conditions fitting ODE models with nls.lm

I am currently trying to fit ODE functional responses using the Levenberg-Marquardt routine (nls.lm) in pkg-minpack.lm following the tutorial here (http://www.r-bloggers.com/learning-r-parameter-fitting-for-models-involving-differential-equations/).
In the example, he fits the data by first setting up a function rxnrate which I modified shown below:
library(ggplot2) #library for plotting
library(reshape2) # library for reshaping data (tall-narrow <-> short-wide)
library(deSolve) # library for solving differential equations
library(minpack.lm) # library for least squares fit using levenberg-marquart algorithm
# prediction of concentration
# rate function
rxnrate=function(t,c,parms){
# rate constant passed through a list called parms
k1=parms$k1
k2=parms$k2
k3=parms$k3
# c is the concentration of species
# derivatives dc/dt are computed below
r=rep(0,length(c))
r[1]=-k1*c["A"] #dcA/dt
r[2]=k1*c["A"]-k2*c["B"]+k3*c["C"] #dcB/dt
r[3]=k2*c["B"]-k3*c["C"] #dcC/dt
# the computed derivatives are returned as a list
# order of derivatives needs to be the same as the order of species in c
return(list(r))
}
My problem is that the initial condition of each states can be also considered as the estimated parameters. However, it does not work properly at the moment.
Below is my code:
# function that calculates residual sum of squares
ssq=function(myparms){
# inital concentration
cinit=c(A=myparms[4],B=0,C=0)
# time points for which conc is reported
# include the points where data is available
t=c(seq(0,5,0.1),df$time)
t=sort(unique(t))
# parms from the parameter estimation routine
k1=myparms[1]
k2=myparms[2]
k3=myparms[3]
# solve ODE for a given set of parameters
out=ode(y=cinit,times=t,func=rxnrate,parms=list(k1=k1,k2=k2,k3=k3))
# Filter data that contains time points where data is available
outdf=data.frame(out)
outdf=outdf[outdf$time %in% df$time,]
# Evaluate predicted vs experimental residual
preddf=melt(outdf,id.var="time",variable.name="species",value.name="conc")
expdf=melt(df,id.var="time",variable.name="species",value.name="conc")
ssqres=preddf$conc-expdf$conc
# return predicted vs experimental residual
return(ssqres)
}
# parameter fitting using levenberg marquart algorithm
# initial guess for parameters
myparms=c(k1=0.5,k2=0.5,k3=0.5,A=1)
# fitting
fitval=nls.lm(par=myparms,fn=ssq)
Once I run this, an error comes out like this
Error in chol.default(object$hessian) :
the leading minor of order 1 is not positive definite
The problem of your code is the following one:
In the code-line cinit=c(A=myparms[4],B=0,C=0) you gave A the value of myparms[4] AND the name of myparms[4]. Let's see:
myparms=c(k1=0.5,k2=0.5,k3=0.5,A=1)
cinit=c(A=myparms[4],B=0,C=0)
print(cinit)
A.A B C
1 0 0
to solve this problem, you can do this:
myparms=c(k1=0.5,k2=0.5,k3=0.5,A=1)
cinit=c(A=unname(myparms[4]),B=0,C=0)
print(cinit)
A B C
1 0 0
or this:
myparms=c(k1=0.5,k2=0.5,k3=0.5,1)
cinit=c(A=unname(myparms[4]),B=0,C=0)
print(cinit)
A B C
1 0 0
Then your code will work!
Best regards,
J_F

Combining binary classification algorithms

I have several algorithms which solve a binary classification (with response 0 or 1) problem by assigning to each observation a probability of the target value being equal to 1. All the algorithms try to minimize the log loss function where N is the number of observations, y_i is the actual target value and p_i is the probability of 1 predicted by the algorithm. Here is some R code with sample data:
actual.response = c(1,0,0,0,1)
prediction.df = data.frame(
method1 = c(0.5080349,0.5155535,0.5338271,0.4434838,0.5002529),
method2 = c(0.5229466,0.5298336,0.5360780,0.4217748,0.4998602),
method3 = c(0.5175378,0.5157711,0.5133765,0.4372109,0.5215695),
method4 = c(0.5155535,0.5094510,0.5201827,0.4351625,0.5069823)
)
log.loss = colSums(-1/length(actual.response)*(actual.response*log(prediction.df)+(1-actual.response)*log(1-prediction.df)))
The sample code gives the log loss for each algorithm:
method1 method3 method2 method4
0.6887705 0.6659796 0.6824404 0.6719181
Now I want to combine this algorithms so I can minimize the log loss even further. Is there any R package which can do this for me? I will appreciate references to any algorithms, articles, books or research papers which solve this kind of problem. Note that as a final result I want to have the predicted probabilities of each class and note plain 0,1 responses.
This is called ensemble learning (Wikipedia).
Check out this article: "an intro to ensemble learning in r."
Here is an example I did using the Cornell movie review data which can be downloaded by clicking the link. I used to data set with 1000 positive and 1000 negative reviews. Once you get the data into R:
library(RTextTools)
library(tm)
library(glmnet)
library(ipred)
library(randomForest)
library(data.table)
## create a column of sentiment score. 0 for negative and 1 for
## positive.
text_neg$pos_neg<-rep(0,1000)
text_pos$pos_neg<-rep(1,1000)
## Combine into 1 data.table and rename.
text_all<-rbind(text_neg, text_pos)
##dont forget to shuffle
set.seed(26)
text2<-text_all[sample(nrow(text_all)),]
## turn the data.frame into a document term matrix. This uses the handy
##RTextTools wrappers and functions.
doc_matrix <- create_matrix(text2$V1, language="english",
removeNumbers=TRUE, stemWords=TRUE, removeSparseTerms=.98)
ncol(data.frame(as.matrix(doc_matrix)))
## 2200 variables at .98 sparsity. runs pretty slow...
## create a container with the very nice RTextTools package
container <- create_container(doc_matrix, text2$pos_neg,
trainSize=1:1700, testSize=1701:2000, virgin=FALSE)
## train the data
time_glm<-system.time(GLMNET <- train_model(container,"GLMNET"));
time_glm #1.19
time_slda<-system.time(SLDA <- train_model(container,"SLDA"));
time_slda #45.03
time_bag<-system.time(BAGGING <- train_model(container,"BAGGING"));
time_bag #59.24
time_rf<-system.time(RF <- train_model(container,"RF")); time_rf #69.59
## classify with the models
GLMNET_CLASSIFY <- classify_model(container, GLMNET)
SLDA_CLASSIFY <- classify_model(container, SLDA)
BAGGING_CLASSIFY <- classify_model(container, BAGGING)
RF_CLASSIFY <- classify_model(container, RF)
## summarize results
analytics <- create_analytics(container,cbind( SLDA_CLASSIFY,
BAGGING_CLASSIFY,RF_CLASSIFY, GLMNET_CLASSIFY))
summary(analytics)
This ran an ensemble classifier using the 4 different methods (random forests, GLM, SLD and bagging). The ensemble summary at the end shows
# ENSEMBLE SUMMARY
#
# n-ENSEMBLE COVERAGE n-ENSEMBLE RECALL
# n >= 1 1.00 0.86
# n >= 2 1.00 0.86
# n >= 3 0.89 0.89
# n >= 4 0.63 0.96
That if all 4 methods agreed on if the review was positive or negative, then the ensemble had a 96% recall rate. But be careful, because with a binary outcome (2 choices) and 4 different algorithms, there is bound to be a lot of agreement.
See the RTextTools documentation for more explanation. They also do an almost identical example with U.S Congress data that I more or less mimicked in the above example.
Hope this was helpful.

R: Calculate sill, range and nugget from a raster object

I need to calculate the sill, range and nugget from a raster layer. I have explored gstat, usdm packages where one can create variogram however I couln't find a function which given a raster layer will estimate these parameters.In most of the functions these parameters have to be defined eg. krigging.
I have raster data layers for different heights which looks similar to
I would like get the sill, nugget and range from the parameters of semivariogram fitted to these data layers to create a plot similar to this:
The original data layers are available here as a multiband tiff. Here is a figure from this paper which further illustrates the concept.
Using gstat, here is an example:
library(raster)
library(gstat)
demo(meuse, ask = FALSE, echo = FALSE)
set.seed(131) # make random numbers reproducible
# add some noise with .1 variance
meuse.grid$dist = meuse.grid$dist + rnorm(nrow(meuse.grid), sd=sqrt(.1))
r = raster(meuse.grid["dist"])
v = variogram(dist~1, as(r, "SpatialPixelsDataFrame"))
(f = fit.variogram(v, vgm("Sph")))
# model psill range
# 1 Nug 0.09035948 0.000
# 2 Sph 0.06709838 1216.737
f$psill[2] # sill
# [1] 0.06709838
f$range[2] # range
# [1] 1216.737
f$psill[1] # nugget
# [1] 0.09035948
Plug in your own raster for r, and it should work. Change the Sph to fit another variogram model, try plot(v,f) to verify the plot.
This is just a guess. This is how I estimate semi variance
where n is the number of layers which their mean is less than the total mean. m is the total mean across all the layers. r is the mean of each layer that fell below the total mean.
s <- stack("old_gap_.tif")
m <- cellStats(mean(s), stat="mean", na.rm=T) # 0.5620522
r <- m[m < 0.5620522]
sem <- 1/53 * (0.5620522 - r)^2
plot(sem, r)

Using KDD cup 99 dataset and machine learning with R

I am trying to use the KDD cup 99 dataset with R but unfortunately, I get very bad results. Basically, the predictor is guessing (~50% error on the cross-validation set). There is probably a bug in my code but I can't find where.
The KDD cup 99 dataset is composed of around 4 millions examples which are separated in 4 different classes of attacks + the class "normal". First, I split the dataset into 5 files (one for each class + one for the class "normal") and I convert the non-numerical data into numerical ones. For the moment, I am working on the class "Remote to Local" (r2l). I select some features according to the results of a paper on the topic. Afterwards, I sample a number of "normal" instances equal to the number of r2l instances to avoid the problem of skewed class. I also replace all the labels for the different types of r2l attacks by the label "attack" so I can train a two-classes classifier. Then I join the sample to the r2l instances in one new dataset. Finally, I apply a 10-fold cross-validation to assess my model which is built using SVM and I get the worst results in the history of machine learning... :(
Here is my code:
r2l <- read.table("kddcup_r2l.data",sep=",",header=T)
#u2r <- read.table("kddcup_u2r.data",sep=",",header=T)
#probe_original <- read.table("kddcup_probe.data",sep=",",header=T)
#dos <- read.table("kddcup_dos.data",sep=",",header=T)
normal <- read.table("kddcup_normal.data",sep=",",header=T)
#probe <- probe_original[sample(1:dim(probe_original)[1],10000),]
# Features selected by the three algorithms svm, lgp and mars
# for the different classes of attack
########################################################################
features.r2l.svm <- c("srv_count","service","duration","count","dst_host_count")
features.r2l.lgp <- c("is_guest_login","num_access_files","dst_bytes","num_failed_logins","logged_in")
features.r2l.mars <- c("srv_count","service","dst_host_srv_count","count","logged_in")
features.r2l.combined <- unique(c(features.r2l.svm,features.r2l.lgp,features.r2l.mars))
# Sample the training set containing the normal labels
# for each class of attack in order to have the same number
# of training data belonging to the "normal" class and the
# "attack" class
#######################################################################
normal_sample.r2l <- normal[sample(1:dim(normal)[1],dim(r2l)[1]),]
# This part was useful before the separation normal/attack because
# attack was composed of different types for each class
######################################################################
normal.r2l.Y <- matrix(normal_sample.r2l[,c("label")])
#######################################################################
# Class of attack Remote to Local (r2l)
#######################################################################
# Select the features according to the algorithms(svm,lgp and mars)
# for this particular type of attack. Combined contains the
# combination of the features selected by the 3 algorithms
#######################################################################
#features.r2l.svm <- c(features.r2l.svm,"label")
r2l_svm <- r2l[,features.r2l.svm]
r2l_lgp <- r2l[,features.r2l.lgp]
r2l_mars <- r2l[,features.r2l.mars]
r2l_combined <- r2l[,features.r2l.combined]
r2l_ALL <- r2l[,colnames(r2l) != "label"]
r2l.Y <- matrix(r2l[,c("label")])
r2l.Y[,1] = "attack"
# Merge the "normal" instances and the "r2l" instances and shuffle the result
###############################################################################
r2l_svm.tr <- rbind(normal_sample.r2l[,features.r2l.svm],r2l_svm)
r2l_svm.tr <- r2l_svm.tr[sample(1:nrow(r2l_svm.tr),replace=F),]
r2l_lgp.tr <- rbind(normal_sample.r2l[,features.r2l.lgp],r2l_lgp)
r2l_lgp.tr <- r2l_lgp.tr[sample(1:nrow(r2l_lgp.tr),replace=F),]
r2l_mars.tr <- rbind(normal_sample.r2l[,features.r2l.mars],r2l_mars)
r2l_mars.tr <- r2l_mars.tr[sample(1:nrow(r2l_mars.tr),replace=F),]
r2l_ALL.tr <- rbind(normal_sample.r2l[,colnames(normal_sample.r2l) != "label"],r2l_ALL)
r2l_ALL.tr <- r2l_ALL.tr[sample(1:nrow(r2l_ALL.tr),replace=F),]
r2l.Y.tr <- rbind(normal.r2l.Y,r2l.Y)
r2l.Y.tr <- matrix(r2l.Y.tr[sample(1:nrow(r2l.Y.tr),replace=F),])
#######################################################################
#
# 10-fold CROSS-VALIDATION to assess the models accuracy
#
#######################################################################
# CV for Remote to Local
########################
cv(r2l_svm.tr, r2l_lgp.tr, r2l_mars.tr, r2l_ALL.tr, r2l.Y.tr)
And the cross-validation function:
cv <- function(svm.tr, lgp.tr, mars.tr, ALL.tr, Y.tr){
Jcv.svm_mean <- NULL
#Compute the size of the cross validation
# =======================================
index=sample(1:dim(svm.tr)[1])
size.CV<-floor(dim(svm.tr)[1]/10)
Jcv.svm <- NULL
#Start 10-fold Cross validation
# =============================
for (i in 1:10) {
# if m is the size of the training set
# (nr of rows in svm.tr for example)
# take n observations for test and (m-n) for training
# with n << m (here n = m/10)
# ===================================================
i.ts<-(((i-1)*size.CV+1):(i*size.CV))
i.tr<-setdiff(index,i.ts)
Y.tr.tr <- as.factor(Y.tr[i.tr])
Y.tr.ts <- as.factor(matrix(Y.tr[i.ts],ncol=1))
svm.tr.tr <- svm.tr[i.tr,]
svm.tr.ts <- svm.tr[i.ts,]
# Get the model for the algorithms
# ==============================================
model.svm <- svm(Y.tr.tr~.,svm.tr.tr,type="C-classification")
# Compute the prediction
# ==============================================
Y.hat.ts.svm <- predict(model.svm,svm.tr.ts)
# Compute the error
# ==============================================
h.svm <- NULL
h.svm <- matrix(Y.hat.ts.svm,ncol=1)
Jcv.svm <- c(Jcv.svm ,sum(!(h.svm == Y.tr.ts))/size.CV)
print(table(h.svm,Y.tr.ts))
}
Jcv.svm_mean <- c(Jcv.svm_mean, mean(Jcv.svm))
d <- 10
print(paste("Jcv.svm_mean: ", round(Jcv.svm_mean,digits=d) ))
}
I obtain very strange results. It seems that the algorithm doesn't really see any difference between the instances. It looks like a guess more than a prediction. I also tried with the class of attack "Probe" but obtain the same results. The paper that I mentioned earlier had a 30% accuracy on the class r2l and 60-98% (depending on the polynomial degree) on probe.
Here is the prediction for one of the 10 fold of the cross-validation:
h.svm(attack) & Y.tr.ts(attack) --> 42 instances
h.svm(attack) & Y.tr.ts(normal.) --> 44 instances
h.svm(normal.) & Y.tr.ts(attack) --> 71 instances
h.svm(normal.) & Y.tr.ts(normal.) --> 68 instances
I would be really grateful if somebody could tell me what is wrong with my code.
Thank you in advance
couldn't be sure if this is your problem, but there are known problems with that data set.
http://www.bruggerink.com/~zow/GradSchool/KDDCup99Harmful.html
sorry I couldn't help with code, I don't know R :/

Resources