Error in nls model- singular gradient - r

I am trying to run a nls model for a pavement data ,I have different sections and I need to run a same model (of course with different variables for each section)
following is my model
data<-read.csv(file.choose(),header=TRUE)
data
SecNum=data[,1]
t=c()
PCI=c()
t
PCI
for(i in 1:length(SecNum)){
if(SecNum[i] == SecNum[i+1]){
tt=data[i,2]
PCIt=data[i,3]
t=c(t, tt)
PCI=c(PCI, PCIt)
} else {
tt=data[i,2]
PCIt=data[i,3]
t=c(t, tt)
PCI=c(PCI, PCIt)
alpha=1
betha=1
theta=1
fit = nls(PCI ~ alpha-beta*exp((-theta*(t^gama))),
start=c(alpha=min(c(PCI, PCIt)),beta=1,theta=1,gama=1))
fitted(fit)
resid(fit)
print(t)
print(PCI)
t= c()
PCI= c()
}
}
after running my model I received an error like this
Error in nls(PCI ~ alpha - beta * exp((-theta * (t^gama))), start = c(alpha = min(c(PCI, :
singular gradient
My professor told me to use "nlsList" instead and that might solve the problem.
since I am new in R I don't know how to do that .I would be really thankful if anyone can advice me how to do it.
here is a sample of my data.
SecNum t PCI AADT ESAL
1 962 1 90.46 131333 3028352
2 962 2 90.01 139682 3213995
3 962 3 86.88 137353 2205859
4 962 4 86.36 137353 2205859
5 962 5 84.56 137353 2205859
6 962 6 85.11 137353 2205859
7 963 1 91.33 91600 3726288

Related

How to use genetic algorithm for prediction correctly

I'm trying to use genetic algorithm for classification problem. However, I didn't succeed to get a summary for the model nor a prediction for a new data frame. How can I get the summary and the prediction for the new dataset?
Here is my toy example:
library(genalg)
dat <- read.table(text = " cats birds wolfs snakes
0 3 9 7
1 3 8 7
1 1 2 3
0 1 2 3
0 1 2 3
1 6 1 1
0 6 1 1
1 6 1 1 ", header = TRUE)
evalFunc <- function(x) {
if (dat$cats < 1)
return(0) else return(1)
}
iter = 100
GAmodel <- rbga.bin(size = 7, popSize = 200, iters = iter, mutationChance = 0.01,
elitism = T, evalFunc = evalFunc)
###########summary try#############
cat(summary.rbga(GAmodel))
# Error in cat(summary.rbga(GAmodel)) :
# could not find function "summary.rbga"
############# prediction try###########
dat$pred<-predict(GAmodel,newdata=dat)
# Error in UseMethod("predict") :
# no applicable method for 'predict' applied to an object of class "rbga"
Update:
After reading the answer given and reading this link:
Pattern prediction using Genetic Algorithm
I wonder how can I programmatically use the GA as part of a prediction mechanism? According to the link's text, one can use the GA for optimizing regression or NN and then use the predict function provided by them/
Genetic Algorithms are for optimization, not for classification. Therefore, there is no prediction method. Your summary statement was close to working.
cat(summary(GAmodel))
GA Settings
Type = binary chromosome
Population size = 200
Number of Generations = 100
Elitism = TRUE
Mutation Chance = 0.01
Search Domain
Var 1 = [,]
Var 0 = [,]
GA Results
Best Solution : 1 1 0 0 0 0 1
Some additional information is available from Imperial College London
Update in response to updated question:
I see from the paper that you mentioned how this makes sense. The idea is to use the genetic algorithm to optimize the weights for a neural network, then use the neural network for classification. This would be a big task, too big to respond here.

Error in if (!n.cat[i]) { : argument is of length zero

I just begin to learn to code using R and I tried to do a classification by C5.0. But I encounter some problems and I don't understand. I am looking for help with gratitude. Below is the code I learned from someone and I tried to use it to run my own data:
require(C50)
data.resultc50 <- c()
prematrixc50 <- c()
for(i in 3863:3993)
{
needdata$class <- as.factor(needdata$class)
trainc50 <- C5.0(class ~ ., needdata[1:3612,], trials=5, control=C5.0Control(noGlobalPruning = TRUE, CF = 0.25))
predc50 <- predict(trainc50, newdata=testdata[i, -1], trials=5, type="class")
data.resultc50[i-3862] <- sum(predc50==testdata$class[i])/length(predc50)
prematrixc50[i-3862] <- as.character.factor(predc50)
}
Belows are two objects needdata & testdata I used in the code above with part of their heads respectively:
class Volume MA20 MA10 MA120 MA40 MA340 MA24 BIAS10
1 1 2800 8032.00 8190.9 7801.867 7902.325 7367.976 1751 7.96
2 1 2854 8071.40 8290.3 7812.225 7936.550 7373.624 1766 6.27
3 0 2501 8117.45 8389.3 7824.350 7973.250 7379.444 1811 5.49
4 1 2409 8165.40 8488.1 7835.600 8007.900 7385.294 1825 4.02
# the above is "needdata" and actually has 15 variables with 3862 obs.
class Volume MA20 MA10 MA120 MA40 MA340 MA24 BIAS10
1 1 2800 8032.00 8190.9 7801.867 7902.325 7367.976 1751 7.96
2 1 2854 8071.40 8290.3 7812.225 7936.550 7373.624 1766 6.27
3 0 2501 8117.45 8389.3 7824.350 7973.250 7379.444 1811 5.49
4 1 2409 8165.40 8488.1 7835.600 8007.900 7385.294 1825 4.02
# the above is "testdata" and has 15 variables with 4112 obs.
The data above contain the factor class with value of 0 & 1. After I run it I got warnings below:
In predict.C5.0(trainc50, newdata = testdata[i, -1], trials = 5, ... : 'trials' should be <= 1 for this object. Predictions generated
using 1 trials
And when I try to look at the object trainc50 just created, I noticed the number of boosting iterations is 1 due to early stopping as shown below:
# trainc50
Call:
C5.0.formula(formula = class ~ ., data = needdata[1:3612, ],
trials = 5, control = C5.0Control(noGlobalPruning = TRUE,
CF = 0.25), earlyStopping = FALSE)
Classification Tree
Number of samples: 3612
Number of predictors: 15
Number of boosting iterations: 5 requested; 1 used due to early stopping
Non-standard options: attempt to group attributes, no global pruning
I also tried to plot the decision tree and I got the error as below:
plot(trainc50)
Error in if (!n.cat[i]) { : argument is of length zero
In addition: Warning message:
In 1:which(out == "Decision tree:") : numerical expression has 2 elements: only the first used
Does that mean my code is too bad to perform further trials while running C5.0? What is wrong? Can someone please help me out about why do I encounter early stopping and what does the error and waring message mean? How can I fix it? If anyone can help me I'll be very thankful.
Used in
http://r-project-thanos.blogspot.tw/2014/09/plot-c50-decision-trees-in-r.html
using function
C5.0.graphviz(firandomf,
"a.txt",
fontname='Arial',
col.draw='black',
col.font='blue',
col.conclusion='lightpink',
col.question='grey78',
shape.conclusion='box3d',
shape.question='diamond',
bool.substitute=c('None', 'yesno', 'truefalse', 'TF'),
prefix=FALSE,
vertical=TRUE)
And in the command line:
pip install graphviz
dot -Tpng ~plot/a.txt >~/plot/a.png

Neuronal Net accuracy is too low

I just wrote my first attempt of providing a neuronal network for household classification by using energy consumption features. So far I could make it run but the output seems to be questionable.
So like you can see I'm using 18 features (maybe to much?) to predict if it's a single or non-single household.
I got 3488 rows like this:
c_day c_weekend c_evening c_morning c_night c_noon c_max c_min r_mean_max r_min_mean r_night_day r_morning_noon
12 14 1826 9 765 3 447 2 878 0 7338 4
r_evening_noon t_above_1kw t_above_2kw t_above_mean t_daily_max single
3424 1 695 0 174319075712881 1
My neuronal network using these parameters:
net.nn <- neuralnet(single
~ c_day
+ c_weekend
+ c_weekday
+ c_evening
+ c_morning
+ c_night
+ c_noon
+ c_max
+ c_min
+ r_mean_max
+ r_min_mean
+ r_night_day
+ r_morning_noon
+ r_evening_noon
+ t_above_1kw
+ t_above_2kw
+ t_above_mean
+ t_daily_max
,train, hidden=15, threshold=0.01,linear.output=F)
1 repetition was calculated.
Error Reached Threshold Steps
1 126.3425379 0.009899229932 4091
I normalized the data before by using the min-max normalization formula:
for(i in names(full_data)){
x <- as.numeric(full_data[,i])
full_data[,i] <- (x-min(x)/max(x)-min(x))
}
I got 3488 rows of data and splitted them into a training and a test set.
half <- nrow(full_data)/2
train <- full_data[1:half,]
test <- full_data[half:3488,]
net.results <- compute(net.nn,test)
nn$net.result
I used the prediction method and bound it to the actual "single[y/no]"-column to compare the result:
predict <- nn$net.result
cleanoutput <- cbind(predict,full_data$single[half:3488])
colnames(cleanoutput) <- c("predicted","actual")
So when I print it, this is my classification result for the first 10 rows:
predicted actual
1701 0.1661093405 0
1702 0.1317067578 0
1703 0.1677147708 1
1704 0.2051188618 1
1705 0.2013035634 0
1706 0.2088726723 0
1707 0.2683753128 1
1708 0.1661093405 0
1709 0.2385537285 1
1710 0.1257108821 0
So if I understand it right, when I round the predicted output it should be either a 0 or 1 but it always ends up being a 0!
Am I using the wrong parameters? Is my data simply not suitable for nn prediction? Is the normalization wrong?
It means your model performance is still not good. Once you have reached good model performance after tuning you should get correct expected behavior. Neural net techniques are very susceptible of scale difference between different columns so standardization of data [mean =0 std =1] is a good practice. As pointed by OP scale() does the job.
Using scale(full_data) for the entire data did the trick. Now the data is normalized through standard-mean deviation and the output seems much more reliable.

Rjags error message: Dimension mismatch

I'm trying to study Bayesian analysis based on book "Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan (2015)".
In this book, there are examples. So, I'm trying to replicate this example in R.
However, I have got an error message in this example.
To be specific, this is the data of example.
data
y s
1 1 Reginald
2 0 Reginald
3 1 Reginald
4 1 Reginald
5 1 Reginald
6 1 Reginald
7 1 Reginald
8 0 Reginald
9 0 Tony
10 0 Tony
11 1 Tony
12 0 Tony
13 0 Tony
14 1 Tony
15 0 Tony
y<-data$y
s<-as.numeric(data$s)
Ntotal=length(y)
Nsubj=length(unique(s))
dataList=list(y=y, s=s, Ntotal=Ntotal, Nsubj=Nsubj)
Also, this is my model.
modelString="
model{
for(i in 1:Ntotal){
y[i] ~ dbern(theta[s[i]])
}
for(s in 1:Nsubj){
theta[s] ~ dbeta(2,2)
}
}
"
writeLines(modelString, con="TEMPmodel.txt")
library(rjags)
library(runjags)
jagsModel=jags.model(file="TEMPmodel.txt",data=dataList)
In this case, I got an error message.
Error in jags.model(file = "TEMPmodel.txt", data = dataList) :
RUNTIME ERROR:
Cannot insert node into theta[1...2]. Dimension mismatch
I don't know what I have made mistake in this code.
Please give me advice.
Thanks in advance.
As suggested by #nicola, the problem is that you're passing s as data to your model, but also using s as the counter iterating over 1:Nsubj. As JAGS indicates, this causes confusion about the dimension of theta... does it have length 15, or 2?
The following works:
model{
for(i in 1:Ntotal){
y[i] ~ dbern(theta[s[i]])
}
for(j in 1:Nsubj){
theta[j] ~ dbeta(2,2)
}
}

vectorising the application of mle2 models

I have written a model that I am fitting to data using ML via the mle2 package. However, I have a large data frame of samples and I would like to fit the model to each replicate, and then retrieve all of the coefficients of the model in a data frame.
I have tried to use the ddply function in the plyr package with no success.
I get the following error message when I try:
Error in output[[var]][rng] <- df[[var]] :
incompatible types (from S4 to logical) in subassignment type fix
Any thoughts?
here is an example of what I am doing.
This is my data frame. I have measurement in Pond 5...n on day 1....n. Measurements consist of 143 fluxes (flux.cor), which is the variable I am modelling.
Pond Obs Date Time Temp DO pH U day month PAR
932 5 932 2011-06-16 17:31:00 17:31:00 294.05 334.3750 8.47 2 1 1 685.08
933 5 933 2011-06-16 17:41:00 17:41:00 294.05 339.0625 8.47 2 1 1 808.44
934 5 934 2011-06-16 17:51:00 17:51:00 294.02 340.6250 8.46 2 1 1 752.78
935 5 935 2011-06-16 18:01:00 18:01:00 294.00 340.6250 8.45 2 1 1 684.14
936 5 936 2011-06-16 18:11:00 18:11:00 293.94 340.9375 8.50 2 1 1 625.86
937 5 937 2011-06-16 18:21:00 18:21:00 293.88 341.5625 8.48 2 1 1 597.06
day.night Treat H pOH OH DO.cor sd.DO av.DO DO.sat
932 1 A 3.388442e-09 5.53 2.951209e-06 342.1406 2.63078 342.1406 274.0811
933 1 A 3.388442e-09 5.53 2.951209e-06 339.0625 2.63078 342.1406 274.0811
934 1 A 3.467369e-09 5.54 2.884032e-06 340.6250 2.63078 342.1406 274.2432
935 1 A 3.548134e-09 5.55 2.818383e-06 340.6250 2.63078 342.1406 274.3513
936 1 A 3.162278e-09 5.50 3.162278e-06 340.9375 2.63078 342.1406 274.6763
937 1 A 3.311311e-09 5.52 3.019952e-06 341.5625 2.63078 342.1406 275.0020
DO_flux NEP.hr flux.cor sd.flux av.flux
932 -3.078125 -3.09222602 -3.078125 2.104482 -0.1070312
933 1.562500 1.54903673 1.562500 2.104482 -0.1070312
934 0.000000 -0.01375489 0.000000 2.104482 -0.1070312
935 0.312500 0.29876654 0.312500 2.104482 -0.1070312
936 0.625000 0.61126617 0.625000 2.104482 -0.1070312
here is the my model:
# function that generates predictions of O2 flux given GPP R and gas exchange
flux.pred <- function(GPP24, PAR, R24, Temp, U, DO, DOsat){
# calculates Schmidt coefficient from water temperature
Sc<-function(Temp){
S<-0.0476*(Temp)^2 + 3.7818*(Temp)^2 - 120.1*Temp + 1800.6
}
# calculates piston velocity k (m h-1) from wind speed at 10m (m s-1)
k600<-function(U){
k.600<-(2.07 + 0.215*((U)^1.7))/100
}
# calculates piston velocity k (m h-1) from wind speed at 10m (m s-1)
k<-function(Temp,U){
k<-k600(U)*((Sc(Temp)/600)^-0.5)
}
# physical gas flux (mg O2 m-2 10mins-1)
D<-function(Temp,U,DO,DOsat){
d<-(k(Temp,U)/6)*(DO-DOsat)
}
# main function to generate predictions
flux<-(GPP24/sum(YSI$PAR[YSI$PAR>40]))*(ifelse(YSI$PAR>40, YSI$PAR, 0))-(R24/144)+D(YSI$Temp,YSI$U,YSI$DO,YSI$DO.sat)
return(flux)
}
which returns predictions for the fluxes.
I then build my likelihood function:
# likelihood function
ll<-function(GPP24, PAR, R24, Temp, U, DO.cor, DO.sat){
pred = (flux.pred(GPP24, PAR, R24, Temp, U, DO.cor, DOsat))
pred = pred[-144]
obs = YSI$flux.cor[-144]
return(-sum(dnorm(obs, mean=pred, sd=sqrt(var(obs-pred)))))
}
and apply it
ll.fit<-mle2(ll,start=list(GPP24=100, R24=100))
It works beautifully for one Pond on one day, but what I want to do is apply it to all ponds on all days automatically.
I tried the ddply (as stated above)
metabolism<-ddply(YSI, .(Pond,Treat,day,month), summarise,
mle = mle2(ll,start=list(GPP24=100, R24=100)))
but had not success. I also tried just extracting the coefficients using a for loop, but this did not work either.
for(i in 1:length(unique(YSI$day))){
GPP<-numeric(length=length(unique(YSI$day)))
GPP[i]<-mle2(ll,start=list(GPP24=100, R24=100))
}
any help would be gratefully received.
There's at least one problem with your functions : Nowhere in your function flux.pred or ll do you have an argument where you can actually specify what the data is that is used. You hardcoded it. So how on earth is any *ply supposed to guess that it needs to change YSI$... into a subset?
Next to that, as #hadley points out, ddply will not suit you. dlply might, or you might just use the classic approach of either by() or lapply(split()).
So imagine you make a function
flux.pred <- function(data, GPP24, R24){
# calculates Schmidt coefficient from water temperature
Sc<-function(data$Temp){
S<-0.0476*(data$Temp)^2 ...
...
}
and a function
ll<-function(GPP24, R24, data ){
pred = (flux.pred(data, GPP24, R24 ))
pred = pred[-144] # check this
obs = data$flux.cor[-144] # check this
return(-sum(dnorm(obs, mean=pred, sd=sqrt(var(obs-pred)))))
}
You should then be able to do do eg :
dlply(data, .(Pond,Treat,day,month), .fun=function(i){
mle2(ll,start=list(GPP24=100, R24=100, data=i))
})
The passing of the data argument is dependent on what you use in mle2 for the optimization. In your case, you use the default optimizer, which is optim. See ?optim for more details. The argument data=i will be passed from mle2 to optim to ll.
What I can't check, is how the optim behaves. It might even be that your function doesn't really work as you intend. Normally you should have a function ll like :
ll <- function(par, data){
GPP24 <- par[1]
R24 <- par[2]
...
}
for optim to work. But if you say it works as you wrote it, I believe you. Make sure though it actually does. I am not convinced...
On a sidenote : neither using by() / lapply(split()) nor using dlply() is the same as vectorizing. In contrary, all these constructs are intrinsic loops. On the why of using them, read : Is R's apply family more than syntactic sugar?

Resources