I am testing the example here: https://fluxml.ai/Flux.jl/stable/models/overview/
using Flux
actual(x) = 4x + 2
x_train, x_test= hcat(0:5...), hcat(6:10...)
y_train, y_test = actual.(x_train), actual.(x_test)
predict = Dense(1 => 1)
predict(x_train)
loss(x,y) = Flux.Losses.mse(predict(x),y)
loss(x_train,y_train)
using Flux:train!
opt = Descent(0.1)
data = [(x_train, y_train)]
parameters = Flux.params(predict)
predict.weight in parameters, predict.bias in parameters
train!(loss, parameters, data, opt)
loss(x_train, y_train)
for epoch in 1:1000
train!(loss, parameters, data, opt)
end
loss(x_train, y_train)
predict(x_test)
y_test
As you can see, it is just a very simple model actual(x) = 4x + 2. If you run these codes you will get an almost perfect prediction result.
1×5 Matrix{Float32}:
26.0001 30.0001 34.0001 38.0001 42.0001
1×5 Matrix{Int64}:
26 30 34 38 42
But if I make a minor change in term of feeding the model with one more data, like this:
x_train, x_test= hcat(0:6...), hcat(6:10...)
So I didn't change anything except line 3. I just changed 5 to 6.
Then the prediction result will become infinite.
1×5 Matrix{Float32}:
NaN NaN NaN NaN NaN
1×5 Matrix{Int64}:
26 30 34 38 42
But why?
I think this is simply a case of a high learning rate gone wrong. I can reproduce the same NaN behaviour with Descent(0.1). I tried printing it out and the loss goes to Inf first before NaN - a classic sign of a divergence because of a high learning rate. So I tried a learning rate of 0.01 and it works just fine - it gives the expected answer. It is probably diverging when x_train is hcat(0:6...). A smaller learning rate allows the network to take smaller steps and it manages to find the minimum as expected.
Related
I am training a multinomial regression model using cv.glmnet, and the number of features and number of classes I have been using has been increasing. On previous versions of my training set, where I had fewer features and classes, my model converged for all lambdas after increasing the value of maxit.
However, with the training data I am using now, I get the following errors even when I increase maxit = 10^7.
Warning messages:
1: from glmnet C++ code (error code -13); Convergence for 13th lambda
value not reached after maxit=100000 iterations; solutions for larger
lambdas returned
2: from glmnet C++ code (error code -14); Convergence for 14th lambda
value not reached after maxit=100000 iterations; solutions for larger
lambdas returned
3: from glmnet C++ code (error code -13);
Convergence for 13th lambda value not reached after maxit=100000
iterations; solutions for larger lambdas returned
.
.
.
Here is code that recreates these warnings:
load(url("https://github.com/DylanDijk/RepoA/blob/main/reprod_features.rda?raw=true"))
load(url("https://github.com/DylanDijk/RepoA/blob/main/reprod_response.rda?raw=true"))
# Training the model:
model_multinom_cv = glmnet::cv.glmnet(x = reprod_features, y = reprod_response,
family = "multinomial", alpha = 1)
I was wondering if anyone had any advise on trying to get a model to converge for all lambda values in the path.
Some options I have been thinking that I will try:
Would an option be to change some of the internal parameters as
listed in the glmnet
vignette
Or to select a lambda sequence myself and then increase maxit
further. I have tried maxit = 10^8 without defining a lambda
sequence but this did not train after multiple hours.
Choose a subset of the features, I have trained the model with a small subset of the features and it the model converged for more lambda values.
But I would rather use all of the features so want to explore whether there are other options first.
Lambda path returned
Below is the lambda path returned after training my model:
> model_multinom_cv$glmnet.fit
Call: glmnet(x = train_sparse, y = train_res, trace.it = 1,
family = "multinomial", alpha = 1)
Df %Dev Lambda
1 0 0.00 0.17730
2 1 1.10 0.16150
3 2 1.88 0.14720
4 5 4.72 0.13410
5 8 8.52 0.12220
6 14 13.49 0.11130
7 21 19.90 0.10150
8 27 25.83 0.09244
9 31 30.63 0.08423
10 36 34.56 0.07674
11 41 38.61 0.06993
12 45 41.89 0.06371
Please bear with me, as this is my first post in my first month of starting with R. I have some biphasic decay data, an example of which is included below:
N
Time
Signal
1
0.0001101
2.462455
2
0.0002230
2.362082
3
0.0003505
2.265309
4
0.0004946
2.180061
5
0.0006573
2.136348
6
0.0008411
2.071639
7
0.0010487
2.087519
8
0.0012832
1.971550
9
0.0015481
2.005190
10
0.0018473
1.969274
11
0.0021852
1.915299
12
0.0025669
1.893703
13
0.0029981
1.905901
14
0.0034851
1.839294
15
0.0040352
1.819827
16
0.0046565
1.756207
17
0.0053583
1.704472
18
0.0061510
1.630652
19
0.0070464
1.584315
20
0.0080578
1.574424
21
0.0092002
1.493813
22
0.0104905
1.349054
23
0.0119480
1.318979
24
0.0135942
1.242094
25
0.0154536
1.115491
26
0.0175539
1.065381
27
0.0199262
0.968143
28
0.0226057
0.846351
29
0.0256323
0.765699
30
0.0290509
0.736105
31
0.0329122
0.588751
32
0.0372736
0.539969
33
0.0421999
0.467340
34
0.0477642
0.389153
35
0.0540492
0.308323
36
0.0611482
0.250392
37
0.0691666
0.247006
38
0.0782235
0.177039
39
0.0884534
0.174750
40
0.1000082
0.191918
I have multiple curves to fit a double falling exponential that has the general formula where some fraction of the particle A is decaying fast (described by k1) and then the remaining fraction of particle A decays slowly (described by k2), summarized below:
where A is the particle fraction, k1 is a fast rate, k2 is the slow rate, and T is time. I believe should be entered as
DFE <- y ~ (A*exp(-c*t)) + ((A-b)*exp(-d*t))
I would like to create a selfStart code to apply to over 40 sets of data without having to guess the start values each time. I found some R documentation for this, but can't figure out where to go from here.
The problem is that I am very new to R (and programming in general) and really don't know how to do this. I have had success (meaning convergence was achieved) with
nls(Signal~ SSasymp(Time, yf, y0, log_alpha), data = DecayData)
which is a close estimate but not a truly good model. I was hoping I could somehow alter the SSasymp code to work with my equation, but I think that I am perhaps too naive to know even where to begin.
I would like to compare the asymptotic model with my double falling exponential, but the double falling exponential model never seems to reach convergence despite many, many, many trials and permutations. At this point, I am not even sure if I have entered the formula correctly anymore. So, I am wondering how to write a selfStart that would ideally give me extractable coefficients/half-times.
Thanks so much!
Edit:
As per Chris's suggestion in the comments, I have tried to insert the formula itself into the nls() command like so:
DFEm = nls("Signal" ~ (A*exp(-c*Time)) + ((A-b)*exp(-d*Time)), data = "Signal", trace= TRUE)
which returns
"Error in nls(Signal ~ (A * exp(-c * Time)) + ((A - b) * exp(-d * : 'data' must be a list or an environment"
So I am unsure of how to proceed, as I've checked spelling and capitalization. Is there something silly that I am missing?
Thanks in advance!
I've used metaMDS for small fishery datasets before, found it illuminating, and would like to apply the same analysis to a very large dataset - fish catch records for 40 stations sampled six times annually since 1970. No matter what I do, I can't seem to get metaMDS to converge. The data has 76 species, 5773 rows, which I've reduced to the most common species and split by decade, so that each community matrix is 52 species and roughly 1100 rows. To resolve the issue of rows that are all zeroes, I added a phony "NoFish" species that is only 'caught' when catch for all other species is zero.
I feel like it should converge, it just doesn't. I've got plenty of data, the stress values look reasonable (0.2 for k=2 and 0.16 for k=3), and I've upped trymax to 200, and used previous.best. I looked online and added noshare, which seemed to help it run a little faster at least. My computer (Windows 10) takes over an hour to run 200 iterations, so I haven't upped trymax more than that. I've looked at my run results and nothing looks obviously out of place. I don't get any warnings at the end, but I have looked up the sratmax and sfgrmin messages online. Unfortunately all the results were gobbledygook to me.
stn <- read.csv("STN_CPUE_1970-2019.csv")
temp = temp[,colSums(temp)>100]#adjusted for CPUE, removes 24 of 76 species
stn.reduced = cbind(stn[,1:12], temp)
stn70r = stn.reduced[1:1072,]
stn3.nMDS = metaMDS(stn70r[,13:64], k=3, trymax=200, noshare = 0.1, previous.best = stn3.nMDS)
OUTPUT:
Square root transformation
Wisconsin double standardization
Using step-across dissimilarities:
Too long or NA distances: 61161 out of 574056 (10.7%)
Stepping across 574056 dissimilarities...
Connectivity of distance matrix with threshold dissimilarity 1
Data are connected
Starting from 3-dimensional configuration
Run 0 stress 0.1590091
.
.
Run 103 stress 0.1585801
... New best solution
... Procrustes: rmse 0.005185737 max resid 0.06388682
.
.
Run 200 stress 0.1625358
*** No convergence -- monoMDS stopping criteria:
40: no. of iterations >= maxit
156: stress ratio > sratmax
4: scale factor of the gradient < sfgrmin
Notes: All runs have a stress of around 0.16. When it does spit out a procrustes result, it's similar to the one above: rmse of 0.005-0.006 and max resid of 0.06-0.07. I've run it multiple times to the same result and I get the same three messages when I run it without noshare and a third axis:
stn.nMDS = metaMDS(stn70r[,13:64], trymax = 200, previous.best = stn.nMDS)
So not sure that increasing axes to three and adding noshare actually help. Any advice is greatly appreciated. Thanks!
i'm trying to use R for global sensitivity analysis of a function. I'm completely new to R so I'm having a hard time understanding the documentation correctly.
I want to use the fast99 method from the sensitivity package but it returns NaN for 2 of my 4 factors.
I'm using R Studio and the sensitivity package.
My function is
Func<-function(
Input
){
alpha<-Input[,1]
beta<-Input[,2]
gamma<-Input[,3]
nu<-Input[,4]
root<-4*beta+alpha^2*gamma +2*alpha*beta*gamma*nu+beta^2*gamma*nu^2
denominator<-2*beta*gamma
summand<-alpha*gamma-beta*gamma*nu
result<-(summand+sqrt(gamma)*sqrt(root))/denominator
return(result)
}
And then I call
library(sensitivity)
factors<-c("alpha","beta", "gamma", "nu")
x<-fast99(Mtb, factors=factors, n=1000, q.arg=list(min=0, max=1))
print(x)
I expect the result to be some number for each factor but it returns
Call:
fast99(model = Mtb, factors = factors, n = 1000, q.arg = list(min = 0, max = 1))
Model runs: 4000
Estimations of the indices:
first order total order
alpha NaN NaN
beta 0.23928895 0.8855446
gamma 0.03075694 0.5991250
nu NaN NaN
Which can't be since alpha should be important
I found the Problem if I set the minimum Value to 0.001 it works fine, there seems to be a problem with dividing by zero, which irritates me because in the denominator is only beta and gamma. But now it works fine.
The problem was having 0 as a Minimum Value
I have some actual data that I am afraid is somewhat nasty.
It's essentially a Positive Negative Binomial distribution (without any zero counts). However, there are some outliers that seem to cause some bad calculations to occur (maybe underflow or NaNs?) The first 8 or so entries are reasonable, but I'm guessing the last few are causing some problems with the fitting.
Here's the data:
> df
counts t
1 1968 1
2 217 2
3 55 3
4 26 4
5 11 5
6 5 6
7 8 7
8 3 8
9 1 10
10 1 11
11 1 12
12 1 13
13 1 15
14 1 18
15 1 26
16 1 59
This command runs for a while and then spits out the error message
> vglm(counts ~ t, data=df, family = posnegbinomial)
Error in if (take.half.step) { : missing value where TRUE/FALSE needed
BUT, if I rerun this cutting off the outliers, I get a solution for posnegbinomial
> vglm(counts ~ t, data=df[1:9,], family = posnegbinomial)
Call:
vglm(formula = counts ~ t, family = posnegbinomial, data = df[1:9,])
Coefficients:
(Intercept):1 (Intercept):2 t
7.7487404 0.7983811 -0.9427189
Degrees of Freedom: 18 Total; 15 Residual
Log-likelihood: -36.21064
If I try the family pospoisson (Positive Poisson: no zero values), I get a similar error "argument is not interpretable as logical".
I do notice that there are a number of similar questions in Stackoverflow about missing values where TRUE/FALSE is needed, but with other R packages. This indicates to me that perhaps the package writers need to better anticipate calculations might fail.
I think your proximal problem is that the predicted means for the negative binomial for your extreme values are so close to zero that they are underflowing to zero, in a way that was not anticipated/protected against by the package authors. (One thing to realize about nonlinear optimization/fitting is that it is always possible to break a fitting method by giving it extreme data ...)
I couldn't get this to work in VGAM, but I'll offer a couple of other suggestions.
plot(log(counts)~t,data=dd)
And eyeballing the data to get an initial estimate of parameter values (at least for the mean model):
m0 <- lm(log(counts)~t,data=subset(dd,t<10))
I thought I might be able to get vglm() to work by setting starting values, but that didn't actually pan out, even when I have fairly good values from other platforms (see below).
glmmADMB
The glmmADMB package can handle positive NB, via family="truncnbinom":
library(glmmADMB)
m1 <- glmmadmb(counts~t, data=dd, family="truncnbinom")
(there are some warning messages ...)
bbmle::mle2()
This requires a little bit more work: it failed with the standard model, but works if I set a floor on the predicted mean ...
library(VGAM) ## for dposnegbin
library(bbmle)
m2 <- mle2(counts~dposnegbin(size=exp(logk),
munb=pmax(exp(logeta),1e-7)),
parameters=list(logeta~t),
data=dd,
start=list(logk=0,logeta=0))
Again warning messages.
Compare glmmADMB, mle2, simple truncated lm fit ...
cc <- cbind(coef(m2),
c(log(m1$alpha),coef(m1)),
c(NA,coef(m0)))
dimnames(cc) <- list(c("log_k","log_int","slope"),
c("mle2","glmmADMB","lm"))
## mle2 glmmADMB lm
## log_k 0.8094678 0.8094625 NA
## log_int 7.7670604 7.7670637 7.1747551
## slope -0.9491796 -0.9491778 -0.8328487
This is in principle also possible with glmmTMB, but it runs into the same kinds of problems as vglm() ...