R: SVM performance using custom kernel (user defined kernel) is not working in kernlab - r

I'm trying to use user defined kernel. I know that kernlab offer user defined kernel(custom kernel functions) in R. I used data spam including package kernlab.
(number of variables=57 number of examples =4061)
I'm defined kernel's form,
kp=function(d,e){
as=v*d
bs=v*e
cs=as-bs
cs=as.matrix(cs)
exp(-(norm(cs,"F")^2)/2)
}
class(kp)="kernel"
It is the transformed kernel for gaussian kernel, where v is the continuously changed values that are inverse of standard deviation vector about each variables, for example:
v=(0.1666667,........0.1666667)
The training set defined 60% of spam data (preserving the proportions of the different classes).
if data's type is spam, than data's type = 1 for train svm
m=ksvm(xtrain,ytrain,type="C-svc",kernel=kp,C=10)
But this step is not working. It's always waiting for a response.
So, I ask you this problem, why? Is it because the number of examples are too big? Is there any other R package that can train SVMs for user defined kernel?

First, your kernel looks like a classic RBF kernel, with v = 1/sigma, so why do you use it? You can use a built-in RBF kernel and simply set the sigma parameter. In particular - instead of using frobenius norm on matrices you could use classic euclidean on the vectorized matrices.
Second - this is working just fine.
> xtrain = as.matrix( c(1,2,3,4) )
> ytrain = as.factor( c(0,0,1,1) )
> v= 0.01
> m=ksvm(xtrain,ytrain,type="C-svc",kernel=kp,C=10)
> m
Support Vector Machine object of class "ksvm"
SV type: C-svc (classification)
parameter : cost C = 10
Number of Support Vectors : 4
Objective Function Value : -39.952
Training error : 0
There are at least two reasons for you still waiting for results:
RBF kernels induce the most hard problem to optimize for SVM (especially for large C)
User defined kernels are far less efficient then builtin
As I am not sure, whether ksvm actually optimizes the user-defined kernel computation (in fact I'm pretty sure it does not), you could try to build the kernel matrix ( K[i,j] = K(x_i,x_j) where x_i is i'th training vector) and provide ksvm with it. You can achieve this by
K <- kernelMatrix(kp,xtrain)
m <- ksvm(K,ytrain,type="C-svc",kernel='matrix',C=10)
Precomputing kernel matrix can be quite long process, but then optimization itself will be much faster, so it is a good method if you want to test many different C values (which you for sure should do). Unfortunately this requires O(n^2) memory, so if you use more then 100 000 vectors, you will need really great amount of RAM.

Related

How can OpenMDAO be used to solve a linear system of equations without inverting the A matrix?

I have a system of equations that is in the form:
Ax = b
Where A and b are a mixture of known states and state rates derived from earlier components and x is a vector of four yet unknown state rates. I've used Matlab to linearise the problem, all I need to do now is to create some components to find x. However, the inverse of A is large in terms of the number of variables in each index, so I can't just turn these into a straightforward linear equation. Could someone suggest a route to go?
I don't fully understand what you mean by "the inverse of A is large in terms of the number of variables in each index", however I think mean that the inverse of A is to larger and dense to compute and store in memory.
OpenMDAO or not, When you run into this situation you are forced to use an iterative linear solver such as gmres. So that is broadly the approach that is needed here too.
OpenMDAO does have a LinearSystemComponent that you can use as a rough blueprint here. However, it does compute a factorization and store it which is not what you want. Regardless, it gives you the blueprint for how to represent a linear system as an implicit component in OpenMDAO.
Broadly, you have to think of defining a linear residual:
R = Ax-b = 0
Your component will have two inputs A and b, and and one output x.
The two key methods here are apply_nonlinear and solve_nonlinear. I realize that the word nonlinear in the method names is confusing. OpenMDAO assumes that the analysis is nonlinear. In your case it happens to be linear, but you use the nonlinear methods all the same.
I will assume that, although you can't compute/store [A] inverse you can compute/store A (perhaps in a sparse format). In that case you might pass the sparse data array of [A] as the input and fill the sparse matrix as needed from that.
the apply_nonlinear method would look like this:
def apply_nonlinear(self, inputs, outputs, residuals):
"""
R = Ax - b.
Parameters
----------
inputs : Vector
unscaled, dimensional input variables read via inputs[key]
outputs : Vector
unscaled, dimensional output variables read via outputs[key]
residuals : Vector
unscaled, dimensional residuals written to via residuals[key]
"""
residuals['x'] = inputs['A'].dot(outputs['x']) - inputs['b']
The key to your question is really the solve_nonlinear method. It would look something like this (using scipy gmres):
def solve_nonlinear(self, inputs, outputs):
"""
Use numpy to solve Ax=b for x.
Parameters
----------
inputs : Vector
unscaled, dimensional input variables read via inputs[key]
outputs : Vector
unscaled, dimensional output variables read via outputs[key]
"""
x, exitCode = gmres(inputs['A'], inputs['b'])
outputs['x'] = x

Parallel processing or optimization of latent class analysis

I am using poLCA package to run latent class analysis (LCA) on a data with 450,000 observations and 114 variables. As with most latent class analysis, I will need to run this multiple rounsd for different number of classes. Each run takes about 12-20 hours depending on the number of class selected.
Is there a way for me to utilize parallel processing to run this more efficiently? Otherwise, is there other ways to optimize this?
#Converting binary variables to 1 and 2
lca_dat1=lca_dat1+1
#Formula for LCA
f<-cbind(Abdominal_hernia,Abdominal_pain,
Acute_and_unspecified_renal_failure,Acute_cerebrovascular_disease,
Acute_myocardial_infarction,Administrative_social_admission,
Allergic_reactions,Anal_and_rectal_conditions,
Anxiety_disorders,Appendicitis_and_other_appendiceal_conditions,
Asthma,Bacterial_infection_unspecified_site,
Biliary_tract_disease,Calculus_of_urinary_tract,
Cancer_of_breast,Cardiac_dysrhythmias,
Cataract,Chronic_obstructive_pulmonary_disease_and_bronchiectasis,
Chronic_renal_failure,Chronic_ulcer_of_skin,
Coagulation_and_hemorrhagic_disorders,Coma_stupor_and_brain_damage,
Complication_of_device_implant_or_graft,Complications_of_surgical_procedures_or_medical_care,
Conditions_associated_with_dizziness_or_vertigo,Congestive_heart_failure_nonhypertensive,
Coronary_atherosclerosis_and_other_heart_disease,Crushing_injury_or_internal_injury,
Deficiency_and_other_anemia,Delirium_dementia_and_amnestic_and_other_cognitive_disorders,
Disorders_of_lipid_metabolism,Disorders_of_teeth_and_jaw,
Diverticulosis_and_diverticulitis,E_Codes_Adverse_effects_of_medical_care,
E_Codes_Adverse_effects_of_medical_drugs,E_Codes_Fall,
Epilepsy_convulsions,Esophageal_disorders,
Essential_hypertension,Fever_of_unknown_origin,
Fluid_and_electrolyte_disorders,Fracture_of_lower_limb,
Fracture_of_upper_limb,Gastritis_and_duodenitis,
Gastroduodenal_ulcer_except_hemorrhage,Gastrointestinal_hemorrhage,
Genitourinary_symptoms_and_illdefined_conditions,Gout_and_other_crystal_arthropathies,
Headache_including_migraine,Heart_valve_disorders,
Hemorrhoids,Hepatitis,Hyperplasia_of_prostate,
Immunizations_and_screening_for_infectious_disease,
Inflammation_infection_of_eye_except_that_caused_by_tuberculosis_or_sexually_transmitteddisease,Inflammatory_diseases_of_female_pelvic_organs,
Intestinal_infection,Intracranial_injury,
Joint_disorders_and_dislocations_traumarelated,Late_effects_of_cerebrovascular_disease,
Medical_examination_evaluation,Menstrual_disorders,
Mood_disorders,Nausea_and_vomiting,
Neoplasms_of_unspecified_nature_or_uncertain_behavior,Nephritis_nephrosis_renal_sclerosis,
Noninfectious_gastroenteritis,Nonspecific_chest_pain,
Nutritional_deficiencies,Open_wounds_of_extremities,
Open_wounds_of_head_neck_and_trunk,Osteoarthritis,
Other_aftercare,Other_and_unspecified_benign_neoplasm,
Other_circulatory_disease,
Other_connective_tissue_disease,
Other_diseases_of_bladder_and_urethra,Other_diseases_of_kidney_and_ureters,
Other_disorders_of_stomach_and_duodenum,Other_ear_and_sense_organ_disorders,
Other_endocrine_disorders,Other_eye_disorders,
Other_female_genital_disorders,Other_fractures,
Other_gastrointestinal_disorders,Other_infections_including_parasitic,
Other_injuries_and_conditions_due_to_external_causes,Other_liver_diseases,
Other_lower_respiratory_disease,Other_nervous_system_disorders,
Other_nontraumatic_joint_disorders,Other_nutritional_endocrine_and_metabolic_disorders,
Other_screening_for_suspected_conditions_not_mental_disorders_or_infectious_disease,
Other_skin_disorders,Other_upper_respiratory_disease,
Other_upper_respiratory_infections,Paralysis,
Pleurisy_pneumothorax_pulmonary_collapse,Pneumonia_except_that_caused_by_tuberculosis_or_sexually_transmitted_disease,
Poisoning_by_other_medications_and_drugs,Respiratory_failure_insufficiency_arrest_adult,
Retinal_detachments_defects_vascular_occlusion_and_retinopathy,Screening_and_history_of_mental_health_and_substance_abuse_codes,
Secondary_malignancies,Septicemia_except_in_labor,
Skin_and_subcutaneous_tissue_infections,Spondylosis_intervertebral_disc_disorders_other_back_problems,
Sprains_and_strains,Superficial_injury_contusion,
Syncope,Thyroid_disorders,Urinary_tract_infections)~1
#LCA for 1 class
lca1<-poLCA(f,lca_dat1,nclass=1,maxiter=3000,tol=1e-7,graph=F,nrep=5)
#LCA for 2 classes
lca2<-poLCA(f,lca_dat1,nclass=2,maxiter=3000,tol=1e-7,graph=T,nrep=5)
##Extract maximum posterior probability
posterior_lca2=lca2$posterior
posterior_lca2$max_pos=apply(posterior_lca2,1,max)
##Check number of maximum posterior probability that falls above 0.7
table(posterior_lca2$max_pos>0.7)
#LCA for 3 classes
lca3<-poLCA(f,lca_dat1,nclass=3,maxiter=3000,tol=1e-7,graph=T,nrep=5)
##Extract maximum posterior probability
posterior_lca3=lca3$posterior
posterior_lca3$max_pos=apply(posterior_lca3,1,max)
##Check number of maximum posterior probability that falls above 0.7
table(posterior_lca3$max_pos>0.7)
...
You can create a list with the different configurations you want to use. Then use either one of the *apply functions from the parallel package or %dopar% from foreach. Which parallel backend you can/should use depends on your OS.
Here an example with foreach:
library(foreach)
library(doParallel)
registerDoSEQ() # proper backend depends on the OS
foreach(nclass = 1:10) %dopar% {
# do something with nclass
sqrt(nclass)
}
Here are my not too brief or compact thoughts on this. They are less than exact. I have not ever used anywhere near so many manifest factors with poLCA and I think you may be breaking some interesting ground doing so computationally. I use poLCA to predict electoral outcomes per voter (red, blue, purple). I can be wrong on that and not suffer a malpractice suit. I really don't know about the risk of LCA use in health analysis. I think of LCA as more of a social sciences tool. I could be wrong about that as well. Anyway:
(1) I believe you want to look for the most "parsimonious" factors to produce a latent class and limit them to a reduced subset that proves the most useful for all your data. That will help with CPU optimization. I have found personally that using manifests that are exceptionally "monotonic" is not (by default) necessarily a good thing, although certainly experimenting with factors more or less "monotonic" talks to you about your model.
I have found it is more "machine learning" friendly/responsible to use the most widespread manifests and "sample split" your data into groups; recombining the posteriors post LCA run. This assumes that that the most widespread factors affect different subgroups quantitatively but with variance for sample groups (e.g. red, blue, purple). I don't know that anyone else does this, but I gave up trying to build the "one LCA model that rules them all" from voterdb information. That didn't work.
(2)The poLCA library (like most latent class analysis) depends upon matrix multiplication. I have found poLCA more CPU bound than memory bound but with 114 manifests you may experience bottlenecks at every nook and cranny of your motherboard. Whatever you can do to increase matrix multiplication efficiency helps. I believe I have found that Microsoft Open R use of Intel's MKL MKLs more efficient than the default CRAN numeric library. Sorry, I haven't completely tested that nor do I understand why some numeric libraries might be more efficient than others for matrix multiplication. I only know that Microsoft Open R brags about this some and it appears to me they have a point with MKL MKL.
(3) Reworking your LCS code into Matt Dowles data.table library shows me efficiencies across the board on all my work. I create 'dat' as an data.table and run iterations for the best optimized data.table function for poLCA and posteriors. Combining data.table efficiency with some of Hadley Wickham's improved *ply functions (plyr library) that puts LCA runs into lists works well for me:
rbindlist(plyr::llply(1:10,check_lc_pc)) # check_lc_pc is the poLCA function.
(4) This is a simple tip (maybe even condescending), but you don't need to list all standard error data once you are satisfied with your model so verbose = FALSE. Also, by making regular test runs, I can determine the poLCA run optimizated best for my model the best ('probs.start') and leverage testing thereof:
lc <- poLCA(f,dat,nrep=1,probs.start=probs.start.new,verbose=FALSE)
poLCA produces a lot of output to the screen by default. Create a poLCA function with verbose=FALSE and a byte-compiled R 3.5 optimizes output.
(5) I use Windows 10 and because of fast SSD, fast DDR, and Microsoft "memory compression" I think I notice that the the Windows 10 OS adapts to lca runs with lots of "memory compression". I assume that it is holding the same matrices in compressed memory because I am calling them repeatedly over time. I really like the Kaby Lake processors that "self over-clock". I see my processor 7700HQ taking advantage of that during LCA runs. (It would seem like LCA runs would benefit from over clocking. I don't like to overclock my processor on my own. That's too much risk for me.) I think it is useful to monitor memory use of your LCA runs from another R console with system calls to Powershell and cmd memory management functions. The one below list the hidden "Memory Compression" process(!!):
ps_f <- function() { system("powershell -ExecutionPolicy Bypass -command $t1 = ps | where {$_.Name -EQ 'RGui' -or $_.Name -EQ 'Memory Compression'};
$t2 = $t1 | Select {
$_.Id;
[math]::Round($_.WorkingSet64/1MB);
[math]::Round($_.PrivateMemorySize64/1MB);
[math]::Round($_.VirtualMemorySize64/1MB) };
$t2 | ft * "); }
ps_all <- function() {ps();ps_e();ps_f();}
I have this memory management function for your session used for the lca runs, but of course, that runs before or after:
memory <- function() {
as.matrix(list(
paste0(shell('systeminfo | findstr "Memory"')), # Windows
paste0("R Memory size (malloc) available: ",memory.size(TRUE)," MB"),
paste0("R Memory size (malloc) in use: ",memory.size()," MB"),
paste0("R Memory limit (total alloc): ",memory.limit()," MB")
There is work on the optimization functions for latent class analysis. I post a link here but I don't think that helps us today as users of poLCA or LCA: http://www.mat.univie.ac.at/~neum/ms/fuchs-coap11.pdf. But maybe the discussion is good background. There is nothing simple about poLCA. This document by the developers: http://www.sscnet.ucla.edu/polisci/faculty/lewis/pdf/poLCA-JSS-final.pdf is worth reading at least twice!
If anyone else has any thoughts of poLCA or LCA compression, I would appreciate further discussion as well. Once I started predicting voter outcomes of entire state as opposed to my county, I had to think about optimization and the limits of poLCA and LCA/LCR.
Nowadays, there is a parallized cpp-based impementation of poLCA, named poLCAParallel in https://github.com/QMUL/poLCAParallel . For me, it was much much faster than the base package.

Equation of rbfKernel in kernlab is different from the standard?

I have observed that kernlab uses rbfkernel as,
rbf(x,y) = exp(-sigma * euclideanNorm(x-y)^2)
but according to this wiki link, the rbf kernel should be of the form
rbf(x,y) = exp(-euclideanNorm(x-y)^2/(2*sigma^2))
which is also more intuitive since two close samples with a large kernel sigma value will lead to a higher similarity matching.
I am not sure what e1071 svm uses (native code libsvm?)
I hope someone can enlighten me on why there is a difference ? I caught this because I was initially using e1071 but switched to ksvm but saw inconsistent results for the two.
A small example for comparison
set.seed(123)
x <- rnorm(3)
y <- rnorm(3)
sigma <- 100
rbf <- rbfdot(sigma=sigma)
rbf(x, y)
exp( -sum((x-y)^2)/(2*sigma^2) )
I would expect the kernel value to be close to 1 (since x,y come from sigma=1, while kernel sigma=100). This is observed only in the second case.
I came across that discrepancy too and I wound up digging into the source to figure out if there was a typo in the documentation or what was going on exactly since sigma in the context of Gaussians traditionally goes as the standard deviation in the denominator right?
Here's the relevant source
**kernlab\R\kernels.R**
## Define the kernel objects,
## functions with an additional slot for the kernel parameter list.
## kernel functions take two vector arguments and return a scalar (dot product)
rbfdot<- function(sigma=1)
{
rval <- function(x,y=NULL)
{
if(!is(x,"vector")) stop("x must be a vector")
if(!is(y,"vector")&&!is.null(y)) stop("y must a vector")
if (is(x,"vector") && is.null(y)){
return(1)
}
if (is(x,"vector") && is(y,"vector")){
if (!length(x)==length(y))
stop("number of dimension must be the same on both data points")
return(exp(sigma*(2*crossprod(x,y) - crossprod(x) - crossprod(y))))
# sigma/2 or sigma ??
}
}
return(new("rbfkernel",.Data=rval,kpar=list(sigma=sigma)))
}
You can observe from their comment on sigma/2 or sigma ?? that they may perhaps be a bit confused about the convention to adopt, the presence of /2 would be consistent with the standard deviation form /(2*sigma), but I had to speculate about this discovery.
Now another corroborating piece of evidence is in the help page for ? rbfdot which reads...
sigma The inverse kernel width used by the Gaussian the Laplacian,
the Bessel and the ANOVA kernel
And that is consistent with the form they use with sigma in the numerator, since in the denominator it would scale proportionately with the width of the Gaussian right. So it indeed looks like they settled on the convention that is described in the Wikipedia article as the gamma form, where they say
An equivalent, but simpler, definition involves a parameter gamma =
-1/(2*sigma^2)
So the difference just seems to be a matter of adopting different but equivalent conventions. One motivator for the particular convention (which someone may confirm in a comment) may arise from issues of code reuse and consistency, where as you see the parameter is used by three other kernel forms that may have their parameters more traditionally set in the numerator. I'm not sure on that point however since I've never used those alternate kernels and am unfamiliar with each.

Kernel for classification of variable length sequences of factors in kernlab

Which is the best approach to define a suitable kernel for classification of variable length sequences of factors. I'm using kernlab with R.
Thanks!
There is no general good way. Variable length factors mean, that there is no dimension-dimension relation, so the suitable kernel function is fully data (problem) dependent.
However, the most basic approach, assuming, that your factors are just elements of some big set is to use Jaccard-based kernel,
K(A,B) = |A n B|
Which simply measures size of the intersection. It is easy to prove, that it is a valid kernel, as one can think about kernel projection phi(A) which encodes the set A as the bit-vector with "1" on the i'th dimension iff i'th element of the Universe (from which A is sampled) is contained in A. K defines a regular scalar product of such elements.
You should read about:
Dynamic Time Warping (DTW) inspired kernels (with PDS constraints, such as global alignment kernels).
String kernels usually used for ADN-structure analysis (see spectrum kernel, mismatch kernel, ...).

How to handle boundary constraints when using `nls.lm` in R

I asked this question a while ago. I am not sure whether I should post this as an answer or a new question. I do not have an answer but I "solved" the problem by applying the Levenberg-Marquardt algorithm using nls.lm in R and when the solution is at the boundary, I run the trust-region-reflective algorithm (TRR, implemented in R) to step away from it. Now I have new questions.
From my experience, doing this way the program reaches the optimal and is not so sensitive to the starting values. But this is only a practical method to step aside from the issues I encounterd using nls.lm and also other optimization functions in R. I would like to know why nls.lm behaves this way for optimization problems with boundary constraints and how to handle the boundary constraints when using nls.lm in practice.
Following I gave an example illustrating the two issues using nls.lm.
It is sensitive to starting values.
It stops when some parameter reaches the boundary.
A Reproducible Example: Focus Dataset D
library(devtools)
install_github("KineticEval","zhenglei-gao")
library(KineticEval)
data(FOCUS2006D)
km <- mkinmod.full(parent=list(type="SFO",M0 = list(ini = 0.1,fixed = 0,lower = 0.0,upper =Inf),to="m1"),m1=list(type="SFO"),data=FOCUS2006D)
system.time(Fit.TRR <- KinEval(km,evalMethod = 'NLLS',optimMethod = 'TRR'))
system.time(Fit.LM <- KinEval(km,evalMethod = 'NLLS',optimMethod = 'LM',ctr=kingui.control(runTRR=FALSE)))
compare_multi_kinmod(km,rbind(Fit.TRR$par,Fit.LM$par))
dev.print(jpeg,"LMvsTRR.jpeg",width=480)
The differential equations that describes the model/system is:
"d_parent = - k_parent * parent"
"d_m1 = - k_m1 * m1 + k_parent * f_parent_to_m1 * parent"
In the graph on the left is the model with initial values, and in the middle is the fitted model using "TRR"(similar to the algorithm in Matlab lsqnonlin function ), on the right is the fitted model using "LM" with nls.lm. Looking at the fitted parameters(Fit.LM$par) you will find that one fitted parameter(f_parent_to_m1) is at the boundary 1. If I change the starting value for one parameter M0_parent from 0.1 to 100, then I got the same results using nls.lm and lsqnonlin.I have many cases like this one.
newpars <- rbind(Fit.TRR$par,Fit.LM$par)
rownames(newpars)<- c("TRR(lsqnonlin)","LM(nls.lm)")
newpars
M0_parent k_parent k_m1 f_parent_to_m1
TRR(lsqnonlin) 99.59848 0.09869773 0.005260654 0.514476
LM(nls.lm) 84.79150 0.06352110 0.014783294 1.000000
Except for the above problems, it often happens that the Hessian returned by nls.lm is not invertable(especially when some parameters are on the boundary) so I cannot get an estimation of the covariance matrix. On the other hand, the "TRR" algorithm(in Matlab) almost always give an estimation by calculating the Jacobian at the solution point. I think this is useful but I am also sure that R optimization algorithms(the ones I have tried) did not do this for a reason. I would like to know whether I am wrong by using the Matlab way of calculating the covariance matrix to get standard error for the parameter estimates.
One last note, I claimed in my previous post that the Matlab lsqnonlin outperforms R's optimization functions in almost all cases. I was wrong. The "Trust-Region-Reflective" algorithm used in Matlab is in fact slower(sometimes much slower) if also implemented in R as you can see from the above example. However, it is still more stable and reaches a better solution than the R's basic optimization algorithms.
First off, I am not an expert on Matlab and Optimisation and have never used R.
I am not sure I see what your actual question is, but maybe I can shed some light into your puzzlement:
LM is slightly enhanced Gauß-Newton approach - for problems with several local minima it is very sensitive to initial states. Including boundaries typically generates more of those minima.
TRR is akin to LM, but more robust. It has better capabilities for "jumping out of" bad local minima. It is quite feasible that it will behave better, but perform worse, than an LM. Actually explaining why is very hard. You would need to study the algorithms in detail and look at how they behave in this situation.
I cannot explain the difference between Matlab's and R's implementation, but there are several extensions to TRR that maybe Matlab uses and R does not.
Does your approach of using LM and TRR alternatingly converge better than TRR alone?
Using the mkin package, you can find the parameters using the "Port" algorithm (which is also a kind of a TRR algorithm as far as I could tell from its documentation), or the "Marq" algorithm, which uses nls.lm in the background. Then you can use "normal" starting values or "bad" starting values.
library(mkin)
packageVersion("mkin")
Recent mkin version can speed up the process considerably as they compile the models from automatically generated C code if a compiler is available on your system (e.g. you have r-base-dev installed on Debian/Ubuntu, or Rtools on Windows).
This defines the model:
m <- mkinmod(parent = mkinsub("SFO", "m1"),
m1 = mkinsub("SFO"),
use_of_ff = "max")
You can check that the differential equations are correct:
cat(m$diffs, sep = "\n")
Then we fit in four variants, Port and LM, with or without M0 fixed to 0.1:
f.Port = mkinfit(m, FOCUS_2006_D)
f.Port.M0 = mkinfit(m, FOCUS_2006_D, state.ini = c(parent = 0.1, m1 = 0))
f.LM = mkinfit(m, FOCUS_2006_D, method.modFit = "Marq")
f.LM.M0 = mkinfit(m, FOCUS_2006_D, state.ini = c(parent = 0.1, m1 = 0),
method.modFit = "Marq")
Then we look at the results:
results <- sapply(list(Port = f.Port, Port.M0 = f.Port.M0, LM = f.LM, LM.M0 = f.LM.M0),
function(x) round(summary(x)$bpar[, "Estimate"], 5))
which are
Port Port.M0 LM LM.M0
parent_0 99.59848 99.59848 99.59848 39.52278
k_parent 0.09870 0.09870 0.09870 0.00000
k_m1 0.00526 0.00526 0.00526 0.00000
f_parent_to_m1 0.51448 0.51448 0.51448 1.00000
So we can see that the Port algorithm finds the best solution (to the best of my knowledge) even with bad starting values. The speed issue that one may have with more complicated models is alleviated using the automatic generation of C code.

Resources