Optimizing parameter values using non-linear least squares in R (with integrals) - r

Obviously an R (and math) amateur. I've been working 10+ hours on trying to get this to work, so I thought I'd attempt posting here as a shot.
I have data collected from an experiment with two variables: Iq and q. These data are linear when plotted in loglog space. I am trying to solve for two other variables, por and r, in the following equation:
Iq=SLD^2*(por/Vra)*integral{Rmin to Rmax}((Vr)^2*f(r)*F dr)
Where:
SLD=known constant
por=unknown
Vra=integral{0 to Inf}(Vr*f(r)dr)
Vr=(4/3)*pi*r^3
Rmin and Rmax = known constants
f(r)=((r^-(1+fd))/(Rmin^(-fd) - Rmax^(-fd))/fd)
r=unknown
fd=known constant
F=(3*(sin(q*r)-q*rcos(q*r))/(q*r)^3)^2
I've tried many attempts at this, but can't seem to wrap my brain around the variables inside the variables into code. This problem used to be solved in an Excel solver routine that optimized parameter values using non-linear least squares that only works on (imo) Windows 95 Excel, and we're trying to adapt it into a more user-friendly data processing method. But I'm a geochemist, so basically useless. Any help would be much appreciated! I can include more details if some kind soul out there is willing to help out.

Related

Using permanova in r to analyse the effect of 3 independent variables on reef systems

I am trying to understand how to run PERMANOVA using Adonis2 in R to analyse some data that I have collected. I have been looking online, but as it often happens, explanations are a bit convoluted, so I am asking for your help, if you can help me. I have got some fish and coral groups as columns, as well as 3 independent variables (reef age, depth, and material). Snapshot of my dataset structure I think I have understood that p-values are not the only important bit of the output, and that the R2 values indicate how much each variable contributes to the model. Is there something wrong or that I am missing here? Also, I think I understood that I should check for homogeneity of variance, but I have not understood, again, if I should check for it on each variable independently, or if I should include them all in the same bit of code (which does not seem to work). Here are the bit of code that I am using to run the PERMANOVA (1), and the one that I am trying to use to assess homogeneity of variance - which does not work (2).
(1) adonis2(species ~ Age+Material+Depth,data=data.var,by="margin)
'Species' is the subset of the dataset including all the species'count, while 'data.var'is the subset including the 3 independent variables. Also what is the difference in using '+' or '' in the code? When I use '' it gives me - 'Error in qr.X(object$CCA$QR) :
need larger value of 'ncol' as pivoting occurred'. What does this mean?
(2) variance.check<-betadisper(species.distance,data.var, type=c("centroid"), bias.adjust= FALSE)
'species.distance' is a matrix calculated through 'vegdist' using Bray-Curtis method. I used 'data.var'to check variance on all the 3 independent variables, but it does not work, while it works if I check them independently (3). Why is that?
(3) variance.check<-betadisper(species.distance, data$Depth, type=c("centroid"), bias.adjust= FALSE)
Thank you in advance for your responses, and for your help. It will really help me to get my head around it (and sorry for the many questions).

R - replicate weight survey

Currently I'm interested in learning how to obtain information from the American Community Survey PUMS files. I have read some of the the ACS documentation and found that to replicate weights I must use the following formula:
And thanks to google I also found that there's the SURVEY package and the svrepdesign function to help me get this done
https://www.rdocumentation.org/packages/survey/versions/3.33-2/topics/svrepdesign
Now, even though I'm getting into R and learning statistics and have a SQL background, there are two BIG problems:
1 - I have no idea what that formula means and I would really like to understand it before going any further
2 - I don't understand how the SVREPDESIGN function works nor how to use it.
I'm not looking for someone to solve my life/problems, but I would really appreciate if someone points me in the right direction and gives a jump start.
Thank you for your time.
When you are using svyrepdesign, you are specifying that it is a design with replicated weights, and it uses the formula you provided to calculate the standard errors.
The American Community Survey has 80 replicate weights, so it first calculates the statistic you are interested in with the full sample weights (X), then it calculates the same statistic with all 80 replicate weights (X_r).
You should read this: https://usa.ipums.org/usa/repwt.shtml

Multiple regression lines to define a set of data

I am trying to use a regression model to establish a relationship between two parameters, A and B(more specifically, runtime and workload, so that can I recommend what an optimal workload could be maybe, or how strongly one affects the other etc. ) I am using 'rlm'(robust linear model) for this purpose since it saves me the trouble of dealing with outliers before hand.
However, rather than output one single regression model, I would like to determine a band that can confidently explain most of the points. Here is an image I took from the web. Those additional red lines are what I want to determine.
This is what I had in mind :
1. I found the mean of the residuals of all the points lying above the line. Then we probably shift the original regression line by some multiple of mean + k*sigma. The same can be done for the points below the line.
In SVM, in order to find the support vectors, we draw parallel lines(essentially shift the middle line until we find support vectors on either sides). I had something like that in mind. Play around with the intercepts a little and find the the number of points which can be explained by the band. Keep a threshold so you can stop somewhere.
The problem is, I am unable to implement this in R. For that matter, I am not sure if these approaches even work either. I would like to know what you would suggest. Also, is there a classic way to do this using one of the many R packages?
Thanks a lot for helping. Appreciate it.

R equivalent to matlab griddata, scatteredInterpolant, and/or TriScatteredInterp

We do a lot of full field 3D numerical simulations (CFD, FEA, etc.). The solutions take a long time to run. We often interpolate from solutions rather than rerun every case. We also interpolate between multiple solutions, which leads to even higher dimensional interpolation (like adding time, so x,y,z,t,v).
Matlab does a great job of reading data V at irregular grid of X,Y,Z coordinates, and interpolating from V using griddata, scatterdInterpolan, and/or TriScatteredInterp. For a variety of reasons, I've switched to R. This remains one key area I've not been able to find as good R equivalent. 'akima' only does x,y,V (not, x,y,z,V, much less even higher dimensions like x,y,z,t,v).
The next best thing I've found has been 'krigging'. But krigging behaves more like model fitting and projection, and often does not behave well between irregular grid points. So it's not nearly as robust as simple direct linear interpolation.
Matlab has had griddata for several decades. It's hard to believe R doesn't have an equivalent out there. Any suggestions? Or is there at least a way to use krigging to yield effectively the same result as a direct linear interpolation?
Jonathan
You might start by looking at the package "tripack" to do Delaunay triangulation, which gives you the first step in duplicating scatteredInterpolant().
R interpp() is equivalent to MATLAB scatteredInterpolant().

Why is there a non semi positive definite leading minor when applying functional pca on curves where the period=2 in the fourier basis?

I am using the fda package for R to generate a random sample of curves. More specifically, I am using a fourier series with a change in the period to represent the specific structure I need. Defining the sample works fine, but I encounter a problem when the number of basis functions is sufficiently large and I want to apply 'pca.fd' on the sample. The error is:
"leading minor of order [... e.g. 24] is not semi positive definite."
I am wondering why this happens, and if there is a way to circumvent it. Obviously, it's rather a numerical or statistical issue then a pure coding problem. But all my assigned coeficients are iid and the fourier basis provides orthogonal functions. Moreover, everything works totally fine when the period is set to its default level. So what goes wrong with period=2 ?
I am happy for any hint on the issue. Thank you very much in advance !
Here is some code to reproduce the error:
nc <- 40 # Number of curves
nb <- 101 # Number of basis functions
coefm <- matrix(rnorm(nb*nc),nrow=nb,ncol=nc) # random coeficient matrix
# basis function object with "normal" period:
mybase = create.fourier.basis(rangeval=c(0,1), nbasis=nb, period=1)
# generate the sample of curves:
fdobj <- fd(coefm,mybase)
# Principal component analysis:
pca.fd(fdobj) # should work, even though the number of basis functions is large.
# Now: change the period of the fourier basis object:
mybase = create.fourier.basis(rangeval=c(0,1), nbasis=nb, period=2)
fdobj <- fd(coefm,mybase)
pca.fd(fdobj) # Here is the error. However, this does not happen with nb<20
Unfortunately, there is no easy and straight away answer. I contacted the maintainers of the fda package and after they investigated the issue let me know that what I was doing was simply very bad in terms of what happens to the curves I generated. Apparently, the curves are so similar on some parts of their range, that the pca runs into some kind of numeric difficulties. So to close this issue, I don't expect a better answer anymore.

Resources