I have below x and y value and as you see x is mostly negative, basically I only have the left side of the PDF of my observed data.
I have to fit it with a student distribution, and find out the degree of freedom and scale parameter.
The problem is, the estimated distribution is gonna have a very small variance (ie. small scale parameter). So when I use the below method to fit the distribution, the nls fails to converge no matter what initial values I set.
I have used an extra parameter c in the below code because I scale the distribution by using this: dt(x/a,df). Therefore, in order to conserve the probability, I unavoidably have to time the output but a constant. I believe this extra parameter leads to a poor convergence, but I have no idea how to fit the distribution in a better way.
I have looked for distribution fitting package, but those packages require a complete distribution while I only have the left side of it.
x y
1 -0.0050 0.000000
2 -0.0045 26.723019
3 -0.0040 28.557704
4 -0.0035 41.085068
5 -0.0030 66.258445
6 -0.0025 81.129807
7 -0.0020 83.751611
8 -0.0015 130.378353
9 -0.0010 157.806018
10 -0.0005 201.505657
11 0.0000 949.650354
12 0.0005 193.721270
dat<-data.frame(x=x,y=y)
res<-nls( y~(dt(x/a,df)*c), dat,
start=list(a=0.000201, df=0.9, c=2104), trace = TRUE)
Related
Given two simple sets of data:
head(training_set)
x y
1 1 2.167512
2 2 4.684017
3 3 3.702477
4 4 9.417312
5 5 9.424831
6 6 13.090983
head(test_set)
x y
1 1 2.068663
2 2 4.162103
3 3 5.080583
4 4 8.366680
5 5 8.344651
I want to fit a linear regression line on the training data, and use that line (or the coefficients) to calculate the "test MSE" or Mean Squared Error of the Residuals on the test data once that line is fit there.
model = lm(y~x,data=training_set)
train_MSE = mean(model$residuals^2)
test_MSE = ?
In this case, it is more precise to call it MSPE (mean squared prediction error):
mean((test_set$y - predict.lm(model, test_set)) ^ 2)
This is a more useful measure as all models aim at prediction. We want a model with minimal MSPE.
In practice, if we do have a spare test data set, we can directly compute MSPE as above. However, very often we don't have spare data. In statistics, the leave-one-out cross-validation is an estimate of MSPE from the training dataset.
There are also several other statistics for assessing prediction error, like Mallows's statistic and AIC.
Given two simple sets of data:
head(training_set)
x y
1 1 2.167512
2 2 4.684017
3 3 3.702477
4 4 9.417312
5 5 9.424831
6 6 13.090983
head(test_set)
x y
1 1 2.068663
2 2 4.162103
3 3 5.080583
4 4 8.366680
5 5 8.344651
I want to fit a linear regression line on the training data, and use that line (or the coefficients) to calculate the "test MSE" or Mean Squared Error of the Residuals on the test data once that line is fit there.
model = lm(y~x,data=training_set)
train_MSE = mean(model$residuals^2)
test_MSE = ?
In this case, it is more precise to call it MSPE (mean squared prediction error):
mean((test_set$y - predict.lm(model, test_set)) ^ 2)
This is a more useful measure as all models aim at prediction. We want a model with minimal MSPE.
In practice, if we do have a spare test data set, we can directly compute MSPE as above. However, very often we don't have spare data. In statistics, the leave-one-out cross-validation is an estimate of MSPE from the training dataset.
There are also several other statistics for assessing prediction error, like Mallows's statistic and AIC.
I run a CCA analysis using vegan with 7500 sites, 9 species and 5 constrains variable. The results are
>
Call: cca(sitspe = Yp, sitenv = Xp)
Inertia Proportion Rank
Total
Constrained 0.5051 6
Inertia is mean squared contingency coefficient
Eigenvalues for constrained axes:
[1] 0.3317 0.1301 0.0328 0.0089 0.0014 0.0003
I don't understand why there is no unconstrained or total inertia?
Probably your constrained axes explained everything, and no constrained inertia was left. How many axes did you get with unconstrained ordination (CCA without constraints)?
Your data are really non-square: matrix dimensions are 7500 times 9. There are only nine species, and if these are dependent or otherwise redundant, you may be able to explain everything with your constraints.
I would be glad if somebody could help me to solve this problem. I have data with repeated measurements design, where we tested a reaction of birds (time.dep) before and after the infection (exper). We have also FL (fuel loads, % of lean body mass), fat score and group (Experimental vs Control) as explanatory variables. I decided to use LME, because distribution of residuals doesn’t deviate from normality. But there is a problem with homogeneity of residuals. Variances of groups “before” and “after” and also between fat levels differ significantly (Fligner-Killeen test, p=0.038 and p=0.01 respectively).
ring group fat time.dep FL exper
1 XZ13125 E 4 0.36 16.295 before
2 XZ13125 E 3 0.32 12.547 after
3 XZ13126 E 3 0.28 7.721 before
4 XZ13127 C 3 0.32 9.157 before
5 XZ13127 C 3 0.40 -1.902 after
6 XZ13129 C 4 0.40 10.382 before
After I have selected the random part of the model, which is random-intercept (~1|ring), I have applied the weight parameter for both “fat” and “exper” (varComb(varIdent(form=~1|fat), varIdent(form=~1|exper)). Now the plot of standardized residuals vs. fitted looks better, but I still get the violation of homogeneity for these variables (same values in fligner test). What do I do wrong?
A common trap in lme is that the default is to give raw residuals, i.e. not adjusted for any of the heteroscedasticity (weights) or correlation (correlation) sub-models that may have been used. From ?residuals.lme:
type: an optional character string specifying the type of residuals
to be used. If ‘"response"’, as by default, the “raw”
residuals (observed - fitted) are used; else, if ‘"pearson"’,
the standardized residuals (raw residuals divided by the
corresponding standard errors) are used; else, if
‘"normalized"’, the normalized residuals (standardized
residuals pre-multiplied by the inverse square-root factor of
the estimated error correlation matrix) are used. Partial
matching of arguments is used, so only the first character
needs to be provided.
Thus if you want your residuals to be corrected for heteroscedasticity (as included in the model) you need type="pearson"; if you want them to be corrected for correlation, you need type="normalized".
I have done a SIMPLS regression in R but I am not sure how to interpret the results, this is how my function looks,
yarn.simpls<-mvr(Pcubes~X1+X2+X3,data=dtj,validation="CV",method="simpls")
and this is my results from
summary(yarn.simpls)
X dimension: 33471 3
Y dimension: 33471 1
Fit method: simpls
Number of components considered: 3
VALIDATION: RMSEP
Cross-validated using 10 random segments.
(Intercept) 1 comps 2 comps 3 comps
CV 0.5729 0.4449 0.4263 0.4175
adjCV 0.5729 0.4449 0.4263 0.4175
TRAINING: % variance explained
1 comps 2 comps 3 comps
X 86.77 97.67 100
Pcubes 39.74 44.72 47
What i would like to know is, what is my coefficients? Is it the adjCV row under VALIDATION: RMSEP. The TRAINING: % variance explianed, is that like the significance of the variables? I just want to make sure i interpret the results correctly.
The % variance is describing how much variation each ncomp captures from the x variables, and then the response variable, so it can be thought of as the relative ability of each of the ncomps to capture information in your data.
CV & adjCV are the values for the root mean squared error of prediction (RMSEP), which is giving you information about how well each ncomp model predicts the outcome variable. In your case, a model with 1 component seems to have the highest predictive power.
If you want coefficients for the underlying variables, use coef(yarn.simpls). This will give you what the variable coefficients would be at each ncomp.