What is the difference between fitdistr and fitdist for t-distribution? - r

More specifically, I have a set of data x e.g
x <- scan(textConnection(
"10.210126 10.277015 10.402625 10.208137 9.831884 9.672501
9.815476 9.980124 9.710509 9.148997 9.406072
9.991224 10.324005 10.402747 10.449439 10.051304 9.886748 9.771041
9.761175 10.049102 9.457981 9.114380 9.461333 9.804220 10.395986
10.192419 10.202962 9.984330 9.765604 9.473166 9.966462 10.120895
9.631744"))
If I use
fitdist(x,"t")
I get the following results:
m s df
10.63855874 0.50766169 37.08954639
( 0.02217171) ( 0.01736012) (23.12225558)
Compared to
fitdist(x,"t",start=list(length(x)-1,mean(x)),lower=c(0))
(starting parameters were found in an answer on this site) which results in:
estimate Std. Error
1 103129.28018 NA
2 10.63869 NA
Why is there a difference? Am I perhaps using the wrong starting points for fitdist()?

Related

Incorrect Dimensions error with the function MRM, in the package ecodist

when using the MRM function in the package Ecodist, I get the following error:
Error in xj[i, , drop = FALSE] : incorrect number of dimensions
I get this error no matter what I do, I even get it with the example code in the documentation:
data(graze)
# Abundance of this grass is related to forest cover but not location
MRM(dist(LOAR10) ~ dist(sitelocation) + dist(forestpct), data=graze, nperm=10)
I don't know what's going on. I have tried other computers and get the same error, so it's not even confined to my machine (windows 10, fully updated).
Best,
Joe
Thanks to Torsten Biemann for pointing me at this. I don't check stackoverflow regularly, but you are always welcome to email me at the ecodist maintainer address or open an issue at https://github.com/phiala/ecodist
As pointed out above, the example works correctly in a clean R session, but fails if spdep is loaded. I haven't figured out the conflict yet, but the problem is in the implict coercion of distance object to vector within the mechanics of using a formula. If you do that explicitly, the command works properly. I'll work on a patch, which will first be at the github above, and sent to CRAN after testing.
# R --vanilla --no-save
library(ecodist)
data(graze)
# Works
set.seed(1234)
MRM(dist(LOAR10) ~ dist(sitelocation) + dist(forestpct), data=graze, nperm=10)
$coef
dist(LOAR10) pval
Int 6.9372046 1.0
dist(sitelocation) -0.4840631 0.6
dist(forestpct) 0.1456083 0.1
$r.squared
R2 pval
0.04927212 0.10000000
$F.test
F F.pval
31.66549 0.10000
library(spdep)
# Fails
MRM(dist(LOAR10) ~ dist(sitelocation) + dist(forestpct), data=graze, nperm=10)
Error in xj[i, , drop = FALSE] : incorrect number of dimensions
# Explicit conversion to vector
graze.d <- with(graze, data.frame(LOAR10 = as.vector(dist(LOAR10)), sitelocation = as.vector(dist(sitelocation)), forestpct = as.vector(dist(forestpct))))
# Works
set.seed(1234)
MRM(LOAR10 ~ sitelocation + forestpct, data=graze.d, nperm=10)
$coef
LOAR10 pval
Int 6.9372046 1.0
sitelocation -0.4840631 0.6
forestpct 0.1456083 0.1
$r.squared
R2 pval
0.04927212 0.10000000
$F.test
F F.pval
31.66549 0.10000

How to reproduce results of predict function in R

Lets say I train a model in R.
model <- lm(as.formula(paste((model_Data)[2],"~",paste((model_Data)[c(4,5,6,7,8,9,10,11,12,13,15,16,17,18,20,21,22,63,79,90,91,109,125,132,155,175,197,202,210,251,252,279,287,292,300,313,318)],collapse="+"),sep="")),data=model_Data)
I then use the model to predict an unknown.
prediction <- predict(model,unknown[1,])
1
8.037219
Instead of using predict lets pull out the coefficients and do it manually.
model$coefficients
9.250265284
0.054054202
0.052738367
-0.55119556
0.019686046
0.392728331
0.794558094
0.200555755
-0.63218309
0.050404541
0.089660195
-0.04889444
-0.24645514
0.225817891
-0.10411162
0.108317865
0.004281512
0.219695437
0.037514904
-0.00914805
0.077885231
0.656321472
-0.05436867
0.033296525
0.072551915
-0.11498145
-0.03414029
0.081145352
0.11187141
0.690106624
NA
-0.11112986
-0.18002883
0.006238802
0.058387332
-0.04469568
-0.02520228
0.121577926
Looks like the model couldn't find a coefficient for one of the variables.
Here are the independent variables for our unknown.
2.048475484
1.747222331
-1.240658767
-1.26971135
-0.61858754
-1.186401425
-1.196781456
-0.437969964
-1.37330171
-1.392555895
-0.147275619
0.315190159
0.544014105
-1.137999082
0.464498153
-1.825631473
-1.824991143
0.61730876
-1.311527708
-0.457725059
-0.455920549
-0.196326975
0.636723746
0.128123676
-0.0064055
-0.788435688
-0.493452602
-0.563353694
-0.441559371
-1.083489708
-0.882784077
-0.567873188
1.068504735
1.364721122
0.294178454
2.302875604
-0.998685333
If I multiply each independent variable by it's coefficient and add on the intercept the predicted value for the unknown is 8.450137349
The predict function gave us 8.037219 and the manual calculation gave 8.450137349. What is happening within the predict function that is causing it to predict a different value than the manual calculation? What has to be done to make the values match?
I get a lot closer to the predict answer when using the code below:
b <- c(9.250265284, 0.054054202, 0.052738367, -0.55119556, 0.019686046, 0.392728331, 0.794558094, 0.200555755, -0.63218309, 0.050404541, 0.089660195, -0.04889444, -0.24645514, 0.225817891, -0.10411162, 0.108317865, 0.004281512, 0.219695437, 0.037514904, -0.00914805, 0.077885231, 0.656321472, -0.05436867, 0.033296525, 0.072551915, -0.11498145, -0.03414029, 0.081145352, 0.11187141, 0.690106624, NA, -0.11112986, -0.18002883, 0.006238802, 0.058387332, -0.04469568, -0.02520228, 0.121577926)
x <- c(1, 2.048475484, 1.747222331, -1.240658767, -1.26971135, -0.61858754, -1.186401425, -1.196781456, -0.437969964, -1.37330171, -1.392555895, -0.147275619, 0.315190159, 0.544014105, -1.137999082, 0.464498153, -1.825631473, -1.824991143, 0.61730876, -1.311527708, -0.457725059, -0.455920549, -0.196326975, 0.636723746, 0.128123676, -0.0064055, -0.788435688, -0.493452602, -0.563353694, -0.441559371, -1.083489708, -0.882784077, -0.567873188, 1.068504735, 1.364721122, 0.294178454, 2.302875604, -0.998685333)
# remove the missing value in `b` and the corresponding value in `x`
x <- x[-31]
b <- b[-31]
x %*% b
# [,1]
# [1,] 8.036963

Optimization function gives incorrect results for 2 similar data sets

I have 2 datasets not very different to each other. Each dataset has 27 rows of actual and forecast values. When tested against Solver in Excel for minimization of the absolute error (abs(actual - par * forecast) they both give nearly equal values for the parameter 'par'. However, when each of these data sets are passed on to the same optimization function that I have written, it only works for one of them. For the other data set, the objective always gets evaluated to zero (0) with'par' assisgned the upper bound value.
This is definitely incorrect. What I am not able to understand is why is R doing so?
Here are the 2 data sets :-
test
dateperiod,usage,fittedlevelusage
2019-04-13,16187.24,17257.02
2019-04-14,16410.18,17347.49
2019-04-15,18453.52,17246.88
2019-04-16,18113.1,17929.24
2019-04-17,17712.54,17476.67
2019-04-18,15098.13,17266.89
2019-04-19,13026.76,15298.11
2019-04-20,13689.49,13728.9
2019-04-21,11907.81,14122.88
2019-04-22,13078.29,13291.25
2019-04-23,15823.23,14465.34
2019-04-24,14602.43,15690.12
2019-04-25,12628.7,13806.44
2019-04-26,15064.37,12247.59
2019-04-27,17163.32,16335.43
2019-04-28,17277.18,16967.72
2019-04-29,20093.13,17418.99
2019-04-30,18820.68,18978.9
2019-05-01,18799.63,17610.66
2019-05-02,17783.24,17000.12
2019-05-03,17965.56,17818.84
2019-05-04,16891.25,18002.03
2019-05-05,18665.49,18298.02
2019-05-06,21043.86,19157.41
2019-05-07,22188.93,21092.36
2019-05-08,22358.08,21232.56
2019-05-09,22797.46,22229.69
Optimization result from R
$minimum
[1] 1.018188
$objective
[1] 28031.49
test1
dateperiod,Usage,fittedlevelusage
2019-04-13,16187.24,17248.29
2019-04-14,16410.18,17337.86
2019-04-15,18453.52,17196.25
2019-04-16,18113.10,17896.74
2019-04-17,17712.54,17464.45
2019-04-18,15098.13,17285.82
2019-04-19,13026.76,15277.10
2019-04-20,13689.49,13733.90
2019-04-21,11907.81,14152.27
2019-04-22,13078.29,13337.53
2019-04-23,15823.23,14512.41
2019-04-24,14602.43,15688.68
2019-04-25,12628.70,13808.58
2019-04-26,15064.37,12244.91
2019-04-27,17163.32,16304.28
2019-04-28,17277.18,16956.91
2019-04-29,20093.13,17441.80
2019-04-30,18820.68,18928.29
2019-05-01,18794.10,17573.40
2019-05-02,17779.00,16969.20
2019-05-03,17960.16,17764.47
2019-05-04,16884.77,17952.23
2019-05-05,18658.16,18313.66
2019-05-06,21036.49,19149.12
2019-05-07,22182.11,21103.37
2019-05-08,22335.57,21196.23
2019-05-09,22797.46,22180.51
Optimization result from R
$minimum
[1] 1.499934
$objective
[1] 0
The optimization function used is shown below :-
optfn <- function(x)
{act <- x$usage
fcst <- x$fittedlevelusage
fn <- function(par)
{sum(abs(act - (fcst * par)))
}
adjfac <- optimize(fn, c(0.5, 1.5))
return(adjfac)
}
adjfacresults <- optfn(test)
adjfacresults <- optfn(test1)
Optimization result from R
adjfacresults <- optfn(test)
$minimum
[1] 1.018188
$objective
[1] 28031.49
Optimization result from R
adjfacresults <- optfn(test1)
$minimum [1]
1.499934
$objective
[1] 0
Can anyone help to identify why is R not doing the same process over the 2 data sets and outputting the correct results in both the cases.
The corresponding results using Excel Solver for the 2 datasets are as follows :-
For 'test' data set
par value = 1.018236659
objective function valule (min) : 28031
For 'test1' data set
par value = 1.01881062927878
objective function valule (min) : 28010
Best regards
Deepak
That's because the second column of test1 is named Usage, not usage. Therefore, act = x$usage is NULL, and the function fn returns sum(abs(NULL - something)) = sum(NULL) = 0. You have to rename this column to usage.

specify CFA with turbulances being the sum of exogenous correlations

I am trying to specify a curious model in lavaan in R language.
The model looks like this:
My specification attempt is shown bellow. What I find difficult to achieve is to fix the unique error of the observed variables to be the sum of two correlations of unique items.
for instance item y*1,2 covaries with y*1,3 and y*2,3 and its error is supposed to be cov y*1,3 + cov y*1,3.
How can I explicitly fix the item error to equal the sum of these covariances in the lavaan syntax bellow?
cfa_model_spesification<-'
C=~ #C4_12*i10i11+C4_13*i10i12+
#C5_12*i13i14+C5_13*i13i15+
#C6_12*i17i18+C6_13*i17i19+
C1_12*i1i2+C1_13*i1i3+
C2_12*i4i5+C2_13*i4i6+
C3_12*i7i8+C3_13*i7i9
R=~ #R4_23*i10i11+R4_12*i11i12+
#R5_23*i13i14+R5_12*i14i16+
#R6_23*i17i18+R6_12*i18i19+
R1_12*i1i2+R1_23*i2i3+
R2_12*i4i5+R2_23*i5i6+
R3_12*i7i8+R3_23*i8i9
O=~ #O4_13*i10i12+O4_23*i11i12+
#O5_13*i13i15+O5_23*i14i16+
#O6_13*i17i19+O6_23*i18i19+
O1_13*i1i3+O1_23*i2i3+
O2_13*i4i6+O2_23*i5i6+
O3_13*i7i9+O3_23*i8i9
O~~1*O
C~~1*C
R~~1*R
O~~C+R
R~~C
R1_23==-R1_12
R2_23==-R2_12
R3_23==-R3_12
R1_23>0
R2_23>0
R3_23>0
# R1_12<0
# R2_12<0
# R3_12<0
O1_13<0
O1_23<0
O2_13<0
O2_23<0
O3_13<0
O3_23<0
i1i2~~i1i3
i1i2~~i2i3
i1i3~~i2i3
i4i5~~i4i6
i4i5~~i5i6
i4i6~~i5i6
i7i8~~i7i9
i7i8~~i8i9
i7i9~~i8i9
i1i2~~1*i1i2
i4i5~~1*i4i5
i7i8~~1*i7i8
# i1i3~~equal("i1i3~~i1i2+i1i3~~i2i3")*i1i3
# i2i3~~equal("i2i3~~i1i2+i2i3~~i1i3")*i2i3
# i4i6~~equal("i4i6~~i4i5+i4i6~~i5i6")*i4i6
# i5i6~~equal("i5i6~~i4i5+i5i6~~i4i6")*i5i6
# i7i9~~equal("i7i9~~i7i8+i7i9~~i8i9")*i7i9
# i8i9~~equal("i8i9~~i7i8+i8i9~~i7i9")*i8i9
'
The syntax for this in mplus looks like this
TITLE:
Example
DATA:
FILE IS triplets.dat;
VARIABLE:
NAMES=i1i2 i1i3 i2i3 i4i5 i4i6 i5i6 i7i8 i7i9 i8i9 i10i11 i10i12 i11i12;
CATEGORICAL=i1i2-i11i12;
ANALYSIS:
ESTIMATOR=ulsmv;
PARAMETERIZATION=THETA;
MODEL:
Trait1 BY
i1i2*1 i1i3*1 (L1)
i4i5*-1 i4i6*-1 (L4)
i7i8*1 i7i9*1 (L7)
i10i11*1 i10i12*1 (L10);
Trait2 BY
i1i2*-1 (L2_n)
i2i3*1 (L2)
i4i5*-1 (L5_n)
i5i6*1 (L5)
i7i8*-1 (L8_n)
i8i9*1 (L8)
i10i11*1 (L11_n)
i11i12*-1 (L11);
Trait3 BY
i1i3*-1 i2i3*-1 (L3_n)
i4i6*-1 i5i6*-1 (L6_n)
i7i9*1 i8i9*1 (L9_n)
i10i12*-1 i11i12*-1 (L12_n);
Trait1-Trait3#1
Trait1 WITH Trait2*-0.4 Trait3*0;
Trait2 WITH Trait3*0.3;
i1i2*2 (e1e2);
i1i3*2 (e1e3);
i2i3*2 (e2e3);
i4i5*2 (e4e5);
i4i6*2 (e4e6);
i5i6*2 (e5e6);
i7i8*2 (e7e8);
i7i9*2 (e7e9);
i8i9*2 (e8e9);
i10i11*2 (e10e11);
i10i12*2 (e10e12);
i11i12*2 (e11e12);
i1i2 WITH i1i3*1 (e1);
i1i2 WITH i2i3*-1 (e2_n);
i1i3 WITH i2i3*1 (e3);
i4i5 WITH i4i6*1 (e4);
i4i5 WITH i5i6*-1 (e5_n);
i4i6 WITH i5i6*1 (e6);
i7i8 WITH i7i9*1 (e7);
i7i8 WITH i8i9*-1 (e8_n);
i7i9 WITH i8i9*1 (e9);
i10i11 WITH i10i12*1 (e10);
i10i11 WITH i11i12*-1 (e11_n);
i10i12 WITH i11i12*1 (e12);
MODEL CONSTRAINT:
L2_n=-L2;
L5_n=-L5;
L8_n=-L8;
L11_n=-L11;
e1e2=e1-e2_n;
e1e3=e1+e3;
e2e3=-e2_n+e3;
e4e5=e4-e5_n;
e4e6=e4+e6;
e5e6=-e5_n+e6;
e7e8=e7-e8_n;
e7e9=e7+e9;
e8e9=-e8_n+e9;
e10e11=e10-e11_n;
e10e12=e10+e12;
e11e12=-e11_n+e12;
e1=1;
e4=1;
e7=1;
e10=1;
This is the model spesification with lavaan I came up with for a tirt model
lavaan_6_model_spesification<-'# factor loadings (lambda)
trait1=~start(1)*L1*i1i2+start(1)*L1*i1i3+start(1)*L4*i4i5+start(1)*L4*i4i6+start(1)*L7*i7i8+start(1)*L7*i7i9+start(1)*L10*i10i11+start(1)*L10*i10i12+start(1)*L13*i13i14+start(1)*L13*i13i15+start(1)*L16*i16i17+start(1)*L16*i16i18
trait2=~start(-1)*L2n*i1i2+start(1)*L2*i2i3+start(-1)*L5n*i4i5+start(1)*L5*i5i6+start(-1)*L8n*i7i8+start(1)*L8*i8i9+start(-1)*L11n*i10i11+start(1)*L11*i11i12+start(-1)*L14n*i13i14+start(1)*L14*i14i15+start(-1)*L17n*i16i17+start(1)*L17*i17i18
trait3=~start(-1)*L3n*i1i3+start(-1)*L3n*i2i3+start(-1)*L6n*i4i6+start(-1)*L6n*i5i6+start(-1)*L9n*i7i9+start(-1)*L9n*i8i9+start(-1)*L12n*i10i12+start(-1)*L12n*i11i12+start(-1)*L15n*i13i15+start(-1)*L15n*i14i15+start(-1)*L18n*i16i18+start(-1)*L18n*i17i18
# fix factor variances to 1
trait1~~1*trait1
trait2~~1*trait2
trait3~~1*trait3
# factor correlations
trait1~~trait2+trait3
trait2~~trait3
# fix factor loadings of the same item to the same value
L2==-L2n
L5==-L5n
L8==-L8n
L11==-L11n
L14==-L14n
L17==-L17n
# declare uniquenesses (psi)
i1i2~~P1P2*i1i2
i1i3~~P1P3*i1i3
i2i3~~P2P3*i2i3
i4i5~~P4P5*i4i5
i4i6~~P4P6*i4i6
i5i6~~P5P6*i5i6
i7i8~~P7P8*i7i8
i7i9~~P7P9*i7i9
i8i9~~P8P9*i8i9
i10i11~~P10P11*i10i11
i10i12~~P10P12*i10i12
i11i12~~P11P12*i11i12
i13i14~~P13P14*i13i14
i13i15~~P13P15*i13i15
i14i15~~P14P15*i14i15
i16i17~~P16P17*i16i17
i16i18~~P16P18*i16i18
i17i18~~P17P18*i17i18
# correlated uniqunesses
i1i2~~start(1)*P1*i1i3
i1i2~~start(-1)*P2n*i2i3
i1i3~~start(1)*P3*i2i3
i4i5~~start(1)*P4*i4i6
i4i5~~start(-1)*P5n*i5i6
i4i6~~start(1)*P6*i5i6
i7i8~~start(1)*P7*i7i9
i7i8~~start(-1)*P8n*i8i9
i7i9~~start(1)*P9*i8i9
i10i11~~start(1)*P10*i10i12
i10i11~~start(-1)*P11n*i11i12
i10i12~~start(1)*P12*i11i12
i13i14~~start(1)*P13*i13i15
i13i14~~start(-1)*P14n*i14i15
i13i15~~start(1)*P15*i14i15
i16i17~~start(1)*P16*i16i18
i16i17~~start(-1)*P17n*i17i18
i16i18~~start(1)*P18*i17i18
# pairs uniqueness is equal to sum of 2 utility uniqunesses
P1P2==P1-P2n
P1P3==P1+P3
P2P3==-P2n+P3
P4P5==P4-P5n
P4P6==P4+P6
P5P6==-P5n+P6
P7P8==P7-P8n
P7P9==P7+P9
P8P9==-P8n+P9
P10P11==P10-P11n
P10P12==P10+P12
P11P12==-P11n+P12
P13P14==P13-P14n
P13P15==P13+P15
P14P15==-P14n+P15
P16P17==P16-P17n
P16P18==P16+P18
P17P18==-P17n+P18
# fix one uniqueness per block for identification
P1==1
P4==1
P7==1
P10==1
P13==1
P16==1
# force item parameters of the same item to be equal
'

Error when using cv.tree

Hi I tried using the function cv.tree from the package tree. I have a binary categorical response (called Label) and 30 predictors. I fit a tree object using all predictors.
I got the following error message that I don't understand:
Error in as.data.frame.default(data, optional = TRUE) :
cannot coerce class ""function"" to a data.frame
The data is the file 'training' taken from this site.
This is what I did:
x <- read.csv("training.csv")
attach(x)
library(tree)
Tree <- tree(Label~., x, subset=sample(1:nrow(x), nrow(x)/2))
CV <- cv.tree(Tree,FUN=prune.misclass)
The error occurs once cv.tree calls model.frame. The 'call' element of the tree object must contain a reference to a data frame whose name is also not the name of a loaded function.
Thus, not only will subsetting in the call to tree generate the error when cv.tree later uses the 'call' element of the tree object, using a dataframe with a name like "df" would give an error as well because model.frame will take this to be name of an existing function (i.e. the 'density of F distribution' from the stats package).
I think the problem is in the dependent variable list. The following works, but I think you need to read the problem description more carefully. First, setup the formula without weight.
x <- read.csv("training.csv")
vars<-setdiff(names(x),c("EventId","Label","Weight"))
fmla <- paste("Label", "~", vars[1], "+",
paste(vars[-c(1)], collapse=" + "))
Here's what you've been running
Tree <- tree(fmla, x, subset=sample(1:nrow(x), nrow(x)/2))
plot(Tree)
$size
[1] 6 5 4 3 1
$dev
[1] 25859 25859 27510 30075 42725
$k
[1] -Inf 0.0 1929.0 2791.0 6188.5
$method
[1] "misclass"
attr(,"class")
[1] "prune" "tree.sequence"
You may want to consider package rpart also
urows = sample(1:nrow(x), nrow(x)/2)
x_sub <- x[urows,]
Tree <- tree(fmla, x_sub)
plot(Tree)
CV <- cv.tree(Tree,FUN=prune.misclass)
CV
library(rpart)
tr <- rpart(fmla, data=x_sub, method="class")
printcp(tr)
Classification tree:
rpart(formula = fmla, data = x_sub, method = "class")
Variables actually used in tree construction:
[1] DER_mass_MMC DER_mass_transverse_met_lep
[3] DER_mass_vis
Root node error: 42616/125000 = 0.34093
n= 125000
CP nsplit rel error xerror xstd
1 0.153733 0 1.00000 1.00000 0.0039326
2 0.059274 2 0.69253 0.69479 0.0035273
3 0.020016 3 0.63326 0.63582 0.0034184
4 0.010000 5 0.59323 0.59651 0.0033393
If you include weight, then that is the only split.
vars<-setdiff(names(x),c("EventId","Label"))

Resources