I have the following generated dataset
library(MASS)
df <- data.frame(product= sample(x= c("toyota","honda","nissan","bmw"),size = 1000 ,replace = TRUE),
parameter = sample(x= c("X","Y", "A"),size = 1000 ,replace = TRUE),
value= rgamma(1000, shape = 5, rate = 0.1))
and I want to fit the lognormal distribution on column "value" and
I use the following code
dist_par <- fitdistr(unlist(df["value"]), "lognormal")
the result is something like below:
meanlog sdlog
3.8416 0.4292
(0.0458) (0.0324)
I have two questions:
I read the help and I guess that the meanlog and sdlog estimations are shown on the first row:
meanlog sdlog
3.8416 0.4292
but the second row of numbers (numbers in parentheses) are confusing, what are they?
meanlog sdlog
.... ....
(0.0458) (0.0324)
I know the result of fitdistr is a list but I don't know how to have access to those four values. For instance how can I get 3.8416 ?
If I run
dist_par[1]
then I get
meanlog sdlog
3.842 0.429
and if I run:
dist_par[1,1]
then I get the following error:
Error in dist_par[1, 1] : incorrect number of dimensions
According to the ?fitdistr documentation
An object of class "fitdistr", a list with four components,
estimate - the parameter estimates,
sd - the estimated standard errors,
vcov - the estimated variance-covariance matrix, and
loglik - the log-likelihood.
This would be evident if we check the structure
str(out)
List of 5
$ estimate: Named num [1:2] 3.801 0.455
..- attr(*, "names")= chr [1:2] "meanlog" "sdlog"
$ sd : Named num [1:2] 0.0144 0.0102
..- attr(*, "names")= chr [1:2] "meanlog" "sdlog"
$ vcov : num [1:2, 1:2] 0.000207 0 0 0.000103
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "meanlog" "sdlog"
.. ..$ : chr [1:2] "meanlog" "sdlog"
i.e. the print method returns the 'estimatte and inside the parentheses the sd and as they are list the [1,1] doesn't work, we need to use standard extraction methods i.e. either $ or [[
> out
meanlog sdlog
3.80075311 0.45468543
(0.01437842) (0.01016708)
> out$estimate
meanlog sdlog
3.8007531 0.4546854
> out$estimate[["meanlog"]]
[1] 3.800753
> out$sd
meanlog sdlog
0.01437842 0.01016708
i.e inside the list, the elements are just named vectors, so use the [ or [[ to extract by name
Related
I want to apply mk.test() to the large dataset and get results in a table/matrix.
My data look something like this:
Column A
Column B
...
ColumnXn
1
2
...
5
...
...
...
...
3
4
...
7
So far I managed to perform mk.test() for all columns and print the results:
for(i in 1:ncol(data)) {
print(mk.test(as.numeric(unlist(data[ , i]))))
}
I got all the results printed:
.....
Mann-Kendall trend test
data: as.numeric(unlist(data[, i]))
z = 4.002, n = 71, p-value = 6.28e-05
alternative hypothesis: true S is not equal to 0
sample estimates:
S varS tau
7.640000e+02 3.634867e+04 3.503154e-01
Mann-Kendall trend test
data: as.numeric(unlist(data[, i]))
z = 3.7884, n = 71, p-value = 0.0001516
alternative hypothesis: true S is not equal to 0
sample estimates:
S varS tau
7.240000e+02 3.642200e+04 3.283908e-01
....
However, I was wondering if it is possible to get results in a table/matrix format that I could save as excel.
Something like this:
Column
z
p-value
S
varS
tau
Column A
4.002
0.0001516
7.640000e+02
3.642200e+04
3.283908e-01
...
...
...
...
...
...
ColumnXn
3.7884
6.28e-05
7.240000e+02
3.642200e+04
3.283908e-01
Is it possible to do so?
I would really appreciate your help.
Instead of printing the test results you can store them in a variable. This variable holds the various test statistics and values. To find the names of the properties you can perform the test on the first row and find the property names using a string conversion:
testres = mk.test(as.numeric(unlist(data[ , 1])))
str(testres)
List of 9
$ data.name : chr "as.numeric(unlist(data[, 1]))"
$ p.value : num 0.296
$ statistic : Named num 1.04
..- attr(*, "names")= chr "z"
$ null.value : Named num 0
..- attr(*, "names")= chr "S"
$ parameter : Named int 3
..- attr(*, "names")= chr "n"
$ estimates : Named num [1:3] 3 3.67 1
..- attr(*, "names")= chr [1:3] "S" "varS" "tau"
$ alternative: chr "two.sided"
$ method : chr "Mann-Kendall trend test"
$ pvalg : num 0.296
- attr(*, "class")= chr "htest"
Here you see that for example the z-value is called testres$statistic and similar for the other properties. The values of S, varS and tau are not separate properties but they are grouped together in the list testres$estimates.
In the code you can create an empty dataframe, and in the loop add the results of that run to this dataframe. Then at the end you can convert to csv using write.csv().
library(trend)
# sample data
mydata = data.frame(ColumnA = c(1,3,5), ColumnB = c(2,4,1), ColumnXn = c(5,7,7))
# empty dataframe to store results
results = data.frame(matrix(ncol=6, nrow=0))
colnames(results) <- c("Column", "z", "p-value", "S", "varS", "tau")
for(i in 1:ncol(mydata)) {
# store test results in variable
testres = mk.test(as.numeric(unlist(mydata[ , i])))
# extract elements of result
testvars = c(colnames(mydata)[i], # column
testres$statistic, # z
testres$p.value, # p-value
testres$estimates[1], # S
testres$estimates[2], # varS
testres$estimates[3]) # tau
# add to results dataframe
results[nrow(results)+1,] <- testvars
}
write.csv(results, "mannkendall.csv", row.names=FALSE)
The resulting csv file can be opened in Excel.
I was wondering what is the most efficient way to extract (not print like HERE) only the Std.Dev. column from the vc object below as a vector?
library(lme4)
library(nlme)
data(Orthodont, package = "nlme")
fm1 <- lmer(distance ~ age + (age|Subject), data = Orthodont)
vc <- VarCorr(fm1) ## extract only the `Std.Dev.` column as a vector
The structure of 'vc' suggests it is a list with single element 'Subject' and the 'stddev' is an attribute
str(vc)
#List of 1
# $ Subject: num [1:2, 1:2] 6.3334 -0.3929 -0.3929 0.0569
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:2] "(Intercept)" "age"
# .. ..$ : chr [1:2] "(Intercept)" "age"
# ..- attr(*, "stddev")= Named num [1:2] 2.517 0.239 ####
So, extract the attribute directly
attr(vc$Subject, "stddev")
and the residual standard deviation is an outside attribute
attr(vc, "sc")
#[1] 1.297364
If we combine them with c, we get a single vector
c(attr(vc$Subject, "stddev"), attr(vc, "sc"))
# (Intercept) age
# 2.5166317 0.2385853 1.2973640
Wrap with as.numeric/as.vector to remove the names as it is a named vector
Or use attributes
c(attributes(vc)$sc, attributes(vc$Subject)$stddev)
If you want the three elements in the column, you can use:
as.numeric(c(attr(vc[[1]], "stddev"), attr(vc, "sc")))
I am trying to model the performance of a portfolio consisting of a basket of ETFs. To do this, I am using a T copula. For now, I have specified the marginals (i.e. the performance of the individual ETFs) as being normal, however, I want to use a Student t-distribution instead of a normal distribution.
I have looked into the fit.st() method from the QRM package, but I am unsure how to combine this with the copula package.
I know how to implement normally distributed margins:
mv.NE <- mvdc(normalCopula(0.75), c("norm"),
list(list(mean = 0, sd =2)))
How can I do the same thing, but with a t-distribution?
All that you need to do is to use tCopula instead of the normalCopula. You need to set up the parameter and degree of freedom of t-copula. And you need to specify the margins as well.
Hence, here we replace the normalCopula with tCopula and df=5 is the degree of freedom. Both margins are normal (as you want).
mv.NE <- mvdc(tCopula(0.75, df=5), c("norm", "norm"),
+ list(list(mean = 0, sd =2), list(list(mean = 0, sd =2))))
The result is:
Multivariate Distribution Copula based ("mvdc")
# copula:
t-copula, dim. d = 2
Dimension: 2
Parameters:
rho.1 = 0.75
df = 5.00
# margins:
[1] "norm" "norm"
with 2 (not identical) margins; with parameters (# paramMargins)
List of 2
$ :List of 2
..$ mean: num 0
..$ sd : num 2
$ :List of 1
..$ mean:List of 2
.. ..$ mean: num 0
.. ..$ sd : num 2
For t-margins, use this:
mv.NE <- mvdc(tCopula(0.75), c("t","t"),list(t=5,t=5))
Multivariate Distribution Copula based ("mvdc")
# copula:
t-copula, dim. d = 2
Dimension: 2
Parameters:
rho.1 = 0.75
df = 4.00
# margins:
[1] "t" "t"
with 2 (not identical) margins; with parameters (# paramMargins)
List of 2
$ t: Named num 5
..- attr(*, "names")= chr "df"
$ t: Named num 5
..- attr(*, "names")= chr "df"
I have a following code resulting in a table-like output
lvs <- c("normal", "abnormal")
truth <- factor(rep(lvs, times = c(86, 258)),
levels = rev(lvs))
pred <- factor(
c(
rep(lvs, times = c(54, 32)),
rep(lvs, times = c(27, 231))),
levels = rev(lvs))
xtab <- table(pred, truth)
library(caret)
confusionMatrix(xtab)
confusionMatrix(pred, truth)
confusionMatrix(xtab, prevalence = 0.25)
I would like to export the below part of the output as a .csv table
Accuracy : 0.8285
95% CI : (0.7844, 0.8668)
No Information Rate : 0.75
P-Value [Acc > NIR] : 0.0003097
Kappa : 0.5336
Mcnemar's Test P-Value : 0.6025370
Sensitivity : 0.8953
Specificity : 0.6279
Pos Pred Value : 0.8783
Neg Pred Value : 0.6667
Prevalence : 0.7500
Detection Rate : 0.6715
Detection Prevalence : 0.7645
Balanced Accuracy : 0.7616
Attempt to write it as a .csv table results in the error message:
write.csv(confusionMatrix(xtab),file="file.csv")
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) :
cannot coerce class ""confusionMatrix"" to a data.frame
Doing the whole work manually, for obvious reasons, is impractical and prone to human errors.
Any suggestions on how to export it as a .csv?
Using caret package
results <- confusionMatrix(pred, truth)
as.table(results) gives
Reference
Prediction X1 X0
X1 36 29
X0 218 727
as.matrix(results,what="overall") gives
Accuracy 7.554455e-01
Kappa 1.372895e-01
AccuracyLower 7.277208e-01
AccuracyUpper 7.816725e-01
AccuracyNull 7.485149e-01
AccuracyPValue 3.203599e-01
McnemarPValue 5.608817e-33
and
as.matrix(results, what = "classes") gives
Sensitivity 0.8953488
Specificity 0.6279070
Pos Pred Value 0.8783270
Neg Pred Value 0.6666667
Precision 0.8783270
Recall 0.8953488
F1 0.8867562
Prevalence 0.7500000
Detection Rate 0.6715116
Detection Prevalence 0.7645349
Balanced Accuracy 0.7616279
Using these and write.csv command you can get the entire confusionMatrix info
Ok, so if you inspect the output of confusionMatrix(xtab, prevalence = 0.25) , it's a list:
cm <- confusionMatrix(pred, truth)
str(cm)
List of 5
$ positive: chr "abnormal"
$ table : 'table' int [1:2, 1:2] 231 27 32 54
..- attr(*, "dimnames")=List of 2
.. ..$ Prediction: chr [1:2] "abnormal" "normal"
.. ..$ Reference : chr [1:2] "abnormal" "normal"
$ overall : Named num [1:7] 0.828 0.534 0.784 0.867 0.75 ...
..- attr(*, "names")= chr [1:7] "Accuracy" "Kappa" "AccuracyLower" "AccuracyUpper" ...
$ byClass : Named num [1:8] 0.895 0.628 0.878 0.667 0.75 ...
..- attr(*, "names")= chr [1:8] "Sensitivity" "Specificity" "Pos Pred Value" "Neg Pred Value" ...
$ dots : list()
- attr(*, "class")= chr "confusionMatrix"
From here on you select the appropriate objects that you want to create a csv from and make a data.frame that will have a column for each variable. In your case, this will be:
tocsv <- data.frame(cbind(t(cm$overall),t(cm$byClass)))
# You can then use
write.csv(tocsv,file="file.csv")
I found that capture.output works best for me.
It simply copies your output as a .csv file
(you can also do it as .txt)
capture.output(
confusionMatrix(xtab, prevalence = 0.25),
file = "F:/Home Office/result.csv")
The absolute easiest solution is to simply write out using readr::write_rds. You can export and import all while keeping the confusionMatrix structure intact.
If A is a caret::confusionMatrix object, then:
broom::tidy(A) %>% writexl::write_xlsx("mymatrix.xlsx")
optionally replace writexl with write.csv().
To also include the table on a separate sheet:
broom::tidy(A) %>% list(as.data.frame(A$table)) %>% writexl::write_xlsx("mymatrix.xlsx")
I am performing a t-test in R
out <- t.test(x=input1, y=input2, alternative=c("two.sided","less","greater"), mu=0, paired=TRUE, conf.level = 0.95)
It gives the result
Paired t-test
data: input1 and input2
t = -1.1469, df = 7, p-value = 0.2891
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.15100900 0.05236717
sample estimates:
mean of the differences
-0.04932091
I need to change the names of the data in result. e.g.,
data: Fruits and Vegetables
Please, anyone give me an idea to include some attribute in t.test to change the data names.
With some dummy data
set.seed(1)
input1 <- rnorm(20, mean = -1)
input2 <- rnorm(20, mean = 5)
It would be easier just to rename or create objects with the desired names:
Fruits <- input1
Vegetables <- input2
t.test(x = Fruits, y = Vegetables, paired = TRUE, alternative = "two.sided")
Paired t-test
data: Fruits and Vegetables
t = -18.6347, df = 19, p-value = 1.147e-13
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-6.454791 -5.151218
sample estimates:
mean of the differences
-5.803005
But if you really want to do this after the fact, then grab the object returned by t.test():
tmp <- t.test(x = input1, y = input2, paired = TRUE, alternative = "two.sided")
Look at the structure of the object tmp
> str(tmp)
List of 9
$ statistic : Named num -18.6
..- attr(*, "names")= chr "t"
$ parameter : Named num 19
..- attr(*, "names")= chr "df"
$ p.value : num 1.15e-13
$ conf.int : atomic [1:2] -6.45 -5.15
..- attr(*, "conf.level")= num 0.95
$ estimate : Named num -5.8
..- attr(*, "names")= chr "mean of the differences"
$ null.value : Named num 0
..- attr(*, "names")= chr "difference in means"
$ alternative: chr "two.sided"
$ method : chr "Paired t-test"
$ data.name : chr "input1 and input2"
- attr(*, "class")= chr "htest"
and note the data.name component. We can replace that with a character string:
tmp$data.name <- "Fuits and Vegetables"
The print tmp:
> tmp
Paired t-test
data: Fuits and Vegetables
t = -18.6347, df = 19, p-value = 1.147e-13
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-6.454791 -5.151218
sample estimates:
mean of the differences
-5.803005