I want to apply mk.test() to the large dataset and get results in a table/matrix.
My data look something like this:
Column A
Column B
...
ColumnXn
1
2
...
5
...
...
...
...
3
4
...
7
So far I managed to perform mk.test() for all columns and print the results:
for(i in 1:ncol(data)) {
print(mk.test(as.numeric(unlist(data[ , i]))))
}
I got all the results printed:
.....
Mann-Kendall trend test
data: as.numeric(unlist(data[, i]))
z = 4.002, n = 71, p-value = 6.28e-05
alternative hypothesis: true S is not equal to 0
sample estimates:
S varS tau
7.640000e+02 3.634867e+04 3.503154e-01
Mann-Kendall trend test
data: as.numeric(unlist(data[, i]))
z = 3.7884, n = 71, p-value = 0.0001516
alternative hypothesis: true S is not equal to 0
sample estimates:
S varS tau
7.240000e+02 3.642200e+04 3.283908e-01
....
However, I was wondering if it is possible to get results in a table/matrix format that I could save as excel.
Something like this:
Column
z
p-value
S
varS
tau
Column A
4.002
0.0001516
7.640000e+02
3.642200e+04
3.283908e-01
...
...
...
...
...
...
ColumnXn
3.7884
6.28e-05
7.240000e+02
3.642200e+04
3.283908e-01
Is it possible to do so?
I would really appreciate your help.
Instead of printing the test results you can store them in a variable. This variable holds the various test statistics and values. To find the names of the properties you can perform the test on the first row and find the property names using a string conversion:
testres = mk.test(as.numeric(unlist(data[ , 1])))
str(testres)
List of 9
$ data.name : chr "as.numeric(unlist(data[, 1]))"
$ p.value : num 0.296
$ statistic : Named num 1.04
..- attr(*, "names")= chr "z"
$ null.value : Named num 0
..- attr(*, "names")= chr "S"
$ parameter : Named int 3
..- attr(*, "names")= chr "n"
$ estimates : Named num [1:3] 3 3.67 1
..- attr(*, "names")= chr [1:3] "S" "varS" "tau"
$ alternative: chr "two.sided"
$ method : chr "Mann-Kendall trend test"
$ pvalg : num 0.296
- attr(*, "class")= chr "htest"
Here you see that for example the z-value is called testres$statistic and similar for the other properties. The values of S, varS and tau are not separate properties but they are grouped together in the list testres$estimates.
In the code you can create an empty dataframe, and in the loop add the results of that run to this dataframe. Then at the end you can convert to csv using write.csv().
library(trend)
# sample data
mydata = data.frame(ColumnA = c(1,3,5), ColumnB = c(2,4,1), ColumnXn = c(5,7,7))
# empty dataframe to store results
results = data.frame(matrix(ncol=6, nrow=0))
colnames(results) <- c("Column", "z", "p-value", "S", "varS", "tau")
for(i in 1:ncol(mydata)) {
# store test results in variable
testres = mk.test(as.numeric(unlist(mydata[ , i])))
# extract elements of result
testvars = c(colnames(mydata)[i], # column
testres$statistic, # z
testres$p.value, # p-value
testres$estimates[1], # S
testres$estimates[2], # varS
testres$estimates[3]) # tau
# add to results dataframe
results[nrow(results)+1,] <- testvars
}
write.csv(results, "mannkendall.csv", row.names=FALSE)
The resulting csv file can be opened in Excel.
I was wondering if there is an Base R function to extract the argument values that are in use in a particular function call?
For example, for each of the objects x, y and z below, is there a generic way to extract the arguments names (e.g., n, sd, rate, scale) in use and the values (e.g., 1e4 for n) assigned by the user or the system to each of those arguments?
NOTE: in certain "R" functions such extraction is easily done. For example, in density(), one can easily extract the argument values using density()$call.
x = rnorm(n = 1e4, mean = 2, sd = 4)
y = rlogis(1e4, 20)
z = rexp(1e4, 5)
This is really not an easy thing to do. If you're building the function, you can capture the call with match.call, which can be parsed without too much trouble:
f <- function(x, y = 1, ...){
cl <- match.call()
as.list(cl[-1])
}
str(f(1))
#> List of 1
#> $ x: num 1
str(f(1, 'foo'))
#> List of 2
#> $ x: num 1
#> $ y: chr "foo"
str(f(1, 'foo', list(3), fun = sum))
#> List of 4
#> $ x : num 1
#> $ y : chr "foo"
#> $ : language list(3)
#> $ fun: symbol sum
Note match.call only captures the call, and doesn't add in default parameters (there's no y in the first example). Those can be accessed with formals(f), since f isn't primitive, so complete arguments could be created via
user_args <- f(1)
fun_args <- formals(f)
fun_args[names(user_args)] <- user_args
str(fun_args)
#> List of 3
#> $ x : num 1
#> $ y : num 1
#> $ ...: symbol
This approach doesn't work well for completed dots, but if they're completed then match.call itself should suffice. To extract the parameters passed to an existing function, you could write a wrapper with match.call, but it's hardly practical to reconstruct every function, and the call you capture will look funny anyway unless you overwrite existing functions. As long as the function isn't primitive, you could use quote to enable the formals approach, though:
cl <- quote(rnorm(5, 2))
user_args <- as.list(cl[-1]) # subset call to only args
fun_args <- formals(as.character(cl[1])) # subset call to only function
names(user_args) <- names(fun_args)[seq(length(user_args))]
fun_args[names(user_args)] <- user_args
str(fun_args)
#> List of 3
#> $ n : num 5
#> $ mean: num 2
#> $ sd : num 1
Another approach is to use rlang, whose functions handle primitives well (fn_fmls(sum)), can extract parts of the call easily and reliably (lang_fn, lang_args), name unnamed parameters accurately (lang_standardize), and more. Together with purrr's new list_modify (dev version), it all becomes fairly painless:
library(rlang)
fun_call <- quo(rnorm(5))
fun_call
#> <quosure: frame>
#> ~rnorm(5)
default_args <- fn_fmls(lang_fn(fun_call))
str(default_args)
#> Dotted pair list of 3
#> $ n : symbol
#> $ mean: num 0
#> $ sd : num 1
user_args <- lang_args(lang_standardise(fun_call))
str(user_args)
#> List of 1
#> $ n: num 5
calling_args <- purrr::list_modify(default_args, user_args)
str(calling_args)
#> Dotted pair list of 3
#> $ n : num 5
#> $ mean: num 0
#> $ sd : num 1
From this question, I was wondering if it's possible to extract the Quadratic discriminant analysis (QDA's) scores and reuse them after like PCA scores.
## follow example from ?lda
Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),
Sp = rep(c("s","c","v"), rep(50,3)))
set.seed(1) ## remove this line if you want it to be pseudo random
train <- sample(1:150, 75)
table(Iris$Sp[train])
## your answer may differ
## c s v
## 22 23 30
Using the QDA here
z <- qda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train)
## get the whole prediction object
pred <- predict(z)
## show first few sample scores on LDs
Here, you can see that it's not working.
head(pred$x)
# NULL
plot(LD2 ~ LD1, data = pred$x)
# Error in eval(expr, envir, enclos) : object 'LD2' not found
NOTE: Too long/formatted for a comment. NOT AN ANSWER
You may want to try the rrcov package:
library(rrcov)
z <- QdaCov(Sp ~ ., Iris[train,], prior = c(1,1,1)/3)
pred <- predict(z)
str(pred)
## Formal class 'PredictQda' [package "rrcov"] with 4 slots
## ..# classification: Factor w/ 3 levels "c","s","v": 2 2 2 1 3 2 2 1 3 2 ...
## ..# posterior : num [1:41, 1:3] 5.84e-45 5.28e-50 1.16e-25 1.00 1.48e-03 ...
## ..# x : num [1:41, 1:3] -97.15 -109.44 -54.03 2.9 -3.37 ...
## ..# ct : 'table' int [1:3, 1:3] 13 0 1 0 16 0 0 0 11
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ Actual : chr [1:3] "c" "s" "v"
## .. .. ..$ Predicted: chr [1:3] "c" "s" "v"
It also has robust PCA methods that may be useful.
Unfortunately, not every model in R conforms to the same object structure/API and this won't be a linear model, so it is unlikely to conform to linear model fit structure APIs.
There's an example of how to visualize the qda results here — http://ramhiser.com/2013/07/02/a-brief-look-at-mixture-discriminant-analysis/
And, you can do:
library(klaR)
partimat(Sp ~ ., data=Iris, method="qda", subset=train)
for a partition plot of the qda results.
I am applying the CCR Data Envelopment Analysis model to benchmark between stock data. To do that I am running R code from a DEA paper published here. This document comes with step by step instructions on how to implement the model below in R.
The mathematical formulation looks like this:
Finding the model I needed already made for me seemed too good to be true. I am getting this error when I run it:
Error in match.names(clabs, nmi) : names do not match previous names
Traceback:
4 stop("names do not match previous names")
3 match.names(clabs, nmi)
2 rbind(deparse.level, ...)
1 rbind(aux, c(inputs[i, ], rep(0, m)))
My test data looks as follows:
> dput(testdfst)
structure(list(Name = structure(1:10, .Label = c("Stock1", "Stock2",
"Stock3", "Stock4", "Stock5", "Stock6", "Stock7", "Stock8", "Stock9",
"Stock10"), class = "factor"), Date = structure(c(14917, 14917,
14917, 14917, 14917, 14917, 14917, 14917, 14917, 14917), class = "Date"),
`(Intercept)` = c(0.454991569278089, 1, 0, 0.459437188169979,
0.520523252955415, 0.827294243132907, 0.642696631099892,
0.166219881886161, 0.086341470900152, 0.882092217743293),
rmrf = c(0.373075150411683, 0.0349067218712968, 0.895550280607866,
1, 0.180151549474574, 0.28669170468735, 0.0939821798173586,
0, 0.269645291515763, 0.0900619760898984), smb = c(0.764987877309785,
0.509094491489323, 0.933653313048327, 0.355340700554647,
0.654000372286503, 1, 0, 0.221454091364611, 0.660571586102851,
0.545086931342479), hml = c(0.100608151187926, 0.155064872867367,
1, 0.464298576152336, 0.110803875258027, 0.0720803195598597,
0, 0.132407005239869, 0.059742053684015, 0.0661623383303703
), rmw = c(0.544512524466665, 0.0761995312858816, 1, 0, 0.507699534880555,
0.590607506295898, 0.460148690870041, 0.451871218073951,
0.801698199214685, 0.429094840372901), cma = c(0.671162426988512,
0.658898571758625, 0, 0.695830176886926, 0.567814542084284,
0.942862571603074, 1, 0.37571611336359, 0.72565234813082,
0.636762557753099), Returns = c(0.601347600017365, 0.806071701848376,
0.187500487065719, 0.602971876359073, 0.470386289298666,
0.655773224143057, 0.414258177255333, 0, 0.266112191477882,
1)), .Names = c("Name", "Date", "(Intercept)", "rmrf", "smb",
"hml", "rmw", "cma", "Returns"), row.names = c("Stock1.2010-11-04",
"Stock2.2010-11-04", "Stock3.2010-11-04", "Stock4.2010-11-04",
"Stock5.2010-11-04", "Stock6.2010-11-04", "Stock7.2010-11-04",
"Stock8.2010-11-04", "Stock9.2010-11-04", "Stock10.2010-11-04"
), class = "data.frame")
And the linear model program is this:
namesDMU <- testdfst[1]
inputs <- testdfst[c(4,5,6,7,8)]
outputs <- testdfst[9]
N <- dim(testdfst)[1] # number of DMU
s <- dim(inputs)[2] # number of inputs
m <- dim(outputs)[2] # number of outputs
f.rhs <- c(rep(0,N),1) # RHS constraints
f.dir <- c(rep("<=",N),"=") # directions of the constraints
aux <- cbind(-1*inputs,outputs) # matrix of constraint coefficients in (6)
for (i in 1:N) {
f.obj <- c(0*rep(1,s),outputs[i,]) # objective function coefficients
f.con <- rbind(aux ,c(inputs[i,], rep(0,m))) # add LHS of bTz=1
results <-lp("max",f.obj,f.con,f.dir,f.rhs,scale=1,compute.sens=TRUE) # solve LPP
multipliers <- results$solution # input and output weights
efficiency <- results$objval # efficiency score
duals <- results$duals # shadow prices
if (i==1) {
weights <- multipliers
effcrs <- efficiency
lambdas <- duals [seq(1,N)]
} else {
weights <- rbind(weights,multipliers)
effcrs <- rbind(effcrs , efficiency)
lambdas <- rbind(lambdas,duals[seq(1,N)])
}
}
Spotting the problem..
A quick search reveals that the rbind function might be at fault. This is located on this line:
f.con <- rbind(aux ,c(inputs[i,], rep(0,m)))
I tried to isolate the data from the loops to see what the problem is:
aux <- cbind(-1*inputs,outputs)
a <- c(inputs[1,])
b <- rep(0,m)
> aux
rmrf smb hml rmw cma Returns
1 -0.37307515 -0.7649879 -0.10060815 -0.54451252 -0.6711624 0.6013476
2 -0.03490672 -0.5090945 -0.15506487 -0.07619953 -0.6588986 0.8060717
3 -0.89555028 -0.9336533 -1.00000000 -1.00000000 0.0000000 0.1875005
4 -1.00000000 -0.3553407 -0.46429858 0.00000000 -0.6958302 0.6029719
5 -0.18015155 -0.6540004 -0.11080388 -0.50769953 -0.5678145 0.4703863
6 -0.28669170 -1.0000000 -0.07208032 -0.59060751 -0.9428626 0.6557732
7 -0.09398218 0.0000000 0.00000000 -0.46014869 -1.0000000 0.4142582
8 0.00000000 -0.2214541 -0.13240701 -0.45187122 -0.3757161 0.0000000
9 -0.26964529 -0.6605716 -0.05974205 -0.80169820 -0.7256523 0.2661122
10 -0.09006198 -0.5450869 -0.06616234 -0.42909484 -0.6367626 1.0000000
> a
$rmrf
[1] 0.3730752
$smb
[1] 0.7649879
$hml
[1] 0.1006082
$rmw
[1] 0.5445125
$cma
[1] 0.6711624
I also looked at this:
> identical(names(aux[1]), names(a[1]))
[1] TRUE
Column and row names are unimportant to me as long as the problem is calculated so I decided to try remove them. This one works but doesn't solve the problem.
rownames(testdfst) <- NULL
Looking at the contents of a and aux, maybe the problem lies with the column names.
colnames(testdfst) <- NULL does not work. It deletes everything in my data-frame. It could maybe... provide a solution to the problem if I can figure out how to remove the column names.
As you correctly identified, the following line is giving you the trouble:
i <- 1
f.con <- rbind(aux ,c(inputs[i,], rep(0,m))) # add LHS of bTz=1
# Error in match.names(clabs, nmi) : names do not match previous names
You can use the str function to see the structure of each element of this expression:
str(aux)
# 'data.frame': 10 obs. of 6 variables:
# $ rmrf : num -0.3731 -0.0349 -0.8956 -1 -0.1802 ...
# $ smb : num -0.765 -0.509 -0.934 -0.355 -0.654 ...
# $ hml : num -0.101 -0.155 -1 -0.464 -0.111 ...
# $ rmw : num -0.5445 -0.0762 -1 0 -0.5077 ...
# $ cma : num -0.671 -0.659 0 -0.696 -0.568 ...
# $ Returns: num 0.601 0.806 0.188 0.603 0.47 ...
str(inputs[i,])
# 'data.frame': 1 obs. of 5 variables:
# $ rmrf: num 0.373
# $ smb : num 0.765
# $ hml : num 0.101
# $ rmw : num 0.545
# $ cma : num 0.671
str(c(inputs[i,], rep(0, m)))
# List of 6
# $ rmrf: num 0.373
# $ smb : num 0.765
# $ hml : num 0.101
# $ rmw : num 0.545
# $ cma : num 0.671
# $ : num 0
Now you can see that the list you are trying to combine with rbind has different names from the data frame it's being combined with. Probably the simplest way to proceed would be to pass a vector as the new row instead of a list, which you can accomplish by converting inputs[i,] to a matrix with as.matrix:
str(c(as.matrix(inputs[i,]), rep(0, m)))
# num [1:6] 0.373 0.765 0.101 0.545 0.671 ...
This will cause the code to work without an error:
f.con <- rbind(aux, c(as.matrix(inputs[i,]), rep(0, m)))
A few unsolicited R coding tips -- instead of dim(x)[1] and dim(x)[2] to get the number of rows and columns, most would find it more readable to do nrow(x) and ncol(x). Also, building objects in a for loop by rbinding one row at a time can be very inefficient -- you can read more about that in the second circle of the R Inferno.
Thanks to Hadley's plyr package ddply function we can take a dataframe, break it down into subdataframes by factors, send each to a function, and then combine the function results for each subdataframe into a new dataframe.
But what if the function returns an object of a class like glm or in my case, a c("glm", "lm"). Then, these can't be combined into a dataframe can they? I get this error instead
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : cannot coerce class 'c("glm", "lm")' into a data.frame
Is there some more flexible data structure that will accommodate all the complex glm class results of my function calls, preserving the information regarding the dataframe subsets?
Or should this be done in an entirely different way?
Just to expand my comment: plyr has set of functions to combine input and output type. So when you function returns something inconvertible to data.frame you should use list as output. So instead of using ddply use dlply.
When you want to do something on each model and convert results to data.frame then ldply is the key.
Lets create some models using dlply
list_of_models <- dlply(warpbreaks, .(tension), function(X) lm(breaks~wool, data=X))
str(list_of_models, 1)
# List of 3
# $ L:List of 13
# ..- attr(*, "class")= chr "lm"
# $ M:List of 13
# ..- attr(*, "class")= chr "lm"
# $ H:List of 13
# ..- attr(*, "class")= chr "lm"
# - attr(*, "split_type")= chr "data.frame"
# - attr(*, "split_labels")='data.frame': 3 obs. of 1 variable:
It gives list of three lm models.
Using ldply you could create a data.frame, e.g.
with predictions of each model:
ldply(list_of_models, function(model) {
data.frame(fit=predict(model, warpbreaks))
})
# tension fit
# 1 L 44.5556
# 2 L 44.5556
# 3 L 44.5556
with statistics to each model:
ldply(list_of_models, function(model) {
c(
aic = extractAIC(model),
deviance = deviance(model),
logLik = logLik(model),
confint = confint(model),
coef = coef(model)
)
})
# tension aic1 aic2 deviance logLik confint1 confint2 confint3 confint4 coef.(Intercept) coef.woolB
# 1 L 2 98.3291 3397.78 -72.7054 34.2580 -30.89623 54.8531 -1.77044 44.5556 -16.33333
# 2 M 2 81.1948 1311.56 -64.1383 17.6022 -4.27003 30.3978 13.82559 24.0000 4.77778
# 3 H 2 76.9457 1035.78 -62.0137 18.8701 -13.81829 30.2411 2.26273 24.5556 -5.77778