When using the mulrank function from the WRS package I get a zero p-value
$test.stat
[1] 44.50749
$nu1
[1] 3.330729
$p.value
[,1]
[1,] 0
$N
[1] 98
$q.hat
[,1] [,2] [,3]
[1,] 0.4260204 0.6738095 0.6554422
[2,] 0.6619048 0.1530612 0.1530612
[3,] 0.4974490 0.5928571 0.6323129
Is this a reasonable output?
Also, when using the cmanova function, I get this error message on the same data:
Error in X[i, ] - X[ii, ] : non-numeric argument to binary operator
Related
I am using a RcppHMM package to make a GHMM(Multivariate gaussian mixture HMM model) with continuous observation.
I want to learn an EM algorithm using continuous observations with different sequence lengths.
To be specific, each observation has a different sequence length from 3 to 6.
I tried to fit the model using the whole observation dataset at once (I made the dataset with ncol=6(maximum sequence length) and filled the empty part with all zero), but it didn't work
so I separated observations as groups with the same lengths [O3, O4, O5, O6]
and updated the model by each group.
Each observation group looks like this
O3
[,1] [,2] [,3]
[1,] 0.8550940 0.3231340 0.8639223
[2,] 0.4453262 0.5840305 0.4356958
[3,] 0.4344789 -1.2234760 0.4344789
[4,] -0.5003085 3.0322560 -0.5003085
[5,] -0.1459598 -0.4661041 -0.1459598
[6,] -0.1977263 -0.6352724 -0.1977263
O4
[,1] [,2] [,3] [,4]
[1,] 0.8965332 0.3338220 0.7270241 0.8824540
[2,] 0.4033438 0.4131293 0.1593136 0.4187023
[3,] -0.7329015 -1.6828296 -0.1550487 -0.1550487
[4,] -0.3213490 7.3449076 -0.2787857 -0.2787857
[5,] -0.2868067 -0.3743332 -0.1340566 -0.1340566
[6,] 2.6832742 -0.5844305 0.2320774 0.2320774
O5
[,1] [,2] [,3] [,4] [,5]
[1,] 0.83401341 0.2492370 0.47493190 0.6440035 0.84985396
[2,] 0.37988234 0.2335883 0.17043570 0.2116066 0.36260248
[3,] -0.05240445 -0.3034002 -0.05240445 -0.3034002 -0.05240445
[4,] -0.37240867 1.1500528 -0.37240867 1.1500528 -0.37240867
[5,] -0.02056839 0.9343497 -0.02056839 0.9343497 -0.02056839
[6,] -0.27586584 -0.4406833 -0.27586584 -0.4406833 -0.27586584
O6
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.9287066 0.35065802 0.4493442 0.6142040 0.7423286 0.9217381
[2,] 0.3852644 0.09612516 0.1623447 0.1320334 0.1875127 0.3928661
[3,] 0.1436024 -0.08326038 0.7800491 0.1436024 0.1926751 0.1436024
[4,] -0.4284304 -0.27916609 -0.5224586 -0.4284304 0.1267840 -0.4284304
[5,] -0.8846364 -0.81131525 -0.1781479 -0.8846364 -0.1266250 -0.8846364
[6,] -0.2141231 -0.78377461 -0.4440142 -0.2141231 -0.7888260 -0.2141231
nrow is the number of dimension of observation, and ncol is lengths of sequences.
When I updated the model with the first group that has sequence length 3, it operated.
But when I tried to re-update model with second group that has sequence length 4, the warning message came out as below,
In learnEM(newModel, O4[, 1:4, ], iter = 20, delta = 1e-05, print = TRUE) :
It is recommended to have a covariance matrix with a determinant bigger than 1/ ((2*PI)^k) .
Does anyone know how to fix this warning message?
And is there any proper way to learn a EM algorithm with observations that have different sequence lengths using this package?
I have the matrix X:
> X
[,1] [,2] [,3] [,4] [,5]
[1,] 0.02253900 -0.012803512 -0.013251695 -0.01728001 0.07287110
[2,] 0.07233855 -0.004631113 -0.010130339 -0.01441094 0.06592686
[3,] 0.05094459 0.030918198 0.032321927 0.01459335 0.02315130
[4,] 0.05484819 -0.019442784 -0.017389669 0.01044847 0.05890890
[5,] 0.02164396 -0.030857845 0.007139042 -0.08033237 -0.02356664
[6,] 0.02388358 -0.007537327 0.018435093 -0.01349781 0.04029035
> class(X)
"matrix"
What I would like is to make this scatterplot
where each column is grouped in each value of x
Here my failed attempt
ggplot(aes(x = 1:5, y = X)) + geom_point()
Here's one base R solution:
plot(rep(1:ncol(x), each=nrow(x)), x)
I have a matrix of n variables and I want to make an new matrix that is a pairwise difference of each vector, but not of itself. Here is an example of the data.
Transportation.services Recreational.goods.and.vehicles Recreation.services Other.services
2.958003 -0.25983789 5.526694 2.8912009
2.857370 -0.03425164 5.312857 2.9698044
2.352275 0.30536569 4.596742 2.9190123
2.093233 0.65920773 4.192716 3.2567390
1.991406 0.92246531 3.963058 3.6298314
2.065791 1.06120930 3.692287 3.4422340
I tried running a for loop below, but I'm aware that R is very slow with loops.
Difference.Matrix<- function(data){
n<-2
new.cols="New Columns"
list = list()
for (i in 1:ncol(data)){
for (j in n:ncol(data)){
name <- paste("diff",i,j,data[,i],data[,j],sep=".")
new<- data[,i]-data[,j]
list[[new.cols]]<-c(name)
data<-merge(data,new)
}
n= n+1
}
results<-list(data=data)
return(results)
}
As I said before the code is running very slow and has not even finished a single run through yet. Also I apologize for the beginner level coding. Also I am aware this code leaves the original data on the matrix, but I can delete it later.
Is it possible for me to use an apply function or foreach on this data?
You can find the pairs with combn and use apply to create the result:
apply(combn(ncol(d), 2), 2, function(x) d[,x[1]] - d[,x[2]])
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 3.217841 -2.568691 0.0668021 -5.786532 -3.151039 2.6354931
## [2,] 2.891622 -2.455487 -0.1124344 -5.347109 -3.004056 2.3430526
## [3,] 2.046909 -2.244467 -0.5667373 -4.291376 -2.613647 1.6777297
## [4,] 1.434025 -2.099483 -1.1635060 -3.533508 -2.597531 0.9359770
## [5,] 1.068941 -1.971652 -1.6384254 -3.040593 -2.707366 0.3332266
## [6,] 1.004582 -1.626496 -1.3764430 -2.631078 -2.381025 0.2500530
You can add appropriate names with another apply. Here the column names are very long, which impairs the formatting, but the labels tell what differences are in each column:
x <- apply(combn(ncol(d), 2), 2, function(x) d[,x[1]] - d[,x[2]])
colnames(x) <- apply(combn(ncol(d), 2), 2, function(x) paste(names(d)[x], collapse=' - '))
> x
Transportation.services - Recreational.goods.and.vehicles Transportation.services - Recreation.services
[1,] 3.217841 -2.568691
[2,] 2.891622 -2.455487
[3,] 2.046909 -2.244467
[4,] 1.434025 -2.099483
[5,] 1.068941 -1.971652
[6,] 1.004582 -1.626496
Transportation.services - Other.services Recreational.goods.and.vehicles - Recreation.services
[1,] 0.0668021 -5.786532
[2,] -0.1124344 -5.347109
[3,] -0.5667373 -4.291376
[4,] -1.1635060 -3.533508
[5,] -1.6384254 -3.040593
[6,] -1.3764430 -2.631078
Recreational.goods.and.vehicles - Other.services Recreation.services - Other.services
[1,] -3.151039 2.6354931
[2,] -3.004056 2.3430526
[3,] -2.613647 1.6777297
[4,] -2.597531 0.9359770
[5,] -2.707366 0.3332266
[6,] -2.381025 0.2500530
When I execute the following my "predictors" dataset is populated correctly:
library(rhdf5)
library(forecast)
library(sltl)
library(tseries)
fid <- H5Fcreate(output_file)
## TODO: compute the order p
p <- 4
# write predictors
h5createDataset(output_file, dataset="predictors", c(p, length(tsstl.remainder) - (p - 1)), storage.mode='double')
predictors <- as.matrix(tsstl.remainder)
for (i in 1:(p - 1)) {
predictors <- as.matrix(cbind(predictors, Lag(as.matrix(tsstl.remainder), i)))
}
predictors <- as.matrix(predictors[-1:-(p-1),])
head(predictors)
h5write(predictors, output_file, name="predictors")
H5Fclose(fid)
The generated (correct) output for head(predictors) is:
[,1] [,2] [,3] [,4]
[1,] 0.3089645 6.7722063 5.1895389 5.2323261
[2,] 8.7607228 0.3089645 6.7722063 5.1895389
[3,] -0.9411553 8.7607228 0.3089645 6.7722063
[4,] -14.1390243 -0.9411553 8.7607228 0.3089645
[5,] -26.6605296 -14.1390243 -0.9411553 8.7607228
[6,] -8.1293076 -26.6605296 -14.1390243 -0.9411553
However, when I read it the results are not correct:
tsmatrix <- t(as.matrix(h5read(output_file, "predictors")))
head(tsmatrix)
Incorrectly outputs:
[,1] [,2] [,3] [,4]
[1,] 0.3089645 8.760723 -0.9411553 -14.13902
[2,] -26.6605296 -8.129308 -9.8687675 31.52086
[3,] 54.2703126 43.902489 31.8164836 43.87957
[4,] 22.1260636 36.733055 54.7064107 56.35158
[5,] 36.3919851 25.193068 48.2244464 57.12196
[6,] 48.0585673 72.402673 68.3265518 80.18960
How come what I write does not correspond to what I get back? I double-checked and hdfview HDF5 viewer also shows this incorrect values for the "predictors" dataset.
What is wrong here?
From the rhdf5 docs:
Please note, that arrays appear as transposed matrices when opening it
with a C-program (h5dump or HDFView). This is due to the fact the
fastest changing dimension on C is the last one, but on R it is the
first one (as in Fortran).
I have a list prob with 50 elements. Each element is a 601x3 matrix of probabilities, each row of which represents a complete sample space (i.e., each row of each matrix sums to 1). For instance, here are the first five rows of the first element of prob:
> prob[[1]][1:5,]
[,1] [,2] [,3]
[1,] 0.6027004 0.3655563 0.03174335
[2,] 0.6013667 0.3665756 0.03205767
[3,] 0.6000306 0.3675946 0.03237481
[4,] 0.5986921 0.3686131 0.03269480
[5,] 0.5973513 0.3696311 0.03301765
Now, what I want to do is to create the following matrix for each row of each matrix/element in the list prob. Taking the first row, let a = .603, b = .366, and c = .032 (rounding to three decimal places). Then,
> w
[,1] [,2] [,3]
[1,] a*(1-a) -a*b -a*c
[2,] -b*a b*(1-b) -b*c
[3,] -c*a -c*b c*(1-c)
Such that:
> w
[,1] [,2] [,3]
[1,] 0.239391 -0.220698 -0.019296
[2,] -0.220698 0.232044 -0.011712
[3,] -0.019296 -0.011712 0.030976
I want to obtain a similar 3x3 matrix 600 more times (for the rest of the rows of this matrix) and then to repeat this entire process 49 more times for the rest of the elements of prob. The only thing I can think of is to call apply within lapply so that I am accessing each row of each matrix one-at-a-time. I'm sure that is not an elegant way to do this (not to mention I can't get it to work), but I can't think of anything else. Can anyone help me out with this? I'd also love to hear suggestions for using a different structure (e.g., is it bad to use matrices within lists?).
Running this process with lapply on a list of similarly dimensioned matrices should be very simple. If it represents a challenge, then you should post the dput(.) output for a two element list with similar matrices. The challenge is really to do the processing row by row which is illustrated here with the output being a 3x3xN array:
w <- apply(M, 1, function(rw) diag( rw*(1-rw) ) +
rbind( rw*c(0, -rw[1], -rw[1] ),
rw*c(-rw[2],0, -rw[2] ),
rw*c(-rw[3], -rw[3], 0)
)
)
w
[,1] [,2] [,3] [,4] [,5]
[1,] 0.23945263 0.23972479 0.23999388 0.24025987 0.24052272
[2,] -0.22032093 -0.22044636 -0.22056801 -0.22068575 -0.22079962
[3,] -0.01913173 -0.01927842 -0.01942588 -0.01957412 -0.01972314
[4,] -0.22032093 -0.22044636 -0.22056801 -0.22068575 -0.22079962
[5,] 0.23192489 0.23219793 0.23246881 0.23273748 0.23300395
[6,] -0.01160398 -0.01175156 -0.01190081 -0.01205173 -0.01220435
[7,] -0.01913173 -0.01927842 -0.01942588 -0.01957412 -0.01972314
[8,] -0.01160398 -0.01175156 -0.01190081 -0.01205173 -0.01220435
[9,] 0.03073571 0.03102998 0.03132668 0.03162585 0.03192748
w <- array(w, c(3,3,5) )
w
, , 1
[,1] [,2] [,3]
[1,] 0.23945263 -0.22032093 -0.01913173
[2,] -0.22032093 0.23192489 -0.01160398
[3,] -0.01913173 -0.01160398 0.03073571
, , 2
[,1] [,2] [,3]
[1,] 0.23972479 -0.22044636 -0.01927842
[2,] -0.22044636 0.23219793 -0.01175156
[3,] -0.01927842 -0.01175156 0.03102998
.... snipped remaining output