I have a question concerning the smacofSym function in the Smacof package. I am using R version 3.1.0 through RStudio Version 0.98.501.
I am using the following command:
MDSdata <- smacofSym(DJaccardMatrix, ndim=2, metric=FALSE, verbose=TRUE).
I've included details of the data I'm using (DJaccardMatrix) below. Every time I run smacofSym I end up with a configuration where the final configuration is right on top of each other. Here is a sample of the results:
MDSdata$conf
D1 D2
1 0.06259624 -0.01494732
2 0.06276541 -0.01480409
3 0.06266933 -0.01492375
4 0.06262438 -0.01496111
5 0.06243336 -0.01496193
6 0.06258047 -0.01502270
7 0.06247747 -0.01500037 .......
To check the results I ran the same matrix on XLStat and got what I was expecting, a much more distributed set of points. After looking at some of the other help requests I've tried running smacofSym as both a matrix and dist, but neither has affected the results.
Here is my info on DJaccardMatrix as a matrix:
num [1:121, 1:121] 0 0.969 0.679 0.704 0.939 ...
attr(*, "dimnames")=List of 2
..$ : chr [1:121] "1" "2" "3" "4" ...
..$ : chr [1:121] "1" "2" "3" "4" ...
Here is my info on DJaccardMatrix as a dist object:
Class 'dist' atomic [1:7260] 0.969 0.679 0.704 0.939 0.8 ...
..- attr(*, "Size")= int 121
..- attr(*, "call")= language as.dist.default(m = dissmat)
..- attr(*, "Diag")= logi FALSE
..- attr(*, "Upper")= logi FALSE
I'm thankful for any recommendations people have. I am assuming it is something very basic, but I am definitely not finding it. (On a side note - feel free to ignore this because it's concerning interpretation - what is the relation between the nonmetric stress that smacof reports and Kruskal's stress? Is there any?)
This answer concerns your side question in parenthesis at the end: "what is the relation between the nonmetric stress that smacof reports and Kruskal's stress"
Kruskal's stress (or Stress-1) is the square root of the nonmetric stress (stress.nm) reported by smacof.
So, if you have a model called mod obtained by running smacofSym:
Stress-1 = mod$stress.nm^.5
Related
I have n matrices of which I am trying to apply nearPD()from the Matrixpackage.
I have done this using the following code:
A<-lapply(b, nearPD)
where b is the list of n matrices.
I now would like to convert the list A into matrices. For an individual matrix I would use the following code:
A<-matrix(runif(n*n),ncol = n)
PD_mat_A<-nearPD(A)
B<-as.matrix(PD_mat_A$mat)
But I am trying to do this with a list. I have tried the following code but it doesn't seem to work:
d<-lapply(c, as.matrix($mat))
Any help would be appreciated. Thank you.
Here is a code so you can try and reproduce this:
n<-10
generate<-function (n){
matrix(runif(10*10),ncol = 10)
}
b<-lapply(1:n, generate)
Here is the simplest method using as.matrix as noted by #nicola in the comments below and (a version using apply) by #cimentadaj in the comments above:
d <- lapply(A, function(i) as.matrix(i$mat))
My original answer, exploiting the nearPD data structure was
With a little fiddling with the nearPD object type, here is an extraction method:
d <- lapply(A, function(i) matrix(i$mat#x, ncol=i$mat#Dim[2]))
Below is some commentary on how I arrived at my answer.
This object is fairly complicated as str(A[[1]]) returns
List of 7
$ mat :Formal class 'dpoMatrix' [package "Matrix"] with 5 slots
.. ..# x : num [1:100] 0.652 0.477 0.447 0.464 0.568 ...
.. ..# Dim : int [1:2] 10 10
.. ..# Dimnames:List of 2
.. .. ..$ : NULL
.. .. ..$ : NULL
.. ..# uplo : chr "U"
.. ..# factors : list()
$ eigenvalues: num [1:10] 4.817 0.858 0.603 0.214 0.15 ...
$ corr : logi FALSE
$ normF : num 1.63
$ iterations : num 2
$ rel.tol : num 0
$ converged : logi TRUE
- attr(*, "class")= chr "nearPD"
You are interested in the "mat" which is accessed by $mat. The # symbols show that "mat" is an s4 object and its components are accessed using #. The components of interest are "x", the matrix content, and "Dim" the dimension of the matrix. The code above puts this information together to extract the matrices from the list of "nearPD" objects.
Below is a brief explanation of why as.matrix works in this case. Note the matrix inside a nearPD object is not a matrix:
is.matrix(A[[1]]$mat)
[1] FALSE
However, it is a "Matrix":
class(A[[1]]$mat)
[1] "dpoMatrix"
attr(,"package")
[1] "Matrix"
From the note in the help file, help("as.matrix,Matrix-method"),
Loading the Matrix namespace “overloads” as.matrix and as.array in the base namespace by the equivalent of function(x) as(x, "matrix"). Consequently, as.matrix(m) or as.array(m) will properly work when m inherits from the "Matrix" class.
So, the Matrix package is taking care of the as.matrix conversion "under the hood."
I want to first calculate a markov transition matrix and then take exponent of it. To achieve the first goal I use the markovchainFit function inside markovchain package and it return me a data.frame , rather than a matrix. So I need to convert it to matrix before I take exponent.
My R code snippet is like
#################################
# Estimate Transition Matrix #
#################################
setwd("G:/Data_backup/GDP_per_Capita")
library("foreign")
library("Hmisc")
mydata <- stata.get("G:/Data_backup/GDP_per_Capita/states.dta")
mydata
library(markovchain)
library(expm)
rgdp_e=mydata[,2:7]
rgdp_o=mydata[,8:13]
createSequenceMatrix(rgdp_e)
rgdp_e_trans<-markovchainFit(data=rgdp_e,,method="bootstrap",nboot=5, name="Bootstrap Mc")
rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans))
rgdp_e_trans<-as.matrix(rgdp_e_trans)
is.matrix(rgdp_e_trans)
rgdp_e_trans %^% 1/5
the rgdp_e_trans is a data frame, and I try to convert it to a numeric matrix. It seems work when I test it using is.matrix command. However, the final line give me an error said
Error in rgdp_e_trans %^% 2 :
(list) object cannot be coerced to type 'double'
After some searching work in stackoverflow, I find this question sharing the similar problem and use rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans)) to coerce the object to be `double', but it seems not work.
Besides, the data.frame rgdp_e_trans contains no factor or characters
The output in the console is like
> rgdp_e=mydata[,2:7]
> rgdp_o=mydata[,8:13]
> createSequenceMatrix(rgdp_e)
Error: not compatible with STRSXP
> rgdp_e_trans<-markovchainFit(data=rgdp_e,,method="bootstrap",nboot=5, name="Bootstrap Mc")
> rgdp_e_trans
$estimate
1 2 3 4 5
1 0.6172840 0.18930041 0.09053498 0.074074074 0.02880658
2 0.1125828 0.59602649 0.28476821 0.006622517 0.00000000
3 0.0000000 0.03846154 0.60256410 0.358974359 0.00000000
4 0.0000000 0.01162791 0.03488372 0.691860465 0.26162791
5 0.0000000 0.00000000 0.00000000 0.044247788 0.95575221
> rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans))
Error: (list) object cannot be coerced to type 'double'
> rgdp_e_trans<-as.matrix(rgdp_e_trans)
> is.matrix(rgdp_e_trans)
[1] TRUE
> rgdp_e_trans %^% 1/5
Error in rgdp_e_trans %^% 1 :
(list) object cannot be coerced to type 'double'
>
Any suggestion to fix the problem, or alternative way to calculate the exponent ? Thank you.
Additional:
> str(rgdp_e_trans)
List of 1
$ estimate:Formal class 'markovchain' [package "markovchain"] with 4 slots
.. ..# states : chr [1:5] "1" "2" "3" "4" ...
.. ..# byrow : logi TRUE
.. ..# transitionMatrix: num [1:5, 1:5] 0.617 0.113 0 0 0 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:5] "1" "2" "3" "4" ...
.. .. .. ..$ : chr [1:5] "1" "2" "3" "4" ...
.. ..# name : chr "Bootstrap Mc"
and I comment out the as.matrix part
rgdp_e=mydata[,2:7]
rgdp_o=mydata[,8:13]
createSequenceMatrix(rgdp_e)
rgdp_e_trans<-markovchainFit(data=rgdp_e,,method="bootstrap",nboot=5, name="Bootstrap Mc")
rgdp_e_trans
str(rgdp_e_trans)
# rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans))
# rgdp_e_trans<-as.matrix(rgdp_e_trans)
# is.matrix(rgdp_e_trans)
rgdp_e_trans$estimate %^% 1/5
You can access the transition matrix directly from the object returned by markovchainFit as:
rgdp_e_trans$estimate#transitionMatrix
Here rgdp_e_trans is your return value from markovchainFit, which is actually a list containing the information from the fitting process. You access the estimates item of that list by using the $ operator. The estimate object is from a formal S4 class (see e.g. Advanced R by Hadley Wickham for a description of the object systems used in R), which is why in order to access its items you have to use the # operator instead of the standard $ used for the more common S3 objects.
If you print out the return value of as.matrix(rgdp_e_trans) it should be immediately obvious where your initial approach went wrong. In general it's a good idea to check the structure of an object with the str function - instead of relying on its print method - when you encounter unexpected results or are working with new types of objects.
I'm using the function tsbootstrap() from the package tseries to generate block bootstrap samples, and to calculate the standard errors for the estimate of the parameters of a regime-switching autoregressive model (which I can obtain using the function msmFit() from the package MSwM).
Here is the code I'm using. Firstly I define a function for the statistic I want to use:
switching.par <- function(z) {
n<-length(z)
x<-z[1:(n-1)]
y<-z[2:n]
my.xy<- data.frame(x,y)
mod<-lm(y~x,data=my.xy)
mod.mswm=msmFit(mod,k=2,sw=c(T,T,T))
b1 <- mod.mswm#Coef[1,2]
b2 <- mod.mswm#Coef[2,2]
c1 <- mod.mswm#Coef[1,1]
c2 <- mod.mswm#Coef[2,1]
del1 <- mod.mswm#std[1]
del2 <- mod.mswm#std[2]
parameters<-c(b1, b2, c1, c2, del1, del2)
names(parameters)<-c("b1", "b2", "c1", "c2", "del1", "del2")
parameters
}
And then I use the tsbootstrap() function (where x is a monthly time series of 10-year US government bonds)
use.boot <- tsbootstrap(x, nb=1000, statistic=switching.par, type="block", b=9)
But I get the following error message:
Error in solve.default(res$Hessian) :
Lapack routine dgesv: system is exactly singular: U[4,4] = 0
Any insight on how to fix this problem? I think the error comes from the function msmFit() of the package.
The error as you correctly understood comes from the msmFit function that fails to converge.
I will give a bit of insight as to the error and then provide a solution that worked for me:
solve.default is a common error that you can see when an optimiser is being used. Usually the optimiser (such as optim) will try to calculate the hessian matrix in order to 'direct' itself to the optimal solution that minimises the underlying function. At some point the hessian matrix needs to be inverted and if it is singular the algorithm crashes. Practically this means that the optimiser failed to converge (i.e. couldn't find a solution).
This can be because of a number of reasons:
Too few observations
Bad starting values
Bad design of the function to be optimised (or used in the function)
Low number of maximum iterations
Literally no solution for the problem
Now let's go to your problem:
It seems that the default maximum iterations for msmFit is 100. Try increasing that to 500 as I do below. The design of the function seems ok to me. Now let's go to the low number of observations. The b argument of the tsbootstrap function as far as I understand from the documentation is a value that controls the observations that go to the switch function. Having it to 9 makes the msmFit function fail. I increased that to 50 (assuming that your df has 50 observations. Anything less than that will probably fail all the time). Also, having it produce 1000 bootstrap series will take a day to compute (at least on my machine).
With all the above in mind, the following seems to work on my machine (for just 10 bootstrap series) and it took ages.
switching.par <- function(z) {
n<-length(z)
x<-z[1:(n-1)]
y<-z[2:n]
my.xy<- data.frame(x,y)
mod<-lm(y~x,data=my.xy)
mod.mswm=msmFit(mod,k=2,sw=c(T,T,T), control=list(maxiter=500))
b1 <- mod.mswm#Coef[1,2]
b2 <- mod.mswm#Coef[2,2]
c1 <- mod.mswm#Coef[1,1]
c2 <- mod.mswm#Coef[2,1]
del1 <- mod.mswm#std[1]
del2 <- mod.mswm#std[2]
parameters<-c(b1, b2, c1, c2, del1, del2)
names(parameters)<-c("b1", "b2", "c1", "c2", "del1", "del2")
parameters
}
use.boot <- tsbootstrap(x, nb=10, statistic=switching.par, type="block", b=50)
Output:
> str(use.boot)
List of 5
$ statistic : num [1:10, 1:6] -0.0275 -0.1692 -0.0199 -0.0587 -0.0763 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:6] "t1" "t2" "t3" "t4" ...
$ orig.statistic: Named num [1:6] 0.0114 -0.1002 0.5155 0.5319 0.2868 ...
..- attr(*, "names")= chr [1:6] "t1" "t2" "t3" "t4" ...
$ bias : Named num [1:6] -0.2029 0.2041 0.0307 -0.0217 -0.036 ...
..- attr(*, "names")= chr [1:6] "t1" "t2" "t3" "t4" ...
$ se : Named num [1:6] 0.2076 0.1774 0.1686 0.1375 0.0533 ...
..- attr(*, "names")= chr [1:6] "t1" "t2" "t3" "t4" ...
$ call : language tsbootstrap(x = x, nb = 10, statistic = switching.par, b = 50, type = "block")
- attr(*, "class")= chr "resample.statistic"
Whenever I use any sort of HTTP command via the system() function in R studio, the rainbow circle of death appears and I have to force-quit R Studio. Up until now, I've written a bunch of checks to make sure a user isn't in R Studio before using an HTTP command (which I use a ton to access data), but it's quite a pain, and it would be fantastic to get to the root of the problem.
e.g.
system("http get http://api.bls.gov/publicAPI/v1/timeseries/data/CXUALCBEVGLB0101M")
causes R studio to crash. Oddly, on another laptop of mine, such commands don't crash R Studio but cause the following error: 'sh: http: command not found', even though http is installed and works fine when using the terminal.
Does anybody know how to fix this problem / why it happens / does it occur for you guys too? Although I know a lot about R, I'm afraid I have no idea how to try to fix this problem.
Thanks!!!
Using http from the httpie package on Linux hangs RStudio (and not plain terminal R) on my Linux system (your rainbow circle implies its a Mac?) so I'm getting the same behaviour as you.
Installing and using wget works for me:
system("wget -O /tmp/data.out http://api.bls.gov/publicAPI/v1/timeseries/data/CXUALCBEVGLB0101M")
Or you could try R's native download.file function. There's a whole bunch of other functions for getting stuff off the web - see the Web Task View http://cran.r-project.org/web/views/WebTechnologies.html
I've not seen this http command used much, so maybe its flakey. Or maybe its opening stdin...
Yes... Try this:
system("http get http://api.bls.gov/publicAPI/v1/timeseries/data/CXUALCBEVGLB0101M >/tmp/data2.out </dev/null" )
I think http is opening stdin, the Unix standard input channel, RStudio isn't sending anything to it. So it waits. If you explicitly assign http's stdin as /dev/null then http completes. This works for me in RStudio.
However, I still prefer wget or curl-based solutions!
Without more contextual information regarding Rstudio version / operating system it is hard to do more than suggest an alternative approach that avoids the use system()
Instead you could use RCurl and getURL
library(RCurl)
getURL('http://api.bls.gov/publicAPI/v1/timeseries/data/CXUALCBEVGLB0101M')
#[1] "{\"status\":\"REQUEST_SUCCEEDED\",\"responseTime\":129,\"message\":[],\"Results\":{\n\"series\":\n[{\"seriesID\":\"CXUALCBEVGLB0101M\",\"data\":[{\"year\":\"2013\",\"period\":\"A01\",\"periodName\":\"Annual\",\"value\":\"445\",\"footnotes\":[{}]},{\"year\":\"2012\",\"period\":\"A01\",\"periodName\":\"Annual\",\"value\":\"451\",\"footnotes\":[{}]},{\"year\":\"2011\",\"period\":\"A01\",\"periodName\":\"Annual\",\"value\":\"456\",\"footnotes\":[{}]}]}]\n}}"
You could also use PUT, GET, POST, etc directly in R, abstracted from RCurl by the httr package:
library(httr)
tmp <- GET("http://api.bls.gov/publicAPI/v1/timeseries/data/CXUALCBEVGLB0101M")
dat <- content(tmp, as="parsed")
str(dat)
## List of 4
## $ status : chr "REQUEST_SUCCEEDED"
## $ responseTime: num 27
## $ message : list()
## $ Results :List of 1
## ..$ series:'data.frame': 1 obs. of 2 variables:
## .. ..$ seriesID: chr "CXUALCBEVGLB0101M"
## .. ..$ data :List of 1
## .. .. ..$ :'data.frame': 3 obs. of 5 variables:
## .. .. .. ..$ year : chr [1:3] "2013" "2012" "2011"
## .. .. .. ..$ period : chr [1:3] "A01" "A01" "A01"
## .. .. .. ..$ periodName: chr [1:3] "Annual" "Annual" "Annual"
## .. .. .. ..$ value : chr [1:3] "445" "451" "456"
## .. .. .. ..$ footnotes :List of 3
## .. .. .. .. ..$ :'data.frame': 1 obs. of 0 variables
## .. .. .. .. ..$ :'data.frame': 1 obs. of 0 variables
## .. .. .. .. ..$ :'data.frame': 1 obs. of 0 variables
The mixOmics package is meant to analyze big data sets (e.g. from high throughput experiments), but it seems not be working with my big matrix.
I am having issues with both rcc (regularized canonical correlation) and tune.rcc (labmda parameters estimation for regularized can cor).
> str(Y)
num [1:13, 1:17766] ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:13] ...
..$ : chr [1:17766] ...
> str(X)
num [1:13, 1:26] ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:13] ...
..$ : chr [1:26] ...
tune.rcc(X, Y, grid1 = seq(0.001, 1, length = 5),
grid2 = seq(0.001, 1, length = 5),
validation = "loo", plt=F)
On Mavericks: runs forever (I quit R after hours)
Since I know Mavericks is problematic I've tried it on a Windows8 machine and on the mixOmics web interface.
On Windows 8:
Error: cannot allocate vector of size 2.4 Gb
On web interface, since it is not possible to estimate lambdas (tune.rcc) I tried rcc with "some" lambdas and get:
Error: cannot allocate vector of size 2.4 Gb
Am I doing something obviously wrong?
Any help very much appreciated.