as.matrix(A$mat) for a given list A - r

I have n matrices of which I am trying to apply nearPD()from the Matrixpackage.
I have done this using the following code:
A<-lapply(b, nearPD)
where b is the list of n matrices.
I now would like to convert the list A into matrices. For an individual matrix I would use the following code:
A<-matrix(runif(n*n),ncol = n)
PD_mat_A<-nearPD(A)
B<-as.matrix(PD_mat_A$mat)
But I am trying to do this with a list. I have tried the following code but it doesn't seem to work:
d<-lapply(c, as.matrix($mat))
Any help would be appreciated. Thank you.
Here is a code so you can try and reproduce this:
n<-10
generate<-function (n){
matrix(runif(10*10),ncol = 10)
}
b<-lapply(1:n, generate)

Here is the simplest method using as.matrix as noted by #nicola in the comments below and (a version using apply) by #cimentadaj in the comments above:
d <- lapply(A, function(i) as.matrix(i$mat))
My original answer, exploiting the nearPD data structure was
With a little fiddling with the nearPD object type, here is an extraction method:
d <- lapply(A, function(i) matrix(i$mat#x, ncol=i$mat#Dim[2]))
Below is some commentary on how I arrived at my answer.
This object is fairly complicated as str(A[[1]]) returns
List of 7
$ mat :Formal class 'dpoMatrix' [package "Matrix"] with 5 slots
.. ..# x : num [1:100] 0.652 0.477 0.447 0.464 0.568 ...
.. ..# Dim : int [1:2] 10 10
.. ..# Dimnames:List of 2
.. .. ..$ : NULL
.. .. ..$ : NULL
.. ..# uplo : chr "U"
.. ..# factors : list()
$ eigenvalues: num [1:10] 4.817 0.858 0.603 0.214 0.15 ...
$ corr : logi FALSE
$ normF : num 1.63
$ iterations : num 2
$ rel.tol : num 0
$ converged : logi TRUE
- attr(*, "class")= chr "nearPD"
You are interested in the "mat" which is accessed by $mat. The # symbols show that "mat" is an s4 object and its components are accessed using #. The components of interest are "x", the matrix content, and "Dim" the dimension of the matrix. The code above puts this information together to extract the matrices from the list of "nearPD" objects.
Below is a brief explanation of why as.matrix works in this case. Note the matrix inside a nearPD object is not a matrix:
is.matrix(A[[1]]$mat)
[1] FALSE
However, it is a "Matrix":
class(A[[1]]$mat)
[1] "dpoMatrix"
attr(,"package")
[1] "Matrix"
From the note in the help file, help("as.matrix,Matrix-method"),
Loading the Matrix namespace “overloads” as.matrix and as.array in the base namespace by the equivalent of function(x) as(x, "matrix"). Consequently, as.matrix(m) or as.array(m) will properly work when m inherits from the "Matrix" class.
So, the Matrix package is taking care of the as.matrix conversion "under the hood."

Related

Error in asMethod(object): Cholmod error 'problem too large'

I have the following object
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..# i : int [1:120671481] 0 2 3 6 10 13 21 22 25 36 ...
..# p : int [1:51366] 0 3024 4536 8694 3302271 3302649 5715381 5756541 5784009 5801691 ...
..# Dim : int [1:2] 10314738 51365
..# Dimnames:List of 2
.. ..$ : chr [1:10314738] "line1" "line2" "line3" "line4" ...
.. ..$ : chr [1:51365] "sparito" "davide," "15enne" "di" ...
.. .. ..- attr(*, ".match.hash")=Class 'match.hash' <externalptr>
..# x : num [1:120671481] 1 1 1 1 1 1 1 1 1 1 ...
..# factors : list()
This object comes from the function dtm_builder of text2map package. Since I would like to remove empty rows from the matrix, I thought about using the command:
raw.sum=apply(dtm,1,FUN=sum) #sum by raw each raw of the table
dtm2=dtm[raw.sum!=0,]
Anyway, I obtained the following error:
Error in asMethod(object): Cholmod error 'problem too large' at file ..
How could I fix it?
The short answer to your problem is that you're likely converting a sparse object to a dense object. Matrix package sparse matrix classes are very memory efficient when a matrix has a lot of zeros (like a DTM) by simply not allocating memory for the zeros.
#akrun's answer should work, but there is a rowSums function in base R and a rowSums function from the Matrix package. You would need to load the Matrix package first.
Here is an example dgCMatrix (note not loading Matrix package yet)
m1 <- Matrix::Matrix(1:9, 3, 3, sparse = TRUE)
m1[1, 1:3] <- 0
class(m1)
If we use the base R rowSums you get the error:
rowSums(m1)
Error in rowSums(dtm): 'x' must be an array of at least two dimensions
If the Matrix package is loaded,rowSums will be replaced with the Matrix package's own method, which works with dgCMatrix. This is also true for the bracket operators [. If you update text2map to version 0.1.5, Matrix is loaded by default.
That is a massive DTM, so you may still run into memory issues -- which will depend on your machine. One thing to note is that removing sparse rows/columns will not help much. So, although words that occur once or twice will make up about 60% of your columns, you will reduce the size in terms of memory more by removing the most frequent words (i.e. words with a number in every row).

complex nested list to a clean matrix in R?

I have struggled for two days longs to find a way to create a specific matrix from a nested list
First of all, I am sorry if I don't explain my issue correctly I am one week new to StackOverflow* and R (and programming...)!
I use a file that you can find there :
original link: https://parltrack.org/dumps/ep_mep_activities.json.lz
Uncompressed by me here: https://wetransfer.com/downloads/701b7ac5250f451c6cb26d29b41bd88020200808183632/bb08429ca5102e3dc277f2f44d08f82220200808183652/666973
first 3 lists and last one (out of 23905) past here: https://pastebin.com/Kq7mjis5
With rjson, I have a nested list like this :
Nested list of MEP Votes
List of 23905
$ :List of 7
..$ ts : chr "2004-12-16T11:49:02"
..$ url : chr "http://www.europarl.europa.eu/RegData/seance_pleniere/proces_verbal/2004/12-16/votes_nominaux/xml/P6_PV(2004)12-16(RCV)_XC.xml"
..$ voteid : num 7829
..$ title : chr "Projet de budget général 2005 modifié - bloc 3"
..$ votes :List of 3
.. ..$ +:List of 2
.. .. ..$ total : num 45
.. .. ..$ groups:List of 6
.. .. .. ..$ ALDE :List of 1
.. .. .. .. ..$ : Named num 4404
.. .. .. .. .. ..- attr(*, "names")= chr "mepid"
.. .. .. ..$ GUE/NGL:List of 25
.. .. .. .. ..$ : Named num 28469
.. .. .. .. .. ..- attr(*, "names")= chr "mepid"
.. .. .. .. ..$ : Named num 4298
.. .. .. .. .. ..- attr(*, "names")= chr "mepid"
then my goal is to have something like this :
final matrix
First I would like to keep only the lists (from [[1]] to [[23905]]) containing $vote$+$groups$Renew or $vote$-$groups$Renew or $vote$'0'$groups$Renew. The main list (the 23905) are registered votes. My work is on the Renew group so my only interest is to have a vote where the Renew groups exist to compare them with other groups.
After that my goal is to create a matrix like this all the [[x]] where we can find groups$Renewexists:
final matrix
V1 V2 (not mandatory) V3[[x]]$voteid
[mepid==666] GUE/NGL + (mepid==[666] is found in [[1]]$vote$+$groups$GUE/NGL)
[mepid==777] Renew - (mepid==[777] is found in [[1]]$vote$-$groups$GUE/NGL)
I want to create a matrix so I can process the votes of each MEP (referenced by their MEPid). Their votes are either + (for yea), - (for nay) or 0 (for abstain). Moreover, I would like to have political groups of MEP displayed in the column next to their mepid. We can find their political group thanks to the place where their votes are stored. If the mepid is shown in the list [[x]]$vote$+$groups$GUE/NGL she or he belongs to the GUE/NGL groups.
What I want to do might look like this
# Clean the nested list
Keep Vote[[x]] if Vote[[x]] list contain ,
$vote$+$groups$Renew,
or $vote$-$groups$Renew,
or $vote$'0'$groups$Renew
# Create the matrix (or a data.frame if it is easier)
VoteMatrix <- as.matrix(
V1 = all "mepid" found in the nested list
V2 = groups (name of the list where we can find the mepid) (not mandatory)
V3 to Vy = If.else(mepid is in [[x]]$vote$+ then “+”,
mepid is in [[x]]$vote$- then “-“, "0")
)
Thank you in advance,
*Nevertheless, I am reading this website actively since I started R!
You can see that the 'votes' sublist is composed of three items a list of member numbers stored within what I think are party designators. Here's how you might "straighten" the positive voter 'memids' by party:
str( unlist( sapply(names(jlis[[1]]$votes$'+'$groups), function(x) unlist(jlis[[1]]$votes$'+'$groups[[x]]) ) ) )
Named num [1:104] 28268 4514 28841 28314 28241 ...
- attr(*, "names")= chr [1:104] "ALDE.mepid" "ALDE.mepid" "ALDE.mepid" "ALDE.mepid" ...
You get a named numeric vector with 108 entries. Perhaps this will demonstrate what sort of terminology to use in better describing your desired result. (Just giving a partial schema for the desired result leaves way too much ambiguity to support a fully formed request.)
I do NOT see the number 23905 anywhere in what I downloaded from your link. We are clearly looking at different data. I see this for the timestamp: chr "2004-12-01T15:20:31". I'm not going to cut you any slack for not knowing R, since the task needs to be fully explained in a natural language. I will cut you slack regarding grammar if English is not your native tongue, but you definitely need to make a better effort at explication. This is what I see for the names with the votes$'+'$groups sublists of the first three items, but since RENEW is not in any of them there's not a lot that could be demonstrated about picking items:
> names( jlis[[1]]$votes$'+'$groups)
[1] "ALDE" "GUE/NGL" "IND/DEM" "NI" "PPE-DE" "PSE" "UEN"
> names( jlis[[2]]$votes$'+'$groups)
[1] "GUE/NGL" "IND/DEM" "NI" "PPE-DE"
> names( jlis[[3]]$votes$'+'$groups)
[1] "ALDE" "GUE/NGL" "IND/DEM" "NI" "PPE-DE" "PSE" "UEN" "Verts/ALE"
Furthermore, when I looked at all of the possible votes values using this method (for all three of the items you made available) I still see no RENEW names.
sapply( jlis[[1]]$votes[c("+","-","0")], function(x) names(x$groups) )
After second edit: Here's the next step of isolating those votes that contain a "Renew` value. I'm assuming that its possible to have a "Renew" value in only one of the three possible 'votes' values (+,-.0). If not (and there are always "Renew" values in each of them when there is one in any of them) then you might be able to simplify the logic. We make three logical vectors:
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['0']][['groups']]) } )
#[1] FALSE FALSE FALSE TRUE
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['+']][['groups']]) } )
#[1] FALSE FALSE FALSE TRUE
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['-']][['groups']]) } )
#[1] FALSE FALSE FALSE TRUE
And then wrap them in a matrix call with 3 columns and take the maximum of each row (the maximum of c(TRUE,FALSE) is 1 and then convert back to logical.
selection_vec = as.logical( apply( matrix( c(
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['0']][['groups']]) } ),
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['+']][['groups']]) } ),
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['-']][['groups']]) } ) ),
ncol=3 ), 1,max))
> selection_vec
[1] FALSE FALSE FALSE TRUE

For new objects in R, how to assign values through subscripting?

For new objects in R, how can I specify assignments to subscripted elements? As in object[3] <- new value. Here is a specific example of the problem I have.
# Rectangle example:
Rectangle <- function(a, b,...){
R <- list(a=a, b=b, others=list(...))
structure(R, class="Rectangle")
}#
`[.Rectangle` <- function(R,ind){
if(ind==1) return(R$a)
if(ind==2) return(R$b)
if(ind>=3) return(R$others[[ind-2]])
}#
R <- Rectangle(2,3,"other1","other2")
> R[1]; R[2]; R[3]; R[4];
[1] 2
[1] 3
[1] "other1"
[1] "other2"
> R[4] <- "new.other";
> R[1]; R[2]; R[3]; R[4];
[1] 2
[1] 3
[1] "other1"
[1] "other2"
Clearly, the assignment to the subscripted object hasn't worked. I would like to know the syntax to define such assignments properly. That is, I would need an example for the following:
`[<-.Rectangle` <- function(){ }
Thank you very much.
To override subset-assign, your function needs to accept three arguments (x, index, value) and return the modified object. It is important that the third parameter is called exactly value, since R internally calls the function using that name (rather than positionally).
Here’s an example:
`[<-.Rectangle` = function (x, index, value) {
if (index == 1L) {
x$a = value
}
else if (index == 2L) {
x$b = value
}
else {
x$others[[index - 2L]] = value
}
x
}
It probably goes without saying that this is a pretty convoluted logic, I’m not convinced that real-world code should have objects with such an API.
Maybe this help you:
> str(R)
List of 3
$ a : num 2
$ b : num 3
$ others:List of 2
..$ : chr "other1"
..$ : chr "other2"
- attr(*, "class")= chr "Rectangle"
> R[4]='hello'
> str(R)
List of 4
$ a : num 2
$ b : num 3
$ others:List of 2
..$ : chr "other1"
..$ : chr "other2"
$ : chr "hello"
- attr(*, "class")= chr "Rectangle"
> R[4]
[1] "other2"
> R[[4]]
[1] "hello"

Access transition matrix from markovchainFit object

I want to first calculate a markov transition matrix and then take exponent of it. To achieve the first goal I use the markovchainFit function inside markovchain package and it return me a data.frame , rather than a matrix. So I need to convert it to matrix before I take exponent.
My R code snippet is like
#################################
# Estimate Transition Matrix #
#################################
setwd("G:/Data_backup/GDP_per_Capita")
library("foreign")
library("Hmisc")
mydata <- stata.get("G:/Data_backup/GDP_per_Capita/states.dta")
mydata
library(markovchain)
library(expm)
rgdp_e=mydata[,2:7]
rgdp_o=mydata[,8:13]
createSequenceMatrix(rgdp_e)
rgdp_e_trans<-markovchainFit(data=rgdp_e,,method="bootstrap",nboot=5, name="Bootstrap Mc")
rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans))
rgdp_e_trans<-as.matrix(rgdp_e_trans)
is.matrix(rgdp_e_trans)
rgdp_e_trans %^% 1/5
the rgdp_e_trans is a data frame, and I try to convert it to a numeric matrix. It seems work when I test it using is.matrix command. However, the final line give me an error said
Error in rgdp_e_trans %^% 2 :
(list) object cannot be coerced to type 'double'
After some searching work in stackoverflow, I find this question sharing the similar problem and use rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans)) to coerce the object to be `double', but it seems not work.
Besides, the data.frame rgdp_e_trans contains no factor or characters
The output in the console is like
> rgdp_e=mydata[,2:7]
> rgdp_o=mydata[,8:13]
> createSequenceMatrix(rgdp_e)
Error: not compatible with STRSXP
> rgdp_e_trans<-markovchainFit(data=rgdp_e,,method="bootstrap",nboot=5, name="Bootstrap Mc")
> rgdp_e_trans
$estimate
1 2 3 4 5
1 0.6172840 0.18930041 0.09053498 0.074074074 0.02880658
2 0.1125828 0.59602649 0.28476821 0.006622517 0.00000000
3 0.0000000 0.03846154 0.60256410 0.358974359 0.00000000
4 0.0000000 0.01162791 0.03488372 0.691860465 0.26162791
5 0.0000000 0.00000000 0.00000000 0.044247788 0.95575221
> rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans))
Error: (list) object cannot be coerced to type 'double'
> rgdp_e_trans<-as.matrix(rgdp_e_trans)
> is.matrix(rgdp_e_trans)
[1] TRUE
> rgdp_e_trans %^% 1/5
Error in rgdp_e_trans %^% 1 :
(list) object cannot be coerced to type 'double'
>
Any suggestion to fix the problem, or alternative way to calculate the exponent ? Thank you.
Additional:
> str(rgdp_e_trans)
List of 1
$ estimate:Formal class 'markovchain' [package "markovchain"] with 4 slots
.. ..# states : chr [1:5] "1" "2" "3" "4" ...
.. ..# byrow : logi TRUE
.. ..# transitionMatrix: num [1:5, 1:5] 0.617 0.113 0 0 0 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:5] "1" "2" "3" "4" ...
.. .. .. ..$ : chr [1:5] "1" "2" "3" "4" ...
.. ..# name : chr "Bootstrap Mc"
and I comment out the as.matrix part
rgdp_e=mydata[,2:7]
rgdp_o=mydata[,8:13]
createSequenceMatrix(rgdp_e)
rgdp_e_trans<-markovchainFit(data=rgdp_e,,method="bootstrap",nboot=5, name="Bootstrap Mc")
rgdp_e_trans
str(rgdp_e_trans)
# rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans))
# rgdp_e_trans<-as.matrix(rgdp_e_trans)
# is.matrix(rgdp_e_trans)
rgdp_e_trans$estimate %^% 1/5
You can access the transition matrix directly from the object returned by markovchainFit as:
rgdp_e_trans$estimate#transitionMatrix
Here rgdp_e_trans is your return value from markovchainFit, which is actually a list containing the information from the fitting process. You access the estimates item of that list by using the $ operator. The estimate object is from a formal S4 class (see e.g. Advanced R by Hadley Wickham for a description of the object systems used in R), which is why in order to access its items you have to use the # operator instead of the standard $ used for the more common S3 objects.
If you print out the return value of as.matrix(rgdp_e_trans) it should be immediately obvious where your initial approach went wrong. In general it's a good idea to check the structure of an object with the str function - instead of relying on its print method - when you encounter unexpected results or are working with new types of objects.

R dataframe define column names at creation

I get monthly price value for the two assets below from Yahoo:
if(!require("tseries") | !require(its) ) { install.packages(c("tseries", 'its')); require("tseries"); require(its) }
startDate <- as.Date("2000-01-01", format="%Y-%m-%d")
MSFT.prices = get.hist.quote(instrument="msft", start= startDate,
quote="AdjClose", provider="yahoo", origin="1970-01-01",
compression="m", retclass="its")
SP500.prices = get.hist.quote(instrument="^gspc", start=startDate,
quote="AdjClose", provider="yahoo", origin="1970-01-01",
compression="m", retclass="its")
I want to put these two into a single data frame with specified columnames (Pandas allows this now - a bit ironic since they take the data.frame concept from R). As below, I assign the two time series with names:
MSFTSP500.prices <- data.frame(msft = MSFT.prices, sp500= SP500.prices )
However, this does not preserve the column names [msft, snp500] I have appointed. I need to define column names in a separate line of code:
colnames(MSFTSP500.prices) <- c("msft", "sp500")
I tried to put colnames and col.names inside the data.frame() call but it doesn't work. How can I define column names while creating the data frame?
I found ?data.frame very unhelpful...
The code fails with an error message indicating no availability of as.its. So I added the missing code (which appears to have been successful after two failed attempts.) Once you issue the missing require() call you can use str to see what sort of object get.hist.quote actually returns. It is neither a dataframe nor a zoo object, although it resembles a zoo-object in many ways:
> str(SP500.prices)
Formal class 'its' [package "its"] with 2 slots
..# .Data: num [1:180, 1] 1394 1366 1499 1452 1421 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:180] "2000-01-02" "2000-01-31" "2000-02-29" "2000-04-02" ...
.. .. ..$ : chr "AdjClose"
..# dates: POSIXct[1:180], format: "2000-01-02 16:00:00" "2000-01-31 16:00:00" ...
If you run cbind on those two objects you get a regular matrix with dimnames:
> str(cbind(SP500.prices, MSFT.prices) )
num [1:180, 1:2] 1394 1366 1499 1452 1421 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:180] "2000-01-02" "2000-01-31" "2000-02-29" "2000-04-02" ...
..$ : chr [1:2] "AdjClose" "AdjClose"
You will still need to change the column names since there does not seem to be a cbind.its that lets you assign column-names. I would caution about using the data.frame method, since the object is might get confusing in its behavior:
> str( MSFTSP500.prices )
'data.frame': 180 obs. of 2 variables:
$ AdjClose :Formal class 'AsIs', 'its' [package ""] with 1 slot
.. ..# .S3Class: chr "AsIs" "its"
$ AdjClose.1:Formal class 'AsIs', 'its' [package ""] with 1 slot
.. ..# .S3Class: chr "AsIs" "its"
The columns are still S4 objects. I suppose that might be useful if you were going to pass them to other its-methods but could be confusing otherwise. This might be what you were shooting for:
> MSFTSP500.prices <- data.frame(msft = as.vector(MSFT.prices),
sp500= as.vector(SP500.prices) ,
row.names= as.character(MSFT.prices#dates) )
> str( MSFTSP500.prices )
'data.frame': 180 obs. of 2 variables:
$ msft : num 35.1 32 38.1 25 22.4 ...
$ sp500: num 1394 1366 1499 1452 1421 ...
> head(rownames(MSFTSP500.prices))
[1] "2000-01-02 16:00:00" "2000-01-31 16:00:00" "2000-02-29 16:00:00"
[4] "2000-04-02 17:00:00" "2000-04-30 17:00:00" "2000-05-31 17:00:00"
MSFT.prices is a zoo object, which seems to be a data-frame-alike, with its own column name which gets transferred to the object. Confer
tmp <- data.frame(a=1:10)
b <- data.frame(lost=tmp)
which loses the second column name.
If you do
MSFTSP500.prices <- data.frame(msft = as.vector(MSFT.prices),
sp500=as.vector(SP500.prices))
then you will get the colnames you want (though you won't get zoo-specific behaviours). Not sure why you object to renaming columns in a second command, though.

Resources