Browse[1]> lmc
[[1]]
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0090841 0.0063588 1.4286 0.154840
m[, "FX_RET_28"] 0.1122490 0.1599463 0.7018 0.483705
m[, "FX_RET_42"] 0.1702606 0.1041854 1.6342 0.103944
m[, "FX_RET_51"] -0.4735956 0.2450406 -1.9327 0.054823 .
m[, "FX_RET_52"] 0.2475292 0.1458240 1.6975 0.091321 .
m[, "FX_RET_53"] -0.5569527 0.1945823 -2.8623 0.004699 **
m[, "FX_RET_60"] -0.3191905 0.2887157 -1.1056 0.270379
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Browse[1]> str(lmc)
List of 1
$ : coeftest [1:7, 1:4] 0.00908 0.11225 0.17026 -0.4736 0.24753 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:7] "(Intercept)" "m[, \"FX_RET_28\"]" "m[, \"FX_RET_42\"]" "m[, \"FX_RET_51\"]" ...
.. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
..- attr(*, "method")= chr "t test of coefficients"
I want to pull out the Estimate column into a vector with (Intercept), m[, "FX_RE_28"], etc. as the names of the elements. I would appreciate any help.
Thanks
From the str output, one would predict that the values for the Estimate-column could be extracted with one of four incantations:
lmc[[1]][ , 1] # using just numerical indexing ... OR
# Apparently not this: lmc[['coeftest']][ , "Estimate" ] # Using character/name indexing
From the comments it appears that there was no name of the list element. Appears the "coeftest" is not a name but rather the class-type of the first (and only=) item in that list.
I thought (but was wrong) : The reason that lmc[[1]]$coeftest[,1] offered by RichAtMango fails is that the object is a list and lmc[[1]] delivers the first and only item of the list which is a matrix. This would have worked: lmc[1]$coeftest[,1] , because the [.]-function delivers a sublist (rather than the value itself) and it still would have had an element named 'coeftest'.
If you had wanted a one column matrix (which would display with the rownames on the side) then the call would have been:
lmc[['coeftest']][ , "Estimate" , drop=FALSE] # to avoid returning as a vector
You cannot "accept" comments as answers in SO. It's not clear why MichaelChirico didn't post an answer. He may have been too busy to post something he thought was sufficiently developed or maybe he wanted to to post dput(lmc) so he could offer a tested answer. I thought the downvote you got was unfair, since you did provide enough information to answer and the indexing difference between "[" and "[[" can be difficult to get for persons beginning with R. Your request needed understanding of both the indexing of lists and the indexing of R matrices.
Related
I have struggled for two days longs to find a way to create a specific matrix from a nested list
First of all, I am sorry if I don't explain my issue correctly I am one week new to StackOverflow* and R (and programming...)!
I use a file that you can find there :
original link: https://parltrack.org/dumps/ep_mep_activities.json.lz
Uncompressed by me here: https://wetransfer.com/downloads/701b7ac5250f451c6cb26d29b41bd88020200808183632/bb08429ca5102e3dc277f2f44d08f82220200808183652/666973
first 3 lists and last one (out of 23905) past here: https://pastebin.com/Kq7mjis5
With rjson, I have a nested list like this :
Nested list of MEP Votes
List of 23905
$ :List of 7
..$ ts : chr "2004-12-16T11:49:02"
..$ url : chr "http://www.europarl.europa.eu/RegData/seance_pleniere/proces_verbal/2004/12-16/votes_nominaux/xml/P6_PV(2004)12-16(RCV)_XC.xml"
..$ voteid : num 7829
..$ title : chr "Projet de budget général 2005 modifié - bloc 3"
..$ votes :List of 3
.. ..$ +:List of 2
.. .. ..$ total : num 45
.. .. ..$ groups:List of 6
.. .. .. ..$ ALDE :List of 1
.. .. .. .. ..$ : Named num 4404
.. .. .. .. .. ..- attr(*, "names")= chr "mepid"
.. .. .. ..$ GUE/NGL:List of 25
.. .. .. .. ..$ : Named num 28469
.. .. .. .. .. ..- attr(*, "names")= chr "mepid"
.. .. .. .. ..$ : Named num 4298
.. .. .. .. .. ..- attr(*, "names")= chr "mepid"
then my goal is to have something like this :
final matrix
First I would like to keep only the lists (from [[1]] to [[23905]]) containing $vote$+$groups$Renew or $vote$-$groups$Renew or $vote$'0'$groups$Renew. The main list (the 23905) are registered votes. My work is on the Renew group so my only interest is to have a vote where the Renew groups exist to compare them with other groups.
After that my goal is to create a matrix like this all the [[x]] where we can find groups$Renewexists:
final matrix
V1 V2 (not mandatory) V3[[x]]$voteid
[mepid==666] GUE/NGL + (mepid==[666] is found in [[1]]$vote$+$groups$GUE/NGL)
[mepid==777] Renew - (mepid==[777] is found in [[1]]$vote$-$groups$GUE/NGL)
I want to create a matrix so I can process the votes of each MEP (referenced by their MEPid). Their votes are either + (for yea), - (for nay) or 0 (for abstain). Moreover, I would like to have political groups of MEP displayed in the column next to their mepid. We can find their political group thanks to the place where their votes are stored. If the mepid is shown in the list [[x]]$vote$+$groups$GUE/NGL she or he belongs to the GUE/NGL groups.
What I want to do might look like this
# Clean the nested list
Keep Vote[[x]] if Vote[[x]] list contain ,
$vote$+$groups$Renew,
or $vote$-$groups$Renew,
or $vote$'0'$groups$Renew
# Create the matrix (or a data.frame if it is easier)
VoteMatrix <- as.matrix(
V1 = all "mepid" found in the nested list
V2 = groups (name of the list where we can find the mepid) (not mandatory)
V3 to Vy = If.else(mepid is in [[x]]$vote$+ then “+”,
mepid is in [[x]]$vote$- then “-“, "0")
)
Thank you in advance,
*Nevertheless, I am reading this website actively since I started R!
You can see that the 'votes' sublist is composed of three items a list of member numbers stored within what I think are party designators. Here's how you might "straighten" the positive voter 'memids' by party:
str( unlist( sapply(names(jlis[[1]]$votes$'+'$groups), function(x) unlist(jlis[[1]]$votes$'+'$groups[[x]]) ) ) )
Named num [1:104] 28268 4514 28841 28314 28241 ...
- attr(*, "names")= chr [1:104] "ALDE.mepid" "ALDE.mepid" "ALDE.mepid" "ALDE.mepid" ...
You get a named numeric vector with 108 entries. Perhaps this will demonstrate what sort of terminology to use in better describing your desired result. (Just giving a partial schema for the desired result leaves way too much ambiguity to support a fully formed request.)
I do NOT see the number 23905 anywhere in what I downloaded from your link. We are clearly looking at different data. I see this for the timestamp: chr "2004-12-01T15:20:31". I'm not going to cut you any slack for not knowing R, since the task needs to be fully explained in a natural language. I will cut you slack regarding grammar if English is not your native tongue, but you definitely need to make a better effort at explication. This is what I see for the names with the votes$'+'$groups sublists of the first three items, but since RENEW is not in any of them there's not a lot that could be demonstrated about picking items:
> names( jlis[[1]]$votes$'+'$groups)
[1] "ALDE" "GUE/NGL" "IND/DEM" "NI" "PPE-DE" "PSE" "UEN"
> names( jlis[[2]]$votes$'+'$groups)
[1] "GUE/NGL" "IND/DEM" "NI" "PPE-DE"
> names( jlis[[3]]$votes$'+'$groups)
[1] "ALDE" "GUE/NGL" "IND/DEM" "NI" "PPE-DE" "PSE" "UEN" "Verts/ALE"
Furthermore, when I looked at all of the possible votes values using this method (for all three of the items you made available) I still see no RENEW names.
sapply( jlis[[1]]$votes[c("+","-","0")], function(x) names(x$groups) )
After second edit: Here's the next step of isolating those votes that contain a "Renew` value. I'm assuming that its possible to have a "Renew" value in only one of the three possible 'votes' values (+,-.0). If not (and there are always "Renew" values in each of them when there is one in any of them) then you might be able to simplify the logic. We make three logical vectors:
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['0']][['groups']]) } )
#[1] FALSE FALSE FALSE TRUE
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['+']][['groups']]) } )
#[1] FALSE FALSE FALSE TRUE
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['-']][['groups']]) } )
#[1] FALSE FALSE FALSE TRUE
And then wrap them in a matrix call with 3 columns and take the maximum of each row (the maximum of c(TRUE,FALSE) is 1 and then convert back to logical.
selection_vec = as.logical( apply( matrix( c(
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['0']][['groups']]) } ),
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['+']][['groups']]) } ),
sapply( seq_along(MEPVotes) , function(i){ 'Renew' %in% names( MEPVotes[[i]]$votes[['-']][['groups']]) } ) ),
ncol=3 ), 1,max))
> selection_vec
[1] FALSE FALSE FALSE TRUE
I'm not sure I understand the type of variable I'm working with. It's the result of a binary classifier:
> mod_binary$predictions %>% glimpse()
num [1:10000, 1:2] 0.989 0.904 0.99 0.989 0.989 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "FALSE" "TRUE"
> mod_binary$predictions %>% head()
FALSE TRUE
[1,] 0.9894592 0.01054078
[2,] 0.9044349 0.09556509
[3,] 0.9898756 0.01012441
[4,] 0.9888804 0.01111959
[5,] 0.9890123 0.01098766
[6,] 0.9641537 0.03584634
What is this variable type called? A list? A named list? A named vector?
I would like to retrieve a vector of the TRUE predictions. Tried:
> mod_binary$predictions$TRUE
Error: unexpected numeric constant in "mod_binary$predictions$TRUE"
> mod_binary$predictions[["TRUE"]]
Error in mod_binary$predictions[["TRUE"]] : subscript out of bounds
> mod_binary$predictions[[1]]
[1] 0.9894592
That last one returned a single observation, and actually the wrong one. It's the FALSE prediction (see the call to head() above)
How can I get just a vector of predicted probabilities for TRUE?
It looks like a matrix from the attributes showed in the glimpse and also from the printed format of the data especially the row names ([1,]). So, the $ won't work for extraction.
According to ?Extract
The default methods work somewhat differently for atomic vectors, matrices/arrays and for recursive (list-like, see is.recursive) objects. $ is only valid for recursive objects, and is only discussed in the section below on recursive objects.
mod_binary$predictions[, "TRUE"]
I got this error while merging 1 column from 1 df called data.all to the my working dfcalled data
setDT(data)[setDT(data.all), RX_HOSP_SURG_APPR_2010 := i.RX_HOSP_SURG_APPR_2010, on=c("PUF_CASE_ID","SR_ID" )]
Warning message: In [.data.table(setDT(data), setDT(data.all),
:=(RX_HOSP_SURG_APPR_2010, : Coerced double RHS to logical to
match the type of the target column (column 157 named
'RX_HOSP_SURG_APPR_2010'). If the target column's type logical is
correct, it's best for efficiency to avoid the coercion and create the
RHS as type logical. To achieve that consider R's type postfix:
typeof(0L) vs typeof(0), and typeof(NA) vs typeof(NA_integer_) vs
typeof(NA_real_). You can wrap the RHS with as.logical() to avoid this
warning, but that will still perform the coercion. If the target
column's type is not correct, it's best to revisit where the DT was
created and fix the column type there; e.g., by using colClasses= in
fread(). Otherwise, you can change the column type now by plonking a
new column (of the desired type) over the top of it; e.g. DT[,
RX_HOSP_SURG_APPR_2010:=as.double(RX_HOSP_SURG_APPR_2010)]. If the
RHS of := has nrow(DT) elements then the assignment is called a column
plonk and is the way to change a column's type. Column types can be
observed with sapply(DT,typeof) [... truncated]
I tried different ways but I could not figure this out
str(data$RX_HOSP_SURG_APPR_2010)
logi [1:8671] FALSE FALSE FALSE NA NA NA ...
str(data.all$RX_HOSP_SURG_APPR_2010)
'haven_labelled' num [1:129296] 0 0 NA NA NA NA NA NA NA NA ...
- attr(, "label")= chr "Surgical Approach at this Facility 2010 and Later"
- attr(, "format.spss")= chr "F1.0"
- attr(, "display_width")= int 23
- attr(, "labels")= Named num [1:7] 0 1 2 3 4 5 9 ..- attr(*, "names")= chr [1:7] "No surgical procedure of primary site" "Robotic
assisted" "Robotic converted to open" "Laparoscopic" ...
Any advice will be appreciated.
You could share dput(head(data)) and dput(head(data.all)) of your "gigantic" data. Please improve your question.
In order to assing on the fly during the join, you need both column classes to be the same and, as you noticed, your variable in data is logical (probably because at the time you read it from a file it only had zeros and NAs) while your variable in data.all is a weird class.
You can try to assign the class first with:
class(data$RX_HOSP_SURG_APPR_2010) <- class(data.all$RX_HOSP_SURG_APPR_2010)
I want to first calculate a markov transition matrix and then take exponent of it. To achieve the first goal I use the markovchainFit function inside markovchain package and it return me a data.frame , rather than a matrix. So I need to convert it to matrix before I take exponent.
My R code snippet is like
#################################
# Estimate Transition Matrix #
#################################
setwd("G:/Data_backup/GDP_per_Capita")
library("foreign")
library("Hmisc")
mydata <- stata.get("G:/Data_backup/GDP_per_Capita/states.dta")
mydata
library(markovchain)
library(expm)
rgdp_e=mydata[,2:7]
rgdp_o=mydata[,8:13]
createSequenceMatrix(rgdp_e)
rgdp_e_trans<-markovchainFit(data=rgdp_e,,method="bootstrap",nboot=5, name="Bootstrap Mc")
rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans))
rgdp_e_trans<-as.matrix(rgdp_e_trans)
is.matrix(rgdp_e_trans)
rgdp_e_trans %^% 1/5
the rgdp_e_trans is a data frame, and I try to convert it to a numeric matrix. It seems work when I test it using is.matrix command. However, the final line give me an error said
Error in rgdp_e_trans %^% 2 :
(list) object cannot be coerced to type 'double'
After some searching work in stackoverflow, I find this question sharing the similar problem and use rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans)) to coerce the object to be `double', but it seems not work.
Besides, the data.frame rgdp_e_trans contains no factor or characters
The output in the console is like
> rgdp_e=mydata[,2:7]
> rgdp_o=mydata[,8:13]
> createSequenceMatrix(rgdp_e)
Error: not compatible with STRSXP
> rgdp_e_trans<-markovchainFit(data=rgdp_e,,method="bootstrap",nboot=5, name="Bootstrap Mc")
> rgdp_e_trans
$estimate
1 2 3 4 5
1 0.6172840 0.18930041 0.09053498 0.074074074 0.02880658
2 0.1125828 0.59602649 0.28476821 0.006622517 0.00000000
3 0.0000000 0.03846154 0.60256410 0.358974359 0.00000000
4 0.0000000 0.01162791 0.03488372 0.691860465 0.26162791
5 0.0000000 0.00000000 0.00000000 0.044247788 0.95575221
> rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans))
Error: (list) object cannot be coerced to type 'double'
> rgdp_e_trans<-as.matrix(rgdp_e_trans)
> is.matrix(rgdp_e_trans)
[1] TRUE
> rgdp_e_trans %^% 1/5
Error in rgdp_e_trans %^% 1 :
(list) object cannot be coerced to type 'double'
>
Any suggestion to fix the problem, or alternative way to calculate the exponent ? Thank you.
Additional:
> str(rgdp_e_trans)
List of 1
$ estimate:Formal class 'markovchain' [package "markovchain"] with 4 slots
.. ..# states : chr [1:5] "1" "2" "3" "4" ...
.. ..# byrow : logi TRUE
.. ..# transitionMatrix: num [1:5, 1:5] 0.617 0.113 0 0 0 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:5] "1" "2" "3" "4" ...
.. .. .. ..$ : chr [1:5] "1" "2" "3" "4" ...
.. ..# name : chr "Bootstrap Mc"
and I comment out the as.matrix part
rgdp_e=mydata[,2:7]
rgdp_o=mydata[,8:13]
createSequenceMatrix(rgdp_e)
rgdp_e_trans<-markovchainFit(data=rgdp_e,,method="bootstrap",nboot=5, name="Bootstrap Mc")
rgdp_e_trans
str(rgdp_e_trans)
# rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans))
# rgdp_e_trans<-as.matrix(rgdp_e_trans)
# is.matrix(rgdp_e_trans)
rgdp_e_trans$estimate %^% 1/5
You can access the transition matrix directly from the object returned by markovchainFit as:
rgdp_e_trans$estimate#transitionMatrix
Here rgdp_e_trans is your return value from markovchainFit, which is actually a list containing the information from the fitting process. You access the estimates item of that list by using the $ operator. The estimate object is from a formal S4 class (see e.g. Advanced R by Hadley Wickham for a description of the object systems used in R), which is why in order to access its items you have to use the # operator instead of the standard $ used for the more common S3 objects.
If you print out the return value of as.matrix(rgdp_e_trans) it should be immediately obvious where your initial approach went wrong. In general it's a good idea to check the structure of an object with the str function - instead of relying on its print method - when you encounter unexpected results or are working with new types of objects.
I am using the tm package to compute term-document-matrix for a dataset, I now have to write the term-document-matrix to a file but when I use the write functions in R I am getting a error.
Here is the code which I am using and the error I am getting:
data("crude")
tdm <- TermDocumentMatrix(crude, control = list(weighting = weightTfIdf, stopwords = TRUE))
dtm <- DocumentTermMatrix(crude, control = list(weighting = weightTfIdf, stopwords = TRUE))
and this is the error while I use the write.table command on this data:
Error in cat(list(...), file, sep, fill, labels, append) : argument 1 (type 'list') cannot be handled by 'cat'
I understand that tbm is a object of type Simple Triplet Matrix, but how can I write this to a simple text file.
I think I might be misunderstanding the question, but if all you want to do is export the term document matrix to a file, then how about this:
m <- inspect(tdm)
DF <- as.data.frame(m, stringsAsFactors = FALSE)
write.table(DF)
Is that what you're after mate?
Hope that helps a little,
Tony Breyal
Should the file be "human-readable"? If not, use dump, dput, or save. If so, convert your list into a data.frame.
Edit: You can convert your list into a matrix if each list element is equal length by doing matrix(unlist(list.name), nrow=length(list.name[[1]])) or something like that (or with plyr).
Why aren't you doing your SVM analysis in R (e.g. with kernlab)?
Edit 2: Ok, I looked at your data, and it isn't easy to convert into a matrix because the list elements aren't equal length:
> is.list(tdm)
[1] TRUE
> str(tdm)
List of 7
$ i : int [1:1475] 15 29 151 152 173 205 215 216 227 228 ...
$ j : int [1:1475] 1 1 1 1 1 1 1 1 1 1 ...
$ v : Named num [1:1475] 3.32 4.32 2.32 2 2.32 ...
..- attr(*, "names")= chr [1:1475] "1.50" "16.00" "barrel," "barrel." ...
$ nrow : int 985
$ ncol : int 20
$ dimnames :List of 2
..$ Terms: chr [1:985] "(bpd)" "(bpd)." "(gcc)" "(it) appears to be nearing a crossroads with regard to\nderegulation, both as it pertains to investments and imports," ...
..$ Docs : chr [1:20] "127" "144" "191" "194" ...
$ Weighting: chr [1:2] "term frequency - inverse document frequency" "tf-idf"
- attr(*, "class")= chr [1:2] "TermDocumentMatrix" "simple_triplet_matrix"
In order to convert this to a matrix, you will need to either take elements of this list (e.g. i, j) or else do some other manipulation.
Edit 3: Just to conclude my commentary here: these objects are intended to be used with the inspect function (see the package vignette).
As discussed, in order to use a function like write.table, you will need to convert your list into a matrix, which requires some manipulation of that list such that you have several vectors of equal length. Looking at the structure of these tm objects: this will be very difficult to do, and I suggest you work with the helper functions that are included with that package.
dtmMatrix <- as.matrix(dtm)
write.csv(dtmMatrix, 'mydata.csv')
This certainly does the work. However, when I tried it on a very large DTM (25000 by 35000), it gave errors relating to lack of memory space.
I used the following method:
dtm <- DocumentTermMatrix(corpus)
dtm1 <- removeSparseTerms(dtm,0.998) ##max allowed sparsity 0.998
m <- inspect(dtm1)
DF <- as.data.frame(m, stringsAsFactors = FALSE)
write.csv(DF,"mydata0.998sparse.csv")
Which reduced the size of the document term matrix to a great extent!
Here you can increase the max allowable sparsity (closer to 1) to include more terms in DF.