I need to extract the row of sparseMatrix as sparseVector, however 'drop=FALSE' option does not work well for me.
To explain the issue, I will use an example from extract sparse rows from sparse matrix in r (my question is different since I need to convert extracted row to vector):
i <- c(1,3:8); j <- c(2,9,6:10); x <- 7 * (1:7)
A <- sparseMatrix(i, j, x = x)
b <- sparseVector(7,2,10)
now A[1,,drop=FALSE] and b should have the same value.
However, A[1,,drop=FALSE] is still a matrix with 2 dimensions. So if I try Matrix::crossprod(b), I get:
1 x 1 Matrix of class "dsyMatrix"
[,1]
[1,] 49
but if I try Matrix::crossprod(A[1,,drop=FALSE]), then I get:
10 x 10 sparse Matrix of class "dsCMatrix"
[1,] . . . . . . . . . .
[2,] . 49 . . . . . . . .
[3,] . . . . . . . . . .
[4,] . . . . . . . . . .
[5,] . . . . . . . . . .
[6,] . . . . . . . . . .
[7,] . . . . . . . . . .
[8,] . . . . . . . . . .
[9,] . . . . . . . . . .
[10,] . . . . . . . . . .
How can I get just 49 in the second case in efficient way (Matrix::crossprod should be faster than %*%, as far as I understand from the description of the function)?
Also, b%*%b works perfectly correct, while A[1,,drop=FALSE]%*%A[1,,drop=FALSE] returns the following error:
Cholmod error 'A and B inner dimensions must match' at file ../MatrixOps/cholmod_ssmult.c, line 82
I am not quite sure there is a method for (directly) casting a sparse matrix row as a sparse vector.
The reason why you are getting an error from
A[1,,drop=FALSE]%*%A[1,,drop=FALSE]
is that you're multiplying matrices that have the same dimension. You need to transpose the second matrix:
A[1,,drop=FALSE] %*% t(A[1,,drop=FALSE])
will return a 1x1 sparse matrix which you can then cast as.numeric()
Related
I have read a few other answers but none have worked for me in extracting the penalized logistic regression coefficients from my final trained model.
penlrntune = makeLearner("classif.glmnet", predict.type = "prob", par.vals = logres.mbo$x)
set.seed(9)
pen_model_tune <- mlr::train(penlrntune, task = trn.task)
## when i run coef(getLearnerModel(pen_model_tune)
coef(getLearnerModel(pen_model_tune, more.unwrap = TRUE))
36 x 83 sparse Matrix of class "dgCMatrix"
[[ suppressing 45 column names ‘s0’, ‘s1’, ‘s2’ ... ]]
[[ suppressing 45 column names ‘s0’, ‘s1’, ‘s2’ ... ]]
(Intercept) -3.15367 -3.251889 -3.3528 -3.455813 -3.560283 -3.665525 -3.792622 -3.816052 -3.093964 -2.249888
Obesity . . . . . . . . . .
Diabetes . . . . . . . . . .
Smoke . . . . . . . . . .
Pulm . . . . . . . . . .
Cardiac . . . . . . . . . .
Dyspnea . . . . . . . . . .
Steroids . . . . . . . . . .
RenalComorb . . . . . . . . . .
emergency.Yes . . . . . . . . . .
ASA3 . . . . . . . . . .
this doesn't make much sense as you can tell. The model otherwise works well, so i must be missing something.
I have reproduced this issue using the pimaindiansdiabetes2 dataset from the mlbench package, when all i want is the logistic regression coefficients from the trained model.
library(mlbench)
library(mlr)
# Load the data and remove NAs
data("PimaIndiansDiabetes2", package = "mlbench")
PimaIndiansDiabetes2 <- na.omit(PimaIndiansDiabetes2)
PimaIndiansDiabetes2 <- PimaIndiansDiabetes2[2:9]
set.seed(9)
indian.task = makeClassifTask(data=PimaIndiansDiabetes2, target = "diabetes", positive = "pos")
testlrn <-makeLearner("classif.glmnet", predict.type = "prob", par.vals = list(alpha = 0.339, s=0.00556))
testlrn_train <- mlr::train(testlrn, indian.task)
coef(getLearnerModel(testlrn_train))
8 x 70 sparse Matrix of class "dgCMatrix"
[[ suppressing 70 column names ‘s0’, ‘s1’, ‘s2’ ... ]]
(Intercept) -0.7008101 -0.832287755 -0.96789148 -1.107230550 -1.249880176 -1.395389472 -1.593724136
glucose . 0.001070684 0.00217191 0.003300326 0.004452356 0.005624265 0.006706897
pressure . . . . . . .
triceps . . . . . . .
insulin . . . . . . .
mass . . . . . . .
pedigree . . . . . . .
age . . . . . . 0.002042163
(Intercept) -1.795136758 -1.995953151 -2.264942612 -2.5516299342 -2.840942953 -3.125132e+00 -3.411189e+00
glucose 0.007787101 0.008868089 0.009892814 0.0108901514 0.011863974 1.280009e-02 1.367427e-02
pressure . . . . . . .
triceps . . . 0.0003408727 0.001478306 2.543441e-03 3.465742e-03
insulin . . . . . 2.182663e-05 6.272459e-05
mass . . 0.002302462 0.0049824211 0.007195585 9.319553e-03 1.125890e-02
pedigree . . . . . 4.557533e-03 4.506202e-02
age 0.004173928 0.006265506 0.008297616 0.0102624303 0.012130153 1.393673e-02 1.565741e-02
i am a bit stuck on this problem. I am trying to delete the dots before the first number, but any dots between two numbers i would like to keep.
. . . . . . . . . . . . . . 122 (100.0) . . . . . . . . . . . . . . 7 (5. 7)
for example the above should output to
122 (100.0) . . . . . . . . . . . . . . 7 (5. 7)
I am not sure what functions or package i should be using to do the above
Thanks!
This should work for what you're asking.
sub('^[\\h.]+', '', x, perl=TRUE)
Maybe something like this:
#copy pasted from your example
text <- ". . . . . . . . . . . . . . 122 (100.0) . . . . . . . . . . . . . . 7 (5. 7)"
#find the location of the first number using gregexpr
loc <- gregexpr('[0-9]', text)[[1]][1]
#substring the text from loc and until the end
substr(text, loc, nchar(text)) # or substring(text, loc)
Output:
[1] "122 (100.0) . . . . . . . . . . . . . . 7 (5. 7)"
I have a total of 54892 documents. After retrieving them from the database, how am I supposed to convert them to a corpus that can be used for Topic Modelling using LDA?
This is the code I have tried:
library(RMySQL)
library(RTextTools)
library(topicmodels)
library(tm)
con <- dbConnect(MySQL(), user="root", password="root", dbname="dbtemp", host="localhost")
rs <- dbSendQuery(con, "select text_body from all_text;")
data <- fetch(rs, n=54892)
huh <- dbHasCompleted(rs)
dbClearResult(rs)
dbDisconnect(con)
I referred to this page, and noticed that the output of data from the line data <- NYTimes[sample(1:3100,size=1000,replace=FALSE),] contains a two column table along with another table with something called TopicCode, then this data is converted to a term-document frequency matrix. I don't know how to get that TopicCode from the two colums that I retrieved from the database?
I have tried a similar problem in Python where I converted the data to a Market Matrix format. I thought of using this file for further computations in R. I tried reading this file using b <- readMM(file="PRC.mm") and when I printed b I got a 336331X88 matrix which looked like :
. . 2 . . . . . . 1 1 . 1 . . 1 . 2 . . . . . . . . . . . . . ......
. 1 . . . . . . 1 1 . . . . . . . . . . . . . . . . . . . . . ......
. . . . . . . . . 1 1 1 . . . 2 . . . . . . . 1 . . 1 . . . . ......
. . 1 . . . 2 . . . . 1 1 . . . . . . . 1 . . . . . . . . . . ......
where . means 0. This looks like a term-document matrix but I still want to remake such kind of matrix in R. What should I do?
This problem to me is harder than it might sound. I imported a GML file. I now have all of my rows with numbers followed by a ,. I can't figure out how to remove and make numeric. I have tried as.numeric and gsub, but when I do my adjacency matrix I get this output:
[1,] . 1 . . 1 . . . . 1 . . . . . . 1 . . . . . . 1 . . . . . . . . . 1 . 1 . . . ......
[2,] 1 . . . . . . . . . . . . . . . . . . . . . . . . 1 . 1 . . . . . 1 . . . 1 . ......
I need the numbers in the [1,] to be a real number so I can attempt a loop that I will come back later for help on!
This code doesn't work:
games[0] <- as.numeric(gsub("[^[:digit:]]","",games[0]))
I get this error:
Error in `[<-.igraph`(`*tmp*`, 0, value = numeric(0)) :
Logical or numeric value must be of length 1
Here is the code I have:
library(igraph)
games <- read.graph("football.gml", format="gml")
and I eventually need to be able to look this algorithm:
get.shortest.paths(games, 1, 155, weights = NULL ,output=c("vpath", "epath", "both"))
[1,] is a row with multiple values (one for each column), not a single string. gsub returns an error because it is only designed for use on a single string. You need to loop over each value in the n x k matrix (or use an apply function to do this) and apply the gsub function to each individual value. Also not sure why you are replacing "[^[:digit:]]". Keep in mind this will substitute out the literal string "[^[:digit:]]" , not whatever this references in R. Here is an example in a loop:
for (i in 1:nrow(data)){
for (j in 1:ncol(data)){
data[i,j] <- gsub(".", "", data[i,j])
}
}
Maybe you could do something creative like this:
read.table(text='1 2 3 4 ,
5 6 7 8 ,
9 1 2 3 ,', sep=' ', na.strings=',')
And then drop the last column.
I get a weird effect when trying to overload the + operator and using the Matrix package with sparse matrices. I first define a very simple class that does not use the Matrix package but has a + operator. I then sum two sparse matrices. The first M+M addition delivers the expected result but the second throws an error. Here is a very simple code that generates the error:
require(Matrix)
setClass("TestM",representation(M='numeric'))
setMethod("initialize", "TestM", function(.Object,x) {
.Object#M = x
.Object
})
setMethod("+", c("TestM","TestM"), function(e1,e2) {
e1#M + e2#M
})
M = Matrix(diag(1:10),sparse=T)
M+M # > FINE
M+M # > ERROR
M = Matrix(diag(1:10),sparse=F)
M+M # > FINE
M+M # > FINE
The second addition throws the following error:
Error in forceSymmetric(callGeneric(as(e1, "dgCMatrix"), as(e2, "dgCMatrix"))) :
error in evaluating the argument 'x' in selecting a method for function
'forceSymmetric': Error in .Arith.Csparse(e1, e2, .Generic, class. = "dgCMatrix") :
object '.Generic' not found
And the error does not happen if the matrices are not sparse. Is there some interference between the + I define and the + for sparseMatrix ? Do I not define the + operator correctly?
Thank you!
Try setting the Ops class to be overloaded:
> setMethod(Ops, c("TestM","TestM"), function(e1,e2) {
+ e1#M + e2#M
+ })
[1] "Ops"
attr(,"package")
[1] "base"
>
> M = Matrix(diag(1:10),sparse=T)
> M+M # > FINE
10 x 10 sparse Matrix of class "dsCMatrix"
[1,] 2 . . . . . . . . .
[2,] . 4 . . . . . . . .
[3,] . . 6 . . . . . . .
[4,] . . . 8 . . . . . .
[5,] . . . . 10 . . . . .
[6,] . . . . . 12 . . . .
[7,] . . . . . . 14 . . .
[8,] . . . . . . . 16 . .
[9,] . . . . . . . . 18 .
[10,] . . . . . . . . . 20
> M+M # (NOT error)... was ERROR
10 x 10 sparse Matrix of class "dsCMatrix"
[1,] 2 . . . . . . . . .
[2,] . 4 . . . . . . . .
[3,] . . 6 . . . . . . .
[4,] . . . 8 . . . . . .
[5,] . . . . 10 . . . . .
[6,] . . . . . 12 . . . .
[7,] . . . . . . 14 . . .
[8,] . . . . . . . 16 . .
[9,] . . . . . . . . 18 .
[10,] . . . . . . . . . 20
Almost 3,5 years later, I stumbled upon the same error, which is also topic of a more recent question: setMethod and package Matrix. I have sent a bug report to R-devel but found out afterwards, that the error is only reproducible if overloading is done outside of a package. In other words, if you define the + method for objects of class testM within a package and load the overloaded +-function by loading the package, it will solve the initial problem without having to overload the whole group in a generic manner (which is not always possible).