I want to save the integral values in an array.Say,from q=1 to q=10 in the following program.But due to output with a non-numeric part ,not being able to do so.Kindly help
q=10
integrand<-function(x)(q*x^3)
integrate(integrand,lower=0,upper=10)
the output is 25000 with absolute error < 2.8e-10
How to remove the non-numerical part?
str() is your friend to figure this out:
> intval <- integrate(integrand,lower=0,upper=10)
> str(intval)
List of 5
$ value : num 25000
$ abs.error : num 2.78e-10
$ subdivisions: int 1
$ message : chr "OK"
$ call : language integrate(f = integrand, lower = 0, upper = 10)
- attr(*, "class")= chr "integrate"
So you can see that it is the value member you need:
> intval$value
[1] 25000
Then:
integrand<-function(x,q=10)(q*x^3)
tmpfun <- function(q) {
integrate(integrand,lower=0,upper=10,q=q)$value
}
sapply(1:10,tmpfun)
## [1] 2500 5000 7500 10000 12500 15000 17500 20000 22500 25000
I hope this is a simplified example, because this particular answer is much more simply obtained by (1) integrating analytically and (2) realizing that a scalar multiple can be taken out of an integral: 1:10*(10^4/4) gets the same answer.
Related
I have the following object
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..# i : int [1:120671481] 0 2 3 6 10 13 21 22 25 36 ...
..# p : int [1:51366] 0 3024 4536 8694 3302271 3302649 5715381 5756541 5784009 5801691 ...
..# Dim : int [1:2] 10314738 51365
..# Dimnames:List of 2
.. ..$ : chr [1:10314738] "line1" "line2" "line3" "line4" ...
.. ..$ : chr [1:51365] "sparito" "davide," "15enne" "di" ...
.. .. ..- attr(*, ".match.hash")=Class 'match.hash' <externalptr>
..# x : num [1:120671481] 1 1 1 1 1 1 1 1 1 1 ...
..# factors : list()
This object comes from the function dtm_builder of text2map package. Since I would like to remove empty rows from the matrix, I thought about using the command:
raw.sum=apply(dtm,1,FUN=sum) #sum by raw each raw of the table
dtm2=dtm[raw.sum!=0,]
Anyway, I obtained the following error:
Error in asMethod(object): Cholmod error 'problem too large' at file ..
How could I fix it?
The short answer to your problem is that you're likely converting a sparse object to a dense object. Matrix package sparse matrix classes are very memory efficient when a matrix has a lot of zeros (like a DTM) by simply not allocating memory for the zeros.
#akrun's answer should work, but there is a rowSums function in base R and a rowSums function from the Matrix package. You would need to load the Matrix package first.
Here is an example dgCMatrix (note not loading Matrix package yet)
m1 <- Matrix::Matrix(1:9, 3, 3, sparse = TRUE)
m1[1, 1:3] <- 0
class(m1)
If we use the base R rowSums you get the error:
rowSums(m1)
Error in rowSums(dtm): 'x' must be an array of at least two dimensions
If the Matrix package is loaded,rowSums will be replaced with the Matrix package's own method, which works with dgCMatrix. This is also true for the bracket operators [. If you update text2map to version 0.1.5, Matrix is loaded by default.
That is a massive DTM, so you may still run into memory issues -- which will depend on your machine. One thing to note is that removing sparse rows/columns will not help much. So, although words that occur once or twice will make up about 60% of your columns, you will reduce the size in terms of memory more by removing the most frequent words (i.e. words with a number in every row).
I have n matrices of which I am trying to apply nearPD()from the Matrixpackage.
I have done this using the following code:
A<-lapply(b, nearPD)
where b is the list of n matrices.
I now would like to convert the list A into matrices. For an individual matrix I would use the following code:
A<-matrix(runif(n*n),ncol = n)
PD_mat_A<-nearPD(A)
B<-as.matrix(PD_mat_A$mat)
But I am trying to do this with a list. I have tried the following code but it doesn't seem to work:
d<-lapply(c, as.matrix($mat))
Any help would be appreciated. Thank you.
Here is a code so you can try and reproduce this:
n<-10
generate<-function (n){
matrix(runif(10*10),ncol = 10)
}
b<-lapply(1:n, generate)
Here is the simplest method using as.matrix as noted by #nicola in the comments below and (a version using apply) by #cimentadaj in the comments above:
d <- lapply(A, function(i) as.matrix(i$mat))
My original answer, exploiting the nearPD data structure was
With a little fiddling with the nearPD object type, here is an extraction method:
d <- lapply(A, function(i) matrix(i$mat#x, ncol=i$mat#Dim[2]))
Below is some commentary on how I arrived at my answer.
This object is fairly complicated as str(A[[1]]) returns
List of 7
$ mat :Formal class 'dpoMatrix' [package "Matrix"] with 5 slots
.. ..# x : num [1:100] 0.652 0.477 0.447 0.464 0.568 ...
.. ..# Dim : int [1:2] 10 10
.. ..# Dimnames:List of 2
.. .. ..$ : NULL
.. .. ..$ : NULL
.. ..# uplo : chr "U"
.. ..# factors : list()
$ eigenvalues: num [1:10] 4.817 0.858 0.603 0.214 0.15 ...
$ corr : logi FALSE
$ normF : num 1.63
$ iterations : num 2
$ rel.tol : num 0
$ converged : logi TRUE
- attr(*, "class")= chr "nearPD"
You are interested in the "mat" which is accessed by $mat. The # symbols show that "mat" is an s4 object and its components are accessed using #. The components of interest are "x", the matrix content, and "Dim" the dimension of the matrix. The code above puts this information together to extract the matrices from the list of "nearPD" objects.
Below is a brief explanation of why as.matrix works in this case. Note the matrix inside a nearPD object is not a matrix:
is.matrix(A[[1]]$mat)
[1] FALSE
However, it is a "Matrix":
class(A[[1]]$mat)
[1] "dpoMatrix"
attr(,"package")
[1] "Matrix"
From the note in the help file, help("as.matrix,Matrix-method"),
Loading the Matrix namespace “overloads” as.matrix and as.array in the base namespace by the equivalent of function(x) as(x, "matrix"). Consequently, as.matrix(m) or as.array(m) will properly work when m inherits from the "Matrix" class.
So, the Matrix package is taking care of the as.matrix conversion "under the hood."
I want to first calculate a markov transition matrix and then take exponent of it. To achieve the first goal I use the markovchainFit function inside markovchain package and it return me a data.frame , rather than a matrix. So I need to convert it to matrix before I take exponent.
My R code snippet is like
#################################
# Estimate Transition Matrix #
#################################
setwd("G:/Data_backup/GDP_per_Capita")
library("foreign")
library("Hmisc")
mydata <- stata.get("G:/Data_backup/GDP_per_Capita/states.dta")
mydata
library(markovchain)
library(expm)
rgdp_e=mydata[,2:7]
rgdp_o=mydata[,8:13]
createSequenceMatrix(rgdp_e)
rgdp_e_trans<-markovchainFit(data=rgdp_e,,method="bootstrap",nboot=5, name="Bootstrap Mc")
rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans))
rgdp_e_trans<-as.matrix(rgdp_e_trans)
is.matrix(rgdp_e_trans)
rgdp_e_trans %^% 1/5
the rgdp_e_trans is a data frame, and I try to convert it to a numeric matrix. It seems work when I test it using is.matrix command. However, the final line give me an error said
Error in rgdp_e_trans %^% 2 :
(list) object cannot be coerced to type 'double'
After some searching work in stackoverflow, I find this question sharing the similar problem and use rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans)) to coerce the object to be `double', but it seems not work.
Besides, the data.frame rgdp_e_trans contains no factor or characters
The output in the console is like
> rgdp_e=mydata[,2:7]
> rgdp_o=mydata[,8:13]
> createSequenceMatrix(rgdp_e)
Error: not compatible with STRSXP
> rgdp_e_trans<-markovchainFit(data=rgdp_e,,method="bootstrap",nboot=5, name="Bootstrap Mc")
> rgdp_e_trans
$estimate
1 2 3 4 5
1 0.6172840 0.18930041 0.09053498 0.074074074 0.02880658
2 0.1125828 0.59602649 0.28476821 0.006622517 0.00000000
3 0.0000000 0.03846154 0.60256410 0.358974359 0.00000000
4 0.0000000 0.01162791 0.03488372 0.691860465 0.26162791
5 0.0000000 0.00000000 0.00000000 0.044247788 0.95575221
> rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans))
Error: (list) object cannot be coerced to type 'double'
> rgdp_e_trans<-as.matrix(rgdp_e_trans)
> is.matrix(rgdp_e_trans)
[1] TRUE
> rgdp_e_trans %^% 1/5
Error in rgdp_e_trans %^% 1 :
(list) object cannot be coerced to type 'double'
>
Any suggestion to fix the problem, or alternative way to calculate the exponent ? Thank you.
Additional:
> str(rgdp_e_trans)
List of 1
$ estimate:Formal class 'markovchain' [package "markovchain"] with 4 slots
.. ..# states : chr [1:5] "1" "2" "3" "4" ...
.. ..# byrow : logi TRUE
.. ..# transitionMatrix: num [1:5, 1:5] 0.617 0.113 0 0 0 ...
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:5] "1" "2" "3" "4" ...
.. .. .. ..$ : chr [1:5] "1" "2" "3" "4" ...
.. ..# name : chr "Bootstrap Mc"
and I comment out the as.matrix part
rgdp_e=mydata[,2:7]
rgdp_o=mydata[,8:13]
createSequenceMatrix(rgdp_e)
rgdp_e_trans<-markovchainFit(data=rgdp_e,,method="bootstrap",nboot=5, name="Bootstrap Mc")
rgdp_e_trans
str(rgdp_e_trans)
# rgdp_e_trans<-as.numeric(unlist(rgdp_e_trans))
# rgdp_e_trans<-as.matrix(rgdp_e_trans)
# is.matrix(rgdp_e_trans)
rgdp_e_trans$estimate %^% 1/5
You can access the transition matrix directly from the object returned by markovchainFit as:
rgdp_e_trans$estimate#transitionMatrix
Here rgdp_e_trans is your return value from markovchainFit, which is actually a list containing the information from the fitting process. You access the estimates item of that list by using the $ operator. The estimate object is from a formal S4 class (see e.g. Advanced R by Hadley Wickham for a description of the object systems used in R), which is why in order to access its items you have to use the # operator instead of the standard $ used for the more common S3 objects.
If you print out the return value of as.matrix(rgdp_e_trans) it should be immediately obvious where your initial approach went wrong. In general it's a good idea to check the structure of an object with the str function - instead of relying on its print method - when you encounter unexpected results or are working with new types of objects.
I have read an ascii (.spe) file into R. This file contains one column of, mostly, integers. However R is interpreting these integers incorrectly, probably because I am not specifying the correct format or something like that. The file was generated in Ortec Maestro software. Here is the code:
library(SDMTools)
strontium<-read.table("C:/Users/Hal 2/Desktop/beta_spec/strontium 90 spectrum.spe",header=F,skip=2)
str_spc<-vector(mode="numeric")
for (i in 1:2037)
{
str_spc[i]<-as.numeric(strontium$V1[i+13])
}
Here, for example, strontium$V1[14] has the value 0, but R is interpreting it as a 10. I think I may have to convert the data to some other format, or something like that, but I'm not sure and I'm probably googling the wrong search terms.
Here are the first few lines from the file:
$SPEC_ID:
No sample description was entered.
$SPEC_REM:
DET# 1
DETDESC# MCB 129
AP# Maestro Version 6.08
$DATE_MEA:
10/14/2014 15:13:16
$MEAS_TIM:
1516 1540
$DATA:
0 2047
Here is a link to the file: https://www.dropbox.com/sh/y5x68jen487qnmt/AABBZyC6iXBY3e6XH0XZzc5ba?dl=0
Any help appreciated.
I saw someone had made a parser for SPE Spectra files in python and I can't let that stand without there being at least a minimally functioning R version, so here's one that parses some of the fields, but gets you your data:
library(stringr)
library(gdata)
library(lubridate)
read.spe <- function(file) {
tmp <- readLines(file)
tmp <- paste(tmp, collapse="\n")
records <- strsplit(tmp, "\\$")[[1]]
records <- records[records!=""]
spe <- list()
spe[["SPEC_ID"]] <- str_match(records[which(startsWith(records, "SPEC_ID"))],
"^SPEC_ID:[[:space:]]*([[:print:]]+)[[:space:]]+")[2]
spe[["SPEC_REM"]] <- strsplit(str_match(records[which(startsWith(records, "SPEC_REM"))],
"^SPEC_REM:[[:space:]]*(.*)")[2], "\n")
spe[["DATE_MEA"]] <- mdy_hms(str_match(records[which(startsWith(records, "DATE_MEA"))],
"^DATE_MEA:[[:space:]]*(.*)[[:space:]]$")[2])
spe[["MEAS_TIM"]] <- strsplit(str_match(records[which(startsWith(records, "MEAS_TIM"))],
"^MEAS_TIM:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]
spe[["ROI"]] <- str_match(records[which(startsWith(records, "ROI"))],
"^ROI:[[:space:]]*(.*)[[:space:]]$")[2]
spe[["PRESETS"]] <- strsplit(str_match(records[which(startsWith(records, "PRESETS"))],
"^PRESETS:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]
spe[["ENER_FIT"]] <- strsplit(str_match(records[which(startsWith(records, "ENER_FIT"))],
"^ENER_FIT:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]
spe[["MCA_CAL"]] <- strsplit(str_match(records[which(startsWith(records, "MCA_CAL"))],
"^MCA_CAL:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]
spe[["SHAPE_CAL"]] <- str_match(records[which(startsWith(records, "SHAPE_CAL"))],
"^SHAPE_CAL:[[:space:]]*(.*)[[:space:]]*$")[2]
spe_dat <- strsplit(str_match(records[which(startsWith(records, "DATA"))],
"^DATA:[[:space:]]*(.*)[[:space:]]$")[2], "\n")[[1]]
spe[["SPE_DAT"]] <- as.numeric(gsub("[[:space:]]", "", spe_dat)[-1])
return(spe)
}
dat <- read.spe("strontium 90 spectrum.Spe")
str(dat)
## List of 10
## $ SPEC_ID : chr "No sample description was entered."
## $ SPEC_REM :List of 1
## ..$ : chr [1:3] "DET# 1" "DETDESC# MCB 129" "AP# Maestro Version 6.08"
## $ DATE_MEA : POSIXct[1:1], format: "2014-10-14 15:13:16"
## $ MEAS_TIM : chr "1516 1540"
## $ ROI : chr "0"
## $ PRESETS : chr [1:3] "None" "0" "0"
## $ ENER_FIT : chr "0.000000 0.002529"
## $ MCA_CAL : chr [1:2] "3" "0.000000E+000 2.529013E-003 0.000000E+000 keV"
## $ SHAPE_CAL: chr "3\n3.100262E+001 0.000000E+000 0.000000E+000"
## $ SPE_DAT : num [1:2048] 0 0 0 0 0 0 0 0 0 0 ...
head(dat$SPE_DAT)
## [1] 0 0 0 0 0 0
It needs some polish and there's absolutely no error checking (i.e. for missing fields), but no time today to deal with that. I'll finish the parsing and make a minimal package wrapper for it over the next couple days.
I am using the tm package to compute term-document-matrix for a dataset, I now have to write the term-document-matrix to a file but when I use the write functions in R I am getting a error.
Here is the code which I am using and the error I am getting:
data("crude")
tdm <- TermDocumentMatrix(crude, control = list(weighting = weightTfIdf, stopwords = TRUE))
dtm <- DocumentTermMatrix(crude, control = list(weighting = weightTfIdf, stopwords = TRUE))
and this is the error while I use the write.table command on this data:
Error in cat(list(...), file, sep, fill, labels, append) : argument 1 (type 'list') cannot be handled by 'cat'
I understand that tbm is a object of type Simple Triplet Matrix, but how can I write this to a simple text file.
I think I might be misunderstanding the question, but if all you want to do is export the term document matrix to a file, then how about this:
m <- inspect(tdm)
DF <- as.data.frame(m, stringsAsFactors = FALSE)
write.table(DF)
Is that what you're after mate?
Hope that helps a little,
Tony Breyal
Should the file be "human-readable"? If not, use dump, dput, or save. If so, convert your list into a data.frame.
Edit: You can convert your list into a matrix if each list element is equal length by doing matrix(unlist(list.name), nrow=length(list.name[[1]])) or something like that (or with plyr).
Why aren't you doing your SVM analysis in R (e.g. with kernlab)?
Edit 2: Ok, I looked at your data, and it isn't easy to convert into a matrix because the list elements aren't equal length:
> is.list(tdm)
[1] TRUE
> str(tdm)
List of 7
$ i : int [1:1475] 15 29 151 152 173 205 215 216 227 228 ...
$ j : int [1:1475] 1 1 1 1 1 1 1 1 1 1 ...
$ v : Named num [1:1475] 3.32 4.32 2.32 2 2.32 ...
..- attr(*, "names")= chr [1:1475] "1.50" "16.00" "barrel," "barrel." ...
$ nrow : int 985
$ ncol : int 20
$ dimnames :List of 2
..$ Terms: chr [1:985] "(bpd)" "(bpd)." "(gcc)" "(it) appears to be nearing a crossroads with regard to\nderegulation, both as it pertains to investments and imports," ...
..$ Docs : chr [1:20] "127" "144" "191" "194" ...
$ Weighting: chr [1:2] "term frequency - inverse document frequency" "tf-idf"
- attr(*, "class")= chr [1:2] "TermDocumentMatrix" "simple_triplet_matrix"
In order to convert this to a matrix, you will need to either take elements of this list (e.g. i, j) or else do some other manipulation.
Edit 3: Just to conclude my commentary here: these objects are intended to be used with the inspect function (see the package vignette).
As discussed, in order to use a function like write.table, you will need to convert your list into a matrix, which requires some manipulation of that list such that you have several vectors of equal length. Looking at the structure of these tm objects: this will be very difficult to do, and I suggest you work with the helper functions that are included with that package.
dtmMatrix <- as.matrix(dtm)
write.csv(dtmMatrix, 'mydata.csv')
This certainly does the work. However, when I tried it on a very large DTM (25000 by 35000), it gave errors relating to lack of memory space.
I used the following method:
dtm <- DocumentTermMatrix(corpus)
dtm1 <- removeSparseTerms(dtm,0.998) ##max allowed sparsity 0.998
m <- inspect(dtm1)
DF <- as.data.frame(m, stringsAsFactors = FALSE)
write.csv(DF,"mydata0.998sparse.csv")
Which reduced the size of the document term matrix to a great extent!
Here you can increase the max allowable sparsity (closer to 1) to include more terms in DF.