I am not sure what I am doing wrong here.
ee <- eigen(crossprod(X))$values
for(i in 1:length(ee)){
if(ee[i]==0:1e^-9) stop("singular Matrix")}
Using the eigen value approach, I am trying to determine if the matrix is singular or not. I am attempting to find out if one of the eigen values of the matrix is between 0 and 10^-9. How can I use the if statement (as above) correctly to achieve my goal? Is there any other way to approach this?
what if I want to concatenate the zero eigen value in vector
zer <-NULL
ee <- eigen(crossprod(X))$values
for(i in 1:length(ee)){
if(abs(ee[i])<=1e-9)zer <- c(zer,ee[i])}
Can I do that?
#AriBFriedman is quite correct. I can, however see a couple of other issues
1e^-9 should be 1e-9.
0:1e-9 returns 0, (: creates a sequence by one between 0 and 1e-9, therefore returns just 0. See ?`:` for more details
Using == with decimals will cause problems due to floating point arithmetic
In the form written, your code checks (individually) whether the elements ee[i] == 0, which is not what you want (nor does it make sense in terms floating point arithmetic)
You are looking for cases where the eigen value is less than this small number, so use less than (<).
What you are looking for is something like
if(any(abs(ee) < 1e-9)) stop('singular matrix')
If you want to get the 0 (or small) eigen vectors, then use which
# this will give the indexs (which elements are small)
small_values <- which(abs(ee) < 1e-9))
# and those small values
ee[small_values]
There is no need for the for loop as everything being done is vectorized.
if takes a single argument of length 1.
Try either ifelse or using any() or all() to turn your vector of logicals into a logical vector of length 1.
Here's an example reproducing your data:
X <- matrix(1:10,1:10)
ee <- eigen(crossprod(X))$values
This will test if any of the values of ee are > 0 AND< 1e-9
if (any((ee > 0) & (ee < 1e-9))) {stop("singular matrix")}
Related
I am trying to generate the term frequency matrix of a document and subsequently look up the frequency of a certain word in a given query in that matrix. In the end I want to sum the frequencies found of the words in the query.
However, I am coping with the error message: Error in feature[i] <- x : replacement has length zero
I do not have a lot of coding experience in general, and this is my first time working with R, thus I am having difficulties solving this error. I presume it has something to do with a null-value. I already tried to avoid the nested for-loop with an apply function because I thought that might help (not sure though), but I could not quite get the hang of how to convert the for-loop into an apply function.
termfreqname <- function(queries,docs){
n <- length(queries)
feature <- vector(length=n)
for(i in 1:n){
query <- queries[i]
documentcorpus <- c(docs[i])
tdm <- TermDocumentMatrix(tm_corpus) #creates the term frequency matrix per document
m <- sapply(strsplit(query, " "), length) #length of the query in words
totalfreq <- list(0) #initialize list
freq_counter <- rowSums(as.matrix(tdm)) #counts the occurrence of a given word in the tdm matrix
for(j in 1:m){
freq <- freq_counter[word(query,j)] #finds frequency of each word in the given query, in the term frequency matrix
totalfreq[[j]] <- freq #adds this frequency to position j in the list
}
x <- reduce(totalfreq,'+') #sums all the numbers in the list
feature[i] <- x #adds this number to feature list
feature
}
}
It depends on your needs, but bottom line you need to add some if statement. How you use it depends on whether you want the default value of the vector to persist. In your code, while feature starts as a logical vector, it is likely coerced to integer or numeric once you overwrite its first value with a number. In that case, the default value in all positions of the vector will be 0 (or 0L, if integer). That's going to influence your decision on how to use the if statement.
if (length(x)) feature[i] <- x
This will only attempt to overwrite the ith value of feature if the x objects has length (that's equivalent to if (length(x) > 0)). In this case, since the default value in the vector will be zero, this means when you are done that you will not be able to distinguish between an element known to be 0 and an element that failed to find anything.
The alternative (and my preference/recommendation):
feature[i] <- if (length(x)) x else NA
In this case, when you are done, you can clearly distinguish between known-zero (0) and uncertain/unknown values (NA). When doing math operations on that vector, you might want/need na.rm=TRUE ... but it all depends on your use.
BTW, as MartinGal noted, your use of reduce(totalfreq, '+') is a little flawed: 'x' may not be (is not?) recognized as a known function. The first fix to this is to use backticks around the function, so
totalfreq <- 5:7
reduce(totalfreq, '+')
# NULL
reduce(totalfreq, `+`)
# [1] 18
sum(totalfreq)
# [1] 18
There the last is the much-more-preferred method. Why? With a vector of length 4, for instance, it takes the first two and adds them, then takes that result and adds it to the third, then takes that result and adds to the fourth. Three operations. When you have 100 elements, it will make 99 individual additions. sum does it once, and this does have an effect on performance (asymptotically).
However, if totalfreq is instead a list, then this changes slightly:
totalfreq <- as.list(5:7)
reduce(totalfreq, `+`)
# [1] 18
sum(totalfreq)
# Error in sum(totalfreq) : invalid 'type' (list) of argument
# x
sum(unlist(totalfreq))
# [1] 18
The reduce code still works, and the sum by itself fails, but we can unlist the list first, effectively creating a vector, and then call sum on that. Much much faster asymptotically. And perhaps clearer, more declarative.
(I'm assuming purrr::reduce, btw ...)
Say I have a function func that takes two scalar numeric inputs and delivers a scalar numeric result, and I have the following code to calculate a result vector u, based on input numeric vector v and initial value u0 for the result vector:
u<-rep(u0,1+length(v))
for (k in 2:length(u)){
u[k]<-func(u[k-1],v[k-1])
}
Note how a component of the result vector depends not only on the corresponding element of the input vector but also on the immediately prior element of the result vector. I can see no obvious way to vectorise this.
It is common to do this sort of thing in financial simulations, for instance when projecting forward company accounts, rolling them up with interest or inflation and adding in operational cash flows each year.
For some specific instances, it is possible to find a case-specific, non-iterative coding, but I would like to know if there's a general solution.
The problem can also be coded by recursion, as follows:
calc.u<-function(v,u0){
if (length(v)<2){
func(u0,v[1]) }
else {
u.prior<-func(u0,v[-length(v),drop=FALSE])
c(u.prior,func(u.prior[length(u.prior)],v[length(v)]) )
}
u<-calc.u(v,u0)
Is there an R tactic for doing this without using either iteration or recursion, ie for vectorising it?
Answered: Thank you #MrFlick for introducing me to the Reduce function, which does exactly what I was wanting. I see that
Reduce('+',v,0,accumulate=T)[-1]
gives me
cumsum(v)
and
Reduce('*',v,0,accumulate=T)[-1]
gives me
cumprod(v)
as expected, where the [-1] is to discard the initial value.
Very nice indeed! Thanks again.
If you have this example
u0 <- 5
v <- (1:5)*2
func <- function(u,v) {u/2+v}
u <- rep(u0,1+length(v))
for (k in 2:length(u)){
u[k]<-func(u[k-1],v[k-1])
}
this is equivalent to
w <- Reduce(func, v, u0, accumulate=TRUE)
And we can check that
all(u==w)
# [1] TRUE
I've tried a couple ways of doing this problem but am having trouble with how to write it. I think I did the first three steps correctly, but now I have to fill the vector z with numbers from y that are divisible by four, not divisible by three, and have an odd number of digits. I know that I'm using the print function in the wrong way, I'm just at a loss on what else to use ...
This is different from that other question because I'm not using a while loop.
#Step 1: Generate 1,000,000 random, uniformly distributed numbers between 0
#and 1,000,000,000, and name as a vector x. With a seed of 1.
set.seed(1)
x=runif(1000000, min=0, max=1000000000)
#Step 2: Generate a rounded version of x with the name y
y=round(x,digits=0)
#Step 3: Empty vector named z
z=vector("numeric",length=0)
#Step 4: Create for loop that populates z vector with the numbers from y that are divisible by
#4, not divisible by 3, with an odd number of digits.
for(i in y) {
if(i%%4==0 && i%%3!=0 && nchar(i,type="chars",allowNA=FALSE,keepNA=NA)%%2!=0){
print(z,i)
}
}
NOTE: As per #BenBolker's comment, a loop is an inefficient way to solve your problem here. Generally, in R, try to avoid loops where possible to maximise the efficiency of your code. #SymbolixAU has provided an example of doing so here in the comments. Having said that, in aid of helping you learn the ins-and-outs of loops and vectors, here's a solution which only requires a change to one line of your code:
You've got the vector created before the loop, that's a good start. Now, inside your loop, you need to populate that vector. To do so, you've currently got print(z,i), which won't really do too much. What you need to to change the vector itself:
z <- c( z, i )
Should work for you (just replace that print line in your loop).
What's happening here is that we're taking the existing z vector, binding i to the end of it, and making that new vector z again. So every time a value is added, the vector gets a little longer, such that you'll end up with a complete vector.
where you have print put this instead:
z <- append(z, i)
I am normally a maple user currently working with R, and I have a problem with correctly indexing variables.
Say I want to define 2 vectors, v1 and v2, and I want to call the nth element in v1. In maple this is easily done:
v[1]:=some vector,
and the nth element is then called by the command
v[1][n].
How can this be done in R? The actual problem is as follows:
I have a sequence M (say of length 10, indexed by k) of simulated negbin variables. For each of these simulated variables I want to construct a vector X of length M[k] with entries given by some formula. So I should end up with 10 different vectors, each of different length. My incorrect code looks like this
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
for(k in 1:sims){
x[k]<-rep(NA,M[k])
X[k]<-rep(NA,M[k])
for(i in 1:M[k]){x[k][i]<-runif(1,min=0,max=1)
if(x[k][i]>=0 & x[i]<=0.1056379){
X[k][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[k][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
}
The error appears to be that x[k] is not a valid name for a variable. Any way to make this work?
Thanks a lot :)
I've edited your R script slightly to get it working and make it reproducible. To do this I had to assume that eks_2016_kasko was an integer value of 10.
require(MASS)
sims<-10
# Because you R is not zero indexed add one
M<-rnegbin(sims, 10*exp(-2.17173), 840.1746) + 1
# Create a list
x <- list()
X <- list()
for(k in 1:sims){
x[[k]]<-rep(NA,M[k])
X[[k]]<-rep(NA,M[k])
for(i in 1:M[k]){
x[[k]][i]<-runif(1,min=0,max=1)
if(x[[k]][i]>=0 & x[[k]][i]<=0.1056379){
X[[k]][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[[k]][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
This will work and I think is what you were trying to do, BUT is not great R code. I strongly recommend using the lapply family instead of for loops, learning to use data.table and parallelisation if you need to get things to scale. Additionally if you want to read more about indexing in R and subsetting Hadley Wickham has a comprehensive break down here.
Hope this helps!
Let me start with a few remarks and then show you, how your problem can be solved using R.
In R, there is most of the time no need to use a for loop in order to assign several values to a vector. So, for example, to fill a vector of length 100 with uniformly distributed random variables, you do something like:
set.seed(1234)
x1 <- rep(NA, 100)
for (i in 1:100) {
x1[i] <- runif(1, 0, 1)
}
(set.seed() is used to set the random seed, such that you get the same result each time.) It is much simpler (and also much faster) to do this instead:
x2 <- runif(100, 0, 1)
identical(x1, x2)
## [1] TRUE
As you see, results are identical.
The reason that x[k]<-rep(NA,M[k]) does not work is that indeed x[k] is not a valid variable name in R. [ is used for indexing, so x[k] extracts the element k from a vector x. Since you try to assign a vector of length larger than 1 to a single element, you get an error. What you probably want to use is a list, as you will see in the example below.
So here comes the code that I would use instead of what you proposed in your post. Note that I am not sure that I correctly understood what you intend to do, so I will also describe below what the code does. Let me know if this fits your intentions.
# define M
library(MASS)
eks_2016_kasko <- 486689.1
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
# define the function that calculates X for a single value from M
calculate_X <- function(m) {
x <- runif(m, min=0,max=1)
X <- ifelse(x > 0.1056379, rlnorm(m, 6.228244, 0.3565041),
rlnorm(m, 8.910837, 1.1890874))
}
# apply that function to each element of M
X <- lapply(M, calculate_X)
As you can see, there are no loops in that solution. I'll start to explain at the end:
lapply is used to apply a function (calculate_X) to each element of a list or vector (here it is the vector M). It returns a list. So, you can get, e.g. the third of the vectors with X[[3]] (note that [[ is used to extract elements from a list). And the contents of X[[3]] will be the result of calculate_X(M[3]).
The function calculate_X() does the following: It creates a vector of m uniformly distributed random values (remember that m runs over the elements of M) and stores that in x. Then it creates a vector X that contains log normally distributed random variables. The parameters of the distribution depend on the value x.
if I have a vector a<-c(3, 5, 7, 8)
and run a[1], not surprisingly I will get 3
but if I will run a[0] I basically get numeric(0)
What does this mean?
And what does this do?
How can I use it for normal reasons?
Others have answered what x[0] does, so I thought I'd expand on why it's useful: generating test cases. It's great for making sure that your functions work with unusual data structure variants that users sometimes produce accidentally.
For example, it makes it easy to generate 0 row and 0 column data frames:
mtcars[0, ]
mtcars[, 0]
These can arise when subsetting goes wrong:
mtcars[mtcars$cyl > 10, ]
But in your testing code it's useful to flag that you're doing it deliberately.
http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Indexing-by-vectors
As you can see it says: A special case is the zero index, which has null effects: x[0] is an empty vector and otherwise including zeros among positive or negative indices has the same effect as if they were omitted.