R: Avoid floating point arithmetic when normalizing to sum 0 - r

I need to pull a vector from a normal distribution and normalize it to sum 0, because I want to simulate the power with the pwr.rasch() function from the pwrRasch package. Sounds easy enough.
I create the vector like this:
set.seed(123)
itempars <- rnorm(n = 10, mean = 0, sd = 1.8)
To normalize the parameters to 0, I subtract the sum of my vector off of the last element of the vector like this:
itempars[10] <- itempars[10] - sum(itempars)
When I type sum(itempars) it should be 0, but it's -8.326673e-17. How is it possible? How can I get it to 0? I tried to round already, but it only increases the sum.
I don't want to choose every Itemparameter by hand. Thanks in advance!
EDIT:
Obviously the reasion is floating-point arithmetic. But it's hard to imagine that there is no way around.
The error massage of pwr.rasch() is as follows:
Error in simul.rasch(eval(parse(text = ppar[[1]])), ipar[[1]]) :
Item pararameters are not normalized to sum-0
Sadly, the function has poor documentation. When I estimate groupwise itemparameters with eRm's RM() function, which has an extra argument for normalizing to sum 0, it gives me a similar difference like in my example.
Any trick'd come in handy as I don't want to create more than 50 normal distributed itemparameters per hand. Even worse. If I understood floating-point arithemtic corretly this problem can appear with the use of doubles in general. It'd be extremely limitating if I'd only be able to use integers as itemparameters.

I downloaded the source code of the pwrRasch package and changed the if condition from
if (all(round(unlist(lapply(ipar, sum)), 3) != 0)) {
stop("Item pararameters are not normalized to sum-0")
}
to
if (all(round(unlist(lapply(ipar, sum)), 3) > 1e-5)) {
stop("Item pararameters are not normalized to sum-0")
}

Related

R: difference between apply(object, 1, function(x) sum(x-a)/b) and rowsums((object-a)/b)

I'm new to R and am struggling with the apply function. It is really slow to execute and I was trying to optimize some code I received.
I am trying to do some matrix operations (element-wise multiplication and division on ~10^6 element matrices) then sum the rows of the resulting matrix. I found the fantastic library Rfast and it executes what I thought was the same code in about 1/30 the time, but I am getting systematic differences between my 'optimized' answer and the previous answer.
The original code was something along the lines of
ans <- apply(object, 1, function(x) sum((x - a) / b))
and my code is
ans = Rfast:::rowsums((object-a)/b)
I'm not sure if it's because one of the methods is throwing away precision or making rounding errors - any thoughts?
Edit
Trying to reproduce the error is pretty hard...
I have been able to isolate the discrepancy to when I divide by my vector b with entries each ~ 3000 (i.e. [3016.460436, 3021.210321, 3033.3303219]. If I take this term out the two methods give the same answer.
I then tried two methods to improve my answer, one was dividing b by 1000 then dividing the sum by 1000 at the end. This didn't work, presumably because the float precision is the same either way.
I also tried forcing my b vector to be integers, which also didn't work.
Sample data doesn't reproduce my error either, which is frustrating...
objmat = rbind(rep(c(1,0,0),1000),rep(c(0,0,1),1000))
amat = rbind(rep(c(0.064384654, 0.025465132, 0.36543214),1000))
bmat = rbind(rep(c(1016.460431,1021.210431,1033.330431),1000))
ans = apply(objmat,1,function(x) sum((x-amat)/bmat))
gives
ans[1] = 0.5418828413
rowsums((objmat[1,]-amat)/bmat) = 0.5418828413
I think it has to be a floating point precision error, but I'm not sure why my dummy data doesn't reproduce it, or which method (apply or rowsums) would be more accurate!

How can I calculate large numbers in this operation in RStudio?

I am executing this operation and when the process gives me NaN,
Choques<-c(1:10)
print (Choques)
pr<-0
n<-3818
p<-0.040633627
for(i in Choques) {
pr[i]<-(factorial(n)/(factorial(Choques[i])*factorial(n-Choques[i])))*p^Choques[i]*(1-p)^(n-Choques[i])
print (pr[i])
}
However changing the value of the variable n for a smaller number let's say 20, if it shows numbers, I would like to know if there is any method to change the NaN by numbers, I suppose the numbers are too big.
Replace
factorial(n)/(factorial(Choques[i])*factorial(n-Choques[i]))
with
choose(n, Choques[i])
You are computing binomial probabilities in a loop and accumulating them in a vector. You can drop the factorials and even the loop by just using the vectorized function dbinom. Your loop can be replaced by the single line:
pr <- dbinom(Choques,n,p)

Applying rollapply to a writen function or using a loop

This is probably a basic rollapply or loop question, however I can not find a way to instruct the rollapply function or to make an expression for a loop calculation
I have a vector of growth rates with an initial value of 100. I would like to calculate the value at each point of the growth series and obtain a vector of this values. Given that the actual growth series is much longer tan the one below the example below is not possible.
x<-c(0.02,0.01,0.4,0.09,-0.3,0.1)
100*(1+x[1])*(1+x[2])*(1+x[3])*(1+x[4])*(1+x[5])*(1+x[6])#End Value
a1<-100*(1+x[1])#1st value
a2<-a1*(1+x[2])#2nd value
a3<-a2*(1+x[3])#3rd value
a4<-a3*(1+x[4])#4th value
a5<-a4*(1+x[5])#5th value
a6<-a5*(1+x[6])#6th value
s<-c(a1,a,2,a,3,a4,a,5,a6) #vector of values
I believe rollapply could be used here, however I can not write the function as to take the prior value and the next sequentially as to create a function and also I am unsure if and how to incorporate the initial value of 100 in the function or adding it at the beguining of x. In addition maybe this can be done as a loop. (Find the function in pseudo code)
x<-c(0.02,0.01,0.4,0.09,-0.3,0.1)
require(zoo);
fn<- function(y) {(1+prior x)*(1+next x)}
rollapply(x, 1, fun= fn, fill=NA, align='right')
Any help is welcomed
x<-c(0.02,0.01,0.4,0.09,-0.3,0.1)
desired <- 100*(1+x[1])*(1+x[2])*(1+x[3])*(1+x[4])*(1+x[5])*(1+x[6])#End Value
desired
100*tail( cumprod (1+x) , 1)
Oh, dammit. I should have read the comments first. #G.Grothendieck has already been here. I suppose showing how to do it with Reduce could be useful:
> 100*Reduce("*", 1+x)
[1] 121.0506

Programming a sensitivity analysis in R: Vary 1 parameter (column), hold others constant. Better way?

I want to test the sensitivity of a calculation to the value of 4 parameters. To do this, I want to vary one parameter at a time -- i.e., change Variable 1, hold variables 2-4 at a "default" value (e.g., 1). I thought an easy way to organize these values would be in a data.frame(), where each column corresponds to a different variable, and each row to a set of parameters for which the calculation should be made. I would then loop through each row of the data frame, evaluating a function given the parameter values in that row.
This seems like it should be a simple thing to do, but I can't find a quick way to do it.
The problem might be my overall approach to programming the sensitivity analysis, but I can't think of a good, simple way to program the aforementioned data.frame.
My code for generating the data.frame:
Adj_vals <- c(seq(0, 1, by=0.1), seq(1.1, 2, by=0.1)) #a series of values for 3 of the parameters to use
A_Adj_vals <- 10^(seq(1,14,0.5)) #a series of values for another one of the parameters to use
n1 <- length(Adj_vals)
n2 <- length(A_Adj_vals)
data.frame(
"Dg_Adj"=c(Adj_vals, rep(1, n1*2+n2)), #this parameter's default is 1
"Df_Adj"=c(rep(1, n1), Adj_vals, rep(1, n1+n2)), #this parameter's default is 1
"sd_Adj"=c(rep(1, n1*2), 0.01, Adj_vals[-1], rep(1, n2)), #This parameter has default of 1, but unlike the others using Adj_vals, it can only take on values >0
"A"=c(rep(1E7, n1*3), A_Adj_vals) #this parameter's default is 10 million
)
This code produces the desired data.frame. Is there a simpler way to achieve the same result? I would accept an answer where sd_Adj takes on 0 instead of 0.01.
It's pretty debatable if this is better, but another way to do it would be to follow this pattern:
defaults<-data.frame(a=1,b=1,c=1,d=10000000)
merge(defaults[c("b","c","d")],data.frame(a=c(seq(0, 1, by=0.1), seq(1.1, 2, by=0.1))))
This should be pretty easy to cook up into a function that automatically removes the correct column from defaults based on the column name in the data frame you are merging with etc

if statement in r?

I am not sure what I am doing wrong here.
ee <- eigen(crossprod(X))$values
for(i in 1:length(ee)){
if(ee[i]==0:1e^-9) stop("singular Matrix")}
Using the eigen value approach, I am trying to determine if the matrix is singular or not. I am attempting to find out if one of the eigen values of the matrix is between 0 and 10^-9. How can I use the if statement (as above) correctly to achieve my goal? Is there any other way to approach this?
what if I want to concatenate the zero eigen value in vector
zer <-NULL
ee <- eigen(crossprod(X))$values
for(i in 1:length(ee)){
if(abs(ee[i])<=1e-9)zer <- c(zer,ee[i])}
Can I do that?
#AriBFriedman is quite correct. I can, however see a couple of other issues
1e^-9 should be 1e-9.
0:1e-9 returns 0, (: creates a sequence by one between 0 and 1e-9, therefore returns just 0. See ?`:` for more details
Using == with decimals will cause problems due to floating point arithmetic
In the form written, your code checks (individually) whether the elements ee[i] == 0, which is not what you want (nor does it make sense in terms floating point arithmetic)
You are looking for cases where the eigen value is less than this small number, so use less than (<).
What you are looking for is something like
if(any(abs(ee) < 1e-9)) stop('singular matrix')
If you want to get the 0 (or small) eigen vectors, then use which
# this will give the indexs (which elements are small)
small_values <- which(abs(ee) < 1e-9))
# and those small values
ee[small_values]
There is no need for the for loop as everything being done is vectorized.
if takes a single argument of length 1.
Try either ifelse or using any() or all() to turn your vector of logicals into a logical vector of length 1.
Here's an example reproducing your data:
X <- matrix(1:10,1:10)
ee <- eigen(crossprod(X))$values
This will test if any of the values of ee are > 0 AND< 1e-9
if (any((ee > 0) & (ee < 1e-9))) {stop("singular matrix")}

Resources