Overlow on R trying to compute a big integer - r

I am encountering a R problem. I was simply trying to make the sum of all the different values in a column from a big data set. Code looks like that:
sum(Animal$Pigs, na.rm = TRUE)
However R tells me:
In sum(Animal$Pigs, na.rm = TRUE) :
integer overflow - use sum(as.numeric(.))
Does it mean that the resulting integer is too big ? Are there any packages that might help ? If not, is there another language I could turn to for large data set (I know a bit of python).

The manual of sum says:
Integer overflow should no longer happen since R version 3.5.0.
To calculate with large integer numbers you can use the gmp library.
sum(10L^100L, 10L^50L, 1L)
#[1] 1e+100
library(gmp)
sum.bigz(as.bigz("10")^100L, as.bigz("10")^50L, 1)
#[1] 10000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000001

Related

Implementing equations in R

I am new to R (also not too good at math) and I am trying to calculate this equation in R with some difficulties:
X is some integer data I have, with 550 samples.
Any help is appreciated since I am unsure how to do this. I think I have to use a for loop and the sum() function but other than that I don;t know.
R supports vectorisation, which means you very rarely need to implement for loops.
For example, you can solve your equation like so:
## I'm just making up a long numerical vector for x - obviously you can use anything
x <- 1:1000
solution <- sum(20/x)^0.5
Unless the brackets denote the integral, rather than the sum? In which case:
solution <- sum( (20/x)^0.5 )

R: difference between apply(object, 1, function(x) sum(x-a)/b) and rowsums((object-a)/b)

I'm new to R and am struggling with the apply function. It is really slow to execute and I was trying to optimize some code I received.
I am trying to do some matrix operations (element-wise multiplication and division on ~10^6 element matrices) then sum the rows of the resulting matrix. I found the fantastic library Rfast and it executes what I thought was the same code in about 1/30 the time, but I am getting systematic differences between my 'optimized' answer and the previous answer.
The original code was something along the lines of
ans <- apply(object, 1, function(x) sum((x - a) / b))
and my code is
ans = Rfast:::rowsums((object-a)/b)
I'm not sure if it's because one of the methods is throwing away precision or making rounding errors - any thoughts?
Edit
Trying to reproduce the error is pretty hard...
I have been able to isolate the discrepancy to when I divide by my vector b with entries each ~ 3000 (i.e. [3016.460436, 3021.210321, 3033.3303219]. If I take this term out the two methods give the same answer.
I then tried two methods to improve my answer, one was dividing b by 1000 then dividing the sum by 1000 at the end. This didn't work, presumably because the float precision is the same either way.
I also tried forcing my b vector to be integers, which also didn't work.
Sample data doesn't reproduce my error either, which is frustrating...
objmat = rbind(rep(c(1,0,0),1000),rep(c(0,0,1),1000))
amat = rbind(rep(c(0.064384654, 0.025465132, 0.36543214),1000))
bmat = rbind(rep(c(1016.460431,1021.210431,1033.330431),1000))
ans = apply(objmat,1,function(x) sum((x-amat)/bmat))
gives
ans[1] = 0.5418828413
rowsums((objmat[1,]-amat)/bmat) = 0.5418828413
I think it has to be a floating point precision error, but I'm not sure why my dummy data doesn't reproduce it, or which method (apply or rowsums) would be more accurate!

post-processing in mice, replace one variable with another

I'm trying to perform multiple imputation on a dataset in R where I have two variables, one of which needs to be the same or greater than the other one. I have set up the method and the predictive matrix, but I am having trouble understanding how to configure the post-processing. The manual (or main paper - van Buuren and Groothuis-Oudshoorn, 2011) states (section 3.5): "The mice() function has an argument post that takes a vector of strings of R commands. These commands are parsed and evaluated just after the univariate imputation function returns, and thus provide a way to post-process the imputed values." There are a couple of examples, of which the second one seems most useful:
R> post["gen"] <- "imp[[j]][p$data$age[!r[,j]]<5,i] <- levels(boys$gen)[1]"
this suggests to me that I could do:
R> ini <- mice(cbind(boys), max = 0, print = FALSE)
R> post["A"] <- "imp[[j]][p$data$B[!r[,j]]>p$data$A[!r[,j]],i] <- levels(boys$A)[boys$B]"
However, this doesn't work (when I plot A v B, I get random scatter rather than the points being confined to one half of the graph where A >= B).
I have also tried using the ifdo() function, as suggested in another sx post:
post["A"] <- "ifdo(A < B), B"
However, it seems the ifdo() function is not yet implemented. I tried running the code suggested for inspiration but afraid my R programming skills are not that brilliant.
So, in summary, has anyone any advice about how to implement post-processing in mice such that value A >= value B in the final imputed datasets?
Ok, so I've found an answer to my own question - but maybe this isn't the best way to do it.
In FIMD, there is a suggestion to do this kind of thing outside the imputation process, which thus gives:
R> long <- mice::complete(imp, "long", include = TRUE)
R> long$A <- with(long, ifelse(B < A, B, A))
This seems to work, so I'm happy.

I want to be able to manipulate objects in class 'phylo' - ie. round/ turn my bootstrap values from decimals (.998) into percentages (99%)

I am using RStudio, programs ape and phytools. I've generated a tree with 500 bootstrap replicates stored in an object of class phylo.
Where cw is the name of my tree, I've tried the following:
round(cw, digits = 2)
and I get the following error message:
Error in round(cw, digits = 2) :
non-numeric argument to mathematical function
I feel like it's probably a very simple manipulation but I'm not sure how to get there.
Hard to tell without a reproducible example but I guess that your bootstrap scores are probably stored in the $node.label subset of your tree.
You can try the following:
## Are the bootstraps in the $node.label object?
if(!is.null(cw$node.label)) {
## Are they as character or numeric?
class(cw$node.label)
}
If they are numeric values:
cw$node.label <- round(cw$node.label, digits = 2)
If they are characters, you can probably coerce them (that can produce some NAs)
cw$node.label <- round(as.numeric(cw$node.label), digits = 2)

R Sweave: digits number in xtable of prop.table

I'm making an xtableFtable on R Sweave and can't find a way to suppress the digits with this code. What I am doing false? I've read that it can happen if your values aren't numeric but factor or character, but is prop.table making them non-numeric? I'm lost...
library(xtable)
a <- ftable(prop.table(table(mtcars$mpg, mtcars$hp), margin=2)*100)
b <- xtableFtable(a, method = "compact", digits = 0)
print.xtableFtable(b, rotate.colnames = TRUE)
I've already tried with digits=c(0,0,0,0...) too.
You could use options(digits) to control how many digits will print. Try something like options(digits = 4) as the first line of your code (change 4 to whatever value you want between 1 and 22). See ?options for more information.
Or round the values before printing
a = round(ftable(prop.table(table(mtcars$mpg, mtcars$hp), margin=2)*100), 2)
b = xtableFtable(a, method = "compact")
print.xtableFtable(b, rotate.colnames = TRUE)
The "digits" argument to xtableFtable seems to be unimplemented (as of my version, which is 1.8.3), since after playing around with it for half an hour nothing seems to make any difference.
There's a hint to this effect in the function documentation:
It is not recommended that users change the values of align, digits or align. First of all, alternative values have not been tested. Secondly, it is most likely that to determine appropriate values for these arguments, users will have to investigate the code for xtableFtable and/or print.xtableFtable.
It's probably just carried over from the xtable function (on which xtableFtable is surely based) as a TODO which the maintainer hasn't gotten around to yet.

Resources