How is R able to sum an integer sequence so fast? - r

Create a large contiguous sequence of integers:
x <- 1:1e20
How is R able to compute the sum so fast?
sum(x)
Doesn't it have to loop over 1e20 elements in the vector and sum each element?

Summing up the comments:
R introduced something called ALTREP, or ALternate REPresentation for R objects. Its intent is to do some things more efficiently. From https://www.r-project.org/dsc/2017/slides/dsc2017.pdf, some examples include:
allow vector data to be in a memory-mapped file or distributed
allow compact representation of arithmetic sequences;
allow adding meta-data to objects;
allow computations/allocations to be deferred;
support alternative representations of environments.
The second and fourth bullets seem appropriate here.
We can see a hint of this in action by looking at what I'm inferring is at the core of the R sum primitive for altreps, at https://github.com/wch/r-source/blob/7c0449d81c853f781fb13e9c7118065aedaf2f7f/src/main/altclasses.c#L262:
static SEXP compact_intseq_Sum(SEXP x, Rboolean narm)
{
#ifdef COMPACT_INTSEQ_MUTABLE
/* If the vector has been expanded it may have been modified. */
if (COMPACT_SEQ_EXPANDED(x) != R_NilValue)
return NULL;
#endif
double tmp;
SEXP info = COMPACT_SEQ_INFO(x);
R_xlen_t size = COMPACT_INTSEQ_INFO_LENGTH(info);
R_xlen_t n1 = COMPACT_INTSEQ_INFO_FIRST(info);
int inc = COMPACT_INTSEQ_INFO_INCR(info);
tmp = (size / 2.0) * (n1 + n1 + inc * (size - 1));
if(tmp > INT_MAX || tmp < R_INT_MIN)
/**** check for overflow of exact integer range? */
return ScalarReal(tmp);
else
return ScalarInteger((int) tmp);
}
Namely, the reduction of an integer sequence without gaps is trivial. It's when there are gaps or NAs that things become a bit more complicated.
In action:
vec <- 1:1e10
sum(vec)
# [1] 5e+19
sum(vec[-10])
# Error: cannot allocate vector of size 37.3 Gb
### win11, R-4.2.2
Where ideally we would see that sum(vec) == (sum(vec[-10]) + 10), but we cannot since we can't use the optimization of sequence-summing.

Related

Outlier for function runtime in R

I am trying to look at system runtime for computationally heavy math functions, but when I run my code I end up with a outlier at n=13.
Wilsons Theorem Runtime in R
(I can't upload photos directly yet)
wilson_r <- (function(x) factorial(x-1)%%x==x-1)
r_wilson_runtime <- c(1:22)
#R cannot compute `wilson_r(23)` or any $n>22$. As R has a 64 bit limit and $log_2(23!)>64$.
for (x in c(1:22)){
holder_times <- c(1:10000)
for (y in c(1:10000)){
start_time <- as.numeric(Sys.time())
wilson_r(x)
end_time <- as.numeric(Sys.time())
holder_times[y]<- end_time-start_time
}
r_wilson_runtime[x] <-mean(holder_times*(10**6))
}
I have tried knitting the document several times, and the outlier remains. Is there a particular reason for the oultier?
The result can be sometimes noisy. If it always happens at the same n (be sure knitr is regenerating the whole document) it is just a coincidence. You can easily get rid of the noise (outstanding measurements) in your example by taking a median not mean.
That said, R has a special function system.time, which is designed for measuring time of execution. It is also better to include the inner repetition loop inside of the measurement, like this:
wilson_r <- (function(x) factorial(x-1)%%x==x-1)
r_wilson_n = 1:22
r_wilson_runtime = sapply(r_wilson_n, function(x) {
N = 100000
ret = system.time({for (y in c(1:N)) wilson_r(x)})
1e6*ret[1]/N
})
plot(r_wilson_n, r_wilson_runtime)
Nevertheless, the result can be still sometimes noisy for such cheap functions (R is a language with automatic gc).
As for your wilson_r for higher n, it is not a good idea to use large integers if you use modulo at the end. It is better to do a modulo at every multiplication. You can use the inline package to make a small C function, to calculate this efficiently:
factorial_modulo = inline::cfunction(
signature(v="integer"),
" int n=v[0], ret=1, i;
for (i=2; i<n; i++)
ret = (ret * i) % n;
v[0] = ret;",
convention=".C")
wilson_r <- (function(x) factorial_modulo(x)==x-1)

How does "runif" function work internally in R?

I am trying to generate a set of uniformly distributed numbers in R. I know that we can use the function "runif" in R to do the same. But I really want to understand the idea behind how this function would have been developed. In the sense how does the code work for the function "runif". So, in a nutshell, I want to create my own function which can do the same task as the "runif"
Ultimately, runif calls a pseudorandom number generator. One of the simpler ones can be found here defined in C within the R code base and should be straightforward to emulate
static unsigned int I1=1234, I2=5678;
void set_seed(unsigned int i1, unsigned int i2)
{
I1 = i1; I2 = i2;
}
void get_seed(unsigned int *i1, unsigned int *i2)
{
*i1 = I1; *i2 = I2;
}
double unif_rand(void)
{
I1= 36969*(I1 & 0177777) + (I1>>16);
I2= 18000*(I2 & 0177777) + (I2>>16);
return ((I1 << 16)^(I2 & 0177777)) * 2.328306437080797e-10; /* in [0,1) */
}
So effectively this takes the initial integer seed values, shuffles them bitwise, then recasts them as double precision floating point numbers via multiplying by a small constant that normalises the doubles into the [0, 1) range.

Get a number from an array of digits

To split a number into digits in a given base, Julia has the digits() function:
julia> digits(36, base = 4)
3-element Array{Int64,1}:
0
1
2
What's the reverse operation? If you have an array of digits and the base, is there a built-in way to convert that to a number? I could print the array to a string and use parse(), but that sounds inefficient, and also wouldn't work for bases > 10.
The previous answers are correct, but there is also the matter of efficiency:
sum([x[k]*base^(k-1) for k=1:length(x)])
collects the numbers into an array before summing, which causes unnecessary allocations. Skip the brackets to get better performance:
sum(x[k]*base^(k-1) for k in 1:length(x))
This also allocates an array before summing: sum(d.*4 .^(0:(length(d)-1)))
If you really want good performance, though, write a loop and avoid repeated exponentiation:
function undigit(d; base=10)
s = zero(eltype(d))
mult = one(eltype(d))
for val in d
s += val * mult
mult *= base
end
return s
end
This has one extra unnecessary multiplication, you could try to figure out some way of skipping that. But the performance is 10-15x better than the other approaches in my tests, and has zero allocations.
Edit: There's actually a slight risk to the type handling above. If the input vector and base have different integer types, you can get a type instability. This code should behave better:
function undigits(d; base=10)
(s, b) = promote(zero(eltype(d)), base)
mult = one(s)
for val in d
s += val * mult
mult *= b
end
return s
end
The answer seems to be written directly within the documentation of digits:
help?> digits
search: digits digits! ndigits isdigit isxdigit disable_sigint
digits([T<:Integer], n::Integer; base::T = 10, pad::Integer = 1)
Return an array with element type T (default Int) of the digits of n in the given base,
optionally padded with zeros to a specified size. More significant digits are at higher
indices, such that n == sum([digits[k]*base^(k-1) for k=1:length(digits)]).
So for your case this will work:
julia> d = digits(36, base = 4);
julia> sum([d[k]*4^(k-1) for k=1:length(d)])
36
And the above code can be shortened with the dot operator:
julia> sum(d.*4 .^(0:(length(d)-1)))
36
Using foldr and muladd for maximum conciseness and efficiency
undigits(d; base = 10) = foldr((a, b) -> muladd(base, b, a), d, init=0)

Why do i get this error - MATLAB

I have the image and the vector
a = imread('Lena.tiff');
v = [0,2,5,8,10,12,15,20,25];
and this M-file
function y = Funks(I, gama, c)
[m n] = size(I);
for i=1:m
for j=1:n
J(i, j) = (I(i, j) ^ gama) * c;
end
end
y = J;
imshow(y);
when I'm trying to do this:
f = Funks(a,v,2)
I am getting this error:
??? Error using ==> mpower
Integers can only be combined with integers of the same class, or scalar doubles.
Error in ==> Funks at 5
J(i, j) = (I(i, j) ^ gama) * c;
Can anybody help me, with this please?
The error is caused because you're trying to raise a number to a vector power. Translated (i.e. replacing formal arguments with actual arguments in the function call), it would be something like:
J(i, j) = (a(i, j) ^ [0,2,5,8,10,12,15,20,25]) * 2
Element-wise power .^ won't work either, because you'll try to "stuck" a vector into a scalar container.
Later edit: If you want to apply each gamma to your image, maybe this loop is more intuitive (though not the most efficient):
a = imread('Lena.tiff'); % Pics or GTFO
v = [0,2,5,8,10,12,15,20,25]; % Gamma (ar)ray -- this will burn any picture
f = cell(1, numel(v)); % Prepare container for your results
for k=1:numel(v)
f{k} = Funks(a, v(k), 2); % Save result from your function
end;
% (Afterwards you use cell array f for further processing)
Or you may take a look at the other (more efficient if maybe not clearer) solutions posted here.
Later(er?) edit: If your tiff file is CYMK, then the result of imread is a MxNx4 color matrix, which must be handled differently than usual (because it 3-dimensional).
There are two ways I would follow:
1) arrayfun
results = arrayfun(#(i) I(:).^gama(i)*c,1:numel(gama),'UniformOutput',false);
J = cellfun(#(x) reshape(x,size(I)),results,'UniformOutput',false);
2) bsxfun
results = bsxfun(#power,I(:),gama)*c;
results = num2cell(results,1);
J = cellfun(#(x) reshape(x,size(I)),results,'UniformOutput',false);
What you're trying to do makes no sense mathematically. You're trying to assign a vector to a number. Your problem is not the MATLAB programming, it's in the definition of what you're trying to do.
If you're trying to produce several images J, each of which corresponds to a certain gamma applied to the image, you should do it as follows:
function J = Funks(I, gama, c)
[m n] = size(I);
% get the number of images to produce
k = length(gama);
% Pre-allocate the output
J = zeros(m,n,k);
for i=1:m
for j=1:n
J(i, j, :) = (I(i, j) .^ gama) * c;
end
end
In the end you will get images J(:,:,1), J(:,:,2), etc.
If this is not what you want to do, then figure out your equations first.

Diffie-Hellman -- Primitive root mod n -- cryptography question

In the below snippet, please explain starting with the first "for" loop what is happening and why. Why is 0 added, why is 1 added in the second loop. What is going on in the "if" statement under bigi. Finally explain the modPow method. Thank you in advance for meaningful replies.
public static boolean isPrimitive(BigInteger m, BigInteger n) {
BigInteger bigi, vectorint;
Vector<BigInteger> v = new Vector<BigInteger>(m.intValue());
int i;
for (i=0;i<m.intValue();i++)
v.add(new BigInteger("0"));
for (i=1;i<m.intValue();i++)
{
bigi = new BigInteger("" + i);
if (m.gcd(bigi).intValue() == 1)
v.setElementAt(new BigInteger("1"), n.modPow(bigi,m).intValue());
}
for (i=0;i<m.intValue();i++)
{
bigi = new BigInteger("" + i);
if (m.gcd(bigi).intValue() == 1)
{
vectorint = v.elementAt(bigi.intValue());
if ( vectorint.intValue() == 0)
i = m.intValue() + 1;
}
}
if (i == m.intValue() + 2)
return false;
else
return true;
}
Treat the vector as a list of booleans, with one boolean for each number 0 to m. When you view it that way, it becomes obvious that each value is set to 0 to initialize it to false, and then set to 1 later to set it to true.
The last for loop is testing all the booleans. If any of them are 0 (indicating false), then the function returns false. If all are true, then the function returns true.
Explaining the if statement you asked about would require explaining what a primitive root mod n is, which is the whole point of the function. I think if your goal is to understand this program, you should first understand what it implements. If you read Wikipedia's article on it, you'll see this in the first paragraph:
In modular arithmetic, a branch of
number theory, a primitive root modulo
n is any number g with the property
that any number coprime to n is
congruent to a power of g (mod n).
That is, if g is a primitive root (mod
n), then for every integer a that has
gcd(a, n) = 1, there is an integer k
such that gk ≡ a (mod n). k is called
the index of a. That is, g is a
generator of the multiplicative group
of integers modulo n.
The function modPow implements modular exponentiation. Once you understand how to find a primitive root mod n, you'll understand it.
Perhaps the final piece of the puzzle for you is to know that two numbers are coprime if their greatest common divisor is 1. And so you see these checks in the algorithm you pasted.
Bonus link: This paper has some nice background, including how to test for primitive roots near the end.

Resources