How does "runif" function work internally in R? - r

I am trying to generate a set of uniformly distributed numbers in R. I know that we can use the function "runif" in R to do the same. But I really want to understand the idea behind how this function would have been developed. In the sense how does the code work for the function "runif". So, in a nutshell, I want to create my own function which can do the same task as the "runif"

Ultimately, runif calls a pseudorandom number generator. One of the simpler ones can be found here defined in C within the R code base and should be straightforward to emulate
static unsigned int I1=1234, I2=5678;
void set_seed(unsigned int i1, unsigned int i2)
{
I1 = i1; I2 = i2;
}
void get_seed(unsigned int *i1, unsigned int *i2)
{
*i1 = I1; *i2 = I2;
}
double unif_rand(void)
{
I1= 36969*(I1 & 0177777) + (I1>>16);
I2= 18000*(I2 & 0177777) + (I2>>16);
return ((I1 << 16)^(I2 & 0177777)) * 2.328306437080797e-10; /* in [0,1) */
}
So effectively this takes the initial integer seed values, shuffles them bitwise, then recasts them as double precision floating point numbers via multiplying by a small constant that normalises the doubles into the [0, 1) range.

Related

How is R able to sum an integer sequence so fast?

Create a large contiguous sequence of integers:
x <- 1:1e20
How is R able to compute the sum so fast?
sum(x)
Doesn't it have to loop over 1e20 elements in the vector and sum each element?
Summing up the comments:
R introduced something called ALTREP, or ALternate REPresentation for R objects. Its intent is to do some things more efficiently. From https://www.r-project.org/dsc/2017/slides/dsc2017.pdf, some examples include:
allow vector data to be in a memory-mapped file or distributed
allow compact representation of arithmetic sequences;
allow adding meta-data to objects;
allow computations/allocations to be deferred;
support alternative representations of environments.
The second and fourth bullets seem appropriate here.
We can see a hint of this in action by looking at what I'm inferring is at the core of the R sum primitive for altreps, at https://github.com/wch/r-source/blob/7c0449d81c853f781fb13e9c7118065aedaf2f7f/src/main/altclasses.c#L262:
static SEXP compact_intseq_Sum(SEXP x, Rboolean narm)
{
#ifdef COMPACT_INTSEQ_MUTABLE
/* If the vector has been expanded it may have been modified. */
if (COMPACT_SEQ_EXPANDED(x) != R_NilValue)
return NULL;
#endif
double tmp;
SEXP info = COMPACT_SEQ_INFO(x);
R_xlen_t size = COMPACT_INTSEQ_INFO_LENGTH(info);
R_xlen_t n1 = COMPACT_INTSEQ_INFO_FIRST(info);
int inc = COMPACT_INTSEQ_INFO_INCR(info);
tmp = (size / 2.0) * (n1 + n1 + inc * (size - 1));
if(tmp > INT_MAX || tmp < R_INT_MIN)
/**** check for overflow of exact integer range? */
return ScalarReal(tmp);
else
return ScalarInteger((int) tmp);
}
Namely, the reduction of an integer sequence without gaps is trivial. It's when there are gaps or NAs that things become a bit more complicated.
In action:
vec <- 1:1e10
sum(vec)
# [1] 5e+19
sum(vec[-10])
# Error: cannot allocate vector of size 37.3 Gb
### win11, R-4.2.2
Where ideally we would see that sum(vec) == (sum(vec[-10]) + 10), but we cannot since we can't use the optimization of sequence-summing.

Memoization code for "Longest Common Substring" doesn't work as expected

I was able to think of a recursive solution for the problem "Longest Common Substring" but when I try to memoize it, it doesn't seem to work as I expected it to, and throws a wrong answer.
Here is the recursive code.
int lcs(string X, string Y,int i, int j, int count)
{
if (i == 0 || j == 0)
return count;
if (X[i - 1] == Y[j - 1])
count = lcs(X,Y,i - 1, j - 1, count + 1);
count = max(count,max(lcs(X,Y,i, j-1, 0),lcs(X,Y,i - 1, j, 0)));
return count;
}
int longestCommonSubstr(string S1, string S2, int n, int m)
{
return lcs(S1,S2,n,m,0,dp);
}
And here is the memoized code.
int lcs(string X, string Y,int i, int j, int count,vector<vector<vector<int>>>& dp)
{
if (i == 0 || j == 0)
return count;
if(dp[i - 1][j - 1][count] != -1)
return dp[i - 1][j - 1][count];
if (X[i - 1] == Y[j - 1])
count = lcs(X, Y, i - 1, j - 1, count + 1, dp);
count = max(count,max(lcs(X,Y,i, j-1, 0,dp),lcs(X,Y,i - 1, j, 0,dp)));
return dp[i-1][j-1][count]=count;
}
int longestCommonSubstr(string S1, string S2, int n, int m)
{
int maxSize=max(n,m);
vector<vector<vector<int>>> dp(n,vector<vector<int>>(m,vector<int>(maxSize,-1)));
return lcs(S1,S2,n,m,0,dp);
}
I do know that the problem can be solved using a 2D DP vector as well but my objective was to convert my original recursive solution to a memoized solution and not write a solution from scratch. And as I have 3 parameters which are changing, so it should use a 3D DP table.
Can anyone figure out what's wrong or help me out with a 3D DP solution with recursive code same or similar to mine.
Note:-
An interesting observation, the max function for some reason works from left to right on my Mac system and on Ubuntu running under parallels as well, but the same function works from right to left in Windows machine and in online compilers. I do not know the reason but I would be happy to know about it. I'm running the code in an M1 Mac, I don't know if the ARM compiler is different from x86 Mac compiler or not.
Another thing, the memoized code gives different answers depending upon which recursive call is called first on the line,
count = max(count,max(lcs(X,Y,i, j-1, 0),lcs(X,Y,i - 1, j, 0)));
If I swap the positions of the function call statements then it gives a correct output but for that specific test case and probably similar cases.
This Memo solution gives TLE as well in large test cases, and I do not know why.
I recently started studying DP and this is the only question which I wasn't able to solve by just modifying the original recursive solution. It has been two days and I just can't figure out the proper reasons.
Submission Link:- https://practice.geeksforgeeks.org/problems/longest-common-substring1452/1/#
Any help in this regard would be great.

How to find a pair of numbers in a list given a specific range?

The problem is as such:
given an array of N numbers, find two numbers in the array such that they will have a range(max - min) value of K.
for example:
input:
5 3
25 9 1 6 8
output:
9 6
So far, what i've tried is first sorting the array and then finding two complementary numbers using a nested loop. However, because this is a sort of brute force method, I don't think it is as efficient as other possible ways.
import java.util.*;
public class Main {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
int n = sc.nextInt(), k = sc.nextInt();
int[] arr = new int[n];
for(int i = 0; i < n; i++) {
arr[i] = sc.nextInt();
}
Arrays.sort(arr);
int count = 0;
int a, b;
for(int i = 0; i < n; i++) {
for(int j = i; j < n; j++) {
if(Math.max(arr[i], arr[j]) - Math.min(arr[i], arr[j]) == k) {
a = arr[i];
b = arr[j];
}
}
}
System.out.println(a + " " + b);
}
}
Much appreciated if the solution was in code (any language).
Here is code in Python 3 that solves your problem. This should be easy to understand, even if you do not know Python.
This routine uses your idea of sorting the array, but I use two variables left and right (which define two places in the array) where each makes just one pass through the array. So other than the sort, the time efficiency of my code is O(N). The sort makes the entire routine O(N log N). This is better than your code, which is O(N^2).
I never use the inputted value of N, since Python can easily handle the actual size of the array. I add a sentinel value to the end of the array to make the inner short loops simpler and quicker. This involves another pass through the array to calculate the sentinel value, but this adds little to the running time. It is possible to reduce the number of array accesses, at the cost of a few more lines of code--I'll leave that to you. I added input prompts to aid my testing--you can remove those to make my results closer to what you seem to want. My code prints the larger of the two numbers first, then the smaller, which matches your sample output. But you may have wanted the order of the two numbers to match the order in the original, un-sorted array--if that is the case, I'll let you handle that as well (I see multiple ways to do that).
# Get input
N, K = [int(s) for s in input('Input N and K: ').split()]
arr = [int(s) for s in input('Input the array: ').split()]
arr.sort()
sentinel = max(arr) + K + 2
arr.append(sentinel)
left = right = 0
while arr[right] < sentinel:
# Move the right index until the difference is too large
while arr[right] - arr[left] < K:
right += 1
# Move the left index until the difference is too small
while arr[right] - arr[left] > K:
left += 1
# Check if we are done
if arr[right] - arr[left] == K:
print(arr[right], arr[left])
break

do not understand result of opencl select statement

I have a simple kernel in OpenCL that has the following structure:
kernel void simple_select(global double *input, global double *output) {
size_t i = get_global_id(0);
printf("input %d\n", (int)(input[i] != 0.0));
output[i] = select((float)0.0, (float)1.0, (int)(input[i] != 0.0));
//output[i] = select((float)0.0, (float)1.0, 1);
}
Equivalently this can be:
kernel void simple_select(global double *input, global double *output) {
size_t i = get_global_id(0);
printf("input %d\n", (int)(input[i] != 0.0));
output[i] = input[i] != 0.0 ? 1.0 : 0.0;
//output[i] = 1 ? 1.0 : 0.0;
}
When I print to the command line, I see:
input 1
input 1
input 1
But the output array has all 0.0. However, if I uncomment the last line of the kernel and comment out the second-to-last-line (meaning if I use the scalar 1 in the select statement) then it works as expected and the output array has all 1.0. So what is the difference between these two lines that leads to two different results?
Here is the answer.
It's a quirk in OpenCL. The problem is that true/false values for scalars are 1/0 (like printf has shown you), but true/false values for vectors are -1/0 - and this is also what select() expects in last argument (more precisely, it expects MSB set which means any negative integer).
Though i think the ternary operator on scalars should still work as expected, if it doesn't i would consider it a bug.

swapping elements of a 2 dimensional vector c++

I have a matrix of the form
vector<vector<int>> K
which has size NxN. How can i swap two elements of this vector say K[i]k[j] with K[n-j][i]?
In general, how can i swap two elements of an 2D vector?
Because vector's [] operator returns a reference, a std::swap() will work. For example:
swap(K[i][j], K[n-i][i]);
The general swapping could look something like this
void swap(int& a, int& b)
{
int temp = a;
a = b;
b = tmp;
}
Then you call it with
swap(K[i][j], K[n - j][i]);
Or you can just call std::swap as #Jeffrey suggests

Resources