Error "cannot allocate memory block of size 67108864 Tb" in the R function false.nearest - r

The R function
tseriesChaos::false.nearest(series, m, d, t, rt=10, eps=sd(series)/10)
realizes the false nearest neighbours algorithm to help deciding the optimal embedding dimension.
I would like to apply it to the following series:
dput(x)
c(0.230960354326456, 0.229123906233121, 0.222750351085665, 0.230096143459004,
0.226315220913903, 0.228151669007238, 0.225775089121746, 0.229447985308415,
0.230096143459004, 0.232256670627633, 0.23722588311548, 0.236361672248029,
0.231716538835476, 0.229231932591552, 0.229880090742141, 0.229447985308415,
0.236901804040186, 0.234525224154694, 0.236577724964891, 0.240574700226855,
0.238090093982932, 0.233552986928811, 0.235929566814303, 0.228799827157827,
0.224694825537431, 0.225775089121746, 0.224694825537431, 0.221129955709193,
0.214540347844874, 0.213352057902128, 0.21054337258291, 0.208706924489575,
0.211083504375068, 0.212487847034676, 0.20903100356487, 0.206654423679378,
0.213027978826834, 0.211083504375068, 0.216160743221346, 0.213244031543697,
0.214324295128011, 0.216160743221346, 0.215512585070757, 0.218753375823701,
0.215836664146052, 0.225126930971157, 0.228367721724101, 0.23128443340175,
0.240574700226855, 0.244139570055093, 0.246732202657448, 0.248028518958626,
0.246300097223723, 0.245976018148428, 0.241762990169601, 0.245976018148428,
0.248892729826078, 0.258831154801772, 0.265744841741385, 0.259803392027655,
0.258831154801772, 0.261855892837852, 0.262504050988441, 0.262071945554715,
0.257102733066868, 0.270065896078643, 0.276655503942962, 0.280544452846495,
0.280004321054337, 0.276547477584531, 0.286485902560225, 0.278924057470023,
0.279140110186886, 0.272658528680998, 0.262828130063736, 0.26466457815707,
0.254726153181376, 0.264448525440207, 0.261207734687264, 0.269741817003349,
0.259587339310792, 0.256886680350005, 0.26163984012099, 0.252133520579021,
0.257858917575888, 0.255158258615102, 0.252457599654316, 0.251701415145295,
0.251161283353138, 0.251053256994707, 0.251917467862158, 0.24316733282921,
0.242195095603327, 0.249540887976666, 0.259263260235497, 0.259263260235497,
0.258399049368046, 0.252565626012747, 0.263800367289619, 0.262071945554715,
0.259695365669223, 0.256886680350005, 0.253213784163336, 0.260127471102949,
0.268769579777466, 0.271578265096684, 0.270173922437075, 0.267905368910014,
0.262071945554715, 0.262936156422167, 0.261855892837852, 0.262720103705304,
0.259047207518635, 0.263044182780598, 0.257102733066868, 0.259155233877066,
0.259155233877066, 0.250297072485687, 0.24089877930215, 0.239494436642541,
0.241546937452738, 0.24014259479313, 0.244355622771956, 0.242195095603327,
0.242303121961759, 0.241438911094307, 0.236901804040186, 0.238954304850383,
0.236793777681754, 0.239386410284109, 0.241546937452738, 0.24608404450686,
0.244139570055093, 0.237333909473912, 0.238954304850383, 0.240250621151561,
0.235281408663714, 0.234093118720968, 0.237657988549206, 0.246948255374311,
0.249432861618235, 0.246516149940585, 0.247164308091174, 0.252997731446473,
0.258399049368046, 0.258399049368046, 0.256238522199417, 0.268661553419034,
0.275143134924922, 0.273630765906881, 0.270281948795506, 0.265204709949228,
0.262071945554715, 0.258074970292751, 0.261747866479421, 0.260883655611969,
0.264124446364913, 0.267257210759425, 0.271146159662958, 0.273954844982176,
0.266933131684131, 0.269201685211192, 0.278383925677865, 0.278491952036297,
0.271146159662958, 0.272982607756293, 0.27503510856649, 0.282921032731987,
0.285297612617479, 0.285189586259047, 0.280436426488063, 0.287026034352382,
0.288538403370422, 0.286593928918656, 0.287998271578265, 0.285081559900616,
0.28464945446689, 0.279032083828454, 0.280112347412769, 0.278816031111591,
0.281624716430809, 0.278491952036297, 0.2802203737712, 0.279896294695906,
0.28097655828022, 0.276763530301394, 0.272550502322567, 0.276979583018256,
0.292643404990818, 0.28907853516258, 0.291239062331209, 0.293615642216701,
0.286918007993951, 0.287998271578265, 0.288322350653559, 0.280868531921789,
0.274386950415901, 0.271146159662958, 0.278275899319434, 0.277411688451982,
0.279140110186886, 0.28907853516258, 0.258939181160203, 0.256670627633142,
0.25278167872961, 0.255698390407259, 0.261423787404127, 0.260559576536675,
0.263692340931187, 0.260667602895106, 0.255158258615102, 0.257858917575888,
0.250081019768824, 0.245219833639408, 0.24684022901588, 0.244895754564114,
0.242195095603327, 0.246300097223723, 0.253861942313925, 0.253429836880199,
0.264988657232365, 0.260235497461381, 0.258831154801772, 0.258831154801772,
0.253213784163336, 0.249864967051961, 0.250081019768824, 0.245219833639408,
0.249756940693529, 0.245651939073134, 0.24835259803392, 0.24835259803392,
0.245867991789997, 0.248244571675489, 0.247056281732743, 0.249756940693529,
0.248676677109215, 0.251593388786864, 0.254186021389219, 0.250837204277844,
0.251593388786864, 0.248676677109215, 0.249540887976666, 0.251593388786864,
0.242627201037053, 0.242519174678622, 0.240250621151561, 0.240034568434698,
0.243059306470779, 0.244031543696662)
Hence, I used the code:
false.nearest(x, m=50, d=r, t=220, eps=1, rt=3)
Anyway, I obtained the error:
Error in false.nearest(x, m = 50, d = r, t = 220, eps = 1, rt = 3) :
cannot allocate memory block of size 67108864 Tb
I can't explain it, vector x has only 250 observations!

Looking at false.nearest source code in tseriesChaos package:
/*
False nearest neighbours algorithm.
in_series: input time series (scaled between 0 and 1)
in_length: time series length
in_m, in_d, in_t: embedding dimension, time delay, theiler window
in_eps: neighbourhood size
in_rt: escape factor
out: fraction of false nearests
out2: total number of nearests
*/
void falseNearest(double *in_series, int *in_length, int *in_m, int *in_d, int *in_t, double *in_eps, double *in_rt, double *out, int *out2) {
double eps, *series;
double dst;
double *dsts;
int *ids;
int m,d, t, length, blength;
int num, denum;
int i,j,md;
double rt;
int id;
boxSearch bs;
/*
BIND PARAMETERS
*/
m = *in_m;
d = *in_d;
t = *in_t;
rt = *in_rt;
eps=*in_eps;
series=in_series;
length=*in_length;
/**/
/*
INIT VARIABLES
*/
blength = length - m*d - t;
With your parameters set :
length. <- 250
m <- 50
d <- 3
t <- 220
(blength = length. - m*d - t)
[1] -120
blength is used as parameter to R_alloc and should be positive, otherwise sign bit will be interpreted as a huge integer, causing memory allocation error :
dsts = (double*) R_alloc(blength, sizeof(double));
In this case, max value of m to keep blength positive is m=10.
Constraints on parameters use are not documented in the package, nor does the package output an informative error message : reason for error is understood, but difficult to help further.

Related

How is R able to sum an integer sequence so fast?

Create a large contiguous sequence of integers:
x <- 1:1e20
How is R able to compute the sum so fast?
sum(x)
Doesn't it have to loop over 1e20 elements in the vector and sum each element?
Summing up the comments:
R introduced something called ALTREP, or ALternate REPresentation for R objects. Its intent is to do some things more efficiently. From https://www.r-project.org/dsc/2017/slides/dsc2017.pdf, some examples include:
allow vector data to be in a memory-mapped file or distributed
allow compact representation of arithmetic sequences;
allow adding meta-data to objects;
allow computations/allocations to be deferred;
support alternative representations of environments.
The second and fourth bullets seem appropriate here.
We can see a hint of this in action by looking at what I'm inferring is at the core of the R sum primitive for altreps, at https://github.com/wch/r-source/blob/7c0449d81c853f781fb13e9c7118065aedaf2f7f/src/main/altclasses.c#L262:
static SEXP compact_intseq_Sum(SEXP x, Rboolean narm)
{
#ifdef COMPACT_INTSEQ_MUTABLE
/* If the vector has been expanded it may have been modified. */
if (COMPACT_SEQ_EXPANDED(x) != R_NilValue)
return NULL;
#endif
double tmp;
SEXP info = COMPACT_SEQ_INFO(x);
R_xlen_t size = COMPACT_INTSEQ_INFO_LENGTH(info);
R_xlen_t n1 = COMPACT_INTSEQ_INFO_FIRST(info);
int inc = COMPACT_INTSEQ_INFO_INCR(info);
tmp = (size / 2.0) * (n1 + n1 + inc * (size - 1));
if(tmp > INT_MAX || tmp < R_INT_MIN)
/**** check for overflow of exact integer range? */
return ScalarReal(tmp);
else
return ScalarInteger((int) tmp);
}
Namely, the reduction of an integer sequence without gaps is trivial. It's when there are gaps or NAs that things become a bit more complicated.
In action:
vec <- 1:1e10
sum(vec)
# [1] 5e+19
sum(vec[-10])
# Error: cannot allocate vector of size 37.3 Gb
### win11, R-4.2.2
Where ideally we would see that sum(vec) == (sum(vec[-10]) + 10), but we cannot since we can't use the optimization of sequence-summing.

Odd behavior in a recursive f# function

I'm trying a naive recursive function in f#:
let rec fact n =
if n > 0 then
n * fact (n - 1)
else
1
For small arguments, it works fine, however, if you pass a big enough number, it fails in an odd way:
> fact 41;;
val it : int = 0
> fact 25;;
val it : int = 2076180480
> fact 26;;
val it : int = -1853882368
I guess some sort of overflow is going on, but shouldn't I get an error???

Generating tuples containing Long for Vavr Property Checking

I need a pair of random longs for property checking with Vavr.
My implementation looks like this:
Gen<Long> longs = Gen.choose(Long.MIN_VALUE, Long.MAX_VALUE);
Arbitrary<Tuple2<Long, Long>> pairOfLongs = longs
.flatMap(value -> random -> Tuple.of(value, longs.apply(random)))
.arbitrary();
Is any better/nicer way to do the same in vavr?
Arbitrary<T> can be seen as a function of type
int -> Random -> T
Generating arbitrary integers
Because the sample size is of type int, it would be natural to do the following:
Arbitrary<Tuple2<Integer, Integer>> intPairs = size -> {
Gen<Integer> ints = Gen.choose(-size, size);
return random -> Tuple.of(ints.apply(random), ints.apply(random));
};
Let's test it:
Property.def("print int pairs")
.forAll(intPairs.peek(System.out::println))
.suchThat(pair -> true)
.check(10, 5);
Output:
(-9, 2)
(-2, -10)
(5, -2)
(3, 8)
(-10, 10)
Generating arbitrary long values
Currently we are not able to define a size of type long, so the workaround is to ignore the size and use the full long range:
Arbitrary<Tuple2<Long, Long>> longPairs = ignored -> {
Gen<Long> longs = Gen.choose(Long.MIN_VALUE, Long.MAX_VALUE);
return random -> Tuple.of(longs.apply(random), longs.apply(random));
};
Let's test it again:
Property.def("print long pairs")
.forAll(longPairs.peek(System.out::println))
.suchThat(pair -> true)
.check(0, 5);
Output:
(2766956995563010048, 1057025805628715008)
(-6881523912167376896, 7985876340547620864)
(7449864279215405056, 6862094372652388352)
(3203043896949684224, -2508953386204733440)
(1541228130048020480, 4106286124314660864)
Interpreting an integer size as long
The size parameter can be interpreted in a custom way. More specifically we could map a given int size to a long size:
Arbitrary<Tuple2<Long, Long>> longPairs = size -> {
long longSize = ((long) size) << 32;
Gen<Long> longs = Gen.choose(-longSize, longSize);
return random -> Tuple.of(longs.apply(random), longs.apply(random));
};
However, the last example does not match the full long range. Maybe it is possible to find a better mapping.
Disclaimer: I'm the author of Vavr (formerly known as Javaslang)

pyopencl.LogicError: clEnqueueNDRangeKernel failed: invalid work item size

I am attempting to implement in Python using pyopencl the dot_persist_kernel() shown here, and I've been squashing numerous bugs along the way. But, I've stumbled upon an issue that I can't crack:
self.program = cl.Program(self.ctx, code).build()
# code is a string with the code from the link given
a = cl_array.to_device(self.queue, np.random.rand(2**20).astype(np.float32))
b = cl_array.to_device(self.queue, np.random.rand(2**20).astype(np.float32))
c = 0.
mf = cl.mem_flags
c_buf = cl.Buffer(self.ctx, mf.WRITE_ONLY, 4)
MAX_COMPUTE_UNITS = cl.get_platforms()[0].get_devices()[0].max_compute_units
WORK_GROUPS_PER_CU = MAX_COMPUTE_UNITS * 4
ELEMENTS_PER_GROUP = a.size / WORK_GROUPS_PER_CU
ELEMENTS_PER_WORK_ITEM = ELEMENTS_PER_GROUP / 256
self.program.DotProduct(self.queue, a.shape, a.shape,
a.data, b.data, c_buf,
np.uint32(ELEMENTS_PER_GROUP),
np.uint32(ELEMENTS_PER_WORK_ITEM),
np.uint32(1028 * MAX_COMPUTE_UNITS))
Assuming an array of size 2^26, the constants will have values of:
MAX_COMPUTE_UNITS = 32 // from get_device()[0].max_compute_units
WORK_GROUPS_PER_CU = 128 // MAX_COMPUTE_UNITS * 4
ELEMENTS_PER_GROUP = 524288 // 2^19
ELEMENTS_PER_WORK_ITEM = 2048 // 2^11
The kernel header looks like:
#define LOCAL_GROUP_XDIM 256
// Kernel for part 1 of dot product, version 3.
__kernel __attribute__((reqd_work_group_size(LOCAL_GROUP_XDIM, 1, 1)))
void dot_persist_kernel(
__global const double * x, // input vector
__global const double * y, // input vector
__global double * r, // result vector
uint n_per_group, // elements processed per group
uint n_per_work_item, // elements processed per work item
uint n // input vector size
)
The error that it is giving is:
Traceback (most recent call last):
File "GPUCompute.py", line 102, in <module>
gpu = GPUCompute()
File "GPUCompute.py", line 87, in __init__
np.uint32(1028 * MAX_COMPUTE_UNITS))
File "C:\Miniconda2\lib\site-packages\pyopencl\__init__.py", line 512, in kernel_call
global_offset, wait_for, g_times_l=g_times_l)
pyopencl.LogicError: clEnqueueNDRangeKernel failed: invalid work item size
I've tried shifting the numbers around a lot, to no avail. Ideas?
There were a few issues going on with the previous implementation, but this one is working:
WORK_GROUPS = cl.get_platforms()[0].get_devices()[0].max_compute_units * 4
ELEMENTS_PER_GROUP = np_a.size / WORK_GROUPS
LOCAL_GROUP_XDIM = 256
ELEMENTS_PER_WORK_ITEM = ELEMENTS_PER_GROUP / LOCAL_GROUP_XDIM
self.program = cl.Program(self.ctx, kernel).build()
self.program.DotProduct(
self.queue, np_a.shape, (LOCAL_GROUP_XDIM,), # kernel information
cl_a, cl_b, cl_c, # data
np.uint32(ELEMENTS_PER_GROUP), # elements processed per group
np.uint32(ELEMENTS_PER_WORK_ITEM), # elements processed per work item
np.uint32(np_a.size) # input vector size
)
It was the culmination of a few things, but the biggest factor was that the second and third arguments passed to DotProduct() are supposed to be tuples--not ints, like I thought. :)

How do you use matrices in Nimrod?

I found this project on GitHub; it was the only search term returned for "nimrod matrix". I took the bare bones of it and changed it a little bit so that it compiled without errors, and then I added the last two lines to build a simple matrix, and then output a value, but the "getter" function isn't working for some reason. I adapted the instructions for adding properties found here, but something isn't right.
Here is my code so far. I'd like to use the GNU Scientific Library from within Nimrod, and I figured that this was the first logical step.
type
TMatrix*[T] = object
transposed: bool
dataRows: int
dataCols: int
data: seq[T]
proc index[T](x: TMatrix[T], r,c: int): int {.inline.} =
if r<0 or r>(x.rows()-1):
raise newException(EInvalidIndex, "matrix index out of range")
if c<0 or c>(x.cols()-1):
raise newException(EInvalidIndex, "matrix index out of range")
result = if x.transposed: c*x.dataCols+r else: r*x.dataCols+c
proc rows*[T](x: TMatrix[T]): int {.inline.} =
## Returns the number of rows in the matrix `x`.
result = if x.transposed: x.dataCols else: x.dataRows
proc cols*[T](x: TMatrix[T]): int {.inline.} =
## Returns the number of columns in the matrix `x`.
result = if x.transposed: x.dataRows else: x.dataCols
proc matrix*[T](rows, cols: int, d: openarray[T]): TMatrix[T] =
## Constructor. Initializes the matrix by allocating memory
## for the data and setting the number of rows and columns
## and sets the data to the values specified in `d`.
result.dataRows = rows
result.dataCols = cols
newSeq(result.data, rows*cols)
if len(d)>0:
if len(d)<(rows*cols):
raise newException(EInvalidIndex, "insufficient data supplied in matrix constructor")
for i in countup(0,rows*cols-1):
result.data[i] = d[i]
proc `[][]`*[T](x: TMatrix[T], r,c: int): T =
## Element access. Returns the element at row `r` column `c`.
result = x.data[x.index(r,c)]
proc `[][]=`*[T](x: var TMatrix[T], r,c: int, a: T) =
## Sets the value of the element at row `r` column `c` to
## the value supplied in `a`.
x.data[x.index(r,c)] = a
var m = matrix( 2, 2, [1,2,3,4] )
echo( $m[0][0] )
This is the error I get:
c:\program files (x86)\nimrod\config\nimrod.cfg(36, 11) Hint: added path: 'C:\Users\H127\.babel\libs\' [Path]
Hint: used config file 'C:\Program Files (x86)\Nimrod\config\nimrod.cfg' [Conf]
Hint: system [Processing]
Hint: mat [Processing]
mat.nim(48, 9) Error: type mismatch: got (TMatrix[int], int literal(0))
but expected one of:
system.[](a: array[Idx, T], x: TSlice[Idx]): seq[T]
system.[](a: array[Idx, T], x: TSlice[int]): seq[T]
system.[](s: string, x: TSlice[int]): string
system.[](s: seq[T], x: TSlice[int]): seq[T]
Thanks you guys!
I'd like to first point out that the matrix library you refer to is three years old. For a programming language in development that's a lot of time due to changes, and it doesn't compile any more with the current Nimrod git version:
$ nimrod c matrix
...
private/tmp/n/matrix/matrix.nim(97, 8) Error: ']' expected
It fails on the double array accessor, which seems to have changed syntax. I guess your attempt to create a double [][] accessor is problematic, it could be ambiguous: are you accessing the double array accessor of the object or are you accessing the nested array returned by the first brackets? I had to change the proc to the following:
proc `[]`*[T](x: TMatrix[T], r,c: int): T =
After that change you also need to change the way to access the matrix. Here's what I got:
for x in 0 .. <2:
for y in 0 .. <2:
echo "x: ", x, " y: ", y, " = ", m[x,y]
Basically, instead of specifying two bracket accesses you pass all the parameters inside a single bracket. That code generates:
x: 0 y: 0 = 1
x: 0 y: 1 = 2
x: 1 y: 0 = 3
x: 1 y: 1 = 4
With regards to finding software for Nimrod, I would like to recommend you using Nimble, Nimrod's package manager. Once you have it installed you can search available and maintained packages. The command nimble search math shows two potential packages: linagl and extmath. Not sure if they are what you are looking for, but at least they seem more fresh.

Resources