D3 and crossfilter throw error - crossfilter

I need to make a dashboard with D3 DC and crossfilter.
The data coming from the web service is almost 1 million records.
However, crossfilter's thrown error "Uncaught RangeError: Maximum call stack size exceeded".
What could cause crossfilter to crash in this way?

The most likely cause is that you have data which is not naturally ordered: i.e. you've got NaNs in your data.
https://github.com/crossfilter/crossfilter/wiki/Crossfilter-Gotchas#natural-ordering-of-dimension-and-group-values
The problem is that NaN < x is false for all numbers x, and NaN > x is also always false. So if your sorting routine does not explicitly check for invalid input, it can run forever. And if that sorting routine is recursive, it can crash.
For reasons of efficiency, crossfilter does not check its input. Maybe validation could be added without hurting efficiency?
https://github.com/crossfilter/crossfilter/issues/69

Related

fisher.test crash R with *** caught segfault *** error

As title said, fisher.test crash R with *** caught segfault *** error. Here is the code to produce the error:
d<-matrix(c(1,0,5,2,1,90,0,0,0,1,0,14,0,0,0,0,0,5,0,
0,0,0,0,2,0,0,0,0,0,2,2,1,0,2,3,89),
nrow=6,byrow = TRUE)
fisher.test(d,simulate.p.value=FALSE)
I found this, since I use the fisher.test inside some functions. Running them on the data produced R to crash with the aforementioned error.
I understand that the table provided to fisher.test is ill behaved, but that kind of things should not be happening, I guess.
I would appreciate any suggestions on which conditions should be met by the contingency table in order to avoid this kind of crashes due to the fisher.test misbehavior. Also what other arguments should be set in fisher.test in order to avoid the crash, I did a little test in which
fisher.test(d,simulate.p.value=TRUE)
does not crash and produced a result.
I am asking for this since I will have to implement that to avoid future crashes in my pipeline.
I can confirm that this is a bug in R 4.2 and that it is now fixed in the development branch of R (with this commit on 7 May). I wouldn't be surprised if it were ported to a patch-release sometime soon, but that's unknown/up to the R developers. Running your example above doesn't segfault any more, but it does throw an error:
Error in fisher.test(d, simulate.p.value = FALSE) :
FEXACT[f3xact()] error: hash key 5e+09 > INT_MAX, kyy=203, it[i (= nco = 6)]= 0.
Rather set 'simulate.p.value=TRUE'
So this makes your workflow better (you can handle these errors with try()/tryCatch()), but it doesn't necessarily satisfy you if you really want to perform an exact Fisher test on these data. (Exact tests on large tables with large entries are extremely computationally difficult, as they essentially have to do computations over the set of all possible tables with given marginal values.)
I don't have any brilliant ideas for detecting the exact conditions that will cause this problem (maybe you can come up with a rough rubric based on the dimensions of the table and the sum of the counts in the table, e.g. if (prod(dim(d)) > 30 && sum(d) > 200) ... ?)
Setting simulate.p.value=TRUE is the most sensible approach. However, if you expect precise results for extreme tables (e.g. you are working in bioinformatics and are going to apply a huge multiple-comparisons correction to the results), you're going to be disappointed. For example:
dd <- matrix(0, 6, 6)
dd[5,5] <- dd[6,6] <- 100
fisher.test(dd)$p.value
## 2.208761e-59, reported as "< 2.2e-16"
fisher.test(dd, simulate.p.value = TRUE, B = 10000)$p.value
# 9.999e-05
fisher.test(..., simulate.p.value = TRUE) will never return a value smaller than 1/(B+1) (this is what happens if none of the simulated tables are more extreme than the observed table: technically, the p-value ought to be reported as "<= 9.999e-05"). Therefore, you will never (in the lifetime of the universe) be able to calculate a p-value like 1e-59, you'll just be able to set a bound based on how large you're willing to make B.

Using large hash tables in R

I'm trying to use package hash, which I understand is the most commonly adopted implementation (other than directly using environments).
If I try to create and store hashes larger than ~20MB, I start getting protect(): protection stack overflow errors.
pryr::object_size(hash::hash(1:120000, 1:120000)) # * (see end of post)
#> 21.5 MB
h <- hash::hash(1:120000, 1:120000)
#> Error: protect(): protection stack overflow
If I run the h <- ... command once, the error only appears once. If I run it twice, I get an infinite loop of errors appearing in the console, freezing Rstudio and forcing me to restart it from the Task Manager.
From multiple other SO questions, I understand this means I'm creating more pointers than R can protect. This makes sense to me, since hashes are actually just environments (which themselves are just hash tables), so I assume R needs to keep track of each value in the hash table as a separate pointer.
The common solution I've seen for the protect() error is to use rstudio.exe --max-ppsize=500000 (which I assume propagates that option to R itself), but it doesn't help in this case, the error remains. This is somewhat surprising, since the hash in the example above is only 120,000 keys/pointers long, much smaller than the given ppsize of 500,000.
So, how can I use large hashes in R? I'm assuming changing to pure environments won't help, since hash is really just a wrapper around environments.
* For the record, the given hash::hash() call above will create hashes with non-syntactic names, but that's irrelevant: my real case has simple character keys and integer values and shows the same behavior)
This is a bug in RStudio, not a limitation in R. The bug happens when it tries to examine the h object for display in the environment pane. The bug is on their issue list as https://github.com/rstudio/rstudio/issues/5546 .

Irregular error warning: RuntimeWarning: invalid value encountered in double_scalars

My code contains some random steps and exponential expression (monotonic expression), which needs to find its root at the end. The "RuntimeWarning: invalid value encountered in double_scalars" appeared occasionally. For example, 3 or 2 times it appeared when I run 5 times. Could you tell me what's going on here? PS: each time I can get the result, but it's just the warning makes me confused.
There are two possible way to solve it, depends on your data.
1.
As you are handling some huge number and exceed the limit of double
To solve this, the method is actually quite mathematical.
First, if and only if (T_data[runs][0])*(np.exp(-(x)*(T_data[runs][1]))) is always smaller than 1.7976931348623157e+308.
As a*e^(-x*b) = e(ln(a)-xb)
Thus, (T_data[runs][0])*(np.exp(-(x)*(T_data[runs][1]))) = np.exp(T_data[runs][0]-(x)*(T_data[runs][1]))
Use np.exp(np.log(T_data[runs][0])-(x)*(T_data[runs][1])) instead.
2.
However, as you said you get the result everytime, it is possible that (T_data[runs][0])*(np.exp(-(x)*(T_data[runs][1]))) is approaching zero, which is too small that double can no longer hold but cause no harm to save as 0.
And you should change your code like this to avoid the warning.
temp = (x)*(T_data[runs][1])) > 709 ? 0 : np.exp(-(x)*(T_data[runs][1]))
exponential += (T_data[runs][0]) * temp
## As ln(1.7976931348623157e+308) ~= 709.78

Product of range in Prolog

I need to write a program, which calculates product of product in range:
I written the following code:
mult(N,N,R,R).
mult(N,Nt,R,Rt):-N1=Nt+1,R1=Rt*(1/(log(Nt))),mult(N,N1,R,R1).
This should implement basic product from Nt to N of 1/ln(j). As far as I understand it's got to be stopped when Nt and N are equal. However, I can't get it working due to:
?- mult(10,2,R,1), write(R).
ERROR: Out of global stack
The following error. Is there any other way to implement loop not using default libraries of SWI-Prolog?
Your program never terminates! To see this consider the following failure-slice of your program:
mult(N,N,R,R) :- false.
mult(N,Nt,R,Rt):-
N1=Nt+1,
R1=Rt*(1/(log(Nt))),
mult(N,N1,R,R1), false.
This new program does never terminate, and thus the original program doesn't terminate. To see that this never terminates, consider the two (=)/2 goals. In the first, the new variable N1 is unified with something. This will always succeed. Similarly, the second goal with always succeed. There will never be a possibility for failure prior to the recursive goal. And thus, this program never terminates.
You need to add some goal, or to replace existing goals. in the visible part. Maybe add
N > Nt.
Further, it might be a good idea to replace the two (=)/2 goals by (is)/2. But this is not required for termination, strictly speaking.
Out of global stack means you entered a too-long chain of recursion, possibly an infinite one.
The problem stems from using = instead of is in your assignments.
mult(N,N,R,R).
mult(N,Nt,R,Rt):-N1 is Nt+1, R1 is Rt*(1/(log(Nt))), mult(N,N1,R,R1).
You might want to insert a cut in your first clause to avoid going on after getting an answer.
If you have a graphical debugger (like the one in SWI) try setting 'trace' and 'debug' on and running. You'll soon realize that performing N1 = Nt+1 giving Ntas 2 yields the term 2+1. Since 2+1+1+1+(...) will never unify with with 10, that's the problem right there.

Ignoring errors in R

I'm running a complex but relatively quick simulation in R (takes about 5-10 minutes per simulation) and I'm beginning to run it in parallel with various input values in order to test the robustness of some of my algorithms.
There seems to be one problem: some arrangements of inputs cause a fatal error within the simulation and the whole code comes crashing down, causing the simulations to end. Is there an easy way to catch the error (which may come from a variety of locations) and have it just ignore those input values and move on to the next?
It's frustrating when I set an array of inputs to check that should take 5-6 hours to run through all the simulations and I come back to find that it crashed in the first 45 minutes.
While I work on trying to fix the bug / identify inputs that push me to that error, any ideas on how to ignore / catch the errors as they come?
Thanks
I don't know how did your organize your simulations, but I guess uu have a loop where you check use new arguments at each step.
You can use tryCatch . Here I am throwing an error if I have bad input.
step.simul <- function (x) {
stopifnot(x%%2 == 1, x>0)
(x - 1)/2
}
Then using tryCatch, I flag the bad inputs with a code
that tells you about the bad input:
sapply(1:5, function(i)tryCatch(step.simul(i), error=function(e)-1000-i))
[1] 0 -1002 1 -1004 2
As you see my simulations runs over all the loop index.

Resources