I am trying to generate some random numbers from hypergeometric distribution using R. However, the rhyper() behaves very strange when I have a very small number of white balls and a large number for black balls. Here is what I got in my computer:
> sum(rhyper(100,1000,1e9-1000,1e6))
[1] 91
> sum(rhyper(100,2000,1e9-2000,1e6))
[1] 204
> sum(rhyper(100,10000,1e9-10000,1e6))
[1] 1016
> sum(rhyper(100,20000,1e9-20000,1e6))
[1] 1909
> sum(rhyper(100,50000,1e9-50000,1e6))
[1] 4968
> sum(rhyper(100,5000,1e9-5000,1e6))
[1] 60
> sum(rhyper(100,6000,1e9-6000,1e6))
[1] 164
> sum(rhyper(100,8000,1e9-8000,1e6))
[1] 0
> sum(rhyper(100,9000,1e9-9000,1e6))
[1] 45
The first 5 works fine, but for the 6th, I expected to get a number around 500, but not something like 60, also for the 7th,8th,9th.
Something wrong with the rhyper() function or my computer?
Related
I have a web response being returned in raw format which I'm unable to properly encode. It contains the following values:
ef bc 86
The character is meant to be a Fullwidth Ampersand (to illustrate below):
> as.character("\uFF06")
[1] "&"
> charToRaw("\uFF02")
[1] ef bc 82
However, no matter what I've tried it gets converted to ". To illustrate:
> rawToChar(charToRaw("\uFF02"))
[1] """
Because of the equivalence of the raw values, I don't think there's anything I can do in my web call to influence the problem I'm having (happy to be corrected). I believe I need to work out how to properly do the character encoding.
I also took an extreme approach of trying all other encodings as follows but none converted to the fullwidth ampersand:
> x_raw <- charToRaw("\uFF02")
> x_raw
[1] ef bc 82
> sapply(
+ stringi::stri_enc_list()
+ ,function(encoding) stringi::stri_encode(str = x_raw, encoding)
+ ) |> # R's new native pipe
+ tibble::enframe(name = "encoding")
# A tibble: 1,203 x 2
encoding value
<chr> <chr>
1 037 "Õ¯b"
2 273 "Õ¯b"
3 277 "Õ¯b"
4 278 "Õ¯b"
5 280 "Õ¯b"
6 284 "Õ¯b"
7 285 "Õ~b"
8 297 "Õ¯b"
9 420 "\u001a\u001ab"
10 424 "\u001a\u001ab"
# ... with 1,193 more rows
My work around at the moment is to replace the strings after the encoding, but this character is just one example of many, and hard-coding every instance doesn't seem practical.
> rawToChar(x_raw)
[1] """
> stringr::str_replace_all(rawToChar(x_raw), c(""" = "\uFF06"))
[1] "&"
The substitution workaround is also complicated that I've also got characters like the HYPHEN (not HYPEN-MINUS) somehow getting converted where the last to raw values are getting converted to a string with what appears to be octal values:
> as.character("\u2010") # HYPHEN
[1] "‐"
> as.character("\u2010") |> charToRaw() # As raw
[1] e2 80 90
> as.character("\u2010") |> charToRaw() |> rawToChar() # Converted back to string
[1] "â€\u0090"
> charToRaw("â\200\220") # string with equivalent raw
[1] e2 80 90
Any help appreciated.
I'm not totally clear on exactly what you are trying to do, but the problem with getting back your original character is that R cannot determine the encoding automatically from the raw bytes. I assume you are on Windows. If you do
val <- rawToChar(charToRaw("\uFF06"))
val
# [1] "&"
Encoding(val)
# [1] "unknown"
Encoding(val) <- "UTF-8"
val
# [1] "&"
Just make sure to set the encoding properly.
I have the first dataset called exprs:
> class(exprs)
[1] "matrix"
> dim(exprs)
[1] 191812 89
My second dataset is called pData:
> class(pData)
[1] "data.frame"
> dim(pData)
[1] 89 3
However when I run:
all(rownames(pData)==colnames(exprs))
[1] FALSE
It results in FALSE. I need the final output to be TRUE.
Is this because one class = data.frame while the other class=matrix?
This is my next question from cycle of "strange" questions.
I found same difference in code execution in R console and RStudio and couldn't understand reason of it. It's also connected with incorrect work of "track" package in RStudio and R.NET as I'd written before in Incorrect work of track package in R.NET
So, let's look at example from https://search.r-project.org/library/base/html/taskCallback.html
(I corrected it a little for correct data output for sum in RStudio)
times <- function(total = 3, str = "Task a") {
ctr <- 0
function(expr, value, ok, visible) {
ctr <<- ctr + 1
cat(str, ctr, "\n")
if(ctr == total) {
cat("handler removing itself\n")
}
return(ctr < total)
}
}
# add the callback that will work for
# 4 top-level tasks and then remove itself.
n <- addTaskCallback(times(4))
# now remove it, assuming it is still first in the list.
removeTaskCallback(n)
## Not run:
# There is no point in running this
# as
addTaskCallback(times(4))
print(sum(1:10))
print(sum(1:10))
print(sum(1:10))
print(sum(1:10))
print(sum(1:10))
## End(Not run)
An output in R console:
>
> # add the callback that will work for
> # 4 top-level tasks and then remove itself.
> n <- addTaskCallback(times(4))
Task a 1
>
> # now remove it, assuming it is still first in the list.
> removeTaskCallback(n)
[1] TRUE
>
> ## Not run:
> # There is no point in running this
> # as
> addTaskCallback(times(4))
1
1
Task a 1
>
> print(sum(1:10))
[1] 55
Task a 2
> print(sum(1:10))
[1] 55
Task a 3
> print(sum(1:10))
[1] 55
Task a 4
handler removing itself
> print(sum(1:10))
[1] 55
> print(sum(1:10))
[1] 55
>
> ## End(Not run)
>
Okay, let's run this in RStudio.
Output:
> source('~/callbackTst.R')
[1] 55
[1] 55
[1] 55
[1] 55
[1] 55
Task a 1
>
Second run give us this:
> source('~/callbackTst.R')
[1] 55
[1] 55
[1] 55
[1] 55
[1] 55
Task a 2
Task a 1
>
Third:
> source('~/callbackTst.R')
[1] 55
[1] 55
[1] 55
[1] 55
[1] 55
Task a 3
Task a 2
Task a 1
>
and so on.
There is a strange difference between RStudio and R console and I don't know why. Could anyone help me? Is is bug or it's normal and I have curved hands?
Thank you.
P.S. This post connected with correct working of "track" package, because "track.start" method consist this part of code:
assign(".trackingSummaryChanged", FALSE, envir = trackingEnv)
assign(".trackingPid", Sys.getpid(), envir = trackingEnv)
if (!is.element("track.auto.monitor", getTaskCallbackNames()))
addTaskCallback(track.auto.monitor, name = "track.auto.monitor")
return(invisible(NULL))
which, I think, doesn't work correct in RStudio and R.NET
P.P.S. I use R 3.2.2 x64, RStudio 0.99.489 and Windows 10 Pro x64. On RRO this problem also exists under R.NET and RStudio
addTaskCallback() will add a callback that's executed when R execution returns to the top level. When you're executing code line-by-line, each statement executed will return control to the top level, and callbacks will execute.
When executed within source(), control isn't returned until the call to source() returns, and so the callback is only run once.
I would like to know how can I get the exponential of big negative number in R? For example when I try :
> exp(-6400)
[1] 0
> exp(-1200)
[1] 0
> exp(-2000)
[1] 0
but I need the value of above expression, even if it is so small, how can I get it in R?
Those number are too small. To know the minimum value your computer can handle try:
> .Machine$double.xmin
[1] 2.225074e-308
Will give you (from ?.Machine)
the smallest non-zero normalized floating-point number, a power of the radix, i.e., double.base ^ double.min.exp. Normally 2.225074e-308.
In my case
> .Machine$double.base
[1] 2
> .Machine$double.min.exp
[1] -1022
Actually I can calculate powers up to
> exp(-745)
[1] 4.940656e-324
To go around this issue you need infinite precision arithmetic.
In R you can achieve that using package Rmpfr (PDF vignette)
library(Rmpfr)
# Calculate exp(-100)
> a <- mpfr(exp(-100), precBits=64)
# exp(-1000)
> a^10
1 'mpfr' number of precision 64 bits
[1] 5.07595889754945890823e-435
# exp(-6400)
> a^64
1 'mpfr' number of precision 64 bits
[1] 3.27578823787094497049e-2780
# use an array of powers
> ex <- c(10, 20, 50, 100, 500, 1000, 1e5)
> a ^ ex
7 'mpfr' numbers of precision 64 bits
[1] 5.07595889754945890823e-435 2.57653587296115182772e-869
[3] 3.36969414830892462745e-2172 1.13548386531474089222e-4343
[5] 1.88757769782054893243e-21715 3.56294956530952353784e-43430
[7] 1.51693678090513840149e-4342945
Note that Rmpfr is based on GNU MPFR and requires GNU GMP. Under Linux you will need gmp, gmp-devel, mpfr, and mpfr-devel to be installed in your system in order to install these packages, not sure how that works under Windows.
I have these two characters and the "as.numeric" function doesn't work same for them. Can anyone help me why this is happening?
options(digits=22)
a="27"
as.numeric(a)
[1] 27.00000000000000000000
a="193381411288395777"
as.numeric(a)
[1] 193381411288395776.0000
It can be seen that in the second case the last digit is not "7" and it is "6". Basically the "as.numeric" function decreases 1 unit from the number in the second case.
Any help is appreciated.
You need to learn about the limits of representation of exact numbers. R can tell you what it has:
R> .Machine
$double.eps
[1] 2.22045e-16
$double.neg.eps
[1] 1.11022e-16
$double.xmin
[1] 2.22507e-308
$double.xmax
[1] 1.79769e+308
$double.base
[1] 2
$double.digits
[1] 53
$double.rounding
[1] 5
$double.guard
[1] 0
$double.ulp.digits
[1] -52
$double.neg.ulp.digits
[1] -53
$double.exponent
[1] 11
$double.min.exp
[1] -1022
$double.max.exp
[1] 1024
$integer.max
[1] 2147483647
$sizeof.long
[1] 8
$sizeof.longlong
[1] 8
$sizeof.longdouble
[1] 16
$sizeof.pointer
[1] 8
R>
Use the int64 package:
library(int64)
> as.int64("193381411288395777")
[1] 193381411288395777