need to write a new random generator in R - r

I need to generate 7 random numbers between -1,1 which sum of them equals to 1. I used this code to do so.
diff(c(0, sort(round(runif(7,-1,1),2)), 1))
But I have a big problem with this.
one output of this code is -0.89, 0.21, 0.00, 0.21, 0.30, 0.19, 0.61, -0.63.
The problem is it is uniform I guess so it every time generates big random numbers in the first and last number which is not I want. I need it to be spread to all numbers.
ex. 0.22 -.21 .33 -.12 0.11 0.35 -0.08 (the sum is not equal to 1 just an example)
Do you know who I can write a code to get this kind of random numbers?

Your general idea is probably inspired by the answers linked in the random description. The standard problem is how to generate 7 numbers between 0 and 1 that add to 1. The answer is:
diff(c(0, sort(runif(6, 0, 1)), 1))
#> [1] 0.27960792 0.02035231 0.02638626 0.09945877 0.25134002 0.03379598 0.28905874
The necessary modifications for getting numbers between -1 and 1 are quite simple; just leave out the sort:
diff(c(0, runif(6, 0, 1), 1))
#> [1] 0.9961661 -0.6528227 0.5298829 -0.2087127 -0.2298045 0.2017705 0.3635203
How does this work? We again partition the space between zero and one. But b leaving out the sort, we allow for the possibility of going backward, i.e. negative numbers are possible. Here is the histogram for 1000 generations:
One weakness in this approach is that the first and last numbers are necessarily positive. If this bothers you, you can add an additional sample, e.g.:
sample(diff(c(0, runif(6, 0, 1), 1)), 7)
#> [1] -0.004242793 -0.725348335 0.385971491 0.320525822 0.389915347
#> [6] 0.053195271 0.579983197

There could be 2 solutions, and both of them work with infinite loop:
Solution 1:
You can consider 6 randoms, and 1 dependant so the sum of them can be 1. But, it might happen that, an element become more than 1 or less than -1. Therefore, we cannot accept all the answers.
while(T){
res<-runif(6,-1,1)
res<-append(res,1-sum(res))
if(sum(res>1)==0)
break
}
res
Output is:
-0.34038038 0.15811401 -0.20748670 0.26443104 0.45216639 -0.09912685 0.77228248
Solution 2:
we should continuesly generate different results, and hope to get a proper answer. But, inorder to reduce the time we must round the randoms by 1 digit:
while(T){
res<-round(runif(7,-1,1), digits = 1)
print(sum(res))
if(sum(res)==1)
break
}
res
Output:
> res
[1] -0.6 0.2 0.4 0.7 -0.2 0.6 -0.1

Similar solution like Salman Lashkarara. You should round the numbers to find a solution.
library(magrittr)
set.seed(42)
x <- 1
while(sum(x) != 0){
x <- runif(7,-1,1) %>%
round(3)
}
x
#> [1] 0.155 0.559 -0.335 -0.230 -0.490 -0.557 0.898
sum(x)
#> [1] 0
Created on 2018-09-16 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0).

Related

Use apply() on a 1-dim vector to find the best threshold

My current mission: pick up some "good" columns from a incomplete matrix, trying to remove NAs while keeping real data.
My idea: I can calculate evey column's missing data NA%. For a given threshold t, all the NA% > t columns will be removed. The removed columns also contain some real data. In these columns, present/missing will show the "price" of deleting these columes. My idea is to search the lowest "price" to delete NA as much as possible, for each dataset.
I already wrote my function till the last 2 steps:
myfunc1 <- function(x){
return(sum(is.na(x))
}
myfunc2 <- function(x){
return (round(myfunc1(x) / length(x),4))
}
myfunc3 <- function(t, set){
m <- which(apply(set, MARGIN = 2, myfunc2) > t)
missed <- sum(is.na(set[m]))
present <- sum(!is.na(set[m]))
return(present/ missed)
}
myfunc3(0.5, setA) # worked
threshold <- seq(from = 0, to = 0.95, by = 0.5)
apply(X = threshold, MARGIN = 1, FUN = myfunc3, set = setA) # not worked. stuck here.
I have 10 datasets from setA to setJ, I want to test all thresholds from 0 to 0.95. I want a matrix as a return with 10 datasets as column and 20 rows threshold with every 0.05 interval.
Did I do this correctly? Are there better ideas, or already existing libraries that I could use?
----------edit: example-----------
setA <- data.frame(cbind(c(1,2,3,4,NA,6,7,NA), c(1,2,NA,4,5,NA,NA,8),c(1,2,3,4,5,6,NA,8), c(1,2,3,4,5,6,7,8),c(NA,NA,NA,4,NA,6,NA,NA)))
colnames(setA) <- sprintf("col%s", seq(1:5))
rownames(setA) <- sprintf("sample%s", seq(1:8))
View(setA)
myfunc1 <- function(x){
return(sum(is.na(x)))
}
myfunc2 <- function(x){
return (round(myfunc1(x) / length(x),4))
}
myfunc3 <- function(t, set){
m <- which(apply(set, MARGIN = 2, myfunc2) > t)
missed <- sum(is.na(set[m]))
present <- sum(!is.na(set[m]))
return(present/ missed)
}
In setA, there are 8 samples. Each sample has 5 attributes to describe the sample. Unfortunately, some data are missing. I need to delete some column with too many NAs. First, let me calculate every column's NA% .
> apply(setA, MARGIN = 2, myfunc2)
col1 col2 col3 col4 col5
0.250 0.375 0.125 0.000 0.750
If I set the threshold t = 0.3, that means col2, col5 are considered too many NAs and need to be deleted, others are acceptable. If I delete the 2 columns, I also delete some real data. (I deleted 7 real data and 9 NAs, 7/9 = 0.78. This means I sacrifice 0.78 real data when I delete 1 NA)
> myfunc3(0.3, setA)
[1] 0.7777778
I want to try every threshold's result and then decide.
threshold <- seq(from = 0, to = 0.9, by = 0.1)
apply(X= threshold, MARGIN = 1, FUN = myfunc3, set = setA) # not work
I manualy calculate setA part:
threshold: 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
price: 1.667 1.667 1.118 0.778 0.334 0.334 0.334 0.334 NaN NaN
At last I want a talbe like:
threshold: 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
setA: 1.667 1.667 1.118 0.778 0.334 0.334 0.334 0.334 NaN NaN
setB:
setC:
...
setJ:
Do I have the correct way with the problem?
-----------Edit---------------
I already solved the problem and please close the thread.

How does R (v3.6.1) determine rounding with odd floor values? [duplicate]

Yes I know why we always round to the nearest even number if we are in the exact middle (i.e. 2.5 becomes 2) of two numbers. But when I want to evaluate data for some people they don't want this behaviour. What is the simplest method to get this:
x <- seq(0.5,9.5,by=1)
round(x)
to be 1,2,3,...,10 and not 0,2,2,4,4,...,10.
Edit: To clearify: 1.4999 should be 1 after rounding. (I thought this would be obvious)
This is not my own function, and unfortunately, I can't find where I got it at the moment (originally found as an anonymous comment at the Statistically Significant blog), but it should help with what you need.
round2 = function(x, digits) {
posneg = sign(x)
z = abs(x)*10^digits
z = z + 0.5 + sqrt(.Machine$double.eps)
z = trunc(z)
z = z/10^digits
z*posneg
}
x is the object you want to round, and digits is the number of digits you are rounding to.
An Example
x = c(1.85, 1.54, 1.65, 1.85, 1.84)
round(x, 1)
# [1] 1.8 1.5 1.6 1.8 1.8
round2(x, 1)
# [1] 1.9 1.5 1.7 1.9 1.8
(Thanks #Gregor for the addition of + sqrt(.Machine$double.eps).)
If you want something that behaves exactly like round except for those xxx.5 values, try this:
x <- seq(0, 1, 0.1)
x
# [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
floor(0.5 + x)
# [1] 0 0 0 0 0 1 1 1 1 1 1
As #CarlWitthoft said in the comments, this is the IEC 60559 standard as mentioned in ?round:
Note that for rounding off a 5, the IEC 60559 standard is expected to be used, ‘go to the even digit’. Therefore round(0.5) is 0 and round(-1.5) is -2. However, this is dependent on OS services and on representation error (since e.g. 0.15 is not represented exactly, the rounding rule applies to the represented number and not to the printed number, and so round(0.15, 1) could be either 0.1 or 0.2).
An additional explanation by Greg Snow:
The logic behind the round to even rule is that we are trying to
represent an underlying continuous value and if x comes from a truly
continuous distribution, then the probability that x==2.5 is 0 and the
2.5 was probably already rounded once from any values between 2.45 and 2.54999999999999..., if we use the round up on 0.5 rule that we learned in grade school, then the double rounding means that values
between 2.45 and 2.50 will all round to 3 (having been rounded first
to 2.5). This will tend to bias estimates upwards. To remove the
bias we need to either go back to before the rounding to 2.5 (which is
often impossible to impractical), or just round up half the time and
round down half the time (or better would be to round proportional to
how likely we are to see values below or above 2.5 rounded to 2.5, but
that will be close to 50/50 for most underlying distributions). The
stochastic approach would be to have the round function randomly
choose which way to round, but deterministic types are not
comforatable with that, so "round to even" was chosen (round to odd
should work about the same) as a consistent rule that rounds up and
down about 50/50.
If you are dealing with data where 2.5 is likely to represent an exact
value (money for example), then you may do better by multiplying all
values by 10 or 100 and working in integers, then converting back only
for the final printing. Note that 2.50000001 rounds to 3, so if you
keep more digits of accuracy until the final printing, then rounding
will go in the expected direction, or you can add 0.000000001 (or
other small number) to your values just before rounding, but that can
bias your estimates upwards.
This appears to work:
rnd <- function(x) trunc(x+sign(x)*0.5)
Ananda Mahto's response seems to do this and more - I am not sure what the extra code in his response is accounting for; or, in other words, I can't figure out how to break the rnd() function defined above.
Example:
seq(-2, 2, by=0.5)
# [1] -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
round(x)
# [1] -2 -2 -1 0 0 0 1 2 2
rnd(x)
# [1] -2 -2 -1 -1 0 1 1 2 2
Depending on how comfortable you are with jiggling your data, this works:
round(x+10*.Machine$double.eps)
# [1] 1 2 3 4 5 6 7 8 9 10
This method:
round2 = function(x, n) {
posneg = sign(x)
z = abs(x)*10^n
z = z + 0.5
z = trunc(z)
z = z/10^n
z*posneg
}
does not seem to work well when we have numbers with many digits. E.g. doing round2(2436.845, 2) will give us 2436.84. The issue seems to occur with the trunc(z) function.
Overall, I think it has something to do with the way R stores numbers and thus the trunc and float function doesn't always work. I was able to get around it in not the most elegant way:
round2 = function(x, n) {
posneg = sign(x)
z = abs(x)*10^n
z = z + 0.5
z = trunc(as.numeric(as.character(z)))
z = z/10^n
(z)*posneg
}
This mimics the rounding away from zero at .5:
round_2 <- function(x, digits = 0) {
x = x + abs(x) * sign(x) * .Machine$double.eps
round(x, digits = digits)
}
round_2(.5 + -2:4)
-2 -1 1 2 3 4 5

Is there an R function to return a parameter in a list that can not be find by str(list)

I’m trying to return a parameter in a list, but I cannot find the parameter using str(list).
this is my codes
install.packages("meta")
library(meta)
m1 <- metacor(c(0.85, 0.7, 0.95), c(20, 40, 10))
m1
COR 95%-CI %W(fixed) %W(random)
1 0.8500 [0.6532; 0.9392] 27.9 34.5
2 0.7000 [0.4968; 0.8304] 60.7 41.7
3 0.9500 [0.7972; 0.9884] 11.5 23.7
Number of studies combined: k = 3
COR 95%-CI z p-value
Fixed effect model 0.7955 [0.6834; 0.8710] 8.48 < 0.0001
Random effects model 0.8427 [0.6264; 0.9385] 4.87 < 0.0001
how could I save COR(=0.8427) orp-value(=< 0.0001) forRandom effects model as a single parameter.
It seems that the numbers that you are looking for (cor 0.8427) are created in print.meta. The function seems too big though so I gave up trying to pinpoint exactly where it gets calculated and what name it has. I don't think it is even saved within the function, but rather printed.
Anyway I took the alternative road of capturing the output:
#capture the output of the summary - the fifth line gives us what we want
out <- capture.output(summary(m1))[5]
#capture all the number and return the first
unlist(regmatches(out, gregexpr("[[:digit:]]+\\.*[[:digit:]]*", out)))[1]
#[1] "0.8427"
I assume your problem is accessing to the object.
The $ will help you with it, such that by putting the variablename, then the dollar and by pressing the tab, the different possibilities of that object will appear. According to you questions, the values would be
> m1$cor[1]
[1] 0.85
> mysummary<-summary(m1)
> mysummary$fixed$p
[1] 2.163813e-17
> mysummary$fixed$z
[1] 8.484643
> ifelse(mysummary$fixed$p<0.0001, "<0.0001", "WHATEVER")
[1] "<0.0001"
To select a specific one, you can use [i] where i is an integer (example i = 1 for 0.85)
To get a 0.0001, I suggest using an ifelse() statement on pvalues or Z with their according rule. Cheers !

R: What is wrong with rounding? [duplicate]

Yes I know why we always round to the nearest even number if we are in the exact middle (i.e. 2.5 becomes 2) of two numbers. But when I want to evaluate data for some people they don't want this behaviour. What is the simplest method to get this:
x <- seq(0.5,9.5,by=1)
round(x)
to be 1,2,3,...,10 and not 0,2,2,4,4,...,10.
Edit: To clearify: 1.4999 should be 1 after rounding. (I thought this would be obvious)
This is not my own function, and unfortunately, I can't find where I got it at the moment (originally found as an anonymous comment at the Statistically Significant blog), but it should help with what you need.
round2 = function(x, digits) {
posneg = sign(x)
z = abs(x)*10^digits
z = z + 0.5 + sqrt(.Machine$double.eps)
z = trunc(z)
z = z/10^digits
z*posneg
}
x is the object you want to round, and digits is the number of digits you are rounding to.
An Example
x = c(1.85, 1.54, 1.65, 1.85, 1.84)
round(x, 1)
# [1] 1.8 1.5 1.6 1.8 1.8
round2(x, 1)
# [1] 1.9 1.5 1.7 1.9 1.8
(Thanks #Gregor for the addition of + sqrt(.Machine$double.eps).)
If you want something that behaves exactly like round except for those xxx.5 values, try this:
x <- seq(0, 1, 0.1)
x
# [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
floor(0.5 + x)
# [1] 0 0 0 0 0 1 1 1 1 1 1
As #CarlWitthoft said in the comments, this is the IEC 60559 standard as mentioned in ?round:
Note that for rounding off a 5, the IEC 60559 standard is expected to be used, ‘go to the even digit’. Therefore round(0.5) is 0 and round(-1.5) is -2. However, this is dependent on OS services and on representation error (since e.g. 0.15 is not represented exactly, the rounding rule applies to the represented number and not to the printed number, and so round(0.15, 1) could be either 0.1 or 0.2).
An additional explanation by Greg Snow:
The logic behind the round to even rule is that we are trying to
represent an underlying continuous value and if x comes from a truly
continuous distribution, then the probability that x==2.5 is 0 and the
2.5 was probably already rounded once from any values between 2.45 and 2.54999999999999..., if we use the round up on 0.5 rule that we learned in grade school, then the double rounding means that values
between 2.45 and 2.50 will all round to 3 (having been rounded first
to 2.5). This will tend to bias estimates upwards. To remove the
bias we need to either go back to before the rounding to 2.5 (which is
often impossible to impractical), or just round up half the time and
round down half the time (or better would be to round proportional to
how likely we are to see values below or above 2.5 rounded to 2.5, but
that will be close to 50/50 for most underlying distributions). The
stochastic approach would be to have the round function randomly
choose which way to round, but deterministic types are not
comforatable with that, so "round to even" was chosen (round to odd
should work about the same) as a consistent rule that rounds up and
down about 50/50.
If you are dealing with data where 2.5 is likely to represent an exact
value (money for example), then you may do better by multiplying all
values by 10 or 100 and working in integers, then converting back only
for the final printing. Note that 2.50000001 rounds to 3, so if you
keep more digits of accuracy until the final printing, then rounding
will go in the expected direction, or you can add 0.000000001 (or
other small number) to your values just before rounding, but that can
bias your estimates upwards.
This appears to work:
rnd <- function(x) trunc(x+sign(x)*0.5)
Ananda Mahto's response seems to do this and more - I am not sure what the extra code in his response is accounting for; or, in other words, I can't figure out how to break the rnd() function defined above.
Example:
seq(-2, 2, by=0.5)
# [1] -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
round(x)
# [1] -2 -2 -1 0 0 0 1 2 2
rnd(x)
# [1] -2 -2 -1 -1 0 1 1 2 2
Depending on how comfortable you are with jiggling your data, this works:
round(x+10*.Machine$double.eps)
# [1] 1 2 3 4 5 6 7 8 9 10
This method:
round2 = function(x, n) {
posneg = sign(x)
z = abs(x)*10^n
z = z + 0.5
z = trunc(z)
z = z/10^n
z*posneg
}
does not seem to work well when we have numbers with many digits. E.g. doing round2(2436.845, 2) will give us 2436.84. The issue seems to occur with the trunc(z) function.
Overall, I think it has something to do with the way R stores numbers and thus the trunc and float function doesn't always work. I was able to get around it in not the most elegant way:
round2 = function(x, n) {
posneg = sign(x)
z = abs(x)*10^n
z = z + 0.5
z = trunc(as.numeric(as.character(z)))
z = z/10^n
(z)*posneg
}
This mimics the rounding away from zero at .5:
round_2 <- function(x, digits = 0) {
x = x + abs(x) * sign(x) * .Machine$double.eps
round(x, digits = digits)
}
round_2(.5 + -2:4)
-2 -1 1 2 3 4 5

compute all pairwise differences within a vector in R

There are several posts on computing pairwise differences among vectors, but I cannot find how to compute all differences within a vector.
Say I have a vector, v.
v<-c(1:4)
I would like to generate a second vector that is the absolute value of all pairwise differences within the vector. Similar to:
abs(1-2) = 1
abs(1-3) = 2
abs(1-4) = 3
abs(2-3) = 1
abs(2-4) = 2
abs(3-4) = 1
The output would be a vector of 6 values, which are the result of my 6 comparisons:
output<- c(1,2,3,1,2,1)
Is there a function in R that can do this?
as.numeric(dist(v))
seems to work; it treats v as a column matrix and computes the Euclidean distance between rows, which in this case is sqrt((x-y)^2)=abs(x-y)
If we're golfing, then I'll offer c(dist(v)), which is equivalent and which I'm guessing will be unbeatable.
#AndreyShabalin makes the good point that using method="manhattan" will probably be slightly more efficient since it avoids the squaring/square-rooting stuff.
Let's play golf
abs(apply(combn(1:4,2), 2, diff))
#Ben, yours is a killer!
> system.time(apply(combn(1:1000,2), 2, diff))
user system elapsed
6.65 0.00 6.67
> system.time(c(dist(1:1000)))
user system elapsed
0.02 0.00 0.01
> system.time({
+ v <- 1:1000
+ z = outer(v,v,'-');
+ z[lower.tri(z)];
+ })
user system elapsed
0.03 0.00 0.03
Who knew that elegant (read understandable/flexible) code can be so slow.
A possible solution is:
z = outer(v,v,'-');
z[lower.tri(z)];
[1] 1 2 3 1 2 1

Resources