# library (energy)
RR=100
n=10
a=2
b=4
miu1=2
miu2=4
m22=(b^2)*(1-(rho^2))
# This is the point where am having problem
# I want the programme to retain the results average0.1, average0.05 and
# average0.01 for every 'rho' from the rho_list used for the simulation
# but I am stuck because I don't know how to get the result
rho_list=c(0,0.3,0.6)
for (rho in rho_list){
energy=rep(NA,RR)
for (i in 1:RR){
z1=rnorm(n,0,1)
z2=rnorm(n,0,1)
x1=miu1+a*z1
x2=miu2+(rho*b*z1)+(sqrt(m22)*z2)
X=matrix(c(x1,x2),byrow=TRUE,ncol=2)
energy[i]=mvnorm.etest(X)$p.value
}
average0.1=sum(energy<=0.1)/RR
average0.05=sum(energy<=0.05)/RR
average0.01=sum(energy<=0.01)/RR
}
I want the program to retain the results average0.1, average0.05 and average0.01 for every rho from the rho_list used for the simulation
but I am stuck because I don't know how to get the result
Your example is not reproducible, so I'm giving you some simulated data to demonstrate how to output the result.
rho_list=c(0,0.3,0.6)
result <- sapply(rho_list, FUN = function(rho, ...) {
average0.1 = runif(1)
average0.05 = runif(1)
average0.01 = runif(1)
c(rho = rho, a01 = average0.1, a0.05 = average0.05, a0.01 = average0.01)
}, RR = RR, n = n, a = a, b = b, miu1 = miu1, miu2 = miu2, m22 = m22, simplify = FALSE)
do.call("rbind", result)
rho a01 a0.05 a0.01
[1,] 0.0 0.0136175 0.08581583 0.07171591
[2,] 0.3 0.8334469 0.42103038 0.07857328
[3,] 0.6 0.8231120 0.40647485 0.65408540
One option would be to store the results in a list for each value of rho and then bind them into a single data frame. Here's an example. Note that since rho isn't defined in the set-up code, I've substituted the definition of m22 for m22 in the loop. Also, I've set RR=10 to save time in running the code.
library(energy)
RR=10
n=10
a=2
b=4
miu1=2
miu2=4
rho_list=c(0, 0.3, 0.6)
energy_threshold = c(0.1, 0.05, 0.01) # Store energy thresholds in a vector
# Create a list of data frames. Each data frame contains the result for each
# of the three energy thresholds for one value of rho.
results = lapply(rho_list, function(rho) {
energy=rep(NA,RR)
for (i in 1:RR) {
z1=rnorm(n,0,1)
z2=rnorm(n,0,1)
x1=miu1+a*z1
x2=miu2+(rho*b*z1)+(sqrt((b^2)*(1-(rho^2)))*z2)
X=matrix(c(x1,x2),byrow=TRUE,ncol=2)
energy[i]=mvnorm.etest(X)$p.value
}
data.frame(rho, energy_threshold, result=sapply(energy_threshold, function(y) sum(energy <= y)/RR))
})
# Bind the three data frames into a single data frame
results = do.call(rbind, results)
And here's the output:
results
rho energy_threshold result
1 0.0 0.10 0.1
2 0.0 0.05 0.0
3 0.0 0.01 0.0
4 0.3 0.10 0.2
5 0.3 0.05 0.1
6 0.3 0.01 0.0
7 0.6 0.10 0.0
8 0.6 0.05 0.0
9 0.6 0.01 0.0
I stored the variables from the loop into a numeric vector and then used cbind() to store results. Here is the entire code :
library(energy)
RR=10
n=10
a=2
b=4
miu1=2
miu2=4
m22=(b^2)*(1-(rho^2))
average0.1 <- as.numeric()
average0.05 <- as.numeric()
average0.01 <- as.numeric()
# This is the point where am having problem
# I want the programme to retain the results average0.1, average0.05 and
# average0.01 for every 'rho' from the rho_list used for the simulation
# but I am stuck because I dont know how to get the result
rho_list=c(0,0.3,0.6)
for (rho in unique(rho_list)){
energy=rep(NA,RR)
for (i in 1:RR){
z1=rnorm(n,0,1)
z2=rnorm(n,0,1)
x1=miu1+a*z1
x2=miu2+(rho*b*z1)+(sqrt(m22)*z2)
X=matrix(c(x1,x2),byrow=TRUE,ncol=2)
energy[i]=mvnorm.etest(X)$p.value
}
average0.1=rbind(average0.1, sum(energy<=0.1)/RR)
average0.05=rbind(average0.05, sum(energy<=0.05)/RR)
average0.01=rbind(average0.01, sum(energy<=0.01)/RR)
}
Related
Context
I asked this question recently:
Comparing partitions from split() using a nested for loop containing an if statement
where I needed to compare partitions generated by split() from a distance matrix using the code fix provided by #robertdj
set.seed(1234) # set random seed for reproducibility
# generate random normal variates
x <- rnorm(5)
y <- rnorm(5)
df <- data.frame(x, y) # merge vectors into dataframe
d <- dist(x) # generate distance matrix
splt <- split(d, 1:5) # split data with 5 values in each partition
for (i in 1:length(splt)) {
for (j in 1:length(splt)) {
if (i != j) {
a <- length(which(splt[[i]] >= min(splt[[j]]))) / length(splt[[i]])
b <- length(which(splt[[j]] <= max(splt[[i]]))) / length(splt[[j]])
}
}
}
I generated a MWE where each split contained the same number of elements. I did this just for illustrative purposes, fully knowing that this would not necessarily hold for real data.
As per #Robert Hacken's comment if I instead do
d <- na.omit(d[lower.tri(d)])
I get partitions of unequal length.
Real Data
However my real data does not have the "same size" property. My real data contains many more partitions than only 5 in my MWE.
Here is my code
splt <- split(dist_matrix, sub("(?:(.*)\\|){2}(\\w+)\\|(\\w+)\\|.*?$", "\\1-\\2", colnames(dist_matrix)))
The distance matrix dist_matrix contains FASTA headers from which I extract the species names.
I then use splt above in the doubly nested loop.
For instance, splt[[4]] contains 5 values, whereas splt[[10]] contains 9.
splt[[4]]
[1] 0.1316667 0.1383333 0.1166667 0.1333333 0.1216667
splt[[10]]
[1] 0.1450000 0.1483333 0.1316667 0.1316667 0.1333333 0.1333333 0.1166667 0.1166667 0.1200000
Expected Output
For my real problem, each partition corresponds to distances for a single species to all other unique species. So, if Species X has two DNA sequences representing it and there are 10 species in total, the partition for Species X should contain 20 distances. However I don't want the partition to include the distance between the two sequences for species A.
splt would thus contain 10 partitions (each not necessarily of the same length) for all species
The expected output of a and b is a number between 0-1 inclusive. I think these numbers should be small in my real example, but they are large when I try to run my code, which I think is a consequence of the warning().
What I've Done
I've read on SO that %in% is typically used to resolve the warning
In splt[[i]] == splt[[j]] :
longer object length is not a multiple of shorter object length
except in my case, I believe I would need %notin% <- Negate(%in%).
However, %notin% gives the error in my original post
the condition has length > 1
Question
How can my nested loop be altered to remove the warning?
I'm going to go out on a limb by interpreting parts of what you say, discarding your code, and seeing what I can come up with. If nothing else, it may spark conversation to explain what about my interpretations are correct (and which are incorrect).
Starting with the splt as generated by the random data, then replacing elements 4 and 5 with longer vectors,
set.seed(1234)
x <- rnorm(5)
y <- rnorm(5)
df <- data.frame(x, y)
d <- dist(x)
splt <- split(d, 1:5)
splt[[4]] <- rnorm(4)
splt[[5]] <- rnorm(10)
We have:
splt <- list("1" = c(1.48449499149608, 2.62312694474001), "2" = c(2.29150692606848, 0.15169544670039), "3" = c(1.13863195324393, 3.43013887931241), "4" = c(-0.477192699753547, -0.998386444859704, -0.77625389463799, 0.0644588172762693), "5" = c(-0.693720246937475, -1.44820491038647, 0.574755720900728, -1.02365572296388, -0.0151383003641817, -0.935948601168394, 1.10229754620026, -0.475593078869057, -0.709440037512506, -0.501258060594761))
splt
# $`1`
# [1] 1.484495 2.623127
# $`2`
# [1] 2.2915069 0.1516954
# $`3`
# [1] 1.138632 3.430139
# $`4`
# [1] -0.47719270 -0.99838644 -0.77625389 0.06445882
# $`5`
# [1] -0.6937202 -1.4482049 0.5747557 -1.0236557 -0.0151383 -0.9359486 1.1022975 -0.4755931 -0.7094400 -0.5012581
You reference expressions like which(splt[[i]] >= min(splt[[j]])), which I'm interpreting to mean *"what is the ratio of splt[[i]] that is above the max value in splt[[j]]. Since we're comparing (for example) splt[[1]] with all of splt[[2]] through splt[[5]] here, and likewise for the others, we're going to have a square matrix where the diagonal is splt[[i]]-vs-splt[[i]] (likely not interesting).
Some quick math so we know what we should end up with:
splt[[1]]
# [1] 1.484495 2.623127
range(splt[[2]])
# [1] 0.1516954 2.2915069
Since 1 from [[1]] is greater than 2's max of 2.29, we expect 0.5 in a comparison between the two (for >= max(.)); similarly, none of [[1]] is below 0.15, so we expect a 0 there.
Similarly, [[5]] over [[4]]:
splt[[5]]
# [1] -0.6937202 -1.4482049 0.5747557 -1.0236557 -0.0151383 -0.9359486 1.1022975 -0.4755931 -0.7094400 -0.5012581
range(splt[[4]])
# [1] -0.99838644 0.06445882
### 2 of 10 are greater than the max
sum(splt[[5]] >= max(splt[[4]])) / length(splt[[5]])
# [1] 0.2
### 9 of 10 are lesser than the min
sum(splt[[5]] <= min(splt[[4]])) / length(splt[[5]])
# [1] 0.2
We can use outer, but sometimes that can be confusing, especially since in this case we'd need to Vectorize the anon-func passed to it. I'll adapt your double-for loop premise into nested sapply calls.
Greater than the other's max
sapply(splt, function(y) sapply(setNames(splt, paste0("max", seq_along(splt))), function(z) sum(y >= max(z)) / length(y)))
# 1 2 3 4 5
# max1 0.5 0.0 0.5 0.00 0.0
# max2 0.5 0.5 0.5 0.00 0.0
# max3 0.0 0.0 0.5 0.00 0.0
# max4 1.0 1.0 1.0 0.25 0.2
# max5 1.0 0.5 1.0 0.00 0.1
Interpretation and subset validation:
1 with max of 2: comparing [[1]] (first column) with the max value from [[2]] (second row), half of 1's values are greater, so we have 0.5 (as expected).
5 with max of 4: comparing [[5]] (fifth column) with the max value from [[4]] (fourth row), 0.2 meet the condition.
Less than the other's min
sapply(splt, function(y) sapply(setNames(splt, paste0("min", seq_along(splt))), function(z) sum(y <= min(z)) / length(y)))
# 1 2 3 4 5
# min1 0.5 0.5 0.5 1.00 1.0
# min2 0.0 0.5 0.0 1.00 0.8
# min3 0.0 0.5 0.5 1.00 1.0
# min4 0.0 0.0 0.0 0.25 0.2
# min5 0.0 0.0 0.0 0.00 0.1
Same two pairs:
1 with min of 2 (row 2, column 1) is 0, as expected
5 with min of 4 (row 4, column 5) is 0.2, as expected
Edit: #compbiostats pointed out that while sum(..) should produce the same results as length(which(..)), the latter may be more robust to missing-data (e.g., NA values, c.f., Difference between sum(), length(which()), and nrow() in R). For sum(..) to share that resilience, we should add na.rm=TRUE) to both sum(.) and min(.) in the above calls. Thanks #compbiostats!
By definition, the entropy is defined as:
entropy <- function (p) sum(-p * log(p))
I'm performing LCA using the poLCA package and trying to calculate entropy, which for some of my models are outputting NaN.
error_prior <- entropy(lca2$P) # Class proportions model 2
error_post <- mean(apply(lca2$posterior, 1, entropy), na.rm = TRUE)
results[2,8] <- round(((error_prior - error_post) / error_prior), 3)
From the answer to this question: Entropy output is NaN for some class solutions and not others, I learnt that it is caused by zeros in p and it can be resolved by adding na.omit to the function as follows:
entropy <- function (p) sum(na.omit(-p * log(p)))
My question is - is this technical tweak mathematically valid without affecting the integrity of the calculation?
In my case, around 1/3 of the values in p are zeros. I'm really unsure if I should use na.omit or find another way to resolve this problem.
It is valid, but not transparent at first glance. The reason is that the mathematical limit of xlog(x) as x -> 0 is 0 (we can prove this using L'Hospital Rule). In this regard, the most robust definition of the function should be
entropy.safe <- function (p) {
if (any(p > 1 | p < 0)) stop("probability must be between 0 and 1")
log.p <- numeric(length(p))
safe <- p != 0
log.p[safe] <- log(p[safe])
sum(-p * log.p)
}
But simply dropping p = 0 cases gives identical results, because the result at p = 0 is 0 and contributes nothing to the sum anyway.
entropy.brutal <- function (p) {
if (any(p > 1 | p < 0)) stop("probability must be between 0 and 1")
log.p <- log(p)
## as same as sum(na.omit(-p * log.p))
sum(-p * log.p, na.rm = TRUE)
}
## p has a single 0
( p <- seq(0, 1, by = 0.1) )
#[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
entropy.brutal(p)
#[1] 2.455935
entropy.safe(p)
#[1] 2.455935
## half of p are zeros
p[1:5] <- 0
p
#[1] 0.0 0.0 0.0 0.0 0.0 0.5 0.6 0.7 0.8 0.9 1.0
entropy.brutal(p)
#[1] 1.176081
entropy.safe(p)
#[1] 1.176081
In conclusion, we can use either entropy.brutal or entropy.safe.
My current mission: pick up some "good" columns from a incomplete matrix, trying to remove NAs while keeping real data.
My idea: I can calculate evey column's missing data NA%. For a given threshold t, all the NA% > t columns will be removed. The removed columns also contain some real data. In these columns, present/missing will show the "price" of deleting these columes. My idea is to search the lowest "price" to delete NA as much as possible, for each dataset.
I already wrote my function till the last 2 steps:
myfunc1 <- function(x){
return(sum(is.na(x))
}
myfunc2 <- function(x){
return (round(myfunc1(x) / length(x),4))
}
myfunc3 <- function(t, set){
m <- which(apply(set, MARGIN = 2, myfunc2) > t)
missed <- sum(is.na(set[m]))
present <- sum(!is.na(set[m]))
return(present/ missed)
}
myfunc3(0.5, setA) # worked
threshold <- seq(from = 0, to = 0.95, by = 0.5)
apply(X = threshold, MARGIN = 1, FUN = myfunc3, set = setA) # not worked. stuck here.
I have 10 datasets from setA to setJ, I want to test all thresholds from 0 to 0.95. I want a matrix as a return with 10 datasets as column and 20 rows threshold with every 0.05 interval.
Did I do this correctly? Are there better ideas, or already existing libraries that I could use?
----------edit: example-----------
setA <- data.frame(cbind(c(1,2,3,4,NA,6,7,NA), c(1,2,NA,4,5,NA,NA,8),c(1,2,3,4,5,6,NA,8), c(1,2,3,4,5,6,7,8),c(NA,NA,NA,4,NA,6,NA,NA)))
colnames(setA) <- sprintf("col%s", seq(1:5))
rownames(setA) <- sprintf("sample%s", seq(1:8))
View(setA)
myfunc1 <- function(x){
return(sum(is.na(x)))
}
myfunc2 <- function(x){
return (round(myfunc1(x) / length(x),4))
}
myfunc3 <- function(t, set){
m <- which(apply(set, MARGIN = 2, myfunc2) > t)
missed <- sum(is.na(set[m]))
present <- sum(!is.na(set[m]))
return(present/ missed)
}
In setA, there are 8 samples. Each sample has 5 attributes to describe the sample. Unfortunately, some data are missing. I need to delete some column with too many NAs. First, let me calculate every column's NA% .
> apply(setA, MARGIN = 2, myfunc2)
col1 col2 col3 col4 col5
0.250 0.375 0.125 0.000 0.750
If I set the threshold t = 0.3, that means col2, col5 are considered too many NAs and need to be deleted, others are acceptable. If I delete the 2 columns, I also delete some real data. (I deleted 7 real data and 9 NAs, 7/9 = 0.78. This means I sacrifice 0.78 real data when I delete 1 NA)
> myfunc3(0.3, setA)
[1] 0.7777778
I want to try every threshold's result and then decide.
threshold <- seq(from = 0, to = 0.9, by = 0.1)
apply(X= threshold, MARGIN = 1, FUN = myfunc3, set = setA) # not work
I manualy calculate setA part:
threshold: 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
price: 1.667 1.667 1.118 0.778 0.334 0.334 0.334 0.334 NaN NaN
At last I want a talbe like:
threshold: 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
setA: 1.667 1.667 1.118 0.778 0.334 0.334 0.334 0.334 NaN NaN
setB:
setC:
...
setJ:
Do I have the correct way with the problem?
-----------Edit---------------
I already solved the problem and please close the thread.
I've created a custom function to calculate values based on two inputs.
# function
info.theta <- function(theta, delta) {
P = 1/(1+exp(-1*(theta-delta)))
Q = 1 -P
1*P*Q
}
I'd like to use the function to calculate the value for all possible combinations of values for two sequences of interest.
# for each input of the function create sequences of values to explore
thetas <- seq(-4, 4, by = .5)
deltas <- seq(-4, 4, by = .5)
I'd like to end up with a data frame with a column labeled thetas, deltas and information, where both theta and delta are the values for the sequence that were used in the function, and information is the output of the function for each combination of theta and delta.
I'm at a loss for how to execute the last point, as this level of coding is new to me. My hunch was maybe a nested for loop. This is obviously not correct, but it is as close as I can get to a start. How would I use the function in the way I described to generate the desired data frame?
#nested for loop
y <- NULL
for(i in sequence) {
for(j in deltas) {
tmp <- info.theta(i, j)
y <- rbind(y, tmp)
}
}
y
You can use outer to get matrix of values:
outer(thetas,deltas,info.theta)
A slight change to your original function:
info.theta <- function(theta, delta) {
P = 1/(1+exp(-1*(theta-delta)))
Q = 1 -P
data.frame(theta=theta,delta=delta, information=1*P*Q)
}
Because data.frames are cooler.
Now:
td_grid<-expand.grid(thetas, deltas)
info.theta(td_grid[,1],td_grid[,2])
results in:
theta delta information
1 -4.0 -4.0 0.2500000000
2 -3.5 -4.0 0.2350037122
3 -3.0 -4.0 0.1966119332
4 -2.5 -4.0 0.1491464521
5 -2.0 -4.0 0.1049935854
6 -1.5 -4.0 0.0701037165
7 -1.0 -4.0 0.0451766597
8 -0.5 -4.0 0.0284530239
9 0.0 -4.0 0.0176627062
10 0.5 -4.0 0.0108662297
11 1.0 -4.0 0.0066480567
I'm basically looking for a way to do a variation of this Ruby script in R.
I have an arbitrary list of numbers (steps of a moderator for a regression plot in this case) which have unequal distances from each other, and I'd like to round values which are within a range around these numbers to the nearest number in the list.
The ranges don't overlap.
arbitrary.numbers <- c(4,10,15) / 10
numbers <- c(16:1 / 10, 0.39, 1.45)
range <- 0.1
Expected output:
numbers
## 1.6 1.5 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.39 1.45
round_to_nearest_neighbour_in_range(numbers,arbitrary.numbers,range)
## 1.5 1.5 1.5 1.3 1.2 1.0 1.0 1.0 0.8 0.7 0.6 0.4 0.4 0.4 0.2 0.1 0.4 1.5
I've got a little helper function that might do for my specific problem, but it's not very flexible and it contains a loop. I can post it here, but I think a real solution would look completely different.
The different answers timed for speed (on a million numbers)
> numbers = rep(numbers,length.out = 1000000)
> system.time({ mvg.round(numbers,arbitrary.numbers,range) })[3]
elapsed
0.067
> system.time({ rinker.loop.round(numbers,arbitrary.numbers,range) })[3]
elapsed
0.289
> system.time({ rinker.round(numbers,arbitrary.numbers,range) })[3]
elapsed
1.403
> system.time({ nograpes.round(numbers,arbitrary.numbers,range) })[3]
elapsed
1.971
> system.time({ january.round(numbers,arbitrary.numbers,range) })[3]
elapsed
16.12
> system.time({ shariff.round(numbers,arbitrary.numbers,range) })[3]
elapsed
15.833
> system.time({ mplourde.round(numbers,arbitrary.numbers,range) })[3]
elapsed
9.613
> system.time({ kohske.round(numbers,arbitrary.numbers,range) })[3]
elapsed
26.274
MvG's function is the fastest, about 5 times faster than Tyler Rinker's second function.
A vectorized solution, without any apply family functions or loops:
The key is findInterval, which finds the "space" in arbitrary.numbers where each element in numbers is "between". So, findInterval(6,c(2,4,7,8)) returns 2, because 6 is between the 2nd and 3rd index of c(2,4,7,8).
# arbitrary.numbers is assumed to be sorted.
# find the index of the number just below each number, and just above.
# So for 6 in c(2,4,7,8) we would find 2 and 3.
low<-findInterval(numbers,arbitrary.numbers) # find index of number just below
high<-low+1 # find the corresponding index just above.
# Find the actual absolute difference between the arbitrary number above and below.
# So for 6 in c(2,4,7,8) we would find 2 and 1.
# (The absolute differences to 4 and 7).
low.diff<-numbers-arbitrary.numbers[ifelse(low==0,NA,low)]
high.diff<-arbitrary.numbers[ifelse(high==0,NA,high)]-numbers
# Find the minimum difference.
# In the example we would find that 6 is closest to 7,
# because the difference is 1.
mins<-pmin(low.diff,high.diff,na.rm=T)
# For each number, pick the arbitrary number with the minimum difference.
# So for 6 pick out 7.
pick<-ifelse(!is.na(low.diff) & mins==low.diff,low,high)
# Compare the actual minimum difference to the range.
ifelse(mins<=range+.Machine$double.eps,arbitrary.numbers[pick],numbers)
# [1] 1.5 1.5 1.5 1.3 1.2 1.0 1.0 1.0 0.8 0.7 0.6 0.4 0.4 0.4 0.2 0.1 0.4 1.5
Yet another solution using findInterval:
arbitrary.numbers<-sort(arbitrary.numbers) # need them sorted
range <- range*1.000001 # avoid rounding issues
nearest <- findInterval(numbers, arbitrary.numbers - range) # index of nearest
nearest <- c(-Inf, arbitrary.numbers)[nearest + 1] # value of nearest
diff <- numbers - nearest # compute errors
snap <- diff <= range # only snap near numbers
numbers[snap] <- nearest[snap] # snap values to nearest
print(numbers)
The nearest in the above code is not really mathematically the nearest number. Instead, it is the largest arbitrary number such that nearest[i] - range <= numbers[i], or equivalently nearest[i] <= numbers[i] + range. So in one go we find the largest arbitrary number which is either in the snapping range for a given input number, or still too small for that. For this reason, we only need to check one way for snap. No absolute value required, and even the squaring from a previous revision of this post was unneccessary.
Thanks to Interval search on a data frame for the pointer at findInterval, as I found it there before recognizing it in the answer by nograpes.
If, in contrast to your original question, you had overlapping ranges, you could write things like this:
arbitrary.numbers<-sort(arbitrary.numbers) # need them sorted
range <- range*1.000001 # avoid rounding issues
nearest <- findInterval(numbers, arbitrary.numbers) + 1 # index of interval
hi <- c(arbitrary.numbers, Inf)[nearest] # next larger
nearest <- c(-Inf, arbitrary.numbers)[nearest] # next smaller
takehi <- (hi - numbers) < (numbers - nearest) # larger better than smaller
nearest[takehi] <- hi[takehi] # now nearest is really nearest
snap <- abs(nearest - numbers) <= range # only snap near numbers
numbers[snap] <- nearest[snap] # snap values to nearest
print(numbers)
In this code, nearestreally ends up being the nearest number. This is achieved by considering both endpoints of every interval. In spirit, this is very much like the version by nograpes, but it avoids using ifelse and NA, which should benefit performance as it reduces the number of branching instructions.
Is this what you want?
> idx <- abs(outer(arbitrary.numbers, numbers, `-`)) <= (range+.Machine$double.eps)
> rounded <- arbitrary.numbers[apply(rbind(idx, colSums(idx) == 0), 2, which)]
> ifelse(is.na(rounded), numbers, rounded)
[1] 1.5 1.5 1.5 1.3 1.2 1.0 1.0 1.0 0.8 0.7 0.6 0.4 0.4 0.4 0.2 0.1 0.4 1.5
Please note that due to rounding errors (most likely), I use range = 0.1000001 to achieve the expected effect.
range <- range + 0.0000001
blah <- rbind( numbers, sapply( numbers, function( x ) abs( x - arbitrary.numbers ) ) )
ff <- function( y ) { if( min( y[-1] ) <= range + 0.000001 ) arbitrary.numbers[ which.min( y[ -1 ] ) ] else y[1] }
apply( blah, 2, ff )
This is still shorter:
sapply(numbers, function(x) ifelse(min(abs(arbitrary.numbers - x)) >
range + .Machine$double.eps, x, arbitrary.numbers[which.min
(abs(arbitrary.numbers - x))] ))
Thanks #MvG
Another option:
arb.round <- function(numbers, arbitrary.numbers, range) {
arrnd <- function(x, ns, r){
ifelse(abs(x - ns) <= range +.00000001, ns, x)
}
lapply(1:length(arbitrary.numbers), function(i){
numbers <<- arrnd(numbers, arbitrary.numbers[i], range)
}
)
numbers
}
arb.round(numbers, arbitrary.numbers, range)
Yields:
> arb.round(numbers, arbitrary.numbers, range)
[1] 1.5 1.5 1.5 1.3 1.2 1.0 1.0 1.0 0.8 0.7 0.6 0.4 0.4 0.4 0.2 0.1 0.4 1.5
EDIT: I removed the return call at the end of the function as it's not necessary adn can burn time.
EDIT: I think a loop will be even faster here:
loop.round <- function(numbers, arbitrary.numbers, range) {
arrnd <- function(x, ns, r){
ifelse(abs(x - ns) <= range +.00000001, ns, x)
}
for(i in seq_along(arbitrary.numbers)){
numbers <- arrnd(numbers, arbitrary.numbers[i], range)
}
numbers
}