Rounding numbers to the nearest values (with different intervals) in R - r

I want to round (or replace) numbers in a
a <- c(0.505, 1.555, 2.667, 53.850, 411.793)
to the nearest values in b:
b <- c(0, 5, 10, 50, 100, 200, 500)
The output will be this:
a_rnd <- c(0, 0, 5, 50, 500)
The logic is simple but I couldn't find any solution, because all the approaches I found require values in b have an equal interval!
How can I achieve this?

You can use sapply to loop over all values of a and use these indexes to extract the proper b values
b[sapply(a, function(x) which.min(abs(x - b)))]
#> [1] 0 0 5 50 500

This is a relatively simple approach:
b[apply(abs(outer(a, b, "-")), 1, which.min)]

Related

Find that values that are immediately below a given set of values and return the entry from another variable

I have two data frames:
a <- c(10, 20, 30)
c <- c(1, 50, 100)
df1 <- data.frame(cbind(a, b, c))
x <- c(80, 30, 15)
z <- c(10, 46, 99)
df2 <- data.frame(cbind(x, y, z))
I want to find the values in c that are immediately below the values in z, and then return the equivalent values in a.
So matching z to c will give me the locations: 1, 1, 2, and I want to output those locations from a (i.e 10, 10, 20)
Edit: For each value in z I want to find the location of the value that is below it in c, then return the value in a based on that location
You can use outer with the comparison <. Then colSums should add the TRUEs and give you your answer given that df1 is ordered on c, i.e.
colSums(outer(df1$c, df2$z, `<`))
#[1] 1 1 2
or
df1$a[colSums(outer(df1$c, df2$z, `<`))]
#[1] 10 10 20

Calculate length of repeats along a vector in r

Is there an efficient way to calculate the length of portions of a vector that repeat a specified value?
For instance, I want to calculate the length of rainless periods along a vector of daily rainfall values:
daily_rainfall=c(15, 2, 0, 0, 0, 3, 3, 0, 0, 10)
Besides using the obvious but clunky approach of looping through the vector, what cleaner way can I get to the desired answer of
rainless_period_length=c(3, 2)
given the vector above?
R has a built-in function rle: "run-length encoding":
daily_rainfall <- c(15, 2, 0, 0, 0, 3, 3, 0, 0, 10)
runs <- rle(daily_rainfall)
rainless_period_length <- runs$lengths[runs$values == 0]
rainless_period_length
output:
[1] 3 2

R solver optimization

I am new to R solver and I want to have a simple example in R for the below problem:
I have four columns which I calculate the individual sums as the illustrated sample example below:
The problem I want to solve in R:
Find the optimal lines that satisfies, simultaneously, the below statements:
For the first two columns (a, b) the individual summations to be more close to 0
The sums of (c, d) to be more close to 5
I do not have restrictions of which package solver to use. It could be helpful to have an example of R code for this!
EDIT
For the same solution I would like to apply some rules:
I want the sum(c) > sum(d) AND sum(d) < (static number, like 5)
Also, if I want the sums to fall into a range of numbers and not just static numbers, how the solution could it be written?
Using M defined reproducibly in the Note at the end we find the b which minimizes the following objective where b is a 0/1 vector:
sum((b %*% M - c(0, 0, 5, 5))^2)
1) CVXR Using the CVXR package we get a solution c(1, 0, 0, 1, 1) which means choose rows 1, 4 and 5.
library(CVXR)
n <- nrow(M)
b <- Variable(n, boolean = TRUE)
pred <- t(b) %*% M
y <- c(0, 0, 5, 5)
objective <- Minimize(sum((t(y) - pred)^2))
problem <- Problem(objective)
soln <- solve(problem)
bval <- soln$getValue(b)
zapsmall(c(bval))
## [1] 1 0 0 1 1
2) Brute Force Alternately since there are only 5 rows there are only 2^5 possible solutions so we can try them all and pick the one which minimizes the objective. First we compute a matrix solns with 2^5 columns such that each column is one possible solution. Then we compute the objective function for each column and take the one which minimizes it.
n <- nrow(M)
inverse.which <- function(ix, n) replace(integer(n), ix, 1)
L <- lapply(0:n, function(i) apply(combn(n, i), 2, inverse.which, n))
solns <- do.call(cbind, L)
pred <- t(t(solns) %*% M)
obj <- colSums((pred - c(0, 0, 5, 5))^2)
solns[, which.min(obj)]
## [1] 1 0 0 1 1
Note
M <- matrix(c(.38, -.25, .78, .83, -.65,
.24, -.35, .44, -.88, .15,
3, 5, 13, -15, 18,
18, -7, 23, -19, 7), 5)

R: selecting items matching criteria from a vector

I have a numeric vector in R, which consists of both negative and positive numbers. I want to separate the numbers in the list based on sign (ignoring zero for now), into two seperate lists:
a new vector containing only the negative numbers
another vector containing only the positive numbers
The documentation shows how to do this for selecting rows/columns/cells in a dataframe - but this dosen't work with vectors AFAICT.
How can it be done (without a for loop)?
It is done very easily (added check for NaN):
d <- c(1, -1, 3, -2, 0, NaN)
positives <- d[d>0 & !is.nan(d)]
negatives <- d[d<0 & !is.nan(d)]
If you want exclude both NA and NaN, is.na() returns true for both:
d <- c(1, -1, 3, -2, 0, NaN, NA)
positives <- d[d>0 & !is.na(d)]
negatives <- d[d<0 & !is.na(d)]
It can be done by using "square brackets".
A new vector is created which contains those values which are greater than zero. Since a comparison operator is used, it will denote values in Boolean. Hence square brackets are used to get the exact numeric value.
d_vector<-(1,2,3,-1,-2,-3)
new_vector<-d_vector>0
pos_vector<-d_vector[new_vector]
new1_vector<-d_vector<0
neg_vector<-d_vector[new1_vector]
purrrpackage includes some useful functions for filtering vectors:
library(purrr)
test_vector <- c(-5, 7, 0, 5, -8, 12, 1, 2, 3, -1, -2, -3, NA, Inf, -Inf, NaN)
positive_vector <- keep(test_vector, function(x) x > 0)
positive_vector
# [1] 7 5 12 1 2 3 Inf
negative_vector <- keep(test_vector, function(x) x < 0)
negative_vector
# [1] -5 -8 -1 -2 -3 -Inf
You can use also discard function

R: find nearest index

I have two vectors with a few thousand points, but generalized here:
A <- c(10, 20, 30, 40, 50)
b <- c(13, 17, 20)
How can I get the indicies of A that are nearest to b? The expected outcome would be c(1, 2, 2).
I know that findInterval can only find the first occurrence, and not the nearest, and I'm aware that which.min(abs(b[2] - A)) is getting warmer, but I can't figure out how to vectorize it to work with long vectors of both A and b.
You can just put your code in a sapply. I think this has the same speed as a for loop so isn't technically vectorized though:
sapply(b,function(x)which.min(abs(x - A)))
FindInterval gets you very close. You just have to pick between the offset it returns and the next one:
#returns the nearest occurence of x in vec
nearest.vec <- function(x, vec)
{
smallCandidate <- findInterval(x, vec, all.inside=TRUE)
largeCandidate <- smallCandidate + 1
#nudge is TRUE if large candidate is nearer, FALSE otherwise
nudge <- 2 * x > vec[smallCandidate] + vec[largeCandidate]
return(smallCandidate + nudge)
}
nearest.vec(b,A)
returns (1,2,2), and should comparable to FindInterval in performance.
Here's a solution that uses R's often overlooked outer function. Not sure if it'll perform better, but it does avoid sapply.
A <- c(10, 20, 30, 40, 50)
b <- c(13, 17, 20)
dist <- abs(outer(A, b, '-'))
result <- apply(dist, 2, which.min)
# [1] 1 2 2

Resources