Remap vector of values between -1 to 1 - math

I have two sampled vectors, there one vector maximum value is 0.8 and the minimum value is -0.8. 2nd vector minimum value is 0.2 and maximum value is 0.3. There 1st and 2nd vector, I want to re map all the values in sampled vector between -1 to 1. How do I do that. Im looking for method which applying outlined two vectors. Thank you in advance.
Sampled vector 1.
[ -0.8, 0.7 , -0.23, 0.56, 0.456, -0.344, -0.75, 0.8]
Sampled vector 2.
[ 0.2, 0.23, 0.21, 0.29, 0.26, 0.25, 0.3]

General formula to map xmin..xmax range onto new_min..new_max one:
X' = new_min + (new_max - new_min)*(X - xmin)/(xmax-xmin)
for destination range -1..1:
X' = -1 + 2 * (X - xmin) / (xmax-xmin)
for source range 0.2..0.3:
X' = -1 + 20 * (X - 0.2) = -5 + 20 * X

You are trying to normalize between [-1, 1]
Obtain the ratio:
norm_ratio = 1/.8
and multiply every element in your vector to get the desired result

Related

r How do I rescale a range of numbers with these constraints?

I need to rescale a series of numbers with certain constraints.
Let's say I have a vector like this:
x <- c(0.5, 0.3, 0.6, 0.4, 0.9, 0.1, 0.2, 0.3, 0.6)
The sum of x must be 6. Right now the sum of x = 3.9.
The numbers cannot be lower than 0
The numbers cannot be higher than 1
I know how to do 1 and 2+3 separately, but not together.
How do I rescale this?
EDIT: As was tried by r2evans, preferably the relative relationships of the numbers is preserved
I don't know that this can be done with a simple expression, but we can optimize our way through it:
opt <- optimize(function(z) abs(6 - sum( z + (1-z) * (x - min(x)) / diff(range(x)) )),
lower=0, upper=1)
opt
# $minimum
# [1] 0.2380955
# $objective
# [1] 1.257898e-06
out <- ( opt$minimum + (1-opt$minimum) * (x - min(x)) / diff(range(x)) )
out
# [1] 0.6190477 0.4285716 0.7142858 0.5238097 1.0000000 0.2380955 0.3333335 0.4285716 0.7142858 1.0000000
sum(out)
# [1] 6.000001
Because that is note perfectly 6, we can do one more step to safeguard it:
out <- out * 6/sum(out)
out
# [1] 0.6190476 0.4285715 0.7142857 0.5238096 0.9999998 0.2380954 0.3333335 0.4285715 0.7142857 0.9999998
sum(out)
# [1] 6
This process preserves the relative relationships of the numbers. If there are more "low" numbers than "high" numbers, scaling so that the sum is 6 will bring the higher numbers above 1. To compensate for that, we shift the lower-end (z in my code), so that all numbers are nudged up a little (but the lower numbers will be nudged up proportionately more).
The results should always be that the numbers are in [opt$minimum,1], and the sum will be 6.
Should be possible with a while loop to increase the values of x (to an upper limit of 1)
x <- c(0.5, 0.3, 0.6, 0.4, 0.9, 0.1, 0.2, 0.3, 0.6)
current_sum = sum(x)
target_sum = 6
while (!current_sum == target_sum) {
print(current_sum)
perc_diff <- (target_sum - current_sum) / target_sum
x <- x * (1 + perc_diff)
x[which(x > 1)] <- 1
current_sum = sum(x)
}
x <- c(0.833333333333333, 0.5, 1, 0.666666666666667, 1, 0.166666666666667,
0.333333333333333, 0.5, 1)
There is likely a more mathematical way

Is there a way in Base R to replicate what VLOOKUP TRUE in Excel does?

I have a consumption pattern that looks like this:
x <-0:10
y<-c(0, 0.05, 0.28, 0.45, 0.78, 0.86, 0.90, 0.92, 0.95, 0.98, 1.00)
X is in years, and Y is not always monotonically-increasing, although it should be most of the time.
If I needed to estimate how many years would elapse before 80% is consumed, in Excel, I would use the VLOOKUP TRUE function which would return 78%, then I would lookup the next value in the series (86%) and then linearly interpolate to get 4.25 years. It's laborious but it gets the job done.
Is there an easy way to compute this in R, in a user-defined function that I can apply to many cases?
Thanks!
x <- 0:10
y <- c(0, 0.05, 0.28, 0.45, 0.78, 0.86, 0.90, 0.92, 0.95, 0.98, 1.00)
estimate_years <- function(x, y, percent) {
idx <- max(which(y < percent))
(percent - y[idx]) / (y[idx+1] - y[idx]) * (x[idx+1] - x[idx]) + x[idx]
}
estimate_years(x, y, 0.80) ## 4.25
Although the approx calculation is cool,
exact linear interpolation here is easy.
idx is the next smaller position for y and x.
idx+1 thus is the next equal/bigger position for y and x in relation to percentage.
Through triangular calculation, where
k = part / total
which is
(percent - y[idx]) / (y[idx+1] - y[idx])
and applying k * total_x
represented here by k * (x[idx+1] - x[idx]) - the result of the linear interpolation
and adding last smaller years
x[idx], we obtain the result.
You could try with approx
resolution <- 1000
fn <- approx(x, y, n=resolution)
min(fn$x[fn$y > 0.8])
[1] 4.254254
The better you need your estimate to be, use a higher number for resolution

Finding nth percentile in a matrix with conditions

I have a matrix measuring 100 rows x 10 columns:
mat1 = matrix(1:1000, nrow = 100, ncol = 10)
I wish to find the nth percentile of each column using colQuantiles, where the nth percentile is equal to a probability value contained in Probs, except when any of the values in Probs > 0.99 - in which case I want the value of 0.99 applied.
Probs = c(0.99, 0.95, 1, 1, 0.96, 0.92, 1, 0.98, 0.99, 1)
I have tried the following:
Res = ifelse(Probs > 0.99, colQuantiles(mat1, Probs = c(0.99)), colQuantiles(mat1, probs = Probs))
But this simply returns the if true part of the above statement for all ten columns of mat1, presumably because there at least one of the values in Probs is > 0.99. How can I adapt the above so it treats each column of mat1 individually according to the probabilities in Probs?
You can use mapply as follows:
Probs[Probs > 0.99] <- 0.99
unname(mapply(function(x, p) quantile(x, p),
split(mat1, rep(1:ncol(mat1), each = nrow(mat1))),
Probs))
output:
[1] 99.01 195.05 299.01 399.01 496.04 592.08 699.01 798.02 899.01 999.01
It splits the matrix into a set of column vectors (see How to convert a matrix to a list of column-vectors in R?) and then find the nth percentile for each column.
We cannot pass different probability for different columns in colQuantiles but we can get all the probabilities for each column using colQuantiles
temp <- matrixStats::colQuantiles(mat1, probs = pmin(Probs, 0.99))
and then extract the diagonal of the matrix to get the required probability in each column.
diag(temp)
#[1] 99.01 195.05 299.01 399.01 496.04 592.08 699.01 798.02 899.01 999.01

Generating random variables with specific correlation threshold value

I am generating random variables with specified range and dimension.I have made a following code for this.
generateRandom <- function(size,scale){
result<- round(runif(size,1,scale),1)
return(result)
}
flag=TRUE
x <- generateRandom(300,6)
y <- generateRandom(300,6)
while(flag){
corrXY <- cor(x,y)
if(corrXY>=0.2){
flag=FALSE
}
else{
x <- generateRandom(300,6)
y <- generateRandom(300,6)
}
}
I want following 6 variables with size 300 and scale of all is between 1 to 6 except for one variable which would have scale 1-7 with following correlation structure among them.
1 0.45 -0.35 0.46 0.25 0.3
1 0.25 0.29 0.5 -0.3
1 -0.3 0.1 0.4
1 0.4 0.6
1 -0.4
1
But when I try to increase threshold value my program gets very slow.Moreover,I want more than 7 variables of size 300 and between each pair of those variables I want some specific correlation threshold.How would I do it efficiently?
This answer is directly inspired from here and there.
We would like to generate 300 samples of a 6-variate uniform distribution with correlation structure equal to
Rhos <- matrix(0, 6, 6)
Rhos[lower.tri(Rhos)] <- c(0.450, -0.35, 0.46, 0.25, 0.3,
0.25, 0.29, 0.5, -0.3, -0.3,
0.1, 0.4, 0.4, 0.6, -0.4)
Rhos <- Rhos + t(Rhos)
diag(Rhos) <- 1
We first generate from this correlation structure the correlation structure of the Gaussian copula:
Copucov <- 2 * sin(Rhos * pi/6)
This matrix is not positive definite, we use instead the nearest positive definite matrix:
library(Matrix)
Copucov <- cov2cor(nearPD(Copucov)$mat)
This correlation structure can be used as one of the inputs of MASS::mvrnorm:
G <- mvrnorm(n=300, mu=rep(0,6), Sigma=Copucov, empirical=TRUE)
We then transform G into a multivariate uniform sample whose values range from 1 to 6, except for the last variable which ranges from 1 to 7:
U <- matrix(NA, 300, 6)
U[, 1:5] <- 5 * pnorm(G[, 1:5]) + 1
U[, 6] <- 6 * pnorm(G[, 6]) + 1
After rounding (and taking the nearest positive matrix to the copula's covariance matrix etc.), the correlation structure is not changed much:
Ur <- round(U, 1)
cor(Ur)

R Generic solution to create 2*2 confusion matrix

My question is related to this one on producing a confusion matrix in R with the table() function. I am looking for a solution without using a package (e.g. caret).
Let's say these are our predictions and labels in a binary classification problem:
predictions <- c(0.61, 0.36, 0.43, 0.14, 0.38, 0.24, 0.97, 0.89, 0.78, 0.86, 0.15, 0.52, 0.74, 0.24)
labels <- c(1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0)
For these values, the solution below works well to create a 2*2 confusion matrix for, let's say, threshold = 0.5:
# Confusion matrix for threshold = 0.5
conf_matrix <- as.matrix(table(predictions>0.5,labels))
conf_matrix
labels
0 1
FALSE 4 3
TRUE 2 5
However, I do not get a 2*2 matrix if I select any value that is smaller than min(predictions) or larger than max(predictions), since the data won't have either a FALSE or TRUE occurrence e.g.:
conf_matrix <- as.matrix(table(predictions>0.05,labels))
conf_matrix
labels
0 1
TRUE 6 8
I need a method that consistently produces a 2*2 confusion matrix for all possible thresholds (decision boundaries) between 0 and 1, as I use this as an input in an optimisation. Is there a way I can tweak the table function so it always returns a 2*2 matrix here?
You can make your thresholded prediction a factor variable to achieve this:
(conf_matrix <- as.matrix(table(factor(predictions>0.05, levels=c(F, T)), labels)))
# labels
# 0 1
# FALSE 0 0
# TRUE 6 8

Resources