Rescale a vector to a specific average in r - r

I have a simple vector, for instance :
a <- c(-1.02, 2.25, 9.12, -2.09, 0.02)
I need to rescale it to an average of 100. But I really don't find the solution in order to solve my problem...
I tried with scale() function in order to rescale the values but however we cannot specify the mean.
I want to have in output when i calculate the mean of the vector : > 100
Thanks in advance for your help !

What about:
rescaled <- a/mean(a)*100
rescaled
[1] -61.594203 135.869565 550.724638 -126.207729 1.207729
mean(rescaled)
[1] 100

scale scales to a mean value of 0, keeping the standard error or scaling it to unity. So just add the desired mean to scale(a) to get a vector with the new mean
b1 <- c(scale(a)) + 100
mean(b1)
#[1] 100
b1
#[1] 99.40138 100.13288 101.66969 99.16202 99.63403
b2 <- c(scale(a, scale = FALSE)) + 100
mean(b2)
#[1] 100
b2
#[1] 97.324 100.594 107.464 96.254 98.364
Note that b2 is equal to
a - mean(a) + 100
#[1] 97.324 100.594 107.464 96.254 98.364

We can use
library(scales)
rescale(a, to = c(0, mean(a))) * 100

Perhaps you can use scale like below
scale(a) + 100

Related

Gamma Distribution in R (qgamma)

This may be a very stupid question but
Does anyone know why i am not getting the second bit equals to the mean (100)?
#beta=4, alpha=5, mean=20
qgamma(0.5, 5, 1/4)
# 18.68364
#beta=2500, alpha=0.04, mean=100
qgamma(0.5,0.04,1/2500)
# 0.00004320412
It is because you are using the quantile function, and qgamma(0.5, shape, scale) corresponds to the median - not the mean as you are expecting.
See the example below;
x <- rgamma(50000, shape = 0.04, scale = 2500)
mean(x)
# [1] 98.82911
median(x)
# [1] 3.700012e-05

Adjust vector subset values according to certain rules

I am trying to code a function that converts Chinese numerals to Arabic.
The idea is to get a set of scaled_digit vectors and a set of scale_factor vectors, multiply and add them up to generate the desired output.
digit=c('〇'=0,'一'=1,'二'=2,'三'=3,'四'=4,'五'=5,'六'=6,'七'=7,'八'=8,'九'=9)
scale=c('十'=10^1,'百'=10^2,'千'=10^3,'萬'=10^4,'億'=10^8,'兆'=10^12)
One of the problems I encounter is this:
When I have a number that is a few hundred 10^4s (萬) and a few thousand 10^12s (兆), I am left with a scale-factor vector that is as follows:
scale_factor=
structure(c(1000, 1e+12, 100, 10, 10000, 1000, 100, 10), .Names = c("千",
"兆", "百", "十", "萬", "千", "百", "十"))
[千] 兆 [百] [十] 萬 千 百 十
[1e+03] 1e+12 [1e+02] [1e+01] 1e+04 1e+03 1e+02 1e+01
Scale factors to be adjusted have been marked by [ ].
The positions of the cardinal scale_factors can be found with the following code:
cardinal_scale_factor=which(diff(scale_factor)>=0, T)+1
兆 萬
2 5
How do I code so that scale_factor[1] and scale_factor[3:4] can be multiplied by scale_factor[2] and [5] respectively?
Expected result:
[千] 兆 [百] [十] 萬 千 百 十
[1e+15] 1e+12 [1e+06] [1e+05] 1e+04 1e+03 1e+02 1e+01
A possible solution:
w1 <- which(rev(cummax(rev(scale_factor)) > rev(scale_factor)))
grp <- cumsum(c(1,diff(w1)) > 1) + 1
w2 <- aggregate(w1, list(grp), max)[[2]] + 1
scale_factor[w1] <- scale_factor[w1] * scale_factor[w2][grp]
which gives:
> scale_factor
千 兆 百 十 萬 千 百 十
1e+15 1e+12 1e+06 1e+05 1e+04 1e+03 1e+02 1e+01
What this does:
With cummax(rev(scale_factor)) you get the cummulative maximum of the reversed scale.
Comparing that with the reversed scale (cummax(rev(scale_factor)) > rev(scale_factor)) gives a logical vector.
Wrapping the logical vector from step 2 in rev and which subsequently, wou get an index vector of positions w1 that do not conform to the decreasing condition.
With cumsum(c(1,diff(w1)) > 1) + 1 you can group these positions for case the 3rd and 4th values in the example data.
With aggregate(w1, list(grp), function(x) max(x) + 1)[[2]] you determine the positions of the multipliers.
Finally, you multiply the values in scale_factor as determined in w1 with the multipliers from w2. You need to index w2 with the group numbers from grp.

Mode of density function using optimize

I want to find the mode (x-value) of a univariate density function using R
s optimize function
I.e. For a standard normal function f(x) ~ N(3, 1) the mode should be the mean i.e. x=3.
I tried the following:
# Define the function
g <- function(x) dnorm(x = x, mean = 3, sd = 1)
Dvec <- c(-1000, 1000)
# First get the gradient of the function
gradfun <- function(x){grad(g, x)}
# Find the maximum value
x_mode <- optimize(f=g,interval = Dvec, maximum=TRUE)
x_mode
This gives the incorrect value of the mode as:
$maximum
[1] 999.9999
$objective
[1] 0
Which is incorrect i.e. gives the max value of the (-1000, 1000) interval as opposed to x=3.
Could anyone please help edit the optimisation code.
It will be used to pass more generic functions of x if this simple test case works
I would use optim for this, avoiding to mention the interval. You can tailor the seed by taking the maximum of the function on the original guessed interval:
guessedInterval = min(Dvec):max(Dvec)
superStarSeed = guessedInterval[which.max(g(guessedInterval))]
optim(par=superStarSeed, fn=function(y) -g(y))
#$par
#[1] 3
#$value
#[1] -0.3989423
#$counts
#function gradient
# 24 NA
#$convergence
#[1] 0
#$message
#NULL

How to obtain the density estimation of a specific value in stats::density?

Suppose I have data like the following:
val <- .65
set.seed(1)
distr <- replicate(1000, jitter(.5, amount = .2))
d <- density(distr)
Since stats::density uses a specific bw, it does not include all possible values in the interval (becuase they're infinite):
d$x[ d$x > .64 & d$x < .66 ]
[1] 0.6400439 0.6411318 0.6422197 0.6433076 0.6443955 0.6454834 0.6465713 0.6476592 0.6487471
[10] 0.6498350 0.6509229 0.6520108 0.6530987 0.6541866 0.6552745 0.6563624 0.6574503 0.6585382
[19] 0.6596261
I would like to find a way to provide val to the density function, so that it will return its d$y estimate (I will then use it to color areas of the density plot).
I can't guess how silly this question is, but I can't find a fast solution.
I thought of obtaining it by a linear interpolation of the d$y corresponding to the two values of d$x that are closer to val. Is there a faster way?
This illustrates the use of approxfun:
> Af <- approxfun(d$x, d$y)
> Af(val)
[1] 2.348879
> plot(d(
+
> plot(d)
> points(val,Af(val) )
> png();plot(d); points(val,Af(val) ); dev.off()

Binning data in R

I have a vector with around 4000 values. I would just need to bin it into 60 equal intervals for which I would then have to calculate the median (for each of the bins).
v<-c(1:4000)
V is really just a vector. I read about cut but that needs me to specify the breakpoints. I just want 60 equal intervals
Use cut and tapply:
> tapply(v, cut(v, 60), median)
(-3,67.7] (67.7,134] (134,201] (201,268]
34.0 101.0 167.5 234.0
(268,334] (334,401] (401,468] (468,534]
301.0 367.5 434.0 501.0
(534,601] (601,668] (668,734] (734,801]
567.5 634.0 701.0 767.5
(801,867] (867,934] (934,1e+03] (1e+03,1.07e+03]
834.0 901.0 967.5 1034.0
(1.07e+03,1.13e+03] (1.13e+03,1.2e+03] (1.2e+03,1.27e+03] (1.27e+03,1.33e+03]
1101.0 1167.5 1234.0 1301.0
(1.33e+03,1.4e+03] (1.4e+03,1.47e+03] (1.47e+03,1.53e+03] (1.53e+03,1.6e+03]
1367.5 1434.0 1500.5 1567.0
(1.6e+03,1.67e+03] (1.67e+03,1.73e+03] (1.73e+03,1.8e+03] (1.8e+03,1.87e+03]
1634.0 1700.5 1767.0 1834.0
(1.87e+03,1.93e+03] (1.93e+03,2e+03] (2e+03,2.07e+03] (2.07e+03,2.13e+03]
1900.5 1967.0 2034.0 2100.5
(2.13e+03,2.2e+03] (2.2e+03,2.27e+03] (2.27e+03,2.33e+03] (2.33e+03,2.4e+03]
2167.0 2234.0 2300.5 2367.0
(2.4e+03,2.47e+03] (2.47e+03,2.53e+03] (2.53e+03,2.6e+03] (2.6e+03,2.67e+03]
2434.0 2500.5 2567.0 2634.0
(2.67e+03,2.73e+03] (2.73e+03,2.8e+03] (2.8e+03,2.87e+03] (2.87e+03,2.93e+03]
2700.5 2767.0 2833.5 2900.0
(2.93e+03,3e+03] (3e+03,3.07e+03] (3.07e+03,3.13e+03] (3.13e+03,3.2e+03]
2967.0 3033.5 3100.0 3167.0
(3.2e+03,3.27e+03] (3.27e+03,3.33e+03] (3.33e+03,3.4e+03] (3.4e+03,3.47e+03]
3233.5 3300.0 3367.0 3433.5
(3.47e+03,3.53e+03] (3.53e+03,3.6e+03] (3.6e+03,3.67e+03] (3.67e+03,3.73e+03]
3500.0 3567.0 3633.5 3700.0
(3.73e+03,3.8e+03] (3.8e+03,3.87e+03] (3.87e+03,3.93e+03] (3.93e+03,4e+03]
3767.0 3833.5 3900.0 3967.0
In the past, i've used this function
evenbins <- function(x, bin.count=10, order=T) {
bin.size <- rep(length(x) %/% bin.count, bin.count)
bin.size <- bin.size + ifelse(1:bin.count <= length(x) %% bin.count, 1, 0)
bin <- rep(1:bin.count, bin.size)
if(order) {
bin <- bin[rank(x,ties.method="random")]
}
return(factor(bin, levels=1:bin.count, ordered=order))
}
and then i can run it with
v.bin <- evenbins(v, 60)
and check the sizes with
table(v.bin)
and see they all contain 66 or 67 elements. By default this will order the values just like cut will so each of the factor levels will have increasing values. If you want to bin them based on their original order,
v.bin <- evenbins(v, 60, order=F)
instead. This just split the data up in the order it appears
This result shows the 59 median values of the break-points. The 60 bin values are probably as close to equal as possible (but probably not exactly equal).
> sq <- seq(1, 4000, length = 60)
> sapply(2:length(sq), function(i) median(c(sq[i-1], sq[i])))
# [1] 34.88983 102.66949 170.44915 238.22881 306.00847 373.78814
# [7] 441.56780 509.34746 577.12712 644.90678 712.68644 780.46610
# ......
Actually, after checking, the bins are pretty darn close to being equal.
> unique(diff(sq))
# [1] 67.77966 67.77966 67.77966

Resources