Related
How do I scale a series such that the first number in the series is 0 and last number is 1. I looked into 'approx', 'scale' but they do not achieve this objective.
# generate series from exponential distr
s = sort(rexp(100))
# scale/interpolate 's' such that it starts at 0 and ends at 1?
# approx(s)
# scale(s)
The scales package has a function that will do this for you: rescale.
library("scales")
rescale(s)
By default, this scales the given range of s onto 0 to 1, but either or both of those can be adjusted. For example, if you wanted it scaled from 0 to 10,
rescale(s, to=c(0,10))
or if you wanted the largest value of s scaled to 1, but 0 (instead of the smallest value of s) scaled to 0, you could use
rescale(s, from=c(0, max(s)))
It's straight-forward to create a small function to do this using basic arithmetic:
s = sort(rexp(100))
range01 <- function(x){(x-min(x))/(max(x)-min(x))}
range01(s)
[1] 0.000000000 0.003338782 0.007572326 0.012192201 0.016055006 0.017161145
[7] 0.019949532 0.023839810 0.024421602 0.027197168 0.029889484 0.033039408
[13] 0.033783376 0.038051265 0.045183382 0.049560233 0.056941611 0.057552543
[19] 0.062674982 0.066001242 0.066420884 0.067689067 0.069247825 0.069432174
[25] 0.070136067 0.076340460 0.078709590 0.080393512 0.085591881 0.087540132
[31] 0.090517295 0.091026499 0.091251213 0.099218526 0.103236344 0.105724733
[37] 0.107495340 0.113332392 0.116103438 0.124050331 0.125596034 0.126599323
[43] 0.127154661 0.133392300 0.134258532 0.138253452 0.141933433 0.146748798
[49] 0.147490227 0.149960293 0.153126478 0.154275371 0.167701855 0.170160948
[55] 0.180313542 0.181834891 0.182554291 0.189188137 0.193807559 0.195903010
[61] 0.208902645 0.211308713 0.232942314 0.236135220 0.251950116 0.260816843
[67] 0.284090255 0.284150541 0.288498370 0.295515143 0.299408623 0.301264703
[73] 0.306817872 0.307853369 0.324882091 0.353241217 0.366800517 0.389474449
[79] 0.398838576 0.404266315 0.408936260 0.409198619 0.415165553 0.433960390
[85] 0.440690262 0.458692639 0.464027428 0.474214070 0.517224262 0.538532221
[91] 0.544911543 0.559945121 0.585390414 0.647030109 0.694095422 0.708385079
[97] 0.736486707 0.787250428 0.870874773 1.000000000
Alternatively:
scale(x,center=min(x),scale=diff(range(x)))
(untested)
This has the feature that it attaches the original centering and scaling factors to the output as attributes, so they can be retrieved and used to un-scale the data later (if desired). It has the oddity that it always returns the result as a (columnwise) matrix, even if it was passed a vector; you can use drop(scale(...)) if you want a vector instead of a matrix (this usually doesn't matter but the matrix format can occasionally cause trouble downstream ... in my experience more often with tibbles/in tidyverse, although I haven't stopped to examine exactly what's going wrong in these cases).
This should do it:
reshape::rescaler.default(s, type = "range")
EDIT
I was curious about the performance of the two methods
> system.time(replicate(100, range01(s)))
user system elapsed
0.56 0.12 0.69
> system.time(replicate(100, reshape::rescaler.default(s, type = "range")))
user system elapsed
0.53 0.18 0.70
Extracting the raw code from reshape::rescaler.default
range02 <- function(x) {
(x - min(x, na.rm=TRUE)) / diff(range(x, na.rm=TRUE))
}
> system.time(replicate(100, range02(s)))
user system elapsed
0.56 0.12 0.68
Yields similar result.
You can also make use of the caret package which will provide you the preProcess function which is just simple like this:
preProcValues <- preProcess(yourData, method = "range")
dataScaled <- predict(preProcValues, yourData)
More details on the package help.
I created following function in r:
ReScale <- function(x,first,last){(last-first)/(max(x)-min(x))*(x-min(x))+first}
Here, first is start point, last is end point.
I generated a series of 10,000 random numbers through:
rand_x = rf(10000, 3, 5)
Now I want to produce another series that contains the variances at each point i.e. the column look like this:
[variance(first two numbers)]
[variance(first three numbers)]
[variance(first four numbers)]
[variance(first five numbers)]
.
.
.
.
[variance of 10,000 numbers]
I have written the code as:
c ( var(rand_x[1:1]) : var(rand_x[1:10000])
but I am only getting 157 elements in the column rather than not 10,000. Can someone guide what I am doing wrong here?
An option is to loop over the index from 2 to 10000 in sapply, extract the elements of 'rand_x' from position 1 to the looped index, apply the var and return a vector of variance output
out <- sapply(2:10000, function(i) var(rand_x[1:i]))
Your code creates a sequence incrementing by one with the variance of the first two elements as start value and the variance of the whole vector as limit.
var(rand_x[1:2]):var(rand_x[1:n])
# [1] 0.9026262 1.9026262 2.9026262
## compare:
.9026262:3.33433
# [1] 0.9026262 1.9026262 2.9026262
What you want is to loop over the vector indices, using seq_along to get the variances of sequences growing by one. To see what needs to be done, I show you first a (rather slow) for loop.
vars <- numeric() ## initialize numeric vector
for (i in seq_along(rand_x)) {
vars[i] <- var(rand_x[1:i])
}
vars
# [1] NA 0.9026262 1.4786540 1.2771584 1.7877717 1.6095619
# [7] 1.4483273 1.5653797 1.8121144 1.6192175 1.4821020 3.5005254
# [13] 3.3771453 3.1723564 2.9464537 2.7620001 2.7086317 2.5757641
# [19] 2.4330738 2.4073546 2.4242747 2.3149455 2.3192964 2.2544765
# [25] 3.1333738 3.0343781 3.0354998 2.9230927 2.8226541 2.7258979
# [31] 2.6775278 2.6651541 2.5995346 3.1333880 3.0487177 3.0392603
# [37] 3.0483917 4.0446074 4.0463367 4.0465158 3.9473870 3.8537925
# [43] 3.8461463 3.7848464 3.7505158 3.7048694 3.6953796 3.6605357
# [49] 3.6720684 3.6580296
The first element has to be NA because the variance of one element is not defined (division by zero).
However, the for loop is slow. Since R is vectorized we rather want to use a function from the *apply family, e.g. vapply, which is much faster. In vapply we initialize with numeric(1) (or just 0) because the result of each iteration is of length one.
vars <- vapply(seq_along(rand_x), function(i) var(rand_x[1:i]), numeric(1))
vars
# [1] NA 0.9026262 1.4786540 1.2771584 1.7877717 1.6095619
# [7] 1.4483273 1.5653797 1.8121144 1.6192175 1.4821020 3.5005254
# [13] 3.3771453 3.1723564 2.9464537 2.7620001 2.7086317 2.5757641
# [19] 2.4330738 2.4073546 2.4242747 2.3149455 2.3192964 2.2544765
# [25] 3.1333738 3.0343781 3.0354998 2.9230927 2.8226541 2.7258979
# [31] 2.6775278 2.6651541 2.5995346 3.1333880 3.0487177 3.0392603
# [37] 3.0483917 4.0446074 4.0463367 4.0465158 3.9473870 3.8537925
# [43] 3.8461463 3.7848464 3.7505158 3.7048694 3.6953796 3.6605357
# [49] 3.6720684 3.6580296
Data:
n <- 50
set.seed(42)
rand_x <- rf(n, 3, 5)
I have raster cell values with 5 digits, but I need to get rid of the first one, for instance, if the the cell value is "31345" I need to make it "1345".
I am trying to use the calc() function from the raster package to do that by subtracting different numbers from based on the raster cell value (since they are all numbers), like this:
correct.grid <- calc(grid, fun=function(x){ifelse(x < 40000, x-30000,
ifelse(x > 40000 & < 50000, x-40000,
ifelse(x > 50000 & < 60000, x-50000,
ifelse(x > 60000, x-60000, 0)))))})
I guess this is probably a terrible approach to the problem (I am not really good at programming), still, I ran into an error because I am using ">" and "<" inside the function I am guessing. Any ideas on how to make these "ifelse"s to work or maybe a smarter approach to the problem?
This is a piece of the unique values in my data if it helps:
> unique(grid)
[1] 30057 30084 30207 30230 30235 30237 30280 30283 30311 30319 30320 30326 30350 30351 30352 30360
[17] 30384 30396 30415 30420 30447 30449 30452 30456 30478 30481 30497 30507 30522 30560 30562 30605
[33] 30606 30612 30638 30639 30645 30654 30657 30658 30662 30665 30678 30682 30701 30707 30714 30736
[49] 30740 30743 30749 30750 30823 30824 30841 30852 30862 30892 30896 30898 30915 30920 30928 30934
[65] 30956 30962 30978 30986 30998 31021 31022 31031 31042 31053 31055 31081 31085 31092 31097 31099
[81] 31114 31115 31122 31126 31129 31130 31131 31141 31157 31168 31171 40019 40026 40075 40197 40217
[97] 50342 50360 50379 50496 50720 50725 50732 50766 50798 50837 51073 51092 51397 53096 53110 53117
[113] 53118 53120 53124 60003 60005 60041 60485 60516 60655 60661 60825 61039 61174 61185 61187 61210
[129] 61221 61224 61227 61259 61287 61289 61290 61295
If you just want to remove the leftmost digit of each value, how about this:
First, let's load a raster object to work with:
library(raster)
# Load a raster object to work with
grid = system.file("external/test.grd", package="raster")
grid = raster(grid)
# Set up values to be whole numbers
values(grid) = round(values(grid)*100)
Now, remove the leftmost digit from every value in the raster:
values(grid) = as.numeric(substr(values(grid), 2, nchar(values(grid))))
Note that a value with one or more zeros after the leftmost digit will get shortened by more than one digit. For example, 60661 will become 661 and 30001 will become 1.
I want to know how can I calculate large values multiplication in R.
R returns Inf!
For example:
6.350218e+277*2.218789e+215
[1] Inf
Let me clarify the problem more:
consider the following code and the results of outFunc function:
library(hypergeo)
poch <-function(a,b) gamma(a+b)/gamma(a)
n<-c(37 , 41 , 4 , 9 , 12 , 13 , 2 , 5 , 23 , 73 , 129 , 22 , 121 )
v<-c(90.2, 199.3, 61, 38, 176.3, 293.6, 318.6, 328.7, 328.1, 313.3, 142.4, 92.9, 95.5)
DF<-data.frame(n,v)
outFunc<-function(k,w,r,lam,a,b) {
((((w*lam)^k) * poch(r,k) * poch(a,b) ) * hypergeo(r+k,a+k,a+b+k,-(w*lam)) )/(poch(a+k,b)*factorial(k))
}
and the function returns:
outFunc(DF$n,DF$v,0.2, 1, 3, 1)
[1] 0.002911330+ 0i 0.003047594+ 0i 0.029886646+ 0i 0.013560599+ 0i 0.010160073+ 0i
[6] 0.008928524+ 0i 0.040165795+ 0i 0.019402318+ 0i 0.005336008+ 0i 0.001689114+ 0i
[11] Inf+NaNi 0.005577985+ 0i Inf+NaNi
As can be seen above, outFunc returns Inf+NaNi for n values of 129 and 121.
I checked the code sections part by part and I find that the returned results of (wlam)^k poch(r,k) for these n values are Inf. I also check my code with equivalent code in Mathematica which everything is OK:
in: out[indata[[All, 1]], indata[[All, 2]], 0.2, 1, 3, 1]
out: {0.00291133, 0.00304759, 0.0298866, 0.0135606, 0.0101601, 0.00892852, \
0.0401658, 0.0194023, 0.00533601, 0.00168911, 0.000506457, \
0.00557798, 0.000365445}
Now please let me know how we can solve this issue as simple as it is in Mathematica. regards.
One option you have available in base R, which does not require a special library, is to convert the two numbers to a common base, and then add the exponents together to get the final result:
> x <- log(6.350218e+277, 10)
> x
[1] 277.8028
> y <- log(2.218789e+215, 10)
> y
[1] 215.3461
> x + y
[1] 493.1489
Since 10^x * 10^y = 10^(x+y), your final answer is 10^493.1489
Note that this solution does not allow to actually store numbers which R would normally treat as INF. Hence, in this example, you still cannot compute 10^493, but you can tease out what the product would be.
For first, I'd recommend two useful reads: logarithms and how floating values are handled by a computer. These are pertinent because with some "tricks" you can handle much bigger values than you think. For instance, your definition of the poch function is terrible. This because the fraction can be simplified a lot but a computer will evaluate the numerator first and if it overflows the result will be useless. That's why R provides beside gamma the lgamma function: it just calculates the logarithm of gamma and can handle much bigger values. So, we calculate the log of each factor in your function and then we use exp to restore the intended values. Try this:
#redefine poch properly
poch<-function(a,b) lgamma(a+b) - lgamma(a)
#redefine outFunc
outFunc<-function(k,w,r,lam,a,b) {
exp((k*(log(w)+log(lam))+ poch(r,k) + poch(a,b) ) +
log(hypergeo(r+k,a+k,a+b+k,-(w*lam)))- poch(a+k,b)-lgamma(k+1))
}
#Now we go
outFunc(DF$n,DF$v,0.2, 1, 3, 1)
#[1] 0.0029113299+0i 0.0030475939+0i 0.0298866458+0i 0.0135605995+0i
#[5] 0.0101600732+0i 0.0089285243+0i 0.0401657947+0i 0.0194023182+0i
#[9] 0.0053360084+0i 0.0016891144+0i 0.0005064566+0i 0.0055779850+0i
#[13] 0.0003654449+0i
> library(gmp)
> x<- pow.bigz(6.350218,277)
> y<- pow.bigz(2.218789,215)
> x*y
Big Integer ('bigz') :
[1] 18592826814872791919942226542714580401488894909642693257011204682802122918146288728149155739011270579948954646130492024596687919148494136290260248656581476275790189359808616520170359345612068099238508437236172770752199936303947098513476300142414338199993261924467166943683593371648
How do I scale a series such that the first number in the series is 0 and last number is 1. I looked into 'approx', 'scale' but they do not achieve this objective.
# generate series from exponential distr
s = sort(rexp(100))
# scale/interpolate 's' such that it starts at 0 and ends at 1?
# approx(s)
# scale(s)
The scales package has a function that will do this for you: rescale.
library("scales")
rescale(s)
By default, this scales the given range of s onto 0 to 1, but either or both of those can be adjusted. For example, if you wanted it scaled from 0 to 10,
rescale(s, to=c(0,10))
or if you wanted the largest value of s scaled to 1, but 0 (instead of the smallest value of s) scaled to 0, you could use
rescale(s, from=c(0, max(s)))
It's straight-forward to create a small function to do this using basic arithmetic:
s = sort(rexp(100))
range01 <- function(x){(x-min(x))/(max(x)-min(x))}
range01(s)
[1] 0.000000000 0.003338782 0.007572326 0.012192201 0.016055006 0.017161145
[7] 0.019949532 0.023839810 0.024421602 0.027197168 0.029889484 0.033039408
[13] 0.033783376 0.038051265 0.045183382 0.049560233 0.056941611 0.057552543
[19] 0.062674982 0.066001242 0.066420884 0.067689067 0.069247825 0.069432174
[25] 0.070136067 0.076340460 0.078709590 0.080393512 0.085591881 0.087540132
[31] 0.090517295 0.091026499 0.091251213 0.099218526 0.103236344 0.105724733
[37] 0.107495340 0.113332392 0.116103438 0.124050331 0.125596034 0.126599323
[43] 0.127154661 0.133392300 0.134258532 0.138253452 0.141933433 0.146748798
[49] 0.147490227 0.149960293 0.153126478 0.154275371 0.167701855 0.170160948
[55] 0.180313542 0.181834891 0.182554291 0.189188137 0.193807559 0.195903010
[61] 0.208902645 0.211308713 0.232942314 0.236135220 0.251950116 0.260816843
[67] 0.284090255 0.284150541 0.288498370 0.295515143 0.299408623 0.301264703
[73] 0.306817872 0.307853369 0.324882091 0.353241217 0.366800517 0.389474449
[79] 0.398838576 0.404266315 0.408936260 0.409198619 0.415165553 0.433960390
[85] 0.440690262 0.458692639 0.464027428 0.474214070 0.517224262 0.538532221
[91] 0.544911543 0.559945121 0.585390414 0.647030109 0.694095422 0.708385079
[97] 0.736486707 0.787250428 0.870874773 1.000000000
Alternatively:
scale(x,center=min(x),scale=diff(range(x)))
(untested)
This has the feature that it attaches the original centering and scaling factors to the output as attributes, so they can be retrieved and used to un-scale the data later (if desired). It has the oddity that it always returns the result as a (columnwise) matrix, even if it was passed a vector; you can use drop(scale(...)) if you want a vector instead of a matrix (this usually doesn't matter but the matrix format can occasionally cause trouble downstream ... in my experience more often with tibbles/in tidyverse, although I haven't stopped to examine exactly what's going wrong in these cases).
This should do it:
reshape::rescaler.default(s, type = "range")
EDIT
I was curious about the performance of the two methods
> system.time(replicate(100, range01(s)))
user system elapsed
0.56 0.12 0.69
> system.time(replicate(100, reshape::rescaler.default(s, type = "range")))
user system elapsed
0.53 0.18 0.70
Extracting the raw code from reshape::rescaler.default
range02 <- function(x) {
(x - min(x, na.rm=TRUE)) / diff(range(x, na.rm=TRUE))
}
> system.time(replicate(100, range02(s)))
user system elapsed
0.56 0.12 0.68
Yields similar result.
You can also make use of the caret package which will provide you the preProcess function which is just simple like this:
preProcValues <- preProcess(yourData, method = "range")
dataScaled <- predict(preProcValues, yourData)
More details on the package help.
I created following function in r:
ReScale <- function(x,first,last){(last-first)/(max(x)-min(x))*(x-min(x))+first}
Here, first is start point, last is end point.