How can I detect changes from negative to positive value? - r

I have calculated the differences of my data points and received this vector:
> diff(smooth$a)/(diff(smooth$b))
[1] -0.0099976150 0.0011162606 0.0116275973 0.0247594149 0.0213592319 0.0205187495 0.0179274056 0.0207752713
[9] 0.0231903072 -0.0077549224 -0.0401528643 -0.0477294350 -0.0340842051 -0.0148157337 0.0003829642 0.0160912230
[17] 0.0311189830
Now I want to get the positions (index) where I have a change from negative to positive when the following 3 data points are also positive.
So my output would be like this:
> output
-0.0099976150 -0.0148157337
How could I do this?

One way like this:
series <- paste(ifelse(vec < 0, 0, 1), collapse = '')
vec[gregexpr('0111', series)[[1]]]
#[1] -0.009997615 -0.014815734
The first line creates a sequence of 0s and 1s depending on the sign of the number. In the second line of the code we capture the sequence with gregexpr. Finally, we use these indices to subset the original vector.

Imagine a vector z:
z <- seq(-2, 2, length.out = 20)
z
#> [1] -2.0000000 -1.7894737 -1.5789474 -1.3684211 -1.1578947 -0.9473684 -0.7368421 -0.5263158
#> [9] -0.3157895 -0.1052632 0.1052632 0.3157895 0.5263158 0.7368421 0.9473684 1.1578947
#> [17] 1.3684211 1.5789474 1.7894737 2.0000000
then you can do
turn_point <- which(z == max(z[z < 0]))
turn_plus_one <- c(turn_point, turn_point + 1)
z[turn_plus_one]
#> [1] -0.1052632 0.1052632

Related

Obtain the exponent which I should use in scaling to make all numbers in vector >= 1

Let's say I have a vector of some numbers, which can be < 1, but never <= 0.
> x = abs(rnorm(30))
> x
[1] 0.32590946 0.05018667 1.54354863 0.28925652 0.61712682 0.09444528
[7] 0.87951971 1.46243702 0.87099892 1.28553745 0.70360649 0.58973942
[13] 1.20054389 0.94429737 0.64038139 1.04173338 0.24249771 1.67273503
[19] 0.77546385 0.33547348 1.73480609 0.20757933 1.94491872 1.10547259
[25] 1.28570768 1.37621399 0.99389595 2.14107987 2.31719369 1.24458788
And when I log the entire vector I get negative values:
> log(x)
[1] -1.121135658 -2.992005742 0.434084070 -1.240441366 -0.482680726
[6] -2.359734671 -0.128379302 0.380104238 -0.138114546 0.251176883
[11] -0.351536037 -0.528074505 0.182774695 -0.057314153 -0.445691353
[16] 0.040886038 -1.416763021 0.514460030 -0.254293908 -1.092212359
[21] 0.550895644 -1.572241722 0.665220187 0.100272928 0.251309290
[26] 0.319336240 -0.006122756 0.761310314 0.840356837 0.218804452
Now the min of this vector is:
> min(x)
[1] 0.05018667
My question is this. I want to scale the data by 10^x or by 2^x(depends on log) so the log performed on this scaled set will produce only positive (or non-negative) numbers. How can I get the lowest exponent that will make it so?
Maybe this fits your needs. The function scales the values such that the minimum value gets assigned to 1 and returns the exponent as well as the scaled vector as a list:
set.seed(123)
scale_log <- function(x, base = 2) {
y <- -log(min(x)) / log(base)
list(exp = y, scaled = base^y * x)
}
x <- abs(rnorm(5))
scale_log(x, 2)
#> $exp
#> [1] 3.826061
#>
#> $scaled
#> [1] 7.949063 3.264540 22.106706 1.000000 1.833650
log2(scale_log(x, 2)$scaled)
#> [1] 2.990785e+00 1.706880e+00 4.466412e+00 3.203427e-16 8.747185e-01
scale_log(x, 10)
#> $exp
#> [1] 1.151759
#>
#> $scaled
#> [1] 7.949063 3.264540 22.106706 1.000000 1.833650
log10(scale_log(x, 10)$scaled)
#> [1] 9.003159e-01 5.138220e-01 1.344524e+00 9.643275e-17 2.633165e-01

How to subset matrices in an list in R?

Currently, I have a list of 500 elements, named List.500. In each list, I have 3 vectors and 1 matrix. The first element is:
> List.500[[1]]
$two_values
$two_values$bin
[1] 0 1
$grid_points$grid
[1] 0.05000000 0.06836735 0.08673469 0.10510204 0.12346939 0.14183673 0.16020408
[8] 0.17857143 0.19693878 0.21530612 0.23367347 0.25204082 0.27040816 0.28877551
$mean_0
[1] 14.48597 14.49662 14.51089 14.52915 14.55242 14.58129 14.61866 14.66572 14.72186
[10] 14.79531 14.88589 14.99356 15.13048 15.29701
$mean_1
[1] 16.48597 16.49662 16.51089 16.52915 16.55242 16.58129 16.61866 16.66572 16.72186
[10] 16.79531 16.88589 16.99356 17.13048 17.29701
$mean_grid
g=0.05 g=0.07 g=0.09 g=0.11 g=0.12 g=0.14 g=0.16 g=0.18 g=0.2
bin=0 14.48597 14.49662 14.51089 14.52915 14.55242 14.58129 14.61866 14.66572 14.72186
bin=1 16.48597 16.49662 16.51089 16.52915 16.55242 16.58129 16.61866 16.66572 16.72186
g=0.22 g=0.23 g=0.25 g=0.27 g=0.29
bin=0 14.79531 14.88589 14.99356 15.13048 15.29701
bin=1 16.79531 16.88589 16.99356 17.13048 17.29701
I would like to subset out only the 1st, 2nd, and 3rd elements from each of the 2 vectors (not including the first vector named two_values$bin and 1 matrix (1st, 2nd, 3rd columns), for each of the 500 elements of List.500. I want to leave two_values$bin alone.
Ideally, I would like to get:
> List.500[[1]]
$two_values
$two_values$bin
[1] 0 1
$grid_points$grid
[1] 0.05000000 0.06836735 0.08673469
$mean_0
[1] 14.48597 14.49662 14.51089
$mean_1
[1] 16.48597 16.49662 16.51089
$mean_grid
g=0.05 g=0.07 g=0.09
bin=0 14.48597 14.49662 14.51089
bin=1 16.48597 16.49662 16.51089
for each of the 500 elements in List.500. Is there a simple way to do this without resorting to breaking the list apart and looping? Thanks.
As commented, you can use rapply. I think the elements of your list follow a pattern but for the purpose of this demonstration, I used the following data.
set.seed(123)
List.500 <- lapply(1:3, function(x) list(two_values = list(bin = 0:1),
grid_points = list(grid = runif(16, 0,.3)),
mean_0 = runif(14, 14, 16),
mean_1 = runif(14, 16, 18),
mean_grid = matrix(runif(28, 14, 18), nrow = 2, byrow = TRUE)))
The following code will do exactly what you wanted.
rapply(List.500,
function(x) {if(is.matrix(x)) {x[,1:3]} else {
if(length(x) == 2) {x} else {x[1:3]}
}},
how = "replace")

quick standard deviation with weights

I wanted to use a function that would quickly give me a standard deviation of a vector ad allow me to include weights for elements in the vector. i.e.
sd(c(1,2,3)) #weights all equal 1
#[1] 1
sd(c(1,2,3,3,3)) #weights equal 1,1,3 respectively
#[1] 0.8944272
For weighted means I can use wt.mean() from library(SDMTools) e.g.
> mean(c(1,2,3))
[1] 2
> wt.mean(c(1,2,3),c(1,1,1))
[1] 2
>
> mean(c(1,2,3,3,3))
[1] 2.4
> wt.mean(c(1,2,3),c(1,1,3))
[1] 2.4
but the wt.sd function does not seem to provide what I thought I wanted:
> sd(c(1,2,3))
[1] 1
> wt.sd(c(1,2,3),c(1,1,1))
[1] 1
> sd(c(1,2,3,3,3))
[1] 0.8944272
> wt.sd(c(1,2,3),c(1,1,3))
[1] 1.069045
I am expecting a function that returns 0.8944272 from me weighted sd. Preferably I would be using this on a data.frame like:
data.frame(x=c(1,2,3),w=c(1,1,3))
library(Hmisc)
sqrt(wtd.var(1:3,c(1,1,3)))
#[1] 0.8944272
You can use rep to replicate the values according to their weights. Then, sd can be computed for the resulting vector.
x <- c(1, 2, 3) # values
w <- c(1, 1, 3) # weights
sd(rep(x, w))
[1] 0.8944272

R: How to generate a sequence seq() given a condition?

I just discovered R and I am trying to work with it.
Here is what I am trying to achieve:
I have a vector of numbers, x, between 50 and 100 and with a size of 250 observations.
x = sample(seq(50, 100), 250, repeat = T)
Now, I want to generate another vector of numbers, y, between 0 and 100, which is the same size as vector x such that each element in y is less than or equal to its equivalent in x.
That is to say that if x[1] is 76, for example, the highest value y[1] could attain when generated is 76. But it could definitely be any other value below 76. In other words and more generally, I want vector y to be generated in such a way that y[i] <= x[i].
I hope I have made my request clearer.
Thank you very much!
y <- x -1 # ...........................
y <- sapply( x, function(x) runif(n=1, max=x))
y
[1] 7.2713788 30.0008063 42.5205775 0.9271717 10.7114456 39.5199145 7.4109775
[8] 28.3464373 28.5840101 34.0654033 15.0675028 50.2836294 45.9031794 13.5931005
[15] 43.2751738 17.0560824 3.1507491 25.7619129 12.3391448 22.6203684 51.3334810
[22] 37.0481703 33.4733277 37.1304850 26.7984406 66.3844126 40.2775918 47.6379024
[29] 16.2480595 66.8358384 33.3513161 60.2673874 65.6204462 45.6951960 1.5729434
[36] 20.4850357 0.1345737 84.5334203 19.7997451 53.8025623 48.5528486 8.8992123
[43] 90.9651742 28.3584167 41.7728159 46.4790641 17.8129578 83.1906415 37.5114353
[50] 89.5685501 85.2499600

interpolate a series over another series in R

Suppose if I have a random time series that I want to interpolate over another time series. How would I do this in R?
# generate time interval series from exponential distribution
s = sort(rexp(10))
# scale between 0 and 1
scale01 = function(x){(x-min(x))/(max(x)-min(x))}
s = scale01(s)
> s
[1] 0.00000000 0.02804113 0.05715588 0.10630185 0.15778932 0.20391987 0.26066608 0.27265697 0.39100373
[10] 1.00000000
# generate random normal series
x = rnorm(20)
> x
[1] -0.82530658 0.92289557 0.39827984 -0.62416117 -1.69055539 -0.28164232 -1.32717654 -1.36992509
[9] -1.54352202 -1.09826247 -0.68260576 1.07307043 2.35298180 -0.41472811 0.38919315 -0.27325343
[17] -1.52592682 0.05400849 -0.43498544 0.73841106
# interpolate 'x' over 's' ?
> approx(x,xout=s)
$x
[1] 0.00000000 0.02804113 0.05715588 0.10630185 0.15778932 0.20391987 0.26066608 0.27265697 0.39100373
[10] 1.00000000
$y
[1] NA NA NA NA NA NA NA NA NA
[10] -0.8253066
>
I want to interpolate the series 'x' over the series 's'. Lets assume time interval series for the 'x' series has 20 elements distributed uniformly over the interval [0,1]. Now I want to interpolate those 10 elements from 'x' that occur at time intervals described by 's'.
EDIT:
I think this does the job.
> approx(seq(0,1,length.out=20), x, xout=s)
$x
[1] 0.00000000 0.02804113 0.05715588 0.10630185 0.15778932 0.20391987 0.26066608 0.27265697 0.39100373
[10] 1.00000000
$y
[1] -0.8253066 0.1061033 0.8777987 0.3781018 -0.6221134 -1.5566990 -0.3483466 -0.4703429 -1.4444105
[10] 0.7384111
Thanks for your help guys. I think I now understand how to use interpolation functions in R now. I should really use a time series data structure here.
This isn't meant as a direct answer to the OP's Q but rather to illustrate how approx() works so the OP can formulate a better Q
Your Q makes next to no sense. approx() works by taking a reference set of x, and y coordinates and then interpolating to find y at n locations over the range of x, or at the specified xout locations supplied by the user.
So in your call, you don't provide y and x doesn't contain a y component so I don't see how this can work.
If you want to interpolate s, so you can find time intervals for any value over range of s then:
> approx(s, seq_along(s), n = 20)
$x
[1] 0.00000000 0.05263158 0.10526316 0.15789474 0.21052632 0.26315789
[7] 0.31578947 0.36842105 0.42105263 0.47368421 0.52631579 0.57894737
[13] 0.63157895 0.68421053 0.73684211 0.78947368 0.84210526 0.89473684
[19] 0.94736842 1.00000000
$y
[1] 1.00000 26.25815 42.66323 54.79831 64.96162 76.99433 79.67388
[8] 83.78458 86.14656 89.86223 91.98513 93.36233 93.77353 94.19731
[15] 94.63652 95.26239 97.67724 98.74056 99.40548 100.00000
Here $y contains the interpolated values for s at n = 20 equally spaced locations on the range of s (0,1).
Edit: If x represents the series at unstated time intervals uniform on 0,1 and you want the interpolated values of x at the time intervals s, then you need something like this:
> set.seed(1)
> x <- rnorm(20)
> s <- sort(rexp(10))
> scale01 <- function(x) {
+ (x - min(x)) / (max(x) - min(x))
+ }
> s <- scale01(s)
>
> ## interpolate x at points s
> approx(seq(0, 1, length = length(x)), x, xout = s)
$x
[1] 0.00000000 0.04439851 0.11870795 0.14379236 0.20767388 0.21218632
[7] 0.25498856 0.29079300 0.40426335 1.00000000
$y
[1] -0.62645381 0.05692127 -0.21465011 0.94393053 0.39810806 0.29323742
[7] -0.64197207 -0.13373472 0.62763207 0.59390132
Is that closer to what you want?

Resources