Calculate mean of every nth element - r

I have a vector that holds hourly data of 31 days, so it has a length of 31*24 = 744. Now I would like to calculate the mean diurnal cycle of the variable that is included in the vector. In order to do that, the mean value of every hour of the day is needed. For 01 UTC for example, the relevant steps are 1,25,49,73,...,721, for 2 UTC they are 2,26,50,74,...,722 and so forth. So I need something that enables the calculation of the mean value with a 24 element moving window.
Here is some code for an exemplary vector:
set.seed(1)
my.vec <- sample(-20:20, size = 744, replace = T)
The output vector should then be of length 24, of course.
Anybody with a hint?

Another possible solution, using base R:
rowMeans(matrix(my.vec, 24, 31))
#> [1] -0.9354839 -0.3548387 -1.0322581 2.5161290 2.1290323 0.7419355
#> [7] 1.3870968 1.4838710 0.9032258 -1.9032258 4.2903226 -0.4193548
#> [13] -1.9354839 -3.1935484 -2.1935484 2.0322581 0.2580645 2.4193548
#> [19] 0.8064516 0.8064516 5.0645161 -0.5806452 -1.2580645 -0.1290323

base
set.seed(1)
my.vec <- sample(-20:20, size = 744, replace = T)
m <- matrix(my.vec, 31, byrow = TRUE)
colMeans(m)
#> [1] -0.9354839 -0.3548387 -1.0322581 2.5161290 2.1290323 0.7419355
#> [7] 1.3870968 1.4838710 0.9032258 -1.9032258 4.2903226 -0.4193548
#> [13] -1.9354839 -3.1935484 -2.1935484 2.0322581 0.2580645 2.4193548
#> [19] 0.8064516 0.8064516 5.0645161 -0.5806452 -1.2580645 -0.1290323
Created on 2022-04-25 by the reprex package (v2.0.1)

We can use rollapply and it should also work with vectors of different lengths
library(zoo)
out <- colMeans(rollapply(seq_along(my.vec), width = 24, by = 24,
FUN = function(i) my.vec[i]))
-checking
> length(out)
[1] 24
> mean(my.vec[seq(1, length(my.vec), by = 24)])
[1] -0.9354839
> mean(my.vec[seq(2, length(my.vec), by = 24)])
[1] -0.3548387
> out[1:2]
[1] -0.9354839 -0.3548387

Related

How to get value of X in equation, d(M)/d(x) =constant using R?

I am trying to find optimal quantity for which I have to equate differentiation of Total Revenue Equation with Marginal cost. I dont know how to solve for x here. Differentiation works on expression type variable and returns the same, and solve() take numeric equation with only coefficient. I dont want to manually input coefficent.
TR = expression(Quantity * (40- 3*Quantity))
MR = D(TR,"Quantity")
Optimal_Quantity = solve(MR-MC) to get Q
The last line is pseudo code on what I want to achieve, Please help. I can manually enter the values, But wish to make it universal. MC = constant numeric value on RHS
I am not completely understanding but if you want to optimize a function, find that functions derivative then find the derivative's zeros.
TR <- expression(Quantity * (40- 3*Quantity))
MR <- D(TR,"Quantity")
class(MR)
#> [1] "call"
dTR <- function(x, const) {
e <- new.env()
e$Quantity <- x
eval(MR, envir = e) - const
}
MC <- 0
u <- uniroot(dTR, interval = c(-10, 10), const = MC)
u
#> $root
#> [1] 6.666667
#>
#> $f.root
#> [1] 0
#>
#> $iter
#> [1] 1
#>
#> $init.it
#> [1] NA
#>
#> $estim.prec
#> [1] 16.66667
curve(dTR(x, const = MC), from = -10, to = 10)
abline(h = 0)
points(u$root, u$f.root, pch = 16, col = "red")
Created on 2022-11-19 with reprex v2.0.2
Edit
To make the function dTR more general purpose, I have included an argument FUN. Above it would only evaluate MR, it can now evaluate any function passed to it.
The code below plots dTR in a large range of values, from -10 to 100, hoping to catch negative and positive end points. Then, after drawing the horizontal axis, boxes the root between 20 and 30.
dTR <- function(x, FUN, const) {
e <- new.env()
e$Quantity <- x
eval(FUN, envir = e) - const
}
total.revenue <- expression(Quantity * (10- Quantity/5))
marginal.revenue <- D(total.revenue, "Quantity")
marginal.cost <- 1
curve(dTR(x, FUN = marginal.revenue, const = marginal.cost), from = -10, to = 100)
abline(h = 0)
abline(v = c(20, 30), lty = "dashed")
u <- uniroot(dTR, interval = c(20, 30), FUN = marginal.revenue, const = marginal.cost)
u
#> $root
#> [1] 22.5
#>
#> $f.root
#> [1] 0
#>
#> $iter
#> [1] 1
#>
#> $init.it
#> [1] NA
#>
#> $estim.prec
#> [1] 7.5
Created on 2022-11-22 with reprex v2.0.2

Obtain the exponent which I should use in scaling to make all numbers in vector >= 1

Let's say I have a vector of some numbers, which can be < 1, but never <= 0.
> x = abs(rnorm(30))
> x
[1] 0.32590946 0.05018667 1.54354863 0.28925652 0.61712682 0.09444528
[7] 0.87951971 1.46243702 0.87099892 1.28553745 0.70360649 0.58973942
[13] 1.20054389 0.94429737 0.64038139 1.04173338 0.24249771 1.67273503
[19] 0.77546385 0.33547348 1.73480609 0.20757933 1.94491872 1.10547259
[25] 1.28570768 1.37621399 0.99389595 2.14107987 2.31719369 1.24458788
And when I log the entire vector I get negative values:
> log(x)
[1] -1.121135658 -2.992005742 0.434084070 -1.240441366 -0.482680726
[6] -2.359734671 -0.128379302 0.380104238 -0.138114546 0.251176883
[11] -0.351536037 -0.528074505 0.182774695 -0.057314153 -0.445691353
[16] 0.040886038 -1.416763021 0.514460030 -0.254293908 -1.092212359
[21] 0.550895644 -1.572241722 0.665220187 0.100272928 0.251309290
[26] 0.319336240 -0.006122756 0.761310314 0.840356837 0.218804452
Now the min of this vector is:
> min(x)
[1] 0.05018667
My question is this. I want to scale the data by 10^x or by 2^x(depends on log) so the log performed on this scaled set will produce only positive (or non-negative) numbers. How can I get the lowest exponent that will make it so?
Maybe this fits your needs. The function scales the values such that the minimum value gets assigned to 1 and returns the exponent as well as the scaled vector as a list:
set.seed(123)
scale_log <- function(x, base = 2) {
y <- -log(min(x)) / log(base)
list(exp = y, scaled = base^y * x)
}
x <- abs(rnorm(5))
scale_log(x, 2)
#> $exp
#> [1] 3.826061
#>
#> $scaled
#> [1] 7.949063 3.264540 22.106706 1.000000 1.833650
log2(scale_log(x, 2)$scaled)
#> [1] 2.990785e+00 1.706880e+00 4.466412e+00 3.203427e-16 8.747185e-01
scale_log(x, 10)
#> $exp
#> [1] 1.151759
#>
#> $scaled
#> [1] 7.949063 3.264540 22.106706 1.000000 1.833650
log10(scale_log(x, 10)$scaled)
#> [1] 9.003159e-01 5.138220e-01 1.344524e+00 9.643275e-17 2.633165e-01

How can I detect changes from negative to positive value?

I have calculated the differences of my data points and received this vector:
> diff(smooth$a)/(diff(smooth$b))
[1] -0.0099976150 0.0011162606 0.0116275973 0.0247594149 0.0213592319 0.0205187495 0.0179274056 0.0207752713
[9] 0.0231903072 -0.0077549224 -0.0401528643 -0.0477294350 -0.0340842051 -0.0148157337 0.0003829642 0.0160912230
[17] 0.0311189830
Now I want to get the positions (index) where I have a change from negative to positive when the following 3 data points are also positive.
So my output would be like this:
> output
-0.0099976150 -0.0148157337
How could I do this?
One way like this:
series <- paste(ifelse(vec < 0, 0, 1), collapse = '')
vec[gregexpr('0111', series)[[1]]]
#[1] -0.009997615 -0.014815734
The first line creates a sequence of 0s and 1s depending on the sign of the number. In the second line of the code we capture the sequence with gregexpr. Finally, we use these indices to subset the original vector.
Imagine a vector z:
z <- seq(-2, 2, length.out = 20)
z
#> [1] -2.0000000 -1.7894737 -1.5789474 -1.3684211 -1.1578947 -0.9473684 -0.7368421 -0.5263158
#> [9] -0.3157895 -0.1052632 0.1052632 0.3157895 0.5263158 0.7368421 0.9473684 1.1578947
#> [17] 1.3684211 1.5789474 1.7894737 2.0000000
then you can do
turn_point <- which(z == max(z[z < 0]))
turn_plus_one <- c(turn_point, turn_point + 1)
z[turn_plus_one]
#> [1] -0.1052632 0.1052632

How to subset matrices in an list in R?

Currently, I have a list of 500 elements, named List.500. In each list, I have 3 vectors and 1 matrix. The first element is:
> List.500[[1]]
$two_values
$two_values$bin
[1] 0 1
$grid_points$grid
[1] 0.05000000 0.06836735 0.08673469 0.10510204 0.12346939 0.14183673 0.16020408
[8] 0.17857143 0.19693878 0.21530612 0.23367347 0.25204082 0.27040816 0.28877551
$mean_0
[1] 14.48597 14.49662 14.51089 14.52915 14.55242 14.58129 14.61866 14.66572 14.72186
[10] 14.79531 14.88589 14.99356 15.13048 15.29701
$mean_1
[1] 16.48597 16.49662 16.51089 16.52915 16.55242 16.58129 16.61866 16.66572 16.72186
[10] 16.79531 16.88589 16.99356 17.13048 17.29701
$mean_grid
g=0.05 g=0.07 g=0.09 g=0.11 g=0.12 g=0.14 g=0.16 g=0.18 g=0.2
bin=0 14.48597 14.49662 14.51089 14.52915 14.55242 14.58129 14.61866 14.66572 14.72186
bin=1 16.48597 16.49662 16.51089 16.52915 16.55242 16.58129 16.61866 16.66572 16.72186
g=0.22 g=0.23 g=0.25 g=0.27 g=0.29
bin=0 14.79531 14.88589 14.99356 15.13048 15.29701
bin=1 16.79531 16.88589 16.99356 17.13048 17.29701
I would like to subset out only the 1st, 2nd, and 3rd elements from each of the 2 vectors (not including the first vector named two_values$bin and 1 matrix (1st, 2nd, 3rd columns), for each of the 500 elements of List.500. I want to leave two_values$bin alone.
Ideally, I would like to get:
> List.500[[1]]
$two_values
$two_values$bin
[1] 0 1
$grid_points$grid
[1] 0.05000000 0.06836735 0.08673469
$mean_0
[1] 14.48597 14.49662 14.51089
$mean_1
[1] 16.48597 16.49662 16.51089
$mean_grid
g=0.05 g=0.07 g=0.09
bin=0 14.48597 14.49662 14.51089
bin=1 16.48597 16.49662 16.51089
for each of the 500 elements in List.500. Is there a simple way to do this without resorting to breaking the list apart and looping? Thanks.
As commented, you can use rapply. I think the elements of your list follow a pattern but for the purpose of this demonstration, I used the following data.
set.seed(123)
List.500 <- lapply(1:3, function(x) list(two_values = list(bin = 0:1),
grid_points = list(grid = runif(16, 0,.3)),
mean_0 = runif(14, 14, 16),
mean_1 = runif(14, 16, 18),
mean_grid = matrix(runif(28, 14, 18), nrow = 2, byrow = TRUE)))
The following code will do exactly what you wanted.
rapply(List.500,
function(x) {if(is.matrix(x)) {x[,1:3]} else {
if(length(x) == 2) {x} else {x[1:3]}
}},
how = "replace")

interpolate a series over another series in R

Suppose if I have a random time series that I want to interpolate over another time series. How would I do this in R?
# generate time interval series from exponential distribution
s = sort(rexp(10))
# scale between 0 and 1
scale01 = function(x){(x-min(x))/(max(x)-min(x))}
s = scale01(s)
> s
[1] 0.00000000 0.02804113 0.05715588 0.10630185 0.15778932 0.20391987 0.26066608 0.27265697 0.39100373
[10] 1.00000000
# generate random normal series
x = rnorm(20)
> x
[1] -0.82530658 0.92289557 0.39827984 -0.62416117 -1.69055539 -0.28164232 -1.32717654 -1.36992509
[9] -1.54352202 -1.09826247 -0.68260576 1.07307043 2.35298180 -0.41472811 0.38919315 -0.27325343
[17] -1.52592682 0.05400849 -0.43498544 0.73841106
# interpolate 'x' over 's' ?
> approx(x,xout=s)
$x
[1] 0.00000000 0.02804113 0.05715588 0.10630185 0.15778932 0.20391987 0.26066608 0.27265697 0.39100373
[10] 1.00000000
$y
[1] NA NA NA NA NA NA NA NA NA
[10] -0.8253066
>
I want to interpolate the series 'x' over the series 's'. Lets assume time interval series for the 'x' series has 20 elements distributed uniformly over the interval [0,1]. Now I want to interpolate those 10 elements from 'x' that occur at time intervals described by 's'.
EDIT:
I think this does the job.
> approx(seq(0,1,length.out=20), x, xout=s)
$x
[1] 0.00000000 0.02804113 0.05715588 0.10630185 0.15778932 0.20391987 0.26066608 0.27265697 0.39100373
[10] 1.00000000
$y
[1] -0.8253066 0.1061033 0.8777987 0.3781018 -0.6221134 -1.5566990 -0.3483466 -0.4703429 -1.4444105
[10] 0.7384111
Thanks for your help guys. I think I now understand how to use interpolation functions in R now. I should really use a time series data structure here.
This isn't meant as a direct answer to the OP's Q but rather to illustrate how approx() works so the OP can formulate a better Q
Your Q makes next to no sense. approx() works by taking a reference set of x, and y coordinates and then interpolating to find y at n locations over the range of x, or at the specified xout locations supplied by the user.
So in your call, you don't provide y and x doesn't contain a y component so I don't see how this can work.
If you want to interpolate s, so you can find time intervals for any value over range of s then:
> approx(s, seq_along(s), n = 20)
$x
[1] 0.00000000 0.05263158 0.10526316 0.15789474 0.21052632 0.26315789
[7] 0.31578947 0.36842105 0.42105263 0.47368421 0.52631579 0.57894737
[13] 0.63157895 0.68421053 0.73684211 0.78947368 0.84210526 0.89473684
[19] 0.94736842 1.00000000
$y
[1] 1.00000 26.25815 42.66323 54.79831 64.96162 76.99433 79.67388
[8] 83.78458 86.14656 89.86223 91.98513 93.36233 93.77353 94.19731
[15] 94.63652 95.26239 97.67724 98.74056 99.40548 100.00000
Here $y contains the interpolated values for s at n = 20 equally spaced locations on the range of s (0,1).
Edit: If x represents the series at unstated time intervals uniform on 0,1 and you want the interpolated values of x at the time intervals s, then you need something like this:
> set.seed(1)
> x <- rnorm(20)
> s <- sort(rexp(10))
> scale01 <- function(x) {
+ (x - min(x)) / (max(x) - min(x))
+ }
> s <- scale01(s)
>
> ## interpolate x at points s
> approx(seq(0, 1, length = length(x)), x, xout = s)
$x
[1] 0.00000000 0.04439851 0.11870795 0.14379236 0.20767388 0.21218632
[7] 0.25498856 0.29079300 0.40426335 1.00000000
$y
[1] -0.62645381 0.05692127 -0.21465011 0.94393053 0.39810806 0.29323742
[7] -0.64197207 -0.13373472 0.62763207 0.59390132
Is that closer to what you want?

Resources