How to subset matrices in an list in R? - r

Currently, I have a list of 500 elements, named List.500. In each list, I have 3 vectors and 1 matrix. The first element is:
> List.500[[1]]
$two_values
$two_values$bin
[1] 0 1
$grid_points$grid
[1] 0.05000000 0.06836735 0.08673469 0.10510204 0.12346939 0.14183673 0.16020408
[8] 0.17857143 0.19693878 0.21530612 0.23367347 0.25204082 0.27040816 0.28877551
$mean_0
[1] 14.48597 14.49662 14.51089 14.52915 14.55242 14.58129 14.61866 14.66572 14.72186
[10] 14.79531 14.88589 14.99356 15.13048 15.29701
$mean_1
[1] 16.48597 16.49662 16.51089 16.52915 16.55242 16.58129 16.61866 16.66572 16.72186
[10] 16.79531 16.88589 16.99356 17.13048 17.29701
$mean_grid
g=0.05 g=0.07 g=0.09 g=0.11 g=0.12 g=0.14 g=0.16 g=0.18 g=0.2
bin=0 14.48597 14.49662 14.51089 14.52915 14.55242 14.58129 14.61866 14.66572 14.72186
bin=1 16.48597 16.49662 16.51089 16.52915 16.55242 16.58129 16.61866 16.66572 16.72186
g=0.22 g=0.23 g=0.25 g=0.27 g=0.29
bin=0 14.79531 14.88589 14.99356 15.13048 15.29701
bin=1 16.79531 16.88589 16.99356 17.13048 17.29701
I would like to subset out only the 1st, 2nd, and 3rd elements from each of the 2 vectors (not including the first vector named two_values$bin and 1 matrix (1st, 2nd, 3rd columns), for each of the 500 elements of List.500. I want to leave two_values$bin alone.
Ideally, I would like to get:
> List.500[[1]]
$two_values
$two_values$bin
[1] 0 1
$grid_points$grid
[1] 0.05000000 0.06836735 0.08673469
$mean_0
[1] 14.48597 14.49662 14.51089
$mean_1
[1] 16.48597 16.49662 16.51089
$mean_grid
g=0.05 g=0.07 g=0.09
bin=0 14.48597 14.49662 14.51089
bin=1 16.48597 16.49662 16.51089
for each of the 500 elements in List.500. Is there a simple way to do this without resorting to breaking the list apart and looping? Thanks.

As commented, you can use rapply. I think the elements of your list follow a pattern but for the purpose of this demonstration, I used the following data.
set.seed(123)
List.500 <- lapply(1:3, function(x) list(two_values = list(bin = 0:1),
grid_points = list(grid = runif(16, 0,.3)),
mean_0 = runif(14, 14, 16),
mean_1 = runif(14, 16, 18),
mean_grid = matrix(runif(28, 14, 18), nrow = 2, byrow = TRUE)))
The following code will do exactly what you wanted.
rapply(List.500,
function(x) {if(is.matrix(x)) {x[,1:3]} else {
if(length(x) == 2) {x} else {x[1:3]}
}},
how = "replace")

Related

fill NA raster cells using focal defined by boundary

I have a raster and a shapefile. The raster contains NA and I am filling the NAs using the focal function
library(terra)
v <- vect(system.file("ex/lux.shp", package="terra"))
r <- rast(system.file("ex/elev.tif", package="terra"))
r[45:60, 45:60] <- NA
r_fill <- terra::focal(r, 5, mean, na.policy="only", na.rm=TRUE)
However, there are some NA still left. So I do this:
na_count <- terra::freq(r_fill, value = NA)
while(na_count$count != 0){
r_fill <- terra::focal(r_fill, 5, mean, na.policy="only", na.rm=TRUE)
na_count <- terra::freq(r_fill, value = NA)
}
Once all NA's are filled, I clip the raster again using the shapefile
r_fill <- terra::crop(r_fill, v, mask = T, touches = T)
This is what my before and after looks like:
I wondered if the while loop is an efficient way to fill the NAs or basically determine how many times I have to run focal to fill all the NAs in the raster.
Perhaps we can, or want to, dispense with the while( altogether by making a better estimate of focal('s w= arg in a world where r, as ground truth, isn't available. Were it available, we could readily derive direct value of w
r <- rast(system.file("ex/elev.tif", package="terra"))
# and it's variants
r2 <- r
r2[45:60, 45:60] <- NA
freq(r2, value=NA) - freq(r, value=NA)
layer value count
1 0 NA 256
sqrt((freq(r2, value=NA) - freq(r, value=NA))$count)
[1] 16
which might be a good value for w=, and introducing another variant
r3 <- r
r3[40:47, 40:47] <- NA
r3[60:67, 60:67] <- NA
r3[30:37, 30:37] <- NA
r3[70:77, 40:47] <- NA
rm(r)
We no longer have our ground truth. How might we estimate an edge of w=? Turning to boundaries( default values (inner)
r2_bi <- boundaries(r2)
r3_bi <- boundaries(r3)
# examining some properties of r2_bi, r3_bi
freq(r2_bi, value=1)$count
[1] 503
freq(r3_bi, value=1)$count
[1] 579
freq(r2_bi, value=1)$count/freq(r2_bi, value = 0)$count
[1] 0.1306833
freq(r3_bi, value=1)$count/freq(r3_bi, value = 0)$count
[1] 0.1534588
sum(freq(r2_bi, value=1)$count,freq(r2_bi, value = 0)$count)
[1] 4352
sum(freq(r3_bi, value=1)$count,freq(r3_bi, value = 0)$count)
[1] 4352
Taken in reverse order, sum[s] and freq[s] suggest that while the total area of (let's call them holes) are the same, they differ in number and r2 is generally larger than r3. This is also clear from the first pair of freq[s].
Now we drift into some voodoo, hocus pocus in pursuit of a better edge estimate
sum(freq(r2)$count) - sum(freq(r2, value = NA)$count)
[1] 154
sum(freq(r3)$count) - sum(freq(r3, value = NA)$count)
[1] 154
(sum(freq(r3)$count) - sum(freq(r3, value = NA)$count))
[1] 12.40967
freq(r2_bi, value=1)$count/freq(r2_bi, value = 0)$count
[1] 0.1306833
freq(r2_bi, value=0)$count/freq(r2_bi, value = 1)$count
[1] 7.652087
freq(r3_bi, value=1)$count/freq(r3_bi, value = 0)$count
[1] 0.1534588
taking the larger, i.e. freq(r2_bi 7.052087
7.652087/0.1306833
[1] 58.55444
154+58
[1] 212
sqrt(212)
[1] 14.56022
round(sqrt(212)+1)
[1] 16
Well, except for that +1 part, maybe still a decent estimate for w=, to be used on both r2 and r3 if called upon to find a better w, and perhaps obviate the need for while(.
Another approach to looking for squares and their edges:
wtf3 <- values(r3_bi$elevation)
wtf2 <- values(r2_bi$elevation)
wtf2_tbl_df2 <- as.data.frame(table(rle(as.vector(is.na(wtf2)))$lengths))
wtf3_tbl_df2 <- as.data.frame(table(rle(as.vector(is.na(wtf3)))$lengths))
names(wtf2_tbl_df2)
[1] "Var1" "Freq"
wtf2_tbl_df2[which(wtf2_tbl_df2$Var1 == wtf2_tbl_df2$Freq), ]
Var1 Freq
14 16 16
wtf3_tbl_df2[which(wtf3_tbl_df2$Freq == max(wtf3_tbl_df2$Freq)), ]
Var1 Freq
7 8 35
35/8
[1] 4.375 # 4 squares of 8 with 3 8 length vectors
bringing in v finally and filling
v <- vect(system.file("ex/lux.shp", package="terra"))
r2_fill_17 <- focal(r2, 16 + 1 , mean, na.policy='only', na.rm = TRUE)
r3_fill_9 <- focal(r3, 8 + 1 , mean, na.policy='only', na.rm = TRUE)
r2_fill_17_cropv <- crop(r2_fill_17, v, mask = TRUE, touches = TRUE)
r3_fill_9_cropv <- crop(r3_fill_9, v, mask = TRUE, touches = TRUE)
And I now appreciate your while( approach as your r2 looks better, more naturally transitioned, though the r3 looks fine. In my few, brief experiments with smaller than 'hole', i.e. focal(r2, 9, I got the sense it would take 2 passes to fill, that suggests focal(r2, 5 would take 4.
I guess further determining the proportion of fill:hole:rast for when to deploy a while would be worthwhile.

Calculate mean of every nth element

I have a vector that holds hourly data of 31 days, so it has a length of 31*24 = 744. Now I would like to calculate the mean diurnal cycle of the variable that is included in the vector. In order to do that, the mean value of every hour of the day is needed. For 01 UTC for example, the relevant steps are 1,25,49,73,...,721, for 2 UTC they are 2,26,50,74,...,722 and so forth. So I need something that enables the calculation of the mean value with a 24 element moving window.
Here is some code for an exemplary vector:
set.seed(1)
my.vec <- sample(-20:20, size = 744, replace = T)
The output vector should then be of length 24, of course.
Anybody with a hint?
Another possible solution, using base R:
rowMeans(matrix(my.vec, 24, 31))
#> [1] -0.9354839 -0.3548387 -1.0322581 2.5161290 2.1290323 0.7419355
#> [7] 1.3870968 1.4838710 0.9032258 -1.9032258 4.2903226 -0.4193548
#> [13] -1.9354839 -3.1935484 -2.1935484 2.0322581 0.2580645 2.4193548
#> [19] 0.8064516 0.8064516 5.0645161 -0.5806452 -1.2580645 -0.1290323
base
set.seed(1)
my.vec <- sample(-20:20, size = 744, replace = T)
m <- matrix(my.vec, 31, byrow = TRUE)
colMeans(m)
#> [1] -0.9354839 -0.3548387 -1.0322581 2.5161290 2.1290323 0.7419355
#> [7] 1.3870968 1.4838710 0.9032258 -1.9032258 4.2903226 -0.4193548
#> [13] -1.9354839 -3.1935484 -2.1935484 2.0322581 0.2580645 2.4193548
#> [19] 0.8064516 0.8064516 5.0645161 -0.5806452 -1.2580645 -0.1290323
Created on 2022-04-25 by the reprex package (v2.0.1)
We can use rollapply and it should also work with vectors of different lengths
library(zoo)
out <- colMeans(rollapply(seq_along(my.vec), width = 24, by = 24,
FUN = function(i) my.vec[i]))
-checking
> length(out)
[1] 24
> mean(my.vec[seq(1, length(my.vec), by = 24)])
[1] -0.9354839
> mean(my.vec[seq(2, length(my.vec), by = 24)])
[1] -0.3548387
> out[1:2]
[1] -0.9354839 -0.3548387

How can I detect changes from negative to positive value?

I have calculated the differences of my data points and received this vector:
> diff(smooth$a)/(diff(smooth$b))
[1] -0.0099976150 0.0011162606 0.0116275973 0.0247594149 0.0213592319 0.0205187495 0.0179274056 0.0207752713
[9] 0.0231903072 -0.0077549224 -0.0401528643 -0.0477294350 -0.0340842051 -0.0148157337 0.0003829642 0.0160912230
[17] 0.0311189830
Now I want to get the positions (index) where I have a change from negative to positive when the following 3 data points are also positive.
So my output would be like this:
> output
-0.0099976150 -0.0148157337
How could I do this?
One way like this:
series <- paste(ifelse(vec < 0, 0, 1), collapse = '')
vec[gregexpr('0111', series)[[1]]]
#[1] -0.009997615 -0.014815734
The first line creates a sequence of 0s and 1s depending on the sign of the number. In the second line of the code we capture the sequence with gregexpr. Finally, we use these indices to subset the original vector.
Imagine a vector z:
z <- seq(-2, 2, length.out = 20)
z
#> [1] -2.0000000 -1.7894737 -1.5789474 -1.3684211 -1.1578947 -0.9473684 -0.7368421 -0.5263158
#> [9] -0.3157895 -0.1052632 0.1052632 0.3157895 0.5263158 0.7368421 0.9473684 1.1578947
#> [17] 1.3684211 1.5789474 1.7894737 2.0000000
then you can do
turn_point <- which(z == max(z[z < 0]))
turn_plus_one <- c(turn_point, turn_point + 1)
z[turn_plus_one]
#> [1] -0.1052632 0.1052632

quick standard deviation with weights

I wanted to use a function that would quickly give me a standard deviation of a vector ad allow me to include weights for elements in the vector. i.e.
sd(c(1,2,3)) #weights all equal 1
#[1] 1
sd(c(1,2,3,3,3)) #weights equal 1,1,3 respectively
#[1] 0.8944272
For weighted means I can use wt.mean() from library(SDMTools) e.g.
> mean(c(1,2,3))
[1] 2
> wt.mean(c(1,2,3),c(1,1,1))
[1] 2
>
> mean(c(1,2,3,3,3))
[1] 2.4
> wt.mean(c(1,2,3),c(1,1,3))
[1] 2.4
but the wt.sd function does not seem to provide what I thought I wanted:
> sd(c(1,2,3))
[1] 1
> wt.sd(c(1,2,3),c(1,1,1))
[1] 1
> sd(c(1,2,3,3,3))
[1] 0.8944272
> wt.sd(c(1,2,3),c(1,1,3))
[1] 1.069045
I am expecting a function that returns 0.8944272 from me weighted sd. Preferably I would be using this on a data.frame like:
data.frame(x=c(1,2,3),w=c(1,1,3))
library(Hmisc)
sqrt(wtd.var(1:3,c(1,1,3)))
#[1] 0.8944272
You can use rep to replicate the values according to their weights. Then, sd can be computed for the resulting vector.
x <- c(1, 2, 3) # values
w <- c(1, 1, 3) # weights
sd(rep(x, w))
[1] 0.8944272

Looping in function using R

require(fracdiff)
#load your data, n is the sample size
x<-matrix(scan("data.txt", n = ), n, 1, byrow = TRUE)
x<-log(x)
x<-x-mean(x)
n<-length(x)
#select the truncation
m<-round(n^(0.7))
perdx<-px[2:n]
fn<-function(h)
{
lambda<-freq[1:m]^(2*h-1)
Gh=mean(perdx[1:m]*lambda)
Rh=log(Gh)-(2*h-1)*mean(log(freq[1:m]))
return(Rh)
}
est<-optimize(fn,c(0,1.5),tol = 0.00001)
hhat<-est$minimum
b <- hhat-0.5
b
I have this function written in R and I want to do loop for m, where m<-round(n^(0.7)) where the power of n running from 0.3-0.8(the default is 0.7, so I have different value of m supply to the function), and ultimately get the string of b (every b for the power of n running from 0.3-0.8), but so far I have been unsuccessful. Furthermore, I want to plot the different values of b with respect to m. I'm really hoping anyone can suggest me how get the result. Any suggestion is highly appreciated. thank you.
First some dummy data:
set.seed(1983)
freq <- rnorm(100, mean=10)
perdx <- rnorm(100, mean=100, sd=10)
Then your function (slightly shortened and modified: you need to add m as a parameter since it will be changing at each iteration), vector m and an empty vector b (you do want to allocate upfront the length of the vector):
m <- 1:100
b <- vector(length=length(m))
fn <- function(h,m){
lambda <- freq[1:m]^(2*h-1)
Gh <- mean(perdx[1:m]*lambda)
log(Gh)-(2*h-1)*mean(log(freq[1:m]))
}
And finally your loop:
for(i in seq_along(m)){b[i] <- optimize(fn,c(0,1.5),tol = 0.00001, m=m[i])$minimum - 0.5}
b
[1] 0.370809995 0.143004969 0.295652288 0.341975500 0.155323568 -0.004270843 -0.004463482 -0.005151013 -0.019702428 -0.066622938 -0.071558276 -0.051269281 -0.010162819 -0.011613268
[15] -0.043173232 -0.023702358 -0.017404588 -0.041314701 -0.041849379 -0.039659543 -0.042926431 -0.041149212 -0.050584172 -0.051101425 -0.051999900 -0.053473729 -0.007245899 -0.023556513
[29] -0.026109458 -0.035935971 -0.063366257 -0.048185532 -0.051862241 -0.051659993 -0.078318886 -0.080683266 -0.082146068 -0.088776082 -0.095815094 -0.097276217 -0.099827675 -0.090215664
[43] -0.091023273 -0.090649640 -0.091877778 -0.091318363 -0.083812376 -0.091700899 -0.086337626 -0.105456723 -0.105972890 -0.101094946 -0.101748039 -0.101323129 -0.070511638 -0.081105305
[57] -0.072667430 -0.072361640 -0.069692202 -0.067384208 -0.072985712 -0.063617816 -0.064122242 -0.067135980 -0.070663150 -0.069359528 -0.069691113 -0.084422380 -0.073379583 -0.072209507
[71] -0.069132825 -0.067681419 -0.063782326 -0.057532656 -0.063031479 -0.054001810 -0.053523184 -0.051783114 -0.053388449 -0.055742505 -0.052429781 -0.058399275 -0.059529803 -0.059389065
[85] -0.058834476 -0.043061836 -0.045186752 -0.048336234 -0.055597368 -0.065307991 -0.060903775 -0.062518358 -0.062898386 -0.059452595 -0.051983381 -0.049742105 -0.050124722 -0.049212744
[99] -0.041458672 -0.043251041

Resources