Acoustic complexity index time series output - r

I have a wav file and I would like to calculate the Acoustic Complexity Index at each second and receive a time series output.
I understand how to modify other settings within a function like seewave::ACI() but I am unable to find out how to output a time series data frame where each row is one second of time with the corresponding ACI value.
For a reproducible example, this audio file is 20 seconds, so I'd like the output to have 20 rows, with each row printing the ACI for that 1-second of time.
In fact, I'd like to achieve this is a few other indices, for example:

You can subset your wav file according to the samples it contains. Since the sampling frequency can be obtained from the wav object, we can get one-second subsets of the file and perform our calculations on each. Note that you have to set the cluster size to 1 second, since the default is 5 seconds.
f <- tropicalsound#samp.rate
starts <- head(seq(0, length(tropicalsound), f), -1)
aci <- sapply(starts, function(i) {
aci <- acoustic_complexity(tropicalsound[i + seq(f)], j = 1)
nds <- sapply(starts, function(i) {
nds <- ndsi(tropicalsound[i + seq(f)])
aei <- sapply(starts, function(i) {
aei <- acoustic_evenness(tropicalsound[i + seq(f)])
This allows us to create a second-by-second data frame representing a time series of each measure:
data.frame(time = 0:19, aci, nds, aei)
#> time aci nds aei
#> 1 0 152.0586 0.7752307 0.438022
#> 2 1 168.2281 0.4171902 0.459380
#> 3 2 149.2796 0.9366220 0.516602
#> 4 3 176.8324 0.8856127 0.485036
#> 5 4 162.4237 0.8848515 0.483414
#> 6 5 161.1535 0.8327568 0.511922
#> 7 6 163.8071 0.7532586 0.549262
#> 8 7 156.4818 0.7706808 0.436910
#> 9 8 156.1037 0.7520663 0.489253
#> 10 9 160.5316 0.7077717 0.491418
#> 11 10 157.4274 0.8320380 0.457856
#> 12 11 169.8831 0.8396483 0.456514
#> 13 12 165.4426 0.6871337 0.456985
#> 14 13 165.1630 0.7655454 0.497621
#> 15 14 154.9258 0.8083035 0.489896
#> 16 15 162.8614 0.7745876 0.458035
#> 17 16 148.6004 0.1393345 0.443370
#> 18 17 144.6733 0.8189469 0.458309
#> 19 18 156.3466 0.6067827 0.455578
#> 20 19 158.3413 0.7175293 0.477261
Note that this is simply a demonstration of how to achieve the desired output; you would need to check the literature to determine whether it is appropriate to use these measures over such short time periods.


Vectorized function usage and joining individual terms into a single tibble

the title is vague but let me explain:
I have a non-vectorized function that outputs a 15-row table of volume estimates for a tree. Each row is a different measurement unit or portion of the input tree. I have a Tables argument to help the user decide what units and measurement protocol they're looking to find, but in 99% of use case scenarios, the output for a single tree's volume estimate is a tibble with more than one row.
I've removed ~20 other arguments from the function for demonstration's sake. DBH is a tree's diameter at breast height. Vol column is arbitrary.
Est1 <- TreeVol(Tables = "All", DBH = 7)
# A tibble: 15 x 3
Tables DBH Vol
<chr> <dbl> <dbl>
1 1. Total_Above_Ground_Cubic_Volume 7 2
2 2. Gross_Inter_1/4inch_Vol 7 4
3 3. Net_Scribner_Vol 7 6
4 4. Gross_Merchantable_Vol 7 8
5 5. Net_Merchantable_Vol 7 10
6 6. Merchantable_Vol 7 12
7 7. Gross_SecondaryProduct_Vol 7 14
8 8. Net_SecondaryProduct_Vol 7 16
9 9. SecondaryProduct 7 18
10 10. Gross_Inter_1/4inch_Vol 7 20
11 11. Net_Inter_1/4inch_Vol 7 22
12 12. Gross_Scribner_SecondaryProduct 7 24
13 13. Net_Scribner_SecondaryProduct 7 26
14 14. Stump_Volume 7 28
15 15. Tip_Volume 7 30
the user can utilize the Tables argument as so:
Est2 <- TreeVol(Tables = "Scribner_BF", DBH = 7)
# A tibble: 3 x 3
Tables DBH Vol
<chr> <dbl> <dbl>
1 3. Net_Scribner_Vol 7 6
2 12. Gross_Scribner_SecondaryProduct 7 24
3 13. Net_Scribner_SecondaryProduct 7 26
The problem arises in that I'd like to write a vectorized version of this function that can calculate the volume for an entire .csv of tree inventory data. Ideally, I'd like the multi-row outputs that relate to a single tree to output as one long tibble, with each 15-row default output filtered by what the user passes to the Tables argument as so:
Est3 <- VectorizedTreeVol(Tables = "Scribner_BF", DBH = c(7, 21, 26))
# A tibble: 9 x 3
Tables DBH Vol
<chr> <dbl> <dbl>
1 3. Net_Scribner_Vol 7 6
2 12. Gross_Scribner_SecondaryProduct 7 24
3 13. Net_Scribner_SecondaryProduct 7 26
4 3. Net_Scribner_Vol 21 18
5 12. Gross_Scribner_SecondaryProduct 21 72
6 13. Net_Scribner_SecondaryProduct 21 76
7 3. Net_Scribner_Vol 26 8
8 12. Gross_Scribner_SecondaryProduct 26 78
9 13. Net_Scribner_SecondaryProduct 26 84
To achieve this, I wrote a for() loop that acts as the heart of the vectorized function. I've heard from multiple people that it's very inefficient (and I agree), but it works with the principle I'd like to achieve, in theory. Nothing I've found on this topic has suggested a better idea for application in a vectorized function like mine.
The general setup for the loop looks like this:
for(i in 1:length(DBH)){
Output <- VectorizedTreeVol(Tables = Tables[[i]], DBH = DBH[[i]]) %>%
purrr::reduce(dplyr::full_join, by = NULL) %>%
and in functions where the non-vectorized output is always a single row, the heart of its respective vectorized function doesn't need to be encased in a for() loop and looks like this:
Output <- OtherVectorizedFunction(Tables = Tables, DBH = DBH) %>%
purrr::reduce(dplyr::full_join, by = ColumnNames) %>% #ColumnNames is a vector with all of the output's column names
This specific call to reduce() has worked pretty well when I've used it to vectorize the other functions in the project, but I'm open to suggestions regarding how to join the output tables. I've been stuck on this dilemma for a few months now, and any help regarding how to achieve what this for() loop is striving for in theory would be awesome. Is having a vectorized function that outputs a tibble like Est3 even possible? Any feedback/comments are much appreciated.
Given this function:
TreeVol <- function(DBH) {
data.frame(Tables = c("Tree_Vol", "Intercapillary_transfusion", "Woodiness"),
Vol = c(DBH^2, sqrt(DBH) + 3, sin(DBH)),
We could put our DBH parameters into purrr::map and then bind_rows to get a data.frame.
VecTreeVol <- function(DBH) {
DBH %>%
purrr::map(TreeVol) %>%
> VecTreeVol(DBH = 1:3)
Tables Vol DBH
1 Tree_Vol 1.0000000 1
2 Intercapillary_transfusion 4.0000000 1
3 Woodiness 0.8414710 1
4 Tree_Vol 4.0000000 2
5 Intercapillary_transfusion 4.4142136 2
6 Woodiness 0.9092974 2
7 Tree_Vol 9.0000000 3
8 Intercapillary_transfusion 4.7320508 3
9 Woodiness 0.1411200 3

R detect zeroes in ts object

Simple question : in R, what's the best way to detect if there is a zero somewhere in a time series (ts class)? I run X13 (seasonal package) on hundreds of time series and I would like to identify those who contain zero values (since multiplicative models don't work when they encounter a zero). If I could detect those series, I could use a IF-THEN-ELSE statement with proper specs for the X13.
Thank you!
You can replace or delete them:
ts <- ts(0:10)
## Deleting
ts[ts != 0]
#> [1] 1 2 3 4 5 6 7 8 9 10
## Replacing
replace(ts, ts==0, 1)
#> Time Series:
#> Start = 1
#> End = 11
#> Frequency = 1
#> [1] 1 1 2 3 4 5 6 7 8 9 10
## Detecting
any(ts == 0)
#> [1] TRUE
How to measure distances between certain pairs of (pixel) coordinates in R?

I have a dataset of 22 point coordinates (points represent landmarks on photo of fish-lateral view).
I would like to measure 24 distances between these points (24 different measurements). For example distance between point 1 and 5 and so on.
And I would like to make a loop from it (always will measure the same set of 24 distances - I have 2000 of such lists of coordinates where I have to measure these 24 distances).
I tried "dist" function (see below) and it gave me all possible measurements between all points.
LCmeasure <- read.csv("LC_meranie2.csv", sep = ";", dec = ",", header = T)
> LCmeasure
point x y
1 1 1724.00000 1747.00000
2 2 1864.00000 1637.00000
3 3 1862.00000 1760.00000
4 4 2004.00000 1757.00000
5 5 2077.00000 1533.00000
6 6 2134.00000 1933.00000
7 7 2293.00000 1699.00000
8 8 2282.00000 1588.00000
9 9 2728.00000 1576.00000
10 10 2922.00000 1440.00000
11 11 3018.00000 1990.00000
12 12 3282.00000 1927.00000
13 13 3435.00000 1462.00000
14 14 3629.00000 1548.00000
15 15 3948.00000 1826.00000
16 16 3935.00000 1571.00000
17 17 4463.00000 1700.00000
18 18 4661.00000 1978.00000
19 19 4671.00000 1445.00000
20 20 4101.00000 1699.00000
21 21 2203.00000 2806.00000
22 22 4772.00000 2788.00000
df= data.frame(LCmeasure)
Points <- data.frame(p1=c(1,1,1,3,4,5,1,1,1,7,10,10,11,12,12,14,15,11,13,7,20,20,20,1),p2=c(8,2,3,4,8,6,11,10,13,10,13,11,13,13,20,20,16,12,14,9,18,17,19,20))
Dists <- Points %>% rowwise() %>% mutate(dist=dist(filter(LCmeasure, Point %in% c(p1,p2))))
Now I need to specify in R to measure for me only those specific 24 distances. For example between point 1 and 5, then between point 2 and 10, and so on.
And to make a loop from it (always will be the same set of 24 distances measured).
Here is my solution to your problem:
Generate a new dataframe with your desired pairs of points and then use dplyr to generate distances based on those points:
Points <- data.frame(p1=c(1,2,4,5,6),p2=c(5,10,14,15,17))
Dists <- Points %>% rowwise() %>% mutate(dist=dist(filter(LCMeasure, point %in% c(p1,p2))))
> Dists
> p1 p2 dist
> <dbl> <dbl> <dbl>
> 1 1 5 413.
> 2 2 10 1076.
> 3 4 14 1638.
> 4 5 15 1894.
> 5 6 17 2341.

Avoid memory increase in foreach loop in R

I try to create summary statistics combining two different spatial data-sets: a big raster file and a polygon file. The idea is to get summary statistics of the raster values within each polygon.
Since the raster is too big to process it at once, I try to create subtasks and process them in parallel i.e. process each polygon from the SpatialPolgyonsDataframe at once.
The code works fine, however after around 100 interations I run into memory problems. Here is my code and what I intent to do:
# session setup
# multicore processing.
# assign three clusters to be used for current R session
cluster = makeCluster(3, type = "SOCK",outfile="")
getDoParWorkers()# check if it worked
# load base data
# bring both data-sets to a common CRS
spodf.malha.2007<-spTransform(spodf.malha.2007,CRSobj = CRS(projargs = proj4string(r.terra.2008)))
proj4string(r.terra.2008)==proj4string(spodf.malha.2007) # should be TRUE
# create a function to extract areas
# apply it one one subset to see if it is working
## parallel loop
# define package(s) to be use in the parallel loop
# try a parallel loop for the first 6 polygons
.packages = l.packages) %dopar% {
print(paste("Processing Polygon ",i, ".",sep=""))
here the output is a list that looks like this.
9 10
193159 2567
7 9 10 12 14 16
17 256 1084 494 67 15
3 5 6 7 9 10 11 12
2199 1327 8840 8579 194437 1061 1073 1834
14 16
222 1395
3 6 7 9 10 12 16
287 102 728 329057 1004 1057 31
3 5 6 7 9 12 16
21 6 20 495 184261 4765 28
6 7 9 10 12 14
161 161 386 943 205 1515
So the result is rather small and should not be the source of the memory allocation problem. So than the following loop upon the whole polygon dataset which has >32.000 rows creates the memory allocation which exceeds 8GB after around 100 iteratins.
# apply the parallel loop on the whole dataset
.packages = l.packages) %dopar% {
print(paste("Processing Polygon ",i, ".",sep=""))
# gc(reset=TRUE) # does not resolve the problem
# closeAllConnections() # does not resolve the problem
What am I doing wrong?
I tried (as suggested in the comments) to remove the object after each iteration in the internal loop, but it did not resolve the problem. I furthermore tried to resolve eventual problems of multiple data-imports by passing the objects to the environment in the first place:
clusterExport(cl = cluster,
varlist = c("r.terra.2008","function.landcover.sum","spodf.malha.2007"))
without major changes. My R version is 3.4 on a linux platform so supposedly also the patch of the link from the fist comment should already be included in this version. I also tried the parallel package as suggested in the first comment but no differences appeared.
You can try exact_extract in the exactextractr package. Is the fastest and memory safer function to extract values from raster. The main function is implemented in C++ and usually it doesn't need parallelization. Since you do not provide any example data I post an example with real data:
# Pull municipal boundaries for Brazil
brazil <- st_as_sf(getData('GADM', country='BRA', level=2))
# Pull gridded precipitation data
prec <- getData('worldclim', var='prec', res=10)
#transform precipitation data in a dummy land use map
lu <- prec[[1]]
values(lu) <- sample(1:10,ncell(lu),replace = T)
#extract land uses class for each pixel inside each polygon
ex <- exact_extract(lu, brazil)
#apply table to the resulting list. Here I use just the first 5 elements to avoid long output
table(x[,1])#note that I use x[,1] because by default exact_extract provide in the second column the coverage fraction of each pixel by each polygon
here the example output:
1 2 4 6 7 9 10
1 1 1 2 3 1 1
2 3 4 5 6 7 8 10
2 4 3 2 1 2 2 2
1 2 4 6 7 8 9 10
4 5 1 1 4 2 5 5
1 2 3 4 5 6 7 8 9 10
2 2 4 2 2 4 1 4 1 2
3 4 5 6 8 10
2 3 1 1 2 3

Draw nearest value from sorted data frame into unsorted data frame

I have two data frames in R. The first data frame is a cumulative frequency distribution (cumFreqDist) with associated periods. The first rows of the data frame look like this:
Time cumfreq
0 0.0000000
4 0.9009009
6 1.8018018
8 7.5075075
12 23.4234234
16 39.6396396
18 53.4534535
20 58.2582583
24 75.3753754
100 100.0000000
The second data frame is 10000 draws from a runif distribution, using the code:
testData <- (runif(10000))*100
For each row in testData, I want to locate the corresponding cumfreq in cumFreqDist and add the corresponding Time value into a new column in testData. Because testData is a test data frame standing in for a real data frame, I do not wish to sort testData.
Because I am dealing with cumulative frequencies, if the testData value is 23.30... the Time value that should be returned is 8. That is, I need to locate the nearest cumfreq value that does not exceed the testData value, and return only that one value.
The data.table package has been mentioned for other similar questions, but my limited understanding is that this package requires a key to be identified in both data frames (after conversion to data tables) and I can't assume that the testData values meet the requirements for being assigned as a key - and it appears that assigning a key will sort the data. This will cause me issues when I set a seed later in further work I am doing.
findInterval() is perfect for this:
cumFreqDist <- data.frame(Time=c(0,4,6,8,12,16,18,20,24,100), cumfreq=c(0.0000000,0.9009009,1.8018018,7.5075075,23.4234234,39.6396396,53.4534535,58.2582583,75.3753754,100.0000000) );
testData <- data.frame(x=runif(10000)*100);
testData$Time <- cumFreqDist$Time[findInterval(testData$x,cumFreqDist$cumfreq)];
## x Time
## 1 26.550866 12
## 2 37.212390 12
## 3 57.285336 18
## 4 90.820779 24
## 5 20.168193 8
## 6 89.838968 24
## 7 94.467527 24
## 8 66.079779 20
## 9 62.911404 20
## 10 6.178627 6
## 11 20.597457 8
## 12 17.655675 8
## 13 68.702285 20
## 14 38.410372 12
## 15 76.984142 24
## 16 49.769924 16
## 17 71.761851 20
## 18 99.190609 24
## 19 38.003518 12
## 20 77.744522 24
