How do I output from a (doubly) recursive function? - r

How do I get the output of this function in R?
nodes2 <- c(1,2,4,5,10,11,20,21)
getLeftOrder <- function(root, snodes, nodes) {
if ( length(nodes) == 0 ) {
return(snodes)
} else {
nodes <- nodes[-(which(nodes == (2*root)))]
getLeftOrder(2*root, snodes, nodes)
snodes <- c(snodes, root)
cat(root, fill = TRUE)
nodes <- nodes[-(which(nodes == (2*root+1)))]
getLeftOrder(2*root+1, snodes, nodes)
}
}
tnodes <- getLeftOrder(1, vector('integer'), nodes2)
The in-order traversal/reordering output from cat() is fine but the output in tnodes is not. I would like to avoid using <<- operator.

The function needs to [at least sometimes] return something interesting, and collect results as recursive calls return. Trying to keep the current logic the same, you want the roots from the first set of recursive calls, the current root, and the roots from the second set of recursive calls. Adding in some message() calls so we can see what's happening:
nodes2 <- c(1,2,4,5,10,11,20,21)
getLeftOrder <- function(root, nodes) {
message('root: ', root, ' nodes: ', paste(nodes, collapse = ' '))
if ( length(nodes) == 0 ) {
return(NULL)
} else {
nodes <- nodes[-which(nodes == 2*root)]
r1 <- getLeftOrder(2*root, nodes)
message(root)
nodes <- nodes[-which(nodes == 2*root + 1)]
r2 <- getLeftOrder(2*root+1, nodes)
return(c(r1, root, r2))
}
}
which runs like so:
tnodes <- getLeftOrder(1, nodes2)
#> root: 1 nodes: 1 2 4 5 10 11 20 21
#> root: 2 nodes: 1 4 5 10 11 20 21
#> root: 4 nodes: 1 5 10 11 20 21
#> root: 8 nodes:
#> 4
#> root: 9 nodes:
#> 2
#> root: 5 nodes: 1 10 11 20 21
#> root: 10 nodes: 1 11 20 21
#> root: 20 nodes: 1 11 21
#> root: 40 nodes:
#> 20
#> root: 41 nodes:
#> 10
#> root: 21 nodes: 1 11
#> root: 42 nodes:
#> 21
#> root: 43 nodes:
#> 5
#> root: 11 nodes: 1 20 21
#> root: 22 nodes:
#> 11
#> root: 23 nodes:
#> 1
#> root: 3 nodes:
tnodes
#> [1] 4 2 20 10 21 5 11 1
I still don't quite understand the logic here, though; there's quite likely a simpler way to do this.

Related

Acoustic complexity index time series output

I have a wav file and I would like to calculate the Acoustic Complexity Index at each second and receive a time series output.
I understand how to modify other settings within a function like seewave::ACI() but I am unable to find out how to output a time series data frame where each row is one second of time with the corresponding ACI value.
For a reproducible example, this audio file is 20 seconds, so I'd like the output to have 20 rows, with each row printing the ACI for that 1-second of time.
library(soundecology)
data(tropicalsound)
acoustic_complexity(tropicalsound)
In fact, I'd like to achieve this is a few other indices, for example:
soundecology::ndsi(tropicalsound)
soundecology::acoustic_evenness(tropicalsound)
You can subset your wav file according to the samples it contains. Since the sampling frequency can be obtained from the wav object, we can get one-second subsets of the file and perform our calculations on each. Note that you have to set the cluster size to 1 second, since the default is 5 seconds.
library(soundecology)
data(tropicalsound)
f <- tropicalsound#samp.rate
starts <- head(seq(0, length(tropicalsound), f), -1)
aci <- sapply(starts, function(i) {
aci <- acoustic_complexity(tropicalsound[i + seq(f)], j = 1)
aci$AciTotAll_left
})
nds <- sapply(starts, function(i) {
nds <- ndsi(tropicalsound[i + seq(f)])
nds$ndsi_left
})
aei <- sapply(starts, function(i) {
aei <- acoustic_evenness(tropicalsound[i + seq(f)])
aei$aei_left
})
This allows us to create a second-by-second data frame representing a time series of each measure:
data.frame(time = 0:19, aci, nds, aei)
#> time aci nds aei
#> 1 0 152.0586 0.7752307 0.438022
#> 2 1 168.2281 0.4171902 0.459380
#> 3 2 149.2796 0.9366220 0.516602
#> 4 3 176.8324 0.8856127 0.485036
#> 5 4 162.4237 0.8848515 0.483414
#> 6 5 161.1535 0.8327568 0.511922
#> 7 6 163.8071 0.7532586 0.549262
#> 8 7 156.4818 0.7706808 0.436910
#> 9 8 156.1037 0.7520663 0.489253
#> 10 9 160.5316 0.7077717 0.491418
#> 11 10 157.4274 0.8320380 0.457856
#> 12 11 169.8831 0.8396483 0.456514
#> 13 12 165.4426 0.6871337 0.456985
#> 14 13 165.1630 0.7655454 0.497621
#> 15 14 154.9258 0.8083035 0.489896
#> 16 15 162.8614 0.7745876 0.458035
#> 17 16 148.6004 0.1393345 0.443370
#> 18 17 144.6733 0.8189469 0.458309
#> 19 18 156.3466 0.6067827 0.455578
#> 20 19 158.3413 0.7175293 0.477261
Note that this is simply a demonstration of how to achieve the desired output; you would need to check the literature to determine whether it is appropriate to use these measures over such short time periods.

Indexing multiple text files using R

I have to combine 5 files with the same structure and add a new variable to index the new data frame, but all 5 files are using the same ID.
I successfully combine them but I do not find how to index them. I have tried a few loops, but they were not giving me what I wanted.
# Combining files
path <- "D:/..."
filenames <- list.files(path)
t <- do.call("rbind", lapply(filenames, read.table, header = TRUE))
# Trying indexing with loops:
for (i in 1:length(t$ID){
t$ID2<-(t$ID+last(t$ID2))
}
I have 5 files, all of them with the same structure, and all of them using the same variable for identification, i.e.
file 1 would have:
ID: 1 1 1 2 2 2 3 3 3
And file 2 to 5 would have exactly the same IDs:
I would like to combine them into a single data frame so I would have this:
ID: 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 1 1 1....
and then name them differently. So I would have:
ID: 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7...
How's this? This code finds the largest ID of first (i) data.frame and then adds that to IDs of next (i+1) data.frame. It records (i+1) largest ID and uses that in the (i+2) data.frame.
For this to work, you will have to forego the first do.call(rbind, ...) in your code.
xy1 <- data.frame(id = rep(1:4, each = 4), matrix(runif(4*4 * 3), ncol = 3))
xy2 <- data.frame(id = rep(1:7, each = 3), matrix(runif(3*7 * 3), ncol = 3))
xy3 <- data.frame(id = rep(1:3, each = 5), matrix(runif(3*5 * 3), ncol = 3))
xy <- list(xy1, xy2, xy3)
# First find largest ID of the first data.frame.
maxid <- max(xy[[1]]$id)
# Add previous max to current ID.
for (i in 2:length(xy)) {
xy[[i]]$id <- maxid + xy[[i]]$id
maxid <- max(xy[[i]]$id) # calculates largest id to be used next
}
> do.call(rbind, xy)
id X1 X2 X3
1 1 0.881397055 0.113236016 0.58935016
2 1 0.205762300 0.216630633 0.04096480
3 1 0.307112552 0.005092413 0.97769030
4 1 0.457299727 0.329346925 0.09582600
5 2 0.007010529 0.089751397 0.69746047
6 2 0.014806573 0.432586138 0.44480438
7 2 0.534909561 0.108258153 0.82475185
8 2 0.313796157 0.749077837 0.38798818
9 3 0.643547518 0.237040912 0.18304776
10 3 0.725906336 0.186099719 0.61738806
11 3 0.506767958 0.646870554 0.27792817
12 3 0.303638439 0.082478410 0.52484137
13 4 0.360623223 0.182054933 0.48604454
14 4 0.804174231 0.427352128 0.70075198
15 4 0.211255624 0.673377745 0.77251727
16 4 0.474358562 0.430095921 0.03648586
17 5 0.731251361 0.635859860 0.90235962
18 5 0.689463703 0.931878683 0.12179179
19 5 0.256770523 0.413928661 0.89254294
20 6 0.358319709 0.393714347 0.53143877
21 6 0.241538687 0.811901018 0.91577045
22 6 0.445141806 0.015133252 0.70977512
23 7 0.179662683 0.574578297 0.09957555
24 7 0.279302309 0.351412534 0.40911867
25 7 0.826039704 0.852739191 0.58671811
26 8 0.822024888 0.061122387 0.12308001
27 8 0.676081285 0.005285565 0.32040908
28 8 0.302821623 0.511678250 0.14814015
29 9 0.966690845 0.221078055 0.72651928
30 9 0.070768391 0.726477379 0.70431920
31 9 0.178425952 0.223096153 0.41111805
32 10 0.952963096 0.209673890 0.73485060
33 10 0.905570765 0.290359419 0.69499805
34 10 0.976600565 0.448144677 0.36100322
35 11 0.458720466 0.636912805 0.04170255
36 11 0.953471285 0.533102906 0.63543974
37 11 0.574490192 0.975327747 0.94730912
38 12 0.878968237 0.956726315 0.04761167
39 12 0.379196322 0.720179957 0.98719308
40 12 0.217246809 0.066895905 0.44981063
41 12 0.309354927 0.048701078 0.24654953
42 12 0.011187546 0.833095978 0.94793368
43 13 0.590529610 0.240967648 0.42954908
44 13 0.525187039 0.739698883 0.72047067
45 13 0.223469798 0.338660741 0.21820068
46 13 0.359939747 0.831732199 0.27095365
47 13 0.672778236 0.327900275 0.04854854
48 14 0.202447020 0.911963711 0.18576047
49 14 0.858830035 0.003633945 0.25713498
50 14 0.784197766 0.527018979 0.30911792
51 14 0.942135786 0.256841256 0.76965498
52 14 0.488395595 0.716133306 0.89618736

rpart: How to get the "where" vector for validation dataset?

when fitting with rpart, it returns the "where" vector which tells which leave each record in the training dataset is on the tree. Is there a function which return something similar to this "where" vector for a test dataset?
I think the partykit package does what you want
library('rpart')
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
fit
rpart.plot::rpart.plot(fit)
Check with same data
set.seed(1)
idx <- sample(nrow(kyphosis), 5L)
fit$where[idx]
# 22 30 46 71 16
# 9 3 7 7 3
library('partykit')
fit <- as.party(fit)
predict(fit, kyphosis[idx, ], type = 'node')
# 22 30 46 71 16
# 9 3 7 7 3
Check with new data
dd <- kyphosis[idx, ]
set.seed(1)
dd[] <- lapply(dd, sample)
predict(fit, dd, type = 'node')
# 22 30 46 71 16
# 5 3 7 9 3
## so #46 should meet criteria for the 7th leaf:
with(kyphosis[46, ],
Start >= 8.5 & # node 1
Start < 14.5 & # node 2
Age >= 55 & # node 4
Age >= 111 # node 6
)
# [1] TRUE
As you mention, the function predict.rpart in the rpart package
doesn't have a where option (to show the leaf node number associated
with a prediction).
However, the rpart.predict function in the rpart.plot package
will do this. For example
> library(rpart.plot)
> fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
> rpart.predict(fit, newdata=kyphosis[1:3,], nn=TRUE)
gives (note the node number nn column):
absent present nn
1 0.42105 0.57895 3
2 0.85714 0.14286 22
3 0.42105 0.57895 3
And
> rpart.predict(fit, newdata=kyphosis[1:3,], nn=TRUE)$nn
gives just the where node numbers:
[1] 3 22 3
To show the rule for each prediction use
> rpart.predict(fit, newdata=kyphosis[1:5,], rules=TRUE)
which gives
absent present
1 0.42105 0.57895 because Start < 9
2 0.85714 0.14286 because Start is 9 to 15 & Age >= 111
3 0.42105 0.57895 because Start < 9

Creating and appending to data frame in R (Error: arguments imply differing number of rows: 0, 1)

I am creating and appending to a data frame in R:
dat <- data.frame(nodeA = character(), nodeB = character(), edge = numeric())
for (i in 1:length(countTable)-1){
for (j in i+1:length(countTable)){
vecA = as.numeric(as.character(countTable[i,]))
vecB = as.numeric(as.character(countTable[j,]))
nodeA = row.names(countTable[i,])
nodeB = row.names(countTable[j,])
corCoeff = cor(vecA , vecB , method = "spearman")
dat = rbind(dat, data.frame(nodeA = nodeA, nodeB = nodeB, edge = corCoeff))
}
}
Where a head and structure of the countTable are as follows:
> head(countTable)
Norm One Two Three Four
ENST00000000233 12 28 11 4 8
ENST00000000412 23 44 37 23 45
ENST00000000442 9 12 27 10 22
ENST00000001008 18 98 61 21 31
ENST00000001567 16 7 3 9 12
ENST00000002125 2 4 4 5 1
> str(countTable)
'data.frame': 17972 obs. of 5 variables:
$ Norm : int 12 23 9 18 16 2 4 1 22 14 ...
$ One : int 28 44 12 98 7 4 24 14 39 39 ...
$ Two : int 11 37 27 61 3 4 12 3 69 30 ...
$ Three: int 4 23 10 21 9 5 4 3 271 9 ...
$ Four : int 8 45 22 31 12 1 13 7 123 60 ...
If I look at the code in the nested for loop individually, it works as I hope for. However, when I run the entire code, I get an error:
Error in data.frame(nodeA = nodeA, nodeB = nodeB, edge = corCoeff) :
arguments imply differing number of rows: 0, 1
In addition: Warning message:
NAs introduced by coercion
The : operator has higher precedence than + and -. Your code should be corrected to:
for (i in 1:(length(countTable)-1)){
for (j in (i+1):length(countTable)){
...
}
}
Note the difference between:
n <- 3
for (i in 1:n-1)
for (j in i+1:n)
cat(sprintf("(%g,%g)\n", i, j))
## (0,1)
## (0,2)
## (0,3)
## (1,2)
## (1,3)
## (1,4)
## (2,3)
## (2,4)
## (2,5)
and:
for (i in 1:(n-1))
for (j in (i+1):n)
cat(sprintf("(%g,%g)\n", i, j))
## (1,2)
## (1,3)
## (2,3)
You may want something like this. Convert countTable to a matrix and drop down to one loop, using i and i-1 for the loop indices. And there is no need to create an empty data frame beforehand.
> countTable <- as.matrix(countTable)
> rn <- rownames(countTable)
> dat <- do.call(rbind, lapply(2:nrow(countTable), function(i){
corCoeff <- cor(countTable[i-1,] , countTable[i,],
method = "spearman", use = "complete.obs")
data.frame(nodeA = rn[i-1], nodeB = rn[i], edge = corCoeff)
}))
> dat
# nodeA nodeB edge
# 1 ENST00000000233 ENST00000000412 0.1538968
# 2 ENST00000000412 ENST00000000442 0.6668859
# 3 ENST00000000442 ENST00000001008 0.7000000
# 4 ENST00000001008 ENST00000001567 -0.8000000
# 5 ENST00000001567 ENST00000002125 -0.5642881

under what circumstances does R recycle?

I have two variables, x (takes in 5 values) and y (takes in 11 values). When I want to run the argument,
> v <- 2*x +y +1
R responds:
Error at 2* x+y: Longer object length is not a multiple of shorter object length.
I tried: 1*x gives me 5 values of x, but y has 11 values. So R says it can’t add 11 to 5 values? – This raises the general question: Under what circumstances does recycling work?
Recycling works in your example:
> x <- seq(5)
> y <- seq(11)
> x+y
[1] 2 4 6 8 10 7 9 11 13 15 12
Warning message:
In x + y : longer object length is not a multiple of shorter object length
> v <- 2*x +y +1
Warning message:
In 2 * x + y :
longer object length is not a multiple of shorter object length
> v
[1] 4 7 10 13 16 9 12 15 18 21 14
The "error" that you reported is in fact a "warning" which means that R is notifying you that it is recycling but recycles anyway. You may have options(warn=2) turned on, which converts warnings into error messages.
In general, avoid relying on recycling. If you get in the habit of ignoring the warnings, some day it will bite you and your code will fail in some very hard to diagnose way.
It doesn't work this way. You have to have vectors of the same length:
x_samelen = c(1,2,3)
y_samelen = c(10,20,30)
x_samelen*y_samelen
[1] 10 40 90
If vectors are of the same length, the result is well defined and understood. You can do "recycling", but it really is not advisable to do so.
I wrote a short script to make your two vectors of the same length, via padding the short vector. This will let you execute your code without warnings:
x_orig <- c(1,2,3,4,5,6,7,8,9,10,11)
y_orig <- c(21,22,23,24,25)
if ( length(x_orig)>length(y_orig) ) {
x <- x_orig
y <- head(x = as.vector(t(rep(x=y_orig, times=ceiling(length(x_orig)/length(y_orig))))), n = length(x_orig) )
cat("padding y\r\n")
} else {
x <- head(x = as.vector(t(rep(x=x_orig, times=ceiling(length(y_orig)/length(x_orig))))), n = length(y_orig) )
y <- y_orig
cat("padding x\r\n")
}
The results are:
x_orig
[1] 1 2 3 4 5 6 7 8 9 10 11
y_orig
[1] 21 22 23 24 25
x
[1] 1 2 3 4 5 6 7 8 9 10 11
y
[1] 21 22 23 24 25 21 22 23 24 25 21
If you reverse x_orig and y_orig:
x_orig
[1] 21 22 23 24 25
y_orig
[1] 1 2 3 4 5 6 7 8 9 10 11
x
[1] 21 22 23 24 25 21 22 23 24 25 21
y
[1] 1 2 3 4 5 6 7 8 9 10 11

Resources