Access data deep in a structure using get() - r

I have a p structure in R memory and I'm trying to access the Rate column of the matrix.
When I type p$6597858$Sample in the console, I get ...
p$`6597858`$Sample
Rate Available X Y
[1,] 1.01 1520.93 0.00 0.0
[2,] 1.02 269.13 0.00 0.0
[3,] 1.03 153.19 0.00 0.0
[4,] 1.04 408.80 0.00 0.0
and so on ...
Within my code when I try to
get("p$`6597858`$Sample[,1]")
I get this returned ...
object 'p$`6597858`$Sample[ ,1]' not found
Is this an apostrophe problem?

Neither the $ nor the [[ operator work within get() (because p[[1]] is not an R object, it's a component of the object p).
You could try
p <- list(`6597858`=list(Sample=data.frame(Rate=1:3,Available=2:4)))
z <- eval(parse(text="p$`6597858`$Sample[,1]"))
but it's probably a bad idea. Is there a reason that
z <- p[["6597858"]][["Sample"]][,"Rate"]
doesn't do what you want?
You can do this dynamically by using character variables to index, still without using get: for example
needed <- 1234
x <- p[[as.character(needed)]][["Sample"]][,"Rate"]
(edit: suggested by Hadley Wickham in comments) or
x <- p[[c(as.character(needed),"Sample","Rate")]]
(if the second-lowest-level element is a data frame or list: if it's a matrix, this alternative won't work, you would need p[[c(as.character(needed),"Sample")]][,"Rate"] instead)
This is a situation where figuring out the idiom of the language and working with it (rather than struggling against it) will pay off ...
library(fortunes)
fortune(106)
If the answer is parse() you should usually rethink the question.
-- Thomas Lumley
R-help (February 2005)
In general,
extracting elements directly from lists is better (safer, leads to less convoluted code) than using get()
using non-standard names for list elements (i.e. those such as pure numbers that need to be protected by backticks) is unwise
[[ is more robust than $

Related

Does R store the values of recursive functions that it has obtained?

I have used Mathematica for many years and have just started using R for programming.
In both programs we can define recursive functions. In Mathematica there is a way to save values of functions. I am not sure if this is the default setting for R. Take the Fibonacci numbers for example. In Mathematica
fibonacci[0]=1;
fibonacci[1]=1;
fibonacci[n_]:=fibonacci[n-1]+fibonacci[n-2];
Let's say to find fibonacci[10]. It needs to find all fibonacci[i] from i=2, ..., 9 first.
After that, if we want to find fibonacci[11], then the program needs to go through the whole process to find all fibonacci[i] from i=2, ..., 10 again. It does not store the values it has obtained. A modification in Mathematica is
fibonacci[0]=1;
fibonacci[1]=1;
fibonacci[n_]:=fibonacci[n]=fibonacci[n-1]+fibonacci[n-2];
In this way, once we have computed the value fibonacci[10], it is stored, and we do not need to compute it again to find fibonacci[11]. This can save a lot of time to find, say fibonacci[10^9].
The Fibonacci function in R can be defined similarly:
fibonacci = function(n) {
if (n==0 | n==1) { n }
else {fibonacci[n-1]+fibonacci[n-2]}}
Does R store the value fibonacci[10] after we compute it? Does R compute fibonacci[10] again when we want to find fibonacci[11] next? Similar questions have been asked for other programmings.
Edit: as John has suggested, I have computed fibonacci[30] (which is 832040) and then fibonacci[31] (which is 1346269). It took longer to get fibonacci[31]. So it appears that the R function defined above does not store the values. How to change the program so that it can store the intermediate values of a recursive function?
R does not do this by default, and similarly as you see Mathematica doesn't either. You can implement memoise yourself or use the memoise package.
fib <- function(i){
if(i == 0 || i == 1)
i
else
fib(i - 1) + fib(i - 2)
}
system.time(fib(30))
# user system elapsed
# 0.92 0.00 1.84
library(memoise)
fib <- memoise(fib) # <== Memoise magic
system.time(fib(100))
# user system elapsed
# 0.02 0.00 0.03

R: 'x' values being identical and lambda values being too redistricted in boxcoxnc function

I am trying to use the boxcoxnc function in the AID package to calculate normalized data using the Shapiro-Wilcox W statistic to determine lambda.
I want the boxcoxnc function to run on each column in my data frame in a for loop.
data<-data.frame(data[,2:27])
for (f in 1:length(data)){
model<-boxcoxnc(as.matrix(as.numeric(unlist(data[f]))),
method="sw",lambda = as.numeric(seq(-20,20,0.01)))
}
The first three columns work fine and when I get to the fourth I get the error:
Error in boxcoxnc(as.matrix(as.numeric(unlist(data[f]))), method = "sw", :
Enlarge the range of the lambda
Which I do, enlarge the range of lambda to(-21, -20, 0.01) and then get the following error on the first column.
Error in shapiro.test(store2[[x]]) : all 'x' values are identical
However, the data is not identical. It is only certain columns in my data frame that does this and I do not know why. The fourth column that calls the first error is this:
1.539
1.587
1.558
1.625
1.651
1.659
1.654
1.643
1.53
1.552
1.537
1.522
1.559
1.636
1.57
1.631
1.544
1.625
1.552
1.519
1.556
1.528
1.616
1.554
1.571
1.534
1.574
1.578
1.574
1.533
1.54
1.531
1.561
1.576
1.624
1.593
1.557
1.556
1.559
1.59
The first column is this: 6.301
6.611
6.448
7.049
7.068
7.208
7.215
7.084
6.129
6.471
6.295
5.984
6.34
7.052
6.448
6.885
6.42
6.963
6.169
6.185
6.289
6.05
6.901
6.333
6.458
6.228
6.458
6.477
6.71
6.296
6.147
6.171
6.278
6.667
6.932
6.646
6.369
6.408
6.466
6.688
Any help is really appreciated.
One thing you must keep always in mind in R is that by design R tends to just to keep going no matter what, coercing and converting the data even when some of the transformation actually do not make sense. This results in values and errors messages down the line that do not make sense and that is what may be happening here. If you inspect the result of as.matrix(as.numeric(...)), for every one of those columns I likely that is not what you expect it to be.
Without knowing exactly how boxcoxnc works I suggest the following alternative code to make it more readable and may-be even fix the bug but that is a big maybe:
for (col in 2:27) {
model <- boxcoxnc(data[,col], method="sw", lambda = seq(-20,20,0.01))
# what are you trying to do with model here, it is rewritten every time.
}
Comments:
subsetting the original data is unnecessary since you are iterating through the columns by index.
even when data[col] would work (since a data.frame is in fact a list of columns) is more appropriate to do data[,col]. Also instead of length(data) you should write ncol(data) but that expression is gone anyway.
as.matrix(as.numeric(unlist(...))) seems totally unnecessary here and is just a chance to something go wrong in terms of R doing an untended conversion. Perhaps as.numeric is necessary iff boxcoxnc is a bit particular and really cannot accept anything but a numeric vector.
as.numeric(seq(...)) can be just sec(...); it would be surprising if seq would return anything but a numeric vector.
Now, something you should consider is perhaps some of those columns do not contain numeric data. If it say numbers but as strings then yes you need as.numeric. Can you confirm that there is no column that contains anything but numerics and integers typed data? Strings or factors would be problematic and might be the root cause of your issues. What is the result of:
sapply(d, class)
Btw apply methods are preferable to for loops so perhaps you want to go that route perhaps you co do something like this:
models <- sapply(data[,2:27], function(col) {
boxcoxnc(col, method="sw", lambda = seq(-20,20,0.01))
})

recycling higher dimensional arrays

I was surprised to find that R's recycling didn't apply in higher dimensions:
> str(Z)
num [1:5, 1:100, 1:10] 1.02 0.989 2.555 1.167 -0.835 ...
> str(w)
num [1:5, 1:100] 1.43 7.84 6.13 2.91 2.8 ...
> Z + w
Error in Z + w : non-conformable arrays
whereas I expected the 2d matrix w to be recycled along the 3rd dimension of Z. I get the same error with a matrix w with dimensions like the last 2 of Z (as with numpy's broadcasting rule). I figured when recycling R would simply flatten each array in the order of the dimensions (C style) and add them, then reshape them back, which would work in however many dimensions. Is there a right way to recycle a matrix like I'm trying to? I guess I could do the flattening and reshaping myself by manipulating the dim attributes, but obviously would prefer not to do the work myself.
The language definition has this line: "That is, if for instance you add c(1, 2, 3) to a six-element vector then you will really add c(1, 2, 3, 1, 2, 3)." Can anyone who has looked under the hood tell me whether R is literally creating a new longer vector from the shorter, to conform the the other operand, and then applying the operator? I had been assuming recycling was more space-efficient. If not then I might as well achieve the higher-dimensional recycling by creating a 3-way array from the matrix. I imagine there is some package for multiway arrays/tensors but I would prefer to use base.
Implicit recycling only works with vectors. A solution to matrix recycling is to use the sweep function, as documented here. In your case, try
sweep(Z,1:2,w,FUN="+")
The second argument specifies which dimensions of Z will be preserved.

How can i use a minimum length of the sequence in cSPADE using R programming

Assuming my output of the cSPADE would be similar to the following, How can i prevent the single nodes. As i am more concern in getting the pattern among two or more elements. Is there anyway i can put a minimum length ?
sequence support
1 <{A}> 1.00
2 <{B}> 1.00
3 <{D}> 0.50
4 <{F}> 1.00
**5 <{A,F}> 0.75
6 <{B,F}> 1.00
7 <{D},{F}> 0.50
8 <{D},{B,F}> 0.50**
I know this is an old question, but I wanted to share an answer I came up with after personally failing to find much help on this topic, in case any one else stumbles on this.
I failed to find an option in cspade directly that would allow for these sequences to not be outputted from the start, but you can eliminate them after the fact.
What you can do is use the function size() in the arulesSequences package. See ?size for additional details, but assuming you saved the cspade output as "seq", you can subset your cspade output like so:
myupdatedseq<-seq[size(seq,"itemsets")>1]
or equivalently,
myupdatedseq<-subset(seq, subset = size(x,"itemsets")>1)
See ?subset as well in the arulesSequences package for additional help subsetting sequences.
What worked for me was:
myupdatedseq <- subset(seq, size(x) > 1)
or, if you want to convert to a data frame:
as(subset(seq, size(x) > 1), "data.frame")

R's signal package's filter not matching with Matlab's filter function

In Matlab, there is a 1-D filter function http://www.mathworks.com/help/matlab/ref/filter.html .
In R's signal package, the description of its filter function states: Generic filtering function. The default is to filter with an ARMA filter of given coefficients. The default filtering operation follows Matlab/Octave conventions.
However, the answers don't match if I give the same specification.
In MATLAB (correct answer):
x=[4 3 5 2 7 3]
filter(2/3,[1 -1/3],x,x(1)*1/3)
ans =
4.0000 3.3333 4.4444 2.8148 5.6049 3.8683
In R, if I follow Matlab/Octave's convention (incorrect answer):
library(signal)
x<-c(4,3,5,2,7,3)
filter(2/3,c(1,-1/3),x,x[1]*1/3)
Time Series:
Start = 1
End = 6
Frequency = 1
[1] 3.111111 3.037037 4.345679 2.781893 5.593964 3.864655
I tried a lot of other examples too. R's signal package's filter function doesn't appear to follow the Matlab/Octave conventions even though the document states it so. Perhaps, I'm using the filter function incorrectly in R. Can someone help me?
I believe the answer is in the documentation (shock!!!!)
matlab:
The filter is a "Direct Form II Transposed"
implementation of the standard difference equation:
a(1)*y(n) = b(1)*x(n) + b(2)*x(n-1) + ... + b(nb+1)*x(n-nb)
- a(2)*y(n-1) - ... - a(na+1)*y(n-na)
If a(1) is not equal to 1, filter normalizes the filter coefficients by a(1).
[emphasis mine]
R:
a[1]*y[n] + a[2]*y[n-1] + … + a[n]*y[1] = b[1]*x[n] + b[2]*x[m-1] + … + b[m]*x[1]
Thanks for lifting this issue a couple of years back... I bumped into it as well and think I got an answer. Essentially I think the optimization algos are different for R and Matlab.
If no guess is provided (that is, set the initial values to default which is zero for both R and Matlab), the results are very similar.
R
library(signal)
x<-c(4,3,5,2,7,3)
filter(2/3,cbind(1,-1/3),x, 0.00)
2.666667 2.888889 4.296296 2.765432 5.588477 3.862826
Matlab
x=[4 3 5 2 7 3]
filter(2/3,[1 -1/3],x,0.00)
2.6667 2.8889 4.2963 2.7654 5.5885 3.8628
Now, if we start tweaking the initial guess of the parameters, then the results will diverge.
R
library(signal)
x<-c(4,3,5,2,7,3)
filter(2/3,cbind(1,-1/3),x, 0.05)
2.683333 2.894444 4.298148 2.766049 5.588683 3.862894
Matlab
x=[4 3 5 2 7 3]
filter(2/3,[1 -1/3],x,0.05)
2.7167 2.9056 4.3019 2.7673 5.5891 3.8630
Hope it helps!

Resources