Attempt to access R environment produces unexpected results - r

I have a bunch of data frames named Ldat.1, Ldat.2, etc., in my normal R environment that I can access interactively.
From the console, I can type:
> dim(Ldat.1)[1]
[1] 40
> dim(Ldat.2)[1]
[1] 39
So I can tell that the first has 40 rows and the second has 39 rows.
However, with dozens of data frames, I want to write a script to tell me how many rows are in each frame.
I tried the following:
print(dim(Ldat.1)[1])
print(dim(Ldat.2)[1])
for (i in 1:2){
namex<-paste("Ldat.",i,sep="")
size<-dim(.GlobalEnv$namex)
print(size[1])
}
and the console showed:
> print(dim(Ldat.1)[1])
[1] 40
> print(dim(Ldat.2)[1])
[1] 39
> for (i in 1:2){
+ namex<-paste("Ldat.",i,sep="")
+ size<-dim(.GlobalEnv$namex)
+ print(size[1])
+ }
NULL
NULL
It's easy enough to construct the strings:
for (i in 1:2){
namex<-paste("Ldat.",i,sep="")
size<-dim(namex)
print(namex)
}
produces:
> for (i in 1:2){
+ namex<-paste("Ldat.",i,sep="")
+ size<-dim(namex)
+ print(namex)
+ }
[1] "Ldat.1"
[1] "Ldat.2"
But despite trying various combinations of "as.data.frame" and "envir=" I can't seem to get R to interpret the string "Ldat.1" as meaning the name of an object accessible from the console.
Thanks in advance.

1) This will list the names of each data frame and the number of rows it has:
sapply(Filter(is.data.frame, mget(ls())), nrow)
If we already had nms, a vector of names of data frames, then we could reduce this to:
nms <- c("Ldat.1", "Ldat.2")
sapply(mget(nms), nrow)
2) Here is another way:
simplify2array(Filter(Negate(is.null), eapply(.GlobalEnv, nrow)))
3) Also try the ll function in the R.oo package.

Try this instead:
for (i in 1:2){
namex<-paste("Ldat.",i,sep="")
size<-dim(.GlobalEnv[[namex]])
print(size[1])
}
The problem has nothing to do with environments and everything to do with the fact that $ does not evaluate its second argument (it's first argument being the token name that precedes it, .GlobalEnv in this case). There is no object in .GlobalEnv named "namex". On the other hand "[[" does an evaluation step, so the value of namex (which is "Ldat.1" during the first pass of the for-loop) will get substituted and lookup will succeed.

Related

how to extract multiple function output automatially in R

I have built my own function, where it returns many values. I really need to extract several values at once. For example, suppose the following is my function
myfunc <- function(x,y){
res <- x+y
res2 <- x^2
res3 <- x*2
out <- list()
out$add <- res
out$squ <- res2
out$or <- res3
out$ADD <- res+res2+res3
out$fi <- res^2+res2+res3
return(out)
}
Then,
> myres
$add
[1] 7
$squ
[1] 9
$or
[1] 6
$ADD
[1] 22
$fi
[1] 64
suppose I want to extract two values at a time, for example,
myres$add, and myres$ADD
is there a way to find them automatically in R instead of repeating it. My original function is very complicated and this will help a lot.
Perhaps, you can try something like this -
res <- myfunc(6, 4)
extract_values <- c('add', 'ADD')
res[extract_values]
#$add
#[1] 10
#$ADD
#[1] 58
You could concatenate them or join in a list:
c(myres$add, myres$squ)
list(myres$add, myres$squ)
If you only want one call to myres you could also index like this:
myres[c(1, 2)]
What you want is known as destructuring, and unfortunately R does not natively support it. There are multiple packages which support this. The one with the (IMHO) nicest syntax is my own package ‘unpack’, which allows you to write positional unpacking as follows:
c[add, ., ., ADD, .] = myfunc(3, 4)
After this, the variables add and ADD are directly available to the caller.
A similar solution (more powerful but with a less nice syntax) is provided by the ‘zeallot’ package.

Alternative to assign function in r

I am using the following code in a loop, I am just replicating the part which I am facing the problem in. The entire code is extremely long and I have removed parts which are running fine in between these lines. This is just to explain the problem:
for (j in 1:2)
{
assign(paste("numeric_data",j,sep="_"),unique_id)
for (i in 1:2)
{
assign(paste("numeric_data",j,sep="_"),
merge(eval(as.symbol(paste("numeric_data",j,sep="_"))),
eval(as.symbol(paste("sd_1",i,sep="_"))),all.x = TRUE))
}
}
The problem that I am facing is that instead of assign in the second step, I want to use (eval+paste)
for (j in 1:2)
{
assign(paste("numeric_data",j,sep="_"),unique_id)
for (i in 1:2)
{
eval(as.symbol((paste("numeric_data",j,sep="_"))))<-
merge(eval(as.symbol(paste("numeric_data",j,sep="_"))),
eval(as.symbol(paste("sd_1",i,sep="_"))),all.x = TRUE)
}
}
However R does not accept eval while assigning new variables. I looked at the forum and everywhere assign is suggested to solve the problem. However, if I use assign the loop overwrites my previously generated "numeric_data" instead of adding to it, hence I get output for only one value of i instead of both.
Here is a very basic intro to one of the most fundamental data structures in R. I highly recommend reading more about them in standard documentation sources.
#A list is a (possible named) set of objects
numeric_data <- list(A1 = 1, A2 = 2)
#I can refer to elements by name or by position, e.g. numeric_data[[1]]
> numeric_data[["A1"]]
[1] 1
#I can add elements to a list with a particular name
> numeric_data <- list()
> numeric_data[["A1"]] <- 1
> numeric_data[["A2"]] <- 2
> numeric_data
$A1
[1] 1
$A2
[1] 2
#I can refer to named elements by building the name with paste()
> numeric_data[[paste0("A",1)]]
[1] 1
#I can change all the names at once...
> numeric_data <- setNames(numeric_data,paste0("B",1:2))
> numeric_data
$B1
[1] 1
$B2
[1] 2
#...in multiple ways
> names(numeric_data) <- paste0("C",1:2)
> numeric_data
$C1
[1] 1
$C2
[1] 2
Basically, the lesson is that if you have objects with names with numeric suffixes: object_1, object_2, etc. they should almost always be elements in a single list with names that you can easily construct and refer to.

Can I make a function in R return a named object?

I want to write a function that creates a time series, but I'd like it to generate the name of the time series as part of the call.
Sort of
makeTS(my.data.frame, string(dateName), string(varName)){
-create time series tsAux from my.data.frame, dateName and varName
-create string tsName
(-the creation of tsAux is not a problem)
assign(tsName, tsAux)
return(tsName)
}
This, perhaps not surprisingly, returns the string tsName, but is there any way that I can make it return a named object?
I've tried with
do.call('<-', list(tsName, tsAux))
and I've also tried using
as.name(tsName) <- tsAux
but nothing seems to work.
I know that
tsName <- makeTS2(my.data.frame, dateName, varName)
would do the trick (where makeTS2() just generates the time series tsAux and returns it), but is there any way to make it work with one function call?
Thanks!
Can you? Sure:
makeTS <- function(dat, varName) {
result <- NA
assign( varName, result, envir = .GlobalEnv )
result
}
> makeTS(NA, "test")
[1] NA
> test
[1] NA
Should you? Almost surely not.
Ari B.' answer is good. You could also use assign() with a variable.
> makeTS <- function(dat) {
+ return(666)
+ }
> varName <- "tmp"
> tmp
Error: object 'tmp' not found
> assign(varName, makeTS(1))
> tmp
[1] 666

R - Return an object name from a for loop

Using a basic function such as this:
myname<-function(z){
nm <-deparse(substitute(z))
print(nm)
}
I'd like the name of the item to be printed (or returned) when iterating through a list e.g.
for (csv in list(acsv, bcsv, ccsv)){
myname(csv)
}
should print:
acsv
bcsv
ccsv
(and not csv).
It should be noted that acsv, bcsv, and ccsvs are all dataframes read in from csvs i.e.
acsv = read.csv("a.csv")
bcsv = read.csv("b.csv")
ccsv = read.csv("c.csv")
Edit:
I ended up using a bit of a compromise. The primary goal of this was not to simply print the frame name - that was the question, because it is a prerequisite for doing other things.
I needed to run the same functions on four identically formatted files. I then used this syntax:
for(i in 1:length(csvs)){
cat(names(csvs[i]), "\n")
print(nrow(csvs[[i]]))
print(nrow(csvs[[i]][1]))
}
Then the indexing of nested lists was utilized e.g.
print(nrow(csvs[[i]]))
which shows the row count for each of the dataframes.
print(nrow(csvs[[i]][1]))
Then provides a table for the first column of each dataframe.
I include this because it was the motivator for the question. I needed to be able to label the data for each dataframe being examined.
The list you have constructed doesn't "remember" the expressions it was constructed of anymore. But you can use a custom constructor:
named.list <- function(...) {
l <- list(...)
exprs <- lapply(substitute(list(...))[-1], deparse)
names(l) <- exprs
l
}
And so:
> named.list(1+2,sin(5),sqrt(3))
$`1 + 2`
[1] 3
$`sin(5)`
[1] -0.9589243
$`sqrt(3)`
[1] 1.732051
Use this list as parameter to names, as Thomas suggested:
> names(mylist(1+2,sin(5),sqrt(3)))
[1] "1 + 2" "sin(5)" "sqrt(3)"
To understand what's happening here, let's analyze the following:
> as.list(substitute(list(1+2,sqrt(5))))
[[1]]
list
[[2]]
1 + 2
[[3]]
sqrt(5)
The [-1] indexing leaves out the first element, and all remaining elements are passed to deparse, which works because of...
> lapply(as.list(substitute(list(1+2,sqrt(5))))[-1], class)
[[1]]
[1] "call"
[[2]]
[1] "call"
Note that you cannot "refactor" the call list(...) inside substitute() to use simply l. Do you see why?
I am also wondering if such a function is already available in one of the countless R packages around. I have found this post by William Dunlap effectively suggesting the same approach.
I don't know what your data look like, so here's something made up:
csvs <- list(acsv=data.frame(x=1), bcsv=data.frame(x=2), ccsv=data.frame(x=3))
for(i in 1:length(csvs))
cat(names(csvs[i]), "\n")

how do i pass parameters to subset()?

I am building a gui which lets me select a subset of a data.frame by clicking on the various factor names.
After having received user input, how do i pass it to the subset function?
e.g.: I have a dataframe df with factors MORNING and EVENING in column timeofday and RECEIVE and SEND in column optype. From the GUI I know that the user wants to a subset containing only RECEIVE operations, so i have the following strings as well:
RequestedFactor1 which equals "optype"
RequestedRelationship1 equals "=="
RequestedValue1 which equals "RECEIVE"
What can i do to those strings to pass them to subset, so that I will receive the same output as if i had called subset(df,optype=="RECEIVE") ?
TIA
For this you can use an eval-parse construct, but again I warn that this is actually tricky business. Please read the help files about these two very carefully. So in your case this becomes :
subset(df,eval(parse(text=paste(RF1,RR1,RV1))))
An example to illustrate some tricky parts :
> RF1 <- "optype"
> RR1 <- "=="
> RV1 <- "\"RECEIVE\""
> optype <- c("RECEIVE","Not")
> ifelse(eval(parse(text=paste(RF1,RR1,RV1))),1,0)
[1] 1 0
Mind the escaped quote-marks (\"). This is necessary as you want to test against a string, and not the RECEIVE object. Alternatively you can do :
> RF1 <- "optype"
> RR1 <- "=="
> RV1 <- "Text"
> optype <- c("RECEIVE","Not")
> Text <- "RECEIVE"
> ifelse(eval(parse(text=paste(RF1,RR1,RV1))),1,0)
[1] 1 0
The comparison operators in R are actually special functions, so you can use do.call to run the functions, no need for eval and parse and the potential headaches that can come from there. e.g.:
rf1 <- 'Species'
rr1 <- '=='
rv1 <- 'setosa'
subset(iris, do.call(rr1, list( get(rf1), rv1 ) ) )
You need to "get" the variable so that you have the variable value rather than the name, the rest can be the character strings.

Resources