Loading multiple workspaces into R at once - r

Consider having many *.Rda files in your directory. They all contain exactly one object (in this case, a model obtained from mboost:::gamboost) with the added twist, that the objects have the same name ("mod_gam").
Is it possible to load all of them into workspace at once (and even renaming them)?
temp <- list.files(pattern="*.Rda")
models <- lapply(temp, load)
does yield a list with empty characters:
str(models)
List of 26
$ : chr "mod_gam"
$ : chr "mod_gam"
$ : chr "mod_gam"
... and so on.

My suggestion would be to add an iterative suffix to your objects as they are loaded in. Since you already know that every object loaded in will be called "mod_gam", it makes things a bit easier.
i <- 1
for(each in temp){
load(each)
eval(parse(text=paste(paste0("mod_gam_",i),"<- mod_gam")))
i <- i+1
}
This will give you the 26 different objects. Note that this isn't optimal -- I wanted to lapply instead of loop, but I was having trouble figuring out how to iterate the suffix each time I read in a new file.

Related

create multiple .lsf files with same content but replaceable IDs

I have a 001.lsf file for subject 001 as follows:
#!/bin/bash
#BSUB -J T2_001 # name for the job
export SUBJECTS_DIR=/subjects/001
cd $SUBJECTS_DIR
mri_convert -in 001 -out 001_2 -U 300
I would like to use R to substitute 001 with a list of subjects 002, 003, 004, ...etc. and generate .lsf files with that subject's name
I have a general idea of generating a loop, Can everyone help with the rest of the loop?
list=c(002, 003, 004...etc)
for i in length(list)
{
df<- read.table("001.lsf")
}
Thank you so much !
Both could be improved a little by doing checks for file-clobbering, i.e., don't over-write an existing file. It sounds like this is a one-time task, so perhaps this isn't critical this time.
Bash
for i in 002 003 004 ; do
sed -e "s/\b001\b/${i}/g" < 001.lsf > "${i}.lsf"
done
R
txt1 <- readLines("001.lsf")
for (i in c("002", "003", "004")) {
writeLines(gsub("\\b001\\b", i, txt1), paste0(i, ".lsf"))
}
BTW, in the event your for code is not intended to be pseudo-code, then:
you are missing parens, in R for loops are always:
for (i in some_list) { ... }
i in length(list) will always iterate exactly once: if the list is empty, then it will iterate with i=0; if the list has 1000 elements, then it will iterate once with i=1000. Perhaps you meant:
for (i in seq_along(list)) { ... }
in case you were thinking it and didn't notice in the previous bullet, one might be tempted to use for (i in 1:length(list)), but see what happens when your list variable is empty: it resolves into for (i in 1:0) which actually loops twice, since 1:0 resolves into [1] 1 0, not an empty vector. That's why I used seq_along (and similarly seq_len), since it returns integer(0) when its argument is empty ... and this causes the for loop to not iterate.
Lastly, list is technically fine, but re-using function names for variables can be problematic, both in how they perform and what types of errors you get. In this case, if you don't define your vector, length(list) still works without warning or error, because its argument is a function and therefore is of length 1 ... not what you intend. In different circumstances (such as $, [, etc), you might get an error such as
Error: object of type 'builtin' is not subsettable
## or
Error: object of type 'closure' is not subsettable
which is much less intuitive than what you might expect
Error: object 'quux' not found
which clearly indicates that you have not defined your variable.
Both of these issues are avoided if you use an otherwise non-existent name for your variable, such as list_of_ids or listOfIds or quux, depending on your preference for naming convention and name-obscurity.

Adding to columns the correct way

I have a simple DT and I would like to add a column to the rest. The code is as follows: (works)
x <- data.table(a=1:5,b=5:1,c=rep(999,5))
for(k in c("a","b")){x[,k] <- x[,..k]+x[,.(c)]}
Now here is the question: Why do I have to use .. for the assignment? Also if I try to use .. in the first case, i.e.
for(k in c("a","b")){x[,..k] <- x[,..k]+x[,.(c)]}
There is an error: "[...]object '..k' not found". This seems strange, that I have to change the syntax within the scope.
Now in dataframe, the equivalent formulation is very clear:
for(k in c("a","b")){x[,k] <- x[,k]+x[,c]} # error with DT
x <- data.frame(a=1:5,b=5:1,c=rep(999,5))
for(k in c("a","b")){x[,k] <- x[,k]+x[,"c"]} # works with dataframe
So I am wondering (1) if the above code is the correct way to do that in datatable (please explain the .. operator, the datatable FAQ 1.1 doesn't address this in particular); and if (2) there are alternative ways to write this in a cleaner way. Thanks for any hints.
from the official introducion (slightly edited for your example):
For those familiar with the Unix terminal, the .. prefix should be
reminiscent of the “up-one-level” command, which is analogous to
what’s happening here – the .. signals to data.table to look for the
k variable “up-one-level”, i.e., in the loop environment
in this case.
So this operator escapes the dataframe and looks for the k variable in one higher level, gets the value and comes back. Not sure why they made it like this, but maybe the variables are not transferred.
You can also use the with argument:
x[,k,with=FALSE]
Edit:
I just checked the source code of data.table. They get the called variable from parent.frame(), so the environment where the function get's called. This is triggered by .. or the with argument. So if you don't use it, the function is not able to get the parameters of the environment.
A question about parent.frame() is found here

How to explore the structure and dimensions of an object in Octave?

Im Matlab the properties function sounds like a possible valid equivalent to commands in R that acquaint you with a particular object in the working environment, providing information as to its structure (data.frame, matrix, list, vector) and type of variables (character, numeric) (for example, with the R command str()), dimensions (using perhaps the call dim()), and names of the variables (names()).
However, this function is not operational in Octave:
>> properties(data)
warning: the 'properties' function is not yet implemented in Octave
I installed the package dataframe as suggested in a comment on the post linked above:
pkg install -forge dataframe and loaded it pkg load dataframe
But I can't find a way of getting a summary of the structure and dimensions of a datset data.mat in the workspace.
I believe it's a structure consisting of a 4 x 372,550 numerical matrix; two 4 x 46,568 numerical matrices, and a 256 x 1 character matrix. To get this info I had to scroll through many pages of the printout of data.
This info is not available on the Octave IDE, where I get:
Name Class Dimensions
data struc 1 x 1
a far cry from the complexity of the object data.
What is the smart way of getting some this information about an object in the workspace in Octave?
Following up on the first answer provided, here is what I get with whos:
>> whos
Variables in the current scope:
Attr Name Size Bytes Class
==== ==== ==== ===== =====
data 1x1 7452040 struct
Total is 1 element using 7452040 bytes
This is not particularly informative about what data really contains. In fact, I just found out a way to extract the names inside data:
>> fieldnames(data)
ans =
{
[1,1] = testData
[2,1] = trainData
[3,1] = validData
[4,1] = vocab
}
Now if I call
>> size(data)
ans =
1 1
the output is not very useful. On the other hand, knowing the names of the matrices within data I can do
>> size(data.trainData)
ans =
4 372550
which is indeed informative.
If you type the name of the variable, you'll see information about it. In your case it's a struct, so it'll tell you the field names. Relevant functions are: size, ndims, class, fieldnames, etc.
size(var)
class(var)
etc.
You refer to .mat. Maybe you have a MAT-file, which you can load with load filename. Once loaded you can examine and use the variables in the file.
whos
prints simple information on the variables in memory, most useful to see what variables exist.
Following up on your edited question. This works in Octave:
for s=fieldnames(data)'
s=s{1};
tmp=data.(s);
disp([s,' - ',class(tmp),' - ',mat2str(size(tmp))])
end
It prints basic information of each of the members of the struct. It does assume that data is a 1x1 struct array. Note that a struct can be an array:
data(2).testData = [];
Causes your data struct to be a 2x1 array. This is why size(data) is relevant. class is also important (it's shown in the output of whos. Variables can be of type double (normal arrays), and other numeric types, logical, struct, cell (an array of arrays), or a custom class that you can write yourself.
I highly recommend reading an introductory text on MATLAB/Octave, as it works very differently from R. It's not just a different flavor of language, it's a whole different world.

Find the source file containing R function definition

I come from a python background and am trying to get up to speed with R, so please bear with me
I have an R file - util.R with the following lines:
util.add <- function(a,b) a + b
util.sub <- function(a,b) { a - b }
I source it as follows:
source('path/util.R')
I now have two function objects and want to write a function as follows:
getFilePath(util.add)
that would give me this result
[1] "path/util.R"
Digging into the srcref attribute of one of the loaded functions appears to work, if you go deep enough ...
source("tmp/tmpsrc.R")
str(util.add)
## function (a, b)
## - attr(*, "srcref")=Class 'srcref' atomic [1:8] 1 13 1 31 13 31 1 1
## .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x8fffb18>
srcfile <- attr(attr(util.add,"srcref"),"srcfile")
ls(srcfile)
## [1] "Enc" "filename" "fixedNewlines" "isFile"
## [5] "lines" "parseData" "timestamp" "wd"
srcfile$filename
## [1] "tmp/tmpsrc.R"
I know this was solved years ago, but I've just come across it and realised that there is a bit more to this if you use the body() function.
The raw function has only the one attribute, "srcref" which contains the code of the function, along with it's own attributes and class of "srcref" (which dictates how it'll get printed).
The body() of a function, such as body(util.add) has three attributes.
"srcref" which contains the body of the function stored as a list of expressions.
"srcfile" which contains the source file of the function (which is what you are looking for in this question)
"wholeSrcref" which points to the entire source file.
This gives you an alternative (although slightly slower) method to extract the source file name attr(body(util.add),"srcfile"), along with being able to see (although not interact with) the sibling functions (i.e. the other functions loaded in the same source file).
Not sure if it's useful, but it could be interesting.
Let's also not forget about the %#% infix operator for accessing attributes using the {purrr} package, with this we could use the more succinct (although again, slower) piece of code as:
util.add%#%srcref%#%srcfile

R: environment lookup

I am a bit confused by R's lookup mechanism. When I have the following code
# create chain of empty environments
e1 <- new.env()
e2 <- new.env(parent=e1)
e3 <- new.env(parent=e2)
# set key/value pairs
e1[["x"]] <- 1
e2[["x"]] <- 2
then I would expect to get "2" if I look for "x" in environment e3.
This works if I do
> get(x="x", envir=e3)
[1] 2
but not if I use
> e3[["x"]]
NULL
Could somebody explain the difference? It seems, that
e3[["x"]]
is not just syntactic sugar for
get(x="x", envir=e3)
Thanks in advance,
Sven
These functions are different.
get performs a search for an object in an environemnt, as well as the enclosing frames (by default):
From ?get:
This function looks to see if the name x has a value bound to it in the specified environment. If inherits is TRUE and a value is not found for x in the specified environment, the enclosing frames of the environment are searched until the name x is encountered. See environment and the ‘R Language Definition’ manual for details about the structure of environments and their enclosures.
In contrast, the [ operator does not search enclosing environments, by default.
From ?'[':
Both $ and [[ can be applied to environments. Only character indices are allowed and no partial matching is done. The semantics of these operations are those of get(i, env=x, inherits=FALSE).

Resources