create multiple .lsf files with same content but replaceable IDs - r

I have a 001.lsf file for subject 001 as follows:
#!/bin/bash
#BSUB -J T2_001 # name for the job
export SUBJECTS_DIR=/subjects/001
cd $SUBJECTS_DIR
mri_convert -in 001 -out 001_2 -U 300
I would like to use R to substitute 001 with a list of subjects 002, 003, 004, ...etc. and generate .lsf files with that subject's name
I have a general idea of generating a loop, Can everyone help with the rest of the loop?
list=c(002, 003, 004...etc)
for i in length(list)
{
df<- read.table("001.lsf")
}
Thank you so much !

Both could be improved a little by doing checks for file-clobbering, i.e., don't over-write an existing file. It sounds like this is a one-time task, so perhaps this isn't critical this time.
Bash
for i in 002 003 004 ; do
sed -e "s/\b001\b/${i}/g" < 001.lsf > "${i}.lsf"
done
R
txt1 <- readLines("001.lsf")
for (i in c("002", "003", "004")) {
writeLines(gsub("\\b001\\b", i, txt1), paste0(i, ".lsf"))
}
BTW, in the event your for code is not intended to be pseudo-code, then:
you are missing parens, in R for loops are always:
for (i in some_list) { ... }
i in length(list) will always iterate exactly once: if the list is empty, then it will iterate with i=0; if the list has 1000 elements, then it will iterate once with i=1000. Perhaps you meant:
for (i in seq_along(list)) { ... }
in case you were thinking it and didn't notice in the previous bullet, one might be tempted to use for (i in 1:length(list)), but see what happens when your list variable is empty: it resolves into for (i in 1:0) which actually loops twice, since 1:0 resolves into [1] 1 0, not an empty vector. That's why I used seq_along (and similarly seq_len), since it returns integer(0) when its argument is empty ... and this causes the for loop to not iterate.
Lastly, list is technically fine, but re-using function names for variables can be problematic, both in how they perform and what types of errors you get. In this case, if you don't define your vector, length(list) still works without warning or error, because its argument is a function and therefore is of length 1 ... not what you intend. In different circumstances (such as $, [, etc), you might get an error such as
Error: object of type 'builtin' is not subsettable
## or
Error: object of type 'closure' is not subsettable
which is much less intuitive than what you might expect
Error: object 'quux' not found
which clearly indicates that you have not defined your variable.
Both of these issues are avoided if you use an otherwise non-existent name for your variable, such as list_of_ids or listOfIds or quux, depending on your preference for naming convention and name-obscurity.

Related

How can I check if a passed entity supports the dollar operator?

lists and environments support the dollar operator in R, so I can do lst$whatever and env$whatever. Other entities, like atomic vectors, do not, for example I can't do vctr$whatever.
Is there a way to programmatically know if a passed entity supports the dollar operator?
having names() apparently is not good, because vectors can have names but are still not dollar indexable. ls() may seem a good approach but it requires that the entity can be converted to an environment, which may not always be the case.
There's no method that will tell you for sure if something will respond to the $ function. But even it there was, there's no guarantee what the $ would do. The $ is generic and classes are free to redefine how it behaves. For example, it could be used to draw a plot
foo <- function(x) {
structure(x, class="foo")
}
`$.foo`<-function(x, v, ...) {
plot(seq.int(nchar(v)), seq.int(nchar(v)), main=v)
}
x <- foo(5)
x$hello
So just because it will respond to $ doesn't mean it will actually return/extract a value.
If you expect $ to have a certain behavior, then you should test for classes that actually have that behavior. If you want to just try to use $, you can always just catch the error in a tryCatch. Here we just return NULL when it fails but you could return whatever you like.
tryCatch(thing$whatever, error=function(e) NULL)

declaration of variables in R

I have a problem of using a variable in R Studio. My code is as following. "child_birth" is a vector composed of 49703 strings that indicates some information about the birth of childre. What I did here is to tell whether the last 7 characters in each element of the vector is "at home". So I used a for loop and an if statement. if it is "at home", then the corresponding element in vector "GetValue" will be TRUE.
forloop <- (1:49703)
for (i in forloop){
temp <- child_birth[i]
if (substr(temp, nchar(temp)-6, nchar(temp)) == "at home" ) {
GetValue[i] = TRUE
}
else{ GetValue[i] = FALSE }
}
I googled it to make sure that in R I don't need to do a predecalration before using a variable. but when I ran the code above, I got the error information:" Error: object 'GetValue' not found". So what's the problem with it?
Thank you!
GetValue[i] only makes sense if GetValue (and i) exist. Compare: x+i only makes sense if x and i exist, which has nothing to do with whether or not x and i must be declared before being used.
In this case, you need to define GetValue before the loop. I recommend
GetValue <- logical(length(child_birth))
so as to allocate enough space. In this case, you could drop the else clause completely since the default logical value is FALSE.
I also recommend dropping the variable forloop and using
for(i in seq_along(child_birth))
Why hard-wire in the magic number 49703? Such numbers are subject to change. If you put them explicitly in the code, you are setting yourself up for future bugs.

return value of if statement in r

So, I'm brushing up on how to work with data frames in R and I came across this little bit of code from https://cloud.r-project.org/web/packages/data.table/vignettes/datatable-intro.html:
input <- if (file.exists("flights14.csv")) {
"flights14.csv"
} else {
"https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv"
}
Apparently, this assigns the strings (character vectors?) in the if and else statements to input based on the conditional. How is this working? It seems like magic. I am hoping to find somewhere in the official R documentation that explains this.
From other languages I would have just done:
if (file.exists("flights14.csv")) {
input <- "flights14.csv"
} else {
input <- "https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv"
}
or in R there is ifelse which also seems designed to do exactly this, but somehow that first example also works. I can memorize that this works but I'm wondering if I'm missing the opportunity to understand the bigger picture about how R works.
From the documentation on the ?Control help page under "Value"
if returns the value of the expression evaluated, or NULL invisibly if none was (which may happen if there is no else).
So the if statement is kind of like a function that returns a value. The value that's returned is the result of either evaulating the if or the then block. When you have a block in R (code between {}), the brackets are also like a function that just return the value of the last expression evaluated in the block. And a string literal is a valid expression that returns itself
So these are the same
x <- "hello"
x <- {"hello"}
x <- {"dropped"; "hello"}
x <- if(TRUE) {"hello"}
x <- if(TRUE) {"dropped"; "hello"}
x <- if(TRUE) {"hello"} else {"dropped"}
And you only really need blocks {} with if/else statements when you have more than one expression to run or when spanning multiple lines. So you could also do
x <- if(TRUE) "hello" else "dropped"
x <- if(FALSE) "dropped" else "hello"
These all store "hello" in x
You are not really missing anything about the "big picture" in R. The R if function is atypical compared both to other languages as well as to R's typical behavior. Unlike most functions in R which do require assignment of their output to a "symbol", i.e a proper R name, if allows assignments that occur within its consequent or alternative code blocks to occur within the global environment. Most functions would return only the final evaluation, while anything else that occurred inside the function body would be garbage collected.
The other common atypical function is for. R for-loops only
retain these interior assignments and always return NULL. The R Language Definition calls these atypical R functions "control structures". See section 3.3. On my machine (and I suspect most Linux boxes) that document is installed at: http://127.0.0.1:10731/help/doc/manual/R-lang.html#Control-structures. If you are on another OS then there is probably a pulldown Help menu in your IDE that will have a pointer to it. Thew help document calls them "control flow constructs" and the help page is at ?Control. Note that it is necessary to quote these terms when you wnat to access that help page using one of those names since they are "reserved words". So you would need ?'if' rather than typing ?if. The other reserved words are described in the ?Reserved page.
?Control
?'if' ; ?'for'
?Reserved
# When you just type:
?if # and hit <return>
# you will see a "+"-sign which indicateds an incomplete expression.
# you nthen need to hit <escape> to get back to a regular R interaction.
In R, functions don't need explicit return. If not specified the last line of the function is automatically returned. Consider this example :
a <- 5
b <- 1
result <- if(a == 5) {
a <- a + 1
b <- b + 1
a
} else {b}
result
#[1] 6
The last line in if block was saved in result. Similarly, in your case the string values are "returned" implicitly.

R: Enriched debugging for linear code chains

I am trying to figure out if it is possible, with a sane amount of programming, to create a certain debugging function by using R's metaprogramming features.
Suppose I have a block of code, such that each line uses as all or part of its input the output from thee line before -- the sort of code you might build with pipes (though no pipe is used here).
{
f1(args1) -> out1
f2(out1, args2) -> out2
f3(out2, args3) -> out3
...
fn(out<n-1>, args<n>) -> out<n>
}
Where for example it might be that:
f1 <- function(first_arg, second_arg, ...){my_body_code},
and you call f1 in the block as:
f1(second_arg = 1:5, list(a1 ="A", a2 =1), abc = letters[1:3], fav = foo_foo)
where foo_foo is an object defined in the calling environment of f1.
I would like a function I could wrap around my block that would, for each line of code, create an entry in a list. Each entry would be named (line1, line2) and each line entry would have a sub-entry for each argument and for the function output. the argument entries would consist, first, of the name of the formal, to which the actual argument is matched, second, the expression or name supplied to that argument if there is one (and a placeholder if the argument is just a constant), and third, the value of that expression as if it were immediately forced on entry into the function. (I'd rather have the value as of the moment the promise is first kept, but that seems to me like a much harder problem, and the two values will most often be the same).
All the arguments assigned to the ... (if any) would go in a dots = list() sublist, with entries named if they have names and appropriately labeled (..1, ..2, etc.) if they are assigned positionally. The last element of each line sublist would be the name of the output and its value.
The point of this is to create a fairly complete record of the operation of the block of code. I think of this as analogous to an elaborated version of purrr::safely that is not confined to iteration and keeps a more detailed record of each step, and indeed if a function exits with an error you would want the error message in the list entry as well as as much of the matched arguments as could be had before the error was produced.
It seems to me like this would be very useful in debugging linear code like this. This lets you do things that are difficult using just the RStudio debugger. For instance, it lets you trace code backwards. I may not know that the value in out2 is incorrect until after I have seen some later output. Single-stepping does not keep intermediate values unless you insert a bunch of extra code to do so. In addition, this keeps the information you need to track down matching errors that occur before promises are even created. By the time you see output that results from such errors via single-stepping, the matching information has likely evaporated.
I have actually written code that takes a piped function and eliminates the pipes to put it in this format, just using text manipulation. (Indeed, it was John Mount's "Bizarro pipe" that got me thinking of this). And if I, or we, or you, can figure out how to do this, I would hope to make a serious run on a second version where each function calls the next, supplying it with arguments internally rather than externally -- like a traceback where you get the passed argument values as well as the function name and and formals. Other languages have debugging environments like that (e.g. GDB), and I've been wishing for one for R for at least five years, maybe 10, and this seems like a step toward it.
Just issue the trace shown for each function that you want to trace.
f <- function(x, y) {
z <- x + y
z
}
trace(f, exit = quote(print(returnValue())))
f(1,2)
giving the following which shows the function name, the input and output. (The last 3 is from the function itself.)
Tracing f(1, 2) on exit
[1] 3
[1] 3

attach() inside function

I'd like to give a params argument to a function and then attach it so that I can use a instead of params$a everytime I refer to the list element a.
run.simulation<-function(model,params){
attach(params)
#
# Use elements of params as parameters in a simulation
detach(params)
}
Is there a problem with this? If I have defined a global variable named c and have also defined an element named c of the list "params" , whose value would be used after the attach command?
Noah has already pointed out that using attach is a bad idea, even though you see it in some examples and books. There is a way around. You can use "local attach" that's called with. In Noah's dummy example, this would look like
with(params, print(a))
which will yield identical result, but is tidier.
Another possibility is:
run.simulation <- function(model, params){
# Assume params is a list of parameters from
# "params <- list(name1=value1, name2=value2, etc.)"
for (v in 1:length(params)) assign(names(params)[v], params[[v]])
# Use elements of params as parameters in a simulation
}
Easiest way to solve scope problems like this is usually to try something simple out:
a = 1
params = c()
params$a = 2
myfun <- function(params) {
attach(params)
print(a)
detach(params)
}
myfun(params)
The following object(s) are masked _by_ .GlobalEnv:
a
# [1] 1
As you can see, R is picking up the global attribute a here.
It's almost always a good idea to avoid using attach and detach wherever possible -- scope ends up being tricky to handle (incidentally, it's also best to avoid naming variables c -- R will often figure out what you're referring to, but there are so many other letters out there, why risk it?). In addition, I find code using attach/detach almost impossible to decipher.
Jean-Luc's answer helped me immensely for a case that I had a data.frame Dat instead of the list as specified in the OP:
for (v in 1:ncol(Dat)) assign(names(Dat)[v], Dat[,v])

Resources