R: How to pass two filepaths as parameters to a function? - r

I'm trying to pass two file paths as parameters to a function. But it's not accepting the inputs. Here's what I'm doing:
partition<-function(d1,p2){
d1<-read.table(file = d1, fill = TRUE)
p2<-read.table(file = p2, fill = TRUE)
}
and while calling the function:
partition("samcopy.txt","partcopy.txt")
The .txt is not being read by the variables inside the function. How to make the variables read the table?

AidanGawronSki's approach works, but from a programming standpoint should be avoided! Here is a more traditional answer to your problem.
partition<-function(d1,p2){
a <- read.table(file = d1, fill = TRUE)
b <- read.table(file = p2, fill = TRUE)
res <- list(a,b)
names(res) <- c(d1,p2)
res
}
To understand why the above approach is "better", it is important to understand what environments are and more generally the R scoping rules. Environments are essentially your workspace. For example, when you first open R and begin assigning objects, these objects are stored within the Global Environment. Another example of an environment is when you call a function, the function creates it own environment comprised of any parameters you have passed to the function. By doing this R ensures that when you call a function, it has no "side effects" or said another way it does not affect the global environment.
Let me show you an example. Imagine you begin an R session, and assign d1 <- 1 in your Global Environment. You're going to want to use d1 later on in your analysis and it would be a shame if it changed without you knowing it, right?
If you utilize AidanGawronSki's approach when you call
partition<-function(d1,p2){
d1 <<- read.table(file = d1, fill = TRUE)
}
The d1 in your Global Environment will change to be read.table(file = d1, fill = TRUE). This is very very dangerous! A object you previously assigned to be one thing is now another thing and you are not even warned of this change.
The same problem, however, will never occur with the approach I have proposed. I strongly recommend you get in the habit of using this approach! If you don't any function can change things in your Global Environment without you knowing.
For more info read this, this or just google something like "functions with no side effects"
FYI there are also several other problems with your code. First you need to tell your function what to return. All you did is call a function, assign stuff to the local environment and then close the function. Functions will always return the last line (as long as it is not an assignment). This is why in my example, I put res as the last line of the function. Also you are not correctly assigning your object. You pass a string like d1 <- "text.txt", to your function and then ask your function to do the following, "text.txt" <- read.table("text.txt",...). That simply does not make sense. You need to assign the output from read.table to an object. In my example, I assign them to a and b.

use the super assignment operator <<-
partition<-function(d1,p2){
d1 <<- read.table(file = d1, fill = TRUE)
p2 <<- read.table(file = p2, fill = TRUE)
}

Related

Is there a way to re-assign an R accessor function and use it to update the variable properties it accesses?

In my code there is a situation where I conditionally want to use one accessor function or another throughout the code. Instead of having an if-else statement for every time I want to pick which accessor to use and coding it explicitly, I tried to conditionally assign either of the accessor functions to a new function called accessor_fun and use it throughout the code, but this returns an error when I use the accessor function to reassign the values it accesses. Here is a simplified example of the problem I am having:
#reassigning the base r function names to a new function name
alt_names_fun <- names
example_list <- list(cat = 7, dog = 8, fish = 33)
other_example_list <- list(table = 44, chair = 101, desk = 35)
#works
alt_names_fun(example_list)
#throws error
alt_names_fun(example_list) <- alt_names_fun(other_example_list)
#still throws error
access_and_assign <- function(x, y, accessor) {
accessor(x) <- accessor(y)
}
access_and_assign(x = example_list, y = other_example_list, accessor = alt_names_fun)
#still throws error
alt_names_fun_2 <- function(x){names(x)}
alt_names_fun_2(example_list) <- alt_names_fun_2(other_example_list)
#works
names(example_list) <- names(other_example_list)
As you see if you try the code above, an example of the kind of error I am getting is
Error in alt_names_fun(example_list) <- alt_names_fun(other_example_list) :
could not find function "alt_names_fun<-"
So my question is, is there a way to do the reassignment of R accessor functions and use them in a way like I am trying to in the example above?
Accessor functions are really pairs of functions. One for retrieval and one for assignment. If you want to replicate that, you need to replicate both parts
alt_names_fun <- names
`alt_names_fun<-` <- `names<-`
The assignment versions have <- in their name. This is a special naming convection that R uses to find them. Since these are character normally not allowed in basic symbol names, you need to use the back ticks to enclose the function names.

R Function - assign LMER to dynamic variable name

To create a more compact script, I am trying to create my first function.
The general function is:
f.mean <- function(var, fig, datafile){
require(lme4)
change <- as.symbol(paste("change", var, sep=""))
base <- as.symbol(paste("baseline", var, sep = ""))
x <- substitute(lmer(change ~ base + (1|ID), data=datafile))
out<-eval(x)
name <- paste(fig,".", var, sep="")
as.symbol(name) <- out
}
}
The purpose of this function is to input var, fig and datafile and to output a new variable named fig.var containing out (eval of LMER).
Apparently it is difficult to 'change' the variable name on the left side of the <-.
What we have tried so far:
- assign(name, out)
- as.symbol(name) <<- out
- makeActive Binding("y",function() x, .GlobalEnv)
- several rename options to rename out to the specified var name
Can someone help me to assign the out value to this 'run' specific variable name? All other suggestions are welcome as well.
As #Roland comments, in R (or any) programming one should avoid indirect environment manipulators such as assign, attach, list2env, <<-, and others which are difficult to debug and break the flow of usual programming involving explicitly defined objects and methods.
Additionally, avoid flooding your global environment of potentially hundreds or thousands of similarly structured objects that may require environment mining such as ls, mget, or eapply. Simply, use one large container like a list of named elements which is more manageable and makes code more maintainable.
Specifically, be direct in assigning objects and pass string literals (var, fig) or objects (datafile) as function parameters and have function return values. And for many inputs, build lists with lapply or Map (wrapper to mapply) to retain needed objects. Consider below adjustment that builds a formula from string literals and passes into your model with results to be returned at end.
f.mean <- function(var, fig, datafile){
require(lme4)
myformula <- as.formula(paste0("change", var, " ~ baseline", var, " + (1|ID)"))
x <- lmer(myformula, data=datafile)
return(x)
}
var_list <- # ... list/vector of var character literals
fig_list <- # ... list/vector of fig character literals
# BUILD AND NAME LIST OF LMER OUTPUTS
lmer_list <- setNames(Map(f.mean, var_list, fig_list, MorArgs=df),
paste0(fig_list, ".", var_list))
# IDENTIFY NEEDED var*fig* BY ELEMENT NAME OF LARGER CONTAINER
lmer_list$fig1.var1
lmer_list$fig2.var2
lmer_list$fig3.var3

Why assigning to globalenv not allowed when creating R package?

I'm creating a R package and I have a function that returns an object which its name is constructed with the argument passed.
I use the function assign() to do this as in the code below and it works fine.
df <- data.frame(A = 1:10, B = 2:11, C = 5:14)
ot_test <- function(df, min){
tmp <- colSums(df)
tmp2 <- df[, tmp >= min]
assign(paste0(deparse(substitute(df)), "_min_", min), tmp2, envir= .GlobalEnv)
}
ot_test(df,60)
ls()
[1] "df" "df_min_60" "ot_test"
But when I check the package with devtools::check I have the message.
Found the following assignments to the global environment:
File 'test/R/ottest.R':
assign(paste0(deparse(substitute(df)), "_min_", min), tmp2, envir = .GlobalEnv)
Is there a way to do the same without having .GlobalEnv in argument or without using the function assign().
Its just an ugly, bad thing to do in a functional programming environment.
What's wrong with:
df_min_60 = ot_test(df,60)
Your argument will be that your method saves a bit of typing, but it opens you up to all sorts of bugs and obscurities.
Suppose I want to call ot_test in a function, in a loop maybe. Now its stomped on, with no warning or obvious clue its going to do it, the df_min_60 in my global workspace. Gee thanks for that. So what do I have to do?
ot_test(df, 60)
# now rename so I don't stomp on it
df_min_60.1 = df_min_60
results = domyloop(d1,d2,d3)
Which has meant more typing.
Now another idea. Suppose I want to call ot_test on a list of data frames and make a list of the results. Normally I'd do something like:
for(i in 1:10){res[[i]] = ot_test(data[[i]], 60)}
but with your code I can't. I have to do:
for(i in 1:10){d=data[[i]]; ot_test(d,60); res[[i]] = d_min_60)}
which is WAY more typing.
Be thankful that devtools::check only gives a message and doesn't set your computer on fire for doing this. Seriously, don't create things in the global environment, return them as return values.

Retrieve/access dynamic variable from R function

I have a function in R that structures my raw data. I create a dataframe called output and then want to make a dynamic variable name depending on the function value block.
The output object does contain a dataframe as I want, and to rename it dynamically, at the end of the function I do this (within the function):
a = assign(paste("output", block, sep=""), output)
... but after running the function there is no object output1 (if block = 1). I simply cannot retrieve the output object, neither merely output nor the dynamic output1 version.
I tried this then:
a = assign(paste("output", block, sep=""), output)
return(a)
... but still - no success.
How can I retrieve the dynamic output variable? Where is my mistake?
Environments.
assign will by default create a variable in the environment in which it's called. Read about environments here: http://adv-r.had.co.nz/Environments.html
I assume you're doing something like:
foo <- function(x){ assign("b", x); b}
If you run foo(5), you'll see it returns 5 as expected (implying that b was created successfully somewhere), but b won't exist in your current environment.
If, however, you do something like this
foo <- function(x){ assign("b", x, envir=parent.frame()); b}
Here, you're assigning not to the current environment at the time assign is called (which happens to be foo's environment). Instead, you're assigning into the parent environment (which, since you're calling this function directly, will be your environment).
All this complexity should reveal to you that this will be fairly complex, a nightmare to maintain, and a really bad idea from a maintenance perspective. You'd surely be better off with something like:
foo <- function(x) { return(x) };
b <- foo(5)
Or if you need multiple items returned:
foo <- function(x) { return(list(df=data.frame(col1=x), b=x)) }
results <- foo(5)
df <- results$df
b <- results$b
But ours is not to reason why...

Using bnlearn Function "cpquery" Within a Loop

I'm attempting to use the bnlearn package to calculate conditional probabilities, and I'm running into a problem when the "cpquery" function is used within a loop. I've created an example, shown below, using data included with the package. When using the cpquery function in a loop, a variable created in the loop ("evi" in the example) is not recognized by the function. I receive the error:
Error in parse(text = evi) : object 'evi' not found
The creation steps of "evi" are based on examples provided by the author.
Any help you could provide would be great. I'm desperate to find a way that I can apply the cpquery function for a large number of observations.
library(bnlearn)
data(learning.test)
fitted = bn.fit(hc(learning.test), learning.test)
bn.function <- function(network, evidence_data) {
a <- NULL
b <- nrow(evidence_data)
for (i in 1:b) {
evi <- paste("(", names(evidence_data), "=='",
sapply(evidence_data[i,], as.character), "')",
sep = "", collapse = " & ")
a[i] <- cpquery(network, (C=='c'), eval(parse(text=evi)))
}
return(a)
}
test <- bn.function(fitted, learning.test)
Thanks in advance!
I don't know if this is due to a bugfix or just because I tried another approach - anyways, looping works if you iteratively build up the evidence list outside of the cpquery-function.
An example for an iteration through a list called evidenceData with all-positive evidences:
for(i in names(evidenceData)){
loopEvidenceList <- list()
loopEvidenceList[[i]] <- "TRUE"
a =cpquery(fitted = bayesNet, event = queryNode == "TRUE",
evidence = loopEvidenceList, method = "lw", n = 100000)
print(a)
}
Depending on the way your evidence is availible, you might need more sophisticated preparation of the "loopEvidenceList" but once you got that prepared, it works fine.
To avoid the scoping problem, you can postpone the call to eval and do it inside the cpquery function. If you directly pass evi (the character variable) to cpquery and then parse it inside the definition, the chain of environments gets shifted and cpquery will have access to evi.
You can use m.cpquery <- edit(cpquery) to fork your own version of the function and insert the following line at its beginning:
evidence = parse(text = evidence)
and then save your new function.
So the heading of m.cpquery will look like:
> m.cpquery
function (fitted, event, evidence, cluster = NULL, method = "ls",
..., debug = FALSE)
{
evidence = parse(text = evidence)
check.fit(fitted)
check.logical(debug)
...
Now you can use m.cpquery in your own function like before, except we'll pass the plain character variable to it:
a[i] <- m.cpquery(network, (C=='c'), evi)
Note that in the first line of m.cpquery, we only parsed the evidence character variable and didn't call eval on it. cpquery is a front-end to conditional.probability.query (see here) and we're relying on conditional.probability.query's subsequent call to eval.
I should say that this is a rather ugly workaround. And it only works if you are using logic sampling (method='ls'). But if you want to use likelihood weighting, the check.mutilated.evidence function will raise an error. I haven't checked if injecting an eval expression before it gets called would result in a mayhem of subsequent errors leading to hell.
I feel like the problem is you are using the same variable in evidence as well as event. Learning.test contains the values of "C" variable. then we are trying to predict C as the event. Maybe using a subset of the original dataset excluding C will do the trick

Resources