Set string = to function in r - r

I'm interested in why you would set "string" <- function(x). I am interested in some feedback on what this does. I have included some code below. The function is within a larger function. I understand what unique(), which.max(), tabulate(), and match() are doing, but I am confused on "mode" <- function(x)
The code was sent to me from another party, and works fine. I'm just interested in what it does.
metric = function(z, i, rn, minht, above)
{
"mode" <- function(x){
ux <-unique(x)
ux[which.max(tabulate(match(x,ux)))]
}
metrics = list(
)
return(metrics)
}

Related

IF statements inside function do not recognize conditions

I want to adjust my function so that my if and else if statements recognize the name of the dataframe used and execute the correct plotting function. These are some mock data structured the same as mine:
df1<-data.frame(A=c(1,2,2,3,4,5,1,1,2,3),
B=c(4,4,2,3,4,2,1,5,2,2),
C=c(3,3,3,3,4,2,5,1,2,3),
D=c(1,2,5,5,5,4,5,5,2,3),
E=c(1,4,2,3,4,2,5,1,2,3),
dummy1=c("yes","yes","no","no","no","no","yes","no","yes","yes"),
dummy2=c("high","low","low","low","high","high","high","low","low","high"))
df1[colnames(df1)] <- lapply(df1[colnames(df1)], factor)
vals <- colnames(df1)[1:5]
dummies <- colnames(df1)[-(1:5)]
step1 <- lapply(dummies, function(x) df1[, c(vals, x)])
step2 <- lapply(step1, function(x) split(x, x[, 6]))
names(step2) <- dummies
tbls <- unlist(step2, recursive=FALSE)
tbls<-lapply(tbls, function(x) x[(names(x) %in% names(df1[c(1:5)]))])
A<-lapply(tbls,"[", c(1,2))
B<-lapply(tbls,"[", c(3,4))
C<-lapply(tbls,"[", c(3,4))
list<-list(A,B,C)
names(list)<-c("A","B","C")
And this is my function:
plot_1<-function (section, subsample) {
data<-list[grep(section, names(list))]
data<-data[[1]]
name=as.character(names(data))
if(section=="A" && subsample=="None"){plot_likert_general_section(df1[c(1:2)],"A")}
else if (section==name && subsample=="dummy1"){plot_likert(data$dummy1.yes, title=paste("How do the",name,"topics rank?"));plot_likert(data$Ldummy1.no, title = paste("How do the",name,"topics rank?"))}
}
Basically what I want it to do is plot a certain graph by specifying section and subsample I'm interested in if, for example, I want to plot section C and subsample dummy.1, I just write:
plot_1(section="C", subsample="dummy1)
I want to avoid writing this:
else if (section=="A" && subsample=="dummy1"){plot_likert(data$dummy1.yes, title=paste("How do the A topics rank?"));plot_likert(data$Ldummy1.no, title = paste("How do the A topics rank?"))}
else if (section=="B" && subsample=="dummy1"){plot_likert(data$dummy1.yes, title=paste("How do the B topics rank?"));plot_likert(data$Ldummy1.no, title = paste("How do the B topics rank?"))}
else if (section=="C" && subsample=="dummy1"){plot_likert(data$dummy1.yes, title=paste("How do the c topics rank?"));plot_likert(data$Ldummy1.no, title = paste("How do the C topics rank?"))}
else if (section=="C" && subsample=="dummy2")...
.
.
}
So I tried to extract the dataframe used from the list so that it matches the string of the section typed in the function (data<-list[grep(section, names(list))]) and store its name as a character (name=as.character(names(data))), because I thought that in this way the function would have recognized the string "A", "B" or "C" by itself, without the need for me to specify each condition.
However, if I run it, I get this error: Warning message: In section == name && subsample == "dummy1" : 'length(x) = 4 > 1' in coercion to 'logical(1)', that, from what I understand, is due to the presence of a vector in the statement. But I have no idea how to correct for this (I'm still quite new to R).
How can I fix the function so that it does what I want? Thanks in advance!
Well, I can't really test your code without the plot_likert_general_section function or the plot_likert function, but I've done a bit of simplifying and best practices--passing list in as an argument, consistent spaces and assignment operators, etc.--and this is my best guess as to what you want:
plot_1 = function(list, section, subsample) { ## added `list` as an argument
data = list[[grep(section, names(list))]] # use [[ to extract a single item
name = as.character(names(data))
if(subsample == "None"){
plot_likert_general_section(df1[c(1:2)], section)
} else {
yesno = paste(subsample, c("yes", "no"), sep = ".")
plot_likert(data[[yesno[1]]], title = paste("How do the", name, "topics rank?"))
plot_likert(data[[yesno[2]]], title = paste("How do the", name, "topics rank?"))
}
}
plot_1(list, section = "C", subsample = "dummy1)
I'm not sure if your plot_likert functions use base or grid graphics--but either way you'll need to handle the multiple plots. With base, probably use mfrow() to display both of them, if grid I'd suggest putting them in a list to return them both, and then maybe using gridExtra::grid.arrange() (or similar) to plot both of them.
You're right that the error is due to passing a vector where a single value is expected. Try inserting print statements before the equality test to diagnose why this is.
Also, be careful with choosing variable names like name which are baseR functions (e.g. ?name). I'd also recommend following the tidyverse style guide here: https://style.tidyverse.org/.

Mocking functions inside a for loop in R

I'm using testthat package for unit testing in R. I have a function CalcByResultSubModel which has one more function CalculateX which is called inside the main function. This is the main function,
CalcByResultSubModel = function(doll_data, fn_master, modelPath) {
# load sub model result
load(modelPath)
# calculation
for(abc in c("ABC", fn_master$fn_a)) {
# columns
col_name = paste0("x", abc)
iterModel = resultSubmodel[[abc]]
# calculate yhat X
doll_data[, col_name] = iterModel %>%
purrr::map(., function(imodel) {
CalculateX(data, imodel)
}) %>%
as.data.frame(.) %>%
apply(., 1, mean)
message(paste(col_name, "calculated"))
}
This is the function CalculateX
CalculateX = function(data, model) {
iterData = data %>%
dplyr::select(model$feature_names) %>%
as.matrix(.)
set.seed(131)
result = predict(model, iterData, missing = NA)
result = matrix(result, 2)[2, ]
return(result)
}
Inorder to perform unit testing we have to mock the function CalculateX. But the complexity here is that, the function is called inside for loop in the main function. I'm quite new to this scenario in my unit testing. Can anyone help me with the mocking of the function in a for loop? This is the code for mocking and I tried this.
local_mock(CalculateX = function(data, model){
for (abc in c("ABC", fn_master$fn_a)
case_when(
abc == "feature1" ~ .ReadCsvWrapper("feature1.csv"),
abc == "feature2" ~ .ReadCsvWrapper( "feature2.csv"),
abc == "feature3" ~ .ReadCsvWrapper("feature3.csv"))
})
But the above approach doesn't seem to work for me. Can anyone help me with this?
There are a couple of problems in your code. First, CalcByResultSubModel calls CalculateX in the loop
for(abc in c("ABC", fn_master$fn_a)) {
# columns
col_name = paste0("x", abc)
iterModel = resultSubmodel[[abc]]
# calculate yhat X
doll_data[, col_name] = iterModel %>%
purrr::map(., function(imodel) {
CalculateX(data, imodel)
}) %>%
as.data.frame(.) %>%
apply(., 1, mean)
message(paste(col_name, "calculated"))
}
so you don't need to put that for(abc in c("ABC", fn_master$fn_a)) into the mocked function. Just set it up to return results similar to what one call to the real function would do.
The second problem is that in the real CalculateX, you have set.seed(131). This is almost certainly a bad idea. It resets the random number generator to a fixed setting every time CalculateX is called, which makes it completely non-random, and also makes calls to random number functions afterwards repeat their outputs.
It's often a good idea to set the seed once at the top of your testing script so that tests are predictable, but resetting it as often as you did is not.

R: substitue and execution environments

I am currently writing a function that will take an equation as an argument. The function will expect variables to be apart of the column names of data.
mydata <- data.frame(x=c(1,2,3,4),y=c(5,6,7,8), z=c(9,10,11,12))
my_function <- function(data, equ) {
EQU.sub <- deparse(substitute(equ))
#Check if colnames are used
for(i in 1:length(colnames(data)) {
if(str_detect(string = EQU.sub, pattern = colnames(data)[i])) {
#if used, create variable with its name.
assign(x = colnames(data)[i],
value = eval(parse(text = paste("data$",
colnames(data),
sep = ""))))
} else {
warning(paste(colnames[i], "was not used in EQU"))
}
}
df$new.value <- eval(equ)
output <- function(new.equ = equ)
return(df)
}
my_function(data = mydata, equ = x+(y^2))
I know what you may be thinking, this is a big workaround for just doing
mydata$x+(mydata$y^2)
THE ISSUE
The issue is that I want to pass my input of equ into an new function.
new_function <- function(new.equ) {
string <- deparse(substitute(new.equ))
#does some stuff....
return(output) }
however, when changing from execution environment of my_function to new_function, calling deparse(substitute(equ)) returns "equ" instead of "x+(y^2)"
I know that the function substitute returns what was explicitly assigned to the variable. (equ) but I am wondering if there is a way for new_function() to be able to see into the execution environment of my_function() so I can get the desired output of "x+(y^2)"
UPDATE
After thinking about it, I could change what I pass to new.equ to the deparsed version of equ as follows...
output <- function(new.equ = EQU.sub)
new_function <- function(new.equ) {
#given that these variables are available
value <- parse(text = new.equ)
#does some stuff....
return(output) }
but my original question still stands because I'm still new to R environments. Is there a more elegant way to go through execution environments?
Using non-standard evaulation like this can be pretty messy. Rather than trying to capture expressions from promises passed to functions, it's much safer just to pass a formula. For example
mydata <- data.frame(x=c(1,2,3,4),y=c(5,6,7,8), z=c(9,10,11,12))
my_function <- function(data, equ) {
stopifnot(inherits(equ, "formula"))
eval(equ[[2]], data)
}
new_function <- function(newequ) {
my_function(mydata, newequ)
}
my_function(mydata, ~x+(y^2))
new_function(~x+(y^2))
Or give your function an extra parameter where you can pass an expression instead so you don't have to rely on a promise. This makes it much easier to write other functions that can call your function.
my_function <- function(data, equ, .equ=substitute(equ)) {
eval(.equ, data)
}
new_function <- function(newequ) {
equ <- substitute(newequ)
my_function(mydata, .equ=equ)
}
my_function(mydata, x+(y^2))
new_function(x+(y^2))
my_function(mydata, .equ=quote(x+(y^2)))

What does builtins(internal = TRUE) return?

From ?builtins:
builtins(TRUE) returns an unsorted list of the names of internal functions, that is those which can be accessed as .Internal(foo(args ...)) for foo in the list.
I don't understand which functions are being returned.
I thought it would be all the closure functions in the base package that call .Internal().
However, the two sets don't match up.
base_objects <- mget(
ls(baseenv(), all.names = TRUE),
envir = baseenv()
)
internals <- names(
Filter(
assertive.types::is_internal_function,
base_objects
)
)
builtins_true <- builtins(internal = TRUE)
c(
both = length(intersect(internals, builtins_true)),
internals_not_builtins_true = length(setdiff(internals, builtins_true)),
builtins_true_not_internals = length(setdiff(builtins_true, internals))
)
## both internals_not_builtins_true builtins_true_not_internals
## 288 125 226
I also thought that it might be the values listed in src/main/names.c in R's source code, and there definitely seems to be some overlap with this, but it isn't exactly this list of values.
What is builtins() doing when you pass internal = TRUE?
Stibu's comment is a specific example of the general problem. ?builtins says that it fetches the names of the objects it returns directly from the symbol table (this is the C symbol table).
And builtins(TRUE) returns all the built-in objects callable via .Internal. That, however, doesn't mean there must be any function that calls .Internal(foo(args, ...)) for any foo.
Stibu gave one example: the internal function may not be called by an R function with the same name, as is the case for many generic functions where the default method calls .Internal.
Another example is something like .addCondHands and .addRestart, which are called by withCallingHandlers and withRestarts, respectively.
It's also possible that one R function calls multiple .Internal functions. I don't know of an example of that off the top of my head though.
After more digging, it seems that the list of functions is everything in the R_FunTab[] object in src/main/names.c where the second digit of the eval column is 1.
Here's a script to retrieve them.
library(stringi)
library(magrittr)
library(dplyr)
names.c <- readLines("https://raw.githubusercontent.com/wch/r-source/56a1b08b7282c5488acb71ee244098f4fd94f7c7/src/main/names.c")
fun_tab <- names.c[92:974] %>%
stri_replace_all_regex("^\\{", "") %>%
stri_replace_all_fixed("{PP", "PP") %>%
stri_replace_all_fixed("}},", "") %>%
stri_replace_all_fixed("\\t", "")
funs <- read.csv(text = fun_tab, header = FALSE, comment.char = "/")
cols <- names.c[86] %>%
stri_sub(4) %>%
stri_split_regex("\\t+") %>%
extract2(1) %>%
stri_trim()
colnames(funs) <- cols
funs$eval <- formatC(funs$eval, width = 3, flag = "0")
# Internal fns have 2nd digit of eval col == 1. See names.c[62:71]
internals <- funs %>% filter_(~ substring(eval, 2, 2) == 1)
I see slight differences when examining
setdiff(internals$printname, builtins(TRUE))
setdiff(builtins(TRUE), internals$printname)
For example builtins(TRUE) doesn't include shell.exec() if you aren't running Windows; mem.limits() was only recently removed from the devel branch of R, so it shows up in builtins(TRUE) for the current release version of R.

Selecting Which Argument to Pass Dynamically in R

I'm trying to pass a specific argument dynamically to a function, where the function has default values for most or all arguments.
Here's a toy example:
library(data.table)
mydat <- data.table(evildeeds=rep(c("All","Lots","Some","None"),4),
capitalsins=rep(c("All", "Kinda","Not_really", "Virginal"),
each = 4),
hellprobability=seq(1, 0, length.out = 16))
hellraiser <- function(arg1 = "All", arg2= "All "){
mydat[(evildeeds %in% arg1) & (capitalsins %in% arg2), hellprobability]}
hellraiser()
hellraiser(arg1 = "Some")
whicharg = "arg1"
whichval = "Some"
#Could not get this to work:
hellraiser(eval(paste0(whicharg, '=', whichval)))
I would love a way to specify dynamically which argument I'm calling: In other words, get the same result as hellraiser(arg1="Some") but while picking whether to send arg1 OR arg2 dynamically. The goal is to be able to call the function with only one parameter specified, and specify it dynamically.
You could use some form of do.call like
do.call("hellraiser", setNames(list(whichval), whicharg))
but really this just seems like a bad way to handle arguments for your functions. It might be better to treat your parameters like a list that you can more easily manipulate. Here's a version that allows you to choose values where the argument names are treated like column names
hellraiser2 <- function(..., .dots=list()) {
dots <- c(.dots, list(...))
expr <- lapply(names(dots), function(x) bquote(.(as.name(x)) %in% .(dots[[x]])))
expr <- Reduce(function(a,b) bquote(.(a) & .(b)), expr)
eval(bquote(mydat[.(expr), hellprobability]))
}
hellraiser2(evildeeds="Some", capitalsins=c("Kinda","Not_really"))
hellraiser2(.dots=list(evildeeds="Some", capitalsins=c("Kinda","Not_really")))
This use of ... and .dots= syntax is borrowed from the dplyr standard evaluation functions.
I managed to get the result with
hellraiser(eval(parse(text=paste(whicharg, ' = \"', whichval, '\"', sep=''))))

Resources