Can someone please explain me this code? especially the role of "function x and [[x]]"? - r

This is the code in R and I'm having trouble understanding the role of function(x) and qdata[[x]] in this line of code. Can someone elaborate me this piece by piece? I didn't write this code. Thank you
outs=lapply(names(qdata[,12:35]), function(x)
hist(qdata[[x]],data=qdata,main="Histogram of Quality Trait",
xlab=as.character(x),las=1.5)$out)

This code generate a series of histograms, one for each of columns 12 to 35 of dataframe qdata. The lapply function iterates over the columns. At each iteraction, the name of the current column is passed as argument "x" to the anonymous function defined by "function(x)". The body of the function is a call to the hist() function, which creates the histogram. qdata[[x]] (where x is the name of a column) extracts the data from that column. I am actually confused by "data=qdata".

We don't have the data object named qdata so we cannot really be sure what will happen with this code. It appears that the author of this code is trying to pass the values of components named outs from function calls to hist. If qdata is an ordinary dataframe, then I suspect that this code will fail in that goal, because the hist function does not have an out component. (Look at the output of ?hist. When I run this with a simple dataframe, I do get histogram plots that appear in my interactive plotting device but I get NULL values for the outs components. Furthermore the 12 warnings are caused by the lack of a data parameter to hte hist function.
qdata <- data.frame(a=rnorm(10), b=rnorm(10))
outs=lapply(names(qdata), function(x)
hist(qdata[[x]],data=qdata,main="Histogram of Quality Trait",
xlab=as.character(x),las=1.5)$out)
#There were 12 warnings (use warnings() to see them)
> str(outs)
List of 2
$ : NULL
$ : NULL
So I think we need to be concerned about the level of R knowledge of the author of this code. It's possible I'm wrong about this presumption. The hist function is generic and it is possible that some unreferenced package has a function designed to handle a data object and retrun an outs value when delivered a vector having a particular class. In a typical starting situation with only the base packages loaded however, there are only three hist.* functions:
methods(hist)
#[1] hist.Date* hist.default hist.POSIXt*
#see '?methods' for accessing help and source code
As far as the questions about the role of function and [[x]]: the keyword function returns a language object that can receive parameter values and then do operations and finally return results. In this case the names get passed to the anonymous function and become, each in turn, the local name, x and the that value is used by the '[['-function to look-up the column in what I am presuming is the ‘qdata’-dataframe.

Related

To find valid argument for a function in R's help document (meaning of ...)

This question may seem basic but this has bothered me quite a while. The help document for many functions has ... as one of its argument, but somehow I can never get my head around this ... thing.
For example, suppose I have created a model say model_xgboost and want to make a prediction based on a dataset say data_tbl using the predict() function, and I want to know the syntax. So I look at its help document which says:
?predict
**Usage**
predict (object, ...)
**Arguments**
object a model object for which prediction is desired.
... additional arguments affecting the predictions produced.
To me the syntax and its examples didn't really enlighten me as I still have no idea what the valid syntax/arguments are for the function. In an online course it uses something like below, which works:
data_tbl %>%
predict(model_xgboost, new_data = .)
However, looking across the help doc I cannot find the new_data argument. Instead it mentioned newdata argument in its Details section, which actually didn't work if I displace the new_data = . with newdata = .:
Error in `check_pred_type_dots()`:
! Did you mean to use `new_data` instead of `newdata`?
My questions are:
How do I know exactly what argument(s) / syntax can be used for a function like this?
Why new_data but not newdata in this example?
I might be missing something here, but is there any reference/resource about how to use/interpret a help document, in plain English? (a lot of document, including R help file seem just give a brief sentence like "additional arguments affecting the predictions produced" etc)
#CarlWitthoft's answer is good, I want to add a little bit of nuance about this particular function. The reason the help page for ?predict is so vague is an unfortunate consequence of the fact that predict() is a generic method in R: that is, it's a function that can be applied to a variety of different object types, using slightly different (but appropriate) methods in each case. As such, the ?predict help page only lists object (which is required as the first argument in all methods) and ..., because different predict methods could take very different arguments/options.
If you call methods("predict") in a clean R session (before loading any additional packages) you'll see a list of 16 methods that base R knows about. After loading library("tidymodels"), the list expands to 69 methods. I don't know what class your object is (class("model_xgboost")), but assuming that it's of class model_fit, we look at ?predict.model_fit to see
predict(object, new_data, type = NULL, opts = list(), ...)
This tells us that we need to call the new data new_data (and, reading a bit farther down, that it needs to be "A rectangular data object, such as a data frame")
The help page for predict says
Most prediction methods which are similar to those for linear
models have an argument ‘newdata’ specifying the first place to
look for explanatory variables to be used for prediction
(emphasis added). I don't know why the parsnip authors (the predict.model_fit method comes from the parsnip package) decided to use new_data rather than newdata, presumably in line with the tidyverse style guide, which says
Use underscores (_) (so called snake case) to separate words within a name.
In my opinion this might have been a mistake, but you can see that the parsnip/tidymodels authors have realized that people are likely to make this mistake and added an informative warning, as shown in your example and noted e.g. here
Among other things, the existence of ... in a function definition means you can enter any arguments (values, functions, etc) you want to. There are some cases where the main function does not even use the ... but passes them to functions called inside the main function. Simple example:
foo <- function(x,...){
y <- x^2
plot(x,y,...)
}
I know of functions which accept a function as an input argument, at which point the items to include via ... are specific to the selected input function name.

use of $ and () in same syntax?

I'm certain there is a really basic answer to this, which is possibly why I'm finding it hard to actually search for and find an answer. But... can somebody please explain exactly what it means to combine $ and () in the same syntax in R?
For example from this vignette:
https://cran.r-project.org/web/packages/pivottabler/vignettes/v00-vignettes.html
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$renderPivot()
I never encountered this while learning R until now years later. I'm seeing it more and more lately but it is not intuitive to me?
$ is usually used when accessing sub-structures of objects in R like columns of a data frame e.g dataframe$column1, while () is usually used to enclose all arguments of a named function e.g rnorm(10,0,1)
What does it mean when they are used together? e.g. x$y(z)
The dollar is a generic operator used to extract or replace parts of recursive objects, such as lists and data frames.
A list is an object consisting of an ordered collection of objects (including other lists), perhaps of different types, said components.
Consider the following list:
L <- list(a = 1, f = function() message("hello"))
This is a list with two components: a and f.
The first is a number and the second is a function. By applying the $-operator, you extract the value of the component, which can also be reassigned:
L$a
# 1
L$a <- 2
L$a
# 2
In the case of the f component, because it is a function, you get its body:
L$f
# function() message("hello")
This is in line with each function identifier: its value is the function's body. It is not surprising that, applying the parentheses to the function's identifier, you execute the function, that is:
L$f()
# hello
This opens the doors to very powerful structures, where you can store both data and the functions to manipulate them.
This logic resembles the classes used in the OOP world. Of course, you need much more features, such instantiations, inheritance. Such mechanisms are provided, for example, by the R6 package, which you mention in your tag.
library(R6)
A <- R6Class("A", list(f=function() message("hello") ))
a <- A$new()
a$f()
# hello
A is an R6 class, so A$new() creates a new instance of the class, a, by means of the class function new. As you can see, this function is called using a syntax (and a logic) similar to L$f() above. The instance a inherits the class function f, said method here, and a$f() executes it.

Working with "..." input in R function

I am putting together an R function that takes some undefined input through the ... argument described in the docs as:
"..." the special variable length argument ***
The idea is that the user will enter a number of column names here, each belonging to a dataset also specified by the user. These columns will then be cross-tabulated in comparison to the dependent variable by tapply. The function is to return a table (independent variable x indedependent variable).
Thus, I tried:
plotter=function(dataset, dependent_variable, ...)
{
indi_variables=list(...); # making a list of the ... input as described in the docs
result=with (dataset, tapply(dependent_variable, indi_variables, mean); # this fails
}
I figured this should work as tapply can take a list as input.
But it does not in this case ('Error in tapply...arguments must have same length') and I think it is because indi_variables is a list of strings.
If I input the contents of the list by hand and leave out the quotation marks, everything works just fine.
However, if the user feeds the function the column names as non-strings, R will interpret them as variable names; and I cannot figure out how to transform the list indi_variables in the right way, unsuccessfully trying things like this:
indi_variables=lapply(indi_variables, as.factor)
So I am wondering
What causes the error described above? Is my interpretation correct?
How would one go about transforming the list created through ... in the right way?
Is there an overall better way of doing this, in the input or the implementation of tapply?
Any help is much appreciated!
Thanks to Joran's helpful reading, I have come up with these improvements than make things work out...
indi_variables=substitute(list(...));
result=with (dataset, tapply(dependent_variable, eval(indi_variables, dataset), FUN=mean));

How do I remove an object from within a function environment in R?

How do I remove an object from the current function environment?
I'm trying to achieve this:
foo <- function(bar){
x <- bar
rm(bar, envir = environment())
print(c(x, is.null(bar)))
}
Because I want the function to be able to handle multiple inputs.
Specifically I'm trying to pass either a dataframe or a vector to the function, and if I'm passing a dataframe I want to set the vector to NULL for later error handling.
If you want, you can watch my DepthPlotter script, where I want to let the second function check if depth is a dataframe, and if so, assign it to df in stead and remove depth from the environment.
Here is a very brief sketch of how to set this up using S3 method dispatch.
First, you define your generic:
DepthPlotter <- function(depth,...){
UseMethod("DepthPlotter", depth)
}
Then you define methods for specific classes of the argument depth. As a very basic example in your case, you might create only two, a data.frame method and a default method to handle the vector case:
DepthPlotter.default <- function(depth, variable, ...){
#Here you write a function assuming that depth is
# anything but a data frame
}
DepthPlotter.data.frame <- function(depth,...){
#Here you'd write a function that assumes
# that depth is a data frame
}
And then you can call DepthPlotter() using either type of argument and the correct function will be run based upon the result of class(depth).
The example I've sketched out here is a little crude, since I've used a default method to handle the vector case. You could write .numeric and .integer methods to handle numeric or integer vectors more specifically. In my example, the .default method will be called for any case other than data.frame, so if you go this route you'd want to write some code in there that checks for strange cases like depth being a complicated list, or other odd object, if you think there's a chance something like that might be passed to the function.

Naming columns of coefficient matrix in a VAR

I am searching for a fast and simple way to give comprehensible names to the columns of a VAR-coefficient matrix.
What I would like to use is the function VAR.names, which is used in the function VAR.est() in the VAR.etp-package. When I use the function VAR.est(), this works perfectly, but as soon as I modify VAR.est (by adding another element to the list of values which are returned), I receive an error message stating "could not find function VAR.names".
I could not find any information on the function VAR.names.
Example:
library(VAR.etp)
data(dat)
M=VAR.est(dat,p=2,type="const")
M$coef
Another possibility would be to use a loop as in the function VAR() from the vars package, but if VAR.names would actually work, this would be a lot more elegant!

Resources