use of $ and () in same syntax? - r

I'm certain there is a really basic answer to this, which is possibly why I'm finding it hard to actually search for and find an answer. But... can somebody please explain exactly what it means to combine $ and () in the same syntax in R?
For example from this vignette:
https://cran.r-project.org/web/packages/pivottabler/vignettes/v00-vignettes.html
library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains)
pt$renderPivot()
I never encountered this while learning R until now years later. I'm seeing it more and more lately but it is not intuitive to me?
$ is usually used when accessing sub-structures of objects in R like columns of a data frame e.g dataframe$column1, while () is usually used to enclose all arguments of a named function e.g rnorm(10,0,1)
What does it mean when they are used together? e.g. x$y(z)

The dollar is a generic operator used to extract or replace parts of recursive objects, such as lists and data frames.
A list is an object consisting of an ordered collection of objects (including other lists), perhaps of different types, said components.
Consider the following list:
L <- list(a = 1, f = function() message("hello"))
This is a list with two components: a and f.
The first is a number and the second is a function. By applying the $-operator, you extract the value of the component, which can also be reassigned:
L$a
# 1
L$a <- 2
L$a
# 2
In the case of the f component, because it is a function, you get its body:
L$f
# function() message("hello")
This is in line with each function identifier: its value is the function's body. It is not surprising that, applying the parentheses to the function's identifier, you execute the function, that is:
L$f()
# hello
This opens the doors to very powerful structures, where you can store both data and the functions to manipulate them.
This logic resembles the classes used in the OOP world. Of course, you need much more features, such instantiations, inheritance. Such mechanisms are provided, for example, by the R6 package, which you mention in your tag.
library(R6)
A <- R6Class("A", list(f=function() message("hello") ))
a <- A$new()
a$f()
# hello
A is an R6 class, so A$new() creates a new instance of the class, a, by means of the class function new. As you can see, this function is called using a syntax (and a logic) similar to L$f() above. The instance a inherits the class function f, said method here, and a$f() executes it.

Related

Object modification only happens in list

I have put objects that I would like to edit in a list.
Say, the names of the objects are kind of like this:
name1_X
name1_Y
name2_X
name2_Y
And there are different sets of these objects, that are stored in different lists, so for each different set, they would have a slightly different name, like:
name1_P_X
name1_F1_X
name2_F2_Y
and so on..
So for every "name" there are six objects. There are two each ending with X or Y for P, F1, F2. We have three lists (listbF_P, listbF_F1, listbF_F2), each containing objects that end with X and Y.
I edited the objects in the list like this (example for only one list):
for (i in 1:NROW(listbF_P)){
listbF_P[[i]]#first.year <- 1986
listbF_P[[i]]#last.year <- 2005
listbF_P[[i]]#year.aggregate.method <- "mean"
listbF_P[[i]]#id <- makeFieldID(listbF_P[[i]])
}
When I check whether the changes were successfully applied, it works but only when referring to the objects inside the list but not the same objects "unlisted".
So if I call
listbF_P[[1]]#last.year
it returns
"2005"
But if I call
name1_X#last.year
it returns
"Inf"
The problem with this is that I want the edited objects in a different list later.
So I need either a way that the latter call example returns "2005" or a way that I can search for a certain object name pattern in multiple lists to put the ones that fit the pattern into another list.
This is because the example above was made with multiple lists (listbF_P, listbF_F1, listbF_F2) and these lists contain a pattern matching "X" and another matching "Y".
So basically I want to have two lists with edited objects, one matching pattern "X" and the other matching pattern "Y".
I would call the list matching the desired patterns like this:
listbF_ALL_X <- mget(ls(pattern=".*_X$"))
listbF_ALL_Y <- mget(ls(pattern=".*_Y$"))
The first list would hence contain all objects ending with "X", e.g.:
name1_P_X
name1_F1_X
name1_F2_X
name2_P_X
[...]
and I would like to have the ones that I edited in the loop earlier
..but when calling the objects out of that list
listbF_ALL_X[[1]]#last.year
again just returns
"Inf"
since it takes the objects out of the environment and not the list. But I want it to return the desired number that has been changed (e.g. "2005").
I hope my problem and the two possible ways of solving them are clear..
If something isn't, ask :)
Thanks for any input
Regards
In R, unlike in many other modern languages, (almost) all objects are logically copies of each other. You can’t have multiple names that are references to the same object (see below for caveats).
But even if this was supported, your design looks confusing. Rather than have lots of related objects with different names, put your objects into nested lists and classes that logically relate them. That is, rather than have objects with names name{1..10}_{P,F1,F2}_{X,Y}, you should have one list, name, in which you store nested lists or classes with named members P, F1, F2 which, in turn, are objects that have names X and Y. Then you could access an object by, say, name[1L]$P$X (or name[1L]#P#X, if you’re using S4 objects with slots).
Or you use a more data-oriented approach and flatten all these nested objects into a table with corresponding columns P, F1, F2, X and Y. Which solution is more appropriate depends on your exact use-case.
Now for the caveat: you can use reference semantics in R by using *environments8 instead of regular objects. When copying an environment, a reference to the same environment object is created. However, this semantic is usually confusing because it’s contrary to the expectation in R, so it should be used with care. The ‘R6’ package creates an object system with reference semantics based on environments. For many purposes where reference semantics are indispensable, ‘R6’ is the right answer.
I found another solution:
I went on by modifying this part:
listbF_ALL_X <- mget(ls(pattern=".*_X$"))
listbF_ALL_Y <- mget(ls(pattern=".*_Y$"))
To not call objects from the environment but by calling objects from each list:
listbF_ALL_X <- c(c(listbF_P, listbF_F1, listbF_F2)[grepl(".*_X$", names(c(listbF_P, listbF_F1, listbF_F2)))])
listbF_ALL_Y <- c(c(listbF_P, listbF_F1, listbF_F2)[grepl(".*_Y$", names(c(listbF_P, listbF_F1, listbF_F2)))])
It's not the prettiest way of doing it but it works and in my case it was the solution that required the least amount of change in my script.

Can someone please explain me this code? especially the role of "function x and [[x]]"?

This is the code in R and I'm having trouble understanding the role of function(x) and qdata[[x]] in this line of code. Can someone elaborate me this piece by piece? I didn't write this code. Thank you
outs=lapply(names(qdata[,12:35]), function(x)
hist(qdata[[x]],data=qdata,main="Histogram of Quality Trait",
xlab=as.character(x),las=1.5)$out)
This code generate a series of histograms, one for each of columns 12 to 35 of dataframe qdata. The lapply function iterates over the columns. At each iteraction, the name of the current column is passed as argument "x" to the anonymous function defined by "function(x)". The body of the function is a call to the hist() function, which creates the histogram. qdata[[x]] (where x is the name of a column) extracts the data from that column. I am actually confused by "data=qdata".
We don't have the data object named qdata so we cannot really be sure what will happen with this code. It appears that the author of this code is trying to pass the values of components named outs from function calls to hist. If qdata is an ordinary dataframe, then I suspect that this code will fail in that goal, because the hist function does not have an out component. (Look at the output of ?hist. When I run this with a simple dataframe, I do get histogram plots that appear in my interactive plotting device but I get NULL values for the outs components. Furthermore the 12 warnings are caused by the lack of a data parameter to hte hist function.
qdata <- data.frame(a=rnorm(10), b=rnorm(10))
outs=lapply(names(qdata), function(x)
hist(qdata[[x]],data=qdata,main="Histogram of Quality Trait",
xlab=as.character(x),las=1.5)$out)
#There were 12 warnings (use warnings() to see them)
> str(outs)
List of 2
$ : NULL
$ : NULL
So I think we need to be concerned about the level of R knowledge of the author of this code. It's possible I'm wrong about this presumption. The hist function is generic and it is possible that some unreferenced package has a function designed to handle a data object and retrun an outs value when delivered a vector having a particular class. In a typical starting situation with only the base packages loaded however, there are only three hist.* functions:
methods(hist)
#[1] hist.Date* hist.default hist.POSIXt*
#see '?methods' for accessing help and source code
As far as the questions about the role of function and [[x]]: the keyword function returns a language object that can receive parameter values and then do operations and finally return results. In this case the names get passed to the anonymous function and become, each in turn, the local name, x and the that value is used by the '[['-function to look-up the column in what I am presuming is the ‘qdata’-dataframe.

Trying to understand R structure: what does a dot in function names signify?

I am trying to learn how to use R. I can use it to do basic things like reading in data and running a t-test. However, I am struggling to understand the way R is structured (I am have a very mediocre java background).
What I don't understand is the way the functions are classified.
For example in is.na(someVector), is is a class? Or for read.csv, is csv a method of the read class?
I need an easier way to learn the functions than simply memorizing them randomly. I like the idea of things belonging to other things. To me it seems like this gives a language a tree structure which makes learning more efficient.
Thank you
Sorry if this is an obvious question I am genuinely confused and have been reading/watching quite a few tutorials.
Your confusion is entirely understandable, since R mixes two conventions of using (1) . as a general-purpose word separator (as in is.na(), which.min(), update.formula(), data.frame() ...) and (2) . as an indicator of an S3 method, method.class (i.e. foo.bar() would be the "foo" method for objects with class attribute "bar"). This makes functions like summary.data.frame() (i.e., the summary method for objects with class data.frame) especially confusing.
As #thelatemail points out above, there are some other sets of functions that repeat the same prefix for a variety of different options (as in read.table(), read.delim(), read.fwf() ...), but these are entirely conventional, not specified anywhere in the formal language definition.
dotfuns <- apropos("[a-z]\\.[a-z]")
dotstart <- gsub("\\.[a-zA-Z]+","",dotfuns)
head(dotstart)
tt <- table(dotstart)
head(rev(sort(tt)),10)
## as is print Sys file summary dev format all sys
## 118 51 32 18 17 16 16 15 14 13
(Some of these are actually S3 generics, some are not. For example, Sys.*(), dev.*(), and file.*() are not.)
Historically _ was used as a shortcut for the assignment operator <- (before = was available as a synonym), so it wasn't available as a word separator. I don't know offhand why camelCase wasn't adopted instead.
Confusingly, methods("is") returns is.na() among many others, but it is effectively just searching for functions whose names start with "is."; it warns that "function 'is' appears not to be generic"
Rasmus Bååth's presentation on naming conventions is informative and entertaining (if a little bit depressing).
extra credit: are there any dot-separated S3 method names, i.e. cases where a function name of the form x.y.z represents the x.y method for objects with class attribute z ?
answer (from Hadley Wickham in comments): as.data.frame.data.frame() wins. as.data.frame is an S3 generic (unlike, say, as.numeric), and as.data.frame.data.frame is its method for data.frame objects. Its purpose (from ?as.data.frame):
If a data frame is supplied, all classes preceding ‘"data.frame"’
are stripped, and the row names are changed if that argument is
supplied.

How to use a value that is specified in a function call as a "variable"

I am wondering if it is possible in R to use a value that is declared in a function call as a "variable" part of the function itself, similar to the functionality that is available in SAS IML.
Given something like this:
put.together <- function(suffix, numbers) {
new.suffix <<- as.data.frame(numbers)
return(new.suffix)
}
x <- c(seq(1000,1012, 1))
put.together(part.a, x)
new.part.a ##### does not exist!!
new.suffix ##### does exist
As it is written, the function returns a dataframe called new.suffix, as it should because that is what I'm asking it to do.
I would like to get a dataframe returned that is called new.part.a.
EDIT: Additional information was requested regarding the purpose of the analysis
The purpose of the question is to produce dataframes that will be sent to another function for analysis.
There exists a data bank where elements are organized into groups by number, and other people organize the groups
into a meaningful set.
Each group has an id number. I use the information supplied by others to put the groups together as they are specified.
For example, I would be given a set of id numbers like: part-1 = 102263, 102338, 202236, 302342, 902273, 102337, 402233.
So, part-1 has seven groups, each group having several elements.
I use the id numbers in a merge so that only the groups of interest are extracted from the large data bank.
The following is what I have for one set:
### all.possible.elements.bank <- .csv file from large database ###
id.part.1 <- as.data.frame(c(102263, 102338, 202236, 302342, 902273, 102337, 402233))
bank.names <- c("bank.id")
colnames(id.part.1) <- bank.names
part.sort <- matrix(seq(1,nrow(id.part.1),1))
sort.part.1 <- cbind(id.part.1, part.sort)
final.part.1 <- as.data.frame(merge(sort.part.1, all.possible.elements.bank,
by="bank.id", all.x=TRUE))
The process above is repeated many, many times.
I know that I could do this for all of the collections that I would pull together, but I thought I would be able to wrap the selection process into a function. The only things that would change would be the part numbers (part-1, part-2, etc..) and the groups that are selected out.
It is possible using the assign function (and possibly deparse and substitute), but it is strongly discouraged to do things like this. Why can't you just return the data frame and call the function like:
new.part.a <- put.together(x)
Which is the generally better approach.
If you really want to change things in the global environment then you may want a macro, see the defmacro function in the gtools package and most importantly read the document in the refrences section on the help page.
This is rarely something you should want to do... assigning to things out of the function environment can get you into all sorts of trouble.
However, you can do it using assign:
put.together <- function(suffix, numbers) {
assign(paste('new',
deparse(substitute(suffix)),
sep='.'),
as.data.frame(numbers),
envir=parent.env(environment()))
}
put.together(part.a, 1:20)
But like Greg said, its usually not necessary, and always dangerous if used incorrectly.

The Art of R Programming : Where else could I find the information?

I came across the editorial review of the book The Art of R Programming, and found this
The Art of R Programming takes you on a guided tour of software development with R, from basic types and data structures to advanced topics like closures, recursion, and anonymous functions
I immediately became fascinated by the idea of anonymous functions, something I had come across in Python in the form of lambda functions but could not make the connection in the R language.
I searched in the R manual and found this
Generally functions are assigned to symbols but they don't need to be. The value returned by the call to function is a function. If this is not given a name it is referred to as an anonymous function. Anonymous functions are most frequently used as arguments other functions such as the apply family or outer.
These things for a not-very-long-time programmer like me are "quirky" in a very interesting sort of way.
Where can I find more of these for R (without having to buy a book) ?
Thank you for sharing your suggestions
Functions don't have names in R. Whether you happen to put a function into a variable or not is not a property of the function itself so there does not exist two sorts of functions: anonymous and named. The best we can do is to agree to call a function which has never been assigned to a variable anonymous.
A function f can be regarded as a triple consisting of its formal arguments, its body and its environment accessible individually via formals(f), body(f) and environment(f). The name is not any part of that triple. See the function objects part of the language definition manual.
Note that if we want a function to call itself then we can use Recall to avoid knowing whether or not the function was assigned to a variable. The alternative is that the function body must know that the function has been assigned to a particular variable and what the name of that variable is. That is, if the function is assigned to variable f, say, then the body can refer to f in order to call itself. Recall is limited to self-calling functions. If we have two functions which mutually call each other then a counterpart to Recall does not exist -- each function must name the other which means that each function must have been assigned to a variable and each function body must know the variable name that the other function was assigned to.
There's not a lot to say about anonymous functions in R. Unlike Python, where lambda functions require special syntax, in R an anonymous function is simply a function without a name.
For example:
function(x,y) { x+y }
whereas a normal, named, function would be
add <- function(x,y) { x+y }
Functions are first-class objects, so you can pass them (regardless of whether they're anonymous) as arguments to other functions. Examples of functions that take other functions as arguments include apply, lapply and sapply.
Get Patrick Burns' "The R Inferno" at his site
There are several good web sites with basic introductions to R usage.
I also like Zoonekynd's manual
Great answers about style so far. Here's an answer about a typical use of anonymous functions in R:
# Make some data up
my.list <- list()
for( i in seq(100) ) {
my.list[[i]] <- lm( runif(10) ~ runif(10) )
}
# Do something with the data
sapply( my.list, function(x) x$qr$rank )
We could have named the function, but for simple data extractions and so forth it's really handy not to have to.

Resources