What is the difference between objects and variables? - r

I want to know the difference between Variables and Objects in R. Is 'a'in the code provided an object or variable ? Where this a is going to be save, in heap or stack ?
a <- 1

There is no difference, from a data type perspective, between 1 and c(1,2,3). Everything in R is an object. For instance:
a <- 1
b <- c(1,2,3)
typeof(a)==typeof(b)
#[1] TRUE
class(a)==class(b)
#[1] TRUE
R is a high level language and you don't have visibility on where and when R actually allocates memory.

We can do object oriented programming in R. In fact, everything in R is an object.
An object is a data structure having some attributes and methods which act on its attributes.
Class is a blueprint for the object. We can think of class like a sketch (prototype) of a house. It contains all the details about the floors, doors, windows etc. Based on these descriptions we build the house.
House is the object. As, many houses can be made from a description, we can create many objects from a class. An object is also called an instance of a class and the process of creating this object is called instantiation.
While most programming languages have a single class system, R has three class systems. Namely, S3, S4 and more recently Reference class systems.
They have their own features and peculiarities and choosing one over the other is a matter of preference. Below, we give a brief introduction to them.
S3 class is somewhat primitive in nature. It lacks a formal definition and object of this class can be created simply by adding a class attribute to it.
create a list with required components
s <- list(name = "Rafay", age = 21, GPA = 3.72)
name the class appropriately
class(s) <- "student"
S4 class are an improvement over the S3 class. They have a formally defined structure which helps in making object of the same class look more or less similar.
< setClass("student", slots=list(name="character", age="numeric", GPA="numeric"))
Now if you're from a c#, c background you must think that
When in c#
int a =2 #it is called variable
Student std1=new Student() ;# it is called object
But as mentioned above everything in R is called object.

Related

Getting type of object in R [duplicate]

Two R questions:
What is the difference between the type (returned by typeof) and the class (returned by class) of a variable? Is the difference similar to that in, say, C++ language?
What are possible types and classes of variables?
In R every "object" has a mode and a class. The former represents how an object is stored in memory (numeric, character, list and function) while the later represents its abstract type. For example:
d <- data.frame(V1=c(1,2))
class(d)
# [1] "data.frame"
mode(d)
# [1] "list"
typeof(d)
# list
As you can see data frames are stored in memory as list but they are wrapped into data.frame objects. The latter allows for usage of member functions as well as overloading functions such as print with a custom behavior.
typeof(storage.mode) will usually give the same information as mode but not always. Case in point:
typeof(c(1,2))
# [1] "double"
mode(c(1,2))
# [1] "numeric"
The reasoning behind this can be found here:
The R specific function typeof returns the type of an R object
Function mode gives information about the mode of an object in the sense of Becker, Chambers & Wilks (1988), and is more compatible with other implementations of the S language
The link that I posted above also contains a list of all native R basic types (vectors, lists etc.) and all compound objects (factors and data.frames) as well as some examples of how mode, typeof and class are related for each type.
type really refers to the different data structures available in R. This discussion in the R Language Definition manual may get you started on objects and types.
On the other hand, class means something else in R than what you may expect. From
the R Language Definition manual (that came with your version of R):
2.2.4 Classes
R has an elaborate class system1, principally controlled via the class attribute. This attribute is a character vector containing the list
of classes that an object inherits from. This forms the basis of the “generic methods” functionality in R.
This attribute can be accessed and manipulated virtually without restriction by users. There is no checking that an object actually contains the components that class methods expect. Thus, altering the class attribute should be done with caution, and when they are available specific creation and coercion functions should be preferred.

Are there obvious reasons to choose between methods and functions in R?

I've written a function to return a class I built that contains some calculations for the data passed to the function.
Once the new object is returned, I intend to print out some of the data in a little "report" and then map the lines contained in the sf slot colored by an attribute the original function calculated.
carbon_class <- setClass("carbon_class", slots = c(total_carbon = "numeric", carbon_by_type = "data.frame", trips = "sf"), contains = c("data.frame", "sf"))
I was going to define two methods for the class to create the report and map, mainly to practice object oriented programming in R, but as I'm reading about it, I'm having a hard time coming up with a reason to use a method instead of just another function.
Are there obvious use cases for each? I'm reading through Hadley's Advanced R and it talks about how to use the S3 and S4 classes/methods, but not why.
Thanks
edit: is it for using side effects because technically functions should only return a value without any side effects while its more acceptable for methods to do other things in addition to what they return?

R: Finding class inheritance structure within an R package

You can find all the objects in a package with
objs <- mget(ls("package:base"), inherits = TRUE)
You can select the functions from these with
funs <- objs[is.function(objs)]
You can get a complete list of the dependencies of the listed functions in a package by applying codetools::findGlobals(), miniCRAN::makeDepGraph, pkgnet::CreatePackageReport (or others) to the function list. All of these functions either graph the resulting dependencies or return an object easily plotable with, e.g., igraph or DependenciesGraph.
Is there an comparable set of commands to find all the classes created by a package and the inheritance structure of those classes? I know that for most packages the resulting web of class inheritance would be relatively simple, but I think that in a few cases, such as ggplot2 and the survey package, the resulting web of class inheritance could be quite helpful.
I have found a package, classGraph, that creates directed acyclic graphs for S4 class structures, but I am more interested in the much more common S3 structures.
This seems brute-force and sloppy, but I suppose if I had a list of all the class attributes used by objects in the base packages, and all the class attributes of objects in a package, then any of the latter which is not among the former would be new classes created by the package or inherited from another non-base package.
This is slightly tricky since I am not aware of any formal definition of a S3 class. For R objects the S3 classes are governed by a very simple character vector of class names stored in the class attribute. Method dispatch is then done by matching element(s) of that attribute with a function name.
You could essentially do:
x <- 1:5
class(x) <- "MyMadeUpClass"
x
# [1] 1 2 3 4 5
# attr(,"class")
# [1] "MyMadeUpClass"
Does the above really define a class in the intuitive formal understanding of the term ?
You can create a print method for objects of this class like (silly example incoming):
print.MyMadeUpClass <- function(x, ...) {
print(sprintf("Pretty vector: %s", paste(x, collapse = ",")))
}
x
# [1] "Pretty vector: 1,2,3,4,5"
The important distinction here is that methods in S3
"belong to" (generic) functions, not classes
are chosen based on classes of the arguments provided to the function call
Point I am trying to make is that S3 does not really have a formally defined inheritance (which I assume is what you are looking for), with contrast to S4 which implements this via the contains concept, so I am not really sure what would you like to see as a result.
Very good read on the topic Object-Oriented Programming, Functional
Programming and R by John M. Chambers: https://arxiv.org/pdf/1409.3531.pdf
Edit (after question edit) - the sloop package:
From S3 perspective I think it makes a lot of sense to examine the structure of generics and methods. A found the sloop package to be a very useful tool for this: https://github.com/r-lib/sloop.

R: modify any function applied to a S4 class

I've been developing a S4 class which is essentially a data.frame with a little bit of extra information. For the purposes of this question, the "extra" features of this class are irrelevant. What matters is that the class contains a data.frame object stored in one of it's slots. (I put the data.frame in a slot, instead of naming it a superclass, because I find that S4 classes which contain data.frames simplify the data.frames to lists for some reason).
Here's a basic example:
setClass('tmp_class', slots = c(df = 'data.frame'))
test_object <- new('tmp_class', df = data.frame(Num = 1:10, Let = letters[1:10]))
Now what I'd like to do is make it so that essentially any function applied to an object of this class is applied to the data.frame in slot #df. It's easy to write methods for specific functions to do this, like:
setMethod('dim', signature = c(x = 'tmp_class'), function(x) dim(x#df))
But I'm limited to only the functions I can think of, and any function invented by a user wouldn't work.
It is a simple matter to write a sort of wrapper/closure to modify a function to work on my class, like this:
tmp_classize <- function(func){
function(tmp, ...){ func(tmp#df, ...) }
}
So, rather than writing methods for, say, colnames() or ncol(), I could just run:
tmp_classize(colnames)(test_object)
or
tmp_classize(ncol)(test_object)
But what I'd like to do is somehow evoke my "tmp_classize" function on any function applied to my class, automatically. I can't figure out how to do it. I was thinking that if could somehow call a "universal method" with an input signature of class "tmp_class", and then use sys.function() to grab the actual function being called, maybe I could make something work, but A) there are recursion problems B) I don't know how to call such a "universal" method. It seems to me that the solution, if it exists at all, might necessitate non-standard evaluation, which I'd rather avoid, but might use if necessary.
Thanks!
P.S. I realize this undertaking may be unwise/poor programming technique, and I may never actually implement it in a package. Still I'm curious to know if it is possible.
P.P.S. I'd also be interested in the same idea applied to S3 classes!
In principal what you could do is make a classUnion for your class and data.frame and write methods for your class that deal with all of the ways to read and write to data.frames such as $, [, dim(), <- and many more. Then when other functions seek to use your new class as data.frame there will be methods for this to work. This is somewhat explained in John Chambers "Software for Data Analysis" starting on page 375. That said this system may be very difficult to implement.
A simpler system may be to just add an extra attribute to your data.frame with the extra info you need. For example:
x<-data.frame(a=1:3,b=4:6)
attr(x,"Info")<-"Extra info I need"
attributes(x)$Info
[1] "Extra info I need"
This is not as elegant as a S4 class but will do everything a data.frame does. I suspect that someone who is familiar with S3 classes could improve on this idea quite a bit.
The simplest solution is to have your class contain data.frame instead of having it as one of the slots. For example here is a data.frame with a timestamp:
setclass(
"timestampedDF",
slots=c(timestamp="POSIXt"),
contains="data.frame"
)
Now all functions which work for a data.frame (such as head) will automatically work for timestampedDF objects. If you need to get at the "data frame part", then that is held in a hidden slot object#.Data.

Easiest way to save an S4 class

Probably the most basic question on S4 classes imaginable here.
What is the simplest way to save an S4 class you have defined so that you can reuse it elsewhere. I have a project where I'm taking a number of very large datasets and compiling summary information from them into small S4 objects. Since I'll therefore be switching R sessions to create the summary object for each dataset, it'd be good to be able to load in the definition of the class from a saved object (or have it load automatically) rather than having to include the long definition of the object at the top of each script (which I assume is bad practice anyway because the code defining the object might become inconsistent).
So what's the syntax along the lines of saveclass("myClass"), loadclass("myclass") or am I just thinking about this in the wrong way?
setClass("track", representation(x="numeric", y="numeric"))
x <- new("track", x=1:4, y=5:8)
save as binary
fn <- tempfile()
save(x, ascii=FALSE, file=fn)
rm(x)
load(fn)
x
save as ASCII
save(x, ascii=TRUE, file=fn)
ASCII text representation from which to regenerate the data
dput(x, file=fn)
y <- dget(fn)
The original source can be found here.
From the question, I think you really do want to include the class definition at the top of each script (although not literally; see below), rather than saving a binary representation of the class definition and load that. The reason is the general one that binary representations are more fragile (subject to changes in software implementation) compared to simple text representations (for instance, in the not too distant past S4 objects were based on simple lists with a class attribute; more recently they have been built around an S4 'bit' set on the underlying C-level data representation).
Instead of copying and pasting the definition into each script, really the best practice is to included the class definition (and related methods) in an R package, and to load the package at the top of the script. It is not actually hard to write packages; an easy way to get started is to use Rstudio to create a 'New Project' as an 'R package'. Use a version number in the package to keep track of the specific version of the class definition / methods you're using, and version control (svn or git, for instance) to make it easy to track the changes / explorations you make as your class matures. Share with your colleagues and eventually the larger R community to let others benefit from your hard work and insight!

Resources