How can I read the source code for an R function? - r

I have a data frame and I want to learn how the summary generates it's information. Specifically, how does summary generate a count for the number of elements in each level of a factor. I can use summary, but I want to learn how to work with factors better. When I try ?summary, I just get the general info. Is this impossible because it is in bytecode?

What we see when you type summary is
> summary
function (object, ...)
UseMethod("summary")
<bytecode: 0x0456f73c>
<environment: namespace:base>
This is telling us that summary is a generic function and has many methods attached to it. To see what those methods are actually called we can try
> methods(summary)
[1] summary.aov summary.aovlist summary.aspell*
[4] summary.connection summary.data.frame summary.Date
[7] summary.default summary.ecdf* summary.factor
[10] summary.glm summary.infl summary.lm
[13] summary.loess* summary.manova summary.matrix
[16] summary.mlm summary.nls* summary.packageStatus*
[19] summary.PDF_Dictionary* summary.PDF_Stream* summary.POSIXct
[22] summary.POSIXlt summary.ppr* summary.prcomp*
[25] summary.princomp* summary.srcfile summary.srcref
[28] summary.stepfun summary.stl* summary.table
[31] summary.tukeysmooth*
Non-visible functions are asterisked
Here we see all the methods associated with the summary function. What this means is that there is different code for when you call summary on an lm object than there is when you call summary on a data.frame. This is good because we wouldn't expect the summary to be conducted the same way for those two objects.
To see the code that is run when you call summary on a data.frame you can just type
summary.data.frame
as shown in the methods list. You'll be able to examine it and study it and do whatever you want with the printed code. You mentioned that you were interested in factors so you will probably want to examine the output of summary.factor. Now you might notice that some of the methods printed had an asterisk (*) next to them which implies that they're non-visible. This essentially means that you can't just type the name of the function to try to view the code.
> summary.prcomp
Error: object 'summary.prcomp' not found
However, if you're determined to see what the code actually is you can use the getAnywhere function to view it.
> getAnywhere(summary.prcomp)
A single object matching ‘summary.prcomp’ was found
It was found in the following places
registered S3 method for summary from namespace stats
namespace:stats
with value
function (object, ...)
{
vars <- object$sdev^2
vars <- vars/sum(vars)
importance <- rbind(`Standard deviation` = object$sdev, `Proportion of Variance` = round(vars,
5), `Cumulative Proportion` = round(cumsum(vars), 5))
colnames(importance) <- colnames(object$rotation)
object$importance <- importance
class(object) <- "summary.prcomp"
object
}
<bytecode: 0x03e15d54>
<environment: namespace:stats>
Hopefully this helps you explore the code in R much more easily in the future.
For even more details you can view Volume 6/4 of The R Journal (warning, pdf) and read Uwe Ligge's "R Help Desk" section which deals with viewing the source code of R functions.

Related

How to tell R how to plot objects of a certain class?

I'm currently dealing with some objects that are a list of attributes that represents a statistical model. For example, let's say I have a matrix, a numeric vector and an integer.
myobj = list(amatrix = matrix(1:9,3,3),avector = c(1:3),aninteger = 1)
class(myobj) = 'myclass'
Suppose that, for some reason, I can create a plot that represents an object of this class. How can I make plot(myobj) recognizes that the object has the class 'myclass', and print it in the desired way, for example image(myobj$amatrix)?
I think the question is essentially how to 'modify' R's plot function so it knows how to handle a newly defined object class? Can I use functions of other packages like ggplot when executing this modification?
In a more general sense, how does functions that handle different classes of objects know how to act for each class?
I have little to none experience with classes in R, so even some simple guides about classes should be helpful.
As mentionned by #emilliman you can define your own method:
myobj = list(amatrix = matrix(1:9,3,3),avector = c(1:3),aninteger = 1)
class(myobj) <- 'myclass'
plot.myclass <- function(x) image(x$amatrix)
methods(plot) # check the 4th element of 3rd line :) (list will differ depending on what packages are loaded)
# [1] plot.acf* plot.data.frame* plot.decomposed.ts* plot.default plot.dendrogram* plot.density* plot.ecdf
# [8] plot.factor* plot.formula* plot.function plot.hclust* plot.histogram* plot.HoltWinters* plot.isoreg*
# [15] plot.lm* plot.medpolish* plot.mlm* plot.myclass plot.ppr* plot.prcomp* plot.princomp*
# [22] plot.profile.nls* plot.R6* plot.raster* plot.spec* plot.stepfun plot.stl* plot.table*
# [29] plot.ts plot.tskernel* plot.TukeyHSD*
#and plot :
plot(myobj)

Why does the function t return a t.test for objects with class set to "test"?

I'm reading Hadley Wickham's book Advanced R, specifically the OO fied guide (http://adv-r.had.co.nz/OO-essentials.html). The first exercise in that chapter is as follows:
Read the source code for t() and t.test() and confirm that t.test() is an S3 generic and not an S3 method. What happens if you create an object with class test and call t() with it?
If I understood the chapter correctly, we can confirm that t() and t.test() are generic, because they use the UseMethod() function in the source code. methods(t) returns t.data.frame, t.default and t.ts* as the methods of function t(). Why then, if both are S3 generics and t does not have a t.test method, does the following code return the t test?
a <- structure(1:4, class = "test")
t(a)
My prediction would be that t would use the default method for class "test" and t.default(a) does the transpose as logically I suppose it should. Where then does the t.test come from?
If you run t(a), where a is your object of class test, then UseMethod("t") is called. This will check what the class of the first argument that you provided to t() is. The class is test and R will now
look for a function t.test(). Since t.test() exists, t.test(a) is run. This is called "method dispatch".
Only if t.test() did not exist, R would resort to calling t.default(). You can actually even see this happen, by detaching the stats package before running t(a):
a <- structure(1:4, class = "test")
detach("package:stats")
t(a)
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## attr(,"class")
## [1] "test"
The question is now, why t.test is not contained in the list, when you run methods("t"). When you look at the source code of methods(), you will notice that it calls .S3methods(). This function compiles the names of all the methods of t. However, at some point, it removes the function names that are contained in S3MethodsStopList:
info <- info[grep(name, row.names(info)), ]
info <- info[!row.names(info) %in% S3MethodsStopList,
]
(If I run edit(.S3methods) in RStudio, these are lines 47 and 48).
S3MethodsStopList is defined earlier (on line 15):
S3MethodsStopList <- tools:::.make_S3_methods_stop_list(NULL)
The function tools:::.make_S3_methods_stop_list() seems not to be documented, but it just seems to return a hardcoded list of function names that contain a dot, but are actually not methods. t.test() is one of them:
grep("^t\\.", tools:::.make_S3_methods_stop_list(NULL), value = TRUE)
## Hmisc6 calibrator mosaic mratios1
## "t.test.cluster" "t.fun" "t.test" "t.test.ration"
## mratios2 mratios3 stats6
## "t.test.ratio.default" "t.test.ratio.formula"
In short, methods() does explicitly filter out functions that are known not to be methods. Method dispatching, on the other hand, simply looks for a function with an appropriate name.

R how to view standard generic method when no object is specified [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
R: show source code of an S4 function in a package
I downloaded a package (GEOquery) and was playing with some of the functions. One of them is called Table, which, to my understanding, is able to tabulate an S4 dataset.
E.g.
> summary(GDS2853) # GDS2853 is a dataset I downloaded from NCBI
Length Class Mode
1 GDS S4
getAnywhere(Table) shows
> getAnywhere(Table)
A single object matching ‘Table’ was found
It was found in the following places
package:GEOquery
namespace:GEOquery
with value
function (object)
standardGeneric("Table")
<environment: 0x06ad5268>
attr(,"generic")
[1] "Table"
attr(,"generic")attr(,"package")
[1] "GEOquery"
attr(,"package")
[1] "GEOquery"
attr(,"group")
list()
attr(,"valueClass")
character(0)
attr(,"signature")
[1] "object"
attr(,"default")
`\001NULL\001`
attr(,"skeleton")
function (object)
stop("invalid call in method dispatch to \"Table\" (no default method)",
domain = NA)(object)
attr(,"class")
[1] "standardGeneric"
attr(,"class")attr(,"package")
[1] "methods"
I'd like to learn the code of Table so that I could know how to tabulate a GDS dataset, as data.frame and as.list couldn't coerce an S4 class - although I could tabulate GDS dataset by, for example,
GDS_table=Table(GDS2853)[1:20000,1:20] #GDS2853 contains 20 columns
and approx 17000 rows
I tried the getMethods as suggested in other posts but below is what I got
> getMethod("Table")
Error in getMethod("Table") :
No method found for function "Table" and signature
I also tried to specify the "where" by putting in package=:GEOquery but apparently package is an unused argument.
Wonder what I did wrong so as to fail to see the source code for Table.
From the output you posted, it looks like Table is an S4 generic.
To view a list of its S4 methods, use showMethods(). To view a particular method, use getMethod(), passing in the 'signature' of the method you want along with the name of the function. (A 'signature' is a character vector composed of the class(es) of the argument(s) according to which the generic Table performs its method dispatch. i.e. if you will be doing Table(GDS2853), the signature will likely be class(GDS2835))
Here's an example that gets the code for an S4 method in the sp package:
library(sp)
showMethods("overlay")
# Function: overlay (package sp)
# x="SpatialGrid", y="SpatialPoints"
# x="SpatialGrid", y="SpatialPolygons"
# x="SpatialGridDataFrame", y="SpatialPoints"
# x="SpatialGridDataFrame", y="SpatialPolygons"
# x="SpatialPixels", y="SpatialPoints"
# x="SpatialPixelsDataFrame", y="SpatialPoints"
# x="SpatialPoints", y="SpatialPolygons"
# x="SpatialPointsDataFrame", y="SpatialPolygons"
# x="SpatialPolygons", y="SpatialGrid"
# x="SpatialPolygons", y="SpatialPoints"
getMethod("overlay", signature=c("SpatialGrid", "SpatialPoints"))
In your example, it would be:
getMethod("Table", "GEOData")
You may also be interested in how to get the help documentation for S4 methods, which has an equally unusual invocation required:
method?Table("GEOData")
Generally, with S4, you will need
the function name
the class (signature) of objects it is for
If you are lost as to the latter:
class(object)
will return the class, and you can also do:
showMethods("Table")
to show all currently available methods. Alternatively, I find I often use:
findMethods("Table")
and the reason is the findMethods returns a list of all the methods for a particular function. Classes can have long names and I find I mistype/miscapitalize them often so as a quick hack, findMethods("functionname") is handy. Of course, it can also bite you for generic functions with many methods as the printed list may be quite long.

Extracting data from an ANOVA object that is based on the data in R

I am trying to find a less convoluted way to extract data from an aov object. Suppose I have a dataset a as shown below, and I ran an ANOVA based on the data, resulting in an object named a.model. I tried to locate the data by using str(a.model), but haven't been able to find them. Since I know how to extract data from lm objects, what I did was using lm(a.model)$model$score, which works. But is it possible to directly extract data from a.model without first converting an aov object to an lm object? - I guess this is more out of curiosity than anything because the "extra" step of conversion is not that much more work.
a=data.frame(factor1 = rep(letters[1:2], each=10),
factor2 = rep(letters[c(1,2,1,2)], each=5),
score=sort(rlnorm(20)))
a.model = aov(score~factor1*factor2, data=a)
Output from aov also have a component called model which contains the data, ie. a.model$model$score is identical to lm(a.model)$model$score.
Function names is useful:
> names(a.model)
[1] "coefficients" "residuals"
[3] "effects" "rank"
[5] "fitted.values" "assign"
[7] "qr" "df.residual"
[9] "contrasts" "xlevels"
[11] "call" "terms"
[13] "model"
Another way which is perhaps more convienient and works in more general cases, is to use functions model.matrix and model.frame which give the desing matrix and the whole model used in formula. In your second example (in comments) you can use model.frame to get the data.

show source code for a function in a package in R [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
R: show source code of an S4 function in a package
I downloaded a package (GEOquery) and was playing with some of the functions. One of them is called Table, which, to my understanding, is able to tabulate an S4 dataset.
E.g.
> summary(GDS2853) # GDS2853 is a dataset I downloaded from NCBI
Length Class Mode
1 GDS S4
getAnywhere(Table) shows
> getAnywhere(Table)
A single object matching ‘Table’ was found
It was found in the following places
package:GEOquery
namespace:GEOquery
with value
function (object)
standardGeneric("Table")
<environment: 0x06ad5268>
attr(,"generic")
[1] "Table"
attr(,"generic")attr(,"package")
[1] "GEOquery"
attr(,"package")
[1] "GEOquery"
attr(,"group")
list()
attr(,"valueClass")
character(0)
attr(,"signature")
[1] "object"
attr(,"default")
`\001NULL\001`
attr(,"skeleton")
function (object)
stop("invalid call in method dispatch to \"Table\" (no default method)",
domain = NA)(object)
attr(,"class")
[1] "standardGeneric"
attr(,"class")attr(,"package")
[1] "methods"
I'd like to learn the code of Table so that I could know how to tabulate a GDS dataset, as data.frame and as.list couldn't coerce an S4 class - although I could tabulate GDS dataset by, for example,
GDS_table=Table(GDS2853)[1:20000,1:20] #GDS2853 contains 20 columns
and approx 17000 rows
I tried the getMethods as suggested in other posts but below is what I got
> getMethod("Table")
Error in getMethod("Table") :
No method found for function "Table" and signature
I also tried to specify the "where" by putting in package=:GEOquery but apparently package is an unused argument.
Wonder what I did wrong so as to fail to see the source code for Table.
From the output you posted, it looks like Table is an S4 generic.
To view a list of its S4 methods, use showMethods(). To view a particular method, use getMethod(), passing in the 'signature' of the method you want along with the name of the function. (A 'signature' is a character vector composed of the class(es) of the argument(s) according to which the generic Table performs its method dispatch. i.e. if you will be doing Table(GDS2853), the signature will likely be class(GDS2835))
Here's an example that gets the code for an S4 method in the sp package:
library(sp)
showMethods("overlay")
# Function: overlay (package sp)
# x="SpatialGrid", y="SpatialPoints"
# x="SpatialGrid", y="SpatialPolygons"
# x="SpatialGridDataFrame", y="SpatialPoints"
# x="SpatialGridDataFrame", y="SpatialPolygons"
# x="SpatialPixels", y="SpatialPoints"
# x="SpatialPixelsDataFrame", y="SpatialPoints"
# x="SpatialPoints", y="SpatialPolygons"
# x="SpatialPointsDataFrame", y="SpatialPolygons"
# x="SpatialPolygons", y="SpatialGrid"
# x="SpatialPolygons", y="SpatialPoints"
getMethod("overlay", signature=c("SpatialGrid", "SpatialPoints"))
In your example, it would be:
getMethod("Table", "GEOData")
You may also be interested in how to get the help documentation for S4 methods, which has an equally unusual invocation required:
method?Table("GEOData")
Generally, with S4, you will need
the function name
the class (signature) of objects it is for
If you are lost as to the latter:
class(object)
will return the class, and you can also do:
showMethods("Table")
to show all currently available methods. Alternatively, I find I often use:
findMethods("Table")
and the reason is the findMethods returns a list of all the methods for a particular function. Classes can have long names and I find I mistype/miscapitalize them often so as a quick hack, findMethods("functionname") is handy. Of course, it can also bite you for generic functions with many methods as the printed list may be quite long.

Resources