How to get the underling data of a MA plot? - r

I would like to use the plotMA function of limma.
The example of the documentation works fine:
A <- runif(1000,4,16)
y <- A + matrix(rnorm(1000*3,sd=0.2),1000,3)
status <- rep(c(0,-1,1),c(950,40,10))
y[,1] <- y[,1] + status
plotMA(y, array=1, status=status, values=c(-1,1), hl.col=c("blue","red"))
Now I would like to access the underlying data that is used for the plot as I would like to use the data in a different context, not just the plot. I currently don't see a way to access the data; of course I could implement the method myself and only use the data, but that feels wrong.
Is there a way to access the underlying data used for the MA plot?

Looking at the code of plotMA we see that several variables are created and used for plotting. These variables are not returned however.
You could now copy and paste the function to write your own function, which plots and returns the data. This is however, error-prone,if there is a new version of the function you may rely on old code.
So what you can do instead is to use trace to insert arbitrary code into plotMA notably some code which stores the data in your global environment. I illustrate the idea with a toy example:
f <- function(x) {
y <- x + rnorm(length(x))
plot(x, y)
invisible()
}
If we would like to use y in this function we could do something like this
trace(f, exit = quote(my_y <<- y))
# [1] "f"
ls()
# [1] "f"
f(1:10)
# Tracing f(1:10) on exit
ls()
# [1] "f" "my_y"
And now we can access my_y.
What you should do:
Look at the code of plotMA
Identify which part of the data you need (e.g. x, y and sel)
Use trace(plotMA, exit = quote({my_data <<- list(x, y, sel)}), where = asNamespace("limma"))
Run plotMA
Access the data via my_data
Note. Check out ?trace to fully understand the possibilities of it. In particular, if you want to inject your code not at the end (exit) but at another psoition (maybe because intermediate variables are overwritten and you need the first results) for which you would need to use the at parameter of trace
Update
Maybe the easiest is to get a full dump of all local variables defined in the function:
trace("plotMA", exit = quote(var_dump <<- mget(ls())), where = asNamespace("limma"))

Related

Passing objects to functions

I am currently working on user defined functions aimed at modelling empirical data and I have problems with objects / parameters passed to the function:
bestModel <- function(k=4L, R2=0.994){
print(k) # here, everything is still fine
lmX <- mixlm::lm(getLinearModelFunction(k), data)
best <- mixlm::best.subsets(lmX, nbest=1)
.
.
.
}
At first, everything works as expected, but as soon as I want to pass the parameter k to another user defined function getLinearModelFunction(), an error is thrown:
Error in getLinearModelFunction(k) : object 'k' not found
It doesn't help, if I am assigning a new parameter, e. g. l <- k and try to pass that on. The parameter doesn't seem to be available for the other function. I ran into this problem not only with primitive data types, but as well complex structures. On command line, everything works, as long as the objects are in my workspace.
To sum it up: Passing parameters work only within that function, but calls of other functions from there onwards result in error. Why? And: What to do about it?
EDIT:
While trying to resolve the problem, it gets really weird. I stripped down all functions:
functionA <- function(data, k){
lmX <- mixlm::lm(functionB(k), data)
summary(lmX)
# best <- mixlm::best.subsets(lmX,nbest=1)
}
functionB <- function(k=4){
if(k==1){
return(formula("raw ~ L1"))
}else if(k==2){
return(formula("raw ~ L1 + L2"))
}else if(k==3){
return(formula("raw ~ L1 + L2 + L3 "))
}else if(k==4){
return(formula("raw ~ L1 + L2 + L3 + L4"))
}
}
Let's say, we have a data.frame d with the variables raw, L1, L2, L3, L4 ... As long, as there is the commenting # before best, it works. As soon as it is removed, calling functionA(d, 3) results in
Error in functionB(k) : object 'k' not found
Even, though k doesn't play a role in that function and before that, it worked.
Ok, indeed, this was an environment thing. The solution is to get the current environment and to take the object from there:
functionA <- function(data, k){
e <- environment()
lmX <- mixlm::lm(functionB(e$k), e$data)
summary(lmX)
best <- mixlm::best.subsets(lmX,nbest=1)
}
This is usually not a problem, when directly working with are packages. The objects usually are in the global environments then. When working with functions, each function has its' own environment. I managed to solve this while starting to learn about packaging the code: http://adv-r.had.co.nz/Environments.html

How to open an R data file in R window

I have some data in R that I intend to analyze. However, the file is not displaying the data. Instead, It is only showing a variable in the data. The following is the procedure I used to load the data and the output produced.
load("C:\Users\user\AppData\Local\Temp\1_29_923-Macdonell.RData")
data=load("C:\Users\user\AppData\Local\Temp\1_29_923-Macdonell.RData")
data
[1] "HeightFinger"
How do I get to view the data?
If you read ?help, it says that the return value of load is:
A character vector of the names of objects created, invisibly.
This suggests (but admittedly does not state) that the true work of the load command is by side-effect, in that it inserts the objects into an environment (defaulting to the current environment, often but not always .GlobalEnv). You should immediately have access to them from where you called load(...).
For instance, if I can guess at variables you might have in your rda file:
x
# Error: object 'x' not found
# either one of these on windows, NOT BOTH
dat = load("C:\\Users\\user\\AppData\\Local\\Temp\\1_29_923-Macdonell.RData")
dat = load("C:/Users/user/AppData/Local/Temp/1_29_923-Macdonell.RData")
dat
# [1] "x" "y" "z"
x
# [1] 42
If you want them to be not stored in the current environment, you can set up an environment to place them in. (I use parent=emptyenv(), but that's not strictly required. There are some minor ramifications to not including that option, none of them earth-shattering.)
myenv <- new.env(parent = emptyenv())
dat = load("C:/Users/user/AppData/Local/Temp/1_29_923-Macdonell.RData",
envir = myenv)
dat
# [1] "x" "y" "z"
x
# Error: object 'x' not found
ls(envir = myenv)
# [1] "x" "y" "z"
From here you can get at your data in any number of ways:
ls.str(myenv) # similar in concept to str() but for environments
# x : num 42
# y : num 1
# z : num 2
myenv$x
# [1] 42
get("x", envir = myenv)
# [1] 42
Side note:
You may have noticed that I used dat as my variable name instead of data. Though you are certainly allowed to use that, it can bite you if you use variable names that match existing variables or functions. For instance, all of your code will work just fine as long as you load your data. If, however, you run some of your code without pre-loading your objects into your data variable, you'll likely get an error such as:
mean(data$x)
# Error in data$x : object of type 'closure' is not subsettable
That error message is not immediately self-evident. The problem is that if not previously defined as in your question, then data here refers to the function data. In programming terms, a closure is a special type of function, so the error really should have said:
# Error in data$x : object of type 'function' is not subsettable
meaning that though dat can be subsetted and dat$x means something, you cannot use the $ subset method on a function itself. (You can't do mean$x when referring to the mean function, for example.) Regardless, even though this here-modified error message is less confusing, it is still not clearly telling you what/where the problem is located.
Because of this, many seasoned programmers will suggest you use unique variable names (perhaps more than just x :-). If you use my suggestion and name it dat instead, then the mistake of not preloading your data will instead error with:
mean(dat$x)
# Error in mean(dat$x) : object 'dat' not found
which is a lot more meaningful and easier to troubleshoot.
There are two ways to save R objects, and you've got them mixed up. In the first way, you save() any collection of objects in an environment to a file. When you load() that file, those objects are re-created with their original names in your current environment. This is how R saves and resotres workspaces.
The second way stores (serializes) a single R object into a file with the saveRDS() function, and recreates it in your environment with the readRDS() function. If you don't assign the results of readRDS(), it'll just print to your screen and drift away.
Examples below:
# Make a simple dataframe
testdf <- data.frame(x = 1:10,
y = rnorm(10))
# Save it out using the save() function
savedir <- tempdir()
savepath <- file.path(savedir, "saved.Rdata")
save(testdf, file = savepath)
# Delete it
rm(testdf)
# Load without assigning - and it's back in your environment
load(savepath)
testdf
# But if you assign the results of load, you just get the name of the object
wrong <- load(savepath)
wrong
# Compare with the RDS:
rds_path <- file.path(savedir, "testdf.rds")
saveRDS(testdf, file = rds_path)
rm(testdf)
testdf <- readRDS(file = rds_path)
testdf
Why the two different approaches? The save()-environment approach is good for creating a checkpoint of your entire environment that you can restore later - that's what R uses it for - but that's about it. It's too easy for such an environment to get cluttered, and if an object you load() has the same name as an object in your current environment, it will overwrite that object:
testdf$z <- "blah"
load(savepath)
testdf # testdf$z is gone
The RDS method lets you assign the name on read, as you're looking to do here. It's a little more annoying to save multiple objects, sure, but you probably shouldn't be saving objects very often anyway - recreating objects from scratch is the best way to ensure that your R code does what you think it does.

Source ggplot code and use within ggbio::tracks function

I need to use ggbio::tracks function to have aligned Xaxis merged ggplots.
As ggplot scripts are complex they are saved as separate files which are then sourced to plot.
Here is one complex ggplot example script file, call it test.R, which will then be sourced:
# test.R
ggplot(cars,aes(speed,dist)) + geom_point(col="red")
Now, the problem:
library(ggbio)
library(ggplot2)
# complex ggplot script sourced text.R
x1 <- source("test.R")
# another complex ggplot script
x2 <- ggplot(cars,aes(speed,dist)) + geom_point(col="green")
# check classes
class(x1)
# [1] "list"
class(x2)
# [1] "gg" "ggplot"
# this works
print(x1)
# this doesn't work within tracks function
tracks(
print(x1),
x2,
heights=c(10,1)
)
Error: Objects of type list not supported by autoplot. Please use qplot() or ggplot() instead.
# below works - Note: x1$value
tracks(
x1$value,
x2,
heights=c(10,1)
)
I am surely missing something very simple, I tried to play with source() options, but couldn't find a way to avoid using $value or print(). Essentially, I want to be able to run below code and get above merged plot:
# ideal code
tracks(
x1,
x2,
heights=c(10,1)
)
Ad-hoc solution: modify your test.R by wrapping it into dummy function like so
# test.R
test_ggplot <- function() {
ggplot(cars,aes(speed,dist)) + geom_point(col="red")
}
and then
source("test.R")
x1 <- test_ggplot()
which obviously results in
class(x1)
#[1] "gg" "ggplot"
Honestly, I've never seen usage of xx <- source() so I doubt it is advised to do so. There is even no Value section in ?source...
Edit: source calls withVisible, which describes the return value exactly as a list:
This function evaluates an expression, returning it in a two element
list containing its value and a flag showing whether it would
automatically print.

Pass an object to a function without copying it on change

My question
If an object x is passed to a function f that modifies it R will create a modified local copy of x within f's environment, rather than changing the original object (due to the copy-on-change principle). However, I have a situation where x is very big and not needed once it has been passed to f, so I want to avoid storing the original copy of x once f is called. Is there a clever way to achieve this?
f is an unknown function to be supplied by a possibly not very clever user.
My current solution
The best I have so far is to wrap x in a function forget that makes a new local reference to x called y, removes the original reference in the workspace, and then passes on the new reference. The problem is that I am not certain it accomplish what I want and it only works in globalenv(), which is a deal breaker in my current case.
forget <- function(x){
y <- x
# x and y now refers to the same object, which has not yet been copied
print(tracemem(y))
rm(list=deparse(substitute(x)), envir=globalenv())
# The outside reference is now removed so modifying `y`
# should no longer result in a copy (other than the
# intermediate copy produced in the assigment)
y
}
f <- function(x){
print(tracemem(x))
x[2] <- 9000.1
x
}
Here is an example of calling the above function.
> a <- 1:3
> tracemem(a)
[1] "<0x2ac1028>"
> b <- f(forget(a))
[1] "<0x2ac1028>"
[1] "<0x2ac1028>"
tracemem[0x2ac1028 -> 0x2ac1e78]: f
tracemem[0x2ac1e78 -> 0x308f7a0]: f
> tracemem(b)
[1] "<0x308f7a0>"
> b
[1] 1.0 9000.1 3.0
> a
Error: object 'a' not found
Bottom line
Am I doing what I hope I am doing and is there a better way to do it?
(1) Environments You can use environments for that:
e <- new.env()
e$x <- 1:3
f <- function(e) with(e, x <- x + 1)
f(e)
e$x
(2) Reference Classes or since reference classes automatically use environments use those:
E <- setRefClass("E", fields = "x",
methods = list(
f = function() x <<- x + 1
)
)
e <- E$new(x = 1:3)
e$f()
e$x
(3) proto objects also use environments:
library(proto)
p <- proto(x = 1:3, f = function(.) with(., x <- x + 1))
p$f()
p$x
ADDED: proto solution
UPDATED: Changed function name to f for consistency with question.
I think the easiest approach is to only load the working copy into memory, instead of loading both the original (global namespace) and the working copy (function namespace). You can sidestep your whole issue by using the 'ff' package to define your 'x' and 'y' data sets as 'ffdf' data frames. As I understand it, 'ffdf' data frames reside on disk and load into memory only as parts of the data frame are needed and purge when those parts are no longer necessary. This would mean, theoretically, that the data would be loaded into memory to copy into the function namespace and then purged after the copy was complete.
I'll admit that I rarely have to use the 'ff' package, and when I do, I usually don't have any issues at all. I'm not checking specific memory usage, though, and my goal is usually just to perform a large calculation across the data. It works, and I don't ask questions.

memory efficient way of passing large objects in R

I have a function that needs to access a variable in its parent environment (scope from which the function is called). The variable is large in terms of memory so I would prefer not to pass it to by value to the function being called. Is there a standard way of doing this other than declaring the variable in the global scope? For example:
g <- function (a, b) { #do stuff}
f <- function(x) {
y <- 3 #but in my program y is very large
g(x, y)
}
I would like to access y in g(). So something like this:
g <- function (a) { a+y }
f <- function(x) {
y <- 3 #but in my program y is very large
g(x)
}
Is this possible?
Thanks
There is no advantage to "declaring the variable in the global scope" and it may not even be possible in R depending on what you mean by that. You certainly could use the second form. The action that causes duplicate or even triplicate copies of an object is assignment. You will need to describe in more detail what you are trying to illustrate by the code: y <- 3. That would not normally be needed inside a function that merely accessed an object named "y" that was located in an enclosing frame.
Storing variables in a declared environment will sometimes improve efficiency of access, but my understanding is that the efficiency is in terms of improved speed because a hash table is used. One accesses items in an environment in the same manner as one accesses list elements:
> evn <- new.env()
> evn$a <- rnorm(100000)
> ls(evn)
[1] "a"
> length(evn$a)
[1] 100000
The BigMemory project may offer facilities for this:
http://www.bigmemory.org/ .
It and Lumley's biglm may help with the large dataset mentioned in the comments.

Resources