Source ggplot code and use within ggbio::tracks function - r

I need to use ggbio::tracks function to have aligned Xaxis merged ggplots.
As ggplot scripts are complex they are saved as separate files which are then sourced to plot.
Here is one complex ggplot example script file, call it test.R, which will then be sourced:
# test.R
ggplot(cars,aes(speed,dist)) + geom_point(col="red")
Now, the problem:
library(ggbio)
library(ggplot2)
# complex ggplot script sourced text.R
x1 <- source("test.R")
# another complex ggplot script
x2 <- ggplot(cars,aes(speed,dist)) + geom_point(col="green")
# check classes
class(x1)
# [1] "list"
class(x2)
# [1] "gg" "ggplot"
# this works
print(x1)
# this doesn't work within tracks function
tracks(
print(x1),
x2,
heights=c(10,1)
)
Error: Objects of type list not supported by autoplot. Please use qplot() or ggplot() instead.
# below works - Note: x1$value
tracks(
x1$value,
x2,
heights=c(10,1)
)
I am surely missing something very simple, I tried to play with source() options, but couldn't find a way to avoid using $value or print(). Essentially, I want to be able to run below code and get above merged plot:
# ideal code
tracks(
x1,
x2,
heights=c(10,1)
)

Ad-hoc solution: modify your test.R by wrapping it into dummy function like so
# test.R
test_ggplot <- function() {
ggplot(cars,aes(speed,dist)) + geom_point(col="red")
}
and then
source("test.R")
x1 <- test_ggplot()
which obviously results in
class(x1)
#[1] "gg" "ggplot"
Honestly, I've never seen usage of xx <- source() so I doubt it is advised to do so. There is even no Value section in ?source...
Edit: source calls withVisible, which describes the return value exactly as a list:
This function evaluates an expression, returning it in a two element
list containing its value and a flag showing whether it would
automatically print.

Related

How to get the underling data of a MA plot?

I would like to use the plotMA function of limma.
The example of the documentation works fine:
A <- runif(1000,4,16)
y <- A + matrix(rnorm(1000*3,sd=0.2),1000,3)
status <- rep(c(0,-1,1),c(950,40,10))
y[,1] <- y[,1] + status
plotMA(y, array=1, status=status, values=c(-1,1), hl.col=c("blue","red"))
Now I would like to access the underlying data that is used for the plot as I would like to use the data in a different context, not just the plot. I currently don't see a way to access the data; of course I could implement the method myself and only use the data, but that feels wrong.
Is there a way to access the underlying data used for the MA plot?
Looking at the code of plotMA we see that several variables are created and used for plotting. These variables are not returned however.
You could now copy and paste the function to write your own function, which plots and returns the data. This is however, error-prone,if there is a new version of the function you may rely on old code.
So what you can do instead is to use trace to insert arbitrary code into plotMA notably some code which stores the data in your global environment. I illustrate the idea with a toy example:
f <- function(x) {
y <- x + rnorm(length(x))
plot(x, y)
invisible()
}
If we would like to use y in this function we could do something like this
trace(f, exit = quote(my_y <<- y))
# [1] "f"
ls()
# [1] "f"
f(1:10)
# Tracing f(1:10) on exit
ls()
# [1] "f" "my_y"
And now we can access my_y.
What you should do:
Look at the code of plotMA
Identify which part of the data you need (e.g. x, y and sel)
Use trace(plotMA, exit = quote({my_data <<- list(x, y, sel)}), where = asNamespace("limma"))
Run plotMA
Access the data via my_data
Note. Check out ?trace to fully understand the possibilities of it. In particular, if you want to inject your code not at the end (exit) but at another psoition (maybe because intermediate variables are overwritten and you need the first results) for which you would need to use the at parameter of trace
Update
Maybe the easiest is to get a full dump of all local variables defined in the function:
trace("plotMA", exit = quote(var_dump <<- mget(ls())), where = asNamespace("limma"))

Make sure that R functions don't use global variables

I'm writing some code in R and have around 600 lines of functions right now and want to know if there is an easy way to check, if any of my functions is using global variables (which I DON'T want).
For example it could give me an error if sourcing this code:
example_fun<-function(x){
y=x*c
return(y)
}
x=2
c=2
y=example_fun(x)
WARNING: Variable c is accessed from global workspace!
Solution to the problem with the help of #Hugh:
install.packages("codetools")
library("codetools")
x = as.character(lsf.str())
which_global=list()
for (i in 1:length(x)){
which_global[[x[i]]] = codetools::findGlobals(get(x[i]), merge = FALSE)$variables
}
Results will look like this:
> which_global
$arrange_vars
character(0)
$cal_flood_curve
[1] "..count.." "FI" "FI_new"
$create_Flood_CuRve
[1] "y"
$dens_GEV
character(0)
...
For a given function like example_function, you can use package codetools:
codetools::findGlobals(example_fun, merge = FALSE)$variables
#> [1] "c"
To collect all functions see Is there a way to get a vector with the name of all functions that one could use in R?
What about emptying your global environment and running the function? If an object from the global environment were to be used in the function, you would get an error, e.g.
V <- 100
my.fct <- function(x){return(x*V)}
> my.fct(1)
[1] 100
#### clearing global environment & re-running my.fct <- function... ####
> my.fct(1)
Error in my.fct(1) : object 'V' not found

How to open an R data file in R window

I have some data in R that I intend to analyze. However, the file is not displaying the data. Instead, It is only showing a variable in the data. The following is the procedure I used to load the data and the output produced.
load("C:\Users\user\AppData\Local\Temp\1_29_923-Macdonell.RData")
data=load("C:\Users\user\AppData\Local\Temp\1_29_923-Macdonell.RData")
data
[1] "HeightFinger"
How do I get to view the data?
If you read ?help, it says that the return value of load is:
A character vector of the names of objects created, invisibly.
This suggests (but admittedly does not state) that the true work of the load command is by side-effect, in that it inserts the objects into an environment (defaulting to the current environment, often but not always .GlobalEnv). You should immediately have access to them from where you called load(...).
For instance, if I can guess at variables you might have in your rda file:
x
# Error: object 'x' not found
# either one of these on windows, NOT BOTH
dat = load("C:\\Users\\user\\AppData\\Local\\Temp\\1_29_923-Macdonell.RData")
dat = load("C:/Users/user/AppData/Local/Temp/1_29_923-Macdonell.RData")
dat
# [1] "x" "y" "z"
x
# [1] 42
If you want them to be not stored in the current environment, you can set up an environment to place them in. (I use parent=emptyenv(), but that's not strictly required. There are some minor ramifications to not including that option, none of them earth-shattering.)
myenv <- new.env(parent = emptyenv())
dat = load("C:/Users/user/AppData/Local/Temp/1_29_923-Macdonell.RData",
envir = myenv)
dat
# [1] "x" "y" "z"
x
# Error: object 'x' not found
ls(envir = myenv)
# [1] "x" "y" "z"
From here you can get at your data in any number of ways:
ls.str(myenv) # similar in concept to str() but for environments
# x : num 42
# y : num 1
# z : num 2
myenv$x
# [1] 42
get("x", envir = myenv)
# [1] 42
Side note:
You may have noticed that I used dat as my variable name instead of data. Though you are certainly allowed to use that, it can bite you if you use variable names that match existing variables or functions. For instance, all of your code will work just fine as long as you load your data. If, however, you run some of your code without pre-loading your objects into your data variable, you'll likely get an error such as:
mean(data$x)
# Error in data$x : object of type 'closure' is not subsettable
That error message is not immediately self-evident. The problem is that if not previously defined as in your question, then data here refers to the function data. In programming terms, a closure is a special type of function, so the error really should have said:
# Error in data$x : object of type 'function' is not subsettable
meaning that though dat can be subsetted and dat$x means something, you cannot use the $ subset method on a function itself. (You can't do mean$x when referring to the mean function, for example.) Regardless, even though this here-modified error message is less confusing, it is still not clearly telling you what/where the problem is located.
Because of this, many seasoned programmers will suggest you use unique variable names (perhaps more than just x :-). If you use my suggestion and name it dat instead, then the mistake of not preloading your data will instead error with:
mean(dat$x)
# Error in mean(dat$x) : object 'dat' not found
which is a lot more meaningful and easier to troubleshoot.
There are two ways to save R objects, and you've got them mixed up. In the first way, you save() any collection of objects in an environment to a file. When you load() that file, those objects are re-created with their original names in your current environment. This is how R saves and resotres workspaces.
The second way stores (serializes) a single R object into a file with the saveRDS() function, and recreates it in your environment with the readRDS() function. If you don't assign the results of readRDS(), it'll just print to your screen and drift away.
Examples below:
# Make a simple dataframe
testdf <- data.frame(x = 1:10,
y = rnorm(10))
# Save it out using the save() function
savedir <- tempdir()
savepath <- file.path(savedir, "saved.Rdata")
save(testdf, file = savepath)
# Delete it
rm(testdf)
# Load without assigning - and it's back in your environment
load(savepath)
testdf
# But if you assign the results of load, you just get the name of the object
wrong <- load(savepath)
wrong
# Compare with the RDS:
rds_path <- file.path(savedir, "testdf.rds")
saveRDS(testdf, file = rds_path)
rm(testdf)
testdf <- readRDS(file = rds_path)
testdf
Why the two different approaches? The save()-environment approach is good for creating a checkpoint of your entire environment that you can restore later - that's what R uses it for - but that's about it. It's too easy for such an environment to get cluttered, and if an object you load() has the same name as an object in your current environment, it will overwrite that object:
testdf$z <- "blah"
load(savepath)
testdf # testdf$z is gone
The RDS method lets you assign the name on read, as you're looking to do here. It's a little more annoying to save multiple objects, sure, but you probably shouldn't be saving objects very often anyway - recreating objects from scratch is the best way to ensure that your R code does what you think it does.

Why doesn't rm(ls()) work and "list" is needed?

I just got started learning R with the "datacamp" site and I ran into a syntax misunderstanding at the beginning.
It says that rm(list = ls()) is a very useful command to clear everything from your workspace but I don't understand what list = is for.
a. They haven't yet taught me the meaning of = in R and I didn't find an explanation at the documentation. = is like <-? What's the difference?
b. If the input of rm() can be a list of variables names, and the output of ls() is a list of var names, why can't I just use rm(ls())?
Passing arguments by position vs name
The = symbol plays a special role in naming arguments to a function call.
Consider two essentially identical functions:
f <- function(..., y=3) (2+sum(...))^y
g <- function(y=3, ...) (2+sum(...))^y
If y= is not named, the results are generally different:
f(y=5) # 32
g(y=5) # 32
f(5) # 343
g(5) # 32
rm is like f -- type ?rm to see -- so if you want to call rm(list = ls()), write it out in full.
Representing object names
In most of R, if you write f(g()), evaluation flows naturally:
g() is evaluated to 8 and substituted into f(g()) for f(8)
f(8) is evaluated to 1000
rm breaks this pattern in its unnamed ... arguments, which basically just exist for interactive use. Only manually typed variable names are allowed.† As a result, rm(ls()) won't run.
Hadley Wickham provides another nice example:
ggplot2 <- "plyr"
library(ggplot2) # loads ggplot2, not plyr!
† Okay, you can use the ... without manually typed names, like
do.call(library, as.list(ggplot2)) # loads plyr!
but don't mess with that unless you know what you're doing.

How to retrieve formals of a primitive function?

For the moment, at least, this is an exercise in learning for me, so the actual functions or their complexity is not the issue. Suppose I write a function whose argument list includes some input variables and a function name, passed as a string. This function then calculates some variables internally and "decides" how to feed them to the function name I've passed in.
For nonprimitive functions, I can do (for this example, assume non of my funcname functions have any arguments other than at most (x,y,z). If they did, I'd have to write some code to search for matching names(formals(get(funcname))) so as not to delete the other arguments):
foo <- function (a,b,funcname) {
x <- 2*a
y <- a+3*b
z <- -b
formals(get(funcname)) <- list(x=x, y=y, z=z)
bar <- get(funcname)()
return(bar)
}
And the nice thing is, even if the function funcname will execute without error even if it doesn't use x, y or z (so long as there are no other args that don't have defaults) .
The problem with "primitive" functions is I don't know any way to find or modify their formals. Other than writing a wrapper, e.g. foosin <-function(x) sin(x), is there a way to set up my foo function to work with both primitive and nonprimitive function names as input arguments?
formals(args(FUN)) can be used to get the formals of a primitive function.
You could add an if statement to your existing function.
> formals(sum)
# NULL
> foo2 <- function(x) {
if(is.primitive(x)) formals(args(x)) else formals(x)
## formals(if(is.primitive(x)) args(x) else x) is another option
}
> foo2(sum)
# $...
#
#
# $na.rm
# [1] FALSE
#
> foo2(with)
# $data
#
#
# $expr
#
#
# $...
Building on Richard S' response, I ended up doing the following. Posted just in case anyone else ever tries do things as weird as I do.
EDIT: I think more type-checking needs to be done. It's possible that coleqn could be
the name of an object, in which case get(coleqn) will return some data. Probably I need
to add a if(is.function(rab)) right after the if(!is.null(rab)). (Of course, given that I wrote the function for my own needs, if I was stupid enough to pass an object, I deserve what I get :-) ).
# "coleqn" is the input argument, which is a string that could be either a function
# name or an expression.
rab<-tryCatch(get(coleqn),error=function(x) {} )
#oops, rab can easily be neither NULL nor a closure. Damn.
if(!is.null(rab)) {
# I believe this means it must be a function
# thanks to Richard Scriven of SO for this fix to handle primitives
# we are not allowed to redefine primitive's formals.
qq <- list(x=x,y=y,z=z)
# matchup the actual formals names
# by building a list of valid arguments to pass to do.call
argk<-NULL
argnames<-names(formals(args(coleqn)))
for(j in 1:length(argnames) )
argk[j]<-which(names(qq)==argnames[1] )
arglist<-list()
for(j in 1:length(qq) )
if(!is.na(argk[j])) arglist[[names(qq)[j]]]<-qq[[j]]
colvar<- do.call(coleqn,arglist)
} else {
# the input is just an expression (string), not a function
colvar <- eval(parse(text=coleqn))
}
The result is an object generated either by the expression or the function just created, using variables internal to the main function (which is not shown in this snippet)

Resources