How to avoid polluting the current scope (with `library(...)`) - r

As a matter of long-standing policy, I avoid importing names into (aka "polluting") the current scope, and instead I use fully-qualified names when referring to items defined in a different package.
The script below shows that, in R, using qualified names is, in itself, not enough.
#!/usr/bin/env Rscript
set.seed(0)
x <- local({
x0 <- matrix(rnbinom(80, size = 5, mu = 10), nrow = 20)
`rownames<-`(rbind(0, c(0, 0, 2, 2), x0),
paste("Tag", 1:(nrow(x0) + 2), sep = "."))
})
y <- edgeR::DGEList(counts = x,
group = rep(1:2, each = 2),
lib.size = 1001:1004)
## library(edgeR)
y[1, 1]
The script fails with
Error in y[1, 1] : incorrect number of dimensions
Execution halted
The script's only crime appears to be not having included the line library(edgeR) somewhere before the failing statement, since the error disappears if one un-comments the commented-out line.
This is voodoo, imho.
Is there a way to avoid the error without polluting the current scope with library(edgeR)?

When you avoid loading the edgeR package, you also avoid loading the [.DGEList method, which is necessary to execute y[1, 1]. If you prefer not to load the edgeR library, you'll need to call the extraction function directly:
edgeR::`[.DGEList`(y, 1, 1)
If you don't like the fully qualified syntax, you can bring in the method you need with
`[.DGEList` <- edgeR::`[.DGEList`
Then y[1, 1] will work as expected. But this is another form of pollution and I'm not sure I'd recommend it as a general solution.

Related

Is there a way to call variables from another file in R without having them appear in the workspace?

I have a list of HEX colours that I want to use for my graphs/tables etc in R.
I have written a piece of code that calls these values at the start of the script.
col1 <- '#00573F'
col2 <- '#40816F'
col3 <- '#804B9F'
col4 <- '#C0D5D0'
col5 <- '#A29161'
I then call these values when plotting throughout, for example:
x <- seq(-pi, pi, 0.1)
plot(x, sin(x),
main="The Sine Function",
ylab="sin(x)",
type="l",
col=col1)
This works perfectly.
However, I was wondering if there is a way to store these colour variables within R as a standard set of variables that I don't have to call every time I start a new script?
Also, it would be great if they didn't show up in the Environment as values purely because there are so many of these colours and I have a hard time keeping track of all the other values in there.
Many have adopted packages as the default way to write R code, to enable organising things like this.
You can get away with a barebone version, which I'll describe here.
You need a R/ folder; dir.create("R").
This directory should not contain scripts, but rather standalone functions, etc. that you have no problem sourcing whenever appropriate.
Inside of this you could make a custom_colors function; file.edit("R/custom_colors.R") (this will open a file in RStudio). Add:
custom_colors <- function(color_id) {
c(
col1 = '#00573F',
col2 = '#40816F',
col3 = '#804B9F',
col4 = '#C0D5D0',
col5 = '#A29161'
)[color_id]
}
Then wherever you need it, you may write source("R/custom_colors.R") to have that single function enter your environment.
Thus you may call custom_colors(1) instead of col1.
A handful of options to consider
Develop an internal package for your color constants
I won't so far as to write the package, but packages may contain any R object (not just functions and data). You could develop an internal package to hold your color constants. If your package is names myInternals, you can then call
x <- seq(-pi, pi, 0.1)
plot(x, sin(x),
main="The Sine Function",
ylab="sin(x)",
type="l",
col= myInternals::col1)
If you have multiple people that need access to your constants, this is the path I would take. It's a bit more overhead work, but separates the constants into a separate environment that is relatively easy to access.
Truth be told, I have an internal package where I work now that uses #Mossa's strategy.
Use 'hidden objects'
If you precede an object with a ., it won't show up in the list of items in the environment (assuming you're using the RStudio pane)
But run the following:
.col1 <- "#00573F"
# .col1 doesn't show up
ls()
# .col1 does show up
ls(all.names = TRUE)
x <- seq(-pi, pi, 0.1)
plot(x, sin(x),
main="The Sine Function",
ylab="sin(x)",
type="l",
col= .col1)
This is probably the easiest, in my opinion, and what I would do if no one else needed access to my constants.
Use a list
Much like #Mossa's answer, using a list will reduce the number of new objects shown in the environment to just 1.
col_list <- list(col1 = '#00573F'
col2 = '#40816F'
col3 = '#804B9F'
col4 = '#C0D5D0'
col5 = '#A29161')
x <- seq(-pi, pi, 0.1)
plot(x, sin(x),
main="The Sine Function",
ylab="sin(x)",
type="l",
col=col_env$col1)
Use an environment
This also only adds one object to the environment, and stores the constants outside of the current environment. Using them isn't much different than using a list, however, so I'm not sure what exactly is gained.
col_env <- new.env()
assign("col1", "#00573F", col_env)
x <- seq(-pi, pi, 0.1)
plot(x, sin(x),
main="The Sine Function",
ylab="sin(x)",
type="l",
col=col_env$col1)
You can add them to your .Rprofile as list or a function (as Mossa suggests), that R will run at each startup.
See this post on how to find your .Rprofile.

Need some help writing a function

I'm trying to write a function that takes a few lines of code and allows me to input a single variable. I've got the code below that creates an object using the Surv function (Survival package). The second line takes the variable in question, in this case a column listed as Variable_X, and outputs data that can then be visualized using ggsurvplot. The output is a Kaplan-Meier survival curve. What I'd like to do is have a function such that i can type f(Variable_X) and have the output KM curve visualized for whichever column I choose from the data. I want f(y) to output the KM as if I had put y where the ~Variable_X currently is. I'm new to R and very new to how functions work, I've tried the below code but it obviously doesn't work. I'm working through datacamp and reading posts but I'm having a hard time with it, appreciate any help.
surv_object <- Surv(time = KMeier_DF$Followup_Duration, event = KMeier_DF$Death_Indicator)
fitX <- survfit(surv_object ~ Variable_X, data = KMeier_DF)
ggsurvplot(fitX, data = KMeier_DF, pval = TRUE)
f<- function(x) {
dat<-read.csv("T:/datafile.csv")
KMeier_DF < - dat
surv_object <- Surv(time = KMeier_DF$Followup_Duration, event =
KMeier_DF$Death_Indicator)
fitX<-survfit(surv_object ~ x, data = KMeier_DF)
PlotX<- ggsurvplot(fitX, data = KMeier_DF, pval = TRUE)
return(PlotX)
}
The crux of the problem you have is actually a tough stumbling block to figure out initially: how to pass variable or dataframe column names into a function. I created some example data. In the example below I supply a function four variables, one of which is your data. You can see two ways I call on the columns, using [[]], and [,], which you can think of as being equivalent to using $. Outside of functions, they are, but not inside. The print functions are there to just show you the data along the way. If those objects exist in your global environment, remove them one by one, rm(surv_object), or clear them all rm(list = ls()).
duration <- c(1, 3, 4, 3, 3, 4, 2)
di <- c(1, 1, 0, 0, 0, 0, 1)
color <- c(1, 1, 2, 2, 3, 3, 4)
KMdf <- data.frame(duration, di, color)
testfun <- function(df, varb1, varb2, varb3) {
surv_object <- Surv(time = df[[varb1]], event = df[ , varb2])
print(surv_object)
fitX <- survfit(surv_object ~ df[[varb3]], data = df)
print(fitX)
# plotx <- ggsurvplot(fitX, data = df, pval = TRUE) # this gives an error that surv_object is not found
# return(plotx)
}
testfun(KMdf, "duration", "di", "color") # notice the use of quotes here, if not you'll get an error about object not found.
And even better, you have an even tougher stumbling block: how r handles variables and where it looks for them. From what I can tell, you're running into that because there is possibly a bug in ggsurvplot and looking at the global environment for variables, and not inside the function. They closed the issue, but as far as I can tell, it's still there. When you try to run the ggsurvplot line, you'll get an error that you would get if you didn't supply a variable:
Error in eval(inp, data, env) : object 'surv_object' not found.
Hopefully that helps. I'd submit a bug report if I were you.
edit
I was hoping this solution would help, but it doesn't.
testfun <- function(df, varb1, varb2, varb3) {
surv_object <- Surv(time = df[[varb1]], event = df[,varb2])
print(surv_object)
fitX <- survfit(surv_object ~ df[[varb3]], data = df)
print(fitX)
attr(fitX[['strata']], "names") <- c("color = 1", "color = 2", "color = 3", "color = 4")
plotx <- ggsurvplot(fitX, data = df, pval = TRUE) # this gives an error that surv_object is not found
return(plotx)
}
Error in eval(inp, data, env) : object 'surv_object' not found
This is homework, right?
First, you need to try to run the code before you provide it as an example. Your example has several fatal errors. ggsurvplot() needs either a library call to survminer or to be summoned as follows: survminer::ggsurvplot().
You have defined a function f, but you never used it. In the function definition, you have a wayward space < -. It never would have worked.
I suggest you start by defining a function that calculates the sum of two numbers, or concatenates two strings. Start here or here. Then, you can return to the Kaplan-Meier stuff.
Second, in another class or two, you will need to know the three parts of a function. You will need to understand the scope of a function. You might as well dig into the basics before you start copy-and-pasting.
Third, before you post another question, please read How to make a great R reproducible example?.
Best of luck.

Extract value from RStudio manipulate package control

Is there a way to extract current value from a slider control in manipulate package? For example:
library(manipulate)
xx <- seq(-pi, pi, pi/20)
manipulate(
plot(xx, sin(par.a*xx)),
par.a = slider(-3, 3, step=0.01, initial = 1))
After playing with the slider I would like to get the value of par.a for further calculations without having to look at the control and write it by hand each time.
Figured it out myself.
It is possible to do with global variables
manipulate(
{plot(xx, sin(par.a*xx))
a <<- par.a},
par.a = slider(-3, 3, step=0.01, initial = 1))
Maybe it will be of assistance to somebody.

tryCatch() apparently ignoring a warning

I'm writing a function that uses kmeans to determine bin widths to convert a continuous measurement (a predicted probability) into an integer (one of 3 bins). I've stumbled upon an edge case in which it's possible for my algorithm to (correctly) predict the same probability for a whole set, and I want to handle that situation. I'm using the rattle package's binning() function in the following way:
btsKmeansBin <- function(x, k = 3, default = c(0, 0.3, 0.5, 1)) {
result <- binning(x, bins = k, method = "kmeans", ordered = T)
bins <- attr(result, "breaks")
attr(bins, "names") <- NULL
bins <- bins[order(bins)]
bins[1] <- 0
bins[length(bins)] <- 1
return(bins)
}
Run this function on x <- c(.5,.5,.5,.5,.5,.5), and you'll get an error at the order(bins) step, because bins will be NULL and therefore not a vector.
Obviously, if x has only one distinct value, kmeans shouldn't work. In this case, I'd like to return the default bin divisions. When this happens, binning issues "Warning: the variable is not considered." So I'd like to use tryCatch to handle this warning, but surrounding the result <- ... line with the following code doesn't work the way I expect:
...
tryCatch({
result <- binning(x, bins = k, method = "kmeans", ordered = T)
}, warning = function(w) {
warn(sprintf("%s. Using default values", w))
return(default)
}, error = function(e) {
stop(e)
})
...
The warning gets printed as though I hadn't used tryCatch, and the code progresses past the return statement and throws the error from order again. I have tried a bunch of variations to no avail. What am I missing, here??
If you look in binning I think you'll find that the "warning" you see is not generated via warning() but with cat(), which is why tryCatch isn't picking it up. The author of binning probably deserves a few lashings with a wet noodle for this oversight. ;) (Or it could be on purpose due to the particular way that rattle works, I'm not sure.)
It appears to return NULL when this happens, so you could simply handle it manually. Not ideal, but possibly the only way to go.

Reinitializing variables in R and having them update globally

I'm not sure how to pose this question with the right lingo and the related questions weren't about the same thing. I wanted to plot a function and noticed that R wasn't udpating the plot with my change in a coefficient.
a <- 2
x <- seq(-1, 1, by=0.1)
y <- 1/(1+exp(-a*x))
plot(x,y)
a <- 4
plot(x,y) # no change
y <- 1/(1+exp(-a*x)) # redefine function
plot(x,y) # now it updates
Just in case I didn't know what I was doing, I followed the syntax on this R basic plotting tutorial. The only difference was the use of = instead of <- for assignment of y = 1/(1+exp(-a*x)). The result was the same.
I've actually never just plotted a function with R, so this was the first time I experienced this. It makes me wonder if I've seen bad results in other areas if re-defined variables aren't propagated to functions or objects initialized with the initial value.
1) Am I doing something wrong and there is a way to have variables sort of dynamically assigned so that functions take into account the current value vs. the value it had when they were created?
2) If not, is there a common way R programmers work around this when tweaking variable assignments and making sure everything else is properly updated?
You are not, in fact, plotting a function. Instead, you are plotting two vectors. Since you haven't updated the values of the vector before calling the next plot, you get two identical plots.
To plot a function directly, you need to use the curve() function:
f <- function(x, a)1/(1+exp(-a*x))
Plot:
curve(f(x, 1), -1, 1, 100)
curve(f(x, 4), -1, 1, 100)
R is not Excel, or MathCAD, or any other application that might lead you to believe that changing an object's value might update other vectors that might have have used that value at some time in the past. When you did this
a <- 4
plot(x,y) # no change
There was no change in 'x' or 'y'.
Try this:
curve( 1/(1+exp(-a*x)) )
a <- 10
curve( 1/(1+exp(-a*x)) )

Resources