pretty log scale breaks with ggplot hex plot - r

I want to draw a hexbin plot with ggplot, but with log scale "pretty" breaks for the frequency. Consider
df = data.frame(a=rnorm(1000)); df$b <- df$a+rnorm(1000);
I used this answer to get pretty breaks on linear scale
ggplot(df, aes(a,b)) +
geom_hex(aes(fill=cut(..value..,breaks=pretty(..value.., n=10)))) +
scale_fill_discrete("Frequency")
This works. Now say I want to use log scale pretty breaks. So I used the idea from another answer to define
base_breaks <- function(n = 10){
function(x) {
axisTicks(log10(range(x, na.rm = TRUE)), log = TRUE, n = n)
}
}
and try to do
ggplot(df, aes(a,b)) +
geom_hex(aes(fill=cut(..value..,breaks=base_breaks(n=10)(..value..))))
but it is not able to find the function. It says:
Error in cut.default(value, breaks = base_breaks(n = 10)(value)) :
could not find function "base_breaks"
Even though base_breaks is defined.
> base_breaks(n=10)(c(1:1000))
[1] 1 5 10 50 100 500 1000
How can I make my function visible in whatever environment ggplot is calling it? I even defined it as a global variable with
base_breaks <<- function(n = 10){
function(x) {
axisTicks(log10(range(x, na.rm = TRUE)), log = TRUE, n = n)
}
}
but I still get the same error.

I am not sure about it, but you could try simplifying the function like this:
base_breaks <<- function(n = 10, x){
axisTicks(log10(range(x, na.rm = TRUE)), log = TRUE, n = n)
}
Maybe the problem is that you have a function whose result is another function, and that could be causing the error. With this aproach you would have the values more directly. Check it out!
I can't check it myself, since I get an error object 'value' not found...

Related

R debug() "could not find function" even though it exists

When I try to debug a certain function (itself defined within the function NbCluster), I get a could not find function error. I have checked and the function in question is definitely loaded when debug is called.
> data("USArrests")
> arrests <- scale(USArrests)
> source("NbCluster_copy.R")
> NbCluster_copy(data = arrests, diss = NULL, distance = "euclidean", min.nc = 2, max.nc = 12,
+ method = "ward.D2", index = "gap", alphaBeale = 0.1)
[1] "Indice.Gap exists"
Error in debug(fun = "Indice.Gap") : could not find function "Indice.Gap"
And the issue does not happen if I manually step through the function (by selecting and running lines instead of calling the function).
I tried making a minimal example, but was unable to, so I don't think it is the nested functions that are the problem.
###This works as expected, when I run "wrapper", debug is called from within the function:
wrapper <- function(x){
wrapper <- function(x){
fun1 <- function(x){
fun0 <- function(x){
y = x + 1
return(y)
}
debug(fun0)
y = fun0(x) * 2
return(y)
}
fun1(x)
}
> wrapper(2)
debugging in: fun0(x)
debug at #3: {
y = x + 1
return(y)
}
Browse[2]>
debug at #4: y = x + 1
Browse[2]>
debug at #5: return(y)
Browse[2]>
exiting from: fun0(x)
[1] 6
This is the part I added into the NbClust function.
if(exists("Indice.Gap")){
print("Indice.Gap exists")
}
debug(fun = "Indice.Gap")
right before the first call of Indice.Gap:
resultSGAP <- Indice.Gap(x = jeu, clall = clall,
reference.distribution = "unif", B = 10, method = "ward.D2",
d = NULL, centrotypes = "centroids")
I only made very minor changes besides the one shown above, but if you want to look at the whole function, my copy is here: https://pastebin.com/wxKKDbHy
Just remove the quotes in debug and it should work:
debug(Indice.Gap)
should do the trick.
outer_fun <- function() {
inner_fun <- function() 1
## does not work
# debug("inner_fun")
## works
debug(inner_fun)
inner_fun()
}
outer_fun()
Funny enough on the top level you can provide the function name as string:
debug("outer_fun") # works
debug(outer_fun) # works

Issues with R's plot: expr did not evaluate to an object of length n

I want to plot this function in R:
My code:
lambda <- function (i) {
m <- 50
j=max(1,i+m)
n <- 27090
((2*m+1)^(-1))*
for (i in j:min(n,i+m)) {
sum (fires_2009_2015$Wi)[j]
}
}
plot(lambda)
However when I run the code, I get an error:
"Error in curve(expr = x, from = from, to = to, xlim = xlim, ylab = ylab, :
'expr' did not evaluate to an object of length 'n' "
The problem is that plot(lambda) will call the function lambda() with a vector argument for i, and your definition of lambda() won't handle that properly. Because of your for loop, it looks a little tricky to make it handle vector arguments efficiently, but there's always the brute force method:
lambda <- Vectorize(lambda)
After this your plot(lambda) should do something. It's probably not what you want, because it will default to a range from 0 to 1, but you can do things like
plot(lambda, from = 0, to = 20)
to change the default range.

Variable name to strings, R

I have a question about convert variable name into strings to work as a x-axis name.
I tried to apply the deparse(substitute(input)), but unfortunately, it doesn't work well when I called this function within another function.
plot_CI <- function(input){
nm <- deparse(substitute(input))
if (substring(nm,1,1) == 'u') {
prior <- 'uniform'
} else if ((substring(nm,1,1) == 'l')) {
prior <- 'logit_Normal'
} else {
prior <- paste(strsplit(nm,"_")[[1]][1:2],collapse="_")
}
plot <- ggplot(temp_data, aes(x = x, y = mean)) +
geom_point(size = 2) +
geom_errorbar(aes(ymax = high, ymin = low)) +
geom_hline(yintercept = true_value, col = 'blue') +
labs(x=prior, y='value')
return(plot)
}
sen_plot <- function(variable){
# variable <- deparse(substitute(var))
file_name <- paste0('C:/Users/Qiangsuper/Dropbox/Papers/1/plot/sensitivity_', variable, '.png')
png(filename = file_name, width = 1000, height = 400)
p1 <- plot_CI(eval(parse(text = paste0('uniform_', variable))))
p2 <- plot_CI(eval(parse(text = paste0('logitN_', variable))))
multiplot(p1,p2,cols=2)
dev.off()
}
for (i in c("beta_1", "beta_2", "beta_3", "phi", "p", "delta")) {
sen_plot(i)
}
I expect 'uniform' as X-axis name, however, I only received eval(parse(text = paste0('Uniform_', variable))).
Thank you very much for your help.
UPDATE ABOUT QUESTION:
I think I should make the question more concise. Here is an easily-understood scenario.
uniform_beta_1 is a data frame or data table, which stores the my results. I try to develop an automatic plotting algorithm which will automatically identify which prior distribution I applied and name the X-axis with this prior distribution. For uniform_beta_1, the prior distribution is uniform, then the X-axis's name will be uniform. Here is what I try to do:
input <- uniform_beta_1
nm <- deparse(substitute(input))
Then apply the substring command to judge the prior distribution. However, in this case, nm will return 'input' rather than 'uniform_beta_1'. I am wondering if there is any way I can return 'uniform_beta_1'.
Thank you very much for your help.
The final solution for my question is that I create a list at very beginning and store all these strings into this list to avoid the transformation in the function.

passing default values from outer function to repeatedly called inner function in R

This question differs from my original; it adheres more to a minimal reproducible example and incorporates a recommendation by be_green against silently loading entire libraries within the context of a function.
The outer function starts by defining a number of cases, default values, and a list of any case exceptions. The inner function assembles each case by using the default values in a computation unless exceptions are defined. Finally, the outer function assembles these cases into a data frame.
Here is the function:
outerfun <- function(cases, var_default, exceptions=list()){
# Inner Function to create a case
innerfun <- function(var=var_default) { # Case
result = var
return(result)
}
# Combine Cases
datlist <- list()
for(case in 1:cases){
datlist[[paste0("X",case)]] <- do.call(innerfun, as.list(exceptions[[paste0("X",case)]]))
}
casedata <- do.call(dplyr::data_frame, datlist)
return(casedata)
}
This function works fine when I define values for the inner function as exceptions:
data <- outerfun(cases = 3, var_default = 10, exceptions = list("X2" = c(var = 14)))
But not when I mix the two:
data <- outerfun(cases = 3, var_default = 10, exceptions =
list("X2" = c(var = var_default + 4)))
Being able to mix the two are important since it makes the function more intuitive and easier to program for a variety of cases.
I think the problem might result from using do.call and have seen other threads detailing this issue (having to do with environments and frames), but I haven't been able to find an optimal solution for me. I like do.call since I can pass a list of arguments into a function. I could turn the inner function into a list (think: function(...) { }) but then I would have to define every variable instead of relying on the default.
Any help or suggestions you might have would be great.
The problem is that lvl_default is not defined outside the context of the function, and yet you call it as an input to a parameter. Because there is no variable called lvl_default in the global environment, when the function tries to evaluate the parameter exceptions = list(X3 - c(lvl = lvl_default + 10), it fails to find a variable to evaluate. You are not able to specify parameters by setting them equal to the names of other unevaluated parameters.
Instead, what I would recommend doing is setting a variable outside the function associated with the value you were hoping to pass into lvl_default and then pass it into the function like so:
level <- 1000
data <- genCaseData(n_signals = 3, datestart = "2017-07-01T15:00:00",
n_cycles = 4, period_default = 10, phase_default = 0, ampl_default = 15,
lvl_default = level, exceptions = list(X1= c(lvl=980),
X3 = c(lvl = level + 10)))
Also as I noted in a comment, I would recommend against silently loading entire libraries within the context of a function. You can end up masking things you didn't mean to, and running into strange errors because the require call doesn't actually throw one if a library is unavailable. Instead I would reference the functions through pkgname::fncname.
be_green did solve this first, but I wanted to follow-up with what I actually did for my project.
As be_green pointed out, I couldn't call var_default within the exception list since it hadn't yet been defined. I didn't understand this at first since you can actually define the default of an argument to a variable defined within the function itself:
addfun <- function(x, y = z + x + 2) {
z = 20
c(x, y)
}
addfun(x = 20)
[1] 20 42
This is because function arguments in R lazily evaluated. I thought this gave me a pass to call the function like this:
addfun(x = 10, y = x + z)
Error in addfun(x = 10, y = x + z) : object 'x' not found
If you remove x then it calls an error for z. So even though the default to y is dependent on x and z, you can't call the function using x or z.
be_green suggested that I pass arguments in a string and then parse it within the function. But I was afraid that others on my team would find the resulting syntax confusing.
Instead, I used ellipsis (...) and evaluated the ellipsis arguments within my function. I did this using this line of code:
list2env(eval(substitute(alist(...))), envir = as.environment(-1))
Here the eval(substitute(alist(...))) pattern is common but results in a named list of arguments. Due to some other features, it becomes more convenient to evaluate the arguments as objects within the function. list2env(x, envir = as.environment(-1)) accomplishes this with an additional step. Once the argument is called, you need to explicitly evaluate the call. So if I wanted to change my addfun() above:
addfun <- function(x, ...) {
z = 20
list2env(eval(substitute(alist(...))),
envir = as.environment(-1))
c(x, eval(y))
}
addfun(x = 10, y = x + z)
This is a trite example: I now need to define y even though it's not an argument in the function. But now I can even re-define z within the function call:
addfun(x = 10, y = z + 2, z = 10)
This is all possible because of non-standard evaluation. There can be trade-offs but in my application of non-standard evaluation, I was able to increase the usability and flexibility of the function while making it more intuitive to use.
Final code:
outerfun <- function(caseIDs, var_default, ...){
list2env(eval(substitute(alist(...))), envir = as.environment(-1))
# Inner Function to create a case
innerfun <- function(var=var_default) { # Case
result = var
return(result)
}
# Combine Cases
datlist <- lapply(caseIDs, function(case) {
do.call(innerfun, eval(get0(case, ifnotfound = list())))
})
names(datlist) <- caseIDs
casedata <- do.call(dplyr::data_frame, datlist)
return(casedata)
}
Now both examples work with full functionality:
data <- outerfun(caseIDs = c("X1","X2","X3"), var_default = 10,
X2 = list(var = 14))
data <- outerfun(caseIDs = c("X1","X2","X3"), var_default = 10,
X2 = list(var = var_default + 4))
I hope this helps someone else! Enjoy!

R - Limit output of summary.princomp

I'm running a principal component analysis on a dataset with more than 1000 variables. I'm using R Studio and when I run the summary to see the cumulative variance of the components, I can only see the last few hundred components. How do I limit the summary to only show, say, the first 100 components?
I tried this and it seems to be working:
l = loadings(prin)
l[,1:100]
It's pretty easy to modify print.summary.princomp (you can see the original code by typing stats:::print.summary.princomp) to do this:
pcaPrint <- function (x, digits = 3, loadings = x$print.loadings, cutoff = x$cutoff,n, ...)
{
#Check for sensible value of n; default to full output
if (missing(n) || n > length(x$sdev) || n < 1){n <- length(x$sdev)}
vars <- x$sdev^2
vars <- vars/sum(vars)
cat("Importance of components:\n")
print(rbind(`Standard deviation` = x$sdev[1:n], `Proportion of Variance` = vars[1:n],
`Cumulative Proportion` = cumsum(vars)[1:n]))
if (loadings) {
cat("\nLoadings:\n")
cx <- format(round(x$loadings, digits = digits))
cx[abs(x$loadings) < cutoff] <- paste(rep(" ", nchar(cx[1,
1], type = "w")), collapse = "")
print(cx[,1:n], quote = FALSE, ...)
}
invisible(x)
}
pcaPrint(summary(princomp(USArrests, cor=TRUE),
loadings = TRUE, cutoff = 0.2), digits = 2,n = 2)
Edited To include a basic check for a sensible value for n. Now that I've done this, I wonder if it isn't worth suggesting to R Core as a permanent addition; seems simple and like it might be useful.
You can put the loadings in matrix form, you could save the matrix to a variable and then subset (a la matrix[,1:100]) it to see the first/middle/last n. In this example, I've used head(). Each column is a principle component.
head(
matrix(
prin$loadings,
ncol=length(dimnames(prin$loadings)[[2]]),
nrow=length(dimnames(prin$loadings)[[1]])
),
100)

Resources