I want to draw a hexbin plot with ggplot, but with log scale "pretty" breaks for the frequency. Consider
df = data.frame(a=rnorm(1000)); df$b <- df$a+rnorm(1000);
I used this answer to get pretty breaks on linear scale
ggplot(df, aes(a,b)) +
geom_hex(aes(fill=cut(..value..,breaks=pretty(..value.., n=10)))) +
scale_fill_discrete("Frequency")
This works. Now say I want to use log scale pretty breaks. So I used the idea from another answer to define
base_breaks <- function(n = 10){
function(x) {
axisTicks(log10(range(x, na.rm = TRUE)), log = TRUE, n = n)
}
}
and try to do
ggplot(df, aes(a,b)) +
geom_hex(aes(fill=cut(..value..,breaks=base_breaks(n=10)(..value..))))
but it is not able to find the function. It says:
Error in cut.default(value, breaks = base_breaks(n = 10)(value)) :
could not find function "base_breaks"
Even though base_breaks is defined.
> base_breaks(n=10)(c(1:1000))
[1] 1 5 10 50 100 500 1000
How can I make my function visible in whatever environment ggplot is calling it? I even defined it as a global variable with
base_breaks <<- function(n = 10){
function(x) {
axisTicks(log10(range(x, na.rm = TRUE)), log = TRUE, n = n)
}
}
but I still get the same error.
I am not sure about it, but you could try simplifying the function like this:
base_breaks <<- function(n = 10, x){
axisTicks(log10(range(x, na.rm = TRUE)), log = TRUE, n = n)
}
Maybe the problem is that you have a function whose result is another function, and that could be causing the error. With this aproach you would have the values more directly. Check it out!
I can't check it myself, since I get an error object 'value' not found...
Related
When I try to debug a certain function (itself defined within the function NbCluster), I get a could not find function error. I have checked and the function in question is definitely loaded when debug is called.
> data("USArrests")
> arrests <- scale(USArrests)
> source("NbCluster_copy.R")
> NbCluster_copy(data = arrests, diss = NULL, distance = "euclidean", min.nc = 2, max.nc = 12,
+ method = "ward.D2", index = "gap", alphaBeale = 0.1)
[1] "Indice.Gap exists"
Error in debug(fun = "Indice.Gap") : could not find function "Indice.Gap"
And the issue does not happen if I manually step through the function (by selecting and running lines instead of calling the function).
I tried making a minimal example, but was unable to, so I don't think it is the nested functions that are the problem.
###This works as expected, when I run "wrapper", debug is called from within the function:
wrapper <- function(x){
wrapper <- function(x){
fun1 <- function(x){
fun0 <- function(x){
y = x + 1
return(y)
}
debug(fun0)
y = fun0(x) * 2
return(y)
}
fun1(x)
}
> wrapper(2)
debugging in: fun0(x)
debug at #3: {
y = x + 1
return(y)
}
Browse[2]>
debug at #4: y = x + 1
Browse[2]>
debug at #5: return(y)
Browse[2]>
exiting from: fun0(x)
[1] 6
This is the part I added into the NbClust function.
if(exists("Indice.Gap")){
print("Indice.Gap exists")
}
debug(fun = "Indice.Gap")
right before the first call of Indice.Gap:
resultSGAP <- Indice.Gap(x = jeu, clall = clall,
reference.distribution = "unif", B = 10, method = "ward.D2",
d = NULL, centrotypes = "centroids")
I only made very minor changes besides the one shown above, but if you want to look at the whole function, my copy is here: https://pastebin.com/wxKKDbHy
Just remove the quotes in debug and it should work:
debug(Indice.Gap)
should do the trick.
outer_fun <- function() {
inner_fun <- function() 1
## does not work
# debug("inner_fun")
## works
debug(inner_fun)
inner_fun()
}
outer_fun()
Funny enough on the top level you can provide the function name as string:
debug("outer_fun") # works
debug(outer_fun) # works
I want to plot this function in R:
My code:
lambda <- function (i) {
m <- 50
j=max(1,i+m)
n <- 27090
((2*m+1)^(-1))*
for (i in j:min(n,i+m)) {
sum (fires_2009_2015$Wi)[j]
}
}
plot(lambda)
However when I run the code, I get an error:
"Error in curve(expr = x, from = from, to = to, xlim = xlim, ylab = ylab, :
'expr' did not evaluate to an object of length 'n' "
The problem is that plot(lambda) will call the function lambda() with a vector argument for i, and your definition of lambda() won't handle that properly. Because of your for loop, it looks a little tricky to make it handle vector arguments efficiently, but there's always the brute force method:
lambda <- Vectorize(lambda)
After this your plot(lambda) should do something. It's probably not what you want, because it will default to a range from 0 to 1, but you can do things like
plot(lambda, from = 0, to = 20)
to change the default range.
I have a question about convert variable name into strings to work as a x-axis name.
I tried to apply the deparse(substitute(input)), but unfortunately, it doesn't work well when I called this function within another function.
plot_CI <- function(input){
nm <- deparse(substitute(input))
if (substring(nm,1,1) == 'u') {
prior <- 'uniform'
} else if ((substring(nm,1,1) == 'l')) {
prior <- 'logit_Normal'
} else {
prior <- paste(strsplit(nm,"_")[[1]][1:2],collapse="_")
}
plot <- ggplot(temp_data, aes(x = x, y = mean)) +
geom_point(size = 2) +
geom_errorbar(aes(ymax = high, ymin = low)) +
geom_hline(yintercept = true_value, col = 'blue') +
labs(x=prior, y='value')
return(plot)
}
sen_plot <- function(variable){
# variable <- deparse(substitute(var))
file_name <- paste0('C:/Users/Qiangsuper/Dropbox/Papers/1/plot/sensitivity_', variable, '.png')
png(filename = file_name, width = 1000, height = 400)
p1 <- plot_CI(eval(parse(text = paste0('uniform_', variable))))
p2 <- plot_CI(eval(parse(text = paste0('logitN_', variable))))
multiplot(p1,p2,cols=2)
dev.off()
}
for (i in c("beta_1", "beta_2", "beta_3", "phi", "p", "delta")) {
sen_plot(i)
}
I expect 'uniform' as X-axis name, however, I only received eval(parse(text = paste0('Uniform_', variable))).
Thank you very much for your help.
UPDATE ABOUT QUESTION:
I think I should make the question more concise. Here is an easily-understood scenario.
uniform_beta_1 is a data frame or data table, which stores the my results. I try to develop an automatic plotting algorithm which will automatically identify which prior distribution I applied and name the X-axis with this prior distribution. For uniform_beta_1, the prior distribution is uniform, then the X-axis's name will be uniform. Here is what I try to do:
input <- uniform_beta_1
nm <- deparse(substitute(input))
Then apply the substring command to judge the prior distribution. However, in this case, nm will return 'input' rather than 'uniform_beta_1'. I am wondering if there is any way I can return 'uniform_beta_1'.
Thank you very much for your help.
The final solution for my question is that I create a list at very beginning and store all these strings into this list to avoid the transformation in the function.
This question differs from my original; it adheres more to a minimal reproducible example and incorporates a recommendation by be_green against silently loading entire libraries within the context of a function.
The outer function starts by defining a number of cases, default values, and a list of any case exceptions. The inner function assembles each case by using the default values in a computation unless exceptions are defined. Finally, the outer function assembles these cases into a data frame.
Here is the function:
outerfun <- function(cases, var_default, exceptions=list()){
# Inner Function to create a case
innerfun <- function(var=var_default) { # Case
result = var
return(result)
}
# Combine Cases
datlist <- list()
for(case in 1:cases){
datlist[[paste0("X",case)]] <- do.call(innerfun, as.list(exceptions[[paste0("X",case)]]))
}
casedata <- do.call(dplyr::data_frame, datlist)
return(casedata)
}
This function works fine when I define values for the inner function as exceptions:
data <- outerfun(cases = 3, var_default = 10, exceptions = list("X2" = c(var = 14)))
But not when I mix the two:
data <- outerfun(cases = 3, var_default = 10, exceptions =
list("X2" = c(var = var_default + 4)))
Being able to mix the two are important since it makes the function more intuitive and easier to program for a variety of cases.
I think the problem might result from using do.call and have seen other threads detailing this issue (having to do with environments and frames), but I haven't been able to find an optimal solution for me. I like do.call since I can pass a list of arguments into a function. I could turn the inner function into a list (think: function(...) { }) but then I would have to define every variable instead of relying on the default.
Any help or suggestions you might have would be great.
The problem is that lvl_default is not defined outside the context of the function, and yet you call it as an input to a parameter. Because there is no variable called lvl_default in the global environment, when the function tries to evaluate the parameter exceptions = list(X3 - c(lvl = lvl_default + 10), it fails to find a variable to evaluate. You are not able to specify parameters by setting them equal to the names of other unevaluated parameters.
Instead, what I would recommend doing is setting a variable outside the function associated with the value you were hoping to pass into lvl_default and then pass it into the function like so:
level <- 1000
data <- genCaseData(n_signals = 3, datestart = "2017-07-01T15:00:00",
n_cycles = 4, period_default = 10, phase_default = 0, ampl_default = 15,
lvl_default = level, exceptions = list(X1= c(lvl=980),
X3 = c(lvl = level + 10)))
Also as I noted in a comment, I would recommend against silently loading entire libraries within the context of a function. You can end up masking things you didn't mean to, and running into strange errors because the require call doesn't actually throw one if a library is unavailable. Instead I would reference the functions through pkgname::fncname.
be_green did solve this first, but I wanted to follow-up with what I actually did for my project.
As be_green pointed out, I couldn't call var_default within the exception list since it hadn't yet been defined. I didn't understand this at first since you can actually define the default of an argument to a variable defined within the function itself:
addfun <- function(x, y = z + x + 2) {
z = 20
c(x, y)
}
addfun(x = 20)
[1] 20 42
This is because function arguments in R lazily evaluated. I thought this gave me a pass to call the function like this:
addfun(x = 10, y = x + z)
Error in addfun(x = 10, y = x + z) : object 'x' not found
If you remove x then it calls an error for z. So even though the default to y is dependent on x and z, you can't call the function using x or z.
be_green suggested that I pass arguments in a string and then parse it within the function. But I was afraid that others on my team would find the resulting syntax confusing.
Instead, I used ellipsis (...) and evaluated the ellipsis arguments within my function. I did this using this line of code:
list2env(eval(substitute(alist(...))), envir = as.environment(-1))
Here the eval(substitute(alist(...))) pattern is common but results in a named list of arguments. Due to some other features, it becomes more convenient to evaluate the arguments as objects within the function. list2env(x, envir = as.environment(-1)) accomplishes this with an additional step. Once the argument is called, you need to explicitly evaluate the call. So if I wanted to change my addfun() above:
addfun <- function(x, ...) {
z = 20
list2env(eval(substitute(alist(...))),
envir = as.environment(-1))
c(x, eval(y))
}
addfun(x = 10, y = x + z)
This is a trite example: I now need to define y even though it's not an argument in the function. But now I can even re-define z within the function call:
addfun(x = 10, y = z + 2, z = 10)
This is all possible because of non-standard evaluation. There can be trade-offs but in my application of non-standard evaluation, I was able to increase the usability and flexibility of the function while making it more intuitive to use.
Final code:
outerfun <- function(caseIDs, var_default, ...){
list2env(eval(substitute(alist(...))), envir = as.environment(-1))
# Inner Function to create a case
innerfun <- function(var=var_default) { # Case
result = var
return(result)
}
# Combine Cases
datlist <- lapply(caseIDs, function(case) {
do.call(innerfun, eval(get0(case, ifnotfound = list())))
})
names(datlist) <- caseIDs
casedata <- do.call(dplyr::data_frame, datlist)
return(casedata)
}
Now both examples work with full functionality:
data <- outerfun(caseIDs = c("X1","X2","X3"), var_default = 10,
X2 = list(var = 14))
data <- outerfun(caseIDs = c("X1","X2","X3"), var_default = 10,
X2 = list(var = var_default + 4))
I hope this helps someone else! Enjoy!
I'm running a principal component analysis on a dataset with more than 1000 variables. I'm using R Studio and when I run the summary to see the cumulative variance of the components, I can only see the last few hundred components. How do I limit the summary to only show, say, the first 100 components?
I tried this and it seems to be working:
l = loadings(prin)
l[,1:100]
It's pretty easy to modify print.summary.princomp (you can see the original code by typing stats:::print.summary.princomp) to do this:
pcaPrint <- function (x, digits = 3, loadings = x$print.loadings, cutoff = x$cutoff,n, ...)
{
#Check for sensible value of n; default to full output
if (missing(n) || n > length(x$sdev) || n < 1){n <- length(x$sdev)}
vars <- x$sdev^2
vars <- vars/sum(vars)
cat("Importance of components:\n")
print(rbind(`Standard deviation` = x$sdev[1:n], `Proportion of Variance` = vars[1:n],
`Cumulative Proportion` = cumsum(vars)[1:n]))
if (loadings) {
cat("\nLoadings:\n")
cx <- format(round(x$loadings, digits = digits))
cx[abs(x$loadings) < cutoff] <- paste(rep(" ", nchar(cx[1,
1], type = "w")), collapse = "")
print(cx[,1:n], quote = FALSE, ...)
}
invisible(x)
}
pcaPrint(summary(princomp(USArrests, cor=TRUE),
loadings = TRUE, cutoff = 0.2), digits = 2,n = 2)
Edited To include a basic check for a sensible value for n. Now that I've done this, I wonder if it isn't worth suggesting to R Core as a permanent addition; seems simple and like it might be useful.
You can put the loadings in matrix form, you could save the matrix to a variable and then subset (a la matrix[,1:100]) it to see the first/middle/last n. In this example, I've used head(). Each column is a principle component.
head(
matrix(
prin$loadings,
ncol=length(dimnames(prin$loadings)[[2]]),
nrow=length(dimnames(prin$loadings)[[1]])
),
100)