makeCacheMatrix <- function(x = matrix()) {
invm <- NULL
set <- function(y) {
x <<- y
invm <<- NULL
}
get <- function() x
setinverse <- function(solve) invm <<- solve()
getinverse <- function() invm
list(set = set, get = get,
setinverse = setinverse,
getinverse = getinverse)
}
cacheSolve <- function(x, ...) {
invm <- x$getinverse()
if(!is.null(invm)) {
message("getting cached matrix")
return(invm)
}
unit <- x$get()
invm <- solve(unit, ...)
x$setinverse(invm)
invm
}
In the test run, I've created a 4/4 matrix which was then stored with the first function. The second function should check if the matrix is cached, if so, print a message and use the stored matrix. I'd like to know what could possibly cause the error displayed below. I have no idea why the data don't get passed to the solve function.
> test <- matrix(rnorm(16, 3), 4, 4)
> test
[,1] [,2] [,3] [,4]
[1,] 2.654912 4.085775 3.1288214 5.059539
[2,] 3.252612 3.403775 0.9990708 1.623138
[3,] 1.705998 3.586488 3.3337772 1.849144
[4,] 2.040830 4.815228 4.1713251 2.294179
> test2 <- makeCacheMatrix(test)
> cacheSolve(test2)
Error in solve.default() : argument "a" is missing, with no default
When you call x$setinverse(invm), you are passing in a matrix object. But in the function setinverse, you are calling its one argument as a function, using solve(). Unfortunately, while solve at that point refers to the object passed originally as invm, R is smart-enough to know that solve() is meant to use a function and therefore is referring to base::solve and not your matrix. If you run base::solve() you will get the same error. So your problem is that your setinverse is wrong.
If you instead assign setinverse <- function(z) invm <<- solve(z), it seems to operate without error. Edit: actually, I think you're just using this as a "setter" function, so it should really be setinverse <- function(solve) invm <<- solve (or function(z) invm <<- z, your call).
Note: I haven't thought through the whole process to know if this is truly what you want to do.
Functions:
makeCacheMatrix <- function(x = matrix()) {
invm <- NULL
set <- function(y) {
x <<- y
invm <<- NULL
}
get <- function() x
setinverse <- function(z) invm <<- z
getinverse <- function() invm
list(set = set, get = get,
setinverse = setinverse,
getinverse = getinverse)
}
# cacheSolve as previously defined
Reproducible running (I suggest you use set.seed next time):
set.seed(42)
test <- matrix(rnorm(16, 3), 4, 4)
test
# [,1] [,2] [,3] [,4]
# [1,] 4.37095845 3.40426832 5.01842371 1.61113930
# [2,] 2.43530183 2.89387548 2.93728590 2.72121123
# [3,] 3.36312841 4.51152200 4.30486965 2.86667866
# [4,] 3.63286260 2.90534096 5.28664539 3.63595040
test2 <- makeCacheMatrix(test)
cacheSolve(test2)
# [,1] [,2] [,3] [,4]
# [1,] 0.819557500 1.961377325 -1.416577665 -0.714220507
# [2,] -0.162978899 -0.332156840 0.713223804 -0.241513973
# [3,] -0.343049934 -1.761166112 0.897525201 0.762466368
# [4,] -0.189839546 0.866424154 -0.459528758 0.073008724
It works! Thank you so much!
What I still don't get though is why the mean() function doesn't need this specification and still can refer to an object. Originally the code went like this:
makeVector <- function(x = numeric()) {
m <- NULL
set <- function(y) {
x <<- y
m <<- NULL
}
get <- function() x
setmean <- function(mean) m <<- mean
getmean <- function() m
list(set = set, get = get,
setmean = setmean,
getmean = getmean)
}
cachemean <- function(x, ...) {
m <- x$getmean()
if(!is.null(m)) {
message("getting cached data")
return(m)
}
data <- x$get()
m <- mean(data, ...)
x$setmean(m)
m
}
I am trying to run this example for the R Programming course on coursEra. However when I try to determine whether or not the matrix is square I get error saying "Error in is.square.matrix(x) : argument x is not a matrix"
My code is below:
library(matrixcalc)
##non-square matrix
NCols <- sample(3:6, 1)
NRows <- sample(2:8, 1)
myMat <- matrix(runif(NCols*NRows), ncol=NCols)
is.square.matrix(myMat)
## functions
makeMatrix <- function(x = matrix()) {
m <- NULL
set <- function(y) {
x <<- y
m <<- NULL
}
get <- function() x
setInv <- if (is.square.matrix(x) == TRUE) {
function(solve) m <<- solve
}
{
function(ginv) m <<- ginv
}
getInv <- function() m
list(x, set = set, get = get,
setInv = setInv,
getInv = getInv)
}
cacheMatrix <- function(x, ...) {
m <- x$getInv()
if(!is.null(m)) {
message("getting cached data")
return(m)
}
data <- x$get()
m <- if (is.square.matrix(x) == TRUE) {
solve(data, ...)
}
{
ginv(data, ...)
}
x$setInv(m)
m
}
## run functions for matrix
notSquare <- makeMatrix(myMat)
cacheMatrix(notSquare)
##check
ginv(myMat)
Then I get the error:
Error in is.square.matrix(x) : argument x is not a matrix
I am a beginner so not sure how to get the sentInv to recognize and check if the matrix is square or not.
Brian
Nevermind. In the makeMatrix function needed to replace x with (x = matrix()) and in the cacheMatrix replace x with (data)
Here's the answer. I just made the function (x) instead of function(x = matrix()) and 'data' is the variable to pull the matrix in the cache function need to have that me the input.
##non-square matrix
NCols <- sample(3:6, 1)
NRows <- sample(2:8, 1)
myMat <- matrix(runif(NCols*NRows), ncol=NCols)
is.square.matrix(myMat)
## functions
makeCacheMatrix <- function(x) {
m <- NULL
set <- function(y) {
x <<- y
m <<- NULL
}
get <- function() x
setInv <- if (is.square.matrix(x) == TRUE) {
function(solve) m <<- solve
}
else {
function(ginv) m <<- ginv
}
getInv <- function() m
list(x, set = set, get = get,
setInv = setInv,
getInv = getInv)
}
cacheSolve <- function(x, ...) {
m <- x$getInv()
if(!is.null(m)) {
message("getting cached data")
return(m)
}
data <- x$get()
m <- if (is.square.matrix(data) == TRUE) {
solve(data, ...)
}
else {
ginv(data, ...)
}
x$setInv(m)
m
}
## run functions for myMat
notSquare <- makeCacheMatrix(myMat)
cacheSolve(notSquare)
##check
ginv(myMat)
I am trying to add a progress bar to a bootstrap function in R.
I tried to make the example function as simple as possible (hence i'm using mean in this example).
library(boot)
v1 <- rnorm(1000)
rep_count = 1
m.boot <- function(data, indices) {
d <- data[indices]
setWinProgressBar(pb, rep_count)
rep_count <- rep_count + 1
Sys.sleep(0.01)
mean(d, na.rm = T)
}
tot_rep <- 200
pb <- winProgressBar(title = "Bootstrap in progress", label = "",
min = 0, max = tot_rep, initial = 0, width = 300)
b <- boot(v1, m.boot, R = tot_rep)
close(pb)
The bootstrap functions properly, but the problem is that the value of rep_count does not increase in the loop and the progress bar stays frozen during the process.
If I check the value of rep_count after the bootstrap is complete, it is still 1.
What am i doing wrong? maybe the boot function does not simply insert the m.boot function in a loop and so the variables in it are not increased?
Thank you.
You could use the package progress as below:
library(boot)
library(progress)
v1 <- rnorm(1000)
#add progress bar as parameter to function
m.boot <- function(data, indices, prog) {
#display progress with each run of the function
prog$tick()
d <- data[indices]
Sys.sleep(0.01)
mean(d, na.rm = T)
}
tot_rep <- 200
#initialize progress bar object
pb <- progress_bar$new(total = tot_rep + 1)
#perform bootstrap
boot(data = v1, statistic = m.boot, R = tot_rep, prog = pb)
I haven't quite figured out yet why it's necessary to set the number of iterations for progress_bar to be +1 the total bootstrap replicates (parameter R), but this is what was necessary in my own code, otherwise it throws an error. It seems like the bootstrap function is run one more time than you specify in parameter R, so if the progress bar is set to only run R times, it thinks the job is finished before it really is.
The pbapply package was designed to work with vectorized functions. There are 2 ways to achieve that in the context of this question: (1) write a wrapper as was suggested, which will not produce the same object of class 'boot'; (2) alternatively, the line lapply(seq_len(RR), fn) can be written as pblapply(seq_len(RR), fn). Option 2 can happen either by locally copying/updating the boot function as shown in the example below, or asking the package maintainer, Brian Ripley, if he would consider adding a progress bar directly or through pbapply as dependency.
My solution (changes indicated by comments):
library(boot)
library(pbapply)
boot2 <- function (data, statistic, R, sim = "ordinary", stype = c("i",
"f", "w"), strata = rep(1, n), L = NULL, m = 0, weights = NULL,
ran.gen = function(d, p) d, mle = NULL, simple = FALSE, ...,
parallel = c("no", "multicore", "snow"), ncpus = getOption("boot.ncpus",
1L), cl = NULL)
{
call <- match.call()
stype <- match.arg(stype)
if (missing(parallel))
parallel <- getOption("boot.parallel", "no")
parallel <- match.arg(parallel)
have_mc <- have_snow <- FALSE
if (parallel != "no" && ncpus > 1L) {
if (parallel == "multicore")
have_mc <- .Platform$OS.type != "windows"
else if (parallel == "snow")
have_snow <- TRUE
if (!have_mc && !have_snow)
ncpus <- 1L
loadNamespace("parallel")
}
if (simple && (sim != "ordinary" || stype != "i" || sum(m))) {
warning("'simple=TRUE' is only valid for 'sim=\"ordinary\", stype=\"i\", n=0', so ignored")
simple <- FALSE
}
if (!exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE))
runif(1)
seed <- get(".Random.seed", envir = .GlobalEnv, inherits = FALSE)
n <- NROW(data)
if ((n == 0) || is.null(n))
stop("no data in call to 'boot'")
temp.str <- strata
strata <- tapply(seq_len(n), as.numeric(strata))
t0 <- if (sim != "parametric") {
if ((sim == "antithetic") && is.null(L))
L <- empinf(data = data, statistic = statistic, stype = stype,
strata = strata, ...)
if (sim != "ordinary")
m <- 0
else if (any(m < 0))
stop("negative value of 'm' supplied")
if ((length(m) != 1L) && (length(m) != length(table(strata))))
stop("length of 'm' incompatible with 'strata'")
if ((sim == "ordinary") || (sim == "balanced")) {
if (isMatrix(weights) && (nrow(weights) != length(R)))
stop("dimensions of 'R' and 'weights' do not match")
}
else weights <- NULL
if (!is.null(weights))
weights <- t(apply(matrix(weights, n, length(R),
byrow = TRUE), 2L, normalize, strata))
if (!simple)
i <- index.array(n, R, sim, strata, m, L, weights)
original <- if (stype == "f")
rep(1, n)
else if (stype == "w") {
ns <- tabulate(strata)[strata]
1/ns
}
else seq_len(n)
t0 <- if (sum(m) > 0L)
statistic(data, original, rep(1, sum(m)), ...)
else statistic(data, original, ...)
rm(original)
t0
}
else statistic(data, ...)
pred.i <- NULL
fn <- if (sim == "parametric") {
ran.gen
data
mle
function(r) {
dd <- ran.gen(data, mle)
statistic(dd, ...)
}
}
else {
if (!simple && ncol(i) > n) {
pred.i <- as.matrix(i[, (n + 1L):ncol(i)])
i <- i[, seq_len(n)]
}
if (stype %in% c("f", "w")) {
f <- freq.array(i)
rm(i)
if (stype == "w")
f <- f/ns
if (sum(m) == 0L)
function(r) statistic(data, f[r, ], ...)
else function(r) statistic(data, f[r, ], pred.i[r,
], ...)
}
else if (sum(m) > 0L)
function(r) statistic(data, i[r, ], pred.i[r, ],
...)
else if (simple)
function(r) statistic(data, index.array(n, 1, sim,
strata, m, L, weights), ...)
else function(r) statistic(data, i[r, ], ...)
}
RR <- sum(R)
res <- if (ncpus > 1L && (have_mc || have_snow)) {
if (have_mc) {
parallel::mclapply(seq_len(RR), fn, mc.cores = ncpus)
}
else if (have_snow) {
list(...)
if (is.null(cl)) {
cl <- parallel::makePSOCKcluster(rep("localhost",
ncpus))
if (RNGkind()[1L] == "L'Ecuyer-CMRG")
parallel::clusterSetRNGStream(cl)
res <- parallel::parLapply(cl, seq_len(RR), fn)
parallel::stopCluster(cl)
res
}
else parallel::parLapply(cl, seq_len(RR), fn)
}
}
else pblapply(seq_len(RR), fn) #### changed !!!
t.star <- matrix(, RR, length(t0))
for (r in seq_len(RR)) t.star[r, ] <- res[[r]]
if (is.null(weights))
weights <- 1/tabulate(strata)[strata]
boot.return(sim, t0, t.star, temp.str, R, data, statistic,
stype, call, seed, L, m, pred.i, weights, ran.gen, mle)
}
## Functions not exported by boot
isMatrix <- boot:::isMatrix
index.array <- boot:::index.array
boot.return <- boot:::boot.return
## Now the example
m.boot <- function(data, indices) {
d <- data[indices]
mean(d, na.rm = T)
}
tot_rep <- 200
v1 <- rnorm(1000)
b <- boot2(v1, m.boot, R = tot_rep)
The increased rep_count is a local variable and lost after each function call. In the next iteration the function gets rep_count from the global environment again, i.e., its value is 1.
You can use <<-:
rep_count <<- rep_count + 1
This assigns to the rep_count first found on the search path outside the function. Of course, using <<- is usually not recommended because side effects of functions should be avoided, but here you have a legitimate use case. However, you should probably wrap the whole thing in a function to avoid a side effect on the global environment.
There might be better solutions ...
I think i found a possible solution. This merges the answer of #Roland with the convenience of the pbapply package, using its functions startpb(), closepb(), etc..
library(boot)
library(pbapply)
v1 <- rnorm(1000)
rep_count = 1
tot_rep = 200
m.boot <- function(data, indices) {
d <- data[indices]
setpb(pb, rep_count)
rep_count <<- rep_count + 1
Sys.sleep(0.01) #Just to slow down the process
mean(d, na.rm = T)
}
pb <- startpb(min = 0, max = tot_rep)
b <- boot(v1, m.boot, R = tot_rep)
closepb(pb)
rep_count = 1
As previously suggested, wrapping everything in a function avoids messing with the rep_count variable.
The progress bar from the package dplyr works well:
library(dplyr)
library(boot)
v1 <- rnorm(1000)
m.boot <- function(data, indices) {
d <- data[indices]
p$tick()$print() # update progress bar
Sys.sleep(0.01)
mean(d, na.rm = T)
}
tot_rep <- 200
p <- progress_estimated(tot_rep+1) # init progress bar
b <- boot(v1, m.boot, R = tot_rep)
You can use the package pbapply
library(boot)
library(pbapply)
v1 <- rnorm(1000)
rep_count = 1
# your m.boot function ....
m.boot <- function(data, indices) {
d <- data[indices]
mean(d, na.rm = T)
}
# ... wraped in `bootfunc`
bootfunc <- function(x) { boot(x, m.boot, R = 200) }
# apply function to v1 , returning progress bar
pblapply(v1, bootfunc)
# > b <- pblapply(v1, bootfunc)
# > |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% Elapsed time: 02s