Suppose fun is a function with 3 arguments (x, y, z) and y or z needs to be specified, but not both.
fun <- function(x, y, z) {
if (missing(y)) {
x^2
} else {
x^5
}
}
Now assume this function gets call within another function as:
fun.v1 <- function(x, y, z) {
sapply(x, fun, y, z)
}
> fun.v1(1:5, y = 4)
[1] 1 32 243 1024 3125
> fun.v1(1:5, z = 4)
[1] 1 4 9 16 25
Rather than using sapply, now I want to implement a parallel backend:
require(parallel)
fun.v2 <- function(x, y, z) {
cl <- makeCluster(2)
bf <- parSapplyLB(cl = cl, X = x, fun, y, z)
stopCluster(cl = cl)
}
fun.v2(1:5, y = 4)
fun.v2(1:5, z = 4)
This code gives an error. Is there a way to fix this?
Update: Below code works as intended. But is there a neater way of doing this?
fun <- function(x, y, z) {
if (is.null(y)) {
x^2
} else {
x^5
}
}
fun.v2 <- function(x, y, z) {
cl <- makeCluster(2)
tmp1 <- if(missing(y))
NULL
else y
tmp2 <- if(missing(z))
NULL
else z
bf <- parSapplyLB(cl = cl, X = x, fun, y = tmp1, z = tmp2)
stopCluster(cl = cl)
return(bf)
}
> fun.v2(1:5, y = 4)
[1] 1 32 243 1024 3125
> fun.v2(1:5, z = 4)
[1] 1 4 9 16 25
It seems that y and z are both non-optional arguments. You can make them optional as follows:
fun.v2 <- function(x, y = NULL, z = NULL) {
cl <- makeCluster(2)
bf <- parSapplyLB(cl = cl, X = x, fun, y, z)
stopCluster(cl = cl)
}
This no longer throws an error.
Related
Model:
X(t) = 4*t + e(t);
t € [0; 1]
e(t) is a Gaussian process with zero mean and covariance function f(s, t) = exp( -|t - s| )
The final result over 100 runs (=100 gray lines) with 50 sampled points each should be like the gray area in the picture.
The green line is what I get from the code below.
library(MASS)
kernel_1 <- function(x, y){
exp(- abs(x - y))
}
cov_matrix <- function(x, kernel_fn, ...) {
outer(x, x, function(a, b) kernel_fn(a, b, ...))
}
draw_samples <- function(x, N=1, kernel_fn, ...) {
set.seed(100)
Y <- matrix(NA, nrow = length(x), ncol = N)
for (n in 1:N) {
K <- cov_matrix(x, kernel_fn, ...)
Y[, n] <- mvrnorm(1, mu = rep(0, times = length(x)), Sigma = K)
}
Y
}
x <- seq(0, 1, length.out = 51) # x-coordinates
model1 <- function(obs, x) {
model1_data <- matrix(NA, nrow = obs, ncol = length(x))
for(i in 1:obs){
e <- draw_samples(x, 1, kernel_fn = kernel_1)
X <- c()
for (p in 1:length(x)){
t <- x[p]
val <- (4*t) + e[p,]
X = c(X, val)
}
model1_data[i,] <- X
}
model1_data
}
# model1(100, x)
Because you have set.seed in draw_samples, you are getting the same random numbers with each draw. If you remove it, then you can do:
a <- model1(100, x)
matplot(t(a), type = "l", col = 'gray')
to get
Below is the downSample function of caret that I found here .
downSample <- function(x, y, list = FALSE, yname = "Class")
{
xc <- class(x)
if(!is.data.frame(x)) x <- as.data.frame(x)
if(!is.factor(y))
{
warning("Down-sampling requires a factor variable as the response. The original data was returned.")
return(list(x = x, y = y))
}
minClass <- min(table(y))
x$.outcome <- y
x <- ddply(x, .(y),
function(dat, n) dat[sample(seq(along = dat$.outcome), n),,drop = FALSE],
n = minClass)
y <- x$.outcome
x <- x[, !(colnames(x) %in% c("y", ".outcome")), drop = FALSE]
if(list)
{
if(xc[1] == "matrix") x <- as.matrix(x)
out <- list(x = x, y = y)
} else {
out <- cbind(x, y)
colnames(out)[ncol(out)] <- yname
}
out
}
suppose that my data set is iris :
data(iris)
x <- iris[, -5]
y <- iris[, 5]
to make the response variable a hugely unbalanced binary one :
y[-c(130, 146)] <- "setosa"
There are now therefore two instances of "virginica" and 148 instances of "setosa". I would like to modify the function downSample so that, in the end, instead of returning a subsampled data set with 50% of minClass, it returns a subsampled data set with for instance 30% (k) of minor class and 70% of major class. Because using the downSample function for n instances in the minClass it selects n instances of the other class to get a fully balanced data set. But in my case I loose a lot of data so I just want to balance it a bit not fully.
Let's suppose that k = 20 % i.e. in the end I want 20% of minClaas and 80% of the other class. I have already tried to modify this part of function :
x <- ddply(x, .(y), function(dat, n)
dat[sample(seq(along = dat$.outcome), n),, drop = FALSE], n = minClass)
by changing n to 4*n but I did not achieve it. There is this error :
Error in size <= n/2 :
comparison (4) is possible only for atomic and list types
Your help would be appreciated.
A simple way to perform this is to change the n = minClass part of the ddply call.
downSample_custom <- function(x, y, list = FALSE, yname = "Class", frac = 1){ #add argument frac which is in the 0 - 1 range
xc <- class(x)
if(!is.data.frame(x)) x <- as.data.frame(x)
if(!is.factor(y))
{
warning("Down-sampling requires a factor variable as the response. The original data was returned.")
return(list(x = x, y = y))
}
minClass <- min(table(y))
x$.outcome <- y
x <- ddply(x, .(y),
function(dat, n) dat[sample(seq(along = dat$.outcome), n),,drop = FALSE],
n = minClass*frac) #change the n to this
y <- x$.outcome
x <- x[, !(colnames(x) %in% c("y", ".outcome")), drop = FALSE]
if(list)
{
if(xc[1] == "matrix") x <- as.matrix(x)
out <- list(x = x, y = y)
} else {
out <- cbind(x, y)
colnames(out)[ncol(out)] <- yname
}
out
}
Does it work:
library(plyr)
imbalanced y:
set.seed(1)
y <- as.factor(sample(c("M", "F"),
prob = c(0.1, 0.9),
size = 10000,
replace = TRUE))
x <- rnorm(10000)
table(downSample_custom(x, y)[,2])
output:
F M
1044 1044
table(downSample_custom(x, y, frac = 0.5)[,2])
output:
F M
522 522
table(downSample_custom(x, y, frac = 0.2)[,2])
output
F M
208 208
using frac > 1 returns an error:
downSample_custom(x, y, frac = 2)
output
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'
EDIT: answer to the updated question.
This can be achieved for instance by sampling the indexes of each class separately. Here is an example that works only for two class problems:
downSample_custom <- function(x, y, yname = "Class", frac = 1){
lev <- levels(y)
minClass <- min(table(y))
lev_min <- levels(y)[which.min(table(y))]
inds_down <- sample(which(y == lev[lev != lev_min]), size = minClass * frac) #sample the indexes of the more abundant class according to minClass * frac
inds_minClass <- which(y == lev[lev == lev_min]) #take all the indexes of the lesser abundant class
out <- data.frame(x, y)
out <- out[sort(c(inds_down, inds_minClass)),]
colnames(out)[ncol(out)] <- yname
return(out)
}
how it looks in practice:
table(downSample_custom(x, y)[,2])
output:
F M
1044 1044
table(downSample_custom(x, y, frac = 5)[,2])
output:
F M
5220 1044
head(downSample_custom(x, y, frac = 5))
output:
x Class
1 -1.5163733 F
2 0.6291412 F
4 1.1797811 M
5 1.1176545 F
6 -1.2377359 F
7 -1.2301645 M
I'm running a mediation analysis on a dataset in r and can't figure out how to get psych::mediate to work--I've done the same on another dataset before and didn't change anything, but it's not working with this new data for some reason.
I tried:
1. Turning 'condition' into a condition.f factor
2. Explicitly naming DATA a "data.frame"
3. Specifying different parameters such as "z" or "mod" in the function
4. Checked capitalization on all the variable column names.
None of the above seem to work.
library(psych)
DATA = STEX_S1_FINALCLEAN
Mediation_RA = psych::mediate( y = "DV_See", x = "Share_T", m = "Seff", data = DATA)
print(Mediation_RA,short=F)
I'd expect a full output with mediation values, but have gotten:
Error in psych::mediate(y = "DV_See", x = "Share_T", m = "Seff", data = DATA) :
object 'ex' not found
I don't see and object 'ex' anywhere, and that's not a name of any columns in the DATA data frame.
Following the suggestion of #r2evans, you can use the following modified function:
mymediate <- function (y, x, m = NULL, data, mod = NULL, z = NULL, n.obs = NULL,
use = "pairwise", n.iter = 5000, alpha = 0.05, std = FALSE,
plot = TRUE, zero = TRUE, main = "Mediation")
{
cl <- match.call()
if (class(y) == "formula") {
ps <- fparse(y)
y <- ps$y
x <- ps$x
m <- ps$m
mod <- ps$prod
ex <- ps$ex
x <- x[!ps$x %in% ps$m]
z <- ps$z
print(str(ps))
} else {
ex = NULL
}
all.ab <- NULL
if (is.numeric(y))
y <- colnames(data)[y]
if (is.numeric(x))
x <- colnames(data)[x]
if (!is.null(m))
if (is.numeric(m))
m <- colnames(data)[m]
if (!is.null(mod)) {
if (is.numeric(mod)) {
nmod <- length(mod)
mod <- colnames(data)[mod]
}
}
if (is.null(mod)) {
nmod <- 0
}
else {
nmod <- length(mod)
}
var.names <- list(IV = x, DV = y, med = m, mod = mod, z = z,
ex = ex)
if (any(!(unlist(var.names) %in% colnames(data)))) {
stop("Variable names not specified correctly")
}
if (ncol(data) == nrow(data)) {
raw <- FALSE
if (nmod > 0) {
stop("Moderation Analysis requires the raw data")
}
else {
data <- data[c(y, x, m, z), c(y, x, m, z)]
}
}
else {
data <- data[, c(y, x, m, z, ex)]
}
if (nmod == 1) {
mod <- c(x, mod)
nmod <- length(mod)
}
if (!is.matrix(data))
data <- as.matrix(data)
if ((dim(data)[1] != dim(data)[2])) {
n.obs = dim(data)[1]
if (!is.null(mod))
if (zero)
data <- scale(data, scale = FALSE)
C <- cov(data, use = use)
raw <- TRUE
if (std) {
C <- cov2cor(C)
}
}
else {
raw <- FALSE
C <- data
nvar <- ncol(C)
if (is.null(n.obs)) {
n.obs <- 1000
message("The data matrix was a correlation matrix and the number of subjects was not specified. \n n.obs arbitrarily set to 1000")
}
if (!is.null(m)) {
message("The replication data matrices were simulated based upon the specified number of subjects and the observed correlation matrix.")
eX <- eigen(C)
data <- matrix(rnorm(nvar * n.obs), n.obs)
data <- t(eX$vectors %*% diag(sqrt(pmax(eX$values,
0)), nvar) %*% t(data))
colnames(data) <- c(y, x, m)
}
}
if ((nmod > 0) | (!is.null(ex))) {
if (!raw) {
stop("Moderation analysis requires the raw data")
}
else {
if (zero) {
data <- scale(data, scale = FALSE)
}
}
}
if (nmod > 0) {
prods <- matrix(NA, ncol = length(ps$prod), nrow = nrow(data))
colnames(prods) <- paste0("V", 1:length(ps$prod))
for (i in 1:length(ps$prod)) {
prods[, i] <- apply(data[, ps$prod[[i]]], 1, prod)
colnames(prods)[i] <- paste0(ps$prod[[i]], collapse = "*")
}
data <- cbind(data, prods)
x <- c(x, colnames(prods))
}
if (!is.null(ex)) {
quads <- matrix(NA, ncol = length(ex), nrow = nrow(data))
colnames(quads) <- ex
for (i in 1:length(ex)) {
quads[, i] <- data[, ex[i]] * data[, ex[i]]
colnames(quads)[i] <- paste0(ex[i], "^2")
}
data <- cbind(data, quads)
x <- c(x, colnames(quads))
}
if (raw) {
C <- cov(data, use = use)
}
if (std) {
C <- cov2cor(C)
}
xy <- c(x, y)
numx <- length(x)
numy <- length(y)
if (!is.null(m)) {
numm <- length(m)
nxy <- numx + numy
m.matrix <- C[c(x, m), c(x, m), drop = FALSE]
}
else {
numm <- 0
nxy <- numx
}
df <- n.obs - nxy - 1
xy.matrix <- C[c(x, m), y, drop = FALSE]
total.reg <- matReg(x, y, m = m, z = z, C = C, n.obs = n.obs)
direct <- total.reg$beta
if (!is.null(z)) {
colnames(direct) <- paste0(colnames(direct), "*")
rownames(direct) <- paste0(rownames(direct), "*")
}
if (numm > 0) {
a.reg <- matReg(x = x, y = m, C = C, z = z, n.obs = n.obs)
b.reg <- matReg(c(x, m), y, C = C, z = z, n.obs = n.obs)
cprime.reg <- matReg(c(x, m), y, C = C, n.obs = n.obs,
z = z)
a <- a.reg$beta
b <- b.reg$beta[-(1:numx), , drop = FALSE]
c <- total.reg$beta
cprime <- cprime.reg$beta
all.ab <- matrix(NA, ncol = numm, nrow = numx)
for (i in 1:numx) {
all.ab[i, ] <- a[i, ] * t(b[, 1])
}
colnames(all.ab) <- m
rownames(all.ab) <- x
ab <- a %*% b
indirect <- c - ab
if (is.null(n.obs)) {
message("Bootstrap is not meaningful unless raw data are provided or the number of subjects is specified.")
mean.boot <- sd.boot <- ci.quant <- boot <- se <- tvalue <- prob <- NA
}
else {
boot <- psych:::boot.mediate(data, x, y, m, z, n.iter = n.iter,
std = std, use = use)
mean.boot <- colMeans(boot)
sd.boot <- apply(boot, 2, sd)
ci.quant <- apply(boot, 2, function(x) quantile(x,
c(alpha/2, 1 - alpha/2), na.rm = TRUE))
mean.boot <- matrix(mean.boot, nrow = numx)
sd.boot <- matrix(sd.boot, nrow = numx)
ci.ab <- matrix(ci.quant, nrow = 2 * numx * numy)
boots <- list(mean = mean.boot, sd = sd.boot, ci = ci.quant,
ci.ab = ci.ab)
}
}
else {
a.reg <- b.reg <- reg <- NA
a <- b <- c <- ab <- cprime <- boot <- boots <- indirect <- cprime.reg <- NA
}
if (!is.null(z)) {
var.names$IV <- paste0(var.names$IV, "*")
var.names$DV <- paste0(var.names$DV, "*")
var.names$med <- paste0(var.names$med, "*")
colnames(C) <- rownames(C) <- paste0(colnames(C), "*")
}
result <- list(var.names = var.names, a = a, b = b, ab = ab,
all.ab = all.ab, c = c, direct = direct, indirect = indirect,
cprime = cprime, total.reg = total.reg, a.reg = a.reg,
b.reg = b.reg, cprime.reg = cprime.reg, boot = boots,
boot.values = boot, sdnames = colnames(data), data = data,
C = C, Call = cl)
class(result) <- c("psych", "mediate")
if (plot) {
if (is.null(m)) {
moderate.diagram(result)
}
else {
mediate.diagram(result, main = main)
}
}
return(result)
}
You can test the mymediate function using the following example:
library(psych)
mod.k2 <- mymediate(y="OccupAsp", x=c("Intelligence","Siblings","FatherEd","FatherOcc"),
m= c(5:6), data=R.kerch, n.obs=767, n.iter=50)
print(mod.k2)
When I write a setter method in a class, the setter method does not change the value. I just cannot find the error here.
point <- function(x,y){
structure(class = "point", list(
# attributes
x = x,
y = y,
# methods
get_x = function() paste("(", x,",",y,")"),
set_x = function(x,y){ self.x = x; self.y = y}
))}
> p <- point(0,1)
> p$get_x()
[1] "( 0 , 1 )"
> p$set_x(6,5)
> p$get_x()
[1] "( 0 , 1 )"
Try to follow this change to your code.
In function set_x, it is the values of variables x and y created in function point that are assigned new values with <<-, not the x and y that exist in the .GlobalEnv.
point <- function(x, y){
structure(class = "point", list(
x = x,
y = y,
get_x = function() paste("(", x,",",y,")"),
set_x = function(x, y){
x <<- x
y <<- y
}
))
}
x <- 0
y <- 1
p <- point(0,1)
p$get_x()
#[1] "( 0 , 1 )"
p$set_x(6,5)
p$get_x()
#[1] "( 6 , 5 )"
x
#[1] 0
y
#[1] 1
Tried this code via source()
f1 <- function(x, ...){
print(y)
}
f1(x = 1, y = 2)
or this code via source()
f1 <- function(x, ...){
y <- 2
f2(x, y = y, ...)
}
f2 <- function(x, ...){
print(y)
}
f1(x = 1)
Got this Error
Error in print(y) : object 'y' not found
I guess the '...' argument takes from the global environment?
you should call y in your function as correct like this
f1 <- function(x, ...){
l <- list(...)
if(!is.null(l$y)) print(l$y)
}
f1(x = 1, y=2)