Options to speed up R code when parallelization is not feasible [duplicate] - r

This question already has an answer here:
Reducing nested for loop to single loop in R
(1 answer)
Closed 5 years ago.
I have developed a simulation, which can be quite slow (up to 6 hours), depending on values of input parameters.
Profiling my code reveals that sample() is the bottleneck, but from my knowledge, there is no better function (sample() is already optimized for maximal speed since it is written in C).
Parallelization will not work either since algorithm iterations are dependent on one another.
Any suggestions on alternatives to improve runtime?
Any advice is warmly welcomed.
Below is my simulation code:
Haplotype Accumulation Curve Simulator
HAC.sim <- function(K = 1, N, Hstar, probs, m = 0, perms = 10000, p = 0.95){
pop <- array(dim = c(c(perms, ceiling((1 - m) * N / K)), K))
haps <- as.character(1:Hstar)
specs <- 1:ceiling((1 - m) * N / K)
for (j in 1:perms){
for (i in 1:K){
pop[j, specs, i] <- sample(haps, size = length(specs), replace = TRUE, prob = probs)
}
}
HAC.mat <- array(dim = c(c(perms, length(specs), K)))
for (k in specs){
for (j in 1:perms){
for (i in 1:K){
ind.index <- sample(specs, size = k, replace = FALSE)
hap.plot <- pop[sample(1:nrow(pop), size = 1, replace = TRUE), ind.index, sample(i, size = 1, replace = TRUE)]
HAC.mat[j, k, i] <- length(unique(hap.plot))
}
}
}
means <- apply(HAC.mat, MARGIN = 2, mean)
lower <- apply(HAC.mat, MARGIN = 2, function(x) quantile(x, 0.025))
upper <- apply(HAC.mat, MARGIN = 2, function(x) quantile(x, 0.975))
d <- assign("d", data.frame(specs, means), envir = .GlobalEnv)
P <- max(means)
Q <- Hstar - max(means)
R <- assign("R", max(means) / Hstar, envir = .GlobalEnv)
S <- (Hstar - max(means)) / Hstar
Nstar <- assign("Nstar", (N * Hstar) / max(means), envir = .GlobalEnv)
X <- ((N * Hstar) / max(means)) - N
cat("\n Measures of Sampling Closeness \n \n Mean number of haplotypes sampled: " , P, "\n Mean number of haplotypes not sampled: " , Q, "\n Proportion of haplotypes sampled: " , R, "\n Proportion of haplotypes not sampled: " , S, "\n \n Calculated mean value of N*: ", Nstar, "\n Mean number of individuals not sampled: ", X, "\n \n")
if (R < p){
cat("Desired level of H* has not yet been reached \n")
} else{
cat("Desired level of H* has been reached")
}
par(mfrow = c(1, 2))
plot(specs, means, type = "n", xlab = "Specimens sampled", ylab = "Unique haplotypes", ylim = c(1, Hstar))
polygon(x = c(specs, rev(specs)), y = c(lower, rev(upper)), col = "gray")
lines(specs, means, lwd = 2)
HAC.bar <- barplot(length(specs) * probs, xlab = "Unique haplotypes", ylab = "Specimens sampled", names.arg = 1:Hstar)
}
A quick reproducible example
N <- 50
Hstar <- 5
probs <- rep(1/Hstar, Hstar)
HAC.sim(N = N, Hstar = Hstar, probs = probs)

Sure, Rcpp is an option but would require you to rewrite parts of your code in C++. One non-invasive option is to use the compiler package that can offer some speed improvements with minimal effort:
#old R version of lapply
slow_func <- function(X, FUN, ...) {
FUN <- match.fun(FUN)
if (!is.list(X))
X <- as.list(X)
rval <- vector("list", length(X))
for(i in seq(along = X))
rval[i] <- list(FUN(X[[i]], ...))
names(rval) <- names(X) # keep `names' !
return(rval)
}
# Compiled versions
require(compiler)
slow_func_compiled <- cmpfun(slow_func)
You can read more about it here. However, if the only issue is sample it looks like Rcpp has an alternative implemention. It looks like it is not faster in all cases though.

Related

Number of items to replace is not a multiple of replacement length - Datacamp

I follow some classes in DataCamp about R and sometimes when I replicate the code from datacamp to R-studio, I have issues but usually, I find the answer here or generally online. However, this time I cannot understand what is the mistake or how to fix it. I copy paste the code from datacamp
# Compute cross-validated errors for up to 8 steps ahead
e <- matrix(NA_real_, nrow = 1000, ncol = 8)
for(h in 1:8)
e[, h] <- tsCV(goog, forecastfunction = naive, h = h)
# Compute the MSE values and remove missing values
mse <- colMeans(e^2, na.rm = TRUE)
# Plot the MSE values against the forecast horizon
data.frame(h = 1:8, MSE = mse) %>% ggplot(aes(x = h, y = MSE)) + geom_point()
When I add this code to R-studio (including the demanded packages) I always get the following error:
**Error in e[, h] <- tsCV(goog, forecastfunction = naive, h = h) :
number of items to replace is not a multiple of replacement length**
Does anybody know why does this happen?
The problem here is when h=1 the tsCV will return only one column when h=2 it will provide two columns h=1 and h=2 similarly if h=8 it will return 8 columns h=1,...h=8. The following code will solve the problem but we will not be able to find values for h=1 with the loop (because tsCV(goog, forecastfunction = naive, h = 1) will be a vector and [,1] will be an incorrect number of dimensions) so we will calculate it separately:
library(forecast)
library(fpp2)
e <- matrix(data = NA, nrow = 1000, ncol =8)
for(h in 2:8){
e[, h] <- tsCV(goog, forecastfunction = naive, h = h)[,h]
}
e[,1]<- tsCV(goog, forecastfunction = naive, h = 1)
# Compute the MSE values and remove missing values
mse <- colMeans(e^2, na.rm = TRUE)
# Plot the MSE values against the forecast horizon
data.frame(h = 1:8, MSE = mse) %>% ggplot(aes(x = h, y = MSE)) + geom_point()
note that:
tsCV stands for Time series cross-validation from forecast package
goog is a data set from fpp2 package
when you set tsCV(h = n), it is returns n columns and calculate all values 1:n.
You can simply change your code to
# Compute cross-validated errors for up to 8 steps ahead
e <- matrix(NA_real_, nrow = 1000, ncol = 8)
e <- tsCV(goog, forecastfunction = naive, h = 8)
# Compute the MSE values and remove missing values
mse <- colMeans(e^2, na.rm = TRUE)
# Plot the MSE values against the forecast horizon
data.frame(h = 1:8, MSE = mse) %>%
ggplot(aes(x = h, y = MSE)) + geom_point()
If you want to know more about the tsCV function, below is the function code
function (y, forecastfunction, h = 1, window = NULL, ...)
{
y <- as.ts(y)
n <- length(y)
e <- ts(matrix(NA_real_, nrow = n, ncol = h))
tsp(e) <- tsp(y)
for (i in seq_len(n - 1)) {
fc <- try(suppressWarnings(forecastfunction(subset(y,
start = ifelse(is.null(window), 1L, ifelse(i - window >=
0L, i - window + 1L, stop("small window"))),
end = i), h = h, ...)), silent = TRUE)
if (!is.element("try-error", class(fc))) {
e[i, ] <- y[i + (1:h)] - fc$mean
}
}
if (h == 1) {
return(e[, 1L])
}
else {
colnames(e) <- paste("h=", 1:h, sep = "")
return(e)
}
}
<bytecode: 0x10e17fe70>
<environment: namespace:forecast>

Performing t-Test Selection manually

I’m trying to write simulation code, that generates data and runs t-test selection (discarding those predictors whose t-test p-value exceeds 0.05, retaining the rest) on it. The simulation is largely an adaptation of Applied Econometrics with R by Kleiber and Zeileis (2008, pp. 183–189).
When running the code, it usually fails. Yet with certain seeds (e.g. 1534) it produces plausible output. If it does not produce output (e.g. 1911), it fails due to: "Error in x[, ii] : subscript out of bounds", which traces back to na.omit.data.frame(). So, for some reason, the way I attempt to handle the NAs seems to fail, but I'm unable to figure out in how so.
coef <- rep(coef[,3], length.out = pdim+1)
err <- as.vector(rnorm(nobs, sd = sd))
uX <- c(rep(1, times = nobs))
pX <- matrix(scale(rnorm(nobs)), byrow = TRUE, ncol = pdim, nrow = nobs)
X <- cbind(uX, pX)
y <- coef %*% t(X) + err
y <- matrix(y)
tTp <- (summary(lm(y ~ pX)))$coefficients[,4]
tTp <- tTp[2:length(tTp)]
TTT <- matrix(c(tTp, rep(.7, ncol(pX)-length(tTp))))
tX <- matrix(NA, ncol = ncol(pX), nrow = nrow(pX))
for(i in 1:ncol(pX)) {ifelse(TTT[i,] < ALPHA, tX[,i] <- pX[,i], NA)}
tX <- matrix(Filter(function(x)!all(is.na(x)), tX), nrow = nobs)
TTR <- lm(y ~ tX)
The first block is unlikely to the cause of the error. It merely generates the data and works well on its own and with other methods, like PCA, as well. The second block pulls the p-values from the regression output; removes the p-value of the intercept (beta_0); and fills the vector with as many 7s as necessary to have the same length as the number of variables, to ensure the same dimension for matrix calculations. Seven is arbitrary and could be any number larger than 0.05 to not pass the test of the loop. This becomes – I believe – necessary, if R discards predictors due to multicollinearity.
The final block creates an empty matrix of the original dimensions; inserts the original data, if the t-test p-value is lower than 0.05, else retains the NA; while the penultimate line removes all columns containing NAs ((exclusively NA or one NA is the same here) taken from mnel’s answer to Remove columns from dataframe where ALL values are NA); lastly, the modified data is again put in the shape of a linear regression.
Does anyone know what causes this behavior or how it would work as intended? I would expect it to either work or not, but not kind of both. Ideally, the former.
A working version of the code is:
set.seed(1534)
Sim_TTS <- function(nobs = c(1000, 15000), pdim = pdims, coef = coef100,
model = c("MLC", "MHC"), ...){
DGP_TTS <- function(nobs = 1000, model = c("MLC", "MHC"), coef = coef100,
sd = 1, pdim = pdims, ALPHA = 0.05)
{
model <- match.arg(model)
if(model == "MLC") {
coef <- rep(coef[,1], length.out = pdim+1)
err <- as.vector(rnorm(nobs, sd = sd))
uX <- c(rep(1, times = nobs))
pX <- matrix(scale(rnorm(nobs)), byrow = TRUE, ncol = pdim, nrow = nobs)
X <- cbind(uX, pX)
y <- coef %*% t(X) + err
y <- matrix(y)
tTp <- (summary(lm(y ~ pX)))$coefficients[,4]
tTp <- tTp[2:length(tTp)]
TTT <- matrix(c(tTp, rep(.7, ncol(pX)-length(tTp))))
tX <- matrix(NA, ncol = ncol(pX), nrow = nrow(pX))
for(i in 1:ncol(pX)) {ifelse(TTT[i,] < ALPHA, tX[,i] <- pX[,i], NA)}
tX <- matrix(Filter(function(x)!all(is.na(x)), tX), nrow = nobs)
TTR <- lm(y ~ tX)
} else {
coef <- rep(coef[,2], length.out = pdim+1)
err <- as.vector(rnorm(nobs, sd = sd))
uX <- c(rep(1, times = nobs))
pX <- matrix(scale(rnorm(nobs)), byrow = TRUE, ncol = pdim, nrow = nobs)
X <- cbind(uX, pX)
y <- coef %*% t(X) + err
y <- matrix(y)
tTp <- (summary(lm(y ~ pX)))$coefficients[,4]
tTp <- tTp[2:length(tTp)]
TTT <- matrix(c(tTp, rep(.7, ncol(pX)-length(tTp))))
tX <- matrix(NA, ncol = ncol(pX), nrow = nrow(pX))
for(i in 1:ncol(pX)) {ifelse(TTT[i,] < ALPHA, tX[,i] <- pX[,i], NA)}
tX <- matrix(Filter(function(x)!all(is.na(x)), tX), nrow = nobs)
TTR <- lm(y ~ tX)
}
return(TTR)
}
PG_TTS <- function(nrep = 1, ...)
{
rsq <- matrix(rep(NA, nrep), ncol = 1)
rsqad <- matrix(rep(NA, nrep), ncol = 1)
pastr <- matrix(rep(NA, nrep), ncol = 1)
vmat <- cbind(rsq, rsqad, pastr)
colnames(vmat) <- c("R sq.", "adj. R sq.", "p*")
for(i in 1:nrep) {
vmat[i,1] <- summary(DGP_TTS(...))$r.squared
vmat[i,2] <- summary(DGP_TTS(...))$adj.r.squared
vmat[i,3] <- length(DGP_TTS(...)$coefficients)-1
}
return(c(mean(vmat[,1]), mean(vmat[,2]), round(mean(vmat[,3]))))
}
SIM_TTS <- function(...)
{
prs <- expand.grid(pdim = pdim, nobs = nobs, model = model)
nprs <- nrow(prs)
pow <- matrix(rep(NA, 3 * nprs), ncol = 3)
for(i in 1:nprs) pow[i,] <- PG_TTS(pdim = prs[i,1],
nobs = prs[i,2], model = as.character(prs[i,3]), ...)
rval <- rbind(prs, prs, prs)
rval$stat <- factor(rep(1:3, c(nprs, nprs, nprs)),
labels = c("R sq.", "adj. R sq.", "p*"))
rval$power <- c(pow[,1], pow[,2], pow[,3])
rval$nobs <- factor(rval$nobs)
return(rval)
}
psim_TTS <- SIM_TTS()
tab_TTS <- xtabs(power ~ pdim + stat + model + nobs, data = psim_TTS)
ftable(tab_TTS, row.vars = c("model", "nobs", "stat"), col.vars = "pdim")}
FO_TTS <- Sim_TTS()
FO_TTS
}
Preceeded by:
pdims <- seq(12, 100, 4)
coefLC12 <- c(0, rep(0.2, 4), rep(0.1, 4), rep(0, 4))/1.3
rtL <- c(0.2, rep(0, 3))/1.3
coefLC100 <- c(coefLC12, rep(rtL, 22))
coefHC12 <- c(0, rep(0.8, 4), rep(0.4, 4), rep(0, 4))/1.1
rtH <- c(0.8, rep(0, 3))/1.1
coefHC100 <- c(coefHC12, rep(rtH, 22))
coef100 <- cbind(coefLC100, coefHC100)
I’m aware that model selection via the significance of individual predictors is not recommended, but that is the whole point – it is meant to be compared to more sophisticated methods.

while loop within a user-defined R function is not breaking

I have created a user defined function as follows:
HAC.sim(K = 1, N, Hstar, m = 0, probs, perms, p = 0.95)
and within the body of this function, I would like to put the following 'while' loop (because I want to eventually create a package for easy use).
while(R < p){
HAC.sim(K = 1, N = Nstar, Hstar, m = 0, probs, perms, p = 0.95)
}
R is a variable that is defined in the main function body.
The problem is that the 'while' loop is not breaking when the condition is met (i.e., the moment when R < p).
My routine works (i.e., it breaks successfully) when the 'while' loop is placed outside of the function, but fails when I place it within the main function body.
I have also tried implementing a repeat loop with break, but the same problem occurs.
Any ideas on how I can fix the issue?
Here is the full function (quite long)
HAC.sim <- function(K = 1, N, Hstar, probs, m = 0, perms = 10000, p = 0.95){
### Set up a container to hold the identity of each individual from each permutation
pop <- array(dim = c(c(perms, ceiling((1 - m) * N / K)), K))
### Create an ID for each haplotype
haps <- as.character(1:Hstar)
### Generate permutations, we assume each permutation has Npop individuals, and we sample those individuals' haplotypes from the probabilities
specs <- 1:ceiling((1 - m) * N / K)
for(j in 1:perms){
for(i in 1:K){
pop[j, specs, i] <- sample(haps, size = ceiling((1 - m) * N / K), replace = TRUE, prob = probs)
}
}
### Make a matrix to hold the 1:N individuals from each permutation
HAC.mat <- array(dim = c(c(perms, ceiling((1 - m) * N / K)), K))
for(k in specs){
for(j in 1:perms){
for(i in 1:K){
ind.index <- sample(specs, size = k, replace = FALSE) ## which individuals will we sample
hap.plot <- pop[sample(1:nrow(pop), size = 1, replace = TRUE), ind.index, sample(i, size = 1, replace = TRUE)] ## pull those individuals from a permutation
HAC.mat[j, k, i] <- length(unique(hap.plot)) ## how many haplotypes did we get for a given sampling intensity (j) from each permutation (i)
}
}
}
### Calculate the mean and CI for number of haplotypes at each sampling intensity (k)
means <- apply(HAC.mat, MARGIN = 2, mean)
lower <- apply(HAC.mat, MARGIN = 2, function(x) quantile(x, 0.025))
upper <- apply(HAC.mat, MARGIN = 2, function(x) quantile(x, 0.975))
assign("d", data.frame(specs, means), envir = .GlobalEnv)
### Compute Measures of Sampling Closeness
P <- max(means)
Q <- Hstar - max(means)
R <- assign("R", max(means) / Hstar, envir = .GlobalEnv)
S <- (Hstar - max(means)) / Hstar
Nstar <- assign("Nstar", (N * Hstar) / max(means), envir = .GlobalEnv)
X <- ((N * Hstar) / max(means)) - N
cat("\n Input parameters \n \n Number of (sub)populations: ", K, "\n Number of individuals: ", N, "\n Number of haplotypes: ", Hstar, "\n Haplotype distribution: ", probs, "\n Migration rate: ", m, "\n Number of permutations: ", perms, "\n Proportion of haplotypes to recover: ", p, "\n \n \n Measures of Sampling Closeness \n \n Mean number of haplotypes sampled: " , P, "\n Mean number of haplotypes not sampled: " , Q, "\n Proportion of haplotypes sampled: " , R, "\n Proportion of haplotypes not sampled: " , S, "\n \n Calculated mean value of N*: ", Nstar, "\n Mean number of individuals not sampled: ", X, "\n \n")
if(R < p){
cat("Desired level of H* has not yet been reached")
} else{
cat("Desired level of H* has been reached")
}
### Plot the curve and frequency barplot
par(mfrow = c(1, 2))
plot(specs, means, type = "n", xlab = "Specimens sampled", ylab = "Unique haplotypes", ylim = c(1, Hstar))
polygon(x = c(specs, rev(specs)), y = c(lower, rev(upper)), col = "gray")
lines(specs, means, lwd = 2)
HAC.bar <- barplot(ceiling((1 - m) * N / K)*probs, xlab = "Unique haplotypes", ylab = "Specimens sampled", names.arg = 1:Hstar)
while(R < p){
HAC.sim(K = K, N = ceiling(Nstar), Hstar = Hstar, probs = probs, m = m, perms = perms, p = p)
}
}
### Run simulation
HAC.sim(K = K, N = N, Hstar = Hstar, probs = probs, m = m, perms = perms, p = p)

get(paste0(...)) overwrites results in R subarray

I have two subarrays and would like to sample one of them at random. The subarrays consist of character labels from a larger pool of labels as follows:
K <- as.character(1:10)
Suppose I call the arrays K1 and K2 and assign 10 labels between them as follows:
K1 <- 1:8
K2 <- 9:10
I then use
get(paste0("K", i))
to retrieve the labels that were sampled using a 'for' loop.
The issue I am having is this: if K1 is sampled, get(paste0("K", i)) returns labels from K2 (which contain labels 9:10) instead of returning K1 as needed.
In other words, I believe get(past0("K", i)) may be overwriting results.
Any thoughts on why this may occur and how to go about circumventing the issue?
Below is a more extensive example:
K <- 2
N <- 100
Hstar <- 10
perms <- 10000
probs <- rep(1/Hstar, Hstar)
specs <- 1:N
pop <- array(dim = c(c(perms, N), K))
haps <- as.character(1:Hstar)
K1 <- 1:8 # subarray 1
K2 <- 9:10 # subarray 2
for(j in 1:perms){
for(i in 1:K){
if(i == 1){
pop[j, specs, i] <- sample(haps, size = N, replace = TRUE, prob = probs)
}
else{
pop[j ,, 1] <- sample(haps[K1], size = N, replace = TRUE, prob = probs[K1])
pop[j ,, 2] <- sample(haps[K2], size = N, replace = TRUE, prob = probs[K1])
}
}
}
HAC.mat <- array(dim = c(c(perms, N), K))
for(k in specs){
for(j in 1:perms){
for(i in 1:K){
ind.index <- sample(specs, size = k, replace = FALSE)
hap.plot <- pop[sample(1:nrow(pop), size = 1, replace = TRUE), ind.index, sample(i, size = 1, replace = TRUE)]
HAC.mat[j, k, i] <- length(unique(hap.plot))
}
}
}
means <- apply(HAC.mat, MARGIN = 2, mean)
lower <- apply(HAC.mat, MARGIN = 2, function(x) quantile(x, 0.025))
upper <- apply(HAC.mat, MARGIN = 2, function(x) quantile(x, 0.975))
d <- data.frame(specs, means, lower, upper)
par(mfrow = c(1, 2))
if(i == 1){
plot(specs, means, type = "n", xlab = "Specimens sampled", ylab = "Unique haplotypes", ylim = c(1, Hstar))
polygon(x = c(specs, rev(specs)), y = c(lower, rev(upper)), col = "gray")
lines(specs, means, lwd = 2)
HAC.bar <- barplot(N*probs, xlab = "Unique haplotypes", ylab = "Specimens sampled", names.arg = 1:Hstar)
}
else if(i > 1){
plot(specs, means, type = "n", xlab = "Specimens sampled", ylab = "Unique haplotypes", ylim = c(1, Hstar))
polygon(x = c(specs, rev(specs)), y = c(lower, rev(upper)), col = "gray")
lines(specs, means, lwd = 2)
HAC.bar <- barplot(N*probs[get(paste0("K", i))], xlab = "Unique haplotypes", ylab = "Specimens sampled", names.arg = get(paste0("K", i))) ## The issue may lie here
}
Any advice on what may be going on here is greatly appreciated.

Extract a named dataframe from a function in R

I have created the following function that I need to extract some information from. However, R is giving me some issues.
HAC.sim <- function(K = 1, N, Hstar, probs, perms = 10000){
specs <- 1:N
### Set up a container to hold the identity of each individual from each permutation
pop <- array(dim = c(c(perms, N), K))
### Create an ID for each haplotype
haps <- as.character(1:Hstar)
### Assign probabilities of occurrence to each haplotype, ensure they sum to 1
### This is where we assume we "know" the distribution of haplotypes
### Here, I have assumed they all occur with equal frequency, but you can change this to assume some dominant ones and some rare ones, whatever you want
probs <- rep(1/Hstar, Hstar)
# probs <- c(220/N, rep(3/N, 2), rep(2/N, 2), rep(1/N, 10))
### Generate permutations, we assume each permutation has N individuals, and we sample those individuals' haplotypes from our probabilities
# If K > 1, haplotypes are partitioned into equally-sized subpopulations/demes
# Can change number of haplotypes in each subpopulation and re-run simulation
# For each additional, K, add new Ki and new pop[j ,, i] in loop
for(j in 1:perms){
for(i in 1:K){
if(i == 1){
pop[j, specs, i] <- sample(haps, size = N, replace = TRUE, prob = probs)
}
else{
pop[j ,, 1] <- sample(haps[K1], size = N, replace = TRUE, prob = probs[K1])
pop[j ,, 2] <- sample(haps[K2], size = N, replace = TRUE, prob = probs[K2])
}
}
}
### Make a matrix to hold the 1:N individuals from each permutation
HAC.mat <- array(dim = c(c(perms, N), K))
for(k in specs){
for(j in 1:perms){
for(i in 1:K){
ind.index <- sample(specs, size = k, replace = FALSE) ## which individuals will we sample
hap.plot <- pop[sample(1:nrow(pop), size = 1, replace = TRUE), ind.index, sample(1:K, size = 1, replace = TRUE)] ## pull those individuals from a permutation
HAC.mat[j, k, i] <- length(unique(hap.plot)) ## how many haplotypes did we get for a given sampling intensity (k) from each ### permutation (j)
}
}
}
### Calculate the mean and CI for number of haplotypes at each sampling intensity (j)
means <- apply(HAC.mat, MARGIN = 2, mean)
lower <- apply(HAC.mat, MARGIN = 2, function(x) quantile(x, 0.025))
upper <- apply(HAC.mat, MARGIN = 2, function(x) quantile(x, 0.975))
### Plot the curve and frequency barplot
par(mfrow = c(1, 2))
for(i in 1:K){
if(i == 1){
plot(specs, means, type = "n", xlab = "Specimens sampled", ylab = "Unique haplotypes", ylim = c(1, Hstar))
polygon(x = c(specs, rev(specs)), y = c(lower, rev(upper)), col = "gray")
lines(specs, means, lwd = 2)
HAC.bar <- barplot(N*probs, xlab = "Unique haplotypes", ylab = "Specimens sampled", names.arg = 1:Hstar)
}
else{
plot(specs, means, type = "n", xlab = "Specimens sampled", ylab = "Unique haplotypes", ylim = c(1, max(HAC.mat)))
polygon(x = c(specs, rev(specs)), y = c(lower, rev(upper)), col = "gray")
lines(specs, means, lwd = 2)
HAC.bar <- barplot(N*probs[get(paste0("K", i))], xlab = "Unique haplotypes", ylab = "Specimens sampled", names.arg = get(paste0("K",i)))
}
}
d <- data.frame(specs, means)
## Measures of Closeness ##
list(c(cat("\n Number of haplotypes sampled: " , max(means))
cat("\n Number of haplotypes not sampled: " , Hstar - max(means))
cat("\n Proportion of haplotypes sampled: " , max(means)/Hstar)
cat("\n Proportion of haplotypes not sampled: " , (Hstar - max(means))/Hstar)), d)
}
HAC.sim(K = 1, N = 100, Hstar = 10)
I would like to be able to easily extract the data frame 'd' from the above function.
I need to pass d to a regression model object.
The output gives me almost what I need except there is a [[1]] appended to the end of the value of (Hstar - max(means))/Hstar), instead of on a new line. Plus, [[1]] is a NULL list object. Why am I getting this NULL list as an output, and how can I fix the problem?
Any help is appreciated.

Resources