Replicate function doesn't work with "on the fly" function - r

I have the following data.frame:
df_1 <- data.frame(
x = replicate(
n = 6, expr = runif(n = 30, min = 20, max = 100), simplify = TRUE
)
)
I want generate 50 data.frames with this function:
f_1 <- function(x) {
data.frame(x = replicate(n = 5, runif(n = 30, min = 20, max = 100)))
}
lt_1 <- replicate(n = 50, expr = f_1(), simplify = FALSE)
The result is ok. But, when apply f_1 within a function (on the fly), this function doesn't work:
lt_2 <- replicate(
n = 50, expr = function(x) {
data.frame(x = replicate(n = 5, runif(n = 30, min = 20, max = 100)))
}, simplify = FALSE
)
What's problem?

We can wrap it inside the () and call () to execute the function
lt_2 <- replicate(
n = 50, expr = (function(x) {
data.frame(x = replicate(n = 5, runif(n = 30, min = 20, max = 100)))
})(), simplify = FALSE
)
In the OP's lt_1, the function is called with f_1()

Related

How can I create a new vector command within a loop in R?

for (i in 1:100) {
e <- rnorm(n = 20, mean = 100, sd = 10)
}
e
So I want to know how I can command (something) within each of the 20 randomly generated vectors. E.g. how can I tell it so it spits me out a new command for each new random vector?
for (i in 1:100) {
e <- rnorm(n = 20, mean = 100, sd = 10)
new_vector <- mean(e) - median(e)
}
e
I have tried this but that's definitely not it.
With the OP's code, we may need to initialize e and concatenate the object to append in each iteration
e <- numeric(0)
for (i in 1:100) {
e <- c(e, rnorm(n = 20, mean = 100, sd = 10))
}
If we want to create a list
e <- vector('list', 100)
for(i in 1:100) {
e[[i]] <- rnorm(n = 20, mean = 100, sd = 10)
}
Or if the interest is to get a vector with the difference of mean and median, initialize new_vector of length 100, loop over the sequence (1:100), get the random numbers in 'e' and assign the difference of mean, median for each position of the 'new_vector' using the the sequence as index
new_vector <- numeric(100)
for(i in 1:100){
e <- rnorm(n = 20, mean = 100, sd = 10)
new_vector[i] <- mean(e) - median(e)
}
Or using lapply/sapply/replicate
lapply(replicate(100, rnorm(n = 20, mean = 100, sd = 10),
simplify = FALSE), function(x) mean(x) - median(x))
Or with vectorized functions - colMedians (from matrixStats) and colMeans
library(matrixStats)
m1 <- replicate(100, rnorm(n = 20, mean = 100, sd = 10))
colMedians(m1)- colMeans(m1)

Fitting data frame probability distributions with different lengths - EnvStat - looping in R

I'm trying to fit probability distributions in R using EnvStat package and looping to calculate multiple columns at once.
Columns have different lengths and some code error is happening. The data frame does not remain in numeric format.
Error message: 'x' must be a numeric vector
I couldn't identify the error. Could anyone help?
Many thanks
Follow code:
x = runif(n = 50, min = 1, max = 12)
y = runif(n = 70, min = 5, max = 15)
z = runif(n = 35, min = 1, max = 10)
m = runif(n = 80, min = 6, max = 18)
length(x) = length(m)
length(y) = length(m)
length(z) = length(m)
df = data.frame(x=x,y=y,z=z,m=m)
df
library(EnvStats)
nproc = 4
cont = 1
dfr = data.frame(variavel = character(nproc),
locationevd= (nproc), scaleevd= (nproc),
stringsAsFactors = F)
# i = 2
for (i in 1:4) {
print(i)
nome.var=colnames(df)
df = df[,c(i)]
df = na.omit(df)
variavela = nome.var[i]
dfr$variavel[cont] = variavela
evd = eevd(df);evd
locationevd = evd$parameters[[1]]
dfr$locationevd[cont] = locationevd
scaleevd = evd$parameters[[2]]
dfr$scaleevd[cont] = scaleevd
cont = cont + 1
}
writexl::write_xlsx(dfr, path = "Results.xls")
Two major changes to you code:
First, use a list instead of a dataframe (so you can accommodate unequal vector lengths):
x = runif(n = 50, min = 1, max = 12)
y = runif(n = 70, min = 5, max = 15)
z = runif(n = 35, min = 1, max = 10)
m = runif(n = 80, min = 6, max = 18)
vl = list(x=x,y=y,z=z,m=m)
vl
if (!require(EnvStats){ install.packages('EnvStats'); library(EnvStats)}
nproc = 4
# cont = 1 Not used
dfr = data.frame(variavel = character(nproc),
locationevd= (nproc), scaleevd= (nproc),
stringsAsFactors = F)
Second: Use one loop index and not use "cont" index
for ( i in 1:length(vl) ) {
# print(i) Not needed
nome.var=names(vl) # probably should have been done before loop
var = vl[[i]]
variavela = nome.var[i]
dfr$variavel[i] = variavela # all those could have been one step
evd = eevd( vl[[i]] ) # ;evd
locationevd = evd$parameters[[1]]
dfr$locationevd[i] = locationevd
scaleevd = evd$parameters[[2]]
dfr$scaleevd[i] = scaleevd
}
Which gets you the desired structure:
dfr
variavel locationevd scaleevd
1 x 5.469831 2.861025
2 y 7.931819 2.506236
3 z 3.519528 2.040744
4 m 10.591660 3.223352

Creating a function to loop over data frame to create distributions of significant correlations in R

I have trouble creating a function that is too complex for my R knowledge and I'd appreciate any help.
I have a data set (DRC_epi) consisting of ~800.000 columns of epigenetic data. I'd like to randomly draw 1000 samples consisting of 500 column names each:
set.seed(42)
y <- replicate(1000, {
names(DRC_epi[, sample(ncol(DRC_epi), 500, replace = TRUE)])
})
I want to use these samples to select samples of a different data frame (DRC_epi_pheno) from which I want to create correlations with the outcome variable of my interest (phenotype_aas). So for the first sub sample it would look like this:
library(tidyverse)
library(correlation)
DRC_cor_sign_1 <- DRC_epi_pheno %>%
select(phenotype_aas, any_of(y[,1])) %>%
correlation(method = "spearman", p_adjust = "fdr") %>%
filter(Parameter1 %in% "phenotype_aas") %>%
filter(p <= 0.05) %>%
select(Parameter1, Parameter2, p)
From this result, I want to store the percentage of significant results in an object:
percentage <- data.frame()
percentage() <- length(DRC_cor_sign_1)/500*100
The question I have now is, how can I put it all together and automate it, so that I don't have to run the analyses 1000 times manually?
So that you have an idea of my data, I create here a toy data set that is similar to my real data set:
set.seed(42)
DRC_epi <- data.frame("cg1" = rnorm(n = 10, mean = 1, sd = 1.5),
"cg2" = rnorm(n = 10, mean = 1, sd = 1.5),
"cg3" = rnorm(n = 10, mean = 1, sd = 1.5),
"cg4" = rnorm(n = 10, mean = 1, sd = 1.5),
"cg5" = rnorm(n = 10, mean = 1, sd = 1.5),
"cg6" = rnorm(n = 10, mean = 1, sd = 1.5),
"cg7" = rnorm(n = 10, mean = 1, sd = 1.5),
"cg8" = rnorm(n = 10, mean = 1, sd = 1.5),
"cg9" = rnorm(n = 10, mean = 1, sd = 1.5),
"cg10" = rnorm(n = 10, mean = 1, sd = 1.5))
DRC_epi_pheno <- cbind(DRC_epi, phenotype_aas = sample(x = 0:40, size = 10, replace = TRUE))

Why lapply works and apply doesn't?

My data:
df_1 <- data.frame(
x = replicate(
n = 3,
expr = runif(n = 30, min = 20, max = 100)
),
y = sample(
x = 1:3, size = 30, replace = TRUE
)
)
The follow code with lapply works:
lapply(X = names(df_1)[c(1:3)], FUN = function(x) {
pairwise.t.test(
x = df_1[, x],
g = df_1[['y']],
p.adj = 'bonferroni'
)
})
But, with apply doesn't:
apply(X = names(df_1)[c(1:3)], MARGIN = 2, FUN = function(x) {
pairwise.t.test(
x = df_1[, x],
g = df_1[['y']],
p.adj = 'bonferroni'
)
})
Error in apply(X = names(df_1)[c(1:3)], MARGIN = 2, FUN = function(x) { :
dim(X) must have a positive length
Why the problem? Are they not equivalent?
For apply you should instead use
apply(X = df_1[1:3], MARGIN = 2, FUN = function(x) {
pairwise.t.test(
x = x,
g = df_1[['y']],
p.adj = 'bonferroni'
)
})
that is because from ?apply
apply returns a vector if MARGIN has length 1 and an array of dimension dim(X)[MARGIN] otherwise.
In your attempt you are using names(df_1)[c(1:3)] as argument to apply which has
dim(names(df_1)[c(1:3)])[2]
#NULL
Hence, you get the error.

How to run a simulation using a for loop

Trying to run a simulation in a for loop. The model representing the simulation is :
sim.predation(size = 30, n = 100, time = 100, handling.time = 2, draw.plot=FALSE)
I want to run a for loop on the n variable only from 100:1000
sapply(100:1000, function(x){
res <- sim.predation(size = 30, n=x, time = 100, handling.time = 2, draw.plot=FALSE)
return(res)
})
Or:
for (x in 100:1000) {
sim.predation(size = 30, n=x, time = 100, handling.time = 2, draw.plot=FALSE)
}

Resources