Using user-defined "for loop" function to construct a data frame - r

I am trying to generate a data frame based on a user-defined function. My problem is that in the output only the first row is being filled.
Here is an example of the function I am using:
df <- data.frame(cs=rep(c("T1","T2","T3","T4"),each=16),yr=rep(c(1:4), times = 4, each = 4))
sp.df <- data.frame(matrix(sample.int(100, size = 20*64,replace=T), ncol = 20, nrow = 64))
myfunc<-function(X, system, Title)
{
for(i in 1:4){
Col_T <- data.frame(matrix(NA, ncol = length(X), nrow = 4))
Col_T[i,] <- colSums(X[which(df$yr==i & df$cs==system),])
return (Col_T)}}
myfunc(X=sp.df, system="T1", Title="T1")
I would welcome any suggestion to resolve this issue.
Thank you very much.

There are two problems with the function:
You're overwriting Col_T with all NAs as the first statement inside the for loop.
You're returning from the function inside the for loop.
Rewrite it as follows:
myfunc <- function(X, system, Title ) {
Col_T <- data.frame(matrix(NA, ncol=length(X), nrow=4 ));
for (i in 1:4)
Col_T[i,] <- colSums(X[which(df$yr==i & df$cs==system),]);
return(Col_T);
};

Related

Store values from a loop

I am simulating dice throws, and would like to save the output in a single object, but cannot find a way to do so. I tried looking here, here, and here, but they do not seem to answer my question.
Here is my attempt to assign the result of a 20 x 3 trial to an object:
set.seed(1)
Twenty = for(i in 1:20){
trials = sample.int(6, 3, replace = TRUE)
print(trials)
i = i+1
}
print(Twenty)
What I do not understand is why I cannot recall the function after it is run?
I also tried using return instead of print in the function:
Twenty = for(i in 1:20){
trials = sample.int(6, 3, replace = TRUE)
return(trials)
i = i+1
}
print(Twenty)
or creating an empty matrix first:
mat = matrix(0, nrow = 20, ncol = 3)
mat
for(i in 1:20){
mat[i] = sample.int(6, 3, replace = TRUE)
print(mat)
i = i+1
}
but they seem to be worse (as I do not even get to see the trials).
Thanks for any hints.
There are several things wrong with your attempts:
1) A loop is not a function nor an object in R, so it doesn't make sense to assign a loop to a variable
2) When you have a loop for(i in 1:20), the loop will increment i so it doesn't make sense to add i = i + 1.
Your last attempt implemented correctly would look like this:
mat <- matrix(0, nrow = 20, ncol = 3)
for(i in 1:20){
mat[i, ] = sample.int(6, 3, replace = TRUE)
}
print(mat)
I personally would simply do
matrix(sample.int(6, 20 * 3, replace = TRUE), nrow = 20)
(since all draws are independent and with replacement, it doesn't matter if you make 3 draws 20 times or simply 60 draws)
Usually, in most programming languages one does not assign objects to for loops as they are not formally function objects. One uses loops to interact iteratively on existing objects. However, R maintains the apply family that saves iterative outputs to objects in same length as inputs.
Consider lapply (list apply) for list output or sapply (simplified apply) for matrix output:
# LIST OUTPUT
Twenty <- lapply(1:20, function(x) sample.int(6, 3, replace = TRUE))
# MATRIX OUTPUT
Twenty <- sapply(1:20, function(x) sample.int(6, 3, replace = TRUE))
And to see your trials, simply print out the object
print(Twenty)
But since you never use the iterator variable, x, consider replicate (wrapper to sapply which by one argument can output a matrix or a list) that receives size and expression (no sequence inputs or functions) arguments:
# MATRIX OUTPUT (DEFAULT)
Twenty <- replicate(20, sample.int(6, 3, replace = TRUE))
# LIST OUTPUT
Twenty <- replicate(20, sample.int(6, 3, replace = TRUE), simplify = FALSE)
You can use list:
Twenty=list()
for(i in 1:20){
Twenty[[i]] = sample.int(6, 3, replace = TRUE)
}

Adding variable name to column in for statement

I have multiple columns in a table called "Gr1","Gr2",...,"Gr10".
I want to convert the class from character to integer. I want to do it in a dynamic way, I'm trying this, but it doesn't work:
for (i in 1:10) {
Col <- paste0('Students1$Gr',i)
Col <- as.integer(Col)
}
My objective here is to know how to add dynamically the for variable to the name of a column. Something like:
for (i in 1:10) {
Students1$Gr(i) <- as.integer(Students1$Gr(i))
}
Any idea is welcome.
Thank you very much,
Matias
# Example matrix
xm <- matrix(as.character(1:100), ncol = 10);
colnames(xm) <- paste0('Gr', 1:10);
# Example data frame
xd <- as.data.frame(xm, stringsAsFactors = FALSE);
# For matrices, this works
xm <- apply(X = xm, MARGIN = 2, FUN = as.integer);
# For data frames, this works
for (i in 1:10) {
xd[ , paste0('Gr', i)] <- as.integer(xd[ , paste0('Gr', i)]);
}

Using "global" variables in function given to apply. We could see them, but could not assign values

Consider the following R-code:
i <- 10
j <- 0
m <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2, byrow = TRUE)
apply(m, 1, function(x) {
if(i > 0)
j<-1
else
j<-2
return <- i
})
Inside the function we can read the content of the variable i, but we cannot manipulate the content of the variable j? But when we can read variables inside the function I would expect that we can assign new values to them too? So can someone explain what is happening here?

How to extract the p.value and estimate from cor.test() in a data.frame?

In this example, I have temperatures values from 50 different sites, and I would like to correlate the Site1 with all the 50 sites. But I want to extract only the components "p.value" and "estimate" generated with the function cor.test() in a data.frame into two different columns.
I have done my attempt and it works, but I don't know how!
For that reason I would like to know how can I simplify my code, because the problem is that I have to run two times a Loop "for" to get my results.
Here is my example:
# Temperature data
data <- matrix(rnorm(500, 10:30, sd=5), nrow = 100, ncol = 50, byrow = TRUE,
dimnames = list(c(paste("Year", 1:100)),
c(paste("Site", 1:50))) )
# Empty data.frame
df <- data.frame(label=paste("Site", 1:50), Estimate="", P.value="")
# Extraction
for (i in 1:50) {
df1 <- cor.test(data[,1], data[,i] )
df[,2:3] <- df1[c("estimate", "p.value")]
}
for (i in 1:50) {
df1 <- cor.test(data[,1], data[,i] )
df[i,2:3] <- df1[c("estimate", "p.value")]
}
df
I will appreciate very much your help :)
I might offer up the following as well (masking the loops):
result <- do.call(rbind,lapply(2:50, function(x) {
cor.result<-cor.test(data[,1],data[,x])
pvalue <- cor.result$p.value
estimate <- cor.result$estimate
return(data.frame(pvalue = pvalue, estimate = estimate))
})
)
First of all, I'm guessing you had a typo in your code (you should have rnorm(5000 if you want unique values. Otherwise you're going to cycle through those 500 numbers 10 times.
Anyway, a simple way of doing this would be:
data <- matrix(rnorm(5000, 10:30, sd=5), nrow = 100, ncol = 50, byrow = TRUE,
dimnames = list(c(paste("Year", 1:100)),
c(paste("Site", 1:50))) )
# Empty data.frame
df <- data.frame(label=paste("Site", 1:50), Estimate="", P.value="")
estimates = numeric(50)
pvalues = numeric(50)
for (i in 1:50){
test <- cor.test(data[,1], data[,i])
estimates[i] = test$estimate
pvalues[i] = test$p.value
}
df$Estimate <- estimates
df$P.value <- pvalues
df
Edit: I believe your issue was is that in the line df <- data.frame(label=paste("Site", 1:50), Estimate="", P.value="") if you do typeof(df$Estimate), you see it's expecting an integer, and typeof(test$estimate) shows it spits out a double, so R doesn't know what you're trying to do with those two values. you can redo your code like thus:
df <- data.frame(label=paste("Site", 1:50), Estimate=numeric(50), P.value=numeric(50))
for (i in 1:50){
test <- cor.test(data[,1], data[,i])
df$Estimate[i] = test$estimate
df$P.value[i] = test$p.value
}
to make it a little more concise.
similar to the answer of colemand77:
create a cor function:
cor_fun <- function(x, y, method){
tmp <- cor.test(x, y, method= method)
cbind(r=tmp$estimate, p=tmp$p.value) }
apply through the data.frame. You can transpose the result to get p and r by row:
t(apply(data, 2, cor_fun, data[, 1], "spearman"))

R apply ecdf to every column

I am trying to get the ecdf() of each column c in a data frame and then feed the rows r of a column into that column's ecdf(c) to return the corresponding value ecdf(c)(r) per cell. The function below works.
getecdf <- function(data){
# initialise result data
result <- data.frame(matrix(NA, nrow = nrow(data), ncol = ncol(data)));
rownames(result) <- rownames(data); colnames(result) <- colnames(data);
for (c in 1:ncol(data)){
col <- data[,c];
cum <- ecdf(col)
for (r in 1: length(col)){
result[r,c] <- cum(col[r])
}
}
return <- result
}
cum_matrix <- getecdf(iris)
However, I am 99.5% sure there is a one liner code that includes apply to do this.
I tried:
apply(iris, 2, function(x) for (r in x){ ecdf(x)(r)})
but don't know how to store the results.
Any help?
ecdf is vectorized, you can use
apply(iris[, 1:4], 2, function(c) ecdf(c)(c))

Resources