Writing loops for multiple columns - r

DT <- as.data.frame(datafile)
for (i in 1:25){
MER = DT[, 1]
PER = DT[, (1+i)]
model_i = assign(paste("lm", i, sep = ""), model_parameters(lm(PER ~ MER)))
alpha = append(alpha, lm(PER ~ MER)$coefficient[1])
t_alpha = append(t_alpha, model_i[1,7])
if (model_i[1,7] > 1.96){
`indicator = append(indicator, 1)`
}else {
`indicator = append(indicator, 0)
}`
For a csv file(i.e. datafile), what's the benefits of using as.data.frame?
Is it using the second column, third column... 26th column as y, to regress the first column(x)?
What is the line of model_i doing? Is it: create lm1, lm2...lm25, then assign these numbers to model 1, model2...model 25? But it seems that lm and model are different names, what does it assign?
For the append function, how should we use it? Does it like: append(name the item, how to find this item)? If we already append it to the datafile, why we need to store it on the left hand side(i.e. alpha)?
Thank you very much for your help.

Related

Append data to data frame in a loops, but only the last value made it into the data frame

I have a for loop,& while loop which produces a data after each iteration.
I want to add all the data together in a data frame but find it difficult. Because only the last data created from the loop is successful(can be seen in the following picture:output code).
Here is the code, please suggest how to fix it:
df = data.frame(matrix(nrow = 350, ncol = 12))
kol<-1
for (x in 1:350) {
output <- c(paste0(x))
df[,1] = output
}
while (kol <= 223) {
if(kol < 224){
rowd1 <- c(paste("gen ",kol))
}
df[,2] = rowd1
kol = kol+1
}#while
while (kol <= 446) {
if(kol < 447){
rowd2 <- c(paste("gen ",kol))
}
df[,3] = rowd2
kol = kol+1
}
colnames(df) <- c("Kromosom", "A","B","C","D","E","F","G","H","I","J","K")
df
so, I will update the question I posed.
what if the code becomes like this:
the problem: row problem
...
for (x in 1:350) {
output <- c(paste0(x))
df[x,1] = output
}
for (x2 in 1:223) {
output2 <- c(paste("Gen ",x2))
df[1,2:224] = output2
}"#why only the value 223 comes out, like the output in the picture 'row problem' that is -Gen 223-"
...
For a data.frame df, df[,n] is the entire n-th column. So you are setting the entire column at each step. In your code, use
df[x, 1] = output
for example, to set the value for a single row.
SOLVED
Special thanks to #JamesHirschorn, I really appreciate your help in resolving the problem. And thanks to the people who give feedback.
To Do
would probably stay away from while here. Why not just use for
instead? – GuedesBF
use df[x, 1] = output to set the value for each row in the 1st column
Code
df = data.frame(matrix(nrow = 350, ncol = 224))
for (x in 1:350) {
output <- c(paste(x))
df[x,1] = output
}
for (x2 in 2:224) {
output2 <- c(paste("Gen ",x2-1))
df[1,x2] = output2
}
colnames(df) <- c("Kromosom", "A","B","C","D","E","F","G","H","I","J","K", ...)
df
Output: here
.So, wish me luck in the future

Add column to a database generated with paste0 in for loop in R

I have 76 dataframe (288 rows and 13 columns) called dataset_id_1, ..., dataset_id_76, generated through a for loop and a list:
List = list()
for (i in 1:76) {
List[[i]] = subset(volping, id == i, select = -c(BetaCorrect, AlphaCorrect, AR, V_Ri2))
}
names(List) = paste0('dataset_id_',1:length(List))
list2env(List,envir = .GlobalEnv)
After this step, I have done the additional calculation to find a parameter (alpha) for each of the 76 dataframe. Therefore, I created a list called alpha with length 76 and one value for each of the dataframe. Now, I want to add a column that assigns the specific parameter contained in the list to the dataframe (1, 2, 3,..., 76). However, I cannot add the column with the typical $ or [ ] because it gives me a problem. To be more clear, I have done the following:
n = c(1:76)
alpha = list()
beta = list()
for (i in n) {
dataset_regr = subset(get(paste0('dataset_id_',i)), TIME >= 740 & TIME < 991)
y = dataset_regr$V_Ri-dataset_regr$V_Rf
x = dataset_regr$V_Rm-dataset_regr$V_Rf
regr = lm(y~x)
alpha[i] = regr$coefficients[1]
beta[i] = regr$coefficients[2]
(get(paste0('dataset_id_',i)))$AlphaCorrect = alpha[i] #I get an error here
}
The generated error is:
(get(paste0("dataset_id_", i)))$AlphaCorrect = alpha[1] :
target of assignment expands to non-language object
Do you know how to solve this issue?
Thank you for your support.
Exactly for this reason you should not have 76 dataframes in your global environment. It is very difficult to manage them. Use lists instead.
lapply(split(volping, volping$id), function(tmp) {
y = tmp$V_Ri-tmp$V_Rf
x = tmp$V_Rm-tmp$V_Rf
regr = lm(y~x)
alpha = regr$coefficients[1]
beta = regr$coefficients[2]
tmp$AlphaCorrect <- alpha
return(tmp)
}) -> result
If you still want to continue having separate dataframes you could use the code that you already have.
names(result) = paste0('dataset_id_',1:length(result))
list2env(result, envir = .GlobalEnv)

Nested apply with multiple parameters

I would like to use the apply family instead of a for loop.
My for loop is nested and contains several vectors and a list, for which I am unsure how to input as parameters with apply.
Codes <- c("A","B","C")
Samples <- c("A","A","B","B","B","C")
Samples_Names <- c("A1","A2","B1","B2","B3","C1")
Samples_folder <- c("Alpha","Alpha","Beta","Beta","Beta","Charlie")
Df <- list(data.frame(T1 = c(1,2,3)), data.frame(T1 = c(1,2,3)), data.frame(T1 = c(1,2,3)))
for (i in 1:length(Codes)){
for (j in 1:length(Samples)) {
if(Codes[i] == Samples[j]) {
write_csv(Df[[i]], path = paste0(Working_Directory,Samples_folder[j],"/",Samples_Names[j],".csv"))
}
}
}
This will give an output of A1,A2 in Alpha, B1,B2,B3 in Beta, and C1 in charlie.
Since you are looking to just use write_csv, we can use pwalk from purrr to accomplish this over the three equal size vectors. No need to include the loop on Codes, as for each iteration in the apply we can write_csv the dataset corresponding to where Samples is found in Codes.
I shortened Working_Directory to WD.
library(purrr)
pwalk(list(Samples, Samples_folder, Samples_Names),
function(x, y, z) write_csv(Df[[match(x, Codes)]], path = paste0(WD, y, "/", z, ".csv")))

How to apply a function that gets data from differents data frames and has conditions in it?

I have a customized function (psup2) that gets data from a data frame and returns a result. The problem is that it takes a while since I am using a "for" loop that runs for every row and column.
Input:
I have a table that contains the ages (table_costumers), an n*m matrix of different terms, and two different mortality tables (for males and females).
The mortality tables i´m using contains one column for ages and another one for its corresponding survival probabilities.
Output:
I want to create a separate dataframe with the same size as that of the term table. The function will take the data from the different mortality tables (depending on the gender) and then apply the function above (psup2) taking the ages from the table X and the terms from the matrix terms.
Up to now I managed to create a very inefficient way to do this...but hopefully by using one of the functions from the apply family this could get faster.
The following code shows the idea of what I am trying to do:
#Function
psup2 <- function(x, age, term) {
P1 = 1
for (i in 1:term) {
P <- x[age + i, 2]
P1 <- P1*P
}
return(P1)
}
#Inputs
terms <- data.frame(V1 = c(1,2,3), V2 = c(1,3,4), V2 = c(2,3,4))
male<- data.frame(age = c(0,1,2,3,4,5), probability = c(0.9981,0.9979,0.9978,.994,.992,.99))
female <- data.frame(age = c(0,1,2,3,4,5), probability = c(0.9983,0.998,0.9979,.9970,.9964,.9950))
table_customers <- data.frame(id = c(1,2,3), age = c(0,0,0), gender = c(1,2,1))
#Loop
output <- data.frame(matrix(NA, nrow = 3, ncol = 0))
for (i in 1:3) {
for (j in 1:3) {
prob <- ifelse(table_customers[j, 3] == 1,
psup2(male, as.numeric(table_customers[j, 2]), as.numeric(terms[j,i])),
psup2(female, as.numeric(table_customers[j, 2]), as.numeric(terms[j,i])))
output[j, i] <- prob
}
}
your psup function can be simplified into:
psup2 <- function(x, age, term) { prod(x$probability[age+(1:term)]) }
So actually, we won't use it, we'll use the formula directly.
We'll put your male and female df next to each other, so we can use the value of the gender column to choose one or another.
mf <- merge(male,female,by="age") # assuming you have the same ages on both sides
input_df <- cbind(table_customers,terms)
output <- t(apply(input_df,1,function(x){sapply(1:3,function(i){prod(mf[x["age"]+(1:x[3+i]),x["gender"]+1])})}))
And that's it :)
The sapply function is used to loop on the columns of terms.
x["age"]+(1:x[3+i]) are the indices of the rows you want to multiply
x["gender"]+1 is the relevant column of the mf data.frame

For i loop, calling different dataframes

I'm new to loops and I have a problem with calling variable from i'th data frame.
I'm able to call each data frame correctly, but when I should call a specified variable inside each data frame problems come:
Example:
for (i in 1:15) {
assign(
paste("model", i, sep = ""),
(lm(response ~ variable, data = eval(parse(text = paste("data", i, sep = "")))))
)
plot(data[i]$response, predict.lm(eval(parse(text = paste("model", i, sep = ""))))) #plot obs vs preds
}
Here I'm doing a simple one variable linear model 15 times, which works just fine. Problems come when I try to plot the results. How should I call data[i] response?
Let's say there are multiple dataframes with names: data1 ...data15 and that there are no other data-objects that begin with the letters: d,a,t,a. Lets also assume that in each of those dataframes are columns named 'response' and 'variable'. The this would gather the dataframes into a list and draw separate plots for the linear regression lines.
dlist <- lapply ( ls(patt='^data'), get)
lapply(dlist, function(df)
plot(NA, xlim=range(df$variable), ylim=range(df$response)
abline( coef( lm(response ~ variable, data=df) ) )
)
If you wanted to name the dataframes in that list, you could use your paste code to supply names:
names(dlist) <- paste("data", i, sep = "")
There are many other assignments you could make in the context of this loop, but you would need to describe the desired results better than with failed efforts.
Here's modified code that should work. It does one variable lm-model and calculates correlation of predicted and observed values and stores it into an empty matrix. It also plots these values.
Thanks Thomas for help.
par(mfrow=c(4,5))
results.matrix <- matrix(NA, nrow = 20, ncol = 2)
colnames(results.matrix) <- c("Subset","Correlation")
for (i in 1:length(datalist)) {
model <- lm(response ~ variable, data = datalist[[i]])
pred <- predict.lm(model)
cor <- (cor.test(pred, datalist[[i]]$response))
plot(pred, datalist[[i]]$response, xlab="pred", ylab="obs")
results.matrix[i, 1] <- i
results.matrix[i, 2] <- cor$estimate
}

Resources