Writing a for loop in r - r

I don't know how to write for-loops in r. Here is what I want to do:
I have a df called "na" with 50 columns (ana1_1:ana50_1). I want to loop these commands over all columns. Here are the commands for the first two columns (ana1_1 and ana2_1):
t<-table(na$ana1_1)
ana1_1<-capture.output(sort(t))
cat(ana1_1,file="ana.txt",sep="\n",append=TRUE)
t<-table(na$ana2_1)
ana2_1<-capture.output(sort(t))
cat(ana2_1,file="ana.txt",sep="\n",append=TRUE)
After the loop, all tables (ana1_1:ana50_1) should be written in ana.txt Has anyone an idea, how to solve the problem? Thank you very much!

One approach would be to loop through the columns with lapply and using the same code as in the OP's post
invisible(lapply(na, function(x) {
x1 <- capture.output(sort(table(x)))
cat(x1, file='ana.txt', sep="\n", append=TRUE)
}))
Wrapping with invisible so that it won't print 'NULL' in the R console.
We can wrap with a condition to check if the file already exists so that it won't add the same lines by accidentally running the code again.
if(!file.exists('ana.txt')){
invisible( lapply(na, function(x) {
x1 <- capture.output(sort(table(x)))
cat(x1, file='ana.txt', sep="\n", append=TRUE)
}))
}

Here is a solution with a for loop. Loops tend to be slow in r so people prefer other solutions (e.g. the great answer provided by akrun). This answer is for your understanding of the loop syntax:
for(i in 1:50){
t1<-table(na[,i])
t2<-capture.output(sort(t1))
cat(t2,file="ana.txt",sep="\n",append=TRUE)
}
We are looping through i from 1 to 50 (first line). To select a column there's two (there's actually more than two, but that's for another time) ways to access it: na$ana1_1 or na[,1] both select the first column (second line). In the first case you refer by column name, in the second by column index. Here the second case is more convenient. The rest is your desired calculations.
Be aware that cat creates a new file if ana.txt is not existing yet and appends to it if it is already there.

Related

Run a piece of code several times changing certain parameters in R (like a macro in SAS)

Trying to learn the ropes in R and already struggling trying to find a replacement for SAS macro.
I'm trying to run a piece of code several times, but I'm having a hard time and came here for help.
First, I'm working with this example file, with a variable that gives me the quantity of rows that I have previously analised in another file (qtde_registros), followed by three variables that give me the quantity of rows that had different type of errors.
file <- readRDS(file="file.Rda")
file
qtde_registros error1 error2 error3
1 1175 0 0 0
After that, I created a list with the errors and another one with the description of each one of them.
Then, using those lists and the file mentioned initially, I wish to create several files (one for each error) that will later be binded in one last file to form a final report.
As I said, I'm struggling with it, so I made an example code of how it would be forming the first file:
error_list <- list("Error1","Error2","Error3",)
description_list <- list("Code not found",
"Invalid date.",
"Negative value.")
error1 <- file
error1$file_name <- "Clients"
error1$error <- error_list[1]
error1$qtde <- error1$error1
error1$desc <- description_list[1]
error1 <- select(error1, file_name, error, qtde, desc)
error1
file_name error qtde desc
1 Clients Error1 0 Code not found
And that leads to my question: how can I make the code above run several times, one for each erros on my list?
I'm aware that the whole mentality may not be the best, as the approach to do certain things are different depending on the language used, but I have to work with the knowledge I have at the moment.
I'm thinking of using the apply family of functions, but I didn't managed to work it out.
Thanks in advance for the help and sorry for any errors in typing or grammar (english is not my first language).
EDIT: forgot to say that I'm not intend to do via For or While loop.
In R (and many other languages) you'll be using a form of for-loop. In R there are several wrappers for loops with specific outcome in the *apply family. Here's a short (incomplete) list of the *apply family and their input/output:
lapply -> list output
sapply -> List or atomic (integer vector, numeric vector etc.)
mapply -> Similar to sapply but can take more than 1 input to go over (so if you have 2 simultanious things to loop over for example)
tapply -> loop over groups defined by INDEX
apply -> Loop over an array (either rows or columns) return matrix/vector
And so on.
I am guessing that your example is incomplete, but I'll show 3 examples to get you started. One using a for-loop, one using lapply and one using mapply.
for-loop
A for-loop is the classic method (found in most programming languages). It works by having a for(---) where --- is replaced by something to iterate over. This could be error_list or it could be a numeric vector seq(1, n) or 1:n. Here you have more than 1 thing to iterate over, so a numeric vector makes sense (and we use this to subset the data)
errors <- list() # <== Somewhere to put our results
for(i in 1:length(error_list)){
error_i <- list(file = file,
file_name = "Clients",
error = error_list[[i]], # Use i to subset error_list
qtde = error_list[[i]], # Maybe this should be something else in your case
desc = description_list[[i]]
)
# Put into our errors list. Create "error1" using paste and our index
errors[[paste0('error', i)]] <- error_i
}
And by the end all of your results will be in the errors list to be extracted using errors[1] or errors["errors1"] (change the number to your error). This can then be combined using do.call(rbind, errors) and then saved using write.table (or write.csv or similar).
lapply
For the *apply family, the *apply takes care of the looping. But instead we have to provide a function to execute (a macro in SAS terms) in each iteration. So we wrap the contents of the loop in the function above.
macro <- function(i){
list(file = file,
file_name = "Clients",
error = error_list[[i]], # Use i to subset error_list
qtde = error_list[[i]], # Maybe this should be something else in your case?
desc = description_list[[i]]
)
}
errors <- lapply(1:length(error_list), macro)
#set names afterwards
names(errors) <- paste0("error", 1:length(error_list))
And once again we have the data ready to be extracted saved etc. This is equivalent to:
errors <- list()
for(i in 1:length(error_list))
errors[[i]] <- macro(i)
names(errors) <- paste0("error", 1:length(error_list))
mapply
Now in your case you have more than 1 thing to iterate over. An alternative is to use mapply and add these as parameters to your function instead. This way we remove error_list[[i]] and description_list[[i]] from the function and instead add these as parameters
macro_mapply <- function(error, description){
list(file = file,
file_name = "Clients",
error = error, # No need to use I here anymore
qtde = error, # Maybe this should be something else in your case?
desc = description
)
}
errors <- mapply(macro_mapply,
# parameters to iterate over comes after function
error = error_list,
description = description_list,
# Avoid simplification (if we want a list returned)
SIMPLIFY = FALSE)
names(errors) <- paste0("error", 1:length(error_list))
Note that "mapply" will try to return a vector if possible, so I set SIMPLIFY = FALSE to avoid this.
Things to note:
In the above 3 examples I have not taken into account if you read multiple files, or any other parameters changing. So if you have to read a file in each iteration it will make sense to go with the first 2 examples and add readRDS to the loop or function with appropriate file naming. Also I have used your data, but I am guessing qtde and error should be different in your specific case but this is not clear from your example.
I hope this will help getting you started.
Once you've gotten the hang of your first loops I and somewhat understand how *applys work, I would then suggest checking out tidyverse which provides what many find to be a more "user-friendly" and intuitive interface to data transformation.
I hope that this will help you getting started on solving your problem.

Combining many vectors into one larger vector (in an automated way)

I have a list of identifiers as follows:
url_num <- c('85054655', '85023543', '85001177', '84988480', '84978776', '84952756', '84940316', '84916976', '84901819', '84884081', '84862066', '84848942', '84820189', '84814935', '84808144')
And from each of these I'm creating a unique variable:
for (id in url_num){
assign(paste('test_', id, sep = ""), FUNCTION GOES HERE)
}
This leaves me with my variables which are:
test_8505465, test_85023543, etc, etc
Each of them hold the correct output from the function (I've checked), however my next step is to combine them into one big vector which holds all of these created variables as a seperate element in the vector. This is easy enough via:
c(test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144)
However, as I update the original 'url_num' vector with new identifiers, I'd also have to come down to the above chunk and update this too!
Surely there's a more automated way I can setup the above chunk?
Maybe some sort of concat() function in the original for-loop which just adds each created variable straight into an empty vector right then and there?
So far I've just been trying to list all the variable names and somehow get the output to be in an acceptable format to get thrown straight into the c() function.
for (id in url_num){
cat(as.name(paste('test_', id, ",", sep = "")))
}
...which results in:
test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144,
This is close to the output I'm looking for but because it's using the cat() function it's essentially a print statement and its output can't really get put anywhere. Not to mention I feel like this method I've attempted is wrong to begin with and there must be something simpler I'm missing.
Thanks in advance for any help you guys can give me!
Troy

Adding columns to multiple dataframes via loop

I have several dataframes and would like to add columns with a loop. At the moment the code looks like this:
FR1$MONTH<-'2015-01'
FR2$MONTH<-'2015-02'
FR3$MONTH<-'2015-03'
FR4$MONTH<-'2015-04'
I have tried the following:
for (i in 1:12) {
assign(paste("FR",i,$,"MONTH",sep=""),paste("2015-",i,sep=""))
}
Unfortunatly it doesnt work.
Can anybody tell me what is wrong with my try, or even better: How to do this right as I suspect a loop isnt the best solution.
Well, one issue that would throw you an error is that the '$' should be within quotes within the first paste() call.
I would try, however:
eval(parse(text = paste0("FR", i, "$MONTH <- 2015-", i)))
within your loop. And you may want to use an ifelse() to get the 0 in the month when you need it.
And I second Colonel's comment about keeping your data.frames within some other data structure.

how to convert a charcter string to a name that accepts data (data frame name) in R

I have stored a list of names as characters and want to convert them to something that can be accepted as data frame name. something like this:
for (i in 1:18) {
str[i] <- paste("alert_month_amount_",i,sep="")
}
name_str = as.character(str)
then name_str will be:
name_str[1] would be "alert_month_amount_1"
now i want to assign certain data to a data frame that uses name_str[i] inside a loop like:
for (n in 1:18){
name_str[n] <- subset(by_Month_Acct_Num,month==month_index[n] & year==year_index[n])
}
but this does not work perhaps because the names are passed as characters inside double quotation mark ("). I would appreciate your help.
You can use assign for this:
assign(name_str[n], subset(by_Month_Acct_Num,month==month_index[n] & year==year_index[n]))
This is FAQ 7.21. The most important part of that answer is the end where it says (like #MrFlick) that it is better to use a list. You really should learn how to take advantage of R's vectorized functions.
The paste and paste0 functions are both vectorized, so your first bit of code can be replaced with:
name_str <- paste0("alert_month_amount_", 1:18)
without need for the loop.
You could create your list and fill it with code like:
alert_month_amount <- list()
for(i in 1:18) {
alert_month_amount[[i]] <- subset(by_Month_Acct_Num,month==month_index[n] & year==year_index[n])
}
Or possibly even easier using the split function. You could also use lapply or mapply.
If you want the elements named then just do:
names(alert_month_amount) <- name_str
Now with everything in a single list you can copy, save, delete, etc. one object rather than needing another loop to do each individual piece. If you want to do the same thing (calculate a summary, fit a regression, etc.) on each piece created then with everything in a list you can just use lapply or sapply on the list rather than having to create another loop and figuring out how to grab each piece in the loop and save it to an output object.

Files details from folder

I'd like to loop through a list of files and record detailed info about them (size, no. of rows, means of columns).
I just started with storing the info in a data frame:
df<-data.frame()
all <-list.files(pattern=".csv")
for (i in all){
file<-read.csv(i)
filas<-nrow(file)
cols<-ncol(file)
info<-c(i,filas,cols)
df<-rbind(df,i,filas,cols)
}
but it triggers an error caused by the 'i' variable, which is just a file name. What am I doing wrong?
Thanks in advance, p.
Don't use for loops. Rather, use lapply in combination with do.call to obtain your desired result. Try:
do.call(rbind,lapply(all,function(x) {y<-read.csv(x); c(file=x, filas=nrow(y), cols=ncol(y))}))
Your approach was failing because in order of rbind to work, you need two data.frames with the same number of columns. You initially have created an empty data.frame (with 0 column) and this couldn't be rbinded to a vector of length 3 (assuming that you want a row for each file showing file name, number of rows and number of columns). If you really want to use a for loop, you should do something like:
for (i in 1:length(all)) {
file<-read.csv(all[i])
info<- data.frame(file=all[i], filas=nrow(file), cols=ncol(file))
if (i==1) df<-info else df<-rbind(df,info)
}

Resources