Using For Loop in R to loop through a dataframe - r

I have a function called Produce_Output. It takes an X variable and a Y variable then in R carries out some calculations, SQL data retrievals, saves a plot to a file location etc. the function itself doesn't produce anything but triggers other actions.
I want to run this function through a data frame which has been setup for it. The dataframe has 8464 obs and 2 variables. I would like to use the function and pass each 2 variables to the function an observation at a time.
calling the function as follows
for (Data_To_Process) {
Produce_Output(TableA$Column1, TableA$Column2)
}
I get the following error
Error in $<-.data.frame(*tmp*, "OND", value = c(3379L, 3121L,
1699L, : replacement has 8464 rows, data has 3
I read a post on here about the data having NULLs, I've checked that and it doesn't. I also don't understand what it replacing what with. I just want it to process the first row, then the next... then the next... as I said the function has no output but triggers other procedures using the two values inserted. Any help would be appreciated.

Rui Barradas - You are perfectly correct that was the issue. I was aware of the index method for the loop but had only access to instances where one column was being used so didn't understand how to reference other columns in the syntax. Thanks for your help
"You are passing the entire columns, not rows. It could be something like for(i in 1:nrow(TableA)){Produce_Output(TableA$Column1[i], TableA$Column2[i])}"

Related

R - how to use a function on a list of data frames

I have a little problem with my code. I hope you can help me :)
I used a function apply to create a list of 20 data frames (data about stock index returns, grouped by year and index - about three companies and the stock, for 5 years). And now I want to use function with two arguments (it calculates proportion of covariance of the returns for selected company and the stock to variance (for every year) - this is why I'm trying to group the data. How to do it... automatically, without manual typing code for every year and company?
I don't have any idea if I should use for loop or there is any other way...?
And the other thing is in which way can I delete uneccesary columns from list of data frames?
I'll be thankful for your help.
And sorry for my English :D
You may consider purrr::map_dfr(). The first argument will be your list of data frames, and the second the action to do with that data frame. The final result will be a single data frame uniting the result of all of the above. Your code will likely look something like this:
purrr::map_dfr(list_of_dataframes, function(x) {...})
Within the bracketes, instead of ... insert your logic. In that context, x will be the same as list_of_dataframes[[1]], and then list_of_dataframes[[2]], etc.
You may want to consult the documentation of the package purrr for further details.

Only last iteration of loop is saved

I have a list of dataframes (subspec2) which I want to loop through to get the columns with the maximum value from each dataframe, and write these to a new dataframe. I wrote the following loop:
good.data<-data.frame(matrix(nrow=401, ncol=78)) #create empty dataframe
for (i in length(subspec2)) ##subspec2 is the list of dataframes
{
max.name<-names(which.max(apply(subspec2[[i]],MARGIN=2,max))) #find column name with max value
good.data[,i]<-subspec2[[i]][max.name] #write the contents of this column into dataframe
}
This seems to work but only returns values in the last column, nothing else appears to have been saved. Many threads point out the df must be outside the loop, but that is not the problem here.
What am I doing wrong?
Thank you!
I believe you need to change for (i in length(subspec2)) to for (i in 1:length(subspec2)). The former will only do 1 iteration, where i = length(subspec2) whereas the latter iterates over multiple is.
(I am pretty sure that is your issue, but one thing that is great to do is to create a reproducible example so I can run your code to double check, for example I am not exactly sure what subspec2 looks like, and I am not able to run your code as it is, a great resource for this is the reprex package).

function to remove all observations that contain a "prohibited" value - R

I have an large dataset looking like:
There are overall 43 different values for PID. I have identified PIDs that need to be removed and summarized them in a vector:
I want to remove all observations (rows) from my data set that contain one of the PIDs from the vecotor NullNK. I have tried writing a function for it, but i get an error ( i have never written functiones before):
for (i in length(NullNK)){
SR_DynUeber_einfam <- SR_DynUeber_einfam [-which(SR_DynUeber_einfam$PID == NullNK(i)),]
}
How can i efficently remove the observations from my original data set that are containing PIDs from NullNK vector?
What is wrong with my function?
Thanks!
For basic operations like this, for loops are often not needed. This does what you are looking for:
SR_DynUeber_einfam[!SR_DynUeber_einfam$PID %in% NullNK,]
One mistake in your function is NullNK(i). You should subset from a vector with NullNK[i] in R.
Hope this helps!

Looping in R to create transformed variables

I have a dataset of 80 variables, and I want to loop though a subset of 50 of them and construct returns. I have a list of the names of the variables for which I want to construct returns, and am attempting to use the dplyr command mutate to construct the variables in a loop. Specifically my code is:
for (i in returnvars) {
alldta <- mutate(alldta,paste("r",i,sep="") = (i - lag(i,1))/lag(i,1))}
where returnvars is my list, and alldta is my dataset. When I run this code outside the loop with just one of the `i' values, it works fine. The code for that looks like this:
alldta <- mutate(alldta,rVar = (Var- lag(Var,1))/lag(Var,1))
However, when I run it in the loop (e.g., attempting to do the previous line of code 50 times for 50 different variables), I get the following error:
Error: unexpected '=' in:
"for (i in returnvars) {
alldta <- mutate(alldta,paste("r",i,sep="") ="
I am unsure why this issue is coming up. I have looked into a number of ways to try and do this, and have attempted solutions that use lapply as well, without success.
Any help would be much appreciated! If there is an easy way to do this with one of the apply commands as well, that would be great. I did not provide a dataset because my question is not data specific, I'm simply trying to understand, as a relative R beginner, how to construct many transformed variables at once and add them to my data frame.
EDIT: As per Frank's comment, I updated the code to the following:
for (i in returnvars) {
varname <- paste("r",i,sep="")
alldta <- mutate(alldta,varname = (i - lag(i,1))/lag(i,1))}
This fixes the previous error, but I am still not referencing the variable correctly, so I get the error
Error in "Var" - lag("Var", 1) :
non-numeric argument to binary operator
Which I assume is because R sees my variable name Var as a string, rather than as a variable. How would I correctly reference the variable in my dataset alldta? I tried get(i) and alldta$get(i), both without success.
I'm also still open to (and actively curious about), more R-style ways to do this entire process, as opposed to using a loop.
Using mutate inside a loop might not be a good idea either. I am not sure if mutate makes a copy of the data frame but its generally not a good practice to grow a data frame inside a loop. Instead create a separate data frame with the output and then name the columns based on your logic.
result = do.call(rbind,lapply(returnvars,function(i) {...})
names(result) = paste("r",returnvars,sep="")
After playing around with this more, I discovered (thanks to Frank's suggestion), that the following works:
extended <- alldta # Make a copy of my dataset
for (i in returnvars) {
varname <- paste("r",i,sep="")
extended[[varname]] = (extended[[i]] - lag(extended[[i]],1))/lag(extended[[i]],1)}
This is still not very R-styled in that I am using a loop, but for a task that is only repeating about 50 times, this shouldn't be a large issue.

R: Error in .Primitive, non-numeric argument to binary operator

I did some reading on similar SO questions, but couldn't figure out how to resolve my error.
I have written the following string of code:
points[paste0(score.avail,"_pts")] <-
Map('*', points[score.avail], mget(paste0(score.avail,'_m')) )
Essentially, I have a list of columns in the 'points' data frame, defined by 'score.avail'. I am multiplying each of the columns by a respective constant, defined as the paste0(score.avail, '_m') expression. It appends new fields based on the multiplication, given by paste0(score.avail, "_pts") expression.
I have used this function before in a similar setup with no issues. However, I am now getting the following error:
Error in .Primitive("*")(dots[[1L]][[1L]], dots[[2L]][[1L]]) :
non-numeric argument to binary operator
I'm pretty sure R is telling me that one of the fields I'm trying to multiply is not numeric. However, I have checked all my fields, and they are numeric. I have even tried running a line as.numeric(score.avail) but that doesn't help. I also ran the following to remove NA's in the fields (before the Map function above).
for(col in score.avail){
points[is.na(get(col)) & (data.source == "average" |
data.source == "averageWeighted"), (col) := 0]}
The thing that stumps me is that this expression has worked with no issues before.
Update
I did some more digging by separating out each component of my original function. I'm getting odd output when running points[score.avail]. Previously when I ran this, it would return just the columns for all of my rows. Now, however, I'm getting none of the rows in my original data frame -- rather, it is imputing the column names in the 'score.avail' list as rows and filling in NA's everywhere (this is clearly the source of my problem).
I think this is because I'm using the object I'm pointing to is a data.table with keyvars set. Previously with this function, I had been pointing to a data frame.
Off to try a few more things.
Another Update
I was able to solve my problem by copying the 'points' object using as.data.frame(). However, I will leave the question open to see if anyone knows how to reset the data table key vars so that the function I specified above will work.
I was able to solve my problem by copying the 'points' object using as.data.frame(). Apparently classifying the object as a data.table was causing my headaches.

Resources