Loop over several dataframes in R - r

I have several data frames that I would like to be used in the same code, one after the other. In the code lines that I have written, I am using the variable "my_data" (which is basically a dataframe). Thus, I thought the easiest solution would be to assign each of my other dataframes to "my_data", one after the other, so that all the code that follows can be executed for each data frame in a loop without changing the code I already have.
The structure I have looks as follows:
#Datasets:
my_data
age_date
gender_data
income_data
## Code that uses "my_data" follows here" ##
How can I create a loop that first assigns "age_data" to "my_data" and executes the code where "my_data" was used as a variable. Then, after it reaches the end, restarts and assigns "gender_data" to the variable "my_data" and does the same until this has been done for all variables.
Help is much appreciated!

I am attempting to answer based upon information provided:
datanames <- c("age_data","gender_data","income_data")
for (dname in datanames){
my_data <- data.frame()
my_data <- get(dname)
# here you can write rest of the code
rm(mydata)
}

Maybe you can try get within for loop
for (i in c( "age_date", "gender_data","income_data")) {
my_data <- get(i)
}

Related

How can I create a variable with different names depending on the number of an iteration in R?

Hello I am trying to create a for loop where a variable is created depending on the value of a column that only has 10 possible values. Ideally the for loop goes and filters the data using dplr by the number and then rewrites the variable sliced for only the first 15 observations.
I created the following but it doesn’t work
for (i in 1:10){
mvendidos[[i]] <- filter(dff,grupo==i)
mvendidos[[i]] <- slice(dff.1:15)}
You need to cast the type of i. Add %>% as.character() and do not forget to create the "container" list:
mvendidos <- list()
for (i in 1:10){
mvendidos[[i %>% as.character()]] <- filter(dff,grupo==i)
# mvendidos[[i %>% as.character()]] <- slice(dff.1:15) # Commented as the synthax is doubtful
}
By the way. I doubt that dff.1:15 is correct. But it is not a part of your question and the structure of dff is not available.
Appended
I think I need to attract your attention to the operator %>%. You can read about dplyr pipes here.
In this case you can replace it with base R 4.2+ operator |>as well.

get() not working for column in a data frame in a list in R (phew)

I have a list of data frames. I want to use lapply on a specific column for each of those data frames, but I keep throwing errors when I tried methods from similar answers:
The setup is something like this:
a <- list(*a series of data frames that each have a column named DIM*)
dim_loc <- lapply(1:length(a), function(x){paste0("a[[", x, "]]$DIM")}
Eventually, I'll want to write something like results <- lapply(dim_loc, *some function on the DIMs*)
However, when I try get(dim_loc[[1]]), say, I get an error: Error in get(dim_loc[[1]]) : object 'a[[1]]$DIM' not found
But I can return values from function(a[[1]]$DIM) all day long. It's there.
I've tried working around this by using as.name() in the dim_loc assignment, but that doesn't seem to do the trick either.
I'm curious 1. what's up with get(), and 2. if there's a better solution. I'm constraining myself to the apply family of functions because I want to try to get out of the for-loop habit, and this name-as-list method seems to be preferred based on something like R- how to dynamically name data frames?, but I'd be interested in other, more elegant solutions, too.
I'd say that if you want to modify an object in place you are better off using a for loop since lapply would require the <<- assignment symbol (<- doesn't work on lapply`). Like so:
set.seed(1)
aList <- list(cars = mtcars, iris = iris)
for(i in seq_along(aList)){
aList[[i]][["newcol"]] <- runif(nrow(aList[[i]]))
}
As opposed to...
invisible(
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <<- runif(nrow(aList[[x]]))
})
)
You have to use invisible() otherwise lapply would print the output on the console. The <<- assigns the vector runif(...) to the new created column.
If you want to produce another set of data.frames using lapply then you do:
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <- runif(nrow(aList[[x]]))
return(aList[[x]])
})
Also, may I suggest the use of seq_along(list) in lapply and for loops as opposed to 1:length(list) since it avoids unexpected behavior such as:
# no length list
seq_along(list()) # prints integer(0)
1:length(list()) # prints 1 0.

Object selection in loop

I am currently experiencing perpetual issues with object selection within loops in R. I am fairly convinced that this is a common problem but I cannot seem to find the answer so here I am...
Here's a practical example of a problem I have:
I have a dataframe as source with a series of variables named sequentially (X1,X2,X3,X4, and so on). I am looking to create a function which takes the data as source matches it to another dataset to create a new, combined dataset.
The number of variables will vary. I want to pass my function a parameter which tells it how many variables I have, and the function needs to adjust the number of times it will run the code accordingly. This seems like a task for a for loop, but again there doesn't appear to be an easy way for that selection and recreation of variables within a loop.
Here's the code I need to repeat:
new1$X1 <- data$X1[match(new1$matf1, data$rowID)]
new1$X2 <- data$X2[match(new1$matf1, data$rowID)]
new1$X3 <- data$X3[match(new1$matf1, data$rowID)]
new1$X4 <- data$X4[match(new1$matf1, data$rowID)]
new1$X5 <- data$X5[match(new1$matf1, data$rowID)]
(...)
return(new1)
I've attempted something like this:
for(i in 1:5) {
new1$Xi <- assign(paste0("X", i)), as.vector(paste0("data$X",i)[match(new1$matf1, data$rowID)])
}
without success.
Thank you for your help!
You can try this simple way, however a join would be more efficient:
vals <- paste0('X',1:5)
for(i in vals){
new1[[i]] <- data[[i]][match(new1$matf1, data$rowID)]
}

How to write a testthat unit test for a function that returns a data frame

I am writing a script that ultimately returns a data frame. My question is around if there are any good practices on how to use a unit test package to make sure that the data frame that is returned is correct. (I'm a beginning R programmer, plus new to the concept of unit testing)
My script effectively looks like the following:
# initialize data frame
df.out <- data.frame(...)
# function set
function1 <- function(x) {...}
function2 <- function(x) {...}
# do something to this data frame
df.out$new.column <- function1(df.out)
# do something else
df.out$other.new.column <- function2(df.out)
# etc ....
... and I ultimately end up with a data frame with many new columns. However, what is the best approach to test that the data frame that is produced is what is anticipated, using unit tests?
So far I have created unit tests that check the results of each function, but I want to make sure that running all of these together produces what is intended. I've looked at Hadley Wickham's page on testing but can't see anything obvious regarding what to do when returning data frames.
My thoughts to date are:
Create an expected data frame by hand
Check that the output equals this data frame, using expect_that or similar
Any thoughts / pointers on where to look for guidance? My Google-fu has let me down considerably on this one to date.
Your intuition seems correct. Construct a data.frame manually based on the expected output of the function and then compare that against the function's output.
# manually created data
dat <- iris[1:5, c("Species", "Sepal.Length")]
# function
myfun <- function(row, col, data) {
data[row, col]
}
# result of applying function
outdat <- myfun(1:5, c("Species", "Sepal.Length"), iris)
# two versions of the same test
expect_true(identical(dat, outdat))
expect_identical(dat, outdat)
If your data.frame may not be identical, you could also run tests in parts of the data.frame, including:
dim(outdat), to check if the size is correct
attributes(outdat) or attributes of columns
sapply(outdat, class), to check variable classes
summary statistics for variables, if applicable
and so forth
If you would like to test this at runtime, you should check out the excellent ensurer package, see here. At the bottom of the page you can see how to construct a template that you can test your dataframe against, you can make it as detailed and specific as you like.
I'm just using something like this
d1 <- iris
d2 <- iris
expect_that(d1, equals(d2)) # passes
d3 <- iris
d3[141,3] <- 5
expect_that(d1, equals(d3)) # fails

Subset function in R wont work with vector selection

I have this weird problem where I have something like this in my code:
#(2,1,6,3)
states.vector <- unique(data$state)
I am iterating through the vector to subset data for each value in the "state" column. At some point through my iteration, the following line of code gives me an empty data frame:
#When state == 1
data.state <- subset(data,state==states.vector[state])
If state is == 1, it means that states.vector[state] == 2. But when I do the following, it works just fine:
subset(data,state==2)
What is weird is that I used this process multiple times, and it worked fine for the exact same task, with the same format for "data", but with some different values inside.
What am I doing wrong?
I think jlhoward has already explained what the problem is.
Why don't you use something like the following lines of code to loop through your states?
states.vector <- unique(data$state)
for (selected_state in states.vector) {
data.state <- subset(data,state==selected_state)
#...
}

Resources