loop over dataframes to create new dataframes - r

I have a list of dataframes and I want to loop over all dataframes to create new dataframes with only unique values. This is my code for creating 1 new dataframe:
dflist <- list(df1=df1, df2=df2, df3 = df3)
udf1 = unique(df1)
I don't know whether I should use a loop or a function. Any help?
Thanks in advance!

Given that you want to keep the unique rows in each data frame I'd do something like this.
lapply(seq_along(dflist), function(l, n, i) {
assign(paste0(n[[i]]), distinct(l[[i]]), envir = globalenv())
}, l=dflist, n=names(dflist))

Related

Looping multiple data frames in R

I am wondering whether it's possible to loop data frames and change the contents of each field.
There are dataframes like df1, df2, df3, ... df100
In each dataframe, there are food columns having a, b
I want to change each a and b in df$food to apple, banana!
for (i in 1:100){
paste('df', i, '$food') <- factor(paste('df', i, '$food'), level = c(a,b), labels = c("apple","banana"))
}
Do you think looping like above is possible?
It would be easier if you put them in a list of dataframes and use lapply.
result <- lapply(mget(paste0('df', 1:100)), function(x) transform(x,
food = factor(food, level=c("a","b"), labels=c("apple","banana"))))
Update the original dataframes back.
list2env(result, .GlobalEnv)

Using a loop to select a column names from a list

I've been struggling with column selection with lists in R. I've loaded a bunch of csv's (all with different column names and different number of columns) with the goal of extracting all the columns that have the same name (just phone_number, subregion, and phonetype) and putting them together into a single data frame.
I can get the columns I want out of one list element with this;
var<-data[[1]] %>% select("phone_number","Subregion", "PhoneType")
But I cannot select the columns from all the elements in the list this way, just one at a time.
I then tried a for loop that looks like this:
new.function <- function(a) {
for(i in 1:a) {
tst<-datas[[i]] %>% select("phone_number","Subregion", "PhoneType")
}
print(tst)
}
But when I try:
new.function(5)
I'll only get the columns from the 5th element.
I know this might seem like a noob question for most, but I am struggling to learn lists and loops and R. I'm sure I'm missing something very easy to make this work. Thank you for your help.
Another way you could do this is to make a function that extracts your columns and apply it to all data.frames in your list with lapply:
library(dplyr)
extractColumns = function(x){
select(x,"phone_number","Subregion", "PhoneType")
#or x[,c("phone_number","Subregion","PhoneType")]
}
final_df = lapply(data,extractColumns) %>% bind_rows()
The way you have your loop set up currently is only saving the last iteration of the loop because tst is not set up to store more than a single value and is overwritten with each step of the loop.
You can establish tst as a list first with:
tst <- list()
Then in your code be explicit that each step is saved as a seperate element in the list by adding brackets and an index to tst. Here is a full example the way you were doing it.
#Example data.frame that could be in datas
df_1 <- data.frame("not_selected" = rep(0, 5),
"phone_number" = rep("1-800", 5),
"Subregion" = rep("earth", 5),
"PhoneType" = rep("flip", 5))
# Another bare data.frame that could be in datas
df_2 <- data.frame("also_not_selected" = rep(0, 5),
"phone_number" = rep("8675309", 5),
"Subregion" = rep("mars", 5),
"PhoneType" = rep("razr", 5))
# Datas is a list of data.frames, we want to pull only specific columns from all of them
datas <- list(df_1, df_2)
#create list to store new data.frames in once columns are selected
tst <- list()
#Function for looping through 'a' elements
new.function <- function(a) {
for(i in 1:a) {
tst[[i]] <- datas[[i]] %>% select("phone_number","Subregion", "PhoneType")
}
print(tst)
}
#Proof of concept for 2 elements
new.function(2)

How do I reorder columns for all data frames in a list in R?

I already have a list of data frames (mylist) and need to switch the first and second column for all the data frames in the list.
Test Data Frame in List
[reads] [phylum]
1 phylum1
2 phylum2
3 phylum3
Into....
[phylum] [reads]
phylum1 1
phylum2 2
phylum3 3
I know I need to use lapply, but not sure what to input for the FUN=
mylist <- lapply(mylist, FUN = mylist[ ,c("phylum", "reads")])
errors saying incorrect number of dimensions
Sorry if this is a simple question and thanks in advance for your help!
-Brand new R user
The FUN asks for a function that it can apply to every element in the list. You are passing mylist[ ,c("phylum", "reads")]) which is not a function.
# sample data
df1 <- data.frame(reads = sample(10,4), phylum = sample(10,4))
df2 <- data.frame(reads = sample(10,4), phylum = sample(10,4))
df3 <- data.frame(reads = sample(10,4), phylum = sample(10,4))
df4 <- data.frame(reads = sample(10,4), phylum = sample(10,4))
ldf <- list(df1,df2,df3,df4)
ldf_re <- lapply(ldf, FUN = function(X){X[c('phylum', 'reads')]})
In the last line, the lapply will iterate through all the dataframes, they will be passed as the X argument for the function defined in the FUN argument and the columns will be dataframes will be stored in the list ldf_re with their columns rearranged.

Select/choose/access the dataframe by its name in R

Suppose I have 3 dataframes in the current R environment, named as d1f, df2, df_3. There is no pattern for their names. How can I access one dataframe by its name?
For example, I have a for loop to process the three dataframes. How can I do something like this?
df_names<-c("d1f", "df2", "df_3")
for(name in df_names)
{
df<-some_function(name)
....some action on df....
}
Best is to store the data frames in a list like so:
set.seed(1)
d1f = rnorm(10)
df2 = rnorm(10)
df_3 = rnorm(10)
dfs = list(d1f, df2, df_3)
for (i in 1:length(dfs)){
dfs[[i]] = dfs[[i]] +1 # eg. add 1 to each element of the three data frames
}

R - Passing list of dataframe name into function

I have dataframes and want to pass them as a parameter to process in function. Let say there are 4 dataframes and want to rename first columns to 'ROWNUM'.
df1 = data.frame(c(1:10),sample(1:100,10))
df2 = data.frame(c(1:10),sample(1:100,10))
df3 = data.frame(c(1:10),sample(1:100,10))
df4 = data.frame(c(1:10),sample(1:100,10))
function(df) colnames(df)[1] = 'ROWNUM'
My objective is I want to rename in one shot rather than passing one by one
Thanks.
We can use lapply after keeping the datasets in a list
nm1 <- ls(pattern="df\\d+")
lst <- lapply(mget(nm1), function(x) {
colnames(x)[1] <- 'ROWNUM'
x})
It is better to keep the datasets in a list, but if we need to update the original datasets
list2env(lst, envir=.GlobalEnv)
Or we use assign
for(j in seq_along(nm1)){
assign(nm1[j], `names<-`(get(nm1[j]),
c("ROWNUM", names(get(nm1[j]))[-1])))
}

Resources