Use a function to modify multiple dataframes - r

I have a function to deduplicate a data frame so that each person (indexed by PatID) is represented once by the latest record (largest RecID):
dedupit <- function(x) {
x <- x[order(x$PatID, -x$RecID),]
x <- x[ !duplicated(x$PatID), ]
return(x)
}
It can deduplicate and replace a dataframe if I do:
df <- dedupit(df)
But I have multiple data frames that need deduplication. Rather than write the above code for each individual data frame, I would like to apply a the dedupit function across multiple dataframes at once so that it replaces the unduplicated dataframe with the duplicated version.
I was able to make a list of the dataframes and lapply the function across each element in the list with:
listofdifs <- list(df1, df2, ....)
listofdfs <- lapply(trial, function(x) dedupit(x))
Though, it only modifies the elements of the list and does not replace the unduplicated dataframes. How do I apply this function to modify and replace multiple dataframes?

Does it work?
Name your dataframes when creating the list, so you can recover them afterwards
list.df <- list(df1 = df1, df2 = df2, df3 = df3)
list2env(lapply(list.df, dedupit), .GlobalEnv)
As a result your dataframes df1, df2, df3 will be the deduplicate version.
unlist a list of dataframes

Related

Create new column across multiple data frames using dfList

I have several data frames all with the same colnames. I want to merge two columns in each one of them to create a new column.
data frames looks like this:
I want the output to look like this:
Normally I would do this very easily for one data frame:
a$XY_ID <- paste(a$X,ak$Y,sep=":")
How to do this for all the dataframes in a list?
Thanks for the help!
You can use paste command in lapply :
dfList <- lapply(dfList, function(x) transform(x, XY_ID = paste(X,Y,sep=":")))
In tidyverse you can use map to iterate over list and unite to combine columns.
dfList <- purrr::map(dfList, ~tidyr::unite(.x, XY_ID, X, Y, sep = ":", remove = FALSE))
We can use do.call with paste and it would also work if there are many columns to concatenate
dfList <- lapply(dfList, function(x) {
x$XYID <- do.call(paste, c(x[c('X', 'Y')], sep=":"))
x })

How do I rename a single column in multiple dataframes to the name of the dataframe in which they reside in R?

I am currently trying to rename a single column in multiple dataframes to match the dataframe name in R.
I have seen some questions/solutions on the site that are similar to what I am attempting to do, but none appear to do this dynamically. I have over 45 dataframes I need rename a column in, so manually typing in each individual name is doable, but time consuming.
Dataframe1 <- column
Dataframe2 <- column
Dataframe3 <- column
I want it to look like this:
Dataframe1 <- Dataframe1
Dataframe2 <- Dataframe2
Dataframe3 <- Dataframe3
The ultimate goal is to have a master dataframe with columns Dataframe1, Dataframe2, and Dataframe3
We can get all the datasets into a list and rename at once in the list
lst1 <- lapply(mget(ls(pattern = "Dataframe\\d+")), function(x) {
names(x)[5] <- "newcol"
x})
Update
If we are renaming the columns in different datasets with different names, then create a vector of columns names that corresponds to each 'Dataframe' column name
nm1 <- c("col5A", "col5B", "col5C", ..., "col5Z")
lst2 <- Map(function(x) {names(x)[5] <- y; x},
mget(ls(pattern = "Dataframe\\d+")),
nm1)
In the above code, we are renaming the 5th column to 'newcol'.
It can also be done using tidyverse
library(dplyr)
library(purrr)
map(mget(ls(pattern = "Dataframe\\d+")), ~ .x %>%
rename_at(5, ~ "newcol"))

Select a numeric columns of a dataframe in a list

I have a list of dataframes. After applying a function I get new columns that are non numeric. From each resulting dataframe that I save in a list modified_list As a result I want to save my modified dataframes but I only want to save the columns that contain numeric values.
I am stocked in the selection of numeric columns. I do not know how to select numeric columns on a list of dataframes. My code looks something like this. Please do you have any idea what can i do to make this code work?
library(plyr)
library(VIM)
data1 <- sleep
data2 <- sleep
data3 <- sleep
# get a list of dataframes
list_dataframes <- list(data1, data2, data3) # list of dataframes
n <- length(list_dataframes)
# apply function to the list_dataframes
modified_list <- llply(list_dataframes, myfunction)
# selects only numeric results
nums <- llply(modified_list, is.numeric)
# saving results
for (i in 1:n){
write.table(file = sprintf( "myfile/%s_hd.txt", dataframes[i]), modified_list[[i]][, nums], row.names = F, sep=",")
}
It sounds like you want to subset each data.frame in a list of data.frames to their numeric columns.
You can test which columns of a data.frame called df are numeric with
sapply(df, is.numeric)
This returns a logical vector, which can be used to subset your data.frame like this:
df[sapply(df, is.numeric)]
Returning the numeric columns of that data.frame. To do this over a list of data.frames df_list and return a list of subsetted data.frames:
lapply(df_list, function(df) df[sapply(df, is.numeric)])
Edit: Thanks #Richard Scriven for simplifying suggestion.

list variables to individual data.frames

Let's say I have a list of 30 data.frames, each containing 2 variables (called value, and rank), called myList
I'd know I can use
my.DF <- do.call("cbind", myList)
to create the output my.DF containing all the variables next to each other.
It is possible to cbind each variable individually into it's own data.frame i.e to just have a new data.frame of just the 2nd variable?
We can extract the second column by looping over the list (lapply) and wrap with data.frame.
data.frame(lapply(myList, `[`, 2))
If we want to separate the variables,
lapply(names(myList[[1]]), function(x)
do.call(cbind,lapply(myList, `[`, x)))
data
set.seed(24)
myList <- list( data.frame(value=1:6, rank= sample(6)),
data.frame(value=7:12, rank=sample(6)))

creat a new variable within several data frames in R

I have several data frames df1, df, 2...., df10. Columns (variables) are the same in all of them.
I want to create a new variable within each of them. I can easily do it "manually" as follows:
df1$newvariable <- ifelse(df1$oldvariable == 999, NA, df1$oldvariable)
or, alternatively
df1 = transform(df1, df1$newvariable= ifelse(df1$oldvariable==999, NA, df1$oldvariable)))
Unfortunately I'm not able to do this in a loop. If I write
for (i in names) { #names is the list of dataframes
i$newvariable <- ifelse(i$oldvariable == 999, NA, i$oldvariable)
}
I get the following output
Error in i$oldvariable : $ operator is invalid for atomic vectors
What I'd do is to pool all data.frame on to a list and then use lapply as follows:
df1 <- as.data.frame(matrix(runif(2*10), ncol=2))
df2 <- as.data.frame(matrix(runif(2*10), ncol=2))
df3 <- as.data.frame(matrix(runif(2*10), ncol=2))
df4 <- as.data.frame(matrix(runif(2*10), ncol=2))
# create a list and use lapply
df.list <- list(df1, df2, df3, df4)
out <- lapply(df.list, function(x) {
x$id <- 1:nrow(x)
x
})
Now, you'll have all the data.frames with a new column id appended and out is a list of data.frames. You can access each of the data.frames with x[[1]], x[[2]] etc...
This has been asked many times. The $<- is not capable of translating that "i" index into either the first or second arguments. The [[<- is capable of doing so for the second argument but not the first. You should be learning to use lapply and you will probably need to do it with two nested lapply's, one for the list of "names" and the other for each column in the dataframes. The question is incomplete since it lacks specific examples. Make up a set of three dataframes, set some of the values to "999" and provide a list of names.

Resources