Create a variable in Multiple Dataframes in R - r

I want to create a ranked variable that will appear in multiple data frames.
I'm having trouble getting the ranked variable into the data frames.
Simple code. Can't make it happen.
dfList <- list(df1,df2,df3)
for (df in dfList){
rAchievement <- rank(df["Achievement"])
df[[rAchievement]]<-rAchievement
}
The result I want is for df1, df2 and df3 to each gain a new variable called rAchievement.
I'm struggling!! And my apologies. I know there are similar questions out there. I have reviewed them all. None seem to work and accepted answers are rare.
Any help would be MUCH appreciated. Thank you!

We can use lapply with transform in a single line
dfList <- lapply(dfList, transform, rAchievement = rank(Achievement))
If we need to update the objects 'df1', 'df2', 'df3', set the names of the 'dfList' with the object names and use list2env (not recommended though)
names(dfList) <- paste0('df", 1:3)
list2env(dfList, .GlobalEnv)
Or using the for loop, we loop over the sequence of the list, extract the list element assign a new column based on the rank of the 'Achievement'
for(i in seq_along(dfList)) {
dfList[[i]][['rAchievement']] <- rank(dfList[[i]]$Achievement)
}

Related

colnames and mutate on multiple dataframes

I have a problem with cleaning up my code. I understand I could type this all out but we don't want that obviously.
I have only dataframes in my global environment. They are all "data.frame".
I want to check the dimensions of all of them and put that in a tibble. I managed that somehow. I also would like to change their colnames() tolower() which works easy if I just type the name of the data.frame, but there's more than 2 and I want it done automatically. Then I also want to mutate all data.frames in the same way.
Small example of my code:
library(tidyverse)
x <- data.frame(letters[1:2]) #To create the data
y <- data.frame(letters[3:4])
dfs <- as.list(ls()) #I take whatever is in my environment
I managed below to get a tibble of the dimensions:
z <- as_tibble(lapply(seq_along(dfs),
function(j) dim(get(dfs[[j]]))), .name_repair = "unique")
colnames(z) <- dfs
Now for the colnames of all the data.frames stored in my list I basically want to perform this code:
colnames(dfs[[1]]) <- tolower(colnames(dfs[[1]])
but that returns NULL as I found out earlier. So I used get() in there to make it work for the dimensions. But if I use get() to assign colnames it says it can't find function "get<-".
Since all colnames for all dataframes are the same (just different nrows()) I could save the lowercase colnames as value and use that, but that doesn't take away that it cant find the get<- function.
names <- tolower(colnames(x))
sapply(seq_along(dfs),
function(j) colnames(get(dfs[[j]])) <- names)
*Error in colnames(get(dfs[[j]])) <- names :
could not find function "get<-"*
as for the mutating part I tried a for loop:
for(i in seq_along(dfs)){
get(dfs[[i]]) <- get(dfs[[i]]) %>% mutate(cd = ab)
}
But it's the same issue.
Could anyone help clearing this problem for me? (and if a cleaner code for the dimensions is available that would be highly appreciated)
I am just trying to up my coding skills. I would have been long done if I just typed it all out but that defeats the purpose.
Thanks!
-JK
Using base R
lapply(dfs, function(x) transform(setNames(x, tolower(names(x))), X = c('a', 'b')))

Looping through similar dataframes to apply changes using for

I have dataframes in which one column has to suffer a modification, handling correctly NAs, characters and digits. Dataframes have similar names, and the column of interest is shared.
I made a for loop to change every row of the column of interest correctly. However I had to create an intermediary object "df" in order to accomplish that.
Is that necessary? or the original dataframes can be modified directly.
sheet1 <- read.table(text="
data
15448
something_else
15334
14477", header=TRUE, stringsAsFactors=FALSE)
sheet2 <- read.table(text="
data
16448
NA
16477", header=TRUE, stringsAsFactors=FALSE)
sheets<-ls()[grep("sheet",ls())]
for(i in 1:length(sheets) ) {
df<-NULL
df<-eval(parse(text = paste0("sheet",i) ))
for (y in 1:length(df$data) ){
if(!is.na(as.integer(df$data[y])))
{
df[["data"]][y]<-as.character(as.Date(as.integer(df$data[y]), origin = "1899-12-30"))
}
}
assign(eval(as.character(paste0("sheet",i))),df)
}
As #d.b. mentions, consider interacting on a list of dataframes especially if similarly structured since you can run same operations using apply procedures plus you save on managing many objects in global environment. Also, consider using the vectorized ifelse to update column.
And if ever you really need separate dataframe objects use list2env to convert each element to separate object. Below wraps as.* functions with suppressWarnings since you do want to return NA.
sheetList <- mget(ls(pattern = "sheet[0-9]"))
sheetList <- lapply(sheetList, function(df) {
df$data <- ifelse(is.na(suppressWarnings(as.integer(df$data))), df$data,
as.character(suppressWarnings(as.Date(as.integer(df$data),
origin = "1899-12-30"))))
return(df)
})
list2env(sheetList, envir=.GlobalEnv)

How to group set of data.frame objects in nested list with different order?

I have set of data.frame object in nested list, I want to group them by name of data.frame object. Because each nested list, data.frame objects are placed in different order, I have difficulty to group them in new list. I tried transpose method from purr packages in CRAN, but it wasn't right answer that I expected. Does anyone knows any trick of doing this sort of grouping for data.frame object more efficiently? Thanks a lot
example:
res_1 <- list(con=list(a.con_1=airquality[1:4,], b.con_1=iris[2:5,], c.con_1=ChickWeight[3:7,]),
dis=list(a.dis_1=airquality[5:7,], b.dis_1=iris[8:11,], c.dis_1=ChickWeight[12:17,]))
res_2 <- list(con=list(b.con_2=iris[7:11,], a.con_2=airquality[4:9,], c.con_2=ChickWeight[2:8,]),
dis=list(b.dis_2=iris[2:5,], a.dis_2=airquality[1:3,], c.dis_2=ChickWeight[12:15,]))
res_3 <- list(con=list(c.con_3=ChickWeight[10:15,], a.con_3=airquality[2:9,], b.con_3=iris[12:19,]),
dis=list(c.dis_3=ChickWeight[2:7,], a.dis_3=airquality[13:16,], b.dis_3=iris[2:7,]))
desired output:
group1_New <- list(con=list(a.con_1, a.con_2, a.con_3),
dis=list(a.dis_1, a.dis_2, a.dis_3))
group2_New <- list(con=list(b.con_1, b.con_2, b.con_3),
dis=list(b.dis_1, b.dis_2, b.dis_3))
group3_New <- list(con=list(c.con_1, c.con_2, c.con_3),
dis=list(c.dis_1, c.dis_2, c.dis_3))
Here is a twice nested for loop that creates the desired structure. There is likely a more efficient method.
# put the nested lists into a list:
myList <- list(res_1, res_2, res_3)
# make a copy of the list to preserve the structure for the new list
myList2 <- myList
for(i in seq_len(length(myList))) {
# get ordering of inner list names
myOrder <- rank(names(myList[[c(i,2)]]))
for(j in seq_len(length(myList[[i]]))) {
for(k in seq_len(length(myList[[c(i, j)]]))) {
# reorder content
myList2[[c(myOrder[k], j, i)]] <- myList[[c(i, j, k)]]
# rename element
names(myList2[[c(myOrder[k], j)]])[i] <- names(myList[[c(i, j)]])[k]
}
}
}
If desired, you could extract the list items after the loops.
The key to this solution is the realization that if you put these lists into a list, the result can be achieved by selectively reversing the indices of the list items. By selectively, I mean that I incorporate rank on the data.frame names to find the proper order for the inner-most loop.
In addition to reordering the data.frames as desired, I included a line to properly reset the names within the list.

R - New variables over several data frames in a loop

In R, I have several datasets, and I want to use a loop to create new variables (columns) within each of them:
All dataframes have the same name structure, so that is what I am using to loop through them. Here is some pseudo-code with what I want to do
Name = Dataframe_1 #Assume the for-loop goes from Dataframe_1 to _10 (loop not shown)
#Pseudo-code
eval(as.name(Name))$NewVariable <- c("SomeString") #This is what I would like to do, but I get an error ("could not find function eval<-")
As a result, I should have the same dataframe with one extra column (NewVariable), where all rows have the value "SomeString".
If I use eval(as.name(Name)) I can call up the dataframe Name with no problem, but none of the usual data frame operators seem to work with that particular call (not <- assignment, or $ or [[]])
Any ideas would be appreciated, thanks in advance!
We can place the datasets in a list and create a new column by looping over the list with lapply. If needed, the original dataframe objects can be updated with list2env.
lst <- mget(paste0('Dataframe_', 1:10))
lst1 <- lapply(lst, transform, NewVariable = "SomeString")
list2env(lst1, envir = .GlobalEnv())
Or another option is with assign
nm1 <- ls(pattern = "^Dataframe_\\d+")
nm2 <- rep("NewVariable", length(nm1))
for(j in seq_along(nm1)){
assign(nm1[j], `[<-`(get(nm1[j]), nm2[j], value = "SomeString"))
}

R - Using a for loop with a test for multiple data frames

In R, I'm trying to use a for loop, with a nested test, in order to append a column to multiple data frames.
I am having trouble 1) calling a data frame with a variable name and 2) using a logical test to skip.
For example, I created 3 data frames with a number, and I want to add a column that's the squared root of the value. I want to skip the data frame if it'll result in an error.
Below is what I've gotten to so far:
df1 <- data.frame(a=c(1))
df2 <- data.frame(a=c(6))
df3 <- data.frame(a=c(-3))
df_lst$b<-
for(df_lst in c("df1","df2","df3"){
ifelse(is.na(df_lst$a) = T, skip,
df_list$b <- sqrt(df1$a)
})
In the above example, I would ideally like to see df1 and df2 with a new column b with the squared root of column a, and then nothing happens to df3.
Any help would be GREATLY appreciated, thank you everyone!
It's generally not a good idea to just have a bunch of data.frames lying around with different names if you need to do things to all of them. You're better off storing them in a list. For example
mydfs<-list(df1, df2, df3)
Then you can use lapply and such to work with those data.frames. For example
mydfs<-lapply(mydfs, function(x) {
if(all(x$a>0)) {
x$b<-sqrt(x$a)
}
x;
})
Otherwise, changing your code to
for(df_lst in c("df1","df2","df3")) {
df<-get(df_lst)
if( all(df$a>=0) ) {
df$b <- sqrt(df$a)
}
assign(df_lst, df)
}
should work as well, it's just generally not considered good practice.

Resources