I have three dataframes EC_Data, ED_Data, and ST_data
all of them have the same column names and more specifically, after 4th column
has Year named colums from 2006 to 2015
So I create a new list that has all three dataframes:
Alldata = list(EC_Data, ED_Data, ST_Data)
So I tried to rename all the columns in a for loop like below...
for(x in seq_along(Alldata))
{
for(j in seq_along(Alldata[[x]]))
{
if(j>4)
{
names(colnames(Alldata[[x]][j])) <- paste("X", substr(colnames(Alldata[[x]][j]), start = 1, stop = 5),sep="")
print(colnames(Alldata[[x]][j]))
}
}
}
But nothing happens...
I cannot understand why, because when I try to call the names of every list, for example with
view(colnames(Alldata[[2]]))
the names seems to be exactly what I want to see
Can someone help me to understand the reason that this loop doesn't work and what can I use instead of this?
Thank you
If we want to rename all the columns use lapply to loop over the list, paste with the substr of the existing column names and assign them with setNames
Alldata <- lapply(Alldata, function(x)
setNames(x, paste0("X", substr(colnames(x), 1, 5))))
Or using a for loop
for(i in seq_along(Alldata)) {
Alldata[[i]] <- setNames(Alldata[[i]],
paste0("X", substr(colnames(Alldata[[i]]), 1, 5))
}
Related
I have many dataframes. I would like to split them based on the values in a column (a factor). Then I would like to store the result of the split in separate data frame that have a specific name.
For the sake of a mrp, consider some generated data,
for (i in 1:10) {
assign(paste("df_",i,sep = ""), data.frame(x = rep(1,12), y = c(rep("a",4),rep("b",4),rep("c",4))))
}
here we have 10 dfs, df_1, df_2... to df_10. (real data is similar to generated data, but in real data column z is different for each df).
Now, I want to split the dfs by 'y' (column 2).
For 1 df, I can do the following;
splitdf <- split(df_1,df_1$y)
namessplit <- c("a","b","c")
for (i in 1:length(splitdf)) {
assign(paste("df_1_",namessplit[[i]],sep = ""),splitdf[[i]])
}
While this works for 1 df, how can I do it for all the dfs?
Big thanks in advance!
It is not recommended to create multiple objects in the global env, but if we want to know how to create the objects from a nested list - Loop over the outer list sequence and then in the inner list sequence, paste the corresponding names to assign the extracted inner list element
lst1 <- lapply(mget(ls(pattern = "^df_\\d+$")), \(x) split(x, x$y))
for(i in seq_along(lst1)) {
for(j in seq_along(lst1[[i]])) {
assign(paste0(names(lst1)[i], "_", names(lst1[[i]][j])), lst1[[i]][[j]])
}
}
-checking for objects created in the global env
> ls(pattern = "^df_\\d+_[a-z]+$")
[1] "df_1_a" "df_1_b" "df_1_c" "df_10_a" "df_10_b" "df_10_c" "df_2_a" "df_2_b" "df_2_c" "df_3_a" "df_3_b" "df_3_c" "df_4_a"
[14] "df_4_b" "df_4_c" "df_5_a" "df_5_b" "df_5_c" "df_6_a" "df_6_b" "df_6_c" "df_7_a" "df_7_b" "df_7_c" "df_8_a" "df_8_b"
[27] "df_8_c" "df_9_a" "df_9_b" "df_9_c"
I have a data frame, say acs10. I need to relabel the columns. To do so, I created another data frame, named as labelName with two columns: The first column contains the old column names, and the second column contains names I want to use, like the table below:
column_1
column_2
oldLabel1
newLabel1
oldLabel2
newLabel2
Then, I wrote a for loop to change the column names:
for (i in seq_len(nrow(labelName))){
names(acs10)[names(acs10) == labelName[i,1]] <- labelName[i,2]}
, and it works.
However, when I tried to put the for loop into a function, because I need to rename column names for other data frames as well, the function failed. The function I wrote looks like below:
renameDF <- function(dataF,varName){
for (i in seq_len(nrow(varName))){
names(dataF)[names(dataF) == varName[i,1]] <- varName[i,2]
print(varName[i,1])
print(varName[i,2])
print(names(dataF))
}
}
renameDF(acs10, labelName)
where dataF is the data frame whose names I need to change, and varName is another data frame where old variable names and new variable names are paired. I used print(names(dataF)) to debug, and the print out suggests that the function works. However, the calling the function does not actually change the column names. I suspect it has something to do with the scope, but I want to know how to make it works.
In your function you need to return the changed dataframe.
renameDF <- function(dataF,varName){
for (i in seq_len(nrow(varName))){
names(dataF)[names(dataF) == varName[i,1]] <- varName[i,2]
}
return(dataF)
}
You can also simplify this and avoid for loop by using match :
renameDF <- function(dataF,varName){
names(dataF) <- varName[[2]][match(names(dataF), varName[[1]])]
return(dataF)
}
This should do the whole thing in one line.
colnames(acs10)[colnames(acs10) %in% labelName$column_1] <- labelName$column_2[match(colnames(acs10)[colnames(acs10) %in% labelName$column_1], labelName$column_1)]
This will work if the column name isn't in the data dictionary, but it's a bit more convoluted:
library(tibble)
df <- tribble(~column_1,~column_2,
"oldLabel1", "newLabel1",
"oldLabel2", "newLabel2")
d <- tibble(oldLabel1 = NA, oldLabel2 = NA, oldLabel3 = NA)
fun <- function(dat, dict) {
names(dat) <- sapply(names(dat), function(x) ifelse(x %in% dict$column_1, dict[dict$column_1 == x,]$column_2, x))
dat
}
fun(d, df)
You can create a function containing just on line of code.
renameDF <- function(df, varName){
setNames(df,varName[[2]][pmatch(names(df),varName[[1]])])
}
I have a few data frames that have the names df_JANUARY 2020, df_FEBRUARY 2020 etc. (I know spaces are an ill practice in variable assignment, but it has to do with a sql query). And would like to build a function to iterate through the months of these data frames. The purpose of this is have the function (not written below) clean each df the same way.
date <- c("JANUARY 2020", "FEBRUARY 2020")
x <- function(date) {
y <- get(paste0("df_", date))
}
for(i in seq_along(date)) {
z <- date[i]
assign(paste0("dfclean_", date[i]), x(z))
}
The problem being that when I use the get() function it's pushing the whole list through rather than one element at a time. Is there away to avoid this problem with this methodology or is there a better way to approach this problem? Any help is extremely appreciated.
We can convert the matrix to data.frame and then use $ as matrix columns are extracted with [
x <- function(daten) {
y <- as.data.frame(get(paste0("df_", daten)))
y[grep("Enterprise", y$AcctType), ]
}
for(i in seq_along(date)) {
z <- date[i]
assign(paste0("dfclean_", date[i]), x(z))
}
We can also use mget
lst1 <- mget(paste0("df_", date))
lst1 <- lapply(lst1, function(x) subset(as.data.frame(x),
grepl("Enterprise",AcctType)))
names(lst1) <- sub("_", "clean_", names(lst1))
list2env(lst1, .GlobalEnv)
I know you didn't ask for this, but how about just rename all of the dataframes with _ instead of space?
The first line assigns all of the objects in the global environment with df in the name to be elements of a list named mydfs.
The second line replaces space with _ in the names.
The third line assigns all of the list elements into the global environment.
mydfs <- mget(ls(pattern = "df"), globalenv())
names(mydfs) <- gsub(" ","_",names(mydfs))
list2env(mydfs, env = globalenv())
Or, option two, you could just use lapply on mydfs.
I have the same problem as this guy: returning from list to data.frame after lapply
Whilst they solved his specific problem, no one actually answered his original question about how to get dataframes out of a list.
I have a list of data frames:
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
And I want to filter/replace etc on them all.
So my function is:
DoThis = function(x){
filter(x, year >=2015 & year <=2018) %>%
replace(is.na(.), 0) %>%
adorn_totals("row")
}
And I use lapply to run the function on them all like this:
a = lapply(dfPreList, DoThis)
As the other post stated, these data frames are now stuck in this list (a), and I need a for loop to get them out, which just cannot be the correct way of doing it.
This is my current working way of applying the function to the dataframes and then getting them out:
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
dfPreListstr= list('yearlyFunding', 'yearlyPubs', 'yearlyAuthors')
DoThis = function(x){
filter(x, year >=2015 & year <=2018) %>%
replace(is.na(.), 0) %>%
adorn_totals("row")
}
a = lapply(dfPreList, DoThis)
for( i in seq_along(dfPreList)){
assign(dfPreListstr[[i]], as.data.frame(a[i]))
}
Is there a way of doing this without having to rely on for loops and string names of the dataframes? I.e. a one-liner with the lapply?
Many thanks for your help
You can assign names to the list and then use list2env.
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
a = lapply(dfPreList, DoThis)
names(a) <- c('yearlyFunding', 'yearlyPubs', 'yearlyAuthors')
list2env(a, .GlobalEnv)
Another way would be to unlist the list, then convert the content into data frame.
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
a = lapply(dfPreList, DoThis)
names(a) <- c('yearlyFunding', 'yearlyPubs', 'yearlyAuthors')
yearlyFunding <- data.frame(matrix(unlist(a$yearlyFunding), nrow= nrow(yearlyFunding), ncol= ncol(yearlyFunding)))
yearlyPubs <- data.frame(matrix(unlist(a$yearlyPubs), nrow= nrow(yearlyPubs), ncol= ncol(yearlyPubs)))
yearlyAuthors <- data.frame(matrix(unlist(a$yearlyAuthors), nrow= nrow(yearlyAuthors), ncol= ncol(yearlyAuthors)))
Since unlist function returns a vector, we first generate a matrix, then convert it to data frame.
I discovered that it seems that I can not add rows to a data.frame in place.
The following code is a minimal example which should append a new row to the data.frame every iteration, but it does not append any.
Please note, in reality I have a complex for-loop with a lot of different if-statements and depending on them I want to append new different data to different data frames.
df <- data.frame(value=numeric())
appendRows <- function(n_rows) {
for(i in 1:n_rows) {
print(i)
df <- rbind(df, setNames(i,names(df)))
}
}
appendRows(10) #Does not append any row, whereas "df <- rbind(df, setNames(1,names(df)))" in a single call appends one row.
How can rows be added to a data.frame in place?
Thanks :-)
Don't forget to return your object:
df <- data.frame(value=numeric())
appendRows <- function(n_rows) {
for(i in 1:n_rows) {
print(i)
df <- rbind(df, setNames(i,names(df)))
}
return(df)
}
appendRows(10)
To modify df you have to store it:
df <- appendRows(10)