rownames on multiple dataframe with for loop in R - r

I have several dataframe. I want the first column to be the name of each row.
I can do it for 1 dataframe this way :
# Rename the row according the value in the 1st column
row.names(df1) <- df1[,1]
# Remove the 1st column
df1 <- df1[,-1]
But I want to do that on several dataframe. I tried several strategies, including with assign and some get, but with no success. Here the two main ways I've tried :
# Getting a list of all my dataframes
my_df <- list.files(path="data")
# 1st strategy, adapting what works for 1 dataframe
for (i in 1:length(files_names)) {
rownames(get(my_df[i])) <- get(my_df[[i]])[,1] # The problem seems to be in this line
my_df[i] <- my_df[i][,-1]
}
# The error is Could not find function 'get>-'
# 2nd strategy using assign()
for (i in 1:length(my_df)) {
assign(rownames(get(my_df[[i]])), get(my_df[[i]])[,1]) # The problem seems to be in this line
my_df[i] <- my_df[i][,-1]
}
# The error is : Error in assign(rownames(my_df[i]), get(my_df[[i]])[, 1]) : first argument incorrect
I really don't see what I missed. When I type get(my_df[i]) and get(my_df[[i]])[,1], it works alone in the console...
Thank you very much to those who can help me :)

You may write the code that you have in a function, read the data and pass every dataframe to the function.
change_rownames <- function(df1) {
row.names(df1) <- df1[,1]
df1 <- df1[,-1]
df1
}
my_df <- list.files(path="data")
list_data <- lapply(my_df, function(x) change_rownames(read.csv(x)))

We can use a loop function like lapply or purrr::map to loop through all the data.frames, then use dplyr::column_to_rownames, which simplifies the procedure a lot. No need for an explicit for loop.
library(purrr)
library(dplyr)
map(my_df, ~ .x %>% read.csv() %>% column_to_rownames(var = names(.)[1]))

Related

Code Breaking When Turned Into Custom Function?

I am putting together a summary table from a larger data frame. I noticed that I was re-using the following code but with different %like% characters:
# This code creates a df of values where the row name matches the character
df <- (data[which(data$`col_name` %like% "Total"),])
df <- df[3:ncol(df)]
df[is.na(df)] <- 0
# This creates a row composed of the sum of each column
for (i in seq_along(df)) {
df[10, i] <- sum(df[i])
}
# This inserts the resulting values into a separate summary table
summary[1, 2:ncol(summary)] <- df[nrow(df),]
To keep the code dry and avoid repetition, I thought it would be best to translate this into a custom function that I could then call with different strings:
create_row <- function(x) {
df <- (data[which(data$`Crop year` %like% as.character(x)),])
df <- df[3:ncol(df)]
df[is.na(df)] <- 0
for (i in seq_along(df)) {
df[10, i] <- sum(df[i])
}
}
# Then populate the summary table as before with the results
total <- create_row("Total")
summary[1, 2:ncol(summary)] <- total[nrow(total),]
However when attempting to run this, it simply returns an empty variable.
Through trial and error, I have found that the line of code causing this is:
df[is.na(df)] <- 0
The code works absolutely fine when run line by line outside of this custom function.
As mentioned in the comments if you add return(df) at the end of the function, the function will work. We need to do that because for loop unlike any other functions doesn't return an object after it's executed.
Moreover, as mentioned in the comments by #alan that you can use colSums to get sum of each column directly instead of for loop to loop over each column and take its sum.

Using a loop to create multiple dataframes from a single dataset

Quick question for you. I have the following:
a <- c(1,5,2,3,4,5,3,2,1,3)
b <- c("a","a","f","d","f","c","a","r","a","c")
c <- c(.2,.6,.4,.545,.98,.312,.112,.4,.9,.5)
df <- data.frame(a,b,c)
What i am looking to do is utilize a for loop to create multiple data frames from rows based on the column contents of column B (i.e. a df for the "a," the "d," and so on).
At the same time, I would also like to name the data frame based on the corresponding value from column B (df will be named "a" for the data frame created from the "a."
I tried making it work based off the answers provided here Using a loop to create multiple data frames in R but i had no luck.
If it helps, I have variables created with levels() and nlevels() to use in the loop to keep it scalable based on how my data changes. Any help would be much appreciated.
Thanks!
This should do:
require(dplyr)
df$b <- as.character(df$b)
col.filters <- unique(df$b)
lapply(seq_along(col.filters), function(x) {
filter(df, b == col.filters[x])
}
) -> list
names(list) <- col.filters
list2env(list, .GlobalEnv)
Naturally, you don't need dplyr to do this. You can just use base syntax:
df$b <- as.character(df$b)
col.filters <- unique(df$b)
lapply(seq_along(col.filters), function(x) {
df[df[, "b"] == col.filters[x], ]
}
) -> list
names(list) <- col.filters
list2env(list, .GlobalEnv)
But I find dplyrmuch more intuitive.
Cheers

lapply and dplyr combination to process nested data frames

I have a list of dataframes inside of my folder directory which I want to process for analyses. I read them by using inside of lapply function first, then I want to process its columns and order its rows by grouping. Therefore most of times I needed to combine dplyr and lapply functions to process faster of my data.
I looked through out the web and check some books but most of the examples are easy ones and do not cover combination of these two functions.
Here is the sample code which I'm using:
files <- mixedsort(dir(pattern = "*.txt",full.names = FALSE)) # to read data
data <- lapply(files,function(x){
tmp <- read.table(file=x, fill=T, sep = "\t", dec=".", header=F,stringsAsFactors=F)
df <- tmp [!grepl(c("AC"),tmp $V1),]
new.df <- select(df, V1:V26)
new.df <- apply(new.df, function(x){ x[11:26] <- x[11:26]/10000;x })
I am getting the following error:
Error in match.fun(FUN) : argument "FUN" is missing, with no default
Here is the reproducible example which looks like my data. Lets say I want to process 2nd and 3rd column of my dat and group by let column. When I try to put below fun command inside of data code above I got error. Any guidance will be appreciated.
dat <- lapply(1:3, function(x)data.frame(let=sample(letters,4),a=sort(runif(20,0,10000),decreasing=TRUE), b=sort(runif(20,0,10000),decreasing=TRUE), c=rnorm(20),d=rnorm(20)))
fun <- lapply(dat, function(x){x[2:3] <-x[2:3] /10000; x})
as mentioned in the comments to your question, the apply function was causing the error. However I don't think apply is what you want, because it aggregates your dataframe.
using just dplyr-syntax your problem can be solved like this:
tmp %>%
filter(!grepl("AC",V1)) %>%
select(V1:V26) %>%
mutate_each(funs(./1000), V11:V26)

R change column names over multiple data

I'm trying to change column names over multiple data sets. I have tried writing the following function to do this:
# simplified test data #
df1<-as.data.frame(c("M","F"))
colnames(df1)<-"M1"
# my function #
rename_cols<-function(df){
colnames(df)[names(df) == "M1"] <- "sex"
}
rename_cols(df1)
However when testing this function on df1, the column is always called "M1" instead of "sex". How can I correct this?
SOLUTION - THANKS TO DAVID ARENBERG
rename_cols<-function(df){
colnames(df)[names(df) == "M1"] <- "sex"
df
}
df1<-rename_cols(df1)
Here is another solution which gets around the problem of functions operating in a temporary space:
df <- as.data.frame(c("M","F"))
colnames(df) <- "M1"
rename_cols <- function(df) {
colnames(df)[names(df) == "M1"] <<- "sex"
}
> rename_cols(df) # this will operate directly on the 'df' object
> df
sex
1 M
2 F
Using the global assignment operator <<- makes the name changes to the input data frame df "stick". Granted, this solution is not ideal because it means the function could potentially do something unwanted. But I feel this is in the spirit of what you were trying to do originally.

Replace a number in dataframe

I have a dataframe in which I occasionally have -1s. I want to replace them with NA. I tried the apply function, but it returns a matrix of characters to me, which is no good:
apply(d,c(1,2), function(x){
if (x == -1){
return (NA)
}else{
return (x)
}
})
I am wrestling with by but I cannot seem to handle it properly. I have got this so far:
s <-by(d,d[,'Q1_I1'], function(x){
for(i in x)
print(i)
})
which if I understood correctly by() serves into x my dataframe row by row. And I can iterate through the each element of the row by the for function. I just don't know how to replace the value.
The reason that apply does not work is that it converts a data frame to a matrix and if your data frame has any factors then this will be a character matrix.
You can use lapply instead which will process the data frame one column at a time. This code works:
mydf <- data.frame( x=c(1:10, -1), y=c(-1, 10:1), g=sample(letters,11) )
mydf
mydf[] <- lapply(mydf, function(x) { x[x==-1] <- NA; x})
mydf
As #rawr mentions in the comments it does work to do:
mydf[ mydf== -1 ] <- NA
but the documentation (?'[.data.frame') say that that is not recommended due to the conversions.
One big question is how the data frame is being created. If you are reading the data using read.table or related functions then you can just specify the na.strings argument and have the conversion done for you as the data is read in.
You can do this fast and transparently with the data.table library.
# take standard dataset and transform to data.table
mtcars = data.table(mtcars,keep.rownames = TRUE)
# select rows with 5 gear and set to NA
mtcars[gear==5,gear:= NA]
mtcars

Resources