How to assign new data to the variable within variable in R - r

I have my file names as all.files in working directory. I want to read these files in loop and assign the names as gsub(".csv","", all.files) for each file.
all.files <- c("harvestA.csv", "harvestB.csv", "harvestC.csv", "harvestD.csv",
"seedA.csv", "seedB.csv", "seedC.csv", "seedD.csv")
I tried something like this below but it won't work. What do I need here?
for(i in 1:length(all.files)){
assign(gsub(".csv","", all.files)[i]) <- read.table(
all.files[i],
header = TRUE,
sep = ","
)
}

You could keep them in a named list as it is not a good practice to clutter the environment with lot of global variables
list_df <- lapply(all.files, read.csv)
names(list_df) <- sub("\\.csv", "", all.files)
You can always extract individual dataframes as list_df[["harvestA"]], list_df[["harvestB"]] etc.
If you still need them as separate dataframes
list2env(list_df, .GlobalEnv)

The . is a metacharacter in regex matching any character. So, we can use fixed = TRUE to match the literal dot. Also, in the OP's code, with the assign, there is no need for another assignment operator (<-), the second argument in assign is value and here it is the dataset read with the read.table
for(i in 1:length(all.files)){
assign(sub(".csv","", all.files, fixed = TRUE)[i], read.table(
all.files[i],
header = TRUE,
sep = ","
))
}

An option using strsplit
for (i in seq_along(all.files)) {
assign(x = strsplit(allfiles[i],"\\.")[[1]][1],
value = read.csv(all.files[i]),
pos = .GlobalEnv)
}

Related

How to name dataframe in for loop

I'm trying to call a dataframe but it's named with a number because it was originally multiple. I want to either rename the dataframes in my loop or find a way to call my dataframe even though it is titled with a number. Right now, after I run this code:
filenames <- list.files(path = "filepath",pattern = ".*txt")
head(filenames)
names <- substr(filenames,1,22)
for(i in names){
filepath <-file.path("filepath",paste(i,".txt",sep = ""))
assign(i,read.delim(filepath,colClasses = c('character','character','factor','factor'),sep = "\t"))
}
I get a lot of separate dataframes with names like '101_1b1_Al_sc_Meditron.txt'. When I try to even view the dataframe, R is confused because the name begins with a number.
Is there a good solution here?
The simplest solution is to reference the original names using backticks.
example:
`123_mtcars` <- mtcars
View(`123_mtcars`)
If you would prefer to create a naming convention or just to remove numbers from each dataframe name you could do that in your loop and use the new variable in your assign statement.
example:
filenames <- list.files(path = "filepath",pattern = ".*txt")
head(filenames)
names <- substr(filenames,1,22)
for(i in names){
filepath <-file.path("filepath",paste(i,".txt",sep = ""))
# gsub to replace all numbers with "" for the name i
dfName <- gsub("[0-9]", "", i)
assign(dfName,read.delim(filepath,colClasses = c('character','character','factor','factor'),sep = "\t"))
}
The are 3 solutions I can think of :
1. Keeping your code in current state.
If we don't change anything about your code and your dataframes are named as '101_1b1_Al_sc_Meditron' to view the contents of the dataframe you can use backticks. Try using it like this :
`101_1b1_Al_sc_Meditron`
2. Change the name of dataframes.
In your loop change the assign line to
assign(paste0('df_', i), read.delim(filepath,
colClasses = c('character','character','factor','factor'),sep = "\t"))
So after running for loop you'll have filenames as df_101_1b1_Al_sc_Meditron which is a standard name and you can access them without any problem.
3. Store data in a list.
Instead of having so many dataframes in the global environment why not store them in a list. Lists are easier to manage.
list_of_files <-lapply(filepath, function(x) read.delim(x,
colClasses = c('character','character','factor','factor'),sep = "\t"))

Naming a dataframe like the path

I have a lot of CSV that need to be standardized. I created a dictionary for doing so and so far the function that I have looks like this:
inputpath <- ("input")
files<- paste0(inputpath, "/",
list.files(path = inputpath, pattern = '*.gz',
full.names = FALSE))
standardizefunctiontofiles = lapply(files, function(x){
DF <- read_delim(x, delim = "|", na="")
names(DF) <- dictionary$final_name[match(names(DF), dictionary$old_name)]
})
Nonetheless, the issue that I have is that when I read the CSV and turn them into a dataframe they lose their path and therefore I can't not write each of them as a CSV that matches the input name. What I would normally do would be:
output_name <- str_replace(x, "input", "output")
write_delim(x, "output_name", delim = "|")
I was thinking that a way of solving this would be to make this step:
DF <- read_delim(x, delim = "|", na="")
so that the DF gets the name of the path but I haven't find any solution for that.
Any ideas on how to solve this issue for being able to apply a function and writing each of them as a standardized CSV?
I don't completely understand the question. But as far as I understood you want to overwrite CSV files you are reading with a new CSV file that contains the information of a modified (and correct) data frame.
I think you have two alternatives
Option 1) When reading data, store both CSV as a data frame and path as a string within a list.
This would be something like
file_list <- list()
for (i in seq_along(files)) {
file_list[[i]] <- list(df = read_delim(files[[i]], delim = "|", na = ""),
path = files[[i]])
}
Then, when you write the corrected data frames, you can use the paths in the second element of the list within the list file_list. Note that in order to get the path as a string you will need to do something like file_list[[1]][["path"]]
Option 2) Use assign
for (i in seq_along(files)) {
assign(files[[i]], read_delim(files[[i]], delim = "|", na = ""))
}
Option 3) Use do.call and the fact that <- is a function!
for (i in seq_along(files)) {
do.call("<-", list(files[[i]], read_delim(files[[i]], delim = "|", na = "")))
}
I hope this is useful!!
NB) None of the functions are implemented as efficiently as possible. They just introduce the idea.

R unable to detect that I have more than one column in loaded files

What I want to do is take every file in the subdirectory that I am in and essentially just shift the column header names over one left.
I try to accomplish this by using fread in a for loop:
library(data.table)
## I need to write this script to reorder the column headers which are now apparently out of wack
## I just need to shift them over one
filelist <- list.files(pattern = ".*.txt")
for(i in 1:length(filelist)){
assign(filelist[[i]], fread(filelist[[i]], fill = TRUE))
names(filelist[[i]]) <- c("RowID", "rsID", "PosID", "Link", "Link.1","Direction", "Spearman_rho", "-log10(p)")
}
However, I keep getting the following or a variant of the following error message:
Error in names(filelist[[i]]) <- c("RowID", "rsID", "PosID", "Link", "Link.1", :
'names' attribute [8] must be the same length as the vector [1]
Which is confusing to me because, as you can clearly see above, R Studio is able to load the files as having the correct number of columns. However, the error message seems to imply that there is only one column. I have tried different functions, such as colnames, and I have even tried to define the separator as being quotation marks (as my files were previously generated by another R script that quotation-separated the entries), to no luck. In fact, if I try to define the separator as such:
for(i in 1:length(filelist)){
assign(filelist[[i]], fread(filelist[[i]], sep = "\"", fill = TRUE))
names(filelist[[i]]) <- c("RowID", "rsID", "PosID", "Link", "Link.1","Direction", "Spearman_rho", "-log10(p)")
}
I get the following error:
Error in fread(filelist[[i]], sep = "\"", fill = TRUE) :
sep == quote ('"') is not allowed
Any help would be appreciated.
I think the problem is that, despite the name, list.files returns a character vector, not a list. So using [[ isn't right. Then, with assign, you create an objects that have the same name as the files (not good practice, it would be better to use a list). Then you try to modify the names of the object created, but only using the character string of the object name. To use an object who's name is in a character string, you need to use get (which is part of why using a list is better than creating a bunch of objects).
To be more explicit, let's say that filelist = c("data1.txt", "data2.txt"). Then, when i = 1, this code: assign(filelist[[i]], fread(filelist[[i]], fill = TRUE)) creates a data table called data1.txt. But your next line, names(filelist[[i]]) <- ... doesn't modify your data table, it modifies the first element of filelist, which is the string "data1.txt", and that string indeed has length 1.
I recommend reading your files into a list instead of using assign to create objects.
filelist <- list.files(pattern = ".*.txt")
datalist <- lapply(filelist, fread, fill = TRUE)
names(datalist) <- filelist
For changing the names, you can use data.table::setnames instead:
for(dt in datalist) setnames(dt, c("RowID", "rsID", "PosID", "Link", "Link.1","Direction", "Spearman_rho", "-log10(p)"))
However, fread has a col.names argument, so you can just do it in the read step directly:
my_names <- c("RowID", "rsID", "PosID", "Link", "Link.1","Direction", "Spearman_rho", "-log10(p)")
datalist <- lapply(filelist, fread, fill = TRUE, col.names = my_names)
I would also suggest not using "-log10(p)" as a column name - nonstandard column names (with parens and -) are usually more trouble than they are worth.
Could you run the following code to have a closer look at what you are putting into filelist?
i <- 1
assign(filelist[[i]], fread(filelist[[i]], fill = TRUE))
print(filelist[[i]])
I suspect you may need to use the code below instead of the assign statement
filelist[[i]] <- fread(filelist[[i]], fill = TRUE)

numeric fields turning into "char" while using stringsAsFactor = F

I am trying to import a few csv files from a specific folder:
setwd("C://Users//XYZ//Test")
filelist = list.files(pattern = ".*.csv")
datalist = lapply(filelist, FUN=read.delim, sep = ',', header=TRUE,
stringsAsFactors = F)
for (i in 1:length(datalist)){
datalist[[i]]<-cbind(datalist[[i]],filelist[i])
}
Data = do.call("rbind", datalist)
After I use the above code, a few columns are type character, despite containing numbers. If I don't use stringsAsFactor = F then the fields read as factor which turns into missing values when I use as.numeric(as.character()) later on.
Is there any solution so that I can keep some fields as numeric? The fields that I want to be as numeric look like this:
Price.Plan Feature.Charges
$180.00 $6,307.56
$180.00 $5,431.25
Thanks
The $, , are not considered numeric, so while using stringsAsFactors = FALSE in the read.delim, it assigns the column type as character. To change that, remove the $, , with gsub, convert to numeric and assign it to the particular columns
df <- lapply(df, function(x) as.numeric(gsub("[$,]", "", x)))

Function to read in multiple delimited text files

Using this answer, I have created a function that should read in all the text datasets in a directory:
read.delims = function(dir, sep = "\t"){
# Make a list of all data frames in the "data" folder
list.data = list.files(dir, pattern = "*.(txt|TXT|csv|CSV)")
# Read them in
for (i in 1:length(list.data)) {
assign(list.data[i],
read.delim(paste(dir, list.data[i], sep = "/"),
sep = sep))
}
}
However, even though there are .txt and .csv files in the specified directory, no R objects get created (I'm guessing this happens because I'm using the read.delim within a function). How to correct this?
You can add the parameter envir in your assignment, like this :
read.delims = function(dir, sep = "\t"){
# Make a list of all data frames in the "data" folder
list.data = list.files(dir, pattern = "*.(txt|TXT|csv|CSV)")
# Read them in
for (i in 1:length(list.data)) {
assign(list.data[i],
read.delim(paste(dir, list.data[i], sep = "/"),
sep = sep),
envir=.GlobalEnv)
}
}
Doing this, your object will be created in the global environment and not just in the function environment
As I said in my comment, it is necessary to return() a value after assigning. I don't really see the point in using assign() though, so here it is with a simple for-loop, assuming you want your output to be a list of data frames.
Note that I changed the reading function to read.table() for personal convenience. You might want to adjust that.
read.delims <- function(dir, sep = "\t"){
# Make a list of all data frames in the "data" folder
list.data <- list.files(dir, pattern = "*.(txt|TXT|csv|CSV)")
list.out <- as.list(1:length(list.data))
# Read them in
for (i in 1:length(list.data)) {
list.out[[i]] <- read.table(paste(dir, list.data[i], sep = "/"), sep = sep)
}
return(list.out)
}
Maybe you should also add a $ to your regular expression.
Cheers.

Resources