Some background for my question: This is an R script that a previous research assistant wrote, but he did not provide any guidance to me on using it for myself. After working through an R textbook, I attempted to use the code on my data files.
What this code is supposed to do is load multiple .csv files, delete certain items/columns from them, and then write the new cleaned .csv files to a specified directory.
Currently, the files are being created in the right directory with the right file name, but the .csv files that are being created are empty.
I am currently getting the following error message:
Warning in
fread(input = paste0("data/", str_match(pattern = "CAFAS|PECFAS",: Starting data input on line 2 and discarding line 1 because it has too few or too many items to be column names or data: (variable names).
This is my code:
library(data.table)
library(magrittr)
library(stringr)
# create a function to delete unnecessary variables from a CAFAS or PECFAS
data set and save the reduced copy
del.items <- function(file){
data <- fread(input = paste0("data/", str_match(pattern = "CAFAS|PECFAS",
string = file) %>% tolower, "/raw/", file), sep = ",", header = TRUE,
na.strings = "", stringsAsFactors = FALSE, skip = 0, colClasses =
"character", data.table = FALSE)
data <- data[-grep(pattern = "^(CA|PEC)FAS_E[0-9]+(TR?(Initial|[0-
9]+|Exit)|SP[a-z])_(G|S|Item)[0-9]+$", x = names(data))]
write.csv(data, file = paste0("data/", str_match(pattern = "CAFAS|PECFAS",
string = file) %>% tolower, "/items-del/", sub(pattern = "ExportData_", x =
file, replacement = "")) %>% tolower, row.names = FALSE)
}
# delete items from all cafas data sets
cafas.files <- list.files("data/cafas/raw", pattern = ".csv")
for (file in cafas.files){
del.items(file)
}
# delete items from all pecfas data sets
pecfas.files <- list.files("data/pecfas/raw", pattern = ".csv")
for (file in pecfas.files){
del.items(file)
}
Related
I have a list of 50 text files all beginning with NEW.
I want to loop through each textfile/dataframe and run some function and then output the results via the write.table function. Therefore for each file, a function is applied and then an output should be created containing the original name with output at the end.
Here is my code.
fileNames <- Sys.glob("*NEW.*")
for (fileName in fileNames) {
df <- read.table(fileName, header = TRUE)
FUNCTION (not shown as this works)
...
result <-print(chr1$results) #for each file a result would be printed.
write.table(result, file = paste0(fileName,"_output.txt"), quote = F, sep = "\t", row.names = F, col.names = T)
#for each file a new separate file is created with the original output name retained.
}
However, I only get one output rather than 50 output files. It seems like its only looping through one file. What am I doing wrong?
readme <- function(folder_name = "my_texts"){
file_list <- list.files(path = folder_name, pattern = "*.txt",
recursive = TRUE, full.names = TRUE).
#list files with .txt ending
textdata <- lapply(file_list, function(x) {.
paste(readLines(x), collapse=" ").
}).
#apply readlines over the file list.
data.table::setattr(textdata, "names", file_list) .
#add names attribute to textdata from file_list.
lapply(names(file_list), function(x){.
lapply(names(file_list[[x]]), function(y) setattr(DT[[x]], y,
file_list[[x]][[y]])).
}).
#set names attribute over the list.
df1 <- data.frame(doc_id = rep(names(textdata), lengths(textdata)),
doc_text = unlist(textdata), row.names = NULL).
#convert to dataframe where names attribute is doc_id and textdata is text.
return(df1).
}
I have a folder containing several .csv files. I need to delete the first three rows and the very last row of all those .csv files and then save them all as .txt. All the files have the same format so it's always the same rows that I would need to delete.
I know how to modify a single dataframe but I do not know how to load, modify and save as txt several dataframes.
I am a beginner using R so I do not have examples of things I have tried yet.
Any help will be really appreciated!
It's hard to start with stack overflow but the other comments about reproducible examples are worth thinking about for the future. My suggestion would be to write a function that reads, modifies, and writes and then loop it across all the files.
I can't tell exactly how to do this as I can't see your data but something like this should work:
library('tidyverse')
old_paths = list.files(
path = your_folder,
pattern = '\\.csv$',
full.names = TRUE
)
read_write = function(path){
new_filename = str_replace(
string = path,
pattern = '\\.csv$',
replacement = '.txt'
)
read_csv(path) %>%
slice(-(1:3)) %>%
slice(-n()) %>%
write_tsv(new_filename) %>%
invisible()
}
lapply(old_paths, read_write)
Let's do this for one data frame, only referencing its file name
input_file = "my_data_1.csv"
data = read.csv(input_file)
# modify
data = data[-(1:3), ] # delete first 3 rows
data = data[-nrow(data), ] # delete last row
# save as .txt
output_file = sub("csv$", "txt", input_file)
write.table(x = data, file = output_file, sep = "\t", row.names = FALSE)
Now we can turn it into a function taking the file name as an argument:
my_txt_convert = function(input_file) {
data = read.csv(input_file)
# modify
data = data[-(1:3), ] # delete first 3 rows
data = data[-nrow(data), ] # delete last row
# save as .txt
output_file = sub("csv$", "txt", input_file)
write.table(x = data, file = output_file, sep = "\t", row.names = FALSE)
}
Then we call the function on all your files:
to_convert = list.files(pattern='.*.csv')
for (file in to_convert) {
my_txt_convert(file)
}
# or
lapply(to_convert, my_txt_convert)
I have multiple EEG data files in .txt format all saved in a single folder, and I would like R to read all the files in said folder, add column headings (i.e., electrode numbers denoted by ordered numbers from 1 to 129) to every file, and overwrite old files with new ones.
rm(list=ls())
setwd("C:/path/to/directory")
files <- Sys.glob("*.txt")
for (file in files){
# read data:
df <- read.delim(file, header = TRUE, sep = ",")
# add header to every file:
colnames(df) <- paste("electrode", 1:129, sep = "")
# overwrite old text files with new text files:
write.table(df, file, append = FALSE, quote = FALSE, sep = ",", row.names = FALSE, col.names = TRUE)
}
I expect the column headings of ordered numbers (i.e., electrode1 to electrode129) to appear on first row in every text file but the code doesn't seem to work.
I bet the solution is ridiculously simple, but I just haven't found any useful information regarding this issue...
Try this one
for (file in files) {
df = read.delim(file,header = FALSE,sep = ",")
colnames(df) = paste("electrode",1:129,sep = "")
write.table(df, file = "my_data.txt", sep = ",")
}
Some background for my question: This is an R script that a previous research assistant wrote, but he did not provide any guidance to me on using it for myself. After working through an R textbook, I attempted to use the code on my data files.
What this code is supposed to do is load multiple .csv files, delete certain items/columns from them, and then write the new cleaned .csv files to a specified directory.
When I run my code, I don't get any errors, but the code isn't going anything. I originally thought that this was a problem with file permissions, but I'm still having the problem after changing them. Not sure what to try next.
Here's the code:
library(data.table)
library(magrittr)
library(stringr)
# create a function to delete unnecessary variables from a CAFAS or PECFAS
data set and save the reduced copy
del.items <- function(file)
{
data <- read.csv(input = paste0("../data/pecfas|cafas/raw",
str_match(pattern = "cafas|pecfas", string = file) %>% tolower, "/raw/",
file), sep = ",", header = TRUE, na.strings = "", stringsAsFactors = FALSE,
skip = 0, colClasses = "character", data.table = FALSE)
data <- data[-grep(pattern = "^(CA|PEC)FAS_E[0-9]+(T(Initial|[0-
9]+|Exit)|SP[a-z])_(G|S|Item)[0-9]+$", x = names(data))]
write.csv(data, file = paste0("../data/pecfas|cafas/items-del",
str_match(pattern = "cafas|pecfas", string = file) %>% tolower, "/items-
del/", sub(pattern = "ExportData_", x = file, replacement = "")) %>%
tolower, sep = ",", row.names = FALSE, col.names = TRUE)
}
# delete items from all cafas data sets
cafas.files <- list.files("../data/cafas/raw/", pattern = ".csv")
for (file in cafas.files){
del.items(file)
}
# delete items from all pecfas data sets
pecfas.files <- list.files("../data/pecfas/raw/", pattern = ".csv")
for (file in pecfas.files){
del.items(file)
}
I am working on combining csv files into one large csv file that will not be able to fit into my machine's RAM. Is there anyway to go about doing that in R? I realize that I could load each individual csv file into R and append the file to an existing database table but for quirky reasons I'm looking to end up with a large csv file.
Try to read each csv file one by one and write out with write.table and option append = T.
Something like this:
read one csv file;
write.table(..., append = T) to the final csv file;
remove the table with rm();
gc().
Repeate until all files are written out.
You can use the option append = TRUE
first <- data.frame(x = c(1,2), y = c(10,20))
second <- data.frame(c(3,4), c(30,40))
write.table(first, "file.csv", sep = ",", row.names = FALSE)
write.table(second, "file.csv", append = TRUE, sep = ",", row.names = FALSE, col.names = FALSE)
First create 3 test files and then create a variable Files containing their names. We used Sys.glob to do get the vector of file names but you may need to modify this statement. Then define outFile as the name of the output file. For each component of Files read in the file with that name and write it out. If it is the first file then write it all out and if it is a subsequent file write it all except for the header being sure to use append = TRUE. Note that L is overwritten each time a file is read in so that only one file takes up space at a time.
# create test files using built in data frame BOD
write.csv(BOD, "BOD1.csv", row.names = FALSE)
write.csv(BOD, "BOD2.csv", row.names = FALSE)
write.csv(BOD, "BOD3.csv", row.names = FALSE)
Files <- Sys.glob("BOD*.csv") # modify as appropriate
outFile <- "out.csv"
for(f in Files) {
L <- readLines(f)
if (f == Files[1]) cat(L, file = outFile, sep = "\n")
else cat(L[-1], file = outFile, sep = "\n", append = TRUE)
}
# check that the output file was written properly
file.show(outFile)
The loop could alternately be replaced with this:
for(f in Files) {
d <- read.csv(f)
first <- f == Files[1]
write.table(d, outFile, sep = ",", row.names = FALSE, col.names = first, append = !first)
}