Error in reading multple text files from directory in R - r

I would like to read multiple text files from my directory the files are arranged in following format
regional_vol_GM_atlas1.txt
regional_vol_GM_atlas2.txt
........
regional_vol_GM_atlas152.txt
Data from the files looks in following format
667869 667869
580083 580083
316133 316133
3631 3631
following is the script that i have written
library(readr)
library(stringr)
library(data.table)
array <- c()
for (file in dir(/media/dev/Daten/Task1/subject1/t1)) # path to the directory where .txt files are located
{
row4 <- read.table(file=list.files(pattern ="regional_vol*.txt"),
header = FALSE,
row.names = NULL,
skip = 3, # Skip the 1st 3 rows
nrows = 1, # Read only the next row after skipping the 1st 3 rows
sep = "\t") # change the separator if it is not "\t"
array <- cbind(array, row4)
}
I am incurring following error
Error in file(file, "rt") : invalid 'description' argument
kindly suggest me where i was wrong in the script

This seems to work fine for me. Make changes as per code comments in case files have headers :
[Answer Edited to reflect new information posted by OP]
# rm(list=ls()) #clean memory if you can afford to
mydir<- "~/Desktop/a" #change as per your path
# read full paths
myfiles<- list.files(mydir,pattern = "regional_vol*",full.names=T)
myfiles #check that files listed correctly
# initialise the dataframe from first file
# change header =T/F depending on presence of header
# make sure sep is correct
df<- read.csv( myfiles[1], header = F, skip = 0, nrows = 4, sep="" )[-c(1:3),]
#check that first line was read correctly
df
#read all the other files and update dataframe
#we read 4 lines to read the header correctly, then remove 3
ans<- lapply(myfiles[-1], function(x){ read.csv( x, header = F, skip = 0, nrows = 4, sep="")[-c(1:3),] })
ans
#update dataframe
lapply(ans, function(x){df<<-rbind(df,x)} )
#this should be the required dataframe
df
Also, if you are on Linux, a much simple method would be to simply make the OS do it for you
awk 'FNR == 4' regional_vol*.txt

This should do it for you.
# set the working directory (where files are saved)
setwd("C:/Users/your_path_here/Desktop/")
file_names = list.files(getwd())
file_names = file_names[grepl(".TXT",file_names)]
# print file_names vector
file_names
# read the WY.TXT file, just for testing
# file = read.csv("C:/Users/your_path_here/Desktop/regional_vol_GM_atlas1.txt", header=F, stringsAsFactors=F)
# see the data structure
str(file)
# run read.csv on all values of file_names
files = lapply(file_names, read.csv, header=F, stringsAsFactors = F)
files = do.call(rbind,files)
# set column names
names(files) = c("field1", "field2", "field3", "field4", "field5")
str(files)
write.table(files, "C:/Users/your_path_here/Desktop/mydata.txt", sep="\t")
write.csv(files,"C:/Users/your_path_here/Desktop/mydata.csv")

Related

Converting text files to excel files in R

I have radiotelemetry data that is downloaded as a series of text files. I was provided with code in 2018 that looped through all the text files and converted them into CSV files. Up until 2021 this code worked. However, now the below code (specifically the lapply loop), returns the following error:
"Error in setnames(x, value) :
Can't assign 1 names to a 4 column data.table"
# set the working directory to the folder that contain this script, must run in RStudio
setwd(dirname(rstudioapi::callFun("getActiveDocumentContext")$path))
# get the path to the master data folder
path_to_data <- paste(getwd(), "data", sep = "/", collapse = NULL)
# extract .TXT file
files <- list.files(path=path_to_data, pattern="*.TXT", full.names=TRUE, recursive=TRUE)
# regular expression of the record we want
regex <- "^\\d*\\/\\d*\\/\\d*\\s*\\d*:\\d*:\\d*\\s*\\d*\\s*\\d*\\s*\\d*\\s*\\d*"
# vector of column names, no whitespace
columns <- c("Date", "Time", "Channel", "TagID", "Antenna", "Power")
# loop through all .TXT files, extract valid records and save to .csv files
lapply(files, function(x){
df <- read_table(file) # read the .TXT file to a DataFrame
dt <- data.table(df) # convert the dataframe to a more efficient data structure
colnames(dt) <- c("columns") # modify the column name
valid <- dt %>% filter(str_detect(col, regex)) # filter based on regular expression
valid <- separate(valid, col, into = columns, sep = "\\s+") # split into columns
towner_name <- str_sub(basename(file), start = 1 , end = 2) # extract tower name
valid$Tower <- rep(towner_name, nrow(valid)) # add Tower column
file_path <- file.path(dirname(file), paste(str_sub(basename(file), end = -5), ".csv", sep=""))
write.csv(valid, file = file_path, row.names = FALSE, quote = FALSE) # save to .csv
})
I looked up possible fixes for this and found using "setnames(skip_absent=TRUE)" in the loop resolved the setnames error but instead gave the error "Error in is.data.frame(x) : argument "x" is missing, with no default"
lapply(files, function(file){
df <- read_table(file) # read the .TXT file to a DataFrame
dt <- data.table(df) # convert the dataframe to a more efficient data structure
setnames(skip_absent = TRUE)
colnames(dt) <- c("col") # modify the column name
valid <- dt %>% filter(str_detect(col, regex)) # filter based on regular expression
valid <- separate(valid, col, into = columns, sep = "\\s+") # split into columns
towner_name <- str_sub(basename(file), start = 1 , end = 2) # extract tower name
valid$Tower <- rep(towner_name, nrow(valid)) # add Tower column
file_path <- file.path(dirname(file), paste(str_sub(basename(file), end = -5), ".csv", sep=""))
write.csv(valid, file = file_path, row.names = FALSE, quote = FALSE) # save to .csv
})
I'm confused at to why this code is no longer working despite working fine last year? Any help would be greatly appreciated!
The error occured at this line colnames(dt) <- c("columns") where you provided only one value to rename the (supposedly) 4-column dataframe. If you meant to replace a particular column, you can
colnames(dt)[i] <- c("columns")
where i is the index of the column you are renaming. Alternatively, provide a vector with 4 new names.

How to write a.dbf file

I'm encountering issue using the below script. All are working fine except for the final line which results to the error below.
# read dbf
library(foreign)
setwd("C:/Users/JGGliban/Desktop/Work/ADMIN/Other Stream/PH")
# Combine multiple dbf files
# library('tidyverse')
# List all files ending with dbf in directory
dbf_files <- list.files(pattern = c("*.DBF","*.dbf"), full.names = TRUE)
# Read each dbf file into a list
dbf_list <- lapply(dbf_files, read.dbf, as.is = FALSE)
# Concatenate the data in each dbf file into one combined data frame
data <- do.call(rbind, dbf_list)
# Write dbf file - max-nchar is the maimum number of characters allowed in a character field. After the max, it will be truncated.
x <- write.dbf(data, file, factor2char = TRUE, max_nchar = 254)
Code modified to:
x <- write.dbf(data, "file.dbf", factor2char = TRUE, max_nchar = 254)

R script for extracting rows from several text files

I have 900 text files in my directory as seen in the following figure below
each file consists of data in the following format
667869 667869.000000
580083 580083.000000
316133 316133.000000
11065 11065.000000
I would like to extract fourth row from each text file and store the values in an array, any suggestions are welcome
This sounds more like a StackOverflow question, similar to
Importing multiple .csv files into R
You can try something like:
setwd("/path/to/files")
files <- list.files(path = getwd(), recursive = FALSE)
head(files)
myfiles = lapply(files, function(x) read.csv(file = x, header = TRUE))
mydata = lapply(myfiles, FUN = function(df){df[4,]})
str(mydata)
do.call(rbind, mydata)
A lazy answer is:
array <- c()
for (file in dir()) {
row4 <- read.table(file,
header = FALSE,
row.names = NULL,
skip = 3, # Skip the 1st 3 rows
nrows = 1, # Read only the next row after skipping the 1st 3 rows
sep = "\t") # change the separator if it is not "\t"
array <- cbind(array, row4)
}
You can further keep the name of the files
colnames(array) <- dir()

'rbind' and 'lapply' work just fine, unless I need to append the filename

I'm reading some csv files to create a dataframe, and append an additional column to it with the file name, using the following code:
wd <- "Working directory"
file_list <- list.files(wd)
### Function: read data ###
read_data <- function(file){
d <- read.csv(paste(wd,file,sep=""), stringsAsFactors = FALSE, strip.white = TRUE, na.strings = c("NA","")); # read in every file in the working directory
d$FileName <- substr(file,20,29); # append part of file name
d # return the dataframe
}
### Call rbind: merge data ###
df <- do.call(rbind, lapply(file_list,read_data))
But this error comes up:
Error in `$<-.data.frame`(`*tmp*`, "FileName", value = "2016010209") :
replacement has 1 row, data has 0
What am I doing wrong?
Cheers
Always check the data! The file was corrupted, therefore empty!
Thanks for your help!

R data.table: using fread on all .csv files in folder skipping the last line of each

I have hundreds of .csv files I need to read in using fread and save as one data table. The basic structure is the same for each .csv. There is header info that needs to be skipped (easy using skip = ). I am having difficulty with skipping the last line of each .csv file. Each .csv file has a different number of rows.
If I have only one file in the Test folder, this script perfectly skips the first rows (using skip = ) and the last row (using nrows = ):
file <- list.files("Q:/Test/", full.names=TRUE)
all <- fread(file, skip = 7, select = c(1:7,9),
nrows = length(readLines(file))-9)
When saving multiple files in the Test folder, this is the code I tried:
file <- list.files("Q:/Test/", full.names=TRUE)
L <- lapply(file, fread, skip = 7, select = c(1:7,9),
nrows = length(readLines(file))-9)
dt <- rbindlist(L)
It doesn't create L and gives me this error:
Error in file(con, "r") : invalid 'description' argument
Any ideas on how to skip the last row of each .csv when each .csv has a different number of rows?
I am using data.table version 1.9.6. Thanks.
It's a bit late, but here's what worked for me:
library(data.table)
fnames <- dir("path", pattern = "csv")
read_data <- function(z){
dat <- fread(z, skip = 1, select = 1)
return(dat[1:(nrow(dat)-1),])
}
datalist <- lapply(fnames, read_data)
bigdata <- rbindlist(datalist, use.names = TRUE)
Here path refers to the directory that you're looking into. I'm assuming that the names are similar for all read files, if not, you can always define a new name for bigdata using names. Hope this helps!

Resources