Is there any way to get information about the number of rows and columns of a multiple CSV file in R and save it in a CSV file? Here is my R code:
#Library
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("fs")) install.packages("fs")
#Mentioning Files Location
file_paths <- fs::dir_ls("C:\\Users\\Desktop\\FileCount\\Test")
file_paths[[2]]
#Reading Multiple CSV Files
file_paths %>%
map(function(path)
{
read_csv(path,col_names = FALSE)
})
#Counting Number of Rows
lapply(X = file_paths, FUN = function(x) {
length(count.fields(x))
})
#Counting Number of Columns
lapply(X = file_paths, FUN = function(x) {
length(ncol(x))
})
#Saving CSV File
write.csv(file_paths,"C:\\Users\\Desktop\\FileCount\\Test\\FileName.csv", row.names = FALSE)
Couple of things are not working:
Number of Columns of a multiple CSV file
When I am saving the file, I want to save Filename, number of rows and number of columns. See attached image.
How the output looks like:
Attached some CSV Files for testing: Here
Any help appreciated.
Welcome on SO! Using the tidyverse and data.table, here's a way to do it:
Note: All the .csv files are in my TestStack directory, but you can change it with your own directory (C:/Users/Desktop/FileCount/Test).
Code:
library(tidyverse)
csv.file <- list.files("TestStack") # Directory with your .csv files
data.frame.output <- data.frame(number_of_cols = NA,
number_of_rows = NA,
name_of_csv = NA) #The df to be written
MyF <- function(x){
csv.read.file <- data.table::fread(
paste("TestStack", x, sep = "/")
)
number.of.cols <- ncol(csv.read.file)
number.of.rows <- nrow(csv.read.file)
data.frame.output <<- add_row(data.frame.output,
number_of_cols = number.of.cols,
number_of_rows = number.of.rows,
name_of_csv = str_remove_all(x,".csv")) %>%
filter(!is.na(name_of_csv))
}
map(csv.file, MyF)
Output:
number_of_cols number_of_rows name_of_csv
1 3 2150 CH_com
2 2 34968 epci_com_20
3 3 732 g1g4
4 7 161905 RP
I have this output because my TestStack had 4 files named CH_com.csv, epci_com_20.csv,...
You can then write the object data.frame.output to a .csv as you wanted: data.table::fwrite(data.frame.output, file = "Output.csv")
files_map <- "test"
files <- list.files(files_map)
library(data.table)
output <- data.table::rbindlist(
lapply(files, function(file) {
dt <- data.table::fread(paste(files_map, file, sep = "/"))
list("number_of_cols" = ncol(dt), "number_of_rows" = nrow(dt), "name_of_csv" = file)
})
)
data.table::fwrite(output, file = "Filename.csv")
Or with map and a seperate function to do the tasks, but without using an empty table first and update it with a global assignment. I see this happen a lot on apply functions, while it is not needed at all.
myF <- function(file) {
dt <- data.table::fread(paste(files_map, file, sep = "/"))
data.frame("number_of_cols" = ncol(dt), "number_of_rows" = nrow(dt), "name_of_csv" = file)
}
output <- do.call(rbind, map(files, myF))
Related
I trying to read the number of rows and columns in several csv files inside a folder.
My code read all files but showed 0 row and 0 column.
files_map <- "C:/Users/Windows 10/Desktop/dados/planilhas LLS"
files <- list.files(full.names = F)
library(data.table)
output <- data.table::rbindlist(lapply(files, function(file) {
dt <- data.table::fread(paste(files_map, file, sep = " "))
list("number_of_cols" = ncol(dt), "number_of_rows" = nrow(dt), "name_of_file" = file)}))
How could I solve this?
Thanks
I made a test on my computer, slightly changing your files and this produces a correct output. You need to change paste to paste0 because you don't want spaces in your filenames, and then add a trailing /.
library(data.table)
setwd("Desktop/")
## make up some random files
fwrite(mtcars, "test_a")
fwrite(mtcars, "test_b")
fwrite(mtcars, "test_c")
files_map <- "~/Desktop"
output <- data.table::rbindlist(lapply(files, function(file) {
dt <- data.table::fread(paste0(files_map, "/", file))
list("number_of_cols" = ncol(dt), "number_of_rows" = nrow(dt), "name_of_file" = file)
})
)
number_of_cols number_of_rows name_of_file
1: 11 32 test_a
2: 11 32 test_b
3: 11 32 test_c
I have radiotelemetry data that is downloaded as a series of text files. I was provided with code in 2018 that looped through all the text files and converted them into CSV files. Up until 2021 this code worked. However, now the below code (specifically the lapply loop), returns the following error:
"Error in setnames(x, value) :
Can't assign 1 names to a 4 column data.table"
# set the working directory to the folder that contain this script, must run in RStudio
setwd(dirname(rstudioapi::callFun("getActiveDocumentContext")$path))
# get the path to the master data folder
path_to_data <- paste(getwd(), "data", sep = "/", collapse = NULL)
# extract .TXT file
files <- list.files(path=path_to_data, pattern="*.TXT", full.names=TRUE, recursive=TRUE)
# regular expression of the record we want
regex <- "^\\d*\\/\\d*\\/\\d*\\s*\\d*:\\d*:\\d*\\s*\\d*\\s*\\d*\\s*\\d*\\s*\\d*"
# vector of column names, no whitespace
columns <- c("Date", "Time", "Channel", "TagID", "Antenna", "Power")
# loop through all .TXT files, extract valid records and save to .csv files
lapply(files, function(x){
df <- read_table(file) # read the .TXT file to a DataFrame
dt <- data.table(df) # convert the dataframe to a more efficient data structure
colnames(dt) <- c("columns") # modify the column name
valid <- dt %>% filter(str_detect(col, regex)) # filter based on regular expression
valid <- separate(valid, col, into = columns, sep = "\\s+") # split into columns
towner_name <- str_sub(basename(file), start = 1 , end = 2) # extract tower name
valid$Tower <- rep(towner_name, nrow(valid)) # add Tower column
file_path <- file.path(dirname(file), paste(str_sub(basename(file), end = -5), ".csv", sep=""))
write.csv(valid, file = file_path, row.names = FALSE, quote = FALSE) # save to .csv
})
I looked up possible fixes for this and found using "setnames(skip_absent=TRUE)" in the loop resolved the setnames error but instead gave the error "Error in is.data.frame(x) : argument "x" is missing, with no default"
lapply(files, function(file){
df <- read_table(file) # read the .TXT file to a DataFrame
dt <- data.table(df) # convert the dataframe to a more efficient data structure
setnames(skip_absent = TRUE)
colnames(dt) <- c("col") # modify the column name
valid <- dt %>% filter(str_detect(col, regex)) # filter based on regular expression
valid <- separate(valid, col, into = columns, sep = "\\s+") # split into columns
towner_name <- str_sub(basename(file), start = 1 , end = 2) # extract tower name
valid$Tower <- rep(towner_name, nrow(valid)) # add Tower column
file_path <- file.path(dirname(file), paste(str_sub(basename(file), end = -5), ".csv", sep=""))
write.csv(valid, file = file_path, row.names = FALSE, quote = FALSE) # save to .csv
})
I'm confused at to why this code is no longer working despite working fine last year? Any help would be greatly appreciated!
The error occured at this line colnames(dt) <- c("columns") where you provided only one value to rename the (supposedly) 4-column dataframe. If you meant to replace a particular column, you can
colnames(dt)[i] <- c("columns")
where i is the index of the column you are renaming. Alternatively, provide a vector with 4 new names.
I have a number data set (say 50 files) of csv files: crasha, crashabd, crashd, …
I wrote a function to do some changes and analysis for a single data. I want to have a dynamic name for output. For example, I want to have newcrasha, newcrashabd, newcrashd, and … as output csv files. Indeed, I want to get names of imported files and use these as output filenames?
For example:
filenames <- list.files(path = "D:/health/car crash/", pattern = "csv",full.names = TRUE)
analyze <- function(filename) {
# Input is character string of a csv file.
crash <- read.csv(file = filename, header = TRUE)
#merg and summation (crashcounter and NUMBER_INJURED)
newcrash<-crash %>% group_by(COLLISION_DATE) %>% summarise(crashcounter = sum(crashcounter), NUMBER_INJURED = sum(NUMBER_INJURED))
write.csv( newcrash, "D://health//car crash// newcrash.csv", row.names = FALSE)
}
filenames <- filenames[1:50]
for (f in filenames) {
analyze(f)
}
Thank you for any help
try this, following the suggestion of #mhovd:
filename <- list.files(path = "D:/health/car crash/", pattern = "csv",full.names = TRUE)
analyze <- function(filename) {
# Input is character string of a csv file.
crash <- read.csv(file = filename, header = TRUE)
#merg and summation (crashcounter and NUMBER_INJURED)
newcrash<-crash %>% group_by(COLLISION_DATE) %>% summarise(crashcounter = sum(crashcounter), NUMBER_INJURED = sum(NUMBER_INJURED))
new.name <- paste0("D:/health/car crash/new",basename(tools::file_path_sans_ext(filename)),".csv")
write.csv( newcrash, file=new.name, row.names = FALSE)
}
lapply(filename[1:50], analyze)
I have 6 txt files and I want to combine them into 1 dataframe. I know how to read them simultaneously and combine them in default way.
I learned to do this in this website:
txt_files_ls = list.files(path=mypath, pattern="*.txt")
txt_files_df <- lapply(txt_files_ls, function(x) {read.table(file = x, header = T, sep ="\t")})
# Combine them
combined_df <- do.call("rbind", lapply(txt_files_df, as.data.frame))
Now I want to do is set the read.table to read the txt files in a sequential manner as i defined, So that after combining them, I will be able to labeled the rows with the name of their original txt file name. Thank you
You can try this:
txt_files_ls = list.files(path=mypath, pattern="*.txt")
#The function for reading
read.data <- function(x)
{
y <- read.table(file = x, header = T, sep ="\t")
y$var <- x
return(y)
}
#Read data
txt_files_df <- lapply(txt_files_ls,read.data)
# Combine them
combined_df <- do.call("rbind", lapply(txt_files_df, as.data.frame))
Where var contains the name of each file.
I have 900 text files in my directory as seen in the following figure below
each file consists of data in the following format
667869 667869.000000
580083 580083.000000
316133 316133.000000
11065 11065.000000
I would like to extract fourth row from each text file and store the values in an array, any suggestions are welcome
This sounds more like a StackOverflow question, similar to
Importing multiple .csv files into R
You can try something like:
setwd("/path/to/files")
files <- list.files(path = getwd(), recursive = FALSE)
head(files)
myfiles = lapply(files, function(x) read.csv(file = x, header = TRUE))
mydata = lapply(myfiles, FUN = function(df){df[4,]})
str(mydata)
do.call(rbind, mydata)
A lazy answer is:
array <- c()
for (file in dir()) {
row4 <- read.table(file,
header = FALSE,
row.names = NULL,
skip = 3, # Skip the 1st 3 rows
nrows = 1, # Read only the next row after skipping the 1st 3 rows
sep = "\t") # change the separator if it is not "\t"
array <- cbind(array, row4)
}
You can further keep the name of the files
colnames(array) <- dir()