Constantly amending dataframe - r

I have a dataframe that generates from a folder that users will place several .csv files into. The .csv files will always have the same column structure, however they vary in row length. The idea is to make a single dataframe with all of the .csv files. When I use the code below with multiple .csv files I receive the following error message: "Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 88, 259"
temp <- list.files(pattern="*.csv", path = dir, full.names = TRUE)
importDM<-lapply(temp, read.csv, header = TRUE)
rawDM <- as.data.frame(importDM)
rawDM$Created.Date <- as.Date(rawDM$Created.Date...Time, format="%d/%m/%Y")
rawDM$Week <- strftime(rawDM$Created.Date,format="%W")
Something that will be an issue down the road as well is I want only the first .csv file added to be used for the header, as I believe with the code as it is will just lapply the header into the dataframe with each .csv file added.
Cheers,

Found an answer on a blog elsewhere, here was the final code:
temp <- list.files(pattern="*.csv", path = dir, full.names = TRUE)
importDM<-do.call("rbind", lapply(temp, read.csv, header = TRUE))
rawDM <- as.data.frame(importDM)

Related

Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input

I have several csv files (let say A.csv, B.csv, ...) in several folders let say (F1, F2,...). I want to read all files in a way that cbind A.csv, B.csv of each folder and create main dataframe for each folder. It means I need to have n dataframe for my n folders with a unique name based on folder name.
I have tried this code to get the list of csv files.
files <- dir("/Users/.../.../...", recursive=TRUE, full.names=TRUE, pattern="\\.csv$")
then created a function:
readFun <- function(x) { df <- read.csv(x)}
then sapply:
sapply(files, readFun)
it returns this error:
Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input
I played around with the code alot, but did not figure out how to debug it.
Any help is highly appreciated.
Also, any hint on how to create main dataframe for each folder?
Thanks
You could first get all the directory names, then create a function to read all the files from one directory combining them and writing the combined file in the same directory.
all_directories <- dir('top/directory', full.names = TRUE)
bind_files_into_one <- function(file_path) {
all_files <- list.files(file_path, full.names = TRUE, pattern = "\\.csv$")
temp <- do.call(cbind, lapply(all_files, read.csv))
write.csv(temp, paste0(file_path, '/combined.csv'), row.names = FALSE)
}
We can use lapply to apply this function to every directory.
lapply(all_directories, bind_files_into_one)

How to write a.dbf file

I'm encountering issue using the below script. All are working fine except for the final line which results to the error below.
# read dbf
library(foreign)
setwd("C:/Users/JGGliban/Desktop/Work/ADMIN/Other Stream/PH")
# Combine multiple dbf files
# library('tidyverse')
# List all files ending with dbf in directory
dbf_files <- list.files(pattern = c("*.DBF","*.dbf"), full.names = TRUE)
# Read each dbf file into a list
dbf_list <- lapply(dbf_files, read.dbf, as.is = FALSE)
# Concatenate the data in each dbf file into one combined data frame
data <- do.call(rbind, dbf_list)
# Write dbf file - max-nchar is the maimum number of characters allowed in a character field. After the max, it will be truncated.
x <- write.dbf(data, file, factor2char = TRUE, max_nchar = 254)
Code modified to:
x <- write.dbf(data, "file.dbf", factor2char = TRUE, max_nchar = 254)

R script for extracting rows from several text files

I have 900 text files in my directory as seen in the following figure below
each file consists of data in the following format
667869 667869.000000
580083 580083.000000
316133 316133.000000
11065 11065.000000
I would like to extract fourth row from each text file and store the values in an array, any suggestions are welcome
This sounds more like a StackOverflow question, similar to
Importing multiple .csv files into R
You can try something like:
setwd("/path/to/files")
files <- list.files(path = getwd(), recursive = FALSE)
head(files)
myfiles = lapply(files, function(x) read.csv(file = x, header = TRUE))
mydata = lapply(myfiles, FUN = function(df){df[4,]})
str(mydata)
do.call(rbind, mydata)
A lazy answer is:
array <- c()
for (file in dir()) {
row4 <- read.table(file,
header = FALSE,
row.names = NULL,
skip = 3, # Skip the 1st 3 rows
nrows = 1, # Read only the next row after skipping the 1st 3 rows
sep = "\t") # change the separator if it is not "\t"
array <- cbind(array, row4)
}
You can further keep the name of the files
colnames(array) <- dir()

R load csv files from folder

I am loading a bunch of csv files simultaneously from a local directory using
the following code:
myfiles = do.call(rbind, lapply(files, function(x) read.table(x, stringsAsFactors = FALSE, header = F, fill = T, sep=",", quote=NULL)))
and getting an error message:
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
I am afraid that quotes cause this as I inspect the number of columns in each of the 4 files I see that file number 3 contain 10 columns (incorrect) and the rest only 9 columns (correct). Looking into the corrupted file - it is definitely caused by quotes that cause a column split.
Any help apreciated
Found the answer, quote parameter should be set to quote ="\""
myfiles = do.call(rbind, lapply(files, function(x) read.table(x, stringsAsFactors = FALSE, header = F, fill = T, sep=",", quote ="\"")))

R find maxima of multiple variables from multiple .CSV files

I have multiple csv's, each containing multiple observations for one participant on several variables. Let's say each csv file looks something like the below, and the name of the file indicates the participant's ID:
data.frame(
happy = sample(1:20, 10),
sad = sample(1:20, 10),
angry = sample(1:20, 10)
)
I found some code in an excellent stackoverflow answer that allows me to access all files saved into a specific folder, calculate the sums of these emotions, and output them into a file:
# access all csv files in the working directory
fileNames <- Sys.glob("*.csv")
for (fileName in fileNames) {
# read original data:
sample <- read.csv(fileName,
header = TRUE,
sep = ",")
# create new data based on contents of original file:
data.summary <- data.frame(
File = fileName,
happy.sum = sum(sample$happy),
sad.sum = sum(sample$sad),
angry.sum = sum(sample$angry))
# write new data to separate file:
write.table(data.summary,
"sample-allSamples.csv",
append = TRUE,
sep = ",",
row.names = FALSE,
col.names = FALSE)}
However, I can ONLY get "sum" to work in this function. I would like to not only find the sums of each emotion for each participant, but also the maximum value of each.
When I try to modify the above:
for (fileName in fileNames) {
# read original data:
sample <- read.csv(fileName,
header = TRUE,
sep = ",")
# create new data based on contents of original file:
data.summary <- data.frame(
File = fileName,
happy.sum = sum(sample$happy),
happy.max = max(sample$happy),
sad.sum = sum(sample$sad),
angry.sum = sum(sample$angry))
# write new data to separate file:
write.table(data.summary,
"sample-allSamples.csv",
append = TRUE,
sep = ",",
row.names = FALSE,
col.names = FALSE)}
I get the following warning message:
In max(sample$happy) : no non-missing arguments to max; returning -Inf
Would sincerely appreciate any advice anyone can give me!
using your test data, the max() statement works fine for me. Is it related to a discrepancy between the sample code you have posted and your actual csv file structure?

Resources