Add header and footer to data in write.table() - r

I have several hundred data files and I need to add a header (start of the data in a file) and footer (end of the data in a file) to each file in r like following:
Header:
line1
line2
line3
likewise, I have few lines that I like to add the footer at the end of each data file
footer:
line1
line2
line3
while writing a table in r with write.table(). Can someone suggest a simple solution? Thanks

Perhaps something like this:
lapply( c('dobjt1', 'dobjct2', 'other3'),
function(x) {
name <- paste0( x, ".txt")
write(c(line1,line2,line3), file=name)
out <- get(x); Need to use `get` when working with character values
write.table(out, file=name, append=TRUE)
write(c(line1,line2,line3), file=name, append=TRUE)
})

Adding steps to #42- s solution to read a list of files from disk and assign their file names as the data frame names leads to a complete, working solution.
We'll use the Pokémon data from Alex Barradas Pokémon Stats data set from kaggle.com as our example.
download.file("https://github.com/lgreski/PokemonData/raw/master/pokemonData.zip",
"pokemonData.zip",mode="wb",method="wininet")
unzip("pokemonData.zip")
thePokemonFiles <- list.files("./pokemonData",
full.names=FALSE)[1:3] # subset to first 3 files
pokemonData <- lapply(thePokemonFiles,function(x) {
data <- read.csv(paste("./pokemonData/",x,sep=""))
# set input file name as object name so file list can be used in lapply() for write
assign(x,data,parent.env(environment()))
NULL # null return to avoid duplicating data frames in output list
})
header <- c("header 1","header 2","header 3")
footer <- c("footer 1","footer 2","footer 3")
lapply(thePokemonFiles, function(x) {
name <- paste0(x, ".txt")
write(header, file = name)
write.table(get(x), file = name, append = TRUE)
write(footer, file = name, append = TRUE)
})
...and the first few lines of the resulting text file for the first generation Pokémon is:

Related

How to work with nested for loops in R with same list?

I amtrying to do some R coding for my project. Where I have to read some .csv files from one directory in R and I have to assign data frame as df_subject1_activity1, i have tried nested loops but it is not working.
ex:
my dir name is "Test" and i have six .csv files
subject1activity1.csv,
subject1activity2.csv,
subject1activity3.csv,
subject2activity1.csv,
subject2activity2.csv,
subject2activity3.csv
now i want to write code to load this .csv file in R and assign dataframe name as
ex:
subject1activity1 = df_subject1_activity1
subject1activity2 = df_subject1_activity2
.... so on using for loop.
my expected output is:
df_subject1_activity1
df_subject1_activity2
df_subject1_activity3
df_subject2_activity1
df_subject2_activity2
df_subject2_activity3
I have trie dfollowing code:
setwd(dirname(getActiveDocumentContext()$path))
new_path <- getwd()
new_path
data_files <- list.files(pattern=".csv") # Identify file names
data_files
for(i in 1:length(data_files)) {
for(j in 1:4){
assign(paste0("df_subj",i,"_activity",j)
read.csv2(paste0(new_path,"/",data_files[i]),sep=",",header=FALSE))
}
}
I am not getting desire output.
new to R can anyone please help.
Thanks
One solution is to use the vroom package (https://www.tidyverse.org/blog/2019/05/vroom-1-0-0/), e.g.
library(tidyverse)
library(vroom)
library(fs)
files <- fs::dir_ls(glob = "subject_*.csv")
data <- purrr::map(files, ~vroom::vroom(.x))
list2env(data, envir = .GlobalEnv)
# You can also combine all the dataframes if they have the same columns, e.g.
library(data.table)
concat <- data.table::rbindlist(data, fill = TRUE)
You are almost there. As always, if you are unsure, is never a bad idea to code clearly using more lines.
data_files <- list.files(pattern=".csv", full.names=TRUE) # Identify file names data_files
for( data_file in data_files) {
## check that the data file matches our expected pattern:
if(!grepl( "subject[0-9]activity[0-9]", basename(data_file) )) {
warning( "skiping file ", basename(data_file) )
next
}
## start creating the variable name from the filename
## remove the .csv extension
var.name <- sub( "\\.csv", "", basename(data_file), ignore.case=TRUE )
## prepend 'df' and introduce underscores:
var.name <- paste0(
"df",
gsub( "(subject|activity)", "_\\1", var.name ) ## this looks for literal 'subject' and 'acitivity' and if found, adds an underscore in front of it
)
## now read the file
data.from.file <- read.csv2( data_file )
## and assign it to our variable name
assign( var.name, data.from.file )
}
I don't have your files to test with, but should the above fail, you should be able to run the code line by line and easily see where it starts to go wrong.

rbind txt files from online directory (R)

I am trying to get concatenate text files from url but i don't know how to do this with the html and the different folders?
This is the code i tried, but it only lists the text files and has a lot of html code like this How do I fix this so that I can combine the text files into one csv file?
library(RCurl)
url <- "http://weather.ggy.uga.edu/data/daily/"
dir <- getURL(url, dirlistonly = T)
filenames <- unlist(strsplit(dir,"\n")) #split into filenames
#append the files one after another
for (i in 1:length(filenames)) {
file <- past(url,filenames[i],delim='') #concatenate for urly
if (i==1){
cp <- read_delim(file, header=F, delim=',')
}
else{
temp <- read_delim(file,header=F,delim=',')
cp <- rbind(cp,temp) #append to existing file
rm(temp)# remove the temporary file
}
}
here is a code snippet that I got to work for me. I like to use rvest over RCurl, just because that's what I've learned. In this case, I was able to use the html_nodes function to isolate each file ending in .txt. The result table has the times saved as character strings, but you could fix that later. Let me know if you have any questions.
library(rvest)
library(readr)
url <- "http://weather.ggy.uga.edu/data/daily/"
doc <- xml2::read_html(url)
text <- rvest::html_text(rvest::html_nodes(doc, "tr td a:contains('.txt')"))
# define column types of fwf data ("c" = character, "n" = number)
ctypes <- paste0("c", paste0(rep("n",11), collapse = ""))
data <- data.frame()
for (i in 1:2){
file <- paste0(url, text[1])
date <- as.Date(read_lines(file, n_max = 1), "%m/%d/%y")
# Read file to determine widths
columns <- fwf_empty(file, skip = 3)
# Manually expand `solar` column to be 3 spaces wider
columns$begin[8] <- columns$begin[8] - 3
data <- rbind(data, cbind(date,read_fwf(file, columns,
skip = 3, col_types = ctypes)))
}

Extracting file numbers from file names in r and looping through files

I have a folder full of .txt files that I want to loop through and compress into one data frame, but each .txt file is data for one subject and there are no columns in the text files that indicate subject number or time point in the study (e.g. 1-5). I need to add a line or two of code into my loop that looks for strings of four numbers (i.e. each file is labeled something like: "4325.5_ERN_No_Startle") and just creates a column with 4325 and another column with 5 that will appear for every data point for that subject until the loop gets to the next one. I have been looking for awhile but am still coming up empty, any suggestions?
I also have not quite gotten the loop to work:
path = "/Users/me/Desktop/Event Codes/ERN task/ERN text files transferred"
out.file <- ""
file <- ""
file.names <- dir(path, pattern =".txt")
for(i in 1:length(file.names)){
file <- read.table(file.names[i],header=FALSE, fill = TRUE)
out.file <- rbind(out.file, file)
}
which runs okay until I get this error message part way through:
Error in read.table(file.names[i], header = FALSE, fill = TRUE) :
no lines available in input
Consider using regex to parse the file name for study period and subject, both of which are then binded in a lapply of list.files:
path = "path/to/text/files"
# ANY TXT FILE WITH PATTERN OF 4 DIGITS FOLLOWED BY A PERIOD AND ONE DIGIT
file.names <- list.files(path, pattern="*[0-9]{4}\\.[0-9]{1}.*txt", full.names=TRUE)
# IMPORT ALL FILES INTO A LIST OF DATAFRAMES AND BINDS THE REGEX EXTRACTS
dfList <- lapply(file.names, function(x) {
if (file.exists(x)) {
data.frame(period=regmatches(x, gregexpr('[0-9]{4}', x))[[1]],
subject=regmatches(x, gregexpr('\\.[0-9]{1}', x))[[1]],
read.table(x, header=FALSE, fill=TRUE),
stringsAsFactors = FALSE)
}
})
# COMBINE EACH DATA FRAME INTO ONE
df <- do.call(rbind, dfList)
# REMOVE PERIOD IN SUBJECT (NEEDED EARLIER FOR SPECIAL DIGIT)
df['subject'] <- sapply(df['subject'],
function(x) gsub("\\.", "", x))
You can try to use tryCatchwhich basically would give you a NULL instead of an error.
file <- tryCatch(read.table(file.names[i],header=FALSE, fill = TRUE), error=function(e) NULL))

R - Dynamic reference to files for read csv

I would like to make a script that reads data from the correct folder. I have several lines in my code refering to the foldername, therefore I would like to make this dynamic. Is it possible to make the reference to a folder name dynamic? See below what I would like to do
# Clarifies the name of the folder, afterwards "Foldername" will be used as reference
FolderA <- Foldername
# Read csv to import the data from the selected location
data1 <- read.csv(file="c:/R/Foldername/datafile1.csv", header=TRUE, sep=",")
data2 <- read.csv(file="c:/R/Foldername/datafile2.csv", header=TRUE, sep=",")
I am trying to get the same result as what I would get with this code:
data1 <- read.csv(file="c:/R/FolderA/datafile1.csv", header=TRUE, sep=",")
data2 <- read.csv(file="c:/R/FolderA/datafile2.csv", header=TRUE, sep=",")
Can somebody please clarify how it would be possible to make this dynamic?
You could use paste0 for this:
FolderA <- "Foldername"
paste0("c:/R/", FolderA, "/datafile1.csv")
#[1] "c:/R/Foldername/datafile1.csv"
So in your case:
data1 <- read.csv(file=paste0("c:/R/", FolderA, "/datafile1.csv"), header=TRUE, sep=",")
A slight generalization of #LyzandeR's answer,
make_files <- function(directory, filenames) {
sprintf("C:/R/%s/%s", directory, filenames)
}
##
Files <- sprintf("file%i.csv", 1:3)
##
make_files("FolderA", Files)
#[1] "C:/R/FolderA/file1.csv" "C:/R/FolderA/file2.csv" "C:/R/FolderA/file3.csv"
you could also try the following method. The loop will create a list with output file, but if your files all have the same column names you could just rbind them together (method 2). This method will allow you to specify your folder, then use the list.files function to extract all files with extension ".csv". This way if you have many csv files in a folder you won't have to write them all out individually.
# Specify working directory or location of files:
FolderA = "c:/R/Foldername"
# identify all files with specific extension:
files = list.files(FolderA,pattern="*.csv")
Method 1 - Separate by lists
data = NULL
for(i in 1:length(files)){
data[[i]] = read.csv(files[i],header=F,stringsAsFactors=F)
}
Method 2 - single dataframe
data = NULL
for(i in 1:length(files)){
df = read.csv(files[i],header=F,stringsAsFactors=F)
data = rbind(data,df)
}

Read in multiple txt files and create a list of it to access each file by accessing the list element in R

Being relatively new to R programming I am struggling with a huge data set of 16 text files (, seperated) saved in one dierctory. All the files have same number of columns and the naming convention, for example file_year_2000, file_year_2001 etc. I want to create a list in R where i can access each file individually by accessing the list elementts. By searching through the web i found some code and tried the following but as a result i get one huge list (16,2 MB) where the output is just strange. I would like to have 16 elements in the list each represting one file read from the directory. I tried the following code but it does not work as i want:
path = "~/.../.../.../Data_1999-2015"
list.files(path)
file.names <- dir(path, pattern =".txt")
length(file.names)
df_list = list()
for( i in length(file.names)){
file <- read.csv(file.names[i],header=TRUE, sep=",", stringsAsFactors=FALSE)
year = gsub('[^0-9]', '', file)
df_list[[year]] = file
}
Any suggestions?
Thanks in advance.
Just to give more details
path = "~/.../.../.../Data_1999-2015"
list.files(path)
file.names <- dir(path, pattern =".txt")
length(file.names)
df_list = list()
for(i in seq(length(file.names))){
year = gsub('[^0-9]', '', file.names[i])
df_list[[year]] = read.csv(file.names[i],header=TRUE, sep=",", stringsAsFactors=FALSE)
}
Maybe it would be worth joining the data frames into one big data frame with an additional column being the year?
I assume that instead of "access each file individually" you mean you want to access individually data in each file.
Try something like this (untested):
path = "~/.../.../.../Data_1999-2015"
file.names <- dir(path, pattern =".txt")
df_list = vector("list", length(file.names))
# create a list of data frames with correct length
names(df_list) <- rep("", length(df_list))
# give it empty names to begin with
for( i in seq(along=length(file.names))) {
# now i = 1,2,...,16
file <- read.csv(file.names[i],header=TRUE, sep=",", stringsAsFactors=FALSE)
df_list[[i]] = file
# save the data
year = gsub('[^0-9]', '', file.names[i])
names(df_list)[i] <- year
}
Now you can use either df_list[[1]] or df_list[["2000"]] for year 2000 data.
I am uncertain if you are reading yout csv files in the right directory. If not, use
file <- read.csv(paste0(path, file.names[i], sep="/"),header=TRUE, sep=",", stringsAsFactors=FALSE)
when reading the file.

Resources