using column names when appending data in write.table - r

I am looping through some data, and appending it to csv file. What I want is to have column names on the top of the file once, and then as it loops to not repeat column names in the middle of file.
If I do col.names=T, it repeats including column names for each new loop. If I have col.names=F, there are no column names at all.
How do I do this most efficiently? I feel that this is such a common case that there must be a way to do it, without writing code especially to handle it.
write.table(dd, "data.csv", append=TRUE, col.names=T)

See ?file.exists.
write.table(dd, "data.csv", append=TRUE, col.names=!file.exists("data.csv"))
Thus column names are written only when you are not appending to a file that already exists.

You may or may not also see a problem with the row names being identical, as write.table does not allow identical row names when appending. You could give this a try. In the first write to file, try write.table with row.names = FALSE only. Then, starting from the second write to file, use both col.names = FALSE and row.names = FALSE
Here's the first write to file
> d1 <- data.frame(A = 1:5, B = 1:5) ## example data
> write.table(d1, "file.txt", row.names = FALSE)
We can check it with read.table("file.txt", header = TRUE). Then we can append the same data frame to that file with
> write.table(d1, "file.txt", row.names = FALSE,
col.names = FALSE, append = TRUE)
And again we can check it with read.table("file.txt", header = TRUE)
So, if you have a list of data frames, say dlst, your code chunk that appends the data frames together might look something like
> dlst <- rep(list(d1), 3) ## list of example data
> write.table(dlst[1], "file.txt", row.names = FALSE)
> invisible(lapply(dlst[-1], write.table, "file.txt", row.names = FALSE,
col.names = FALSE, append = TRUE))
But as #MrFlick suggests, it would be much better to append the data frames in R, and then send them to file once. This would eliminate many possible errors/problems that could occur while writing to file. If the data is in a list, that could be done with
> dc <- do.call(rbind, dlst)
> write.table(dc, "file.txt")

Try changing the column names of the data frame using names() command in R and replace with the same names as existing and then try the dbWriteTable command keeping row.names = False. The issue will get solved.
e.g.
if your data frame df1 has columns as obs, name, age then
names(df1) <- c('obs','name','age')
and then try
dbWriteTable(conn, 'table_name', df1, append = T, row.names = F)

Related

How do I extract specific rows from a CSV and format the data in R?

I have a CSV file that contains thousands of lines like this:
1001;basket/files/legobrick.mp3
4096;basket/files/sunshade.avi
2038;data/lists/blockbuster.ogg
2038;data/random/noidea.dat
I want to write this to a new CSV file but include only rows which contain '.mp3' or '.avi'. The output file should be just one column and look like this:
"basket/files/legobrick.mp3#1001",
"basket/files/sunshade.avi#4096",
So the first column should be suffixed to the second column and separated by a hash symbol and each line should be quoted and separated by a comma as shown above.
The source CSV file does not contain a header with column names. It's just data.
Can someone tell me how to code this in R?
Edit (following marked answer): This question is not a duplicate because it involves filtering rows and the output code format is completely different requiring different processing methods. The marked answer is also completely different which really backs up my assertion that this is not a duplicate.
You can do it in the following way :
#Read the file with ; as separator
df <- read.csv2(text = text, header = FALSE, stringsAsFactors = FALSE)
#Filter the rows which end with "avi" or "mp3"
inds <- grepl("avi$|mp3$", df$V2)
#Create a new dataframe by pasting those rows with a separator
df1 <- data.frame(new_col = paste(df$V2[inds], df$V1[inds], sep = "#"))
df1
# new_col
#1 basket/files/legobrick.mp3#1001
#2 basket/files/sunshade.avi#4096
#Write the csv
write.csv(df1, "/path/of/file.csv", row.names = FALSE)
Or if you want it as a text file you can do
write.table(df1, "path/test.txt", row.names = FALSE, col.names = FALSE, eol = ",\n")
data
text = "1001;basket/files/legobrick.mp3
4096;basket/files/sunshade.avi
2038;data/lists/blockbuster.ogg
2038;data/random/noidea.dat"
See whether the below code helps
library(tidyverse)
df %>%
filter(grepl("\\.mp3|\\.avi", file_path)) %>%
mutate(file_path = paste(file_path, ID, sep="#")) %>%
pull(file_path) %>% dput
A data.table answer:
dt <- fread("file.csv")
fwrite(dt[V2 %like% "mp3$|avi$", .(paste0(V2, "#", V1))], "output.csv", col.names = FALSE)

Read list of names from CSV into R

I have a text file of names, separated by commas, and I want to read this into whatever in R (data frame or vector are fine). I try read.csv and it just reads them all in as headers for separate columns, but 0 rows of data. I try header=FALSE and it reads them in as separate columns. I could work with this, but what I really want is one column that just has a bunch of rows, one for each name. For example, when I try to print this data frame, it prints all the column headers, which are useless, and then doesn't print the values. It seems like it should be easily usable, but it appears to me one column of names would be easier to work with.
Since the OP asked me to, I'll post the comment above as an answer.
It's very simple, and it comes from some practice in reading in sequences of data, numeric or character, using scan.
dat <- scan(file = your_filename, what = 'character', sep = ',')
You can use read.csv are read string as header, but then just extract names (using names) and put this into a data.frame:
data.frame(x = names(read.csv("FILE")))
For example:
write.table("qwerty,asdfg,zxcvb,poiuy,lkjhg,mnbvc",
"FILE", col.names = FALSE, row.names = FALSE, quote = FALSE)
data.frame(x = names(read.csv("FILE")))
x
1 qwerty
2 asdfg
3 zxcvb
4 poiuy
5 lkjhg
6 mnbvc
Something like that?
Make some test data:
# test data
list_of_names <- c("qwerty","asdfg","zxcvb","poiuy","lkjhg","mnbvc" )
list_of_names <- paste(list_of_names, collapse = ",")
list_of_names
# write to temp file
tf <- tempfile()
writeLines(list_of_names, tf)
You need this part:
# read from file
line_read <- readLines(tf)
line_read
list_of_names_new <- unlist(strsplit(line_read, ","))

Exporting two data frames into same file

I currently have two data-frames, One DF contains around ~100,000 rows, while the other only has ~1000. I can export either one of these using the write.table function shown below...
write.table(DF_1, file = paste("DF_one.csv" ),
row.names = F, col.names = T, sep = ",")
This is easily opened by excel and works well. The problem is I need to include the other data frame in the very same excel file, and I'm not sure how to do this or if it is even possible.
I am open to any ideas, and have provided some example data to work with below.
#Example data for data frame one, length =30
Dates<-c(Sys.Date()+1:30)
Data1<-c(1+1:30)
#Data Frame One
Df1<-data.frame(Dates,Data1)
#Example data for data rame two, length=10
Letters<-c(letters[1:10])
Data2<-c(1:10)
#Data Frame Two
Df2<-data.frame(Letters,Data2)
#Now, is there a way can we export both to the same file?
#Here is the export for just data frame one
write.table(Df1, file = paste("DFone.csv" ),
row.names = F, col.names = T, sep = ",")
Any ideas including:"stop being picky and just export 2 files and then merge in excel" are appreciated.
Research Done:
I like this approach but would prefer a horizontal format instead of vertical
(I should probably just not be picky)
How to merge multiple data frame into one table and export to Excel?
How to write multiple tables, dataframes, regression results etc - to one excel file?
Thanks for all the help!
I have no idea if this preserves the information structure that you want but you are really intent on getting them into the same table you could do the following.
Both <- data.frame(Df1,Df2)
write.table(Both, file = paste("DF_Both.csv" ),
row.names = F, col.names = T, sep = ",")
Because the first solution did not meet your requirements here is another one that saves data frames to multiple tabs of an excel spreadsheet.
install.packages("xlsx")
library(xlsx)
###Define the save.xlsx function
save.xlsx <- function (file, ...)
{
require(xlsx, quietly = TRUE)
objects <- list(...)
fargs <- as.list(match.call(expand.dots = TRUE))
objnames <- as.character(fargs)[-c(1, 2)]
nobjects <- length(objects)
for (i in 1:nobjects) {
if (i == 1)
write.xlsx(objects[[i]], file, sheetName = objnames[i])
else write.xlsx(objects[[i]], file, sheetName = objnames[i],
append = TRUE)
}
print(paste("Workbook", file, "has", nobjects, "worksheets."))
}
### Save the file to your working directory.
save.xlsx("WorkbookTitle.xlsx", Df1, Df2)
Full discolsure this was adapted from another question on stack overflow R dataframes to multi sheet Excel Work

Using lapply to apply a function over read-in list of files and saving output as new list of files

I'm quite new at R and a bit stuck on what I feel is likely a common operation to do. I have a number of files (57 with ~1.5 billion rows cumulatively by 6 columns) that I need to perform basic functions on. I'm able to read these files in and perform the calculations I need no problem but I'm tripping up in the final output. I envision the function working on 1 file at a time, outputting the worked file and moving onto the next.
After calculations I would like to output 57 new .txt files named after the file the input data first came from. So far I'm able to perform the calculations on smaller test datasets and spit out 1 appended .txt file but this isn't what I want as a final output.
#list filenames
files <- list.files(path=, pattern="*.txt", full.names=TRUE, recursive=FALSE)
#begin looping process
loop_output = lapply(files,
function(x) {
#Load 'x' file in
DF<- read.table(x, header = FALSE, sep= "\t")
#Call calculated height average a name
R_ref= 1647.038203
#Add column names to .las data
colnames(DF) <- c("X","Y","Z","I","A","FC")
#Calculate return
DF$R_calc <- (R_ref - DF$Z)/cos(DF$A*pi/180)
#Calculate intensity
DF$Ir_calc <- DF$I * (DF$R_calc^2/R_ref^2)
#Output new .txt with calcuated columns
write.table(DF, file=, row.names = FALSE, col.names = FALSE, append = TRUE,fileEncoding = "UTF-8")
})
My latest code endeavors have been to mess around with the intial lapply/sapply function as so:
#begin looping process
loop_output = sapply(names(files),
function(x) {
As well as the output line:
#Output new .csv with calcuated columns
write.table(DF, file=paste0(names(DF), "txt", sep="."),
row.names = FALSE, col.names = FALSE, append = TRUE,fileEncoding = "UTF-8")
From what I've been reading the file naming function during write.table output may be one of the keys I don't have fully aligned yet with the rest of the script. I've been viewing a lot of other asked questions that I felt were applicable:
Using lapply to apply a function over list of data frames and saving output to files with different names
Write list of data.frames to separate CSV files with lapply
to no luck. I deeply appreciate any insights or paths towards the right direction on inputting x number of files, performing the same function on each, then outputting the same x number of files. Thank you.
The reason the output is directed to the same file is probably that file = paste0(names(DF), "txt", sep=".") returns the same value for every iteration. That is, DF must have the same column names in every iteration, therefore names(DF) will be the same, and paste0(names(DF), "txt", sep=".") will be the same. Along with the append = TRUE option the result is that all output is written to the same file.
Inside the anonymous function, x is the name of the input file. Instead of using names(DF) as a basis for the output file name you could do some transformation of this character string.
example.
Given
x <- "/foo/raw_data.csv"
Inside the function you could do something like this
infile <- x
outfile <- file.path(dirname(infile), gsub('raw', 'clean', basename(infile)))
outfile
[1] "/foo/clean_data.csv"
Then use the new name for output, with append = FALSE (unless you need it to be true)
write.table(DF, file = outfile, row.names = FALSE, col.names = FALSE, append = FALSE, fileEncoding = "UTF-8")
Using your code, this is the general idea:
require(purrr)
#list filenames
files <- list.files(path=, pattern="*.txt", full.names=TRUE, recursive=FALSE)
#Call calculated height average a name
R_ref= 1647.038203
dfTransform <- function(file){
colnames(file) <- c("X","Y","Z","I","A","FC")
#Calculate return
file$R_calc <- (R_ref - file$Z)/cos(file$A*pi/180)
#Calculate intensity
file$Ir_calc <- file$I * (file$R_calc^2/R_ref^2)
return(file)
}
output <- files %>% map(read.table,header = FALSE, sep= "\t") %>%
map(dfTransform) %>%
map(write.table, file=paste0(names(DF), "txt", sep="."),
row.names = FALSE, col.names = FALSE, append = TRUE,fileEncoding = "UTF-8")

Creating text files from column in data frame in R without for loop

I am trying to create individual text files from columns in a data-frame using dplyr and the and the map function from the purrr package so that I do not have to create a for loop and can use the the existing column names as the file name for the new txt file.
Here is the dataframe:
n = c(2, 3, 5)
s = c("aa", "bb", "cc")
b = c(TRUE, FALSE, TRUE)
df = data.frame(n, s, b)
Then I created this function:
textfilecreate <- function(filename){
filename1 <- noquote(names(filename))
colunmname <- select(filename, filename1)
myfile <- paste0( "_", colunmname, ".txt")
write.table(colunmname, file = myfile, sep = "", row.names = FALSE,
col.names = FALSE, quote = FALSE, append = FALSE)
}
Then I called the map function:
map(data_link, textfilecreate)
I got this error:
Error in noquote(names(filename)) : attempt to set an attribute on NULL
I know that I am missing something but I cannot quite pinpoint what.
Thanks in advance.
One of the difficulties here is that map loops through each column one at a time, so you end up working on a vector of values instead of data.frame. This leads to the problems you were having with noquote.
However, you don't need to do any select-ing here, as map will loop through and return each column. The remaining issue is how to get the names for the file names.
One alternative is to loop through the dataset and the column names simultaneously, creating the file name with the names and using each column as the file to save. I use walk2 instead of map2 to loop through two lists simultaneously as it doesn't create a new list.
Two argument function:
textfilecreate = function(filename, name){
myfile = paste0( "_", name, ".txt")
write.table(filename, file = myfile, sep = "", row.names = FALSE,
col.names = FALSE, quote = FALSE, append = FALSE)
}
Now loop through the dataset and the column names via walk2. The first list is used as the first argument and the second list as the second argument by default.
walk2(df, names(df), textfilecreate)
You can simply use lapply like this:
lapply(names(df), function(colname) write.table(df[,colname],file=paste0(colname,'.txt')))

Resources