Writing big dataframe into txt file

Writing big dataframe into txt file - r

I have a dataframe s. I would like to write its content into an outputfile.txt
When I use the following commands:
> sink ("outputfile.txt")
> s
> sink()
I get the following message:
[ reached getOption("max.print") -- omitted 5162 rows ]
How can I write all the content of this dataframe directly into a txt file?

Don’t use sink to write table data to files, use the appropriate functions instead. In base R, that’s write.table and its sibling functions. Unfortunately the function has some rather questionable defaults — but the following for instance should work:
write.table(data, filename, sep = '\t', quote = FALSE, col.names = NA)
sink is generally only useful to capture output from functions that don’t return their output but rather echo it directly to the console (such as warnings and messages).

Related

R Read-Functions for CSV-files

I have an issue, where I'm reading in big (+500mb) CSV-files and then want to verify that all data has been read in correctly. To do so, I have been using a comparison between length() of readLines() and nrow() of read.csv2.
The following is my R-code:
df <- readFileFromServer(HOST, KEY,
paste0(SERVER_PATH, SERVER_FOLDER),
FILENAME,
FUN = read.csv2,
sep = ";",
quote = "", encoding = "UTF-8", skipNul = TRUE)
df_check <- readFileFromServer(HOST, KEY,
paste0(SERVER_PATH, SERVER_FOLDER),
FILENAME,
FUN = readLines,skipNul = TRUE)`
Then I verify that all data was loaded, by checking:
if(nrow(df) != (length(df_check) - dif)){
stop("some error msg")
}
dif is set to 1, to account for header in the CSV-files.
This check is the part that fails for a given CSV-file.
This has been working as intended up until this point, but now this check is causing issues, but I cannot fully understand why.
The one CSV-file that fails the check has "NULL" in the data, which I believe readLines interprets as a delimiter, thus causing a new line, and then the check fails, but I'm really not sure.
I tried parsing different parameters to my readfunctions, but issue still persists.
I expect readlines and read.csv2 to result in equal the same length()-1 and nrow() respectively, as shown in my code-snippet.

This is not a proper answer, but it was too long for a comment. This would be my debug strategy here.
Pick a file that fails. Slurp it with readLines.
Save the file locally using writeLines.
Your first job is to make sure that the check fails also when the file
is loaded from the disk. My first thought would be that the file transfer the first time you have run readFilesFromServer and the second time were not precisely identical.
Now. If your problem persists for the given file when you read it locally with read.csv (different number of rows than number of lines in the readLine output), your job becomes much easier (and faster, probably) to solve.
First, take a look at the beginning of the CSV file and at its end. Are they as they should be? Do they match the data in the head and tail of your data frame? If yes, then you need to find the missing lines systematically.
Since CSV is just comma separated files, you can compare each line read from the CSV file with readLines with the line as it should be based on the table you have read using read.csv. How this should be done, depends on how your original csv file looks like (whether you need to insert quotes etc.). Basically, you need to figure out a way of restoring the lines of the CSV file from the data in your data frame, and then looking for the first line that is different.
Here is some code to give you an idea what I mean:
## first, prepare data – for this example only!
f <- file("test.csv", "w")
writeLines(c("a,b,c", "1,what ever,42", "12,89,one"), f)
close(f)
## actual test
## first, read the file with readlines
f <- file("test.csv", "r")
rl <- readLines(f)
close(f)
## then, read it with test.csv
csv <- read.csv("test.csv")
## third, prepare the lines as they should look based on the CSV
rl_sim <- do.call(paste, c(csv, sep=","))
## find the first mismatch
for(i in 1:length(rl_sim)) {
if(rl_sim[i] != rl[i + 1]) {
message("Problems start at line ", i, "\n", rl_sim[i], rl[i + 1])
break
}
}

Data table fread with zip file in other directory with spaces in the name

I am trying to read a csv in a zip file by using the command fread("unzip -cq file.zip") which works perfectly when the file is in my working directory.
But when I try the command by specifying the path of the file without changing the directory say fread("unzip -cq C:/Users/My user/file.zip") I get an error saying the following unzip: cannot find either C:/Users/My or C:/Users/My.zip
The reason why this happens is that there are spaces in my path but what would be the workaround?
The only option that I have thought is to just change to the directory where each file is located and read it from there but this is not ideal.

I use shQuote for this, like...
fread_zip = function(fp, silent=FALSE){
qfp = shQuote(fp)
patt = "unzip -cq %s"
thecall = sprintf(patt, qfp)
if (!silent) cat("The call:", thecall, sep="\n")
fread(thecall)
}
Defining a pattern and then substituting in with sprintf can keep things readable and easier to manage. For example, I have a similar wrapper for .tar.gz files (which apparently need to be unzipped twice with a | pipe between the steps).
If your zip contains multiple csvs, fread isn't set up to read them all (though there's an open issue). My workaround for that case currently looks like...
library(magrittr)
fread_zips = function(fp, unzip_dir = file.path(dirname(fp), sprintf("csvtemp_%s", sub(".zip", "", basename(fp)))), silent = FALSE, do_cleanup = TRUE){
# only tested on windows
# fp should be the path to mycsvs.zip
# unzip_dir should be used only for CSVs from inside the zip
dir.create(unzip_dir, showWarnings = FALSE)
# unzip
unzip(fp, overwrite = TRUE, exdir = unzip_dir)
# list files, read separately
# not looking recursively, since csvs should be only one level deep
fns = list.files(unzip_dir)
if (!all(tools::file_ext(fns) == "csv")) stop("fp should contain only CSVs")
res = lapply(fns %>% setNames(file.path(unzip_dir, .), .), fread)
if (do_cleanup) unlink(unzip_dir, recursive = TRUE)
res
}
So, because we're not passing a command-line call directly to fread, there's no need for shQuote here. I wrote and used this function yesterday, so there are probably still some oversights or bugs.
The magrittr %>% pipe part could be written as setNames(file.path(unzip_dir, fns), fns) instead.

Try to assign the location to a variable and use paste to call the zip file like below:
myVar<-"C:/Users/Myuser/"
fread(paste0("unzip -cq ",myVar,"file.zip"))

Exporting Data to CSV in R

I have merged a bunch of csv files but cant get them to export to one file correctly what am i doing wrong?The data shows up in my console but I get a error that says "Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ""function"" to a data.fram",
setwd("c:/users/adam/documents/r data/NBA/DK/TEMP")
filenames <- list.files("c:/users/adam/documents/r data/NBA/DK/TEMP")
do.call("rbind",lapply(filenames, read.csv, header = TRUE))
write.csv(read.csv, file ='Lineups.csv')

You did not assign the results of do.call function to anything. Fairly common R noob error. Failure to understand the functional programming paradigm. Results need to be assigned to R names or they just get garbage-collected.
The error is actually from the code that you didn't put in a code block:
write.csv(read.csv, file ='Lineups.csv')
The 'read.csv' was presumably your intended name for the result of the do.call-operation, except it is by default a function name rather than your expectation. You could assign the do.call-results to the name 'read.csv' but doing so is very poor practice. Choose a more descriptive name like 'TEMP_files_appended'.
TEMP_files_appended <- do.call("rbind",lapply(filenames, read.csv, header = TRUE))
write.csv(TEMP_files_appended, file ='Lineups.csv')
(I will observe that using header=TRUE for read.csv is not needed since that is the default for that function.)

Adding items to txt output in the same file from

I would like to printout to the same txt (outfile.txt) file items one after the other.
For instance, first I would like to print to outfile.txt a dataframe - u. Afterwards, a written message 'hello' and finally a summary of model.
How can I do it? Is sink(outfile.txt) is appropriate for this case?

It is generally a very bad idea to mix data in the same file. I advise against it in the strongest terms: it makes the data file next to unusable for other programs.
That said, most functions to save data have an append argument. You can set this to TRUE to append to an existing file rather than overwriting its contents. No need for sink.
Where you do need sink (or equivalent) is when you want to write contents formatted in the same way as it’s written on the console. This, for instance, is the case for summary.
Here’s an example similar to your requirements:
filename = 'test.txt'
write.table(head(cars), filename, quote = FALSE, col.names = NA)
cat('\nHello\n\n', file = filename, append = TRUE)
capture.output(print(summary(cars)), file = filename, append = TRUE)
Rather than sink, this uses capture.output, which is a convenience wrapper around sink.

add header to file created by "write.csv"

I am trying to automate some data exporting, and I would like to add a header to each file such as "please cite Bob and Jane 2008" ... or even a few lines of specific instructions depending on the context.
I have looked at the write.csv and write.table documentation, but do not see any such feature.
What is the easiest way to achieve this?

Here are two possible approaches - the solution under EDIT using connections is more flexible and efficient.
Using write.table(...,append = T) and cat
Use append=T within a call to write.table, having cat the header there previously
wrapped in its own function....
write.table_with_header <- function(x, file, header, ...){
cat(header, '\n', file = file)
write.table(x, file, append = T, ...)
}
Note that append is ignored in a write.csv call, so you simply need to call
write.table_with_header(x,file,header,sep=',')
and that will result in a csv file.
EDIT
using connections
(Thanks to #flodel whose suggestion is this)
my.write <- function(x, file, header, f = write.csv, ...){
# create and open the file connection
datafile <- file(file, open = 'wt')
# close on exit
on.exit(close(datafile))
# if a header is defined, write it to the file (#CarlWitthoft's suggestion)
if(!missing(header)) writeLines(header,con=datafile)
# write the file using the defined function and required addition arguments
f(x, datafile,...)
}
Note that this version allows you to use write.csv or write.table or any function and uses a file connection which
(as #flodel points out in the comments)
will only open and close the file once, and automatically appends. Therefore it is more efficient!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Writing big dataframe into txt file - r

Related

R Read-Functions for CSV-files

Data table fread with zip file in other directory with spaces in the name

Exporting Data to CSV in R

Adding items to txt output in the same file from

add header to file created by "write.csv"

Categories

Resources