Read a CSV Directly Into R as a String - r

I have a CSV file that I want to read into R without making it a Data Frame. It seems like it would be quite simple but I can't figure out how to do it. I quite literally just want the CSV file to read in as it would appear in a text editor. The reason for this is I need to feed the string into an API.
Using read.csv() obviously won't work for this because it automatically reads in as a df.

Try readLines()
This will read in the file with each line being a value in a vector. You'll need to then wrap that in a paste(readLines(),collapse="\n") to have it be a single text string that could be passed to an API.

Related

R write fixed width columns to csv (in the plain text file)

I write data frames to csv files using write.csv(). When this is done, the output when viewed in a plain text editor, in particular vi or notepad++, shows no spacing between the column content and the commas, resulting in it being relatively hard to read. For example, the columns are not lined up down the page.
I have negative interest in using excel to view the csv files. I am definitely not looking for a suggestion for a csv viewer. Nor do I want instructions on how to modify the plain text file afterward. Padding needs to be spaces not tabs.
I am interested in how to get R to line up the columns in the plain text csv file so that they are easier to read using a non specialized plain text editor.
I could (and might) write my own routine that converts everything to some fixed width string format and print that. But, I would prefer to find that this is an option within write.csv() or similar common output library call.
[I just this moment found out about printf in R, and that might be the best answer to this conundrum].

Writing to a CSV file producing errors

I am using R to analyze some text data. After doing some aggregation, I had a new dataframe I wanted to write to a csv file, so I can use it in other analyses. The dataframe looks correct in r- it only has 2 columns with text data- but once I write the csv and open it, the text is scattered across different columns. Here is the code I was using:
write.csv(new_df, "4.19 Group 1_agg by user try 2.csv")
I tried adding in an extra bit of code to specify that it should be using UTF-8, since I've heard this could be an encoding error, so the code then looked like this:
write.csv(new_df, "4.19 Group 1_agg by user try 2.csv", fileEncoding = "UTF-8")
I also tried reading in the file differently (using fread instead of read.csv)
Still, the csv file looks wrong/messy in many places. Here is what it should look like:
This is what it looks like currently:
Again, I think the error must be in writing the csv file, because everything looks good in R when I check it using names and head. Any help is appreciated, thank you!

Import data csv with particular quotes in R

I have a csv like this:
"Data,""Ultimo"",""Apertura"",""Massimo"",""Minimo"",""Var. %"""
"28.12.2018,""86,66"",""86,66"",""86,93"",""86,32"",""0,07%"""
What is the solution for importing correctly please?
I tried with read.csv("IT000509408=MI Panoramica.csv", header=T,sep=",", quote="\"") but it doesn't work.
Each row in your file is encoded as a single csv field.
So instead of:
123,"value"
you have:
"123,""value"""
To fix this you can read the file as csv (which will give you one field per row without the extra quotes), and then write the full value of that field to a new file as plain text (without using a csv writer).

feed treetagger in R with text in string rather than text in file

I use TreeTagger from R, through the Korpus package.
Calling the treetag function requires me to indicate a filename,
which contains the text to be processed. However, I would like to provide a string
rather than a filename, because I have a do some preliminary text processing on this string.
I guess this has to go through a file because it is wrapping a script call.
As I am looping over 10000 texts I would like to avoid writing the file to disk and waste time,
but just flow through memory.
Can I avoid this ? Thanks.
No. Or not really. As you suspect, the external script needs a file:
read the docs:
Either a connection or a character vector, valid path to a file,
containing the text to be analyzed. If file is a connection, its
contents will be written to a temporary file, since TreeTagger can't
read from R connection objects.
So its got to write it to a file for the external TreeTagger binary to read. If you don't do that, then the treetag function does it for you. Either way, the text ends up in file.
If TreeTagger can read from a Unix named pipe, or fifo, then you might be able to stream text to it on the fly.
The only other option would be to see if the TreeTagger source can be linked with R in some way so that you can call one of its subroutines directly, passing an R object. I don't even know if this is written in Java or C++ or whatever, but it might be a big job anyway.
As indicated in the documentation:
format:
Either "file" or "obj", depending on whether you want to scan files or analyze the text in a given object, like a character vector. If the latter, it will be written to a temporary file (see file).
Using this knowledge, we can simply use the treetag()-function in combination with a character vector:
treetag(as.vector(yourinput), format = "obj").
Internally R converts it to a text file and Treetagger will refer to that temporary file and analyze it.

Excel data organized in multiple nested rows, can R read it?

Please see the picture. I've started using R, and know how/that it can read files from Excel, but can it read something formatted like this?
http://www.flickr.com/photos/68814612#N05/8632809494/
(my apologies, upload was not working for me)
Elaborating on some of what's in the comments:
If you load the file into Excel, you can save it as a fixed-width or comma-delimited text file. Either should be easy to read into R.
The following may be obvious to you already.
(First, a question: Are you sure that you can't get the data in a format that has one set of data per line? Is it possible that the file you're getting was generated from a different file format that is more conducive to loading the data into R?)
Whether you should start rearranging the data in R or instead manipulate the raw text depends on what comes naturally to you (or to people you have around who can help). For me, personally, I would rearrange the text file outside of R before loading it into R. That's what's easiest for me. Perl is a great language for this purpose, but you could also do it with Unix shell scripts if that's accessible to you, or using a powerful editor such as Vim or Emacs. If you have no preference, I'd suggest Perl. If you have any significant programming experience, you'll be able to learn what you need. On the other hand, you're already loading it into R, so maybe it would be better to process the data there.
For example, you could execute a loop that goes the text file line by line and does something like this:
while (still have lines to read) {
read first header line into an vector if this is the first time through the loop
otherwise, read it and throw it away
read data line 1 into an vector
read second header line into vector if this is the first time
otherwise, read it and throw it away
read data line 2 into an vector
read third header line into vector if this is the first time
otherwise, read it and throw it away
read data line 3 into an vector
if this is first time through, concatenate the header vectors; store as next row
in something (a file, a matrix, a dataframe, etc.)
concatenate the data vectors you've been saving, and store as next row in same thing
}
write out the whole 2D data structure
Or if the headers will never change, then you could just embed them literally into the script before the loop, and throw them out no matter what. That will make the code cleaner. Or read the first few lines of the file separately to get the headers, and then have a separate script to read the data and add it to the file with the headers in it. (The headers will probably be useful in R, so I would suggest preserving them at the top of the text file.)

Resources