Import data csv with particular quotes in R - r

I have a csv like this:
"Data,""Ultimo"",""Apertura"",""Massimo"",""Minimo"",""Var. %"""
"28.12.2018,""86,66"",""86,66"",""86,93"",""86,32"",""0,07%"""
What is the solution for importing correctly please?
I tried with read.csv("IT000509408=MI Panoramica.csv", header=T,sep=",", quote="\"") but it doesn't work.

Each row in your file is encoded as a single csv field.
So instead of:
123,"value"
you have:
"123,""value"""
To fix this you can read the file as csv (which will give you one field per row without the extra quotes), and then write the full value of that field to a new file as plain text (without using a csv writer).

Related

Missing delimiter error when importing html text

Playing with Azure Machine Learning using the Designer and am getting a "Delimiter not found" error when importing my data.
I originally started with a few hundred html files stored as azure blobs. Each file would be considered a single row of text, however, I had no luck importing these files for further text analytics.
I created a Data Factory job that imported each file, stripped all the tabs, quotes, cr/lf from the text, added a column for the file name and stored it all as a combined tab-delimited file. In notepad++ I can confirm that the format is FileName tab HtmlText. This is the file I'm trying to import into ML and getting the missing delimiter message as I'm trying to define the import module.
Here is the error when I try and create a dataset:
{
"message": "'Delimiter' is not specified or invalid."
}
Question 1: Is there a better way to do text analytics on a large collection of html files?
Question 2: Is there a format I need to use in my combined .tsv file that works?
Question 3: Is there maybe a max length to the string column? My html can be 10's of thousands of characters long.
you're right that it might be line length, but my guess is that there are still some special characters (i.e. anything starting with \ that aren't properly escaped or removed. How did you scrape and strip the text data? Have you tried using beautifulsoup?

How can I check if the input file and output file are in the same format in R

enter image description hereI imported my inputs from a "Table1.txt" file using read.table, then I worked on my table then I would like to save my outputs in a new text file "Table1Modifed.txt" using write.table and keep everything in the same format
I would like to check if the files "Table1.txt" and "TableModified.txt" are exactly in the same format(Number of digits,Uppercase Lower case...)
If you would like to compare contents of two files in R, you cause diffr() from the diffr package. This will point contents that are different. Is this what you are looking for?

Read a CSV Directly Into R as a String

I have a CSV file that I want to read into R without making it a Data Frame. It seems like it would be quite simple but I can't figure out how to do it. I quite literally just want the CSV file to read in as it would appear in a text editor. The reason for this is I need to feed the string into an API.
Using read.csv() obviously won't work for this because it automatically reads in as a df.
Try readLines()
This will read in the file with each line being a value in a vector. You'll need to then wrap that in a paste(readLines(),collapse="\n") to have it be a single text string that could be passed to an API.

Converting df to csv format, without creating a file

I am creating a process to converting an API data into a df.
My problem is:
The data just appears correct after exporting to a csv file, using ' df.to_csv("df.csv", sep=','). If I don't do that, the first column appears a big data list.
Is there a way to do this process of convert to csv format without creating an external file ?
From the documentation of DataFrame.to_csv:
path_or_buf : string or file handle, default None
File path or object, if None is provided the result is returned as a
string.
So simply doing:
csv_string = df.to_csv(None, sep=",")
Gives you a string containing a csv representation of your dataframe without creating an external file.

Text under generated table in .csv file in R

For weekly reports I generate, I create a .csv file for my outputs which has 9 columns and 7 rows.
I use this command to create my file:
write.csv(table, paste('home/weekly_',start.date,'.csv',sep=''), row.names=F)
Note: 'table' is a matrix (I believe that's the right R terminology)
I would like to add a footnote/note under the table in this file, would this be possible?
For example, if I were to create a text file instead of a .csv file, I would use the following commands:
cat("Number of participants not eligible:")
cat(length(which((tbl[,'Reg_age_dob']<=18) & as.Date(tbl[,'DateWithdrew'])>='2013-01-01'& as.Date(tbl[,'DateWithdrew'])<'2013-04-01' & as.Date(tbl[,'QuestionnaireEndDate'])<'2013-01-01' )))
cat("\n")
How would I do this to appear under the table in a .csv output file?
After writing the CSV part, just append the rest as new lines using
write("Footer",file="myfile",append=TRUE)
Solution from here: Add lines to a file
But be aware of the fact, that a CSV parser will be a upset, if you do not use comment tags correctly.
It might be better to use a second file for your purpose.

Resources