Reading Email Attachment .xls in R - r

I am using R to read attached files from Outlook. I've tried already these steps and they work perfectly Reading Email Attachment to R.
The problem is that my files are in a xls format so read.csv doesnt work.
I have this error:
data <- read.csv(attachment_file)
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on
'C:\Users\GABRIE~1.BEN\AppData\Local\Temp\RtmpWmxRwP\file155c162721e8'
When I tried to open it with read.excel function I get this error
data <- read_excel(attachment_file, sheet = 1)
Error: Missing file extension.

You should use this "/" instead of this "\" in your file url.
Like this:
'C:/Users/GABRIE~1.BEN/......"

Related

Reading csv file using R and RStudio

I am trying to read a csv file in R, but I am getting some errors.
This is what I have and also I have set the correct path
mydata <- read.csv("food_poisioning.csv")
But I am getting this error
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string at '<ff><fe>Y'
In addition: Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 2 appears to contain embedded nulls
I believe I am getting this error because my csv file is actually not separated by comma, but it has spaces. This is what is looks like:
I tried using sep=" ", but it didn't work.
If you're having difficulty using read.csv() or read.table() (or writing other import commands), try using the "Import Dataset" button on the Environment panel in RStudio. It is useful especially when you are not sure how to specify the table format or when the table format is complex.
For your .csv file, use "From Text (readr)..."
A window will pop up and allow you to choose a file/URL to upload. You will see a preview of the data table after you select a file/URL. You can click on the column names to change the column class or even "skip" the column(s) you don't need. Use the Import Options to further manage your data.
Here is an example using CreditCard.csv from Vincent Arel-Bundock's Github projects:
You can also modify and/or copy and paste the code in Code Preview, or click Import to run the code when you are ready.

not able to read file using read.csv in R

I am not able to read a csv file in R. The file I imported need some cleaning such as removing text qualifiers such ",' etc. Still I am unable to read it. shows the following error.
currency<-read.csv("02prepared data/scraped data kickstarter/film & video1.csv")
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, numerals = numerals, :
invalid multibyte string at '45,<30>97'
here is the link to the file:- https://drive.google.com/open?id=1ABXPoYxk8b4WCQuRAu-Hhh2OvpJ76PhH
You can try setting fileEncoding = 'latin1', as suggested in this answer:
https://stackoverflow.com/a/14363274/6304113
I tried the method in the link to read your file, and it works for me.

Reading a csv file from aws datalake

I am trying to read a csv file from aws datalake using R.
I used the below code to read the data, unfortunately I am getting an error
Error in read.table(file = file, header = header, sep = sep, quote =
quote, : no lines available in input
I am using the below code,
aws.signature::use_credentials()
c<- get_object("s3://datalake-1/x-data/")
cobj<- rawToChar(c)
con<- textConnection(cobj)
data <- read.csv(con)
close(con)
data
It looks like the file is not present at the address/URI provided. Unable to reproduce this error so, maybe look for your CSV's correct location.
Apart from that I'd also put the read statement within tryCatch as referenced in an already existing answer linked here

Scopus_ReadCSV {CITAN} not working with csv file exported from Scopus

I am using Rstudio with R 3.3.1 on Windows 7 and I have installed CITAN package. I am trying to import bibliography entries from a CSV file that I exported from Scopus (as it is, untouched), choosing to export all available information.
This is the error that I get:
example <- Scopus_ReadCSV("scopus.csv")
Error in Scopus_ReadCSV("scopus.csv") : Column not found: `Source'.
In addition: Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
invalid input found on input connection 'scopus.csv'
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'scopus.csv'
Column `Source' is there when I open the file, so I do not know why it says 'not found'.
Eventually I came into the following conclusions:
The encoding of the CSV file as exported from Scopus was UTF-8-BOM, which does not seem to be recognized from R when using Scopus_readCSV("file.csv") or read.table("file.csv", header = TRUE, sep = ",", fileEncoding = "UTF-8").
Although it is used an encoding type for the file from Scopus, there can be found some "strange" non-english characters which are not readable from the read function in R. (Mainly found this problem in names with special characters)
Solutions for those issues:
Open the CSV file with a notepad application like the Notepad++ and save the file with UTF-8 encoding to become readable for R as UTF-8.
When running the read function in R you will notice that it stops reading (e.g. in the 40th out of 200 registries). See where exactly it stopped and this way you can find the special character, by opening the CSV with the notepad, and then you can erase/change it as you wish in order to not have the same issue in R again.
Another solution that worked for me:
Open the file in Google Sheets, then download it from there again as a *.csv-file. R opens it correctly afterwards.

Get "embedded nul(s) found in input" when reading a csv using read.csv()

I was reading in a csv file.
Code is:
mydata = read.csv("mycsv.csv", header=True, sep=",", quote="\"")
Get the following warning:
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
embedded nul(s) found in input
Now some cells in my CSV have missing values that are represented by "".
How do I write this code so that I do not get the above warning?
Your CSV might be encoded in UTF-16. This isn't uncommon when working with some Windows-based tools.
You can try loading a UTF-16 CSV like this:
read.csv("mycsv.csv", ..., fileEncoding="UTF-16LE")
You can try using the skipNul = TRUE option.
mydata = read.csv("mycsv.csv", quote = "\"", skipNul = TRUE)
From ?read.csv
Embedded nuls in the input stream will terminate the field currently being read, with a warning once per call to scan. Setting skipNul = TRUE causes them to be ignored.
It worked for me.
This is nothing to do with the encoding. This is the problem with reading of the nulls in the file. To handle that, you need to pass the skipNul = TRUE paramater.
for example:
neg = scan('F:/Natural_Language_Processing/negative-words.txt', what = 'character', comment.char = '', encoding = "UTF-8", skipNul = TRUE)
Might be a file that do not have CRLF, might only have LF. Try to check the HEX output of the file.
If so. Try running the file through awk:
awk '{printf "%s\r\n", $0}' file > new_log_file
I had the same error message and figured out that although my files had a .csv extensions and opened up with no problems in a spreadsheet, they were actually saved as ¨All Formats¨ rather than ¨Text CSV (.csv)¨
Another quick solution:
Double check that you are, in fact, reading a .csv file!
I was accidentally reading a .rds file instead of .csv and got this "embedded null" error.
In those cases be sure the data you are importing does not have "#" characters but if that the case try using the option comment.char="". It worked for me.

Resources