Reading a csv file from aws datalake - r

I am trying to read a csv file from aws datalake using R.
I used the below code to read the data, unfortunately I am getting an error
Error in read.table(file = file, header = header, sep = sep, quote =
quote, : no lines available in input
I am using the below code,
aws.signature::use_credentials()
c<- get_object("s3://datalake-1/x-data/")
cobj<- rawToChar(c)
con<- textConnection(cobj)
data <- read.csv(con)
close(con)
data

It looks like the file is not present at the address/URI provided. Unable to reproduce this error so, maybe look for your CSV's correct location.
Apart from that I'd also put the read statement within tryCatch as referenced in an already existing answer linked here

Related

Reading csv file using R and RStudio

I am trying to read a csv file in R, but I am getting some errors.
This is what I have and also I have set the correct path
mydata <- read.csv("food_poisioning.csv")
But I am getting this error
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string at '<ff><fe>Y'
In addition: Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 1 appears to contain embedded nulls
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 2 appears to contain embedded nulls
I believe I am getting this error because my csv file is actually not separated by comma, but it has spaces. This is what is looks like:
I tried using sep=" ", but it didn't work.
If you're having difficulty using read.csv() or read.table() (or writing other import commands), try using the "Import Dataset" button on the Environment panel in RStudio. It is useful especially when you are not sure how to specify the table format or when the table format is complex.
For your .csv file, use "From Text (readr)..."
A window will pop up and allow you to choose a file/URL to upload. You will see a preview of the data table after you select a file/URL. You can click on the column names to change the column class or even "skip" the column(s) you don't need. Use the Import Options to further manage your data.
Here is an example using CreditCard.csv from Vincent Arel-Bundock's Github projects:
You can also modify and/or copy and paste the code in Code Preview, or click Import to run the code when you are ready.

fread() in R unable to open a file

I am trying to open a file in R as shown below:
data0 <- filename_a %>% map_df(~fread(., sep=",", skip=1))
Let us assume that fread fails to read this file for various reasons. Such as the file is under use by other program or the file does not exist. In such a case I would like to read filename_b instead.
At this moment, as soon as the above step fails, the code stops executing. How can I read filename_b when filename_a fails to read?
You can try using tryCatch as follows :
library(data.table)
data <- tryCatch(fread(filename_a, sep=",", skip=1),
error = function(e) return(fread(filename_b, sep=",", skip=1)))

Loading csv files from a zip file in R results in no lines available in input error

not a question but a problem somebody else might stumble upon. I handle some data in csvs each week that is put into zip files to save space.
Usually I can easily read the csvs in the zip file with this code:
connections = unz(zip_path,csv_file)
DAT_r = read.csv2(connections, sep=";", dec=",", header=TRUE, stringsAsFactors=TRUE,
encoding="latin1", fill=TRUE, check.names=FALSE)
Today however I got the misleading error:
Error in read.table(file = file, header = header, sep = sep, quote = quote) : no lines available in input
After tedious checking of the csvs I realised the zip file was larger than usual. Indeed the size was too big to read in which spawned the error. Splitting the zip in two files resolved the error
Cheers

Importing multiple csv files with lappy

When i need to import multiple csv files I use:
Cluster <- lapply(dir(),read.csv)
Previously setting the working directoy of course, but somehow today it stopped working, and returning this error message:
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
The only unusual thing i did was setting the Java directory manually so that way rJava can be loaded.
Any idea what happen?

issues of reading csv files using read.table [duplicate]

This question already has answers here:
'Incomplete final line' warning when trying to read a .csv file into R
(17 answers)
Closed 9 years ago.
I am trying to import CSV files to graph for a project. I'm using R 2.15.2 on a Mac OS X.
The first way tried
The script I'm trying to run to import the CSV file is this:
group4 <- read.csv("XXXX.csv", header=T)
But I keep getting this error message:
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
object 'XXXXXX.csv' not found
The second way tried
I tried moving my working directory but got another error saying I can't move my working directory. So I went into Preferences tab and changed the working directory to the file that has my CSV files. But I still get the same error(as the first way).
The third way tried
Then I tried this script:
group4 <- read.table(file.choose(), sep="\t", header=T)
And I get this error:
Warning message:
In read.table(file.choose(), sep = "\t", header = T) :
incomplete final line found by readTableHeader on '/Users/xxxxxx/Documents/Programming/R/xxxxxx/xxxxxx.csv'
I've searched on the R site and all over the Internet, and nothing has got me to the point where I can import this simple CSV file into the R console.
The file is not in your working directory, change it, or use an absolute path.
Than you are pointing to a non-existing directory, or you do not have write privileges there.
The last line of your file is malformed.
As to the missing EOF (i.e. last line in file is corrupted)...
Usually, a data file should end with an empty line. Perhaps check your file if that is the case.
As an alternative, I would suggest to try out readLines(). This function reads each line of your data file into a vector. If you know the format of your input, i.e. the number of columns in the table, you could do this...
number.of.columns <- 5 # the number of columns in your data file
delimiter <- "\t" # this is what separates the values in your data file
lines <- readLines("path/to/your/file.csv", -1L)
values <- unlist(lapply(lines, strsplit, delimiter, fixed=TRUE))
data <- matrix(values, byrow=TRUE, ncol=number.of.columns)

Resources