Unable to load .csv file in R on my new laptop - r

I am facing a weird problem with R while loading a .csv file. Till last week I was working on a assignment in R having loaded a .csv file worth more than 1 million records.
I got a new company laptop and installed R and the following packages:
library(dplyr)
library(lubridate)
library(hms)
library(stringr)
library(tidyr)
library(ggplot2)
library(gridExtra)
library(tidyverse)
library(chron)
library(corrplot)
library(rio)
library(data.table)
library(openxlsx)
library(readr)
But surprisingly I am unable to run the read.csv command to read the csv file. I am using the following commands after giving the setwd command and getting the errors given below.
consumer_data <- read.csv("ConsumerElectronics.csv", fileEncoding="UTF-8-BOM")
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
consumer_data <- read.csv("ConsumerElectronics.csv", stringsAsFactors = "FALSE")
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
Don't know what's causing this problem. Earlier I was not getting this issue at all. Have I missed to install any package to read a csv file?
I want only the above code to work. Please do not provide any other alternative solution as this code was working earlier and reading the .csv file from the location where its stored.

Related

Loading csv files from a zip file in R results in no lines available in input error

not a question but a problem somebody else might stumble upon. I handle some data in csvs each week that is put into zip files to save space.
Usually I can easily read the csvs in the zip file with this code:
connections = unz(zip_path,csv_file)
DAT_r = read.csv2(connections, sep=";", dec=",", header=TRUE, stringsAsFactors=TRUE,
encoding="latin1", fill=TRUE, check.names=FALSE)
Today however I got the misleading error:
Error in read.table(file = file, header = header, sep = sep, quote = quote) : no lines available in input
After tedious checking of the csvs I realised the zip file was larger than usual. Indeed the size was too big to read in which spawned the error. Splitting the zip in two files resolved the error
Cheers

Reading a csv file from aws datalake

I am trying to read a csv file from aws datalake using R.
I used the below code to read the data, unfortunately I am getting an error
Error in read.table(file = file, header = header, sep = sep, quote =
quote, : no lines available in input
I am using the below code,
aws.signature::use_credentials()
c<- get_object("s3://datalake-1/x-data/")
cobj<- rawToChar(c)
con<- textConnection(cobj)
data <- read.csv(con)
close(con)
data
It looks like the file is not present at the address/URI provided. Unable to reproduce this error so, maybe look for your CSV's correct location.
Apart from that I'd also put the read statement within tryCatch as referenced in an already existing answer linked here

fread issue with archive package unzip file in R

I am having issues while trying to use fread, after I unzip a file using the archive package in R. The data I am using can be downloaded from https://www.kaggle.com/c/favorita-grocery-sales-forecasting/data
The code is as follows:
library(dplyr)
library(devtools)
library(archive)
library(data.table)
setwd("C:/jc/2017/13.Lafavorita")
hol<-archive("./holidays_events.csv.7z")
holcsv<-fread(hol$path, header = T, sep = ",")
This code gives the error message:
File 'holidays_events.csv' does not exist. Include one or more spaces to consider the input a system command.
Yet if I try:
holcsv1<-read.csv(archive_read(hol),header = T,sep = ",")
It works perfectly. I need to use the fread command because the other data bases I need to open are too big to use read.csv. I am puzzled because my code was working fine a few days ago. I could unzip the files manually, but that is not the point. I have tried to solve this problem for hours, but I cannot seem to find anything useful on the documentation. I found this: https://github.com/yihui/knitr/blob/master/man/knit.Rd#L104-L107 , but I cannot understand it.
Turns out the answer is rather simple, but I found it by luck. So after using the archive function you need to pass it to the archive_extract function. So in my case, I should add the following to the code: hol1<-archive_extract(hol) . Then I have to change the last line to: holcsv<-fread(hol1$path, header = T, sep = ",")

Can't read csv into Spark using spark_read_csv()

I'm trying to use sparklyr to read a csv file into R. I can read the .csv into R just fine using read.csv(), but when I try to use spark_read_csv() it breaks down.
accidents <- spark_read_csv(sc, name = 'accidents', path = '/home/rstudio/R/Shiny/accident_all.csv')
However, when I attempt to execute this code I receive the following error:
Error in as.hexmode(xx) : 'x' cannot be coerced to class "hexmode"
I haven't found much by Googling that error. Can anyone shed some light onto what is going on here?
Yes, local .csv files can be read easily in Spark Data frame using spark_read_csv(). I have a .csv file in Documents directory and I have read it using the following code snippet. I thing there is no need to use file:// prefix. Below is the snippet:
Sys.setenv(SPARK_HOME = "C:/Spark/spark-2.0.1-bin-hadoop2.7/")
library(SparkR, lib.loc = "C:/Spark/spark-2.0.1-bin-hadoop2.7/R/lib")
library(sparklyr)
library(dplyr)
library(data.table)
library(dtplyr)
sc <- spark_connect(master = "local", spark_home = "C:/Spark/spark-2.0.1-bin-hadoop2.7/", version = "2.0.1")
Credit_tbl <- spark_read_csv(sc, name = "credit_data", path = "C:/Users/USER_NAME/Documents/Credit.csv", header = TRUE, delimiter = ",")
You can see the dataframe just by calling the object name Credit_tbl.

R Reading in a zip data file without unzipping it

I have a very large zip file and i am trying to read it into R without unzipping it like so:
temp <- tempfile("Sales", fileext=c("zip"))
data <- read.table(unz(temp, "Sales.dat"), nrows=10, header=T, quote="\"", sep=",")
Error in open.connection(file, "rt") : cannot open the connection
In addition: Warning message:
In open.connection(file, "rt") :
cannot open zip file 'C:\Users\xxx\AppData\Local\Temp\RtmpyAM9jH\Sales13041760345azip'
If your zip file is called Sales.zip and contains only a file called Sales.dat, I think you can simply do the following (assuming the file is in your working directory):
data <- read.table(unz("Sales.zip", "Sales.dat"), nrows=10, header=T, quote="\"", sep=",")
The methods of the readr package also support compressed files if the file suffix indicates the nature of the file, that is files ending in .gz, .bz2, .xz, or .zip will be automatically uncompressed.
require(readr)
myData <- read_csv("foo.txt.gz")
No need to use unz, as now read.table can handle the zipped file directly:
data <- read.table("Sales.zip", nrows=10, header=T, quote="\"", sep=",")
See this post
This should work just fine if the file is sales.csv.
data <- readr::read_csv(unzip("Sales.zip", "Sales.csv"))
To check the filename without extracting the file. This works
unzip("sales.zip", list = TRUE)
If you have zcat installed on your system (which is the case for linux, macos, and cygwin) you could also use:
zipfile<-"test.zip"
myData <- read.delim(pipe(paste("zcat", zipfile)))
This solution also has the advantage that no temporary files are created.
The gzfile function along with read_csv and read.table can read compressed files.
library(readr)
df = read_csv(gzfile("file.csv.gz"))
library(data.table)
df = read.table(gzfile("file.csv.gz"))
read_csv from the readr package can read compressed files even without using gzfile function.
library(readr)
df = read_csv("file.csv.gz")
read_csv is recommended because it is faster than read.table
In this expression you lost a dot
temp <- tempfile("Sales", fileext=c("zip"))
It should be:
temp <- tempfile("Sales", fileext=c(".zip"))
For remote-based zipped files
samhsa2015 <- fread("curl https://www.opr.princeton.edu/workshops/Downloads/2020Jan_LatentClassAnalysisPratt_samhsa_2015F.zip | funzip")
answer from here: https://stackoverflow.com/a/37824192/12387385)

Resources