Can't read csv into Spark using spark_read_csv() - r

I'm trying to use sparklyr to read a csv file into R. I can read the .csv into R just fine using read.csv(), but when I try to use spark_read_csv() it breaks down.
accidents <- spark_read_csv(sc, name = 'accidents', path = '/home/rstudio/R/Shiny/accident_all.csv')
However, when I attempt to execute this code I receive the following error:
Error in as.hexmode(xx) : 'x' cannot be coerced to class "hexmode"
I haven't found much by Googling that error. Can anyone shed some light onto what is going on here?

Yes, local .csv files can be read easily in Spark Data frame using spark_read_csv(). I have a .csv file in Documents directory and I have read it using the following code snippet. I thing there is no need to use file:// prefix. Below is the snippet:
Sys.setenv(SPARK_HOME = "C:/Spark/spark-2.0.1-bin-hadoop2.7/")
library(SparkR, lib.loc = "C:/Spark/spark-2.0.1-bin-hadoop2.7/R/lib")
library(sparklyr)
library(dplyr)
library(data.table)
library(dtplyr)
sc <- spark_connect(master = "local", spark_home = "C:/Spark/spark-2.0.1-bin-hadoop2.7/", version = "2.0.1")
Credit_tbl <- spark_read_csv(sc, name = "credit_data", path = "C:/Users/USER_NAME/Documents/Credit.csv", header = TRUE, delimiter = ",")
You can see the dataframe just by calling the object name Credit_tbl.

Related

fread() in R unable to open a file

I am trying to open a file in R as shown below:
data0 <- filename_a %>% map_df(~fread(., sep=",", skip=1))
Let us assume that fread fails to read this file for various reasons. Such as the file is under use by other program or the file does not exist. In such a case I would like to read filename_b instead.
At this moment, as soon as the above step fails, the code stops executing. How can I read filename_b when filename_a fails to read?
You can try using tryCatch as follows :
library(data.table)
data <- tryCatch(fread(filename_a, sep=",", skip=1),
error = function(e) return(fread(filename_b, sep=",", skip=1)))

Problem with reading microdata from IPUMS into R

I am trying to read the micro data from the extract that I downloaded from IPUMS USA into R. It seemed simple at first, but I can't get it. I already downloaded the DDI and CSV, and it is not working!
Would appreciate any help as how to how to get this data into R.
I've tried two different ways. I learned how to do this code from this website: https://tech.popdata.org/Integrating-IPUMS-Data-with-R/ (but apparently it was wrong).
Here's my code:
cps_ddi <- read_ipums_ddi(ipums_example("wagesdata.xml"))
cps_data <- read_ipums_micro(cps_ddi, data_file = ipums_example("usa_00004.csv"), verbose = FALSE)
Console returns this:
Error in ipums_example("wagesdata.xml") :
Could not find file 'wagesdata.xml' in examples. Available files are: cps_00006.csv.gz, cps_00006.dat.gz, cps_00006.xml, cps_00010.dat.gz, cps_00010.xml, cps_00015.dat.gz, cps_00015.xml, nhgis0008_csv.zip, nhgis0008_shape_small.zip`
cps_data <- read_ipums_micro(cps_ddi, data_file = ipums_example("usa_00004.csv"), verbose = FALSE)
Error in read_ipums_micro(cps_ddi, data_file = ipums_example("usa_00004.csv"), : object 'cps_ddi' not found
The ipums_example() function is designed to find the example data included with the R package.
However, if you're working with your own data, you don't need it.
I believe this should work:
cps_ddi <- read_ipums_ddi("wagesdata.xml")
cps_data <- read_ipums_micro(cps_ddi, data_file = "usa_00004.csv", verbose = FALSE)
If it doesn't, then most likely you haven't downloaded the data to your current working directory. You can check where your session is by running command getwd() and see what files are currently available with list.files()

Unable to load .csv file in R on my new laptop

I am facing a weird problem with R while loading a .csv file. Till last week I was working on a assignment in R having loaded a .csv file worth more than 1 million records.
I got a new company laptop and installed R and the following packages:
library(dplyr)
library(lubridate)
library(hms)
library(stringr)
library(tidyr)
library(ggplot2)
library(gridExtra)
library(tidyverse)
library(chron)
library(corrplot)
library(rio)
library(data.table)
library(openxlsx)
library(readr)
But surprisingly I am unable to run the read.csv command to read the csv file. I am using the following commands after giving the setwd command and getting the errors given below.
consumer_data <- read.csv("ConsumerElectronics.csv", fileEncoding="UTF-8-BOM")
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
consumer_data <- read.csv("ConsumerElectronics.csv", stringsAsFactors = "FALSE")
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
Don't know what's causing this problem. Earlier I was not getting this issue at all. Have I missed to install any package to read a csv file?
I want only the above code to work. Please do not provide any other alternative solution as this code was working earlier and reading the .csv file from the location where its stored.

r - defining path for read_excel with paste0

I'm working on an R script that is supposed to open an excel file from a folder in the current user computer using read_excel from readxl library.
The path will have a personal folder (C:/Users/Username....).
I'm trying to accomplish that as follows:
string <- getwd()
name <- strsplit(strsplit(x = string, split = "C:/Users/")[[1]][2], split = "/")[[1]][1]
path_crivo <- paste0("C:/Users/", name, "/some_folders/excel_file.xlsx")
So path_crivo stores the string: C:/Users/João Anselmo/some_folders/excel_file.xlsx"
When I run the read_excel function with this path I get the error:
read_excel(path_crivo)
"Error in read_fun(path = path, sheet_i = sheet, limits = limits, shim = shim, :
Evaluation error: zip file 'C:/Users/João Anselmo/some_folders/excel_file.xlsx' cannot be opened."
If I set path_crivo directly as follows:
path_crivo <- "C:/Users/João Anselmo/some_folders/excel_file.xlsx"
It works perfectly.
Anyone have faced a similar problem?
I can't rename the folders, nor set path_crivo directly, it is supposed to be a personal path.
Thanks for your help
Try
Encoding(path_crivo)<-"latin1"
(or, possibly, change the encoding of string before you create path_crivo)

Can't open .biom file for Phyloseq tree plotting

After trying to read a biom file:
rich_dense_biom <-
system.file("extdata", "D:\sample_otutable.biom", package = "phyloseq")
myData <-
import_biom(rich_dense_biom, treefilename, refseqfilename, parseFunction =
parse_taxonomy_greengenes)
the following errors are showing
Error in read_biom(biom_file = BIOMfilename) :
Both attempts to read input file:
either as JSON (BIOM-v1) or HDF5 (BIOM-v2).
Check file path, file name, file itself, then try again.
Are you sure D:\sample_otutable.biom really exists? And is a system file?
In R for Windows, it is at least safer (if not required?) to separate file paths with \\
This works for me
library("devtools")
install_github("biom", "joey711")
library(biom)
biom.file <-
"C:\\Users\\Mark Miller\\Documents\\R\\win-library\\3.3\\biom\\extdata\\min_dense_otu_table.biom"
my.data <- import_biom(BIOMfilename = biom.file)

Resources