R Error: unknown input format - r

NEI <- readRDS(unz(tf, filename = "summarySCC_PM25.rds", open = "", encoding = getOption("encoding")))
Variable tf is a temporary file with a very specific location saved on the hard drive. It is my understanding that the format for unz() is:
unz(description, filename, open = "", encoding = getOption("encoding"))
As I read the documentation, I am interpreting that my application of the code is accurate as that:
description is a specific zip file destination, which outputs in var tf as c://...//345du.zip
filename is summarySCC_PM25.rds, which is the file to be extracted from tf
open is already established in the var so black should be fine
encoding labels the language type.
Within the context of the code above, I receive "Error: unknown input format" from R 3.1.1. I need clarification on what might be happening as that I interpret my code to be the same as:
NEI <- readRDS("summarySCC_PM25.rds")
Am I misinterpreting this?

I found your data online so that I can read your file. It was available from here:
https://www.linkedin.com/today/post/article/20140617173447-5576436-explore-n-analyze-data-assignment-2
> unzip("C:\\Users\\jmiller\\Downloads\\exdata_data_NEI_data.zip")
> NEI <- readRDS("summarySCC_PM25.rds")
> dim(NEI)
[1] 6497651 6
> colnames(NEI)
[1] "fips" "SCC" "Pollutant" "Emissions" "type" "year"

Avoid unz() and use unzip(withanindex) since the temp file is a moving target

Related

DirSource returning empty directory error despite correct file path

This seems like a very basic issue. The file path is valid and I can open the file using other means in R, but I am looking to use tm library.
docs <- Corpus(DirSource("C:/Users/xyz/Work/test.corpus.txt"), encoding = "UTF-8"))
Throws an error of:
Error in inherits(x, "Source") : empty directory
EDIT:
This works with the original method:
docs <- Corpus(DirSource("C:/Users/xyz/Work/"), encoding = "UTF-8"))
Apparently you cannot specify an individual file name. The solution is to to read the file via another method and then use another source type such as VectorSource.
You can specify a pattern so that DirSource only picks the files with that pattern. pattern = ".txt" for all txt files. Or if you want, pattern = "test.corpus.txt". Something like below.
docs <- Corpus(DirSource("C:/Users/xyz/Work/", pattern = "test.corpus.txt", encoding = "UTF-8")

r - defining path for read_excel with paste0

I'm working on an R script that is supposed to open an excel file from a folder in the current user computer using read_excel from readxl library.
The path will have a personal folder (C:/Users/Username....).
I'm trying to accomplish that as follows:
string <- getwd()
name <- strsplit(strsplit(x = string, split = "C:/Users/")[[1]][2], split = "/")[[1]][1]
path_crivo <- paste0("C:/Users/", name, "/some_folders/excel_file.xlsx")
So path_crivo stores the string: C:/Users/João Anselmo/some_folders/excel_file.xlsx"
When I run the read_excel function with this path I get the error:
read_excel(path_crivo)
"Error in read_fun(path = path, sheet_i = sheet, limits = limits, shim = shim, :
Evaluation error: zip file 'C:/Users/João Anselmo/some_folders/excel_file.xlsx' cannot be opened."
If I set path_crivo directly as follows:
path_crivo <- "C:/Users/João Anselmo/some_folders/excel_file.xlsx"
It works perfectly.
Anyone have faced a similar problem?
I can't rename the folders, nor set path_crivo directly, it is supposed to be a personal path.
Thanks for your help
Try
Encoding(path_crivo)<-"latin1"
(or, possibly, change the encoding of string before you create path_crivo)

Create a .tar.gz file from serialized content string with R

Given a .tar.gz file on my hard disk, would like to create that exact file, but with R code alone (e.g. with the help of serialization). The goal is to not refer to the file itself, but to generate a plain text variable containing the content of the file and after that to write the file to the file system. I thought about the following:
Take the base64 string of the file (base64 serialization).
Write it to the file system as a binary file.
But the following code generates an empty file:
zzfil <- tempfile("testfile")
zz <- file(zzfil, "wb")
file_content <- "H4sIAAAAAAAAA+1YbW/bNhD2Z/6KW/zBNpLIerHjQmvapo6HBWgyw3ZXDE1X0BJtEZFIgaTguEb/+06S7drJumJA5m6DHsAQxOM9PPF4uscyTJuUBnd0zmzbaV8Oxv3R1XBy9ctN7clgI846nfzq9Lr27rVAr2fXHM+zvV6303N6NdvxHDSDXTsAMm2owlDE/K/nfcv+H8WwzL0PZu8gkMkyxcG1lUy4ifH2XUQNmIhtxuFSMg3Nwgp9qlmL/MqU5lL4YFuOZZOLzERS5Z4SFkoaBtyQa8qFwR9DwwTZ1stCsh2H50uZKc3i2SstE7aImGKWYOYFuWQ6UDw1xRrXUjGgU5kZWOShcQNhEVFCl1Pky80mogKkYBAjcYsA4q1mMEN+0LgyTkd6AVyETBgu5hiOonNF0wgt3ERcFI+8s7BF3vCACb3Zkbi8A67zCDIkUi/JQAQyRDof3k5+On2GgadMhNqHETRfnINndSyvRa6SVCqDo/N5GkvjFjbXci2ndQKGT6e4sfmQg9PdFnlDPy0vqaGYLpUxcsNYqPsySXlMyx0RkqxzE/rg2s6zU7t76jngOL7T870uBtP/EcScbPK/n/b2zcX1YDy86A+e8ox9q/6x8Mv6d92eZztY/+6Z163q/xBg9/kBHFJjmBLNo9/fv/dpnEbU//Dh+KhFahX+33hQ/6P2P7BG0eO73a/Xv20/qH/nDGUAdKv6P3z+IxbH0hod8P2P2T5b59/zOo6d6z+706ve/4dAHX7OE34CC6ni8AdSJ3XUZLmS0YDCid3TJEUNMstEkCsMEDRhITSKU9LAuYuIBxGkCpWbhsYeV8Mq2H6TGQRIFTOqRKnJSsm2kX200Ii58srlFozGJgu5BGr8wh8gMib12211mt7NtRXR0AqkJT61C/MY9SFkms2yGO7YciqpCkEjoQkyDGkm1eOFNsSvMx6H+JghjFgsaQhbNQyNvlExHMM44jOD19eNwqMfseBuZ9oOHnoMSo8JFtifOzzymDQIqTfgVdmTSbHH8Px0u/nNFqxQwBab3Tza22ts1Z8fO3/Mq3ufISeI+VRRtWyuNWfrC+eV3gpRLrAw4piFL4+KCTlNaWuGqEDPExNQpU+AMt28P0/SeaucdgxzJpOPeISMRBWdYNBQh5DNaBYbmCItRlr13X/p+z+h4ukVwN/v/67Tc6r+/53yv1YA4cH+/7mP8u91u17V/w+B27yhr4qUfya3NOZUb+9M/l1nte4z74o+g6OZxrOyKhtMM287t+GXTyMrMvyKFMB5azGhd52rN3CFChUqfB/8AQr6tbUAGgAA"
writeBin(RCurl::base64Decode(file_content), zz)
close(zz)
file.rename(from = zzfil, to = paste0(zzfil,".tar.gz"))
How should I serialize the file instead? I.e. how should I fill the functions file_to_string and string_to_file?
file_to_string <- function(input_file){
# Return a serialized string of input_file
}
string_to_file <- function(input_string){
# Return content to write to a file
}
original_file <- "original.tar.gz"
zzfil <- tempfile("copy")
zz <- file(zzfil, "wb")
file_content <- file_to_string(original_file)
writeBin(string_to_file(file_content), zz)
close(zz)
file.rename(from = zzfil, to = paste0(zzfil,".tar.gz"))
For me, using R 3.4.4 on platform x86_64-pc-linux-gnu, RCurl version 1.95-4.10, the example code produces a non-empty file that can be read back in using readBin, so i can't reproduce your empty file issue.
But that's not the main issue here.
UsingwriteBin does not achieve what you want to do: it's use case is to store an R-Object (a vector) in a binary format on the filesystem and read it back in with readBin; not to read in a binary file, then manipulate it and save the new version or generate a binary file that is meant to be understood by anything else besides readBin.
In my humble opinion: R is probably not the right tool to do binary patches.

Convert .csv file for further manipulation using 'highfrequency' package on R

The highfrequency package has been created in a way to transform .txt and .csv files from the NYSE TAQ and WRDS TAQ respectively into .RData files of xts objects, which then can be easily manipulated through the package.
The problem is that I have limited access to the WRDS database which only enables me to download tick-data from the CRSP (The Center for Research in Security Prices) database but not the TAQ (Trades and Quotes) database. So my data look like this. The downloadable file contains tick-data for the REIT index from 2014-01-01 to 2014-01-05. I changed manually the ticker header for the header PRICE as it is proposed by Kris Boudt, one of the main authors.
The code that I use is the following:
from="2014-03-01"
to="2014-04-31"
datasource="C:/Users/aris/Desktop/raw_data"
datadestination="C:/Users/aris/Desktop/xts_data"
convert(from = from,to=to,datasource = datasource,datadestination = datadestination,
trades=TRUE,quotes=FALSE,ticker="REIT",dir=FALSE,extension="csv",header = TRUE,
tradecolnames = NULL, quotecolnames = NULL,format = "%Y%m%d %H:%M:%S",onefile=TRUE)
I suspect that the problem lies at the line format = "%Y%m%d %H:%M:%S", as at the .csv file the date and the time are comma separated. I tried to put a comma between %d and %H like this format = "%Y%m%d,%H:%M:%S" but nothing.
The error reads
Error in `$<-.data.frame`(`*tmp*`, "COND", value = numeric(0)) :
replacement has 0 rows, data has 1048575
All the suggestions are welcomed.
Thanks to Joshua Ulrich I was able to gain some additional intuition and solve the problem(s). Actually, there is no need to manipulate the .csv file itself and add extra columns. Instead of setting tradecolnames = NULL you let the machine know which columns are contained into your file by setting tradecolnames = c("DATE","TIME","PRICE"). The problem with the non-existent directories is fixed by setting dir=TRUE . The final code looks like this:
from="2014-03-01"
to="2014-04-31"
datasource="C:/Users/aris/Desktop/raw_data"
datadestination="C:/Users/aris/Desktop/xts_data"
convert(from,to,datasource,datadestination,trades=TRUE,quotes=FALSE,ticker="REIT",dir=TRUE,extension="csv",header= TRUE,tradecolnames=c("DATE","TIME","PRICE"),format = "%Y%m%d %H:%M:%S",onefile=TRUE)
The highfrequency::convert function calls highfrequency:::makeXtsTrades, which expects the following columns in your text file: DATE,TIME,PRICE,SIZE,SYMBOL,EX,COND,CORR,G127.
I added empty columns to your text file, and did not get the error in your question. The edited text file looks like:
DATE,TIME,PRICE,SIZE,SYMBOL,EX,COND,CORR,G127
20140102,9:30:00,1123.77,,,,,,
20140102,9:30:01,1122.81,,,,,,
20140102,9:30:02,1122.77,,,,,,
I got another error though.
Error in gzfile(file, "wb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "wb") :
cannot open compressed file '/home/josh/Desktop/z_xts/2014-01-02/REIT_trades.RData', probable reason 'No such file or directory'
So it looks like the convert function expects all the daily output directories to exist before you run it. The function runs and creates the output after I create those directories.

minimize code for base64 decoding of rds object

I have R object (saved as .rds file) in base64-encoded string:
encoded <- "H4sIAAAAAAAABoVQywrCMBBc8zi0ICiCPyHme7yVrU0hkEZIA/XofwvWTUwQc/Ewy2SyOzvJpQUABpwxYJwoP1DZEHYECQKaeD5nwnHqa+1LTgCCpfGPIB1Oes5eReSD8VVf42+LKr3bmOdBZV3XZ214tTgXQ5bFdsCAKmBv9Y8yenKsDPbKuKC9Q6tmbUevRxKPGa+MP2mlcYO+0/6YFOrLrksT6RkyI36nSInJsCx6A7sXoh15AQAA"
I need to load this object in R. Following this SO question ("Base64 encoding a .Rda file"), I came to the following code:
library("base64enc")
conb64 <- file('obj.b64', 'w+b')
write(encoded, conb64);
close(conb64)
base64decode(file='obj.b64', output = 'obj.rds')
myobj <- readRDS('obj.rds')
This works fine but I would like to minimize the code and ideally manage without creating disk files, something like myobj <- readRDS(base64decode(encoded)). Is there any way to remove at least some operations?
It seems for me that there's a bug in base64enc package. Can be reproduced by simple executing base64decode(what='anything', output = 'any.name') - gives an error:
Error in base64decode(what="anything", output = "any.name") :
argument "file" is missing, with no default
apparently it happens because base64decode() use file as an argument but also calls a file() function. When I changed the source of the function (replaced file to filename), everything worked correctly and code decoded <- base64decode(encoded, what = 'raw') gives correct binary rds file. While the bug is not corrected, one can use the function of the same name from caTools package: decoded <- base64decode(z = encoded, what = 'raw'). However, I failed to feed this to readRDS() function:
library('caTools')
decoded <- base64decode(encoded, what = 'raw')
con1 <- rawConnection(object = dec, open = 'rb')
myobj <- readRDS(con1)
# Error in readRDS(con1) : unknown input format

Resources