minimize code for base64 decoding of rds object

minimize code for base64 decoding of rds object - r

I have R object (saved as .rds file) in base64-encoded string:
encoded <- "H4sIAAAAAAAABoVQywrCMBBc8zi0ICiCPyHme7yVrU0hkEZIA/XofwvWTUwQc/Ewy2SyOzvJpQUABpwxYJwoP1DZEHYECQKaeD5nwnHqa+1LTgCCpfGPIB1Oes5eReSD8VVf42+LKr3bmOdBZV3XZ214tTgXQ5bFdsCAKmBv9Y8yenKsDPbKuKC9Q6tmbUevRxKPGa+MP2mlcYO+0/6YFOrLrksT6RkyI36nSInJsCx6A7sXoh15AQAA"
I need to load this object in R. Following this SO question ("Base64 encoding a .Rda file"), I came to the following code:
library("base64enc")
conb64 <- file('obj.b64', 'w+b')
write(encoded, conb64);
close(conb64)
base64decode(file='obj.b64', output = 'obj.rds')
myobj <- readRDS('obj.rds')
This works fine but I would like to minimize the code and ideally manage without creating disk files, something like myobj <- readRDS(base64decode(encoded)). Is there any way to remove at least some operations?
It seems for me that there's a bug in base64enc package. Can be reproduced by simple executing base64decode(what='anything', output = 'any.name') - gives an error:
Error in base64decode(what="anything", output = "any.name") :
argument "file" is missing, with no default
apparently it happens because base64decode() use file as an argument but also calls a file() function. When I changed the source of the function (replaced file to filename), everything worked correctly and code decoded <- base64decode(encoded, what = 'raw') gives correct binary rds file. While the bug is not corrected, one can use the function of the same name from caTools package: decoded <- base64decode(z = encoded, what = 'raw'). However, I failed to feed this to readRDS() function:
library('caTools')
decoded <- base64decode(encoded, what = 'raw')
con1 <- rawConnection(object = dec, open = 'rb')
myobj <- readRDS(con1)
# Error in readRDS(con1) : unknown input format

Related

Remove invalid character in json in R

Two problems.
I have a large file with sensor data exported in json format in txt files.
When I use jsonlite to parse it:
json1 <- fromJSON(txt = "temp.txt")
I receive:
Error in parse_con(txt, bigint_as_char) :
lexical error: invalid char in json text.
prm,{"event_id":"0d3eefe1-8f7e-
(right here) ------^
I have tried to run a simple code to clean it:
test <- readLines("temp.txt", warn = FALSE)
test <- gsub("prm,", "", test)
This cleans the gunk out but then when I try to save it back as a text file:
write.table(test, "test.txt", sep= ",")
The file contains this at the beginning:
"x"
"1","{\"event_id\":\"0d3eefe1-8 etc
Any ideas?

I think what you are looking for is writeLines().
write.table() will convert the character strings to a table. This part: "1", is the line number which R puts as a new column in the way you save the file. "x" is the column name which is created.
What I think you wanted to do is:
writeLines(test, "test.txt", useBytes = TRUE)
The part useBytes = TRUE makes sure the encoding is not changed when you save the file (which Windows annoyingly insists on doing otherwise).

DirSource returning empty directory error despite correct file path

This seems like a very basic issue. The file path is valid and I can open the file using other means in R, but I am looking to use tm library.
docs <- Corpus(DirSource("C:/Users/xyz/Work/test.corpus.txt"), encoding = "UTF-8"))
Throws an error of:
Error in inherits(x, "Source") : empty directory
EDIT:
This works with the original method:
docs <- Corpus(DirSource("C:/Users/xyz/Work/"), encoding = "UTF-8"))
Apparently you cannot specify an individual file name. The solution is to to read the file via another method and then use another source type such as VectorSource.

You can specify a pattern so that DirSource only picks the files with that pattern. pattern = ".txt" for all txt files. Or if you want, pattern = "test.corpus.txt". Something like below.
docs <- Corpus(DirSource("C:/Users/xyz/Work/", pattern = "test.corpus.txt", encoding = "UTF-8")

r - defining path for read_excel with paste0

I'm working on an R script that is supposed to open an excel file from a folder in the current user computer using read_excel from readxl library.
The path will have a personal folder (C:/Users/Username....).
I'm trying to accomplish that as follows:
string <- getwd()
name <- strsplit(strsplit(x = string, split = "C:/Users/")[[1]][2], split = "/")[[1]][1]
path_crivo <- paste0("C:/Users/", name, "/some_folders/excel_file.xlsx")
So path_crivo stores the string: C:/Users/João Anselmo/some_folders/excel_file.xlsx"
When I run the read_excel function with this path I get the error:
read_excel(path_crivo)
"Error in read_fun(path = path, sheet_i = sheet, limits = limits, shim = shim, :
Evaluation error: zip file 'C:/Users/JoÃ£o Anselmo/some_folders/excel_file.xlsx' cannot be opened."
If I set path_crivo directly as follows:
path_crivo <- "C:/Users/João Anselmo/some_folders/excel_file.xlsx"
It works perfectly.
Anyone have faced a similar problem?
I can't rename the folders, nor set path_crivo directly, it is supposed to be a personal path.
Thanks for your help

Try
Encoding(path_crivo)<-"latin1"
(or, possibly, change the encoding of string before you create path_crivo)

Create a .tar.gz file from serialized content string with R

Given a .tar.gz file on my hard disk, would like to create that exact file, but with R code alone (e.g. with the help of serialization). The goal is to not refer to the file itself, but to generate a plain text variable containing the content of the file and after that to write the file to the file system. I thought about the following:
Take the base64 string of the file (base64 serialization).
Write it to the file system as a binary file.
But the following code generates an empty file:
zzfil <- tempfile("testfile")
zz <- file(zzfil, "wb")
file_content <- "H4sIAAAAAAAAA+1YbW/bNhD2Z/6KW/zBNpLIerHjQmvapo6HBWgyw3ZXDE1X0BJtEZFIgaTguEb/+06S7drJumJA5m6DHsAQxOM9PPF4uscyTJuUBnd0zmzbaV8Oxv3R1XBy9ctN7clgI846nfzq9Lr27rVAr2fXHM+zvV6303N6NdvxHDSDXTsAMm2owlDE/K/nfcv+H8WwzL0PZu8gkMkyxcG1lUy4ifH2XUQNmIhtxuFSMg3Nwgp9qlmL/MqU5lL4YFuOZZOLzERS5Z4SFkoaBtyQa8qFwR9DwwTZ1stCsh2H50uZKc3i2SstE7aImGKWYOYFuWQ6UDw1xRrXUjGgU5kZWOShcQNhEVFCl1Pky80mogKkYBAjcYsA4q1mMEN+0LgyTkd6AVyETBgu5hiOonNF0wgt3ERcFI+8s7BF3vCACb3Zkbi8A67zCDIkUi/JQAQyRDof3k5+On2GgadMhNqHETRfnINndSyvRa6SVCqDo/N5GkvjFjbXci2ndQKGT6e4sfmQg9PdFnlDPy0vqaGYLpUxcsNYqPsySXlMyx0RkqxzE/rg2s6zU7t76jngOL7T870uBtP/EcScbPK/n/b2zcX1YDy86A+e8ox9q/6x8Mv6d92eZztY/+6Z163q/xBg9/kBHFJjmBLNo9/fv/dpnEbU//Dh+KhFahX+33hQ/6P2P7BG0eO73a/Xv20/qH/nDGUAdKv6P3z+IxbH0hod8P2P2T5b59/zOo6d6z+706ve/4dAHX7OE34CC6ni8AdSJ3XUZLmS0YDCid3TJEUNMstEkCsMEDRhITSKU9LAuYuIBxGkCpWbhsYeV8Mq2H6TGQRIFTOqRKnJSsm2kX200Ii58srlFozGJgu5BGr8wh8gMib12211mt7NtRXR0AqkJT61C/MY9SFkms2yGO7YciqpCkEjoQkyDGkm1eOFNsSvMx6H+JghjFgsaQhbNQyNvlExHMM44jOD19eNwqMfseBuZ9oOHnoMSo8JFtifOzzymDQIqTfgVdmTSbHH8Px0u/nNFqxQwBab3Tza22ts1Z8fO3/Mq3ufISeI+VRRtWyuNWfrC+eV3gpRLrAw4piFL4+KCTlNaWuGqEDPExNQpU+AMt28P0/SeaucdgxzJpOPeISMRBWdYNBQh5DNaBYbmCItRlr13X/p+z+h4ukVwN/v/67Tc6r+/53yv1YA4cH+/7mP8u91u17V/w+B27yhr4qUfya3NOZUb+9M/l1nte4z74o+g6OZxrOyKhtMM287t+GXTyMrMvyKFMB5azGhd52rN3CFChUqfB/8AQr6tbUAGgAA"
writeBin(RCurl::base64Decode(file_content), zz)
close(zz)
file.rename(from = zzfil, to = paste0(zzfil,".tar.gz"))
How should I serialize the file instead? I.e. how should I fill the functions file_to_string and string_to_file?
file_to_string <- function(input_file){
# Return a serialized string of input_file
}
string_to_file <- function(input_string){
# Return content to write to a file
}
original_file <- "original.tar.gz"
zzfil <- tempfile("copy")
zz <- file(zzfil, "wb")
file_content <- file_to_string(original_file)
writeBin(string_to_file(file_content), zz)
close(zz)
file.rename(from = zzfil, to = paste0(zzfil,".tar.gz"))

For me, using R 3.4.4 on platform x86_64-pc-linux-gnu, RCurl version 1.95-4.10, the example code produces a non-empty file that can be read back in using readBin, so i can't reproduce your empty file issue.
But that's not the main issue here.
UsingwriteBin does not achieve what you want to do: it's use case is to store an R-Object (a vector) in a binary format on the filesystem and read it back in with readBin; not to read in a binary file, then manipulate it and save the new version or generate a binary file that is meant to be understood by anything else besides readBin.
In my humble opinion: R is probably not the right tool to do binary patches.

R Error: unknown input format

NEI <- readRDS(unz(tf, filename = "summarySCC_PM25.rds", open = "", encoding = getOption("encoding")))
Variable tf is a temporary file with a very specific location saved on the hard drive. It is my understanding that the format for unz() is:
unz(description, filename, open = "", encoding = getOption("encoding"))
As I read the documentation, I am interpreting that my application of the code is accurate as that:
description is a specific zip file destination, which outputs in var tf as c://...//345du.zip
filename is summarySCC_PM25.rds, which is the file to be extracted from tf
open is already established in the var so black should be fine
encoding labels the language type.
Within the context of the code above, I receive "Error: unknown input format" from R 3.1.1. I need clarification on what might be happening as that I interpret my code to be the same as:
NEI <- readRDS("summarySCC_PM25.rds")
Am I misinterpreting this?

I found your data online so that I can read your file. It was available from here:
https://www.linkedin.com/today/post/article/20140617173447-5576436-explore-n-analyze-data-assignment-2
> unzip("C:\\Users\\jmiller\\Downloads\\exdata_data_NEI_data.zip")
> NEI <- readRDS("summarySCC_PM25.rds")
> dim(NEI)
[1] 6497651 6
> colnames(NEI)
[1] "fips" "SCC" "Pollutant" "Emissions" "type" "year"

Avoid unz() and use unzip(withanindex) since the temp file is a moving target

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

minimize code for base64 decoding of rds object - r

Related

Remove invalid character in json in R

DirSource returning empty directory error despite correct file path

r - defining path for read_excel with paste0

Create a .tar.gz file from serialized content string with R

R Error: unknown input format

Categories

Resources