R readRDS file through shortcut - r

I've created a shortcut on Windows which points to a RDS file. Is there a way to 'read' the shortcut and load the file?
I've tried the following command, but failed.
readRDS("...") # with correct location

You could use R.Utils package and it's readWindowsShortcut function -> CRAN link
This is a two-step procedure, where:
Using the readWindowsShortcut() one converts a .lnk file into a list
With readRDS one extracts the relativePath element from the list
Here is a working example with a step-by-step explanation:
data <- data.frame(a = c(1,2,3,4),
b = c("a", "b", "c", "d"))
# Save to disk (in working directory)
saveRDS(data, file = "data.Rds")
##
# Create a windows link `.lnk` manally
##
# Load (install if necessary) library
library(R.utils)
# Read content of .lnk and store it to an object
path <- readWindowsShortcut("data.Rds - Shortcut.lnk")
# See what's inside `path` object
summary(path)
Length Class Mode
header 12 -none- list
fileLocationInfo 4 -none- list
relativePath 1 -none- character
workingDirectory 1 -none- character
relativePathname 1 -none- character
pathname 1 -none- character
# Read RDS from `relativePath`
readRDS(path$relativePath)
a b
1 1 a
2 2 b
3 3 c
4 4 d

Related

sparklyr :: Error reading parquet file using sparklyr library in R

I am trying to read parquet file from databricks Filestore
library(sparklyr)
parquet_dir has been pre-defined
parquet_dir = /dbfs/FileStore/test/flc_next.parquet'
List the files in the parquet dir
filenames <- dir(parquet_dir, full.names = TRUE)
"/dbfs/FileStore/test/flc_next.parquet/_committed_6244562942368589642"
[2] "/dbfs/FileStore/test/flc_next.parquet/_started_6244562942368589642"
[3] "/dbfs/FileStore/test/flc_next.parquet/_SUCCESS"
[4] "/dbfs/FileStore/test/flc_next.parquet/part-00000-tid-6244562942368589642-0edceedf-7157-4cce-a084-0f2a4a6769e6-925-1-c000.snappy.parquet"
Show the filenames and their sizes
data_frame(
filename = basename(filenames),
size_bytes = file.size(filenames)
)
rning: `data_frame()` was deprecated in tibble 1.1.0.
Please use `tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
# A tibble: 4 × 2
filename size_bytes
<chr> <dbl>
1 _committed_6244562942368589642 124
2 _started_6244562942368589642 0
3 _SUCCESS 0
4 part-00000-tid-6244562942368589642-0edceedf-7157-4cce-a084-0f2a4a6… 248643
Import the data into Spark
timbre_tbl <- spark_read_parquet("flc_next.parquet", parquet_dir)
Error : $ operator is invalid for atomic vectors
Some(<code style = 'font-size:10p'> Error: $ operator is invalid for atomic vectors </code>)
I would appreciate any help/suggestion
Thanks in advance
The first argument of spark_read_parquet expects a spark connection, check this: sparklyr::spark_connect. If you are running the codes in Databricks then this should work:
sc <- spark_connect(method = "databricks")
timbre_tbl <- spark_read_parquet(sc, "flc_next.parquet", parquet_dir)

How to loop through urls in a column using download.file()

I have this df from which I need to download all file urls:
library(RCurl)
view(df)
Date column_1
<chr> <chr>
1 5/1/21 https://this.is.url_one.tar.gz
2 5/2/12 https://this.is.url_two.tar.gz
3 7/3/19 https://this.is.url_three.tar.gz
4 8/3/13 https://this.is.url_four.tar.gz
5 10/1/17 https://this.is.url_five.tar.gz
6 12/12/10 https://this.is.url_six.tar.gz
7 9/9/16 https://this.is.url_seven.tar.gz
8 4/27/20 https://this.is.url_eight.tar.gz
9 7/20/15 https://this.is.url_nine.tar.gz
10 8/30/19 https://this.is.url_ten.tar.gz
# … with 30 more rows
Of course I do not want to type download.file(url='https://this.is.url_number.tar.gz', destfile='files.tar.gz', method='curl') 40 times for each url. How can I loop over all url's in column_1 using download.file()?
Here is one way in a forloop
for(i in seq_len(nrow(df))) {
download.file(url = df$column_1[i],
destfile = paste0('files', df$column_ID_number[i],
'.tar.gz'), method = 'curl')
}
You can use Map -
Map(download.file, df$column_1, sprintf('file%d.tar.gz', seq(nrow(df))))
where sprintf is used to create filenames to save the file.

Need to use jsonlite to handle ndjson message list using stream_in() and stream_out()

I have an ndjson data source. For a simple example, consider a text file with three lines, each containing a valid json message. I want to extract 7 variables from the messages and put them in a dataframe.
Please use the following sample data in a text file. You can paste this data into a text editor and save it as "ndjson_sample.txt"
{"ts":"1","ct":"{\"Var1\":6,\"Var2\":6,\"Var3\":-70,\"Var4\":12353,\"Var5\":1,\"Var6\":\"abc\",\"Var7\":\"x\"}"}
{"ts":"2","ct":"{\"Var1\":6,\"Var2\":6,\"Var3\":-68,\"Var4\":4528,\"Var5\":1,\"Var6\":\"def\",\"Var7\":\"y\"}"}
{"ts":"3","ct":"{\"Var1\":6,\"Var2\":6,\"Var3\":-70,\"Var4\":-5409,\"Var5\":1,\"Var6\":\"ghi\",\"Var7\":\"z\"}"}
The following three lines of code accomplish what I want to do:
file1 <- "ndjson_sample.txt"
json_data1 <- ndjson::stream_in(file1)
raw_df_temp1 <- as.data.frame(ndjson::flatten(json_data1$ct))
For reasons I won't get into, I cannot use the ndjson package. I must find a way to use the jsonlite package to do the same thing using the stream_in() and stream_out() functions. Here's what I tried:
con_in1 <- file(file1, open = "rt")
con_out1 <- file(tmp <- tempfile(), open = "wt")
callback_func <- function(df){
jsonlite::stream_out(df, con_out1, pagesize = 1)
}
jsonlite::stream_in(con_in1, handler = callback_func, pagesize = 1)
close(con_out1)
con_in2 <- file(tmp, open = "rt")
raw_df_temp2 <- jsonlite::stream_in(con_in2)
This is not giving me the same data frame as a final output. Can you tell me what I'm doing wrong and what I have to change to make raw_df_temp1 equal raw_df_temp2?
I could potentially solve this with a the fromJSON() functions operating on each line of the file, but I'd like to find a way to do it with the stream functions. The files I will be dealing with a are quite large and so efficiency will be key. I need this to be as fast as possible.
Thank you in advance.
Currently under ct you'll find a string that can (subsequently) be fed to fromJSON independently, but it will not be parsed as such. Ignoring your stream_out(stream_in(...),...) test, here are a couple of ways to read it in:
library(jsonlite)
json <- stream_in(file('ds_guy.ndjson'), simplifyDataFrame=FALSE)
# opening file input connection.
# Imported 3 records. Simplifying...
# closing file input connection.
cbind(
ts = sapply(json, `[[`, "ts"),
do.call(rbind.data.frame, lapply(json, function(a) fromJSON(a$ct)))
)
# ts Var1 Var2 Var3 Var4 Var5 Var6 Var7
# 1 1 6 6 -70 12353 1 abc x
# 2 2 6 6 -68 4528 1 def y
# 3 3 6 6 -70 -5409 1 ghi z
Calling fromJSON on each string might be cumbersome, and with larger data this slow-down is why there is stream_in, so if we can capture the "ct" component into a stream of its own, then ...
writeLines(sapply(json, `[[`, "ct"), 'ds_guy2.ndjson')
(There are far-more-efficient ways to do this with non-R tools, including perhaps a simple
sed -e 's/.*"ct":"\({.*\}\)"}$/\1/g' -e 's/\\"/"/g' ds_guy.ndjson > ds_guy.ndjson2
though this makes a few assumptions about the data that may not be perfectly safe. A better solution would be to use jq, which should "always" correctly-parse proper json, then a quick sed to replace escaped quotes:
jq '.ct' ds_guy.ndjson | sed -e 's/\\"/"/g' > ds_guy2.ndjson
and you can do that with system(...) in R if needed.)
From there, under the assumption that each line will contain exactly one row of data.frame data:
json2 <- stream_in(file('ds_guy2.ndjson'), simplifyDataFrame=TRUE)
# opening file input connection.
# Imported 3 records. Simplifying...
# closing file input connection.
cbind(ts=sapply(json, `[[`, "ts"), json2)
# ts Var1 Var2 Var3 Var4 Var5 Var6 Var7
# 1 1 6 6 -70 12353 1 abc x
# 2 2 6 6 -68 4528 1 def y
# 3 3 6 6 -70 -5409 1 ghi z
NB: in the first example, "ts" is a factor, all others are character because that's what fromJSON gives. In the second example, all strings are factor. This can easily be addressed through judicious use of stringsAsFactors=FALSE, depending on your needs.

Saving object attributes using a function

I'm trying to modify the save() function so that the script from which the object originated is stored as an attribute of the object.
s = function(object, filepath, original.script.name){
#modified save() function
#stores the name of the script from which the object originates as an attribute, then saves as normal
attr(object, "original.script") = original.script.name
save(object, file = filepath)
}
Sample:
testob = 1:10
testob
# [1] 1 2 3 4 5 6 7 8 9 10
s(testob, filepath = "rotation1scripts_v4/saved.objects/testob", "this.is.the.name")
load(file = "rotation1scripts_v4/saved.objects/testob")
testob
# [1] 1 2 3 4 5 6 7 8 9 10
attributes(testob)
# NULL
Investigating further, it seems that the object is not being loaded into the environment:
testob2 = 1:5
testob2
# [1] 1 2 3 4 5
s(testob2, "rotation1scripts_v4/saved.objects/testob2", "this.is.the.name")
rm(testob2)
load(file = "rotation1scripts_v4/saved.objects/testob2")
testob2
# Error: object 'testob2' not found
Why isn't it working?
You need to be careful with save(). It saves variables with the same name that's passed to save(). So When you call save(object, ...), it's saving the variable as "object" and not "testob" which you seem to be expecting. You can do some non-standard environment shuffling to make this work though. Try
s <- function(object, filepath, original.script.name){
objectname <- deparse(substitute(object))
attr(object, "original.script") = original.script.name
save_envir <- new.env()
save_envir[[objectname]] <- object
save(list=objectname, file = filepath, envir=save_envir)
}
We use deparse(substitute()) to the get name of the variable passed to the function. Then we create a new environment where we can create the object with the same name. This way we can use that name when actually saving the object.
This appears to work if we test with
testob <- 1:10
s(testob, filepath = "test.rdata", "this.is.the.name")
rm(testob)
load(file = "test.rdata")
testob
# [1] 1 2 3 4 5 6 7 8 9 10
# attr(,"original.script")
# [1] "this.is.the.name"

Missing `parse` information inside vignette build

Goal
The goal is to create a package that parses R scripts and lists functions (from the package - like mvbutils- but also imports).
Function
The main function relies on parsing R script with
d<-getParseData(x = parse(text = deparse(x)))
Reproducible code
For example in an interactive R session the output of
x<-test<-function(x){x+1}
d<-getParseData(x = parse(text = deparse(x)))
Has for first few lines:
line1 col1 line2 col2 id parent token terminal text
23 1 1 4 1 23 0 expr FALSE
1 1 1 1 8 1 23 FUNCTION TRUE function
2 1 10 1 10 2 23 '(' TRUE (
3 1 11 1 11 3 23 SYMBOL_FORMALS TRUE x
4 1 12 1 12 4 23 ')' TRUE )
Error
When building a vignette with knitr containing - either with knit html from RStudio or devtools::build_vignettes, the output of the previous chunk of code is NULL. On the other hand using "knitr::knit" inside an R session will give the correct output.
Questions:
Is there a reason for the parser to behave differently inside the knit function/environment, and is there a way to bypass this?
Update
Changing code to:
x<-test<-function(x){x+1}
d<-getParseData(x = parse(text = deparse(x),keep.source = TRUE))
Fixes the issue, but this does not answer the question of why the same function behaves differently.
From the help page ?options:
keep.source:
When TRUE, the source code for functions (newly defined or loaded) is stored internally allowing comments to be kept in the right places. Retrieve the source by printing or using deparse(fn, control = "useSource").
The default is interactive(), i.e., TRUE for interactive use.
When building the vignette, you are running a non-interactive R session, so the source code is discarded in parse().
parse(file = "", n = NULL, text = NULL, prompt = "?",
keep.source = getOption("keep.source"), srcfile,
encoding = "unknown")

Resources