Cannot save tables with get() in R - r

I scan through SQL database and load evey table by ODBC then would like to store it in a file of file name same as table name. I compose filename by paste(path,variablename,Sys.Date(),sep="_"). I also like to distinguish data in R by storing value of tables in a variable of same name as corresponding SQL table. I achieve this by loading data to a temporary variable then assigning its content to a variable which name is stored in variablename with assign(variablename,temporarytable) function.
I would like to save an R variable with save() function, but need to refer to its name stored in variablename variable. When using get(variablename) I got its content. When using save(get(variablename),file=paste(...,variablename,...)) I got an error that the object ‘get(variablename)’ cannot be found.
What is the problem with get() in save()? How can I get a variable content saved in this situation?
ps
I scan through SQL database tables with for loop. variablename variable stores SQL table name in particular iteration. assign(variablename,temporarytable) helped to load data to an object of required name.

It can be solved like this:
save(list = variablename, file = paste(...,variablename,...))
But I would still itch to know why save(get(variablename), ...) does not work.

Maybe, you can save the data to an object.
temp <- get(variablename)
save(temp,file=file.path(...,variablename,...))

Related

Load R data objects' attributes without loading object from file? [duplicate]

Here's the situation. My R code is supposed to check whether existing RData files in application's cache are up-to-date. I do that by saving the files with names consisting of base64-encoded names of a specific data element. However, data corresponding to each of these elements are being retrieved by submitting a particular SQL query per element, all specified in data collection's configuration file. So, in a situation when data for an element is retrieved, but afterwards I had to change that particular SQL query, data is not being updated.
In order to handle this situation, I decided to use R objects' attributes. I plan to save each data object's corresponding SQL query (request) - base64-encoded - as the object's attribute:
# save hash of the request's SQL query as data object's attribute,
# so that we can detect when configuration contains modified query
attr(data, "SQL") <- base64(request)
Then, when I need to verify whether the SQL has been query changed, I'd like to simply retrieve the object's corresponding attribute and compare it with the hash of the current SQL query. If they match - the query hasn't been changed and I skip processing this data request, if they don't match - the query has been changed and I go ahead with processing the request:
# check if the archive file has already been processed
if (DEBUG) {message("Processing request \"", request, "\" ...")}
if (file.exists(rdataFile)) {
# now check if request's SQL query hasn't been modified
data <- load(rdataFile)
if (identical(base64(request), attr(data, "SQL"))) {
skipped <<- skipped + 1
if (DEBUG) {message("Processing skipped: .Rdata file found.\n")}
return (invisible())
}
rm(data)
}
My question is whether it's possible to read/access object's attributes without fully loading the object from file. In other words, can I avoid the load() and rm() in the code above?
Your advice is much appreciated!
UPDATE: Additional question: What's wrong with my code, as it performs processing even when it shouldn't - in case, when all information is up-to-date (no changes in cache and in configuration file as well)?
UPDATE 2 (additional code per #MrFlick's answer):
# construct name from data source prefix and data ID (see config. file),
# so that corresponding data object (usually, data frame) will be saved
# later under that name via save()
dataName <- paste(dsPrefix, "data", indicator, sep = ".")
assign(dataName, srdaGetData())
data <- as.name(dataName)
# save hash of the request's SQL query as data object's attribute,
# so that we can detect when configuration contains modified query
attr(data, "SQL") <- base64(request)
# save current data frame to RData file
save(list = dataName, file = rdataFile)
# alternatively, use do.call() as in "getFLOSSmoleDataXML.R"
# clean up
rm(data)
You can't "really" do it, but you could modify the code in my cgwtools::lsdata function.
function (fnam = ".Rdata")
{
x <- load(fnam, envir = environment())
return(x)
}
This loads, thus taking time and briefly taking memory, and then the local environment disappears. So, add an argument for the items you want to check attributes for, add a line inside the function which does attributes(your_items) ->y ; return (list(x=x,y=y))
And there is a problem with the way you are using load(). When you use save/load you can "freeze-dry" multiple objects to an .RData file. They "re-infalte" into the current environemnt. As a result, when you call load(), it does not return the object(s), it returns a character vector with the names of all the objects that it restored. Since you didn't supply your save() code, i'm not sure what's actually in your load file, but if it was a variable called data, then just call
load(rdataFile)
not
data <- load(rdataFile)

Create database in Azure Databricks using R dataframe

I have my final output in R dataframe. I need to write this output to a database in Azure Databricks. Can someone help me with the syntax? I used this code:
require(SparkR)
data1 <- createDataFrame(output)
write.df(data1, path="dbfs:/datainput/sample_dataset.parquet",
source="parquet", mode="overwrite")
This code runs without error, but i dont see the database in the datainput folder (mentioned in the path). Is there some other way to do it?
I believe you are looking for saveAsTable function. write.df is particularly to save the data in a file system only, not to tag the data as table.
require(SparkR)
data1 <- createDataFrame(output)
saveAsTable(data1, tableName = "default.sample_table", source="parquet", mode="overwrite")
In the above code, default is some existing database name, under which a new table will get created having name as sample_table. If you mention sample_table instead of default.sample_table then it will be saved in the default database.

How to write an R Data Frame to a Snowflake database table

Does anyone know how to WRITE an R Data Frame to a new Snowflake database table? I have a successful Snowflake ODBC connection created in R, and can successfully query from Snowflake. The connection command is: conn <- DBI::dbConnect(odbc::odbc(), "Snowflake").
Now, I want to WRITE a data frame created in R back to Snowflake as a table. I used the following command: dbWriteTable(conn, "database.schema.tablename", R data frame name). Using this command successfully connects with Snowflake, but I get the following error message: "Error in new_result(connection#ptr, statement) : nanodbc/nanodbc.cpp:1344: 22000: Cannot perform CREATE TABLE. This session does not have a current database. Call 'USE DATABASE', or use a qualified name."
I am using a qualified database name in my "database.schema.tablename" argument in the dbWriteTable function. I don't see how to employ "USE DATABASE" in my R function. Any ideas?? Thank you!!
The API for DBI::dbWriteTable(…) requires passing either the literal table name as a string, or as a properly quoted identifier:
dbWriteTable(conn, name, value, ...)
conn: A DBIConnection object, as returned by dbConnect().
name: A character string specifying the unquoted DBMS table name, or the result of a call to dbQuoteIdentifier().
value: a data.frame (or coercible to data.frame).
dbWriteTable(conn, "database.schema.tablename", R data frame name)
Your code above will attempt to create a table literally named "database.schema.tablename", using the database and schema context associated with the connection object.
For example, if your connection had a database DB and schema SCH set, this would have succeeded in creating a table called DB.SCH."database.schema.tablename".
To define the database, schema and table names properly, use the DBI::Id class object with the right hierarchal order:
table_id <- Id(database="database", schema="schema", table="tablename")
dbWriteTable(conn, table_id, R data frame name)
Behind the scenes, the DBI::dbWriteTable(…) function recognizes the DBI::Id class argument type for name, and converts it into a quoted identifier format via DBI::dbQuoteIdentifier(…) (as a convenience).

Can convert a string to an object but can't save() it -- why? [duplicate]

I am repeatedly applying a function to read and process a bunch of csv files. Each time it runs, the function creates a data frame (this.csv.data) and uses save() to write it to a .RData file with a unique name. Problem is, later when I read these .RData files using load(), the loaded variable names are not unique, because each one loads with the name this.csv.data....
I'd like to save them with unique tags so that they come out properly named when I load() them. I've created the following code to illustrate .
this.csv.data = list(data=c(1:9), unique_tag = "some_unique_tag")
assign(this.csv.data$unique_tag,this.csv.data$data)
# I want to save the data,
# with variable name of <unique_tag>,
# at a file named <unique_tag>.dat
saved_file_name <- paste(this.csv.data$unique_tag,"RData",sep=".")
save(get(this.csv.data$unique_tag), saved_file_name)
but the last line returns:
"Error in save(get(this_unique_tag), file = data_tag) :
object ‘get(this_unique_tag)’ not found"
even though the following returns the data just fine:
get(this.csv.data$unique_tag)
Just name the arguments you use. With your code the following works fine:
save(list = this.csv.data$unique_tag, file=saved_file_name)
My preference is to avoid the name in the RData file on load:
obj = local(get(load('myfile.RData')))
This way you can load various RData files and name the objects whatever you want, or store them in a list etc.
You really should use saveRDS/readRDS to serialize your objects.
save and load are for saving whole environments.
saveRDS(this.csv.data, saved_file_name)
# later
mydata <- readRDS(saved_file_name)
you can use
save.image("myfile.RData")
This worked for me:
env <- new.env()
env[[varname]] <- object_to_save
save(list=c(varname), envir=env, file='out.Rda')
You could probably do it without a new env (but I didn't try this):
.GlobalEnv[[varname]] <- object_to_save
save(list=c(varname), envir=.GlobalEnv, file='out.Rda')
You might even be able to remove the envir variable.

Can I access R data objects' attributes without fully loading objects from file?

Here's the situation. My R code is supposed to check whether existing RData files in application's cache are up-to-date. I do that by saving the files with names consisting of base64-encoded names of a specific data element. However, data corresponding to each of these elements are being retrieved by submitting a particular SQL query per element, all specified in data collection's configuration file. So, in a situation when data for an element is retrieved, but afterwards I had to change that particular SQL query, data is not being updated.
In order to handle this situation, I decided to use R objects' attributes. I plan to save each data object's corresponding SQL query (request) - base64-encoded - as the object's attribute:
# save hash of the request's SQL query as data object's attribute,
# so that we can detect when configuration contains modified query
attr(data, "SQL") <- base64(request)
Then, when I need to verify whether the SQL has been query changed, I'd like to simply retrieve the object's corresponding attribute and compare it with the hash of the current SQL query. If they match - the query hasn't been changed and I skip processing this data request, if they don't match - the query has been changed and I go ahead with processing the request:
# check if the archive file has already been processed
if (DEBUG) {message("Processing request \"", request, "\" ...")}
if (file.exists(rdataFile)) {
# now check if request's SQL query hasn't been modified
data <- load(rdataFile)
if (identical(base64(request), attr(data, "SQL"))) {
skipped <<- skipped + 1
if (DEBUG) {message("Processing skipped: .Rdata file found.\n")}
return (invisible())
}
rm(data)
}
My question is whether it's possible to read/access object's attributes without fully loading the object from file. In other words, can I avoid the load() and rm() in the code above?
Your advice is much appreciated!
UPDATE: Additional question: What's wrong with my code, as it performs processing even when it shouldn't - in case, when all information is up-to-date (no changes in cache and in configuration file as well)?
UPDATE 2 (additional code per #MrFlick's answer):
# construct name from data source prefix and data ID (see config. file),
# so that corresponding data object (usually, data frame) will be saved
# later under that name via save()
dataName <- paste(dsPrefix, "data", indicator, sep = ".")
assign(dataName, srdaGetData())
data <- as.name(dataName)
# save hash of the request's SQL query as data object's attribute,
# so that we can detect when configuration contains modified query
attr(data, "SQL") <- base64(request)
# save current data frame to RData file
save(list = dataName, file = rdataFile)
# alternatively, use do.call() as in "getFLOSSmoleDataXML.R"
# clean up
rm(data)
You can't "really" do it, but you could modify the code in my cgwtools::lsdata function.
function (fnam = ".Rdata")
{
x <- load(fnam, envir = environment())
return(x)
}
This loads, thus taking time and briefly taking memory, and then the local environment disappears. So, add an argument for the items you want to check attributes for, add a line inside the function which does attributes(your_items) ->y ; return (list(x=x,y=y))
And there is a problem with the way you are using load(). When you use save/load you can "freeze-dry" multiple objects to an .RData file. They "re-infalte" into the current environemnt. As a result, when you call load(), it does not return the object(s), it returns a character vector with the names of all the objects that it restored. Since you didn't supply your save() code, i'm not sure what's actually in your load file, but if it was a variable called data, then just call
load(rdataFile)
not
data <- load(rdataFile)

Resources