I am connecting to a local SQL Server database and running some queries in loop. The output from each iteration is saved as a RDS datafile locally on the disk.
This works fine when I execute this sequentially in standard for loop as shown below.
for(i in 1: NROW(Employee_Df))
{
df_results <- sqlQuery(myconn, paste(" SELECT * FROM Salary_Df
WHERE FullName like ",Employee_Df$FullName[i], "",sep="'"))
saveRDS(df_results,
file=paste0("/Home/Desktop", "Salary",Employee_Df$FullName[i], ".rds"),
compress = TRUE)
}
When I try to assign the output from the same query using foreach, the code fails. I don't see an error messages but I don't see any valid RDS file saved based on the results from this query.
foreach(i = 1:2, .packages="RODBC")%dopar%{
df_results <- sqlQuery(myconn, paste(" SELECT * FROM Salary_Df
WHERE FullName like ",Employee_Df$FullName[i], "",sep="'"))
saveRDS(df_results,
file=paste0("/Home/Desktop", "Salary",Employee_Df$FullName[i], ".rds"),
compress = TRUE)
}
Any suggestions on how to make this work in foreach is much appreciated.
Related
I want to write data to a table called [foo/bar] inside of a SQLite database created through RSQLite. I wrote the following function to save answers to disk.
save_to_disk = function(data) {
con = dbConnect(SQLite(), "responses.db")
path = "[foo/bar]"
message("Writing the data now!")
message("The database file size was: ", file.size(response_db))
dbWriteTable(con, path, data, append = TRUE)
message("Table append complete!")
message("The database file size is now: ", file.size(response_db))
dbDisconnect(con)
}
However, when I try to pass this function data I see:
Writing the data now!
The database file size was: 8192
Table append complete!
The database file size is now: 8192
If I change the table name to dummy and repeat the process, then instead I see:
Writing the data now!
The database file size was: 8192
Table append complete!
The database file size is now: 12288
It seems as though RSQLite doesn't like my table name for some reason. My understanding was that this was a valid table name since it was wrapped in [...]. In fact, outside of that function I am perfectly able to write to such a table. In other words, this works from the REPL:
test = data.frame(submitted = as.integer(Sys.time()),
respondent = "Alice",
subject = "Alice",
question = "name",
part = "first",
order = 1L,
answer = "Alice")
dbWriteTable(con, "[foo/bar]", test, append = TRUE)
After that I can use dbReadTable to see what I had just entered. Why doesn't it work the same way in a function?
I want to, programmatically, source all .R files contained within a given array retrieved with the Sys.glob() function.
This is the code I wrote:
# fetch the different ETL parts
parts <- Sys.glob("scratch/*.R")
if (length(parts) > 0) {
for (part in parts) {
# source the ETL part
source(part)
# rest of code goes here
# ...
}
} else {
stop("no ETL parts found (no data to process)")
}
The problem I have is I cannot do this or, at least, I get the following error:
simpleError in source(part): scratch/foo.bar.com-https.R:4:151: unexpected string constant
I've tried different combinations for the source() function like the following:
source(sprintf("./%s", part))
source(toString(part))
source(file = part)
source(file = sprintf("./%s", part))
source(file = toString(part))
No luck. As I'm globbing the contents of a directory I need to tell R to source those files. As it's a custom-tailored ETL (extract, transform and load) script, I can manually write:
source("scratch/foo.bar.com-https.R")
source("scratch/bar.bar.com-https.R")
source("scratch/baz.bar.com-https.R")
But that's dirty and right now there are 3 different extraction patterns. They could be 8, 80 or even 2000 different patterns so writing it by hand is not an option.
How can I do this?
Try getting the list of files with dir and then using lapply:
For example, if your files are of the form t1.R, t2.R, etc., and are inside the path "StackOverflow" do:
d = dir(pattern = "^t\\d.R$", path = "StackOverflow/", recursive = T, full.names = T)
m = lapply(d, source)
The option recursive = T will search all subdirectories, and full.names = T will add the path to the filenames.
If you still want to use Sys.glob(), this works too:
d = Sys.glob(paths = "StackOverflow/t*.R")
m = lapply(d, source)
I am running a code and it is really important for me to catch the error and save it for later, but not include it in my final result of the foreach. I have used trycatch and even tried coercing an error using stop. Here is a snippet of my code:
##options namely stop , remove or pass
error_handle <- "remove"
cores <- round(detectCores()*percent)
cl<-makeCluster(cores)
registerDoParallel(cl)
predict_load_all <- foreach(i=1:length(id),.export=func,.packages
(.packages()),.errorhandling = error_handle) %dopar% {
possibleError <- tryCatch({
weather_cast <- data.frame(udata$date,j,coeff_i,predict(hour_fits[[j]],
newdata=udata))
},error=function(e)return(paste0("The hour '",j, "'",
" caused the error: '", e, "'")))
if(!exists("weather_cast")){
#possibleError <- data.frame('coeff_item' = coeff_i,'Error' =
possibleError)
possibleError <- data.frame('Error' = possibleError)
write_csv(possibleError,file.path(path_predict,
'Error_weather_cast.csv'),append = T)
stop('error')
}
colnames(weather_cast)<- c("Date","Hour","coeff_item","Predicted_Load")
ifelse(j==1,predict_load <-weather_cast,predict_load <-
rbind(predict_load,weather_cast))
predict_load <- spread(predict_load, Hour, Predicted_Load)
predict_load
}
I am running the foreach to output predict_load_all. possibleError is the error which needs to be saved, which is bound by a trycatch. This should save the object (satisfies exists condition) and then using stop, an error is induced which gets ignored by the remove(.errohandling object) and the loop in the foreach is skipped. This way, I get the error and a list without the errors.
This doesn't seem to be saving an error file.
Any ideas?
In my experience (and I have a lot more "learning experiences" with parallel processing than actual success stories, unfortunately), this is not an easy task. The best solution I found was to add outfile = '' to the makeCluster command. For me, in a non-interactive mode, this caused (some of?) the error messages to get written to the output file instead of being discarded completely.
I'm trying to catalog the structure of a MSSQL 2008 R2 database using R/RODBC. I have set up a DSN, connected via R and used the sqlTables() command but this is only getting the 'system databases' info.
library(RODBC)
conn1 <- odbcConnect('my_dsn')
sqlTables(conn1)
However if I do this:
library(RODBC)
conn1 <- odbcConnect('my_dsn')
sqlQuery('USE my_db_1')
sqlTables(conn1)
I get the tables associated with the my_db_1 database. Is there a way to see all of the databases and tables without manually typing in a separate USE statement for each?
There may or may not be a more idiomatic way to do this directly in SQL, but we can piece together a data set of all tables from all databases (a bit more programatically than repeated USE xyz; statements) by getting a list of databases from master..sysdatabases and passing these as the catalog argument to sqlTables - e.g.
library(RODBC)
library(DBI)
##
tcon <- RODBC::odbcConnect(
dsn = "my_dsn",
uid = "my_uid",
pwd = "my_pwd"
)
##
db_list <- RODBC::sqlQuery(
channel = tcon,
query = "SELECT name FROM master..sysdatabases")
##
R> RODBC::sqlTables(
channel = tcon,
catalog = db_list[14, 1]
)
(I can't show any of the output for confidentiality reasons, but it produces the correct results.) Of course, in your case you probably want to do something like
all_metadata <- lapply(db_list$name, function(DB) {
RODBC::sqlTables(
channel = tcon,
catalog = DB
)
})
# or some more efficient variant of data.table::rbindlist...
meta_df <- do.call("rbind", all_metadata)
This is in continuation of my related question: Error when trying to interactively load data file saved by paused batch script. I decided to present my question with a reproducible example separately to avoid making the already large description in the previous question even bigger. In the following reproducible example, I expect to retrieve the value of the stored object ("Important data"), but instead, as you see, I retrieve the name of the object itself ("sf.data.devLinks"). I suspected that it could be because of me using as.name, but I tested additionally a primitive example in an interactive session and as.name worked fine. I also, as you see, have tried using eval and substitute, but it didn't help.
library(RCurl)
info <- "Important data"
ATTR <- "SQL"
request <- "SELECT info FROM topSecret"
dataName <- "sf.data.devLinks"
rdataFile <- "/tmp/testAttr.rds"
save <- TRUE
getData <- function() {
return (info)
}
requestDigest <- base64(request)
# check if the archive file has already been processed
message("\nProcessing request \"", request, "\" ...\n")
# read back the object with the attribute
if (file.exists(rdataFile)) {
# now check if request's SQL query hasn't been modified
data <- readRDS(rdataFile)
message("Retrieved object '", data, "', containing:\n")
message(str(data))
requestAttrib <- attr(data, ATTR, exact = TRUE)
if (is.null(requestAttrib)) {
message("Object '", data, "' doesn't have attribute \"",
ATTR, "\"\n")
}
else {
message("Object '", data, "' contains attribute ", ATTR, ":\n\"",
base64(requestAttrib), "\"\n")
if (identical(requestDigest, requestAttrib)) {
message("Processing skipped: RDS file is up-to-date.\n")
save <- FALSE
return
}
}
rm(data)
}
if (save) {
message("Saving results of request \"",
request, "\" as R data object ...\n")
assign(dataName, getData())
data <- as.name(dataName)
#eval(substitute(assign(dataName, getData()),
# list(data <- as.name(dataName))))
# save hash of the request's SQL query as data object's attribute,
# so that we can detect when configuration contains modified query
attr(data, ATTR) <- base64(request)
# save current data frame to RDS file
saveRDS(data, rdataFile)
}
Please note that testing this code requires running it twice (first run - to store the object, second - to retrieve).
The problem is in your use of as.name, not in the code to save the object. This works perfectly fine:
data <- 1:10
object.name <- 'data.name'
query <- 'SELECT * FROM TABLE'
file <- tempfile()
assign(object.name, structure(data, SQL = query))
saveRDS(get(object.name), file)
read.object <- readRDS(file)
identical(read.object, get(object.name))
You're creating a name object, and assigning attributes to it, but you're expecting the data to be there. It won't be, a symbol is just a pointer to the value. You need to use eval() or something similar to get the value from the symbol.
Finally, I was able to return to this question. I have figured out the correct solution, so I'm answering my own question. The answer here is based on my reproducible example, but I have made corresponding changes in my more complex real-world R code. The solution is rather simple, but twofold, as follows.
Replace the original code data <- readRDS(rdataFile) with assign(dataName, readRDS(rdataFile)).
Replace the original code as.name(dataName) with get(dataName). An alternative to get() here is eval(parse(text=dataName)), which IMHO is more cumbersome.
Below I provide the solution's full source code, based on the original reproducible example. I don't provide the script's output, which is easy to reproduce (remember to run script at least twice). Again, thanks to everyone who helped with this question.
library(RCurl)
info <- "Important data"
ATTR <- "SQL"
request <- "SELECT info FROM topSecret"
dataName <- "sf.data.devLinks"
rdataFile <- "/tmp/testAttr.rds"
save <- TRUE
getData <- function() {
return (info)
}
requestDigest <- base64(request)
# check if the archive file has already been processed
message("\nProcessing request \"", request, "\" ...\n")
# read back the object with the attribute
if (file.exists(rdataFile)) {
# now check if request's SQL query hasn't been modified
assign(dataName, readRDS(rdataFile))
message("Retrieved object '", dataName, "', containing:\n")
message(str(get(dataName)))
requestAttrib <- attr(get(dataName), ATTR, exact = TRUE)
if (is.null(requestAttrib)) {
message("Object '", dataName, "' doesn't have attribute \"",
ATTR, "\"\n")
}
else {
message("Object '", dataName, "' contains attribute \"",
ATTR, "\":\n\"", base64(requestAttrib), "\"\n")
if (identical(requestDigest, requestAttrib)) {
message("Processing skipped: RDS file is up-to-date.\n")
save <- FALSE
return
}
}
}
if (save) {
message("Saving results of request \"",
request, "\" as R data object ...\n")
assign(dataName, getData())
message(str(dataName))
data <- get(dataName)
# alternative to using get(), but more cumbersome:
# data <- eval(parse(text=dataName))
# save hash of the request's SQL query as data object's attribute,
# so that we can detect when configuration contains modified query
attr(data, ATTR) <- base64(request)
message(str(data))
# save current data frame to RDS file
saveRDS(data, rdataFile)
}