R file runs great in R studio, not VSCode - r

Quick backstory:
I've already done quite a few other R scripts in VSCode. I have the R extension, I've knitted .rmd's, etc. etc...
So in other words, unlike the solutions posted here which dealt more with getting R to work in the first place (in visual studio at least), I've already got R working for most things within VSCode.
So I have an R file that I'll put the code to below, but when I open the file in RStudio, it works great! It creates a jdbc connection, queries a database using some SQL, creates a dataframe, etc...
When I close that file, and then open it in VSCode, it'll run MOST of the r code within it, but when it goes to do the query, I get this sql error:
"JDBC ERROR: ORA-00907: missing right parenthesis"
And its the exact same file! I'd google what ORA-00907 means for SQL and how to fix it.... but the code DOES work in Rstudio?
One other thing I noticed is the problem does NOT happen when I run the file as a whole from within VsCode, i.e.:
source("BlackBox.R")
If I do that, I'll step through everything and save out the results of the query as a .csv like I want it to. But if I OPEN the file and go through line by line, or try to run the whole thing, or anything.... it won't work.
Code below (with names changed to protect the innocent):
library(tidyverse)
Sys.setenv(JAVA_HOME="C:\\Program Files\\TIBCO\\Jaspersoft Studio-6.6.0\\features\\jre.win32.win32.x86_64.feature_1.8.0.u171\\jre")
options(java.parameters="-Xmx2g")
replacement <- function(category = "LC_ALL") {
if (identical(category, "LC_MESSAGES"))
return("")
category <- match(category, .LC.categories)
if (is.na(category))
stop("invalid 'category' argument")
.Internal(Sys.getlocale(category))
}
base <- asNamespace("base")
environment(replacement) <- base
unlockBinding("Sys.getlocale", base)
assign("Sys.getlocale", replacement, envir = base)
lockBinding("Sys.getlocale", base)
library(rJava)
rJava::.jinit()
library(RJDBC)
jdbcDriver <- JDBC(driverClass="oracle.jdbc.driver.OracleDriver", classPath="C:/Users/johnDoe/OneDrive - Company/Documents/Data/jar files/ojdbc8.jar")
jdbcConnection <- dbConnect(jdbcDriver, "jdbc:oracle:thin:#(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=website.com)(PORT=2879))(CONNECT_DATA=(SERVICE_NAME=pdblhs)))", "name", "password")
Subjects <- dbGetQuery(jdbcConnection,
"WITH usa AS (
SELECT subj_access.protocol_id
, subj_access.protocol_subject_id
FROM website.sv_user_pcl_permission priv_check
JOIN website.sv_user_pcs_access subj_access ON priv_check.protocol_id = subj_access.protocol_id AND priv_check.contact_id = subj_access.contact_id
WHERE priv_check.function_name = 'CRPT-Subject Visits'
AND priv_check.contact_id = '1234')
, subjects AS (
SELECT Protocol_id, Protocol_no, Protocol_subject_id,subject_no, sequence_number, status,status_date
FROM website.RV_SUBJECT_STATUS
)
, all_visit AS (
SELECT protocol_subject_id, visit_name, visit_date, planned_visit_date, visit_desc FROM website.rv_sub_calendar WHERE visit_status = 'Planned' and visit_date > ADD_MONTHS(SYSDATE, 0)
)
, max_visit AS (
SELECT protocol_subject_id, visit_name, visit_date, planned_visit_date, visit_desc FROM LHS_ONCORE_PROD.all_visit WHERE (visit_date, protocol_subject_id) IN (SELECT MIN(visit_date) visit_date, protocol_subject_id FROM all_visit GROUP BY protocol_subject_id)--
)
, followup AS (
SELECT protocol_subject_id, off_studydate, off_study_reason FROM website.rv_subject_follow_up
)
SELECT subjects.protocol_id
, subjects.protocol_no
, subjects.protocol_subject_id
, subjects.subject_no
, subjects.sequence_number
, subjects.status
, subjects.status_date
, max_visit.visit_date
, max_visit.planned_visit_date
, max_visit.visit_name
, max_visit.visit_desc
, followup.off_studydate
, followup.off_study_reason
FROM usa
INNER JOIN subjects ON subjects.Protocol_subject_id = usa.protocol_subject_id
LEFT JOIN max_visit ON max_visit.protocol_subject_id = usa.protocol_subject_id
LEFT JOIN followup ON followup.Protocol_subject_id = subjects.Protocol_subject_id
")
# Close connection
dbDisconnect(jdbcConnection) ##Closes it.
setwd("A:/Project Documents/R Database connections/place")
write_csv(Subjects,"Subject.csv")
Any ideas? At least where to look?
Update:
It's something to do with the spacing in VSCode. For instance if I run this simple command on one line, it'll recognize it as a new object and save it into the environment:
But if I run this command thats split over two lines, it'll trap me in an endless loop of looking for a right parenthesis or something:
And when I highlight several lines from the query, it'll show me different spacing/indentations in the beginning of each line even though they all look the same:

Related

R - sql query stored as object name does not work with r dbGetquery

Need a little help with the following R code. I’ve got quite a number of data to load from a Microsoft sql database. I tried to do a few things to make the sql queries manageable.
1) Stored the query as object names with unique prefix
2) Using search to return a vector of the object names with unique prefix
3) using for loop to loop through the vector to load data <- this part didn’t work.
Library(odbc)
Library(tidyverse)
Library(stringer)
#setting up dB connection, odbc pkg
db<- DBI::dbConnect(odbc::odbc(),Driver =‘SQL Server’, Server=‘Server_name’, Database=‘Datbase name’, UID=‘User ID’, trusted_connection=‘yes’)
#defining the sql query
Sql_query1<-“select * from db1”
Sql_query2<-“select top 100 * from db2”
#the following is to store the sql query object name in a vector by searching for object names with prefix sql_
Sql_list <- ls()[str_detect(ls(),regex(“sql_”,ignore_case=TRUE))]
#This is the part where the code didn’t work
For (i in Sql_list){ i <- dbGetQuery(db, i)}
The error I’ve got is “Error: ‘Sql_query1’ nanodb.cpp:1587: 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]Could not find stored procedure ‘Sql_query1’
However, if i don’t use the loop, no error occurred! It may be feasible if I’ve only got 2 -3 queries to manage... unfortunately I’ve 20 of them!
dbGetquery(db,Sql_query1)
Can anyone help? Thank you!
#Rohits solution written down:
first part from your side is fine
#setting up dB connection, odbc pkg
db<- DBI::dbConnect(odbc::odbc(),Driver =‘SQL Server’, Server=‘Server_name’, Database=‘Datbase name’, UID=‘User ID’, trusted_connection=‘yes’)
But then it would be more convenient to do something like this:
A more verbose version:
sqlqry_lst <- vector(mode = 'list', length = 2)#create a list to hold queries the in real life length = 20
names(sqlqry_lst) <- paste0('Sql_query', 1:2)#assign names to your list again jut use 1:20 here in your real life example
#put the SQL code into the list elements
sqlqry_lst['Sql_query1'] <- "select * from db1"
sqlqry_lst['Sql_query2'] <- "select top 100 * from db2"
#if you really want to use for loops
res <- vector(mode = 'list', length(sqlqry_lst))#result list
for (i in length(sqlqry_lst)) res[[i]] <- dbGetquery(db,sqlqry_lst[[i]])
Or as a two liner, a bit more R stylish and imho elegant:
sqlqry_lst <- list(Sql_query1="select * from db1", Sql_query2="select top 100 * from db2")
res <- lapply(sqlqry_lst, FUN = dbGetQuery, conn=db)
I suggest you mix and mingle the verbose eg for creating or more precisely for naming the query list and the short version for running the queries against the database as it suits u best.

Search list of keywords in elastic using R elastic package

I am making a shiny app where a user can input an excel file of terms and search them on elastic holdings. The excel file contains one column of keywords and is read into a list, then what I am trying to do is have the each item in the list searched using Search(). I easily did this in Python with a for loop over the terms and then the search connection inside the for loop and got accurate results. I understand that it isn't that easy in R, but I cannot get to the right solution. I am using the R elastic package and have been trying different versions of Search() for over a day. I have not used elastic before so my apologies for not understanding the syntax much. I know that I need to do something with aggs for a list of terms..
Essentially I want to search on the source.body_ field and I want to use match_phrase for searching my terms.
Here is the code I have in Python that works, but I need everything in R for the shiny app and don't want to use reticulate.
queries = list()
for term in my_terms:
search_result = es.search(index="cars", body={"query": {"match_phrase": {'body_':term}}}, size = 5000)
search_result.update([('term', term)])
queries.append(search_result)
I established my elastic connection as con and made sure it can bring back accurate matches on just one keyword with:
match <- {"query": {"match_phrase" : {"body_" : "mustang"}}}
search_results <- Search(con, index="cars", body = match, asdf = TRUE)
That worked how I expected it to with just one keyword explicitly defined.
So after that, here is what I have tried for a list of my_terms:
aggs <- '{"aggs":{"stats":{"terms":{"field":"my_terms"}}}}'
queries <- data.frame()
for (term in my_terms) {
final <- Search(con, index="cars", body = aggs, asdf = TRUE, size = 5000)
rbind(queries, final)
}
When I run this, it just brings back everything in my elastic. I also tried running that with the for loop commented out and that didn't work.
I also have tried to embed my_terms inside my match list from the single term search at the beginning of this post like so:
match <- '{"query": {"match_phrase" : {"body_": "my_terms"}}}'
Search(con, index="cars", body = match, asdf = TRUE, size = 5000)
This returns nothing. I have tried so many combinations of the aggs list and match list but nothing returns what I'm looking for. Any advice would be much appreciated, I feel like I've read just about everything so far and now I'm just confused.
UPDATE:
I figured it out with
p <- data.frame()
for (t in the_terms$keyword) {
result <- Search(con, index="cars", body = paste0('{"query": {"match_phrase" : {"body_":', '"', t, '"', '}}}'), asdf = TRUE, size = 5000)
p <- rbind(p, result$hit$hit)
}

R: While loop input

I am bit new to R and have a question about a program I am trying to write. I am hoping to take in files (as many as a user pleases) with a while loop (eventually using read.table on each) but it keeps breaking on me.
Here is what I have so far:
cat("Please enter the full path for your files, if you have no more files to add enter 'X': ")
fil<-readLines(con="stdin", 1)
cat(fil, "\n")
while (!input=='X' | !input=='x'){
inputfile=input
input<- readline("Please enter the full path for your files, if you have no more files to add enter 'X': ")
}
if(input=='X' | input=='x'){
exit -1
}
When I run it (from the commandline (UNIX)) I get these results:
> library("lattice")
>
> cat("Please enter the full path for your files, if you have no more files to add enter 'X': ")
Please enter the full path for your files, if you have no more files to add enter 'X': > fil<-readLines(con="stdin", 1)
x
> cat(fil, "\n")
x
> while (!input=='X' | !input=='x'){
+ inputfile=input
+ input<- readline("Please enter the full path for your files, if you have no more files to add enter 'X': ")
+ }
Error: object 'input' not found
Execution halted
I am not quite sure how to fix the problem, but I am pretty sure that it is probably a simple problem.
Any suggestions?
Thanks!
when you first run the script input doesnt exist. Assign
input<-c()
say before your while statement or put
inputfile=input
below input<- readline....
I'm not exactly sure what the underlying problem is for your issue. It may be that you're inputting the directory path incorrectly.
Here's a solution I've used a few times. It makes it much easier for the user. Basically, your code will not require user input, all it requires is that you have a certain naming convention for your files.
setwd("Your/Working/Directory") #This doesn't change
filecontents <- 1
i <- 1
while (filecontents != 0) {
mydata.csv <- try(read.csv(paste("CSV_file_",i,".csv", sep = ""), header = FALSE), silent = TRUE)
if (typeof(mydata.csv) != "list") { #checks to see if the imported data is a list
filecontents <- 0
}
else {
assign(paste('dataset',i, sep=''), mydata)
#Whatever operations you want to do on the files.
i <- i + 1
}
}
As you can see, the naming convention for the files is CSV_file_n where n is any number of input files (i took this code out of one of my programs, in which I load csv's). One of the problems I kept having was Error messages popping up when my code looked for a file that wasn't there. With this loop, those messages won't arise. If it assigns the contents of a non-existant file to mydata.csv, it merely checks to see if mydata.csv is a list. If it is, it continues operating. If not, it stops. If you're worried about differentiating between your data from different files within the code, just insert any relevant information about the file in a constant location within the file itself. For example, in my csv's, My 3rd column always contained the name of the image from which I gathered the information contained in the rest of the csv.
Hope this helps you a bit, even though I see you've already got a solution :-). It's really just an option if you want your program to be more autonomous.

Retrieving Variable Declaration

How can I find how did I first declare a certain variable when I am a few hundred
lines down from where I first declared it. For example, I have declared the following:
a <- c(vectorA,vectorB,vectorC)
and now I want to see how I declared it. How can I do that?
Thanks.
You could try using the history command:
history(pattern = "a <-")
to try to find lines in your history where you assigned something to the variable a. I think this matches exactly, though, so you may have to watch out for spaces.
Indeed, if you type history at the command line, it doesn't appear to be doing anything fancier than saving the current history in a tempfile, loading it back in using readLines and then searching it using grep. It ought to be fairly simple to modify that function to include more functionality...for example, this modification will cause it to return the matching lines so you can store it in a variable:
myHistory <- function (max.show = 25, reverse = FALSE, pattern, ...)
{
file1 <- tempfile("Rrawhist")
savehistory(file1)
rawhist <- readLines(file1)
unlink(file1)
if (!missing(pattern))
rawhist <- unique(grep(pattern, rawhist, value = TRUE,
...))
nlines <- length(rawhist)
if (nlines) {
inds <- max(1, nlines - max.show):nlines
if (reverse)
inds <- rev(inds)
}
else inds <- integer()
#file2 <- tempfile("hist")
#writeLines(rawhist[inds], file2)
#file.show(file2, title = "R History", delete.file = TRUE)
rawhist[inds]
}
I will assume you're using the default R console. If you're on Windows, you can File -> Save history and open the file in your fav text browser, or you can use function savehistory() (see help(savehistory)).
What you need to do is get a (good) IDE, or at least a decent text editor. You will benevit from code folding, syntax coloring and much more. There's a plethora of choices, from Tinn-R, VIM, ESS, Eclipse+StatET, RStudio or RevolutionR among others.
You can run grep 'a<-' .Rhistory from terminal (assuming that you've cdd to your working directory). ESS has several very useful history-searching functions, like (comint-history-isearch-backward-regexp) - binded to M-r by default.
For further info, consult ESS manual: http://ess.r-project.org/Manual/ess.html
When you define a function, R stores the source code of the function (preserving formatting and comments) in an attribute named "source". When you type the name of the function, you will get this content printed.
But it doesn't do this with variables. You can deparse a variable, which generates an expression that will produce the variable's value but this doesn't need to be the original expression. For example when you have b <- c(17, 5, 21), deparse(b) will produce the string "c(17, 5, 21)".
In your example, however, the result wouldn't be "c(vectorA,vectorB,vectorC)", it would be an expression that produces the combined result of your three vectors.

How can I get Emacs ess to recognize a query string (within quotes) as code?

Background
I have a function dbquery that simplifies the process of querying a MySQL database from within R.
dbquery <- function(querystring) {
dvr <- dbDriver("MySQL")
con <- dbConnect(dvr, group = "databasename")
q <- dbSendQuery(con, querystring)
data <- fetch(q, n = -1)
return(data)
}
Thus I can send:
dbquery(querystring = "select field_1, field_2, field_3
from table_a join table_b on this = that
join table_c on that = something
where field_4 in (1,2,3);"
However, the variable querystring must be contained within quotes. This makes it so that Emacs ESS will not nicely indent my queries like it would if it were in SQL mode - or even like it does if there are no quotes but just in ESS-R mode.
Question
Is it possible to get ESS to do this? Perhaps by writing the function so that it will accept the query without a quote (and add the quotes within the function), or perhaps adding something to .emacs or ess.el?
I think what you want in MMM Mode. As his name suggests: MultiMajorMode Mode allows to have multiple modes on different regions of the same buffer.
I recommend that you checkout the examples in http://www.emacswiki.org/emacs/HtmlModeDeluxe as they will probably give you an idea how to do it in your case (you might want to add some comment in your code around the sql so that MMM can find the sql code).
You would have to do something like this I guess (untested):
(require 'mmm-mode)
(mmm-add-group
'sql-in-ess
'(
(sql-query
:submode sql-mode
:face WHATEVERYOUWANT
:front "#SQL_QUERY>"
:back "#<SQL_QUERY"))
(add-to-list 'mmm-mode-ext-classes-alist '(ess-mode nil sql-in-ess))
However, this might be overkill, unless it happens a lot that you have complex sql queries in the R code.
I don't know of any way to do this. It seems like you're asking, "can I make Emacs be in two modes simultaneously? (i.e. ESS and SQL)" I think the answer is "no" but I hope that someone comes along and shows us a cleaver hack that proves me wrong!
A simple alternative approach would be to use paste, with each line a separate string:
dbquery(querystring = paste("select field_1, field_2, field_3",
"from table_a join table_b on this = that",
"join table_c on that = something",
"where field_4 in (1,2,3);"))
Perhaps a bit clunky, but it works in practice.

Resources