Getting data in data.frame for user specified variables [duplicate] - r

This question already has an answer here:
Getting user provided data out of a database and into R [closed]
(1 answer)
Closed 9 years ago.
I asked this question yesterday and it was answered but I don't think they really understood my question. I need to extract data from a database. But the user provides the list of variables. I need my code to loop through each requested variable, pull the data associated with it and drop it in a data.frame, ready for analysis.
f.extractVariables<-
structure(function
(dbPath,dbName,table,variables
){
# LOAD LIBRARIES
require(RODBC)
require(xlsx)
setwd(dbPath)
db <- odbcConnectAccess2007(dbName)
#This was my idea on how to loop through the variable list but it won't run
for (i in 1:length(variables))
{
dataCollection <- sqlQuery(db, 'SELECT table.variables[[i]]
FROM table;')
}
#This piece was the solution I was given which runs but all it does is parrot back
#the variable list it doesn't retrieve anything.
variables=paste(variables,collapse=",")
query<-paste(paste("SELECT",variables),
"FROM table",sep='\n')
cat(query)
odbcClose(db)
}
)
#And the user provided input looks something like this. I can change the way the
#variable list comes in to make it easier.
dbPath = 'z:/sites/'
dbName = 'oysterSites.accdb'
table = 'tblDataSiteOysterSamplingPlan'
variables= 'nwLon,nwLat,neLon,neLat'
f.extractVariables(dbPath,dbName,table,variables)

In which language you think you are coding? This is not .NET...
Remove this
for (i in 1:length(variables))
{
dataCollection <- sqlQuery(db, 'SELECT table.variables[[i]]
FROM table;')
}
in which 'SELECT table.variables[[i]] FROM table;' will always be the SAME for all the cycles of the loop, amd change with (at least my guess of what you were trying to do)
for (i in 1:length(variables))
{
dataCollection <- sqlQuery(db, paste0('SELECT table.variables[[', i, ']]
FROM table;'))
}
You say that the above does not work. No point to keep reviewing your code, fix the above and then come back

Related

R - sql query stored as object name does not work with r dbGetquery

Need a little help with the following R code. I’ve got quite a number of data to load from a Microsoft sql database. I tried to do a few things to make the sql queries manageable.
1) Stored the query as object names with unique prefix
2) Using search to return a vector of the object names with unique prefix
3) using for loop to loop through the vector to load data <- this part didn’t work.
Library(odbc)
Library(tidyverse)
Library(stringer)
#setting up dB connection, odbc pkg
db<- DBI::dbConnect(odbc::odbc(),Driver =‘SQL Server’, Server=‘Server_name’, Database=‘Datbase name’, UID=‘User ID’, trusted_connection=‘yes’)
#defining the sql query
Sql_query1<-“select * from db1”
Sql_query2<-“select top 100 * from db2”
#the following is to store the sql query object name in a vector by searching for object names with prefix sql_
Sql_list <- ls()[str_detect(ls(),regex(“sql_”,ignore_case=TRUE))]
#This is the part where the code didn’t work
For (i in Sql_list){ i <- dbGetQuery(db, i)}
The error I’ve got is “Error: ‘Sql_query1’ nanodb.cpp:1587: 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]Could not find stored procedure ‘Sql_query1’
However, if i don’t use the loop, no error occurred! It may be feasible if I’ve only got 2 -3 queries to manage... unfortunately I’ve 20 of them!
dbGetquery(db,Sql_query1)
Can anyone help? Thank you!
#Rohits solution written down:
first part from your side is fine
#setting up dB connection, odbc pkg
db<- DBI::dbConnect(odbc::odbc(),Driver =‘SQL Server’, Server=‘Server_name’, Database=‘Datbase name’, UID=‘User ID’, trusted_connection=‘yes’)
But then it would be more convenient to do something like this:
A more verbose version:
sqlqry_lst <- vector(mode = 'list', length = 2)#create a list to hold queries the in real life length = 20
names(sqlqry_lst) <- paste0('Sql_query', 1:2)#assign names to your list again jut use 1:20 here in your real life example
#put the SQL code into the list elements
sqlqry_lst['Sql_query1'] <- "select * from db1"
sqlqry_lst['Sql_query2'] <- "select top 100 * from db2"
#if you really want to use for loops
res <- vector(mode = 'list', length(sqlqry_lst))#result list
for (i in length(sqlqry_lst)) res[[i]] <- dbGetquery(db,sqlqry_lst[[i]])
Or as a two liner, a bit more R stylish and imho elegant:
sqlqry_lst <- list(Sql_query1="select * from db1", Sql_query2="select top 100 * from db2")
res <- lapply(sqlqry_lst, FUN = dbGetQuery, conn=db)
I suggest you mix and mingle the verbose eg for creating or more precisely for naming the query list and the short version for running the queries against the database as it suits u best.

Search list of keywords in elastic using R elastic package

I am making a shiny app where a user can input an excel file of terms and search them on elastic holdings. The excel file contains one column of keywords and is read into a list, then what I am trying to do is have the each item in the list searched using Search(). I easily did this in Python with a for loop over the terms and then the search connection inside the for loop and got accurate results. I understand that it isn't that easy in R, but I cannot get to the right solution. I am using the R elastic package and have been trying different versions of Search() for over a day. I have not used elastic before so my apologies for not understanding the syntax much. I know that I need to do something with aggs for a list of terms..
Essentially I want to search on the source.body_ field and I want to use match_phrase for searching my terms.
Here is the code I have in Python that works, but I need everything in R for the shiny app and don't want to use reticulate.
queries = list()
for term in my_terms:
search_result = es.search(index="cars", body={"query": {"match_phrase": {'body_':term}}}, size = 5000)
search_result.update([('term', term)])
queries.append(search_result)
I established my elastic connection as con and made sure it can bring back accurate matches on just one keyword with:
match <- {"query": {"match_phrase" : {"body_" : "mustang"}}}
search_results <- Search(con, index="cars", body = match, asdf = TRUE)
That worked how I expected it to with just one keyword explicitly defined.
So after that, here is what I have tried for a list of my_terms:
aggs <- '{"aggs":{"stats":{"terms":{"field":"my_terms"}}}}'
queries <- data.frame()
for (term in my_terms) {
final <- Search(con, index="cars", body = aggs, asdf = TRUE, size = 5000)
rbind(queries, final)
}
When I run this, it just brings back everything in my elastic. I also tried running that with the for loop commented out and that didn't work.
I also have tried to embed my_terms inside my match list from the single term search at the beginning of this post like so:
match <- '{"query": {"match_phrase" : {"body_": "my_terms"}}}'
Search(con, index="cars", body = match, asdf = TRUE, size = 5000)
This returns nothing. I have tried so many combinations of the aggs list and match list but nothing returns what I'm looking for. Any advice would be much appreciated, I feel like I've read just about everything so far and now I'm just confused.
UPDATE:
I figured it out with
p <- data.frame()
for (t in the_terms$keyword) {
result <- Search(con, index="cars", body = paste0('{"query": {"match_phrase" : {"body_":', '"', t, '"', '}}}'), asdf = TRUE, size = 5000)
p <- rbind(p, result$hit$hit)
}

Adding rows to different dataframe programatically

Im creating a dataframe dynamically and Im using custom names to refer to those data frames.How ever, I can succesfully create the data frames dynamically and add information individually but manually when i try to add a record to it it will run the action but nothing happens. I can open the data frame and it shows as empty
#Extract unique machines on the system
machines <- unique(wo_raw$MACHINE)
for(machine in machines){
#Check if the machine is present on current data frames or has a record
if(exists(machine) && is.data.frame(get(machine))){
#Machine already exists on the system
cat(machine," is a dataframe","\n")
netlbs <- subset(wo_raw,((wo_raw$TYPE =="T" & wo_raw$TYPE2=="E") | (wo_raw$TYPE == "T" & is.na(wo_raw$TYPE2))) & wo_raw$WEEK<=curWeek & wo_raw$MACHINE == machine & wo_raw$YEAR == curYear,select = NET_LBS)
scraplbs<- subset(wo_raw,((wo_raw$TYPE =="T" & wo_raw$TYPE2=="E") | (wo_raw$TYPE == "T" & is.na(wo_raw$TYPE2))) & wo_raw$WEEK<=curWeek & wo_raw$MACHINE == machine & wo_raw$YEAR == curYear,select = SCRAP_LBS)
if(is.data.frame(netlbs) && nrow(netlbs)!=0){
totalNet<- sum(netlbs)
totalScrap<- sum(scraplbs)
scrapRate <- percent(totalScrap/(sum(totalNet,totalScrap)),accuracy = 2)
tempDf<-data.frame(curYear,curMonth,curDay,curWeek,totalNet,totalScrap,scrapRate)
names(tempDf)<-c("year","month","day","week","net_lbs","scrap_lbs","scrap_rate")
cat("Total Net lbs for ",machine,": ",totalNet,"\n")
cat("Total Scrap lbs for ",machine,": ",totalScrap,"\n")
cat("Total Scrap Rate for ",machine,": ",scrapRate,"\n")
#machine<-rbind(get(machine),tempDf)
#assign(machine,rbind(machine,tempDf))
add_row(get(machine),year=curYear,
month=curMonth,
day=curDay,
week=curWeek,
net_lbs=totalNet,
scrap_lbs=totalScrap,
scrap_rate=scrapRate)
cat("added row \n")
}
#info<-c(curYear,curMonth,curDay,curWeek,netlbs)
#cat("Total Net lbs: ",netlbs,"\n")
#netlbs <-NULL
}else{
cat("Creating machine dataframe: ",machine,"\n")
#Create a dataframe labeled with machine name contining
#date information, net lbs,scrap lbs and scrap rate
assign(paste0(machine,""),data.frame(year=integer(),
month=integer(),
day=integer(),
week=integer(),
net_lbs=double(),
scrap_lbs=double(),
scrap_rate=integer()
)
)
#machine$year<-curYear
}
#machine<-NULL
}
All the functions that I've tried are in commented lines from previous answers found on Stack Overflow. I did get working with a for but i dont think that would be really feasible since it will consume a lot of resources plus it doesn't work well when handling various data types . Does anybody have an idea of whats going on, I don't have an error to go by.
I think your code needs quite a lot of cleanup. Make sure you know for yourself at each step what exactly you are handling.
Some hints:
Try to make your code self-contained. If I run your code, I get an error right away, as I don't have wo_raw defined. I understand it's some kind of data.frame, but exactly what is in there? What do I need to do to try to run your code? Also with variables like curYear. I get that it needs to be 2019, but I need to type an awful lot to just get to the problem, I can't just copy-paste.
If you use any libraries, please also include a line for them. I don't know what add_row does or is supposed to do. So I also don't know if that's where your expectations are wrong?
Try to make your code minimal before posting it here. I like the comments and cats sprinkled throughout, but why a line such as netlbs <- subset(wo_raw,((wo_raw$TYPE =="T" & wo_raw$TYPE2=="E") | (wo_raw$TYPE == "T" & is.na(wo_raw$TYPE2))) & wo_raw$WEEK<=curWeek & wo_raw$MACHINE == machine & wo_raw$YEAR == curYear,select = NET_LBS)? For this problem, just something like subset(wo_raw, wo_raw$mach==machine, net) would suffice
I get that the code works, but try to work out where you are using what kind of objects. if (is.data.frame(netlbs)) {total=sum(netlbs)} may work, but summing a data.frame while you actually just need a column leads to confusion.
When using variables to store the names of other variables such as you are doing, be very aware of what you are actually refering to. For that reason, it's generally advisable to steer clear of these constructs, it's almost always easier to store your results in a list or something similar
Come to that: the variable machine is not a data.frame, it's a character. That character is the name of another variable, which is a data.frame. So (commented out) I see some instances of machine <- NULL and machine$year, those are wrong. As is rbind(machine, ...), as machine is not a data.frame
That being said, I think you got close with the assign-statement.
Does assign(machine,rbind(get(machine),tempDf)) work?

How do I cache vectorized calls that take user input in R?

I am trying to calculate a field for all rows of a large dataset. The function to calculate it is from the package taxize, and uses an HTTP request to query an external site for the right ID number. It is searching by scientific name, and often there are multiple results, in which case this function asks for user input. I would like the function to cache my selection and return that ID number every time the same call is made from then on. I have tried with my own caching function and with memoizedCall() from the package R.cache but every time it hits the second entry of the same scientific name it still prompts me for user input. I feel like I am misunderstanding something basic about how vectorization works. Sorry for my ignorance but any advice is appreciated.
Here is the code I used as a custom caching function.
check_tsn <- function(data,tsn_list){
print(data)
print(tsn_list)
if (is.null(tsn_list$data)){
tsn_list$data = taxize::get_tsn(data)
print('added to tsn_list')
}
return(tsn_list$data)
}
tsn_list <- vector(mode = "list", nrow(wanglang))
Genus.Species <- c('Tamiops swinhoei','Bos taurus','Tamiops swinhoei')
IUCN.ID <- c('21382','','21382')
species <- data.frame(Genus.Species,IUCN.ID)
species$TSN.ID = check_tsn(species$Genus.Species,tsn_list)

Running R script_Readline and Scan does not pause for user input

I have looked at other posts that appeared similar to this question but they have not helped me. This may be just my ignorance of R. Thus I decided to sign up and make my first post on stack-overflow.
I am running an R-script and would like the user to decide either to use one of the two following loops. The code to decide user input looks similar to the one below:
#Define the function
method.choice<-function() {
Method.to.use<-readline("Please enter 'New' for new method and'Old' for old method: ")
while(Method.to.use!="New" && Method.to.use!="Old"){ #Make sure selection is one of two inputs
cat("You have not entered a valid input, try again", "\n")
Method.to.use<-readline("Please enter 'New' for new method and 'Old' for old method: ")
cat("You have selected", Method.to.use, "\n")
}
return(Method.to.use)
}
#Run the function
method.choice()
Then below this I have the two possible choices:
if(Method.to.use=="New") {
for(i in 1:nrow(linelist)){...}
}
if(Method.to.use=="Old"){
for(i in 1:nrow(linelist)){...}
}
My issue is, and what I have read from other posts, is that whether I use "readline", "scan" or "ask", R does not wait for my input. Instead R will use the following lines as the input.
The only way I found that R would pause for input is if the code is all on the same line or if it is run line by line (instead of selecting all the code at once). See example from gtools using "ask":
silly <- function()
{
age <- ask("How old are you? ")
age <- as.numeric(age)
cat("In 10 years you will be", age+10, "years old!\n")
}
This runs with a pause:
silly(); paste("this is quite silly")
This does not wait for input:
silly()
paste("this is quite silly")
Any guidance would be appreciated to ensure I can still run my entire script and have it pause at readline without continuing. I am using R-studio and I have checked that interactive==TRUE.
The only other work-around I found is wrapping my entire script into one main function, which is not ideal for me. This may require me to use <<- to write to my environment.
Thank you in advance.

Resources