Storing Multiple Query results in a single variable - r

I'm running into problem of storing results of multiple queries into a list.
Currently i have the results from previously executed query stored in a list.
Currently i can't find a way to store the results into a list or anything of that like.
queryResults2 return a 2D list.
#Sample code for 2nd select
for(i in length(queryResults[[1]]){
query_pd <-paste("Select price_date,price from price_master where stock_id ='",queryResults[[1]][i],"' order by price_date")
queryResults2 <- dbGetQuery(conn, query_pd)
#storing value here
}

First of all, don't worry too much about the whole "don't use loops" thing. Here are three basic options:
for loop. The basic template would be:
result <- vector("list",length(queryResults[[1]])
for (i in ...){
#Do stuff
result[[i]] <- something
}
lapply Here the basic format would be:
lapply(seq_len(queryResults[[1]]),function(i) dbGetQuery(conn,paste(...)))
You don't necessarily need the function to take the query and connection object as arguments, R's scoping will be able to find them if they exist in the calling environment.
Run it all as one query with an IN clause and then split is afterwards.
Personally, I try to use (3) as much as possible.

Related

R loop to create multiple objects from equally as many existing objects

I've tried searching for the answer to this but am having trouble because I'm not sure I'm even searching the right thing. Basically I would like in R to create a loop to create multiple objects, each from a different object. For example, let's say I have 50 existing objects (matrix, data frame, graph etc.) and they are all named similarly (table1, table2...table50). Now I would like to create 50 new objects, lets say graph1...graph50. I'm having trouble with a loop because I don't know how to work with the names being strings. I've tried the assign function, but it isn't dynamic enough in the assignment argument. I would basically like something like this:
for (i in list(table names)){
graph "i" <- as.network(table "i")
}
I would also like to have this hold for objects assigned as some function of itself ie graph "i" <- somefunction(graph "i") etc...
Additionally if there is a more efficient way by all means I'm open to it. It seems like an easy task but I can't figure it out. Right now I'm literally just concatenating the statements in excel and pasting to R so it doesn't take too long, but it is a pain. Thank you.
I think you could have a nested loop to do what you're looking for; you could could apply whatever transformations you're wanting to each object within the input list and store the results in a new list with the same object names.
in_list <- list(table1 = iris,
table2 = EuStockMarkets)
out_list <- list()
for(i in 1:length(in_list)){
for(j in colnames(in_list[[i]])){
out_list[[ gsub("table", "graph", names(in_list)[i]) ]][[j]] <- summary(in_list[[i]][,j])
}
}
Hope this helps!

Ordered Map / Hash Table in R

While working with lists i've noticed an issue that i didn't expect.
result5 <- vector("list",length(queryResults[[1]]))
for(i in 1:length(queryResults[[1]])){
id <- queryResults[[1]][i]
result5[[id]] <-getPrices(id)
}
The problem is that after this code runs instead of the result staying the same size (w/e queryResults[[1]] is) it goes up to the last index creating a bunch of null entries in the middle.
result5 current stores a number of int,double lists so it looks like :
result5[[index(int)]][[row]][col]
While on it's own it's not too problematic I would rather avoid that simply for easier size calculations later on.
For clarification, id is an integer. And in the given case for loop offers same performance, but greater convenience than the apply functions.
After some testing seems like the easiest way of doing it is :
Using a hash package to convert it using a hash using :
result6 <- hash(queryResults[[1]],lapply(queryResults[[1]],getPrices))
And if it needs to get accessed calling
result6[[toString(id)]]
With the difference in performance being marginal, albeit it's still fairly annoying having to include toString in your code.
It's not clear exactly what your question is, but judging by the structure of the loop, you probably want
result5[[i]] <- getPrices(id)
rather than result5[[id]] <- getPrices(id).

returning different data frames in a function - R

Is it possible to return 4 different data frames from one function?
Scenario:
I am trying to read a file, parse it, and return some parts of the file.
My function looks something like this:
parseFile <- function(file){
carFile <- read.table(file, header=TRUE, sep="\t")
carNames <- carFile[1,]
carYear <- colnames(carFile)
return(list(carFile,carNames,carYear))
}
I don't want to have to use list(carFile,carNames,carYear). Is there a way return the 3 data frames without returning them in a list first?
R does not support multiple return values. You want to do something like:
foo = function(x,y){return(x+y,x-y)}
plus,minus = foo(10,4)
yeah? Well, you can't. You get an error that R cannot return multiple values.
You've already found the solution - put them in a list and then get the data frames from the list. This is efficient - there is no conversion or copying of the data frames from one block of memory to another.
This is also logical, the return from a function should conceptually be a single entity with some meaning that is transferred to whatever function is calling it. This meaning is also better conveyed if you name the returned values of the list.
You could use a technique to create multiple objects in the calling environment, but when you do that, kittens die.
Note in your example carYear isn't a data frame - its a character vector of column names.
There are other ways you could do that, if you really really want, in R.
assign('carFile',carFile,envir=parent.frame())
If you use that, then carFile will be created in the calling environment. As Spacedman indicated you can only return one thing from your function and the clean solution is to go for the list.
In addition, my personal opinion is that if you find yourself in such a situation, where you feel like you need to return multiple dataframes with one function, or do something that no one has ever done before, you should really revisit your approach. In most cases you could find a cleaner solution with an additional function perhaps, or with the recommended (i.e. list).
In other words the
envir=parent.frame()
will do the job, but as SpacedMan mentioned
when you do that, kittens die
The zeallot package does what you need in a similar that Python can unpack variables from a function. Reproducible example below.
parseFile <- function(){
carMPG <- mtcars$mpg
carName <- rownames(mtcars)
carCYL <- mtcars$cyl
return(list(carMPG,carName,carCYL))
}
library(zeallot)
c(myFile, myName, myYear) %<-% parseFile()

How to use a value that is specified in a function call as a "variable"

I am wondering if it is possible in R to use a value that is declared in a function call as a "variable" part of the function itself, similar to the functionality that is available in SAS IML.
Given something like this:
put.together <- function(suffix, numbers) {
new.suffix <<- as.data.frame(numbers)
return(new.suffix)
}
x <- c(seq(1000,1012, 1))
put.together(part.a, x)
new.part.a ##### does not exist!!
new.suffix ##### does exist
As it is written, the function returns a dataframe called new.suffix, as it should because that is what I'm asking it to do.
I would like to get a dataframe returned that is called new.part.a.
EDIT: Additional information was requested regarding the purpose of the analysis
The purpose of the question is to produce dataframes that will be sent to another function for analysis.
There exists a data bank where elements are organized into groups by number, and other people organize the groups
into a meaningful set.
Each group has an id number. I use the information supplied by others to put the groups together as they are specified.
For example, I would be given a set of id numbers like: part-1 = 102263, 102338, 202236, 302342, 902273, 102337, 402233.
So, part-1 has seven groups, each group having several elements.
I use the id numbers in a merge so that only the groups of interest are extracted from the large data bank.
The following is what I have for one set:
### all.possible.elements.bank <- .csv file from large database ###
id.part.1 <- as.data.frame(c(102263, 102338, 202236, 302342, 902273, 102337, 402233))
bank.names <- c("bank.id")
colnames(id.part.1) <- bank.names
part.sort <- matrix(seq(1,nrow(id.part.1),1))
sort.part.1 <- cbind(id.part.1, part.sort)
final.part.1 <- as.data.frame(merge(sort.part.1, all.possible.elements.bank,
by="bank.id", all.x=TRUE))
The process above is repeated many, many times.
I know that I could do this for all of the collections that I would pull together, but I thought I would be able to wrap the selection process into a function. The only things that would change would be the part numbers (part-1, part-2, etc..) and the groups that are selected out.
It is possible using the assign function (and possibly deparse and substitute), but it is strongly discouraged to do things like this. Why can't you just return the data frame and call the function like:
new.part.a <- put.together(x)
Which is the generally better approach.
If you really want to change things in the global environment then you may want a macro, see the defmacro function in the gtools package and most importantly read the document in the refrences section on the help page.
This is rarely something you should want to do... assigning to things out of the function environment can get you into all sorts of trouble.
However, you can do it using assign:
put.together <- function(suffix, numbers) {
assign(paste('new',
deparse(substitute(suffix)),
sep='.'),
as.data.frame(numbers),
envir=parent.env(environment()))
}
put.together(part.a, 1:20)
But like Greg said, its usually not necessary, and always dangerous if used incorrectly.

Efficent way to query an sqlite database with an R vector

I have a vector of values in R and want to get responding values from a sqlite database. I use the following code.
values = c()
for (a in keys)
{
result <- dbGetQuery(con," SELECT content FROM aacontent WHERE Id=?",a)
values = c(values,results)
}
Unfortunatly, this code is very slow. Is there a more efficent way to do this?
Thanks,
Johannes
If aacontent isn't very large then read it all into R and use something like R's match function, or the sqldf function, or data.table functions
If aacontent is too large for that, and keys is small-ish, then write keys to an sqlite table and do a join query. You might benefit from creating an index on one or both of them.
The are certainly pre-built tools for SQL querying tasks like this from R (since you're using SQLite, I'd be sure to check out sqldf), but in my experience I just end up writing lots of little helper-wrapper functions for building queries.
For instance, in your case, your problem isn't really the R piece, it's that you want to roll all the values in keys into one query. So you'd want a query that looks more like:
SELECT content FROM aacontent WHERE Id IN (val1,val2,...)
and then the trick is using paste in R to build the IN clause. I tend to just use a simple wrapper function on dbGetQuery that uses the ... argument and paste to stitch queries together from various pieces. Something like this:
myQuery <- function(con,...){
arg <- list(...)
res <- dbGetQuery(con,paste(arg,collapse = ""))
res
}
So that it's a bit easier to stitch together stuff when using IN clauses:
myQuery(con,"SELECT content FROM aacontent WHERE Id IN (",
paste(keys,collapse = ","),"))
Note that it's a bit harder if the values in keys are characters, since then you need to do some more work with paste to get single quotes around each element, but it's not that much more work.
This advice is all more relevant if the db in question is fairly small; if you're dealing with bigger data, Spacedman's suggestions are probably more worth looking into.

Resources