Assigning names to a dataframe via a loop in R - r

There appear to be some similar questions but I cannot quite get my head round them at this late hour
I am trying to manipulate a set of dataframes based on sql calls - something like this
x <- c(3,9,12) # x is of variable length in real world
for (i in 1:length(x)) {
nam <- paste("df",i, sep="")
assign(nam) <- sqlQuery(channel,paste(
"Select myCol from myTable where myVal =",x[i],sep=""));
}
So I am after dataframes df1,df2,df3 which I can then combine etc.
Andrie's answer below is perfect but I am having trouble extending it to two variables
myQuery <- function(t,x){
sqlQuery(channel,paste("Select myCol from myTable where myTextVal='",t,"' and myVal =", x, sep=""))
}
x <- c(3,9,12)
t <-c("00","10","12")
myData <- lapply(c(t,x), myQuery)
I am getting an 'Error in paste... argument "x" is missing, with no default'
I'm not sure if it is because there is a mix of numeric and character variables in lapply vector
but applying as.numeric /as.character in the sql statement did not seem to help

The R idiom would be to use an apply type function instead of a loop. The effect of this is that your resultant data object is a list. In this case it will be a list of data.frame objects.
Something like the following:
myQuery <- function(x){
sqlQuery(channel,paste("Select myCol from myTable where myVal =", x, sep=""))
}
x <- c(3,9,12)
t <- c("00","10","12")
myData <- lapply(c(t, x), myQuery)
You can then extract the individual data.frames with list subsetting:
myData[[1]]
EDIT. The point is that lapply will take a single vector as input. Your instruction c(t, x) combines its input into a single vector. Thus you shouldn't change myQuery - it still only takes a single input argument.

Well, the assign function needs both the name and the value as arguments:
assign(nam, sqlQuery(channel,paste("Select myCol from myTable where myVal =",x[i],sep="")))
Type ?assign to learn more...

You need mapply:
myData <- mapply(myQuery, t, x, SIMPLIFY=FALSE)
But I think better solution is to first prepare queries:
queries <- sprintf(
"Select myCol from myTable where myTextVal='%s' and myVal=%i",
t, x) # here I assume that x is integer, see ?sprintf for other formats
queries
[1] "Select myCol from myTable where myTextVal='00' and myVal=3"
[2] "Select myCol from myTable where myTextVal='10' and myVal=9"
[3] "Select myCol from myTable where myTextVal='12' and myVal=12"
And then lapply over them:
myData <- lapply(queries, function(sql) sqlQuery(channel, sql))
# could be simplified to:
myData <- lapply(queries, sqlQuery, channel=channel)

Related

Save a dataframe name and then reference that object in subsequent code

Would like to reference a dataframe name stored in an object, such as:
dfName <- 'mydf1'
dfName <- data.frame(c(x = 5)) #want dfName to resolve to 'mydf1', not create a dataframe named 'dfName'
mydf1
Instead, I get: Error: object 'mydf1' not found
CORRECTED SCENARIO:
olddf <- data.frame(c(y = 8))
mydf1 <- data.frame(c(x = 5))
assign('dfName', mydf1)
dfName <- olddf #why isnt this the same as doing "mydf1 <- olddf"?
I don't want to reference an actual dataframe named "dfName", rather "mydf1".
UPDATE
I have found a clunky workaround for what I wanted to do. The code is:
olddf <- data.frame(x = 8)
olddfName <- 'olddf'
newdfName <- 'mydf1'
statement <- paste(newdfName, "<-", olddfName, sep = " ")
writeLines(statement, "mycode.R")
source("mycode.R")
Anyone have a more elegant way, especially without resorting to a write/source?
I am guessing you want to store multiple data.frames in a loop or similar. In that case it is much more efficient and better to store them in a named list. However, you can achieve your goal with assign
assign('mydf1', data.frame(x = 5))
mydf1
x
1 5

R function for looping over a string with unique values

I am working on a project where I have to download more than 10 million records on a relatively small server. So instead of just downloading the entire dataset, I have to download it in smaller sections. I am trying to create a loop that will call batches of the data based on date. I'm used to coding in Stata where you can call a local by using `x' or some variant within a string. However, I can't find a way to do this in R. Below is a small piece of the code I'm using. Basically, whenever I try to run this 'val' and 'val2' aren't updating with the dates in the defined lists so the output literally just reads as if the server is trying to search between 'val' and 'val2' instead of between '20190101' and '20190301'. Any suggestions for how to fix this are greatly appreciated!
x<-c(20190101, 20190301)
y<-c(20190301, 20190501)
foreach (val=x, val2=y) %do% {
data<-DBI::dbGetQuery(myconn, "SELECT * FROM .... WHERE (DATE BETWEEN 'val' AND 'val2')")
}
With a basic loop
x<-c(20190101, 20190301)
y<-c(20190301, 20190501)
data_all = c()
for(i in 1:length(x)){
query = paste0("SELECT * FROM .... WHERE (DATE BETWEEN '",
x[i], "' AND '", y[i], "')")
data <- DBI::dbGetQuery(myconn, query)
data_all = rbind(data_all, data)
}
With sprintf you can construct the query and use lapply + do.call to combine the results into one dataframe.
x<-c(20190101, 20190301)
y<-c(20190301, 20190501)
input <- sprintf("SELECT * FROM .... WHERE (DATE BETWEEN '%s' AND '%s')", x, y)
result <- do.call(rbind, lapply(input, function(x) DBI::dbGetQuery(myconn, x)))
Using purrr::map_df is a bit shorter.
result <- purrr::map_df(input, ~DBI::dbGetQuery(myconn, .x))

How to lookup data and print values based on criteria in R?

So I have a csv file that has 12 columns of data, what I want to do is get specific values from the CSV file based on the desired criteria
A snip of the data is provided, so I have this list of Maps:
Maps <- c("Nuke","Vertigo","Inferno","Mirage","Train","Overpass","Dust2")
The goal is to get CTWinProb & TWinProb values for each of the maps in the Map list, e.g.
CTWinProbs;
Nuke = 0.5758
Dust2 = 0.4965
Inferno = 0.4885
etc and vice versa for TWinProb
So far I have been using sqldf library which is very tedious, this is what I am currently doing:
T1NukeCT <- sqldf("select CTWinProb from Team1 where MapName like '%Nuke%'")
which outputs T1NukeCT = 0.5758
and repeating for each Map and then again for TWinProb
I am sure there is an easier way, just quite new to using R so am not 100% on the best method here or how to go about doing it in a less tedious manner
You may use a WHERE IN (...) clause:
Maps <- c("Nuke","Vertigo","Inferno","Mirage","Train","Overpass","Dust2")
where_in <- paste0("('", paste(Maps, collapse="','"), "')")
sql <- paste0("SELECT CTWinProb FROM Team1 WHERE MapName IN ", where_in)
T1NukeCT <- sqldf(sql)
To be clear, the SQL query generated by the above script is:
SELECT CTWinProb
FROM Team1
WHERE MapName IN ('Nuke','Vertigo','Inferno','Mirage','Train','Overpass','Dust2')
What output/results are you looking for exactly?
If you want results in R, these are two simple functions to return the desired values.
They require the dplyr package to be loaded.
library(dplyr)
YourData <- read_csv("./yourfile/.csv")
CTWinFunc <- function(x){
YourData %>% filter(MapName == x) %>% pull(CTWinProb)}
TWinFunc <- function(x){
YourData %>% filter(MapName == x) %>% pull(TWinProb)}
Now CTWinFunc("Nuke") should return CTWinProb result for Nuke, ie: 0.5758
And TWinFunc("Nuke") should return TWinProb result for Nuke, ie: 0.4242
If you want to return a vector with all the results together, I guess you could use the sapply() function. Something like this...
TWins <- sapply(Maps, TWinFunc)
TWins[lengths(TWins)==0] <- NA
TWins <- unlist(TWins)
And this should give you a table with the results:
cbind(Maps, Twins)
Of course, it seems like all this data is already in the original table and you could just subset that.
YourData[,c(4,11,12)]

argument "second_str" is missing with no default

I have written a function which finds the intersection between two strings. I want to use this function in apply and find out all the intersections in the given data frame. I am using below code.
Function:-
common <- function(first_str,second_str)
{
a <- unlist(strsplit(first_str," "))
b <- unlist(strsplit(second_str," "))
com <- intersect(a,b)
return((length(com)/length(union(a,b)))*100)
}
Data frame:-
str1 <- c("One Two Three","X Y Z")
str2 <- c("One Two Four", "X Y A")
df <- data.frame(str1, str2)
When use apply I get argument "second_str" is missing with no default error
apply(df, 1, common)
Could you please help me out with the solution?
apply() will only pass a single vector to the function you provide. With margin=1 it will call your function once per each row with a single vector containing all the values for the "current" row. It will not split up those values into multiple parameters to your function.
You could instead re-write your function to
common2 <- function(x) {
first_str <- x[1]
second_str <- x[2]
a <- unlist(strsplit(first_str," "))
b <- unlist(strsplit(second_str," "))
com <- intersect(a,b)
return((length(com)/length(union(a,b)))*100)
}
Although that doesn't scale well for multiple parameters. YOu could also use Map or mapply to iterate over multiple vectors at a time
If your original function you can do
with(df, Map(common, str1, str2))

R- How to do a loop on a list and output different dataframes

I'm attempting to create a loop in R that will use a vector of dates, run them through a loop that includes a SQL query, and then generate a separate dataframe for each output. Here is as far as I've gotten:
library(RODBC)
dvect <- as.Date("2015-04-13") + 0:2
d <- list()
for(i in list(dvect)){
queryData <- sqlQuery(myconn, paste("SELECT
WQ_hour,
sum(calls) as calls
FROM database
WHERE DDATE = '", i,"'
GROUP BY 1
", sep = ""))
d[i] <- rbind(d, queryData)
}
From what I can tell, the query portion of the code runs fine since I've tested it by itself. Where I'm stumbling is the last line where I try to save the contents of each loop through the query separately with each having a label of the date that was used in the loop.
I'd appreciate any help. I've only been using R consistently for about 2 months now so I'm definitely open to alternative ways of doing this that are cleaner and more efficient.
Thanks.
I'd suggest making the SQL query a function, and use lapply to apply it and return your result as a list.
userSQLquery = function(i) {
sqlQuery(myconn, paste("SELECT
WQ_hour,
sum(calls) as calls
FROM database
WHERE DDATE = '", i,"'
GROUP BY 1
", sep = ""))
}
dvect = as.Date("2015-04-13") + 0:2
d = as.list(1:length(dvect))
names(d) = dvect
lapply(d, userSQLquery)
I have very little experience with SQL though, so this may not work. Maybe it could start you off?
Looks like a job for lapply (lapply documentation)instead of a for loop. (In R it's often good to avoid a for loop by using a vectorization.)
If you want each date to return a separate data frame, and then have each data frame labelled with the original date, try:
dates <- c("Jan 1", "Oct 31", "Dec 25")
queryData <- function(date){
#dummy data
return(runif(5))
}
results <- lapply(dates, queryData)
names(results) <- dates
Either use:
d[[i]] <- queryData
if you want each data.frame (query result) as a separate element in the list output d.
Or use:
d <- rbind(d, queryData)
if you want a single data.frame with all the query outputs combined. In this case you should declare d as a data.frame (i.e. d <- data.frame()).
You can also store each data.frame (i.e. the query result) with its corresponding date in a list as:
d[[i]] <- list(date = dvect[[i]], queryResult = queryData)
I think the last one is what you are looking for.

Resources