I have a function wrapping RODBC::sqlQuery that takes a start & end date and returns 5 columns with and roughly 1 million rows per call. I need to iterate through a list of about 60 dates storing the function's resulting data frames in a list.
What I want to know is:
How to pass both start and end date arguments to the function in an
apply-style fashion
How to store the resulting data frames neatly (like a table of |date|data.frame.pointer|)
Here's some of the code:
get.data <- function(date.start, date.end) { ... }
date.range <- seq(as.Date("2009-01-01"), Sys.Date(), by="1 month")
And sample output:
get.data(date.start="2009-01-01", date.end='2009-02-01')
date country oId eId pId
1 2009-01-01 Australia 12345 12345 12345
2 ... ... ... ... ...
Thank you for your help. I've been trying to figure out how to do this for hours to no avail.
For what you want, mapply will do the trick:
n <- length(date.range)
mapply(get.data, date.range[-n], date.range[-1])
This returns a list whose elements are the individual returned values from get.data. So in this case, you would get a list of data frames. That may well be the most appropriate way of storing the output, but it would depend on what you want to do with it.
Related
hopefully it is not a duplicate, rather difficult to phrase it correctly (relatively new to R)
So the problem is: I want to use sequences of dates excluding certain weekdays based on the row information. I can use bizdays and create calendar objects on the fly but it is quite inefficient - I would rather have them created before and use as needed. On the other side I do not want to create a calendar for every single object that can happen to occur (too many to bother, combination of all weekdays plus versions with/without holidays).
I can create a dataframe with list of dates between start/end date for every row, but i need to provide a calendar with weekdays
P <- setDT(R)[, list(ID=ID,
dt=bizseq(Start.Date,End.Date, cal)
), by=1:nrow(R)]
To provide a calendar I have to define it like
cal <- Calendar(weekdays=c("monday", "tuesday"))
now a working dataset that could explain what i am struggling with
> M <-c(0,1,1,0)
> T <- c(1,1,1,0)
> W <- c(0,0,0,1)
> df <- data.frame(M,T,W)
> df$S <-paste0("c",df$M,df$T,df$W)
> udf <- unique(df)
> udf
M T W S
1 0 1 0 c010
2 1 1 0 c110
4 0 0 1 c001
using udf i would like to create a list of calendar objects that i can afterwards pass to the bizseq using get(df$S), something along the lines of
require(bizdays)
loop or apply?
.... <- Calendar(weekdays=c(ifelse(udf$M==0,"","monday"), ifelse(udf$T==0,"","tuesday"),ifelse(udf$W==0,"","wednesday")))
So now the right questions;) Firstly - is it the best approach? then if so - how to create these 3 objects under their names ("c101" etc), so for example the c100 will match the calendar with Monday on - it is not a question how to create a calendar as the method above works (it is enough to substitute the dots with the name), but how to create object c101 that would become a calendar if i create names in a dynamic way? I could imagine looping through the rows, but have no idea how to force the resulting object to be named udf$S. Unless you reckon there is any better method of providing the corresponding calendar than get() from a list of pre-created objects (for a dataframe with thousands of dates and combination of days off).
I would like basically to end up with 3 calendar objects named c010, c110, c001, but if the expanded table has more unique options to create all other combinations before i run the setDT() function
Afterthought: I can add ID to the udf and call the calendars by index and then return the index to df, but I wonder if it is possible to create dynamic names of objects just as I tried
NOTE
following Sathish's lead I used what seems sufficient:
for(i in 1:nrow(udf)) {
cal <- Calendar(weekdays=c(ifelse(udf[i,1]==0,"","monday"), ifelse(udf[i,2]==0,"","tuesday"),ifelse(udf[i,3]==0,"","wednesday")))
assign(udf[i,4], cal)
}
I am trying to write a script that loops through month-end dates and compares associated fields, but I am unable to find a way to way to do this.
I have my data in a flatfile and subset based on 'TheDate'
For instance I have:
date.range <- subset(raw.data, observation_date == theDate)
Say TheDate = 2007-01-31
I want to find the next month included in my data flatfile which is 2007-02-28. How can I reference this in my loop?
I currently have:
date.range.t1 <- subset(raw.data, observation_date == theDate+1)
This doesnt work obviously as my data is not daily.
EDIT:
To make it more clear, my data is like below
ticker observation_date Price
ADB 31/01/2007 1
ALS 31/01/2007 2
ALZ 31/01/2007 3
ADB 28/02/2007 2
ALS 28/02/2007 5
ALZ 28/02/2007 1
I am using a loop so I want to skip from 31/01/2007 to 29/02/2007 by recognising it is the next date, and use that value to subset my data
First get unique values of date like so:
unique_dates<-unique(raw.data$observation_date)
The sort these unique dates:
unique_dates_ordered<-unique_dates[order(as.Date(unique_dates, format="%Y-%m-%d"))]
Now you can subset based on the index of unique_dates_ordered i.e.
subset(raw.data, raw.data$observation_date == unique_dates_ordered[i])
Where i = 1 for the first value, i = 2 for the second value etc.
I'm looking to sort/process some large data before I enter it into a function.
I have a large dataset of log readings, consisting of many unique addresses and timings.
The data looks a bit like this:
UNIQUE_ADDRESS1 24/08/2016 13:01
UNIQUE_ADDRESS2 24/08/2016 13:02
UNIQUE_ADDRESS3 24/08/2016 13:05
UNIQUE_ADDRESS1 25/08/2016 00:00
UNIQUE_ADDRESS2 25/08/2016 00:01
UNIQUE_ADDRESS3 25/08/2016 00:12
I am ultimately running a function that needs individual data frames consisting of a specific unique address and a specific date only.
The data frame will look like this, consisting of all rows which contain the specific unique address AND specific date.
dataframe1 <- [UNIQUE_ADDRESS1 24/08/2016 13:01,
UNIQUE_ADDRESS1 24/08/2016 13:03,
UNIQUE_ADDRESS1 24/08/2016 13:06,
UNIQUE_ADDRESS1 24/08/2016 13:08
... etc]
Where there will be a dataframe2 which corresponds to UNIQUE_ADDRESS1 again, but with all the timings within the date of 25/08/2016 instead. This will be done for each device.
I figured this needs to be done in a loop, but I can't get the syntax done correctly.
So far I am using grep to extract each unique address from the massive log file, like this, to create data frames with each device separately:
device1 <- logfile[grep("^UNIQUE_ADDRESS1", logfile[,2]), ]
Then I have created an array of dates:
dates <- c("23/09/2016", "24/08/2016", "25/08/2016")
I now want to create new data frames that combine each individual unique addresses and each date. So all the log readings for UNIQUE_ADDRESS1 on date 23/09/2016 in one data frame, then another for 24/08/2016, etc. The same for each UNIQUE_ADDRESS.
I've tried using grep and grepl, but when I have used them in an if loop or ifelse loop they claim that my dates are not in my device data frames (which they definitely are), and any value I try, it returns false no matter what.
Can anybody help me with how I can achieve my aim?
Thanks
Edit
At the moment I'm trying to do this in a for loop, where "device1" contains the unique addresses for device 1, etc. However, when I increment i it'll only save to the same data fame (device1) instead of a new dataframe which is what I need.
for (k in 1:6)
device1 <- device1[grep(dates[i], device1[,4]), ]
device2 <- device2[grep(dates[i], device1[,4]), ]
device3 <- device3[grep(dates[i], device1[,4]), ]
device4 <- device4[grep(dates[i], device1[,4]), ]
device5 <- device5[grep(dates[i], device1[,4]), ]
device6 <- device6[grep(dates[i], device1[,4]), ]
device7 <- device7[grep(dates[i], device1[,4]), ]
device8 <- device8[grep(dates[i], device1[,4]), ]
Assuming that you have your input data in a data frame, e.g.
> myTable
Var1 Var2 Var3
1 UNIQUE_ADDRESS1 24/08/2016 13:01
2 UNIQUE_ADDRESS2 24/08/2016 13:02
3 UNIQUE_ADDRESS3 24/08/2016 13:05
4 UNIQUE_ADDRESS1 25/08/2016 0:00
5 UNIQUE_ADDRESS2 25/08/2016 0:01
6 UNIQUE_ADDRESS3 25/08/2016 0:12
Consider using the dlply function of the R-package plyr.
library(plyr)
myList = dlply(aTable, ~ Var1 + Var2, .fun = identity)
Each element of the list myList will be one of your sub-tables, e.g.
> myList[[1]]
Var1 Var2 Var3
1 UNIQUE_ADDRESS1 24/08/2016 13:01
I have a large data set in which I have to search for specific codes depending on what i want. For example, chemotherapy is coded by ~40 codes, that can appear in any of 40 columns called (diag1, diag2, etc).
I am in the process of writing a function that produces plots depending on what I want to show. I thought it would be good to specify what I want to plot in a input data frame. Thus, for example, in case I only want to plot chemotherapy events for patients, I would have a data frame like this:
Dataframe name: Style
Name SearchIn codes PlotAs PlotColour
Chemo data[substr(names(data),1,4)=="diag"] 1,2,3,4,5,6 | red
I already have a function that searches for codes in specific parts of the data frame and flags the events of interest. What i cannot do, and need your help with, is referring to a data frame (Style$SearchIn[1]) using codes in a data frame as above.
> Style$SearchIn[1]
[1] data[substr(names(data),1,4)=="diag"]
Levels: data[substr(names(data),1,4)=="diag"]
I thought perhaps get() would work, but I cant get it to work:
> get(Style$SearchIn[1])
Error in get(vars$SearchIn[1]) : invalid first argument
enter code here
or
> get(as.character(Style$SearchIn[1]))
Error in get(as.character(Style$SearchIn[1])) :
object 'data[substr(names(data),1,5)=="TDIAG"]' not found
Obviously, running data[substr(names(data),1,5)=="TDIAG"] works.
Example:
library(survival)
ex <- data.frame(SearchIn="lung[substr(names(lung),1,2) == 'ph']")
lung[substr(names(lung),1,2) == 'ph'] #works
get(ex$SearchIn[1]) # does not work
It is not a good idea to store R code in strings and then try to eval them when needed; there are nearly always better solutions for dynamic logic, such as lambdas.
I would recommend using a list to store the plot specification, rather than a data.frame. This would allow you to include a function as one of the list's components which could take the input data and return a subset of it for plotting.
For example:
library(survival);
plotFromSpec <- function(data,spec) {
filteredData <- spec$filter(data);
## ... draw a plot from filteredData and other stuff in spec ...
};
spec <- list(
Name='Chemo',
filter=function(data) data[,substr(names(data),1,2)=='ph'],
Codes=c(1,2,3,4,5,6),
PlotAs='|',
PlotColour='red'
);
plotFromSpec(lung,spec);
If you want to store multiple specifications, you could create a list of lists.
Have you tried using quote()
I'm not entirely sure what you want but maybe you could store the things you're trying to get() like
quote(data[substr(names(data),1,4)=="diag"])
and then use eval()
eval(quote(data[substr(names(data),1,4)=="diag"]), list(data=data))
For example,
dat <- data.frame("diag1"=1:10, "diag2"=1:10, "other"=1:10)
Style <- list(SearchIn=c(quote(data[substr(names(data),1,4)=="diag"]), quote("Other stuff")))
> head(eval(Style$SearchIn[[1]], list(data=dat)))
diag1 diag2
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
I have a tricky problem with applying a function to a list of data frames. Ultimately I want to plot individual time series charts for large data set of drug usage figures.
My dataset comprises 30 different antibiotics with a usage rate that has been collected monthly over a 5 year period. It has 3 columns and 1692 rows.
So far I have made a list of individual data frames for each antibiotic class. (The name of the list is drug and drug.class is a character vector of drug names from the original data frame)
drugList <- list()
n<-length(drug.class)
for (i in 1:n){
drugList[[i]] <-AB[Drug==(drug.class[i]),]
}
For example, I have 30 data frames in a list with the following columns:
[[29]]
Drug Usage DateA
1353 Tobramycin 5.06 01-Jan-2006
1354 Tobramycin 4.21 01-Feb-2006
1355 Tobramycin 6.34 01-Mar-2006
.
.
.
Drug Usage DateA
678 Vancomycin 11.62 01-Jan-2006
679 Vancomycin 11.94 01-Feb-2006
680 Vancomycin 14.29 01-Mar-2006
Before each plot is made a logical test is performed to determine if the time series is autocorrelated. The data frmaes in the list are of verying lengths.
I have written a function to perform the test as follows:
acTest <- function(){
id<-ts(1:length(DateA))
a1<-ts(Usage)
a2<-lag(a1-1)
tg<-ts.union(a1,id,a2)
mg<-lm(a1~a2+bs(id,df=3), data=tg)
a2Pval <- summary(mg)$coefficients[2, 4]
if (a2Pval<=0.05) {
TRUE
} else {
FALSE
}
}
I have previously tested all my functions on individual data frames and they work as expected.
I am trying to work out how to apply the test to each data frame in the drug list. I believe if I can get help working this out I will be in a position to apply the time series functions in the same manner.
Thanks in advance for any assistance offered.
A few suggestions:
Change your acTest function so that it actually accepts a data.frame as a parameter. Otherwise you'll have lots of problems with the function looking for (and modifying) objects named DateA and Usage in the global environment.
acTest <- function(dat){
id<-ts(1:length(dat$DateA))
a1<-ts(dat$Usage)
a2<-lag(a1-1)
tg<-ts.union(a1,id,a2)
mg<-lm(a1~a2+bs(id,df=3), data=tg)
a2Pval <- summary(mg)$coefficients[2, 4]
if (a2Pval<=0.05) {
TRUE
} else {
FALSE
}
}
Applying a function to each element of a list is a common task in R. It is (most often) done using lapply.
lapply(drugList,FUN=acTest)
Finally, you can do tasks like this without storing each data frame as a separate list element by using tools like ddply (among others) that split a data frame using one variable, apply a function to each piece and then reassemble them into a single data frame again. In your case, that would look something like:
ddply(AB,.(Drug),.fun = acTest)