Reshaping data for panel regression from Datastream - r

I have downloaded data from Datastream in form one variable per sheet.
Current data view - One variable: Price
What I want to do it to convert each sheet (each variable) into panel format so that I can use plm() or export data to Stata (I am kind of new to R), so that it looks like
Click to view - What I expect to have
One conundrum is that I have >500 companies and manually writting the names (or codes) in the R code is very burdensome
I would really appreciate if you could sketch a basic code and not just refer to reshape function in R.
P.S. Sorry for posting this question if it was already answered.

Your current data set is in wide format and you need it in long format and melt function from reshape package will do very well
The primary key for melt function is date since it is the same for all companies
I have assumed a test dataset for the below demo:
#Save Price, volume, market value, shares, etc into individual CSV files
#Rename first column as "date" and Remove rows 2 and 3 since you do not need them
#Demo for price data
price_data = read.csv("path_to_price_csv_file",header=TRUE,stringsAsFactors=FALSE,na.strings="NA")
test_DF = price_data
require(reshape2)
require(PerformanceAnalytics)
data(managers)
test_DF = data.frame(date=as.Date(index(managers),format="%Y-%m-%d"),managers,row.names=NULL,stringsAsFactors=FALSE)
#This data is similar in format as your price data
head(test_DF)
# date HAM1 HAM2 HAM3 HAM4 HAM5 HAM6 EDHEC.LS.EQ SP500.TR US.10Y.TR US.3m.TR
# 1 1996-01-31 0.0074 NA 0.0349 0.0222 NA NA NA 0.0340 0.00380 0.00456
# 2 1996-02-29 0.0193 NA 0.0351 0.0195 NA NA NA 0.0093 -0.03532 0.00398
# 3 1996-03-31 0.0155 NA 0.0258 -0.0098 NA NA NA 0.0096 -0.01057 0.00371
# 4 1996-04-30 -0.0091 NA 0.0449 0.0236 NA NA NA 0.0147 -0.01739 0.00428
# 5 1996-05-31 0.0076 NA 0.0353 0.0028 NA NA NA 0.0258 -0.00543 0.00443
# 6 1996-06-30 -0.0039 NA -0.0303 -0.0019 NA NA NA 0.0038 0.01507 0.00412
#test_data = test_DF #replace price, volume , shares dataset here
#dateColumnName = "date" #name of your date column
#columnOfInterest1 = "manager" #for you this will be "Name"
#columnOfInterest2 = "return" #this will vary according to your input data, price, volume, shares etc.
Custom_Melt_DataFrame = function(test_data = test_DF ,dateColumnName = "date", columnOfInterest1 = "manager",columnOfInterest2 = "return") {
molten_DF = melt(test_data,dateColumnName,stringsAsFactors=FALSE)
colnames(molten_DF) = c(dateColumnName,columnOfInterest1,columnOfInterest2)
#format as character
molten_DF[,columnOfInterest1] = as.character(molten_DF[,columnOfInterest1])
#assign index
molten_DF$index = rep(1:(ncol(test_data)-1),each=nrow(test_data))
#reorder columns
molten_DF = molten_DF[,c("index",columnOfInterest1,dateColumnName,columnOfInterest2)]
return(molten_DF)
}
custom_data = Custom_Melt_DataFrame (test_data = test_DF ,dateColumnName = "date", columnOfInterest1 = "manager",columnOfInterest2 = "return")
head(custom_data,10)
# index manager date return
# 1 1 HAM1 1996-01-31 0.0074
# 2 1 HAM1 1996-02-29 0.0193
# 3 1 HAM1 1996-03-31 0.0155
# 4 1 HAM1 1996-04-30 -0.0091
# 5 1 HAM1 1996-05-31 0.0076
# 6 1 HAM1 1996-06-30 -0.0039
# 7 1 HAM1 1996-07-31 -0.0231
# 8 1 HAM1 1996-08-31 0.0395
# 9 1 HAM1 1996-09-30 0.0147
# 10 1 HAM1 1996-10-31 0.0288
tail(custom_data,10)
# index manager date return
# 1311 10 US.3m.TR 2006-03-31 0.00385
# 1312 10 US.3m.TR 2006-04-30 0.00366
# 1313 10 US.3m.TR 2006-05-31 0.00404
# 1314 10 US.3m.TR 2006-06-30 0.00384
# 1315 10 US.3m.TR 2006-07-31 0.00423
# 1316 10 US.3m.TR 2006-08-31 0.00441
# 1317 10 US.3m.TR 2006-09-30 0.00456
# 1318 10 US.3m.TR 2006-10-31 0.00381
# 1319 10 US.3m.TR 2006-11-30 0.00430
# 1320 10 US.3m.TR 2006-12-31 0.00441

Related

R: data.table using a for loop to wrangle multiple columns

I am currently working in R to build a for loop which will add the year to 7 columns that contain partial dates (dd/mm). I have been attempting to run the following for-loop and have not been successful. What am I doing wrong?
Here's a sample of what my data set looks like (The actual data set includes columns HomDate - HomDate_7 but I only included the first few as I know you'll get the point...)
Participant DateVisit HomDate HomDate_2 HomeDate_3 year_flag
1 2012-04-25 18/04 19/04 20/04 NA
2 2012-01-04 28/12 29/12 30/12 1
3 2012-01-05 31/12 01/01 01/02 1
4 2012-06-13 06/06 07/06 08/06 NA
5 2012-02-12 05/02 06/02 07/02 NA
Here's the code I've been trying to use:
hom_date <- list("HomDate", "HomDate_2", "HomDate_3", "HomDate_4", "HomDate_5", "HomDate_6",
"HomDate_7")
set_dates <- function(x){
home_morbid[,x:=as.character(x)]
home_morbid[(substr(x, 4, 5)==12) & (year_flag==1), x:=paste(x, "/2011", sep="")]
home_morbid[(substr(x, 4, 5)==01) & (year_flag==1), x:=paste(x, "/2012", sep="")]
home_morbid[is.na(year_flag), x:=paste(x, "/", substr(DateVisit, 1, 4), sep="")]
}
for(i in 1:length(hom_date)){
x <- hom_date[i]
home_morbid_2<-set_dates(x)
}
I'm not sure what happens to those with an NA flag. Here is an approach:
to_replace<-grep("^Hom",names(df))
df[,(to_replace):=lapply(.SD, function(x) ifelse(is.na(year_flag),x,
ifelse(substr(x, 4, 5)==12,
paste0(x,"/","2011"),
paste0(x,"/","2012")))),
.SDcols=HomDate:HomeDate_3][]
Participant DateVisit HomDate HomDate_2 HomeDate_3 year_flag
1: 1 2012-04-25 18/04 19/04 20/04 NA
2: 2 2012-01-04 28/12/2011 29/12/2011 30/12/2011 1
3: 3 2012-01-05 31/12/2011 01/01/2012 01/02/2012 1
4: 4 2012-06-13 06/06 07/06 08/06 NA
5: 5 2012-02-12 05/02 06/02 07/02 NA
To replace NA flagged years with the year from DateVisit:
library(lubridate)
to_replace<-grep("^Hom",names(df))
df[,(to_replace):=lapply(.SD, function(x) ifelse(is.na(year_flag),
paste0(x,"/",year(ymd(DateVisit))),
ifelse(substr(x, 4, 5)==12,
paste0(x,"/","2011"),
paste0(x,"/","2012")))),
.SDcols=HomDate:HomeDate_3][]
Participant DateVisit HomDate HomDate_2 HomeDate_3 year_flag
1: 1 2012-04-25 18/04/2012 19/04/2012 20/04/2012 NA
2: 2 2012-01-04 28/12/2011 29/12/2011 30/12/2011 1
3: 3 2012-01-05 31/12/2011 01/01/2012 01/02/2012 1
4: 4 2012-06-13 06/06/2012 07/06/2012 08/06/2012 NA
5: 5 2012-02-12 05/02/2012 06/02/2012 07/02/2012 NA

Mutate while accessing value in list column in a pipe with map and pluck

I would like to achieve the following:
filter dataframe catalogs based on multiple columns in dataframe orders, for each row in dataframe orders and store the result in a list column in dataframe orders. (succeeded)
calculate the difference between a date in data frame orders and another date in the new listcolumn.
Table s_orders contains order data for different people (account keys). Table s_catalogs contains all catalogs that were sent to each account key
For each order, I want to know:
if and what catalogs were sent from the previous order (or the beginning) until the day before the focal order. More specifically, consumers received a (paper) catalog at s_catalogs$CATDATE. I want to know for each order what catalogs were received between the previous order (s_orders$PREVORDER) and the latest order. Because some consumers do not have a previous order I set the previous order date startdate to date("1999-12-31") which is the beginning of my dataset.
Then I want to do some calculations on the catalog data. (in this example: calculate the difference between date of a catalog and the order date)
For this, I have written a function getCatalogs, which takes the account key and two dates as input, and outputs a dataframe with the results from the other table. Would be much appreciated if someone has a better, more efficient solution? maybe with some sort of join?
I think my main problem is how to use mutate, pmap, pipes, pluck interchangeably for building complex queries on multiple tables.
My actual problem is outlined in sections Desired result and Problem.
# packages needed
library("dplyr")
library("lubridate")
library("purrr")
#library("tidyverse")
Example data
( i sampled some users from my data. s_ stands for 'sample')
# orders
s_orders <- structure(list(ACCNTKEY = c(2806, 2806, 2806, 3729, 3729, 3729,
3729, 4607, 4607, 4607, 4607, 4742, 11040, 11040, 11040, 11040,
11040, 17384), ORDDATE = structure(c(11325, 11703, 11709, 11330,
11375, 11384, 12153, 11332, 11445, 11589, 11713, 11333, 11353,
11429, 11662, 11868, 11960, 11382), class = "Date")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -18L))
# # A tibble: 18 x 2
# ACCNTKEY ORDDATE
# <dbl> <date>
# 1 2806 2001-01-03
# 2 2806 2002-01-16
# 3 2806 2002-01-22
# 4 3729 2001-01-08
# 5 3729 2001-02-22
# 6 3729 2001-03-03
# 7 3729 2003-04-11
# 8 4607 2001-01-10
# 9 4607 2001-05-03
# 10 4607 2001-09-24
# 11 4607 2002-01-26
# 12 4742 2001-01-11
# 13 11040 2001-01-31
# 14 11040 2001-04-17
# 15 11040 2001-12-06
# 16 11040 2002-06-30
# 17 11040 2002-09-30
# 18 17384 2001-03-01
# catalogs
s_catalogs <- structure(list(ACCNTKEY = c("2806", "2806", "4607", "2806", "4607",
"4607", "4607"), CATDATE = structure(c(11480, 11494, 11522, 11858,
11886, 12264, 12250), class = "Date"), CODE = c("2806/07/2001",
"2806/21/2001", "4607/19/2001", "2806/20/2002", "4607/18/2002",
"4607/31/2003", "4607/17/2003")), row.names = c(NA, -7L), class = c("tbl_df",
"tbl", "data.frame"))
# # A tibble: 7 x 3
# ACCNTKEY CATDATE CODE
# <chr> <date> <chr>
# 1 2806 2001-06-07 2806/07/2001
# 2 2806 2001-06-21 2806/21/2001
# 3 4607 2001-07-19 4607/19/2001
# 4 2806 2002-06-20 2806/20/2002
# 5 4607 2002-07-18 4607/18/2002
# 6 4607 2003-07-31 4607/31/2003
# 7 4607 2003-07-17 4607/17/2003
calculate the lagged order date
# calculate previous order date for each order in s_orders
s_orders<-s_orders %>%
group_by(ACCNTKEY) %>%
arrange(ORDDATE) %>%
mutate(PREVORDER=as_date(lag(ORDDATE)))
So now we know the previous order (if any)
Function getCatalogs (improvement appreciated)
So the below function getCatalogs returns a dataframe with the catalogs that were received by that account key before the order (or actually in between the last orders/catalogs that were received between startdate and enddate).
# in case _startdate_ is missing then I set it to some starting value
getCatalogs<-function(key,startdate,enddate){
if(is.na(startdate)){
startdate<-as_date(date("1999-12-31"))
}
tmp <- s_catalogs[s_catalogs$ACCNTKEY==key &
s_catalogs$CATDATE<enddate &
s_catalogs$CATDATE>=startdate,]
if (NROW(tmp)>0){
return(tmp)
}else{return(NA)}
}
Use the function
let's get for each order all catalogs in a listcolumn
# For each row in s_orders search in dataframe s_catalogs all catalogs that were received for that account key before the order date but after the previous order.
s_orders <- s_orders %>% as_tibble() %>%
mutate(catalogs =
pmap(c(list(ACCNTKEY),list(PREVORDER),list(ORDDATE)),.f= function(x,y,z){getCatalogs(x,y,z)}))
This line for example gets the date of the latest catalog, which is what i need:
s_orders %>% pluck("catalogs") %>% pluck(13) %>% pluck("CATDATE") %>% max()
# [1] "2001-06-21"
Desired result:
Now I would like to retrieve the number of days between the above date and the date of the order (ORDDATE). The following code does it exactly but it is only correct in row 13.
# get amount of days since last catalog
s_orders3 <- s_orders %>%
mutate(diff = ORDDATE - s_orders %>%
pluck("catalogs") %>% pluck(13) %>% pluck("CATDATE") %>% max())
# # A tibble: 18 x 5
# ACCNTKEY ORDDATE PREVORDER catalogs diff
# <dbl> <date> <date> <list> <time>
# 1 2806 2001-01-03 NA <lgl [1]> -169 days
# 2 3729 2001-01-08 NA <lgl [1]> -164 days
# 3 4607 2001-01-10 NA <lgl [1]> -162 days
# 4 4742 2001-01-11 NA <lgl [1]> -161 days
# 5 11040 2001-01-31 NA <lgl [1]> -141 days
# 6 3729 2001-02-22 2001-01-08 <lgl [1]> -119 days
# 7 17384 2001-03-01 NA <lgl [1]> -112 days
# 8 3729 2001-03-03 2001-02-22 <lgl [1]> -110 days
# 9 11040 2001-04-17 2001-01-31 <lgl [1]> -65 days
# 10 4607 2001-05-03 2001-01-10 <lgl [1]> -49 days
# 11 4607 2001-09-24 2001-05-03 <tibble [1 × 3]> 95 days
# 12 11040 2001-12-06 2001-04-17 <lgl [1]> 168 days
# 13 2806 2002-01-16 2001-01-03 <tibble [2 × 3]> 209 days
# 14 2806 2002-01-22 2002-01-16 <lgl [1]> 215 days
# 15 4607 2002-01-26 2001-09-24 <lgl [1]> 219 days
# 16 11040 2002-06-30 2001-12-06 <lgl [1]> 374 days
# 17 11040 2002-09-30 2002-06-30 <lgl [1]> 466 days
# 18 3729 2003-04-11 2001-03-03 <lgl [1]> 659 days
Check manually:
date("2002-01-16")-date("2001-06-21")
# Time difference of 209 days
Problem
However, the code subtracts the same date from order date in every row. I want it to use the date that belongs to each particular row.
So the problem is how to replace the %>% pluck(13) %>% by some command that dows this trick to every row and put it in the diff column.
I am really searching for a solution that uses either purrr or dplyr or some other package that is just as efficient and clear.
Hoping that I have understood the question clearly, here is my attempt trying to solve the problem. I changed the getCatalogs function to return only max CATDATE in case if it is present.
library(dplyr)
library(purrr)
getCatalogs<-function(key,startdate,enddate){
if(is.na(startdate)) startdate<- as.Date("1999-12-31")
tmp <- s_catalogs$CATDATE[s_catalogs$ACCNTKEY==key &
s_catalogs$CATDATE<enddate &
s_catalogs$CATDATE>=startdate]
if (length(tmp) > 0) max(tmp) else NA
}
s1_orders<- s_orders %>%
group_by(ACCNTKEY) %>%
arrange(ORDDATE) %>%
mutate(PREVORDER=lag(ORDDATE))
and then use pmap like :
s1_orders %>%
mutate(catalogs = pmap_dbl(list(ACCNTKEY,PREVORDER,ORDDATE), getCatalogs),
catalogs = as.Date(catalogs, origin = "1970-01-01"),
diff = ORDDATE - catalogs)
# ACCNTKEY ORDDATE PREVORDER catalogs diff
# <dbl> <date> <date> <date> <drtn>
# 1 2806 2001-01-03 NA NA NA days
# 2 3729 2001-01-08 NA NA NA days
# 3 4607 2001-01-10 NA NA NA days
# 4 4742 2001-01-11 NA NA NA days
# 5 11040 2001-01-31 NA NA NA days
# 6 3729 2001-02-22 2001-01-08 NA NA days
# 7 17384 2001-03-01 NA NA NA days
# 8 3729 2001-03-03 2001-02-22 NA NA days
# 9 11040 2001-04-17 2001-01-31 NA NA days
#10 4607 2001-05-03 2001-01-10 NA NA days
#11 4607 2001-09-24 2001-05-03 2001-07-19 67 days
#12 11040 2001-12-06 2001-04-17 NA NA days
#13 2806 2002-01-16 2001-01-03 2001-06-21 209 days
#14 2806 2002-01-22 2002-01-16 NA NA days
#15 4607 2002-01-26 2001-09-24 NA NA days
#16 11040 2002-06-30 2001-12-06 NA NA days
#17 11040 2002-09-30 2002-06-30 NA NA days
#18 3729 2003-04-11 2001-03-03 NA NA days
Update
Without changing the current getCatalogs function, we can test the length of catalogs
s1_orders %>%
mutate(catalogs = pmap(list(ACCNTKEY,PREVORDER,ORDDATE), getCatalogs),
temp = map_dbl(catalogs, ~if (length(.x) > 1)
.x %>% pluck("CATDATE") %>% max else NA),
temp = as.Date(temp, origin = "1970-01-01"),
diff = ORDDATE - temp)

How to pull last nth trading day of month in XTS in R?

This question is closely related to Pull nth Day of Month in XTS in R, in which I got a good answer:
library(xts)
data(sample_matrix)
x <- as.xts(sample_matrix)
do.call(rbind, lapply(split(x, "months"), function(x) x[10]))
# Open High Low Close
# 2007-01-11 49.88529 50.23910 49.88529 50.23910
# 2007-02-10 50.68923 50.72696 50.60707 50.69562
# 2007-03-10 49.79370 49.88984 49.70385 49.88698
# 2007-04-10 49.55704 49.78776 49.55704 49.76984
# 2007-05-10 48.83479 48.84549 48.38001 48.38001
# 2007-06-10 47.74899 47.74899 47.28685 47.28685
However, I want to Pull last nth trading day of each month. For example, I have a dataframe look like this, but the time span is several years.
date change open high low close volume
1 1990-01-02 1.780 353.40 359.69 351.98 359.69 162070000
2 1990-01-03 -0.259 359.69 360.59 357.89 358.76 192330000
3 1990-01-04 -0.861 358.76 358.76 352.89 355.67 177000000
4 1990-01-05 -0.976 355.67 355.67 351.35 352.20 158530000
5 1990-01-08 0.451 352.20 354.24 350.54 353.79 140110000
6 1990-01-09 -1.179 353.83 354.17 349.61 349.62 155210000
7 1990-01-10 -0.661 349.62 349.62 344.32 347.31 175990000
8 1990-01-11 0.351 347.31 350.14 347.31 348.53 154390000
9 1990-01-12 -2.468 348.53 348.53 339.49 339.93 183880000
10 1990-01-15 -0.862 339.93 339.94 336.57 337.00 140590000
11 1990-01-16 1.113 337.00 340.75 333.37 340.75 186070000
12 1990-01-17 -0.983 340.77 342.01 336.26 337.40 170470000
13 1990-01-18 0.234 337.40 338.38 333.98 338.19 178590000
14 1990-01-19 0.284 338.19 340.48 338.19 339.15 185590000
15 1990-01-22 -2.586 339.14 339.96 330.28 330.38 148380000
16 1990-01-23 0.372 330.38 332.76 328.67 331.61 179300000
17 1990-01-24 -0.407 331.61 331.71 324.17 330.26 207830000
I want to have a function that extract last nth trading day of every month, and form a new dataframe. For example, if I want to extract last 2nd trading day of every month. The output should look like the table shown below.
date change open high low close volume
1990-01-30 -0.683 325.20 325.73 319.83 322.98 186030000
1990-02-27 0.484 328.68 331.94 328.47 330.26 152590000
1990-03-29 -0.354 342.00 342.07 339.77 340.79 132190000
1990-04-27 -1.144 332.92 333.57 328.71 329.11 130630000
Note that I want to extract the last nth data point of each month, rather than the last nth calendar date.
You could use tail.
n <- 2
do.call(rbind, lapply(split(x, "months"), function(x) tail(x, n)))
# Open High Low Close
# 2007-01-30 49.85477 50.02180 49.77242 50.02180
# 2007-01-31 50.07049 50.22578 50.07049 50.22578
# 2007-02-27 50.74333 50.78909 50.61874 50.69206
# 2007-02-28 50.69435 50.77091 50.59881 50.77091
# 2007-03-30 48.74562 49.00218 48.74562 48.93546
# 2007-03-31 48.95616 49.09728 48.95616 48.97490
# 2007-04-29 49.30289 49.30289 49.05676 49.13529
# 2007-04-30 49.13825 49.33974 49.11500 49.33974
# 2007-05-30 47.78866 47.93267 47.78866 47.83291
# 2007-05-31 47.82845 47.84044 47.73780 47.73780
# 2007-06-29 47.63629 47.77563 47.61733 47.66471
# 2007-06-30 47.67468 47.94127 47.67468 47.76719
Data
library(xts)
data(sample_matrix)
x <- as.xts(sample_matrix)

How to convert daywise(daily) data to monthly data using R? [duplicate]

This question already has answers here:
Aggregate Daily Data to Month/Year intervals
(9 answers)
Closed 7 years ago.
I have day-wise data of interest rate of 15 years from 01-01-2000 to 01-01-2015.
I want to convert this data to monthly data, which only having month and year.
I want to take mean of the values of all the days in a month and make it one value of that month.
How can I do this in R.
> str(mibid)
'data.frame': 4263 obs. of 6 variables:
$ Days: int 1 2 3 4 5 6 7 8 9 10 ...
$ Date: Date, format: "2000-01-03" "2000-01-04" "2000-01-05" "2000-01-06" ...
$ BID : num 8.82 8.82 8.88 8.79 8.78 8.8 8.81 8.82 8.86 8.78 ...
$ I.S : num 0.092 0.0819 0.0779 0.0801 0.074 0.0766 0.0628 0.0887 0.0759 0.073 ...
$ BOR : num 9.46 9.5 9.52 9.36 9.33 9.37 9.42 9.39 9.4 9.33 ...
$ R.S : num 0.0822 0.0817 0.0828 0.0732 0.084 0.0919 0.0757 0.0725 0.0719 0.0564 ...
> head(mibid)
Days Date BID I.S BOR R.S
1 1 2000-01-03 8.82 0.0920 9.46 0.0822
2 2 2000-01-04 8.82 0.0819 9.50 0.0817
3 3 2000-01-05 8.88 0.0779 9.52 0.0828
4 4 2000-01-06 8.79 0.0801 9.36 0.0732
5 5 2000-01-07 8.78 0.0740 9.33 0.0840
6 6 2000-01-08 8.80 0.0766 9.37 0.0919
>
I'd do this with xts:
set.seed(21)
mibid <- data.frame(Date=Sys.Date()-100:1,
BID=rnorm(100, 8, 0.1), I.S=rnorm(100, 0.08, 0.01),
BOR=rnorm(100, 9, 0.1), R.S=rnorm(100, 0.08, 0.01))
require(xts)
# convert to xts
xmibid <- xts(mibid[,-1], mibid[,1])
# aggregate
agg_xmibid <- apply.monthly(xmibid, colMeans)
# convert back to data.frame
agg_mibid <- data.frame(Date=index(agg_xmibid), agg_xmibid, row.names=NULL)
head(agg_mibid)
# Date BID I.S BOR R.S
# 1 2015-04-30 8.079301 0.07189111 9.074807 0.06819096
# 2 2015-05-31 7.987479 0.07888328 8.999055 0.08090253
# 3 2015-06-30 8.043845 0.07885779 9.018338 0.07847999
# 4 2015-07-31 7.990822 0.07799489 8.980492 0.08162038
# 5 2015-08-07 8.000414 0.08535749 9.044867 0.07755017
A small example of how this might be done using dplyr and lubridate
set.seed(321)
dat <- data.frame(day=seq.Date(as.Date("2010-01-01"), length.out=200, by="day"),
x = rnorm(200),
y = rexp(200))
head(dat)
day x y
1 2010-01-01 1.7049032 2.6286754
2 2010-01-02 -0.7120386 0.3916089
3 2010-01-03 -0.2779849 0.1815379
4 2010-01-04 -0.1196490 0.1234461
5 2010-01-05 -0.1239606 2.2237404
6 2010-01-06 0.2681838 0.3217511
require(dplyr)
require(lubridate)
dat %>%
mutate(year = year(day),
monthnum = month(day),
month = month(day, label=T)) %>%
group_by(year, month) %>%
arrange(year, monthnum) %>%
select(-monthnum) %>%
summarise(x = mean(x),
y = mean(y))
Source: local data frame [7 x 4]
Groups: year
year month x y
1 2010 Jan 0.02958633 0.9387509
2 2010 Feb 0.07711820 1.0985411
3 2010 Mar -0.06429982 1.2395438
4 2010 Apr -0.01787658 1.3627864
5 2010 May 0.19131861 1.1802712
6 2010 Jun -0.04894075 0.8224855
7 2010 Jul -0.22410057 1.1749863
Another option is using data.table which has several very convenient datetime functions. Using the data of #SamThomas:
library(data.table)
setDT(dat)[, lapply(.SD, mean), by=.(year(day), month(day))]
this gives:
year month x y
1: 2010 1 0.02958633 0.9387509
2: 2010 2 0.07711820 1.0985411
3: 2010 3 -0.06429982 1.2395438
4: 2010 4 -0.01787658 1.3627864
5: 2010 5 0.19131861 1.1802712
6: 2010 6 -0.04894075 0.8224855
7: 2010 7 -0.22410057 1.1749863
On the data of #JoshuaUlrich:
setDT(mibid)[, lapply(.SD, mean), by=.(year(Date), month(Date))]
gives:
year month BID I.S BOR R.S
1: 2015 5 7.997178 0.07794925 8.999625 0.08062426
2: 2015 6 8.034805 0.07940600 9.019823 0.07823314
3: 2015 7 7.989371 0.07822263 8.996015 0.08195401
4: 2015 8 8.010541 0.08364351 8.982793 0.07748399
If you want the names of the months instead of numbers, you will have to include [, day:=as.IDate(day)] after the setDT() part and use months instead of month:
setDT(mibid)[, Date:=as.IDate(Date)][, lapply(.SD, mean), by=.(year(Date), months(Date))]
Note: Especially on larger datasets, data.table will probably be (a lot) faster then the other two solutions.

Calling a list of tickers in quantmod using R

I want to get some data from a list of Chinese stocks using quantmod.
The list is like below:
002705.SZ -- 002730.SZ (in this sequence, there are some tickers matched with Null stock, for example, there is no stock called 002720.SZ)
300357.SZ -- 300402.SZ
603188.SS
603609.SS
603288.SS
603306.SS
603369.SS
I want to write a loop to run all these stocks to get the data from each of them and save them into one data frame.
This should get you started.
library(quantmod)
library(stringr) # for str_pad
stocks <- paste(str_pad(2705:2730,width=6,side="left",pad="0"),"SZ",sep=".")
get.stock <- function(s) {
s <- try(Cl(getSymbols(s,auto.assign=FALSE)),silent=T)
if (class(s)=="xts") return(s)
return (NULL)
}
result <- do.call(cbind,lapply(stocks,get.stock))
head(result)
# X002705.SZ.Close X002706.SZ.Close X002707.SZ.Close X002708.SZ.Close X002709.SZ.Close X002711.SZ.Close X002712.SZ.Close X002713.SZ.Close
# 2014-01-21 15.25 27.79 NA 17.26 NA NA NA NA
# 2014-01-22 14.28 28.41 NA 16.56 NA NA NA NA
# 2014-01-23 13.65 27.78 33.62 15.95 19.83 NA 36.58 NA
# 2014-01-24 15.02 30.56 36.98 17.55 21.81 NA 40.24 NA
# 2014-01-27 14.43 31.26 40.68 18.70 23.99 26.34 44.26 NA
# 2014-01-28 14.18 30.01 44.75 17.66 25.57 28.97 48.69 NA
This takes advantage of the fact that getSymbols(...) returns either an xts object, or a character string with an error message if the fetch fails.
Note that cbind(...) for xts objects aligns according to the index, so it acts like merge(...).
This produces an xts object, not a data frame. To convert this to a data.frame, use:
result.df <- data.frame(date=index(result),result)

Resources