How can I ignore a data set if some column names don't exist in it?
I have a list of weather data from a stream but I think certain key weather conditions don't exist and therefore I have this error below with rbind:
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
My code:
weatherDf <- data.frame()
for(i in weatherData) {
# Get the airport code.
airport <- i$airport
# Get the date.
date <- as.POSIXct(as.numeric(as.character(i$timestamp))/1000, origin="1970-01-01", tz="UTC-1")
# Get the data in dailysummary only.
dailySummary <- i$dailysummary
weatherDf <- rbind(weatherDf, ldply(
list(dailySummary),
function(x) c(airport, format(as.Date(date), "%Y-%m-%d"), x[["meanwindspdi"]], x[["meanwdird"]], x[["meantempm"]], x[["humidity"]])
))
}
So how can I make sure these key conditions below exist in the data:
meanwindspdi
meanwdird
meantempm
humidity
If any of them does not exit, then ignore the bunch of them. Is it possible?
EDIT:
The content of weatherData is in jsfiddle (I can't post it here as it is too long and I dunno where is the best place to show the data publicly for R...)
EDIT 2:
I get some error when I try to export the data into a txt:
> write.table(weatherData,"/home/teelou/Desktop/data/data.txt",sep="\t",row.names=FALSE)
Error in data.frame(date = list(pretty = "January 1, 1970", year = "1970", :
arguments imply differing number of rows: 1, 0
What does it mean? It seems that there are some errors in the data...
EDIT 3:
I have exported my entire data in .RData to my google drive:
https://drive.google.com/file/d/0B_w5RSQMxtRSbjdQYWJMX3pfWXM/view?usp=sharing
If you use RStudio, then you can just import the data.
EDIT 4:
target_names <- c("meanwindspdi", "meanwdird", "meantempm", "humidity")
# If it has data then loop it.
if (!is.null(weatherData)) {
# Initialize a data frame.
weatherDf <- data.frame()
for(i in weatherData) {
if (!all(target_names %in% names(i)))
next
# Get the airport code.
airport <- i$airport
# Get the date.
date <- as.POSIXct(as.numeric(as.character(i$timestamp))/1000, origin="1970-01-01", tz="UTC-1")
# Get the data in dailysummary only.
dailySummary <- i$dailysummary
weatherDf <- rbind(weatherDf, ldply(
list(dailySummary),
function(x) c(airport, format(as.Date(date), "%Y-%m-%d"), x[["meanwindspdi"]], x[["meanwdird"]], x[["meantempm"]], x[["humidity"]])
))
}
# Rename column names.
colnames(weatherDf) <- c("airport", "key_date", "ws", "wd", "tempi", 'humidity')
# Convert certain columns weatherDf type to numberic.
columns <-c("ws", "wd", "tempi", "humidity")
weatherDf[, columns] <- lapply(columns, function(x) as.numeric(weatherDf[[x]]))
}
Inspect the weatherDf:
> View(weatherDf)
Error in .subset2(x, i, exact = exact) : subscript out of bounds
You can use next to skip the current iteration of the loop and go to the next iteration:
target_names <- c("meanwindspdi", "meanwdird", "meantempm", "humidity")
for(i in weatherData) {
if (!all(target_names %in% names(i)))
next
# continue with loop...
I am trying to download bulk Oanda forex data using quantmod::getSymbols. The help file states that you can download only 500 days worth of data per request whereas I get a warning about a cap of 5 years worth of data from warnings(). Nevertheless, I tried to create a loop to download data from 1997 until this date. This is my code:
library(xts)
library(quantmod)
date_from = c("1996-01-01", "2001-01-02", "2005-01-03", "2009-01-03", "2013-01-04")
date_to = c("2001-01-01", "2005-01-02", "2009-01-03", "2013-01-03", "2016-01-04")
for (i in 1:5) {
getSymbols("EUR/AUD", src="oanda", from = dates_from[i], to = date_to[i])
forex = for (i=1) EURAUD else NULL
final_Dataset<- rbind(c(forex, EURAUD))
}
What changes should I implement?
Edit 1
I made it work but it is sloppily written. Any proposed changes would be much appreciated.
date_from = c("1996-01-01", "2001-01-02", "2005-01-03", "2009-01-03", "2013-01-04")
date_to = c("2001-01-01", "2005-01-02", "2009-01-03", "2013-01-03", "2016-01-04")
forex = vector(mode = 'list', length = 5)
for (i in 1:5) {
getSymbols("EUR/AUD", src="oanda", from = dates_from[i], to = date_to[i])
forex[[i]] = EURAUD
}
EUR_AUD = Reduce(rbind,forex)
You can do this by looping over a vector of dates that are 500 days apart. Notice that I wrapped the getSymbols call in try because the first 2 starting dates did not work. I'm not sure why.
require(quantmod)
Data <- do.call(rbind, lapply(dates, function(d) {
sym <- "EUR/AUD"
x <- try(getSymbols(sym, src="oanda", from=d, to=d+499, auto.assign=FALSE))
if (inherits(x, "try-error"))
return(NULL)
else
return(x)
}))
The R package "termstrc", designed for term-structure estimation, is an incredibly useful tool, but it requires data to be set in a particularly awkward format: lists within lists.
Question: What is the best way to prepare and shape data, either outside R or inside R, in order to create the repeated sublist format required to run the function "dyncouponbonds"?
The "dyncouponbonds" command requires data to be set in a repeated sublist, whereby a list of bonds and time-invariant features of those bonds (let's call this "bondlist"), is appended with some time t features of those bonds (price and accrued interest), and replicated for time t+1 to T.
Below is an example of the list format for one period. The "dyncouponbonds" command requires this format to be replicated, within an umbrella list, for all T periods. ISIN, MATURITYDATE, ISSUEDATE, COUPONRATE will be identical for each period. PRICE, ACCRUED, CASHFLOWS and TODAY will be different for each period.
R> str(govbonds$GERMANY)
List of 8
$ ISIN : chr [1:52] "DE0001141414" "DE0001137131" "DE0001141422" ...
$ MATURITYDATE:Class 'Date' num [1:52] 13924 13952 13980 14043 ...
$ ISSUEDATE :Class 'Date' num [1:52] 11913 13215 12153 13298 ...
$ COUPONRATE : num [1:52] 0.0425 0.03 0.03 0.0325 ...
$ PRICE : num [1:52] 100 99.9 99.8 99.8 ...
$ ACCRUED : num [1:52] 4.09 2.66 2.43 2.07 ...
$ CASHFLOWS :List of 3
..$ ISIN: chr [1:384] "DE0001141414" "DE0001137131" "DE0001141422" ...
..$ CF : num [1:384] 104 103 103 103 ...
..$ DATE:Class 'Date' num [1:384] 13924 13952 13980 14043 ...
$ TODAY :Class 'Date' num 13908
This a fairly advanced data manipulation question. R has many powerful data manipulation tools and you're not going to need to move away from R to prepare the (admittedly fairly obtuse) dyncouponbonds object. Indeed you actually shouldn't, because taking a structure from another language and then turning into dyncouponbonds will simply be more work.
The first thing I would make sure is that you are very familiar with the lapply function. You're going to be making plenty of use of it. You're going to be using it to create a list of couponbonds objects, which is what dyncouponbonds actually is. Creating couponbonds objects however is a little tougher, mainly because of the CASHFLOWS sublist which wants each cashflow associated with the bond's ISIN and with the date of the cashflow. For this you'll use lapply and some fairly advanced subscripting. The subset function will also come in handy.
This question also very much depends on where you will be getting the data from, and getting it out of Bloomberg is non-trivial, mainly because you will need to go back in history using the BDS function and "DES_CASH_FLOW" field for each bond to get its cashflows. I say history, because if you're using dyncouponbonds I'm assuming you will want to do historic yield curve analysis. You'll need to override the BDS function's "SETTLE_DT" field, to the value that you will have received for the bond using the BDP function and field "FIRST_SETTLE_DT", so that you get all the cashflows from the beginning of the bond's life (otherwise it'll only return from today, and that's no good for historic analysis). But I digress. If you're not using bloomberg I don't know where you'll get this data from.
You'll then need to get the static data for each bond, namely the maturity, the ISIN, and the coupon rate and the issue date. And you'll need historic price and accrued interest data. Again if using bloomberg, you'll use the BDP function for this with fields you'll see in the code, below, and the historic data function BDH which I have wrapped as bbdh. Assuming again that you're a bloomberg user, here is the code:
bbGetCountry <- function(cCode, up = FALSE) {
# this function is going to get all the data out of bloomberg that we need for a
# country, and update it if ncessary
if (up == TRUE) startDate <- as.Date("2012-01-01") else startDate <- histStartDate
# first get all the curve members for history
wdays <- wdaylist(startDate, Sys.Date()) # create the list of working days from startdate
actives <- lapply(wdays, function(x) {
bds(conn, BBcurveIDs[cCode], "CURVE_MEMBERS", override_fields = "CURVE_DATE",
override_values = format(x, "%Y%m%d"))
})
names(actives) <- wdays
uniqueActives <- unique(unlist(actives)) # there will be puhlenty duplicates. Get rid of them
# now get the unchanging bond data
staticData <- bdp(conn, uniqueActives, bbStaticDataFields)
# now get the cash flowdata
cfData <- lapply(uniqueActives, function(x) {
bds(conn, x, "DES_CASH_FLOW_ADJ", override_fields = "SETTLE_DT",
override_values = format(as.Date(staticData[x, "FIRST_SETTLE_DT"]), "%Y%m%d"))
})
names(cfData) <- uniqueActives
# now for historic data
historicData <- lapply(bbHistoricDataFields, function(x) bbdh(uniqueActives, flds = x, startDate = startDate))
names(historicData) <- bbHistoricDataFields # put the names in otherwise we get a numbered list
allDates <- as.Date(index(historicData$LAST_PRICE)) # all the dates we will find settlement dates for for all bonds. No posix
save(actives, file = paste("data/", cCode, "actives.dat", sep = "")) #save all the files now
save(staticData, file = paste("data/", cCode, "staticData.dat", sep = ""))
save(cfData, file = paste("data/", cCode, "cfData.dat", sep = ""))
save(historicData, file = paste("data/", cCode, "historicData.dat", sep = ""))
#save(settleDates, file = paste("data/", cCode, "settleDates.dat", sep = ""))
assign(paste(cCode, "data", sep = ""), list(actives = actives, staticData = staticData, cfData = cfData, #
historicData = historicData), pos = 1)
}
the bbdh function I use above is wrapper around the Rbbg library's bdh function and looks like this:
bbdh <- function(secs, years = 1, flds = "last_price", startDate = NULL) {
#this function gets secs over years from bloomberg daily data
if(is.null(startDate)) startDate <- Sys.Date() - years * 365.25
if(class(startDate) == "Date") stardDate <- format(startDate, "%Y%m%d") #convert date classes to bb string
if(nchar(startDate) > 8) startDate <- format(as.Date(startDate), "%Y%m%d") # if we've been passed wrong format character string
rawd <- bdh(conn, secs, flds, startDate, always.display.tickers = TRUE, include.non.trading.days = TRUE,
option_names = c("nonTradingDayFillOption", "nonTradingDayFillMethod"),
option_values = c("NON_TRADING_WEEKDAYS", "PREVIOUS_VALUE"))
rawd <- dcast(rawd, date ~ ticker) #put into columns
colnames(rawd) <- sub(" .*", "", colnames(rawd)) #remove the govt, currncy bits from bb tickers
return(xts(rawd[, -1], order.by = as.POSIXct(rawd[, 1])))
}
The country code comes from a structure which associates two letter names with bloomberg yield curve descriptions:
BBcurveIDs <- list(PO = "YCGT0084 Index", #Portugal
DE = "YCGT0016 Index",
FR = "YCGT0014 Index",
SP = "YCGT0061 Index",
IT = "YCGT0040 Index",
AU = "YCGT0001 Index", #Australia
AS = "YCGT0063 Index", #Austria
JP = "YCGT0018 Index",
GB = "YCGT0022 Index",
HK = "YCGT0095 Index",
CA = "YCGT0007 Index",
CH = "YCGT0082 Index",
NO = "YCGT0078 Index",
SE = "YCGT0021 Index",
IR = "YCGT0062 Index",
BE = "YCGT0006 Index",
NE = "YCGT0020 index",
ZA = "YCGT0090 Index",
PL = "YCGT0177 Index", #Poland
MX = "YCGT0251 Index")
So bbGetCountry will create 4 different data structures, called actives, staticData, dynamicData, and historicData, all from the following bloomberg fields:
bbStaticDataFields <- c("ID_ISIN",
"ISSUER",
"COUPON",
"CPN_FREQ",
"MATURITY",
"CALC_TYP_DES", # pricing calculation type
"INFLATION_LINKED_INDICATOR", # N or Y, in R returned as TRUE or FALSE
"ISSUE_DT",
"FIRST_SETTLE_DT",
"PX_METHOD", # PRC or YLD
"PX_DIRTY_CLEAN", # market convention dirty or clean
"DAYS_TO_SETTLE",
"CALLABLE",
"MARKET_SECTOR_DES",
"INDUSTRY_SECTOR",
"INDUSTRY_GROUP",
"INDUSTRY_SUBGROUP")
bbDynamicDataFields <- c("IS_STILL_CALLABLE",
"RTG_MOODY",
"RTG_MOODY_WATCH",
"RTG_SP",
"RTG_SP_WATCH",
"RTG_FITCH",
"RTG_FITCH_WATCH")
bbHistoricDataFields <- c("PX_BID",
"PX_ASK",
#"PX_CLEAN_BID",
#"PX_CLEAN_ASK",
"PX_DIRTY_BID",
"PX_DIRTY_ASK",
#"ASSET_SWAP_SPD_BID",
#"ASSET_SWAP_SPD_ASK",
"LAST_PRICE",
#"SETTLE_DT",
"YLD_YTM_MID")
Now you're ready to create couponbond objects, using all these data structures:
createCouponBonds <- function(cCode, dateString) {
cdata <- get(paste(cCode, "data", sep = "")) # get the data set
today <- as.Date(dateString)
settleDate <- today
daycount <- 0
while(daycount < 3) {
settleDate <- settleDate + 1
if (!(weekdays(settleDate) %in% c("Saturday", "Sunday"))) daycount <- daycount + 1
}
goodbonds <- subset(cdata$staticData, COUPON != 0 & INFLATION_LINKED_INDICATOR == FALSE) # clean out zeros and tbills
goodbonds <- goodbonds[rownames(goodbonds) %in% cdata$actives[[dateString]][, 1], ]
stripnames <- sapply(strsplit(rownames(goodbonds), " "), function(x) x[1])
pxbid <- cdata$historicData$PX_BID[today, stripnames]
pxask <- cdata$historicData$PX_ASK[today, stripnames]
pxdbid <- cdata$historicData$PX_DIRTY_BID[today, stripnames]
pxdask <- cdata$historicData$PX_DIRTY_ASK[today, stripnames]
price <- as.numeric((pxbid + pxask) / 2)
accrued <- as.numeric(pxdbid - pxbid)
cashflows <- lapply(rownames(goodbonds), function(x) {
goodflows <- cdata$cfData[[x]][as.Date(cdata$cfData[[x]][, "Date"]) >= today, ]
#gfstipnames <- sapply(strsplit(rownames(goodflows), " "), function(x) x[1]) dunno if I need this
isin <- rep(cdata$staticData[x, "ID_ISIN"], nrow(goodflows))
cf <- apply(goodflows[, 2:3], 1, sum) / 10000
dt <- as.Date(goodflows[, 1])
return(list(isin = isin, cf = cf, dt = dt))
})
isinvec <- unlist(lapply(cashflows, function(x) x$isin))
cfvec <- as.numeric(unlist(lapply(cashflows, function(x) x$cf)))
datevec <- unlist(lapply(cashflows, function(x) x$dt))
govbonds <- list(ISIN = goodbonds$ID_ISIN,
MATURITYDATE = as.Date(goodbonds$MATURITY),
ISSUEDATE = as.Date(goodbonds$FIRST_SETTLE_DT),
COUPONRATE = as.numeric(goodbonds$COUPON) / 100,
PRICE = price,
ACCRUED = accrued,
CASHFLOWS = list(ISIN = isinvec, CF = cfvec, DATE = as.Date(datevec)),
TODAY = settleDate)
govbonds <- list(govbonds)
names(govbonds) <- cCode
class(govbonds) <- "couponbonds"
return(govbonds)
}
Take a close look at the cashflows <- lapply... function because this is where you'll create the sublist and is the core of the answer to your question, although of course, how this is done depends very much on how you have decided to build the intermediate data structures, and I have given you just one possibility. I realise that my answer is complex, but the problem is very complex. All the code you need is not in this answer either, a few helper functions are missing, but I am happy to provide them if you contact me. Certainly the skeleton of the core functions is all here, and actually, much of the problem is getting the data in the first place, and structuring it appropriately. You correctly surmise that some of the data is static for each bond, some of it is dynamic, and some of it is historical. So the dimensions of the intermediate datas structures are different for different pieces of the couponbonds objects. How you represent that is up to you, though I have used separate lists / data frames for each, linked via the bond IDs where necessary.
The function above will take a date string so you can do it for each of your historic data points, using the above-mentioned lapply, and hey "presto", dyncouponds:
spl <<- lapply(dodates, function(x) createCouponBonds("SP", x))
names(spl) <<- lapply(spl, function(x) x$SP$TODAY)
class(spl) <- "dyncouponbonds"
There you go. You asked for it....
If you're not using bloomberg, your input data structures will be very different but, as I said starting out, get super familiar with lapply and sapply. OBviously there are many other ways this problem could be solved, but the above works for Bloomberg. If you understand this code, you'll surely know what you're doing for other data sources.
Finally please note that the Rbbg package from findata.org is used to interface to bloomberg.
My 2 cents, I have been trying to get this work with new Rblpapi. I still have some problems with createCouponBonds part but I think other functions returns correctly. Won't solve whole problem but at least partial fix. BBcurveIDs, bbStaticDataFields, bbDynamicDataFields, bbHistoricDataFields are the same as above.
bbGetCountry <- function(cCode, up = FALSE) {
if (up == TRUE) startDate <- as.Date("2016-01-01") else startDate <- histStartDate
cal <- Calendar(weekdays=c("saturday", "sunday"))
wdays <- as.list(bizseq(startDate, Sys.Date(), cal))
actives <- lapply(wdays, function(x) {
bds(BBcurveIDs[cCode][[1]], "CURVE_MEMBERS", override = c(CURVE_DATE=format(x, "%Y%m%d")))
})
names(actives) <- wdays
uniqueActives <- unique(unlist(actives))
staticData <- bdp(uniqueActives, bbStaticDataFields)
cfData <- lapply(uniqueActives, function(x) {
bds(x, "DES_CASH_FLOW_ADJ", override = c(SETTLE_DT = format(as.Date(staticData[x, "FIRST_SETTLE_DT"]), "%Y%m%d")))
})
names(cfData) <- uniqueActives
historicData <- lapply(bbHistoricDataFields, function(x) bbdh(uniqueActives, flds = x, startDate = startDate))
names(historicData) <- bbHistoricDataFields
allDates <- as.Date(index(historicData$LAST_PRICE))
save(actives, file = paste("data_", cCode, "actives.dat", sep = ""))
save(staticData, file = paste("data_", cCode, "staticData.dat", sep = ""))
save(cfData, file = paste("data_", cCode, "cfData.dat", sep = ""))
save(historicData, file = paste("data_", cCode, "historicData.dat", sep = ""))
#save(settleDates, file = paste("data_", cCode, "settleDates.dat", sep = ""))
assign(paste(cCode, "data", sep = ""), list(actives = actives, staticData = staticData, cfData = cfData, #
historicData = historicData), pos = 1)
}
And bbdh function:
bbdh <- function(secs, years = 1, flds = "last_price", startDate = NULL) {
if(is.null(startDate)) startDate <- Sys.Date() - years * 365.25
if(class(startDate) == "Date") stardDate <- format(startDate, "%Y%m%d")
if(nchar(startDate) > 8) startDate <- format(as.Date(startDate), "%Y%m%d")
rawd <- bdh(secs, flds,
startDate,
include.non.trading.days = FALSE,
options = structure(c("PREVIOUS_VALUE", "NON_TRADING_WEEKDAYS"),
names = c("nonTradingDayFillMethod","nonTradingDayFillOption")))
rawd <- ldply(rawd, data.frame)
colnames(rawd) <- c("sec", "date", "fld")
rawd <- dcast(rawd, date ~ sec, value.var="fld")
colnames(rawd) <- gsub(" Corp", "", colnames(rawd))
return(xts(rawd[,-1], order.by=rawd[,1]))
}