Can't Download Index Data from Yahoo R - r

So, I want to download an index's data but can't get the data. The problem is that Yahoo.finance sees index's tickers as having a ^ before them, for example, ^VIX not VIX. Unfotunentally, my code doesn't like that. I can't find any functions or alternative ways to call this data. Also, I really don't want to change the platform from which I get my data from, that would be a HUGE pain in the butt for the rest of my code.
I have tried putting in the ^ with the asset, and it won't download data. I've also tried calling the data without the ^ and that gives the wrong data.
asset <- "VIX"
ticker <- "VIX"
start.date <- as.Date('2009-09-01')
getSymbols(ticker, src='yahoo', from=start.date)
Adj.Close <- get(ticker)[,6]
When I put this in I end up getting this message:
Error in get(ticker) : object '^VIX' not found
Thank you for your time, regardless of whether or not you know a solution.

I cannot confirm your issue. There is no need here for a leading "^" symbol.
Running
library(quantmod)
ticker <- "VIX"
start.date <- as.Date('2009-09-01')
getSymbols(ticker, src='yahoo', from=start.date)
will automatically store the output in an xts object called VIX
head(get(ticker))
# VIX.Open VIX.High VIX.Low VIX.Close VIX.Volume VIX.Adjusted
#2014-12-04 28200.4 30096.9 27953.0 28447.7 811330 28447.7
#2014-12-05 26551.2 27540.7 25974.0 26056.5 377529 26056.5
#2014-12-08 25231.9 26056.5 23582.8 23582.8 367585 23582.8
#2014-12-09 23582.8 23582.8 21274.0 21274.0 570963 21274.0
#2014-12-10 19789.7 20202.0 19212.5 19295.0 539795 19295.0
#2014-12-11 18635.3 19295.0 17398.5 17728.3 1053637 17728.3
Note that can you avoid the get step if you avoid auto-assigning the output of getSymbols to the current environment:
res <- getSymbols("VIX", src='yahoo', from=start.date, auto.assign = FALSE)

I assume Maurits seeks data for the VIX Index, in which case I find it necessary to include the carat in the ticker. Building on the previous answer we can see that a valid object is returned when we use the "^VIX" ticker.
library(quantmod)
start.date <- as.Date('2009-09-01')
ticker <- "^VIX"
getSymbols(ticker, src='yahoo', from=start.date)
[1] "^VIX"
However, getis unable to find that object:
head(get(ticker))
Error in get(ticker) : object '^VIX' not found
But, using "VIX" rather than "^VIX" with getreturns the desired result:
head(get("VIX"))
VIX.Open VIX.High VIX.Low VIX.Close VIX.Volume VIX.Adjusted
2009-09-01 26.01 29.23 26.00 29.15 0 29.15
2009-09-02 29.14 29.57 28.41 28.90 0 28.90
2009-09-03 28.90 28.90 26.98 27.10 0 27.10
2009-09-04 26.98 26.98 24.86 25.26 0 25.26
2009-09-08 25.26 26.15 25.26 25.62 0 25.62
2009-09-09 25.66 25.93 24.23 24.32 0 24.32

Related

How do I use an extracted string to call an xts data object?

I want to use a string of stock symbols and loop through it pulling pairs into a block of code for analysis. I can get the loop to pull in the data but then I want to assign the data to a generic data element so I can just run it through my code. Can't get hold of the xts object from the list programmatically and get it to execute - just returns the value.
library(quantmod)
library(xts)
asset1 = "ADBE"
asset2 = "VGT"
assets <- c(asset1, asset2)
assets # This returns [1] "ADBE "VGT"
getSymbols(assets[1]) # All good so far this returns an xts object [1:3247] [1:6] called ADBE
Manually if I enter:
df01 = ADBE # This makes df01 the same as the data values for ADBE.
df01 <- assets[1] # makes df01 a character string equal to "ADBE"
Question:
How do I make the df01 = ADBE piece happen programmatically using the values in assets. When I use assets[1] it fails and as I don't want to type the stock codes every time but assign it and as I loop through a list of assets(n) using generic code.
I realise this is probably a simple dumb question but its got me stumped and cannot find a solution on-line.
Keep the results in one list, by default getSymbols assigns every asset into the environment, we can change this by using auto.assign = FALSE, example:
myResult <- lapply(assets, getSymbols, auto.assign = FALSE)
myResult <- setNames(myResult, assets)
# access using names
myResult$ADBE
# ADBE.Open ADBE.High ADBE.Low ADBE.Close ADBE.Volume ADBE.Adjusted
# 2007-01-03 40.72 41.32 38.89 39.92 7126000 39.92
# ...
# or using a variable
assets[ 1 ]
# [1] "ADBE"
myResult[[ assets[ 1 ] ]]
# ADBE.Open ADBE.High ADBE.Low ADBE.Close ADBE.Volume ADBE.Adjusted
# 2007-01-03 40.72 41.32 38.89 39.92 7126000 39.92
# ...
If we do not wish to keep them in a list, then maybe use get:
getSymbols(assets)
df01 <- get(assets[ 1 ])
df01
# ADBE.Open ADBE.High ADBE.Low ADBE.Close ADBE.Volume ADBE.Adjusted
# 2007-01-03 40.72 41.32 38.89 39.92 7126000 39.92
# ...

Quantmod: Create new column for multiple tickers in one time

I've my own csv file with a list of stocks that I use to download tickers data from yahoo.
For that purpose I use the following code(Correct):
library(quantmod)
Tickers <- read.csv("nasdaq_tickers_list.csv", stringsAsFactors = FALSE)
getSymbols(Tickers$Tickers,from="2018-01-01", src="yahoo" )
The result is that 55 tickers have been loaded correctly.
Now I'd like to make some calculations, I need to create a new column on each ticker with the substract of the (High Price - Open Price)
I need something like this, for example AABA ticker:
New column name= AABA.Range
AABA.Range =(AABA$AABA.High - AABA$AABA.Open)
How can I get this applied and get a new column for the 55 tickers?
I was able to create the new column one by one, but how to do it for all of them with one function?
Is that possible?
Thanks a lot for your help.
One of the problems you have is that all the stock information is in the global environment. So first we need to pull all of them into a giant list. Next I created a range function that returns the stock data plus the range column with the correct name.
# Put all stocks in big list, by checking which xts objects are in the global environment.
stock_data = sapply(.GlobalEnv, is.xts)
all_stocks <- do.call(list, mget(names(stock_data)[stock_data]))
# range function
stock_range <- function(x) {
stock_name <- stringi::stri_extract(names(x)[1], regex = "^[A-Z]+")
stock_name <- paste0(stock_name, ".range")
column_names <- c(names(x), stock_name)
x$range <- quantmod::Hi(x) - quantmod::Lo(x)
x <- setNames(x, column_names)
return(x)
}
# calculate all ranges and add them to the data
all_stocks <- lapply(all_stocks, stock_range)
head(all_stocks$MSFT)
MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted MSFT.range
2007-01-03 29.91 30.25 29.40 29.86 76935100 22.67236 0.850000
2007-01-04 29.70 29.97 29.44 29.81 45774500 22.63439 0.529998
2007-01-05 29.63 29.75 29.45 29.64 44607200 22.50531 0.299999
2007-01-08 29.65 30.10 29.53 29.93 50220200 22.72550 0.569999
2007-01-09 30.00 30.18 29.73 29.96 44636600 22.74828 0.450000
2007-01-10 29.80 29.89 29.43 29.66 55017400 22.52049 0.459999
It might be better that when you load the data just run a lapply to get all the data in a list. That way the first step is not needed and you can use all the TTR functions with lapply (or Map)
my_stock_data <- lapply(Tickers , getSymbols, auto.assign = FALSE)
names(my_stock_data) <- Tickers

Creating a function by taking few arguments and calculating

I'm still working on a question from couple of days ago and would like to receive feedback/support on how I could create a function. Your expertise is highly appreciated.
I have created the following:
##### 1)
> raceIDs
[1] "GER" "SUI" "NZ2" "US1" "US2" "POR" "FRA" "AUS" "NZ1" "SWE"
##### 2)
#For each "raceIDs", there is a csv file which I have made a loop to read and created a list of data frames (assigned to the symbol "boatList")
#For example, if I select "NZ1" the output is:
> head(boatList[[9]]) #Only selected the first six lines as there is more than 30000 rows
Boat Date Secs LocalTime SOG
1 NZ1 01:09:2013 38150.0 10:35:49.997 22.17
2 NZ1 01:09:2013 38150.2 10:35:50.197 22.19
3 NZ1 01:09:2013 38150.4 10:35:50.397 22.02
4 NZ1 01:09:2013 38150.6 10:35:50.597 21.90
5 NZ1 01:09:2013 38150.8 10:35:50.797 21.84
6 NZ1 01:09:2013 38151.0 10:35:50.997 21.95
##### 3)
# A matrix showing the race times for each raceIDs
> raceTimes
start finish
GER "11:10:02" "11:35:05"
SUI "11:10:02" "11:35:22"
NZ2 "11:10:02" "11:34:12"
US1 "11:10:01" "11:33:29"
US2 "11:10:01" "11:36:05"
POR "11:10:02" "11:34:31"
FRA "11:10:02" "11:34:45"
AUS "11:10:03" "11:36:48"
NZ1 "11:10:01" "11:35:16"
SWE "11:10:03" "11:35:08"
What I need to do is I need to calculate the average speed (SOG) of a boat "while it was racing" (between start and finish times) by creating a function called meanRaceSpeed and having three arguments:
What I have tried so far is to create a function with 3 arguments (with a bit of help from experts here):
meanRaceSpeed <- function(raceIDs, boatList, raceTimes)
{
#Probably need to compare times, and thought it might be useful to convert character values into `DateTime` values but not to sure how to use it
#DateTime <- as.POSIXct(paste(boatList$Date, boatList$Time), format="%Y%m%d %H%M%S")
#To get the times for each boat
start_time <- raceTimes$start[rownames(raceTimes) = raceIDs]
finish_time <- raceTimes$finish[rownames(raceTimes) = raceIDs]
start_LocalTime <- min(grep(start_time, boatList$LocalTime))
finish_LocalTime <- max(grep(finish_time, boatList$LocalTime))
#which `SOG`s contain all the `LocalTimes` between start and finish
#take their `mean`
mean(boatList$SOG[start_LocalTime : finish_LocalTime])
}
### Obviously, my code does not work :( and I don't know where.
So basically, I need to create a function with three arguments and the expected result is:
#e.g For NZ1
> meanRaceSpeed("NZ1", boatList, raceTimes)
[1] 18.32 #Mean speed for NZ1 between 11:10:01 - 11:35:16
#e.g for US1
> meanRaceSpeed("US1", boatList, raceTimes)
[1] 17.23 #Mean speed for US1 between 11:10:01 - 11:33:29
Any helps where I could have gone wrong? Highly appreciate your help please.
I'm going to give some general advice for R, but I will also help you with your specific question. Whenever I have a problem in R, I usually find that it helps to make things more explicit.
If the function isn't working with these methods (is that a data frame or a matrix in your function?) then you should try another method. If those table manipulation methods aren't working, try a different one. How?
Here's a few different things you can do to test your function, and a few suggestions that may move you along a bit. (I don't want to fix the whole thing for you, since it's your homework, but rather get you on your way.)
1) Why not try using a loop instead of brackets?
start_time <- raceTimes$start[rownames(raceTimes) = raceIDs]
Make that into a for loop. It's not too hard to do.
2) Debug your functions. There are a lot of tools to do this built into R, and in packages you can add. Since you, likely, don't have time for that with your homework. I'd suggest doing this. Take apart the function and apply each part of it with a variable you want. Are they of the right length? Are they the right data type? Are they getting the right answer before you put them all together? Make sure of that.
3) If all else fails, don't be afraid if the function and code is not elegant. R is not always an elegant language. (Actually, it's rarely an elegant language.) Especially when you're a beginner, your code will likely be ugly. Just make sure it works.
Since I, already, had experience with your data, I sat to make a complete example.
First, data that look like yours:
raceIDs <- c("GER", "SUI", "NZ2", "US1", "US2", "POR", "FRA", "AUS", "NZ1", "SWE")
raceTimes <- as.matrix(read.table(text = ' start finish
GER "11:10:02" "11:35:05"
SUI "11:10:02" "11:35:22"
NZ2 "11:10:02" "11:34:12"
US1 "11:10:01" "11:33:29"
US2 "11:10:01" "11:36:05"
POR "11:10:02" "11:34:31"
FRA "11:10:02" "11:34:45"
AUS "11:10:03" "11:36:48"
NZ1 "11:10:01" "11:35:16"
SWE "11:10:03" "11:35:08"', header = T))
#turn matrix to data.frame or, else, `$` won't work
raceTimes <- as.data.frame(raceTimes, stringsAsFactors = F)
blDF <- data.frame(Boat = rep(raceIDs, 3),
LocalTime = c(raceTimes$start, rep("11:20:25", length(raceIDs)), raceTimes$finish),
SOG = runif(3 * length(raceIDs), 15, 25), stringsAsFactors = F)
boatList <- split(blDF, blDF$Boat)
#remove `names` to create them from scratch
names(boatList) <- NULL
Then:
#create `names` by searching each element of
#`boatList` of what `boat` it contains
names(boatList) <- unlist(lapply(boatList, function(x) unique(x$Boat)))
#the function
meanRaceSpeed <- function(ID, boatList, raceTimes)
{ #named the first argument `ID` instead of `raceIDs`
start_time <- raceTimes$start[rownames(raceTimes) == ID]
finish_time <- raceTimes$finish[rownames(raceTimes) == ID]
start_LocalTime <- min(grep(start_time, boatList[[ID]]$LocalTime))
finish_LocalTime <- max(grep(finish_time, boatList[[ID]]$LocalTime))
mean(boatList[[ID]]$SOG[start_LocalTime : finish_LocalTime])
}
Test:
meanRaceSpeed("US1", boatList, raceTimes)
#[1] 19.7063
meanRaceSpeed("NZ1", boatList, raceTimes)
#[1] 21.74729
mean(boatList$NZ1$SOG) #to test function
#[1] 21.74729
mean(boatList$US1$SOG) #to test function
#[1] 19.7063

Download VIX futures prices from CBOE

I am trying to get historical prices for VIX futures by downloading all the CSV files on this page (http://cfe.cboe.com/Products/historicalVIX.aspx). Here is the code I am using to do this:
library(XML)
#Extract all links for url
url <- "http://cfe.cboe.com/Products/historicalVIX.aspx"
doc <- htmlParse(url)
links <- xpathSApply(doc, "//a/#href")
free(doc)
#Filter out URLs ending with csv and complete the link.
links <- links[substr(links, nchar(links) - 2, nchar(links)) == "csv"]
links <- paste("http://cfe.cboe.com", links, sep="")
#Peform read.csv on each url in links, skipping the first two URLs as they are not relevant.
c <- lapply(links[-(1:2)], read.csv, header = TRUE)
I get the error:
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
more columns than column names
Upon further investigation, I realize this is because some of the CSV files are formatted differently. If I load the URL links[9] manually, I see that the first row has this disclaimer:
CFE data is compiled for the .......use of CFE data is subject to the Terms and Conditions of CBOE's Websites.
Most of the other files (e.g.links[8] and links[10]) are fine so it seems this has been randomly inserted. Is there some R magic that can be done to handle this?
Thank you.
I have a getSymbols.cfe method in my qmao package (for the getSymbols function in quantmod package) that will make this a lot easier.
#install.packages('qmao', repos='http://r-forge.r-project.org')
library(qmao)
This is from the examples section of ?getSymbols.cfe (please read the help page as the function has a few arguments that you may want to be different than the defaults)
getSymbols(c("VX_U11", "VX_V11"),src='cfe')
#all contracts expiring in 2010 and 2011.
getSymbols("VX",Months=1:12,Years=2010:2011,src='cfe')
#getSymbols("VX",Months=1:12,Years=10:11,src='cfe') #same
And it's not just for VIX
getSymbols(c("VM","GV"),src='cfe') #The mini-VIX and Gold vol contracts expiring this month
If you're not familiar with getSymbols, by default it stores the data in your .GlobalEnv and return the name of the object that was saved.
> getSymbols("VX_Z12", src='cfe')
[1] "VX_Z12"
> tail(VX_Z12)
VX_Z12.Open VX_Z12.High VX_Z12.Low VX_Z12.Close VX_Z12.Settle VX_Z12.Change VX_Z12.Volume VX_Z12.EFP VX_Z12.OpInt
2012-10-26 19.20 19.35 18.62 18.87 18.9 0.0 22043 15 71114
2012-10-31 18.55 19.50 18.51 19.46 19.5 0.6 46405 319 89674
2012-11-01 19.35 19.35 17.75 17.87 17.9 -1.6 40609 2046 95720
2012-11-02 17.90 18.65 17.55 18.57 18.6 0.7 42592 1155 100691
2012-11-05 18.60 20.15 18.43 18.86 18.9 0.3 28136 110 102746
2012-11-06 18.70 18.85 17.75 18.06 18.1 -0.8 35599 851 110638
Edit
I see now that I did not answer your question, but rather pointed you to another way to get the same error! A simple way to make your code work, is to make a wrapper for read.csv that uses readLines to see if the first row contains the disclaimer; if it does, skip the the first row, otherwise use read.csv as normal.
myRead.csv <- function(x, ...) {
if (grepl("Terms and Conditions", readLines(x, 1))) { #is the first row the disclaimer?
read.csv(x, skip=1, ...)
} else read.csv(x, ...)
}
L <- lapply(links[-(1:2)], myRead.csv, header = TRUE)
I also applied that patch to getSymbols.cfe. You can get the latest version of qmao (1.3.11) using svn checkout (see this post if you need help with that), or, you can wait until R-Forge builds it for you which usually happens pretty quickly, but could take up to a couple of days.

Reading sdmx-xml files into a dataframe in R

I was wondering if anyone has managed to read SDMX-XML files into a dataframe. The file I’d like to read is https://www.ecb.europa.eu/stats/sdmx/icpf/1/data/pension_funds.xml (1mb).
I saved the file as “pensions_funds.xml” to the pwd and tried to use the XML package to read it:
fileName <- system.file("pensions", "pensions_funds.xml", package="XML")
parsed<-xmlTreeParse("pension_funds.xml",getDTD=F)
r<-xmlRoot(parsed)
tmp = xmlSApply(r, function(x) xmlSApply(x, xmlValue))
The few lines above basically follow the example here http://www.omegahat.org/RSXML/gettingStarted.html
but I think I would first need to somehow ignore the header (I have pasted below the first couple of pages of the file I’m trying to read). So I think the above might work but it starts from the wrong node for my purposes. I would like to grab the obs_values, indexed by their time_period and ref_area.
The first thing would be to find the right node and start there however I suspect I might be on a fool’s errand since I have limited knowledge of data formats and I’m not sure the XML package can be used for SDMX-XML files. Smarter people appear to have tried to do this
http://opensdmxdevelopers.wikispaces.com/RSDMX
I can’t find this package for download on its homepage here
https://r-forge.r-project.org/projects/rsdmx/
(I can’t see any link/download section but maybe I’m blind) and it seems to be early stages. The existence of the rsdmx suggests using the xml package to read sdmx might not be easy so I’m ready to give up at this stage unless anyone has had success with this. Actually I’m mainly interested in reading this file
http://www.ecb.europa.eu/stats/sdmx/bsi/1/data/outstanding_amounts.xml
But this is a 10mb file so I was starting smaller.
edit3
attempting sgibb's answer on large file using changes in Mischa's comment
library("XML")
url <- "http://www.ecb.europa.eu/stats/sdmx/bsi/1/data/outstanding_amounts.xml"
sdmxHandler <- function() {
## data.frame which stores results
data <- data.frame(stringsAsFactors=FALSE)
## counter to store current row
i <- 1
## temp value to store current REF_AREA
## temp value to store current REF_AREA
refArea <- NA
bsItem <- NA
bsCountSector <- NA
## handler subroutine for Obs tag
Obs <- function(name, attr) {
## found an Obs tag and now fill data.frame
data[i, "refArea"] <<- refArea
data[i, "timePeriod"] <<- as.numeric(attr["TIME_PERIOD"])
data[i, "obsValue"] <<- as.numeric(attr["OBS_VALUE"])
data[i, "bsItem"] <<- bsItem
data[i, "bsCountSector"] <<- bsCountSector
i <<- i + 1
}
## handler subroutine for Series tag
Series <- function(name, attr) {
refArea <<- attr["REF_AREA"]
bsItem <<- as.character(attr["BS_ITEM"])
bsCountSector <<- as.numeric(attr["BS_ITEM"])
}
return(list(getData=function() {return(data)},
Obs=Obs, Series=Series))
}
## run parser
df <- xmlEventParse(file(url), handlers=sdmxHandler())$getData()
Specification mandate value for attribute OBS_VALUE
attributes construct error
Couldn't find end of Start Tag Obs line 15108
Premature end of data in tag Series line 15041
Premature end of data in tag DataSet line 91
Premature end of data in tag CompactData line 2
Error: 1: Specification mandate value for attribute OBS_VALUE
2: attributes construct error
3: Couldn't find end of Start Tag Obs line 15108
4: Premature end of data in tag Series line 15041
5: Premature end of data in tag DataSet line 91
6: Premature end of data in tag CompactData line 2
In addition: There were 50 or more warnings (use warnings() to see the first 50)
edit2:
the answer from sgibb looks ideal and works perfectly on the smaller file. I tried to run it on
url <- http://www.ecb.europa.eu/stats/sdmx/bsi/1/data/outstanding_amounts.xml
(the 10mb file, original link corrected), with the only modification being the addition of two lines:
data[i, "bsItem"] <<- as.character(attr["BS_ITEM"])
data[i, "bsCountSector"] <<- as.numeric(attr["BS_COUNT_SECTOR"])
(these are additional id variables which are needed to identify a row in this larger dataset).
It ran for a few minutes then finished with this error:
Error: 1: Specification mandate value for attribute TIME_PE
2: attributes construct error
3: Couldn't find end of Start Tag Obs line 20743
4: Premature end of data in tag Series line 20689
5: Premature end of data in tag DataSet line 91
6: Premature end of data in tag CompactData line 2
In addition: There were 50 or more warnings (use warnings() to see the first 50)
The basic format of the data seems very similar so I thought this might work. The basic format of the 10mb file is as below:
<Series FREQ="M" REF_AREA="AT" ADJUSTMENT="N" BS_REP_SECTOR="A" BS_ITEM="A20" MATURITY_ORIG="A" DATA_TYPE="1" COUNT_AREA="U2" BS_COUNT_SECTOR="0000" CURRENCY_TRANS="Z01" BS_SUFFIX="E" TIME_FORMAT="P1M" COLLECTION="E">
<Obs TIME_PERIOD="1997-09" OBS_VALUE="275.3" OBS_STATUS="A" OBS_CONF="F"/>
<Obs TIME_PERIOD="1997-10" OBS_VALUE="275.9" OBS_STATUS="A" OBS_CONF="F"/>
<Obs TIME_PERIOD="1997-11" OBS_VALUE="276.6" OBS_STATUS="A" OBS_CONF="F"/>
edit1:
desired data format:
Ref_area time_period obs_value
At 2006 118
At 2007 119
…
Be 2006 101
…
Here’s the first bit of the data.
</Header>
DataSet xsi:schemaLocation="https://www.ecb.europa.eu/vocabulary/stats/icpf/1 https://www.ecb.europa.eu/stats/sdmx/icpf/1/structure/2011-08-11/sdmx-compact.xsd" xmlns="https://www.ecb.europa.eu/vocabulary/stats/icpf/1">
<Group DECIMALS="0" TITLE_COMPL="Austria, reporting institutional sector Insurance corporations and pension funds - Closing balance sheet - All financial assets and liabilities - counterpart area World (all entities), counterpart institutional sector Total economy including Rest of the World (all sectors) - Credit (resources/liabilities) - Non-consolidated, Current prices - Euro, Neither seasonally nor working day adjusted - ESA95 TP table Not applicable" UNIT_MULT="9" UNIT="EUR" ESA95TP_SUFFIX="Z" ESA95TP_DENOM="E" ESA95TP_CONS="N" ESA95TP_DC_AL="2" ESA95TP_CPSECTOR="S" ESA95TP_CPAREA="A1" ESA95TP_SECTOR="S125" ESA95TP_ASSET="F" ESA95TP_TRANS="LE" ESA95TP_PRICE="V" ADJUSTMENT="N" REF_AREA="AT"/><Series ESA95TP_SUFFIX="Z" ESA95TP_DENOM="E" ESA95TP_CONS="N" ESA95TP_DC_AL="2" ESA95TP_CPSECTOR="S" ESA95TP_CPAREA="A1" ESA95TP_SECTOR="S125" ESA95TP_ASSET="F" ESA95TP_TRANS="LE" ESA95TP_PRICE="V" ADJUSTMENT="N" REF_AREA="AT" COLLECTION="E" TIME_FORMAT="P1Y" FREQ="A"><Obs OBS_CONF="F" OBS_STATUS="E" OBS_VALUE="112" TIME_PERIOD="2008"/><Obs OBS_CONF="F" OBS_STATUS="E" OBS_VALUE="119" TIME_PERIOD="2009"/><Obs OBS_CONF="F" OBS_STATUS="E" OBS_VALUE="125" TIME_PERIOD="2010"/><Obs OBS_CONF="F" OBS_STATUS="E" OBS_VALUE="127" TIME_PERIOD="2011"/></Series><Group D
RSDMX seems to be in an early development state. IMHO there is no package available yet. But you could easily implement it on your own using the XML package. I would suggest to use xmlEventParse (see ?xmlEventParse for details):
EDIT: adapt example to changed requirements of outstanding_amounts.xml
EDIT2: add download.file
library("XML")
#url <- "http://www.ecb.europa.eu/stats/sdmx/icpf/1/data/pension_funds.xml"
url <- "http://www.ecb.europa.eu/stats/sdmx/bsi/1/data/outstanding_amounts.xml"
## download xml file to avoid download errors disturbing xmlEventParse
tmp <- tempfile()
download.file(url, tmp)
sdmxHandler <- function() {
## data.frame which stores results
data <- data.frame(stringsAsFactors=FALSE)
## counter to store current row
i <- 1
## temp value to store current REF_AREA, BS_ITEM and BS_COUNT_SECTOR
refArea <- NA
bsItem <- NA
bsCountSector <- NA
## handler subroutine for Obs tag
Obs <- function(name, attr) {
## found an Obs tag and now fill data.frame
data[i, "refArea"] <<- refArea
data[i, "bsItem"] <<- bsItem
data[i, "bsCountSector"] <<- bsCountSector
data[i, "timePeriod"] <<- as.Date(paste(attr["TIME_PERIOD"], "-01", sep=""), format="%Y-%m-%d")
data[i, "obsValue"] <<- as.double(attr["OBS_VALUE"])
## update current row
i <<- i + 1
}
## handler subroutine for Series tag
Series <- function(name, attr) {
refArea <<- attr["REF_AREA"]
bsItem <<- attr["BS_ITEM"]
bsCountSector <<- as.numeric(attr["BS_COUNT_SECTOR"])
}
return(list(getData=function() {return(data)},
Obs=Obs, Series=Series))
}
## run parser
df <- xmlEventParse(tmp, handlers=sdmxHandler())$getData()
head(df)
# refArea bsItem bsCountSector timePeriod obsValue
#1 DE A20 2210 12053 39.6
#2 DE A20 2210 12084 46.1
#3 DE A20 2210 12112 50.2
#4 DE A20 2210 12143 52.0
#5 DE A20 2210 12173 52.3
#6 DE A20 2210 12204 47.3
The package rsdmx allows you to read SDMX-ML files and coerce them as data.frame. It is now hosted at Github, and currently available in CRAN, but in case you can install easily it from GitHub with the following:
require("devtools")
install_github("rsdmx", "opensdmx")
Applying to your data, you can do the following:
sdmx <- readSDMX("http://www.ecb.europa.eu/stats/sdmx/bsi/1/data/outstanding_amounts.xml")
df <- as.data.frame(sdmx)
More examples are given in the rsdmx wiki
Note that its functionalities currently load the xml object into R, as a slot part of the SDMX R objects instantiated by rsdmx. In the future, we would like to investigate how rsdmx can use xmlEventParse (as suggested above by #sgibb) to read very large datasets.
library(XML)
xmlparsed <- xmlParse(file(url))
## obtain dataset node::
series_data <- getNodeSet(xmlparsed, "//Series")
if(length(series_data)==0){
datasetnode <- xmlChildren( xmlChildren(xmlparsed)[[1]])[[2]]
series_data<-xmlChildren(datasetnode)[ names(xmlChildren(datasetnode))=="Series"]
}
## prepare dataset
dataset.frame <- data.frame(matrix(ncol=3))
colnames(dataset.frame) <- c('REF_AREA', 'TIME_PERIOD', 'OBS_VALUE')
## loop over data
counter=1
for (i in 1: length(series_data)){
if('Obs'%in%names(xmlChildren(series_data[[i]])) ){ ## To ignore empty //Series nodes
for (j in 1: length(xmlChildren(series_data[[i]]))){
dataset.frame[counter,1] <- xmlAttrs(series_data[[i]])['REF_AREA']
dataset.frame[counter,2] <- xmlAttrs(series_data[[i]][[j]])['TIME_PERIOD']
dataset.frame[counter,3] <- xmlAttrs(series_data[[i]][[j]])['OBS_VALUE']
counter=counter+1
}
}
}
head(dataset.frame,5)

Resources