Rblpapi, unlist data.frame zoo xts - r

Using the Rblpapi, I get stockdata of 3 indices in a data.frame with several lists.
Then, I want to get it in either a zoo or preferably xts format. However, I first have to unlist properly.
Since not everyone has access to Rblpapi and therefore cannot replicate, please look at the str output and suggest me how to unlist.
Any leads or help appreciated!
library(Rblpapi)
library(zoo)
library(xts)
str(res)
List of 3
$ :'data.frame': 9 obs. of 2 variables:
..$ date : Date[1:9], format: ...
..$ PX_LAST: num [1:9] 201 194 188 190 190 ...
$ :'data.frame': 9 obs. of 2 variables:
..$ date : Date[1:9], format: ...
..$ PX_LAST: num [1:9] 4891 4686 4477 4568 4517 ...
$ :'data.frame': 9 obs. of 2 variables:
..$ date : Date[1:9], format: ...
..$ PX_LAST: num [1:9] 19.3 22.5 26.1 22.5 22 ...
head(res)
[[1]]
date PX_LAST
1 2016-01-05 201.3600
2 2016-01-12 193.6608
3 2016-01-19 188.0600
4 2016-01-26 190.2000
5 2016-02-02 190.1600
6 2016-02-09 185.4300
7 2016-02-16 189.7800
8 2016-02-23 192.3200
9 2016-03-01 197.9700
[[2]]
date PX_LAST
1 2016-01-05 4891.430
2 2016-01-12 4685.919
3 2016-01-19 4476.950
4 2016-01-26 4567.673
5 2016-02-02 4516.946
6 2016-02-09 4268.763
7 2016-02-16 4435.956
8 2016-02-23 4503.583
9 2016-03-01 4680.479
[[3]]
date PX_LAST
1 2016-01-05 19.34
2 2016-01-12 22.47
3 2016-01-19 26.05
4 2016-01-26 22.50
5 2016-02-02 21.98
6 2016-02-09 26.54
7 2016-02-16 24.11
8 2016-02-23 20.98
9 2016-03-01 17.85
Unlist to get one data.frame / zoo / xts object with (date, pricedata1, pricedata2, pricedata3)
df <- data.frame(matrix(unlist(res), nrow=9))
head(df)
X1 X2 X3 X4 X5 X6
1 16805 201.36 16805 4891.43 16805 19.34
2 16812 193.6608 16812 4685.919 16812 22.47
3 16819 188.06 16819 4476.95 16819 26.05
4 16826 190.2 16826 4567.673 16826 22.5
5 16833 190.16 16833 4516.946 16833 21.98
6 16840 185.43 16840 4268.763 16840 26.54
However, this is not what I want. column X3 en X5 should not be there. Plus the date format is not good. Therefore getting it to zoo or xts doesn't work:
price<-read.zoo(df, format="%Y%m%d")
df$date <-as.Date(as.character(df$date),format="%Y%m%d")
x<-xts(df$date, df$px_last)
Error in read.zoo(df, format = "%Y%m%d") : index has bad entries at data rows: 1 2 3 4 5 6 7 8 9
Error in xts(df$date, df$px_last) : order.by requires an
appropriate time-based object

I believe all you need is to join each list element by the date.
However, for that, first you need to rename all those variables PX_LAST to something unique. For example:
require(data.table)
for (i in 1:length(res)) {
setnames(res[[i]],"PX_LAST",paste("PX_LAST",i,sep="_"))
}
Then you can join, either by pairwise mergeing, or using the plyr::join_all function:
require(plyr)
df <- join_all(res, by="date", type="full")
# date PX_LAST_1 PX_LAST_2 PX_LAST_3
# 1 2016-01-05 201.3600 4891.430 19.34
# 2 2016-01-12 193.6608 4685.919 22.47
# 3 2016-01-19 188.0600 4476.950 26.05
# 4 2016-01-26 190.2000 4567.673 22.50
# 5 2016-02-02 190.1600 4516.946 21.98
# 6 2016-02-09 185.4300 4268.763 26.54
# 7 2016-02-16 189.7800 4435.956 24.11
# 8 2016-02-23 192.3200 4503.583 20.98
# 9 2016-03-01 197.9700 4680.479 17.85
Then finally you can use
include(zoo)
price<-read.zoo(df)
include(xts)
xts(df, df$date)

Related

Find mean from subset of one column based on ranking in the top 50 of another column

I have a data frame that has the following columns:
> str(wbr)
'data.frame': 214 obs. of 12 variables:
$ countrycode : Factor w/ 214 levels "ABW","ADO","AFG",..: 1 2 3 4 5 6 7 8 9 10 ...
$ countryname : Factor w/ 214 levels "Afghanistan",..: 10 5 1 6 2 202 8 9 4 7 ...
$ gdp_per_capita : num 19913 35628 415 2738 4091 ...
$ literacy_female : num 96.7 NA 17.6 59.1 95.7 ...
$ literacy_male : num 96.9 NA 45.4 82.5 98 ...
$ literacy_all : num 96.8 NA 31.7 70.6 96.8 ...
$ infant_mortality : num NA 2.2 70.2 101.6 13.3 ...
$ illiteracy_female: num 3.28 NA 82.39 40.85 4.31 ...
$ illiteracy_mele : num 3.06 NA 54.58 17.53 1.99 ...
$ illiteracy_male : num 3.06 NA 54.58 17.53 1.99 ...
$ illiteracy_all : num 3.18 NA 68.26 29.42 3.15 ...
I would like to find the mean of illiteracy_all from the top 50 countries with the highest GDP.
Before you answer me I need to inform you that the data frame has NA values meaning that if I want to find the mean I would have to write:
mean(wbr$illiteracy_all, na.rm=TRUE)
For a reproducible example, let's take:
data.df <- data.frame(x=101:120, y=rep(c(1,2,3,NA), times=5))
So how could I average the y values for e.g. the top 5 values of x?
> data.df
x y
1 101 1
2 102 2
3 103 3
4 104 NA
5 105 1
6 106 2
7 107 3
8 108 NA
9 109 1
10 110 2
11 111 3
12 112 NA
13 113 1
14 114 2
15 115 3
16 116 NA
17 117 1
18 118 2
19 119 3
20 120 NA
Any of the following would work:
mean(data.df[rank(-data.df$x)<=5,"y"], na.rm=TRUE)
mean(data.df$y[rank(-data.df$x)<=5], na.rm=TRUE)
with(data.df, mean(y[rank(-x)<=5], na.rm=TRUE))
To unpack why this works, note first that rank gives ranks in a different order to what you might expect, 1 being the rank of the smallest number not the largest:
> rank(data.df$x)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
We can get round that by negating the input:
> rank(-data.df$x)
[1] 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
So now ranks 1 to 5 are the "top 5". If we want a vector of TRUE and FALSE to indicate the position of the top 5 we can use:
> rank(-data.df$x)<=5
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[14] FALSE FALSE TRUE TRUE TRUE TRUE TRUE
(In reality you might find you have some ties in your data set. This is only going to cause issues if the 50th position is tied. You might want to have a look at the ties.method argument for rank to see how you want to handle this.)
So let's grab the values of y in those positions:
> data.df[rank(-data.df$x)<=5,"y"]
[1] NA 1 2 3 NA
Or you could use:
> data.df$y[rank(-data.df$x)<=5]
[1] NA 1 2 3 NA
So now we know what to input into mean:
> mean(data.df[rank(-data.df$x)<=5,"y"], na.rm=TRUE)
[1] 2
Or:
> mean(data.df$y[rank(-data.df$x)<=5], na.rm=TRUE)
[1] 2
Or if you don't like repeating the name of the data frame, use with:
> with(data.df, mean(y[rank(-x)<=5], na.rm=TRUE))
[1] 2

Fill data.frame with lists data

I have a data.frame like this which I splitted by "bicho" in a list:
row.names bicho freq date ndvi date2 ndvi2 date3 ndvi3 ...
1 john 3 2009-04-08 5001 2009-04-23 4537 2009-05-09 3540
1.1 john 3 2009-04-08 5001 2009-04-23 4537 2009-05-09 3540
1.2 john 3 2009-04-08 5001 2009-04-23 4537 2009-05-09 3540
... ... . ... .. ... .. ... .. ...
2 steve 4 2010-04-29 6338 2010-05-09 5145 2010-05-25 3318
2.1 steve 4 2010-04-29 6338 2010-05-09 5145 2010-05-25 3318
2.2 steve 4 2010-04-29 6338 2010-05-09 5145 2010-05-25 3318
2.3 steve 4 2010-04-29 6338 2010-05-09 5145 2010-05-25 3318
List example:
$ john:'data.frame': 14 obs. of 152 variables:
..$ bicho : Factor w/ 26 levels "john","john",..: 1 1 1 1 1 1 1 1 1 1 ...
..$ freq : num [1:14] 14 14 14 14 14 14 14 14 14 14 ...
..$ date : Date[1:14], format: "2009-04-08" "2009-04-08" ...
..$ ndvi : num [1:14] 5001 5001 5001 5001 5001 ...
..$ date2 : chr [1:14] "2009-04-23" "2009-04-23" "2009-04-23" "2009-04-23" ...
..$ ndvi2 : num [1:14] 4538 4538 4538 4538 4538 ...
..$ date3 : chr [1:14] "2009-05-09" "2009-05-09" "2009-05-09" "2009-05-09" ...
..$ ndvi3 : num [1:14] 3540 3540 3540 3540 3540 ...
The list has 26 elements, each one looking like the one above.
My goal is to fill a data frame with all of the elements in it, but with new columns in which I want to do some calculations. The final data.frame should look like this:
row.names bicho freq time1 time2 ndvi
1 john 3 0 (date2-date1) 5001
1.1 john 3 (date2-date1) (date3-date2) 4537
1.2 john 3 (date3-date2) (date4-date3) 3540
... ... . ... ... ..
2 steve 4 0 (date2-date1) 6338
2.1 steve 4 (date2-date1) (date3-date2) 5145
2.2 steve 4 (date3-date2) (date4-date3) 3318
2.3 steve 4 (date4-date3) (date5-date4) 1239
My initial code looks like this. The problem is that I want to fill the final data.frame row by row (1:563) with list elements with variable length, but I can't find a way to do that.
for(b in list){
for(j in seq_along(df$bicho){
for(i in seq_along(b$bicho)){
print(i)
if(i==1){
df$tempo1[j]=0
df$tempo2[j]=as.Date(b$date2[i])-b$date[i]
df$NDVI<-b[i,4]
df$tempo1[j+1]=df$tempo2[j]
}}}}
The objective of this code was to fill only the first row of each variable.
How about fill the data first ?
## make toy lists
ll1 <- list(bicho=rep("a",3),frep=rep(3,3),x=1:3,y=7:9)
ll2 <- list(bicho=rep("b",5),frep=rep(4,5),x=1:5,y=6:10)
## fill
ret <- c()
your.list.of.list <- list(ll1,ll2)
for (e in your.list.of.list){
ret <- rbind(ret,do.call(cbind,e))
}
ret <- data.frame(ret,stringsAsFactors=FALSE)
then you can add any additional columns by column-wise computations
ret$z <- as.numeric(ret$y)-as.numeric(ret$x)
ret
> ret
bicho frep x y z
1 a 3 1 7 6
2 a 3 2 8 6
3 a 3 3 9 6
4 b 4 1 6 5
5 b 4 2 7 5
6 b 4 3 8 5
7 b 4 4 9 5
8 b 4 5 10 5
>

Carc data from rda file to numeric matrix

I try to make KDA (Kernel discriminant analysis) for carc data, but when I call command X<-data.frame(scale(X)); r shows error:
"Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric"
I tried to use as.numeric(as.matrix(carc)) and carc<-na.omit(carc), but it does not help either
library(ks);library(MASS);library(klaR);library(FSelector)
install.packages("klaR")
install.packages("FSelector")
library(ks);library(MASS);library(klaR);library(FSelector)
attach("carc.rda")
data<-load("carc.rda")
data
carc<-na.omit(carc)
head(carc)
class(carc) # check for its class
class(as.matrix(carc)) # change class, and
as.numeric(as.matrix(carc))
XX<-carc
X<-XX[,1:12];X.class<-XX[,13];
X<-data.frame(scale(X));
fit.pc<-princomp(X,scores=TRUE);
plot(fit.pc,type="line")
X.new<-fit.pc$scores[,1:5]; X.new<-data.frame(X.new);
cfs(X.class~.,cbind(X.new,X.class))
X.new<-fit.pc$scores[,c(1,4)]; X.new<-data.frame(X.new);
fit.kda1<-Hkda(x=X.new,x.group=X.class,pilot="samse",
bw="plugin",pre="sphere")
kda.fit1 <- kda(x=X.new, x.group=X.class, Hs=fit.kda1)
Can you help to resolve this problem and make this analysis?
Added:The car data set( Chambers, kleveland, Kleiner & Tukey 1983)
> head(carc)
P M R78 R77 H R Tr W L T D G C
AMC_Concord 4099 22 3 2 2.5 27.5 11 2930 186 40 121 3.58 US
AMC_Pacer 4749 17 3 1 3.0 25.5 11 3350 173 40 258 2.53 US
AMC_Spirit 3799 22 . . 3.0 18.5 12 2640 168 35 121 3.08 US
Audi_5000 9690 17 5 2 3.0 27.0 15 2830 189 37 131 3.20 Europe
Audi_Fox 6295 23 3 3 2.5 28.0 11 2070 174 36 97 3.70 Europe
Here is a small dataset with similar characteristics to what you describe
in order to answer this error:
"Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric"
carc <- data.frame(type1=rep(c('1','2'), each=5),
type2=rep(c('5','6'), each=5),
x = rnorm(10,1,2)/10, y = rnorm(10))
This should be similar to your data.frame
str(carc)
# 'data.frame': 10 obs. of 3 variables:
# $ type1: Factor w/ 2 levels "1","2": 1 1 1 1 1 2 2 2 2 2
# $ type2: Factor w/ 2 levels "5","6": 1 1 1 1 1 2 2 2 2 2
# $ x : num -0.1177 0.3443 0.1351 0.0443 0.4702 ...
# $ y : num -0.355 0.149 -0.208 -1.202 -1.495 ...
scale(carc)
# Similar error
# Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
Using set()
require(data.table)
DT <- data.table(carc)
cols_fix <- c("type1", "type2")
for (col in cols_fix) set(DT, j=col, value = as.numeric(as.character(DT[[col]])))
str(DT)
# Classes ‘data.table’ and 'data.frame': 10 obs. of 4 variables:
# $ type1: num 1 1 1 1 1 2 2 2 2 2
# $ type2: num 5 5 5 5 5 6 6 6 6 6
# $ x : num 0.0465 0.1712 0.1582 0.1684 0.1183 ...
# $ y : num 0.155 -0.977 -0.291 -0.766 -1.02 ...
# - attr(*, ".internal.selfref")=<externalptr>
The first column(s) of your data set may be factors. Taking the data from corrgram:
library(corrgram)
carc <- auto
str(carc)
# 'data.frame': 74 obs. of 14 variables:
# $ Model : Factor w/ 74 levels "AMC Concord ",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ Origin: Factor w/ 3 levels "A","E","J": 1 1 1 2 2 2 1 1 1 1 ...
# $ Price : int 4099 4749 3799 9690 6295 9735 4816 7827 5788 4453 ...
# $ MPG : int 22 17 22 17 23 25 20 15 18 26 ...
# $ Rep78 : num 3 3 NA 5 3 4 3 4 3 NA ...
# $ Rep77 : num 2 1 NA 2 3 4 3 4 4 NA ...
# $ Hroom : num 2.5 3 3 3 2.5 2.5 4.5 4 4 3 ...
# $ Rseat : num 27.5 25.5 18.5 27 28 26 29 31.5 30.5 24 ...
# $ Trunk : int 11 11 12 15 11 12 16 20 21 10 ...
# $ Weight: int 2930 3350 2640 2830 2070 2650 3250 4080 3670 2230 ...
# $ Length: int 186 173 168 189 174 177 196 222 218 170 ...
# $ Turn : int 40 40 35 37 36 34 40 43 43 34 ...
# $ Displa: int 121 258 121 131 97 121 196 350 231 304 ...
# $ Gratio: num 3.58 2.53 3.08 3.2 3.7 3.64 2.93 2.41 2.73 2.87 ...
So exclude them by trying this:
X<-XX[,3:14]
or this
X<-XX[,-(1:2)]

Plot of minute tick data with correct x-axis formatting?

I want to plot tick data on a minute-basis. My dataframe looks like the following:
> head(df)
No Date Time Close Volume Weekday
1 3361 03.12.2012 08:00:00.000 7.435 27000000 Montag
2 3362 03.12.2012 08:01:00.000 7.428 47000000 Montag
3 3363 03.12.2012 08:02:00.000 7.428 41000000 Montag
4 3364 03.12.2012 08:03:00.000 7.429 39000000 Montag
5 3365 03.12.2012 08:04:00.000 7.426 44000000 Montag
6 3366 03.12.2012 08:05:00.000 7.423 49000000 Montag
>
Now I want to plot the first 241 entries, with the correct x-axis description. Currently I use a simple 1:241 vector:
plot(c(1:241),df[1:241,4],type="l")
And I get:
When I try
plot(df[1:241,3],df[1:241,4],type="l")
this looks like:
What's wrong here? Thanks!
EDIT:
> str(df)
'data.frame': 81613 obs. of 6 variables:
$ No : int 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 ...
$ Date : Factor w/ 270 levels "01.01.2013","01.02.2013",..: 25 25 25 25 25 25 25 25 25 25 ...
$ Time : Factor w/ 600 levels "08:00:00.000",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Close : num 7.43 7.43 7.43 7.43 7.43 ...
$ Volume : int 27000000 47000000 41000000 39000000 44000000 49000000 51000000 48000000 49000000 45000000 ...
$ Weekday: Factor w/ 5 levels "Dienstag","Donnerstag",..: 5 5 5 5 5 5 5 5 5 5 ...
>
EDIT2:
Data here.
You need to convert your variables Date and Time with something like strptime:
df$DateTime = strptime(paste(as.character(df$Date), as.character(df$Time)), "%m.%d.%Y %H:%M:%S")
plot(df$DateTime[1:241], df$Close[1:241], type="l")

Generate entries in time series data

I want to generate a row (with zero ammount) for each missing month (until the current) in the following dataframe. Can you please give me a hand in this? Thanks!
trans_date ammount
1 2004-12-01 2968.91
2 2005-04-01 500.62
3 2005-05-01 434.30
4 2005-06-01 549.15
5 2005-07-01 276.77
6 2005-09-01 548.64
7 2005-10-01 761.69
8 2005-11-01 636.77
9 2005-12-01 1517.58
10 2006-03-01 719.09
11 2006-04-01 1231.88
12 2006-05-01 580.46
13 2006-07-01 1468.43
14 2006-10-01 692.22
15 2006-11-01 505.81
16 2006-12-01 1589.70
17 2007-03-01 1559.82
18 2007-06-01 764.98
19 2007-07-01 964.77
20 2007-09-01 405.18
21 2007-11-01 112.42
22 2007-12-01 1134.08
23 2008-02-01 269.72
24 2008-03-01 208.96
25 2008-04-01 353.58
26 2008-05-01 756.00
27 2008-06-01 747.85
28 2008-07-01 781.62
29 2008-09-01 195.36
30 2008-10-01 424.24
31 2008-12-01 166.23
32 2009-02-01 237.11
33 2009-04-01 110.94
34 2009-07-01 191.29
35 2009-11-01 153.42
36 2009-12-01 222.87
37 2010-09-01 1259.97
38 2010-11-01 375.61
39 2010-12-01 496.48
40 2011-02-01 360.07
41 2011-03-01 324.95
42 2011-04-01 566.93
43 2011-06-01 281.19
44 2011-08-01 428.04
'data.frame': 44 obs. of 2 variables:
$ trans_date : Date, format: "2004-12-01" "2005-04-01" "2005-05-01" "2005-06-01" ...
$ ammount: num 2969 501 434 549 277 ...
you can use seq.Date and merge:
> str(df)
'data.frame': 44 obs. of 2 variables:
$ trans_date: Date, format: "2004-12-01" "2005-04-01" "2005-05-01" "2005-06-01" ...
$ ammount : num 2969 501 434 549 277 ...
> mns <- data.frame(trans_date = seq.Date(min(df$trans_date), max(df$trans_date), by = "month"))
> df2 <- merge(mns, df, all = TRUE)
> df2$ammount <- ifelse(is.na(df2$ammount), 0, df2$ammount)
> head(df2)
trans_date ammount
1 2004-12-01 2968.91
2 2005-01-01 0.00
3 2005-02-01 0.00
4 2005-03-01 0.00
5 2005-04-01 500.62
6 2005-05-01 434.30
and if you need months until current, use this:
mns <- data.frame(trans_date = seq.Date(min(df$trans_date), Sys.Date(), by = "month"))
note that it is sufficient to call simply seq instead of seq.Date if the parameters are Date class.
If you're using xts objects, you can use timeBasedSeq and merge.xts. Assuming your original data is in an object Data:
# create xts object:
# no comma on the first subset (Data['ammount']) keeps column name;
# as.Date needs a vector, so use comma (Data[,'trans_date'])
x <- xts(Data['ammount'],as.Date(Data[,'trans_date']))
# create a time-based vector from 2004-12-01 to 2011-08-01. The "m" denotes
# monthly time-steps. By default this returns a yearmon class. Use
# retclass="Date" to return a Date vector.
d <- timeBasedSeq(paste(start(x),end(x),"m",sep="/"), retclass="Date")
# merge x with an "empty" xts object, xts(,d), filling with zeros
y <- merge(x,xts(,d),fill=0)

Resources