I have the following code in R
v <- c("featureA", "featureB")
newdata <- unique(data[v])
print(unique(data[v])
print(predict(model, newdata, type='response', allow.new.level = TRUE)
And I got the following result
featureA featureB
1 bucket_in_10_to_30 bucket_in_90_to_100
2 bucket_in_10_to_30 bucket_in_50_to_90
3 bucket_in_0_to_10 bucket_in_50_to_90
4 bucket_in_0_to_10 bucket_in_90_to_100
7 bucket_in_10_to_30 bucket_in_10_to_50
10 bucket_in_30_to_100 bucket_in_90_to_100
19 bucket_in_0_to_10 bucket_in_0_to_10
33 bucket_in_0_to_10 bucket_in_10_to_50
36 bucket_in_30_to_100 bucket_in_10_to_50
38 bucket_in_10_to_30 bucket_in_0_to_10
52 bucket_in_30_to_100 bucket_in_0_to_10
150 bucket_in_30_to_100 bucket_in_50_to_90
1 2 3 4 7 10 19 33 36 38 52 150
0.001920662 0.005480186 0.000961198 0.000335883 0.006311521 0.004005570 0.000620979 0.001107773 0.013100210 0.003546136 0.007382468 0.011384935
And I'm wondering if it's possible in R that I can reshape and directly get a 3 x 4 tables similar to this
feature A / features B
bucket_in_90_to_100
bucket_in_50_to_90
bucket_in_0_to_30
...
...
bucket_in_0_to_30
...
...
Thanks for the help!
I have 3 variables. Hence, I will get 6 combinations. I want to produce the 2nd column in conbination form. I got the folloWing result.
P<- matrix(c (
0.427, -0.193, 0.673,
-0.193, 0.094, -0.428,
0.673, -0.428, 224.099
), c( 3,3))
G <-matrix(c(
0.238, -0.033, 0.468,
-0.033, 0.084, -0.764,
0.468, -0.764, 205.144
), c(3,3))
A<-rep(1, each=nrow(P))
df<-do.call(rbind, lapply(1:ncol(P), function(x) {
do.call(rbind, combn(ncol(P), x, function(y) {
data.frame(comb = paste(y, collapse = ""),
B = ((solve(P)%*%G)%*%A),
stringsAsFactors = FALSE)
}, simplify = FALSE))
}))
>df
comb B
1 1 -19.7814149
2 1 -44.1515387
3 1 0.8891786
4 2 -19.7814149
5 2 -44.1515387
6 2 0.8891786
7 3 -19.7814149
8 3 -44.1515387
9 3 0.8891786
10 12 -19.7814149
11 12 -44.1515387
12 12 0.8891786
13 13 -19.7814149
14 13 -44.1515387
15 13 0.8891786
16 23 -19.7814149
17 23 -44.1515387
18 23 0.8891786
19 123 -19.7814149
20 123 -44.1515387
21 123 0.8891786
Here I got only 3 (-19.7814149,-44.1515387,0.8891786) values But I wanted 12 values.
comb B
1 0.5574
2 0.8936
3 0.9154
12 10.0772, 21.233
13 0.2083 , 0.9169
23 -3.1085, 0.9061
123 -19.7814, -44.1515, 0.8892
I can't manage this. Furthermore, I want to use these B values to calculate my desired result (GA), where my formula is
b<-t(B)
gain<-do.call(rbind, lapply(1:ncol(P), function(x) {
do.call(rbind, combn(ncol(P), x, function(y) {
data.frame(GA = abs(round(1.76*(sum(G[y,y] %*% B[y] * A[y]))/ sqrt((b[y] %*%P[y,y])%*%B[y]),2)),
stringsAsFactors = FALSE)
}, simplify = FALSE))
}))
My final output is
comb B GA
1 0.5574 0.641
2 0.8936 0.4822
3 0.9154 24.1186
12 10.0772, 21.233 3.123
13 0.2083 , 0.9169 24.1748
23 -3.1085, 0.9061 24.0867
123 -19.7814, -44.1515, 0.8892 24.9097
Is there any solution?
data_processed <- sqldf(" select a.permno, a.number, a.mean, b.ret as med, a.std
from data_processed as a
left join data_processed2 as b
on a.permno=b.permno")
The code above is not working. I an getting the error below:
Error in result_create(conn#ptr, statement) : no such column: b.ret
Here is my data:
data_processed:
permno number mean std
1 10107 120 0.0117174000 0.06802718
2 11850 120 0.0024398083 0.04594591
3 12060 120 0.0005072167 0.08544500
4 12490 120 0.0063569167 0.05325215
5 14593 120 0.0200060583 0.08865493
6 19561 120 0.0154743500 0.07771348
7 25785 120 0.0184815583 0.16510082
8 27983 120 0.0025951333 0.09538822
9 55976 120 0.0092889000 0.04812975
10 59328 120 0.0098526167 0.07135423
data_processed2:
permno return
1 10107 0.0191920
2 11850 0.0015495
3 12060 -0.0040130
4 12490 0.0078245
5 14593 0.0231735
6 19561 0.0202610
7 25785 -0.0018760
8 27983 0.0027375
9 55976 0.0089435
10 59328 0.0166490
My first question here :)
My goal is: Given a data frame with predictors (each column a predictor / rows observations) fit a regression using lm and then predict the value using the last observation using a rolling window.
The data frame looks like:
> DfPredictor[1:40,]
Y X1 X2 X3 X4 X5
1 3.2860 192.5115 2.1275 83381 11.4360 8.7440
2 3.2650 190.1462 2.0050 88720 11.4359 8.8971
3 3.2213 192.9773 2.0500 74130 11.4623 8.8380
4 3.1991 193.7058 2.1050 73930 11.3366 8.7536
5 3.2224 193.5407 2.0275 80875 11.3534 8.7555
6 3.2000 190.6049 2.0950 86606 11.3290 8.8555
7 3.1939 191.1390 2.0975 91402 11.2960 8.8433
8 3.1971 192.2921 2.2700 88181 11.2930 8.8681
9 3.1873 194.9700 2.3300 115959 1.9477 8.5245
10 3.2182 194.5396 2.4200 134754 11.3200 8.4990
11 3.2409 194.5396 2.2025 136685 1.9649 8.4192
12 3.2112 195.1362 2.1900 136316 1.9750 8.3752
13 3.2231 193.3560 2.2475 140295 1.9691 8.3546
14 3.2015 192.9649 2.2575 139474 1.9500 8.3116
15 3.1744 194.0154 2.1900 146202 1.8476 8.2225
16 3.1646 194.4423 2.2650 142983 1.8600 8.1948
17 3.1708 194.9473 2.2425 141377 1.8522 8.2589
18 3.1675 193.9788 2.2400 141377 1.8600 8.2600
19 3.1744 194.2563 2.3000 149875 1.8718 8.2899
20 3.1410 193.4316 2.2300 129561 1.8480 8.2395
21 3.1266 191.2633 2.2550 122636 1.8440 8.2396
22 3.1486 192.0354 2.3600 130996 1.8570 8.8640
23 3.1282 194.3351 2.4825 92430 1.7849 8.1291
24 3.1214 193.5196 2.4750 94814 1.7624 8.1991
25 3.1230 193.2017 2.3725 87590 1.7660 8.2310
26 3.1182 192.1642 2.4475 87715 1.6955 8.2414
27 3.1203 191.3744 2.3775 89857 1.6539 8.2480
28 3.1156 192.2646 2.3725 92159 1.5976 8.1676
29 3.1270 192.7555 2.3675 97425 1.5896 8.1162
30 3.1154 194.0375 2.3725 87598 1.5277 8.2640
31 3.1104 192.0596 2.3850 93236 1.5132 7.9999
32 3.0846 192.2792 2.2900 94608 1.4990 8.1600
33 3.0569 193.2573 2.3050 84663 1.4715 8.2200
34 3.0893 192.7632 2.2550 67149 1.4955 7.9590
35 3.0991 192.1229 2.3050 75519 1.4280 7.9183
36 3.0879 192.1229 2.3100 76756 1.3839 7.9133
37 3.0965 192.0502 2.2175 61748 1.3130 7.8750
38 3.0655 191.2274 2.2300 41490 1.2823 7.8656
39 3.0636 191.6342 2.1925 51049 1.1492 7.7447
40 3.1097 190.9312 2.2150 21934 1.1626 7.6895
For instance using the rolling window with width = 10 the regression should be estimate and then predict the 'Y' correspondent to the X1,X2,...,X5.
The predictions should be included in a new column 'Ypred'.
There's some way to do that using rollapply + lm/predict + mudate??
Many thanks!!
Using the data in the Note at the end and assuming that in a window of width 10 we want to predict the last Y (i..e. the 10th), then:
library(zoo)
pred <- function(x) tail(fitted(lm(Y ~., as.data.frame(x))), 1)
transform(DF, pred = rollapplyr(DF, 10, pred, by.column = FALSE, fill = NA))
giving:
Y X1 X2 X3 X4 X5 pred
1 3.2860 192.5115 2.1275 83381 11.4360 8.7440 NA
2 3.2650 190.1462 2.0050 88720 11.4359 8.8971 NA
3 3.2213 192.9773 2.0500 74130 11.4623 8.8380 NA
4 3.1991 193.7058 2.1050 73930 11.3366 8.7536 NA
5 3.2224 193.5407 2.0275 80875 11.3534 8.7555 NA
6 3.2000 190.6049 2.0950 86606 11.3290 8.8555 NA
7 3.1939 191.1390 2.0975 91402 11.2960 8.8433 NA
8 3.1971 192.2921 2.2700 88181 11.2930 8.8681 NA
9 3.1873 194.9700 2.3300 115959 1.9477 8.5245 NA
10 3.2182 194.5396 2.4200 134754 11.3200 8.4990 3.219764
11 3.2409 194.5396 2.2025 136685 1.9649 8.4192 3.241614
12 3.2112 195.1362 2.1900 136316 1.9750 8.3752 3.225423
13 3.2231 193.3560 2.2475 140295 1.9691 8.3546 3.217797
14 3.2015 192.9649 2.2575 139474 1.9500 8.3116 3.205856
15 3.1744 194.0154 2.1900 146202 1.8476 8.2225 3.177928
16 3.1646 194.4423 2.2650 142983 1.8600 8.1948 3.156405
17 3.1708 194.9473 2.2425 141377 1.8522 8.2589 3.176243
18 3.1675 193.9788 2.2400 141377 1.8600 8.2600 3.177165
19 3.1744 194.2563 2.3000 149875 1.8718 8.2899 3.177211
20 3.1410 193.4316 2.2300 129561 1.8480 8.2395 3.145533
21 3.1266 191.2633 2.2550 122636 1.8440 8.2396 3.127410
22 3.1486 192.0354 2.3600 130996 1.8570 8.8640 3.148792
23 3.1282 194.3351 2.4825 92430 1.7849 8.1291 3.124913
24 3.1214 193.5196 2.4750 94814 1.7624 8.1991 3.124992
25 3.1230 193.2017 2.3725 87590 1.7660 8.2310 3.117981
26 3.1182 192.1642 2.4475 87715 1.6955 8.2414 3.117679
27 3.1203 191.3744 2.3775 89857 1.6539 8.2480 3.119898
28 3.1156 192.2646 2.3725 92159 1.5976 8.1676 3.121039
29 3.1270 192.7555 2.3675 97425 1.5896 8.1162 3.123903
30 3.1154 194.0375 2.3725 87598 1.5277 8.2640 3.119438
31 3.1104 192.0596 2.3850 93236 1.5132 7.9999 3.113963
32 3.0846 192.2792 2.2900 94608 1.4990 8.1600 3.101229
33 3.0569 193.2573 2.3050 84663 1.4715 8.2200 3.076817
34 3.0893 192.7632 2.2550 67149 1.4955 7.9590 3.083266
35 3.0991 192.1229 2.3050 75519 1.4280 7.9183 3.089377
36 3.0879 192.1229 2.3100 76756 1.3839 7.9133 3.084225
37 3.0965 192.0502 2.2175 61748 1.3130 7.8750 3.075252
38 3.0655 191.2274 2.2300 41490 1.2823 7.8656 3.063025
39 3.0636 191.6342 2.1925 51049 1.1492 7.7447 3.068808
40 3.1097 190.9312 2.2150 21934 1.1626 7.6895 3.091819
Note: Input DF in reproducible form is:
Lines <- " Y X1 X2 X3 X4 X5
1 3.2860 192.5115 2.1275 83381 11.4360 8.7440
2 3.2650 190.1462 2.0050 88720 11.4359 8.8971
3 3.2213 192.9773 2.0500 74130 11.4623 8.8380
4 3.1991 193.7058 2.1050 73930 11.3366 8.7536
5 3.2224 193.5407 2.0275 80875 11.3534 8.7555
6 3.2000 190.6049 2.0950 86606 11.3290 8.8555
7 3.1939 191.1390 2.0975 91402 11.2960 8.8433
8 3.1971 192.2921 2.2700 88181 11.2930 8.8681
9 3.1873 194.9700 2.3300 115959 1.9477 8.5245
10 3.2182 194.5396 2.4200 134754 11.3200 8.4990
11 3.2409 194.5396 2.2025 136685 1.9649 8.4192
12 3.2112 195.1362 2.1900 136316 1.9750 8.3752
13 3.2231 193.3560 2.2475 140295 1.9691 8.3546
14 3.2015 192.9649 2.2575 139474 1.9500 8.3116
15 3.1744 194.0154 2.1900 146202 1.8476 8.2225
16 3.1646 194.4423 2.2650 142983 1.8600 8.1948
17 3.1708 194.9473 2.2425 141377 1.8522 8.2589
18 3.1675 193.9788 2.2400 141377 1.8600 8.2600
19 3.1744 194.2563 2.3000 149875 1.8718 8.2899
20 3.1410 193.4316 2.2300 129561 1.8480 8.2395
21 3.1266 191.2633 2.2550 122636 1.8440 8.2396
22 3.1486 192.0354 2.3600 130996 1.8570 8.8640
23 3.1282 194.3351 2.4825 92430 1.7849 8.1291
24 3.1214 193.5196 2.4750 94814 1.7624 8.1991
25 3.1230 193.2017 2.3725 87590 1.7660 8.2310
26 3.1182 192.1642 2.4475 87715 1.6955 8.2414
27 3.1203 191.3744 2.3775 89857 1.6539 8.2480
28 3.1156 192.2646 2.3725 92159 1.5976 8.1676
29 3.1270 192.7555 2.3675 97425 1.5896 8.1162
30 3.1154 194.0375 2.3725 87598 1.5277 8.2640
31 3.1104 192.0596 2.3850 93236 1.5132 7.9999
32 3.0846 192.2792 2.2900 94608 1.4990 8.1600
33 3.0569 193.2573 2.3050 84663 1.4715 8.2200
34 3.0893 192.7632 2.2550 67149 1.4955 7.9590
35 3.0991 192.1229 2.3050 75519 1.4280 7.9183
36 3.0879 192.1229 2.3100 76756 1.3839 7.9133
37 3.0965 192.0502 2.2175 61748 1.3130 7.8750
38 3.0655 191.2274 2.2300 41490 1.2823 7.8656
39 3.0636 191.6342 2.1925 51049 1.1492 7.7447
40 3.1097 190.9312 2.2150 21934 1.1626 7.6895"
DF <- read.table(text = Lines, header = TRUE)
I am trying in R to indicate in which quintile a value of a variable is for every month of my data frame in this case based on volatility.
For each month I want to know for each stock if it is in the most volatile quintile of if it is in one of the others.
So far I have come up with the following function (see below). Unfortunately, the function only works in some cases and often gives the following error:
Error in cut.default(df$VOLATILITY, unique(breaks), label = FALSE, na.rm =TRUE):
invalid number of intervals
Could you give me some advice on how to improve this code so that it works properly.
It's relatively urgent. Many thanks!
quintilesVolByMonth <- function(x){
months<-as.vector(unique(x$DATE))
dfx<-data.frame()
for(n in seq(1,length(months))){
num<-5
print(paste("Appending month",months[n],sep=""))
df<-subset(x,DATE==months[n])
breaks<-quantile(df$VOLATILITY,probs=seq(0,1, 1/num),na.rm=TRUE)
df$volquintile <- cut(df$VOLATILITY,unique(breaks),
label=FALSE, na.rm=TRUE)
dfx<-rbind(dfx,df)
}
return(dfx)
}
Frame.Quintile <- quintilesVolByMonth(x)
EXAMPLE OF THE DATA: The last column is what I am trying to get. The data here is just an example and not actual results.
> DATE <- c("01/10/2011","01/10/2012","01/10/2010","01/08/2010","01/10/2011","01/12/2011","01/09/2011","01/10/2011","01/09/2012","01/08/2012","01/02/2010","01/01/2011","01/09/2010","01/06/2010","01/07/2010","01/01/2012","01/01/2012","01/11/2011","01/09/2011","01/10/2011")
> NAME<-c("HOEK'S MACHINE DEAD - DELIST.","WORLD SCOPE (CADB TEST STOCK)","BRILL (KON.)", "BBL DEAD - 30/06/465", "GENK LOGISTICS","GROENIJK.YLCBN. DEAD - DELIST.31/05/479", "NOORD-EUR.HOUTH.","PALTHE DEAD - 4/2/475","GENERALE BANQUE DEAD - DEL. 30/12/490","STORK DEAD - TAKEOVER 905099","LOUVAIN-LA-NEUVE","VENTOS DEAD - 06/06/384","BRAINE-LE-COMTE SUSP 14/02/460","VILENZO DEAD - 25/11/370","ECONOSTO KON. DEAD - 07/07/374","ELECTRORAIL DEAD - DELIST 21/02/387","BLYSTEIN FL.1384","OBOURG (CIMENTS)","BRUGEFI DEAD - 31/07/475","GIB NEW")
> VOLATILITY<-c(0.3383, 0.084, 0.046, 0.0945, 0.0465, 0.2008, 0.1361, 0.2183, 0.1032, 0.1083, 0.0494, 0.0538, 0.0357, 0.037, 0.0386, 0.073, 0.073, 0.0393, 0.0687, 0.3308)
> VOLQUINTILE<-c(4,1,1,2,2,3,2,3,4,2,3,2,4,1,2,1,1,2,3,4)
>
> x<-data.frame(DATE,NAME,VOLATILITY, VOLQUINTILE)
> x
DATE NAME VOLATILITY VOLQUINTILE
1 01/10/2011 HOEK'S MACHINE DEAD - DELIST. 0.3383 4
2 01/10/2012 WORLD SCOPE (CADB TEST STOCK) 0.0840 1
3 01/10/2010 BRILL (KON.) 0.0460 1
4 01/08/2010 BBL DEAD - 30/06/465 0.0945 2
5 01/10/2011 GENK LOGISTICS 0.0465 2
6 01/12/2011 GROENIJK.YLCBN. DEAD - DELIST.31/05/479 0.2008 3
7 01/09/2011 NOORD-EUR.HOUTH. 0.1361 2
8 01/10/2011 PALTHE DEAD - 4/2/475 0.2183 3
9 01/09/2012 GENERALE BANQUE DEAD - DEL. 30/12/490 0.1032 4
10 01/08/2012 STORK DEAD - TAKEOVER 905099 0.1083 2
11 01/02/2010 LOUVAIN-LA-NEUVE 0.0494 3
12 01/01/2011 VENTOS DEAD - 06/06/384 0.0538 2
13 01/09/2010 BRAINE-LE-COMTE SUSP 14/02/460 0.0357 4
14 01/06/2010 VILENZO DEAD - 25/11/370 0.0370 1
15 01/07/2010 ECONOSTO KON. DEAD - 07/07/374 0.0386 2
16 01/01/2012 ELECTRORAIL DEAD - DELIST 21/02/387 0.0730 1
17 01/01/2012 BLYSTEIN FL.1384 0.0730 1
18 01/11/2011 OBOURG (CIMENTS) 0.0393 2
19 01/09/2011 BRUGEFI DEAD - 31/07/475 0.0687 3
20 01/10/2011 GIB NEW 0.3308 4
Does this work for you?
library(plyr)
vol1<-ddply(mydata,.(DATE), transform, max.name=NAME[which.max(quantile(VOLATILITY))])
DATE NAME VOLATILITY max.name
1 01/01/2011 VENTOS DEAD - 06/06/384 0.0538 VENTOS DEAD - 06/06/384
2 01/01/2012 ELECTRORAIL DEAD - DELIST 21/02/387 0.0730 ELECTRORAIL DEAD - DELIST 21/02/387
3 01/01/2012 BLYSTEIN FL.1384 0.0730 ELECTRORAIL DEAD - DELIST 21/02/387
4 01/02/2010 LOUVAIN-LA-NEUVE 0.0494 LOUVAIN-LA-NEUVE
5 01/06/2010 VILENZO DEAD - 25/11/370 0.0370 VILENZO DEAD - 25/11/370
6 01/07/2010 ECONOSTO KON. DEAD - 07/07/374 0.0386 ECONOSTO KON. DEAD - 07/07/374
7 01/08/2010 BBL DEAD - 30/06/465 0.0945 BBL DEAD - 30/06/465
8 01/08/2012 STORK DEAD - TAKEOVER 905099 0.1083 STORK DEAD - TAKEOVER 905099
9 01/09/2010 BRAINE-LE-COMTE SUSP 14/02/460 0.0357 BRAINE-LE-COMTE SUSP 14/02/460
10 01/09/2011 NOORD-EUR.HOUTH. 0.1361 <NA>
11 01/09/2011 BRUGEFI DEAD - 31/07/475 0.0687 <NA>
12 01/09/2012 GENERALE BANQUE DEAD - DEL. 30/12/490 0.1032 GENERALE BANQUE DEAD - DEL. 30/12/490
13 01/10/2010 BRILL (KON.) 0.0460 BRILL (KON.)
14 01/10/2011 HOEK'S MACHINE DEAD - DELIST. 0.3383 <NA>
15 01/10/2011 GENK LOGISTICS 0.0465 <NA>
16 01/10/2011 PALTHE DEAD - 4/2/475 0.2183 <NA>
17 01/10/2011 GIB NEW 0.3308 <NA>
18 01/10/2012 WORLD SCOPE (CADB TEST STOCK) 0.0840 WORLD SCOPE (CADB TEST STOCK)
19 01/11/2011 OBOURG (CIMENTS) 0.0393 OBOURG (CIMENTS)
20 01/12/2011 GROENIJK.YLCBN. DEAD - DELIST.31/05/479 0.2008 GROENIJK.YLCBN. DEAD - DELIST.31/05/479
Updated solution:
library(plyr)
vol2<-ddply(x,.(DATE), transform,quantile=ifelse(VOLATILITY<quantile(VOLATILITY,p=0.25),1,
ifelse(((VOLATILITY>quantile(VOLATILITY,p=0.25))& (VOLATILITY<quantile(VOLATILITY,p=0.5))),2,ifelse(((VOLATILITY>quantile(VOLATILITY,p=0.5))& VOLATILITY<quantile(VOLATILITY,p=0.75)),3,4))))
DATE NAME VOLATILITY quantile
1 01/01/2011 VENTOS DEAD - 06/06/384 0.0538 4
2 01/01/2012 ELECTRORAIL DEAD - DELIST 21/02/387 0.0730 4
3 01/01/2012 BLYSTEIN FL.1384 0.0730 4
4 01/02/2010 LOUVAIN-LA-NEUVE 0.0494 4
5 01/06/2010 VILENZO DEAD - 25/11/370 0.0370 4
6 01/07/2010 ECONOSTO KON. DEAD - 07/07/374 0.0386 4
7 01/08/2010 BBL DEAD - 30/06/465 0.0945 4
8 01/08/2012 STORK DEAD - TAKEOVER 905099 0.1083 4
9 01/09/2010 BRAINE-LE-COMTE SUSP 14/02/460 0.0357 4
10 01/09/2011 NOORD-EUR.HOUTH. 0.1361 4
11 01/09/2011 BRUGEFI DEAD - 31/07/475 0.0687 1
12 01/09/2012 GENERALE BANQUE DEAD - DEL. 30/12/490 0.1032 4
13 01/10/2010 BRILL (KON.) 0.0460 4
14 01/10/2011 HOEK'S MACHINE DEAD - DELIST. 0.3383 4
15 01/10/2011 GENK LOGISTICS 0.0465 1
16 01/10/2011 PALTHE DEAD - 4/2/475 0.2183 2
17 01/10/2011 GIB NEW 0.3308 3
18 01/10/2012 WORLD SCOPE (CADB TEST STOCK) 0.0840 4
19 01/11/2011 OBOURG (CIMENTS) 0.0393 4
20 01/12/2011 GROENIJK.YLCBN. DEAD - DELIST.31/05/479 0.2008 4