I would like to collect information on several stocks using a loop and save all the information required into a single data frame. I need to use a loop because the approach I have used (see below) is not very efficient. It retrieves information only for select stocks and skips some. Below is what I've tried:
library(quantmod)
library(TTR)
stocks <-c("MRO", "TSLA", "HAL", "XOM", "DIN", "DRI", "DENN","WEN", "SPCE", "DE", "DRI", "KSS", "AAL","DFS", "LYV","SPXL")
dataEnv <- new.env()
getSymbols(stocks, from = "2014-02-01",to= "2016-01-01", env=dataEnv)
plist <- eapply(dataEnv,Ad)
pframe <- do.call(merge, plist)
pframe1 <- as.data.frame(apply(pframe[,1:ncol(pframe)],2,function(x) diff(x)*100/head(x,-1)))
You can either use the tidyquant or the BatchGetSymbols package. My personal preference is the latter when dealing with data coming from yahoo.
Using tidyquant:
library(tidyquant)
stocks <-c("MRO", "TSLA", "HAL", "XOM", "DIN", "DRI", "DENN","WEN", "SPCE", "DE", "DRI", "KSS", "AAL","DFS", "LYV","SPXL")
tq_stocks <- tq_get(stocks, from = "2014-02-01",to= "2016-01-01")
tq_stocks
# A tibble: 7,245 x 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 MRO 2014-02-03 32.8 32.8 32.0 32.1 8983000 28.1
2 MRO 2014-02-04 32.2 32.4 31.9 32.3 10932900 28.4
3 MRO 2014-02-05 32.3 32.4 31.6 32.1 6534500 28.1
4 MRO 2014-02-06 31.7 33.0 31.6 31.8 9408400 27.9
5 MRO 2014-02-07 31.9 32.8 31.7 32.6 8184400 28.6
6 MRO 2014-02-10 32.5 32.5 32.0 32.3 5862600 28.3
7 MRO 2014-02-11 32.3 32.9 32.3 32.7 6140400 28.7
8 MRO 2014-02-12 33.0 33.3 32.8 33.3 5202500 29.2
9 MRO 2014-02-13 33.0 33.4 32.7 33.3 6755900 29.2
10 MRO 2014-02-14 33.0 33.4 32.9 33.2 6096300 29.3
tidyquant will give some warnings. These you can ignore, a ticket has been opened to address these.
Using BatchGetSymbols:
library(BatchGetSymbols)
batch_stocks <- BatchGetSymbols(stocks, first.date = "2014-02-01", last.date = "2016-01-01")
str(batch_stocks)
List of 2
$ df.control: tibble [15 x 6] (S3: tbl_df/tbl/data.frame)
..$ ticker : chr [1:15] "MRO" "TSLA" "HAL" "XOM" ...
..$ src : chr [1:15] "yahoo" "yahoo" "yahoo" "yahoo" ...
..$ download.status : chr [1:15] "OK" "OK" "OK" "OK" ...
..$ total.obs : int [1:15] 483 483 483 483 483 483 483 483 483 483 ...
..$ perc.benchmark.dates: num [1:15] 1 1 1 1 1 1 1 1 1 1 ...
..$ threshold.decision : chr [1:15] "KEEP" "KEEP" "KEEP" "KEEP" ...
$ df.tickers:'data.frame': 6762 obs. of 10 variables:
..$ price.open : num [1:6762] 32.8 32.2 32.3 31.7 31.9 ...
..$ price.high : num [1:6762] 32.8 32.4 32.4 33 32.8 ...
..$ price.low : num [1:6762] 32 31.9 31.6 31.6 31.7 ...
..$ price.close : num [1:6762] 32.1 32.3 32.1 31.8 32.6 ...
..$ volume : num [1:6762] 8983000 10932900 6534500 9408400 8184400 ...
..$ price.adjusted : num [1:6762] 28.1 28.4 28.1 27.9 28.6 ...
..$ ref.date : Date[1:6762], format: "2014-02-03" "2014-02-04" "2014-02-05" "2014-02-06" ...
..$ ticker : chr [1:6762] "MRO" "MRO" "MRO" "MRO" ...
..$ ret.adjusted.prices: num [1:6762] NA 0.00873 -0.00742 -0.00903 0.02483 ...
..$ ret.closing.prices : num [1:6762] NA 0.00873 -0.00742 -0.00903 0.02483 ...
batch_stocks will be a list of 2 data.frames. The first is a control data.frame that shows if all the tickers have been downloaded correctly. The second data.frame contains all the ticker data. An advantage of BatchGetSymbols is that it can run in parallel if you use it in combination with the future package. Also, if you already have the data locally it will not download the data again. So running this 3 times in a row, it will only download the data once, and get the rest from the temporarily stored data.
Related
I have built a logistic regression model with the dependent variable WinParty, which outputs fine. Then when trying to do variable selection with stepAIC I keep getting this error
Data Structure
tibble [2,467 × 25] (S3: tbl_df/tbl/data.frame)
$ PollingPlace : chr [1:2467] "Abbotsbury" "Abbotsford" "Abbotsford East" "Aberdare" ...
$ CoalitionVotes : int [1:2467] 9438 15548 3960 3164 2370 4524 3186 10710 372 5993 ...
$ VoteDifference : num [1:2467] 0.1397 -0.0579 0.0796 -0.2454 0.2623 ...
$ Liberal.National.Coalition.Percentage: num [1:2467] 57 47.1 54 37.7 63.1 ...
$ WinParty : num [1:2467] 1 0 1 0 1 0 0 0 1 0 ...
$ Median_age_persons : num [1:2467] 43 46 41.5 37 41 31 37 36 57.5 41 ...
$ Median_mortgage_repay_monthly : num [1:2467] 2232 3000 2831 1452 1559 ...
$ Median_tot_prsnl_inc_weekly : num [1:2467] 818 1262 1380 627 719 ...
$ Median_rent_weekly : num [1:2467] 550 595 576 310 290 ...
$ Median_tot_fam_inc_weekly : num [1:2467] 2541 3062 3126 1521 2021 ...
$ Average_household_size : num [1:2467] 3.27 2.35 2.28 2.46 2.38 ...
$ Indig_Percent : num [1:2467] 0 0 1.09 10.94 10.61 ...
$ BirthPlace_Aus : num [1:2467] 60.9 67.9 61.7 90.9 89 ...
$ Other_lang_Percen : num [1:2467] 44.97 25.85 28.71 2.58 2.45 ...
$ Aus_Cit_Percent : num [1:2467] 91.5 91.5 86.6 93.7 91.9 ...
$ Yr12_Comp_Percent : num [1:2467] 49.7 57.1 62.7 25 23.1 ...
$ Pop_Density_SQKM : num [1:2467] 2849 6112 7951 1686 334 ...
$ Industrial_Percent : num [1:2467] 6.24 3.95 4.69 8.3 15.31 ...
$ Population_Serving_Percent : num [1:2467] 16 12.9 15.1 16.1 13.6 ...
$ Health_Education_Percent : num [1:2467] 9.26 11.43 10.28 9.07 7.79 ...
$ Knowledge_Intensive_Percent : num [1:2467] 11.31 19.64 17.06 7.44 6.56 ...
$ Over60_Yr : num [1:2467] 25.1 31.6 24.9 20.6 25.3 ...
$ GenZ : num [1:2467] 24.5 20 25.9 26.2 23.6 ...
$ GenX : num [1:2467] 27 29.1 26.6 25.8 26.1 ...
$ Millenials : num [1:2467] 23.3 20.3 19.7 27.3 27.1 ...
- attr(*, "na.action")= 'omit' Named int [1:8] 264 647 843 1332 1774 2033 2077 2138
..- attr(*, "names")= chr [1:8] "264" "647" "843" "1332" ...
The glm function computes the logistic regression with no errors
mod1 <- glm(WinParty~Median_age_persons+Median_rent_weekly+
Median_tot_fam_inc_weekly+Indig_Percent+BirthPlace_Aus+
Other_lang_Percen+Aus_Cit_Percent+Yr12_Comp_Percent+
Industrial_Percent+Population_Serving_Percent+Health_Education_Percent+
Knowledge_Intensive_Percent+Over60_Yr+GenZ+GenX+Millenials,
family = binomial(link = "logit"), data = GS_PP_Agg)
summary(mod1)
step1 <- stepAIC(mod1, scope = list(lower = "~1",upper = "~Median_age_persons+Median_rent_weekly+
Median_tot_fam_inc_weekly+Indig_Percent+BirthPlace_Aus+
Other_lang_Percen+Aus_Cit_Percent+Yr12_Comp_Percent+
Industrial_Percent+Population_Serving_Percent+Health_Education_Percent+
Knowledge_Intensive_Percent+Over60_Yr+GenZ+GenX+Millenials"), data = GS_PP_Agg)
Step AIC function returns the error:
"Error in FUN(left, right) : non-numeric argument to binary operator"
Some help in solving this error would be greatly appreciated!
I've run some analysis that outputs data in the following format:
> sft
Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
1 1 0.35400 8.4300 0.7710 146.00 145.00 166.0
2 2 0.21900 2.2500 0.8960 83.30 82.80 107.0
3 3 0.17300 1.1600 0.9310 49.90 49.80 72.0
4 4 0.04100 0.3070 0.7360 31.60 31.20 50.3
5 5 0.00165 -0.0298 0.4610 21.30 21.00 37.3
6 6 0.05310 -0.1780 -0.1240 15.30 14.60 28.9
7 7 0.21300 -0.2610 -0.0113 11.60 10.90 24.0
8 8 0.63800 -0.5280 0.5560 9.27 8.18 22.3
9 9 0.82500 -0.6310 0.8110 7.69 6.14 21.2
10 10 0.85000 -0.7400 0.8100 6.59 4.97 20.3
11 11 0.82200 -0.8310 0.7710 5.77 3.95 19.6
12 12 0.81900 -0.8480 0.7680 5.16 3.27 19.0
13 13 0.73300 -0.8670 0.6660 4.67 2.80 18.4
14 14 0.65300 -0.9170 0.5840 4.28 2.39 17.9
15 15 0.70200 -0.9130 0.6440 3.97 2.22 17.4
What I want is to extract the Power that gave the highest (maximum) SFT.R.sq value.
Here is the table's attributes:
>str(sft)
List of 2
$ powerEstimate: int NA
$ fitIndices :'data.frame': 15 obs. of 7 variables:
..$ Power : int [1:15] 1 2 3 4 5 6 7 8 9 10 ...
..$ SFT.R.sq : num [1:15] 0.35392 0.21883 0.17291 0.04098 0.00165 ...
..$ slope : num [1:15] 8.4267 2.2461 1.158 0.307 -0.0298 ...
..$ truncated.R.sq: num [1:15] 0.771 0.896 0.931 0.736 0.461 ...
..$ mean.k. : num [1:15] 145.8 83.3 49.9 31.6 21.3 ...
..$ median.k. : num [1:15] 145.1 82.8 49.8 31.2 21 ...
..$ max.k. : num [1:15] 165.6 107.1 72 50.3 37.3 ...
I can grab the two columns I need easily with:
sft$fitIndices$Power
sft$fitIndices$SFT.R.sq
But I can't get the actual power associated with the highest SFT.R.sq value:
>sft$fitIndices$Power[max(sft$fitIndices$SFT.R.sq)]
integer(0)
Examples of what I'm trying to do usually involve dataframes where you extract a value based on the value from another column - but it doesn't seem to work with attributes.
We need which.max to return the position of max value for subsetting the 'Power'
sft$fitIndices$Power[which.max(sft$fitIndices$SFT.R.sq)]
Also, if we need to slice the row, extract the data.frame element and slice
library(dplyr)
library(purrr)
pluck(sft, "fitIndices") %>%
slice_max(n = 1, order_by = "SFT.R.sq")
How do I properly refer to a list-column in R, when I am using a map (or any purrr function) function and want to utilize "x" from the map function in calling the appropriate list? For example, if I have a list of 3 (let's call it testlist) and within that list I have a series of single columns (that are dataframes). Each column consists of a list of character vectors (in this case they are a list of symbols to be input into tq_get in tidyqant). Below is some simplified code to help illustrate.
The following code works, but it's hardcoded:
library(tidyverse)
library(lubridate)
library(tidyquant)
library(purrr)
library(dplyr)
str(testlist)
List of 3
$ 2010-12-31:'data.frame': 12 obs. of 1 variable:
..$ symbol: chr [1:12] "ASH" "RS" "FUL" "RGLD" ...
$ 2011-12-31:'data.frame': 15 obs. of 1 variable:
..$ symbol: chr [1:15] "CBT" "RS" "TCK" "MEOH" ...
$ 2012-12-31:'data.frame': 13 obs. of 1 variable:
..$ symbol: chr [1:13] "CBT" "ATI" "RS" "SXT" ...
d <- tq_get((pull(testlist$`2012-12-31`)),
get = "stock.prices",
from = "2011-12-30",
to = "2013-12-31")
To clarify, each dataframe within the "testlist" list is labeled with a date. In this case 2012-12-31.
However, I would like vary the date when referring to each dataframe within "testlist". For example:
year <- as.Date("2012-12-31")
d <- tq_get((pull(testlist[year])),
get = "stock.prices",
from = "2011-12-30",
to = "2013-12-31")
This does not work. I have determined that if I'm referring to a column within a dataframe this will work:
testlist[,as.character(year)]
But clearly referring a column in a dataframe is different from referring to a dateframe within a list.
Here is the expected output. It works for the first example and does not work for the 2nd.
d
# A tibble: 6,526 x 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 CBT 2011-12-30 32.2 32.4 32.0 32.1 216100 25.9
2 CBT 2012-01-03 33.2 33.6 32.9 33.2 410500 26.8
3 CBT 2012-01-04 33.1 33.4 32.7 33.2 502100 26.8
4 CBT 2012-01-05 32.9 32.9 32.0 32.7 688400 26.4
5 CBT 2012-01-06 32.8 33.1 31.7 32.8 951900 26.4
6 CBT 2012-01-09 32.9 33.2 32.5 32.7 393100 26.4
7 CBT 2012-01-10 33.3 33.9 33.2 33.3 306300 26.9
8 CBT 2012-01-11 33.3 33.7 33.2 33.5 209700 27.0
9 CBT 2012-01-12 33.7 34.4 33.4 34.3 209800 27.7
10 CBT 2012-01-13 34.0 34.2 33.3 33.9 273200 27.4
# ... with 6,516 more rows
Any help would be appreciated!
I am using this data. When I want to define the variable as.Date() I am getting NA.
This is the code I am using. Can someone please advise what I am doing wrong?
dataf <- read.csv("DB.csv")
dataf$Date <- as.Date(dataf$Date, format = "%b-%Y")
str(dataf)
'data.frame': 55 obs. of 9 variables:
$ Date : Date, format: NA NA ...
$ Sydney : num 85.3 88.2 87 84.4 84.2 84.8 83.2 82.6 81.4 81.8 ...
$ Melbourne: num 60.7 62.1 60.8 60.9 60.9 62.4 62.5 63.2 63.1 64 ...
$ Brisbane : num 64.2 69.4 70.7 71.7 71.2 72 72.6 73.3 73.6 75 ...
$ Adelaide : num 62.2 63.9 64.8 65.9 67.1 68.6 68.6 69.3 70 71.6 ...
$ Perth : num 48.3 50.6 52.5 53.7 54.7 57.1 59.4 62.6 65.2 70 ...
$ Hobart : num 61.2 66.5 68.7 71.8 72.3 74.6 75.8 76.9 76.7 79.1 ...
$ Darwin : num 40.5 43.3 45.5 45.2 46.8 49.7 53.6 54.7 56.1 60.2 ...
$ Canberra : num 68.3 70.9 69.9 70.1 68.6 69.7 70.3 69.9 70.1 71.7 ...
In addition to the good suggestions in the comments, you should try lubridate::parse_date_time" which can handle incomplete dates
as.Date("01-2017", format="%m-%Y")
# [1] NA
as.POSIXct("01-2017", format="%m-%Y")
# [1] NA
as.POSIXlt("01-2017", format="%m-%Y")
# [1] NA
library(lubridate)
parse_date_time("01-2017", "my")
# [1] "2017-01-01 UTC"
Currently, I am writing my bachelor thesis in economics. One part of my work is a comparison of ETF returns and the returns of their benchmark indices. For this, I want to use an r-script. At the moment I have loading my raw closing prices in the program and named the tables "ETFs" and "Benchmark". My next step is to calculate the daily returns of the ETFs with this closing prices. I've started with a for-loop but it failed.
The error term was:
Error in daylyreturn_ETFs[r, c] <- (ETFs[r, currColName]/ETFs[(r + 1), :
incorrect number of subscripts on matrix
Varaibles:
ETFs:
'data.frame': 1672 obs. of 21 variables:
$ Name : Factor w/ 1636 levels "01.02.2010","01.02.2011",..: 1608 1557 1502 1449 1252 1194 1139 1084 1029 863 ...
$ iShares.Core.S.P.500.USD.Acc : num 203 203 206 205 205 ...
$ iShares.Core.DAX.U.00AE...DE. : num 100 100 100 100 100 ...
$ iShares.Core.MSCI.World.USD.Acc : num 42 42.2 42.8 42.5 42.6 ...
$ iShares.S.P.500.USD.Dist : num 21.2 21.3 21.7 21.6 21.5 ...
$ iShares.EURO.STOXX.50..DE. : num 33.1 32.9 33 33 32.9 ...
$ iShares.Core..U.0080..Corp.Bond.EUR.Dist : num 130 130 130 130 130 ...
$ Lyxor.Euro.Stoxx.50.DR.ETF.D.EUR.A.I : num 31.9 31.9 32 32 32 ...
$ iShares..U.0080..High.Yield.Corp.Bond.EUR.Dist: num 107 107 106 106 106 ...
$ iShares.JP.Morgan...EM.Bond.USD.Dist : num 104 104 105 104 104 ...
$ iShares.MSCI.Europe.Dist : num 22.6 22.5 22.6 22.5 22.5 ...
$ iShares.STOXX.Europe.600..DE. : num 36.1 36 36.1 36 36 ...
$ iShares.EURO.STOXX.50.Dist : num 33.1 33.1 33.2 33.2 33.2 ...
$ iShares.MSCI.World.USD.Dist : num 35.5 35.5 35.9 35.8 35.7 ...
$ iShares.Edge.MSCI.USA.Size.Factor : num 5.28 5.27 5.3 5.29 4.32 4.32 4.31 4.32 4.33 4.28 ...
$ ETFS.Physical.Gold : num 106 106 106 105 104 ...
$ iShares.iBonds.Mar.2020.Term.Corp.exFncl : num 24.6 24.6 24.5 24.5 24.5 ...
$ iShares.Euro.Corporate.Bond.Large.Cap : num 135 135 135 135 135 ...
$ db.x.trackers.Euro.Stoxx.50..DR..1D : num 34.8 34.6 34.7 34.7 34.6 ...
$ db.x.trackers.Euro.Stoxx.50..DR..1C : num 44.1 44 44.1 44.1 44 ...
$ Xetra.Gold : num 35.3 35.5 35.3 35 34.9 ...`
Code:
library(readr)
werte <- read_delim("~/Uni Frankfurt/Semester 7/Bachelorarbeit/R/werte.csv", ";", escape_double = FALSE, trim_ws = TRUE)
library(readr)
werte1 <- read_delim("~/Uni Frankfurt/Semester 7/Bachelorarbeit/R/werte1.csv", ";", escape_double = FALSE, trim_ws = TRUE)
write.table(x=werte, file = "werte.dat", sep = ";", dec = ",", row.names = FALSE, col.names = TRUE)
ETFs <- read.table("werte.dat", sep = ";", dec = ",", header = TRUE)
ETFs
write.table(x=werte1, file = "werte1.dat", sep = ";", dec = ".", row.names = FALSE, col.names = TRUE)
Benchmark <- read.table("werte1.dat", sep = ";", dec = ".", header = TRUE)
Benchmark
dailyreturn_ETFs<- array()
str(ETFs)
for(c in 2:ncol(ETFs))
{
currColName <- colnames(ETFs)[c];
for(r in nrow(ETFs[c])-1:1)
{
dailyreturn_ETFs[r,c] <- (ETFs[r,currColName]/ETFs[(r+1),currColName])-1
}
}
I am very thankful for every help. If you need additional information the get rid of the problem don't hesitate to ask.