Afternoon! I'm just starting out with R and learning about data frames, packages, etc... read a lot of the messages here but couldn't find an answer.
I have a table I'm accessing with R that has the following fields:
And, I'm calculating SMAs on the close prices:
sqlQuery <- "Select * from [dbo].[Stock_Data]"
conn <- odbcDriverConnect(connectionString)
dfSMA <- sqlQuery(conn, sqlQuery)
sma20 <- SMA(dfSMA$Close, n = 20)
dfSMA["SMA20"] <- sma20
When I look at the output, it appears to be calculating the SMA without any regard for what the symbol is. I haven't tried to replicate the calculation, but I would suspect it's just doing it by 20 moving rows, regardless of date/symbol.
How do I restrict the calculation to a given symbol?
Any help is appreciated - just need to be pointed in the right direction.

You're far more likely to get answers if you provide reproducible examples. First, let's replicate your data:
symbols <- c("GS", "MS")
# Create example data:
dGS <- data.frame("Symbol" = "GS", "Date" = index(GS), coredata(OHLCV(GS)))
names(dGS) <- str_replace(names(dGS), "GS\\.", "")
dMS <- data.frame("Symbol" = "MS", "Date" = index(MS), coredata(OHLCV(MS)))
names(dMS) <- str_replace(names(dMS), "MS\\.", "")
dfSMA <- rbind(dGS, dMS)
> head(dfSMA)
Symbol Date Open High Low Close Volume Adjusted
1 GS 2007-01-03 200.60 203.32 197.82 200.72 6494900 178.6391
2 GS 2007-01-04 200.22 200.67 198.07 198.85 6460200 176.9748
3 GS 2007-01-05 198.43 200.00 197.90 199.05 5892900 177.1528
4 GS 2007-01-08 199.05 203.95 198.10 203.73 7851000 181.3180
5 GS 2007-01-09 203.54 204.90 202.00 204.08 7147100 181.6295
6 GS 2007-01-10 203.40 208.44 201.50 208.11 8025700 185.2161
What you want to do is subset your long data object, and then apply technical indicators on each symbol in isolation. Here is one approach to guide you toward acheiving your desired result.
You could do this using a list, and build the indicators on xts data objects for each symbol, not on a data.frame like you do in your example (You can apply the TTR functions to columns in a data.frame but it is ugly -- work with xts objects is much more ideal). This is template for how you could do it. The final output should be intuitive to work with. Keep each symbol in a separate "Container" (element of the list) rather than combining all the symbols in one data.frame which isn't easy to work with.
make_xts_from_long_df <- function(x) {
# Subset the symbol you desire
res <- dfSMA[dfSMA$Symbol == x, ]
#Create xts, then allow easy merge of technical indicators
x_res <- xts(OHLCV(res), = res$Date)
merge(x_res, SMA(Cl(x_res), n = 20))
} <- setNames(lapply(symbols, make_xts_from_long_df), symbols)


Recursive / Expanding Window forecasts

I am having a small issue with my Rstudio code. I will try to replicate my code but unfortunately there is no easy data for me to show. This is about the package forecast. What I am looking for is somehwat simpler for what is in the manual. But unfortunately, I am not able to work round it.
so the issue is with an expanding window forecast. So I have a dependent variable Y and 3 regressors (X). I am trying to build a recursive one steap ahead forecast for each X.
Here is my code.
## Load data
data = Dataset[,2:ncol(Dataset)]
st <- as.Date("1990-1-1")
en <- as.Date("2020-12-1")
tt <- seq(st, en, by = "1 month")
data = xts(data,
RECFORECAST=function (Y,X,h,window){
st <- as.Date("1990-1-1")
en <- as.Date("2020-12-1")
tt <- seq(st, en, by = "1 month")
datas= cbind(Y,X)
newfcast= matrix(0,nrow(datas),h)
for (k in 1:nrow(datas)){
sample =datas[1:(window+k-1),]
# print(sample)
v= window+k
# print(v)
# fit = Arima(sample[,1], order=c(0,0,0),xreg=sample[,2])
fit = lm(sample[,1]~sample[,2], data = sample)
# fcast=forecast(fit,xreg=rep(sample[v,2],h))$mean
fcast = forecast.lm(fit,sample[v,2],h=1)$mean
# print(fcast)
# newfcast[k+window+1,]=fcast
## Code to send the loop into forecasts
StoreMatrix = data$growth ## This is the first column data[,1]
for (i in 2:4)
RecModel=RECFORECAST(Y,X,h=1,window=60) ##Here the initial window is 60 obs
}, silent=T)
The bits # were different ways I tried to crosscheck my data and they may not be useful. I have tried so many things but I don't seem to be able to get my head through it. At the end I want to have a matrix (StoreMatrix) with the first variable being the realization, and each of the columns with the corresponding 1 step ahead forecast.
The main lines where there seems to be an issue are these ones:
# fcast=forecast(fit,xreg=rep(sample[v,2],h))$mean
fcast = forecast.lm(fit,sample[v,2],h=1)$mean
Note sure how to solve this. Thank you very much.

Subset an xts by year into a list. Subset an xts by year and month into a list

im new to R and the stack platforms.
sti <- getSymbols("^STI", src = "yahoo", auto.assign = F, from = "2007-01-01", to = "2017-12-31")
sti_adjusted <- sti[, 6]
I done this in order to subset the data into a list of years.
ls_sti_adjusted <- list(sti_adjusted["2007"], sti_adjusted["2008"], sti_adjusted["2009"], sti_adjusted["2010"], sti_adjusted["2011"], sti_adjusted["2012"], sti_adjusted["2013"], sti_adjusted["2014"], sti_adjusted["2015"], sti_adjusted["2016"], sti_adjusted["2017"])
I'm looking for a more elegant solution, like a for-loop maybe?
ls_sti_adjusted <- list()
for (i in 2007:2017){
ls_sti_adjusted[[]] <- ???
The second issue is how can I further subset the elements into months in the year?
so for example: ls_sti_adjusted[[1]][[2]][[3]] returns the 3rd data point of February in 2007. Is this possible?
I hope that I am clear about the problem that I am facing. Thanks folks, plus any tips/tricks to understand loops and lists better would be greatly appreciated.
Combining .indexyear and split(x,f = “months” will give you the desired list.
lapply(unique(.indexyear(STI)),function(x) split.xts(STI[.indexyear(STI) == x ,],f='months’))
If you only need yearly lists leave out the split part, like so:
lapply(unique(.indexyear(STI)),function(x) STI[.indexyear(STI) == x ,])
UPDATE: OP’s follow-up question regarding naming of lists
Assuming you named the list of lists object STIlist you can do the following to name the list by years.( keep in mind that the names are converted to strings! )
names(STIlist) <- 2007:2018
To get the list of the year 2007:
> both(STIlist[['2007']])
STI.Open STI.High STI.Low STI.Close STI.Volume STI.Adjusted
2007-01-03 3015.74 3037.74 3010.22 3037.74 192739200 3037.74
2007-01-04 3035.08 3045.18 3008.23 3023.80 198216700 3023.80
2007-01-05 3031.09 3038.27 3000.50 3029.04 233321400 3029.04
STI.Open STI.High STI.Low STI.Close STI.Volume STI.Adjusted
2007-12-27 3469.11 3491.65 3459.97 3477.20 91474200 3477.20
2007-12-28 3452.18 3463.38 3441.96 3445.82 109442100 3445.82
2007-12-31 3424.48 3482.30 3424.48 3482.30 205741900 3482.30
If you need need more information about naming lists "Google is your best friend” or post another question :-)
for the first question something like this?
ls_sti_adjusted <- lapply(unique(index(sti_adjusted)),function(x) sti_adjusted[index(sti_adjusted)==x,1])
We could use the indexing directly from xts, check ?index.xts:
split(sti_adjusted, .indexyear(sti_adjusted))
In order to keep the correct naming 2012, 2013, ..., we can try:
split(sti_adjusted, as.integer(format(index(sti_adjusted), '%Y')))
Of course this can be nested in a list as much as you want:
nestedList <- lapply(
split(sti_adjusted, .indexyear(sti_adjusted))
, function(x) split(x, .indexmon(x))
nestedList[[3]][[2]][3] #3.year, 2.month, 3. obs.
Example using build-in data from xts:
data(sample_matrix, package = "xts")
sample_matrix <- as.xts(sample_matrix)
nestedList <- lapply(
split(sample_matrix, .indexyear(sample_matrix))
, function(x) split(x, .indexmon(x))
Open High Low Close
2007-03-05 50.26501 50.3405 50.26501 50.29567

Filter xts objects by common dates

I am stuck with the following code.
For reference the code it is taken from the following website (, I am also compiling the code through R Studio.
startDate = as.Date("2013-01-01")
symbolData <- new.env()
getSymbols(symbolLst, env = symbolData, src = "yahoo", from = startDate)
stockPair <- list(
a =coredata(Cl(eval(parse(text=paste("symbolData$\"",symbolLst[1],"\"",sep="")))))
,b = coredata(Cl(eval(parse(text=paste("symbolData$\"",symbolLst[2],"\"",sep="")))))
,hedgeRatio = 0.70 ,name=title)
spread <- stockPair$a - stockPair$hedgeRatio*stockPair$b
I am getting the following error.
Error in stockPair$a - stockPair$hedgeRatio * stockPair$b :
non-conformable arrays
The reason these particular series don't match is because "WPL.AX" has an extra value (date:19-05-2014 - the matrix lengths are different) compared to "BHP". How can I solve this issue when loading data?
I have also tested other stock pairs such as "ANZ","WBC" with the source = "google" which produces two of the same length arrays.
> length(stockPair$a)
[1] 360
> length(stockPair$b)
[1] 359
Add code such as this prior to the stockPair computation, to trim each xts set to the intersection of dates:
common_dates <- as.Date(Reduce(intersect, eapply(symbolData, index)))
symbolData <- eapply(symbolData, `[`, i=common_dates)
Your code works fine if you don't convert your xts object to matrix via coredata. Then Ops.xts will ensure that only the rows with the same index will be subtracted. And fortune(106) applies.
# If the answer is parse() you should usually rethink the question.
# -- Thomas Lumley
# R-help (February 2005)
stockPair <- list(
a = Cl(symbolData[[symbolLst[1]]])
,b = Cl(symbolData[[symbolLst[2]]])
,hedgeRatio = 0.70
,name = "title")
spread <- stockPair$a - stockPair$hedgeRatio*stockPair$b
Here's an alternative approach:
# merge stocks into a single xts object
stockPair <-, eapply(symbolData, Cl))
# ensure stockPair columns are in the same order as symbolLst, since
# eapply may loop over the environment in an order you don't expect
stockPair <- stockPair[,pmatch(symbolLst, colnames(stockPair))]
colnames(stockPair) <- c("a","b")
# add hedgeRatio and name as xts attributes
xtsAttributes(stockPair) <- list(hedgeRatio=0.7, name="title")
spread <- stockPair$a - attr(stockPair,'hedgeRatio')*stockPair$b

Scrape number of articles on a topic per year from NYT and WSJ?

I would like to create a data frame that scrapes the NYT and WSJ and has the number of articles on a given topic per year. That is:
2011 2 3
2012 10 7
I found this tutorial for the NYT but is not working for me :_(. When I get to line 30 I get this error:
> cts <-
Error in provideDimnames(x) :
length of 'dimnames' [1] not equal to array extent
Any help would be much appreciated.
PS: This is my code that is not working (A NYT api key is needed
# Need to install from source
# then load:
### set parameters ###
api <- "API key goes here" ###### <<<API key goes here!!
q <- "MOOCs" # Query string, use + instead of space
records <- 500 # total number of records to return, note limitations above
# calculate parameter for offset
os <- 0:(records/10-1)
# read first set of data in
uri <- paste ("", q, "&offset=", os[1], "&fields=date&api-key=", api, sep="") <- readLines(uri, warn="F") # get them
res <- fromJSON( # tokenize
dat <- unlist(res$results) # convert the dates to a vector
# read in the rest via loop
for (i in 2:length(os)) {
# concatenate URL for each offset
uri <- paste ("", q, "&offset=", os[i], "&fields=date&api-key=", api, sep="") <- readLines(uri, warn="F")
res <- fromJSON(
dat <- append(dat, unlist(res$results)) # append
# aggregate counts for dates and coerce into a data frame
cts <-
# establish date range
dat.conv <- strptime(dat, format="%Y%m%d") # need to convert dat into POSIX format for this
daterange <- c(min(dat.conv), max(dat.conv))
dat.all <- seq(daterange[1], daterange[2], by="day") # all possible days
# compare dates from counts dataframe with the whole data range
# assign 0 where there is no count, otherwise take count
# (take out PSD at the end to make it comparable)
dat.all <- strptime(dat.all, format="%Y-%m-%d")
# cant' seem to be able to compare Posix objects with %in%, so coerce them to character for this:
freqs <- ifelse(as.character(dat.all) %in% as.character(strptime(cts$dat, format="%Y%m%d")), cts$Freq, 0)
plot (freqs, type="l", xaxt="n", main=paste("Search term(s):",q), ylab="# of articles", xlab="date")
axis(1, 1:length(freqs), dat.all)
lines(lowess(freqs, f=.2), col = 2)
UPDATE: the repo is now at
There is a RNYTimes package created by Duncan Temple-Lang - but it is outdated because the NYTimes API is on v2 now. I've been working on one for political endpoints only, but not relevant for you.
I'm rewiring RNYTimes right now...Install from github. You need to install devtools first to get install_github
Then try your search with that, e.g,
library(RNYTimes); library(plyr)
moocs <- searchArticles("MOOCs", key = "<yourkey>")
This gives you number of articles found
[1] 121
You could get word counts for each article by
as.numeric(sapply(moocs$response$docs, "[[", 'word_count'))
[1] 157 362 1316 312 2936 2973 355 1364 16 880

Creating a function by taking few arguments and calculating

I'm still working on a question from couple of days ago and would like to receive feedback/support on how I could create a function. Your expertise is highly appreciated.
I have created the following:
##### 1)
> raceIDs
[1] "GER" "SUI" "NZ2" "US1" "US2" "POR" "FRA" "AUS" "NZ1" "SWE"
##### 2)
#For each "raceIDs", there is a csv file which I have made a loop to read and created a list of data frames (assigned to the symbol "boatList")
#For example, if I select "NZ1" the output is:
> head(boatList[[9]]) #Only selected the first six lines as there is more than 30000 rows
Boat Date Secs LocalTime SOG
1 NZ1 01:09:2013 38150.0 10:35:49.997 22.17
2 NZ1 01:09:2013 38150.2 10:35:50.197 22.19
3 NZ1 01:09:2013 38150.4 10:35:50.397 22.02
4 NZ1 01:09:2013 38150.6 10:35:50.597 21.90
5 NZ1 01:09:2013 38150.8 10:35:50.797 21.84
6 NZ1 01:09:2013 38151.0 10:35:50.997 21.95
##### 3)
# A matrix showing the race times for each raceIDs
> raceTimes
start finish
GER "11:10:02" "11:35:05"
SUI "11:10:02" "11:35:22"
NZ2 "11:10:02" "11:34:12"
US1 "11:10:01" "11:33:29"
US2 "11:10:01" "11:36:05"
POR "11:10:02" "11:34:31"
FRA "11:10:02" "11:34:45"
AUS "11:10:03" "11:36:48"
NZ1 "11:10:01" "11:35:16"
SWE "11:10:03" "11:35:08"
What I need to do is I need to calculate the average speed (SOG) of a boat "while it was racing" (between start and finish times) by creating a function called meanRaceSpeed and having three arguments:
What I have tried so far is to create a function with 3 arguments (with a bit of help from experts here):
meanRaceSpeed <- function(raceIDs, boatList, raceTimes)
#Probably need to compare times, and thought it might be useful to convert character values into `DateTime` values but not to sure how to use it
#DateTime <- as.POSIXct(paste(boatList$Date, boatList$Time), format="%Y%m%d %H%M%S")
#To get the times for each boat
start_time <- raceTimes$start[rownames(raceTimes) = raceIDs]
finish_time <- raceTimes$finish[rownames(raceTimes) = raceIDs]
start_LocalTime <- min(grep(start_time, boatList$LocalTime))
finish_LocalTime <- max(grep(finish_time, boatList$LocalTime))
#which `SOG`s contain all the `LocalTimes` between start and finish
#take their `mean`
mean(boatList$SOG[start_LocalTime : finish_LocalTime])
### Obviously, my code does not work :( and I don't know where.
So basically, I need to create a function with three arguments and the expected result is:
#e.g For NZ1
> meanRaceSpeed("NZ1", boatList, raceTimes)
[1] 18.32 #Mean speed for NZ1 between 11:10:01 - 11:35:16
#e.g for US1
> meanRaceSpeed("US1", boatList, raceTimes)
[1] 17.23 #Mean speed for US1 between 11:10:01 - 11:33:29
Any helps where I could have gone wrong? Highly appreciate your help please.
I'm going to give some general advice for R, but I will also help you with your specific question. Whenever I have a problem in R, I usually find that it helps to make things more explicit.
If the function isn't working with these methods (is that a data frame or a matrix in your function?) then you should try another method. If those table manipulation methods aren't working, try a different one. How?
Here's a few different things you can do to test your function, and a few suggestions that may move you along a bit. (I don't want to fix the whole thing for you, since it's your homework, but rather get you on your way.)
1) Why not try using a loop instead of brackets?
start_time <- raceTimes$start[rownames(raceTimes) = raceIDs]
Make that into a for loop. It's not too hard to do.
2) Debug your functions. There are a lot of tools to do this built into R, and in packages you can add. Since you, likely, don't have time for that with your homework. I'd suggest doing this. Take apart the function and apply each part of it with a variable you want. Are they of the right length? Are they the right data type? Are they getting the right answer before you put them all together? Make sure of that.
3) If all else fails, don't be afraid if the function and code is not elegant. R is not always an elegant language. (Actually, it's rarely an elegant language.) Especially when you're a beginner, your code will likely be ugly. Just make sure it works.
Since I, already, had experience with your data, I sat to make a complete example.
First, data that look like yours:
raceIDs <- c("GER", "SUI", "NZ2", "US1", "US2", "POR", "FRA", "AUS", "NZ1", "SWE")
raceTimes <- as.matrix(read.table(text = ' start finish
GER "11:10:02" "11:35:05"
SUI "11:10:02" "11:35:22"
NZ2 "11:10:02" "11:34:12"
US1 "11:10:01" "11:33:29"
US2 "11:10:01" "11:36:05"
POR "11:10:02" "11:34:31"
FRA "11:10:02" "11:34:45"
AUS "11:10:03" "11:36:48"
NZ1 "11:10:01" "11:35:16"
SWE "11:10:03" "11:35:08"', header = T))
#turn matrix to data.frame or, else, `$` won't work
raceTimes <-, stringsAsFactors = F)
blDF <- data.frame(Boat = rep(raceIDs, 3),
LocalTime = c(raceTimes$start, rep("11:20:25", length(raceIDs)), raceTimes$finish),
SOG = runif(3 * length(raceIDs), 15, 25), stringsAsFactors = F)
boatList <- split(blDF, blDF$Boat)
#remove `names` to create them from scratch
names(boatList) <- NULL
#create `names` by searching each element of
#`boatList` of what `boat` it contains
names(boatList) <- unlist(lapply(boatList, function(x) unique(x$Boat)))
#the function
meanRaceSpeed <- function(ID, boatList, raceTimes)
{ #named the first argument `ID` instead of `raceIDs`
start_time <- raceTimes$start[rownames(raceTimes) == ID]
finish_time <- raceTimes$finish[rownames(raceTimes) == ID]
start_LocalTime <- min(grep(start_time, boatList[[ID]]$LocalTime))
finish_LocalTime <- max(grep(finish_time, boatList[[ID]]$LocalTime))
mean(boatList[[ID]]$SOG[start_LocalTime : finish_LocalTime])
meanRaceSpeed("US1", boatList, raceTimes)
#[1] 19.7063
meanRaceSpeed("NZ1", boatList, raceTimes)
#[1] 21.74729
mean(boatList$NZ1$SOG) #to test function
#[1] 21.74729
mean(boatList$US1$SOG) #to test function
#[1] 19.7063
