Here is my code right now:
f=function(Symbol, start, end, interval){
getSymbols(Symbols=Symbol, from=start, to= end)
Symbol=data.frame(Symbol)
a=length(Symbol$Symbol.Adjusted)
b=a/interval
c=ceiling(b)
origData=as.data.frame(matrix(`length<-`(Symbol$Symbol.Adjusted, c * interval), ncol = interval, byrow = TRUE))
return(origData)
}
f("SPY", "2012-01-01", "2013-12-31", 10)
Next I need to Get the adjusted close price and consider this price data only for following tasks. Split daily stock adjusted close price into N blocks as rows in a data frame. So that each block containing M days (columns) data, where M equals to the time interval value. It’s referred as origData in my code.
The function is supposed to return the data frame origData, but whenever I try running this it tells me that the Symbol data frame is empty. How do I need to change my function to get the data frame output?
#IRTFM's observations are correct. Incorporating those changes you can change your function to :
library(quantmod)
f = function(Symbol, start, end, interval){
getSymbols(Symbols=Symbol, from=start, to= end)
data= get(Symbol)
col = data[, paste0(Symbol, '.Adjusted')]
a=length(col)
b=a/interval
c=ceiling(b)
origData= as.data.frame(matrix(`length<-`(col, c * interval),
ncol = interval, byrow = TRUE))
return(origData)
}
f("SPY", "2012-01-01", "2013-12-31", 10)
I haven't figured out what the set of expressions inside the data.matrix call is supposed to do and you made no effort to explain your intent. However, your error occurs farther up the line. If you put in a debugging call to str(Symbol) you will see that Symbol will evaluate to "SPY" but that is just a character value and not an R object name. The object you wnat is named SPY and the way to retrieve an object's value when you can only have access to a character value is to use the R function get, So try adding this after the getSymbols call inside the function:
library(quantmod) # I'm assuming this was the package in use
...
Symbol=data.frame( get(Symbol) )
str(Symbol) # will print the result at your console
....
# then perhaps you can work on what you were trying inside the data.matrix call
You will also find that the name Symbol.Adjusted will not work (since R is not a macro language). You will need to do something like:
a=length( Symbol[[ paste0(Symbol, ".Adjusted")]] )
Oh wait. You overwrote the value for Symbol. That won't work. You need to use a different name for your dataframe. So why don't you edit your question to fix the errors I've identified so far and also describe what you are trying to do when you were using as.data.frame.
Related
I'm using quantmod to work on multiple symbols in R. My instinct is to combine the symbols into a list of xts objects, then use lapply do do what I need to do. However, some of the things that make quantmod convenient seem (to this neophyte) not to play nicely with lists. An example:
> symbols <- c("SPY","GLD")
> getSymbols(symbols)
> prices.list <- mget(symbols)
> names(prices.list) <- symbols
> returns.list <- lapply(prices.list, monthlyReturn, leading = FALSE)
This works. But it's unclear to me which column of prices it is using. If I try to specify adjusted close, it throws an error:
> returns.list <- lapply(Ad(prices.list), monthlyReturn, leading = FALSE)
Error in Ad(prices.list) :
subscript out of bounds: no column name containing "Adjusted"
The help for Ad() confirms that it works on "a suitable OHLC object," not on a list of OHLC objects. In this particular case, how can I specify that lapply should apply the monthlyReturn function to the Adjusted column?
More generally, what is the best practice for working with multiple symbols in quantmod? Is it to use lists, or is another approach better suited?
Answer monthlyReturn:
All the **Return functions are based on periodReturn. The default check of periodReturn is to make sure it is an xts objects and then takes the open price as the start value and the close price as the last value and calculates the return. If these are available at least. If these are not available it will calculate the return based on the first value of the timeseries and the last value of the timeseries, taking into account the needed time interval (month, day, year, etc).
Answer for lapply:
You want do 2 operations on a list object, so using an function inside the lapply should be used:
lapply(prices.list, function(x) monthlyReturn(Ad(x), leading = FALSE))
This will get what you want.
Answer for multiple symbols:
Do what you are doing.
run and lapply when getting the symbols:
stock_prices <- lapply(symbols, getSymbols, auto.assign = FALSE)
use packages tidyquant or BatchGetSymbols to get all the data in a big tibble.
... probably forgot a few. There are multiple SO answers about this.
I have a program that is supposed to create a pdf file of actograms given a csv of activity and time. I need to loop through multiple activity columns, one for each subject. The first activity column is column 3. Here is the relevant code:
pdf("All Actograms.pdf")
for(i in 3:(length(dat) - 1)) {
activity <- colnames(dat)[i]
# Plot the actogram
print(actogram(activity~datetime, dat=dat, col="black", main=colnames(dat)[i], strip.left.format="%m/%d", doublePlot = TRUE, scale=0.75))
}
dev.off()
When I call my actogram function, I get the error "non-numeric argument to binary operator." The problem is the formula "activity~datetime," because datetime is a column name and activity should be too. If I try it out of the loop, with the name of an activity column rather than a variable containing the name, it works fine. Upon debugging, I found the actogram function is receiving the string "activity," rather than the variable activity. I don't really understand formulas, but I want to know if there's any way to accomplish what I'm trying to do, which is loop through many columns, changing the column before the "~" each time I call the actogram function. I'm very new to R.
Thanks!
We do not have the data you are working on but I think the simplest thing you can do is the following:
pdf("All Actograms.pdf")
for(i in 3:(length(dat) - 1)) {
activity <- colnames(dat)[i]#save the name of the column I
colnames(dat)[i] <- "activity" # change the name of column I to activity
# Plot the actogram
print(actogram(activity~datetime, dat=dat, col="black", main=activity, strip.left.format="%m/%d", doublePlot = TRUE, scale=0.75))
colnames(dat)[i] <- activity # change back the name of the column I to its original name
}
dev.off()
Hopefully it works.
I am trying to rename the columns of a time series using assign function as follows -
assign(colnames(paste0(<logic_to_get_dataset>)),
c(<logic_to_get_column_names>))
I am getting a warning : In assign(colnames(get(paste0("xvars_", TopVars[j, 1], "_lag", :
only the first element is used as variable name
also, the column name assignment does not happen. I think this is happening because of colnames() function. Is there a workaround ?
The issue is that assign only looks at the first element of the vector.
You can try this, for example:
df = data.frame(x = 1:3, y = 4:2)
within(df, assign(colnames(df),c('a','b'))
You'll notice that R only looks at the first variable, and it tries to reassign the values that are described by those column names to the second value. This behavior is obviously not what you're looking for.
Unfortunately, it's kind of hackey, but you can always use something like this
data.frame.name = get_df()#some function that returns text
data.frame.columns = get_cols()#some function that returns text
eval(parse(text = paste0('colnames(',data.frame.name,') = c(',
paste(data.frame.columns,collapse = ','),')')))
I prefer to avoid doing these kinds of expressions, but it should work as intended.
Here it goes -
temp_var <- paste0('colnames(var_',TopLines[j,1],'_lag',get(paste0('uniqLg_',TopLines[j,1]))[k,],'_',get(paste0('uniqLg_',TopLines[j,1]))[k,]+12 ,
') <- c(gsub( "xt',get(paste0('uniqLg_',TopLines[j,1]))[k,],'" , "xt',get(paste0('uniqLg_',TopLines[j,1]))[k,],'__',get(paste0('uniqLg_',TopLines[j,1]))[k,]+12,
'", colnames(var_',TopLines[j,1],'_xt',get(paste0('uniqLg_',TopLines[j,1]))[k,],')))')
print(temp_var )
eval(parse( text=temp_var ))
where TopLines is a data frame with one column and contains a list of lines. The only problem with this method is, I can't test the output of eval unless I actually open the dataset and see if the changes have been affected.
I am having a problem with the window function in R.
newdata1 <-window(mergedall,start=c(as.Date(as.character("2014-06-16"))),end=c(as.Date(as.character("2015-01-31"))))
I got this error. I am trying to understand how I can fix this issue. Thank you!
Error in window.default(mergedall, start = c(as.Date(as.character("2014-06-16"))), :
'start' cannot be after 'end'
In addition: Warning message:
In window.default(mergedall, start = c(as.Date(as.character("2014-06-16"))), :
'end' value not changed`
I know it's an old post. But, please make sure that "mergedall" is a time series object which was created using the ts command.
While creating the time series object from any vector or series,
some_result_ts <- ts(vector,frequency=xx,start=c(yyyy,m))
This kind of error comes when yyyy is lesser than the start you are specifying in window command.
For example if you take a data frame column or a vector or series , and during the ts formation with ts command, give yyyy=2010,m=1 with a frequency of 12 and assuming it's a 36 month data, the implicit end will be 2013,12.
some_result_ts <- ts(vector,frequency=12,start=c(2010,1))
Then, while using a window function, if you are specifying let's say, start = c(2014,1) , then R will give a message that => 'start' cannot be after 'end' and end value not changed.
Again it's an old post. But since I stumbled upon it by searching the same error. I want to still provide something useful for future Googlers.
I could not replicate your issue because you did not provide your own mergedall dataset. So I am starting with a toy example to show a few places where the problem might be. It's really not that difficult at all.
Potential problem #1:
You did not create a ts object to begin with. Window function operates on a ts object, and it cannot just be a vector took directly from a df. Use ts function to make a vector a ts object first. And then assign it with proper start, end, frequency.
all <-seq(1:8) #eight observations in sequence
Assign these eight values as monthly observations, starting from 201406 to 201501. Frequency 12 means monthly.
all.ts <- ts(all, start = c(2014,6), end = c(2015,1), frequency = 12)
Potential problem #2:
You perhaps already assigned your mergedall series as a ts object, but with different start/end/frequency. My example above was based on monthly observations. So even though they are correct examples, they will not match with your daily-based window function. Window function and the ts object needs to be consistent.
Following my example, the window function would look like:
newdata1 <-window(all.ts,start=c(2014,6),end=c(2015,1) )
Hi here is what you can try, perhaps this would be the solution as I also faced the same problem.
You might not be referring to proper index value in the timeseries object.
In below code I have added the index (i) you can put 1 in case the object has only one series or any number or pass different values using a simple loop.
Hope it helps.!
newdata1 <-window(mergedall[i],start=c(as.Date(as.character("2014-06-16"))),end=c(as.Date(as.character("2015-01-31"))))
I am also a future googler and none of the answers helped me. This was my problem and solution:
MWE issue:
set.seed(50)
data <- ts(rnorm(100), start(1850))
data.train <- window(data, start = 1850, end = 1949)
MWE solution:
set.seed(50)
data <- ts(rnorm(100), start = (1850))
data.train <- window(data, start = 1850, end = 1949)
Issue was the missing equals sign when setting the start date.
The resulting variable data was still a time series; but the give-away was: "Time-Series from 1 to 100" rather than "Time-Series from 1850 to 1949", which told me that something was awry with creating the time series.
The ts function doesn't raise this as an error, presumably because it accepts the start() function from the {stats} package, according to the ?ts doc.
This is probably an issue arising from the format of your 'mergedall' object.
Make sure that you have a ts, xts or a zoo object.
Try f.ex. the following first, in order to ensure the format of your object:
str(mergedall)
The purpose of this very simple function is just to transform a date column to a date variable and a numeric time (hourly) column to a factor variable, which will be used with plyr later in the code.
I can get this code to run successfully in the command line, but when I attempt to run it in the function I get an error.
# setting up some fake data
set.seed(31)
foo <- function(myHour, myDate){
rlnorm(1, meanlog=0,sdlog=1)*(myHour) + (150*myDate)
}
Hour <- 1:24
Day <-1:1080
dates <-seq(as.Date("2010-01-01"), by = "day", length.out= 1080)
myData <- expand.grid( Day, Hour)
names(myData) <- c("Date","Hour")
myData$Adspend <- apply(myData, 1, function(x) foo(x[2], x[1]))
myData$Date <-dates
myData$Demand <-(rnorm(1,mean = 0, sd=1)+.75*myData$Adspend)
#############################################################
myData
# Function Creation
AddCal <-function(DF,Date,Time) {
DF$Date<-as.Date(DF$Date, format="%m/%d/%Y")#Change Date variable into a date type
DF$Time<-factor(DF$Time,levels=`c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24))
}
#Test Function
Bob<-AddCal(myData,Date,Hour)
#Error I receive
Error in `$<-.data.frame`(`*tmp*`, "Time", value = integer(0)) :
replacement has 0 rows, data has 25920
I spent about 2 hours searching for answers and trying different things. Because I can run the individual lines of code at the command line and get the desired result, I am assuming this is an advanced coding problem beyond my novice capabilities.
In your function, replace all instances DF$Time with DF[[Time]] same for DF$Date.
Also see the two comments below from #Dwin & #mrip:
Make sure to return a value
Make sure to pass string arguments where strings are expected
What's going on:
When you use DF$Time, R is looking for a column named Time in DF. It is not treating Time as the string variable that you expect.
DF[[Time]] on the other hand does treat Time as a variable.
The reason the error only refers to Time and not Date is because Date is both the name of your variable and the name of a column in DF. (If in your function call you would have used something like AddCal(.. Date=Demand) or whatever other column name, you would not get back the results you would expect)
Side Note:
c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24)
is equivalent to
seq(24) and to 1:24