I have been trying to find the percentage price jump (+-15 % change) in the numbers and when there is the jump it will give me the corresponding date. However, when there is a first jump it will break-out from the inner-for loop gives me the correspndong data for it and which will get store in temp.cyc data frame.
The program is not showing any error but it is repeating the same no. throughout and when I check the value of test.df it is showing NA.
Can anyone help me understand what's going on? And I am new to R so it will helpful if you can give your answer in detail. Thank You :)
# Defining variables
row= nrow(price.close)
col=ncol(price.close)
#Defining Matrix
m<-matrix(0,ncol=1,nrow=row)
p<-matrix(0,ncol=5,nrow=row)
# Dataframe to temporaily store percentage Change
test.df<- vector(mode="numeric", length=nrow(price.close))
# Dataframe to extract required Values
temp.cyc<-as.data.frame(p)
colnames(temp.cyc)<-c("cyc.duration","Start.date","End.date","Start.date.value","End.date.value")
for( j in 1:row)
{
for(i in j:row)
{
test.df<-(price.close[(i+1),2]-price.close[j,2])/price.close[j,2]
if(test.df >= 0.15 | test.df <= -0.15 | is.na(test.df)== TRUE )
{
temp.cyc$Start.date.value = price.close[j,2]
temp.cyc$End.date.value <- price.close[i,2]
temp.cyc$Start.date <- price.close[j,1]
temp.cyc$End.date <- price.close[i,1]
}
break
}
}
Seems to me you are using this for financial data , e.g. stock prices. If this assumption is right then I suggest that you should use packages that have this functionality. I would suggest quantmod package.
Here's short example how to get dates when price goes more then 15% up or down.
library(quantmod)
# create some dummy stock data over 10 days period
# next time I hope you will attach some of your data
stockClose <- c(100,50,75,70,68,100,115,120,130,100)
stockDates <- seq(as.Date("2014-01-01"),length=10,by="+1 days")
stock.xts <- as.xts(stockClose,stockDates)
# calculate change , check '?Delt' help for more info
change <- Delt(stock.xts)
#get only those rows where price change in both directions is higher then 15%
specialDays <- change[coredata(change) < -0.15 | coredata(change) > 0.15,]
#get dates
justDates <- index(specialDays)
which gives us "2014-01-02" ,"2014-01-03" ,"2014-01-06" and "2014-01-10"
In case you want to actually compare all possible combinations for entering and exiting position then you can use something like this :
library(quantmod)
calculatePeriods <- function(){
stockClose <- c(100,50,75,70,68,100,115,120,130,100)
stockDates <- seq(as.Date("2014-01-01"),length=10,by="+1 days")
stock.xts <- as.xts(stockClose,stockDates)
# you will be adding rows to thid df
result <- data.frame()
for(i in 1:(length(stock.xts)-1)){
for(j in 2:length(stock.xts)){
change <- (coredata(stock.xts[j])-coredata(stock.xts[i]))/coredata(stock.xts[i])
if(change < (-0.15) | change > (0.15)){
row <- data.frame("cyc.duration"=as.numeric(index(stock.xts[j])-index(stock.xts[i]),units="days"),"Start.date"=index(stock.xts[i]),"End.date"=index(stock.xts[j]),"Start.date.value"=coredata(stock.xts[i]),"End.date.value"=coredata(stock.xts[j]))
result <- rbind(result,row)
}
}
}
return(result)
}
Related
RStudio Version 1.0.143
Windows Ver: Windows10 Pro
I have 300+ files which has the same struction, and I want to create a loop, so it can calculate the correlation index of the required files. I can get the right files and calculate the correlation index, but I can't get them all showed as a result. I tried to save them to a vector, but it tells me "the object not found". and if it can work, I also worried about whether the content of the vector will stay if I run the function for several times. Here's the loop:
for(i in ind_larg){
+ specdata_i <- read.csv(i)
+ com_case_ind <- complete.cases(specdata_i)
+ sulfate_i <- specdata_i[,2][com_case_ind]
+ nitrate_i <- specdata_i[,3][com_case_ind]
+ ou[i] <- cor(sulfate_i, nitrate_i)
+ }
and the result
Error: object 'ou' not found
I'm not sure if you need the rest of the code before this, so I attach them at the end here.
> setwd("C:/Users/sunxi/Coursera/specdata")
> ind <- dir(path = "C:/Users/sunxi/Coursera/specdata", pattern = ".csv") #Save the index of the files to a vector.
> specdata_ful <- lapply(ind, read.csv) #combine all the files to a data frame.
> specdat_recon_ful <- do.call(rbind, specdata_ful) #Reconstruct the data frame to put the same variable in one column.
> com_case_ful <- complete.cases(specdat_recon_ful) #Filter the complete cases.
> id_ful <- specdat_recon_ful[,4][com_case_ful] #The ID of the complete cases.
> sulfate_ful <- specdat_recon_ful[,2][com_case_ful] #The sulfate value of the complete cases.
> nitrate_ful <- specdat_recon_ful[,3][com_case_ful] #The nitrate value of the complete cases.
> id_freq_ful <- table(id_ful) #Summary the frequency in each id
> id_freq_mat_ful <- as.data.frame(id_freq_ful) #transfer the table into the data.frame.
> good <- id_freq_mat_ful[["Freq"]] > 1000 #Filter the freqency larger than threshold.
> id_good <- id_freq_mat_ful[["id_ful"]][good] #Filter the id has the frequency of complete cases larger than the threshold.
> ind_larg <- ind[id_good] #Create an index for the id has required requency.
You have to create the variable ou before you access it with ou[i]:
ou <- c()
for(i in ind_larg){
# your loop here...
ou[i] <- cor(sulfate_i, nitrate_i)
}
I'm a novice R user and have created a small script that is doing some trigonometry with movement data. I need to add a final column that deletes repeated values from the column before it.
I've tried adding an if else statement that seems to work when isolated, but keep having errors when it is put into the for loop. I'd appreciate any advice.
# trig loop
list.df <- vector("list", max(Sp_test$ID))
names1 <- c(1:max(Sp_test$ID))
for(i in 1:max(Sp_test$ID)) {
if(i %in% unique(Sp_test$ID)) {
idata <- subset(Sp_test, ID == i)
idata$originx <- idata[1,3]
idata$originy <- idata[1,4]
idata$deltax <- idata[,"UTME"]-idata[,"originx"]
idata$deltay <- idata[,"UTMN"]-idata[,"originy"]
idata$length <- sqrt((idata[,"deltax"])^2+(idata[,"deltay"]^2))
idata$arad <- atan2(idata[,"deltay"],idata[,"deltax"])
idata$xnorm <- idata[,"deltax"]/idata[,"length"]
idata$ynorm <- idata[,"deltay"]/idata[,"length"]
sumy <- sum(idata$ynorm, na.rm=TRUE)
sumx <- sum(idata$xnorm, na.rm=TRUE)
idata$vecsum <- atan2(sumy,sumx)
idata$width <- idata$length*sin(idata$arad-idata$vecsum)
# need if else statement excluding a repeat from the position just before it
list.df[[i]] <- idata
names1[i] <- i
} }
# this works alone, I think the problem is when it gets to the first of the dataset and there is not one before it
if (idata$width[j]==idata$width[j-1]) {
print("NA")
} else {
print(idata$width[j])
}
I think you want to use the function diff for this. diff(idata$width) will give the differences between successive values of idata$width. Then
idata$width[c(FALSE, diff(idata$width) == 0)] <- NA
I think does what you want. The initial FALSE is since there is no value corresponding to the first element (since as you rightly noted, the first element doesn't have an element before it).
I'm very new to R and trying to backtest a strategy I've programmed already in WealthLab.
Several stuff I don't understand (and it doesn't work obviously:)
I don't get the Close Prices nicely into a vector...or some kind of vector but it starts with structure and I don't really understand what this function does. Thats why my series[,1] call probably doesn't work.
n <- nrow(series) doesn't work either, but I need that for the Loop
So I guess if I get These 2 questions answered my strategy should work...I'm very thankful for any help..R seems quite complicated even with programming experience in other languages
#rm(list = ls(all = TRUE))
#import data, default is yahoo
require(quantmod)
series <- getSymbols('AAPL',from='2013-01-01')
#generate HLOC series
close <- Cl(AAPL)
open <- Op(AAPL)
low <-Lo(AAPL)
high <- Hi(AAPL)
#setting parameters
lookback <- 24 #24 days ago
startMoney <- 10000
#Empty our time series for position and returns
f <- function(x) 0 * x
position <- apply(series[,1],FUN=f)
colnames(position)="long_short"
returns <- apply(series[,1],FUN=f)
colnames(returns)="Returns"
trades = returns
colnames(trades)="Trades"
amount = returns
colnames(amount) = "DollarAmount"
amt[seq(1,lookback)] = startMoney
#Calculate all the necessary values in a loop with our trading strategy
n <- nrow(series)
for(i in seq(lookback+1,n)){
#get the return
if(position[i-1] == 1){
#we were long
returns[i] = close[i]/close[i-1] - 1
} else if(position[i-1] == -1){
#we were short
returns[i] = close[i-1]/close[i] - 1
}
#long/short position
if(open[i-lookback]<open[i] && low[i-1] < open[i]){
#go long
position[i] = 1
} else if(open[i-lookback]>open[i] && high[i-1] > open[i]){
# go short
position[i] = -1
} else {
position[i] = position[i-1]
}
#mark a trade if we did one
if(position[i] != position[i-1]) trades[i] = 1
#Calculate the dollar amount
amount[i] = amount[i-1]*exp(returns[i])
if(trades[i]) amount[i] = amount[i] - 2
}
Starting with the second question
> s <- getSymbols('SPY')
> nrow(s)
NULL
> class(s)
[1] "character"
> s.data <- get(s)
> class(s.data)
[1] "xts" "zoo"
> nrow(s.data)
[1] 1635
So if you want to work on the actual xts object you need to use get.
About your first question - i don't think you really need to pull the data as a vector - the xts object is an array indexed by date and it's easy to work with.
If you still want to get the data you can use
closing.prices <- coredata(Cl(s))
Now, to get you started with simple back testing of strategies i will suggest working in the following steps
define your strategy.
2. create an array or add a column to your xts object that will represent your position for each day. 1 for long, 0 for no position and -1 for short (later on you can play with the number for leverage).
3. multiply each days return with the position and you'll get your strategy return vector.
4. examine the results - my recommendation is PerformanceAnalytics.
simple strategy - buy when close over SMA20 , sell under
library(quantmod)
library(PerformanceAnalytics)
s <- get(getSymbols('SPY'))["2012::"]
s$sma20 <- SMA(Cl(s) , 20)
s$position <- ifelse(Cl(s) > s$sma20 , 1 , -1)
myReturn <- lag(s$position) * dailyReturn(s)
charts.PerformanceSummary(cbind(dailyReturn(s),myReturn))
and this is what you'll get
I am backtesting some investment strategy using R, I have a piece of script below:
set.seed(1)
output.df <- data.frame(action=sample(c("initial_buy","sell","buy"),
10000,replace=TRUE),stringsAsFactors=FALSE)
output.df[,"uid"] <- 1:nrow(output.df)
cutrow.fx <- function(output.df) {
loop.del <- 2
while (loop.del <= nrow(output.df)) {
if ((output.df[loop.del,"action"]=="initial_buy" &
output.df[loop.del-1,"action"]=="initial_buy")|
(output.df[loop.del,"action"]=="sell" &
output.df[loop.del-1,"action"]=="sell")|
(output.df[loop.del,"action"]=="buy" &
output.df[loop.del-1,"action"]=="sell")|
(output.df[loop.del,"action"]=="initial_buy" &
output.df[loop.del-1,"action"]=="buy")){
output.df <- output.df[-loop.del,]
} else {
loop.del <- loop.del + 1
}
}
output.df<<-output.df
}
print(system.time(cutrow.fx(output.df=output.df)))
The strategy will determine: 1) when to start buying a stock; 2) when to add additional contribution to the stock; and 3) when to sell all the stock. I have a dataframe with price of a stock for the past 10 years. I wrote 3 scripts to indicate which date should I buy/sell the stock, combine the 3 results and order them.
I need to remove some of the "impossible action", e.g. I cannot sell the same stock twice without buying new units beforehand, so I used the script above to delete those impossible action. But the for loop is kind of slow.
Any suggestion for speeding it up?
Update 01
I have updated the cutrow.fx into the following but fail:
cutrow.fx <- function(output.df) {
output.df[,"action_pre"] <- "NIL"
output.df[2:nrow(output.df),"action_pre"] <- output.df[1:(nrow(output.df)-1),"action"]
while (any(output.df[,"action_pre"]=="initial_buy" & output.df[,"action"]=="initial_buy")|
any(output.df[,"action_pre"]=="sell" & output.df[,"action"]=="sell")|
any(output.df[,"action_pre"]=="sell" & output.df[,"action"]=="buy")|
any(output.df[,"action_pre"]=="buy" & output.df[,"action"]=="initial_buy")) {
output.df <- output.df[!(output.df[,"action_pre"]=="initial_buy" & output.df[,"action"]=="initial_buy"),]
output.df <- output.df[!(output.df[,"action_pre"]=="sell" & output.df[,"action"]=="sell"),]
output.df <- output.df[!(output.df[,"action_pre"]=="sell" & output.df[,"action"]=="buy"),]
output.df <- output.df[!(output.df[,"action_pre"]=="buy" & output.df[,"action"]=="initial_buy"),]
output.df[,"action_pre"] <- "NIL"
output.df[2:nrow(output.df),"action_pre"] <- output.df[1:(nrow(output.df)-1),"action"]
}
output.df[,"action_pre"] <- NULL
output.df<<-output.df
}
I used the vector comparison as somehow inspired (I used somehow as I'm not sure if I get exact what he means in the answer) by John, use a while-loop to repeat. But the output is not the same.
Is the for-loop here inevitable?
It looks like all you're doing is checking the last action. This doesn't require a loop at all. All you have to do is shift the vector and do straight vector comparisons. Here's an artificial example.
x <- sample(1:11)
buysell <- sample(c('buy', 'sell'), 11, replace = TRUE)
So, I have 11 samples, x, and whether I've bought or sold them. I want to make a boolean that shows whether I bought or sold the last sample.
bought <- c(NA, buysell[1:10])
which( bought == 'buy' )
Examine the x and buysell variables and you'll see the results here are the index of the x items where a buy was made on the prior item.
Also, you might want to check out he function %in%.
I tried to do something clever with vectorization, but failed because previous iterations of the loop can change the data relationships for later iterations through. So I couldn't lag the data by a set amount and compare lagged to real results.
What I can do is minimize the copying operation involved. R is assign-by-copy, so when you write a statement like output.df <- output.df[-loop.del,], you are copying the entire data structure for each row that is deleted. Instead of changing (and copying) the data frame, I made changes to a logical vector. Some other attempts at speed-up include using logical and (&&) instead of bitwise and (&), using %in% to make fewer comparisons, and minimizing accesses on output.df.
To compare the two functions I slightly modified OP solution such that the original data frame was not overwritten. It looks like this can improve speeds by a factor of 10, but it still takes a noticeable about of time (>0.5 sec). I'd love to see any faster solutions.
OP's solution (slightly modified in return value and without global assign)
cutrow.fx <- function(output.df) {
loop.del <- 2
while (loop.del <= nrow(output.df)) {
if ((output.df[loop.del,"action"]=="initial_buy" &
output.df[loop.del-1,"action"]=="initial_buy")|
(output.df[loop.del,"action"]=="sell" &
output.df[loop.del-1,"action"]=="sell")|
(output.df[loop.del,"action"]=="buy" &
output.df[loop.del-1,"action"]=="sell")|
(output.df[loop.del,"action"]=="initial_buy" &
output.df[loop.del-1,"action"]=="buy")){
output.df <- output.df[-loop.del,]
} else {
loop.del <- loop.del + 1
}
}
return(output.df)
}
ans1 <- cutrow.fx(output.df)
my solution
cutrow.fx2 <- function(output.df) {
##edge case if output.df has too few rows
if (nrow(output.df) < 2) return(output.df)
##logical vector of indices of rows to keep
idx <- c(TRUE,logical(nrow(output.df)-1))
##keeps track of the previous row
prev.row <- 1
prev.act <- output.df[prev.row,"action"]
for (current.row in seq_len(nrow(output.df))[-1]) {
##access output.df only once per iteration
current.act <- output.df[current.row,"action"]
##checks to see if current row is bad
##if so, continue to next row and leave previous row as is
if ( (prev.act %in% c("initial_buy","buy")) &&
(current.act == "initial_buy") ) {
next
} else if ( (prev.act == "sell") &&
(current.act %in% c("buy","sell")) ) {
next
}
##if current row is good, mark it in idx and update previous row
idx[current.row] <- TRUE
prev.row <- current.row
prev.act <- current.act
}
return(output.df[idx,])
}
ans2 <- cutrow.fx2(output.df)
checks that answers are the same
identical(ans1,ans2)
## [1] TRUE
#benchmarking
require(microbenchmark)
mb <- microbenchmark(
ans1=cutrow.fx(output.df)
,ans2=cutrow.fx2(output.df),times=50)
print(mb)
# Unit: milliseconds
# expr min lq median uq max
# 1 ans1 9630.1671 9743.1102 9967.6442 10264.7000 12396.5822
# 2 ans2 481.8821 491.6699 500.6126 544.4222 645.9658
plot(mb)
require(ggplot2)
ggplot2::qplot(y=time, data=mb, colour=expr) + ggplot2::scale_y_log10()
Here is some code that is a bit simpler and much faster. It does not loop over all elements, but only loops between matches. It matches forward rather than backward.
First, modify your cutrow.fx function. Remove the <<-output.df on the last line, and simply return the result. Then you can run two functions and compare the results.
cutrow.fx1 <- function(d) {
len <- length(d[,1])
o <- logical(len)
f <- function(a) {
switch(a,
initial_buy=c('buy', 'sell'),
buy=c('buy', 'sell'),
sell='initial_buy'
)
}
cur <- 1
o[cur] <- TRUE
while (cur < len) {
nxt <- match(f(d[cur,1]), d[(cur+1):len,1])
if (all(is.na(nxt))) {
break
} else {
cur <- cur + min(nxt, na.rm=TRUE);
o[cur] <- TRUE
}
}
d[o,]
}
Show that the results are correct:
identical(cutrow.fx1(output.df), cutrow.fx(output.df))
## [1] TRUE
And it is quite a bit faster. This is due to the partial vectorization of the problem, using match to find the next row to keep, rather than iterating to discard rows.
print(system.time(cutrow.fx(output.df)))
## user system elapsed
## 5.688 0.000 5.720
print(system.time(cutrow.fx1(output.df)))
## user system elapsed
## 1.050 0.000 1.056
I would like to export an hclust-dendrogram from R into a data table in order to subsequently import it into another ("home-made") software. str(unclass(fit)) provides a text overview for the dendrogram, but what I'm looking for is really a numeric table. I've looked at the Bioconductor ctc package, but the output it's producing looks somewhat cryptical. I would like to have something similar to this table: http://stn.spotfire.com/spotfire_client_help/heat/heat_importing_exporting_dendrograms.htm
Is there a way to get this out of an hclust object in R?
In case anyone is also interested in dendrogram export, here is my solution. Most probably, it's not the best one as I started using R only recently, but at least it works. So suggestions on how to improve the code are welcome.
So, ifhris my hclust object and df is my data, the first column of which contains a simple index starting from 0, and the row names are the names of the clustered items:
# Retrieve the leaf order (row name and its position within the leaves)
leaf.order <- matrix(data=NA, ncol=2, nrow=nrow(df),
dimnames=list(c(), c("row.num", "row.name")))
leaf.order[,2] <- hr$labels[hr$order]
for (i in 1:nrow(leaf.order)) {
leaf.order[which(leaf.order[,2] %in% rownames(df[i,])),1] <- df[i,1]
}
leaf.order <- as.data.frame(leaf.order)
hr.merge <- hr$merge
n <- max(df[,1])
# Re-index all clustered leaves and nodes. First, all leaves are indexed starting from 0.
# Next, all nodes are indexed starting from max. index leave + 1.
for (i in 1:length(hr.merge)) {
if (hr.merge[i]<0) {hr.merge[i] <- abs(hr.merge[i])-1}
else { hr.merge[i] <- (hr.merge[i]+n) }
}
node.id <- c(0:length(hr.merge))
# Generate dendrogram matrix with node index in the first column.
dend <- matrix(data=NA, nrow=length(node.id), ncol=6,
dimnames=list(c(0:(length(node.id)-1)),
c("node.id", "parent.id", "pruning.level",
"height", "leaf.order", "row.name")) )
dend[,1] <- c(0:((2*nrow(df))-2)) # Insert a leaf/node index
# Calculate parent ID for each leaf/node:
# 1) For each leaf/node index, find the corresponding row number within the merge-table.
# 2) Add the maximum leaf index to the row number as indexing the nodes starts after indexing all the leaves.
for (i in 1:(nrow(dend)-1)) {
dend[i,2] <- row(hr.merge)[which(hr.merge %in% dend[i,1])]+n
}
# Generate table with indexing of all leaves (1st column) and inserting the corresponding row names into the 3rd column.
hr.order <- matrix(data=NA,
nrow=length(hr$labels), ncol=3,
dimnames=list(c(), c("order.number", "leaf.id", "row.name")))
hr.order[,1] <- c(0:(nrow(hr.order)-1))
hr.order[,3] <- t(hr$labels[hr$order])
hr.order <- data.frame(hr.order)
hr.order[,1] <- as.numeric(hr.order[,1])
# Assign the row name to each leaf.
dend <- as.data.frame(dend)
for (i in 1:nrow(df)) {
dend[which(dend[,1] %in% df[i,1]),6] <- rownames(df[i,])
}
# Assign the position on the dendrogram (from left to right) to each leaf.
for (i in 1:nrow(hr.order)) {
dend[which(dend[,6] %in% hr.order[i,3]),5] <- hr.order[i,1]-1
}
# Insert height for each node.
dend[c((n+2):nrow(dend)),4] <- hr$height
# All leaves get the highest possible pruning level
dend[which(dend[,1] <= n),3] <- nrow(hr.merge)
# The nodes get a decreasing index starting from the pruning level of the
# leaves minus 1 and up to 0
for (i in (n+2):nrow(dend)) {
if ((dend[i,4] != dend[(i-1),4]) || is.na(dend[(i-1),4])){
dend[i,3] <- dend[(i-1),3]-1}
else { dend[i,3] <- dend[(i-1),3] }
}
dend[,3] <- dend[,3]-min(dend[,3])
dend <- dend[order(-node.id),]
# Write results table.
write.table(dend, file="path", sep=";", row.names=F)
There is package that does exactly opposite of what you want - Labeltodendro ;-)
But seriously, can't you just manually extract the elements from hclust object (e.g. $merge, $height, $order) and create custom table from the extracted elements?