R loop doesn't work while single command works - r

I am trying to covert many time series xts objects to tibbles, and the for loop I wrote does not work properly, I don't know why.
This does not only happen to this particular task, but other task I perform, I have a list called "code", which contains a list of names for all the xts objects I want to convert from.
code <- c('ABT','BA','CL','ROK')
for (i in code)
{
i <- tk_tbl(i, preserve_index = TRUE, rename_index = "index",
timetk_idx = FALSE, silent = FALSE)
}
What is strange is that, if I use a single one without loop, it works beautifully and convert the xts "ABT" to a tibble "ABT"
ABT <- tk_tbl(ABT, preserve_index = TRUE, rename_index = "index",
timetk_idx = FALSE, silent = FALSE)
The error message for the first code is
Warning: No index to preserve. Object otherwise converted to tibble
successfully.
38: In tk_tbl.data.frame(as.data.frame(data), preserve_index, ... :
Edit:
tk_tabl is a function from the package timetk, and it "Coerce time-series objects to tibble."
And code is a vector containing names.
library(timetk)
code <- c('ABT','BA','CL','ROK')
> dput(head(ROK))
structure(c(8.14062, 8.15625, 8.03125, 7.78125, 7.6875, 7.71875,
8.25, 8.15625, 8.125, 7.90625, 7.71875, 7.75, 8.03125, 8.125,
7.90625, 7.65625, 7.625, 7.65625, 8.1875, 8.125, 7.90625, 7.71875,
7.65625, 7.6875, 109600, 80800, 138400, 151600, 96800, 258800,
0.684505, 0.67928, 0.660992, 0.645316, 0.640091, 0.642704),
class=c("xts", "zoo"), .indexCLASS = "Date", tclass = "Date",
.indexTZ = "UTC", tzone = "UTC", src = "yahoo",
updated = structure(1558826745.23035, class = c("POSIXct","POSIXt")),
index = structure(c(378604800, 378950400, 379036800,
379123200, 379209600, 379296000), tzone = "UTC", tclass = "Date"),
.Dim = c(6L, 6L), .Dimnames = list(NULL, c("ROK.Open", "ROK.High",
"ROK.Low", "ROK.Close", "ROK.Volume", "ROK.Adjusted")))

For me it looks like that you expect <- to do what assign is doing.
I think you get your expected result when you change your loop to:
for (i in code) {
assign(i, tk_tbl(i, preserve_index = TRUE, rename_index = "index", timetk_idx = FALSE, silent = FALSE))
}

Related

How to NOT write_csv if data frame is empty

I have a dataframe that is gathered everyday via a sql query. Sometimes it'll have rows in it, sometimes it wont. I then write_csv it into a onedrive location which triggers an automated email.
df and code like this if relevant:
df<-structure(list(PROTOCOL_ID = numeric(0), PROTOCOL_NO = character(0),
STATUS = character(0), STATUS_DATE = structure(numeric(0), tzone = "", class = c("POSIXct",
"POSIXt")), PROCESSED_FLAG = character(0), INITIATOR_CODE = numeric(0),
CHANGE_REASON_CODE = numeric(0), PR_STATUS_ID = numeric(0),
COMMENTS = character(0), CREATED_DATE = structure(numeric(0), tzone = "", class = c("POSIXct",
"POSIXt")), CREATED_USER = character(0), MODIFIED_DATE = structure(numeric(0), tzone = "", class = c("POSIXct",
"POSIXt")), MODIFIED_USER = character(0), OUTCOME_ID = numeric(0),
IRB_NO = character(0), NCT_NUMBER = character(0), PI_NAMES = character(0)), row.names = integer(0), class = "data.frame")
write_csv(df, "df.csv")
If the dataframe has zero rows that day, I'd rather it DIDN'T write the csv. I'm sure I could figure out a step that deletes the data frame if empty and then the write_csv line would error, but I'd rather not do that. Is there an easy way to 'turn off' the write?
We could have a condition to only write to csv when the number of rows is greater than 0
if(nrow(df) > 0) readr::write_csv(df, "df.csv")

Change data to numeric type to determine which distribution fits better

I am trying to figure out which distribution fits best logarithmic stock returns. Here is my code:
library(TTR)
sign="^GSPC"
start=19900101
end=20160101
x <- getYahooData(sign, start = start, end = end, freq = "daily")
x$logret <- log(x$Close) - lag(log(x$Close))
x=x[,6]
I want to use the function descdist(x, discrete = FALSE) which I got from this amazing post https://stats.stackexchange.com/questions/132652/how-to-determine-which-distribution-fits-my-data-best Nonetheless r gives me this error: Error in descdist(x, discrete = FALSE) : data must be a numeric vector How do I transform my data to numeric vector??
The output from dput(head(x)) is:
structure(c(NA, -0.00258888580664607, -0.00865029791190164, -0.00980414107803274,
0.00450431207515223, -0.011856706127011), class = c("xts", "zoo"
), .indexCLASS = "Date", .indexTZ = "UTC", tclass = "Date", tzone = "UTC", index = structure(c(631238400,
631324800, 631411200, 631497600, 631756800, 631843200), tzone = "UTC", tclass = "Date"), .Dim = c(6L,
1L), .Dimnames = list(NULL, "logret"))
Pre-process x using as.numeric(na.omit(x)), or simply run
descdist(as.numeric(na.omit(x)), discrete = FALSE)

How to fix 'no item called "package:pkg" on the search list' without using library(pkg)?

I am writing a package named testpkg and have put quantmod in the Depends section of the DESCRIPTION file.
I have written the following functions:
#' hello1
#'
#' #return NA
#' #export
hello1 <- function() {
print("hello1!")
quantmod::is.HLC("Hello, world!")
}
#' hello2
#'
#' #return NA
#' #export
hello2 <- function () {
x <- structure(c(25.85, 25.639999, 26.700001, 26.26, 26.92, 27.870001,
25.26, 25.52, 26.66, 25.610001, 26.85, 27.74, 26352700, 32512200,
64264600, 25.610001, 26.85, 27.74),
.indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC",
src = "yahoo", updated = structure(1437653990.9303, class = c("POSIXct",
"POSIXt")),
class = c("xts", "zoo"), index = structure(c(1167782400,
1167868800, 1167955200),
tzone = "UTC",
tclass = "Date"),
.Dim = c(3L, 6L), .Dimnames = list(NULL, c("YHOO.Open", "YHOO.High", "YHOO.Low",
"YHOO.Close", "YHOO.Volume", "YHOO.Adjusted")))
print(x)
quantmod::chartSeries(x)
}
Now, when I go into a project and run testpkg::hello1(), I get the output I expected.
However, if I run testpkg::hello2(), I can see that x is printed but the plot is not generated. I get the error:
Error in as.environment("package:quantmod") :
no item called "package:quantmod" on the search list
I know I can fix this by calling library(quantmod) before making the call to testpkg::hello2() but it seems odd to me that testpkg::hello1() can run error-free without calling library(quantmod). What is the reason for this and is there an alternative way to run testpkg::hello2() without calling library(quantmod) first?

Access First Graph

So I am using the PerformanceAnalytics package to plot performance summary of a simple PnL series so
library(xts)
library(PerformanceAnalytics)
dates <- structure(c(14008, 14011, 14012, 14013, 14014, 14015, 14018, 14019, 14020, 14021),
class = "Date")
PnL.xts = structure(c(0, -0.00510803851321091, -0.0102109843849305, -0.00138369232677364,
-0.00255257489213331, -0.00200279255353461, 0.0104232666033935,
0.00181846800788812, 4.72633257030091e-05, 0.0138334493571853),
.Dim = c(10L, 1L),
index = structure(c(1210291200, 1210550400, 1210636800, 1210723200,
1210809600, 1210896000, 1211155200, 1211241600,
1211328000, 1211414400),
tzone = "UTC", tclass = "Date"),
.indexCLASS = "Date", tclass = "Date",
.indexTZ = "UTC", tzone = "UTC", .Dimnames = list(NULL, "PnL"),
class = c("xts", "zoo"))
PnL.cum = cumsum(PnL.xts)
ret.ann = Return.annualized(PnL.xts, geometric = FALSE)
ret.cum = Return.cumulative(PnL.xts, geometric = FALSE)
ret.min = min(PnL.cum)
stdev = StdDev.annualized(PnL.xts)
sharpe = SharpeRatio.annualized(PnL.xts, geometric = FALSE)
stats = paste(paste("Annualized Return:", percent(round(ret.ann, 5))),
paste("Cumulative Return:", percent(round(ret.cum, 5))),
paste("Standard Deviation:", round(stdev, 5)),
paste("Sharpe Ratio:", round(sharpe, 5)), sep = '\n' )
lag = 1
descr = paste("Following fitted Granger model - ", lag, " day lag", sep = "")
charts.PerformanceSummary(R = PnL.xts, geometric = FALSE)
text(midrange(dates),ret.min, labels = stats, cex = 1)
mtext(descr, side = 3, line = 31)
However, I want to add some descriptive text into the cumulative PnL chart such as annualized return, cumulative return, standard deviation, and sharpe. How can I paste this into the whitespace in the first graph?
If I plotted the graph by itself, I could just do it with the above code. However, since the charts.PerformanceSummary function automatically plots 3 graphs, I can only access the 3rd graph it seems. Is there any way to access the first of 3 graphs printed by a function, so that I can write text on it relative to its own coordinates?
Here is an example of what I want: http://i.imgur.com/QXUb2Aq.png. But in this case, I had to manually, test values of the y coordinate until I found somthing that worked. Thanks!

How to remove a row from zoo/xts object, given a timestamp

I was happily running with this code:
z=lapply(filename_list, function(fname){
read.zoo(file=fname,header=TRUE,sep = ",",tz = "")
})
xts( do.call(rbind,z) )
until Dirty Data came along with this at the end of one file:
Open High Low Close Volume
2011-09-20 21:00:00 1.370105 1.370105 1.370105 1.370105 1
and this at the start of the next file:
Open High Low Close Volume
2011-09-20 21:00:00 1.370105 1.371045 1.369685 1.3702 2230
So rbind.zoo complains about a duplicate.
I can't use something like:
y <- x[ ! duplicated( index(x) ), ]
as they are in different zoo objects, inside a list. And I cannot use aggregate, as suggested here because they are a list of zoo objects, not one big zoo object. And I can't get one big object 'cos of the duplicates. Catch-22.
So, when the going gets tough, the tough hack together some for loops (excuse the prints and a stop, as this isn't working code yet):
indexes <- do.call("c", unname(lapply(z, index)))
dups=duplicated(indexes)
if(any(dups)){
duplicate_timestamps=indexes[dups]
for(tix in 1:length(duplicate_timestamps)){
t=duplicate_timestamps[tix]
print("We have a duplicate:");print(t)
for(zix in 1:length(z)){
if(t %in% index(z[[zix]])){
print(z[[zix]][t])
if(z[[zix]][t]$Volume==1){
print("-->Deleting this one");
z[[zix]][t]=NULL #<-- PROBLEM
}
}
}
}
stop("There are duplicate bars!!")
}
The bit I've got stumped on is assigning NULL to a zoo row doesn't delete it (Error in NextMethod("[<-") : replacement has length zero). OK, so I do a filter-copy, without the offending item... but I'm tripping up on these:
> z[[zix]][!t,]
Error in Ops.POSIXt(t) : unary '!' not defined for "POSIXt" objects
> z[[zix]][-t,]
Error in `-.POSIXt`(t) : unary '-' is not defined for "POSIXt" objects
P.S. While high-level solutions to my real problem of "duplicates rows across a list of zoo objects" are very welcome, the question here is specifically about how to delete a row from a zoo object given a POSIXt index object.
A small bit of test data:
list(structure(c(1.36864, 1.367045, 1.370105, 1.36928, 1.37039,
1.370105, 1.36604, 1.36676, 1.370105, 1.367065, 1.37009, 1.370105,
5498, 3244, 1), .Dim = c(3L, 5L), .Dimnames = list(NULL, c("Open",
"High", "Low", "Close", "Volume")), index = structure(c(1316512800,
1316516400, 1316520000), class = c("POSIXct", "POSIXt"), tzone = ""), class = "zoo"),
structure(c(1.370105, 1.370115, 1.36913, 1.371045, 1.37023,
1.37075, 1.369685, 1.36847, 1.367885, 1.3702, 1.36917, 1.37061,
2230, 2909, 2782), .Dim = c(3L, 5L), .Dimnames = list(NULL,
c("Open", "High", "Low", "Close", "Volume")), index = structure(c(1316520000,
1316523600, 1316527200), class = c("POSIXct", "POSIXt"), tzone = ""), class = "zoo"))
UPDATE: Thanks to G. Grothendieck for the row-deleting solution. In the actual code I followed the advice of Joshua and GSee to get a list of xts objects instead of a list of zoo objects. So my code became:
z=lapply(filename_list, function(fname){
xts(read.zoo(file=fname,header=TRUE,sep = ",",tz = ""))
})
x=do.call.rbind(z)
(As an aside, please note the call to do.call.rbind. This is because rbind.xts has some serious memory issues. See https://stackoverflow.com/a/12029366/841830 )
Then I remove duplicates as a post-process step:
dups=duplicated(index(x))
if(any(dups)){
duplicate_timestamps=index(x)[dups]
to_delete=x[ (index(x) %in% duplicate_timestamps) & x$Volume<=1]
if(nrow(to_delete)>0){
#Next line says all lines that are not in the duplicate_timestamp group
# OR are in the duplicate timestamps, but have a volume greater than 1.
print("Will delete the volume=1 entry")
x=x[ !(index(x) %in% duplicate_timestamps) | x$Volume>1]
}else{
stop("Duplicate timestamps, and we cannot easily remove them just based on low volume.")
}
}
If z1 and z2 are your zoo objects then to rbind while removing any duplicates in z2:
rbind( z1, z2[ ! time(z2) %in% time(z1) ] )
Regarding deleting points in a zoo object having specified times, the above already illustrates this but in general if tt is a vector of times to delete:
z[ ! time(z) %in% tt ]
or if we knew there were a single element in tt then z[ time(z) != tt ] .
rbind.xts will allow duplicate index values, so it could work if you convert to xts first.
x <- lapply(z, as.xts)
y <- do.call(rbind, x)
# keep last value of any duplicates
y <- y[!duplicated(index(y),fromLast=TRUE),]
I think you'll have better luck if you convert to xts first.
a <- structure(c(1.370105, 1.370105, 1.370105, 1.370105, 1), .Dim = c(1L,
5L), index = structure(1316570400, tzone = "", tclass = c("POSIXct",
"POSIXt")), .indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct",
"POSIXt"), .indexTZ = "", tzone = "", .Dimnames = list(NULL,
c("Open", "High", "Low", "Close", "Volume")), class = c("xts",
"zoo"))
b <- structure(c(1.370105, 1.371045, 1.369685, 1.3702, 2230), .Dim = c(1L,
5L), index = structure(1316570400, tzone = "", tclass = c("POSIXct",
"POSIXt")), .indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct",
"POSIXt"), .indexTZ = "", tzone = "", .Dimnames = list(NULL,
c("Open", "High", "Low", "Close", "Volume")), class = c("xts",
"zoo"))
(comb <- rbind(a, b))
# Open High Low Close Volume
#2011-09-20 21:00:00 1.370105 1.370105 1.370105 1.370105 1
#2011-09-20 21:00:00 1.370105 1.371045 1.369685 1.370200 2230
dupidx <- index(comb)[duplicated(index(comb))] # indexes of duplicates
tail(comb[dupidx], 1) #last duplicate
# now rbind the last duplicated row with all non-duplicated data
rbind(comb[!index(comb) %in% dupidx], tail(comb[dupidx], 1))

Resources