R data frame with dates - r

I have a data frame in the following form
SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
2007-01-03 142.25 142.86 140.57 141.37 94807600 125.38
2007-01-04 141.23 142.05 140.61 141.67 69620600 125.65
2007-01-05 141.33 141.40 140.38 140.54 76645300 124.64
2007-01-08 140.82 141.41 140.25 141.19 71655000 125.22
2007-01-09 141.31 141.60 140.40 141.07 75680100 125.11
2007-01-10 140.58 141.57 140.30 141.54 72428000 125.53
however the command index(DATA.FRAME) return integers rather than dates. What function should I use to get a list of dates instead of integers?
EDIT:
The output of dput(DATA.FRAME) is
structure(list(SPY.Open = c(142.25, 141.23, 141.33, 140.82, 141.31,
140.58), SPY.High = c(142.86, 142.05, 141.4, 141.41, 141.6, 141.57
), SPY.Low = c(140.57, 140.61, 140.38, 140.25, 140.4, 140.3),
SPY.Close = c(141.37, 141.67, 140.54, 141.19, 141.07, 141.54
), SPY.Volume = c(94807600, 69620600, 76645300, 71655000,
75680100, 72428000), SPY.Adjusted = c(125.38, 125.65, 124.64,
125.22, 125.11, 125.53)), .Names = c("SPY.Open", "SPY.High",
"SPY.Low", "SPY.Close", "SPY.Volume", "SPY.Adjusted"), row.names = c("2007-01-03",
"2007-01-04", "2007-01-05", "2007-01-08", "2007-01-09", "2007-01-10"
), class = "data.frame")

I'm not familiar with the command index in R. It looks like the dates are stored as the rownames of your data frame.
I would try:
as.Date(rownames(DataFrameName))
Alternatively, you can turn integers into dates in R. I forget exactly how, but basically you just need one conversation factor (say 15344 = Oct. 9th 2007 or something) - it should be in ?as.Date or ?as.POSIXct

rownames(DATA.FRAME) #results in character vector
#[1] "2007-01-03" "2007-01-04" "2007-01-05" "2007-01-08" "2007-01-09" "2007-01-10"
as.Date(rownames(DATA.FRAME)) #convert to date

Related

R: do.call with merge and eapply

I am merging two xts objects with join="left" i.e. (all rows in the left object, and those that match in the right). I loaded these objectd in myEnv.
library(quantmod)
myEnv <- new.env()
getSymbols("AAPL;FB", env=myEnv)
[1] "AAPL" "FB"
MainXTS <- do.call(merge, c(eapply(myEnv, Cl), join = "left"))
head(MainXTS)
AAPL.Close FB.Close
2007-01-03 2.992857 NA
2007-01-04 3.059286 NA
2007-01-05 3.037500 NA
2007-01-08 3.052500 NA
2007-01-09 3.306072 NA
2007-01-10 3.464286 NA
range(index(myEnv$AAPL))
[1] "2007-01-03" "2020-10-27"
range(index(myEnv$FB))
[1] "2012-05-18" "2020-10-27"
So far it is working as expected since the time index in above merged object is being picked up from APPL. The issue is that when I change the order of the tickers so that FB comes first, the merged object still picks up time indexes from AAPL.
myEnv <- new.env()
getSymbols("FB;AAPL", env=myEnv)
[1] "FB" "AAPL"
MainXTS <- do.call(merge, c(eapply(myEnv, Cl), join = "left"))
head(MainXTS)
AAPL.Close FB.Close
2007-01-03 2.992857 NA
2007-01-04 3.059286 NA
2007-01-05 3.037500 NA
2007-01-08 3.052500 NA
2007-01-09 3.306072 NA
2007-01-10 3.464286 NA
I was expecting the time index to be picked up from FB. Does any one know what I am missing?
I think this has something to do with the fact that the order of objects being loaded is the same and in both cases above it is:
ls(myEnv)
[1] "AAPL" "FB"
We can change the order with match
out <- do.call(merge, c(lapply(mget(ls(myEnv)[match(ls(myEnv),
c("FB", "AAPL"))], myEnv), Cl), join = "left"))
-output
head(out)
# FB.Close AAPL.Close
#2012-05-18 38.23 18.94214
#2012-05-21 34.03 20.04571
#2012-05-22 31.00 19.89179
#2012-05-23 32.00 20.37714
#2012-05-24 33.03 20.19000
#2012-05-25 31.91 20.08178

diff() function returns an empty object

I'm trying to differenciate a time serie, which looks like that : time serie to differenciate. But sadly, diff(spread) returns me this. I also tried diff(spread,1)). I nearly copypasted some code of a working example, and I don't find any obvious mistakes. I installed the modules two hours ago, so I've got the last version of all packages used.
# chemin espace de travail
setwd("C:/Users/Simon/Desktop/Projet serie temp")
#### Q1 ####
require(zoo)
require(tseries)
require(fUnitRoots)
data <- read.csv("base_form.csv",sep=",") #import .csv
View(data) #visualisation
indice = data$Index
dates = data$Dates
spread <- zoo(indice, order.by=dates)
View(spread)
plot.window(ylim = c(-20,20))
plot(spread) #représentation graphique
dspread <- diff(spread) #différence première
plot(cbind(spread,dspread))
Here is the error I get :
> plot(dspread)
Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs) :
valeurs finies requises pour 'ylim'
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
Here is the output of dput(head(spread))
structure(c(83.87, 86.15, 94.07, 90.02, 92.22, 93.18), index = structure(1:6, .Label = c("1990-01",
"1990-02", "1990-03", "1990-04", "1990-05", "1990-06", "1990-07",
"1990-08", "1990-09", "1990-10", "1990-11", "1990-12", "1991-01",
"1991-02", "1991-03", "1991-04", "1991-05", "1991-06", "1991-07",
"1991-08", "1991-09", "1991-10", "1991-11", "1991-12", "1992-01",
"1992-02", "1992-03", "1992-04", "1992-05", "1992-06", "1992-07",
"1992-08", "1992-09", "1992-10", "1992-11", "1992-12", "1993-01",
"1993-02", "1993-03", "1993-04", "1993-05", "1993-06", "1993-07",
"1993-08", "1993-09", "1993-10", "1993-11", "1993-12", "1994-01",
"1994-02", "1994-03", "1994-04", "1994-05", "1994-06", "1994-07",
"1994-08", "1994-09", "1994-10", "1994-11", "1994-12", "1995-01",
"1995-02", "1995-03", "1995-04", "1995-05", "1995-06", "1995-07",
"1995-08", "1995-09", "1995-10", "1995-11", "1995-12", "1996-01",
"1996-02", "1996-03", "1996-04", "1996-05", "1996-06", "1996-07",
"1996-08", "1996-09", "1996-10", "1996-11", "1996-12", "1997-01",
"1997-02", "1997-03", "1997-04", "1997-05", "1997-06", "1997-07",
"1997-08", "1997-09", "1997-10", "1997-11", "1997-12", "1998-01",
"1998-02", "1998-03", "1998-04", "1998-05", "1998-06", "1998-07",
"1998-08", "1998-09", "1998-10", "1998-11", "1998-12", "1999-01",
"1999-02", "1999-03", "1999-04", "1999-05", "1999-06", "1999-07",
"1999-08", "1999-09", "1999-10", "1999-11", "1999-12", "2000-01",
"2000-02", "2000-03", "2000-04", "2000-05", "2000-06", "2000-07",
"2000-08", "2000-09", "2000-10", "2000-11", "2000-12", "2001-01",
"2001-02", "2001-03", "2001-04", "2001-05", "2001-06", "2001-07",
"2001-08", "2001-09", "2001-10", "2001-11", "2001-12", "2002-01",
"2002-02", "2002-03", "2002-04", "2002-05", "2002-06", "2002-07",
"2002-08", "2002-09", "2002-10", "2002-11", "2002-12", "2003-01",
"2003-02", "2003-03", "2003-04", "2003-05", "2003-06", "2003-07",
"2003-08", "2003-09", "2003-10", "2003-11", "2003-12", "2004-01",
"2004-02", "2004-03", "2004-04", "2004-05", "2004-06", "2004-07",
"2004-08", "2004-09", "2004-10", "2004-11", "2004-12", "2005-01",
"2005-02", "2005-03", "2005-04", "2005-05", "2005-06", "2005-07",
"2005-08", "2005-09", "2005-10", "2005-11", "2005-12", "2006-01",
"2006-02", "2006-03", "2006-04", "2006-05", "2006-06", "2006-07",
"2006-08", "2006-09", "2006-10", "2006-11", "2006-12", "2007-01",
"2007-02", "2007-03", "2007-04", "2007-05", "2007-06", "2007-07",
"2007-08", "2007-09", "2007-10", "2007-11", "2007-12", "2008-01",
"2008-02", "2008-03", "2008-04", "2008-05", "2008-06", "2008-07",
"2008-08", "2008-09", "2008-10", "2008-11", "2008-12", "2009-01",
"2009-02", "2009-03", "2009-04", "2009-05", "2009-06", "2009-07",
"2009-08", "2009-09", "2009-10", "2009-11", "2009-12", "2010-01",
"2010-02", "2010-03", "2010-04", "2010-05", "2010-06", "2010-07",
"2010-08", "2010-09", "2010-10", "2010-11", "2010-12", "2011-01",
"2011-02", "2011-03", "2011-04", "2011-05", "2011-06", "2011-07",
"2011-08", "2011-09", "2011-10", "2011-11", "2011-12", "2012-01",
"2012-02", "2012-03", "2012-04", "2012-05", "2012-06", "2012-07",
"2012-08", "2012-09", "2012-10", "2012-11", "2012-12", "2013-01",
"2013-02", "2013-03", "2013-04", "2013-05", "2013-06", "2013-07",
"2013-08", "2013-09", "2013-10", "2013-11", "2013-12", "2014-01",
"2014-02", "2014-03", "2014-04", "2014-05", "2014-06", "2014-07",
"2014-08", "2014-09", "2014-10", "2014-11", "2014-12", "2015-01",
"2015-02", "2015-03", "2015-04", "2015-05", "2015-06", "2015-07",
"2015-08", "2015-09", "2015-10", "2015-11", "2015-12", "2016-01",
"2016-02", "2016-03", "2016-04", "2016-05", "2016-06", "2016-07",
"2016-08", "2016-09", "2016-10", "2016-11", "2016-12", "2017-01",
"2017-02", "2017-03", "2017-04", "2017-05", "2017-06", "2017-07",
"2017-08", "2017-09", "2017-10", "2017-11", "2017-12", "2018-01",
"2018-02"), class = "factor"), class = "zoo")
I cannot reproduce the problem perfectly, but I have some thoughts.
TL;DR: Edit: don't use factors, use either character or Date objects before zoo-ifying things.
I hunted this down by looking at the source for zoo:::diff.zoo. Namely, it was failing at
x - lag(x, k=-1)
# Data:
# numeric(0)
# Index:
# factor(0)
# 338 Levels: 1990-01 1990-02 1990-03 1990-04 1990-05 1990-06 1990-07 1990-08 1990-09 1990-10 1990-11 1990-12 1991-01 1991-02 1991-03 1991-04 1991-05 1991-06 1991-07 1991-08 1991-09 1991-10 1991-11 1991-12 1992-01 1992-02 1992-03 1992-04 ... 2018-02
I believe that typically zoo objects are indexed based on some form of time-progression. This might be simple integers, as in
str(zoo(2:5))
# 'zoo' series from 1 to 4
# Data: int [1:4] 2 3 4 5
# Index: int [1:4] 1 2 3 4
or something more explicit/intentional, such as a Date or POSIXct timestamp. In your case, it's a factor. I don't know if zoo is trying to treat it like an integer (probably not, otherwise it should have come up with something), or like some categorical character, most likely not what you want in a time-series. (Correction: as 42- pointed out, this is actually quite fine.)
So even if zoo intelligently deals with factors, there is also the problem that the date you have listed is not perfectly unambiguous (is not a time-based object). For instance, by "1990-01" do you mean "1990-01-01"? Though it might seem intuitive and obvious to make that assumption, R typically does not follow you on that leap.
Try this:
(ind <- index(x))
# [1] 1990-01 1990-02 1990-03 1990-04 1990-05 1990-06
# 338 Levels: 1990-01 1990-02 1990-03 1990-04 1990-05 1990-06 1990-07 1990-08 1990-09 1990-10 1990-11 1990-12 ... 2018-02
(ind <- as.Date(paste0(ind, "-01"), format="%Y-%m-%d"))
# [1] "1990-01-01" "1990-02-01" "1990-03-01" "1990-04-01" "1990-05-01" "1990-06-01"
index(x) <- ind
(The surrounding parentheses are merely a shortcut to dump the output post-assignment. They can be safely removed for production.) That now allows
x - lag(x, k=-1)
# 1990-01-01 1990-02-01 1990-03-01 1990-04-01 1990-05-01 1990-06-01
# NA 2.28 7.92 -4.05 2.20 0.96
which means your spread is likely working now:
diff(x)
# 1990-02-01 1990-03-01 1990-04-01 1990-05-01 1990-06-01
# 2.28 7.92 -4.05 2.20 0.96
My guess means that your data import should instead look like:
data <- read.csv("base_form.csv",sep=",") #import .csv
indice = data$Index
dates = as.Date(paste0(data$Dates, "-01"), format="%Y-%m-%d")
spread <- zoo(indice, order.by=dates)
or more simply
data <- read.csv("base_form.csv",sep=",")
dates = as.character(data$Dates)
or even more simply
data <- read.csv("base_form.csv",sep=",", stringsAsFactors=FALSE)
The problem appears to be the dates are encoded as factors. Note the difference if we construct spread manually:
> indice <- c(83.87, 86.15, 94.07, 90.02, 92.22, 93.18)
> dates <- as.factor(c("1990-01", "1990-02", "1990-03", "1990-04", "1990-05", "1990-06"))
> spread <- zoo(indice, order.by = dates)
> diff(spread)
Data:
numeric(0)
Index:
factor(0)
Levels: 1990-01 1990-02 1990-03 1990-04 1990-05 1990-06
> dates <- c("1990-01", "1990-02", "1990-03", "1990-04", "1990-05", "1990-06")
> spread <- zoo(indice, order.by = dates)
> diff(spread)
1990-02 1990-03 1990-04 1990-05 1990-06
2.28 7.92 -4.05 2.20 0.96
To fix it, you can try adding stringsAsFactors = FALSE to your read.csv.
data <- read.csv("base_form.csv", stringsAsFactors = FALSE)
(Note that sep = "," is the default for read.csv, so you don't really need to specify it.)
EDIT: I should add there are many more zoo-like way of reading dates in correctly, see https://cran.r-project.org/web/packages/zoo/vignettes/zoo-read.pdf
I'm posting to correct what I think are some inaccuracies in r2evans analysis of the problem. It is true that the problem stems from using a factor as an index. The factor class in R does not support ordering operations and at least one of the "o"'s in the name "zoo" stands for "ordered". It could have been solved quickly by:
index(spread) <- as.character(index(spread))
Then the diff-operation would have succeeded, and the cbind operation would also have succeeded because there is a cbind.zoo function that recognizes differences in number of rows and automagically pads the shorter columns with NA's at the beginning.
> cbind( diff(spread), spread )
diff(spread) spread
1990-01 NA 83.87
1990-02 2.28 86.15
1990-03 7.92 94.07
1990-04 -4.05 90.02
1990-05 2.20 92.22
1990-06 0.96 93.18
> cbind( diff(diff(spread)), spread )
diff(diff(spread)) spread
1990-01 NA 83.87
1990-02 NA 86.15
1990-03 5.64 94.07
1990-04 -11.97 90.02
1990-05 6.25 92.22
1990-06 -1.24 93.18
Character vectors are perfectly acceptable index classes for zoo. They will be ordered as lexical values. It's perfectly acceptable to make a "<" or ">" operation on two character values, so there is no ambiguity in this case. The zoo-package also has a yearmon class that this index could become if desired.

dailyReturn with xts object

I am having difficulty using dailyReturn function on an xts object with multiple return series.
a<-Cl(getSymbols("INTC",auto.assign=FALSE))
b<-Cl(getSymbols("IBM",auto.assign=FALSE))
a<-merge(a,b)
dailyReturn(a[,1]) #This works!
dailyReturn(a) #Only return the result for first series
apply(a,2, dailyReturn)
#Error in array(ans, c(len.a%/%d2, d.ans), if (!all(vapply(dn.ans, is.null, :
length of 'dimnames' [1] not equal to array extent
How do I get dailyReturn to return the daily returns for multiple series in xts object?
I prefer ROC also, but if you must use dailyReturn, you can lapply over the columns and cbind them back together.
> head(do.call(cbind, lapply(a, dailyReturn)))
daily.returns daily.returns.1
2007-01-03 0.0000000000 0.000000000
2007-01-04 0.0402948403 0.010691889
2007-01-05 -0.0033065659 -0.009052996
2007-01-08 -0.0042654028 0.015191952
2007-01-09 0.0009519277 0.011830131
2007-01-10 0.0233000476 -0.011791746
I used do.call so that it will work with any number of columns.
I would just use TTR::ROC instead.
> head(r <- ROC(a, type="discrete"))
INTC.Close IBM.Close
2007-01-03 NA NA
2007-01-04 0.0402948403 0.010691889
2007-01-05 -0.0033065659 -0.009052996
2007-01-08 -0.0042654028 0.015191952
2007-01-09 0.0009519277 0.011830131
2007-01-10 0.0233000476 -0.011791746

How to write a "find and replace all BUT" function in R? [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I have a dataframe that looks roughly like below (meaning that it is an approximation made for illustration, and not an exact replica of the dataframe you can download through the link below, or get from the dput() I pasted below):
March_created_at March_email March_type April_created_at April_email April_type
3/11/12 7:28 jeremy#asynk.ch PushEvent 4/1/12 4:03 PushEvent
3/11/12 7:28 jeremy#asynk.ch PushEvent 4/1/12 4:03 PushEvent
3/11/12 7:28 jeremy#asynk.ch PushEvent 4/1/12 4:03 PushEvent
3/11/12 7:28 jeremy#asynk.ch PushEvent 4/1/12 7:03 high IssuesEvent
3/11/12 11:06 medium PushEvent 4/1/12 13:57 medium PushEvent
3/11/12 11:06 medium PushEvent 4/1/12 13:57 medium PushEvent
3/11/12 11:06 medium PushEvent 4/1/12 13:57 medium PushEvent
3/11/12 12:46 PushEvent
3/11/12 12:46 PushEvent
3/11/12 12:46 PushEvent
The full dataset can be found here as a CSV file
I'm looking to write a function that takes the following inputs:
A dataframe
Certain columns of that dataframe
A list of strings (e.g. a set of email addresses)
A replacement string (e.g. "low")
Now, I want the function to go through only the specified columns of that dataframe and replace all of the strings (as well as empty cells) that do not match the list of strings specified in point 3 above with the replacement string in point 4. However, this should only be done if the following condition holds:
The cell under consideration needs to have a timestamp for the same month.
For example, let's say we are about to replace the empty cell on row 8 in column "March_email". I can see that on row 8 in the column "March_created_at" there is a timestamp, so I can go ahead and replace this empty cell with the specified string (e.g. "low"). However, look at row 8 in the column "April_email". This cell is also empty, and so is the cell on row 8 in column "April_created_at". In this case, nothing should be done (i.e. no string inserted).
The reason I want to do this is that certain cells are just empty because there is no data, so nothing should be inserted. Other cells are empty because the data is missing, so I need to impute the data based on the function I specified above.
How can I accomplish this in R?
Appendix: Here is a dput() of the head of the dataset:
structure(list(March_created_at = c("2012-03-11 07:28:04", "2012-03-11 07:28:04",
"2012-03-11 07:28:04", "2012-03-11 07:28:19", "2012-03-11 07:28:19",
"2012-03-11 07:28:19"), March_actor_attributes_email = c("jeremy#asynk.ch",
"jeremy#asynk.ch", "jeremy#asynk.ch", "jeremy#asynk.ch", "jeremy#asynk.ch",
"jeremy#asynk.ch"), March_type = c("PushEvent", "PushEvent",
"PushEvent", "PushEvent", "PushEvent", "PushEvent"), April_created_at = c("2012-04-01 04:03:13",
"2012-04-01 04:03:13", "2012-04-01 04:03:13", "2012-04-01 07:03:11",
"2012-04-01 07:03:11", "2012-04-01 07:03:11"), April_actor_attributes_email = c("",
"", "", "high", "high", "high"), April_type = c("PushEvent",
"PushEvent", "PushEvent", "IssuesEvent", "IssuesEvent", "IssuesEvent"
), May_created_at = c("2012-05-01 00:16:05", "2012-05-01 00:16:05",
"2012-05-01 00:16:05", "2012-05-01 01:03:19", "2012-05-01 01:03:19",
"2012-05-01 01:03:19"), May_actor_attributes_email = c("john.firebaugh#gmail.com",
"john.firebaugh#gmail.com", "john.firebaugh#gmail.com", "mitch.tishmack#gmail.com",
"mitch.tishmack#gmail.com", "mitch.tishmack#gmail.com"), May_type = c("PushEvent",
"PushEvent", "PushEvent", "IssueCommentEvent", "IssueCommentEvent",
"IssueCommentEvent"), June_created_at = c("2012-06-01 00:25:05",
"2012-06-01 00:25:05", "2012-06-01 00:25:05", "2012-06-01 00:42:29",
"2012-06-01 00:42:29", "2012-06-01 00:42:29"), June_actor_attributes_email = c("michaelklishin#me.com",
"michaelklishin#me.com", "michaelklishin#me.com", "", "", ""),
June_type = c("IssueCommentEvent", "IssueCommentEvent", "IssueCommentEvent",
"PushEvent", "PushEvent", "PushEvent"), July_created_at = c("2012-07-01 13:46:20",
"2012-07-01 13:46:20", "2012-07-02 11:53:37", "2012-07-02 11:53:37",
"2012-07-02 12:27:30", "2012-07-02 12:27:30"), July_actor_attributes_email = c("medium",
"medium", "ryoqun#gmail.com", "ryoqun#gmail.com", "ryoqun#gmail.com",
"ryoqun#gmail.com"), July_type = c("PushEvent", "PushEvent",
"CreateEvent", "CreateEvent", "PushEvent", "PushEvent"),
August_created_at = c("2012-08-01 00:04:09", "2012-08-01 00:04:09",
"2012-08-01 00:04:42", "2012-08-01 00:04:42", "2012-08-01 00:05:04",
"2012-08-01 00:05:04"), August_actor_attributes_email = c("jeremy#asynk.ch",
"jeremy#asynk.ch", "jeremy#asynk.ch", "jeremy#asynk.ch",
"jeremy#asynk.ch", "jeremy#asynk.ch"), August_type = c("IssueCommentEvent",
"IssueCommentEvent", "IssuesEvent", "IssuesEvent", "IssueCommentEvent",
"IssueCommentEvent"), September_created_at = c("2012-09-01 18:12:24",
"2012-09-01 18:12:24", "2012-09-01 23:51:18", "2012-09-01 23:51:18",
"2012-09-02 00:34:54", "2012-09-02 00:34:54"), September_actor_attributes_email = c("ryoqun#gmail.com",
"ryoqun#gmail.com", "ryoqun#gmail.com", "ryoqun#gmail.com",
"ryoqun#gmail.com", "ryoqun#gmail.com"), September_type = c("CommitCommentEvent",
"CommitCommentEvent", "CreateEvent", "CreateEvent", "PushEvent",
"PushEvent"), October_created_at = c("2012-10-01 07:48:38",
"2012-10-01 10:01:40", "2012-10-01 10:01:43", "2012-10-01 10:17:00",
"2012-10-01 16:08:29", "2012-10-01 18:06:46"), October_actor_attributes_email = c("medium",
"medium", "medium", "medium", "", "core"), October_type = c("PushEvent",
"IssuesEvent", "PushEvent", "PushEvent", "ForkEvent", "PullRequestEvent"
)), .Names = c("March_created_at", "March_actor_attributes_email",
"March_type", "April_created_at", "April_actor_attributes_email",
"April_type", "May_created_at", "May_actor_attributes_email",
"May_type", "June_created_at", "June_actor_attributes_email",
"June_type", "July_created_at", "July_actor_attributes_email",
"July_type", "August_created_at", "August_actor_attributes_email",
"August_type", "September_created_at", "September_actor_attributes_email",
"September_type", "October_created_at", "October_actor_attributes_email",
"October_type"), row.names = c(NA, 6L), class = "data.frame")
How about something like this:
myfun <- function(month, DF, matches, replacement) {
email.col <- paste0(month, '_actor_attributes_email')
date.col <- paste0(month, '_created_at')
DF[[email.col]] <- ifelse(DF[[date.col]] != '' & !DF[[email.col]] %in% matches,
DF[[email.col]],
replacement)
return (DF[, c(date.col, email.col)])
}
myfun(dat, 'April', 'high', 'foo')
# April_created_at April_actor_attributes_email
# 1 2012-04-01 04:03:13 foo
# 2 2012-04-01 04:03:13 foo
# 3 2012-04-01 04:03:13 foo
# 4 2012-04-01 07:03:11 high
# 5 2012-04-01 07:03:11 high
# 6 2012-04-01 07:03:11 high
Then, you can just feed it a bunch of months...
out <- lapply(list('March', 'April', 'May'),
myfun, DF=dat, matches='', replacement='foo')
And you can get that back into a data.frame right quick. with plyr
as.data.frame(unlist(out, recursive=FALSE))
There are plenty of other ways and options but this should give you a big start.

quantmod ... unable to get OHLCV symbol data for current day

I am not able to get OHLCV data from yahoo for current day using Quantmod getSymbols() call. The data exists on yahoo and can also see today's OHLCV data in my charting platform. As workaround I got today's EOD quote from yahoo using getQuote(..) call. But when I tried to append this to the downloaded symbol data via rbind, the data object gets populated with NULLs.
I appreciate any suggestions on either how to append the today's quote to the downloaded historic symbol data or any R API's I can call after market hours to get symbol EOD (OHLCV data) including for today. Thanks.
library(quantmod)
library(blotter)
library(PerformanceAnalytics)
getSymbols("SPY")
spy.quote = getQuote("SPY", what = yahooQuote.EOD)
> tail(SPY, n=3)
SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
2012-10-25 142.02 142.28 140.57 141.43 134457400 141.43
2012-10-26 141.30 141.84 140.39 141.35 146023500 141.35
2012-10-31 141.85 142.03 140.68 141.35 103341300 141.35
> spy.quote
Trade Time Open High Low Close Volume
SPY 2012-11-01 04:00:00 141.65 143.01 141.52 142.83 100990760
> SPY = rbind(SPY, spy.quote)
> tail(SPY, n=3)
SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL
spy.quote NULL NULL NULL NULL NULL NULL
You need to convert the quote data from a data.frame to an xts object and add a column for Adjusted price. Then you can rbind.
getSymbols("SPY", src='yahoo', to='2012-10-31')
spy.quote = getQuote("SPY", what = yahooQuote.EOD)
# convert to xts
xts.quote <- xts(spy.quote[, -1], as.Date(spy.quote[, 1])) # use Date for indexClass
xts.quote$Adjusted <- xts.quote[, 'Close'] # add an Adjusted column
tail(rbind(SPY, xts.quote), 3)
SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
2012-10-26 141.30 141.84 140.39 141.35 146023500 141.35
2012-10-31 141.85 142.03 140.68 141.35 103341300 141.35
2012-11-01 141.65 143.01 141.52 142.83 100995568 142.83

Resources