Splitting dataset based on date in R, using library (lubridate) - r

While splitting a dataset I end up with the following error, and look for some advice to overcome it:
dt=read.csv("C:/xx/fData.csv")
testdata = dt[year(dt$date) < 2010,]
valid = dt[year(dt$date) > 2010,]
> training = dt[year(dt$date) < 2010,]
Error in as.POSIXlt.character(as.character(x), ...) :
character string is not in a standard unambiguous format
> testing = dt[year(dt$date) > 2010,]
Error in as.POSIXlt.character(as.character(x), ...) :
ps: fData looks like
| date | number
----------------------
1 |1/1/2011| 0
2 |1/2/2011| 0

Given that the first part of your string is the month day and the second part is the month you should convert your string into a date format before calling the year function
dt$date <- as.Date(dt$date,'%d/%m/%Y')

Related

Error in converting character to time variable in r with Lubridate packages

Thanks for your help.
One of variable in my dataset looks like this:
> df$TM
> [1] "000054" "000020" "000056" "000051" "000025" "000116" "000219" "000207" "000233" "000206" "000142" "000126" "000237" "000215" "000236" "000246" "000219"
[18] "000227" "000803" "000920"...
The real meaning of each character is hours, minutes and seconds.
When I adjust hms function in Lubridates as follows
> df$TM <- hms(df$TM)
Warning message is coming: "In .parse_hms(..., order = "HMS", quiet = quiet) :
Some strings failed to parse, or all strings are NAs"
After that, all the values in the column changes to NA.
I also tried
> df$TM <- as.POSIXct(df$TM, format = "%H:%M:%S")
and
> df$TM <- chronicle(times = df$TM)
and
> df$TM <- strptime(df$TM, format = "%H:%M:%S")
but... these three trial also have same results.
(Actually all data has changed to NA, so warning message is same as error message to me)
I really appreciate your help.
You can make use of this answer to include a semicolon after every second element. After that you can transform the resulting character string as date (with day, month and year) or leave it as is.
For completeness, the solution for your problem then is
as.POSIXct(sub(":+$", "", gsub('(.{2})', '\\1:', df$TM)), format = "%H:%M:%S")

How to choose a specific line in a dataset using R

dataset
And I want to pick the row which the Date is 17/12/2006 and 18/12/2006, the type of Date is character, I use the code:
a<-c('17/12/2006','18/12/2006')
NewTable<-WholeTable[which($Date %in% a)]
The error is "Error in which$Date : object of type 'closure' is not subsettable"
Then I try another code:
WholeTable$Date <- as.character(WholeTable$Date)
NewTable<-subset(WholeTable, Date == "17/12/2006"|Date == "18/12/2006")
It can create a new subset but with 0 rows.
Really confused
May be easier if you provide a minimum dataset, if I understand correctly though, this should work:
# In this example date is a factor variable with 4 levels
Wholetable <- data.frame(date = c("16/12/2006", "17/12/2006", "18/12/2006", "19/12/2006"), a = c(1:4))
Newtable <- subset(Wholetable, date == "17/12/2006" | date == "18/12/2006")

Error in if ((location <= 1) | (location >= length(x)) - R - Eventstudies

I am trying my best at a simple event study in R, with some data retrieved from the Wharton Research Data Service (WRDS). I am not completely new to R, but I would describe my expertise level as intermediate. So, here is the problem. I am using the eventstudies package and one of the steps is converting the physical dates to event time frame dates with the phys2eventtime(..) function. This function takes multiple arguments:
z : time series data for which event frame is to be generated. In the form of an xts object.
Events : it is a data frame with two columns: unit and when. unit has column name of which response is to measured on the event date, while when has the event date.
Width : width corresponds to the number of days on each side of the event date. For a given width, if there is any NA in the event window then the last observation is carried forward.
The authors of the package have provided an example for the xts object (StockPriceReturns) and for Events (SplitDates). This looks like the following:
> data(StockPriceReturns)
> data(SplitDates)
> head(SplitDates)
unit when
5 BHEL 2011-10-03
6 Bharti.Airtel 2009-07-24
8 Cipla 2004-05-11
9 Coal.India 2010-02-16
10 Dr.Reddy 2001-10-10
11 HDFC.Bank 2011-07-14
> head(StockPriceReturns)
Mahindra.&.Mahindra
2000-04-03 -8.3381609
2000-04-04 0.5923550
2000-04-05 6.8097616
2000-04-06 -0.9448889
2000-04-07 7.6843828
2000-04-10 4.1220462
2000-04-11 -1.9078480
2000-04-12 -8.3286900
2000-04-13 -3.8876847
2000-04-17 -8.2886060
So I have constructed my data in the same way, an xts object (DS_xts) and a data.frame (cDS) with the columns "unit" and "when". This is how it looks:
> head(DS_xts)
61241
2011-01-03 0.024247
2011-01-04 0.039307
2011-01-05 0.010589
2011-01-06 -0.022172
2011-01-07 0.018057
2011-01-10 0.041488
> head(cDS)
unit when
1 11754 2012-01-05
2 10104 2012-01-24
3 61241 2012-01-31
4 13928 2012-02-07
5 14656 2012-02-08
6 60097 2012-02-14
These are similar in my opinion, but how it looks does not tell the whole story. I am quite certain that my problem is in how I have constructed these two objects. Below is my R code:
#install.packages("eventstudies")
library("eventstudies")
DS = read.csv("ReturnData.csv")
cDS = read.csv("EventData.csv")
#Calculate Abnormal Returns
DS$AR = DS$RET - DS$VWRETD
#Clean up and let only necessary columns remain
DS = DS[, c("PERMNO", "DATE", "AR")]
cDS = cDS[, c("PERMNO", "DATE")]
#Generate correct date format according to R's as.Date
for (i in 1:nrow(DS)) {
DS$DATE[i] = format(as.Date(toString(DS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
for (i in 1:nrow(cDS)) {
cDS$DATE[i] = format(as.Date(toString(cDS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
#Rename cDS columns according to phys2eventtime format
colnames(cDS)[1] = "unit"
colnames(cDS)[2] = "when"
#Create list of unique PERMNO's
PERMNO <- unique(DS$PERMNO)
for (i in 1:length(PERMNO)) {
#Subset based on PERMNO
DStmp <- DS[DS$PERMNO == PERMNO[i], ]
#Remove PERMNO column and rename AR to PERMNO
DStmp <- DStmp[, c("DATE", "AR")]
colnames(DStmp)[2] = as.character(PERMNO[i])
dates <- as.Date(DStmp$DATE)
DStmp <- DStmp[, -c(1)]
#Create a temporary XTS object
DStmp_xts <- xts(DStmp, order.by = dates)
#If first iteration, just create new variable, otherwise merge
if (i == 1) {
DS_xts <- DStmp_xts
} else {
DS_xts <- merge(DS_xts, DStmp_xts, all = TRUE)
}
}
#Renaming columns for matching
colnames(DS_xts) <- c(PERMNO)
#Making sure classes are the same
cDS$unit <- as.character(cDS$unit)
eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
So, if I run phys2eventtime(..) it returns:
> eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
Error in if ((location <= 1) | (location >= length(x))) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In findInterval(when, index(x)) : NAs introduced by coercion
I have looked at the original function (it is available at their GitHub, can't use more than two links yet) to figure out this error, but I ran out of ideas how to debug it. I hope someone can help me sort it out. As a final note, I have also looked at another (magnificent) answer related to this R package (question: "format a zoo object with “dimnames”=List of 2"), but it wasn't enough to help me solve it (or I couldn't yet comprehend it).
Here is the link for the two CSV files if you would like to reproduce my error (or solve it!).

quantmod <- Having trouble writing a formula to extract single day returns without headers

I am attempting to write a formula that will return a stocks single day return, but I believe im having trouble with the data type of the periodReturn subset field
periodReturn(ticker,period='daily',subset='20161010::20161010')
works but
dayReturn <- function(ticker,date) {
ticker <- c(MSFT)
date <- c(20161010)
dayreturn <- periodReturn(ticker,period='daily',paste("subset='",date,"::",date,"'"))
dayreturn
}
gives error
dayReturn(msft,20161010)
daily.returns
Warning messages:
1: In as_numeric(YYYY) : NAs introduced by coercion
2: In as_numeric(MM) : NAs introduced by coercion
3: In as_numeric(DD) : NAs introduced by coercion
>
Thanks in advance for any advice!
You have a couple of syntax errors going on here inside your dayReturn function.
Here is reproducible code extracted from inside your function that will work:
library(quantmod)
getSymbols("MSFT")
ticker <- c(MSFT)
date <- c("20161010")
dayreturn <- periodReturn(ticker,period='daily',subset = paste0(date,"::",date,"'"))
Your errors:
date wants to be a string, not a numeric number.
Your string for the dates you want to subset over is incorrect. you want to use subset = "YYYYMMDD::YYYYMMDD" or (subset = "YYYY-MM-DD::YYYY-MM-DD") in side periodReturn.
Your function would work more correctly like this:
dayReturn <- function(ticker, date1 , date2) {
dayreturn <- periodReturn(ticker,period='daily',subset = paste0(date1,"::",date2,"'"))
dayreturn
}
dayReturn(MSFT, "20161010", "20161012")
# daily.returns
# 2016-10-10 0.004152284
# 2016-10-11 -0.014645107
# 2016-10-12 -0.001398811

converting time to POSIXct in R

I have a data frame that can have values like this:
p<-c("2012-08-14 9:00", "2012-08-14 7:00:00")
I am trying to conver to datetime as this:
p<-as.POSIXct(p)
this converted everyting to to 2012-08-14 09:00:00
for some reason, it is not working anymore. If you have noticed, my data sometimes have seconds and somtimes it does not.
How do you force this to be datetime format?
I get errors like this:
Error in as.POSIXlt.character(p) :
character string is not in a standard unambiguous format
Your vector isn't in a consistent format, so convert it to POSIXlt first because as.POSIXlt.character checks multiple formats.
p <- c("2012-08-14 9:00", "2012-08-14 7:00:00")
plt <- as.POSIXlt(p)
pct <- as.POSIXct(plt)
the package lubridate may help
here an example - perhaps not the most elegant one - but it hs
p<-c("2012-08-14 9:00", "2012-08-14 7:00:00")
require(lubridate) #
NewDate <- c()
for (i in 1 : 2)
{
if (nchar(unlist(strsplit(p[i], ' '))[2]) == 4) {NewDate <- c(NewDate, as.character(ymd_hm(p[i])))}
if (nchar(unlist(strsplit(p[i], ' '))[2]) == 7) {NewDate <- c(NewDate, as.character(ymd_hms(p[i])))}
}
NewDate

Resources