R/zoo: duplicate index entries in ‘order.by’ are not unique - r

I have an excel file containing 3 columns of data against a column of time at one hour interval. I tried to convert the data into a zoo object. But everytime i tried to that there is an error that says "In zoo(y, order.by = index(x), ...) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique".
> datos_meterologicos <- read_excel(datos, sheet = "Precip")
> idx <- as.Date(datos_meterologicos$Fecha)
> date.matrix <- as.data.frame(datos_meterologicos[,-1])
> date.xts <- as.xts(date.matrix,order.by=idx)
> date.zoo <- as.zoo(date.xts)
Warning message:
In zoo(y, order.by = index(x), ...) :
some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
I looked up some of the solutions from other case with the same conflict that I Have, so I tried the next code
datos_meterologicos$Fecha <- read.zoo(datos_meterologicos, FUN=as.POSIXct, format = "%Y/%m/%d %H:%M", tz="UTC"). But I get the same error.
The data is right here https://docs.google.com/spreadsheets/d/1oV2uk5LIL9aFy3Eepw0WkIWucI3_GgkV/edit?usp=sharing&ouid=115562552506837112131&rtpof=true&sd=true

You are transforming the your datetime values into a date with as.Date. You need to add the time as well otherwise you have 24 values of 1 day instead of the day and the hours. Using as.POSIXct will preserve your times.
idx <- as.POSIXct(datos_meterologicos$Fecha)
# rest of your code...

Related

Error in converting character to time variable in r with Lubridate packages

Thanks for your help.
One of variable in my dataset looks like this:
> df$TM
> [1] "000054" "000020" "000056" "000051" "000025" "000116" "000219" "000207" "000233" "000206" "000142" "000126" "000237" "000215" "000236" "000246" "000219"
[18] "000227" "000803" "000920"...
The real meaning of each character is hours, minutes and seconds.
When I adjust hms function in Lubridates as follows
> df$TM <- hms(df$TM)
Warning message is coming: "In .parse_hms(..., order = "HMS", quiet = quiet) :
Some strings failed to parse, or all strings are NAs"
After that, all the values in the column changes to NA.
I also tried
> df$TM <- as.POSIXct(df$TM, format = "%H:%M:%S")
and
> df$TM <- chronicle(times = df$TM)
and
> df$TM <- strptime(df$TM, format = "%H:%M:%S")
but... these three trial also have same results.
(Actually all data has changed to NA, so warning message is same as error message to me)
I really appreciate your help.
You can make use of this answer to include a semicolon after every second element. After that you can transform the resulting character string as date (with day, month and year) or leave it as is.
For completeness, the solution for your problem then is
as.POSIXct(sub(":+$", "", gsub('(.{2})', '\\1:', df$TM)), format = "%H:%M:%S")

How to get chron to convert a vector of Excel serial dates

I am trying to convert an element of a matrix from what were Excel serial dates to a vector of Date objects before using plot().
I can create a vetor and I get the expected result:
library(chron)
# set date origin as defined in Excel
options(chron.origin = c(month=1, day=1, year=1900))
test_dates <- c(40917:40920)
test_dates
## [1] 40917 40918 40919 40920
chron(test_dates, out.format = "m/d/y")
## [1] 01/11/12 01/12/12 01/13/12 01/14/12
But when I try to use this on my actual vector, it does not work
# first 10 vlaues in vector
pivot_pred$date
## [1] 40917 40918 40919 40920 40921 40922 40923 40924 40925 40926...
chron(pivot_pred$date, out.format = "m/d/y")
## Error in chron(dates. = floor(dts), times. = tms, format = format, out.format = out.format, :
misspecified chron format(s) length
I'm sure this is simple but I have tried many variations and none worked. Any suggestions on what I'm doing wrong?

Error in if ((location <= 1) | (location >= length(x)) - R - Eventstudies

I am trying my best at a simple event study in R, with some data retrieved from the Wharton Research Data Service (WRDS). I am not completely new to R, but I would describe my expertise level as intermediate. So, here is the problem. I am using the eventstudies package and one of the steps is converting the physical dates to event time frame dates with the phys2eventtime(..) function. This function takes multiple arguments:
z : time series data for which event frame is to be generated. In the form of an xts object.
Events : it is a data frame with two columns: unit and when. unit has column name of which response is to measured on the event date, while when has the event date.
Width : width corresponds to the number of days on each side of the event date. For a given width, if there is any NA in the event window then the last observation is carried forward.
The authors of the package have provided an example for the xts object (StockPriceReturns) and for Events (SplitDates). This looks like the following:
> data(StockPriceReturns)
> data(SplitDates)
> head(SplitDates)
unit when
5 BHEL 2011-10-03
6 Bharti.Airtel 2009-07-24
8 Cipla 2004-05-11
9 Coal.India 2010-02-16
10 Dr.Reddy 2001-10-10
11 HDFC.Bank 2011-07-14
> head(StockPriceReturns)
Mahindra.&.Mahindra
2000-04-03 -8.3381609
2000-04-04 0.5923550
2000-04-05 6.8097616
2000-04-06 -0.9448889
2000-04-07 7.6843828
2000-04-10 4.1220462
2000-04-11 -1.9078480
2000-04-12 -8.3286900
2000-04-13 -3.8876847
2000-04-17 -8.2886060
So I have constructed my data in the same way, an xts object (DS_xts) and a data.frame (cDS) with the columns "unit" and "when". This is how it looks:
> head(DS_xts)
61241
2011-01-03 0.024247
2011-01-04 0.039307
2011-01-05 0.010589
2011-01-06 -0.022172
2011-01-07 0.018057
2011-01-10 0.041488
> head(cDS)
unit when
1 11754 2012-01-05
2 10104 2012-01-24
3 61241 2012-01-31
4 13928 2012-02-07
5 14656 2012-02-08
6 60097 2012-02-14
These are similar in my opinion, but how it looks does not tell the whole story. I am quite certain that my problem is in how I have constructed these two objects. Below is my R code:
#install.packages("eventstudies")
library("eventstudies")
DS = read.csv("ReturnData.csv")
cDS = read.csv("EventData.csv")
#Calculate Abnormal Returns
DS$AR = DS$RET - DS$VWRETD
#Clean up and let only necessary columns remain
DS = DS[, c("PERMNO", "DATE", "AR")]
cDS = cDS[, c("PERMNO", "DATE")]
#Generate correct date format according to R's as.Date
for (i in 1:nrow(DS)) {
DS$DATE[i] = format(as.Date(toString(DS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
for (i in 1:nrow(cDS)) {
cDS$DATE[i] = format(as.Date(toString(cDS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
#Rename cDS columns according to phys2eventtime format
colnames(cDS)[1] = "unit"
colnames(cDS)[2] = "when"
#Create list of unique PERMNO's
PERMNO <- unique(DS$PERMNO)
for (i in 1:length(PERMNO)) {
#Subset based on PERMNO
DStmp <- DS[DS$PERMNO == PERMNO[i], ]
#Remove PERMNO column and rename AR to PERMNO
DStmp <- DStmp[, c("DATE", "AR")]
colnames(DStmp)[2] = as.character(PERMNO[i])
dates <- as.Date(DStmp$DATE)
DStmp <- DStmp[, -c(1)]
#Create a temporary XTS object
DStmp_xts <- xts(DStmp, order.by = dates)
#If first iteration, just create new variable, otherwise merge
if (i == 1) {
DS_xts <- DStmp_xts
} else {
DS_xts <- merge(DS_xts, DStmp_xts, all = TRUE)
}
}
#Renaming columns for matching
colnames(DS_xts) <- c(PERMNO)
#Making sure classes are the same
cDS$unit <- as.character(cDS$unit)
eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
So, if I run phys2eventtime(..) it returns:
> eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
Error in if ((location <= 1) | (location >= length(x))) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In findInterval(when, index(x)) : NAs introduced by coercion
I have looked at the original function (it is available at their GitHub, can't use more than two links yet) to figure out this error, but I ran out of ideas how to debug it. I hope someone can help me sort it out. As a final note, I have also looked at another (magnificent) answer related to this R package (question: "format a zoo object with “dimnames”=List of 2"), but it wasn't enough to help me solve it (or I couldn't yet comprehend it).
Here is the link for the two CSV files if you would like to reproduce my error (or solve it!).

invalid 'tz' value, problems with time zone

I'm working with minute data of NASDAQ, it has the index "2015-07-13 12:05:00 EST". I adjusted the system time with Sys.setenv(TZ = 'EST').
I want to program a simple buy/hold/sell strategy, therefore I create a vector of flat positions as a foundation.
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ))
Then I want to apply a constraint, that in a certain time window, positions are bound to be flat, which in my case means equal to 1.
pos_flat["T13:41/T14:00"] <- 1
And this returns the error:
"Error in as.POSIXlt.POSIXct(.POSIXct(.index(x)), tz = indexTZ(x)) :invalid 'tz' value".
I also get this error doing other calculations, I just used this example because it is easy and shows the problem.
As extra information:
> Sys.timezone
function (location = TRUE)
{
tz <- Sys.getenv("TZ", names = FALSE)
if (nzchar(tz))
return(tz)
if (location)
return(.Internal(tzone_name()))
z <- as.POSIXlt(Sys.time())
zz <- attr(z, "tzone")
if (length(zz) == 3L)
zz[2L + z$isdst]
else zz[1L]
}
<bytecode: 0x03648ff4>
<environment: namespace:base>
I don't understand the problem with the tz value... Any ideas?
The source of your "invalid 'tz' value" error is because, for whatever reason, R doesn't accept tz = df$var. If you set tz = 'America/New_York' or some other character value, then it will work.
Better answer (instead of using force_tz below) for converting UTC times to various timezones based on location. It is also simpler and better than looping through or using a nested ifelse. I subset and change tz based on a timezone column (which my data already has, if not you can create it). Just make sure you account for all timezones in your data
(unique(df$timezone))
df$datetime2[df$timezone == 'America/New_York'] <- format(df$datetime, tz="America/New_York")[df$timezone == 'America/New_York']
df$datetime2[df$timezone == 'America/Chicago'] <- format(df$datetime, tz="America/Chicago")[df$timezone == 'America/Chicago']
df$datetime2[df$timezone == 'America/Denver'] <- format(df$datetime, tz="America/Denver")[df$timezone == 'America/Denver']
df$datetime2[df$timezone == 'America/Los_Angeles'] <- format(df$datetime, tz="America/Los_Angeles")[df$timezone == 'America/Los_Angeles']
Previous solution: Converting to Local Time in R - Vector of Timezones
require(lubridate)
require(dplyr)
df = data.frame(timestring = c("2015-12-12 13:34:56", "2015-12-14 16:23:32"), localzone = c("America/Los_Angeles", "America/New_York"), stringsAsFactors = F)
df$moment = as.POSIXct(df$timestring, format="%Y-%m-%d %H:%M:%S", tz="UTC")
df = df %>% rowwise() %>% mutate(localtime = force_tz(moment, localzone))
df
You are getting errors because "EST" is not a valid timezone specification. It's an abbreviation that's often used when printing and displaying timezones.
The index is printed as "2015-07-13 12:05:00 EST" because "EST" probably represents Eastern Standard Time in the United States. If you want to set the TZ environment variable to that timezone, you should use Sys.setenv() with Country/City notation:
Sys.setenv(TZ = "America/New_York")
You can also set the timezone in the xts constructor:
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ), tzone = "America/New_York")
Your error occurs because of a misinterpretation of the time object. You need to have UNIX timestamps in order to use something like
pos_flat["T13:41/T14:00"] <- 1
Try a conversion of your indices by doing something like this:
index(NASDAQ) <- as.POSIXct(strptime(index(NASDAQ), "%Y-%m-%d %H:%M:%S"))
As you want to use EST, you have to change your environment variables (if you are not living in EST timezone). So all in all, this should work:
Sys.setenv(TZ = 'EST')
#load stuff
#...
index(NASDAQ) <- as.POSIXct(strptime(index(NASDAQ), "%Y-%m-%d %H:%M:%S"))
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ))
pos_flat["T13:41/T14:00"] <- 1
For further information, have a look at the POSIXct and POSIXlt structures in R.
Best regards

R: formatting the digits in xtable

I have the data:
transaction <- c(1,2,3);
date <- c("2010-01-31","2010-02-28","2010-03-31");
type <- c("debit", "debit", "credit");
amount <- c(-500, -1000.97, 12500.81);
oldbalance <- c(5000, 4500, 17000.81)
evolution <- data.frame(transaction, date, type, amount, oldbalance, row.names=transaction, stringsAsFactors=FALSE);
evolution$date <- as.Date(evolution$date, "%Y-%m-%d");
evolution <- transform(evolution, newbalance = oldbalance + amount);
evolution
If I want to create a table with the digits in amount just equal to 1 decimal place, does such a command work?
> tab.for <- formatC(evolution$amount,digits=1,format="f")
> tab.lat <- xtable(tab.for)
Error in UseMethod("xtable") :
no applicable method for 'xtable' applied to an object of class "character"
Thanks.
If you just want to reformat the amount column and present in an xtable, then you need to supply xtable with a dataframe, just supplying one column makes it looks like a character vector.
evolution$tab.for <- formatC(evolution$amount,digits=1,format="f")
evolutionxtable<-xtable(subset(evolution, select=c(date, type, tab.for)))
print(evolutionxtable)

Resources