R Error: index is not in increasing order - r

NOTE: PROBLEM RESOLVED IN THE COMMENTS BELOW
I'm getting the following error when trying to turn a data.frame into xts following the answer in found here.
Error in .xts(DA[, 3:6], index = as.POSIXct(DAINDEX, format = "%m/%d/%Y %H:%M:%S", :
index is not in increasing order
I've not been able to find much on this error or how to resolve it, so any help towards that would be greatly appreciated.
The data is daily S&P 500 in a comma delimited format with the following columns: "Date" "Time" "Open" "High" "Low" "Close".
Below is the code:
DA <- read.csv("SNP.csv", header = TRUE, stringsAsFactors = FALSE)
DAINDEX <- paste(DA$Date, DA$Time, sep = " ")
Data.hist <- .xts(DA[,3:6], index = as.POSIXct(DAINDEX, format = "%m/%d/%Y %H:%M:%S", tzone = "GMT"))
As requested, some lines of the data
structure(list(Date = c("5/20/2016", "5/19/2016", "5/18/2016",
"5/17/2016", "5/16/2016", "5/13/2016"), Time = c("0:00:00", "0:00:00",
"0:00:00", "0:00:00", "0:00:00", "0:00:00"), Open = c(2041.880005,
2044.209961, 2044.380005, 2065.040039, 2046.530029, 2062.5),
High = c(2058.350098, 2044.209961, 2060.610107, 2065.689941,
2071.879883, 2066.790039), Low = c(2041.880005, 2025.910034,
2034.48999, 2040.819946, 2046.530029, 2043.130005), Close = c(2052.320068,
2040.040039, 2047.630005, 2047.209961, 2066.659912, 2046.609985
)), .Names = c("Date", "Time", "Open", "High", "Low", "Close"
), row.names = c(NA, 6L), class = "data.frame")
The above is the output of dput(head(DA))

The easiest thing to do is use the regular xts constructor instead of .xts. It will check if the index is sorted correctly, and sort the index and data, if necessary.
Data.hist <- xts(DA[,3:6], as.POSIXct(DAINDEX, "%m/%d/%Y %H:%M:%S", "GMT"))

Related

Creating a function to change a variable type to time

I'm playing around with functions in R and want to create a function that takes a character variable and converts it to a POSIXct.
The time variable currently looks like this:
"2020-01-01T05:00:00.283236Z"
I've successfully converted the time variable in my janviews dataset with the following code:
janviews$time <- gsub('T',' ',janviews$time)
janviews$time <- as.POSIXct(janviews$time, format = "%Y-%m-%d %H:%M:%S", tz = Sys.timezone())
Since I have to perform this on multiple datasets, I want to create a function that will perform this. I created the following function but it doesn't seem to be working and I'm not sure why:
set.time <- function(dat, variable.name){
dat$variable.name <- gsub('T', ' ', dat$variable.name)
dat$variable.name <- as.POSIXct(dat$variable.name, format = "%Y-%m-%d %H:%M:%S", tz = Sys.timezone())
}
Here's the first four rows of the janviews dataset:
structure(list(customer_id = c("S4PpjV8AgTBx", "p5bpA9itlILN",
"nujcp24ULuxD", "cFV46KwexXoE"), product_id = c("kq4dNGB9NzwbwmiE",
"FQjLaJ4B76h0l1dM", "pCl1B4XF0iRBUuGt", "e5DN2VOdpiH1Cqg3"),
time = c("2020-01-01T05:00:00.283236Z", "2020-01-01T05:00:00.895876Z",
"2020-01-01T05:00:01.362329Z", "2020-01-01T05:00:01.873054Z"
)), row.names = c(NA, -4L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x1488180e0>)
Also, if there is a better way to convert my time variable, I am open to changing my method!
I would use the lubridate package and the as_datetime() function.
lubridate::as_datetime("2020-01-01T05:00:00.283236Z")
Returns
"2020-01-01 05:00:00 UTC"
Lubridate Info

R plotting annual data and "January" repeated at end of graph

I'm fairly new to R and am trying to plot some expenditure data. I read the data in from excel and then do some manipulation on the dates
data <- read.csv("Spending2019.csv", header = T)
#converts time so R can use the dates
strdate <- strptime(data$DATE,"%m/%d/%Y")
newdate <- cbind(data,strdate)
finaldata <- newdate[order(strdate),]
This probably isn't the most efficient, but it gets me there :)
Here's the relevant columns of the first four lines of my finaldata dataframe
dput(droplevels(finaldata[1:4,c(5,7)]))
structure(list(AMOUNT = c(25.13, 14.96, 43.22, 18.43), strdate = structure(c(1546578000,
1546750800, 1547010000, 1547010000), class = c("POSIXct", "POSIXt"
), tzone = "")), row.names = c(NA, 4L), class = "data.frame")
The full data set has 146 rows and the dates range from 1/4/2019 to 12/30/2019
I then plot the data
plot(finaldata$strdate,finaldata$AMOUNT, xlab = "Month", ylab = "Amount Spent")
and I get this plot
This is fine for me getting started, EXCEPT why is JAN repeated at the far right end? I have tried various forms of xlim and can't seem to get it to go away.

mutate_impl error in dplyr/lubridate add date time

Using the lubridate package I want to add seconds (for the purpose of the example) to a "POSIXct", "POSIXt" field in a tibble.
b <- structure(list(`"a"` = c("a", "a", "a", "a", "a"), Date_time = structure(c(1506694322,
1506694270, 1506693970, 1506693897, 1506693849), class = c("POSIXct",
"POSIXt"), tzone = "")), .Names = c("\"a\"", "Date_time"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L))
b %>%
mutate(tol_lower = Date_time - second(2),
tol_lower = Date_time + second(30))
I get the error:
Error in mutate_impl(.data, dots) : 'origin' must be supplied
Why is this? I appreciate i can calculate hours, but I'd like to know what I'm doing wrong.
Additional points:
-I've tried as.Date, which gives the same error.
-I can add seconds directly without issue: tol_lower = Date_time - 2
Whyn't use this?
b %>% mutate(tol_lower = Date_time - 2,
tol_upper = Date_time + 30)
In case you want to add hours to given date then simply use Date_time + 2*60*60 (i.e. 2 hours added to Date_time).
Also ?second clearly says that x in second(x) is a "date-time object" but in your case you are trying to pass an integer.
Hope it helps!

R time formatting with dirty data

I'm using R to generate a CZML file from a database.
The database has dirty data.
I need a way to make sure times are in the format "%H:%M:%S".
The data can be in the correct %H:%M:%S already or missing zeros in front of the hour, e.g 8:30:00, which is an invalid ISO 8601 and throws the CZML parsing off entirely.
It needs to always be like so 08:30:00 or 07:09:00 in the 24h format.
I have errors because it is like so 8:30:00 or 7:09:00 still in the 24h format though, I haven't checked if the minutes or seconds are incorrect too but for the moment I assume they are correct and the only problem is the hours.
For example, I have a csv file like this:
"Date","Time","TZ","Jul.Time","BirdID","Species","Sex","Age","SiteID","Latitude","Longitude"
"4-Mar-13","08:30:00","America/Costa_Rica",2456356.187500,"test2","GREH","M","AHY","56scr25",8.71191178,-82.96866316
"4-Mar-13","8:30:00","America/Costa_Rica",2456356.187500,"test2","GREH","M","AHY","56scr25",8.71191178,-82.96866316
I need to generate a CZML like so:
"point": {
"color": {
"rgba": [
"2013-03-04T08:30:00Z",225,50,50,196,"2013-03-04T08:30:01Z",50,50,225,196,"2013-03-04T13:30:00Z",225,50,50,196,"2013-03-04T13:30:01Z",50,50,225,196,"2013-03-04T16:00:00Z",225,50,50,196,"2013-03-04T16:00:01Z",50,50,225,196
]
},
"pixelSize": { "number": 10 }
}
My code is like so:
j=1
numVisits=nrow(visitedTimes)
while(j<=numVisits){
date=as.Date(visitedTimes$Date[j], format="%d-%b-%y")
time=format(visitedTimes$Time[j], format="%H:%M:%S")
timeOfPassage=paste0(date,"T",time,"Z")
timeAfter=as.POSIXlt(timeOfPassage, format="%Y-%m-%dT%H:%M:%SZ")
timeAfter$sec=timeAfter$sec+1
timeAfter=format(timeAfter, format="%Y-%m-%dT%H:%M:%SZ")
cat(paste0("\"",timeOfPassage,"\","))
cat("225,50,50,196,")
cat(paste0("\"",timeAfter,"\","))
cat("50,50,225,196")
if(j<numVisits){
cat(",")
}
j=j+1
}
But it doesn't produce the desired output because of the dirty data..
Any ideas?
We can use times from chron
library(chron)
times(v1)
#[1] 08:30:00 08:30:00 07:09:00 07:09:00
Or using base R
format(strptime(v2, '%H:%M:%S'), '%H:%M:%S')
#[1] "08:30:00" "08:30:00" "07:09:00" "07:09:00" "07:09:05" "11:10:00"
Using the OP's updated dataset
df1$Time <- times(df1$Time)
df1$Time
#[1] 08:30:00 08:30:00
Or using regex
sub('^(.:)', '0\\1', df1$Time)
gsub('[^:]{2}(*SKIP)(*F)|(\\d)', '0\\1', v2, perl=TRUE)
#[1] "08:30:00" "08:30:00" "07:09:00" "07:09:00" "07:09:05" "11:10:00"
data
v1 <- c('8:30:00', '08:30:00', '7:09:00', '7:9:00')
v2 <- c(v1, '7:9:5', '11:10:0')
df1 <- structure(list(Date = c("4-Mar-13", "4-Mar-13"), Time = c("08:30:00",
"8:30:00"), TZ = c("America/Costa_Rica", "America/Costa_Rica"
), Jul.Time = c(2456356.1875, 2456356.1875), BirdID = c("test2",
"test2"), Species = c("GREH", "GREH"), Sex = c("M", "M"), Age = c("AHY",
"AHY"), SiteID = c("56scr25", "56scr25"), Latitude = c(8.71191178,
8.71191178), Longitude = c(-82.96866316, -82.96866316)), .Names = c("Date",
"Time", "TZ", "Jul.Time", "BirdID", "Species", "Sex", "Age",
"SiteID", "Latitude", "Longitude"), class = "data.frame", row.names = c(NA,
-2L))

R convert YYMMDD to date

I have data in YYMMDDHH format but am trying to get the weekday so I need to go to a date format but can't figure it out.
Here's a dput of the relevant data:
structure(list(id = c(7927751403363142656, 18236986451472797696,
5654946373641778176, 14195690822403907584, 1693303484298446848,
1.1362181921561e+19, 11694645532962195456, 1221431312630614784,
1987127670789791488, 379819848497418688), hour = c(14102118L,
14102217L, 14102812L, 14102912L, 14102820L, 14102401L, 14102117L,
14102312L, 14102301L, 14102414L)), .Names = c("id", "hour"), row.names = c(3620479L,
8510796L, 29632625L, 34450879L, 31874113L, 13420799L, 3332671L,
11543560L, 9602012L, 15574701L), class = "data.frame")
When I use:
dat2$dow <- as.Date(substr(as.character(dat2$hour), 1,6), format = '%Y%m%d')
I just get NA's. Any suggestions?
"%Y" is for 4-digit years; "%y" is for 2-digit years. And you don't need to use substr. as.Date will ignore anything after the end of the specified format.
dat2$dow <- as.Date(as.character(dat2$hour), format='%y%m%d')

Resources