Read in times from Excel in R - r

I have data saved in Excel that includes time data.
When reading it in with read.xlsx in R, it adds "1899-12-30" to the time column, I presume in an attempt to read in a date in addition to the time that doesn't exist.
library(xlsx)
times<-read.xlsx("times.xlsx", sheetName = "Sheet1")
times
Time
1 1899-12-30 20:13:24
2 1899-12-30 08:13:54
3 1899-12-30 08:14:24
4 1899-12-30 08:14:54
5 1899-12-30 08:15:24
I tried
times<-read.xlsx("times.xlsx", sheetName = "Sheet1", colClasses('POSIXct'))
and
times<-read.xlsx("times.xlsx", sheetName = "Sheet1", colClasses('POSIXct(format='%H:%M:%S')'))
but the first doesn't do anything and the second gives me an error.

Note that read.xlsx() recognizes TIME as %H:%M:%S, and converts it into the dummy POSIXct/POSIXt object, i.e. 1899-12-31 08:00:00 and 1899-12-31 20:00:00
#use readxl
library(readxl)
df <- read_excel('test.xlsx')
OR use format
read.xlsx("myfile.xlsx") %>%
mutate(
TIME = format(TIME, "%I:%M %p")
)
OR after reading df convert it into time using
as.POSIXct(df$Time, format="%H:%M:%S", tz="CET")
EDIT:
I don't have data to replicate your errors or problem that you are facing , so i have made one according to those date format
df = data.frame(Time = c("1899-12-30 20:13:24","1899-12-30 08:13:54","1899-12-30 08:14:24","1899-12-30 08:14:54","1899-12-30 08:15:24"))
df <- as.POSIXct(df$Time, format = "%Y-%m-%d %H:%M") #apply function to create a POSIXct object
#use the `strftime()` function to split the column and then the function times() to create a chronological object.
library(chron)
time <- times(strftime(df, format="%H:%M:%S"))
This method should def work, hope you got the idea there are many ways to achieve this

Related

How to format properly date-time column in R using mutate?

I am trying to format a string column to a date-time serie.
The row in the column are like this example: "2019-02-27T19:08:29+000"
(dateTime is the column, the variable)
mutate(df,dateTime=as.Date(dateTime, format = "%Y-%m-%dT%H:%M:%S+0000"))
But the results is:
2019-02-27
What about the hours, minutes and seconds ?
I need it to apply a filter by date-time
Your code is almost correct. Just the extra 0 and the as.Date command were wrong:
library("dplyr")
df <- data.frame(dateTime = "2019-02-27T19:08:29+000",
stringsAsFactors = FALSE)
mutate(df, dateTime = as.POSIXct(dateTime, format = "%Y-%m-%dT%H:%M:%S+000"))

Sub setting times out of time series in R

I downloaded stock market data from Yahoo (code below) - for context, at first I tried with getSymbols(^DJI) but I got error messages possibly related to Yahoo... different issue.
The point is that once downloaded, and imported into R, I massaged it into a format close enough to a time series to be able to run chartSeries(DJI):
require(RCurl)
require(foreign)
x <- getURL("https://raw.githubusercontent.com/RInterested/datasets/gh-pages/%5EDJI.csv")
DJI <- read.csv(text = x, sep =",")
DJI$Date <- as.Date(DJI$Date, format = "%m/%d/%Y") # Formatting Date as.Date
rownames(DJI) <- DJI$Date # Assigning Date to row names
DJI$Date <- NULL # Removing the Date column
chartSeries(DJI, type="auto", theme=chartTheme('white'))
even if the dataset is not really a time series:
> is.ts(DJI)
[1] FALSE
The problem comes about when I try to find out the date of, for instance, the minimum closing value of the Dow. I can do something like
> DJI[DJI$Close == min(DJI$Close),]
Open High Low Close Adj.Close Volume
1985-05-01 1257.18 1262.81 1239.07 1242.05 1242.05 10050000
yielding the entire row, including the row name (1985-05-01), which is the only part I want. However, if I insist on just getting the actual date, I have to juggle a second dataset containing the dates in one of the columns:
require(RCurl)
require(foreign)
x <- getURL("https://raw.githubusercontent.com/RInterested/datasets/gh-pages/%5EDJI.csv")
DJI <- read.csv(text = x, sep =",")
DJI$Date <- as.Date(DJI$Date, format = "%m/%d/%Y") # Formatting Date as.Date
rownames(DJI) <- DJI$Date # Assigning Date to row names
DJI.raw <- DJI # Second dataset for future subsetting
DJI$Date <- NULL # Removing the Date column
which does allow me to run
> DJI.raw$Date[DJI.raw$Close == min(DJI.raw$Close)]
[1] "1985-05-01"
Further, I don't think that turning the dataset into an .xts file would help.
I'm not clear what you want but it sounds like you just want the date? You mention xts is not an option (which would have been runnable)
time(as.xts(DJI))[which.min(DJI$Close)] # POSIXct format
# [1] "1985-05-01 EDT"
Otherwise a simple rownames + which.min would get the date for you?
as.Date(rownames(DJI)[which.min(DJI$Close)]) # Date format
# [1] "1985-05-01"

How to find the difference of time (Time taken to process a file) in R?

I have Log data, it records the Start datetime and end datetime stamp.
Data from the log file look as below
Preapred data in excel
Start_Date1 Start_Time1 Start_Millisecond1 Start_Date2 Start_Time2 Start_Millisecond2
29-11-2015 18:25:04 671 29-11-2015 18:40:05 275
29-11-2015 18:25:03 836 29-11-2015 18:40:04 333
10-11-2015 02:41:57 286 10-11-2015 02:51:52 690
When i load the data into R using Rstudio. The class of data looks as below.
Data Loaded and Its data type
I am using below line of code to convert date to POSIXlt.
nov$Start.Date1<-as.POSIXlt(as.character(nov$Start.Date1), format="%d-%m-%Y")
nov <-read.csv(file = '././data/Data For R Nov CBEFF log.csv',header = TRUE,na.strings = FALSE,stringsAsFactors = FALSE)
str(nov$Start.Time1)
nov$Start.Date1<-as.POSIXlt(as.character(nov$Start.Date1), format="%d-%m-%Y")
nov$Start.Time1<-as.POSIXlt(as.character(nov$Start.Time1), format="%H:%M:%S")
nov$Start.Time1<-format(nov$Start.Time1, format="%H:%M:%S")
nov$Start.Date2<-as.POSIXlt(as.character(nov$Start.Date2), format="%d-%m-%Y")
nov$Start.Time2<-as.POSIXlt(as.character(nov$Start.Time2), format="%H:%M:%S")
nov$Start.Time2<-format(nov$Start.Time2, format="%H:%M:%S")
**
> I want to caluclate time taken to complete that is > StartTime2-StartTime1
**
StartTime1 and StartTime2 are now in chr data type.
This should do the trick. If you had posted the data (reproducible example), I could check the code. This way it might have some typos in it.
nov<-read.delim("sample.csv", sep=";", dec=".")
nov$start1<-as.POSIXlt(paste(nov$Start_Date1,nov$Start_Time1 ,sep=" "), format="%d-%m-%Y %H:%M:%S")
nov$start2<-as.POSIXlt(paste(nov$Start_Date2,nov$Start_Time2 ,sep=" "), format="%d-%m-%Y %H:%M:%S")
nov$timediff<-as.numeric(difftime(nov$start2,nov$start1, unit="secs"))*1000+(nov$Start.Milisecond2-nov$Start.Milisecond1)
This gives you the time in miliseconds
EDIT
Verified with sample data. The variable names have changed from "Start.Date1" to "Start_Date1"

Extract and save as csv files based on specified timeframe

Below is my dataset example saved as a csv file. Is it possible to extract them and save as several csv files based on specified timeframe.
For example:
The specified timeframes are:
daytime: 07:30 (same date) to 20:30 (same date)
nighttime: 21:30 (same date) to 06:30 (next date).
After the extraction, datasets are save as csv files based on this filename format:
daytime: "date"-day
daytime: "date"-night
"date" is the date from the timestamp.
Thanks for your help.
timestamp c3.1 c3.2 c3.3 c3.4 c3.5 c3.6 c3.7 c3.8 c3.9 c3.10 c3.11 c3.12
8/13/15 15:43 1979.84 1939.6 2005.21 1970 1955.55 1959.82 1989 2001.12 2004.38 1955.75 1958.75 1986.53
8/13/15 15:44 1979.57 1939.64 2005.14 1970.4 1956.43 1958.56 1989.7 2000.78 2004.53 1954.9 1959.76 1986.18
8/13/15 15:45 1979.32 1938.92 2004.52 1970.21 1955.75 1960.12 1989.07 2001.47 2003.7 1955.32 1958.94 1985.79
8/13/15 15:46 1979.33 1939.7 2004.66 1971.25 1955.89 1958.27 1989.24 2000.86 2003.92 1955.29 1959.25 1985.49
Assuming that dat is your data:
## The date-time format in the data set
format <- "%m/%d/%y %H:%M"
## Convert date-time to POSIXct
timestamp <- as.POSIXct(dat$timestamp, format = format)
## First and last dates in the data
first <- as.Date(min(timestamp))
last <- as.Date(max(timestamp))
## The start of day and night timeframes
start.day <- paste(first, "07:30")
start.night <- paste(first - 1, "20:30") ## first night timeframe starts the day before
end <- paste(last + 1, "20:30")
## The breakpoints, assuming that day is 7:30-20:30 and nigth 20:31-7:29 (i.e. no missing records)
breaks <- sort.POSIXlt(c(seq.POSIXt(as.POSIXct(start.day), as.POSIXct(end), by= "day"),
seq.POSIXt(as.POSIXct(start.night), as.POSIXct(end), by= "day")))
## The corresponding labels
labels <- head(paste0(as.Date(breaks), c("-night", "-day")), - 1)
## Add column with timeframe
dat$timeframe <-cut.POSIXt(timestamp, breaks = breaks, labels = labels)
## Save csv files
for(x in levels(dat$timeframe)) {
subset <- dat[dat$timeframe == x, ]
subset$timeframe <- NULL ## Remove the timeframe column
if(nrow(subset) > 0) write.csv(subset, file = paste0(x, ".csv"))
}

Create a Time Series from a CSV file

I have data in the format Date, Time, Value. Here is a sample:
04/01/2010,07:10,17159
04/01/2010,07:20,4877
04/01/2010,07:30,6078
04/01/2010,07:40,3105
04/01/2010,07:50,4073
04/01/2010,08:00,6986
04/01/2010,08:10,7906
04/01/2010,08:20,7681
04/01/2010,08:30,5665
04/01/2010,08:40,6631
04/01/2010,08:50,4633
04/01/2010,09:00,6346
04/01/2010,09:10,6444
04/01/2010,09:20,6324
04/01/2010,09:30,11696
04/01/2010,09:40,7667
04/01/2010,09:50,6375
04/01/2010,10:00,5934
04/01/2010,10:10,12626
04/01/2010,10:20,11674
04/01/2010,10:30,4660
04/01/2010,10:40,3831
04/01/2010,10:50,7089
04/01/2010,11:00,4548
04/01/2010,11:10,2590
04/01/2010,11:20,3334
04/01/2010,11:30,5171
I want to convert this to a Time Series of Value keeping the same format. i.e. I need to be able store the date and time components too. This is is because i want to "deseasonalize" the data.
I have tried
z <- read.csv("fileName", header=TRUE,sep=",")
but not sure what to do from here. Can anyone show me how to load into a time series object properly? Or is there another way to do this?
Thanks in advance
You can use the zoo package. The code below was writen to be reproducible but in actual practice text="Lines" would be replaced with file="fileName". Also as shown in the question the Date field is ambiguous and you may need to adjust the percent codes if its not day/month/year.
library(zoo)
Lines <- "Date,Time,Value
04/01/2010,07:10,17159
04/01/2010,07:20,4877
04/01/2010,07:30,6078
04/01/2010,07:40,3105
"
z <- read.zoo(text = Lines, sep = ",", header = TRUE,
index = 1:2, tz = "", format = "%d/%m/%Y %H:%M")
which gives:
> z
2010-01-04 07:10:00 2010-01-04 07:20:00 2010-01-04 07:30:00 2010-01-04 07:40:00
17159 4877 6078 3105
In addition to what has been mentioned as the answer, you can check this link (http://eclr.humanities.manchester.ac.uk/index.php/R_TSplots) which discusses the use of 'xts' in this case.
I hope it helps.

Resources