I have data in the format Date, Time, Value. Here is a sample:
04/01/2010,07:10,17159
04/01/2010,07:20,4877
04/01/2010,07:30,6078
04/01/2010,07:40,3105
04/01/2010,07:50,4073
04/01/2010,08:00,6986
04/01/2010,08:10,7906
04/01/2010,08:20,7681
04/01/2010,08:30,5665
04/01/2010,08:40,6631
04/01/2010,08:50,4633
04/01/2010,09:00,6346
04/01/2010,09:10,6444
04/01/2010,09:20,6324
04/01/2010,09:30,11696
04/01/2010,09:40,7667
04/01/2010,09:50,6375
04/01/2010,10:00,5934
04/01/2010,10:10,12626
04/01/2010,10:20,11674
04/01/2010,10:30,4660
04/01/2010,10:40,3831
04/01/2010,10:50,7089
04/01/2010,11:00,4548
04/01/2010,11:10,2590
04/01/2010,11:20,3334
04/01/2010,11:30,5171
I want to convert this to a Time Series of Value keeping the same format. i.e. I need to be able store the date and time components too. This is is because i want to "deseasonalize" the data.
I have tried
z <- read.csv("fileName", header=TRUE,sep=",")
but not sure what to do from here. Can anyone show me how to load into a time series object properly? Or is there another way to do this?
Thanks in advance
You can use the zoo package. The code below was writen to be reproducible but in actual practice text="Lines" would be replaced with file="fileName". Also as shown in the question the Date field is ambiguous and you may need to adjust the percent codes if its not day/month/year.
library(zoo)
Lines <- "Date,Time,Value
04/01/2010,07:10,17159
04/01/2010,07:20,4877
04/01/2010,07:30,6078
04/01/2010,07:40,3105
"
z <- read.zoo(text = Lines, sep = ",", header = TRUE,
index = 1:2, tz = "", format = "%d/%m/%Y %H:%M")
which gives:
> z
2010-01-04 07:10:00 2010-01-04 07:20:00 2010-01-04 07:30:00 2010-01-04 07:40:00
17159 4877 6078 3105
In addition to what has been mentioned as the answer, you can check this link (http://eclr.humanities.manchester.ac.uk/index.php/R_TSplots) which discusses the use of 'xts' in this case.
I hope it helps.
Related
I have data saved in Excel that includes time data.
When reading it in with read.xlsx in R, it adds "1899-12-30" to the time column, I presume in an attempt to read in a date in addition to the time that doesn't exist.
library(xlsx)
times<-read.xlsx("times.xlsx", sheetName = "Sheet1")
times
Time
1 1899-12-30 20:13:24
2 1899-12-30 08:13:54
3 1899-12-30 08:14:24
4 1899-12-30 08:14:54
5 1899-12-30 08:15:24
I tried
times<-read.xlsx("times.xlsx", sheetName = "Sheet1", colClasses('POSIXct'))
and
times<-read.xlsx("times.xlsx", sheetName = "Sheet1", colClasses('POSIXct(format='%H:%M:%S')'))
but the first doesn't do anything and the second gives me an error.
Note that read.xlsx() recognizes TIME as %H:%M:%S, and converts it into the dummy POSIXct/POSIXt object, i.e. 1899-12-31 08:00:00 and 1899-12-31 20:00:00
#use readxl
library(readxl)
df <- read_excel('test.xlsx')
OR use format
read.xlsx("myfile.xlsx") %>%
mutate(
TIME = format(TIME, "%I:%M %p")
)
OR after reading df convert it into time using
as.POSIXct(df$Time, format="%H:%M:%S", tz="CET")
EDIT:
I don't have data to replicate your errors or problem that you are facing , so i have made one according to those date format
df = data.frame(Time = c("1899-12-30 20:13:24","1899-12-30 08:13:54","1899-12-30 08:14:24","1899-12-30 08:14:54","1899-12-30 08:15:24"))
df <- as.POSIXct(df$Time, format = "%Y-%m-%d %H:%M") #apply function to create a POSIXct object
#use the `strftime()` function to split the column and then the function times() to create a chronological object.
library(chron)
time <- times(strftime(df, format="%H:%M:%S"))
This method should def work, hope you got the idea there are many ways to achieve this
My data has a start and end time stamp such as this:
200401010000 200401010030
200401010030 200401010100
200401010100 200401010130 and so on...
I'm trying to convert these fields into %YYYY%MM%DD%HH%MM format using lubridate and as.POSIXct but it I get only NAs. Any help will be appreciated.
My goal is to aggregate the data for each month.
The code I've used so far is as follows:
start_time = as.POSIXct(dat$TIMESTAMP_START, format = "%YYYY%MM%DD %HH%MM",origin = "2004-01-01 00:00", tz="EDT")
stop_time = as.POSIXct(dat$TIMESTAMP_END, format = "%YYYY%MM%DD%HH%MM",origin = "2004-01-01 00:30", tz="EDT")
dat$interval <- interval(start_time, stop_time)
Two problems I can see:
If you're using lubridate already, you should probably use the function ymd_hm(), which is just cleaner IMO.
You can't apply that function to a vector (which I presume dat$TIMESTAMP_START and dat$TIMESTAMP_END are); to do this, you can use:
start_time <- sapply(dat$TIMESTAMP_START, ymd_hm())
end_time <- sapply(dat$TIMESTAMP_END, ymd_hm())
That will apply the function to each item in your vector.
I downloaded stock market data from Yahoo (code below) - for context, at first I tried with getSymbols(^DJI) but I got error messages possibly related to Yahoo... different issue.
The point is that once downloaded, and imported into R, I massaged it into a format close enough to a time series to be able to run chartSeries(DJI):
require(RCurl)
require(foreign)
x <- getURL("https://raw.githubusercontent.com/RInterested/datasets/gh-pages/%5EDJI.csv")
DJI <- read.csv(text = x, sep =",")
DJI$Date <- as.Date(DJI$Date, format = "%m/%d/%Y") # Formatting Date as.Date
rownames(DJI) <- DJI$Date # Assigning Date to row names
DJI$Date <- NULL # Removing the Date column
chartSeries(DJI, type="auto", theme=chartTheme('white'))
even if the dataset is not really a time series:
> is.ts(DJI)
[1] FALSE
The problem comes about when I try to find out the date of, for instance, the minimum closing value of the Dow. I can do something like
> DJI[DJI$Close == min(DJI$Close),]
Open High Low Close Adj.Close Volume
1985-05-01 1257.18 1262.81 1239.07 1242.05 1242.05 10050000
yielding the entire row, including the row name (1985-05-01), which is the only part I want. However, if I insist on just getting the actual date, I have to juggle a second dataset containing the dates in one of the columns:
require(RCurl)
require(foreign)
x <- getURL("https://raw.githubusercontent.com/RInterested/datasets/gh-pages/%5EDJI.csv")
DJI <- read.csv(text = x, sep =",")
DJI$Date <- as.Date(DJI$Date, format = "%m/%d/%Y") # Formatting Date as.Date
rownames(DJI) <- DJI$Date # Assigning Date to row names
DJI.raw <- DJI # Second dataset for future subsetting
DJI$Date <- NULL # Removing the Date column
which does allow me to run
> DJI.raw$Date[DJI.raw$Close == min(DJI.raw$Close)]
[1] "1985-05-01"
Further, I don't think that turning the dataset into an .xts file would help.
I'm not clear what you want but it sounds like you just want the date? You mention xts is not an option (which would have been runnable)
time(as.xts(DJI))[which.min(DJI$Close)] # POSIXct format
# [1] "1985-05-01 EDT"
Otherwise a simple rownames + which.min would get the date for you?
as.Date(rownames(DJI)[which.min(DJI$Close)]) # Date format
# [1] "1985-05-01"
I have Log data, it records the Start datetime and end datetime stamp.
Data from the log file look as below
Preapred data in excel
Start_Date1 Start_Time1 Start_Millisecond1 Start_Date2 Start_Time2 Start_Millisecond2
29-11-2015 18:25:04 671 29-11-2015 18:40:05 275
29-11-2015 18:25:03 836 29-11-2015 18:40:04 333
10-11-2015 02:41:57 286 10-11-2015 02:51:52 690
When i load the data into R using Rstudio. The class of data looks as below.
Data Loaded and Its data type
I am using below line of code to convert date to POSIXlt.
nov$Start.Date1<-as.POSIXlt(as.character(nov$Start.Date1), format="%d-%m-%Y")
nov <-read.csv(file = '././data/Data For R Nov CBEFF log.csv',header = TRUE,na.strings = FALSE,stringsAsFactors = FALSE)
str(nov$Start.Time1)
nov$Start.Date1<-as.POSIXlt(as.character(nov$Start.Date1), format="%d-%m-%Y")
nov$Start.Time1<-as.POSIXlt(as.character(nov$Start.Time1), format="%H:%M:%S")
nov$Start.Time1<-format(nov$Start.Time1, format="%H:%M:%S")
nov$Start.Date2<-as.POSIXlt(as.character(nov$Start.Date2), format="%d-%m-%Y")
nov$Start.Time2<-as.POSIXlt(as.character(nov$Start.Time2), format="%H:%M:%S")
nov$Start.Time2<-format(nov$Start.Time2, format="%H:%M:%S")
**
> I want to caluclate time taken to complete that is > StartTime2-StartTime1
**
StartTime1 and StartTime2 are now in chr data type.
This should do the trick. If you had posted the data (reproducible example), I could check the code. This way it might have some typos in it.
nov<-read.delim("sample.csv", sep=";", dec=".")
nov$start1<-as.POSIXlt(paste(nov$Start_Date1,nov$Start_Time1 ,sep=" "), format="%d-%m-%Y %H:%M:%S")
nov$start2<-as.POSIXlt(paste(nov$Start_Date2,nov$Start_Time2 ,sep=" "), format="%d-%m-%Y %H:%M:%S")
nov$timediff<-as.numeric(difftime(nov$start2,nov$start1, unit="secs"))*1000+(nov$Start.Milisecond2-nov$Start.Milisecond1)
This gives you the time in miliseconds
EDIT
Verified with sample data. The variable names have changed from "Start.Date1" to "Start_Date1"
I would like to download daily data from yahoo for the S&P 500, the DJIA, and 30-year T-Bonds, map the data to the proper time zone, and merge them with my own data. I have several questions.
My first problem is getting the tickers right. From yahoo's website, it looks like the tickers are: ^GSPC, ^DJI, and ^TYX. However, ^DJI fails. Any idea why?
My second problem is that I would like to constrain the time zone to GMT (I would like to ensure that all my data is on the same clock, GMT seems like a neutral choice), but I couldn' get it to work.
My third problem is that I would like to merge the yahoo data with my own data, obtained by other means and available in a different format. It is also daily data.
Here is my attempt at constraining the data to the GMT time zone. Executed at the top of my R script.
Sys.setenv(TZ = "GMT")
# > Sys.getenv("TZ")
# [1] "GMT"
# the TZ variable is properly set
# but does not affect the time zone in zoo objects, why?
Here is my code to get the yahoo data:
library("tseries")
library("xts")
date.start <- "1999-12-31"
date.end <- "2013-01-01"
# tickers <- c("GSPC","TYX","DJI")
# DJI Fails, why?
# http://finance.yahoo.com/q?s=%5EDJI
tickers <- c("GSPC","TYX") # proceed without DJI
z <- zoo()
index(z) <- as.Date(format(time(z)),tz="")
for ( i in 1:length(tickers) )
{
cat("Downloading ", i, " out of ", length(tickers) , "\n")
x <- try(get.hist.quote(
instrument = paste0("^",tickers[i])
, start = date.start
, end = date.end
, quote = "AdjClose"
, provider = "yahoo"
, origin = "1970-01-01"
, compression = "d"
, retclass = "zoo"
, quiet = FALSE )
, silent = FALSE )
print(x[1:4]) # check that it's not empty
colnames(x) <- tickers[i]
z <- try( merge(z,x), silent = TRUE )
}
Here is the dput(head(df)) of my dataset:
df <- structure(list(A = c(-0.011489000171423, -0.00020300000323914,
0.0430639982223511, 0.0201549995690584, 0.0372899994254112, -0.0183669999241829
), B = c(0.00110999995376915, -0.000153000000864267, 0.0497750006616116,
0.0337960012257099, 0.014121999964118, 0.0127800004556775), date = c(9861,
9862, 9863, 9866, 9867, 9868)), .Names = c("A", "B", "date"
), row.names = c("0001-01-01", "0002-01-01", "0003-01-01", "0004-01-01",
"0005-01-01", "0006-01-01"), class = "data.frame")
I'd like to merge the data in df with the data in z. I can't seem to get it to work.
I am new to R and very much open to your advice about efficiency, best practice, etc.. Thanks.
EDIT: SOLUTIONS
On the first problem: following GSee's suggestions, the Dow Jones Industrial Average data may be downloaded with the quantmod package: thus, instead of the "^DJI" ticker, which is no longer available from yahoo, use the "DJIA" ticker. Note that there is no caret in the "DJIA" ticker.
On the second problem, Joshua Ulrich points out in the comments that "Dates don't have timezones because days don't have a time component."
On the third problem: The data frame appears to have corrupted dates, as pointed out by agstudy in the comments.
My solutions rely on the quantmod package and the attached zoo/xts packages:
library(quantmod)
Here is the code I have used to get proper dates from my csv file:
toDate <- function(x){ as.Date(as.character(x), format("%Y%m%d")) }
dtz <- read.zoo("myData.csv"
, header = TRUE
, sep = ","
, FUN = toDate
)
dtx <- as.xts(dtz)
The dates in the csv file were stored in a single column in the format "19861231". The key to getting correct dates was to wrap the date in "as.character()". Part of this code was inspired from R - Stock market data from csv to xts. I also found the zoo/xts manuals helpful.
I then extract the date range from this dataset:
date.start <- start(dtx)
date.end <- end(dtx)
I will use those dates with quantmod's getSymbols function so that the other data I download will cover the same period.
Here is the code I have used to get all three tickers.
tickers <- c("^GSPC","^TYX","DJIA")
data <- new.env() # the data environment will store the data
do.call(cbind, lapply( tickers
, getSymbols
, from = date.start
, to = date.end
, env = data # data saved inside an environment
)
)
ls(data) # see what's inside the data environment
data$GSPC # access a particular ticker
Also note, as GSee pointed out in the comments, that the option auto.assign=FALSE cannot be used in conjunction with the option env=data (otherwise the download fails).
A big thank you for your help.
Yahoo doesn't provide historical data for ^DJI. Currently, it looks like you can get the same data by using the ticker "DJIA", but your mileage may vary.
It does work in this case because you're only dealing with Dates
the df object your provided is yearly data beginning in the year 0001. So, that's probably not what you wanted.
Here's how I would fetch and merge those series (or use an environment and only make one call to getSymbols)
library(quantmod)
do.call(cbind, lapply(c("^GSPC", "^TYX"), getSymbols, auto.assign=FALSE))