I have data frame(df) that contains daily stock index prices covering over 4000 days. It looks like:
Date Prices
1986-1-1 20
. .
. .
. .
. .
2001-08-31 40
I am trying to convert the data frame into zoo object using read.zoo(df) (read.zoo is a function in zoo package). However it gives me the following error:
Warning message:
In zoo(rval3, ix) :
some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
that affects the subsequent codes I apply to the object.
For a reproducibility purpose, the original data (FTSE100jensen.csv) and code (JensenPaper.R) is available on https://github.com/ahmedfsalhin/1stpaper
The problem is that you called read.zoo() without providing a value for format=, but your dates are formated like "%d/%m/%Y", not "%Y-%m-%d"
I'm not quite sure why this error was occurring, but I first converted Date to the Date class and was able to call read.zoo without error using this:
options(stringsAsFactors=FALSE)
library(zoo)
##
Data <- read.csv(
"F:/gitData.csv",
header=TRUE)
#
Data$Date <- as.Date(
Data$Date,
"%d/%m/%Y")
##
zData <- read.zoo(Data)
##
> head(zData)
Open High Low Close Volume Adj.Close
1986-01-01 1412.6 1412.6 1412.6 1412.6 0 1412.6
1986-01-02 1412.6 1420.8 1412.0 1420.5 0 1420.5
1986-01-03 1420.5 1430.0 1419.6 1429.8 0 1429.8
1986-01-06 1429.8 1436.3 1424.1 1424.1 0 1424.1
1986-01-07 1419.8 1419.8 1411.6 1415.2 0 1415.2
1986-01-08 1415.2 1419.3 1400.3 1404.2 0 1404.2
and everything seems to be in order, e.g. I can call .zoo methods properly, etc...
> plot(zData)
To address the comments above, the error message does seem to indicate that there are duplicated dates, but this is not the case:
> dim(Data)
[1] 4088 7
> length(unique(Data$Date))
[1] 4088
Related
I have downloaded the DJI historical data from Yahoo as a csv for further analysis in R. Out of curiosity getSymbols("^DJI") didn't seem to work, but I digress.
The point is that I don't know how to turn this csv file into a time series format.
Here is the output and problem so far:
> DJI = read.csv("^DJI.csv")
> head(DJI)
Date Open High Low Close Adj.Close Volume
1 1/29/1985 1277.72 1295.49 1266.89 1292.62 1292.62 13560000
2 1/30/1985 1297.37 1305.10 1278.93 1287.88 1287.88 16820000
3 1/31/1985 1283.24 1293.40 1272.64 1286.77 1286.77 14070000
4 2/1/1985 1276.94 1286.11 1269.77 1277.72 1277.72 10980000
5 2/4/1985 1272.08 1294.94 1268.99 1290.08 1290.08 11630000
6 2/5/1985 1294.06 1301.13 1278.60 1285.23 1285.23 13800000
> chartSeries(DJI)
Error in try.xts(x, error = "chartSeries requires an xtsible object") :
chartSeries requires an xtsible object
So the {quantmod} function chartSerie is requesting an .xts file, but the Date column in DJI is not immediately recognized as such:
> DJI = as.Date(DJI$Date)
Error in charToDate(x) :
character string is not in a standard unambiguous format
EDIT after the answer below:
> head(DJI)
Open High Low Close Adj.Close Volume
1985-01-29 1277.72 1295.49 1266.89 1292.62 1292.62 13560000
1985-01-30 1297.37 1305.10 1278.93 1287.88 1287.88 16820000
1985-01-31 1283.24 1293.40 1272.64 1286.77 1286.77 14070000
1985-02-01 1276.94 1286.11 1269.77 1277.72 1277.72 10980000
1985-02-04 1272.08 1294.94 1268.99 1290.08 1290.08 11630000
1985-02-05 1294.06 1301.13 1278.60 1285.23 1285.23 13800000
> is.ts(DJI)
[1] FALSE
To convert the dates you need a format statement...
DJI$Date <- as.Date(DJI$Date,format="%m/%d/%Y")
quantmod needs dates in xts objects to be row names rather than a separate column. You should therefore also do
rownames(DJI) <- DJI$Date
DJI$Date <- NULL #to remove the column
chartSeries(DJI)
I have a dataframe data,Which Contains the columns having integers,and columns containing date and time,As shown
>head(data,2)
PRESSURE AMBIENT_TEMP OUTLET_PRESSURE COMP_STATUS DATE TIME predict
1 14 65 21 0 2014-01-09 12:45:00 0.6025863
2 17 65 22 0 2014-01-10 06:00:00 0.6657910
And Now i'm going to write this back to Sql database by the chunck
sqlSave(channel,data,tablename = "ANL_ASSET_CO",append = T)
Where channel is connection name,But this gives error
[RODBC] Failed exec in Update
22018 1722 [Oracle][ODBC][Ora]ORA-01722: invalid number
But When i try excluding the date column ,it writes back without any error.
> sqlSave(channel,data[,c(1:4,7)],tablename = "ANL_ASSET_CO",append = T)
> sqlSave(channel,data[,c(1:4,6:7)],tablename = "ANL_ASSET_CO",append = T)
Because of the date column the data is not writing to ORACLE SQL developer,Could be problem with the hyphen.
How can i write , Any help !!
>class(data$DATE)
[1] "POSIXct" "POSIXt"
So had to change the data type as character
>data$DATE <- as.character(data$DATE)
>sqlSave(channel,data,tablename = "ANL_ASSET_CO",append=T)
This one worked!!
I'm trying to apply a function to a column in a dataframe that contains dates and keep getting an error. I'm not exactly sure what I'm doing wrong.
My dataframe:
dates total
1 2014-12-08 01:10:00 163.7
2 2014-12-08 01:10:00 163.9
3 2014-12-08 01:12:00 163.6
4 2014-12-08 08:27:00 163.0
5 2014-12-08 08:35:00 163.7
6 2014-12-08 08:39:00 162.4
I want to replace the dates by either 'morning' or 'night' or alternatively created a new column with 'morning' or 'night'. the approach that i took involved unclassing the date so i could get the hour. I defined a night as before 4am or after 5pm. I put this in a function called timeofday:
timeofday <- function(x) {
bmk <- unclass(x)
if (bmk$hour < 4) {
return("night")
} else if (bmk$hour > 17) {
return("night")
} else {
return("morning")
}
}
I then did the following:
timeofday(df$dates)
Warning message:
In if (bmk$hour < 4) { :
the condition has length > 1 and only the first element will be used
Any help on identifying the issue would be greatly appreciated.
you could also use cut as in:
cut(unclass(x)$hour-7,c(0,15,24)-8,c('night','morning'))
(note that you have to shift your frame of reference so that you don't have two 'night' categories with this solution)
Your code contains this if statement
if (bmk$hour < 4)
If bmk is a vector, like in your case, you have an if statement containing a vector and therefore it will take account of the first element of the vector only.
This is the workaround
sapply(df$dates, timeofday)
I have a CSV file of 1000 daily prices
They are of this format:
1 1.6
2 2.5
3 0.2
4 ..
5 ..
6
7 ..
.
.
1700 1.3
The index is from 1:1700
But I need to specify a begin date and end date this way:
Start period is lets say, 25th january 2009
and the last 1700th value corresponds to 14th may 2013
So far Ive gotten this close to this problem:
> dseries <- ts(dseries[,1], start = ??time??, freq = 30)
How do I go about this? thanks
UPDATE:
managed to create a seperate object with dates as suggested in the answers and plotted it, but the y axis is weird, as shown in the screenshot
Something like this?
as.Date("25-01-2009",format="%d-%m-%Y") + (seq(1:1700)-1)
A better way, thanks to #AnandaMahto:
seq(as.Date("2009-01-25"), by="1 day", length.out=1700)
Plotting:
df <- data.frame(
myDate=seq(as.Date("2009-01-25"), by="1 day", length.out=1700),
myPrice=runif(1700)
)
plot(df)
R stores Date-classed objects as the integer offset from "1970-01-01" but the as.Date.numeric function needs an offset ('origin') which can be any staring date:
rDate <- as.Date.numeric(dseries[,1], origin="2009-01-24")
Testing:
> rDate <- as.Date.numeric(1:10, origin="2009-01-24")
> rDate
[1] "2009-01-25" "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29"
[6] "2009-01-30" "2009-01-31" "2009-02-01" "2009-02-02" "2009-02-03"
You didn't need to add the extension .numeric since R would automticallly seek out that function if you used the generic stem, as.Date, with an integer argument. I just put it in because as.Date.numeric has different arguments than as.Date.character.
I installed the quantmod package and I'm trying to import a csv file with 1 minute intraday data. Here is a sample GAZP.csv file:
"D";"T";"Open";"High";"Low";"Close";"Vol"
20130902;100100;132.2000000;133.0500000;131.9200000;132.5000000;131760
20130902;100200;132.3700000;132.5700000;132.2500000;132.2900000;66090
20130902;100300;132.3600000;132.5000000;132.2600000;132.4700000;37500
I've tried:
> getSymbols('GAZP',src='csv')
Error in `colnames<-`(`*tmp*`, value = c("GAZP.Open", "GAZP.High", "GAZP.Low", :
length of 'dimnames' [2] not equal to array extent
> getSymbols.csv('GAZP',src='csv')
> # or
> getSymbols.csv('GAZP',env,dir="c:\\!!",extension="csv")
Error in missing(verbose) : 'missing' can only be used for arguments
How should I properly use the getSymbols.csv command to read such data?
#Vladimir, if you are not insisting to use the "getSymbols" function from the quantmod package you can import your csv file - assuming it is in your working directory - as zoo object with the line:
GAZP=read.zoo("GAZP.csv",sep=";",header=TRUE,index.column=list(1,2),FUN = function(D,T) as.POSIXct(paste(D, T), format="%Y%m%d %H%M%S"))
and convert it to a xts object if you want.
GAZP.xts <- as.xts(GAZP)
> GAZP
Open High Low Close Vol
2013-09-02 10:01:00 132.20 133.05 131.92 132.50 131760
2013-09-02 10:02:00 132.37 132.57 132.25 132.29 66090
2013-09-02 10:03:00 132.36 132.50 132.26 132.47 37500