Date formatted cell in xlsx files to R - r

I have an excel file which has date information in some cells. like :
I read this file into R by the following command :
library(xlsx)
data.files = list.files(pattern = "*.xlsx")
data <- lapply(data.files, function(x) read.xlsx(x, sheetIndex = 9,header = T))
Everything is correct except the cells with date! instead of having the xlsx information into those cell, I always have 42948 as a date :
Does anybody know how could I fix this ?

As you can see, after importing your files, dates are represented as numeric values (here 42948). They are actually the internal representation of the date information in Excel. Those values are the ones that R presents instead of the “real” dates.
You can get those dates in R with as.Date(42948 - 25569, origin = "1970-01-01")
Notice that you can also use a vector containing the internal representation of the dates, so this should also work
vect <- c(42948, 42949, 42950)
as.Date(vect - 25569, origin = "1970-01-01")
PS: To convert an Excel datetime colum, see this (p.31)

Related

Trouble with converting csv file into xts using Highfrequency package in R

I'm using the convert function in Highfrequency package in R. The dataset I'm using is TAQ downloaded from WRDS. The data looks like This.
The function convert suppose to convert the .csv into .RData files of xts objects.
I follow the instruction of the package and use the following code:
library(highfrequency)
from <- "2017-01-05"
to <- "2017-01-05"
format <- "%Y%m%d %H:%M:%S"
datasource <- "C:/Users/feimo/OneDrive/SFU/Thesis-Project/R/IBM"
datadestination <- "C:/Users/feimo/OneDrive/SFU/Thesis-Project/R/IBM"
convert( from=from, to=to, datasource=datasource,
datadestination=datadestination, trades = T, quotes = F,
ticker="IBM", dir = T, extension = "csv",
header = F, tradecolnames = NULL,
format=format, onefile = T )
But I got the following error message:
> Error in `$<-.data.frame`(`*tmp*`, "COND", value = numeric(0)) :
> replacement has 0 rows, data has 23855
I believe the default column names in the function is: c("SYMBOL", "DATE", "EX", "TIME", "PRICE", "SIZE", "COND", "CORR", "G127") which is different from my dataset, so I manually changed it in my .csv to match it. Then I got another error
>Error in xts(tdata, order.by = tdobject) : 'order.by' cannot contain 'NA', 'NaN', or 'Inf'
Tried to look at the original code, but couldn't find a solution.
Any suggestion would be really helpful. Thanks!
When I run your code on the data to which you provide a link, I get the second error you mention:
Error in xts(tdata, order.by = tdobject) :
'order.by' cannot contain 'NA', 'NaN', or 'Inf'
This error can be traced to these lines in the function highfrequency:::makeXtsTrades(), which is called by highfrequency::convert():
tdobject = as.POSIXct(paste(as.vector(tdata$DATE), as.vector(tdata$TIME)),
format = format, tz = "GMT")
tdata = xts(tdata, order.by = tdobject)
The error results from two problems:
The variable "DATE" in your data file is read into R as numeric, whereas it appears that the code creating tdobject expects tdata$DATE to be a character vector. You could fix this by manually converting that variable to a character vector:
tdata <- read.csv("IBM_trades.csv")
tdata$DATE <- as.character(tdata$DATE)
write.csv(tdata, file = "IBM_trades_DATE_fixed.csv", row.names = FALSE)
The variable "TIME_M" in your data file is not a time of the format "%H:%M:%S". It looks like it is only the minutes and seconds component of a more complete time variable, because values only contain one colon and the values before and after the colon vary from 0 to 59.9. Fixing this problem would require finding the hour component of the time variable.
These two problems result in tdobject being filled with NA values rather than valid date-times, which causes an error when xts::xts() tries to order the data by tdobject.
The more general issue seems to be that the function highfrequency::convert() expects your data to follow something like the format described here on the WRDS website, but your data has slightly different column names and possibly different value formats. I would recommend taking a close look at that WRDS page and the documentation for your data file and determining which variables in your data correspond to those described on that page (for instance, it's not clear to me that your data contains any variable that is equivalent to "G127").

Reading dates from Excel

I am having two date(formatted as dd-mm-yyyy in excel) columns in my data in excel sheet.
Date Delivery Date Collection
06-08-17 15-08-17
11-04-17 15-04-17
24-01-17 24-01-17
11-08-16 14-08-16
There are multiple issues.
Currently I am reading a subset of data(manually made of top 100 rows in another excel sheet.).
The dates in same format in excel are shown differently in R.
They all look like as in Data.Collection when I read the whole data set.
data <- read.xlsx("file.xlsx", sheetName='subset', startRow=1)
The data output shown in R is
.
I need them all to be shown as in Data.Delivery because I need to write the result back after analysis.
I am also trying to make it Date in R using
dates <- data$Date.Delivery
as.Date(dates, origin = "30-12-1899",format="%d-%m-%y")
To format Date.Collection as in Data.Delivery after reading your file, try
# see the str of your data
str(data)
# if Date.Collection is characher
data$Date.Collection <- as.numeric(data$Date.Collection)
# if Date.Collection is factor
data$Date.Collection <- as.numeric(levels(data$Date.Collection))[data$Date.Collection]
# conversion
data$Date.Collection <- as.Date(data$Date.Collection - 25569, origin = "1970-01-01")
or you can read the file using "gdata" or "XLConnect" packages to read the column as factor.
then use ymd() from lubridate to convert it into date
require(gdata)
data = read.xls (path, sheet = 1, header = TRUE)
data$Date.Collection <- ymd(data$Date.Collection)

Convert XML Doc to CSV using R - Or search items in XML file using R

I have a large, un-organized XML file that I need to search to determine if a certain ID numbers are in the file. I would like to use R to do so and because of the format, I am having trouble converting it to a data frame or even a list to extract to a csv. I figured I can search easily if it is in a csv format. So , I need help understanding how to do convert it and extract it properly, or how to search the document for values using R. Below is the code I have used to try and covert the doc,but several errors occur with my various attempts.
## Method 1. I tried to convert to a data frame, but the each column is not the same length.
require(XML)
require(plyr)
file<-"EJ.XML"
doc <- xmlParse(file,useInternalNodes = TRUE)
xL <- xmlToList(doc)
data <- ldply(xL, data.frame)
datanew <- read.table(data, header = FALSE, fill = TRUE)
## Method 2. I tried to convert it to a list and the file extracts but only lists 2 words on the file.
data<- xmlParse("EJ.XML")
print(data)
head(data)
xml_data<- xmlToList(data)
class(data)
topxml <- xmlRoot(data)
topxml <- xmlSApply(topxml,function(x) xmlSApply(x, xmlValue))
xml_df <- data.frame(t(topxml),
row.names=NULL)
write.csv(xml_df, file = "MyData.csv",row.names=FALSE)
I am going to do some research on how to search within R as well, but I assume the file needs to be in a data frame or list to so either way. Any help is appreciated! Attached is a screen shot of the data. I am interested in finding matching entity id numbers to a list I have in a excel doc.

Importing Data in R based on Dates

I want to import an excel file in R. The file however has columns such as Jan-13, Jan14 and so on. These are the column headers. When I import the data using the readxl package, it by default converts the date into numbers. So my columns which should be dates are now numbers.
I am using the code :
library(readxl)
data = read_excel("FileName", col_names = TRUE, skip = 0)
Can someone please help?
The date information is still there. It's just in the wrong format. This should work:
names(data) <- format(as.Date(as.numeric(names(data), origin="1899-01-01")), "%b%d")

Load csv into R as xts, or comparable to enable time series analysis

I am still learning R, and get very confused when using various data types, classes, etc. I have run into this issue of "Dates" not being in the right format for xts countless times now, and find a solution each time after searching long and hard for (what I consider) complicated solutions.
I am looking for a way to load a CSV into R and convert the date upon loading it each time I want to load a csv into R. 99% of my files contain Date as the first column, in format 01-31-1900 (xts wants YYYY-mm-dd).
Right now I have the following:
FedYieldCurve <- read.csv("Yield Curve.csv", header = TRUE, sep = ",", stringsAsFactors = FALSE)
FedYieldCurve$Date <- format(as.Date(FedYieldCurve$Date), "%Y/%m/%d")
and i am getting: Error in charToDate(x) :
character string is not in a standard unambiguous format
The format argument must be inside as.Date. Try this (if the dates in the files are stored in the 01-31-1900 format):
as.Date(FedYieldCurve$Date,format="%m-%d-%Y")
When you try to coerce a string to a Date object you have to specify the format of the string as the format argument in the as.Date call. You have the error you reported when you try to coerce a string which has a format other than the standard YYYY-mm-dd.
Provide a few lines of the file when asking questions like this. In the absence of this we have supplied some data below in a self contained example.
Use read.zoo from the zoo package (which xts loads) specifying the format. (Replace the read.zoo line with the commented line to read from a file.)
Lines <- "Date,Value
01-31-1900,3"
library(xts)
# z <- read.zoo("myfile.csv", header = TRUE, sep = ",", format = "%m-%d-%Y")
z <- read.zoo(text = Lines, header = TRUE, sep = ",", format = "%m-%d-%Y")
x <- as.xts(z)
See ?read.zoo and Reading Data in zoo.

Resources