Date changes while importing excel file in R - r

I have multiple excel files (160) where one 'date' column about 100 observations is in not in proper format. While exporting all the files together the date column changes as follows
Date Column in Excel
Date
05-07-2015
04-07-2015
03-07-2015
02-07-2015
.......
Date column importing in R
Date
42190
42189
42188
42187
......
How to change "42191" to original date format?

Excel may save dates as numeric or maybe they imported in a numeric format. So you can try:
# from Windows Excel:
as.Date(42190, origin = "1899-12-30")
[1] "2015-07-05"
# from Mac Excel:
as.Date(42190, origin = "1904-01-01")
Interestingly, Excel support page define the origin date for Windows excel as "1900-01-01", but from here and here you can see that for R, date of "1899-12-30" should use as the origin date.

copy your date column and special paste it as value in other column and use that to import in R

I've found this to be quite helpful:
library(openxlsx)
dates <- c(42190,42189,42188,42187)
datesConverted <- convertToDate(dates); datesConverted
# "2015-07-05" "2015-07-04" "2015-07-03" "2015-07-02"
Gives you exactly what you're looking for.

Related

How to read in excel file when Date and Time in the same column in R

I am trying to read an excel file into R. Among other fields, the excel file has two "date" fields, each containing both the date and time stamp in the SAME field.
Example:
StartDate 9/14/2019 10:18:59 AM
EndDate 9/18/2019 2:27:14 AM
When I tried read_excel to read in the excel file, the data frame formatted these two columns very strangely. It spat out the days (with decimals). Such as 43712.429849537039, Which I thought was days from Jan-01-1970 (the origin date that popped up when I typed lubrudate::origin).
data %<>%
mutate(StartDate = as.Date(StartDate, origin = "1970-01-01 UTC"))
So I tried converting this back using as.Date, but it converts it to the totally wrong date... (converts all the dates to the year 2089). Example, 2089-09-05.
Any help with this would be really appreciated! There must be a simpler way to directly read in a date-time column?!
You can use the lubridate package, it is excellent:
library(tidyverse)
df <- data.frame(StartDate =c("9/14/2019 10:18:59 AM","9/14/2019 3:18:59 PM"),
EndDate= c("9/18/2019 2:27:14 AM","9/18/2019 1:27:14 PM"))
df <- df %>% mutate(StartDate = lubridate::mdy_hms(StartDate), EndDate = lubridate::mdy_hms(EndDate))
It turns out that excel has a different "origin date" from R. Excels counts the days from 01-01-1900, where as R counts days from 01-01-1970.
When I used read_excel to read the file into a df, R used excels' counts of days. Which is why I got a weird date when I tried to convert to the date format using 1970. As soon as I used as.Date with excels "origin" date of 1990 (excels origin date), my dates parsed out correctly!

as.Date() not giving desired result. (giving NA)

I read a .xlsx file containing Date columns into R and converted it into dataframe.
Some date columns are being read correctly but most of the others are getting converted to "43116" format.
Any attempt to convert it into Date using as.Date(, origin= <>, format=<>) is returning NA.
I have tried all possible solutions like using 'stringAsFactors = FALSE', POSIT thing and checking the excel file for date formats but nothing worked.
Please help.
It is difficult to recreate the problem if no data is provided, but if you want to convert the number 43129 or the character "43129" to a R date you should do the following:
a <- 43129
b <- '43129'
format(as.Date(a, origin = "1899-12-30"), '%Y-%m-%d')
[1] "2018-01-29"
format(as.Date(as.integer(b), origin = "1899-12-30"), '%Y-%m-%d')
[1] "2018-01-29"
I used the format yyyy-mm-dd, but any other date format could be used if you format it properly.
Hope it helps!

Reading dates from Excel

I am having two date(formatted as dd-mm-yyyy in excel) columns in my data in excel sheet.
Date Delivery Date Collection
06-08-17 15-08-17
11-04-17 15-04-17
24-01-17 24-01-17
11-08-16 14-08-16
There are multiple issues.
Currently I am reading a subset of data(manually made of top 100 rows in another excel sheet.).
The dates in same format in excel are shown differently in R.
They all look like as in Data.Collection when I read the whole data set.
data <- read.xlsx("file.xlsx", sheetName='subset', startRow=1)
The data output shown in R is
.
I need them all to be shown as in Data.Delivery because I need to write the result back after analysis.
I am also trying to make it Date in R using
dates <- data$Date.Delivery
as.Date(dates, origin = "30-12-1899",format="%d-%m-%y")
To format Date.Collection as in Data.Delivery after reading your file, try
# see the str of your data
str(data)
# if Date.Collection is characher
data$Date.Collection <- as.numeric(data$Date.Collection)
# if Date.Collection is factor
data$Date.Collection <- as.numeric(levels(data$Date.Collection))[data$Date.Collection]
# conversion
data$Date.Collection <- as.Date(data$Date.Collection - 25569, origin = "1970-01-01")
or you can read the file using "gdata" or "XLConnect" packages to read the column as factor.
then use ymd() from lubridate to convert it into date
require(gdata)
data = read.xls (path, sheet = 1, header = TRUE)
data$Date.Collection <- ymd(data$Date.Collection)

Date formatted cell in xlsx files to R

I have an excel file which has date information in some cells. like :
I read this file into R by the following command :
library(xlsx)
data.files = list.files(pattern = "*.xlsx")
data <- lapply(data.files, function(x) read.xlsx(x, sheetIndex = 9,header = T))
Everything is correct except the cells with date! instead of having the xlsx information into those cell, I always have 42948 as a date :
Does anybody know how could I fix this ?
As you can see, after importing your files, dates are represented as numeric values (here 42948). They are actually the internal representation of the date information in Excel. Those values are the ones that R presents instead of the “real” dates.
You can get those dates in R with as.Date(42948 - 25569, origin = "1970-01-01")
Notice that you can also use a vector containing the internal representation of the dates, so this should also work
vect <- c(42948, 42949, 42950)
as.Date(vect - 25569, origin = "1970-01-01")
PS: To convert an Excel datetime colum, see this (p.31)

Export a simple R dataframe to txt tsv or csv

I am trying to do something apparently obvious, but have no way to solve it. From a dataframe in R downloaded from the web as follows I need to save the data. Here is how I do download it:
library(tseries)
library(zoo)
ts <- get.hist.quote(instrument="DJIA",
start="2008-07-01", end="2017-03-05",
quote="Close", provider="yahoo", origin="1970-01-01",
compression="d", retclass="zoo")
Then, returns object "ts" with a two columns table; the first of dates (with no header as R prefers) and the other with the "Close" value of DJIA
> ts
Close
2008-07-01 11382.26
2008-07-02 11215.51
2008-07-03 11288.53
2008-07-07 11231.96
.
.
.
2016-03-03 16943.90
2016-03-04 17006.77
I need this data exported in txt or similar format and import the list later; (because I will try to process health information, with no internet access) but when I try to save it; the date column with no header is missing. Additionally a "number of row" column is added. I do appologize if the question is obvious but have no other option to solve it
The date column has no header, because the date is imported as rownames/index. The default of write.csv has row.names = FALSE. Try:
write.csv(ts, file = "ts.csv",row.names=TRUE)
EDIT
Strangly, this doesn't work with an object of class "zoo"
According tot ? write.table:
write.table prints its required argument x (after converting it to a
data frame if it is not one nor a matrix) to a file or connection.
Apparently this conversion fails somehow. However, this works:
write.csv(data.frame(ts), file = "ts.csv",row.names=TRUE)
The ts object is a zoo object (not a two column table). In this case the zoo object is internally represented by a one column matrix of data and an "index" attribute holding the dates.
1) save/load If the only thing you want to do with the output file is to read it back into R later then there is no reason to require text and any format will do. In particular you could do this:
save(ts, file = "ts.Rda")
Now in a later session:
library(zoo)
load("ts.Rda")
1a) This would also work and produces an R source file that when sourced reconstructs the zoo object:
dump("ts", "ts.R")
and in a later session:
library(zoo)
source("ts.R")
2) write.zoo/read.zoo This will give a text file:
write.zoo(ts, "ts.dat")
and it can be written back in another session using:
library(zoo)
ts <- cbind( read.zoo("ts.dat", header = TRUE) )

Resources