I am having two date(formatted as dd-mm-yyyy in excel) columns in my data in excel sheet.
Date Delivery Date Collection
06-08-17 15-08-17
11-04-17 15-04-17
24-01-17 24-01-17
11-08-16 14-08-16
There are multiple issues.
Currently I am reading a subset of data(manually made of top 100 rows in another excel sheet.).
The dates in same format in excel are shown differently in R.
They all look like as in Data.Collection when I read the whole data set.
data <- read.xlsx("file.xlsx", sheetName='subset', startRow=1)
The data output shown in R is
.
I need them all to be shown as in Data.Delivery because I need to write the result back after analysis.
I am also trying to make it Date in R using
dates <- data$Date.Delivery
as.Date(dates, origin = "30-12-1899",format="%d-%m-%y")
To format Date.Collection as in Data.Delivery after reading your file, try
# see the str of your data
str(data)
# if Date.Collection is characher
data$Date.Collection <- as.numeric(data$Date.Collection)
# if Date.Collection is factor
data$Date.Collection <- as.numeric(levels(data$Date.Collection))[data$Date.Collection]
# conversion
data$Date.Collection <- as.Date(data$Date.Collection - 25569, origin = "1970-01-01")
or you can read the file using "gdata" or "XLConnect" packages to read the column as factor.
then use ymd() from lubridate to convert it into date
require(gdata)
data = read.xls (path, sheet = 1, header = TRUE)
data$Date.Collection <- ymd(data$Date.Collection)
Related
If you open up a new instance of Excel 2016 and enter a date in the following format 8/15 it will output in Excel like this:
Excel's format menu identifies this as Custom > d-mmm. Now let's move on to the R part of this question. I'll create a simple data frame:
df <- data.frame(col1 = as.Date(c("2021-01-01", "2021-02-15")), col2 = c(3, 9))
#> col1 col2
#> 1 2021-01-01 3
#> 2 2021-02-15 9
I want to write this data frame to an Excel file, and default to Excel's Custom > d-mmm date format (as shown above).
My first attempt is quite generic, and uses the writexl package:
library(writexl)
write_xlsx(df, "df.xlsx")
Excel ends up reading these dates as Excel's yyyy-mm-dd format, not the Custom > d-mmm format I want. Let me get a little more sophisticated and try and use the openxlsx package to get what I want:
library(openxlsx)
options(xlsx.date.format = "d-mmm")
write.xlsx(df, file = "df.xlsx", asTable = TRUE)
This time, the output is in the Excel format Date > *3/14/2012. How can I get the date to be recognized in Excel, in the Excel Custom > d-mmm format? The output would look like this, and should not involve any manual Excel steps:
Apply the global formatting syntax as per the documentation
options(openxlsx.dateFormat = "d-mmm")
Documentation:
https://rdrr.io/cran/openxlsx/man/openxlsx.html
I have multiple excel files (160) where one 'date' column about 100 observations is in not in proper format. While exporting all the files together the date column changes as follows
Date Column in Excel
Date
05-07-2015
04-07-2015
03-07-2015
02-07-2015
.......
Date column importing in R
Date
42190
42189
42188
42187
......
How to change "42191" to original date format?
Excel may save dates as numeric or maybe they imported in a numeric format. So you can try:
# from Windows Excel:
as.Date(42190, origin = "1899-12-30")
[1] "2015-07-05"
# from Mac Excel:
as.Date(42190, origin = "1904-01-01")
Interestingly, Excel support page define the origin date for Windows excel as "1900-01-01", but from here and here you can see that for R, date of "1899-12-30" should use as the origin date.
copy your date column and special paste it as value in other column and use that to import in R
I've found this to be quite helpful:
library(openxlsx)
dates <- c(42190,42189,42188,42187)
datesConverted <- convertToDate(dates); datesConverted
# "2015-07-05" "2015-07-04" "2015-07-03" "2015-07-02"
Gives you exactly what you're looking for.
I have an excel file which has date information in some cells. like :
I read this file into R by the following command :
library(xlsx)
data.files = list.files(pattern = "*.xlsx")
data <- lapply(data.files, function(x) read.xlsx(x, sheetIndex = 9,header = T))
Everything is correct except the cells with date! instead of having the xlsx information into those cell, I always have 42948 as a date :
Does anybody know how could I fix this ?
As you can see, after importing your files, dates are represented as numeric values (here 42948). They are actually the internal representation of the date information in Excel. Those values are the ones that R presents instead of the “real” dates.
You can get those dates in R with as.Date(42948 - 25569, origin = "1970-01-01")
Notice that you can also use a vector containing the internal representation of the dates, so this should also work
vect <- c(42948, 42949, 42950)
as.Date(vect - 25569, origin = "1970-01-01")
PS: To convert an Excel datetime colum, see this (p.31)
There is a large dataset that I need to download over the web using R, but I would like to learn how to filter it at the same time while downloading to the Dates that I need. Right now, I have it setup to download and .unzip and then I create another data set with a filter. The file is a text ";" delimited file
There is a Date column with format 1/1/2009 and I need to only select two dates, 3/1/2009 and 3/2/2009, how to do that in R ?
When I import it, R set it as a factor, since I only need those two dates and there is no need to do a Between, I just select the two factors and call it a day.
Thanks!
I don't think you can filter while downloading. To select only these dates you can use the subset function:
# do not convert string to factors
d.all = read.csv(file, ..., stringsAsFactors = FALSE, sep = ';')
# Date column is called DATE:
d.filter = subset(d.all, DATE %in% c("1/1/2009", "3/1/2009"))
I want to import an excel file in R. The file however has columns such as Jan-13, Jan14 and so on. These are the column headers. When I import the data using the readxl package, it by default converts the date into numbers. So my columns which should be dates are now numbers.
I am using the code :
library(readxl)
data = read_excel("FileName", col_names = TRUE, skip = 0)
Can someone please help?
The date information is still there. It's just in the wrong format. This should work:
names(data) <- format(as.Date(as.numeric(names(data), origin="1899-01-01")), "%b%d")