Convert a character column to dates in R - r

I am trying to convert a data column (x_date) that has this form "31.03.2013" (the class is "character") into Dates in the form of "2013-01-31"
I tried with the following codes:
as.Date(x_date, format = "%d-%m-%Y")
as.Date(x_date, format="%Y-%m-%d")
as.Date(x_date,format= "%Y-%m-%d", tryFormats = c("%Y-%m-%d", "%Y/%m/%d", "%d.%m.%Y"), optional=FALSE )
in all of the three cases the complete data column turns into "NA".
Then I tried this code:
format.Date(x_date, format="%Y-%m-%d")
and I get an error warning.
Can anybody help me to convert my column into the respective Dates?

Specify the format of the data instead of in tryFormats
as.Date(x_date, format = '%d.%m.%Y')

Related

Change factor into time format

A column contains only time in H:M:S(e.g. 13:08:20) but its in FACTOR format so now I want to change the FACTOR into POSIXct so that I can apply ceiling date() function on it.
I have tried these in some cases when I run its shows no error but then the columns whole contains NA values. :
x <- anytime(cam5$CaptureTime)
x <- hms(cam5$CaptureTime)
x <- hms(as.character(cam5$CaptureTime))
x <- as.POSIXct(cam5$CaptureTime)
x <- as.POSIXct(as.character(cam5$CaptureTime))
We can use as.POSIXct and specify the format
as.POSIXct("13:08:20", format = "%T")
Or specifying it separately
as.POSIXct("13:08:20", format = "%H:%M:%S")
This would also work with strptime
strptime("13:08:20", format = "%T")
We can use hms from lubridate
library(lubridate)
hms("13:08:20")

How to convert a numeric value into a Date value

So, I have a data.frame with a column called Date.birth, but I have these values in a numeric format:
Date.birth
43067
43060
Probably is problem format. But I need in a Date format like these:
Date.birth
11/28/17
11/21/17
Actually the above format is the correct. I tried this command:
as.Date(levels(data$Date.birth), format="%d.%m.%Y")
but didn't work. Anyone has a suggestion?
Thanks.
We need to specify the origin if it is a numeric value
as.Date(data$Date.birth, origin = "1899-12-30")
e.g.
as.Date(43067, origin = "1899-12-30")
#[1] "2017-11-28"
After converting to Date class, if it needs to be in a custom format, use format
format(as.Date(43067, origin = "1899-12-30"), "%m/%d/%y")
#[1] "11/28/17"
If your column is factor, do convert to numeric first
as.Date(as.numeric(as.character(data$Date.birth)), origin = "1899-12-30")
If this is an excel numeric date, janitor has a great solution:
library(janitor)
excel_numeric_to_date(data$Date.birth)
It can be simply done by using lubridate package-
lubridate::as_date(as.numeric(dt$Date.birth),origin="1899-12-30")
[1] "2017-11-28" "2017-11-21"
Sample Data-
dt <- read.table(text="Date.birth
43067
43060",header=T)
try this
VariableName <- dt %>%
mutate(Date.birth = format(excel_numeric_to_date(as.numeric(Date.birth)),"%m/%d/%y"))
#[1] "11/28/17"

How can I reformat a series of dates in a vector in R

I have vector of dates that i'm trying to convert with as date but i'm not getting the expected output, when I sapply with as.Date instead of getting a series of reformatted dates I get the names as dates and some odd value.
dates = c("20-Mar-2015", "25-Jun-2015", "23-Sep-2015", "22-Dec-2015")
sapply(dates, as.Date, format = "%d-%b-%Y")
20-Mar-2015 25-Jun-2015 23-Sep-2015 22-Dec-2015
16514 16611 16701 16791
I would like each of the values in the vector to be showing the new formated value. E.g. like what would happen if as.Date was shown applied to each element
as.Date("20-Mar-2015", format = "%d-%b-%Y")
[1] "2015-03-20"
You can directly use as.Date(dates, format = "%d-%b-%Y"). as.Date is vectorized, i.e. it can take a vector as input, not only a single entry.
In your case:
dates <- c("20-Mar-2015", "25-Jun-2015", "23-Sep-2015", "22-Dec-2015")
as.Date(dates, format = "%d-%b-%Y")
# [1] "2015-03-20" "2015-06-25" "2015-09-23" "2015-12-22"

Trouble finding non-unique index entries in zooreg time series

I have several years of data that I'm trying to work into a zoo object (.csv at Dropbox). I'm given an error once the data is coerced into a zoo object. I cannot find any duplicated in the index.
df <- read.csv(choose.files(default = "", caption = "Select data source", multi = FALSE), na.strings="*")
df <- read.zoo(df, format = "%Y/%m/%d %H:%M", regular = TRUE, row.names = FALSE, col.names = TRUE, index.column = 1)
Warning message:
In zoo(rval3, ix) :
some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
I've tried:
sum(duplicated(df$NST_DATI))
But the result is 0.
Thanks for your help!
You are using read.zoo(...) incorrectly. According to the documentation:
To process the index, read.zoo calls FUN with the index as the first
argument. If FUN is not specified then if there are multiple index
columns they are pasted together with a space between each. Using the
index column or pasted index column: 1. If tz is specified then the
index column is converted to POSIXct. 2. If format is specified then
the index column is converted to Date. 3. Otherwise, a heuristic
attempts to decide among "numeric", "Date" and "POSIXct". If format
and/or tz is specified then they are passed to the conversion function
as well.
You are specifying format=... so read.zoo(...) converts everything to Date, not POSIXct. Obviously, there are many, many duplicated dates.
Simplistically, the correct solution is to use:
df <- read.zoo(df, FUN=as.POSIXct, format = "%Y/%m/%d %H:%M")
# Error in read.zoo(df, FUN = as.POSIXct, format = "%Y/%m/%d %H:%M") :
# index has bad entries at data rows: 507 9243 18147 26883 35619 44355
but as you can see this does not work either. Here the problem is much more subtle. The index is converted using POSIXct, but in the system time zone (which on my system is US Eastern). The referenced rows have timestamps that coincide with the changeover from Standard to DST, so these times do not exist in the US Eastern timezone. If you use:
df <- read.zoo(df, FUN=as.POSIXct, format = "%Y/%m/%d %H:%M", tz="UTC")
the data imports correctly.
EDIT:
As #G.Grothendieck points out, this would also work, and is simpler:
df <- read.zoo(df, tz="UTC")
You should set tz to whatever timezome is appropriate for the dataset.

Default Axis Format of PosixCt

Is there a way I can change the default format for how POSIXct labels appear when using plot and when they are part of a dataframe (Date HH:MM instead of just HH:MM)?
I would be nice if I could do this without having to issue an axis command each time or converting the dataframe to an xts object.
Answer goes to Vincent Zoonekynd.
You can use format argument in plot function to output the data in "%Y-%m-%d %H:%M" format.
Please see the code below:
df <- data.frame(
ms = c(10485849612, 10477641600, 10561104000, 10562745600),
value = 1:4
)
df$posix_time <- as.POSIXct(df$ms, origin = "1582-10-14", tz = "GMT")
plot(df$posix_time, df$value, format = "%Y-%m-%d %H:%M")
Output:

Resources