Correct date initially as factor in R - r

I have dates format like this 31.3.14 for example. I do:
as.Date(gsub("\\.","-", "31.3.14"))
I get this: "0031-03-14", what I would need is: 31-02-2014
I would need that for random dates like: 31.3.99 (to get date as: 31-03-1999)
So I don't know how I could just remove the 00 front of 31 and add 20 front of 20 to be 31-03-2014 and do the same also for dates like 31.3.99.

Try wrapping as.Date with format
x <- c("31.3.14", "31.3.99")
format(as.Date(x, "%d.%m.%y"), "%d-%m-%Y")
# [1] "31-03-2014" "31-03-1999"

Related

Convert character date in R fread to show four digit year format

I have a csv file which shows date as 14-Mar-20 . Its a Date-Month-Year format. But in background it is showing as 3/14/2020.
When I try to do fread this file into R it comes as a character format 14-Mar-20.I converted this to date as as.Date(x, format("%d-%h-%Y).
The issue is, In R the date shows the year as "20" (Two digits). I want to read the data as four digit year into R. I don't want to add string 20 to make it 2020 as there can also have years like 1948. No amount of formatting helps with Year as %Y.
Is there a way to read csv file such that the date comes as 14/Mar/2020 or a way in R to make the years into four digit without string add of 20 to year?
Sample Data
c("12-Dec-14", "19-Dec-14", "12-Dec-14", "19-Dec-14", "12-Dec-14",
"26-Dec-14")
Expected Output:
12-12-2014, 19-Dec-2014....
Note: In csv file it is stores as 12/12/2014 but formatted to show as 12-Dec-14. So when I pull the data in R it comes as 12-Dec-14
Maybe this could help
> strftime(as.Date(d,format = "%d-%b-%y"),format = "%d-%b-%Y")
[1] "12-Dec-2014" "19-Dec-2014" "12-Dec-2014" "19-Dec-2014" "12-Dec-2014"
[6] "26-Dec-2014"
Data
d <- c("12-Dec-14", "19-Dec-14", "12-Dec-14", "19-Dec-14", "12-Dec-14",
"26-Dec-14")
You want to use %y.
From the documentation of strptime():
Year without century (00–99). On input, values 00 to 68 are prefixed
by 20 and 69 to 99 by 19 – that is the behaviour specified by the 2018
POSIX standard, but it does also say ‘it is expected that in a future
version the default century inferred from a 2-digit year will change’.
You can use as.Date to convert dates into date class.
x <- c("12-Dec-14", "19-Dec-14", "12-Dec-14", "19-Dec-14", "12-Dec-14", "26-Dec-14")
x1 <- as.Date(x, '%d-%b-%y')
x1
#[1] "2014-12-12" "2014-12-19" "2014-12-12" "2014-12-19" "2014-12-12" "2014-12-26"
If you want data in specific format use format on date values.
x2 <- format(x1, '%d-%b-%Y')
x2
#[1] "12-Dec-2014" "19-Dec-2014" "12-Dec-2014" "19-Dec-2014" "12-Dec-2014" "26-Dec-2014"

Converting a column of integers that aren't in date format already into abbreviated months

I'm trying to convert a column of full integers into date format of abbreviated months. The column has numbers like : 01 02 04 15 13. etc. I want these numbers to show the month they correspond to. Could someone please tell me how. the code I'm trying is this:
#Changing integers to Month Abbrev.
dets_per_month$monthcollected = as.POSIXlt(dets_per_month$monthcollected, format = "%m", origin = "%m")
but I realize the column doesn't have an origin because it's not in date format.
month.abb[as.integer(dets_per_month$monthcollected)]
I would recommend the lubridate package for all things date-time related. It's a nifty package and has more utility than base R, but YMMV.
library(lubridate)
x <- rep(1:12, 2)
lubridate::month(x, label=TRUE)

Convert time (24 hours format) in character type to time in R

I have a data frame with time in character format, I need to convert it into time format. I have tried using strptime and POSIXct but it adds the date also. I just need the time.
For e.g.: TRK_DEP_TIME <- c("22:00", "14:30"......) _____ character datatype
doing........ as.POSIXCT(TRK_DEP_TIME, format = %H:%M")
The result will be ("10/11/17,22:00", "10/11/17, 14:30".....)
I am looking for just the time, I don't need the date to be associated with it. Is there any way I can achieve this?
Use chron "times" class:
library(chron)
ch <- c("22:00", "14:30") # test input (character)
times(paste0(ch, ":00"))
## [1] 22:00:00 14:30:00

Date Transformation in R

I'm facing a very minor issue, but somehow can't resolve it.
When I'm importing a csv file that has date, the date is coming in "%Y-%m-%d" format. But I want it to be in "%d-%m-%Y" format. I tried "as.Date" to transform it. But it's not working.
The data structure look like this after importing:
Date Share_Val
21/01/2015 20
22/01/2015 19
23/01/2015 21
24/01/2015 23
25/01/2015 26
But when I'm importing the file by read.csv, the data look like the following:
Date Share_Val
01/21/2015 20
01/22/2015 19
01/23/2015 21
01/24/2015 23
01/25/2015 26
I tried lubridate. But it didn't help.
Sam's result comes exactly the way I wanted. But when I'm trying the following, it's not coming
data$date<-format(as.Date(data$date,"%m/%d/%Y"))
Can anybody please give me any suggestions?
See if this helps. Note the stringsAsFactors. If your Date field is a factor, you will need data$Date <- as.character(data$Date) first
data <- data.frame(Date = c("21/01/2015", "22/01/2015", "23/01/2015",
"24/01/2015", "25/01/2015"), Share_Val=c(20, 19, 21, 23, 26),
stringsAsFactors=F)
format(as.Date(data$Date, "%d/%m/%Y"), "%d-%m-%Y")
[1] "21-01-2015" "22-01-2015" "23-01-2015" "24-01-2015" "25-01-2015"
Too long for a comment.
I think you may be misunderstanding how Dates work in R. A variable (or column) of class Date is stored internally as the number of days since 1970-01-01. When you print a Date variable, it is displayed using the %Y-%m-%d format. The as.Date(...) function converts character to Date. The format=... argument controls how the character string is interpreted, not how the result is displayed, as in:
as.Date("02/05/2015", format="%m/%d/%Y")
# [1] "2015-02-05"
as.Date("02/05/2015", format="%d/%m/%Y")
# [1] "2015-05-02"
So in the first case the string is interpreted as 05 Feb, in the second 02 May. Note that in both cases the result is displayed (printed) in %Y-%m-%d format.

lm and time series formats - is conversion necessary?

I want the slope from a couple of columns that looks like so:
date time
7/8/2014 23.4917166
7/9/2014 28.69671107
7/10/2014 27.3262166
7/11/2014 30.25426663
7/12/2014 29.8345944
7/13/2014 27.7473055
7/14/2014 29.8657722
7/15/2014 29.2622055
The problem is, lm() doesn't seem to play ball with date in a mm/dd/yyyy format. If I make the date data numeric like so:
date time
1 23.4917166
2 28.69671107
3 27.3262166
4 30.25426663
5 29.8345944
6 27.7473055
7 29.8657722
8 29.2622055
and run something like
timetest <- read.table("clipboard", sep="\t", header=T)
test <- lm(time ~ date, data=timetest)
coefficients(test)[2]
I get:
date
0.5605038
So how should I go about transforming the mm/dd/yyyy date format into something numeric? Is there a function to cast them as unix time?
If you convert first the date field to a Date specifying the format used (MM/DD/YYYY eq %m/%d/%Y), then lm does the conversion for you:
timetest$new_date <- as.Date(timetest$date, format = "%m/%d/%Y")
So, the regression looks like
test <- lm(time ~ new_date, data = timetest)
coefficients(test)[2]
and gives
as.numeric(new_date)
0.5605038
Note that as.numeric turns the date into the number of days since 1970-01-01
as.numeric(timetest$new_date[1])
[1] 16259
and
difftime(timetest$new_date[1], as.Date("1970-01-01"))
Time difference of 16259 days
You can also use predict to obtain new value for dates in the format of the original field
predict(test, data.frame(new_date =
seq.Date(as.Date("7/16/2014", format = "%m/%d/%Y"),
as.Date("7/20/2014", format = "%m/%d/%Y"), by = 1)))
that returns
1 2 3 4 5
30.83212 31.39262 31.95312 32.51363 33.07413
For some reason the as.POSIXct() wasn't working, so I went with:
timetest <- read.table("clipboard", sep="\t", header=T)
timetest$date <- as.numeric(as.Date(timetest$date, "%m/%d/%Y"))
test <- lm(time ~ date, data=timetest)
coefficients(test)[2]
Where the new 2nd line just overwrites the original m/d/y data with numeric values. Unix time not necessary for this process.
If you want to convert your dates into Unix time, you can use something like
unix_time<-as.numeric(as.POSIXct(date, format="%m/%d/%Y"))
but the values will end with a lot of zeros, so I'm not sure how useful they'll be in a regression.

Resources