Formatting Dates with non-standard format - r

I'm relatively new to this site so forgive me if my question is a bit vague for you guys. I also realize there are many threads on this topic, yet I feel they do not answer my question specifically since they are almost all about changing yy/mm/dd to dd/mm/yy or vice versa.
In short what do i want? I want my current format changed into only year.
I have a column full of dates of this format.
31OCT2016:23:52:00.000
I've seen in many topics you can use format commands but they go something like this;
dates <- c("05/27/84", "07/07/05")
I have over 100.000 observations so this can't be done manually.
So I tried;
mydata$dates <- format(as.Date(mydata$dates), "%Y")
But that didn't work. I saw on this website the proper values
http://www.statmethods.net/input/dates.html
But it did not say anything on how to get rid of hours minutes and seconds.
So what is the easiest way to strip it all down to year only?

Lubridate is your friend. To be precise, the function dmy_hms:
I'll generate some sample data which has the same format as your example so my code is reproducible. Don't worry about it too much. For your purposes, you can jump right to the conversion part.
#------------------------------------------------------------------------------------------
#This code block is entirely for generating reproducible sample data
d <- sample(1:27,10,T)
mon <- toupper(sample(month.abb,10,T))
y <- sample(2000:2017,10,T)
h <- sample(0:23,10,T)
min <- sample(0:59,10,T)
s <- sample(0:59,10,T)
#load package
library(lubridate)
dts <- sprintf('%02d%s%s:%s:%s:%s.000',d,mon,y,h,min,s)
> dts
[1] "01JAN2012:12:6:53.000" "01NOV2010:0:19:47.000" "03SEP2000:9:45:3.000" "25NOV2009:21:39:57.000" "08DEC2015:19:27:36.000"
[6] "23MAR2009:13:39:40.000" "03JUN2010:14:54:50.000" "03APR2002:6:34:45.000" "19NOV2012:5:17:29.000" "02FEB2003:0:3:59.000"
#------------------------------------------------------------------------------------------
So basically the variable dts is your column full of dates which you want to convert:
#conversion
> dmy_hms(dts)
[1] "2012-01-01 12:06:53 UTC" "2010-11-01 00:19:47 UTC" "2000-09-03 09:45:03 UTC" "2009-11-25 21:39:57 UTC"
[5] "2015-12-08 19:27:36 UTC" "2009-03-23 13:39:40 UTC" "2010-06-03 14:54:50 UTC" "2002-04-03 06:34:45 UTC"
[9] "2012-11-19 05:17:29 UTC" "2003-02-02 00:03:59 UTC"
And then to get just the years, you can use the year function:
> year(dmy_hms(dts))
[1] 2012 2010 2000 2009 2015 2009 2010 2002 2012 2003
So assuming you want to do everything inside the data.frame, your code could look like this:
# example dataframe
dframe <- data.frame(variable=c('A','B','C'),dates=sample(dts,3))
This is a data frame with some variable and the column with the dates.
> dframe
variable dates
1 A 15JAN2000:0:37:6.000
2 B 13DEC2016:8:34:28.000
3 C 18AUG2005:2:27:16.000
So to convert the dates, we can simply do dframe$dates <- year(dmy_hms(dframe$dates))
If we look at dframe again, we can see that the conversion was successful:
> dframe
variable dates
1 A 2000
2 B 2016
3 C 2005

Related

Need help to extract the data of only one year from a .nc dataset in R

I need some help with a geospatial data project:
I downloaded this dataset of yearly mean temperatures: https://psl.noaa.gov/data/gridded/data.UDel_AirT_Precip.html (this one) from which I need to data of 1990.
I imported the data into R with the following command:
ras_tempdata_v5 = raster("PATH/air.mon.mean.v501.nc")
and was hoping I could go with something like
tempdata_v5_90 <- raster(ras_tempdata_v5#history == 1990)
(would the "raster" here be needed again or would it automatically be raster data?)
to end up with just the data for 1990. But I have no idea where is the dataset the years are stored and how to access only the data for one specific year.
This is what the dataset looks like in RSTudio.
I also installed Panoply (looked like this) to find out something about the variable structure, but that didn't really help me either.
Any help would be highly appreciated!
I think you can accomplish what you are after by first calling the dataset as a RasterStack, which only reads a little bit of data about the rasters. Then you can look at how the raster layers are named. From there, you will use a regular expression to get the layer numbers for all layers representing 1990. Then simply do a subset select of your rasterStack, and create a rasterBrick from your desired layers. You can then remove the original rasterStack from you environment to save memory if you want. Below is a reproducible example:
library(raster)
path<-"Path_to_your_rasters/air.mon.mean.v501.nc"
test<-stack(path)
lyrs<-names(test)
usethese<-grep(pattern="X1990",lyrs)#My machine reads each layer in with an "X", print lyrs to see if yours does the same.
MMAT<-brick(test[[usethese]])
rm(test)
I would use terra (the replacement of raster); perhaps like this:
library(terra)
r <- rast("air.mon.mean.v501.nc")
i <- grep("1990", time(r))
# or something like this but that is more tricky
#i <- which((time(r) > "1989-12-31") & (time(r) < "1990-12-31"))
i
# [1] 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092
x <- r[[i]]
time(x)
# [1] "1990-01-01 UTC" "1990-02-01 UTC" "1990-03-01 UTC" "1990-04-01 UTC"
# [5] "1990-05-01 UTC" "1990-06-01 UTC" "1990-07-01 UTC" "1990-08-01 UTC"
# [9] "1990-09-01 UTC" "1990-10-01 UTC" "1990-11-01 UTC" "1990-12-01 UTC"
(Otherwise the same as Sean McKenzie's approach)

Recode "date & time variable" into two separate variables

I'm a PhD student (not that experienced in R), and I'm trying to recode a string variable, called RecordedDate into two separate variables: a Date variable and a Time variable. I am using RStudio.
An example of values are:
8/6/2018 18:56
7/26/2018 10:43
7/28/2018 8:36
I would like to you the first part of the value (example: 08/6/2018) to reformat this into a date variable, and the second part of the value (example: 18:56) into a time variable.
I'm thinking the first step would be to create code that can break this up into two variables, based on some rule. I’m thinking maybe I can separate separate everything before the "space" into the Date variable, and after the "space" in the Time variable. I am not able to figure this out.
Then, I'm looking for code that would change the Date from a "string" variable to a "date" type variable. I’m not sure if this is correct, but I’m thinking something like:
better_date <- as.Date(Date, "%m/%d/%Y")
Finally, then I would like to change theTime variable to a "time" type format (if this exists). Not sure how to do this part either, but something that indicates hours and minutes. This part is less important than getting the date variable.
Two immediate ways:
strsplit() on the white space
The proper ways: parse, and then format back out.
Only 2. will guarantee you do not end up with hour 27 or minute 83 ...
Examples:
R> data <- c("8/6/2018 18:56", "7/26/2018 10:43", "7/28/2018 8:36")
R> strsplit(data, " ")
[[1]]
[1] "8/6/2018" "18:56"
[[2]]
[1] "7/26/2018" "10:43"
[[3]]
[1] "7/28/2018" "8:36"
R>
And:
R> data <- c("8/6/2018 18:56", "7/26/2018 10:43", "7/28/2018 8:36")
R> df <- data.frame(data)
R> df$pt <- anytime::anytime(df$data) ## anytime package used
R> df$time <- format(df$pt, "%H:%M")
R> df$day <- format(df$pt, "%Y-%m-%d")
R> df
data pt time day
1 8/6/2018 18:56 2018-08-06 18:56:00 18:56 2018-08-06
2 7/26/2018 10:43 2018-07-26 10:43:00 10:43 2018-07-26
3 7/28/2018 8:36 2018-07-28 00:00:00 00:00 2018-07-28
R>
I often collect data in a data.frame (or data.table) and then add column by column.

How to convert a date with only a year to a date with the format "Year-Month-Day" in R

Sorry for the question, I started using RStudio a month ago and I get confronted to things I've never learned. I checked all the websites, helps and forums possible the past two days and this is getting me crazy.
I got a variable called Release giving the date of the release of a song. Some dates are following the format %Y-%m-%d whereas some others only give me a Year.
I'd like them to be all the same but I'm struggling to only modify the observations with the year.
Brief summary in word:
11/11/2011
01/06/2011
1974
1970
16/09/2003
I've imported the data with :
music<-read.csv("music2.csv", header=TRUE, sep = ",", encoding = "UTF-8",stringsAsFactors = F)
And this how I have it in RStudio
"2011-11-11" "2011-06-01" "1974" "1970" "2003-09-16"
This is an example as I got 2200 obs.
The working code is
Modifdates<- ifelse(nchar(music$Release)==4,paste0("01-01-",music$Release),music$Release)
Modifdates
I obtain this :
"2011-11-11" "2011-06-01" "01-01-1974" "01-01-1970" "2003-09-16"
I just would like them to be all with the same format "%Y-%m-%d". How can I do that?
So I tried this
as.Date(music$Release,format="%Y-%m-%d")
But I got NA's where I modified my dates.
Could anyone help?
Update
Using sub find occurrences of date consisting from single year ("(^[0-9]{4}$)" part), using back-reference substitute it to add -01-01 at the end of the string ("\\1-01-01" part), and finally convert it to the date class, using as.Date() (as.Date() default is format = "%Y-%m-%d" so you don't need to specify it):
dat <- c("2011-11-11", "2011-06-01", "1974", "1970", "2003-09-16")
dat class is character:
as.Date(sub("(^[0-9]{4}$)", "\\1-01-01", dat))
# "2011-11-11" "2011-06-01" "1974-01-01" "1970-01-01" "2003-09-16"
dat class is factor, but sub automatically coerce it to the character class for you:
# dat <- as.factor(dat); dat
# 2011-11-11 2011-06-01 1974 1970 2003-09-16
# Levels: 1970 1974 2003-09-16 2011-06-01 2011-11-11
as.Date(sub("(^[0-9]{4}$)", "\\1-01-01", dat))
# "2011-11-11" "2011-06-01" "1974-01-01" "1970-01-01" "2003-09-16"
Welcome to SO, please try to provide a reproducible example next time so that we can best help you.
I think here you could use:
testdates <- c("1974", "12-12-2012")
betterdates <- ifelse(nchar(testdates)==4,paste0("01-01-",testdates),testdates)
> betterdates
[1] "01-01-1974" "12-12-2012"
EDIT: if your vector is factor you should use as.character.factor first. If you then want to convert back to factor you can use as.factor
EDIT2 : do not convert as.date before doing this. Only do it after this modification

Extract time from factor column in R

I would like to extract the time from a table column sd_data$start in R with the following characteristics:
str(sd_data$start)
Factor w/ 122 levels "01/03/2017 08:00",..: 1 2 5 10 12 14 18 19 20 21 ...
I found similar questions on the forum but so far all the answers have only given me NAs or blank values (00:00:00) so I see no other option than raise the question again specifically for my dataset.
I have managed to extract the dates and move them to a new column in the table with little effort and I am very surprised how difficult it is (for me at least) to do the same for hours, minutes and seconds. I must be overlooking something.
sd_data$start_date <- as.Date(sd_data$start,format='%d/%m/%Y')
sd_data$start_time <-
Thanks in advance for helping me to find the right lines of code to complete this task.
Here an example of what I am trying to do and where I am failing to get the time out.
smpldata <- "01/03/2017 08:00"
smpltime <-as.Date(as.character(smpldata),format='%d/%m/%Y %M:%S')
smpltime
# [1] 08:00 = what I would like to see
# [1] "2017-03-01" = what I am seeing
Maybe using as.character() to convert to character before convert to date, because the factor type is not well transformed. And including the other string elements on the date format as suggested above by Sotos.
sd_data$start_date <-
as.Date(as.character(sd_data$start),
format='%d/%m/%Y %H:%M:%S')
Another tip is to take a look at lubridate package. It's very usefull for this kind of task.
library(lubridate)
smpldata <- as.factor("01/03/2017 08:00")
(smpltime <-dmy_hm(as.character(smpldata)))
[1] "2017-03-01 08:00:00 UTC"
Here you still see the date. You can handle just the time for plots and other needs using hour() and minute().
hour(smpltime)
[1] 8
minute(smpltime)
[1] 0
Or you can use the format() function to get exactly what you want.
format(smpltime, "%H:%M:%S")
[1] "08:00:00"
format(smpltime, "%H:%M")
[1] "08:00"

R Convert to date from multiple formats

I need to convert a string of dates that is in multiple formats to valid dates.
e.g.
dates <- c("01-01-2017","02-01-2017","12-01-2016","20160901","20161001", "20161101")
> as.Date(dates, format=c("%m-%d-%Y","%Y%m%d"))
[1] "2017-01-01" NA "2016-12-01" "2016-09-01" NA "2016-11-01"
two dates show as NA
This is pretty much I wrote the anytime package for:
R> dates <- c("01-01-2017","02-01-2017","12-01-2016","20160901","20161001",
+ "20161101")
R> library(anytime)
R> anydate(dates)
[1] "2017-01-01" "2017-02-01" "2016-12-01" "2016-09-01"
[5] "2016-10-01" "2016-11-01"
R>
Parse any sane input reliably and without explicit format or origin or other line noise.
That being said, not starting ISO style with the year is asking for potential trouble, so 02-03-2017 could be February 3 or March 2. I am following the North American convention I too consider somewhat broken -- but is so darn prevalent. Do yourself a favour and try to limit inputs to ISO dates, at least ISO order YYYYMMDD.
I have tried library(anytime), however for big data did not work.
Then, I found useful this sequence:
df$Date2 <- format(as.Date(df$Date, format="%m/%d/%Y"), "%d/%m/%y")
df$Date2 <- as.Date(df$Date2,"%d/%m/%y")
It worked for me to "8/10/2005" as well as "08/13/05" in the same column.

Resources