I am experiment with R and came across an issue I don't fully understand.
dates = c("03-19-76", "04/19/76", as.character("04\19\76"), "05.19.76", "060766")
dates
[1] "03-19-76" "04/19/76" "04\0019>" "05.19.76" "060766"
Why should the third date be interpreted and what sort of interpretation is taking place. I also got this output when I left out the as.character function.
Thanks
Echoing the comments, make sure to escape backslashes in strings.
dates = c("03-19-76", "04/19/76", "04\\19\\76", "05.19.76", "060766")
> dates
[1] "03-19-76" "04/19/76" "04\\19\\76" "05.19.76" "060766"
Now that you've got the dates stored, there's actually a lot of built in functions you can use with dates. Dates even have their own object types! To do so use as.Date. Since you're using nonstandard date formats, you have to tell R how you've formatted them.
> as.Date(dates[1], "%m-%d-%y")
[1] "1976-03-19"
> as.Date(dates[2], "%m/%d/%y")
[1] "1976-04-19"
> as.Date("20\\10\\1999", "%d\\%m\\%Y")
[1] "1999-10-20"
a <- as.Date(dates[1], "%m-%d-%y")
b <- as.Date(dates[2], "%m/%d/%y")
> b - a
Time difference of 31 days
d <- as.numeric(b-a)
> d
[1] 31
> a + d^2
[1] "1978-11-05"
Note that since you're using 2-digit years, you use %y. If you used 4-digit years, you'd use %Y. If you forget, you'll get oddities like this:
> as.Date("03/14/2001", "%m/%d/%y")
[1] "2020-03-14"
> as.Date("03/14/10", "%m/%d/%Y")
[1] "0010-03-14"
Related
How could I modify raw date values. For example.
> DF2
Date
1 11012018
2 7312014
3 6102015
4 10202017
Into modified date values the one with "/"
> DF2
Date
1 11/01/2018
2 7/31/2014
3 6/10/2015
4 10/20/2017
Use lubridate for all date and time related tasks
> lubridate::mdy(c("11012018", "7/31/2014"))
[1] "2018-11-01" "2014-07-31"
You can also format it if needed:
format(lubridate::mdy(c("11012018", "7/31/2014")), "%m/%d/%Y")
[1] "11/01/2018" "07/31/2014"
Assuming: your date is in month-date-year format. Else you can use other lubridate functions
We could also use(It is assumed that you just need to add a new separator. In any case, you could convert back to date-time type):
new<-gsub("([0-9]{,2})([0-9]{2})([0-9]{4})","\\1 \\2 \\3",df$Date)
gsub(" ","/",new)
#[1] "11/01/2018" "7/31/2014" "6/10/2015" "10/20/2017"
Edit:
More generally as suggested by #jay.sf ,
test4<-gsub("(^[0-1]?\\d)([0-3]?\\d)(\\d{4}$)","\\1 \\2 \\3",df$Date)
gsub(" ","/",test4)
#[1] "11/01/2018" "7/31/2014" "6/10/2015" "10/20/2017"
This is to account for such date formats as:
test3<-c("11012018", "1112015", "7312014", "7312014", "10202017", "772007", "772007",
"7072007")
One possible solution could be:
df <- transform(df, V1 = as.Date(as.character(V1), "%d%m%Y"))
And another which may convert in the required mm/dd/yyyy format is as below:
df <- data.frame(lapply(df, function(x) as.Date(as.character(x), "%m%d%Y")))
Both the solutions are through the base R package.
I have dates in the format Apr42016, Aug12017, Apr112018. I am trying to convert in Y/m/d using R. I have tried the codes below but when I have a single digit for the day it returned NA. Anyone could help me, please?
strptime(data$date, "%b%e%Y")
as.Date (data$date, format="%b%d%Y")
as.POSIXct(data$date, format="%b%e%Y")
Thank you
You can modify the strings with sub (and add a 0 if necessary) before using as.Date:
myvec <- c("Apr42016", "Aug12017", "Apr112018") # the data
myvec2 <- sub("(?<=[^0])(?=[0-9]{5})", "0", myvec, perl = TRUE)
# [1] "Apr042016" "Aug012017" "Apr112018"
as.Date(myvec2, format = "%b%d%Y")
# [1] "2016-04-04" "2017-08-01" "2018-04-11"
If you can break up the numbers before as.Date, it will make things much easier. (Borrowing Sven's look-behind.)
sub("(?<=\\D)(\\d+)(\\d{4})$", "-\\1-\\2",
c("Apr42016", "Aug12017", "Apr112018"), perl=TRUE)
# [1] "Apr-4-2016" "Aug-1-2017" "Apr-11-2018"
From here, the format should be rather straight-forward:
as.Date(sub("(?<=\\D)(\\d+)(\\d{4})$", "-\\1-\\2", c("Apr42016", "Aug12017", "Apr112018"), perl = TRUE),
format="%b-%d-%Y")
# [1] "2016-04-04" "2017-08-01" "2018-04-11"
R uses the date "1970-01-01" as an origin. Does it make an exception from its typical 1-indexing to index dates with 0-indexing?
> x <- as.Date("1970-01-01")
> y <- as.Date("1970-01-02")
> unclass(x)
[1] 0
> unclass(y)
[1] 1
No. This is not an indexing thing. "Dates are represented as the number of days since 1970-01-01" (From the ?Date help page). Also note
unclass(as.Date("1969-12-31")) == -1
So it's not an index, it's a difference from a sentinel value. There's no underlying vector here.
I have a column of time values, except that they are in character format and do not have the colons to separate H, M, S. The column looks similar to the following:
Time
024201
054722
213024
205022
205024
125440
I want to convert all the values in the column to look like actual time values in the format H:M:S. The values are already in HMS format, so it is simply a matter of inserting colons, but that is proving more difficult than I thought. I found a package that adds commas every three digits from the right to make Strings look like currency values, but nothing for time (without also adding a date value, which I do not want to do). Any help would be appreciated.
Since the data is time related, you should consider storing it in a POSIX format:
> df <- data.frame(Time=c("024201", "054722", "213024", "205022", "205024", "125440")
> df$Time <- as.POSIXct(df$Time, format="%H%M%S")
> df
Time
1 2014-01-05 02:42:01
2 2014-01-05 05:47:22
3 2014-01-05 21:30:24
4 2014-01-05 20:50:22
5 2014-01-05 20:50:24
6 2014-01-05 12:54:40
To output just the times:
> format(df, "%H:%M:%S")
Time
1 02:42:01
2 05:47:22
3 21:30:24
4 20:50:22
5 20:50:24
6 12:54:40
A regular expression with lookaround works for this:
gsub('(..)(?=.)', '\\1:', x$Time, perl=TRUE)
The (?=.) means a character (matched by .) must follow, but is not considered part of the match (and is not captured).
Here is a regex solution:
x <- readLines(n=6)
024201
054722
213024
205022
205024
125440
gsub("(\\d\\d)(\\d\\d)(\\d\\d)", "\\1:\\2:\\3", x)
## [1] "02:42:01" "05:47:22" "21:30:24"
## [4] "20:50:22" "20:50:24" "12:54:40 "
Here the (\\d\\d) says we're looking for 2 digits. The parenthesis breaks the string into 3 parts. Then the \\1: says take chunk 1 and place a colon after it.
Or via date/times classes:
time <- c("024201", "054722", "213024", "205022", "205024", "125440")
time <- as.POSIXct(paste0("1970-01-01", time), format="%Y-%d-%m %H%M%S")
(time <- format(time, "%H:%M:%S"))
# [1] "02:42:01" "05:47:22" "21:30:24" "20:50:22" "20:50:24" "12:54:40"
This gives a chron "times" class vector:
> library(chron)
> times(gsub("(..)(..)(..)", "\\1:\\2:\\3", DF$Time))
[1] 02:42:01 05:47:22 21:30:24 20:50:22 20:50:24 12:54:40
The "times" class can display times without having to display the date and supports various methods on the times.
On the other hand, if only a character string is wanted then only the gsub part is needed.
I have a character string of the date in Year-week format as such:
weeks.strings <- c("2002-26", "2002-27", "2002-28", "2002-29", "2002-30", "2002-31")
However, converting this character to Date class results in a loss of week identifier:
> as.Date(weeks.strings, format="%Y-%U")
[1] "2002-08-28" "2002-08-28" "2002-08-28" "2002-08-28" "2002-08-28"
[6] "2002-08-28"
As shown above, the format is converted into year- concatenated with today's date, so any information about the original week is lost (ex - when using the format function or strptime to try and coerce back into the original format.
One solution I found in a help group is to specify the day of the week:
as.Date(weeks.strings, format="%Y-%u %U")
[1] "2002-02-12" "2002-02-19" "2002-02-26" "2002-03-05" "2002-01-02"
[6] "2002-01-09"
But it looks like this results in incorrect week numbering (doesn't match the original string).
Any guidance would be appreciated.
You just need to add a weekday to your weeks.strings in order to make the dates unambiguous (adapted from Jim Holtman's answer on R-help).
as.Date(paste(weeks.strings,1),"%Y-%U %u")
As pointed out in the comments, the Date class is not appropriate if the dates span a long horizon because--at some point--the chosen weekday will not exist in the first/last week of the year. In that case you could use a numeric vector where the whole portion is the year and the decimal portion is the fraction of weeks/year. For example:
wkstr <- sprintf("%d-%02d", rep(2000:2012,each=53), 0:52)
yrwk <- lapply(strsplit(wkstr, "-"), as.numeric)
yrwk <- sapply(yrwk, function(x) x[1]+x[2]/53)
Obviously, there's no unique solution, since each week could be represented by any of up to 7 different dates. That said, here's one idea:
weeks.strings <- c("2002-26", "2002-27", "2002-28", "2002-29",
"2002-30", "2002-31")
x <- as.Date("2002-1-1", format="%Y-%m-%d") + (0:52*7)
x[match(weeks.strings, format(x, "%Y-%U"))]
# [1] "2002-07-02" "2002-07-09" "2002-07-16" "2002-07-23"
# [5] "2002-07-30" "2002-08-06"