Adding characters within strings in an R dataframe column - r

This is the first line of my dataframe (with column names):
site, date, value
TEES, 20000314, 315
As you can see, the dates don't have separators (- or /), so I can't use as.Date. Thus, I need something like this:
TEES, 2000-03-14, 315
How do I do this? Presumably something with sub

Will this work:
as.Date(gsub('(\\d{4})(\\d{2})(\\d{2})','\\1-\\2-\\3',df$date))
[1] "2000-03-14"
Data:
df
site date value
1 TEES 20000314 315

You could use the ymd function from the lubridate package. This will automatically add "-" to separate YYYY-MM-DD and convert it to Date.
library(lubridate)
ymd(df$date)
# "2000-03-14"

You can use as.Date you just need to specify the tryFormats argument:
as.Date("20000314", tryFormats = c("%Y%m%d"))
[1] "2000-03-14"
The default is to try these formats: c("%Y-%m-%d", "%Y/%m/%d"), which don't match your current structure so you have to tell it how to read your structure.

We can use anydate from anytime
library(anytime)
anydate("20000314")
#[1] "2000-03-14"

Related

Format hms vector as custom format

I want to convert an hms vector into a character vector of a format like "13:15":
t <- hms::as_hms(ymd_hm("2021-07-23 13:15"))
Now formatting with something like format(t, "%H:%M") does not lead to the desired result.
Any suggestions?
Thanks
You may try regex approach to remove last 2 numbers from t.
sub(':\\d{2}$', '', t)
#[1] "13:15"
You can paste the components:
t <- hms::as_hms(ymd_hm("2021-07-23 13:15"))
paste0(hour(t),":",minute(t))
Output:
[1] "13:15"

Convert YYYYMM factor format to YYYY-MM format

I have data which have the format of YYYYMM and I wish convert it to YYYY-MM format.
exemple : 201805 should be in the format of 2018-05
How could I do it please ?
We can use as.yearmon from zoo to convert it to yearmon object and then do the format
library(zoo)
format(as.yearmon(as.character(v1), "%Y%m"), "%Y-%m")
#[1] "2018-05"
data
v1 <- 201805
I like the idea of using actual dates here. If the days component does not matter to you, then you may arbitrarily just set each of your dates to the first of the month. Then, we can leverage R's dates functions to handle the heavy lifting.
x <- "201805"
x <- paste0(x, "01")
x
y <- format(as.Date(x, format = "%Y%m%d"), "%Y-%m-%d")
substr(y, 1, 7)
[1] "20180501"
[1] "2018-05"
You could use regular expressions:
data <- "201805"
sub("(\\d{4})", "\\1-", data)
[1] "2018-05"
Another variant, using only lookarounds:
sub("(?<=\\d{4})(?=\\d{2})", "-", data, perl=TRUE)
How about following one(I am considering that OP need not to perform any checks on its variable's value here).
val="201805"
sub("(..$)","-\\1",val)
OR to perform substitution with last 2 digits only try following.
val="201805"
sub("(\\d{2}$)","-\\1",val)
[1] "2018-05"
Very similar to some of the others, but because I find the package useful I will mention it:
library(lubridate)
date <- "201805"
format(ymd(paste0(date,"01")), "%Y-%m")
Lubridate can make life easy if the formats start to vary.
Here is another option albeit a longer one:
library(tidyverse)
somestring<-"201805"
stringi::stri_sub(somestring,1,4)<-"-"
somestring1<-"201805"
somestring2<-substring(somestring1,1,4)
as.character.Date(paste0(somestring2,somestring))
Result:
"2018-05"

R - converting dates within data.frame

I am working with a "data.frame" which are given in the following formate: Aug 12, 2017.
class(data[,1]) = factor
How can i convert these into dates?
data[,1] <- as.Date.factor(data[,1],format = "%m.%d.%y"), returns NA's.
I would suggest the package lubridate for very easy to use functions to operate with dates. For example:
mdy("Aug 12,2017")
[1] "2017-08-12"
If your date is in YYYY-MM-DD format, you can use the ymd function. There are also other functions such as dmy, dmy_hms (for datetime), etc.
If your column is called my.date, you can do:
data$my.date <- mdy(data$my.date)
Alternatively, you can use the %<>% operator from magrittr to make your code even shorter:
data$my.date %<>% mdy
Use as.POSIXct (Base-R Solution):
as.POSIXct("Aug 12,2017", format="%b%d,%Y")
Output:
[1] "2017-08-12 CEST"
Using strptime, could work:
strptime("Aug 12,2017", "%b%d,%Y")
Output:
[1] "2017-08-12 UTC"
The second parameter for strptime is the format of the dates you have. For instance, if your dates are like this "1/5/2005", then the format would be:
format="%m/%d/%Y"
Hope it helps

R convert character "111213" into proper time which is "11:12:13"

R convert character "111213" into time "11:12:13".
strptime("111213", format="%H%m%s") gives NA
and
strptime("111213", "%H%m%s") gives 1970-01-01 01:00:13 CET
I think the canonical answer would be as in my comment:
format(strptime("111213", format="%H%M%S"), "%H:%M:%S")
#[1] "11:12:13"
where you can read ?strptime for all the details. format is a generic function, and in this specific case we are using format.POSIXlt.
Another solution is to merely play with string:
paste(substring("111213", c(1,3,5), c(2,4,6)), collapse = ":")
#[1] "11:12:13"
This makes sense because your input is really not a Date-Time: there is no Date.
We can use
library(chron)
times(gsub("(.{2})(?=\\d)", "\\1:", "111213", perl = TRUE))
#[1] 11:12:13
To manipulate time, you can use hms package.
By default, it working with %H:%M;%S (or %X format).
For you specifique time format ("111213"), you need to go through base function as.difftime
hms::as.hms(as.difftime("111213", format = "%H%M%S"))
#> 11:12:13
So if we incorporate also date in similar "integer" format we can obtain command :
strptime("20181017 112233", format="%Y%m%d %H%M%S")

extract part of a date in a dataframe column

thanks for your help in advance. i am working with the getQuote function in the quantmod package, which returns the following data frame:
is there a way to modify all the dates in the first column to exclude the time stamp, while retaining the data frame structure? i just want the "YYYY-MM-DD" in the first column. i know that if it was a vector of dates, i would use substr(df[,1],1,10). i have also looked into the apply function, with: apply(df[,1],1,substr,1,10).
Another option not mentioned yet:
tt <- getQuote("AAPL")
trunc(tt[,1], units='days')
This returns the date in POSIXlt. You can wrap it in as.POSIXct, if you want.
using ?strptime
tt <- getQuote("AAPL")
tt[,1]
[1] "2013-01-16 02:52:00 CET"
as.POSIXct(strptime(tt[,1],format ='%Y-%m-%d')) ## as.POSIXct because strptime returns POSIXlt
[1] "2013-01-16 CET"
EDIT
You can use the format argument of POSIXct, but you need to convert the tt[,1] to character before.
as.POSIXct(as.character(tt[,1]),format ='%Y-%m-%d')
[1] "2013-01-16 CET"
I would do this with lubridate
library(plyr)
library(lubridate)
tickers <- c("AAPL","AAJX","ABR")
df <- ldply(tickers, getQuote)
rownames(df) <- tickers
df[,"Trade Time"] <- paste(year(df[,"Trade Time"]),month(df[,"Trade Time"]),day(df[,"Trade Time"]),sep="-")
There might be a more elegant way of printing the date, but this is what came to me first.
You may just use gsub. No need to convert data type.
tt <- getQuote("AAPL")
tt[, 'Trade Time']<- gsub(" [0-9]{2}:[0-9]{2}:[0-9]{2}", "", tt[, 'Trade Time'])
It can be as simple as:
tt[,1]=as.Date(tt[,1])
(where tt is tt <- getQuote("AAPL"), as shown in the alternative answers)
The blank before the comma means "do all rows" and the 1 after the comma means "operate on (just) the first column".
I prefer this solution because it gives you a Date object, which must be exactly what you want if you are trying to strip off timestamps.
agstudy's answer give you a date with a timezone, and that is going to bite you the first time you run your script in a different timezone. (Aside: I got some regressions in a unit test suite when I ran them in the U.K. while there at Christmas, due to a subtle timezone assumption in my test code.)

Resources