I have a dataframe of strings representing times, such as:
times <- structure(list(exp1 = c("17:19:04 \r", "17:28:53 \r", "17:38:44 \r"),
exp2 = c("17:22:04 \r", "17:31:53 \r", "17:41:45 \r")),
row.names = c(NA, 3L), class = "data.frame")
If I run strptime() on a single element of my dataframe times, it converts it into a nice POSIXt object:
strptime(times[1,1], '%H:%M:%S')
[1] "2020-02-19 17:19:04 GMT"
Great, so now I'd like to convert my whole dataframe times into this format.
I cannot seem to find the solution to do this smoothly.
A few of the things I have tried so far:
strptime(times, '%H:%M:%S') # generates NA
strftime(times, '%H:%M:%S') # Error: do not know how to convert 'x' to class “POSIXlt”
apply(times, 2, function(x) strftime(x, '%H:%M:%S')) # Error: character string is not in a standard unambiguous format
The closest I got to what I want is:
apply(times, 2, function(x) strptime(x, '%H:%M:%S'))
It generates a messy list. I can probably find a way to use it, but there must be a more staightforward way?
You could use lapply.
times[] <- lapply(times, strptime, '%H:%M:%S')
# exp1 exp2
# 1 2020-02-19 17:19:04 2020-02-19 17:22:04
# 2 2020-02-19 17:28:53 2020-02-19 17:31:53
# 3 2020-02-19 17:38:44 2020-02-19 17:41:45
Note: apply also works.
times[] <- apply(times, 2, function(x) strptime(x, '%H:%M:%S'))
The trick is to replace the columns (in contrast to overwriting the data frame with a list) with [] <-, which can be seen as abbreviated for times[1:2] <- lapply(times[1:2], ·) in this case.
Related
I would like to change all the mixed date format into one format for example d-m-y
here is the data frame
x <- data.frame("Name" = c("A","B","C","D","E"), "Birthdate" = c("36085.0","2001-sep-12","Feb-18-2005","05/27/84", "2020-6-25"))
I hv tried using this code down here, but it gives NAs
newdateformat <- as.Date(x$Birthdate,
format = "%m%d%y", origin = "2020-6-25")
newdateformat
Then I tried using parse, but it also gives NAs which means it failed to parse
require(lubridate)
parse_date_time(my_data$Birthdate, orders = c("ymd", "mdy"))
[1] NA NA "2001-09-12 UTC" NA
[5] "2005-02-18 UTC"
and I also could find what is the format for the first date in the data frame which is "36085.0"
i did found this code but still couldn't understand what the number means and what is the "origin" means
dates <- c(30829, 38540)
betterDates <- as.Date(dates,
origin = "1899-12-30")
p/s : I'm quite new to R, so i appreciate if you can use an easier explanation thank youuuuu
You should parse each format separately. For each format, select the relevant rows with a regular expression and transform only those rows, then move on the the next format. I'll give the answer with data.table instead of data.frame because I've forgotten how to use data.frame.
library(lubridate)
library(data.table)
x = data.table("Name" = c("A","B","C","D","E"),
"Birthdate" = c("36085.0","2001-sep-12","Feb-18-2005","05/27/84", "2020-6-25"))
# or use setDT(x) to convert an existing data.frame to a data.table
# handle dates like "2001-sep-12" and "2020-6-25"
# this regex matches strings beginning with four numbers and then a dash
x[grepl('^[0-9]{4}-',Birthdate),Birthdate1:=ymd(Birthdate)]
# handle dates like "36085.0": days since 1904 (or 1900)
# see https://learn.microsoft.com/en-us/office/troubleshoot/excel/1900-and-1904-date-system
# this regex matches strings that only have numeric characters and .
x[grepl('^[0-9\\.]+$',Birthdate),Birthdate1:=as.Date(as.numeric(Birthdate),origin='1904-01-01')]
# assume the rest are like "Feb-18-2005" and "05/27/84" and handle those
x[is.na(Birthdate1),Birthdate1:=mdy(Birthdate)]
# result
> x
Name Birthdate Birthdate1
1: A 36085.0 2002-10-18
2: B 2001-sep-12 2001-09-12
3: C Feb-18-2005 2005-02-18
4: D 05/27/84 1984-05-27
5: E 2020-6-25 2020-06-25
This question already has answers here:
Specify custom Date format for colClasses argument in read.table/read.csv
(4 answers)
How to avoid: read.table truncates numeric values beginning with 0
(3 answers)
Closed 3 years ago.
After I imported the text file into R, R omitted "0" in the time column.
For example:
Before import time | After import time
077250 | 77250
000002 | 2
Thus, unable to convert to the correct time format. (from 77250 to 07:25:50)
How can i convert the integer time to the correct time format?
I have tried:
chron (time, "%H:%M:%S")
strptime(time, "%H:%M:%S")
time <- as.hms(time)
You can use str_pad from the stringr package to restore the zeroes:
library(stringr)
time_old <- "2"
time_new <- str_pad(time_old, width = 6, side = "left", pad = 0)
Then, you should be able to use the chron function:
chron::chron(times = time_new, format = list(times = "hms"),
out.format = "h:m:s")
[1] 00:00:02
We can use sprintf and strptime/as.POSIXct
If you have read them as numeric use %d in sprintf or use %s if they are characters.
x <- c(072550, 2)
format(strptime(sprintf("%06d", x), "%H%M%S"), "%T")
#[1] "07:25:50" "00:00:02"
x <- c("072550", "2")
format(strptime(sprintf("%06s", x), "%H%M%S"), "%T")
#[1] "07:25:50" "00:00:02"
This possibly duplicate question shows how to read the data in the format you want directly, by specifying your own formatting function through colClasses :
setAs("character","myDate", function(from) as.Date(from, format="%Y%m%d") )
setAs("character","myTime", function(from) chron(times = from, format = "hms", out.format = "h:m:s"))
tmp <- c("1\t20080815\t072550", "2\t20100523\t000002")
con <- textConnection(tmp)
tmp2 <- read.delim(con, colClasses=c('numeric','myDate','myTime'), header=FALSE)
tmp2 contains :
V1 V2 V3
1 1 2008-08-15 07:25:50
2 2 2010-05-23 00:00:02
read.delim is a shortcut for read.table that sets a few defaults and passes any extra parameters like colClasses directly to read.table
I have dates in the format Apr42016, Aug12017, Apr112018. I am trying to convert in Y/m/d using R. I have tried the codes below but when I have a single digit for the day it returned NA. Anyone could help me, please?
strptime(data$date, "%b%e%Y")
as.Date (data$date, format="%b%d%Y")
as.POSIXct(data$date, format="%b%e%Y")
Thank you
You can modify the strings with sub (and add a 0 if necessary) before using as.Date:
myvec <- c("Apr42016", "Aug12017", "Apr112018") # the data
myvec2 <- sub("(?<=[^0])(?=[0-9]{5})", "0", myvec, perl = TRUE)
# [1] "Apr042016" "Aug012017" "Apr112018"
as.Date(myvec2, format = "%b%d%Y")
# [1] "2016-04-04" "2017-08-01" "2018-04-11"
If you can break up the numbers before as.Date, it will make things much easier. (Borrowing Sven's look-behind.)
sub("(?<=\\D)(\\d+)(\\d{4})$", "-\\1-\\2",
c("Apr42016", "Aug12017", "Apr112018"), perl=TRUE)
# [1] "Apr-4-2016" "Aug-1-2017" "Apr-11-2018"
From here, the format should be rather straight-forward:
as.Date(sub("(?<=\\D)(\\d+)(\\d{4})$", "-\\1-\\2", c("Apr42016", "Aug12017", "Apr112018"), perl = TRUE),
format="%b-%d-%Y")
# [1] "2016-04-04" "2017-08-01" "2018-04-11"
I am wondering why this error occurs. I would like to convert this using brackets as I am making sequential conversions in a loop. And because I just want to be able to do it and understand what is happening.
head(clean.deposit.rates)
Date
1 1/31/1983
2 2/28/1983
3 3/31/1983
4 4/30/1983
5 5/31/1983
6 6/30/1983
class(clean.deposit.rates)
[1] "data.frame"
class(as.Date(clean.deposit.rates[[1]], "%m/%d/%Y"))
[1] "Date"
class(as.Date(clean.deposit.rates$Date, "%m/%d/%Y"))
[1] "Date"
as.Date(clean.deposit.rates["Date"], "%m/%d/%Y")
Error in as.Date.default(clean.deposit.rates["Date"], "%m/%d/%Y") :
do not know how to convert 'clean.deposit.rates["Date"]' to class “Date”
You need to use two [ brackets. With one, the column remains as a data frame. With two, it becomes an atomic vector which can properly be passed to the correct as.Date method
as.Date(df["Date"], "%m/%d/%Y")
# Error in as.Date.default(df["Date"], "%m/%d/%Y") :
# do not know how to convert 'df["Date"]' to class “Date”
Since df["Date"] is class data.frame, the x argument uses as.Date.default because there is no as.Date.data.frame method. The error is triggered because x is FALSE for all the if statements and continues through as.Date.default to the line
stop(gettextf("do not know how to convert '%s' to class %s",
deparse(substitute(x)), dQuote("Date")), domain = NA)
Using df[["Date"]], the column becomes a vector and is passed to either as.Date.character or as.Date.factor depending on the class of the vector, and the desired result is returned.
as.Date(df[["Date"]], "%m/%d/%Y")
# [1] "1983-01-31" "1983-02-28" "1983-03-31" "1983-04-30" "1983-05-31"
# [6] "1983-06-30"
If you want to do this for multiple columns in a single data frame, then use the lapply function. Something like:
colNames <- c('StartDate','EndDate')
mydf[colNames] <- lapply( mydf[colNames], as.Date, "%m/%d/%Y" )
I have got CSV files which has the Date in the following format:
25-Aug-2004
I want to read it as an "xts" object so as to use the function "periodReturn" in quantmod package.
Can I use the following file for the function?
Symbol Series Date Prev.Close Open.Price High.Price Low.Price
1 XXX EQ 25-Aug-2004 850.00 1198.70 1198.70 979.00
2 XXX EQ 26-Aug-2004 987.95 992.00 997.00 975.30
Guide me with the same.
Unfortunately I can't speak for the ts part, but this is how you can convert your dates to a proper format that can be read by other functions as dates (or time).
You can import your data into a data.frame as usual (see here if you've missed it). Then, you can convert your Date column into a POSIXlt (POSIXt) class using strptime function.
nibha <- "25-Aug-2004" # this should be your imported column
lct <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C") #temporarily change locale to C if you happen go get NAs
strptime(nibha, format = "%d-%b-%Y")
Sys.setlocale("LC_TIME", lct) #revert back to your locale
Try this. We get rid of the nuisance columns and specify the format of the time index, then convert to xts and apply the dailyReturn function:
Lines <- "Symbol Series Date Prev.Close Open.Price High.Price Low.Price
1 XXX EQ 25-Aug-2004 850.00 1198.70 1198.70 979.00
2 XXX EQ 26-Aug-2004 987.95 992.00 997.00 975.30"
library(quantmod) # this also pulls in xts & zoo
z <- read.zoo(textConnection(Lines), format = "%d-%b-%Y",
colClasses = rep(c(NA, "NULL", NA), c(1, 2, 5)))
x <- as.xts(z)
dailyReturn(x)
Of course, textConnection(Lines) is just to keep the example self contained and in reality would be replaced with something like "myfile.dat".