convert string data.frame to Date - r

I am wondering why this error occurs. I would like to convert this using brackets as I am making sequential conversions in a loop. And because I just want to be able to do it and understand what is happening.
head(clean.deposit.rates)
Date
1 1/31/1983
2 2/28/1983
3 3/31/1983
4 4/30/1983
5 5/31/1983
6 6/30/1983
class(clean.deposit.rates)
[1] "data.frame"
class(as.Date(clean.deposit.rates[[1]], "%m/%d/%Y"))
[1] "Date"
class(as.Date(clean.deposit.rates$Date, "%m/%d/%Y"))
[1] "Date"
as.Date(clean.deposit.rates["Date"], "%m/%d/%Y")
Error in as.Date.default(clean.deposit.rates["Date"], "%m/%d/%Y") :
do not know how to convert 'clean.deposit.rates["Date"]' to class “Date”

You need to use two [ brackets. With one, the column remains as a data frame. With two, it becomes an atomic vector which can properly be passed to the correct as.Date method
as.Date(df["Date"], "%m/%d/%Y")
# Error in as.Date.default(df["Date"], "%m/%d/%Y") :
# do not know how to convert 'df["Date"]' to class “Date”
Since df["Date"] is class data.frame, the x argument uses as.Date.default because there is no as.Date.data.frame method. The error is triggered because x is FALSE for all the if statements and continues through as.Date.default to the line
stop(gettextf("do not know how to convert '%s' to class %s",
deparse(substitute(x)), dQuote("Date")), domain = NA)
Using df[["Date"]], the column becomes a vector and is passed to either as.Date.character or as.Date.factor depending on the class of the vector, and the desired result is returned.
as.Date(df[["Date"]], "%m/%d/%Y")
# [1] "1983-01-31" "1983-02-28" "1983-03-31" "1983-04-30" "1983-05-31"
# [6] "1983-06-30"

If you want to do this for multiple columns in a single data frame, then use the lapply function. Something like:
colNames <- c('StartDate','EndDate')
mydf[colNames] <- lapply( mydf[colNames], as.Date, "%m/%d/%Y" )

Related

data frame with mixed date format

I would like to change all the mixed date format into one format for example d-m-y
here is the data frame
x <- data.frame("Name" = c("A","B","C","D","E"), "Birthdate" = c("36085.0","2001-sep-12","Feb-18-2005","05/27/84", "2020-6-25"))
I hv tried using this code down here, but it gives NAs
newdateformat <- as.Date(x$Birthdate,
format = "%m%d%y", origin = "2020-6-25")
newdateformat
Then I tried using parse, but it also gives NAs which means it failed to parse
require(lubridate)
parse_date_time(my_data$Birthdate, orders = c("ymd", "mdy"))
[1] NA NA "2001-09-12 UTC" NA
[5] "2005-02-18 UTC"
and I also could find what is the format for the first date in the data frame which is "36085.0"
i did found this code but still couldn't understand what the number means and what is the "origin" means
dates <- c(30829, 38540)
betterDates <- as.Date(dates,
origin = "1899-12-30")
p/s : I'm quite new to R, so i appreciate if you can use an easier explanation thank youuuuu
You should parse each format separately. For each format, select the relevant rows with a regular expression and transform only those rows, then move on the the next format. I'll give the answer with data.table instead of data.frame because I've forgotten how to use data.frame.
library(lubridate)
library(data.table)
x = data.table("Name" = c("A","B","C","D","E"),
"Birthdate" = c("36085.0","2001-sep-12","Feb-18-2005","05/27/84", "2020-6-25"))
# or use setDT(x) to convert an existing data.frame to a data.table
# handle dates like "2001-sep-12" and "2020-6-25"
# this regex matches strings beginning with four numbers and then a dash
x[grepl('^[0-9]{4}-',Birthdate),Birthdate1:=ymd(Birthdate)]
# handle dates like "36085.0": days since 1904 (or 1900)
# see https://learn.microsoft.com/en-us/office/troubleshoot/excel/1900-and-1904-date-system
# this regex matches strings that only have numeric characters and .
x[grepl('^[0-9\\.]+$',Birthdate),Birthdate1:=as.Date(as.numeric(Birthdate),origin='1904-01-01')]
# assume the rest are like "Feb-18-2005" and "05/27/84" and handle those
x[is.na(Birthdate1),Birthdate1:=mdy(Birthdate)]
# result
> x
Name Birthdate Birthdate1
1: A 36085.0 2002-10-18
2: B 2001-sep-12 2001-09-12
3: C Feb-18-2005 2005-02-18
4: D 05/27/84 1984-05-27
5: E 2020-6-25 2020-06-25

R: How do I convert a dataframe of strings into POSIXt objects?

I have a dataframe of strings representing times, such as:
times <- structure(list(exp1 = c("17:19:04 \r", "17:28:53 \r", "17:38:44 \r"),
exp2 = c("17:22:04 \r", "17:31:53 \r", "17:41:45 \r")),
row.names = c(NA, 3L), class = "data.frame")
If I run strptime() on a single element of my dataframe times, it converts it into a nice POSIXt object:
strptime(times[1,1], '%H:%M:%S')
[1] "2020-02-19 17:19:04 GMT"
Great, so now I'd like to convert my whole dataframe times into this format.
I cannot seem to find the solution to do this smoothly.
A few of the things I have tried so far:
strptime(times, '%H:%M:%S') # generates NA
strftime(times, '%H:%M:%S') # Error: do not know how to convert 'x' to class “POSIXlt”
apply(times, 2, function(x) strftime(x, '%H:%M:%S')) # Error: character string is not in a standard unambiguous format
The closest I got to what I want is:
apply(times, 2, function(x) strptime(x, '%H:%M:%S'))
It generates a messy list. I can probably find a way to use it, but there must be a more staightforward way?
You could use lapply.
times[] <- lapply(times, strptime, '%H:%M:%S')
# exp1 exp2
# 1 2020-02-19 17:19:04 2020-02-19 17:22:04
# 2 2020-02-19 17:28:53 2020-02-19 17:31:53
# 3 2020-02-19 17:38:44 2020-02-19 17:41:45
Note: apply also works.
times[] <- apply(times, 2, function(x) strptime(x, '%H:%M:%S'))
The trick is to replace the columns (in contrast to overwriting the data frame with a list) with [] <-, which can be seen as abbreviated for times[1:2] <- lapply(times[1:2], ·) in this case.

R: What are dates in a dates vector: dates or numeric values? (difference between x[i] and i)

Could anyone explain please why in the first loop each element of my dates vector is a date while in the second each element of my dates vector is numeric?
Thank you!
x <- as.Date(c("2018-01-01", "2018-01-02", "2018-01-02", "2018-05-06"))
class(x)
# Loop 1 - each element is a Date:
for (i in seq_along(x)) print(class(x[i]))
# Loop 2 - each element is numeric:
for (i in x) print(class(i))
The elements are Date, the first loop is correct.
Unfortunately R does not consistently have the style of the second loop. I believe that the issue is that the for (i in x) syntax bypasses the Date methods for accessors like [, which it can do because S3 classes in R are very thin and don't prevent you from not using their intended interfaces. This can be confusing because something like for (i in 1:4) print(i) works directly, since numeric is a base vector type. Date is S3, so it is coerced to numeric. To see the numeric objects that are printing in the second loop, you can run this:
x <- as.Date(c("2018-01-01", "2018-01-02", "2018-01-02", "2018-05-06"))
for (i in x) print(i)
#> [1] 17532
#> [1] 17533
#> [1] 17533
#> [1] 17657
which is giving you the same thing as the unclassed version of the Date vector. These numbers are the days since the beginning of Unix time, which you can also see below if you convert them back to Date with that origin.
unclass(x)
#> [1] 17532 17533 17533 17657
as.Date(unclass(x), "1970-01-01")
#> [1] "2018-01-01" "2018-01-02" "2018-01-02" "2018-05-06"
So I would stick to using the proper accessors for any S3 vector types as you do in the first loop.
When you run:
for (i in seq_along(x)) print(class(x[i]))
You're using an iterator i over each element of x. Which means that each time you get the class of each iterated member of x.
However, when you run:
for (i in x) print(class(i))
You're looking for the class of each member. Using the ?Date:
Dates are represented as the number of days since 1970-01-01
Which is the reason why you get numeric as your class.
Moreover, if you'll use print() for each loop you'll get dates and numbers:
for (i in seq_along(x)) print(x[i])
[1] "2018-01-01"
[1] "2018-01-02"
[1] "2018-01-02"
[1] "2018-05-06"
and
for (i in x) print(i)
[1] 17532
[1] 17533
[1] 17533
[1] 17657
Lastly, if you want to test R's logic we can do something like that:
x[1] - as.Date("1970-01-01")
Taking the first element of x ("2018-01-01") and subtract "1970-01-01", which is the first date. Our output will be:
Time difference of 17532 days
If you look at ?'for', you'll see that for(var in seq) is only defined when seq is "An expression evaluating to a vector", and is.vector(x) is FALSE. So the documentation says (maybe not so clearly) that the behavior here is undefined, which is why the behavior is unexpected.
As joran mentions, as.vector(x) returns a numeric vector, same as unclass(x) mentioned by Calum You.

Why apply() converts date objects to numeric objects? [duplicate]

This question already has answers here:
why all date strings are changed into numbers?
(2 answers)
Closed 7 years ago.
Why apply() converts my date objects to numeric before calling the user function?
apply(matrix(seq(as.Date("2010-01-01"), as.Date("2010-01-05"), 1)), 1, function(x) { return(class(x)) })
[1] "numeric" "numeric" "numeric" "numeric" "numeric"
And why as.Date() doesn't have the origin parameter set to "1970-01-01" by default?
> as.Date(apply(matrix(seq(as.Date("2010-01-01"), as.Date("2010-01-05"), 1)), 1, function(x) { return(x) }))
Error in as.Date.numeric(apply(matrix(seq(as.Date("2010-01-01"), as.Date("2010-01-05"), :
'origin' must be supplied
> as.Date(apply(matrix(seq(as.Date("2010-01-01"), as.Date("2010-01-05"), 1)), 1, function(x) { return(x) }), origin="1970-01-01")
[1] "2010-01-01" "2010-01-02" "2010-01-03" "2010-01-04" "2010-01-05"
There is a function seq.Date in the base package that will allow you to make a sequence for a Date object. But a matrix will still only take atomic vectors, so you will either just have to call as.Date() again whenever you need to use the Date, or just store it in a dataframe because that can hold "Date" class values.
As far as the default parameter for as.Date, I don't think it makes sense to have 1970 set as the default. What if people are analyzing data from before that date for whatever possible reason?

Convert data from csv file into "xts" object

I have got CSV files which has the Date in the following format:
25-Aug-2004
I want to read it as an "xts" object so as to use the function "periodReturn" in quantmod package.
Can I use the following file for the function?
Symbol Series Date Prev.Close Open.Price High.Price Low.Price
1 XXX EQ 25-Aug-2004 850.00 1198.70 1198.70 979.00
2 XXX EQ 26-Aug-2004 987.95 992.00 997.00 975.30
Guide me with the same.
Unfortunately I can't speak for the ts part, but this is how you can convert your dates to a proper format that can be read by other functions as dates (or time).
You can import your data into a data.frame as usual (see here if you've missed it). Then, you can convert your Date column into a POSIXlt (POSIXt) class using strptime function.
nibha <- "25-Aug-2004" # this should be your imported column
lct <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C") #temporarily change locale to C if you happen go get NAs
strptime(nibha, format = "%d-%b-%Y")
Sys.setlocale("LC_TIME", lct) #revert back to your locale
Try this. We get rid of the nuisance columns and specify the format of the time index, then convert to xts and apply the dailyReturn function:
Lines <- "Symbol Series Date Prev.Close Open.Price High.Price Low.Price
1 XXX EQ 25-Aug-2004 850.00 1198.70 1198.70 979.00
2 XXX EQ 26-Aug-2004 987.95 992.00 997.00 975.30"
library(quantmod) # this also pulls in xts & zoo
z <- read.zoo(textConnection(Lines), format = "%d-%b-%Y",
colClasses = rep(c(NA, "NULL", NA), c(1, 2, 5)))
x <- as.xts(z)
dailyReturn(x)
Of course, textConnection(Lines) is just to keep the example self contained and in reality would be replaced with something like "myfile.dat".

Resources