Add leading Zero to a Date - r

I have dates in a character vector. I cannot easily convert to a date vector using as.Date, because not all of the strings have the form mm/dd/yyyy, thus giving me the ambiguous date error. Some strings have the form m/dd/yyyy (months 1:9).
Here's part of the vector:
data$Date <- c("8/26/2014","3/10/2014","9/25/2014","11/12/2014","8/4/2015")
Indicator for date to let me know which strings I need to add a zero to
data$date <- grepl("[0-9]{2}/[0-9]{2}/[0-9]{4}", data$Date)
Attempt to add zeros through a conditional:
data$Date<-ifelse(data$date == "FALSE", paste0("0", data$Date), data$Date)
Doesn't work (I'm not familiar with paste). Any concise solutions to add a leading zero to single digit months (m/dd/yyy)? I'm guessing gsub or sub? I need all the strings to be in form mm/dd/yyy so I can convert to a date vector.

data <- data.frame(Date=c("8/26/2014","3/10/2014","9/25/2014","11/12/2014","8/4/2015"))
as.Date(data$Date,format="%m/%d/%Y")
works fine for me with your data. Output is
"2014-08-26" "2014-03-10" "2014-09-25" "2014-11-12" "2015-08-04"

Related

R: Daily data to monthly

I have a large xts object, with multiple variable. The index is daily in that manner, it corresponds to exact days, however there is only one observation for each variable in a month. Is there a way to drop the day from the index and only keep year-month?
To ilustrate my problem for instance I have var1 with an observation on 2011-06-28 and var2 with observation 2011-06-30. I would like to index both as 2011-06
Thanks
alternatively you could "tell" R that you are using dates of a certain format with the as.Date() function and then use format() to change it to the format you desire.
Like this:
dates=c("2011-06-28","2011-06-29","2011-06-30","2011-07-1") #test string with dates in original format
dates2 <- format(as.Date(dates,"%Y-%m-%d"), format="%Y-%m") #changing the "%Y-%m-%d" format to the desired "%Y-%m"
print(dates2)
Edit: If you only want to change the index of a xts:
indexFormat(xts_object) <- "%Y-%m"
Cheers
Chris
You can probably do this:
Use gsub (replace a pattern with whatever you want) with regex (a sequence of characters that define a search pattern in e.g. a string).
The pattern is done with regex, which has lots of metacharacters that allow you to do more advanced things. The dot (.) is a wildcard and the $ anchors it at the back. So the pattern is basically any 3 characters before the end and replace them with nothing.
your_object<-c("2011-06-28","2011-06-30")
gsub(pattern = "...$", replace = "", x = your_object)
Here is a guide for using gsub with regex (http://uc-r.github.io/regex).

How to change dataframe R column from DD/MM/YYYY to YYYY/MM/DD without returning NAs

I have a dataframe that looks like this (see picture below). I want to change the date from DD/MM/YYYY to YYYY/MM/DD but for some reason it returns "NA" values! I think it has to do with the time values behind the date (I do not need those values).
The code I used was this (supposing DF is the data frame)
DF[,1] <- as.Date(DF[,1] , format = "%d-%m-%Y")```
Gregor Thomas gave me the answer: The format you show in the picture has slashes /, but the format string you use has dashes -. Try format = "%d/%m/%Y"

Formatting dates using a condition

I am using "R" to format a character variable that has two different kinds of date formats (MM-DD-YYYY & YYYY-MM-DD). The second is an excel origin date.
DateVar <- c("12-07-2017", "43229", "43137", "03-27-2018")
I created vector using grepl to identify both types and then a for loop to apply the as.date function to only the "excel origin dates".
indicator <- !grepl("-", DateVar)
for(i in indicator == TRUE){
as.date(DateVar, origin = "1899-12-30")
It is not working for me however, so I am looking if someone can point me in the right direction.
Thanks.
Couple of things: The for loop is unnecessary - just subset DateVar with [indicator]. Second, it's as.Date, not as.date (note the "D"). Third, since it's a character vector, you need to pass the origin numbers through as.integer for as.Date to be able to work with them:
as.Date(as.integer(DateVar[indicator]), origin = "1899-12-30")
(or, without the intervening indicator assignment:
as.Date(as.integer(DateVar[!grepl("-",DateVar)]), origin = "1899-12-30")
[1] "2018-05-09" "2018-02-06"
If you wish to input these dates back into DateVar, you again use the subset function:
DateVar[indicator]<-format(as.Date(as.integer(DateVar[indicator]), origin = "1899-12-30"), "%m-%d-%Y")

Fail to extract the date from the timestamp by using as.date and if.else

I have read a csv file in as mydata, an existing column called inbound_date, contain the data like
NULL
2017-06-24 16:47:35
2017-06-24 16:47:35
I want to create a new column to extract the day for this column. i have tried below code, but failed,
mydata$inbound_day<-ifelse(is.null(mydata$inbound_date),"null",as.Date(mydata$inbound_date,format = "%Y-%m-%d"))
The new column inbound_day has been added, but it shows as NA in the column for all the rows.
Can help to see the code, which part is wrong? Thanks!
There are two things at play here.
The behaviour of ifelse. It will return as many values as the
length of the condition. If the condition returns only one value, ifelse
too will return a single value.
The behaviour of is.null is not the same as that of is.na. Unlike is.na, is.null(mydata$inbound_date) is checking the whole
of mydata$inbound_date1 as a single object and you are getting
just one value in return, which is False
.
The combined effect of these two things is that you are only getting the as.Date value for the first item as result, and it is a single NA. What's more, this `NA is then being recycled to fill the whole column with NAs.
Solution -- Use is.na where you are using is.null. It will return multiple values and the thing will work as expected.
You have to specify the time as well.
x <- as.POSIXlt("2017-06-24 16:47:35", format = "%Y-%m-%d %H:%M:%S")
format(x, "%Y-%m-%d")
[1] "2017-06-24"
Using lubridate to format instead of as.date then extracting the day
library(lubridate)
x <- ymd_hms("2017-06-24 16:47:35")
format(x, "%d")

as.posixct when applied for an element in the data frame returns a number instead of date and time

Here is the existing data:
I have 2 columns of data. Each row of the first column has data whereas only certain rows of the second column has data (others being blank). I want to convert the format of the data with the help of as.POSIXct(). For the first column I used the following code (I named the data frame as 'mrkt'):
mrkt[1]<-lapply(mrkt[1],as.POSIXct)
This worked well in terms of converting the existing data to the right format
For the second column the above code won't work as the as.POSIXct() cannot address "" values. So I wrote a loop instead:
for (i in 1:dim(mrkt[2])[1]){
if (!as.character(mrkt[[2]][i])==""){
mrkt$open_time[i]<-as.POSIXct(mrkt$open_time[i])
}
}
However this is giving me weird outputs in the form of a number. How can I avoid that? Here is the output:
An easy way to do this would be to do this:
library(plyr)
library(dplyr)
mrkt %>%
mutate(send_time = send_time %>%
as.POSIXct,
open_time = open_time %>%
mapvalues("", NA) %>%
as.POSIXct)
This is due to implicit typecasting from POSIXct to numeric. This only happens in the loop because the vector has an assigned type and values are casted to this type if single values are assigned. When the whole vector is replaced a new vector is created with the right type.
The simplest solution is to use as.POSIXct(strptime(mrkt$open_time, format=yourformat)), with a correctly defined format, see ?strptime for the formats. This is vectorized, and strptime handles empty Strings correctly (returning NA).

Resources