I am trying to convert date objects into a date class using lubridate. These are the following dates in the "wrong" format"
wrong_format_date1 <- "01-25-1999"
wrong_format_date2 <- 25012005
wrong_format_date3 <- "2005-05-31"
But I would like them in this format:
"1999-01-25"
"2005-01-25"
"2005-05-31"
Can someone please assist me with this?
Try this. Use parse_date_time() from lubridate where you can define a vector with possible formats so that you get your strings parsed as dates. Here the code:
library(lubridate)
#Data
wrong_format_date1 <- "01-25-1999"
wrong_format_date2 <- 25012005
wrong_format_date3 <- "2005-05-31"
#Dataframe
df <- data.frame(v1=c(wrong_format_date1,wrong_format_date2,wrong_format_date3),stringsAsFactors = F)
#Code
df$Date <- as.Date(parse_date_time(df$v1, c("mdY", "dmY","Ymd")))
Output:
v1 Date
1 01-25-1999 1999-01-25
2 25012005 2005-01-25
3 2005-05-31 2005-05-31
Related
I have dates listed as "X5.13.1996", representing May 13th, 1996. The class for the date column is currently a character.
When using mdy from lubridate, it keeps populating NA. Is there a code I can use to get rid of the "X" to successfully use the code? Is there anything else I can do?
You can use substring(date_variable, 2) to drop the first character from the string.
substring("X5.13.1996", 2)
[1] "5.13.1996"
To convert a variable (i.e., column) in your data frame:
library(dplyr)
library(lubridate)
dates <- data.frame(
dt = c("X5.13.1996", "X11.15.2021")
)
dates %>%
mutate(converted = mdy(substring(dt, 2)))
or, without dplyr:
dates$converted <- mdy(substring(dates$dt, 2))
Output:
dt converted
1 X5.13.1996 1996-05-13
2 X11.15.2021 2021-11-15
as I was trying to analyze a dataset from kaggle, I run in some conversion issues. I want to retrieve an ISO date à la "2022-04-31" from "4/31/2022 8:26".
My first idea was a classical programming approach via loop and if-logic - way too much afford. The problem here are the missing leading zeroes.
The second approach was to separate the column string values via str_split and then convert it together again:
################################################################################
# START OF SCRIPT
################################################################################
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(lubridate))
################################################################################
# ETL
################################################################################
#---->> https://www.kaggle.com/carrie1/ecommerce-data
raw_data <- read.csv("data 2.csv", sep = ",")
clean_data <- raw_data %>% drop_na()
clean_data <- clean_data[!duplicated(clean_data[,1:8]),]
#
## date conversion
#
split <- str_split(clean_data$InvoiceDate, "/") %>% plyr::ldply(,data.frame)
colnames(split) <- c("month", "day", "year")
split$year <- substr(split$year, 1,4)
######
filled_day = as.Date(split$day, format = "%d")
str_day <- substr(filled_day, 9,10)
For the day column it seems to work like that, but I am failing to reconvert the month with base and lubridate. Maybe my approach is either too complex or too simple. Please share your ideas with me
You can use as.Date with the format %m/%d/%Y.
as.Date("4/30/2022 8:26", "%m/%d/%Y")
#[1] "2022-04-30"
But this will work only for valid dates.
as.Date("4/31/2022 8:26", "%m/%d/%Y")
#[1] NA
as there is no 31 April.
Another way is using sub and gsub not testing if the date is valid:
gsub("\\b(\\d)\\b", "0\\1"
, sub("(\\d+)/(\\d+)/(\\d+).*", "\\3-\\1-\\2", "4/31/2022 8:26"))
#[1] "2022-04-31"
My problem is that I am importing a CSV file, and trying to get R to recognize the date column as dates and format them as such.
So far I have achieved to replace the format seen below "#yyyy-mm-dd#" with the integer date value in R.
But when I check the class before and after the transformation it still says "character".
I need the column to be recognized as a date class so that I can use it for forecasting. But
DemandCSV <- read_csv("C:/Users/pth/Desktop/Care/Demand.csv")
nrow <- nrow(DemandCSV)
for(i in 1:nrow){
DemandCSV[i,1] <-as.Date(ymd(substr(DemandCSV[i,1], 2, 11)))
}
DemandCSV[,1] <- format(DemandCSV[,1], "%Y-%m-%d")
Figured out an inelegant solution (turns out it was not a solution)
DemandCSV <- read_csv("C:/Users/pth/Desktop/Care/Demand.csv")
nrow <- nrow(DemandCSV)
for(i in 1:nrow){
DemandCSV[i,1] <-as.Date(ymd(substr(DemandCSV[i,1], 2, 11)))
DemandCSV[i,1] <- format(as.Date(as.numeric(DemandCSV[i,1],origin = "01-01-1970")), "%Y-%m-%d")}
DemandCSV %>% pad %>% fill_by_value(0)
Does including the "#" in the format string solve your problem?
data <- c("#2019-09-23#", "#2019-09-24#", "#2019-09-25#")
a <- as.Date(data,format="#%Y-%m-%d#")
or
DemandCSV <- data.frame(date=
c("#2019-09-23#", "#2019-09-24#", "#2019-09-25#"))
mutate_at(DemandCSV,"date",as.Date,format="#%Y-%m-%d#")
Maybe simpler to
Substitute out the #
Rely on anydate from the anytime package
Demo:
R> data <- c("#2019-09-23#", "#2019-09-24#", "#2019-09-25#")
R> anytime::anydate(gsub("#", "", data))
[1] "2019-09-23" "2019-09-24" "2019-09-25"
R>
it is possible to use the package dygraphs with my data in the format day:
Day KP1 KP2 KP3
02.01.2007 12345 54564 5156156
03.01.2007
I have tried a lot to convert the day to format for dygraphs, but it doesn't work. For example this one:
Data$Day <- strptime(myData2[,1],format="%d.%m.%Y")
Data <- xts(Data[,1], order.by = myData[,1])
My question is: Is it possible to use dygraphs for data which is in the day-format?
Thank a. Greets R007
Are you looking for this:
Data <- read.table(text="
Day KP1 KP2 KP3
02.01.2007 12345 54564 5156156
03.01.2007 10346 50565 5156140
04.01.2007 9346 44565 5156140",
header = TRUE)
Data$Day <- strptime(Data[,1],format="%d.%m.%Y")
Data <- xts(Data[,2:4], order.by = Data[,1])
dygraph(data=Data)
Please help as I have a csv file of large database with date column having various format of dates like 20080408 or 2008/04/08 or 08/04/2008.How do i change these format to one format of dd/mm/yyyy.In R Programing
You can do it with failure tests via lubridate dmy and mdy conversions as well (hence the suppressWarnings() calls. I don't think you're going to be able to ensure proper handling of things like "08/04/2008" if 08 is supposed to be the "day" component, tho, given that the functions can't read minds.
library(lubridate)
dat <- c("20080408", "2008/04/08", "08/04/2008")
dat.1 <- unlist(lapply(dat, function(x) {
suppressWarnings(res <- mdy(x))
if (is.na(res)) { suppressWarnings(res <- ymd(x)) }
return(as.character(res))
}))
dat.1
## [1] "2008-04-08" "2008-04-08" "2008-08-04"
The following should work for your data.frame. You may need to convert your date column to the class as.character in order that the string split function strsplit works correctly. After tha, the loop simply evaluates how many characters are in the string before the first "/" character, and adjusts the formatting accordingly.
Example:
df <- data.frame(DATE=as.character(c("20080408", "2008/04/08", "08/04/2008")), DATE2=as.Date(NA))
df$DATE=as.character(df$DATE)
for(i in seq(df$DATE)){
sp <- unlist(strsplit(df$DATE[i], "/"))
if(nchar(sp[1]) == 8){
df$DATE2[i] <- as.Date(df$DATE[i], format="%Y%m%d")
}
if(nchar(sp[1]) == 4){
df$DATE2[i] <- as.Date(df$DATE[i], format="%Y/%m/%d")
}
if(nchar(sp[1]) == 2){
df$DATE2[i] <- as.Date(df$DATE[i], format="%d/%m/%Y")
}
}
Result:
df
# DATE DATE2
#1 20080408 2008-04-08
#2 2008/04/08 2008-04-08
#3 08/04/2008 2008-04-08
You can read them as character values and convert them using as.Date.
x1 <- '20080408' ## class character (string)
x2 <- '2008/04/08'
x1.dt <- as.Date(x1, format='%Y%m%d')
x2.dt <- as.Date(x2, format='%Y/%m/%d') ## different format
print(c(x1, x2), format='%d/%m/%Y') ## you can return Date objects in any format you want
Check out ?strftime for all the formatting options.