I am trying to transform a list of numbers (e.g. 20200119) into a valid date (here: 2020-01-19)
This is my trial data:
df <- data.frame(c(20200119, 20180718, 20180729, 20150502, 20010301))
colnames(df)[1] = "Dates"
And this is what I tried so far:
df <- as_date(df)
df <- as.Date.numeric(df)
df <- as.Date.factor(df)
Neither of them works unfortunately.
I also tried to seperate the numbers, but I couldn't achieve either.
Can somebody help me?
Convert it to a character and convert it then to a Date with given format %Y%m%d:
as.Date(as.character(df$Dates), "%Y%m%d")
#[1] "2020-01-19" "2018-07-18" "2018-07-29" "2015-05-02" "2001-03-01"
Another option using strptime with the right format like this:
df <- data.frame(c(20200119, 20180718, 20180729, 20150502, 20010301))
colnames(df)[1] = "Dates"
df$Dates2 <- strptime(df$Dates, format = "%Y%m%d")
df
#> Dates Dates2
#> 1 20200119 2020-01-19
#> 2 20180718 2018-07-18
#> 3 20180729 2018-07-29
#> 4 20150502 2015-05-02
#> 5 20010301 2001-03-01
Created on 2023-01-12 with reprex v2.0.2
Related
I'm trying to convert all the column of my dataframe to the date format, their class is currently chr.
My code is working without the loop (when I specify the column: df$colname)
df_2 <- data.frame(matrix(nrow = nrow(df)))
for (i in colnames(df)){
var <- data.frame(as.POSIXct(df$i, format ="%Y-%m-%d %H:%M:%S"))
df_2 <- cbind(df_2, var)
}
My dataframe (df):
col1
col2
2015-01-02 09:43:45
2015-01-05 10:08:48
2015-01-02 10:15:44
2015-01-05 10:21:05
col1<- c("2015-01-02 09:43:45 ", "2015-01-02 10:15:44")
col2 <- c("2015-01-05 10:08:48","2015-01-05 10:21:05")
df <- data.frame(col1, col2)
Lets create df2 with the same structure of df:
df_2 <- data.frame(matrix(nrow=nrow(df),ncol=ncol(df)))
Then, the loop should run in a sequence. In your case is the number of columns. Then you want to add the formatted data as date, to df2, according with the number of columns. So, the loop should work like this:
for (i in 1:ncol(df)){
df_2[i]<-as.POSIXct(df[,i],format ="%Y-%m-%d %H:%M:%S")
colnames(df_2)<-names(df)
}
You can do it without using loops at all, if you want to. Here's a tidyverse solution:
library(tidyverse)
d <- tibble(
col1=c("2015-01-02 09:43:45", "2015-01-05 10:08:48"),
col2=c("2015-01-02 10:15:44", "2015-01-05 10:21:05")
)
d %>% mutate(across(everything(), ~as.POSIXct(., format ="%Y-%m-%d %H:%M:%S")))
# A tibble: 2 × 2
col1 col2
<dttm> <dttm>
1 2015-01-02 09:43:45 2015-01-02 10:15:44
2 2015-01-05 10:08:48 2015-01-05 10:21:05
I am unable to reproduce OP's issue with my solution. I've provided the output I obtain above.
It might possibly be a version issue. The versions of the packages I am using are given below:
> packageVersion("tidyverse")
[1] ‘1.3.1’
> packageVersion("tibble")
[1] ‘3.1.5’
> packageVersion("dplyr")
[1] ‘1.0.7’
> packageVersion("base")
[1] ‘4.1.0’
Beyond that, I have no suggestions.
Also, please bear in mind that it is not good practice to upload code, results or data as images for these reasons.
I read in an excel file, where 1 column contains dates in different format: excel format (e.g. 43596) and text (e.g. "01.01.2020").
To convert excel format one can use as.Date(as.numeric(df$date), origin = "1899-12-30")
to convert text one can use as.Date(df$date, format = "%d.%m.%Y")
These work for individual values, but when I try ifelse as:
df$date <- ifelse(length(df$date)==5,
as.Date(as.numeric(df$date), origin = "1899-12-30"),
as.Date(df$date, format = "%d.%m.%Y"))
or a for loop:
for (i in length(x)) {
if(nchar(x[i])==5) {
y[i] <- as.Date(as.numeric(x[i]), origin = "1899-12-30")
} else {x[i] <- as.Date(x[i], , format = "%d.%m.%Y"))}
} print(x)
It does not work because of:
"character string is not in a standard unambiguous format"
Maybe you could advice a better solution to convert/ replace different date formats in the appropriate one?
I have 2 solutions for it.
Changing the code, which I don't like because you are depending on xlsx date formats:
> df <- tibble(date = c("01.01.2020","43596"))
>
> df$date <- as.Date(ifelse(nchar(df$date)==5,
+ as.Date(as.numeric(df$date), origin = "1899-12-30"),
+ as.Date(df$date, format = "%d.%m.%Y")), origin = "1970-01-01")
Warning message:
In as.Date(as.numeric(df$date), origin = "1899-12-30") :
NAs introducidos por coerción
>
> df$date
[1] "2020-01-01" "2019-05-11"
>
Save the document as CSV and use read_csv() function from readr package. That solves everything !!!!
You could use sapply to apply ifelse to each value:
df$date <- as.Date(sapply(df$date,function(date) ifelse(nchar(date)==5,
as.Date(as.numeric(date), origin = "1899-12-30"),
as.Date(date, format = "%d.%m.%Y"))),
origin="1970-01-01")
df
# A tibble: 6 x 2
contract date
<dbl> <date>
1 231429 2019-05-11
2 231437 2020-01-07
3 231449 2021-01-01
4 231459 2020-03-03
5 231463 2020-10-27
6 231466 2011-03-17
A tidyverse solution using rowwise
library(dplyr)
library(lubridate)
df %>%
rowwise() %>%
mutate(date_new=as.Date(ifelse(grepl("\\.",date),
as.character(dmy(date)),
as.character(as.Date(as.numeric(date), origin="1899-12-30"))))) %>%
ungroup()
# A tibble: 6 × 3
contract date date_new
<dbl> <chr> <date>
1 231429 43596 2019-05-11
2 231437 07.01.2020 2020-01-07
3 231449 01.01.2021 2021-01-01
4 231459 03.03.2020 2020-03-03
5 231463 44131 2020-10-27
6 231466 40619 2011-03-17
I am working with large datasets and in which one column is represented as char data type instead of a DateTime datatype. I trying it convert but I am unable to convert it.
Could you please suggest any suggestions for this problem? it would be very helpful for me
Thanks in advance
code which i am using right now
c_data$dt_1 <- lubridate::parse_date_time(c_data$started_at,"ymd HMS")
getting output:
2027- 05- 20 20:10:03
but desired output is
2020-05-20 10:03
Here is another way using lubridate:
library(lubridate)
df <- tibble(start_at = c("27/05/2020 10:03", "25/05/2020 10:47"))
df %>%
mutate(start_at = dmy_hms(start_at))
# A tibble: 2 x 1
start_at
<dttm>
1 2020-05-27 20:10:03
2 2020-05-25 20:10:47
In R, dates and times have a single format. You can change it's format to your required format but then it would be of type character.
If you want to keep data in the format year-month-day min-sec you can use format as -
format(Sys.time(), '%Y-%m-%d %M:%S')
#[1] "2021-08-27 17:54"
For the entire column you can apply this as -
c_data$dt_2 <- format(c_data$dt_1, '%Y-%m-%d %M:%S')
Read ?strptime for different formatting options.
Using anytime
library(dplyr)
library(anytime)
addFormats("%d/%m/%Y %H:%M")
df %>%
mutate(start_at = anytime(start_at))
-output
# A tibble: 2 x 1
start_at
<dttm>
1 2020-05-27 10:03:00
2 2020-05-25 10:47:00
I have a dataframe with over 8.8 million observations and I need to remove rows from the dataframe before a certain date. Currently the date format is in MM/DD/YYYY but I would like to convert it to R date format (I believe YYYY-MM-DD).
When I run the code that I have below, it puts them in the correct R format, but it does not keep the correct date. For some reason, it makes the dates 2020. None of the dates in my data frame have the year 2020
> dates <- nyc_call_data_sample$INCIDENT_DATETIME
> date <- as.Date(dates,
+ format = "%m/%d/%y")
> head(nyc_call_data_sample$INCIDENT_DATETIME)
[1] "07/01/2015" "04/24/2016" "04/01/2013" "02/07/2015" "06/27/2016" "05/04/2017"
> head(date)
[1] "2020-07-01" "2020-04-24" "2020-04-01" "2020-02-07" "2020-06-27" "2020-05-04"
> nyc_call_data_sample$INCIDENT_DATETIME <- strptime(as.character(nzd$date), "%d/%m/%y")
Also, I have data that goes back as far as 2013. How would I go about removing all rows from the dataframe that are before 01/01/2017
Thanks!
as.Date and basic ?Extraction are your friend here.
dat <- data.frame(
unformatted = c("07/01/2015", "04/24/2016", "04/01/2013", "02/07/2015", "06/27/2016", "05/04/2017")
)
dat$date <- as.Date(dat$unformatted, format = "%m/%d/%Y")
dat
# unformatted date
# 1 07/01/2015 2015-07-01
# 2 04/24/2016 2016-04-24
# 3 04/01/2013 2013-04-01
# 4 02/07/2015 2015-02-07
# 5 06/27/2016 2016-06-27
# 6 05/04/2017 2017-05-04
dat[ dat$date > as.Date("2017-01-01"), ]
# unformatted date
# 6 05/04/2017 2017-05-04
(Feel free to remove the unformatted column with dat$unformatted <- NULL.)
With tidyverse:
library(dplyr)
dat %>%
mutate(date = as.Date(unformatted, format = "%m/%d/%Y")) %>%
select(-unformatted) %>%
filter(date > as.Date("2017-01-01"))
# date
# 1 2017-05-04
Been having difficulty with this one data frame manipulation in R.
I have two columns for well height and a date-time string ("yyyy-mm-dd HH:MM:ss").
I would like to extract all the rows from this table that occur at midnight (00:00:00).
I could manipulate this table in seconds with python, but I want to figure it out in R using strsplit() instead of POSIXct.
How do I mutate the table so that I split the date-time string and extract just the time value into a new column?
I think the answer is in vapply, but I have been drenching myself in manuals the last couple weeks and still can't figure it out.
Welcome to SO. it can be done in multiple ways. Try this:
## some data
df <- data.frame(height=c(11,12),time = c("1999-9-9 00:00:00","1999-9-9 00:00:02"),stringsAsFactors = FALSE)
df
#> height time
#> 1 11 1999-9-9 00:00:00
#> 2 12 1999-9-9 00:00:02
## In base R
df2<- df
df2$hms <- do.call(rbind,strsplit(df2$time," "))[,2]
df2[df2$hms=="00:00:00",]
#> height time hms
#> 1 11 1999-9-9 00:00:00 00:00:00
## In tidyverse
library(dplyr)
df3 <- df %>%
mutate(hms = gsub(".*(..:..:..).*","\\1",time)) %>%
filter(hms == "00:00:00")
df3
#> height time hms
#> 1 11 1999-9-9 00:00:00 00:00:00
Created on 2018-10-04 by the reprex package (v0.2.1)
You don't provide an example, so here is my guess:
Let's say you have a character vector (could be a column):
dateTimes <- c("1999-01-01 11:11:11", "1999-01-01 12:12:12", "1999-01-01 13:13:13")
You extract the times in the end:
ans <- sub(".*-\\d+\\s", "", dateTimes, perl = T)
#[1] "11:11:11" "12:12:12" "13:13:13"
Save them into a new variable or column:
When you want to extract rows that occur at 00:00:00 simply use a string comparison and subset your data:
df1[ans == "00:00:00",]