I'm trying to convert all the column of my dataframe to the date format, their class is currently chr.
My code is working without the loop (when I specify the column: df$colname)
df_2 <- data.frame(matrix(nrow = nrow(df)))
for (i in colnames(df)){
var <- data.frame(as.POSIXct(df$i, format ="%Y-%m-%d %H:%M:%S"))
df_2 <- cbind(df_2, var)
}
My dataframe (df):
col1
col2
2015-01-02 09:43:45
2015-01-05 10:08:48
2015-01-02 10:15:44
2015-01-05 10:21:05
col1<- c("2015-01-02 09:43:45 ", "2015-01-02 10:15:44")
col2 <- c("2015-01-05 10:08:48","2015-01-05 10:21:05")
df <- data.frame(col1, col2)
Lets create df2 with the same structure of df:
df_2 <- data.frame(matrix(nrow=nrow(df),ncol=ncol(df)))
Then, the loop should run in a sequence. In your case is the number of columns. Then you want to add the formatted data as date, to df2, according with the number of columns. So, the loop should work like this:
for (i in 1:ncol(df)){
df_2[i]<-as.POSIXct(df[,i],format ="%Y-%m-%d %H:%M:%S")
colnames(df_2)<-names(df)
}
You can do it without using loops at all, if you want to. Here's a tidyverse solution:
library(tidyverse)
d <- tibble(
col1=c("2015-01-02 09:43:45", "2015-01-05 10:08:48"),
col2=c("2015-01-02 10:15:44", "2015-01-05 10:21:05")
)
d %>% mutate(across(everything(), ~as.POSIXct(., format ="%Y-%m-%d %H:%M:%S")))
# A tibble: 2 × 2
col1 col2
<dttm> <dttm>
1 2015-01-02 09:43:45 2015-01-02 10:15:44
2 2015-01-05 10:08:48 2015-01-05 10:21:05
I am unable to reproduce OP's issue with my solution. I've provided the output I obtain above.
It might possibly be a version issue. The versions of the packages I am using are given below:
> packageVersion("tidyverse")
[1] ‘1.3.1’
> packageVersion("tibble")
[1] ‘3.1.5’
> packageVersion("dplyr")
[1] ‘1.0.7’
> packageVersion("base")
[1] ‘4.1.0’
Beyond that, I have no suggestions.
Also, please bear in mind that it is not good practice to upload code, results or data as images for these reasons.
Related
I am trying to transform a list of numbers (e.g. 20200119) into a valid date (here: 2020-01-19)
This is my trial data:
df <- data.frame(c(20200119, 20180718, 20180729, 20150502, 20010301))
colnames(df)[1] = "Dates"
And this is what I tried so far:
df <- as_date(df)
df <- as.Date.numeric(df)
df <- as.Date.factor(df)
Neither of them works unfortunately.
I also tried to seperate the numbers, but I couldn't achieve either.
Can somebody help me?
Convert it to a character and convert it then to a Date with given format %Y%m%d:
as.Date(as.character(df$Dates), "%Y%m%d")
#[1] "2020-01-19" "2018-07-18" "2018-07-29" "2015-05-02" "2001-03-01"
Another option using strptime with the right format like this:
df <- data.frame(c(20200119, 20180718, 20180729, 20150502, 20010301))
colnames(df)[1] = "Dates"
df$Dates2 <- strptime(df$Dates, format = "%Y%m%d")
df
#> Dates Dates2
#> 1 20200119 2020-01-19
#> 2 20180718 2018-07-18
#> 3 20180729 2018-07-29
#> 4 20150502 2015-05-02
#> 5 20010301 2001-03-01
Created on 2023-01-12 with reprex v2.0.2
I try convert date format yyyymmdd in yyyy only in R.
In how to convert numeric only year in Date in R? presented a very interesting answer, as it managed to make R understand to convert an 8-digit entry (yyyymmdd) as a 4-digit year year (yyyy) in the lubricated package, this is very good for me.
in old code i used round_date() for it:
date2<-c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
name<-c('A','B','C','D','E')
df<-data.frame(date2,name)
df2 <- df %>%
mutate(date2 = dmy(date2)) %>%
mutate(year_date = round_date(date2,'year'))
df2
str(df2)
date2<date> name<chr> year_date <date>
2000-01-01 A 2000-01-01
2000-08-08 B 2001-01-01
2001-03-16 C 2001-01-01
2000-12-25 D 2001-01-01
2000-02-29 E 2000-01-01
But I started to have problems with my statistical analysis when discovering for example that a date 2000-08-08 was rounded up to the year 2001-01-01, instead of 2001-01-01 as I expected.
This is a very big problem for me, since information that belongs to the year 2005 has been moved to the year 2006, considering that I have more than 1400 rows in my database.
I noticed that dates after the middle of the year (after June) are rounded up to the next year, this is very bad.
How do I round a 2000-08-08 date to just 2000 instead of 2001?
Doesn't this (simpler, also only base R) operation do what you want?
> date2 <- c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
> dd <- as.Date(date2, "%d/%m/%Y")
> yd <- format(dd, "%Y-01-01")
> dt <- as.Date(yd)
> D <- data.frame(date2=date2, date=dd, y=yd, d=dt)
> D
date2 date y d
1 01/01/2000 2000-01-01 2000-01-01 2000-01-01
2 08/08/2000 2000-08-08 2000-01-01 2000-01-01
3 16/03/2001 2001-03-16 2001-01-01 2001-01-01
4 25/12/2000 2000-12-25 2000-01-01 2000-01-01
5 29/02/2000 2000-02-29 2000-01-01 2000-01-01
>
In essence we just extract the year component from the (parsed as date) Date object and append -01-01.
Edit: There are also trunc() operations for Date and Datetime objects. Oddly, truncation for years only works for Datetime (see the help page for trunc.Date for more) so this works too:
> as.Date(trunc(as.POSIXlt(dd), "years"))
[1] "2000-01-01" "2000-01-01" "2001-01-01" "2000-01-01" "2000-01-01"
>
Edit 2: We can use that last step in a cleaner / simpler solution in a data.frame with three columns for input data (as characters), parse data as a proper Date type and the desired truncated year data — all using base R without further dependencies. Of course, if you would want to you could rewrite it via the pipe and lubridate for the same result via slightly slower route (which only matters for "large" data).
> date2 <- c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
> pd <- as.Date(date2, "%d/%m/%Y")
> td <- as.Date(trunc(as.POSIXlt(pd), "years"))
> D <- data.frame(input = date2, parsed = pd, output = td)
> D
input parsed output
1 01/01/2000 2000-01-01 2000-01-01
2 08/08/2000 2000-08-08 2000-01-01
3 16/03/2001 2001-03-16 2001-01-01
4 25/12/2000 2000-12-25 2000-01-01
5 29/02/2000 2000-02-29 2000-01-01
>
For a real "production" use you may not need the data.frame and do not need to keep the intermediate result leading to a one-liner:
> as.Date(trunc(as.POSIXlt( as.Date(date2, "%d/%m/%Y") ), "years"))
[1] "2000-01-01" "2000-01-01" "2001-01-01" "2000-01-01" "2000-01-01"
>
which is likely the most compact and efficient conversion you can get.
If you want just the year (and not the date corresponding to the first day of the year) you can use lubridate::year().
df %>% mutate(across(date2,dmy),
year_date=year(date2))
If you do want the first day of the year then floor_date() will do the trick.
df %>% mutate(across(date2,dmy),
year_date=floor_date(date2,"year"))
or if you only need the truncated date you could go directly to mutate(year_date=floor_date(dmy(date2)))
In base R, year() would be format(date2, "%Y"), as shown in #DirkEddelbuettel's answer.
If you consult the round_datehelp page, you will also see floor_date:
library("lubridate")
library("dplyr")
date2 <- c('01/01/2000','08/08/2000','16/03/2001','25/12/2000','29/02/2000')
name <- c('A','B','C','D','E')
df <- data.frame(date2,name)
df2 <- df %>%
mutate(date2 = dmy(date2)) %>%
mutate(year_date = floor_date(date2,'year'))
df2
In my data, Time and date is stored in a column 'cord' (class=factor). I want to separate the date and the time into two separate columns.
The data looks like this:
1 2019-05-26T13:50:56.335288Z
2 2019-05-26T17:55:45.348073Z
3 2019-05-26T18:12:00.882572Z
4 2019-05-26T18:26:49.577310Z
I have successfully extracted the date using:cord$Date <- as.POSIXct(cord$Time)
I have however not been able to find a way to extract the time in format "H:M:S".
The output of dput(head(cord$Time)) returns a long list of timestamps: "2020-04-02T13:34:07.746777Z", "2020-04-02T13:41:11.095014Z",
"2020-04-02T14:08:05.508818Z", "2020-04-02T14:17:10.337101Z", and so on...
Extract H:M:S
library(lubridate)
format(as_datetime(cord$Time), "%H:%M:%S")
#> [1] "13:50:56" "17:55:45" "18:12:00" "18:26:49"
If you need milliseconds too:
format(as_datetime(cord$Time), "%H:%M:%OS6")
#> [1] "13:50:56.335288" "17:55:45.348073" "18:12:00.882572" "18:26:49.577310"
where cord is:
cord <- read.table(text = " Time
1 2019-05-26T13:50:56.335288Z
2 2019-05-26T17:55:45.348073Z
3 2019-05-26T18:12:00.882572Z
4 2019-05-26T18:26:49.577310Z ", header = TRUE)
I typically use lubridate and data.table to do my date and manipulation work. This works for me copying in some of your raw dates as strings
library(lubridate)
library(data.table)
x <- c("2019-05-26T13:50:56.335288Z", "2019-05-26T17:55:45.348073Z")
# lubridate to parse to date time
y <- parse_date_time(x, "ymd HMS")
# data.table to split in to dates and time
split_y <- tstrsplit(y, " ")
dt <- as.data.table(split_y)
setnames(dt, "Date", "Time")
dt[]
# if you use data.frames instead
df <- as.data.frame(dt)
df
I have a dataframe with over 8.8 million observations and I need to remove rows from the dataframe before a certain date. Currently the date format is in MM/DD/YYYY but I would like to convert it to R date format (I believe YYYY-MM-DD).
When I run the code that I have below, it puts them in the correct R format, but it does not keep the correct date. For some reason, it makes the dates 2020. None of the dates in my data frame have the year 2020
> dates <- nyc_call_data_sample$INCIDENT_DATETIME
> date <- as.Date(dates,
+ format = "%m/%d/%y")
> head(nyc_call_data_sample$INCIDENT_DATETIME)
[1] "07/01/2015" "04/24/2016" "04/01/2013" "02/07/2015" "06/27/2016" "05/04/2017"
> head(date)
[1] "2020-07-01" "2020-04-24" "2020-04-01" "2020-02-07" "2020-06-27" "2020-05-04"
> nyc_call_data_sample$INCIDENT_DATETIME <- strptime(as.character(nzd$date), "%d/%m/%y")
Also, I have data that goes back as far as 2013. How would I go about removing all rows from the dataframe that are before 01/01/2017
Thanks!
as.Date and basic ?Extraction are your friend here.
dat <- data.frame(
unformatted = c("07/01/2015", "04/24/2016", "04/01/2013", "02/07/2015", "06/27/2016", "05/04/2017")
)
dat$date <- as.Date(dat$unformatted, format = "%m/%d/%Y")
dat
# unformatted date
# 1 07/01/2015 2015-07-01
# 2 04/24/2016 2016-04-24
# 3 04/01/2013 2013-04-01
# 4 02/07/2015 2015-02-07
# 5 06/27/2016 2016-06-27
# 6 05/04/2017 2017-05-04
dat[ dat$date > as.Date("2017-01-01"), ]
# unformatted date
# 6 05/04/2017 2017-05-04
(Feel free to remove the unformatted column with dat$unformatted <- NULL.)
With tidyverse:
library(dplyr)
dat %>%
mutate(date = as.Date(unformatted, format = "%m/%d/%Y")) %>%
select(-unformatted) %>%
filter(date > as.Date("2017-01-01"))
# date
# 1 2017-05-04
I have time-series data in xts representation as
library(xts)
xtime <-timeBasedSeq('2015-01-01/2015-01-30 23')
df <- xts(rnorm(length(xtime),30,4),xtime)
Now I want to calculate co-orelation between different days, and hence I want to represent df in matrix form as:
To achieve this I used
p_mat= split(df,f="days",drop=FALSE,k=1)
Using this I get a list of days, but I am not able to arrange this list in matrix form. Also I used
p_mat<- df[.indexday(df) %in% c(1:30) & .indexhour(df) %in% c(1:24)]
With this I do not get any output.
Also I tried to use rollapply(), but was not able to arrange it properly.
May I get help to form the matrix using xts/zoo objects.
Maybe you could use something like this:
#convert to a data.frame with an hour column and a day column
df2 <- data.frame(value = df,
hour = format(index(df), '%H:%M:%S'),
day = format(index(df), '%Y:%m:%d'),
stringsAsFactors=FALSE)
#then use xtabs which ouputs a matrix in the format you need
tab <- xtabs(value ~ day + hour, df2)
Output:
hour
day 00:00:00 01:00:00 02:00:00 03:00:00 04:00:00 05:00:00 06:00:00 07:00:00 08:00:00 09:00:00 10:00:00 11:00:00 12:00:00
2015:01:01 28.15342 35.72913 27.39721 29.17048 28.42877 28.72003 28.88355 31.97675 29.29068 27.97617 35.37216 29.14168 29.28177
2015:01:02 23.85420 28.79610 27.88688 27.39162 29.77241 22.34256 34.70633 23.34011 28.14588 25.53632 26.99672 38.34867 30.06958
2015:01:03 37.47716 31.70040 29.04541 34.23393 33.54569 27.52303 38.82441 28.97989 24.30202 29.42240 30.83015 39.23191 30.42321
2015:01:04 24.13100 32.08409 29.36498 35.85835 26.93567 28.27915 26.29556 29.29158 31.60805 27.07301 33.32149 25.16767 25.80806
2015:01:05 32.16531 29.94640 32.04043 29.34250 31.68278 28.39901 24.51917 33.95135 36.07898 28.76504 24.98684 32.56897 29.82116
2015:01:06 18.44432 27.43807 32.28203 29.76111 29.60729 32.24328 25.25417 34.38711 29.97862 32.82924 34.13643 30.89392 26.48517
2015:01:07 34.58491 20.38762 32.29096 31.49890 28.29893 33.80405 28.44305 28.86268 33.42964 36.87851 31.08022 28.31126 25.24355
2015:01:08 33.67921 31.59252 28.36989 35.29703 27.19507 27.67754 25.99571 27.32729 33.78074 31.73481 34.02064 28.43953 31.50548
2015:01:09 28.46547 36.61658 36.04885 30.33186 32.26888 25.90181 31.29203 34.17445 30.39631 28.18345 27.37687 29.85631 34.27665
2015:01:10 30.68196 26.54386 32.71692 28.69160 23.72367 28.53020 35.45774 28.66287 32.93100 33.78634 30.01759 28.59071 27.88122
2015:01:11 32.70907 31.51985 29.22881 36.31157 32.38494 25.30569 29.37743 22.32436 29.21896 19.63069 35.25601 27.45783 28.28008
2015:01:12 29.96676 30.51542 29.41650 29.34436 37.05421 33.05035 34.44572 26.30717 30.65737 34.61930 29.77391 21.48256 31.37938
2015:01:13 33.46089 34.29776 37.58262 27.58801 28.43653 28.33511 28.49737 28.53348 28.81729 35.76728 27.20985 28.44733 32.61015
2015:01:14 22.96213 32.27889 36.44939 23.45088 26.88173 27.43529 27.27547 21.86686 32.00385 23.87281 29.90001 32.37194 29.20722
2015:01:15 28.30359 30.94721 20.62911 33.84679 27.58230 26.98849 23.77755 24.18443 30.22533 32.03748 21.60847 25.98255 32.14309
2015:01:16 23.52449 29.56138 31.76356 35.40398 24.72556 31.45754 30.93400 34.77582 29.88836 28.57080 25.41274 27.93032 28.55150
2015:01:17 25.56436 31.23027 25.57242 31.39061 26.50694 30.30921 28.81253 25.26703 30.04517 33.96640 36.37587 24.50915 29.00156
...and so on
Here's one way to do it using a helper function that will account for days that do not have 24 observations.
library(xts)
xtime <- timeBasedSeq('2015-01-01/2015-01-30 23')
set.seed(21)
df <- xts(rnorm(length(xtime),30,4), xtime)
tHourly <- function(x) {
# initialize result matrix for all 24 hours
dnames <- list(format(index(x[1]), "%Y-%m-%d"),
paste0("H", 0:23))
res <- matrix(NA, 1, 24, dimnames = dnames)
# transpose day's rows and set colnames
tx <- t(x)
colnames(tx) <- paste0("H", .indexhour(x))
# update result object and return
res[,colnames(tx)] <- tx
res
}
# split on days, apply tHourly to each day, rbind results
p_mat <- split(df, f="days", drop=FALSE, k=1)
p_list <- lapply(p_mat, tHourly)
p_hmat <- do.call(rbind, p_list)