I have a date format as follows: yyyymmdd. So, 10 March 2022 is fromatted as 20220310. So there is no separator between the day, month and year. But no I want to replace to column with all those dates with a column that only contains the year. Normally I would use the following code:
df <- df %>%
mutate(across(contains("Date"), ~(parse_date_time(., orders = c('ymd')))))
And then separate the column into three different columns with year, month and days and than delete the monht and day column. But somehow the code above doesn't work. Hope that anyone can help me out.
Not as fancy, but you could simply get the year from a substring of the whole date:
df$Year <- as.numeric(substr(as.character(df$Date),1,4))
you can try this:
df$column_with_date <- as.integer(x = substr(x = df$column_with_date, start = 1, stop = 4))
The as.integer function is optional, but you could use it to save more space in memory.
You code works if it is in the format below. You can use mutate_at with a list of year, month, and day to create the three columns like this:
df <- data.frame(Date = c("20220310"))
library(lubridate)
library(dplyr)
df %>%
mutate(across(contains("Date"), ~(parse_date_time(., orders = c('ymd'))))) %>%
mutate_at(vars(Date), list(year = year, month = month, day = day))
#> Date year month day
#> 1 2022-03-10 2022 3 10
Created on 2022-07-25 by the reprex package (v2.0.1)
Related
I have a DF and I would like to create a column with YEAR and MONTH, but setting 2 digits for the month. See my code:
ID <- c(111,222,333,444,555)
DATE <- c(as.Date(c('10/10/2021','12/11/2021','30/12/2021','20/01/2022','25/02/2022') ,"%d/%m/%Y"))
DF_1 <- data.frame(ID, DATE)
Adding the YEAR and MONTH column:
DF_2 <- DF_1 %>%
mutate(YEAR_MONTH = paste(lubridate::year(DATA),
lubridate::month(DATE),
sep = ""))
As you can see, in IDs 444 and 555 the month only presented one digit. I would like it to look like this:
ID <- c(111,222,333,444,555)
DATE <- c(as.Date(c('10/10/2021','12/11/2021','30/12/2021','20/01/2022','25/02/2022') ,"%d/%m/%Y"))
YEAR_MONTH <- c('202110','202111','202112','202201','202202')
DF_3 <- data.frame(ID, DATE, YEAR_MONTH)
How would I go about treating these months that are showing up with just one digit?
Grateful.
Instead of using lubridate year/month, we can directly modify with format which returns the 4 digit year and 2 digit month. lubridate returns a numeric/integer value which cannot have 0 as padding on the left
library(dplyr)
DF_1 <- DF_1 %>%
mutate(YEAR_MONTH = format(DATE, "%Y%m"))
Or using base R
DF_1$YEAR_MONTH <- with(DF_1, format(DATE, "%Y%m"))
let's say I have a list of dates from March 1st to July 15th:
daterange = as.data.frame(seq(as.Date("2020-3-1"), as.Date("2020-7-15"), "days"))
I want to group the dates by 1-15 and 16-30/31 for each month. So the dates in March will be separated into two groups: Mar 1-15 and Mar 16-31. Then keep doing this for every month.
I know the lubridate package can sort by week, but I don't know how to set a custom range.
Thanks
We can create a logical vector on day as well as a group on yearmon
library(dplyr)
library(zoo)
library(lubridate)
library(stringr)
daterange2 <- daterange %>%
set_names('Date') %>%
group_by(yearmon = as.yearmon(Date),
Daygroup = (day(Date) > 15) + 1) %>%
mutate(Label = str_c(format(Date, '%b'),
str_c(min(day(Date)), max(day(Date)), sep='-'), sep= ' '))
Using base R, you can create two groups in each month by pasting the month value from each date and assign value 1/2 based on the date.
newdaterange <- transform(daterange, group = paste0(format(date, "%b"), '-group-',
ifelse(as.integer(format(date, "%d")) > 15, 1, 2)))
I am an aspiring data scientist, and this will be my first ever question on StackOF.
I have this line of code to help wrangle me data. My date filter is static. I would prefer not to have to go in an change this hardcoded value every year. What is the best alternative for my date filter to make it more dynamic? The date column is also difficult to work with because it is not a
"date", it is a "dbl"
library(dplyr)
library(lubridate)
# create a sample dataframe
df <- data.frame(
DATE = c(20191230, 20191231, 20200122)
)
Tried so far:
df %>%
filter(DATE >= 20191231)
# load packages (lubridate for dates)
library(dplyr)
library(lubridate)
# create a sample dataframe
df <- data.frame(
DATE = c(20191230, 20191231, 20200122)
)
This looks like this:
DATE
1 20191230
2 20191231
3 20200122
# and now...
df %>% # take the dataframe
mutate(DATE = ymd(DATE)) %>% # turn the DATE column actually into a date
filter(DATE >= floor_date(Sys.Date(), "year") - days(1))
...and filter rows where DATE is >= to one day before the first day of this year (floor_date(Sys.Date(), "year"))
DATE
1 2019-12-31
2 2020-01-22
I'm trying to filte some daily panel ,but I just want to use the month end data,first I must know their month end date.
data example:
https://imgur.com/a/SqL7A7F
I tried to use this code to get the month end date.However,some dates are missed (Some month end days are 24/25/26,I missed many data)
My problems is,how can I get the data month end date and not ignore any earlier last day(like3/23,6/25,etc.)
library(anytime)
x=anydate(as.vector(fundbv$date))
y=unique(as.Date(format(mydates+28,"%Y-%m-01"))-1)
finaldays=x[x %in% unique(as.Date(format(x+28,"%Y-%m-01"))-1)]
finaldays=unique(finaldays)
Thanks and appreciate!!!!!
Here's how to do it with dplyr and lubridate:
library(dplyr)
library(lubridate)
# generate a data frame with dates to play with
(df <- data_frame(
date=seq(as.Date("2017-01-01"), as.Date("2018-12-31"), by=6),
amount=rgamma(length(date), shape=2, scale=20)))
df %>%
group_by(month=floor_date(date, "month")) %>%
summarize(date = max(date))
I'm new to R, so please no hate. I want to convert the below column of ints to a column of years
Convert this:
Date: int 189507 189508 189509 ...
To this:
Year: int 1895 1895 1895
Code
library(tidyverse)
library(lubridate)
df <- read_csv("noaa-central-park.csv")
year <- df$Date
df <- transform(df, year = as.Date(as.character(year), "%Y"))
tempByYears <- group_by(df, year)
Question: I still get a year-month-day format as shown below. How to fix?
Sources: Stackoverflow questions, group_by() video
I'm assuming that the value in Date is Year + Month, in the format %Y%m. In that case, it would be better not to read it into R as in integer. You could specify that Date be a character, for example.
I'm using df1 for the data frame variable name because df may cause confusion with the function of the same name.
df1 <- read_csv("noaa-central-park.csv",
col_types = cols(Date = col_character()))
Now assuming that every Date starts with a 4-digit year, the simplest way to get year is to extract the first 4 characters and convert to numeric:
df1 <- df1 %>%
mutate(year = as.numeric(substring(Date, 1, 4))