Convert column of ints to year - r

I'm new to R, so please no hate. I want to convert the below column of ints to a column of years
Convert this:
Date: int 189507 189508 189509 ...
To this:
Year: int 1895 1895 1895
Code
library(tidyverse)
library(lubridate)
df <- read_csv("noaa-central-park.csv")
year <- df$Date
df <- transform(df, year = as.Date(as.character(year), "%Y"))
tempByYears <- group_by(df, year)
Question: I still get a year-month-day format as shown below. How to fix?
Sources: Stackoverflow questions, group_by() video

I'm assuming that the value in Date is Year + Month, in the format %Y%m. In that case, it would be better not to read it into R as in integer. You could specify that Date be a character, for example.
I'm using df1 for the data frame variable name because df may cause confusion with the function of the same name.
df1 <- read_csv("noaa-central-park.csv",
col_types = cols(Date = col_character()))
Now assuming that every Date starts with a 4-digit year, the simplest way to get year is to extract the first 4 characters and convert to numeric:
df1 <- df1 %>%
mutate(year = as.numeric(substring(Date, 1, 4))

Related

Extract year from date with weird date format

I have a date format as follows: yyyymmdd. So, 10 March 2022 is fromatted as 20220310. So there is no separator between the day, month and year. But no I want to replace to column with all those dates with a column that only contains the year. Normally I would use the following code:
df <- df %>%
mutate(across(contains("Date"), ~(parse_date_time(., orders = c('ymd')))))
And then separate the column into three different columns with year, month and days and than delete the monht and day column. But somehow the code above doesn't work. Hope that anyone can help me out.
Not as fancy, but you could simply get the year from a substring of the whole date:
df$Year <- as.numeric(substr(as.character(df$Date),1,4))
you can try this:
df$column_with_date <- as.integer(x = substr(x = df$column_with_date, start = 1, stop = 4))
The as.integer function is optional, but you could use it to save more space in memory.
You code works if it is in the format below. You can use mutate_at with a list of year, month, and day to create the three columns like this:
df <- data.frame(Date = c("20220310"))
library(lubridate)
library(dplyr)
df %>%
mutate(across(contains("Date"), ~(parse_date_time(., orders = c('ymd'))))) %>%
mutate_at(vars(Date), list(year = year, month = month, day = day))
#> Date year month day
#> 1 2022-03-10 2022 3 10
Created on 2022-07-25 by the reprex package (v2.0.1)

Set 2 digits in Month

I have a DF and I would like to create a column with YEAR and MONTH, but setting 2 digits for the month. See my code:
ID <- c(111,222,333,444,555)
DATE <- c(as.Date(c('10/10/2021','12/11/2021','30/12/2021','20/01/2022','25/02/2022') ,"%d/%m/%Y"))
DF_1 <- data.frame(ID, DATE)
Adding the YEAR and MONTH column:
DF_2 <- DF_1 %>%
mutate(YEAR_MONTH = paste(lubridate::year(DATA),
lubridate::month(DATE),
sep = ""))
As you can see, in IDs 444 and 555 the month only presented one digit. I would like it to look like this:
ID <- c(111,222,333,444,555)
DATE <- c(as.Date(c('10/10/2021','12/11/2021','30/12/2021','20/01/2022','25/02/2022') ,"%d/%m/%Y"))
YEAR_MONTH <- c('202110','202111','202112','202201','202202')
DF_3 <- data.frame(ID, DATE, YEAR_MONTH)
How would I go about treating these months that are showing up with just one digit?
Grateful.
Instead of using lubridate year/month, we can directly modify with format which returns the 4 digit year and 2 digit month. lubridate returns a numeric/integer value which cannot have 0 as padding on the left
library(dplyr)
DF_1 <- DF_1 %>%
mutate(YEAR_MONTH = format(DATE, "%Y%m"))
Or using base R
DF_1$YEAR_MONTH <- with(DF_1, format(DATE, "%Y%m"))

Converting ddmmyy-xxxx to date in R

I have a dateframe with a column with numbers that represent a date. So 110190-1111 is ddmmyy-xxxx, where the x's don't matter. It is implicit that the century is 1900.
df <- c("110190-1111", "220391-1111", "241287-1111")
I would like to have it converted to.
c("1990-01-11", "1991-03-22", "1987-12-24)
I have removed the last 4 digits and the "-" with the following.
ID <- c("110190-1111", "220391-1111", "241287-1111")
df <- data.frame(ID)
df <- df %>% mutate(date=gsub("-.*", "", ID))
I have tried fiddling with the as.Date function with no luck. Any suggestions? Thanks.
as.Date ignores junk at the end so
df %>% mutate(Date = as.Date(ID, "%d%m%y"))
giving:
ID Date
1 110190-1111 1990-01-11
2 220391-1111 1991-03-22
3 241287-1111 1987-12-24
or using only base R:
transform(df, Date = as.Date(ID, "%d%m%y"))
We can use dmy from lubridate
library(lubridate)
df$date <- dmy(df$date)

R: How to remove the day from a date? [duplicate]

This question already has answers here:
Format Date to Year-Month in R
(3 answers)
Closed 2 years ago.
I have a bunch of dates in a df column in the following format: dd.mm.yyyy
I want it to look like this: 01/2020 (mm.yyyy)
How can I remove the day from all of the dates?
Use format to specify the date format you'd like
date <- as.Date("13/01/2020", format = "%d/%m/%Y")
format(date, "%m/%Y")
[1] "01/2020"
Edit - applying to dataframe column
dates <- c("13/01/2020", "17/02/2015", "13/03/2013")
df <- data.frame(dates, stringsAsFactors = FALSE)
df$dates <- as.Date(df$dates, format = "%d/%m/%Y")
df$dates_format <- format(df$dates, "%m/%Y")
df
dates dates_format
1 2020-01-13 01/2020
2 2015-02-17 02/2015
3 2013-03-13 03/2013
Besides format by #Greg, another option is using sub like below
> sub(".*?/","","13/01/2020")
[1] "01/2020"
Here is a solution using lubridate.
library(lubridate)
#Set the desired format (mm-yyyy) as my_stamp
my_stamp<-stamp( "02-2019",
orders = "my")
#A df with a column full of dates
df <- data.frame(dates = c("30/04/2020","29/03/2020","28/02/2020"))
#Change the column from string to date format
df$dates<-dmy(df$dates)
#Apply the format you desire to the dates (i.e., only month and year)
df$dates<-my_stamp(df$dates)
# dates
#1 04-2020
#2 03-2020
#3 02-2020
There are explicit date formatting options in R (see answer by Greg). Another option would be to separate the date into 3 columns, and then recombine the month and year, putting a / in between. Note this leaves the new date column in character format, which you may want to change depending on your needs.
library(tidyr)
df <- data.frame(date = "13/01/2020")
df <- separate(df, date, into = c("day","month","year"), sep = "/")
df$newdate <- paste(df$month, df$year, sep = "/")

Replace characters with dates in dataframe in r

I have a dataframe with dates stored as strings. The conversion with strptime works fine when I test it in the terminal, but when I want to assign the date in the original cell, I get an error:
provided 11 variables to replace 1 variables
This must be due to the fact that the Object created by strptime() POSIXlt is a list.
How can I assign that object into the cell? I later want to order the dataframe by the date column.
I'm sorry that I can't share the code, due to privacy restrictions.
Edit: This snippet should produce the same error
#creating dataframe
x <- c( "20.11.2019 10:12:15", "21.10.2019 10:12:16", "20.10.2019 10:12:20")
y <- c( "1234", "1238", "1250")
df <- data.frame( "date" = x, "id" = y)
df[order(df$date),] #ordering by date
df #showing that dates get ordered 'incorrectly'
df[,1] = strptime(df[,1], "%d.%m.%Y %H:%M:%S") #trying to replace character with dates while converting
#afterwards I want to order them again 'correctly'
Personally I would use dplyr to mutate the values of the original cell. In combination with lubridate it works for me (at least I think this what you wanted):
df <- df %>% mutate(date =ymd_hms(strptime(date, "%d.%m.%Y %H:%M:%S"))) %>% arrange(date)
date id
1 2019-10-20 10:12:20 1250
2 2019-10-21 10:12:16 1238
3 2019-11-20 10:12:15 1234
This simple adjustment also works. Change df[,1] to df$date.
df$date = strptime(df[,1], "%d.%m.%Y %H:%M:%S")

Resources