I have a column in my large data set called Date. How do I extract both the year and month from it? I would like to create a column Month where the month goes from 1-12 and year where the year goes from the first year in my data set to the last year in my data set.
Thanks.
> typeof(data$Date)
[1] "character
> head(data$Date)
[1] "2/06/2020 11:23" "12/06/2020 7:56" "12/06/2020 7:56" "29/06/2020 16:54" "3/06/2020 15:09" "25/06/2020 17:11"
dplyr and lubridate -
library(dplyr)
library(lubridate)
data <- data %>%
mutate(Date = dmy_hm(Date),
month = month(Date),
year = year(Date))
# Date month year
#1 2020-06-02 11:23:00 6 2020
#2 2020-06-12 07:56:00 6 2020
#3 2020-06-12 07:56:00 6 2020
#4 2020-06-29 16:54:00 6 2020
#5 2020-06-03 15:09:00 6 2020
#6 2020-06-25 17:11:00 6 2020
Base R -
data$Date <- as.POSIXct(data$Date, tz = 'UTC', format = '%d/%m/%Y %H:%M')
data <- transform(data, Month = format(Date, '%m'), Year = format(Date, '%Y'))
data
data <- structure(list(Date = c("2/06/2020 11:23", "12/06/2020 7:56",
"12/06/2020 7:56", "29/06/2020 16:54", "3/06/2020 15:09", "25/06/2020 17:11"
)), class = "data.frame", row.names = c(NA, -6L))
Related
I have a dataframe with a column named date structured as bellow. Note that this is a small sample of my dataframe. I have different months and different years (my main date range is from 2005-01-03 to 2021-12-31). I want to count the number of days in each month and year combination i.e. 2 days in 2005-12, 3 days in 2006-01, ... . How can I get a vector of these counts?
df$date <- as.Date(c(
"2005-12-28", "2005-12-31", "2006-01-01", "2006-01-02", "2006-01-03", "2006-02-04", "2007-03-02", "2007-03-03", "2007-03-06", "2007-04-10", "2007-04-11"))
library(dplyr)
df %>%
# distinct(date) %>% # unnecessary if no dupe dates
mutate(month = lubridate::floor_date(date, "month")) %>%
count(month)
Result
month n
1 2005-12-01 2
2 2006-01-01 3
3 2006-02-01 1
4 2007-03-01 3
5 2007-04-01 2
Data used:
df <- structure(list(date = structure(c(13145, 13148, 13149, 13150,
13151, 13183, 13574, 13575, 13578, 13613, 13614), class = "Date")), row.names = c(NA,
-11L), class = "data.frame")
df %>% mutate(date = format(.$date, "%Y-%m")) %>% group_by(date) %>% count(date) -> out
out gives you summary by year and month as tibble.
Here is another solution ,
a <- as.Date(c("2005-12-28", "2005-12-31", "2006-01-01",
"2006-01-02", "2006-01-03", "2006-02-04",
"2007-03-02", "2007-03-03", "2007-03-06",
"2007-04-10", "2007-04-11"))
date <- strsplit(as.character(a) , "-")
# to extract months
months <- lapply(date , function(x) x[2])
# to extract years
years <- lapply(date , function(x) x[1])
table(unlist(months))
#>
#> 01 02 03 04 12
#> 3 1 3 2 2
table(unlist(years))
#>
#> 2005 2006 2007
#> 2 4 5
Created on 2022-06-01 by the reprex package (v2.0.1)
I am looking for the solution to my task. It is to add a new column to the data frame, which contains monthly data, that would have year-on-year change for each record that has a corresponding record of same month a year ago.
So, my code as of now is:
library(blsAPI)
cpi <- blsAPI("CUUR0000SA0",2,TRUE)
cpi$value <- as.numeric(cpi$value)
cpi$date <- as.Date(
paste0("1 ",cpi$periodName," ",cpi$year),
format = "%d %B %Y")
cpi <- cpi[order(cpi$date),]
I would like to add new column with YoY change value for cpi$value column.
Something like:
df <- data.frame(Date = c("2021-01-16", "2017-05-09"))
df |> dplyr::mutate(new = as.Date(Date) + 365)
#> Date new
#> 1 2021-01-16 2022-01-16
#> 2 2017-05-09 2018-05-09
library(lubridate)
df |>
dplyr::mutate(new = as_date(Date) %m+% years(1))
#> Date new
#> 1 2021-01-16 2022-01-16
#> 2 2017-05-09 2018-05-09
Created on 2022-02-11 by the reprex package (v2.0.1)
I have the following data
dat <- structure(list(Datetime = structure(c(1261987200, 1261987500,
1261987800, 1261988100, 1261988400), class = c("POSIXct", "POSIXt"
), tzone = ""), Rain = c(0, -999, -999, -999, -999)), row.names = c(NA,
5L), class = "data.frame")
The first column contains the dates (year, month, day, hour). The second column is Rainfall.
The dates are not continuous. Some of the dates with missing Rainfall were already removed.
I would like to ask what is the best way of subsetting this data in terms of Year, Day, month or hour?
For example, I just want to get all data for July (month = 7). What I do is something like this:
dat$month<-substr(dat$Datetime,6,7)
july<-dat[which(dat$month == 7),]
or if its a year, say 2010:
dat$year<-substr(dat$Datetime,1,4)
dat<-which(dat$year == 2010),]
Then convert them into numeric types.
Is there an easier way to do this in R? the dates are already formatted using POSIXlt.
I'll appreciate any help on this.
Lyndz
If you want to convert the Datetime to year or month (numeric), you can try format like below
df1 <- transform(
df,
year = as.numeric(format(Datetime,"%Y")),
month = as.numeric(format(Datetime,"%m"))
)
which gives
Datetime Rain year month
1 2009-12-28 09:00:00 0 2009 12
2 2009-12-28 09:05:00 -999 2009 12
3 2009-12-28 09:10:00 -999 2009 12
4 2009-12-28 09:15:00 -999 2009 12
5 2009-12-28 09:20:00 -999 2009 12
If you want to subset df1 further by year (for example, year == 2010), then
subset(
df1,
year == 2010
)
You can use the lubridate package and associated month and year functions.
library(tidyverse)
library(lubridate)
df <- structure(list(
Datetime = structure(
c(1261987200, 1261987500,
1261987800, 1261988100, 1261988400),
class = c("POSIXct", "POSIXt"),
tzone = ""
),
Rain = c(0,-999,-999,-999,-999)
),
row.names = c(NA,
5L),
class = "data.frame") %>%
as_tibble()
df %>%
mutate(month = lubridate::month(Datetime),
year = lubridate::year(Datetime))
Output:
# A tibble: 5 x 4
Datetime Rain month year
<dttm> <dbl> <dbl> <dbl>
1 2009-12-28 16:00:00 0 12 2009
2 2009-12-28 16:05:00 -999 12 2009
3 2009-12-28 16:10:00 -999 12 2009
4 2009-12-28 16:15:00 -999 12 2009
5 2009-12-28 16:20:00 -999 12 2009
I have a data frame with dates and the time in it.
Now I want to convert each date into the correct month. How can I do this?
Now it looks like this:
1 01.01.2019 00:00:20.747000
2 21.04.2019 00:00:21.362000
3 31.08.2019 00:00:21.422000
I need it in a format like this:
1 01.01.2019
2 21.04.2019
3 31.08.2019
or eventually like this:
1 January
2 April
3 August
With base R, you can do the following.
First, I wasn't sure if initial data frame was in POSIXct format. I converted it for my example.
Then you can use format to extract the month number or month name.
lubridate is a great package to use for various date manipulations as well and has month function.
df$datetime <- as.POSIXct(df$datetime, format = "%d.%m.%Y %H:%M:%OS")
df$date_only <- as.Date(df$datetime)
df$month_num <- format(df$datetime, "%m")
df$month <- format(df$datetime, "%B")
df
Output
datetime date_only month_num month
1 2019-01-01 00:00:20 2019-01-01 01 January
2 2019-04-21 00:00:21 2019-04-21 04 April
3 2019-08-31 00:00:21 2019-08-31 08 August
Data
df <- structure(list(datetime = c("01.01.2019 00:00:20.747000", "21.04.2019 00:00:21.362000",
"31.08.2019 00:00:21.422000")), class = "data.frame", row.names = c(NA,
-3L))
Try:
df$date <- lubridate::dmy_hms(df$date)
df$date <- format(df$date, "%d.%m.%Y")
data:
df: structure(list(date = c("01.01.2019", "21.04.2019", "31.08.2019"
)), row.names = c(NA, -3L), class = "data.frame")
This question already has an answer here:
Sort year-month column by year AND month
(1 answer)
Closed 1 year ago.
I have dates in the format mm/yyyy in column 1, and then results in column 2.
month Result
01/2018 96.13636
02/2018 96.40000
3/2018 94.00000
04/2018 97.92857
05/2018 95.75000
11/2017 98.66667
12/2017 97.78947
How can I order by month such that it will start from the first month (11/2017) and end (05/2018).
I have tried a few 'orders', but none seem to be ordering by year and then by month
In tidyverse (w/ lubridate added):
library(tidyverse)
library(lubridate)
dfYrMon <-
df1 %>%
mutate(date = parse_date_time(month, "my"),
year = year(date),
month = month(date)
) %>%
arrange(year, month) %>%
select(date, year, month, result)
With data:
df1 <- tibble(month = c("01/2018", "02/2018", "03/2018", "04/2018", "05/2018", "11/2017", "12/2017"),
result = c(96.13636, 96.4, 94, 97.92857, 95.75, 98.66667, 97.78947))
Will get you this 'dataframe':
# A tibble: 7 x 4
date year month result
<dttm> <dbl> <dbl> <dbl>
1 2017-11-01 2017 11 98.66667
2 2017-12-01 2017 12 97.78947
3 2018-01-01 2018 1 96.13636
4 2018-02-01 2018 2 96.40000
5 2018-03-01 2018 3 94.00000
6 2018-04-01 2018 4 97.92857
7 2018-05-01 2018 5 95.75000
Making your data values atomic (year in its own column, month in its own column) generally improves the ease of manipulation.
Or if you want to use base R date manipulations instead of lubridate's:
library(tidyverse)
dfYrMon_base <-
df1 %>%
mutate(date = as.Date(paste("01/", month, sep = ""), "%d/%m/%Y"),
year = format(as.Date(date, format="%d/%m/%Y"),"%Y"),
month = format(as.Date(date, format="%d/%m/%Y"),"%m")
) %>%
arrange(year, month) %>%
select(date, year, month, result)
dfYrMon_base
Note the datatypes created.
# A tibble: 7 x 4
date year month result
<date> <chr> <chr> <dbl>
1 2017-11-01 2017 11 98.66667
2 2017-12-01 2017 12 97.78947
3 2018-01-01 2018 01 96.13636
4 2018-02-01 2018 02 96.40000
5 2018-03-01 2018 03 94.00000
6 2018-04-01 2018 04 97.92857
7 2018-05-01 2018 05 95.75000
We can convert it to yearmon class and then do the order
library(zoo)
out <- df1[order(as.yearmon(df1$month, "%m/%Y"), df1$Result),]
row.names(out) <- NULL
out
# month Result
#1 11/2017 98.66667
#2 12/2017 97.78947
#3 01/2018 96.13636
#4 02/2018 96.40000
#5 03/2018 94.00000
#6 04/2018 97.92857
#7 05/2018 95.75000
data
df1 <- structure(list(month = c("01/2018", "02/2018", "03/2018", "04/2018",
"05/2018", "11/2017", "12/2017"), Result = c(96.13636, 96.4,
94, 97.92857, 95.75, 98.66667, 97.78947)), .Names = c("month",
"Result"), class = "data.frame",
row.names = c("1", "2", "3",
"4", "5", "6", "7"))