I have a data set that contains a simple column consisting of dates, like this:
Dates
1 2012/04/10
2 2012/03/30
3 2012/03/24
4 2012/03/25
5 2012/04/10
6 2012/04/14
7 2012/04/21
My desired output is this:
Dates DateName
1 2012/04/10 April 2012
2 2015/03/30 March 2015
3 2011/03/24 March 2011
4 2016/12/25 December 2016
5 2014/06/10 June 2014
6 2014/05/14 May 2014
7 2018/07/21 August 2018
To do this I used the following code:
dt$Dates <- as.Date(dt$Dates)
dt$DateName <- format(dt$Dates,"%B %Y")
Whilst this works fine, my new column comes out a character class. I wish for this to come out as a date class instead. This is because I cannot sort this column by calendar date. Rather, it sorts alphabetically.
Is there a way to class or re-class my new date format as some sort of date or calander class?
(I'm not necessarily looking for a base-R solution).
(If possible, I would also highly prefer to keep my new format as is).
I have tried the following lines of code and more, but these only return errors.
dt$DateName <- format.Date(dt$Dates,"%B %Y")
dt$DateName <- format.POSIXlt(dt$Dates,"%B %Y")
dt$DateName <- format.difftime(dt$Dates,"%B %Y")
dt$DateName <- as.Date(dt$Dates, format ="%B %Y")
You can convert dates to yearmon class :
dt$month_year <- zoo::as.yearmon(dt$Dates, "%Y/%m/%d")
dt
# Dates month_year
#1 2012/04/10 Apr 2012
#2 2012/03/30 Mar 2012
#3 2012/03/24 Mar 2012
#4 2012/03/25 Mar 2012
#5 2012/04/10 Apr 2012
#6 2012/04/14 Apr 2012
#7 2012/04/21 Apr 2012
class(dt$month_year)
#[1] "yearmon"
You can then sort them
dt[order(dt$month_year), ]
# Dates month_year
#2 2012/03/30 Mar 2012
#3 2012/03/24 Mar 2012
#4 2012/03/25 Mar 2012
#1 2012/04/10 Apr 2012
#5 2012/04/10 Apr 2012
#6 2012/04/14 Apr 2012
#7 2012/04/21 Apr 2012
data
dt <- structure(list(Dates = structure(c(4L, 3L, 1L, 2L, 4L, 5L, 6L
), .Label = c("2012/03/24", "2012/03/25", "2012/03/30", "2012/04/10",
"2012/04/14", "2012/04/21"), class = "factor")), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7"))
Related
This question already has answers here:
Insert rows for missing dates/times
(9 answers)
How to add only missing Dates in Dataframe
(3 answers)
Add missing months for a range of date in R
(2 answers)
Closed 2 years ago.
I have a data of random dates from 2008 to 2020 and their corresponding value
Date Val
September 16, 2012 32
September 19, 2014 33
January 05, 2008 26
June 07, 2017 02
December 15, 2019 03
May 28, 2020 18
I want to fill the missing dates from January 01 2008 to March 31, 2020 and their corresponding value as 1.
I refer some of the post like Post1, Post2 and I am not able to solve the problem based on that. I am a beginner in R.
I am looking for data like this
Date Val
January 01, 2008 1
January 02, 2008 1
January 03, 2008 1
January 04, 2008 1
January 05, 2008 26
........
Use tidyr::complete :
library(dplyr)
df %>%
mutate(Date = as.Date(Date, "%B %d, %Y")) %>%
tidyr::complete(Date = seq(as.Date('2008-01-01'), as.Date('2020-03-31'),
by = 'day'), fill = list(Val = 1)) %>%
mutate(Date = format(Date, "%B %d, %Y"))
# A tibble: 4,475 x 2
# Date Val
# <chr> <dbl>
# 1 January 01, 2008 1
# 2 January 02, 2008 1
# 3 January 03, 2008 1
# 4 January 04, 2008 1
# 5 January 05, 2008 26
# 6 January 06, 2008 1
# 7 January 07, 2008 1
# 8 January 08, 2008 1
# 9 January 09, 2008 1
#10 January 10, 2008 1
# … with 4,465 more rows
data
df <- structure(list(Date = c("September 16, 2012", "September 19, 2014",
"January 05, 2008", "June 07, 2017", "December 15, 2019", "May 28, 2020"
), Val = c(32L, 33L, 26L, 2L, 3L, 18L)), class = "data.frame",
row.names = c(NA, -6L))
We can create data frame with the desired date range and then join our data frame on it and replace all NAs with 1:
library(tidyverse)
days_seq %>%
left_join(df) %>%
mutate(Val = if_else(is.na(Val), as.integer(1), Val))
Joining, by = "Date"
# A tibble: 4,474 x 2
Date Val
<date> <int>
1 2008-01-01 1
2 2008-01-02 1
3 2008-01-03 1
4 2008-01-04 1
5 2008-01-05 33
6 2008-01-06 1
7 2008-01-07 1
8 2008-01-08 1
9 2008-01-09 1
10 2008-01-10 1
# ... with 4,464 more rows
Data
days_seq <- tibble(Date = seq(as.Date("2008/01/01"), as.Date("2020/03/31"), "days"))
df <- tibble::tribble(
~Date, ~Val,
"2012/09/16", 32L,
"2012/09/19", 33L,
"2008/01/05", 33L
)
df$Date <- as.Date(df$Date)
I have a dataframe which has a value column and "month year" column. In the first row Aug 2018 is written for the month year column. Is there a possibility that the following rows which hava a value in the value column are automatically filled with the next month respectively? So that row two is filled with Sep 2018 and row three with Oct 2018 and so on?
Actual result:
value month
645 Aug 2018
589 NA
465 NA
523 NA
632 NA
984 NA
Expected results:
value month
645 Aug 2018
589 Sep 2018
465 Okt 2018
523 Nov 2018
632 Dez 2018
984 Jan 2019
In base R, you could do something like this to create a monthly sequence
df$month <- format(seq(as.Date(paste("01", df$month[1]), "%d %b %Y"),
length.out = nrow(df), by = "month"), "%b %Y")
df
# value month
#1 645 Aug 2018
#2 589 Sep 2018
#3 465 Oct 2018
#4 523 Nov 2018
#5 632 Dec 2018
#6 984 Jan 2019
Important assumption to note here is you have only one value of month which is present in the first row and you want to replace all other values of month by incrementing 1 month from the previous entry.
We can do this with as.yearmon from zoo. Used package version 1.8.3
library(zoo)
df$month <- head(as.yearmon(df$month[1]) + c(0, seq_len(nrow(df)))/12, -1)
df
# value month
#1 645 Aug 2018
#2 589 Sep 2018
#3 465 Oct 2018
#4 523 Nov 2018
#5 632 Dec 2018
#6 984 Jan 2019
data
df <- structure(list(value = c(645L, 589L, 465L, 523L, 632L, 984L),
month = c("Aug 2018", NA, NA, NA, NA, NA)), class = "data.frame",
row.names = c(NA, -6L))
This is part of the dataframe I am working on. The first column represents the year, the second the month, and the third one the number of observations for that month of that year.
2005 07 2
2005 10 4
2005 12 2
2006 01 4
2006 02 1
2006 07 2
2006 08 1
2006 10 3
I have observations from 2000 to 2018. I would like to run a Kernel Regression on this data, so I need to create a continuum integer from a date class vector. For instance Jan 2000 would be 1, Jan 2001 would be 13, Jan 2002 would be 25 and so on. With that I will be able to run the Kernel. Later on, I need to translate that back (1 would be Jan 2000, 2 would be Feb 2000 and so on) to plot my model.
Just use a little algebra:
df$cont <- (df$year - 2000L) * 12L + df$month
You could go backward with modulus and integer division.
df$year <- df$cont %/% 12 + 2000L
df$month <- df$cont %% 12 # 12 is set at 0, so fix that with next line.
df$month[df$month == 0L] <- 12L
Here, %% is the modulus operator and %/% is the integer division operator. See ?"%%" for an explanation of these and other arithmetic operators.
What you can do is something like the following. First create a dates data.frame with expand.grid so we have all the years and months from 2000 01 to 2018 12. Next put this in the correct order and last add an order column so that 2000 01 starts with 1 and 2018 12 is 228. If you merge this with your original table you get the below result. You can then remove columns you don't need. And because you have a dates table you can return the year and month columns based on the order column.
dates <- expand.grid(year = seq(2000, 2018), month = seq(1, 12))
dates <- dates[order(dates$year, dates$month), ]
dates$order <- seq_along(dates$year)
merge(df, dates, by.x = c("year", "month"), by.y = c("year", "month"))
year month obs order
1 2005 10 4 70
2 2005 12 2 72
3 2005 7 2 67
4 2006 1 4 73
5 2006 10 3 82
6 2006 2 1 74
7 2006 7 2 79
8 2006 8 1 80
data:
df <- structure(list(year = c(2005L, 2005L, 2005L, 2006L, 2006L, 2006L, 2006L, 2006L),
month = c(7L, 10L, 12L, 1L, 2L, 7L, 8L, 10L),
obs = c(2L, 4L, 2L, 4L, 1L, 2L, 1L, 3L)),
class = "data.frame",
row.names = c(NA, -8L))
An option is to use yearmon type from zoo package and then calculate difference of months from Jan 2001 using difference between yearmon type.
library(zoo)
# +1 has been added to difference so that Jan 2001 is treated as 1
df$slNum = (as.yearmon(paste0(df$year, df$month),"%Y%m")-as.yearmon("200001","%Y%m"))*12+1
# year month obs slNum
# 1 2005 7 2 67
# 2 2005 10 4 70
# 3 2005 12 2 72
# 4 2006 1 4 73
# 5 2006 2 1 74
# 6 2006 7 2 79
# 7 2006 8 1 80
# 8 2006 10 3 82
Data:
df <- read.table(text =
"year month obs
2005 07 2
2005 10 4
2005 12 2
2006 01 4
2006 02 1
2006 07 2
2006 08 1
2006 10 3",
header = TRUE, stringsAsFactors = FALSE)
I have a dataframe as follow:
ID Mois Year
A 12 2010
B 01 2011
C 04 2010
D 05 2011
E 07 2011
F 11 2010
G 12 2011
H 03 2010
I 01 2012
J 02 2012
I wouls like to add quarter columns as:
quarter1: ( 12(of n-1), 01 of n, 02 of n): means (12 of 2010, 01 of
2011, 02 of 2011)
quarter2:(03 of n , 04 of n, 05 of n)
quarter3: (06 of n, O7 of n, O8of n)
quarter4:( 09of n, 10 of n, 11
of n)
I have tried this code `
data=cbind(data, quarter=ifelse(data$mois==c(12,1,2), "1",
ifelse(data$mois==c(3,4,5),"2",
ifelse(data$mois==c(6,7,8),"3", "4"))))
but it does not work and i dont know how to add the condition of the quarter1 as( 12(of n-1), 01 of n, 02 of n): means (12 of 2010, 01 of 2011, 02 of 2011)
or can we replace data$year where data$month == 12 to year + 1, before doing the quarter?
Any help would be much appreciated.
1) formula We can use this formula to calculate quarters:
transform(data, YearQ = Year + (Mois == 12), Quarter = Mois %% 12 %/% 3 + 1)
giving:
ID Mois Year YearQ Quarter
1 A 12 2010 2011 1
2 B 1 2011 2011 1
3 C 4 2010 2010 2
4 D 5 2011 2011 2
5 E 7 2011 2011 3
6 F 11 2010 2010 4
7 G 12 2011 2012 1
8 H 3 2010 2010 2
9 I 1 2012 2012 1
10 J 2 2012 2012 1
2) yearqtr Another possibility is to use "yearqtr" class giving the same result:
library(zoo)
transform(data, YearQ = Year + (Mois == 12), Quarter = cycle(as.yearqtr(Year + Mois/12)))
giving same as (1).
2a) Alternately we may just wish to create yearmon and yearqtr columns:
transform(data, ym = as.yearmon(Year + (Mois -1)/12), yq = as.yearqtr(Year + Mois/12))
giving:
ID Mois Year ym yq
1 A 12 2010 Dec 2010 2011 Q1
2 B 1 2011 Jan 2011 2011 Q1
3 C 4 2010 Apr 2010 2010 Q2
4 D 5 2011 May 2011 2011 Q2
5 E 7 2011 Jul 2011 2011 Q3
6 F 11 2010 Nov 2010 2010 Q4
7 G 12 2011 Dec 2011 2012 Q1
8 H 3 2010 Mar 2010 2010 Q2
9 I 1 2012 Jan 2012 2012 Q1
10 J 2 2012 Feb 2012 2012 Q1
3) switch We can use switch like this:
transform(data, YearQ = Year + (Mois == 12),
Quarter = sapply(Mois, switch, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 1)))
giving same as (1).
Note
The input data in reproducible form is:
Lines <- "
ID Mois Year
A 12 2010
B 01 2011
C 04 2010
D 05 2011
E 07 2011
F 11 2010
G 12 2011
H 03 2010
I 01 2012
J 02 2012"
data <- read.table(text = Lines, header = TRUE)
If you can do with the new column quarter of class factor, then cut will do it.
m <- data$Mois
m[m == 12] <- 0
data$quarter <- cut(m, breaks = c(-1, 2, 5, 8, 11), labels = as.character(1:4))
rm(m) # tidy up
If you really need or want class character, just coerce it.
data$quarter <- as.character(data$quarter)
DATA.
dput(data)
structure(list(ID = structure(1:10, .Label = c("A", "B", "C",
"D", "E", "F", "G", "H", "I", "J"), class = "factor"), Mois = c(12L,
1L, 4L, 5L, 7L, 11L, 12L, 3L, 1L, 2L), Year = c(2010L, 2011L,
2010L, 2011L, 2011L, 2010L, 2011L, 2010L, 2012L, 2012L)), .Names = c("ID",
"Mois", "Year"), class = "data.frame", row.names = c(NA, -10L
))
Another option could be using the same line of solution as that of OP. Add quarter column using ifelse and then modify year using ifelse too.
data$quarter <- ifelse(data$Mois %in% c(12,1,2), "1",
ifelse(data$Mois %in% c(3,4,5),"2",
ifelse(data$Mois %in% c(6,7,8),"3", "4")))
data$Year <- ifelse(data$Mois == 12, data$Year + 1, data$Year)
data
ID Mois Year quarter
1 A 12 2011 1
2 B 1 2011 1
3 C 4 2010 2
4 D 5 2011 2
5 E 7 2011 3
6 F 11 2010 4
7 G 12 2012 1
8 H 3 2010 2
9 I 1 2012 1
10 J 2 2012 1
Data:
data <- read.table(text = "ID Mois Year
A 12 2010
B 01 2011
C 04 2010
D 05 2011
E 07 2011
F 11 2010
G 12 2011
H 03 2010
I 01 2012
J 02 2012", header = TRUE, stringsAsFactor = FALSE)
I have a data frame in which the values are stored as characters. However, many values contain two numbers that need to be added together. Example:
2014 Q1 Sales 2014 Q2 Sales 2014 Q3 Sales 2014 Q4 Sales
Product 1 3+6 2+10 8 13+2
Product 2 6 4+0 <NA> 5
Product 3 <NA> 5+9 3+1 11
Is there a way to go through the whole data frame and replace all cells containing characters like "3+6" with new values equal to their sum? I assume this would involve coercing the characters to numeric or integers, but I don't know how that would be possible for values with the + sign in them. I would like the example data frame to end up looking like this:
2014 Q1 Sales 2014 Q2 Sales 2014 Q3 Sales 2014 Q4 Sales
Product 1 9 12 8 15
Product 2 6 4 <NA> 5
Product 3 <NA> 14 4 11
Here's an easier example:
dat <- data.frame(a=c("3+6", "10"), b=c("12", NA), c=c("3+4", "5+6"))
dat
## a b c
## 1 3+6 12 3+4
## 2 10 <NA> 5+6
apply(dat, 1:2, function(x) eval(parse(text=x)))
## a b c
## [1,] 9 12 7
## [2,] 10 NA 11
Using R itself to do the computation with eval and parse does the trick.
Here is one option with gsubfn without using eval(parse. We convert the 'data.frame' to 'matrix' (as.matrix(dat)). We match the numbers ([0-9]+), capture it as a group using parentheses ((..)) followed by +, followed by second set of numbers, and replace it by converting to numeric class and then do the +. The output can be assigned back to the original dataset to get the same structure as in 'dat'.
library(gsubfn)
dat[] <- as.numeric(gsubfn('([0-9]+)\\+([0-9]+)',
~as.numeric(x)+as.numeric(y), as.matrix(dat)))
dat
# 2014 Q1 Sales 2014 Q2 Sales 2014 Q3 Sales 2014 Q4 Sales
#Product 1 9 12 8 15
#Product 2 6 4 NA 5
#Product 3 NA 14 4 11
Or we can loop the columns with lapply and perform the replacement with gsubfn for each of the columns.
dat[] <- lapply(dat, function(x) as.numeric(gsubfn('([0-9]+)\\+([0-9]+)',
~as.numeric(x)+as.numeric(y), as.character(x))))
data
dat <- structure(list(`2014 Q1 Sales` = structure(c(1L, 2L, NA), .Label = c("3+6",
"6"), class = "factor"), `2014 Q2 Sales` = structure(1:3, .Label = c("2+10",
"4+0", "5+9"), class = "factor"), `2014 Q3 Sales` = structure(c(2L,
NA, 1L), .Label = c("3+1", "8"), class = "factor"), `2014 Q4 Sales` = structure(c(2L,
3L, 1L), .Label = c("11", "13+2", "5"), class = "factor")), .Names = c("2014 Q1 Sales",
"2014 Q2 Sales", "2014 Q3 Sales", "2014 Q4 Sales"), class = "data.frame", row.names = c("Product 1",
"Product 2", "Product 3"))