I have a dataframe as follow:
ID Mois Year
A 12 2010
B 01 2011
C 04 2010
D 05 2011
E 07 2011
F 11 2010
G 12 2011
H 03 2010
I 01 2012
J 02 2012
I wouls like to add quarter columns as:
quarter1: ( 12(of n-1), 01 of n, 02 of n): means (12 of 2010, 01 of
2011, 02 of 2011)
quarter2:(03 of n , 04 of n, 05 of n)
quarter3: (06 of n, O7 of n, O8of n)
quarter4:( 09of n, 10 of n, 11
of n)
I have tried this code `
data=cbind(data, quarter=ifelse(data$mois==c(12,1,2), "1",
ifelse(data$mois==c(3,4,5),"2",
ifelse(data$mois==c(6,7,8),"3", "4"))))
but it does not work and i dont know how to add the condition of the quarter1 as( 12(of n-1), 01 of n, 02 of n): means (12 of 2010, 01 of 2011, 02 of 2011)
or can we replace data$year where data$month == 12 to year + 1, before doing the quarter?
Any help would be much appreciated.
1) formula We can use this formula to calculate quarters:
transform(data, YearQ = Year + (Mois == 12), Quarter = Mois %% 12 %/% 3 + 1)
giving:
ID Mois Year YearQ Quarter
1 A 12 2010 2011 1
2 B 1 2011 2011 1
3 C 4 2010 2010 2
4 D 5 2011 2011 2
5 E 7 2011 2011 3
6 F 11 2010 2010 4
7 G 12 2011 2012 1
8 H 3 2010 2010 2
9 I 1 2012 2012 1
10 J 2 2012 2012 1
2) yearqtr Another possibility is to use "yearqtr" class giving the same result:
library(zoo)
transform(data, YearQ = Year + (Mois == 12), Quarter = cycle(as.yearqtr(Year + Mois/12)))
giving same as (1).
2a) Alternately we may just wish to create yearmon and yearqtr columns:
transform(data, ym = as.yearmon(Year + (Mois -1)/12), yq = as.yearqtr(Year + Mois/12))
giving:
ID Mois Year ym yq
1 A 12 2010 Dec 2010 2011 Q1
2 B 1 2011 Jan 2011 2011 Q1
3 C 4 2010 Apr 2010 2010 Q2
4 D 5 2011 May 2011 2011 Q2
5 E 7 2011 Jul 2011 2011 Q3
6 F 11 2010 Nov 2010 2010 Q4
7 G 12 2011 Dec 2011 2012 Q1
8 H 3 2010 Mar 2010 2010 Q2
9 I 1 2012 Jan 2012 2012 Q1
10 J 2 2012 Feb 2012 2012 Q1
3) switch We can use switch like this:
transform(data, YearQ = Year + (Mois == 12),
Quarter = sapply(Mois, switch, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 1)))
giving same as (1).
Note
The input data in reproducible form is:
Lines <- "
ID Mois Year
A 12 2010
B 01 2011
C 04 2010
D 05 2011
E 07 2011
F 11 2010
G 12 2011
H 03 2010
I 01 2012
J 02 2012"
data <- read.table(text = Lines, header = TRUE)
If you can do with the new column quarter of class factor, then cut will do it.
m <- data$Mois
m[m == 12] <- 0
data$quarter <- cut(m, breaks = c(-1, 2, 5, 8, 11), labels = as.character(1:4))
rm(m) # tidy up
If you really need or want class character, just coerce it.
data$quarter <- as.character(data$quarter)
DATA.
dput(data)
structure(list(ID = structure(1:10, .Label = c("A", "B", "C",
"D", "E", "F", "G", "H", "I", "J"), class = "factor"), Mois = c(12L,
1L, 4L, 5L, 7L, 11L, 12L, 3L, 1L, 2L), Year = c(2010L, 2011L,
2010L, 2011L, 2011L, 2010L, 2011L, 2010L, 2012L, 2012L)), .Names = c("ID",
"Mois", "Year"), class = "data.frame", row.names = c(NA, -10L
))
Another option could be using the same line of solution as that of OP. Add quarter column using ifelse and then modify year using ifelse too.
data$quarter <- ifelse(data$Mois %in% c(12,1,2), "1",
ifelse(data$Mois %in% c(3,4,5),"2",
ifelse(data$Mois %in% c(6,7,8),"3", "4")))
data$Year <- ifelse(data$Mois == 12, data$Year + 1, data$Year)
data
ID Mois Year quarter
1 A 12 2011 1
2 B 1 2011 1
3 C 4 2010 2
4 D 5 2011 2
5 E 7 2011 3
6 F 11 2010 4
7 G 12 2012 1
8 H 3 2010 2
9 I 1 2012 1
10 J 2 2012 1
Data:
data <- read.table(text = "ID Mois Year
A 12 2010
B 01 2011
C 04 2010
D 05 2011
E 07 2011
F 11 2010
G 12 2011
H 03 2010
I 01 2012
J 02 2012", header = TRUE, stringsAsFactor = FALSE)
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a dataset looks like this:
TYPE YEAR NUMBERS
A 2020 60
A 2019 NA
A 2018 88
A 2017 NA
A 2016 90
I want to impute the missing value with the mean of the value in column 'numbers'
I have search for a lot of tutorial, but they just directly replace the missing value with the mean which is not what i want. I try using mice and hmics, but they come out errors. So, if there is any better way to do this?Thanks!
I'd have done this :
df <- read.table(text = 'TYPE YEAR NUMBERS
A 2020 60
A 2019 NA
A 2018 88
A 2017 NA
A 2016 90', header=T)
a= mean(na.omit(df$NUMBERS))
df[is.na(df$NUMBERS),"NUMBERS"]=a
df
Output:
TYPE YEAR NUMBERS
1 A 2020 60.00000
2 A 2019 79.33333
3 A 2018 88.00000
4 A 2017 79.33333
5 A 2016 90.00000
Is it what you wanted?
I'm inferring from the presence of the TYPE column that you should be imputing based on the group's mean, not the population's mean.
Modified data:
dat <- structure(list(TYPE = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B"), YEAR = c(2020L, 2019L, 2018L, 2017L, 2016L, 2020L, 2019L, 2018L, 2017L, 2016L), NUMBERS = c(60L, NA, 88L, NA, 90L, 160L, NA, 188L, NA, 190L)), class = "data.frame", row.names = c(NA, -10L))
base R
do.call(rbind, by(dat, dat$TYPE,
function(z) { z$NUMBERS[is.na(z$NUMBERS)] <- mean(z$NUMBERS, na.rm = TRUE); z}))
# TYPE YEAR NUMBERS
# A.1 A 2020 60.00000
# A.2 A 2019 79.33333
# A.3 A 2018 88.00000
# A.4 A 2017 79.33333
# A.5 A 2016 90.00000
# B.6 B 2020 160.00000
# B.7 B 2019 179.33333
# B.8 B 2018 188.00000
# B.9 B 2017 179.33333
# B.10 B 2016 190.00000
or
do.call(rbind, by(dat, dat$TYPE,
function(z) transform(z, NUMBERS = ifelse(is.na(NUMBERS), mean(NUMBERS, na.rm = TRUE), NUMBERS))))
dplyr
library(dplyr)
dat %>%
group_by(TYPE) %>%
mutate(NUMBERS = if_else(is.na(NUMBERS), mean(NUMBERS, na.rm = TRUE), as.numeric(NUMBERS))) %>%
ungroup()
# # A tibble: 10 x 3
# TYPE YEAR NUMBERS
# <chr> <int> <dbl>
# 1 A 2020 60
# 2 A 2019 79.3
# 3 A 2018 88
# 4 A 2017 79.3
# 5 A 2016 90
# 6 B 2020 160
# 7 B 2019 179.
# 8 B 2018 188
# 9 B 2017 179.
# 10 B 2016 190
I have a dataframe like this:
data <- data.frame(Time = rep(c("Jan 1999", "Feb 1999", "Mar 1999"), each = 3), Country = rep(c("Australia", "Brazil", "Canada"), 3), rep(Group = c("A", "B", "A"), 3), Intercept = NA)
and another dataframe with coefficients from a regression where A and B are the Intercepts for the different groups.
coeffs <- data.frame(Time = c("Jan 1999", "Feb 1999", "Mar 1999"), A = c(1,2,3), B = c(3,2,1))
Now I want to put the Intercepts from the coeffs dataframe into the dataframe's intercept columns. I did this the following way:
l <- length(unique(data[,"Country"]))
data[,"Intercept"] <- ifelse(data_1[,"Group_1"] == "A", rep(coeffs_1[,"A"], each = l), rep(coeffs_1[,"B"], each = l))
This seems to work well for the 2 groups, but now I need to do the same thing for 7 groups and I don't see how I could generalize the approach above. I guess I could use a 7 level nested ifelse statement or a for loop, but there has to be a more elegant way.
Thanks for your help!
Get coeffs in long format and join with data :
library(dplyr)
coeffs %>%
tidyr::pivot_longer(cols = -Time, names_to = 'Group',
values_to = 'Intercept') %>%
right_join(data, by = c('Time', 'Group'))
# A tibble: 9 x 4
# Time Group Intercept Country
# <chr> <chr> <dbl> <chr>
#1 Jan 1999 A 1 Australia
#2 Jan 1999 A 1 Canada
#3 Jan 1999 B 3 Brazil
#4 Feb 1999 A 2 Australia
#5 Feb 1999 A 2 Canada
#6 Feb 1999 B 2 Brazil
#7 Mar 1999 A 3 Australia
#8 Mar 1999 A 3 Canada
#9 Mar 1999 B 1 Brazil
Used this dataframe for data :
data <- data.frame(Time = rep(c("Jan 1999", "Feb 1999", "Mar 1999"), each = 3),
Country = rep(c("Australia", "Brazil", "Canada"), 3),
Group = rep(c("A", "B", "A"), 3))
This question already has answers here:
Insert rows for missing dates/times
(9 answers)
How to add only missing Dates in Dataframe
(3 answers)
Add missing months for a range of date in R
(2 answers)
Closed 2 years ago.
I have a data of random dates from 2008 to 2020 and their corresponding value
Date Val
September 16, 2012 32
September 19, 2014 33
January 05, 2008 26
June 07, 2017 02
December 15, 2019 03
May 28, 2020 18
I want to fill the missing dates from January 01 2008 to March 31, 2020 and their corresponding value as 1.
I refer some of the post like Post1, Post2 and I am not able to solve the problem based on that. I am a beginner in R.
I am looking for data like this
Date Val
January 01, 2008 1
January 02, 2008 1
January 03, 2008 1
January 04, 2008 1
January 05, 2008 26
........
Use tidyr::complete :
library(dplyr)
df %>%
mutate(Date = as.Date(Date, "%B %d, %Y")) %>%
tidyr::complete(Date = seq(as.Date('2008-01-01'), as.Date('2020-03-31'),
by = 'day'), fill = list(Val = 1)) %>%
mutate(Date = format(Date, "%B %d, %Y"))
# A tibble: 4,475 x 2
# Date Val
# <chr> <dbl>
# 1 January 01, 2008 1
# 2 January 02, 2008 1
# 3 January 03, 2008 1
# 4 January 04, 2008 1
# 5 January 05, 2008 26
# 6 January 06, 2008 1
# 7 January 07, 2008 1
# 8 January 08, 2008 1
# 9 January 09, 2008 1
#10 January 10, 2008 1
# … with 4,465 more rows
data
df <- structure(list(Date = c("September 16, 2012", "September 19, 2014",
"January 05, 2008", "June 07, 2017", "December 15, 2019", "May 28, 2020"
), Val = c(32L, 33L, 26L, 2L, 3L, 18L)), class = "data.frame",
row.names = c(NA, -6L))
We can create data frame with the desired date range and then join our data frame on it and replace all NAs with 1:
library(tidyverse)
days_seq %>%
left_join(df) %>%
mutate(Val = if_else(is.na(Val), as.integer(1), Val))
Joining, by = "Date"
# A tibble: 4,474 x 2
Date Val
<date> <int>
1 2008-01-01 1
2 2008-01-02 1
3 2008-01-03 1
4 2008-01-04 1
5 2008-01-05 33
6 2008-01-06 1
7 2008-01-07 1
8 2008-01-08 1
9 2008-01-09 1
10 2008-01-10 1
# ... with 4,464 more rows
Data
days_seq <- tibble(Date = seq(as.Date("2008/01/01"), as.Date("2020/03/31"), "days"))
df <- tibble::tribble(
~Date, ~Val,
"2012/09/16", 32L,
"2012/09/19", 33L,
"2008/01/05", 33L
)
df$Date <- as.Date(df$Date)
I have a data set that contains a simple column consisting of dates, like this:
Dates
1 2012/04/10
2 2012/03/30
3 2012/03/24
4 2012/03/25
5 2012/04/10
6 2012/04/14
7 2012/04/21
My desired output is this:
Dates DateName
1 2012/04/10 April 2012
2 2015/03/30 March 2015
3 2011/03/24 March 2011
4 2016/12/25 December 2016
5 2014/06/10 June 2014
6 2014/05/14 May 2014
7 2018/07/21 August 2018
To do this I used the following code:
dt$Dates <- as.Date(dt$Dates)
dt$DateName <- format(dt$Dates,"%B %Y")
Whilst this works fine, my new column comes out a character class. I wish for this to come out as a date class instead. This is because I cannot sort this column by calendar date. Rather, it sorts alphabetically.
Is there a way to class or re-class my new date format as some sort of date or calander class?
(I'm not necessarily looking for a base-R solution).
(If possible, I would also highly prefer to keep my new format as is).
I have tried the following lines of code and more, but these only return errors.
dt$DateName <- format.Date(dt$Dates,"%B %Y")
dt$DateName <- format.POSIXlt(dt$Dates,"%B %Y")
dt$DateName <- format.difftime(dt$Dates,"%B %Y")
dt$DateName <- as.Date(dt$Dates, format ="%B %Y")
You can convert dates to yearmon class :
dt$month_year <- zoo::as.yearmon(dt$Dates, "%Y/%m/%d")
dt
# Dates month_year
#1 2012/04/10 Apr 2012
#2 2012/03/30 Mar 2012
#3 2012/03/24 Mar 2012
#4 2012/03/25 Mar 2012
#5 2012/04/10 Apr 2012
#6 2012/04/14 Apr 2012
#7 2012/04/21 Apr 2012
class(dt$month_year)
#[1] "yearmon"
You can then sort them
dt[order(dt$month_year), ]
# Dates month_year
#2 2012/03/30 Mar 2012
#3 2012/03/24 Mar 2012
#4 2012/03/25 Mar 2012
#1 2012/04/10 Apr 2012
#5 2012/04/10 Apr 2012
#6 2012/04/14 Apr 2012
#7 2012/04/21 Apr 2012
data
dt <- structure(list(Dates = structure(c(4L, 3L, 1L, 2L, 4L, 5L, 6L
), .Label = c("2012/03/24", "2012/03/25", "2012/03/30", "2012/04/10",
"2012/04/14", "2012/04/21"), class = "factor")), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7"))
This is part of the dataframe I am working on. The first column represents the year, the second the month, and the third one the number of observations for that month of that year.
2005 07 2
2005 10 4
2005 12 2
2006 01 4
2006 02 1
2006 07 2
2006 08 1
2006 10 3
I have observations from 2000 to 2018. I would like to run a Kernel Regression on this data, so I need to create a continuum integer from a date class vector. For instance Jan 2000 would be 1, Jan 2001 would be 13, Jan 2002 would be 25 and so on. With that I will be able to run the Kernel. Later on, I need to translate that back (1 would be Jan 2000, 2 would be Feb 2000 and so on) to plot my model.
Just use a little algebra:
df$cont <- (df$year - 2000L) * 12L + df$month
You could go backward with modulus and integer division.
df$year <- df$cont %/% 12 + 2000L
df$month <- df$cont %% 12 # 12 is set at 0, so fix that with next line.
df$month[df$month == 0L] <- 12L
Here, %% is the modulus operator and %/% is the integer division operator. See ?"%%" for an explanation of these and other arithmetic operators.
What you can do is something like the following. First create a dates data.frame with expand.grid so we have all the years and months from 2000 01 to 2018 12. Next put this in the correct order and last add an order column so that 2000 01 starts with 1 and 2018 12 is 228. If you merge this with your original table you get the below result. You can then remove columns you don't need. And because you have a dates table you can return the year and month columns based on the order column.
dates <- expand.grid(year = seq(2000, 2018), month = seq(1, 12))
dates <- dates[order(dates$year, dates$month), ]
dates$order <- seq_along(dates$year)
merge(df, dates, by.x = c("year", "month"), by.y = c("year", "month"))
year month obs order
1 2005 10 4 70
2 2005 12 2 72
3 2005 7 2 67
4 2006 1 4 73
5 2006 10 3 82
6 2006 2 1 74
7 2006 7 2 79
8 2006 8 1 80
data:
df <- structure(list(year = c(2005L, 2005L, 2005L, 2006L, 2006L, 2006L, 2006L, 2006L),
month = c(7L, 10L, 12L, 1L, 2L, 7L, 8L, 10L),
obs = c(2L, 4L, 2L, 4L, 1L, 2L, 1L, 3L)),
class = "data.frame",
row.names = c(NA, -8L))
An option is to use yearmon type from zoo package and then calculate difference of months from Jan 2001 using difference between yearmon type.
library(zoo)
# +1 has been added to difference so that Jan 2001 is treated as 1
df$slNum = (as.yearmon(paste0(df$year, df$month),"%Y%m")-as.yearmon("200001","%Y%m"))*12+1
# year month obs slNum
# 1 2005 7 2 67
# 2 2005 10 4 70
# 3 2005 12 2 72
# 4 2006 1 4 73
# 5 2006 2 1 74
# 6 2006 7 2 79
# 7 2006 8 1 80
# 8 2006 10 3 82
Data:
df <- read.table(text =
"year month obs
2005 07 2
2005 10 4
2005 12 2
2006 01 4
2006 02 1
2006 07 2
2006 08 1
2006 10 3",
header = TRUE, stringsAsFactors = FALSE)