How to calculate the number of months from the initial date for each individual - r

This is a representation of my dataset
ID<-c(rep(1,10),rep(2,8))
year<-c(2007,2007,2007,2008,2008,2009,2010,2009,2010,2011,
2008,2008,2009,2010,2009,2010,2011,2011)
month<-c(2,7,12,4,11,6,11,1,9,4,3,6,7,4,9,11,2,8)
mydata<-data.frame(ID,year,month)
I want to calculate for each individual the number of months from the initial date. I am using two variables: year and month.
I firstly order years and months:
mydata2<-mydata%>%group_by(ID,year)%>%arrange(year,month,.by_group=T)
Then I created the variable date considering that the day begin with 01:
mydata2$date<-paste("01",mydata2$month,mydata2$year,sep = "-")
then I used lubridate to change this variable in date format
mydata2$date<-dmy(mydata2$date)
But after this, I really don't know what to do, in order to have such a dataset (preferably using dplyr code) below:
ID year month date dif_from_init
1 1 2007 2 01-2-2007 0
2 1 2007 7 01-7-2007 5
3 1 2007 12 01-12-2007 10
4 1 2008 4 01-4-2008 14
5 1 2008 11 01-11-2008 21
6 1 2009 1 01-1-2009 23
7 1 2009 6 01-6-2009 28
8 1 2010 9 01-9-2010 43
9 1 2010 11 01-11-2010 45
10 1 2011 4 01-4-2011 50
11 2 2008 3 01-3-2008 0
12 2 2008 6 01-6-2008 3
13 2 2009 7 01-7-2009 16
14 2 2009 9 01-9-2009 18
15 2 2010 4 01-4-2010 25
16 2 2010 11 01-11-2010 32
17 2 2011 2 01-2-2011 35
18 2 2011 8 01-8-2011 41

One way could be:
mydata %>%
group_by(ID) %>%
mutate(date = as.Date(sprintf('%d-%d-01',year, month)),
diff = as.numeric(round((date - date[1])/365*12)))
# A tibble: 18 x 5
# Groups: ID [2]
ID year month date diff
<dbl> <dbl> <dbl> <date> <dbl>
1 1 2007 2 2007-02-01 0
2 1 2007 7 2007-07-01 5
3 1 2007 12 2007-12-01 10
4 1 2008 4 2008-04-01 14
5 1 2008 11 2008-11-01 21
6 1 2009 6 2009-06-01 28
7 1 2010 11 2010-11-01 45
8 1 2009 1 2009-01-01 23
9 1 2010 9 2010-09-01 43
10 1 2011 4 2011-04-01 50
11 2 2008 3 2008-03-01 0
12 2 2008 6 2008-06-01 3
13 2 2009 7 2009-07-01 16
14 2 2010 4 2010-04-01 25
15 2 2009 9 2009-09-01 18
16 2 2010 11 2010-11-01 32
17 2 2011 2 2011-02-01 35
18 2 2011 8 2011-08-01 41

Related

Repeating annual values multiple times to form a monthly dataframe

I have an annual dataset as below:
year <- c(2016,2017,2018)
xxx <- c(1,2,3)
yyy <- c(4,5,6)
df <- data.frame(year,xxx,yyy)
print(df)
year xxx yyy
1 2016 1 4
2 2017 2 5
3 2018 3 6
Where the values in column xxx and yyy correspond to values for that year.
I would like to expand this dataframe (or create a new dataframe), which retains the same column names, but repeats each value 12 times (corresponding to the month of that year) and repeat the yearly value 12 times in the first column.
As mocked up by the code below:
year <- rep(2016:2018,each=12)
xxx <- rep(1:3,each=12)
yyy <- rep(4:6,each=12)
df2 <- data.frame(year,xxx,yyy)
print(df2)
year xxx yyy
1 2016 1 4
2 2016 1 4
3 2016 1 4
4 2016 1 4
5 2016 1 4
6 2016 1 4
7 2016 1 4
8 2016 1 4
9 2016 1 4
10 2016 1 4
11 2016 1 4
12 2016 1 4
13 2017 2 5
14 2017 2 5
15 2017 2 5
16 2017 2 5
17 2017 2 5
18 2017 2 5
19 2017 2 5
20 2017 2 5
21 2017 2 5
22 2017 2 5
23 2017 2 5
24 2017 2 5
25 2018 3 6
26 2018 3 6
27 2018 3 6
28 2018 3 6
29 2018 3 6
30 2018 3 6
31 2018 3 6
32 2018 3 6
33 2018 3 6
34 2018 3 6
35 2018 3 6
36 2018 3 6
Any help would be greatly appreciated!
I'm new to R and I can see how I would do this with a loop statement but was wondering if there was an easier solution.
Convert df to a matrix, take the kronecker product with a vector of 12 ones and then convert back to a data.frame. The as.data.frame can be omitted if a matrix result is ok.
as.data.frame(as.matrix(df) %x% rep(1, 12))

How to create a new column using looping and rbind in r?

I have a data similar like this. I would like to make 3 columns (date1, date2, date3) by using looping and rbind. It is because I am requied to do it by only that method.
(all I was told is making a loop, subset the data, sort it make a new data frame then rbind it to make a new column.)
year month day id
2011 1 5 3101
2011 1 14 3101
2011 2 3 3101
2011 2 4 3101
2012 1 27 3153
2012 2 20 3153
2012 2 22 3153
2012 3 1 3153
2013 1 31 3103
2013 2 1 3103
2013 2 4 3103
2013 3 4 3103
2013 3 6 3103
The result I expect is:
date1: number of days from 2011, January 1st, start again from 1 in a new year.
date2: number of days of an id working in a year, start again from 1 in a new year.
date3: number of days open within a year, start again from 1 in a new year.
(all of the dates are in ascending order)
year month day id date1 date2 date3
2011 1 5 3101 5 1 1
2011 1 14 3101 14 2 2
2011 2 3 3101 34 3 3
2011 2 4 3101 35 4 4
2012 1 27 3153 27 1 1
2012 2 20 3153 51 2 2
2012 2 22 3153 53 3 3
2012 3 1 3153 60 4 4
2013 1 31 3103 31 1 1
2013 2 1 3103 32 2 2
2013 2 4 3103 35 3 3
2013 3 4 3103 94 4 4
2013 3 6 3103 96 5 5
Please help! Thank you.
You can do it without using unnecessary for loop and subset, here is the answer below
df <- read.table(text =" year month day id
2011 1 5 3101
2011 1 14 3101
2011 2 3 3101
2011 2 4 3101
2012 1 27 3153
2012 2 20 3153
2012 2 22 3153
2012 3 1 3153
2013 1 31 3103
2013 2 1 3103
2013 2 4 3103
2013 3 4 3103
2013 3 6 3103",header = T)
library(lubridate)
df$date1 <- yday(mdy(paste0(df$month,"-",df$day,"-",df$year)))
df$date2 <- ave(df$year, df$id, FUN = seq_along)
df$date3 <- ave(df$year, df$year, FUN = seq_along)

How to lump sum the number of days of a data of several year?

I have data similar to this. I would like to lump sum the day (I'm not sure the word "lump sum" is correct or not) and create a new column "date" so that new column lump sum the number of 3 years data in ascending order.
year month day
2011 1 5
2011 2 14
2011 8 21
2012 2 24
2012 3 3
2012 4 4
2012 5 6
2013 2 14
2013 5 17
2013 6 24
I did this code but result was wrong and it's too long also. It doesn't count the February correctly since February has only 28 days. are there any shorter ways?
cday <- function(data,syear=2011,smonth=1,sday=1){
year <- data[1]
month <- data[2]
day <- data[3]
cmonth <- c(0,31,28,31,30,31,30,31,31,30,31,30,31)
date <- (year-syear)*365+sum(cmonth[1:month])+day
for(yr in c(syear:year)){
if(yr==year){
if(yr%%4==0&&month>2){date<-date+1}
}else{
if(yr%%4==0){date<-date+1}
}
}
return(date)
}
op10$day.no <- apply(op10[,c("year","month","day")],1,cday)
I expect the result like this:
year month day date
2011 1 5 5
2011 1 14 14
2011 1 21 21
2011 1 24 24
2011 2 3 31
2011 2 4 32
2011 2 6 34
2011 2 14 42
2011 2 17 45
2011 2 24 52
Thank you for helping!!
Use Date classes. Dates and times are complicated, look for tools to do this for you rather than writing your own. Pick whichever of these you want:
df$date = with(df, as.Date(paste(year, month, day, sep = "-")))
df$julian_day = as.integer(format(df$date, "%j"))
df$days_since_2010 = as.integer(df$date - as.Date("2010-12-31"))
df
# year month day date julian_day days_since_2010
# 1 2011 1 5 2011-01-05 5 5
# 2 2011 2 14 2011-02-14 45 45
# 3 2011 8 21 2011-08-21 233 233
# 4 2012 2 24 2012-02-24 55 420
# 5 2012 3 3 2012-03-03 63 428
# 6 2012 4 4 2012-04-04 95 460
# 7 2012 5 6 2012-05-06 127 492
# 8 2013 2 14 2013-02-14 45 776
# 9 2013 5 17 2013-05-17 137 868
# 10 2013 6 24 2013-06-24 175 906
# using this data
df = read.table(text = "year month day
2011 1 5
2011 2 14
2011 8 21
2012 2 24
2012 3 3
2012 4 4
2012 5 6
2013 2 14
2013 5 17
2013 6 24", header = TRUE)
This is all using base R. If you handle dates and times frequently, you may also want to look a the lubridate package.

Panel data in long format

I have two data frames:
d1:
Id group occu D Year
12 1 1 12 2007
13 4 2 67 2007
14 6 3 34 2007
15 7 1 88 2007
16 2 2 72 2007
17 1 1 43 2007
18 4 1 66 2007
and d2:
Id group occu D Year
12 1 1 34 2010
13 4 2 100 2010
14 6 3 76 2010
15 7 1 99 2010
16 2 2 102 2010
17 1 1 55 2010
18 4 1 32 2010
The variables "group" and "occu" are factors I want to make a panel data for the year 2007 and 2010 in the long form in R.
How can I do this?

Recoding two variables into a new variable [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 8 years ago.
Improve this question
I'm trying to create a fiscal year variable called 'period', which will run from September through August for six years. My data frame 'dat' is structured as follows:
'data.frame': 52966 obs. of 4 variables:
$ userid : int 96 96 96 101 101 101 101 101 101 101 ...
$ comment.year : int 2008 2009 2009 2008 2008 2008 2008 2008 2008 2009 ...
$ comment.month: int 7 3 8 7 8 9 10 11 12 1 ...
$ num.comments : int 1 1 1 33 51 16 27 29 40 39 ...
I get this error message: Error: unexpected '=' in "dat$period[comment.year=2008 & comment.month="
when I run the following code. I've experimented with double equal signs and putting the month and year integers in quotes, but no success. I'm also wondering if there's a simpler way to do the recode. Since I'm dealing with 6 years, my approach takes 72 lines.
dat$period[comment.year=2008 & comment.month=9]<-"1"
dat$period[comment.year=2008 & comment.month=10]<-"1"
dat$period[comment.year=2008 & comment.month=11]<-"1"
dat$period[comment.year=2008 & comment.month=12]<-"1"
dat$period[comment.year=2009 & comment.month=1]<-"1"
dat$period[comment.year=2009 & comment.month=2]<-"1"
dat$period[comment.year=2009 & comment.month=3]<-"1"
dat$period[comment.year=2009 & comment.month=4]<-"1"
dat$period[comment.year=2009 & comment.month=5]<-"1"
dat$period[comment.year=2009 & comment.month=6]<-"1"
dat$period[comment.year=2009 & comment.month=7]<-"1"
dat$period[comment.year=2009 & comment.month=8]<-"1"
dat$period[comment.year=2009 & comment.month=9]<-"2"
dat$period[comment.year=2009 & comment.month=10]<-"2"
dat$period[comment.year=2009 & comment.month=11]<-"2"
dat$period[comment.year=2009 & comment.month=12]<-"2"
Rather than doing a bunch of partial assignments, why not just calculate the different in years with a bonus bump for months >=9?
#sample data
dat<-data.frame(
comment.year=rep(2009:2011, each=12),
comment.month=rep(1:12, 3)
)[-(1:8), ]
#assign new period
dat$period<- dat$comment.year-min(dat$comment.year) + ifelse(dat$comment.month>=9,1,0)
which gives you
comment.year comment.month period
9 2009 9 1
10 2009 10 1
11 2009 11 1
12 2009 12 1
13 2010 1 1
14 2010 2 1
15 2010 3 1
16 2010 4 1
17 2010 5 1
18 2010 6 1
19 2010 7 1
20 2010 8 1
21 2010 9 2
22 2010 10 2
23 2010 11 2
24 2010 12 2
25 2011 1 2
26 2011 2 2
27 2011 3 2
28 2011 4 2
29 2011 5 2
30 2011 6 2
31 2011 7 2
32 2011 8 2
33 2011 9 3
34 2011 10 3
35 2011 11 3
36 2011 12 3
If you want to make sure to start at a certain user, you can use 2009 rather than min(dat$comment.year).
Using MrFlick's sample data:
dat$period = rep(1:3, each=12)[1:28]
dat
comment.year comment.month period
9 2009 9 1
10 2009 10 1
11 2009 11 1
12 2009 12 1
13 2010 1 1
14 2010 2 1
15 2010 3 1
16 2010 4 1
17 2010 5 1
18 2010 6 1
19 2010 7 1
20 2010 8 1
21 2010 9 2
22 2010 10 2
23 2010 11 2
24 2010 12 2
25 2011 1 2
26 2011 2 2
27 2011 3 2
28 2011 4 2
29 2011 5 2
30 2011 6 2
31 2011 7 2
32 2011 8 2
33 2011 9 3
34 2011 10 3
35 2011 11 3
36 2011 12 3
>
Can easily be extended to your data.
I guess you could also try (Using #MrFlick's data)
set.seed(42)
dat1 <- dat[sample(1:nrow(dat)),]
dat<- within(dat, {period<- as.numeric(factor(comment.year))
period[comment.month <9] <- period[comment.month <9] -1})
dat
# comment.year comment.month period
#9 2009 9 1
#10 2009 10 1
#11 2009 11 1
#12 2009 12 1
#13 2010 1 1
#14 2010 2 1
#15 2010 3 1
#16 2010 4 1
#17 2010 5 1
#18 2010 6 1
#19 2010 7 1
#20 2010 8 1
#21 2010 9 2
#22 2010 10 2
#23 2010 11 2
#24 2010 12 2
#25 2011 1 2
#26 2011 2 2
#27 2011 3 2
#28 2011 4 2
#29 2011 5 2
#30 2011 6 2
#31 2011 7 2
#32 2011 8 2
#33 2011 9 3
#34 2011 10 3
#35 2011 11 3
#36 2011 12 3
Using the unordered dat1
within(dat1, {period<- as.numeric(factor(comment.year)); period[comment.month <9] <- period[comment.month <9] -1})[,3]
#[1] 3 3 1 2 2 1 2 1 2 2 1 2 2 1 1 2 2 1 1 1 3 1 2 1 2 1 2 3
Crosschecking the results with #MrFlick's method
dat1$comment.year-min(dat1$comment.year) + ifelse(dat1$comment.month>=9,1,0)
# [1] 3 3 1 2 2 1 2 1 2 2 1 2 2 1 1 2 2 1 1 1 3 1 2 1 2 1 2 3

Resources