r - data frame manipulation [duplicate] - r

This question already has answers here:
Reshape multiple value columns to wide format
(5 answers)
Closed 5 years ago.
Suppose I have this data frame:
df <- data.frame(ID = c("id1", "id1", "id1", "id2", "id2", "id3", "id3", "id3"),
Code = c("A", "B", "C", "A", "B", "A", "C", "D"),
Count = c(34,65,21,3,8,12,15,16), Value = c(3,1,8,2,3,3,5,8))
that looks like this:
df
ID Code Count Value
1 id1 A 34 3
2 id1 B 65 1
3 id1 C 21 8
4 id2 A 3 2
5 id2 B 8 3
6 id3 A 12 3
7 id3 C 15 5
8 id3 D 16 8
I would like to obtain this result data frame:
result <- data.frame(Code = c("A", "B", "C", "D"),
id1_count = c(34,65,21,NA), id1_value = c(3,1,8,NA),
id2_count = c(3, 8, NA, NA), id2_value = c(2, 3, NA, NA),
id3_count = c(12,NA,15,16), id3_value = c(3,NA,5,8))
that looks like this:
> result
Code id1_count id1_value id2_count id2_value id3_count id3_value
1 A 34 3 3 2 12 3
2 B 65 1 8 3 NA NA
3 C 21 8 NA NA 15 5
4 D NA NA NA NA 16 8
Is there a one liner in the R base package that can do that? I am able to achieve the result I need but not in the R way (i.e., with loops and so on). Any help is appreciated. Thank you.

You can try dcast from devel version of data.table (v1.9.5) which can take multiple value.var columns. Instructions to install are here
library(data.table)
dcast(setDT(df), Code~ID, value.var=c('Count', 'Value'))
# Code Count_id1 Count_id2 Count_id3 Value_id1 Value_id2 Value_id3
#1: A 34 3 12 3 2 3
#2: B 65 8 NA 1 3 NA
#3: C 21 NA 15 8 NA 5
#4: D NA NA 16 NA NA 8
Or using reshape from base R
reshape(df, idvar='Code', timevar='ID', direction='wide')
# Code Count.id1 Value.id1 Count.id2 Value.id2 Count.id3 Value.id3
#1 A 34 3 3 2 12 3
#2 B 65 1 8 3 NA NA
#3 C 21 8 NA NA 15 5
#8 D NA NA NA NA 16 8

You could also try:
library(tidyr)
library(dplyr)
df %>%
gather(key, value, -(ID:Code)) %>%
unite(id_key, ID, key) %>%
spread(id_key, value)
Which gives:
# Code id1_Count id1_Value id2_Count id2_Value id3_Count id3_Value
#1 A 34 3 3 2 12 3
#2 B 65 1 8 3 NA NA
#3 C 21 8 NA NA 15 5
#4 D NA NA NA NA 16 8

Related

Convert data from wide format to long format with multiple measure columns [duplicate]

This question already has answers here:
wide to long multiple measures each time
(5 answers)
Closed 1 year ago.
I want to do this but the exact opposite. So say my dataset looks like this:
ID
X_1990
X_2000
X_2010
Y_1990
Y_2000
Y_2010
A
1
4
7
10
13
16
B
2
5
8
11
14
17
C
3
6
9
12
15
18
but with a lot more measure variables (i.e. also Z_1990, etc.). How can I get it so that the year becomes a variable and it will keep the different measures, like this:
ID
Year
X
Y
A
1990
1
10
A
2000
4
13
A
2010
7
16
B
1990
2
11
B
2000
5
14
B
2010
8
17
C
1990
3
12
C
2000
3
15
C
2010
9
18
You may use pivot_longer with names_sep argument.
tidyr::pivot_longer(df, cols = -ID, names_to = c('.value', 'Year'), names_sep = '_')
# ID Year X Y
# <chr> <chr> <int> <int>
#1 A 1990 1 10
#2 A 2000 4 13
#3 A 2010 7 16
#4 B 1990 2 11
#5 B 2000 5 14
#6 B 2010 8 17
#7 C 1990 3 12
#8 C 2000 6 15
#9 C 2010 9 18
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(ID = c("A", "B", "C"), X_1990 = 1:3, X_2000 = 4:6,
X_2010 = 7:9, Y_1990 = 10:12, Y_2000 = 13:15, Y_2010 = 16:18),
row.names = c(NA, -3L), class = "data.frame")

Joining dataframes with different dimensions and filling the gaps

I want to join two different dataframes. Those dataframes are of different dimensions. Here are the example datasets,
Main dataset
# Main data
id <- c(rep(1, 3), rep(3, 3), rep(10, 1))
time <- c(201601, 201602, 201603, 201601, 201602, 201603, 201601)
data1 <- c(100, 150, 160, 111, 120, 130, 150)
data2 <- c(5, 6, 9, 3, 2, 1, 0)
dataf1 <- data.frame(id, time, data1, data2)
Dataframe to be joined with the main dataset
# Additional data
id <- c(3, 10, 2)
time <- c(rep(201604, 3))
data2 <- c(20, 30, 11)
dataf2 <- data.frame(id, time, data2)
I want to join these two dataframes, namely, dataf1 and dataf2. I have tried dplyr::full_join(dataf1, dataf2, by = "id") but it's not giving what I want. The expected join should look like this,
However, the final output should include the missing timestamps. The final output should look like this,
Is there any way I can achieve this?
Here is a data.table go at your question
library(data.table)
#create data.tables out of your data.frames
setDT(dataf1)
setDT(dataf2)
#row-bind all your data together
alldata <- rbindlist( list( dataf1, dataf2 ), use.names = TRUE, fill = TRUE )
#get all unique id-time combinations out of your data
DT <- CJ( alldata$id, alldata$time, unique = TRUE)
setnames(DT, names(DT), c("id", "time"))
#join your data to all unique combinataions of id-time
ans <- DT[ alldata, `:=`( data1 = i.data1, data2 = i.data2), on = .(id, time)]
ourput
# id time data1 data2
# 1: 1 201601 100 5
# 2: 1 201602 150 6
# 3: 1 201603 160 9
# 4: 1 201604 NA NA
# 5: 2 201601 NA NA
# 6: 2 201602 NA NA
# 7: 2 201603 NA NA
# 8: 2 201604 NA 11
# 9: 3 201601 111 3
# 10: 3 201602 120 2
# 11: 3 201603 130 1
# 12: 3 201604 NA 20
# 13:10 201601 150 0
# 14:10 201602 NA NA
# 15:10 201603 NA NA
# 16:10 201604 NA 30
As you can see, it (almost) matches your desired output.
I got confused at why you wanted id = 10 & time = 201604 ==> data1 = 30. Why this behaviour, while data1 = NA, and data2 = 30 ?
Of course you can easily exchange data1 with data2 using an ifelse-like solution in like ans[ is.na(data1) & !is.na(data2),:=(data1 = data2, data2 = NA)]
Here is one way using tidyr::complete with dplyr. After doing a full_join, we convert time column to Date object. For every id complete the sequence from the minimum value to '2016-04-01' and remove NA rows.
library(dplyr)
full_join(dataf1, dataf2, by = "id") %>%
select(-time.y, -data2.y) %>%
rename_all(~names(dataf1)) %>%
mutate(time1 = as.Date(paste0(time, "01"), "%Y%m%d")) %>%
tidyr::complete(id, time1 = seq(min(time1, na.rm = TRUE),
as.Date('2016-04-01'), by = "1 month")) %>%
mutate(time = format(time1, "%Y%m")) %>%
filter_at(vars(-id), any_vars(!is.na(.))) %>%
select(-time1)
# id time data1 data2
# <dbl> <chr> <dbl> <dbl>
# 1 1 201601 100 5
# 2 1 201602 150 6
# 3 1 201603 160 9
# 4 1 201604 NA NA
# 5 2 201601 NA NA
# 6 2 201602 NA NA
# 7 2 201603 NA NA
# 8 2 201604 NA NA
# 9 3 201601 111 3
#10 3 201602 120 2
#11 3 201603 130 1
#12 3 201604 NA NA
#13 10 201601 150 0
#14 10 201602 NA NA
#15 10 201603 NA NA
#16 10 201604 NA NA
This matches your exact final output:
library(data.table)
setnames(dataf2, "data2", "data1") # Warning: This will modify the original dataf2
rbindlist(
list(dataf1, dataf2),
fill = TRUE
)[CJ(id, time, unique = TRUE), on = .(id, time)]
# id time data1 data2
# 1: 1 201601 100 5
# 2: 1 201602 150 6
# 3: 1 201603 160 9
# 4: 1 201604 NA NA
# 5: 2 201601 NA NA
# 6: 2 201602 NA NA
# 7: 2 201603 NA NA
# 8: 2 201604 11 NA
# 9: 3 201601 111 3
# 10: 3 201602 120 2
# 11: 3 201603 130 1
# 12: 3 201604 20 NA
# 13: 10 201601 150 0
# 14: 10 201602 NA NA
# 15: 10 201603 NA NA
# 16: 10 201604 30 NA

How can I use merge so that I have data for all times?

I'm trying to change a data into which all entities have value for all possible times(months). Here's what I'm trying;
Class Value month
A 10 1
A 12 3
A 9 12
B 11 1
B 10 8
From the data above, I want to get the following data;
Class Value month
A 10 1
A NA 2
A 12 3
A NA 4
....
A 9 12
B 11 1
B NA 2
....
B 10 8
B NA 9
....
B NA 12
So I want to have all possible cells with through month from 1 to 12;
How can I do this? I'm right now trying it with merge function, but appreciate any other ways to approach.
We can use tidyverse
library(tidyverse)
df1 %>%
complete(Class, month = min(month):max(month)) %>%
select_(.dots = names(df1)) %>% #if we need to be in the same column order
as.data.frame() #if needed to convert to 'data.frame'
In base R using merge (where df is your data):
res <- data.frame(Class=rep(levels(df$Class), each=12), value=NA, month=1:12)
merge(df, res, by = c("Class", "month"), all.y = TRUE)[,c(1,3,2)]
# Class Value month
# 1 A 10 1
# 2 A NA 2
# 3 A 12 3
# 4 A NA 4
# 5 A NA 5
# 6 A NA 6
# 7 A NA 7
# 8 A NA 8
# 9 A NA 9
# 10 A NA 10
# 11 A NA 11
# 12 A 9 12
# 13 B 11 1
# 14 B NA 2
# 15 B NA 3
# 16 B NA 4
# 17 B NA 5
# 18 B NA 6
# 19 B NA 7
# 20 B 10 8
# 21 B NA 9
# 22 B NA 10
# 23 B NA 11
# 24 B NA 12
df <- structure(list(Class = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), Value = c(10L, 12L, 9L, 11L, 10L), month = c(1L,
3L, 12L, 1L, 8L)), .Names = c("Class", "Value", "month"), class = "data.frame", row.names = c(NA,
-5L))
To add to #akrun's answer, if you want to replace the NA values with 0, you can do the following:
library(dplyr)
library(tidyr)
df1 %>%
complete(Class, month = min(month):max(month)) %>%
mutate(Value = ifelse(is.na(Value),0,Value))

Find row of the next instance of the value in R

I have two columns Time and Event. There are two events A and B. Once an event A takes place, I want to find when the next event B occurs. Column Time_EventB is the desired output.
This is the data frame:
df <- data.frame(Event = sample(c("A", "B", ""), 20, replace = TRUE), Time = paste("t", seq(1,20)))
What is the code in R for finding the next instance of a value (B in this case)?
What is the code for once the instance of B is found, return the value of the corresponding Time Column?
The code should be something like this:
data$Time_EventB <- ifelse(data$Event == "A", <Code for returning time of next instance of B>, "")
In Excel this can be done using VLOOKUP.
Here's a simple solution:
set.seed(1)
df <- data.frame(Event = sample(c("A", "B", ""),size=20, replace=T), time = 1:20)
as <- which(df$Event == "A")
bs <- which(df$Event == "B")
next_b <- sapply(as, function(a) {
diff <- bs-a
if(all(diff < 0)) return(NA)
bs[min(diff[diff > 0]) == diff]
})
df$next_b <- NA
df$next_b[as] <- df$time[next_b]
> df
Event time next_b
1 A 1 2
2 B 2 NA
3 B 3 NA
4 4 NA
5 A 5 8
6 6 NA
7 7 NA
8 B 8 NA
9 B 9 NA
10 A 10 14
11 A 11 14
12 A 12 14
13 13 NA
14 B 14 NA
15 15 NA
16 B 16 NA
17 17 NA
18 18 NA
19 B 19 NA
20 20 NA
Here's an attempt using a "rolling join" from the data.table package:
library(data.table)
setDT(df)
df[Event=="B", .(time, nextb=time)][df, on="time", roll=-Inf][Event != "A", nextb := NA][]
# time nextb Event
# 1: 1 2 A
# 2: 2 NA B
# 3: 3 NA B
# 4: 4 NA
# 5: 5 8 A
# 6: 6 NA
# 7: 7 NA
# 8: 8 NA B
# 9: 9 NA B
#10: 10 14 A
#11: 11 14 A
#12: 12 14 A
#13: 13 NA
#14: 14 NA B
#15: 15 NA
#16: 16 NA B
#17: 17 NA
#18: 18 NA
#19: 19 NA B
#20: 20 NA
Using data as borrowed from #thc

copy values of a column into another column based on a condition using a loop

I need to create a complicated "for" loop, but after reading some examples I'm still clueless of how to write it in a proper R way and therefore I'm not sure whether it will work or not. I'm still an R beginner :(
I have a dataset in the long format, with different occasions, however, some occasions are not truly new ones since the date of start is the same, but have a different offence that I need to copy in a new column called "offence2", after this I need to drop the false new occasion, in order to keep only rows that represent new occasions. My real data have up to 8 different offences for a single date, but I made a simpler example.
This are an example of how my data looks like
id<-c(1,1,1,2,2,3,3,3,4,4,4,4,5,5,5)
dstart<-c("25/11/2006", "13/12/2006","13/12/2006","07/02/2006","07/02/2006",
"15/01/2006", "22/03/2006","18/09/2006", "04/03/2006","04/03/2006",
"22/08/2006","22/08/2006","11/04/2006", "11/04/2006", "19/10/2006")
dstart1<-as.Date(dstart, "%d/%m/%Y")
offence<-c("a","b","c","b","d","a","a","e","b","a","c","a","a","b","a")
cod_offence<-c(25, 26,27,26,28,25,25,29,26,25,27,25,25,26,25)
mydata<-data.frame(id, dstart1, offence, cod_offence)
Data
id dstart1 offence cod_offence
1 1 2006-11-25 a 25
2 1 2006-12-13 b 26
3 1 2006-12-13 c 27
4 2 2006-02-07 b 26
5 2 2006-02-07 d 28
6 3 2006-01-15 a 25
7 3 2006-03-22 a 25
8 3 2006-09-18 e 29
9 4 2006-03-04 b 26
10 4 2006-03-04 a 25
11 4 2006-08-22 c 27
12 4 2006-08-22 a 25
13 5 2006-04-11 a 25
14 5 2006-04-11 b 26
15 5 2006-10-19 a 25
I need something like this:
id dstart1 offence cod_offence offence2
1 1 2006-11-25 a 25 NA
2 1 2006-12-13 b 26 c
3 1 2006-12-13 c 27 NA
4 2 2006-02-07 b 26 d
5 2 2006-02-07 d 28 NA
6 3 2006-01-15 a 25 NA
7 3 2006-03-22 a 25 NA
8 3 2006-09-18 e 29 NA
9 4 2006-03-04 b 26 a
10 4 2006-03-04 a 25 NA
11 4 2006-08-22 c 27 a
12 4 2006-08-22 a 25 NA
13 5 2006-04-11 a 25 b
14 5 2006-04-11 b 26 NA
15 5 2006-10-19 a 25 NA
I think that I need to do something like this:
given i=individual
j=observation within individual
for each individual I need to check whether mydata$dstart1(j) = mydata$dstart1(j+1)
if this is true, then copy mydata$offence2(j)=mydata$offence(j+1), otherwise keep the same value
This has to stop if id(j) != id(j+1) and re-start with the new id.
My problem is that I don't know how to put this in a loop.
Thank you!!
Update
Yes, it'w works fine with the example, but not yet with my real data, since they are a little bit more complex
What happen if instead of two repeated dates I have three or more? each one of them with different offences. Following #CathG solution, I need to create more variables according to the number of offences (in my case 8), I guess I would need a new vector that identify the position of the observation within id and a new "instruction" that tell R that depending of the position of the mydata$dstart1, the value need to be copied in a different column. But then again, I don't know how to do it.
id dstart1 offence cod_offence offence2 offence3 offence4
1 1 2006-11-25 a 25 NA NA NA
2 1 2006-12-13 b 26 c NA NA
3 1 2006-12-13 c 27 NA NA NA
4 2 2006-02-07 b 26 d NA NA
5 2 2006-02-07 d 28 NA NA NA
6 2 2006-04-12 b 26 d c a
7 2 2006-04-12 d 28 NA NA NA
8 2 2006-04-12 c 27 NA NA NA
9 2 2006-04-12 a 25 NA NA NA
Thanks again!!!
With splitand a loop :
# data with repeated dates /offences
id<-c(1,1,1,2,2,3,3,3,4,4,4,4,5,5,5,5,5,5)
dstart<-c("25/11/2006", "13/12/2006","13/12/2006","07/02/2006","07/02/2006",
"15/01/2006", "22/03/2006","18/09/2006", "04/03/2006","04/03/2006",
"22/08/2006","22/08/2006","11/04/2006", "11/04/2006", "19/10/2006","19/10/2006","19/10/2006","19/10/2006")
dstart1<-as.Date(dstart, "%d/%m/%Y")
offence<-c("a","b","c","b","d","a","a","e","b","a","c","a","a","b","a","c","b","a")
cod_offence<-c(25, 26,27,26,28,25,25,29,26,25,27,25,25,26,25,27,25,25)
mydata<-data.frame(id, dstart1, offence, cod_offence)
# see the max offences there are for same id and date
maxoff<-max(table(mydata$id,mydata$dstart1))
mydata[,paste("offence",2:maxoff,sep="")]<-NA
# split your data according to id
splitmydata<-split(mydata,mydata$id)
# for each "per id dataset", apply a function that looks for repeated offences / dates and fill the "offences" variables in the row with first occurence of specific date.
splitmydata2<-lapply(splitmydata,
function(tab){
for(datestart in unique(tab[,"dstart1"])){
ind_date<-sort(which(tab[,"dstart1"]==datestart))
if(length(ind_date[-1])){
tab[ind_date[1],grep("^offence",colnames(tab),value=T)[2:(length(ind_date))]]<-as.character(tab[ind_date[-1],"offence"])
}
}
return(tab)
}
)
mydata2<-unsplit(splitmydata2,mydata$id) # finally, unsplit your data
> mydata2
id dstart1 offence cod_offence offence2 offence3 offence4
1 1 2006-11-25 a 25 <NA> <NA> <NA>
2 1 2006-12-13 b 26 c <NA> <NA>
3 1 2006-12-13 c 27 <NA> <NA> <NA>
4 2 2006-02-07 b 26 d <NA> <NA>
5 2 2006-02-07 d 28 <NA> <NA> <NA>
6 3 2006-01-15 a 25 <NA> <NA> <NA>
7 3 2006-03-22 a 25 <NA> <NA> <NA>
8 3 2006-09-18 e 29 <NA> <NA> <NA>
9 4 2006-03-04 b 26 a <NA> <NA>
10 4 2006-03-04 a 25 <NA> <NA> <NA>
11 4 2006-08-22 c 27 a <NA> <NA>
12 4 2006-08-22 a 25 <NA> <NA> <NA>
13 5 2006-04-11 a 25 b <NA> <NA>
14 5 2006-04-11 b 26 <NA> <NA> <NA>
15 5 2006-10-19 a 25 c b a
16 5 2006-10-19 c 27 <NA> <NA> <NA>
17 5 2006-10-19 b 25 <NA> <NA> <NA>
18 5 2006-10-19 a 25 <NA> <NA> <NA>
You can use base R
indx <- with(mydata, ave(as.numeric(dstart1), id,
FUN=function(x) c(x[-1]==x[-length(x)], FALSE)))
transform(mydata, offence2=ifelse(!!indx,
c(as.character(offence[-1]), NA), NA))
Or using dplyr
library(dplyr)
mydata %>%
group_by(id) %>%
mutate(offence2= dstart1==lead(dstart1),
offence2= ifelse(!is.na(offence2)&offence2,
as.character(lead(offence)), NA_character_))
# id dstart1 offence cod_offence offence2
#1 1 2006-11-25 a 25 NA
#2 1 2006-12-13 b 26 c
#3 1 2006-12-13 c 27 NA
#4 2 2006-02-07 b 26 d
#5 2 2006-02-07 d 28 NA
#6 3 2006-01-15 a 25 NA
#7 3 2006-03-22 a 25 NA
#8 3 2006-09-18 e 29 NA
#9 4 2006-03-04 b 26 a
#10 4 2006-03-04 a 25 NA
#11 4 2006-08-22 c 27 a
#12 4 2006-08-22 a 25 NA
#13 5 2006-04-11 a 25 b
#14 5 2006-04-11 b 26 NA
#15 5 2006-10-19 a 25 NA
or using data.table
library(data.table)
setDT(mydata)[, indx:=c(dstart1[-1]==dstart1[-.N], FALSE), by=id][,
offence2:=ifelse(indx, as.character(offence)[which(indx)+1],
NA_character_), by=id][,indx:=NULL]
mydata
# id dstart1 offence cod_offence offence2
#1: 1 2006-11-25 a 25 NA
#2: 1 2006-12-13 b 26 c
#3: 1 2006-12-13 c 27 NA
#4: 2 2006-02-07 b 26 d
#5: 2 2006-02-07 d 28 NA
#6: 3 2006-01-15 a 25 NA
#7: 3 2006-03-22 a 25 NA
#8: 3 2006-09-18 e 29 NA
#9: 4 2006-03-04 b 26 a
#10: 4 2006-03-04 a 25 NA
#11: 4 2006-08-22 c 27 a
#12: 4 2006-08-22 a 25 NA
#13: 5 2006-04-11 a 25 b
#14: 5 2006-04-11 b 26 NA
#15: 5 2006-10-19 a 25 NA
Update
Using the new dataset mydata2 and if you use the first method, we get d1
indx <- with(mydata2, ave(as.numeric(dstart1), id,
FUN=function(x) c(x[-1]==x[-length(x)], FALSE)))
d1 <- transform(mydata2, offence2=ifelse(!!indx,
c(as.character(offence[-1]), NA), NA))
From d1, we can create an indx column and then use dcast to convert from long form to wide for the column offence2. If there are columns with all NAs, we can remove that by using colSums(is.na(. Rename the columns, and then use mutate_each from dplyr to sort the columns, and finally cbind it with mydata2
d1$indx <- with(d1, ave(seq_along(id), id, dstart1, FUN=seq_along))
library(reshape2)
d2 <- dcast(d1, id + dstart1+indx~indx, value.var='offence2')
d2New <- d2[,colSums(is.na(d2))!=nrow(d2)]
nm1 <- grep("^\\d",colnames(d2New))
colnames(d2New)[nm1] <- paste0('offence', 2:(length(nm1)+1))
d3 <- d2New[,-3] %>%
group_by(id, dstart1) %>%
mutate_each(funs(.[order(.)])) %>%
ungroup()
cbind(mydata,d3[,-c(1:2)])
# id dstart1 offence cod_offence offence2 offence3 offence4
#1 1 2006-11-25 a 25 <NA> <NA> <NA>
#2 1 2006-12-13 b 26 c <NA> <NA>
#3 1 2006-12-13 c 27 <NA> <NA> <NA>
#4 2 2006-02-07 b 26 d <NA> <NA>
#5 2 2006-02-07 d 28 <NA> <NA> <NA>
#6 2 2006-04-12 b 26 d c a
#7 2 2006-04-12 d 28 <NA> <NA> <NA>
#8 2 2006-04-12 c 27 <NA> <NA> <NA>
#9 2 2006-04-12 a 25 <NA> <NA> <NA>
data
mydata <- structure(list(id = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5,
5, 5), dstart1 = structure(c(13477, 13495, 13495, 13186, 13186,
13163, 13229, 13409, 13211, 13211, 13382, 13382, 13249, 13249,
13440), class = "Date"), offence = structure(c(1L, 2L, 3L, 2L,
4L, 1L, 1L, 5L, 2L, 1L, 3L, 1L, 1L, 2L, 1L), .Label = c("a",
"b", "c", "d", "e"), class = "factor"), cod_offence = c(25, 26,
27, 26, 28, 25, 25, 29, 26, 25, 27, 25, 25, 26, 25)), .Names = c("id",
"dstart1", "offence", "cod_offence"), row.names = c(NA, -15L),
class = "data.frame")
mydata2 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L),
dstart1 = structure(c(13477, 13495, 13495, 13186, 13186, 13250, 13250,
13250, 13250), class = "Date"), offence = c("a", "b", "c", "b", "d", "b",
"d", "c", "a"), cod_offence = c(25L, 26L, 27L, 26L, 28L, 26L, 28L, 27L, 25L
)), .Names = c("id", "dstart1", "offence", "cod_offence"), row.names =
c("1","2", "3", "4", "5", "6", "7", "8", "9"), class = "data.frame")

Resources