Selecting observations for which two years are available by country - r

I have a dataset as follows:
DT <- fread(
"ID country year Event_A Event_B
4 BEL 2002 0 1
5 BEL 2002 0 1
6 NLD 2002 1 1
7 NLD 2006 1 0
8 NLD 2006 1 1
9 GBR 2001 0 1
10 GBR 2001 0 0
11 GBR 2001 0 1
12 GBR 2007 1 1
13 GBR 2007 1 1",
header = TRUE)
I would like to keep only observations for which I have observations in two country-years. So, BEL will drop out because it only has observations in 2002.
I would like to do something like DT[,if(unique(year)>1) .SD, by=country] but that does not do anything. I also tried DT[unique(year)>1, .SD, by=country] but this gives the error:
Error in `[.data.table`(DT, unique(year) > 1, .SD, by = country) :
i evaluates to a logical vector length 4 but there are 10 rows. Recycling of logical i is no longer allowed as it hides more bugs than is worth the rare convenience. Explicitly use rep(...,length=.N) if you really need to recycle.
Desired output:
DT <- fread(
"ID country year Event_A Event_B
6 NLD 2002 1 1
7 NLD 2006 1 0
8 NLD 2006 1 1
9 GBR 2001 0 1
10 GBR 2001 0 0
11 GBR 2001 0 1
12 GBR 2007 1 1
13 GBR 2007 1 1",
header = TRUE)

You can use uniqueN to get count of unique values and select rows using .SD.
library(data.table)
DT[, .SD[uniqueN(year) > 1], country]
# country ID year Event_A Event_B
#1: NLD 6 2002 1 1
#2: NLD 7 2006 1 0
#3: NLD 8 2006 1 1
#4: GBR 9 2001 0 1
#5: GBR 10 2001 0 0
#6: GBR 11 2001 0 1
#7: GBR 12 2007 1 1
#8: GBR 13 2007 1 1
Or in dplyr we can do the same with n_distinct and filter
library(dplyr)
DT %>% group_by(country) %>% filter(n_distinct(year) > 1)

In the same spirit as #user2474226, if you're open to other packages, a simple dplyrsolution:
library(data.table)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:data.table':
#>
#> between, first, last
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
DT <- fread(
"ID country year Event_A Event_B
4 BEL 2002 0 1
5 BEL 2002 0 1
6 NLD 2002 1 1
7 NLD 2006 1 0
8 NLD 2006 1 1
9 GBR 2001 0 1
10 GBR 2001 0 0
11 GBR 2001 0 1
12 GBR 2007 1 1
13 GBR 2007 1 1",
header = TRUE)
# Detect count of countries
sel_cnt <-
DT %>%
count(country, year) %>%
count(country) %>%
filter(n > 1)
DT %>%
semi_join(sel_cnt, by = "country")
#> ID country year Event_A Event_B
#> 1 6 NLD 2002 1 1
#> 2 7 NLD 2006 1 0
#> 3 8 NLD 2006 1 1
#> 4 9 GBR 2001 0 1
#> 5 10 GBR 2001 0 0
#> 6 11 GBR 2001 0 1
#> 7 12 GBR 2007 1 1
#> 8 13 GBR 2007 1 1

Here is a base R solution by using ave() and subset()
DTout <- subset(DT, as.logical(ave(DT$year,DT$country, FUN = function(x) length(unique(x))>=2)))
such that
> DTout
ID country year Event_A Event_B
3 6 NLD 2002 1 1
4 7 NLD 2006 1 0
5 8 NLD 2006 1 1
6 9 GBR 2001 0 1
7 10 GBR 2001 0 0
8 11 GBR 2001 0 1
9 12 GBR 2007 1 1
10 13 GBR 2007 1 1

If it's not necessary to do it in data.table, you can count the number of distinct years by country via base R:
country_count <- aggregate(year ~ country, DT, FUN = function(x) NROW(unique(x)))
DT[DT$country %in% country_count$country[country_count$year > 1],]
# output
ID country year Event_A Event_B
3 6 NLD 2002 1 1
4 7 NLD 2006 1 0
5 8 NLD 2006 1 1
6 9 GBR 2001 0 1
7 10 GBR 2001 0 0
8 11 GBR 2001 0 1
9 12 GBR 2007 1 1
10 13 GBR 2007 1 1

Related

Retaining column information when melting multiple columns into one

I have a data.table which I have melted follows:
library(data.table)
DT <- fread(
"ID country year Event_A Event_B
4 NLD 2002 0 1
5 NLD 2002 0 1
6 NLD 2006 1 1
7 NLD 2006 1 0
8 NLD 2006 1 1
9 GBR 2002 0 1
10 GBR 2002 0 0
11 GBR 2002 0 1
12 GBR 2006 1 1
13 GBR 2006 1 1",
header = TRUE)
melt(DT, id.var = setdiff(names(DT), c("Event_A", "Event_B")),
value.name = 'Event')[, variable := NULL][order(ID)]
# ID country year Event
# 1: 4 NLD 2002 0
# 2: 4 NLD 2002 1
# 3: 5 NLD 2002 0
# 4: 5 NLD 2002 1
# 5: 6 NLD 2006 1
# 6: 6 NLD 2006 1
# 7: 7 NLD 2006 1
# 8: 7 NLD 2006 0
# 9: 8 NLD 2006 1
#10: 8 NLD 2006 1
#11: 9 GBR 2002 0
#12: 9 GBR 2002 1
#13: 10 GBR 2002 0
#14: 10 GBR 2002 0
#15: 11 GBR 2002 0
#16: 11 GBR 2002 1
#17: 12 GBR 2006 1
#18: 12 GBR 2006 1
#19: 13 GBR 2006 1
#20: 13 GBR 2006 1
However, in hindsight, I want to have the Event category in the melted data set. How do I make sure that this information is retained in the melted data?
EDIT (Due to oversimplification in original post):
DT <- fread(
"ID country year Event_A Event_B Choice_A Choice_B
4 NLD 2002 0 1 0 1
5 NLD 2002 0 1 1 1
6 NLD 2006 1 1 0 1
7 NLD 2006 1 0 1 1
8 NLD 2006 1 1 1 1
9 GBR 2002 0 1 1 0
10 GBR 2002 0 0 1 1
11 GBR 2002 0 1 0 1
12 GBR 2006 1 1 1 1
13 GBR 2006 1 1 0 0",
header = TRUE)
DT<- melt(DT, measure = patterns("^Event_", "^Choice_"),
value.name = c("Event", "Choice"))[, variable := NULL][order(ID)]
Desired output:
# ID country year Event Event_Cat Choice Choice_Cat
# 1: 4 NLD 2002 0 A 0 A
# 2: 4 NLD 2002 1 B 1 B
# 3: 5 NLD 2002 0 A
# 4: 5 NLD 2002 1 B
# 5: 6 NLD 2006 1 A
# 6: 6 NLD 2006 1 B
# 7: 7 NLD 2006 1
# 8: 7 NLD 2006 0
# 9: 8 NLD 2006 1
#10: 8 NLD 2006 1
#11: 9 GBR 2002 0
#12: 9 GBR 2002 1
#13: 10 GBR 2002 0
#14: 10 GBR 2002 0
#15: 11 GBR 2002 0
#16: 11 GBR 2002 1
#17: 12 GBR 2006 1
#18: 12 GBR 2006 1
#19: 13 GBR 2006 1
#20: 13 GBR 2006 1
Don't NULLify the variable.name:
setnames(
melt(DT, id.var = setdiff(names(DT), c("Event_A", "Event_B")), value.name = 'Event')[
, variable:=sub("Event_", "", variable)][order(ID)],
old="variable", new="Event_Cat")
ID country year Event_Cat Event
1: 4 NLD 2002 A 0
2: 4 NLD 2002 B 1
3: 5 NLD 2002 A 0
4: 5 NLD 2002 B 1
5: 6 NLD 2006 A 1
6: 6 NLD 2006 B 1 ...
Edit, based on new information provided (melting multiple columns).
DT2 <- setnames(
melt(DT, measure = patterns("^Event_", "^Choice_"),
value.name = c("Event", "Choice"))[, variable := forcats::lvls_revalue(variable,
c("A", "B"))][order(ID)],
old="variable", new="Cetegory")
DT2
ID country year Cetegory Event Choice
1: 4 NLD 2002 A 0 0
2: 4 NLD 2002 B 1 1
3: 5 NLD 2002 A 0 1
4: 5 NLD 2002 B 1 1
5: 6 NLD 2006 A 1 0
6: 6 NLD 2006 B 1 1 ...
You could use pivot_longer from tidyr :
tidyr::pivot_longer(DT, cols = starts_with('Event'),
names_to = c('.value', 'Event_Cat'),
names_sep = '_')
# ID country year Event_Cat Event
# <int> <chr> <int> <chr> <int>
# 1 4 NLD 2002 A 0
# 2 4 NLD 2002 B 1
# 3 5 NLD 2002 A 0
# 4 5 NLD 2002 B 1
# 5 6 NLD 2006 A 1
# 6 6 NLD 2006 B 1
# 7 7 NLD 2006 A 1
# 8 7 NLD 2006 B 0
# 9 8 NLD 2006 A 1
#10 8 NLD 2006 B 1
#11 9 GBR 2002 A 0
#12 9 GBR 2002 B 1
#13 10 GBR 2002 A 0
#14 10 GBR 2002 B 0
#15 11 GBR 2002 A 0
#16 11 GBR 2002 B 1
#17 12 GBR 2006 A 1
#18 12 GBR 2006 B 1
#19 13 GBR 2006 A 1
#20 13 GBR 2006 B 1

Coalesce columns with capitalised and non capitalised versions of variables names without specifying the variable names

I have a (large) data frame as follows:
library(data.table)
DT <- fread(
"ID country year A b B a
4 NLD 2002 NA 1 NA 0
5 NLD 2002 NA 0 NA 1
6 NLD 2006 NA 1 NA 1
7 NLD 2006 NA 0 NA 0
8 NLD 2006 0 NA 0 NA
9 GBR 2002 0 NA 0 NA
10 GBR 2002 0 NA 0 NA
11 GBR 2002 0 NA 0 NA
12 GBR 2006 1 NA 1 NA
13 GBR 2006 1 NA 0 NA",
header = TRUE)
I would simply like to merge variables A and a, and, B and b.
EDIT: The problem is that I have to do this for more than 1000 variables, so I would like to avoid specifying either the column names which need not to be checked or the ones that do.
I was hoping for a solution that first splits the columns into a group for which there is no non-capitalised alternative, and a group for which there is.
As far as I understand the solution here:
Coalesce columns based on pattern in R
It still requires to provide the variables names for which the case needs to be checked. If I misunderstand this solution, which is very much possible, please let me know. In any case, as explained, I need a solution without specifically specifying the variables.
I found a good start here.
That solution has however a slightly different approach than the one I need.
How do I make such a variable merge conditional on something like tolower(varname) == varname ?
Desired output:
DT <- fread(
"ID country year A B
4 NLD 2002 0 1
5 NLD 2002 1 0
6 NLD 2006 1 1
7 NLD 2006 0 0
8 NLD 2006 0 0
9 GBR 2002 0 0
10 GBR 2002 0 0
11 GBR 2002 0 0
12 GBR 2006 1 1
13 GBR 2006 1 0 ",
header = TRUE)
I can offer a solution using tidyverse functions. This is essentially the same as the solution offered by AntoniosK, but pivot_longer and pivot_wider are the preferred alternatives to spread and gather.
library(dplyr)
library(tidyr)
DT %>%
mutate(UNIQUEID = row_number()) %>%
mutate_all(as.character) %>%
pivot_longer(cols = -UNIQUEID) %>%
mutate(name = stringr::str_to_upper(name)) %>%
filter(!is.na(value)) %>%
pivot_wider(names_from = name, values_from = value) %>%
type.convert(as.is=TRUE) %>% select(-UNIQUEID)
h/t #dario for the great suggestions.
A data.table-only solution - using a simple loop instead of reshaping data:
all_cols <- names(DT)
cols <- grep("[A-Z]", all_cols, value = TRUE)
for (col in cols) {
snc <- all_cols[all_cols == tolower(col)]
if (length(snc)) {
DT[, (col) := fcoalesce(.SD), .SDcols = c(snc, col)]
DT[, (setdiff(snc, col)) := NULL]
}
}
> DT[]
ID country year A B
1: 4 NLD 2002 0 1
2: 5 NLD 2002 1 0
3: 6 NLD 2006 1 1
4: 7 NLD 2006 0 0
5: 8 NLD 2006 0 0
6: 9 GBR 2002 0 0
7: 10 GBR 2002 0 0
8: 11 GBR 2002 0 0
9: 12 GBR 2006 1 1
10: 13 GBR 2006 1 0
The OP is using data.table, so the question deserves a data.table answer.
The approach below is similar to sindri_baldur's answer in general but differs in important details. In particular,
it will also coalesce multiple columns like "CC", "cc", "cC" covering the different ways of writing variable names, e.g., upper case, lower case, as well as lower and upper camel case.
it will return a description of the columns which have been coalesced.
library(data.table)
library(magrittr) # piping is used to improve readability
names(DT) %>%
data.table(orig = ., lc = tolower(.)) %>%
.[, {
if (.N > 1L) {
new <- toupper(.BY)
old <- setdiff(orig, new)
DT[, (new) := fcoalesce(.SD), .SDcols = orig]
DT[, (old) := NULL]
sprintf("Coalesced %s onto %s", toString(old), new)
}
}, by = lc]
DT[]
lc V1
1: a Coalesced a onto A
2: b Coalesced b onto B
DT[]
ID country year A B
1: 4 NLD 2002 0 1
2: 5 NLD 2002 1 0
3: 6 NLD 2006 1 1
4: 7 NLD 2006 0 0
5: 8 NLD 2006 0 0
6: 9 GBR 2002 0 0
7: 10 GBR 2002 0 0
8: 11 GBR 2002 0 0
9: 12 GBR 2006 1 1
10: 13 GBR 2006 1 0
For another use case
DT2 <- fread(
"ID country year A b B a CC cc cC
4 NLD 2002 NA 1 NA 0 1 NA NA
5 NLD 2002 NA 0 NA 1 NA 2 NA
6 NLD 2006 NA 1 NA 1 NA NA 3
7 NLD 2006 NA 0 NA 0 NA NA NA
8 NLD 2006 0 NA 0 NA 1 NA NA
9 GBR 2002 0 NA 0 NA NA 2 NA
10 GBR 2002 0 NA 0 NA NA NA 3
11 GBR 2002 0 NA 0 NA 1 NA NA
12 GBR 2006 1 NA 1 NA NA 2 NA
13 GBR 2006 1 NA 0 NA NA NA 3",
header = TRUE)
DT <- copy(DT2)
above code returns
lc V1
1: a Coalesced a onto A
2: b Coalesced b onto B
3: cc Coalesced cc, cC onto CC
DT[]
ID country year A B CC
1: 4 NLD 2002 0 1 1
2: 5 NLD 2002 1 0 2
3: 6 NLD 2006 1 1 3
4: 7 NLD 2006 0 0 NA
5: 8 NLD 2006 0 0 1
6: 9 GBR 2002 0 0 2
7: 10 GBR 2002 0 0 3
8: 11 GBR 2002 0 0 1
9: 12 GBR 2006 1 1 2
10: 13 GBR 2006 1 0 3
Explanation
The column names are turned into a data.table with a additional column lc of the lower case versions of the column names.
Instead of a for loop we use grouping by = and data.table's feature to evaluate any expression, even with side effects. So, DT is updated by reference for each distinct value of lc within the scope of the data.table which was created on-the-fly in step 1 but only if there is more than one column in the group.
Future extensions
This approach can be extended to coalesce columns which use underscore, dots, or blanks "_", ".", " "in its column names, e.g., "var_1", "VAR.1", "Var 1".
Assuming that your example dataset represents your general case, this should work:
library(data.table)
library(tidyverse)
DT <- fread(
"ID country year A b B a
4 NLD 2002 NA 1 NA 0
5 NLD 2002 NA 0 NA 1
6 NLD 2006 NA 1 NA 1
7 NLD 2006 NA 0 NA 0
8 NLD 2006 0 NA 0 NA
9 GBR 2002 0 NA 0 NA
10 GBR 2002 0 NA 0 NA
11 GBR 2002 0 NA 0 NA
12 GBR 2006 1 NA 1 NA
13 GBR 2006 1 NA 0 NA",
header = TRUE)
# spot the column names to keep as they are
data.frame(x = names(DT), stringsAsFactors = F) %>% # get actual column names of the dataset
mutate(y = toupper(x)) %>% # get the upper values
group_by(y) %>% # for each upper value
filter(n() == 1) %>% # count them and keep only the unique columns
pull(x) -> fix_cols # store unique column names
DT %>%
gather(col_name, value, -fix_cols) %>% # reshape dataset
mutate(col_name = toupper(col_name)) %>% # change column names to upper case
na.omit() %>% # remove NA rows
spread(col_name, value) # reshape again
# ID country year A B
# 1 4 NLD 2002 0 1
# 2 5 NLD 2002 1 0
# 3 6 NLD 2006 1 1
# 4 7 NLD 2006 0 0
# 5 8 NLD 2006 0 0
# 6 9 GBR 2002 0 0
# 7 10 GBR 2002 0 0
# 8 11 GBR 2002 0 0
# 9 12 GBR 2006 1 1
# 10 13 GBR 2006 1 0

Extracting the change of the mean per group over time

I have a data table from which I calculated the mean sales as follows:
library(data.table)
DT <- fread(
"ID country year sales industry size cat4
1 NLD 2000 4 A 1 0
2 NLD 2000 4 B 1 1
3 NLD 2006 2 A 1 1
4 NLD 2002 4 A 1 0
5 NLD 2002 4 B 1 1
6 NLD 2006 2 A 1 1
7 NLD 2006 2 B 2 0
8 NLD 2006 1 A 1 4
9 GBR 2001 2 B 3 5
10 GBR 2001 1 B 2 5
11 GBR 2002 1 A 1 11
12 GBR 2006 1 A 1 2
13 GBR 2006 1 B 3 12
14 GBR 2006 1 A 1 2
15 GBR 2006 1 B 3 12",
header = TRUE)
setDT(DT)[,Mean_Sales:= mean(sales, na.rm=TRUE), by=c("country", "industry", "size")]
However, now I am interested in how Mean_Sales changes over time, per group: by=c("iso3c", "industry", "size").
I would like to take the mean of the absolute differences, divided by the years they are apart.
As an example, for a company in NLD of industry A and size 1, constituting to ID=1 and ID=8, I want the mean of absolute differences (|1-4|=3), divided by the years apart (2006-2000 = 6). Leading to a year to year change of the mean of 3/6 = 0.5.
I just cannot figure out how to get it into R code. Any help would be greatly appreciated.
Desired output:
library(data.table)
DT <- fread(
"ID country year sales industry size cat4 delta
1 NLD 2000 4 A 1 0 0.5
2 NLD 2000 4 B 1 1 0.33
3 NLD 2006 2 A 1 1
4 NLD 2002 4 A 1 0
5 NLD 2002 4 B 1 1
6 NLD 2006 2 A 1 1
7 NLD 2006 2 B 1 0 0.33
8 NLD 2006 1 A 1 4 0.5
9 GBR 2001 2 B 3 5
10 GBR 2001 1 B 2 5
11 GBR 2002 1 A 1 11
12 GBR 2006 1 A 1 2
13 GBR 2006 1 B 3 12
14 GBR 2006 1 A 1 2
15 GBR 2006 1 B 3 12",
header = TRUE)
You could order by year and get absolute difference between last and first sales value and divide it by difference in year.
library(data.table)
DT[order(year), delta := abs(last(sales) - first(sales))/(max(year) - min(year)),
.(country, industry, size)]

Count observations per group satisfying a different condition for each row

I have a dataframe that looks like this one
state start end date treat
1 1999 2000 2001 1
1 1998 2000 2001 1
1 2000 2003 NA 0
2 2001 2002 NA 0
2 2002 2004 2003 1
2 2003 2004 2005 1
3 2002 2004 2006 1
3 2003 2004 NA 0
3 2005 2007 NA 0
I want to group it by state identifier and, for each state, I want compute the number of treated observation (treat) the date of which lies in between start and end.
In other words I want to get the following
state start end date treat result
1 1999 2000 2001 1 0
1 1998 2000 2001 1 0
1 2000 2003 NA 0 2
2 2001 2002 NA 0 0
2 2002 2004 2003 1 1
2 2003 2004 2005 1 0
3 2002 2004 2006 1 0
3 2003 2004 NA 0 0
3 2005 2008 NA 0 1
For instance, result in the first row is equal to 0 because within state = 1 there is no date between 1999 and 2000. On the other hand, result in the last row is equal to one because within state 3 I have one treated unit the date of which lies between 2005 and 2008 (in particular date = 2006 in the 7th row).
Thank you very much for your help.
You can split by state and combine two outer with & testing if date is between start and end and then sum treat for those matching dates.
x$result <- unlist(lapply(split(x, x$state), function(y) {
tt <- outer(y$start, y$date, "<") & outer(y$end, y$date, ">")
tt[is.na(tt)] <- TRUE
apply(tt, 1, function(z) sum(y$treat[z]))
}))
x
# state start end date treat result
#1 1 1999 2000 2001 1 0
#2 1 1998 2000 2001 1 0
#3 1 2000 2003 NA 0 2
#4 2 2001 2002 NA 0 0
#5 2 2002 2004 2003 1 1
#6 2 2003 2004 2005 1 0
#7 3 2002 2004 2006 1 0
#8 3 2003 2004 NA 0 0
#9 3 2005 2007 NA 0 1
Or you take the part describing the treat per state and date and merge it with the part describing state, start and end and sum the matching treat.
tt <- aggregate(treat ~ state + date, x[,c("state", "date", "treat")], sum)
tt <- merge(x[,c("state", "start", "end")], tt)
tt$treat[tt$start >= tt$date | tt$end <= tt$date] <- 0
aggregate(treat ~ start + end + state, tt, sum)
# start end state treat
#1 1998 2000 1 0
#2 1999 2000 1 0
#3 2000 2003 1 2
#4 2001 2002 2 0
#5 2002 2004 2 1
#6 2003 2004 2 0
#7 2002 2004 3 0
#8 2003 2004 3 0
#9 2005 2007 3 1
This gives your numbers though it repeats them on every row:
library(tidyverse)
df %>% group_by(state) %>%
mutate(result=sum(treat==1 & date>=min(start, na.rm=TRUE) & date<=max(end, na.rm=TRUE), na.rm=TRUE))
#> # A tibble: 9 x 6
#> # Groups: state [3]
#> state start end date treat result
#> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 1 1999 2000 2001 1 2
#> 2 1 1998 2000 2001 1 2
#> 3 1 2000 2003 NA 0 2
#> 4 2 2001 2002 NA 0 1
#> 5 2 2002 2004 2003 1 1
#> 6 2 2003 2004 2005 1 1
#> 7 3 2002 2004 2006 1 1
#> 8 3 2003 2004 NA 0 1
#> 9 3 2005 2007 NA 0 1
If you just want one number per group, summarize might be a better option:
df %>% group_by(state) %>%
summarize(result=sum(treat==1 & date>=min(start, na.rm=TRUE) & date<=max(end, na.rm=TRUE), na.rm=TRUE))
#> # A tibble: 3 x 2
#> state result
#> <dbl> <int>
#> 1 1 2
#> 2 2 1
#> 3 3 1

Melting/Splitting a row into two rows, using two column values in the original row, leaving the rest intact

I have a data.table as follows:
DT <- fread(
"ID country year Event_A Event_B
4 NLD 2002 0 1
5 NLD 2002 0 1
6 NLD 2006 1 1
7 NLD 2006 1 0
8 NLD 2006 1 1
9 GBR 2002 0 1
10 GBR 2002 0 0
11 GBR 2002 0 1
12 GBR 2006 1 1
13 GBR 2006 1 1",
header = TRUE)
I want to cast the event columns over the row without summing them, creating new rows. I tried:
meltedsessions <- melt(Exp, id.vars = -c(Event_A", "Event_B"), measure.vars = c("Event_A", "Event_B"))
I need to specify id.vars as a negative because the actual dataset has another 240 variables that need to stay intact. However if I do this I get the error:
Error in melt.data.table(Exp, id.vars = c("ID", "country", "year"), measure.vars = c("Event_A", :
One or more values in 'id.vars' is invalid.
How should I solve this?
Desired output:
DT <- fread(
"NewID ID country year Event
1 4 NLD 2002 0
2 4 NLD 2002 1
3 5 NLD 2002 0
4 5 NLD 2002 1
5 6 NLD 2006 1
6 6 NLD 2006 1
7 7 NLD 2006 1
8 7 NLD 2006 0
9 8 NLD 2006 1
10 8 NLD 2006 0
11 9 GBR 2002 1
12 9 GBR 2002 1
13 10 GBR 2002 0
14 10 GBR 2002 0
15 11 GBR 2002 0
16 12 GBR 2002 1
17 13 GBR 2006 1
18 14 GBR 2006 1
19 15 GBR 2006 1
20 16 GBR 2006 1",
header = TRUE)
Instead of - in id.var, can use setdiff
library(data.table)
melt(DT, id.var = setdiff(names(DT), c("Event_A", "Event_B")),
value.name = 'Event')[, variable := NULL][order(ID)]
# ID country year Event
# 1: 4 NLD 2002 0
# 2: 4 NLD 2002 1
# 3: 5 NLD 2002 0
# 4: 5 NLD 2002 1
# 5: 6 NLD 2006 1
# 6: 6 NLD 2006 1
# 7: 7 NLD 2006 1
# 8: 7 NLD 2006 0
# 9: 8 NLD 2006 1
#10: 8 NLD 2006 1
#11: 9 GBR 2002 0
#12: 9 GBR 2002 1
#13: 10 GBR 2002 0
#14: 10 GBR 2002 0
#15: 11 GBR 2002 0
#16: 11 GBR 2002 1
#17: 12 GBR 2006 1
#18: 12 GBR 2006 1
#19: 13 GBR 2006 1
#20: 13 GBR 2006 1

Resources