Using a conditional in a for loop to create a unique panel id - r

I have a dataset which looks as follows:
# A tibble: 5,458 x 539
# Groups: country, id1 [2,729]
idstd id2 xxx id1 country year
<dbl+> <dbl> <dbl+lbl> <dbl+lbl> <chr> <dbl>
1 445801 NA NA 7 Albania 2009
2 542384 4616555 1163 7 Albania 2013
3 445802 NA NA 8 Albania 2009
4 542386 4616355 1162 8 Albania 2013
5 445803 NA NA 25 Albania 2009
6 542371 4616545 1161 25 Albania 2013
7 445804 NA NA 30 Albania 2009
8 542152 4616556 475 30 Albania 2013
9 445805 NA NA 31 Albania 2009
10 542392 4616542 1160 31 Albania 2013
The data is paneldata, but is there is no unique panel-id. The first two observations are for example respondent number 7 from Albania, but number 7 is used again for other countries. id2 however is unique. My plan is therefore to copy id2 into the NA entry of the corresponding respondent.
I wrote the following code:
for (i in 1:nrow(df)) {
if (df$id1[i]== df$id1[i+1] & df$country[i] == df$country[i+1]) {
df$id2[i] <- df$id2[i+1]
}}
Which gives the following error:
Error in if (df$id1[i] == df1$id1[i + 1] & : missing value where TRUE/FALSE needed
It does however seem to work. As my dataset is quite large and I am not very skilled, I am reluctant to accept the solution I came up with, especially when it gives an error.
Could anyone may help explain the error to me?
In addition, is there a more efficient (for example data.table) and maybe error free way to deal with this?

Can you not do something along the line:
library(tidyverse)
df %>%
group_by(country, id1) %>%
mutate(uniqueId = id2 %>% discard(is.na) %>% unique) %>%
ungroup()
Also, from looking at your loop I judge that the NA are always 1 row apart from the unique IDs, so you could also do:
df %>%
mutate(id2Lag = lag(id2),
uniqueId = ifelse(is.na(id2), id2Lag, id2) %>%
select(-id2Lag)

Related

How to assign unique country ID number in panel data frame in R

In my dataset, I want to create unique country id numbers. Any help?
library(dplyr)
library(pwt10)
dataframe looks like
country isocode year currency gdp inflation ...
Aruba ABW 1950 N/A N/A N/A
Aruba ABW 1950 N/A N/A N/A
Aruba ABW 1950 N/A N/A N/A
Aruba ABW 1950 N/A N/A N/A
...
Argentina ARG 1950 Peso 130 60 ...
I want to create another column of country ID variable (id_num), whose values are written in descending order (1,2,3,....) so that it looks like the following:
country isocode year currency gdp inflation ID
Aruba ABW 1950 N/A N/A N/A 1
Aruba ABW 1950 N/A N/A N/A 1
Aruba ABW 1950 N/A N/A N/A 1
Aruba ABW 1950 N/A N/A N/A 1
...
Argentina ARG 1950 Peso 130 60 ... 5
`
I was wondering how to create the unique country ID column. Any help?
If I understood your task correctly you are looking to build a second (first is the isocode) group identification by sequencial numbering of groups. One way to achive this is the cur_group_id() function from dplyr. Here is a toy example you should be able to adapt to your data.frame:
library(dplyr)
# dummy data
df <- data.frame(col1 = c("a", "a", "b", "b", "b", "c") ,
col2 = 1:6)
df %>%
# arrange the data in growing order for column you want to build sequential group ID from/for
dplyr::arrange(col1) %>%
# build the groupings
dplyr::group_by(col1) %>%
# add new column : sequenctial group id
dplyr::mutate(ID = dplyr::cur_group_id()) %>%
# always ungroup to prevent unwanted behaviour down stream
dplyr::ungroup()
# A tibble: 6 x 3
col1 col2 ID
<chr> <int> <int>
1 a 1 1
2 a 2 1
3 b 3 2
4 b 4 2
5 b 5 2
6 c 6 3
Do you mean that "ARM" should return 1, "AUS" should return 2 and so on.
Maybe you can try this answer with match.
library(dplyr)
result <- pwt10.0 %>%
filter(isocode %in% comparison_states) %>%
distinct(isocode) %>%
mutate(id_num = match(comparison_states, isocode))
result
# isocode id_num
#ARM-1950 ARM 1
#AUS-1950 AUS 2
#CAN-1950 CAN 3
#CHN-1950 CHN 4
#GBR-1950 GBR 5
#ITA-1950 ITA 6
#JPN-1950 JPN 7
#LUX-1950 LUX 8
#NOR-1950 NOR 9
#NZL-1950 NZL 10
#SGP-1950 SGP 11
#SWE-1950 SWE 12
#THA-1950 THA 13
#TWN-1950 TWN 14
#USA-1950 USA 15

Revaluing many observations with a for loop in R

I have a data set where I am looking at longitudinal data for countries.
master.set <- data.frame(
Country = c(rep("Afghanistan", 3), rep("Albania", 3)),
Country.ID = c(rep("Afghanistan", 3), rep("Albania", 3)),
Year = c(2015, 2016, 2017, 2015, 2016, 2017),
Happiness.Score = c(3.575, 3.360, 3.794, 4.959, 4.655, 4.644),
GDP.PPP = c(1766.593, 1757.023, 1758.466, 10971.044, 11356.717, 11803.282),
GINI = NA,
Status = 2,
stringsAsFactors = F
)
> head(master.set)
Country Country.ID Year Happiness.Score GDP.PPP GINI Status
1 Afghanistan Afghanistan 2015 3.575 1766.593 NA 2
2 Afghanistan Afghanistan 2016 3.360 1757.023 NA 2
3 Afghanistan Afghanistan 2017 3.794 1758.466 NA 2
4 Albania Albania 2015 4.959 10971.044 NA 2
5 Albania Albania 2016 4.655 11356.717 NA 2
6 Albania Albania 2017 4.644 11803.282 NA 2
I created that Country.ID variable with the intent of turning them into numerical values 1:159.
I am hoping to avoid doing something like this to replace the value at each individual observation:
master.set$Country.ID <- master.set$Country.ID[master.set$Country.ID == "Afghanistan"] <- 1
As I implied, there are 159 countries listed in the data set. Because it' longitudinal, there are 460 observations.
Is there any way to use a for loop to save me a lot of time? Here is what I attempted. I made a couple of lists and attempted to use an ifelse command to tell R to label each country the next number.
Here is what I have:
#List of country names
N.Countries <- length(unique(master.set$Country))
Country <- unique(master.set$Country)
Country.ID <- unique(master.set$Country.ID)
CountryList <- unique(master.set$Country)
#For Loop to make Country ID numerically match Country
for (i in 1:460){
for (j in N.Countries){
master.set[[Country.ID[i]]] <- ifelse(master.set[[Country[i]]] == CountryList[j], j, master.set$Country)
}
}
I received this error:
Error in `[[<-.data.frame`(`*tmp*`, Country.ID[i], value = logical(0)) :
replacement has 0 rows, data has 460
Does anyone know how I can accomplish this task? Or will I be stuck using the ifelse command 159 times?
Thanks!
Maybe something like
master.set$Country.ID <- as.numeric(as.factor(master.set$Country.ID))
Or alternatively, using dplyr
library(tidyverse)
master.set <- master.set %>% mutate(Country.ID = as.numeric(as.factor(Country.ID)))
Or this, which creates a new variable Country.ID2based on a key-value pair between Country.ID and a 1:length(unique(Country)).
library(tidyverse)
master.set <- left_join(master.set,
data.frame( Country = unique(master.set$Country),
Country.ID2 = 1:length(unique(master.set$Country))))
master.set
#> Country Country.ID Year Happiness.Score GDP.PPP GINI Status
#> 1 Afghanistan Afghanistan 2015 3.575 1766.593 NA 2
#> 2 Afghanistan Afghanistan 2016 3.360 1757.023 NA 2
#> 3 Afghanistan Afghanistan 2017 3.794 1758.466 NA 2
#> 4 Albania Albania 2015 4.959 10971.044 NA 2
#> 5 Albania Albania 2016 4.655 11356.717 NA 2
#> 6 Albania Albania 2017 4.644 11803.282 NA 2
#> Country.ID2
#> 1 1
#> 2 1
#> 3 1
#> 4 2
#> 5 2
#> 6 2
library(dplyr)
df<-data.frame("Country"=c("Afghanistan","Afghanistan","Afghanistan","Albania","Albania","Albania"),
"Year"=c(2015,2016,2017,2015,2016,2017),
"Happiness.Score"=c(3.575,3.360,3.794,4.959,4.655,4.644),
"GDP.PPP"=c(1766.593,1757.023,1758.466,10971.044,11356.717,11803.282),
"GINI"=NA,
"Status"=rep(2,6))
df1<-df %>% arrange(Country) %>% mutate(Country_id = group_indices_(., .dots="Country"))
View(df1)

How to subtract each Country's value by year

I have data for each Country's happiness (https://www.kaggle.com/unsdsn/world-happiness), and I made data for each year of the reports. Now, I don't know how to get the values for each year subtracted from each other e.g. how did happiness rank change from 2015 to 2017/2016 to 2017? I'd like to make a new df of differences for each.
I was able to bind the tables for columns in common and started to work on removing Countries that don't have data for all 3 years. I'm not sure if I'm going down a complicated path.
keepcols <- c("Country","Happiness.Rank","Economy..GDP.per.Capita.","Family","Health..Life.Expectancy.","Freedom","Trust..Government.Corruption.","Generosity","Dystopia.Residual","Year")
mydata2015 = read.csv("C:\\Users\\mmcgown\\Downloads\\2015.csv")
mydata2015$Year <- "2015"
data2015 <- subset(mydata2015, select = keepcols )
mydata2016 = read.csv("C:\\Users\\mmcgown\\Downloads\\2016.csv")
mydata2016$Year <- "2016"
data2016 <- subset(mydata2016, select = keepcols )
mydata2017 = read.csv("C:\\Users\\mmcgown\\Downloads\\2017.csv")
mydata2017$Year <- "2017"
data2017 <- subset(mydata2017, select = keepcols )
df <- rbind(data2015,data2016,data2017)
head(df, n=10)
tail(df, n=10)
df15 <- df[df['Year']=='2015',]
df16 <- df[df['Year']=='2016',]
df17 <- df[df['Year']=='2017',]
nocon <- rbind(setdiff(unique(df16['Country']),unique(df17['Country'])),setdiff(unique(df15['Country']),unique(df16['Country'])))
Don't have a clear path to accomplish what I want but it would look like
df16_to_17
Country Happiness.Rank ...(other columns)
Yemen (Yemen[Happiness Rank in 2017] - Yemen[Happiness Rank in 2016])
USA (USA[Happiness Rank in 2017] - USA[Happiness Rank in 2016])
(other countries)
df15_to_16
Country Happiness.Rank ...(other columns)
Yemen (Yemen[Happiness Rank in 2016] - Yemen[Happiness Rank in 2015])
USA (USA[Happiness Rank in 2016] - USA[Happiness Rank in 2015])
(other countries)
It's very straightforward with dplyr, and involves grouping by country and then finding the differences between consecutive values with base R's diff. Just make sure to use df and not df15, etc.:
library(dplyr)
rank_diff_df <- df %>%
group_by(Country) %>%
mutate(Rank.Diff = c(NA, diff(Happiness.Rank)))
The above assumes that the data are arranged by year, which they are in your case because of the way you combined the dataframes. If not, you'll need to call arrange(Year) before the call to mutate. Filtering out countries with missing year data isn't necessary, but can be done after group_by() with filter(n() == 3).
If you would like to view the differences it would make sense to drop some variables and rearrange the data:
rank_diff_df %>%
select(Year, Country, Happiness.Rank, Rank.Diff) %>%
arrange(Country)
Which returns:
# A tibble: 470 x 4
# Groups: Country [166]
Year Country Happiness.Rank Rank.Diff
<chr> <fct> <int> <int>
1 2015 Afghanistan 153 NA
2 2016 Afghanistan 154 1
3 2017 Afghanistan 141 -13
4 2015 Albania 95 NA
5 2016 Albania 109 14
6 2017 Albania 109 0
7 2015 Algeria 68 NA
8 2016 Algeria 38 -30
9 2017 Algeria 53 15
10 2015 Angola 137 NA
# … with 460 more rows
The above data frame will work well with ggplot2 if you are planning on plotting the results.
If you don't feel comfortable with dplyr you can use base R's merge to combine the dataframes, and then create a new dataframe with the differences as columns:
df_wide <- merge(merge(df15, df16, by = "Country"), df17, by = "Country")
rank_diff_df <- data.frame(Country = df_wide$Country,
Y2015.2016 = df_wide$Happiness.Rank.y -
df_wide$Happiness.Rank.x,
Y2016.2017 = df_wide$Happiness.Rank -
df_wide$Happiness.Rank.y
)
Which returns:
head(rank_diff_df, 10)
Country Y2015.2016 Y2016.2017
1 Afghanistan 1 -13
2 Albania 14 0
3 Algeria -30 15
4 Angola 4 -1
5 Argentina -4 -2
6 Armenia -6 0
7 Australia -1 1
8 Austria -1 1
9 Azerbaijan 1 4
10 Bahrain -7 -1
Assuming the three datasets are present in your environment with the name data2015, data2016 and data2017, we can add a year column with the respective year and keep the columns which are present in keepcols vector. arrange the data by Country and Year, group_by Country, keep only those countries which are present in all 3 years and then subtract the values from previous rows using lag or diff.
library(dplyr)
data2015$Year <- 2015
data2016$Year <- 2016
data2017$Year <- 2017
df <- bind_rows(data2015, data2016, data2017)
data <- df[keepcols]
data %>%
arrange(Country, Year) %>%
group_by(Country) %>%
filter(n() == 3) %>%
mutate_at(-1, ~. - lag(.)) #OR
#mutate_at(-1, ~c(NA, diff(.)))
# A tibble: 438 x 10
# Groups: Country [146]
# Country Happiness.Rank Economy..GDP.pe… Family Health..Life.Ex… Freedom
# <chr> <int> <dbl> <dbl> <dbl> <dbl>
# 1 Afghan… NA NA NA NA NA
# 2 Afghan… 1 0.0624 -0.192 -0.130 -0.0698
# 3 Afghan… -13 0.0192 0.471 0.00731 -0.0581
# 4 Albania NA NA NA NA NA
# 5 Albania 14 0.0766 -0.303 -0.0832 -0.0387
# 6 Albania 0 0.0409 0.302 0.00109 0.0628
# 7 Algeria NA NA NA NA NA
# 8 Algeria -30 0.113 -0.245 0.00038 -0.0757
# 9 Algeria 15 0.0392 0.313 -0.000455 0.0233
#10 Angola NA NA NA NA NA
# … with 428 more rows, and 4 more variables: Trust..Government.Corruption. <dbl>,
# Generosity <dbl>, Dystopia.Residual <dbl>, Year <dbl>
The value of first row for each Year would always be NA, rest of the values would be subtracted by it's previous values.

R: Creating a table with the highest values by year

I hope I don't ask a question that has been asked already, but I couldn't quite find what I was looking for. I am fairly new to R and have no experience with programming.
I want to make a table with the top 10 values of three sections for each year If my data looks somthing like this:
Year Country Test1 Test2 Test3
2000 ALB 500 497 501
2001 ALB NA NA NA
...
2000 ARG 502 487 354
2001 ARG NA NA NA
...
(My years go from 2000 to 2015, I only have observations for every three years, and even in those years still a lot of NA's for some countries or tests)
I would like to get a table in which I can see the 10 top values for each test for each year. So for the year 2000,2003,2006,...,2015 the top ten values and the countries that reached those values for test 1,2&3.
AND then (I am not sure if this should be a separate question) I would like to get the table into Latex.
Easier to see top values this way.
You could use dcast and melt from data.table package:
# convert to data table
setDT(df)
# convert it to long format and select the columns to used
df1 <- melt(df, id.vars=1:2)
df1 <- df1[,c(1,2,4)]
# get top values year and country
df1 <- df1[,top_value := .(list(sort(value, decreasing = T))), .(Year, Country)][,.(Year, Country, top_value)]
print(df1)
Year Country top_value
1: 2000 ALB 501,500,497
2: 2001 ALB
3: 2000 ARG 502,487,354
4: 2001 ARG
5: 2000 ALB 501,500,497
6: 2001 ALB
7: 2000 ARG 502,487,354
8: 2001 ARG
9: 2000 ALB 501,500,497
10: 2001 ALB
11: 2000 ARG 502,487,354
12: 2001 ARG

Looping through two dataframes and adding columns inside of the loop

I have a problem when specifying a loop with a data frame.
The general idea I have is the following:
I have an area which contains a certain number of raster quadrants. These raster quadrants have been visited irregularily over several years (e.g. from 1950 -2015).
I have two data frames:
1) a data frame containing the IDs of the rasterquadrants (and one column for the year of first visit of this quadrant):
df1<- as.data.frame(cbind(c("12345","12346","12347","12348"),rep(NA,4)))
df1[,1]<- as.character(df1[,1])
df1[,2]<- as.numeric(df1[,2])
names(df1)<-c("Raster_Q","First_visit")
2) a data frame that contains the infos on the visits; this one is ordered with by 1st rasterquadrants and then 2nd years. This dataframe has the info when the rasterquadrant was visited and when.
df2<- as.data.frame(cbind(c(rep("12345",5),rep("12346",7),rep("12347",3),rep(12348,9)),
c(1950,1952,1955,1967,1951,1968,1970,
1998,2001,2014,2015,2017,1965,1986,2000,1952,1955,1957,1965,2003,2014,2015,2016,2017)))
df2[,1]<- as.character(df2[,1])
df2[,2]<- as.numeric(as.character(df2[,2]))
names(df2)<-c("Raster_Q","Year")
I want to know when and how often the full area was 'sampled'.
Scheme of what I want to do; different colors indicate different areas/regions
My rationale:
I sorted the complete data in df2 according to Quadrant and Year. I then match the rasterquadrant in df1 with the name of the rasterquadrant in df2 and the first value of year from df2 is added.
For this I wrote a loop (see below)
In order not to replicate a quadrant I created a vector "visited"
visited<-c()
Every entry of df2 that matches df1 will be written into this vector, so that the second entry of e.g. rasterquadrant "12345" in df2 is ignored in the loop.
Here comes the loop:
visited<- c()
for (i in 1:nrow(df2)){
index<- which(df1$"Raster_Q"==df2$"Raster_Q"[i])
if(length(index)==0) {next()} else{
if(df1$"Raster_Q"[index] %in% visited){next()} else{
df1$"First_visit"[index]<- df2$"Year"[i]
visited[index]<- df1$"Raster_Q"[index]
}
}
}
This gives me the first full sampling period.
Raster_Q First_visit
1 12345 1950
2 12346 1968
3 12347 1965
4 12348 1952
However, I want to have all full sampling periods.
So I do:
df1$"Second_visit"<-NA
I reset the visited vector and specify the following loop:
visited <- c()
for (i in 1:nrow(df2)){
if(df2$Year[i]<=max(df1$"First_visit")){next()} else{
index<- which(df1$"Raster_Q"==df2$"Raster_Q"[i])
if(length(index)==0) {next()} else{
if(df1$"Raster_Q"[index] %in% visited){next()} else{
df1$"Second_visit"[index]<- df2$"Year"[i]
visited[index]<- df1$"Raster_Q"[index]
}
}
}
}
Which is basically the same loop as before, however, only making sure that, if df2$"Year" in a certain raster quadrant has already been included in the first visit, then it is skipped.
That gives me the second full sampling period:
Raster_Q First_visit Second_visit
1 12345 1950 NA
2 12346 1968 1970
3 12347 1965 1986
4 12348 1952 2003
Okay, so far so good. I could do that all by hand. But I have loads and loads of rasterquadrants and several areas that can and should be screened in this way.
So doing all of this in a single loop for this would be really great! However, I realized that this will create a problem because the loop then gets recursive:
The added column will not be included in the subsequent iteration of the loop, because the df1 itself is not re-read for each loop, and in consequence, the new coulmn for the new sampling period will not be included in the following iterations:
visited<- c()
for (i in 1:nrow(df2)){
m<-ncol(df1)
index<- which(df1$"Raster_Q"==df2$"Raster_Q"[i])
if(length(index)==0) {next()} else{
if(df1$"Raster_Q"[index] %in% visited){next()} else{
df1[index,m]<- df2$"Year"[i]
visited[index]<- df1$"Raster_Q"[index]
#finish "first_visit"
df1[,m+1]<-NA
# add column for "second visit"
if(df2$Year[i]<=max(df1$"First_visit")){next()} else{
# make sure that the first visit year are not included
index<- which(df1$"Raster_Q"==df2$"Raster_Q"[i])
if(length(index)==0) {next()} else{
if(df1$"Raster_Q"[index] %in% visited){next()} else{
df1[index,m+1]<- df2$"Year"[i]
visited[index]<- df1$"Raster_Q"[index]
}
}
}
This won't work. Another issue is that the vector visited() is not emptied during this loop, so that basically every Raster_Q has already been visited in the second sampling period.
I am stuck.... any ideas?
You can do this without a for loop by using the dplyr and tidyr packages. First, you take your df2 and use dplyr::arrange to order by raster and year. Then you can rank the years visited using the rank function inside of the dplyr::mutate function. Then using tidyr::spread you can put them all in their own columns. Here is the code:
df <- df2 %>%
arrange(Raster_Q, Year) %>%
group_by(Raster_Q) %>%
mutate(visit = rank(Year),
visit = paste0("visit_", as.character(visit))) %>%
tidyr::spread(key = visit, value = Year)
Here is the output:
> df
# A tibble: 4 x 10
# Groups: Raster_Q [4]
Raster_Q visit_1 visit_2 visit_3 visit_4 visit_5 visit_6 visit_7 visit_8 visit_9
* <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 12345 1950 1951 1952 1955 1967 NA NA NA NA
2 12346 1968 1970 1998 2001 2014 2015 2017 NA NA
3 12347 1965 1986 2000 NA NA NA NA NA NA
4 12348 1952 1955 1957 1965 2003 2014 2015 2016 2017
EDIT: So I think I understand your problem a little better now. You are looking to remove all duplicate visits to each quadrant that happened before the maximum Year of each respective "round" of visits. So to accomplish this, I wrote a short function that in essence does what the code above does, but with a slight change. Here is the function:
filter_by_round <- function(data, round) {
output <- data %>%
arrange(Raster_Q, Year) %>%
group_by(Raster_Q) %>%
mutate(visit = rank(Year, ties.method = "first")) %>%
ungroup() %>%
mutate(in_round = ifelse(Year <= max(.$Year[.$visit == round]) & visit > round,
TRUE, FALSE)) %>%
filter(!in_round) %>%
select(-c(in_round, visit))
return(output)
}
What this function does, is look through the data and if a given year is less than the max year for the specified "visit round" then it is removed. To apply this only to the first round, you would do this:
df2 %>%
filter_by_round(1) %>%
group_by(Raster_Q) %>%
mutate(visit = rank(Year, ties.method = "first")) %>%
ungroup() %>%
mutate(visit = paste0("visit_", as.character(visit))) %>%
tidyr::spread(key = visit, value = Year)
which would give you this:
# A tibble: 4 x 8
Raster_Q visit_1 visit_2 visit_3 visit_4 visit_5 visit_6 visit_7
* <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 12345 1950 NA NA NA NA NA NA
2 12346 1968 1970 1998 2001 2014 2015 2017
3 12347 1965 1986 2000 NA NA NA NA
4 12348 1952 2003 2014 2015 2016 2017 NA
However, while it does accomplish what your for loop would have, you now have other occurrences of the same problem. I have come up with a way to do this successfully but it requires you to know how many "visit rounds" you had or some trial and error. To accomplish this, you can use map and assign the change to a global variable.
# I do this so we do not lose the original dataset
df <- df2
# I chose 1:5 after some trial and error showed there are 5 unique
# "visit rounds" in your toy dataset
# However, if you overshoot your number, it should still work,
# you will just get warnings about `max` not working correctly
# however, this may casue issues, so figuring out your exact number is
# recommended
purrr::map(1:5, function(x){
# this assigns the output of each iteration to the global variable df
df <<- df %>%
filter_by_round(x)
})
# now applying the original transformation to get the spread dataset
df %>%
group_by(Raster_Q) %>%
mutate(visit = rank(Year, ties.method = "first")) %>%
ungroup() %>%
mutate(visit = paste0("visit_", as.character(visit))) %>%
tidyr::spread(key = visit, value = Year)
This will give you the following output:
# A tibble: 4 x 6
Raster_Q visit_1 visit_2 visit_3 visit_4 visit_5
* <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 12345 1950 NA NA NA NA
2 12346 1968 1970 2014 2015 2017
3 12347 1965 1986 NA NA NA
4 12348 1952 2003 2014 2015 2016
granted, this is probably not the most elegant solution, but it works. Hopefully this solves the problem for you

Resources