Rename/recode variable value in R based on condition using dplyr - r

I have a dataset dataExtended with variable CountryOther and n which is a count of wines in that particular country. CountryOther is character type and n is integer. What I want to do, is to rename values in CountryOther to Other in case the n <=20. I would like to do it with dyplr package and I am not sure how to do it and if to use only mutate or mutate_at.
As long as I wasn't able to do wrote the condition as stated above, I tried to do it manually as follows but it didn't work:
dataExtended$CountryOther <- dataExtended$Country
dataExtended %>%
mutate(CountryOther = recode(CountryOther,
China = "Other",
Mexico = "Other",
Slovakia = "Other",
Bulgaria = "Other",
Canada = "Other",
Croatia = "Other",
Uruguay = "Other",
Georgia = "Other",
Turkey = "Other",
Moldova = "Other",
Slovenia = "Other",
Hungary = "Other",
Switzerland = "Other",
Greece = "Other",
Israel = "Other",
Lebanon= "Other"))

Using the Red.csv from your link imported with readr::read_csv() creates a data.frame / tibble
#> data
# A tibble: 8,666 × 8
Name Country Region Winery Rating NumberOf…¹ Price Year
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr>
1 Pomerol 2011 France Pomerol Château La Providence 4.2 100 95 2011
2 Lirac 2017 France Lirac Château Mont-Redon 4.3 100 15.5 2017
3 Erta e China Rosso di Toscana 2015 Italy Toscana Renzo Masi 3.9 100 7.45 2015
4 Bardolino 2019 Italy Bardolino Cavalchina 3.5 100 8.72 2019
5 Ried Scheibner Pinot Noir 2016 Austria Carnuntum Markowitsch 3.9 100 29.2 2016
6 Gigondas (Nobles Terrasses) 2017 France Gigondas Vieux Clocher 3.7 100 19.9 2017
7 Marion's Vineyard Pinot Noir 2016 New Zealand Wairarapa Schubert 4 100 43.9 2016
8 Red Blend 2014 Chile Itata Valley Viña La Causa 3.9 100 17.5 2014
9 Chianti 2015 Italy Chianti Castello Montaùto 3.6 100 10.8 2015
10 Tradition 2014 France Minervois Domaine des Aires Hautes 3.5 100 6.9 2014
# … with 8,656 more rows, and abbreviated variable name ¹​NumberOfRatings
Now with dplyrs help
library(dplyr)
data %>%
add_count(Country, name = "WineCount") %>%
mutate(CountryOther = ifelse(WineCount <= 20, "Other", Country))
we get
# A tibble: 8,666 × 10
Name Country Region Winery Rating Numbe…¹ Price Year WineC…² Count…³
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int> <chr>
1 Pomerol 2011 France Pomerol Château La… 4.2 100 95 2011 2256 France
2 Lirac 2017 France Lirac Château Mo… 4.3 100 15.5 2017 2256 France
3 Erta e China Rosso di Toscana 2015 Italy Toscana Renzo Masi 3.9 100 7.45 2015 2650 Italy
4 Bardolino 2019 Italy Bardolino Cavalchina 3.5 100 8.72 2019 2650 Italy
5 Ried Scheibner Pinot Noir 2016 Austria Carnuntum Markowitsch 3.9 100 29.2 2016 220 Austria
6 Gigondas (Nobles Terrasses) 2017 France Gigondas Vieux Cloc… 3.7 100 19.9 2017 2256 France
7 Marion's Vineyard Pinot Noir 2016 New Zealand Wairarapa Schubert 4 100 43.9 2016 63 New Ze…
8 Red Blend 2014 Chile Itata Valley Viña La Ca… 3.9 100 17.5 2014 326 Chile
9 Chianti 2015 Italy Chianti Castello M… 3.6 100 10.8 2015 2650 Italy
10 Tradition 2014 France Minervois Domaine de… 3.5 100 6.9 2014 2256 France
# … with 8,656 more rows, and abbreviated variable names ¹​NumberOfRatings, ²​WineCount, ³​CountryOther
We can filter for WineCount <= 30:
# A tibble: 125 × 10
Name Country Region Winery Rating Numbe…¹ Price Year WineC…² Count…³
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int> <chr>
1 Steiner 2013 Hungary Sopron Wenin… 3.7 100 24.5 2013 9 Other
2 Viile Metamorfosis Merlot 2015 Romania Dealu Mare Vitis… 3.5 102 7.5 2015 23 Romania
3 Halkidiki Limnio - Merlot 2013 Greece Chalkidiki Tsant… 3.2 105 12.5 2013 13 Other
4 Cabernet Sauvignon 2013 Mexico Valle de Guad… L. A.… 3.4 1066 8.65 2013 1 Other
5 Driopi Classic Agiorgitiko Nemea 2017 Greece Nemea Κτημα… 3.7 107 11.5 2017 13 Other
6 Malbec de Purcari 2018 Moldova South Eastern Châte… 4.1 107 12.0 2018 8 Other
7 Cabernet Sauvignon de Purcari 2017 Moldova South Eastern Châte… 4.1 1082 13.0 2017 8 Other
8 Cabernet Sauvignon 2016 Romania Samburesti Caste… 3.3 112 7.9 2016 23 Romania
9 Aigle Les Murailles Rouge 2015 Switzerland Aigle Henri… 3.7 112 23.2 2015 12 Other
10 Γουμένισσα (Goumenissa) 2015 Greece Goumenissa Chatz… 3.7 115 20 2015 13 Other
to check the desired output: There are several rows filled with "Other" in column CountryOther.

in the end I created this code which works:
#New table with wine count
wineCount <- data %>% count(Country)
#Joining two tables together
dataExtended <- inner_join(wineCount, data, by = "Country")
# Creating new variable CountryOther
dataExtended$CountryOther <- dataExtended$Country
# Renaming count from n to WineCount
dataExtended <- rename(dataExtended, WineCount = n)
# Replacement of countries with WineCount<=20 to Other
dataExtended <- dataExtended %>%
mutate(CountryOther = ifelse(WineCount<=20, "Other", CountryOther))
# Final check
unique(dataExtended$CountryOther)
The problem was I needed to store changes into the dataframe, which I didn't do before (as you can see in my last comment):
dataExtended <- rename(dataExtended, WineCount = n)
and
dataExtended <- dataExtended %>%
mutate(CountryOther = ifelse(WineCount<=20, "Other", CountryOther))
I also tested your code and it works as well and additionally it looks neater. So thank you very much for your help.

Related

How can I merge variables to my dataframe from another dataframe if the year is the same?

I have the dataframe assets_year:
fiscalyear countryname Assets net_margin
<int> <chr> <dbl> <dbl>
1 2010 Austria 1602544072. 1.72
2 2010 Belgium 2534519957. 0.974
3 2010 Estonia 33248259. 1.31
4 2010 Finland 1490200498. 1.42
5 2010 France 17137601040. 1.51
6 2010 Germany 11553780086. 2.32
tail
fiscalyear countryname Assets net_margin
<int> <chr> <dbl> <dbl>
1 2017 Luxembourg 503785373. 0.730
2 2017 Netherlands 3810079489. 1.40
3 2017 Portugal 504072448. 1.73
4 2017 Slovakia 61735274. 2.49
5 2017 Slovenia 41642423. 1.96
6 2017 Spain 4397884239. 1.39
Additionally, I summed up the asset values per year in another DF:
fiscalyear `sum(Assets)`
<int> <dbl>
1 2010 52192928317.
2 2011 55914561036.
3 2012 52202110772.
4 2013 42418952433.
5 2014 53001352848.
6 2015 43550880007.
In order to scale net margin per asset value, I would like to cbind(...) the sum(assets) to my preexisting dataframe which is in panel format. Thus all countries have a entry for 2010, 2011 ... 2017.

using separate() to separate numbers stuck together (Example: 201612) in R [duplicate]

This question already has answers here:
Splitting Columns by Number of Characters [duplicate]
(2 answers)
Closed 2 years ago.
I want to separate the month_date_yyyymm column from this tibble:
month_date_yyyymm postal_code zip_name nielsen_hh_rank hotness_rank hotness_score
<dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 201612 80230 denver, co 8459 3420 74.0
2 201612 80503 longmont, co 2233 6088 60.7
3 201612 38221 big sandy, tn 15014 12539 25.5
4 201612 13691 theresa, ny 15586 14796 11.6
5 201612 19076 prospect park, pa 11777 1661 84.4
6 201612 18036 coopersburg, pa 8235 7870 51.5
>
I want the tibble to look like this
year month postal_code zip_name nielsen_hh_rank hotness_rank hotness_score
<chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
1 2016 12 80230 denver, co 8459 3420 74.0
2 2016 12 80503 longmont, co 2233 6088 60.7
3 2016 12 38221 big sandy, tn 15014 12539 25.5
4 2016 12 13691 theresa, ny 15586 14796 11.6
5 2016 12 19076 prospect park, pa 11777 1661 84.4
6 2016 12 18036 coopersburg, pa 8235 7870 51.5
I can't figure out how to separate numbers that are stuck together, such as the month_date_yyyymm column. I know it has something to do with sep = in the separate function. Here is my code:
hotness_cleaned <- hotness %>% separate(month_date_yyyymm, into = c("year", "month"), sep = "2016", remove = T)
However, it's showing up like this:
year month postal_code zip_name nielsen_hh_rank hotness_rank hotness_score
<chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
1 "" 12 80230 denver, co 8459 3420 74.0
2 "" 12 80503 longmont, co 2233 6088 60.7
3 "" 12 38221 big sandy, tn 15014 12539 25.5
4 "" 12 13691 theresa, ny 15586 14796 11.6
5 "" 12 19076 prospect park, pa 11777 1661 84.4
6 "" 12 18036 coopersburg, pa 8235 7870 51.5
What is the correct syntax for separating numbers that are stuck together using "sep = "?
Thank you.
We can specify the position index in sep
library(dplyr)
library(tidyr)
hotness %>%
separate(month_date_yyyymm, into = c("year", "month"),
sep = 4, remove = TRUE, convert = TRUE)
-output
# year month postal_code zip_name nielsen_hh_rank hotness_rank hotness_score
#1 2016 12 80230 denver, co 8459 3420 74.0
#2 2016 12 80503 longmont, co 2233 6088 60.7
#3 2016 12 38221 big sandy, tn 15014 12539 25.5
#4 2016 12 13691 theresa, ny 15586 14796 11.6
#5 2016 12 19076 prospect park, pa 11777 1661 84.4
#6 2016 12 18036 coopersburg, pa 8235 7870 51.5
data
hotness <- structure(list(month_date_yyyymm = c(201612L, 201612L, 201612L,
201612L, 201612L, 201612L), postal_code = c(80230L, 80503L, 38221L,
13691L, 19076L, 18036L), zip_name = c("denver, co", "longmont, co",
"big sandy, tn", "theresa, ny", "prospect park, pa", "coopersburg, pa"
), nielsen_hh_rank = c(8459L, 2233L, 15014L, 15586L, 11777L,
8235L), hotness_rank = c(3420L, 6088L, 12539L, 14796L, 1661L,
7870L), hotness_score = c(74, 60.7, 25.5, 11.6, 84.4, 51.5)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))

Merging two data frames with different rows in R

I have two data frames. The first one looks like
Country Year production
Germany 1996 11
France 1996 12
Greece 1996 15
UK 1996 17
USA 1996 24
The second one contains all the countries that are in the first data frame plus a few more countries for year 2018. It looks likes this
Country Year production
Germany 2018 27
France 2018 29
Greece 2018 44
UK 2018 46
USA 2018 99
Austria 2018 56
Japan 2018 66
I would like to merge the two data frames, and the final table should look like this:
Country Year production
Germany 1996 11
France 1996 12
Greece 1996 15
UK 1996 17
USA 1996 24
Austria 1996 NA
Japan 1996 NA
Germany 2018 27
France 2018 29
Greece 2018 44
UK 2018 46
USA 2018 99
Austria 2018 56
Japan 2018 66
I've tried several functions including full_join, merge, and rbind but they didn't work. Does anybody have any ideas?
With dplyr and tidyr, you may use:
bind_rows(df1, df2) %>%
complete(Country, Year)
Country Year production
<chr> <int> <int>
1 Austria 1996 NA
2 Austria 2018 56
3 France 1996 12
4 France 2018 29
5 Germany 1996 11
6 Germany 2018 27
7 Greece 1996 15
8 Greece 2018 44
9 Japan 1996 NA
10 Japan 2018 66
11 UK 1996 17
12 UK 2018 46
13 USA 1996 24
14 USA 2018 99
Consider base R with expand.grid and merge (and avoid any dependencies should you be a package author):
# BUILD DF OF ALL POSSIBLE COMBINATIONS OF COUNTRY AND YEAR
all_country_years <- expand.grid(Country=unique(c(df_96$Country, df_18$Country)),
Year=c(1996, 2018))
# MERGE (LEFT JOIN)
final_df <- merge(all_country_years, rbind(df_96, df_18), by=c("Country", "Year"),
all.x=TRUE)
# ORDER DATA AND RESET ROW NAMES
final_df <- data.frame(with(final_df, final_df[order(Year, Country),]),
row.names = NULL)
final_df
# Country Year production
# 1 Germany 1996 11
# 2 France 1996 12
# 3 Greece 1996 15
# 4 UK 1996 17
# 5 USA 1996 24
# 6 Austria 1996 NA
# 7 Japan 1996 NA
# 8 Germany 2018 27
# 9 France 2018 29
# 10 Greece 2018 44
# 11 UK 2018 46
# 12 USA 2018 99
# 13 Austria 2018 56
# 14 Japan 2018 66
Demo

Rescale data frame columns as percentages of baseline entry with dplyr

I often need to rescale time series relative to their value at a certain baseline time (usually as a percent of the baseline). Here's an example.
> library(dplyr)
> library(magrittr)
> library(tibble)
> library(tidyr)
# [messages from package imports snipped]
> set.seed(42)
> mexico <- tibble(Year=2000:2004, Country='Mexico', A=10:14+rnorm(5), B=20:24+rnorm(5))
> usa <- tibble(Year=2000:2004, Country='USA', A=30:34+rnorm(5), B=40:44+rnorm(5))
> table <- rbind(mexico, usa)
> table
# A tibble: 10 x 4
Year Country A B
<int> <chr> <dbl> <dbl>
1 2000 Mexico 11.4 19.9
2 2001 Mexico 10.4 22.5
3 2002 Mexico 12.4 21.9
4 2003 Mexico 13.6 25.0
5 2004 Mexico 14.4 23.9
6 2000 USA 31.3 40.6
7 2001 USA 33.3 40.7
8 2002 USA 30.6 39.3
9 2003 USA 32.7 40.6
10 2004 USA 33.9 45.3
I want to scale A and B to express each value as a percent of the country-specific 2001 value (i.e., the A and B entries in rows 2 and 7 should be 100). My way of doing this is somewhat roundabout and awkward: extract the baseline values into a separate table, merge them back into a separate column in the main table, and then compute scaled values, with annoying intermediate gathering and spreading to avoid specifying the column names of each time series (real data sets can have far more than two value columns). Is there a better way to do this, ideally with a single short pipeline?
> long_table <- table %>% gather(variable, value, -Year, -Country)
> long_table
# A tibble: 20 x 4
Year Country variable value
<int> <chr> <chr> <dbl>
1 2000 Mexico A 11.4
2 2001 Mexico A 10.4
#[remaining tibble printout snipped]
> baseline_table <- long_table %>%
filter(Year == 2001) %>%
select(-Year) %>%
rename(baseline=value)
> baseline_table
# A tibble: 4 x 3
Country variable baseline
<chr> <chr> <dbl>
1 Mexico A 10.4
2 USA A 33.3
3 Mexico B 22.5
4 USA B 40.7
> normalized_table <- long_table %>%
inner_join(baseline_table) %>%
mutate(value=100*value/baseline) %>%
select(-baseline) %>%
spread(variable, value) %>%
arrange(Country, Year)
Joining, by = c("Country", "variable")
> normalized_table
# A tibble: 10 x 4
Year Country A B
<int> <chr> <dbl> <dbl>
1 2000 Mexico 109. 88.4
2 2001 Mexico 100. 100
3 2002 Mexico 118. 97.3
4 2003 Mexico 131. 111.
5 2004 Mexico 138. 106.
6 2000 USA 94.0 99.8
7 2001 USA 100 100
8 2002 USA 92.0 96.6
9 2003 USA 98.3 99.6
10 2004 USA 102. 111.
My second attempt was to use transform, but this failed because transform doesn't seem to recognize dplyr groups, and it would be suboptimal even if it worked because it requires me to know that 2001 is the second year in the time series.
> table %>%
arrange(Country, Year) %>%
gather(variable, value, -Year, -Country) %>%
group_by(Country, variable) %>%
transform(norm=value*100/value[2])
Year Country variable value norm
1 2000 Mexico A 11.37096 108.9663
2 2001 Mexico A 10.43530 100.0000
3 2002 Mexico A 12.36313 118.4741
4 2003 Mexico A 13.63286 130.6418
5 2004 Mexico A 14.40427 138.0340
6 2000 USA A 31.30487 299.9901
7 2001 USA A 33.28665 318.9811
8 2002 USA A 30.61114 293.3422
9 2003 USA A 32.72121 313.5627
10 2004 USA A 33.86668 324.5395
11 2000 Mexico B 19.89388 190.6402
12 2001 Mexico B 22.51152 215.7247
13 2002 Mexico B 21.90534 209.9157
14 2003 Mexico B 25.01842 239.7480
15 2004 Mexico B 23.93729 229.3876
16 2000 USA B 40.63595 389.4085
17 2001 USA B 40.71575 390.1732
18 2002 USA B 39.34354 377.0235
19 2003 USA B 40.55953 388.6762
20 2004 USA B 45.32011 434.2961
It would be nice for this to be more scalable, but here's a simple solution. You can refer to A[Year == 2001] inside mutate, much as you might do table$A[table$Year == 2001] in base R. This lets you scale against your baseline of 2001 or whatever other year you might need.
Edit: I was missing a group_by to ensure that values are only being scaled against other values in their own group. The "sanity check" (that I clearly didn't do) is that values for Mexico in 2001 should have a scaled value of 1, and same for USA and any other countries.
library(tidyverse)
set.seed(42)
mexico <- tibble(Year=2000:2004, Country='Mexico', A=10:14+rnorm(5), B=20:24+rnorm(5))
usa <- tibble(Year=2000:2004, Country='USA', A=30:34+rnorm(5), B=40:44+rnorm(5))
table <- rbind(mexico, usa)
table %>%
group_by(Country) %>%
mutate(A_base2001 = A / A[Year == 2001], B_base2001 = B / B[Year == 2001])
#> # A tibble: 10 x 6
#> # Groups: Country [2]
#> Year Country A B A_base2001 B_base2001
#> <int> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 2000 Mexico 11.4 19.9 1.09 0.884
#> 2 2001 Mexico 10.4 22.5 1 1
#> 3 2002 Mexico 12.4 21.9 1.18 0.973
#> 4 2003 Mexico 13.6 25.0 1.31 1.11
#> 5 2004 Mexico 14.4 23.9 1.38 1.06
#> 6 2000 USA 31.3 40.6 0.940 0.998
#> 7 2001 USA 33.3 40.7 1 1
#> 8 2002 USA 30.6 39.3 0.920 0.966
#> 9 2003 USA 32.7 40.6 0.983 0.996
#> 10 2004 USA 33.9 45.3 1.02 1.11
Created on 2018-05-23 by the reprex package (v0.2.0).
Inspired by Camille's answer, I found one simple approach that that scales well:
table %>%
gather(variable, value, -Year, -Country) %>%
group_by(Country, variable) %>%
mutate(value=100*value/value[Year == 2001]) %>%
spread(variable, value)
# A tibble: 10 x 4
# Groups:   Country [2]
Year Country A B
<int> <chr> <dbl> <dbl>
1 2000 Mexico 109. 88.4
2 2000 USA 94.0 99.8
3 2001 Mexico 100. 100
4 2001 USA 100 100
5 2002 Mexico 118. 97.3
6 2002 USA 92.0 96.6
7 2003 Mexico 131. 111.
8 2003 USA 98.3 99.6
9 2004 Mexico 138. 106.
10 2004 USA 102. 111.
Preserving the the original values alongside the scaled ones takes more work. Here are two approaches. One of them uses an extra gather call to produce two variable-name columns (one indicating the series name, the other marking original or scaled), then unifying them into one column and reformatting.
table %>%
gather(variable, original, -Year, -Country) %>%
group_by(Country, variable) %>%
mutate(scaled=100*original/original[Year == 2001]) %>%
gather(scaled, value, -Year, -Country, -variable) %>%
unite(variable_scaled, variable, scaled, sep='_') %>%
mutate(variable_scaled=gsub("_original", "", variable_scaled)) %>%
spread(variable_scaled, value)
# A tibble: 10 x 6
# Groups:   Country [2]
Year Country A A_scaled B B_scaled
<int> <chr> <dbl> <dbl> <dbl> <dbl>
1 2000 Mexico 11.4 109. 19.9 88.4
2 2000 USA 31.3 94.0 40.6 99.8
3 2001 Mexico 10.4 100. 22.5 100
4 2001 USA 33.3 100 40.7 100
5 2002 Mexico 12.4 118. 21.9 97.3
6 2002 USA 30.6 92.0 39.3 96.6
7 2003 Mexico 13.6 131. 25.0 111.
8 2003 USA 32.7 98.3 40.6 99.6
9 2004 Mexico 14.4 138. 23.9 106.
10 2004 USA 33.9 102. 45.3 111.
A second equivalent approach creates a new table with the columns scaled "in place" and then merges it back into with the original one.
table %>%
gather(variable, value, -Year, -Country) %>%
group_by(Country, variable) %>%
mutate(value=100*value/value[Year == 2001]) %>%
ungroup() %>%
mutate(variable=paste(variable, 'scaled', sep='_')) %>%
spread(variable, value) %>%
inner_join(table)
Joining, by = c("Year", "Country")
# A tibble: 10 x 6
Year Country A_scaled B_scaled A B
<int> <chr> <dbl> <dbl> <dbl> <dbl>
1 2000 Mexico 109. 88.4 11.4 19.9
2 2000 USA 94.0 99.8 31.3 40.6
3 2001 Mexico 100. 100 10.4 22.5
4 2001 USA 100 100 33.3 40.7
5 2002 Mexico 118. 97.3 12.4 21.9
6 2002 USA 92.0 96.6 30.6 39.3
7 2003 Mexico 131. 111. 13.6 25.0
8 2003 USA 98.3 99.6 32.7 40.6
9 2004 Mexico 138. 106. 14.4 23.9
10 2004 USA 102. 111. 33.9 45.3
It's possible to replace the final inner_join with arrange(County, Year) %>% select(-Country, -Year) %>% bind_cols(table), which may perform better for some data sets, though it orders the columns suboptimally.

Remove rows with NA values and delete those observations in another year [duplicate]

This question already has answers here:
Filter rows in R based on values in multiple rows
(2 answers)
Closed 5 years ago.
I find it a bit hard to find the right words for what I'm trying to do.
Say I have this dataframe:
library(dplyr)
# A tibble: 74 x 3
country year conf_perc
<chr> <dbl> <dbl>
1 Canada 2017 77
2 France 2017 45
3 Germany 2017 60
4 Greece 2017 33
5 Hungary 2017 67
6 Italy 2017 38
7 Canada 2009 88
8 France 2009 91
9 Germany 2009 93
10 Greece 2009 NA
11 Hungary 2009 NA
12 Italy 2009 NA
Now I want to delete the rows that have NA values in 2009 but then I want to remove the rows of those countries in 2017 as well. I would like to get the following results:
# A tibble: 74 x 3
country year conf_perc
<chr> <dbl> <dbl>
1 Canada 2017 77
2 France 2017 45
3 Germany 2017 60
4 Canada 2009 88
5 France 2009 91
6 Germany 2009 93
We can do any after grouping by 'country'
library(dplyr)
df1 %>%
group_by(country) %>%
filter(!any(is.na(conf_perc)))
# A tibble: 6 x 3
# Groups: country [3]
# country year conf_perc
# <chr> <int> <int>
#1 Canada 2017 77
#2 France 2017 45
#3 Germany 2017 60
#4 Canada 2009 88
#5 France 2009 91
#6 Germany 2009 93
base R solution:
foo <- df$year == 2009 & is.na(df$conf_perc)
bar <- df$year == 2017 & df$country %in% unique(df$country[foo])
df[-c(which(foo), which(bar)), ]
# country year conf_perc
# 1 Canada 2017 77
# 2 France 2017 45
# 3 Germany 2017 60
# 7 Canada 2009 88
# 8 France 2009 91
# 9 Germany 2009 93

Resources