How to find rolling top 3 values in a column by group?

How to find rolling top 3 values in a column by group? - r

A data frame has 3 columns
-----------------------------------------
| Id | Country | Date |
-----------------------------------------
The 3 columns record the travel history of the person.
3 more columns need to be created representing the rolling top 3 countries this person (ID) has travelled to the most often before the date on the row.
(If tie appears for 2 countries, the latest travelled country has the precedence.)
mydata <- data.frame(ID = c('A1B1', 'A1B1', 'A1B1', 'A1B1', 'A1B1', 'A1B1', 'A1B1', 'A1B1', 'A2B2', 'A2B2', 'A2B2', 'A2B2', 'A2B2', 'A2B2'),
Country = c('Japan', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Japan', 'France', 'UK', 'Spain', 'Spain', 'UK', 'UK', 'Brazil'),
Date = as.Date(c('2010/01/02', '2010/04/18', '2011/03/22', '2011/11/23', '2012/05/09', '2012/09/11', '2014/01/06', '2015/12/11', '2010/04/03', '2010/05/11', '2011/05/01', '2012/03/01', '2013/01/03', '2014/01/04')))
# final data should look like below
#ID Country Date Pref1 Pref2 Pref3
#A1B1 Japan 2010-01-02 NA NA NA
#A1B1 USA 2010-04-18 Japan NA NA
#A1B1 USA 2011-03-22 USA Japan NA
#A1B1 USA 2011-11-23 USA Japan NA
#A1B1 Germany 2012-05-09 USA Japan NA
#A1B1 Germany 2012-09-11 USA Germany Japan
#A1B1 Japan 2014-01-06 USA Germany Japan
#A1B1 France 2015-12-11 USA Japan Germany
#A2B2 UK 2010-04-03 NA NA NA
#A2B2 Spain 2010-05-11 UK NA NA
#A2B2 Spain 2011-05-01 Spain UK NA
#A2B2 UK 2012-03-01 Spain UK NA
#A2B2 UK 2013-01-03 UK Spain NA
#A2B2 Brazil 2014-01-04 UK Spain NA
Q. How to create the last 3 columns for rolling top 3 countries in counts by ID?

Here is a way taking last 3 unique countries at each row for each ID.
library(dplyr)
mydata %>%
group_by(ID) %>%
mutate(data = purrr::map(row_number(), ~{
un_country <- Country[seq_len(.x - 1)]
if(.x == 1) un_country <- NA
else un_country <- names(sort(table(un_country), decreasing = TRUE))[1:3]
data.frame(t(un_country[1:3]))
})) %>%
tidyr::unnest_wider(data)
# ID Country Date X1 X2 X3
# <chr> <chr> <date> <chr> <chr> <chr>
# 1 A1B1 Japan 2010-01-02 NA NA NA
# 2 A1B1 USA 2010-04-18 Japan NA NA
# 3 A1B1 USA 2011-03-22 Japan USA NA
# 4 A1B1 USA 2011-11-23 USA Japan NA
# 5 A1B1 Germany 2011-05-09 USA Japan NA
# 6 A1B1 Germany 2012-09-11 USA Germany Japan
# 7 A1B1 Japan 2014-01-06 USA Germany Japan
# 8 A1B1 France 2015-12-11 USA Germany Japan
# 9 A2B2 UK 2010-04-03 NA NA NA
#10 A2B2 Spain 2010-05-11 UK NA NA
#11 A2B2 Spain 2011-05-01 Spain UK NA
#12 A2B2 UK 2012-03-01 Spain UK NA
#13 A2B2 UK 2013-01-03 Spain UK NA
#14 A2B2 Brazil 2014-01-04 UK Spain NA

I think this does it. I've included the mydata here as I think there was a typo in one of the dates.
mydata <- data.frame(ID = c('A1B1', 'A1B1', 'A1B1', 'A1B1', 'A1B1', 'A1B1', 'A1B1', 'A1B1', 'A2B2', 'A2B2', 'A2B2', 'A2B2', 'A2B2', 'A2B2'),
Country = c('Japan', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Japan', 'France', 'UK', 'Spain', 'Spain', 'UK', 'UK', 'Brazil'),
Date = as.Date(c('2010/01/02', '2010/04/18', '2011/03/22', '2011/11/23', '2012/05/09', '2012/09/11', '2014/01/06', '2015/12/11', '2010/04/03', '2010/05/11', '2011/05/01', '2012/03/01', '2013/01/03', '2014/01/04')))
library(data.table)
setDT(mydata)
mydata[order(Date), `:=`(num_v = seq_len(.N), last_v = Date), .(ID, Country)]
x <- mydata[
mydata[, CJ(Country = unique(Country), Date = unique(Date)), ID],
on=c('ID', 'Country', 'Date'), roll=Inf]
x[, `:=`(num_v = shift(num_v), last_v = shift(last_v)), .(ID, Country)]
x[is.na(num_v), Country := NA]
y <- x[,
.SD[order(-num_v, -last_v)][1:3, .(Pref = paste0('Pref',1:3), Country)],
.(ID, Date)]
dcast(y, ID+Date~Pref, value.var = 'Country')
#> ID Date Pref1 Pref2 Pref3
#> 1: A1B1 2010-01-02 <NA> <NA> <NA>
#> 2: A1B1 2010-04-18 Japan <NA> <NA>
#> 3: A1B1 2011-03-22 USA Japan <NA>
#> 4: A1B1 2011-11-23 USA Japan <NA>
#> 5: A1B1 2012-05-09 USA Japan <NA>
#> 6: A1B1 2012-09-11 USA Germany Japan
#> 7: A1B1 2014-01-06 USA Germany Japan
#> 8: A1B1 2015-12-11 USA Japan Germany
#> 9: A2B2 2010-04-03 <NA> <NA> <NA>
#> 10: A2B2 2010-05-11 UK <NA> <NA>
#> 11: A2B2 2011-05-01 Spain UK <NA>
#> 12: A2B2 2012-03-01 Spain UK <NA>
#> 13: A2B2 2013-01-03 UK Spain <NA>
#> 14: A2B2 2014-01-04 UK Spain <NA>
You can join back on the Country from the original mydata if you need it.

This isn't a super clean answer. Hopefully it helps you gets you close.
library(readr)
df <- readr::read_table(
"ID Country Date
A1B1 Japan 2010-01-02
A1B1 USA 2010-04-18
A1B1 USA 2011-03-22
A1B1 USA 2011-11-23
A1B1 Germany 2012-05-09
A1B1 Germany 2012-09-11
A1B1 Japan 2014-01-06
A1B1 France 2015-12-11
A2B2 UK 2010-04-03
A2B2 Spain 2010-05-11
A2B2 Spain 2011-05-01
A2B2 UK 2012-03-01
A3B2 UK 2013-01-03
A3B2 Brazil 2014-01-04")
df
library(tidyverse)
rankings <- df %>%
group_by(ID, Country) %>%
summarise(obs = n(),
last_dt = max(Date)) %>%
arrange(ID,-obs, desc(last_dt)) %>%
mutate(rank = 1:n()) %>% print() %>%
filter(rank <= 3) %>%
pivot_wider(
names_from = rank,
values_from = Country,
names_prefix = "rank_",
id_cols = ID
) %>% print()
#> `summarise()` regrouping output by 'ID' (override with `.groups` argument)
#> # A tibble: 8 x 5
#> # Groups: ID [3]
#> ID Country obs last_dt rank
#> <chr> <chr> <int> <date> <int>
#> 1 A1B1 USA 3 2011-11-23 1
#> 2 A1B1 Japan 2 2014-01-06 2
#> 3 A1B1 Germany 2 2012-09-11 3
#> 4 A1B1 France 1 2015-12-11 4
#> 5 A2B2 UK 2 2012-03-01 1
#> 6 A2B2 Spain 2 2011-05-01 2
#> 7 A3B2 Brazil 1 2014-01-04 1
#> 8 A3B2 UK 1 2013-01-03 2
#> # A tibble: 3 x 4
#> # Groups: ID [3]
#> ID rank_1 rank_2 rank_3
#> <chr> <chr> <chr> <chr>
#> 1 A1B1 USA Japan Germany
#> 2 A2B2 UK Spain <NA>
#> 3 A3B2 Brazil UK <NA>
df %>% left_join(rankings, by = "ID")
#> # A tibble: 14 x 6
#> ID Country Date rank_1 rank_2 rank_3
#> <chr> <chr> <date> <chr> <chr> <chr>
#> 1 A1B1 Japan 2010-01-02 USA Japan Germany
#> 2 A1B1 USA 2010-04-18 USA Japan Germany
#> 3 A1B1 USA 2011-03-22 USA Japan Germany
#> 4 A1B1 USA 2011-11-23 USA Japan Germany
#> 5 A1B1 Germany 2012-05-09 USA Japan Germany
#> 6 A1B1 Germany 2012-09-11 USA Japan Germany
#> 7 A1B1 Japan 2014-01-06 USA Japan Germany
#> 8 A1B1 France 2015-12-11 USA Japan Germany
#> 9 A2B2 UK 2010-04-03 UK Spain <NA>
#> 10 A2B2 Spain 2010-05-11 UK Spain <NA>
#> 11 A2B2 Spain 2011-05-01 UK Spain <NA>
#> 12 A2B2 UK 2012-03-01 UK Spain <NA>
#> 13 A3B2 UK 2013-01-03 Brazil UK <NA>
#> 14 A3B2 Brazil 2014-01-04 Brazil UK <NA>
Created on 2020-08-29 by the reprex package (v0.3.0)

Here's a messy Base R solution:
rlln_rnk_df <- do.call("rbind", lapply(split(mydata, mydata$ID), function(x){
y <- do.call("rbind", lapply(seq_len(nrow(x)), function(i){
tmp <- x[x$Date <= x$Date[i],]
tmp1 <- cbind(head(tmp[order(tmp$Date, decreasing = TRUE),], 1),
rnk = t(names(sort(table(tmp$Country), decreasing = TRUE))))
tmp1 <- setNames(tmp1, c(names(tmp), paste0("rnk.", 1:(ncol(tmp1) - ncol(tmp)))))
tmp1[,setdiff(paste0("rnk.", 1:(length(unique(mydata$Country)))), names(tmp1))] <- NA_character_
tmp1
}
)
)
z <- y[order(y$Date),]
cbind(ID = z$ID, Country = z$Country, Date = z$Date,
z[match(z$Date, z$Date[2:nrow(z)]), (grep("rnk", names(z), value = TRUE))])
}
)
)
df_clean <- data.frame(rlln_rnk_df[, colSums(is.na(rlln_rnk_df)) < nrow(rlln_rnk_df)],
row.names = NULL)

Related

Creating a new column when two columns satisfy certain conditions in R

My data is like this:
country supporter1 supporter2 supporter3 supporter4 supporter5
USA Albania Germany USA NA NA
France USA France NA NA NA
UK UK Chile Peru NA NA
Germany USA Iran Mexico India Pakistan
USA China Spain NA NA NA
Cuba Cuba UK Germany South Korea NA
China Russia NA NA NA NA
What I want to do is to create a new variable when the country column and one of the remaining supporter columns (supporter 1, supporter 2, supporter 3, supporter 4, and supporter 5) are the same (for instance country France and supporter2 France are the same). In this case, the new variable should take 1, 0 otherwise.
I expect to have this:
country supporter1 supporter2 supporter3 supporter4 supporter5 new variable
USA Albania Germany USA NA NA 1
France USA France NA NA NA 1
UK UK Chile Peru NA NA 1
Germany USA Iran Mexico India Pakistan 0
USA China Spain NA NA NA 0
Cuba Cuba UK Germany South Korea NA 1
China Russia NA NA NA NA 0

Update dplyr only solution Using if_any:
library(dplyr)
df %>%
rowwise() %>%
mutate(new_var = as.integer(as.logical(if_any(starts_with("supporter"), ~ . %in% country))))
country supporter1 supporter2 supporter3 supporter4 supporter5 new_var
<chr> <chr> <chr> <chr> <chr> <chr> <int>
1 USA Albania Germany USA NA NA 1
2 France USA France NA NA NA 1
3 UK UK Chile Peru NA NA 1
4 Germany USA Iran Mexico India Pakistan 0
5 USA China Spain NA NA NA 0
6 Cuba Cuba UK Germany South Korea NA 1
7 China Russia NA NA NA NA 0
First answer: also correct:
Here is one possible solution:
calculate rowwise
check in cols supporter1 to supporter5 if country is included
unite all new columns to one and with an ifelse statement take 1 or 0
library(dplyr)
library(stringr)
library(tidyr)
df %>%
rowwise() %>%
mutate(across(supporter1:supporter5, ~ifelse(. %in% country, 1,0), .names = "new_{col}")) %>%
unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ') %>%
mutate(New_Col = ifelse(str_detect(New_Col, "1"), 1,0))
country supporter1 supporter2 supporter3 supporter4 supporter5 New_Col
<chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 USA Albania Germany USA NA NA 1
2 France USA France NA NA NA 1
3 UK UK Chile Peru NA NA 1
4 Germany USA Iran Mexico India Pakistan 0
5 USA China Spain NA NA NA 0
6 Cuba Cuba UK Germany South Korea NA 1
7 China Russia NA NA NA NA 0

Here is a base R solution.
First mapply checks for equality of suporter* and country. NA's are considered to return FALSE. Then as.integer/rowSums transforms rows with at least one TRUE into 1, otherwise 0.
eq <- mapply(\(x, y){x == y & !is.na(x)}, df1[-1], df1[1])
as.integer(rowSums(eq) != 0)
#[1] 1 1 1 0 0 1 0
df1$new_variable <- as.integer(rowSums(eq) != 0)
Data
df1 <- read.table(text = "
country supporter1 supporter2 supporter3 supporter4 supporter5
USA Albania Germany USA NA NA
France USA France NA NA NA
UK UK Chile Peru NA NA
Germany USA Iran Mexico India Pakistan
USA China Spain NA NA NA
Cuba Cuba UK Germany 'South Korea' NA
China Russia NA NA NA NA
", header = TRUE)

Another solution is checking per row whether country is present in one of the columns:
df <- data.frame(country=c("USA","France","UK","Germany","USA","Cuba","China"),
supporter1=c("Albania","USA","UK","USA","China","Cuba","Russia"),
supporter2=c("Germany","France","Chile","Iran","Spain","UK","NA"),
supporter3=c("USA","NA","Peru","Mexico","NA","Germany","NA"),
supporter4=c("NA","NA","NA","India","NA","South Korea","NA"),
supporter5=c("NA","NA","NA","Pakistan","NA","NA","NA"))
That would give:
df$new <- sapply(seq(1,nrow(df)), function(x) ifelse(df$country[x] %in% df[x,2:6],1,0))
> df$new
[1] 1 1 1 0 0 1 0

Create a new column from conditions

I have a dataframe with information of some countries and states like this:
data.frame("state1"= c(NA,NA,"Beijing","Beijing","Schleswig-Holstein","Moskva",NA,"Moskva",NA,"Berlin"),
"country1"=c("Spain","Spain","China","China","Germany","Russia","Germany","Russia","Germany","Germany"),
"state2"= c(NA,NA,"Beijing",NA,NA,NA,"Moskva",NA,NA,NA),
"country2"=c("Germany","Germany","China","Germany","","Ukraine","Russia","Germany","Ukraine","" ),
"state3"= c(NA,NA,NA,NA,"Schleswig-Holstein",NA,NA,NA,NA,"Berlin"),
"country3"=c("Spain","Spain","Germany","Germany","Germany","Germany","Germany","Germany","Germany","Germany"))
Now, I would like to create a new column with the information of German states. (the result would look like below).
When at least one of the three variables state are a German state, assign it in the new variable.
data.frame("GE_State"=c(NA,NA,NA,NA, "Schleswig-Holstein",NA,NA,NA,NA,"Berlin"))
Please help a beginner for the condition setting.
Thank you in advance!

Using dplyr::mutate() with case_when() works, although I suspect there should be a more efficient way using across()
library(dplyr)
df %>%
mutate(GE_state = case_when(country1 == "Germany" & !is.na(state1) ~ state1,
country2 == "Germany" & !is.na(state2) ~ state2,
country3 == "Germany" & !is.na(state3) ~ state3,
TRUE ~ NA_character_))
#> state1 country1 state2 country2 state3 country3
#> 1 <NA> Spain <NA> Germany <NA> Spain
#> 2 <NA> Spain <NA> Germany <NA> Spain
#> 3 Beijing China Beijing China <NA> Germany
#> 4 Beijing China <NA> Germany <NA> Germany
#> 5 Schleswig-Holstein Germany <NA> Schleswig-Holstein Germany
#> 6 Moskva Russia <NA> Ukraine <NA> Germany
#> 7 <NA> Germany Moskva Russia <NA> Germany
#> 8 Moskva Russia <NA> Germany <NA> Germany
#> 9 <NA> Germany <NA> Ukraine <NA> Germany
#> 10 Berlin Germany <NA> Berlin Germany
#> GE_state
#> 1 <NA>
#> 2 <NA>
#> 3 <NA>
#> 4 <NA>
#> 5 Schleswig-Holstein
#> 6 <NA>
#> 7 <NA>
#> 8 <NA>
#> 9 <NA>
#> 10 Berlin
Created on 2021-03-31 by the reprex package (v1.0.0)

I think you want cbind() here:
df1 <- cbind(df1, df2)
Data:
df1 <- <your first data frame>
df2 <- data.frame("GE_State"=c(NA,NA,NA,NA, "Schleswig-Holstein",NA,NA,NA,NA,"Berlin"))

How do I add calendar dates to an existing data table so that an entire month is accounted for in R?

I need to "back-fill" dates for a 3 year time period (2016-2018) and
My actual data has over 32,000 observations and is a table of dates, countries, regions within a country, and protests and riot events.
For simplicity, let's say the data is limited to the month January with 3 countries. I want to eventually do a lagged data panel, but to do so I need all of the dates for the time period accounted for (I think).
DT <- data.table(Date = as.Date(c("2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019-01-12", "2019-01-08" )),
Country = c("India","India","India","Pakistan","Pakistan", "Cameroon", "India"),
Region = c('Kashmir', 'Rajasthan', 'Punjab', 'Islamabad', 'National', 'Nord-Ouest', "Kashmir"),
Protest = c(4,2,0,1,4,1,0 ),
Riot = c(0,2,1,1,4,1,1 ))
# Date Country Region Protest Riot
# 1: 2019-01-01 India Kashmir 4 0
# 2: 2019-01-01 India Rajasthan 2 2
# 3: 2019-01-01 India Punjab 0 1
# 4: 2019-01-01 Pakistan Islamabad 1 1
# 5: 2019-01-01 Pakistan National 4 4
# 6: 2019-01-12 Cameroon Nord-Ouest 1 1
# 7: 2019-01-08 India Kashmir 0 1
I could make and merge a new data table with dates for the month of January and count out the number of repetitions for each country, but that is not feasible for 32 Countries and their regions. Is there a way to account for countries having different numbers of entry for any given date and then filling in rows so that (in this case) each country and region would have a date represented for everyday of the month? a desired output would be along the lines of:
# Date Country Region Protest Riot
# 1: 2019-01-01 India Kashmir 4 0
# 2: 2019-01-01 India Rajasthan 2 2
# 3: 2019-01-01 India Punjab 0 1
# 4: 2019-01-01 Pakistan Islamabad 1 1
# 5: 2019-01-01 Pakistan National 4 4
# 6: 2019-01-01 Cameroon Nord-Ouest NA NA
# 7: 2019-01-02 India Kashmir NA NA
# 8: 2019-01-02 India Rajasthan NA NA
# 9: 2019-01-02 India Punjab NA Na
# 10: 2019-01-02 Pakistan Islamabad NA NA
# 11: 2019-01-02 Pakistan National NA NA
# 12: 2019-01-02 Cameroon Nord-Ouest NA NA

May be we can use complete
library(tidyverse)
DT %>%
mutate_if(is.character, ~ factor(., levels = unique(.))) %>%
group_by(Country, Region) %>%
complete(Date = seq(min(.$Date), max(.$Date), by = 'day')) %>%
ungroup %>%
arrange(Date) %>%
head(12)
# A tibble: 12 x 5
# Country Region Date Protest Riot
# <fct> <fct> <date> <dbl> <dbl>
# 1 India Kashmir 2019-01-01 4 0
# 2 India Rajasthan 2019-01-01 2 2
# 3 India Punjab 2019-01-01 0 1
# 4 Pakistan Islamabad 2019-01-01 1 1
# 5 Pakistan National 2019-01-01 4 4
# 6 Cameroon Nord-Ouest 2019-01-01 NA NA
# 7 India Kashmir 2019-01-02 NA NA
# 8 India Rajasthan 2019-01-02 NA NA
# 9 India Punjab 2019-01-02 NA NA
#10 Pakistan Islamabad 2019-01-02 NA NA
#11 Pakistan National 2019-01-02 NA NA
#12 Cameroon Nord-Ouest 2019-01-02 NA NA

cumulative count of character vector

I want to make a cumulative count of country names from a data frame:
df <- data.frame(country = c("Sweden", "Germany", "Sweden", "Sweden", "Germany",
"Vietnam"), year= c(1834, 1846, 1847, 1852, 1860, 1865))
I have tried different version of count(), cumsum() and tally() but can’t seem to get it right.
Output should look like:
country year n
Sweden 1834 1
Germany 1846 2
Sweden 1847 2
Sweden 1852 2
Germany 1860 2
Vietnam 1865 3

df %>% mutate(count = cumsum(!duplicated(.$country))) %>% as_tibble()
#> # A tibble: 6 x 3
#> country year count
#> <fctr> <dbl> <int>
#> 1 Sweden 1834 1
#> 2 Germany 1846 2
#> 3 Sweden 1847 2
#> 4 Sweden 1852 2
#> 5 Germany 1860 2
#> 6 Vietnam 1865 3
or
dist_cum <- function(var)
sapply(seq_along(var), function(x) length(unique(head(var, x))))
df %>% mutate(var2=dist_cum(country))
#> country year var2
#> 1 Sweden 1834 1
#> 2 Germany 1846 2
#> 3 Sweden 1847 2
#> 4 Sweden 1852 2
#> 5 Germany 1860 2
#> 6 Vietnam 1865 3

You can try this:
library(ggplot2)
library(plyr)
df<-data.frame(country=c("Sweden","Germany","Sweden","Sweden","Germany","Vietnam", "Germany"),year= c(1834,1846,1847,1852,1860,1865,1860))
counts <- ddply(df, .(df$country, df$year), nrow)
The output is:
> counts
df$country df$year V1
1 Germany 1846 1
2 Germany 1860 2
3 Sweden 1834 1
4 Sweden 1847 1
5 Sweden 1852 1
6 Vietnam 1865 1

How to remove rows in data frame after frequency tables in R

I have 3 data frames from which I have to find the continent with less than 2 countries and remove those countries(rows). The data frames are structured in a manner similar a data frame called x below:
row Country Continent Ranking
1 Kenya Africa 17
2 Gabon Africa 23
3 Spain Europe 04
4 Belgium Europe 03
5 China Asia 10
6 Nigeria Africa 14
7 Holland Europe 01
8 Italy Europe 05
9 Japan Asia 06
First I wanted to know the frequency of each country per continent, so I did
x2<-table(x$Continent)
x2
Africa Europe Asia
3 4 2
Then I wanted to identify the continents with less than 2 countries
x3 <- x2[x2 < 10]
x3
Asia
2
My problem now is how to remove these countries. For the example above it will be the 2 countries in Asia and I want my final data set to look like presented below:
row Country Continent Ranking
1 Kenya Africa 17
2 Gabon Africa 23
3 Spain Europe 04
4 Belgium Europe 03
5 Nigeria Africa 14
6 Holland Europe 01
7 Italy Europe 05
The number of continents with less than 2 countries will vary among the different data frames so I need one universal method that I can apply to all.

Try
library(dplyr)
x %>%
group_by(Continent) %>%
filter(n()>2)
# row Country Continent Ranking
#1 1 Kenya Africa 17
#2 2 Gabon Africa 23
#3 3 Spain Europe 04
#4 4 Belgium Europe 03
#5 6 Nigeria Africa 14
#6 7 Holland Europe 01
#7 8 Italy Europe 05
Or using the x2
subset(x, Continent %in% names(x2)[x2>2])
# row Country Continent Ranking
#1 1 Kenya Africa 17
#2 2 Gabon Africa 23
#3 3 Spain Europe 04
#4 4 Belgium Europe 03
#6 6 Nigeria Africa 14
#7 7 Holland Europe 01
#8 8 Italy Europe 05

A very easy way with "data.table" would be:
library(data.table)
as.data.table(x)[, N := .N, by = Continent][N > 2]
# row Country Continent Ranking N
# 1: 1 Kenya Africa 17 3
# 2: 2 Gabon Africa 23 3
# 3: 3 Spain Europe 4 4
# 4: 4 Belgium Europe 3 4
# 5: 6 Nigeria Africa 14 3
# 6: 7 Holland Europe 1 4
# 7: 8 Italy Europe 5 4
In base R you can try:
x[with(x, ave(rep(TRUE, nrow(x)), Continent, FUN = function(y) length(y) > 2)), ]
# row Country Continent Ranking
# 1 1 Kenya Africa 17
# 2 2 Gabon Africa 23
# 3 3 Spain Europe 4
# 4 4 Belgium Europe 3
# 6 6 Nigeria Africa 14
# 7 7 Holland Europe 1
# 8 8 Italy Europe 5

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to find rolling top 3 values in a column by group? - r

Related

Creating a new column when two columns satisfy certain conditions in R

Create a new column from conditions

How do I add calendar dates to an existing data table so that an entire month is accounted for in R?

cumulative count of character vector

How to remove rows in data frame after frequency tables in R

Categories

Resources