Mutate and case when issue - dplyr

Mutate and case when issue - dplyr - r

I have the following data and I was to make a new column using mutate which details when colour = 'g' then take the level on the g row minus the level figure on the 'r' row.
Then likewise with type. Where type = 1 then take the corresponding level minus the level on the type 2 row.
library(dplyr)
d <- tibble(
date = c("2018", "2018", "2018", "2019", "2019", "2019", "2020", "2020", "2020", "2020"),
colour = c("none","g", "r", "none","g", "r", "none", "none", "none", "none"),
type = c("type1", "none", "none", "type2", "none", "none", "none", "none", "none", "none"),
level= c(78, 99, 45, 67, 87, 78, 89, 87, 67, 76))
Just to be clear this is what I want the data to look like.
So the data should look like this:
d2 <- tibble(
date = c("2018", "2018", "2018", "2019", "2019", "2019", "2020", "2020", "2020", "2020"),
colour = c("none","g", "r", "none","g", "r", "none", "none", "none", "none"),
type = c("type1", "none", "none", "type2", "none", "none", "none", "none", "none", "none"),
level= c(78, 99, 45, 67, 87, 78, 89, 87, 67, 76),
color_gap = c("NULL", 44, "NULL", "NULL", 9, "NULL", "NULL", "NULL", "NULL", "NULL"),
type_gap = c(11, "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"))
I started to use mutate and case when and got to the below. However, I'm stuck on the final calculation part. How do I say I want to take the color g level - the color r level?
d %>%
mutate(color_gap = case_when(color == "g" ~ level)%>%
mutate(type_gap = case_when(type== "type1" ~ level)%>%
) -> d2
Anyone know how to complete this?
Thanks

This subtracts the first r level from the first g level, second r level from second g level, etc. Same for type1 and type2. This has no checks at all. It doesn't check whether there is a matching r for each g, whether they are in the expected order, whether they are in the same date-group, etc. It assumes the data is already perfectly formatted as expected, so be careful using this on real data.
d %>%
mutate(color_gap = replace(rep(NA, n()), colour == 'g',
level[colour == 'g'] - level[colour == 'r']),
type_gap = replace(rep(NA, n()), type == 'type1',
level[type == 'type1'] - level[type == 'type2']))
# # A tibble: 10 x 6
# date colour type level color_gap type_gap
# <chr> <chr> <chr> <dbl> <dbl> <dbl>
# 1 2018 none type1 78 NA 11
# 2 2018 g none 99 54 NA
# 3 2018 r none 45 NA NA
# 4 2019 none type2 67 NA NA
# 5 2019 g none 87 9 NA
# 6 2019 r none 78 NA NA
# 7 2020 none none 89 NA NA
# 8 2020 none none 87 NA NA
# 9 2020 none none 67 NA NA
# 10 2020 none none 76 NA NA

you could do this with group_by and mutate.
I assumed that there is only 1 row per date that would satisfy each condition.
d %>%
mutate(color_gap = case_when(colour == "g" ~ level)) %>%
mutate(type_gap = case_when(type== "type1" ~ level)) %>%
group_by(date) %>%
mutate(diff = max(color_gap,na.rm=T)-max(type_gap, na.rm=T))

Related

Summarising movements of individuals spread over several rows

I am a newly self-taught user of R and require assistance.
I am working with a dataset that has captured location of residence and whether the locality is metropolitan, regional or rural over 7 years (2015-2021) for a subset of a population. Each individual has a unique ID and each year is on a new row (ie. each ID has 7 rows). I am trying to figure out how many individuals have remained in the same location, how many have moved and where they moved to.
I am really struggling to figure out what I need to do to get the required outputs, but I assume there is a way to get a summary table that has number of individuals who havent moved (+- where they are located) and number of individuals that have moved (+- where they have moved to).
Your assistance would be greatly appreciated.
Dummy dataset:
stack <- tribble(
~ID, ~Year, ~Residence, ~Locality,
#--/--/--/----
"a", "2015", "Sydney", "Metro",
"a", "2016", "Sydney", "Metro",
"a", "2017", "Sydney", "Metro",
"a", "2018", "Sydney", "Metro",
"a", "2019", "Sydney", "Metro",
"a", "2020", "Sydney", "Metro",
"a", "2021", "Sydney", "Metro",
"b", "2015", "Sydney", "Metro",
"b", "2016", "Orange", "Regional",
"b", "2017", "Orange", "Regional",
"b", "2018", "Orange", "Regional",
"b", "2019", "Orange", "Regional",
"b", "2020", "Broken Hill", "Rural",
"b", "2021", "Sydney", "Metro",
"c", "2015", "Dubbo", "Regional",
"c", "2016", "Dubbo", "Regional",
"c", "2017", "Dubbo", "Regional",
"c", "2018", "Dubbo", "Regional",
"c", "2019", "Dubbo", "Regional",
"c", "2020", "Dubbo", "Regional",
"c", "2021", "Dubbo", "Regional",
)
Cheers in advance.

You can use the lead function to add columns containing the persons' location in the following year. Using mutate across, you can apply the lead to two columns simultaneously. You can then make a row-wise comparisons and look for moves before summarising.
#Group by individual before applying the lead function
#Apply the lead function to the two listed columns and add "nextyear" as a suffix
#Add a logical column which returns TRUE if any change of residence or locality is detected.
#summarise the date by individual by retaining the location with the max year.
stack%>%
unite(col="Location", c(Residence, Locality), sep="-")%>%
group_by(ID)%>%
mutate(across(c("Year", "Location"), list(nextyear= lead)),
Move=Location!=Location_nextyear)%>%
filter(!is.na(Year_nextyear))%>%
mutate(nb.of.moves=sum(Move, na.rm=TRUE))%>%
slice_max(Year)%>%
select(ID, last.location=Location_nextyear, nb.of.moves)
# A tibble: 3 x 3
# Groups: ID [3]
ID last.location nb.of.moves
<chr> <chr> <int>
1 a Sydney-Metro 0
2 b Sydney-Metro 3
3 c Dubbo-Regional 0

Here is another tidyverse option and using cumsum. We can get the cumulative sum to show how many times each person moves (if they do). Then, we can slice the last row, and get the count of each location. The change column indicates how many times they moved. However, it's unclear what you want the final product to look like.
library(tidyverse)
stack %>%
group_by(ID) %>%
mutate(
change = cumsum(case_when(
paste0(Residence, Locality) != lag(paste0(Residence, Locality)) ~ TRUE,
TRUE ~ FALSE
))
) %>%
slice(n()) %>%
ungroup %>%
count(Residence, Locality, change)
Output
Residence Locality change n
<chr> <chr> <int> <int>
1 Dubbo Regional 0 1
2 Sydney Metro 0 1
3 Sydney Metro 3 1

Using data.table.
library(data.table)
setDT(stack) # convert to data.table
setorder(stack, ID, Year) # assure rows are in correct order
stack[, rle(paste(Residence, Locality, sep=', ')), by=.(ID)]
## ID lengths values
## 1: a 7 Sydney, Metro
## 2: b 1 Sydney, Metro
## 3: b 4 Orange, Regional
## 4: b 1 Broken Hill, Rural
## 5: b 1 Sydney, Metro
## 6: c 7 Dubbo, Regional
So a stayed in Sydney for 7 years, b stayed in Sydney for 1 year then moved to Orange for 4 years, then moved to Broken Hill for 1 year, then moved back to Sydney for 1 year.
To determine how many times each person moved:
result <- stack[, rle(paste(Residence, Locality, sep=', ')), by=.(ID)]
result[, .(N=.N-1), by=.(ID)]
## ID N
## 1: a 0
## 2: b 3
## 3: c 0
So a and c did not move at all, and b moved 3 times.

Similar to what #Dealec did, I used the lag function from dplyr instead.
library(tidyverse)
library(janitor)
#>
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#>
#> chisq.test, fisher.test
stack <- tribble(
~ID, ~Year, ~Residence, ~Locality,
#--/--/--/----
"a", "2015", "Sydney", "Metro",
"a", "2016", "Sydney", "Metro",
"a", "2017", "Sydney", "Metro",
"a", "2018", "Sydney", "Metro",
"a", "2019", "Sydney", "Metro",
"a", "2020", "Sydney", "Metro",
"a", "2021", "Sydney", "Metro",
"b", "2015", "Sydney", "Metro",
"b", "2016", "Orange", "Regional",
"b", "2017", "Orange", "Regional",
"b", "2018", "Orange", "Regional",
"b", "2019", "Orange", "Regional",
"b", "2020", "Broken Hill", "Rural",
"b", "2021", "Sydney", "Metro",
"c", "2015", "Dubbo", "Regional",
"c", "2016", "Dubbo", "Regional",
"c", "2017", "Dubbo", "Regional",
"c", "2018", "Dubbo", "Regional",
"c", "2019", "Dubbo", "Regional",
"c", "2020", "Dubbo", "Regional",
"c", "2021", "Dubbo", "Regional",
) %>%
clean_names()
results <- stack %>%
mutate(location = paste(residence, locality, sep = "_")) %>%
arrange(id, year) %>%
group_by(id) %>%
mutate(
row = row_number(),
movement = case_when(
row == 1 ~ NA_character_,
location == lag(location, n = 1) ~ "no_movement",
TRUE ~ location
)
) %>%
ungroup() %>%
select(-row)
results
#> # A tibble: 21 x 6
#> id year residence locality location movement
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 a 2015 Sydney Metro Sydney_Metro <NA>
#> 2 a 2016 Sydney Metro Sydney_Metro no_movement
#> 3 a 2017 Sydney Metro Sydney_Metro no_movement
#> 4 a 2018 Sydney Metro Sydney_Metro no_movement
#> 5 a 2019 Sydney Metro Sydney_Metro no_movement
#> 6 a 2020 Sydney Metro Sydney_Metro no_movement
#> 7 a 2021 Sydney Metro Sydney_Metro no_movement
#> 8 b 2015 Sydney Metro Sydney_Metro <NA>
#> 9 b 2016 Orange Regional Orange_Regional Orange_Regional
#> 10 b 2017 Orange Regional Orange_Regional no_movement
#> # ... with 11 more rows
results %>%
count(year, movement) %>%
pivot_wider(names_from = movement,
values_from = n) %>%
clean_names()
#> # A tibble: 7 x 6
#> year na no_movement orange_regional broken_hill_rural sydney_metro
#> <chr> <int> <int> <int> <int> <int>
#> 1 2015 3 NA NA NA NA
#> 2 2016 NA 2 1 NA NA
#> 3 2017 NA 3 NA NA NA
#> 4 2018 NA 3 NA NA NA
#> 5 2019 NA 3 NA NA NA
#> 6 2020 NA 2 NA 1 NA
#> 7 2021 NA 2 NA NA 1
#tracking movement from a location
from_location <- stack %>%
mutate(location = paste(residence, locality, sep = "_")) %>%
arrange(id, year) %>%
group_by(id) %>%
mutate(
row = row_number(),
movement_from = case_when(
row == 1 ~ NA_character_,
location == lag(location, n = 1) ~ "no_movement",
TRUE ~ lag(location, n = 1)
)
) %>%
ungroup() %>%
select(-row)
from_location %>%
count(year, movement_from) %>%
pivot_wider(names_from = movement_from,
names_prefix = "from_",
values_from = n) %>%
clean_names()
#> # A tibble: 7 x 6
#> year from_na from_no_movement from_sydney_metro from_orange_regional
#> <chr> <int> <int> <int> <int>
#> 1 2015 3 NA NA NA
#> 2 2016 NA 2 1 NA
#> 3 2017 NA 3 NA NA
#> 4 2018 NA 3 NA NA
#> 5 2019 NA 3 NA NA
#> 6 2020 NA 2 NA 1
#> 7 2021 NA 2 NA NA
#> # ... with 1 more variable: from_broken_hill_rural <int>
Created on 2022-04-28 by the reprex package (v2.0.1)

Comparing dates across different data frames and return output

I have two data frames with different data for the same people. Data frame 1 (dfx) has unique ids and dates people had appointments and data frame 2 has unique ids and a start and end date.
It looks something like below:
c1 <- c("1", "1", "1", "1", "1", "2", "2", "2", "2", "2")
d1 <- c("2017", "2018", "2019", "2020", "2021", "2019", "2019", "2019", "2020", "2021")
dfx <- data.frame(c1,d1)
c2 <- c("1", "1", "2")
ds <- c("2017", "2020", "2017")
de <- c("2018", "2021", "2018")
dfy <- data.frame(c2,ds,de)
I'm working with data frame 2 and I want to know if dates in data frame 1 is within the start and end dates in data frame 2. I am trying to get an output in dfy saying TRUE or FALSE for overlap.
For this example, the output should return TRUE, TRUE, FALSE.
I've tried working with this on dplyr and not getting the result I'm looking. I'll appreciate any help. Thanks.
dplyr code:
overlap <- dfy %>%
group_by(c2) %>%
mutate (on_hold = any(mapply(function(id, start, end) any(id == dfx$c1 & dfx$d1 > start & dfx$d1 < end), c2, ds, de))) %>%
arrange(c2, ds, de, on_hold)

solution
ranges <- dfx %>%
group_by(c1) %>%
summarise(range = list(unique(d1)))
left_join(dfy, ranges, by = c("c2" = "c1")) %>%
rowwise() %>%
mutate(in_range = ds %in% range & de %in% range)
output
# A tibble: 3 x 5
# Rowwise:
c2 ds de range in_range
<fct> <fct> <fct> <list> <lgl>
1 1 2017 2018 <fct [5]> TRUE
2 1 2020 2021 <fct [5]> TRUE
3 2 2017 2018 <fct [3]> FALSE
data as provided by OP
c1 <- c("1", "1", "1", "1", "1", "2", "2", "2", "2", "2")
d1 <- c("2017", "2018", "2019", "2020", "2021", "2019", "2019", "2019", "2020", "2021")
dfx <- data.frame(c1,d1)
c2 <- c("1", "1", "2")
ds <- c("2017", "2020", "2017")
de <- c("2018", "2021", "2018")
dfy <- data.frame(c2,ds,de)

R: Is there a way to select a column according to the current year?

Say you have a database like gapminder with the population per country. Even though the current year is 2021, you also have predictions for the following years to come.
location 2020.0 2021.0 2022.0
Canada 5 7 9
China 23 34 54
Congo 1 2 3
and another database like this, vaccins
location date amount_of_vaccins
Canada 2020-01-02 50
China 2021-05-03 59
Congo 2022-03-05 34
How can I merge the population of each country into the second database, but following the dates in the second database.
I managed to merge them by country like this:
merge(gapminder,vaccins, by = "location")
but I'm getting this
location date amount_of_vaccins 2020.0 2021.0 2022.0
Canada 2020-01-02 50 5 7 9
China 2021-05-03 59 23 34 54
Congo 2022-03-05 34 1 2 3
I'd like to have only a new variable giving the population of the country according to the year. Thank you.

You could do something like this with tidyverse.
library(tidyverse)
df1 <- df1 %>%
pivot_longer(!location, names_to = "date", values_to = "population") %>%
dplyr::mutate(year = str_sub(date, 1, 4))
df2 %>%
dplyr::mutate(year = str_sub(date, end = 4)) %>%
dplyr::left_join(., df1, by = c("location", "year")) %>%
dplyr::select(-c(date.y, year)) %>%
dplyr::rename(date = date.x)
Output
location date amount_of_vaccins population
1 Canada 2020-01-02 50 5
2 China 2021-05-03 59 34
3 Congo 2022-03-05 54 3
Data
df1 <-
structure(
list(
location = c("Canada", "China", "Congo"),
`2020.0` = c(5, 23, 1),
`2021.0` = c(7, 34, 2),
`2022.0` = c(9, 54, 3)
),
class = "data.frame",
row.names = c(NA,-3L)
)
df2 <-
structure(
list(
location = c("Canada", "China", "Congo"),
date = c("2020-01-02",
"2021-05-03", "2022-03-05"),
amount_of_vaccins = c(50, 59, 54)
),
class = "data.frame",
row.names = c(NA,-3L)
)

Reshaping dataset in a way that values for variables become variable names and their values are picked from another column

-------------------NEW POST:
I've posted incorrect example of my data in past (leaving it below). In reality my data has repetitive "Modules" under same column and previous solution doesn't work for my problem.
My example data (current dataset):
Year <- c("2013", "2020", "2015", "2012")
Grade <- c(28, 39, 76, 54)
Code <- c("A", "B", "C", "A")
Module1 <- c("English", "English", "Science", "English")
Results1 <- c(45, 58, 34, 54)
Module2 <- c("History", "History", "History", "Art")
Results2 <- c(12, 67, 98, 45)
Module3 <- c("Art", "Geography", "Math", "Geography")
Results3 <- c(89, 84, 45, 67)
Module14 <- c("Math", "Math", "Geography", "Art")
Results14 <- c(89, 24, 95, 67)
Module15 <-c("Science", "Art", "Art", "Science")
Results15 <-c(87, 24, 25, 67)
daf <- data.frame(Id, Year, Grade, Code, Module1, Results1, Module2, Results2, Module3, Results3, Module14, Results14, Module15, Results15)
My target - dataset I need to achieve:
Year <- c("2013", "2020", "2015", "2012")
Grade <- c(28, 39, 76, 54)
Code <- c("A", "B", "C", "A")
English <- c(45, 58,NA,54)
Math <- c(89, 24,45, NA)
Science <- c(87, NA, 34, 67)
Geography <- c(NA, 84, 95,67)
Art <- c(89,24,25,45)
wished_df <- data.frame(Id, Year, Grade, Code, English, Math, Science,Geography, Art)
Thanks again for any help!
-------------------------------- OLD POST:
I am trying to reshape my current data to new format.
Module1 <- c("English", "Math", "Science", "Geography")
Results1 <- c(45, 58, 34, 54)
Module2 <- c("Math", "History", "English", "Art")
Results2 <- c(12, 67, 98, 45)
Module3 <- c("History", "Art", "English", "Geography")
Results3 <- c(89, 84, 45, 67)
daf <- data.frame(Module1, Results1, Module2, Results2, Module3, Results3)
What I need is module names set as ‘variable names’, and module results set as ‘values for variable names’, looking like:
English1 <- c(45, 98, 45)
Math1 <- c(58, 12, NA)
Science1 <- c(34, NA, NA)
Geography1 <- c(54,NA, 67)
Art1 <- c(NA, 45, 84)
wished_df <- data.frame(English1, Math1, Science1,Geography1, Art1)
Thank you for any ideas.

1) reshape Using the data in the Note at the end, split the input column names into two groups (Module columns and Results columns) giving varying. Using that reshape to long form where varying= defines which columns in the input correspond to a single column in the long form. v.names= specifies the names to use for each of the two columns produced from the varying columns. reshape will give a data frame with columns time, Module, Result and id columns. We don't need the id column so drop it using [-4].
Then reshape that back to the new wide form. idvar= specifies the source of the output rows and timevar= specifies the source of the output columns. Everything else is the body of the result. reshape will generate a time column which we don't need so remove it using [-1]. At the end we remove the junk part of each column name.
No packages are used.
varying <- split(names(daf), sub("\\d+$", "", names(daf)))
long <- reshape(daf, dir = "long", varying = varying, v.names = names(varying))[-4]
wide <- reshape(long, dir = "wide", idvar = "time", timevar = "Module")[-1]
names(wide) <- sub(".*[.]", "", names(wide))
giving:
> wide
English Math Science Geography History Art
1.1 45 58 34 54 NA NA
1.2 98 12 NA NA 67 45
1.3 45 NA NA 67 89 84
2) pivot_ Using the data in the Note at the end, specify that all columns are to be used and using .names specify that the column names in long form are taken from the first portion of the column names of the input where the names of the input are split according to the names_pattern= regular expression. Then pivot to a new wide form where the column names are taken from the Module column and the values in the body of the result are taken from the Results column. The index column will define the rows and can be omitted afterwards.
library(dplyr)
library(tidyr)
daf %>%
pivot_longer(everything(), names_to = c(".value", "index"),
names_pattern = "(\\D+)(\\d+)") %>%
pivot_wider(names_from = Module, values_from = Results) %>%
select(-index)
giving:
# A tibble: 3 x 6
English Math History Art Science Geography
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 45 58 NA NA 34 54
2 98 12 67 45 NA NA
3 45 NA 89 84 NA 67
3) unlist/tapply UUsing the data in the Note at the end, another base solution can be fashioned by separately unlisting the Module and Results columns to get the long form and using tapply to convert to wide form. No packages are used
is_mod <- grepl("Module", names(daf))
long <- data.frame(Module = unlist(daf[is_mod]), Results = unlist(daf[!is_mod]))
tab <- tapply(long$Results, list(sub("\\d+$", "", rownames(long)), long$Module), sum)
as.data.frame.matrix(tab)
giving:
Art English Geography History Math Science
Module1 NA 45 54 NA 58 34
Module2 45 98 NA 67 12 NA
Module3 84 45 67 89 NA NA
Note
Module1 <- c("English", "Math", "Science", "Geography")
Results1 <- c(45, 58, 34, 54)
Module2 <- c("Math", "History", "English", "Art")
Results2 <- c(12, 67, 98, 45)
Module3 <- c("History", "Art", "English", "Geography")
Results3 <- c(89, 84, 45, 67)
daf <- data.frame(Module1, Results1, Module2, Results2, Module3, Results3)

A data.table version:
library(data.table)
library(magrittr)
dt <- as.data.table(daf)
dt %>%
melt.data.table(measure.vars = patterns("^Module", "^Result")) %>%
dcast.data.table(variable ~ ..., value.var = "value2")
giving:
Key: <variable>
variable Art English Geography History Math Science
<fctr> <num> <num> <num> <num> <num> <num>
1: 1 NA 45 54 NA 58 34
2: 2 45 98 NA 67 12 NA
3: 3 84 45 67 89 NA NA

How to plot this picture using ggplot2?

Above is my dataset, just a simple dataset. It shows the GDP per capita of the richest and the poorest regions in nine countries in 2000 and 2015 as well as the gap of GDP per capita between the poorest and richest regions. Below is the reproducible example of this dataset:
structure(list(Country = c("Britain", "Germany", "United State",
"France", "South Korea", "Italy", "Japan", "Spain", "Sweden"),
Poor2000 = c(69, 50, 74, 52, 79, 50, 80, 80, 90), Poor2015 = c(61,
48, 73, 50, 73, 52, 78, 84, 82), Rich2000 = c(848, 311, 290,
270, 212, 180, 294, 143, 148), Rich2015 = c(1150, 391, 310,
299, 200, 198, 290, 151, 149)), row.names = c(NA, -9L), class = c("tbl_df",
"tbl", "data.frame"))
I wanna make a plot like this:
In this plot I just wanna show the GDP per capita of the poorest regions in the nine countries in 2000 and 2015 (the draft picture just has three countries for the sake of convenience). But I don't know how to do it using ggplot. Because it seems like I need to set x-axis as "Country" and y-axis as "Poor2000" and "Poor2015" the two variables. I don't know how to do that. Thanks many in advance.

Here a possible solution. Starting from your dataframe, you can first create a new dataframe that will reshape it into a longer format. FOr doing that, I used pivot_longer function from tidyr package:
library(tidyr)
library(dplyr)
DF <- df %>% select(Country, Poor2000, Poor2015) %>%
mutate(Diff = Poor2015 - Poor2000) %>%
pivot_longer(-Country, names_to = "Poor", values_to = "value")
# A tibble: 27 x 3
Country Poor value
<fct> <chr> <dbl>
1 Britain Poor2000 69
2 Britain Poor2015 61
3 Britain Diff -8
4 Germany Poor2000 50
5 Germany Poor2015 48
6 Germany Diff -2
7 United States Poor2000 74
8 United States Poor2015 73
9 United States Diff -1
10 France Poor2000 52
# … with 17 more rows
We will also create a second dataframe that will contain the difference of values between Poor2000 and Poor2015:
DF_second_label <- df %>% select(Country, Poor2000, Poor2015) %>%
group_by(Country) %>%
mutate(Diff = Poor2015 - Poor2000, ypos = max(Poor2000,Poor2015))
# A tibble: 9 x 5
# Groups: Country [9]
Country Poor2000 Poor2015 Diff ypos
<fct> <dbl> <dbl> <dbl> <dbl>
1 Britain 69 61 -8 69
2 Germany 50 48 -2 50
3 United States 74 73 -1 74
4 France 52 50 -2 52
5 South Korea 79 73 -6 79
6 Italy 50 52 2 52
7 Japan 80 78 -2 80
8 Spain 80 84 4 84
9 Sweden 90 82 -8 90
Then, we can plot both new dataframe in ggplot2 and select only countries of interest by using subset function:
ggplot(subset(DF, Poor != "Diff" & Country %in% c("Britain","South Korea","Sweden")),
aes(x = Country, y = value, fill = Poor))+
geom_col(position = position_dodge())+
geom_text(aes(label = value), position = position_dodge(0.9), vjust = -0.5, show.legend = FALSE)+
geom_text(inherit.aes = FALSE,
data = subset(DF_second_label, Country %in% c("Britain","South Korea","Sweden")),
aes(x = Country,
y = ypos+10,
label = Diff), color = "darkgreen", size = 6, show.legend = FALSE)+
labs(x = "", y = "GDP per Person", title = "Poor in 2000 & 2015")+
theme(plot.title = element_text(hjust = 0.5))
And you get:
Reproducible example
df <- data.frame(Country = c("Britain","Germany", "United States", "France", "South Korea", "Italy","Japan","Spain","Sweden"),
Poor2000 = c(69,50,74,52,79,50,80,80,90),
Poor2015 = c(61,48,73,50,73,52,78,84,82),
Rich2000 = c(848,311,290,270,212,180,294,143,148))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Mutate and case when issue - dplyr - r

Related

Summarising movements of individuals spread over several rows

Comparing dates across different data frames and return output

R: Is there a way to select a column according to the current year?

Reshaping dataset in a way that values for variables become variable names and their values are picked from another column

How to plot this picture using ggplot2?

Categories

Resources