Plot graphs next to a single output - r

I have a dataset like this dataframe:
structure(list(year = c(2001L, 2002L, 2003L, 2004L, 2005L, 2006L,
2007L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2001L,
2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2001L, 2002L, 2003L,
2004L, 2005L, 2006L, 2007L, 2001L, 2002L, 2003L, 2004L, 2005L,
2006L, 2007L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L
), volume = c(21L, 44L, 37L, 23L, 46L, 21L, 69L, 21L, 44L, 37L,
23L, 46L, 21L, 69L, 21L, 44L, 37L, 23L, 46L, 21L, 69L, 21L, 44L,
37L, 23L, 46L, 21L, 69L, 21L, 44L, 37L, 23L, 46L, 21L, 69L, 21L,
44L, 37L, 23L, 46L, 21L, 69L), stock = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("stock1", "stock2", "stock3",
"stock4", "stock5", "stock6"), class = "factor")), .Names = c("year",
"volume", "stock"), class = "data.frame", row.names = c(NA, -42L
))
I try to have an ouput like one.
What I have until know
library(ggplot2)
p <- ggplot(df, aes(x = df$year, y = df$volume)) + geom_line(aes(color = "red")) +
facet_grid(stock ~ ., scales = "free_x") + theme(legend.position = left)

Related

How to generate a lag variable (endogenous lag) that captures previous values?

I want generate the following endogenous lag (Y) variable
set Y=1 in the current routine year, if submission==1 and routineyear==1 in the previous routine year
set Y=2 in the current routine year, if sub==0 and routineyear==1 in the previous routine year
Otherwise=0
Note though that "previous routine year" is not previous year, the intervals between routine years varies. This is actually what makes it hard for me to generate this variable.
Basically, I want to generate an endogenous variable that would capture state's behavior in their LAST routineyear.
To illustrate what I want to do:
Assume that country A had its routine year in 1990 - the same year the submission variable was also =1. This would generate Y=1.
Now, the next routineyear for country A is in 1992, where the submission=1 and routineyear=1 in that year. The endogenous lag in this should indicate A's previous behavior as in 1990 (Y=1).
Then, the next routineyear is in 1996 where submission=0 while routineyear=1. The endogenous lag in this case would be the value of A's previous behavior in 1992 (Y=1).
Then again, next routineyear is in 1998, where submission=1 and routineyear=1. The endogenous lag here should indicate A's previous behavior in the last routineyear, in 1996. that is: Y=2!.
This is how the endogenous lag should look like (based on the example above)
country year submission routineyear Y(endo lag)
A 1990 1 1 1
A 1991 0 0 0
A 1992 1 1 1
A 1993 1 0 0
A 1994 0 0 0
A 1995 0 0 0
A 1996 0 1 1
A 1997 0 0 0
A 1998 1 1 2
A 1999 0 0 0
A 2000 0 0 0
A 2001 0 1 1
A 2002 0 0 0
A 2003 1 1 2
I've been trying to do this using different logics but without success. One of the biggest problems is that routine year is different for each country, the intervals are not stable.
I believe that someone who can write proper codes/functions in R would be able to slove this puzzle. If not, I would appreciate all recommendations as how to proceed from here.
A sample from my real data:
structure(list(ccode = c(31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L,
31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 40L,
40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L, 40L,
40L, 40L, 40L, 40L, 40L, 40L, 40L, 41L, 41L, 41L, 41L, 41L, 41L, 41L,
41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L,
41L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L,
42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 42L, 51L, 51L, 51L, 51L, 51L,
51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L,
51L, 51L, 51L, 51L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L,
52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 52L, 53L, 53L,
53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L, 53L,
53L, 53L, 53L, 53L, 53L, 53L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L,
54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L, 54L,
70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L,
70L, 70L, 70L, 70L, 70L, 70L, 70L, 70L, 80L, 80L, 80L, 80L, 80L, 80L,
80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L,
80L, 80L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L,
90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L, 90L), year = c(1990L,
1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L,
2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L,
2011L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L,
1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L,
2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L,
1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L,
2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 1993L, 1994L,
1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L,
2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L,
1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 1999L, 2000L, 2001L,
2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L,
1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L,
2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L,
2010L, 2011L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L,
1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L,
2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L,
1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L,
2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L, 1992L, 1993L,
1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 2003L,
2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 1990L, 1991L,
1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L,
2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L,
1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L,
2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L,
2010L, 2011L), country = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), .Label = c("Bahamas", "Barbados",
"Belize", "Cuba", "Dominica", "Dominican Republic", "Guatemala",
"Haiti", "Jamaica", "Mexico", "Trinidad and Tobago"), class =
"factor"),
submission = c(1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L,
1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L,
1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L,
0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L,
1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L,
0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L,
1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L,
0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L,
0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L,
1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L,
0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L,
1L, 0L, 1L, 0L, 1L, 0L, 0L), routineyear = c(1L, 0L, 0L,
1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L,
0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L,
1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L,
0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L,
0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L,
0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L,
0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L
)), .Names = c("ccode", "year", "country", "submission", "routineyear"), class = "data.frame", row.names = c(NA, -243L ))
Using data.table:
library(data.table)
setDT(DF)
DF[, Y := 0
][routineyear == 1
, Y := 1 + (shift(submission, fill = 1) == 0)
, by = country][]
which gives (first 15 rows shown):
> DF
ccode year country submission routineyear Y
1: 31 1990 Bahamas 1 1 1
2: 31 1991 Bahamas 0 0 0
3: 31 1992 Bahamas 0 0 0
4: 31 1993 Bahamas 0 1 1
5: 31 1994 Bahamas 0 0 0
6: 31 1995 Bahamas 1 0 0
7: 31 1996 Bahamas 0 0 0
8: 31 1997 Bahamas 1 1 2
9: 31 1998 Bahamas 0 0 0
10: 31 1999 Bahamas 1 1 1
11: 31 2000 Bahamas 0 0 0
12: 31 2001 Bahamas 1 1 1
13: 31 2002 Bahamas 0 0 0
14: 31 2003 Bahamas 1 1 1
15: 31 2004 Bahamas 0 0 0
........
What this does:
setDT(DF) converts your dataframe to a data.table
Y := 0 sets Y to 0 by reference first
Filter for routineyear == 1
Update Y by reference such that Y is set to 1 if previous submission is 1 and to 2 is previous submission is 0
library(dplyr)
select(dat2, -Y) %>%
filter(routineyear == 1L) %>%
group_by(country) %>%
mutate(Y = 2L - lag(submission, default = 1L)) %>%
ungroup() %>%
right_join(select(dat2, -Y)) %>%
mutate(Y = replace(Y, is.na(Y), 0L))
# # A tibble: 14 x 5
# country year submission routineyear Y
# <fct> <int> <int> <int> <int>
# 1 A 1990 1 1 1
# 2 A 1991 0 0 0
# 3 A 1992 1 1 1
# 4 A 1993 1 0 0
# 5 A 1994 0 0 0
# 6 A 1995 0 0 0
# 7 A 1996 0 1 1
# 8 A 1997 0 0 0
# 9 A 1998 1 1 2
# 10 A 1999 0 0 0
# 11 A 2000 0 0 0
# 12 A 2001 0 1 1
# 13 A 2002 0 0 0
# 14 A 2003 1 1 2
all.equal(.Last.value, dat2)
# [1] TRUE
where dat2 is:
dat2 <- read.table(text =
"country year submission routineyear Y
A 1990 1 1 1
A 1991 0 0 0
A 1992 1 1 1
A 1993 1 0 0
A 1994 0 0 0
A 1995 0 0 0
A 1996 0 1 1
A 1997 0 0 0
A 1998 1 1 2
A 1999 0 0 0
A 2000 0 0 0
A 2001 0 1 1
A 2002 0 0 0
A 2003 1 1 2
", header = TRUE)

Data manipulation: spread columns with different number of rows in dplyr

I am trying to spread the time columns of my dataframe. left_join would be my choice, but the age groups age and geo differ, thus I end up with most years containing NA values and one of the age categories disappears.
library(dplyr)
dt %>%
filter(time!=2001) %>%
group_by(time, geo, age, sex) %>%
filter(time==2011) %>%
left_join(.,dt %>%
group_by(time, sex, age, geo) %>%
mutate(time2 = 2011) %>%
filter(time != 2011) %>%
spread(time, value),
by = c('time' = 'time2', 'age', 'geo'))
What I obtain is this:
time geo sex.x age value sex.y `2000` `2001` `2002` `2003`
2011 51900 1 0 27933 1 NA 26193 NA NA
2011 51900 1 0 27933 2 NA 22760 NA NA
2011 51900 1 5 20627 1 NA 26213 NA NA
2011 51900 1 5 20627 2 NA 25647 NA NA
...
2011 51900 1 75 6400 1 NA 5313 NA NA
2011 51900 1 75 6400 2 NA 11500 NA NA
2011 51900 1 80 4520 NA NA NA NA NA
but there's a problem with the ```value`` column as it repeats the same values twice (and it shouldn't) and years 2000, 2002, ..., 2020
What I would like is this:
geo sex age 2001 2011 2000 2002 2003 ... 2020
51900 1 0 39290 41900 69844 55281 55545 58045
51900 2 0 34140 38270 61192 65301 65429 65391
51902 1 0 4307 4193 69844 55281 55545 58045
51902 2 0 3753 3453 61192 65301 65429 65391
...
51900 1 80 NA 41900 104766 97952 98143 87068
51900 2 80 NA 38270 91788 89921 83317 98086
dt = structure(list(time = c(2001L, 2001L, 2001L, 2001L, 2001L, 2001L,
2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L,
2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2000L, 2000L, 2000L, 2000L, 2000L, 2002L,
2002L, 2002L, 2002L, 2002L, 2003L, 2003L, 2003L, 2003L, 2003L, 2004L, 2004L, 2004L, 2004L, 2004L, 2005L, 2005L, 2005L, 2005L,
2005L, 2006L, 2006L, 2006L, 2006L, 2006L, 2007L, 2007L, 2007L, 2007L, 2007L, 2008L, 2008L, 2008L, 2008L, 2008L, 2009L, 2009L,
2009L, 2009L, 2009L, 2010L, 2010L, 2010L, 2010L, 2010L, 2012L, 2012L, 2012L, 2012L, 2012L, 2013L, 2013L, 2013L, 2013L, 2013L,
2014L, 2014L, 2014L, 2014L, 2014L, 2015L, 2015L, 2015L, 2015L, 2015L, 2016L, 2016L, 2016L, 2016L, 2016L, 2017L, 2017L, 2017L,
2017L, 2017L, 2018L, 2018L, 2018L, 2018L, 2018L, 2019L, 2019L, 2019L, 2019L, 2019L, 2020L, 2020L, 2020L, 2020L, 2020L, 2000L,
2000L, 2000L, 2000L, 2000L, 2002L, 2002L, 2002L, 2002L, 2002L, 2003L, 2003L, 2003L, 2003L, 2003L, 2004L, 2004L, 2004L, 2004L,
2004L, 2005L, 2005L, 2005L, 2005L, 2005L, 2006L, 2006L, 2006L, 2006L, 2006L, 2007L, 2007L, 2007L, 2007L, 2007L, 2008L, 2008L,
2008L, 2008L, 2008L, 2009L, 2009L, 2009L, 2009L, 2009L, 2010L, 2010L, 2010L, 2010L, 2010L, 2012L, 2012L, 2012L, 2012L, 2012L,
2013L, 2013L, 2013L, 2013L, 2013L, 2014L, 2014L, 2014L, 2014L, 2014L, 2015L, 2015L, 2015L, 2015L, 2015L, 2016L, 2016L, 2016L,
2016L, 2016L, 2017L, 2017L, 2017L, 2017L, 2017L, 2018L, 2018L, 2018L, 2018L, 2018L, 2019L, 2019L, 2019L, 2019L, 2019L, 2020L,
2020L, 2020L, 2020L, 2020L), geo = c(51900L, 51900L, 51900L, 51900L, 51900L, 51900L, 51900L, 51900L, 51902L, 51902L, 51902L,
51902L, 51902L, 51902L, 51902L, 51902L, 51900L, 51900L, 51900L, 51900L, 51900L, 51900L, 51900L, 51900L, 51900L, 51900L, 51902L,
51902L, 51902L, 51902L, 51902L, 51902L, 51902L, 51902L, 51902L, 51902L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L,
51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L,
51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L,
51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L,
51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L,
51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L, 51L), sex = c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), age = c(0L, 5L, 10L, 75L, 0L, 5L, 10L, 75L, 0L, 5L, 10L, 75L, 0L, 5L, 10L, 75L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L,
80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L,
0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L, 0L, 5L,
10L, 75L, 80L, 0L, 5L, 10L, 75L, 80L), value = c(26193L, 26213L, 31653L, 5313L, 22760L, 25647L, 31393L, 11500L, 4307L, 4793L,
5947L, 667L, 3753L, 4500L, 5207L, 1440L, 27933L, 20627L, 20593L, 6400L, 4520L, 25513L, 17480L, 17800L, 9520L, 8560L, 4193L, 3027L,
3453L, 800L, 580L, 3453L, 2473L, 2980L, 1013L, 1167L, 61192L, 88249L, 105509L, 20595L, 18198L, 55281L, 76667L, 99967L, 25571L,
19187L, 55545L, 70490L, 95697L, 28376L, 19340L, 56564L, 64639L, 90809L, 30322L, 19579L, 57471L, 59755L, 85464L, 30949L, 20081L,
60145L, 55926L, 79537L, 30083L, 22373L, 61425L, 53664L, 73329L, 27916L, 24891L, 61683L, 52992L, 67148L, 25620L, 27118L, 61776L,
53403L, 61637L, 24601L, 28551L, 62477L, 53990L, 57438L, 25439L, 29074L, 64401L, 56247L, 52992L, 31317L, 30495L, 64691L, 58095L,
52582L, 35069L, 30691L, 64689L, 60083L, 52853L, 37023L, 31297L, 64391L, 61877L, 53538L, 36327L, 32537L, 63158L, 63367L, 54657L,
33260L, 35359L, 61961L, 64311L, 56249L, 28203L, 38591L, 60751L, 64639L, 58159L, 22742L, 41433L, 59469L, 64485L, 60081L, 18813L,
42936L, 58045L, 64127L, 61703L, 17280L, 42758L, 69844L, 93632L, 109773L, 11025L, 7397L, 65301L, 82373L, 103304L, 16130L, 7705L,
65429L, 77025L, 98764L, 18861L, 7835L, 66195L, 72123L, 93892L, 20763L, 8231L, 66949L, 68002L, 88909L, 21513L, 8973L, 69257L,
64759L, 83202L, 21269L, 10813L, 70402L, 62813L, 77601L, 20044L, 12820L, 70681L, 62125L, 72404L, 18627L, 14631L, 70818L, 62321L,
68099L, 17947L, 15893L, 71579L, 62729L, 65085L, 18379L, 16509L, 73653L, 64712L, 61851L, 21697L, 17861L, 73764L, 66737L, 61483L,
23663L, 18103L, 73537L, 68968L, 61599L, 24347L, 18455L, 73041L, 70867L, 62190L, 23305L, 18986L, 71645L, 72368L, 63235L, 21077L,
20717L, 70201L, 73275L, 64867L, 17653L, 22534L, 68704L, 73517L,
66893L, 14089L, 23935L, 67117L, 73238L, 68928L, 11606L, 24343L, 65391L, 72725L, 70609L, 10697L, 23592L)), .Names = c("time",
"geo", "sex", "age", "value"), class = "data.frame", row.names = c(NA, -226L))
You can use the spread function from tidyr
dt_final <- dt %>% spread (time, # the variable I want to use to create multiple columns
value)# the variable to use to fill the rows in the new columns
head(as.tibble(dt_final))
# geo sex age `2000` `2001` `2002` `2003` `2004` `2005` `2006` `2007` `2008` `2009` `2010` `2011` `2012` `2013` `2014` `2015` `2016` `2017` `2018` `2019` `2020`
# <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 51 1 0 69844 NA 65301 65429 66195 66949 69257 70402 70681 70818 71579 NA 73653 73764 73537 73041 71645 70201 68704 67117 65391
# 2 51 1 5 93632 NA 82373 77025 72123 68002 64759 62813 62125 62321 62729 NA 64712 66737 68968 70867 72368 73275 73517 73238 72725
# 3 51 1 10 109773 NA 103304 98764 93892 88909 83202 77601 72404 68099 65085 NA 61851 61483 61599 62190 63235 64867 66893 68928 70609
# 4 51 1 75 11025 NA 16130 18861 20763 21513 21269 20044 18627 17947 18379 NA 21697 23663 24347 23305 21077 17653 14089 11606 10697
# 5 51 1 80 7397 NA 7705 7835 8231 8973 10813 12820 14631 15893 16509 NA 17861 18103 18455 18986 20717 22534 23935 24343 23592
# 6 51 2 0 61192 NA 55281 55545 56564 57471 60145 61425 61683 61776 62477 NA 64401 64691 64689 64391 63158 61961 60751 59469 58045

dplyr data manipulation to multiply columns by a value

I have a dataframe that looks like the following (dput at the end):
region type age_group year value
AO1 p 0 1990 12
AO1 p 5 1990 10
AO1 p 10 1990 8
AO1 p 15 1990 14
AO1 p 20 1990 19
...
AO1 p 80 1990 12
AO1 p 1 1990 0.54
AO1 p 2 1990 0.46
AO1 p 3 1990 1
where the last three lines express the percentage of males (1) and female (2) and total (3).
What I would like to do is to produce two more variables value.m and value.f by multiplying value by the correct percentage
In this case, value.m would use 0.54 and value.f 0.46 for year 1990 in region AO1
dt$value.m <- dt %>%
group_by(region, type, age_num, year) %>%
mutate(value.m=value*???)
Any ideas?
dt <- structure(list(region = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L,1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 4L, 4L, 4L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 4L, 4L, 4L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 4L, 4L,
4L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 4L, 4L, 4L, 2L, 2L, 2L), .Label =
c("AO1", "AO11", "AO22", "AO3"), class = "factor"), age = structure(c(1L,
10L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 11L, 12L, 13L, 14L, 15L, 16L,
17L, 1L, 10L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 11L, 12L, 13L,
14L, 15L, 16L, 17L, 1L, 10L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
11L, 12L, 13L, 14L, 15L, 16L, 17L, 1L, 10L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 19L, 18L, 20L,
19L, 18L, 20L, 19L, 18L, 20L, 19L, 18L, 20L, 21L, 30L, 22L, 23L,
24L, 25L, 26L, 27L, 28L, 29L, 31L, 32L, 33L, 34L, 35L, 36L, 37L,
21L, 30L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 31L, 32L, 33L,
34L, 35L, 36L, 37L, 21L, 30L, 22L, 23L, 24L, 25L, 26L, 27L, 28L,
29L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 21L, 30L, 22L, 23L, 24L,
25L, 26L, 27L, 28L, 29L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 39L,
38L, 40L, 39L, 38L, 40L, 39L, 38L, 40L, 39L, 38L, 40L, 1L, 10L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 11L, 12L, 13L, 14L, 15L, 16L,
17L, 1L, 10L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 11L, 12L, 13L,
14L, 15L, 16L, 17L, 1L, 10L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
11L, 12L, 13L, 14L, 15L, 16L, 17L, 1L, 10L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 19L, 18L, 20L,
19L, 18L, 20L, 19L, 18L, 20L, 19L, 18L, 20L, 21L, 30L, 22L, 23L,
24L, 25L, 26L, 27L, 28L, 29L, 31L, 32L, 33L, 34L, 35L, 36L, 37L,
21L, 30L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 31L, 32L, 33L,
34L, 35L, 36L, 37L, 21L, 30L, 22L, 23L, 24L, 25L, 26L, 27L, 28L,
29L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 21L, 30L, 22L, 23L, 24L,
25L, 26L, 27L, 28L, 29L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 39L,
38L, 40L, 39L, 38L, 40L, 39L, 38L, 40L, 39L, 38L, 40L), .Label = c("c_0_4",
"c_10_14", "c_15_19", "c_20_24", "c_25_29", "c_30_34", "c_35_39",
"c_40_44", "c_45_49", "c_5_9", "c_50_54", "c_55_59", "c_60_64",
"c_65_69", "c_70_74", "c_75_79", "c_80+", "c_f", "c_m", "c_total_sex",
"p_0_4", "p_10_14", "p_15_19", "p_20_24", "p_25_29", "p_30_34",
"p_35_39", "p_40_44", "p_45_49", "p_5_9", "p_50_54", "p_55_59",
"p_60_64", "p_65_69", "p_70_74", "p_75_79", "p_80+", "p_f", "p_m",
"p_total_sex"), class = "factor"), age_num = c(0L, 5L, 10L, 15L,
20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L,
0L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L,
65L, 70L, 75L, 80L, 0L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L,
45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 0L, 5L, 10L, 15L, 20L,
25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 0L, 5L, 10L, 15L,
20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L,
0L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L,
65L, 70L, 75L, 80L, 0L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L,
45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 0L, 5L, 10L, 15L, 20L,
25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 0L, 5L, 10L, 15L,
20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L,
0L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L,
65L, 70L, 75L, 80L, 0L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L,
45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 0L, 5L, 10L, 15L, 20L,
25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 0L, 5L, 10L, 15L,
20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L,
0L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L,
65L, 70L, 75L, 80L, 0L, 5L, 10L, 15L, 20L, 25L, 30L, 35L, 40L,
45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 0L, 5L, 10L, 15L, 20L,
25L, 30L, 35L, 40L, 45L, 50L, 55L, 60L, 65L, 70L, 75L, 80L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), year = c(2006L, 2006L,
2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L,
2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L,
2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L,
2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L,
2006L, 2006L,
2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L,
2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L,
2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L,
2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L,
2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L,
2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L,
2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L,
2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L,
2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L,
2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L,
2006L, 2006L, 2006L, 2006L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L), value
= c(79.6, 55.1, 44.6, 44.3,
26.8, 9.5, 7.2, 6.5, 5.6, 2.4, 0.6, 5.2, 7.6, 10.4, 12, 13.5,
13.5, 42.4, 23.1, 14.7, 12.5, 3.9, 1.4, 2.4, 5, 4.2, 7, 7.6,
10.2, 9.5, 11.1, 12.1, 13.8, 14.1, 30.5, 18.1, 14.6, 7.6, 1.4,
3.3, 4.1, 6.9, 8, 9.9, 9.8, 13.5, 13.1, 14.1, 14.2, 14.6, 14.6,
60.1, 52.1, 52.5, 64.1, 45.5, 26.9, 10.6, 7.7, 8.7, 0.4, 0.5,
4.1, 8.8, 9.9, 12.4, 13.3, 14, 216.8, 227.6, 459.7, 115.8, 112.3,
243.5, 85, 87.9, 188.2, 241.6, 253.9, 510.8, 0.2, 0.15, 0.13,
0.13, 0.09, 0.053, 0.05, 0.05, 0.04, 0.03, 0.03, 0.024, 0, 0.01,
0.016, 0, 0, 0.22, 0.15, 0.12, 0.11, 0.07, 0.05, 0.05, 0.04,
0.04, 0.03, 0.03, 0.02, 0.02, 0.02, 0.01, 0.01, 0, 0.2, 0.19,
0.15, 0.11, 0.07, 0.06, 0.06, 0.04, 0.04, 0.03, 0.03, 0.01, 0.01,
0.01, 0.01, 0, 0, 0.14, 0.13, 0.13, 0.15, 0.12, 0.08, 0.05, 0.04,
0.05, 0.03, 0.03, 0.02, 0.01, 0.01, 0.01, 0, 0, 0.49, 0.51, 1,
0.51, 0.49, 1, 0.49, 0.51, 1, 0.49, 0.51, 1, 241.9, 175.54, 146.5,
138.46, 108.14, 73.94, 66.58, 64.78, 58.9, 43.86, 49.1, 36.5,
33.38, 25.54, 21.66, 18.42, 18.58, 243.74, 163.86, 130.22, 121.42,
96.1, 80.3, 63.9, 55.02, 49.02, 41.78, 51.74, 35.22, 32.66, 25.78,
23.06, 18.66, 18.14, 152.5, 109.9, 93.34, 82.62, 61.7, 56.06,
44.38, 38.26, 33.02, 29.58, 30.86, 21.86, 21.18, 17.62, 17.86,
15.86, 15.58, 196.82, 175.74, 180.46, 182.3, 153.22, 118.18,
81.34, 70.46, 65.82, 47.7, 54.66, 38.54, 29.42, 25.58, 20.38,
18.18, 17.18, 547.58, 566.78, 1100.38, 519.1, 522.78, 1028.06,
310.54, 322.26, 618.82, 619.62, 647.02, 1252.66, 0.206, 0.15,
0.126, 0.122, 0.088, 0.052, 0.05, 0.05, 0.04, 0.03, 0.032, 0.02,
0.02, 0.01, 0.01, 0, 0.002, 0.222, 0.15, 0.118, 0.108, 0.074,
0.054, 0.05, 0.04, 0.038, 0.028, 0.032, 0.02, 0.02, 0.018, 0.01,
0.008, 0, 0.23, 0.158, 0.142, 0.11, 0.074, 0.064, 0.056, 0.04,
0.038, 0.028, 0.03, 0.012, 0.01, 0.01, 0.01, 0, 0, 0.144, 0.132,
0.134, 0.14, 0.118, 0.082, 0.054, 0.042, 0.046, 0.028, 0.032,
0.02, 0.01, 0.01, 0.008, 0, 0, 0.49, 0.51, 1, 0.57, 0.43, 1,
0.4, 0.6, 1, 0.3, 0.7, 1)), .Names = c("region", "age", "age_num",
"year", "value"), class = "data.frame", row.names = c(NA, -320L))
Step 1: merge year and region in one variable (I work on dt, that you've dput-ed)
new.dt <- dt %>% mutate(regyear = paste(region, year))
Step 2: create data.frame with your p_m's and regyear only:
p.m.s<-new.dt %>%
filter(age=='p_m') %>%
select(regyear, value) %>%
rename(pm=value) # to avoid duplicated names in new.df and p.m.s
Step 3: the same with p_f's:
p.f.s<-new.dt %>% filter(age=='p_f') %>% select(regyear, value) %>% rename(pf=value)
Step 4: get what you need :)
new.dt %>%
left_join(p.m.s) %>% # add p_m's
left_join(p.f.s) %>% # add p_f's
mutate(value.m=value*pm, value.f=value*pf) %>%
select(-c(regyear,pm,pf)) # clean up
Hope this hepled!
Hi in the data you gave the variable type is called age. So be careful about this. According to your data you can accomplish that doing this
dt %>% join(dt %>% filter(age=="p_m" & region==region)
%>% select(region,value) %>% setNames(c("region","p_m")),by= "region")
%>% join(dt %>% filter(age=="p_f" & region==region) %>% select(region,value)
%>% setNames(c("region","p_f")),by= "region")
%>% mutate (value.m=value*p_m, value.f=value*p_f)
%>% select(-c(p_m,p_f))
This code filter p_m and p_f for each region and join with the original table.
Then use mutate to calculate the value, then drop the column p_m and p_f

How to change the order of the bars in accordance with the group variable in a barplot using lattice in R?

I am making a bar plot using lattice in R where I have data for 4 different years on sources of irrigation for different states. using my code, the bar plot is coming fine but I wish the bar corresponding to the year 1996 to be plotted first followed by the bar corresponding to year 2001 etc. so as to show the increasing area being irrigated by tube-wells. However, I am unable to change the ordering. Here is my data and the R code. Many thanks for your help.
# sample data
irr_atlas <- structure(list(state = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("ANDHRA PRADESH",
"KARNATAKA", "MADHYA PRADESH", "RAJASTHAN"), class = "factor"),
st_code = c(28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L,
28L, 28L, 28L, 28L, 28L, 28L, 28L, 29L, 29L, 29L, 29L, 29L,
29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 23L,
23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L,
23L, 23L, 23L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L), year = c(1996L, 1996L, 1996L, 1996L,
2001L, 2001L, 2001L, 2001L, 2006L, 2006L, 2006L, 2006L, 2011L,
2011L, 2011L, 2011L, 1996L, 1996L, 1996L, 1996L, 2001L, 2001L,
2001L, 2001L, 2006L, 2006L, 2006L, 2006L, 2011L, 2011L, 2011L,
2011L, 1996L, 1996L, 1996L, 1996L, 2001L, 2001L, 2001L, 2001L,
2006L, 2006L, 2006L, 2006L, 2011L, 2011L, 2011L, 2011L, 1996L,
1996L, 1996L, 1996L, 2001L, 2001L, 2001L, 2001L, 2006L, 2006L,
2006L, 2006L, 2011L, 2011L, 2011L, 2011L), irr_area = c(1.84066,
0.942819, 0.82886, 0.853502, 1.54922, 0.825659, 0.542492,
1.53412, 1.72969, 0.70271, 0.637221, 1.53894, 1.99893, 0.678425,
0.819829, 1.70708, 0.921594, 0.231669, 0.316999, 0.358529,
0.91339, 0.207157, 0.426549, 0.481061, 0.921255, 0.18192,
0.426145, 0.547193, 0.930802, 0.148065, 0.377149, 1.51843,
1.59425, 0.112145, 2.67683, 0.540054, 1.48056, 0.030502,
1.63696, 0.563948, 1.12595, 0.058667, 2.46494, 1.15004, 1.10444,
0.157069, 2.64378, 2.14177, 1.55814, 0.106623, 2.71347, 0.644683,
1.35746, 0.030586, 2.41845, 0.935234, 1.76933, 0.054374,
2.46197, 1.76918, 1.62587, 0.050299, 2.14737, 2.82708),irr_source = structure(c(1L,2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L,
1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L,
3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L,
4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L,
2L, 4L, 3L), .Label = c("Canal", "Tank", "Tube", "Well"), class = "factor")), .Names = c("state","st_code", "year", "irr_area", "irr_source"), class = "data.frame", row.names = c(NA, -64L))
Code for plot...
library(lattice)
barchart(~irr_area | factor(state) + factor(irr_source),
group=year, data=irr_atlas, auto.key=list(space="right"))
As mentioned, ordering of groups in R graphics is usually determined by the ordering of the factor variable. So, you can reorder your factors with factor and its levels argument.
library(lattice)
barchart(~irr_area | factor(state) + factor(irr_source),
group=factor(year, levels=sort(unique(year), decreasing=T)), # change the order of years
data=irr_atlas, auto.key=list(space="right"))
You can switch it back the other way by changing decreasing=F.

Manual annotate a ggplot with different labels, in different facets

JD Long helped me with this: question about manual annotation.
But is it possible to do something similar on a facetted plot, such that the label style corresponds to the linestyle (aestetics) and in a way that I can annotate different facets individually?
Some data:
funny <- structure(list(Institution = structure(c(1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("Q-branch",
"Some-Ville", "Spectre"), class = "factor"), Type = structure(c(5L,
6L, 1L, 3L, 5L, 6L, 2L, 4L, 5L, 6L, 2L, 4L, 5L, 6L, 2L, 4L, 5L,
6L, 2L, 4L, 5L, 6L, 2L, 4L, 5L, 6L, 2L, 4L, 5L, 6L, 2L, 4L, 5L,
6L, 2L, 4L, 5L, 6L, 2L, 4L, 5L, 6L, 2L, 4L, 5L, 6L, 2L, 4L), .Label = c("Korte videregående uddannelser",
"Mammas beer", "Mellemlange videregående uddannelser", "Tastes good",
"Unknown", "Your"), class = "factor"), År = c(2008L, 2008L,
2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L,
2008L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L,
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2006L, 2006L,
2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L,
2006L), Mndr = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 15L, 15L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 27L, 27L, 27L,
27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L), Data = c(159L,
NA, NA, 23L, 204L, NA, NA, 12L, 256L, NA, NA, 24L, 166L, 6L,
NA, 43L, 228L, NA, NA, 20L, 196L, 11L, NA, 37L, 99L, 14L, 9L,
96L, 147L, 7L, 5L, 91L, 100L, 10L, 7L, 126L, 60L, 17L, 6L, 106L,
78L, 18L, 13L, 140L, 48L, 23L, 5L, 136L)), .Names = c("Institution",
"Type", "År", "Mndr", "Data"), class = "data.frame", row.names = c(NA,
-48L))
And a facetted plot:
ggplot(funny, aes(Mndr, y=Data, group=Type, col=Type)) +
geom_line() +
facet_grid(.~Institution)
Thanks in advance for your help!
The idea is that for each manual annotation you have to define not only the label, but all the variables that define the panel, color, etc. The following code adds two labels in different panels.
pl <- ggplot(funny, aes(Mndr, y=Data, group=Type, col=Type))+geom_line()
+facet_grid(.~Institution) #your plot
nd <- data.frame(Institution=c("Q-branch","Some-Ville"), #panel
Type=c("Unknown", "Tastes good"), #color
Mndr=c(7,12), #x-coordinate of label
Data= c(170,50), #y-coordinate of label
Text=c("Label 1", "Label 2")) #label text
# add labels to plot:
pl <- pl + geom_text(aes(label=Text), data=nd, hjust=0, legend=FALSE)
pl
The legend=FALSE option will ensure that the small a's denoting the text are not added to the legend. You don't have to have a data frame for the labels, you could have a separate geom_text for each, but I find this way simpler.

Resources