grouping months by winter season instead of year in R - r

I have got the following data frame
year <- c(1949, 1950, 1950, 1950, 1951, 1951, 1951, 1952, 1952, 1952, 1953, 1953, 1953)
month <- c(12, 1, 2, 12, 1, 2, 12, 1, 2, 12, 1, 2, 12)
df <- data.frame(year, month)
df
year month
1 1949 12
2 1950 1
3 1950 2
4 1950 12
5 1951 1
6 1951 2
7 1951 12
8 1952 1
9 1952 2
10 1952 12
11 1953 1
12 1953 2
13 1953 12
where month 1 is January and month 12 is December. now I would like to group them by winter season. this would mean that for example month 12 from year 1949 would be grouped with month 1 and 2 from 1950 because they are part of 1 winter season. the ideal outcome would be:
year month winterseason
1 1949 12 1
2 1950 1 1
3 1950 2 1
4 1950 12 2
5 1951 1 2
6 1951 2 2
7 1951 12 3
8 1952 1 3
9 1952 2 3
10 1952 12 4
11 1953 1 4
12 1953 2 4
13 1953 12 5
any ideas?

If this is already arranged by the month
df$winterseason <- cumsum(df$month == 12)
df$winterseason
#[1] 1 1 1 2 2 2 3 3 3 4 4 4 5

This would label each season by a yearqtr class object giving the year and quarter of the last month of each winter. We convert the year and month to a "yearmon" class object and add 1/12 which pushes each month to the next month. Then convert that to a "yearqtr" class object.
library(zoo)
transform(df, season = as.yearqtr(as.yearmon(paste(year, month, sep = "-")) + 1/12))
giving:
year month season
1 1949 12 1950 Q1
2 1950 1 1950 Q1
3 1950 2 1950 Q1
4 1950 12 1951 Q1
5 1951 1 1951 Q1
6 1951 2 1951 Q1
7 1951 12 1952 Q1
8 1952 1 1952 Q1
9 1952 2 1952 Q1
10 1952 12 1953 Q1
11 1953 1 1953 Q1
12 1953 2 1953 Q1
13 1953 12 1954 Q1
Note that if season is a variable containing the season column values then as.integer(season) and cycle(season) can be used to extract the year and quarter numbers so, for example, if there were also non-winter rows then cycle(season) == 1, would identify those in the winter.

Try
year <- c(1949, 1950, 1950, 1950, 1951, 1951, 1951, 1952, 1952, 1952, 1953, 1953, 1953)
month <- c(12, 1, 2, 12, 1, 2, 12, 1, 2, 12, 1, 2, 12)
df <- data.frame(year, month)
df$season <- ifelse(month == 12,year+1,year) - min(year)
This is not very elegant, but produces your ideal outcome
year month season
1 1949 12 1
2 1950 1 1
3 1950 2 1
4 1950 12 2
5 1951 1 2
6 1951 2 2
7 1951 12 3
8 1952 1 3
9 1952 2 3
10 1952 12 4
11 1953 1 4
12 1953 2 4
13 1953 12 5

Here is an alternative: using magrittr and data.table
df$winterYear <- ifelse(month %in% c(11,12),year+1,year) %>% data.table::rleidv()
result:
year month winterYear
1 1949 12 1
2 1950 1 1
3 1950 2 1
4 1950 12 2
5 1951 1 2
6 1951 2 2
7 1951 12 3
8 1952 1 3
9 1952 2 3
10 1952 12 4
11 1953 1 4
12 1953 2 4
13 1953 12 5
Side note: To be save you can/should sort your data by year,month.

Related

Combining two or more columns into one [duplicate]

This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 9 months ago.
I have four columns:
Year
I
I
I
1951
4
6
7
1952
8
0
3
1953
3
2
9
how do I combine them into two like that?:
Year
I
1951
4
1951
6
1951
7
1952
8
1952
0
1952
3
1953
3
1953
2
1953
9
I am not completely sure what your format is, but probably you want something like this by using pivot_longer to make your table from wide to long:
df <- data.frame(Year = c(1951, 1952, 1953),
v1 = c(4,8,3),
v2 = c(6,0,2),
v3 = c(7,3,9))
library(tidyr)
df %>%
pivot_longer(!Year, names_to = "variable", values_to = "values")
Output:
# A tibble: 9 × 3
Year variable values
<dbl> <chr> <dbl>
1 1951 v1 4
2 1951 v2 6
3 1951 v3 7
4 1952 v1 8
5 1952 v2 0
6 1952 v3 3
7 1953 v1 3
8 1953 v2 2
9 1953 v3 9

R: create or delete rows given a range of values [duplicate]

This question already has answers here:
Complete dataframe with missing combinations of values
(2 answers)
Fill missing combinations in a dataframe
(1 answer)
Closed 1 year ago.
I have the next database with country, year, and GDP:
What I have
Country
Year
GDP
Afghanistan
1950
$123
Afghanistan
1951
$123
Afghanistan
2019
$123
Australia
1945
$123
Australia
2021
$123
And what I need is to create or delete rows so each country has rows from 1948 to 2021. So, for example, for Afghanistan I need to create rows 1948 to 1949 and 2021 with a null GDP, and for Australia delete the 1945 row and create everything in between.
This isn't my exact database, I have 200+ countries each with different years. Is there a way to create this easily?
What I need
Country
Year
GDP
Afghanistan
1948
NA
...
...
...
Afghanistan
2021
NA
Australia
1948
$123
...
...
...
Australia
2021
$123
We can use complete to create the missing combinations and specify the GDP as 0
library(tidyr)
complete(df1, Country, Year = 1948:2021, list(GDP = 0)) %>%
arrange(Country)
We can use complete, then filter and finally replace_na.
library(dplyr)
df <-read.table(header=TRUE, text="Country Year GDP
Afghanistan 1950 $123
Afghanistan 1951 $123
Afghanistan 2019 $123
Australia 1945 $123
Australia 2021 $123")
df <- df %>%
complete(Year = 1948:2021, Country) %>%
filter(between(Year, 1948, 2021)) %>%
replace_na(list(GDP = 0)) %>%
arrange(Country)
head(df)
tail(df)
> print(head(df))
# A tibble: 6 x 3
Year Country GDP
<int> <chr> <chr>
1 1948 Afghanistan 0
2 1949 Afghanistan 0
3 1950 Afghanistan $123
4 1951 Afghanistan $123
5 1952 Afghanistan 0
6 1953 Afghanistan 0
> print(tail(df))
# A tibble: 6 x 3
Year Country GDP
<int> <chr> <chr>
1 2016 Australia 0
2 2017 Australia 0
3 2018 Australia 0
4 2019 Australia 0
5 2020 Australia 0
6 2021 Australia $123
Created on 2021-09-26 by the reprex package (v2.0.1)
library(tidyr)
library(dplyr)
df <-
tibble::tribble(
~Country, ~Year, ~GDP,
"Afghanistan", 1950L, "$123",
"Afghanistan", 1951L, "$123",
"Afghanistan", 2019L, "$123",
"Australia", 1945L, "$123",
"Australia", 2021L, "$123"
)
df %>%
filter(Year >= 1948 & Year <= 2021) %>%
complete(Year = 1948:2021,Country) %>%
arrange(Country)
# A tibble: 148 x 3
Year Country GDP
<int> <chr> <chr>
1 1948 Afghanistan NA
2 1949 Afghanistan NA
3 1950 Afghanistan $123
4 1951 Afghanistan $123
5 1952 Afghanistan NA
6 1953 Afghanistan NA
7 1954 Afghanistan NA
8 1955 Afghanistan NA
9 1956 Afghanistan NA
10 1957 Afghanistan NA
# ... with 138 more rows
Here is a solution with complete and coalesce
library(dplyr)
library(tidyr)
df %>%
complete(Year = 1948:2021, Country) %>%
arrange(Country, Year) %>%
mutate(GDP = coalesce(GDP, "0"))
# A tibble: 149 x 3
Year Country GDP
<int> <chr> <chr>
1 1948 Afghanistan 0
2 1949 Afghanistan 0
3 1950 Afghanistan $123
4 1951 Afghanistan $123
5 1952 Afghanistan 0
6 1953 Afghanistan 0
7 1954 Afghanistan 0
8 1955 Afghanistan 0
9 1956 Afghanistan 0
10 1957 Afghanistan 0
# … with 139 more rows

Reshape dataframe in R using dcast or ftable [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 3 years ago.
I currently have a data frame that looks like this.
country2<-c("Afghanistan","Afghanistan","Afghanistan")
continent2<-c("Asia","Asia","Asia")
series<-c('lifeexp','pop','gdp')
y1901<-c('1','3','100')
y1902<-c('2','4','101')
y1903<-c('2','4','101')
y1904<-c('2','4','101')
y1905<-c('2','4','101')
y1906<-c('2','4','101')
y1907<-c('2','4','101')
df<-data.frame(country2,continent2,series,y1901,y1902,y1903,y1904,y1905,y1906,y1907)
country2 continent2 series y1901 y1902 y1903 y1904 y1905 y1906 y1907
1 Afghanistan Asia lifeexp 1 2 2 2 2 2 2
2 Afghanistan Asia pop 3 4 4 4 4 4 4
3 Afghanistan Asia gdp 100 101 101 101 101 101 101
How can I reshape this data so that it will look like this?
country<-c("Afghanistan","Afghanistan","Afghanistan","Afghanistan","Afghanistan","Afghanistan","Afghanistan")
continent<-c("Asia","Asia","Asia","Asia","Asia","Asia","Asia")
year<-c("1901","1902","1903","1904","1905","1906","1907")
lifeexp<-c("1","2","2","2","2","2","2")
pop<-c('3','4','4','4','4','4','4')
gdp<-c('100','101','101','101','101','101','101')
df<-data.frame(country,continent,year,lifeexp,pop,gdp)
country continent year lifeexp pop gdp
1 Afghanistan Asia 1901 1 3 100
2 Afghanistan Asia 1902 2 4 101
3 Afghanistan Asia 1903 2 4 101
4 Afghanistan Asia 1904 2 4 101
5 Afghanistan Asia 1905 2 4 101
6 Afghanistan Asia 1906 2 4 101
7 Afghanistan Asia 1907 2 4 101
I have tried using dcast2 from the reshape2 to reshape the data but I can only enter 1 column for value.var.
dcast(df,country+region~series,value.var ='y1901',fun.aggregate = sum)
I also tried using ftable and xtabs but I'm still not sure how to enter more than 1 column for the value. The code below gives an error.
ftable(xtabs(c(y2000,y2001)~country+region+series,df))
Thanks
A data.table approach using melt and dcast could be
library(data.table)
setDT(df)
dcast(melt(df,measure = patterns("^y\\d+")),country2 + continent2 + variable~series)
# country2 continent2 variable gdp lifeexp pop
#1: Afghanistan Asia y1901 100 1 3
#2: Afghanistan Asia y1902 101 2 4
#3: Afghanistan Asia y1903 101 2 4
#4: Afghanistan Asia y1904 101 2 4
#5: Afghanistan Asia y1905 101 2 4
#6: Afghanistan Asia y1906 101 2 4
#7: Afghanistan Asia y1907 101 2 4
I know that you are looking for a solution with ftable or dcast but just for your knowledge, you can achieve it using tidyr:
library(tidyverse)
df %>%
pivot_longer(., cols = starts_with("y190"), names_to = "year", values_to = "Value") %>%
pivot_wider(., names_from = "series", values_from = "Value") %>%
mutate(year = gsub("y","", year)) %>%
rename(country = country2, continent = continent2)
# A tibble: 7 x 6
country continent year lifeexp pop gdp
<fct> <fct> <chr> <fct> <fct> <fct>
1 Afghanistan Asia 1901 1 3 100
2 Afghanistan Asia 1902 2 4 101
3 Afghanistan Asia 1903 2 4 101
4 Afghanistan Asia 1904 2 4 101
5 Afghanistan Asia 1905 2 4 101
6 Afghanistan Asia 1906 2 4 101
7 Afghanistan Asia 1907 2 4 101

What is the best way to create a new column based on two conditions?

I have 60 years of daily weather data and want to label each winter (i.e. 1-60). Because winter's cross years, there's no way to subset or write a simple ifelse statement using just the months. A nested ifelse statement specifying both month and year for each of the 60 years seems impractical, is there a better way to do this?
Here is just an example with three years.
month<-c(11,12,1,2,3,4,11,12,1,2,3,4,11,12,1,2,3,4)
year<-c(1950,1950,1951,1951,1951,1951,1951,1951,1952,1952,1952,1952,1952,1952,1953,1953,1953,1953)
df<-cbind(month,year)
df<-as.data.frame(df)
I want the dates between Nov. 1950 and April 1951 to all be labeled 1 in a new column. The dates between Nov. 1951 - April 1952 labeled 2, etc.
I would like the final dataframe to look something like this:
month year winter
1 11 1950 1
2 12 1950 1
3 1 1951 1
4 2 1951 1
5 3 1951 1
6 4 1951 1
7 11 1951 2
8 12 1951 2
9 1 1952 2
10 2 1952 2
11 3 1952 2
12 4 1952 2
13 11 1952 3
14 12 1952 3
15 1 1953 3
16 2 1953 3
17 3 1953 3
18 4 1953 3
Any thoughts on a simple way to do this, as I have 60 years of daily data for over 30 weather stations?
Use cumsum like this:
transform(df, winter = cumsum(month == 11))
giving:
month year winter
1 11 1950 1
2 12 1950 1
3 1 1951 1
4 2 1951 1
5 3 1951 1
6 4 1951 1
7 11 1951 2
8 12 1951 2
9 1 1952 2
10 2 1952 2
11 3 1952 2
12 4 1952 2
13 11 1952 3
14 12 1952 3
15 1 1953 3
16 2 1953 3
17 3 1953 3
18 4 1953 3

Merging datasets based on more than 1 column in both datasets

I'm trying to merge two datasets, by year and country. The first data set (df = GNIPC) represent Gross national income per capite for every country from 1980-2008.
Country Year GNIpc
(chr) (dbl) (dbl)
1 Afghanistan 1990 NA
2 Afghanistan 1991 NA
3 Afghanistan 1992 2010
4 Afghanistan 1993 NA
5 Afghanistan 1994 12550
6 Afghanistan 1995 NA
The second dataset (df = sanctions) represents the imposition of economic sanctions from 1946 to present day.
country imposition sanctiontype sanctions_period
(chr) (dbl) (chr) (chr)
1 Afghanistan 1 1 6 8 1997-2001
2 Afghanistan 1 7 1979-1979
3 Afghanistan 1 4 7 1995-2002
4 Albania 1 2 8 2005-2005
5 Albania 1 7 2005-2006
6 Albania 1 8 2004-2005
I would like to merge the two datasets so that for every GNI year i either have sanctions present in the country or not. For the GNI years that are not in the sanctions_period the value would be 0 and for those that are it would be 1. This is what i want it to look like:
Country Year GNIpc Imposition sanctiontype
(chr) (dbl) (dbl) (dbl) (chr)
1 Afghanistan 1990 NA 0 NA
2 Afghanistan 1991 NA 0 NA
3 Afghanistan 1992 2010 0 NA
4 Afghanistan 1993 NA 0 NA
5 Afghanistan 1994 12550 0 NA
6 Afghanistan 1995 NA 1 4 7
Some example data:
df1 <- data.frame(country = c('Afghanistan', 'Turkey'),
imposition = c(1, 0),
sanctiontype = c('1 6 8', '4'),
sanctions_period = c('1997-2001', '2003-ongoing')
)
country imposition sanctiontype sanctions_period
1 Afghanistan 1 1 6 8 1997-2001
2 Turkey 0 4 2012-ongoing
The "sanctions_period" column can be transformed with dplyr and tidyr:
library(tidyr)
library(dplyr)
df.new <- separate(df1, sanctions_period, c('start', 'end'), remove = F) %>%
mutate(end = ifelse(end == 'ongoing', '2016', end)) %>%
mutate(start = as.numeric(start), end = as.numeric(end)) %>%
group_by(country, sanctions_period) %>%
do(data.frame(country = .$country, imposition = .$imposition, sanctiontype = .$sanctiontype, year = .$start:.$end))
sanctions_period country imposition sanctiontype year
<fctr> <fctr> <dbl> <fctr> <int>
1 1997-2001 Afghanistan 1 1 6 8 1997
2 1997-2001 Afghanistan 1 1 6 8 1998
3 1997-2001 Afghanistan 1 1 6 8 1999
4 1997-2001 Afghanistan 1 1 6 8 2000
5 1997-2001 Afghanistan 1 1 6 8 2001
6 2012-ongoing Turkey 0 4 2012
7 2012-ongoing Turkey 0 4 2013
8 2012-ongoing Turkey 0 4 2014
9 2012-ongoing Turkey 0 4 2015
10 2012-ongoing Turkey 0 4 2016
From there, it should easy to merge with your first data frame. Note that your first data frame capitalizes Country and Year, while the second doesn't.
df.merged <- merge(df.first, df.new, by.x = c('Country', 'Year'), by.y = c('country', 'year'))
Using dplyr:
left_join(GNIPC, sanctions, by=c("Country"="country", "Year"="Year")) %>%
select(Country,Year, GNIpc, Imposition, sanctiontype)

Resources