my data is here:
x <- data.frame("Year" = c(1945,1945,1945,1946,1946,1946, 1947,1947,1947), "Age" = c(1,2,3,1,2,3,1,2,3), "Value" = c(4,5,6,7,8,9,10,11,12))
I would like to assign the value from "year+1 and age +1" to a new variable. Ex. For the case with year =1945 and age=1, I would like to assign the value = 8 (from year = 1946, age =2 ) to the new variable.
My ideal result will be like this:
x <- data.frame("Year" = c(1945,1945,1945,1946,1946,1946, 1947,1947,1947), "Age" = c(1,2,3,1,2,3,1,2,3), "Value" = c(4,5,6,7,8,9,10,11,12),"Year1moereandAge1more"= c(8,9,NA, 11, 12, NA, NA, NA,NA))
Thank you for helping a beginner.
Using a modified self-join:
library(dplyr)
x %>%
transmute(Year = Year - 1, Age = Age - 1, Year1moereandAge1more = Value) %>%
right_join(x) %>%
arrange(Year, Age)
# Joining, by = c("Year", "Age")
# Year Age Year1moereandAge1more Value
# 1 1945 1 8 4
# 2 1945 2 9 5
# 3 1945 3 NA 6
# 4 1946 1 11 7
# 5 1946 2 12 8
# 6 1946 3 NA 9
# 7 1947 1 NA 10
# 8 1947 2 NA 11
# 9 1947 3 NA 12
Related
I have a dataframe that looks like
id = c("1", "2", "3")
IN1999 = c(1, 1, 0)
IN2000 = c(1, 0, 1)
TEST1999 = c(10, 12, NA)
TEST2000 = c(15, NA, 11)
df <- data.frame(id, IN1999, IN2000, TEST1999, TEST2000)
I am trying to use pivot_longer to change it into this form:
id year IN TEST
1 1 1999 1 10
2 1 2000 1 15
3 2 1999 1 12
4 2 2000 0 NA
5 3 1999 0 NA
6 3 2000 1 11
My current code looks like this
df %>%
pivot_longer(col = !id, names_to = c(".value", "year"),
names_sep = 4)
but obviousely by setting names_sep = 4, r cuts IN1999 and IN2000 at the wrong place. How can I set the argument so that r can separate the column name from the last four digits?
The names_sep-argument in pivot_longer also accepts regex expressions, that will allow you to split before the occurrence of four digits as in this example below:
library(tidyr)
df |>
pivot_longer(col = !id, names_to = c(".value", "year"),
names_sep = "(?=\\d{4})")
Output:
# A tibble: 6 × 4
id year IN TEST
<chr> <chr> <dbl> <dbl>
1 1 1999 1 10
2 1 2000 1 15
3 2 1999 1 12
4 2 2000 0 NA
5 3 1999 0 NA
6 3 2000 1 11
Below is the sample data. I receive the data in a form such as this. Each row is a quarter and then the months are columns inside of it. Trying to do some month over month calculation but am thinking that I transform the data frame in order to do so. I am thinking that I would do a pivot_longer but not seeing anything online that is of a similar vein. Below is the desired result
year<-c(2018,2018,2018,2018,2019,2019,2019,2019,2020,2020,2020,2020)
qtr<-c(1,2,3,4,1,2,3,4,1,2,3,4)
avgemp <-c(3,5,7,9,11,13,15,17,19,21,23,25)
month1emp<-c(2,4,6,8,10,12,14,16,18,20,22,24)
month2emp<-c(3,5,7,9,11,13,15,17,19,21,23,25)
month3emp<-c(4,6,8,10,12,14,16,18,20,22,24,26)
sample<-data.frame(year,qtr,month1emp,month2emp,month3emp)
Desired Result
year qtr month employment
2018 1 1 2
2018 1 2 3
2018 1 3 4
2018 2 4 4
2018 2 4 5
2018 2 4 6
and so on. At 2019, the month value would restart and go from 1 to 12.
We could use pivot_longer on the 'month' columns, specify the names_pattern to capture the digits ((\\d+)) followed by the emp for the 'month' and the .value columns
library(dplyr)
library(tidyr)
sample %>%
pivot_longer(cols = starts_with('month'),
names_to = c("month", ".value"), names_pattern = ".*(\\d+)(emp)")%>%
rename(employment = emp)
-output
# A tibble: 36 x 4
year qtr month employment
<dbl> <dbl> <chr> <dbl>
1 2018 1 1 2
2 2018 1 2 3
3 2018 1 3 4
4 2018 2 1 4
5 2018 2 2 5
6 2018 2 3 6
7 2018 3 1 6
8 2018 3 2 7
9 2018 3 3 8
10 2018 4 1 8
# … with 26 more rows
If we need to increment the 'month' based on 'qtr' value
sample %>%
pivot_longer(cols = starts_with('month'),
names_to = c("month", ".value"), names_pattern = ".*(\\d+)(emp)")%>%
rename(employment = emp) %>%
mutate(month = as.integer(month) + c(0, 3, 6, 9)[qtr])
# A tibble: 36 x 4
year qtr month employment
<dbl> <dbl> <dbl> <dbl>
1 2018 1 1 2
2 2018 1 2 3
3 2018 1 3 4
4 2018 2 4 4
5 2018 2 5 5
6 2018 2 6 6
7 2018 3 7 6
8 2018 3 8 7
9 2018 3 9 8
10 2018 4 10 8
# … with 26 more rows
Base R solution:
# Create a vector of boolean values,
# denoting whether or not the columns should
# be unpivoted: unpivot_cols => boolean vector
unpivot_cols <- startsWith(
names(df),
"month"
)
# Reshape the data.frame, calculate
# the month value: rshpd_df => data.frame
rshpd_df <- transform(
reshape(
df,
direction = "long",
varying = names(df)[unpivot_cols],
ids = NULL,
timevar = "month",
times = seq_len(sum(unpivot_cols)),
v.names = "employment",
new.row.names = seq_len(
nrow(df) * ncol(df)
)
),
month = ((12 / 4) * (qtr - 1)) + month
)
# Order the data.frame by year and month:
# ordered_df => data.frame
ordered_df <- with(
rshpd_df,
rshpd_df[order(year, month),]
)
I have a data frame like this:
my_df <- data.frame(
year = c("2018","2018","2017","2017", "2016","2016"),
my_month = c(6,7,8,9,4,5),
val=c(5,9,3,2,1,1))
> my_df
year my_month val
1 2018 6 5
2 2018 7 9
3 2017 8 3
4 2017 9 2
5 2016 4 1
6 2016 4 1
I need a data frame like this:
my_df_2 <- data.frame(
year = c("2018","2018","2017","2017", "2016","2016"),
my_month = c(6,7,8,9,4,5),
val=c(5,9,3,2,1,1),
pre_month = c(NA,4,NA,-1,NA,0))
> my_df_2
year my_month val pre_month
1 2018 6 5 NA
2 2018 7 9 4
3 2017 8 3 NA
4 2017 9 2 -1
5 2016 4 1 NA
6 2016 5 1 0
Basically "pre_month" col is created by taking "my_month" row for that particular year and subtracting the value of previous month in "val" column. So far 7-2018 -> 9-5=4 and so on.
Thank you for your help.
Here's a solution using tidyverse.
my_df <- data.frame(
year = c("2018","2018","2017","2017", "2016","2016"),
my_month = c(6,7,8,9,4,5),
val=c(5,9,3,2,1,1))
library(tidyverse)
my_df %>%
mutate(year = as.numeric(year)) %>%
group_by(year) %>%
arrange(my_month) %>%
mutate(pre_month = c(NA, diff(val))) %>%
arrange(desc(year))
I changed year to a numeric so it could be sorted sensibly.
I need to convert my data, which is on quarterly basis, to monthly, by dividing some variable by 3.
Example dataset:
df <- data.frame(Year = c(2018,2019,2020), qtr = c(1,3,2),
amount = c(3,6,12), variable = c(5,6,7))
df
What I would need is to get months for every quarter, i.e. the final dataset would look like this:
data.frame(Year = c(2018,2018,2018,2019,2019,2019,2020,2020,2020),
qtr = c(1,2,3,7,8,9,4,5,6),
amount = c(1,1,1,2,2,2,4,4,4),
variable = c(5,5,5,6,6,6,7,7,7))
Also, bonus question, how do I print the data frames in this environment
Does this work:
df %>%
mutate(qtr_start_mth = case_when(qtr == 1 ~ 1,
qtr == 2 ~ 4,
qtr == 3 ~ 7,
qtr == 4 ~ 10),
qtr_end_mth = case_when(qtr == 1 ~ 3,
qtr == 2 ~ 6,
qtr == 3 ~ 9,
qtr == 4 ~ 12)) %>%
mutate(month = map2(qtr_start_mth, qtr_end_mth, `:`)) %>%
separate_rows() %>%
unnest(month) %>%
mutate(amount = amount /3) %>%
select(1,2,3,4,7)
# A tibble: 9 x 5
Year qtr amount variable month
<dbl> <dbl> <dbl> <dbl> <int>
1 2018 1 1 5 1
2 2018 1 1 5 2
3 2018 1 1 5 3
4 2019 3 2 6 7
5 2019 3 2 6 8
6 2019 3 2 6 9
7 2020 2 4 7 4
8 2020 2 4 7 5
9 2020 2 4 7 6
Data used:
> dput(df)
structure(list(Year = c(2018, 2019, 2020), qtr = c(1, 3, 2),
amount = c(3, 6, 12), variable = c(5, 6, 7)), class = "data.frame", row.names = c(NA,
-3L))
>
Using base:
do.call(rbind,
c(make.row.names = FALSE,
lapply(split(df, df$Year), function(i){
cbind(i, month = 1:3 + (i$qtr - 1) * 3, row.names = NULL)
})))
# Year qtr amount variable month
# 1 2018 1 3 5 1
# 2 2018 1 3 5 2
# 3 2018 1 3 5 3
# 4 2019 3 6 6 7
# 5 2019 3 6 6 8
# 6 2019 3 6 6 9
# 7 2020 2 12 7 4
# 8 2020 2 12 7 5
# 9 2020 2 12 7 6
I have a df looking like this:
Department ID Category Category.ID
NA NA NA NA
Sales 101 2 4
Sales 101 2 4
NA NA NA NA
Sales 101 2 4
Sales 101 2 4
NA NA NA NA
Sales 101 2 4
Sales 101 2 4
df = data.frame(Department = rep(c(NA, 'Sales', 'Sales'), times = 3),
ID = rep(c(NA, 101, 101), times = 3),
Category.Department = rep(c(NA, 2, 2), times = 3),
Category.ID = rep(c(NA, 4, 4), times = 3), stringsAsFactors = FALSE)
And I would like to have an output like this, where in only one column I can have the Department and ID and in another one, the Category. The NA in each column it is important to separate the groups.
New.Col Category
NA NA
Sales 2
101 4
NA NA
Sales 2
101 4
NA NA
Sales 2
101 4
So far I tried with transpose, sapply and a function but it has not worked as I expected. Any suggestions in base?
Can't accept an accept without true expected output.
df$group <- rep(1:3, times = 3)
df2 <- reshape(df[df$group != 3,], direction = "long", varying = list(New.col = c(1,2), Category = c(3,4)),
idvar = "id", v.names = c("New.col", "Category"))
df3 <- df2[order(df2$id),]
df3[!(df3$time == 1 & df3$group == 1), c(3,4)]
New.col Category
1.2 <NA> NA
2.1 Sales 2
2.2 101 4
3.2 <NA> NA
4.1 Sales 2
4.2 101 4
5.2 <NA> NA
6.1 Sales 2
6.2 101 4
Here is a different approach than casting to long format, which relies in coalesce. In addition, I created a group variable and removed the NA rows as they will not serve a purpose in your analysis, i.e.
library(tidyverse)
df %>%
group_by(grp = cumsum(rowSums(is.na(.)) == ncol(.))) %>%
mutate_at(vars(contains('ID')), funs(lag)) %>%
mutate_at(vars(contains('Department')), funs(lead)) %>%
mutate(new.col = coalesce(Department, as.character(ID)),
category = coalesce(Category.Department, Category.ID)) %>%
select(grp, new.col, category) %>%
distinct()
which gives,
# A tibble: 6 x 3
# Groups: grp [3]
grp new.col category
<int> <chr> <dbl>
1 1 Sales 2
2 1 101 4
3 2 Sales 2
4 2 101 4
5 3 Sales 2
6 3 101 4