to set column name to row vaues in R - r

I have this type of table in R
April Tourist
2018 123
2018 222
I want my table to look like this:-
Month Year Domestic International Total
April 2018 123 222 345
I am new to R. I tried using melt and rownames() function given by R but not getting exactly the way out.

Based on your comment that you only have 2 rows in your data set here's a way to do this with dplyr and tidyr -
df <- data_frame(April = c(2018, 2018),
Tourist = c(123, 222))
df %>%
mutate(Type = c("Domestic", "International")) %>%
gather(Month, Year, April) %>%
spread(Type, Tourist) %>%
mutate(
Total = Domestic + International
)
# A tibble: 1 x 5
Month Year Domestic International Total
<chr> <dbl> <dbl> <dbl> <dbl>
1 April 2018 123 222 345

Related

How best to calculate a year over year difference in R

Below is the sample code. The task at hand is to create a year over year difference (2021 q4 value - 2020 q4 value) for only the fourth quarter and percentage difference. Desired result is below. Usually I would do a pivot_wider and such. However, how does one do this and not take all quarters into account?
year <- c(2020,2020,2020,2020,2021,2021,2021,2021,2020,2020,2020,2020,2021,2021,2021,2021)
qtr <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
area <- c(1012,1012,1012,1012,1012,1012,1012,1012,1402,1402,1402,1402,1402,1402,1402,1402)
employment <- c(100,102,104,106,108,110,114,111,52,54,56,59,61,66,65,49)
test1 <- data.frame (year,qtr,area,employment)
area difference percentage
1012 5 4.7%
1402 -10 -16.9
You would use filter on quarter:
test1 |>
filter(qtr == 4) |>
group_by(area) |>
mutate(employment_lag = lag(employment),
diff = employment - employment_lag) |>
na.omit() |>
ungroup() |>
mutate(percentage = diff/employment_lag)
Output:
# A tibble: 2 × 7
year qtr area employment diff employment_start percentage
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2021 4 1012 111 5 106 0.0472
2 2021 4 1402 49 -10 59 -0.169
Update: Adding correct percentage.

Reshaping multiple long columns into wide column format in R

My sample dataset has multiple columns that I want to convert into wide format. I have tried using the dcast function, but I get error. Below is my sample dataset:
df2 = data.frame(emp_id = c(rep(1,2), rep(2,4),rep(3,3)),
Name = c(rep("John",2), rep("Kellie",4), rep("Steve",3)),
Year = c("2018","2019","2018","2018","2019","2019","2018","2019","2019"),
Type = c(rep("Salaried",2), rep("Hourly", 2), rep("Salaried",2),"Hourly",rep("Salaried",2)),
Dept = c("Sales","IT","Sales","Sales", rep("IT",3),rep("Sales",2)),
Salary = c(100,1000,95,95,1500,1500,90,1200,1200))
I'm expecting my output to look like:
One option is the function pivot_wider() from the tidyr package:
df.wide <- tidyr::pivot_wider(df2,
names_from = c("Type", "Dept", "Year"),
values_from = "Salary",
values_fn = {mean})
This should get you the desired result.
What do you think about this output? It is not the expected output, but somehow I find it easier to interpret the data??
df2 %>%
group_by(Name, Year, Type, Dept) %>%
summarise(mean = mean(Salary))
Output:
Name Year Type Dept mean
<chr> <chr> <chr> <chr> <dbl>
1 John 2018 Salaried Sales 100
2 John 2019 Salaried IT 1000
3 Kellie 2018 Hourly Sales 95
4 Kellie 2019 Salaried IT 1500
5 Steve 2018 Hourly IT 90
6 Steve 2019 Salaried Sales 1200

How to do a frequency table where column values are variables?

I have a DF named JOB. In that DF i have 4 columns. Person_ID; JOB; FT (full time or part time with values of 1 for full time and 2 for part time) and YEAR. Every person can have only 1 full time job per year in this DF. This is the full time job they got most of their income during the year.
DF
PERSON_ID JOB FT YEAR
1 Analyst 1 2018
1 Analyst 1 2019
1 Analyst 1 2020
2 Coach 1 2018
2 Coach 1 2019
2 Analyst 1 2020
3 Gardener 1 2020
4 Coach 1 2018
4 Coach 1 2019
4 Analyst 1 2020
4 Coach 2 2019
4 Gardener 2 2019
I want to get different frequency in the lines of the following question:
What full time job changes occurred from 2019 and 2020?
I want to look only at changes where FT=1.
I want my end table to look like this
2019 2020 frequency
Analyst Analyst 1
Coach Analyst 2
NA Gardener 1
I want to look at the data so that i can say 2 people moved from they coaching job to analyst job. 1 analyst did not change their job and one person entered the labour market as a gardener.
I tried to fiddle around with the table function but did not even get close to what i wanted. I could not get the YEAR's to go to separate variables.
10 Bonus points if i can do it in base R :)
Thank you for your help
Not pretty but worked:
# split df by year
df_2019 <- df[df$YEAR %in% c(2019) & df$FT == 1, ]
df_2020 <- df[df$YEAR %in% c(2020) & df$FT == 1, ]
# rename Job columns
df_2019$JOB_2019 <- df_2019$JOB
df_2020$JOB_2020 <- df_2020$JOB
# select needed columns
df_2019 <- df_2019[, c("PERSON_ID", "JOB_2019")]
df_2020 <- df_2020[, c("PERSON_ID", "JOB_2020")]
# merge dfs
df2 <- merge(df_2019, df_2020, by = "PERSON_ID", all = TRUE)
df2$frequency <- 1
df2$JOB_2019 <- addNA(df2$JOB_2019)
df2$JOB_2020 <- addNA(df2$JOB_2020)
# aggregate frequency
aggregate(frequency ~ JOB_2019 + JOB_2020, data = df2, FUN = sum, na.action=na.pass)
JOB_2019 JOB_2020 frequency
1 Analyst Analyst 1
2 Coach Analyst 2
3 <NA> Gardener 1
Not R base but worked:
library(dplyr)
library(tidyr)
data %>%
filter(FT==1, YEAR %in% c(2019, 2020)) %>%
group_by(YEAR, JOB, PERSON_ID) %>%
tally() %>%
pivot_wider(names_from = YEAR, values_from = JOB) %>%
select(-PERSON_ID) %>%
group_by(`2019`, `2020`) %>%
summarise(n = n())
`2019` `2020` n
<chr> <chr> <int>
1 Analyst Analyst 1
2 Coach Analyst 2
3 NA Gardener 1

Datafram format transforming in R: how to with dates to years (each ID new row per year)

I’ve to transform my dataframe from the current to the new format (see image or structure below). I’ve no idea how I can accomplish that. I want a year for each ID, from 2013-2018 (so each ID has 6 rows, one for every year). The dates are the dates of living on that adress (entry date) and when they left that adress (end date). So each ID and year gives the zipcode and city they lived. The place the ID lived (for each year) should be were they lived the longest that year. I've already set the enddate to 31-12-2018 if they still live there (here showed with NA). Below a picture and the first 3 rows. Hopefully you guys can help me out!
Current format:
ID (1, 1, 2)
ZIPCODE (1234AB, 5678CD, 9012EF)
CITY (NEWYORK, LA, MIAMI)
ENTRY_DATE (2-1-2014, 13-3-2017, 10-11-2011)
END_DATE (13-5-2017, 21-12-2018, 6-9-2017)
New format:
ID (1, 1, 1, 1, 1, 1, 2)
YEAR (2013, 2014, 2015, 2016, 2017, 2018, 2013)
ZIPCODE (NA, 1234AB, 1234AB, 1234AB, 5678CD, 5678CD, 9012EF)
CITY (NA, NEWYORK, NEWYORK, NEWYORK, LA, LA, MIAMI)
See link below
Here is one approach.
First, create date intervals for each location from start to end dates. Using map2 and unnest you will create additional rows for each year.
Since you wish to include the location information where there were the greatest number of days for that calendar year, you could look at overlaps between 2 intervals: one interval is the calendar year, and the second interval is the ENTRY_DATE to END_DATE. For each year, you can filter by max(WEEKS) (or to ensure a single address per year, arrange in descending order by WEEKS and slice(1) --- or with latest tidyr consider slice_max). This will keep the row where there is the greatest number of weeks duration overlap between intervals.
The final complete will ensure you have rows for all years between 2013-2018.
library(tidyverse)
library(lubridate)
df %>%
mutate(ENTRY_END_INT = interval(ENTRY_DATE, END_DATE),
YEAR = map2(year(ENTRY_DATE), year(END_DATE), seq)) %>%
unnest(YEAR) %>%
mutate(YEAR_INT = interval(as.Date(paste0(YEAR, '-01-01')), as.Date(paste0(YEAR, '-12-31'))),
WEEKS = as.duration(intersect(ENTRY_END_INT, YEAR_INT))) %>%
group_by(ID, YEAR) %>%
arrange(desc(WEEKS)) %>%
slice(1) %>%
group_by(ID) %>%
complete(YEAR = seq(2013, 2018, 1)) %>%
arrange(ID, YEAR) %>%
select(-c(ENTRY_DATE, END_DATE, ENTRY_END_INT, YEAR_INT, WEEKS))
Output
# A tibble: 14 x 4
# Groups: ID [2]
ID YEAR ZIPCODE CITY
<dbl> <dbl> <chr> <chr>
1 1 2013 NA NA
2 1 2014 1234AB NEWYORK
3 1 2015 1234AB NEWYORK
4 1 2016 1234AB NEWYORK
5 1 2017 5678CD LA
6 1 2018 5678CD LA
7 2 2011 9012EF MIAMI
8 2 2012 9012EF MIAMI
9 2 2013 9012EF MIAMI
10 2 2014 9012EF MIAMI
11 2 2015 9012EF MIAMI
12 2 2016 9012EF MIAMI
13 2 2017 9012EF MIAMI
14 2 2018 NA NA
Data
df <- structure(list(ID = c(1, 1, 2), ZIPCODE = c("1234AB", "5678CD",
"9012EF"), CITY = c("NEWYORK", "LA", "MIAMI"), ENTRY_DATE = structure(c(16072,
17238, 15288), class = "Date"), END_DATE = structure(c(17299,
17896, 17415), class = "Date")), class = "data.frame", row.names = c(NA,
-3L))

How to replace numeric month with a month's full name

Change a column with month in number to the actual month name in full using tidyverse package. Please, bear in mind that even though this data has only four months here, my real dataset contains all actual month of the year.
I am new to tidyverse
mydata <- tibble(camp = c("Platinum 2018-03","Reboarding 2018","New Acct Auto Jul18", "Loan2019-4"),
Acct = c(1, 33, 6, 43),
Balance = c(222, 7744, 949, 123),
Month = c(1,4,6,8))
I expect the output to be
January, April, June, August etc. Thanks for your help.
R comes with a month.name vector which should be ok as long as you only need English names.
mydata %>% mutate(MonthName = month.name[Month])
giving:
# A tibble: 4 x 5
camp Acct Balance Month MonthName
<chr> <dbl> <dbl> <dbl> <chr>
1 Platinum 2018-03 1 222 1 January
2 Reboarding 2018 33 7744 4 April
3 New Acct Auto Jul18 6 949 6 June
4 Loan2019-4 43 123 8 August
Other Languages
If you need other languages use this code (or omit as.character to get ordered factor output):
library(lubridate)
Sys.setlocale(locale = "French")
mydata %>% mutate(MonthName = as.character(month(Month, label = TRUE, abbr = FALSE)))
giving:
# A tibble: 4 x 5
camp Acct Balance Month MonthName
<chr> <dbl> <dbl> <dbl> <chr>
1 Platinum 2018-03 1 222 1 janvier
2 Reboarding 2018 33 7744 4 avril
3 New Acct Auto Jul18 6 949 6 juin
4 Loan2019-4 43 123 8 août
A dplyr-lubridate solution:
mydata %>%
mutate(Month = lubridate::month(Month, label = TRUE, abbr = FALSE))
# A tibble: 4 x 4
camp Acct Balance Month
<chr> <dbl> <dbl> <ord>
1 Platinum 2018-03 1 222 January
2 Reboarding 2018 33 7744 April
3 New Acct Auto Jul18 6 949 June
4 Loan2019-4 43 123 August

Resources