This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 9 months ago.
I have R code that takes raw data where each patient entry is one row, and sums it up for a 'frequency' column for each department by date.
What I used here was the code:
department_totals <- as.data.frame(count(sheet, c("Date", "Department")))
To get:
Department
Date
Frequency
Dental
14 Mar
5
Dental
15 Mar
3
Dental
16 Mar
2
Cardio
14 Mar
4
Cardio
15 Mar
7
Cardio
16 Mar
8
Physio
14 Mar
1
Physio
16 Mar
2
But for this new project, I need it to be the actual individual departments by date, like this:
Date
Dental
Cardio
Physio
14 Mar
5
4
1
15 Mar
3
7
blank
16 Mar
2
8
2
And I can't figure out how to do it. I can group by department, but I'm trying to make each unique variable in 'Department' its own variable and then have the frequency of variables for each of those as a new column, ordered by date.
The intent here is to be able to make line graphs of how each of these departments' frequency of patients changes over time.
library(tidyverse)
df %>% pivot_wider(names_from = Department, values_from = Frequency)
# A tibble: 3 x 4
Date Dental Cardio Physio
<chr> <int> <int> <int>
1 14_Mar 5 4 1
2 15_Mar 3 7 NA
3 16_Mar 2 8 2
I want to extract the past 3 weeks' data for each household_id, channel combination. These past 3 weeks will be calculated from mala_fide_week and mala_fide_year and it will be less than that for each household_id and channel combination.
Below is the dataset:
for e.g. Household_id 100 for channel A: the mala_fide_week is 42 and mala_fide_year 2021. So past three records will be less than week 42 of the year 2021. This will be calculated from the week and year columns.
For the Household_id 100 and channel B combination, there are only two records much less than mala_fide_week and mala_fide_year.
For Household_id 101 and channel C, there are two years involved in 2019 and 2020.
The final dataset will be as below
Household_id 102 is not considered as week and year is greater than mala_fide_week and mala_fide_year.
I am trying multiple options but not getting through. Any help is much appreciated!
sample dataset:
data <- data.frame(Household_id =
c(100,100,100,100,100,100,101,101,101,101,102,102),
channel = c("A","A","A","A","B","B","C","C","c","C","D","D"),
duration = c(12,34,567,67,34,67,98,23,56,89,73,76),
mala_fide_week = c(42,42,42,42,42,42,5,5,5,5,30,30),
mala_fide_year =c(2021,2021,2021,2021,2021,2021,2020,2020,2020,2020,2021,2021),
week =c(36,37,38,39,22,23,51,52,1,2,38,39),
year = c(2021,2021,2021,2021,2020,2020,2019,2019,2020,2020,2021,2021))
I think you first need to obtain the absolute number of weeks week + year * 52, then filter accordingly. slice_tail gets the last three rows of each group.
library(dplyr)
data |>
filter(week + 52*year <= mala_fide_week + 52 *mala_fide_year) |>
group_by(Household_id, channel) |>
arrange(year, week, .by_group = TRUE) |>
slice_tail(n = 3)
# A tibble: 8 x 7
# Groups: Household_id, channel [3]
Household_id channel duration mala_fide_week mala_fide_year week year
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 100 A 34 42 2021 37 2021
2 100 A 567 42 2021 38 2021
3 100 A 67 42 2021 39 2021
4 100 B 34 42 2021 22 2020
5 100 B 67 42 2021 23 2020
6 101 C 23 5 2020 52 2019
7 101 C 56 5 2020 1 2020
8 101 C 89 5 2020 2 2020
I am trying to extract the team with the maximum number of wins each year in women's college basketball, and I am currently stuck with having the number of wins for each year for each team, and I want only the team with the maximum number of wins in each year.
winsbyyear <- WomenCBnewdf %>%
group_by(Year,Team)%>%
summarise(totalwinsyr = sum(Outcome))
Output currently looks like this, but I am expecting to see each year only once with the team with the maximum number of wins in the subsequent columns
Year Team totalwinsyr
<fct> <chr> <dbl>
1 2014 AbileneChristian 10
2 2014 AirForce 0
3 2014 Akron 18
4 2014 Alabama 10
5 2014 AlabamaAM 3
6 2014 AlabamaHuntsville 0
7 2014 AlabamaMobile 0
8 2014 AlabamaSt 15
9 2014 AlaskaAnchorage 1
10 2014 AlbanyNY 16
How to select the rows with maximum values in each group with dplyr?
I have already looked here but I could not find any resources to help with a group_by() with multiple values
Create a new column with the number of wins and then filter:
winsbyyear <- WomenCBnewdf %>%
group_by(Year,Team)%>%
mutate(totalwinsyr = sum(Outcome)) %>%
filter(totalwinsyr == max(totalwinsyr))
I have a column like this of the Data data.frame:
Month
3
6
9
3
6
9
3
6
9
...
I want to update 3 with March, 6 with Jume, 9 with September. I know how to do it if I have two months 3 and 10 for example with: mutate(Data, Month=if_else(Month==3,"March","October")) How can I do it for three months?
Expected output:
Month
March
June
September
March
June
September
March
June
September
...
You could just use your numerical month values to access month.name, which is R's built-in vector of month names, starting at index 1:
Data <- data.frame(Month=c(3,6,9))
Data$MonthName <- month.name[Data$Month]
Data
Month MonthName
1 3 March
2 6 June
3 9 September
customer_id transaction_id month year
1 3 7 2014
1 4 7 2014
2 5 7 2014
2 6 8 2014
1 7 8 2014
3 8 9 2015
1 9 9 2015
4 10 9 2015
5 11 9 2015
2 12 9 2015
I am well familiar with R basics. Any help will be appreciated.
the expected output should look like following:
month year number_unique_customers_added
7 2014 2
8 2014 0
9 2015 3
In the month 7 and year 2014, only customers_id 1 and 2 are present, so number of customers added is two. In the month 8 and year 2014, no new customer ids are added. So there should be zero customers added in this period. Finally in year 2015 and month 9, customer_ids 3,4 and 5 are the new ones added. So new number of customers added in this period is 3.
Using data.table:
require(data.table)
dt[, .SD[1,], by = customer_id][, uniqueN(customer_id), by = .(year, month)]
Explanation: We first remove all subsequent transactions of each customer (we're interested in the first one, when she is a "new customer"), and then count unique customers by each combination of year and month.
Using dplyr we can first create a column which indicates if a customer is duplicate or not and then we group_by month and year to count the new customers in each group.
library(dplyr)
df %>%
mutate(unique_customers = !duplicated(customer_id)) %>%
group_by(month, year) %>%
summarise(unique_customers = sum(unique_customers))
# month year unique_customers
# <int> <int> <int>
#1 7 2014 2
#2 8 2014 0
#3 9 2015 3