I am working on a document were I have a list of tests with dates.
I am trying to get R to pivot them horizontally, with the first test showing up first and the later tests showing up later.
However, when applying functions such as sort() or order() or even group_by(), R still sometimes shows an earlier test in the first column pivotted to horizontal.
I would think I should apply some sort of odering to the date column before numbering, so that R numbers the actual first test with the first numerical value with which I am pivotting.
Any idea as to how I would go about this?
My dataframe looks like this:
employee nr. date date2 test_1 test_2
x 2010/01/10 2010/01/05 positive positive
.................................
It should be so that the 2 dates are switched. The date is formatted as yyyy/mm/dd.
In the original dataset it was formatted as dd/mm/yy (you can see the format change in the code).
My expected output should look something like this:
employee nr. date date2 test_1 test_2
x 2010/01/05 2010/01/10 positive positive
#specify dates as variable "date" for R to recognize the variable
ct_clean$date <- as.Date(ct_clean$date, origin = "1899-30-12", format = "%d/%m/%y")
###assign number to duplicate value of employee number (if multiple tests -> multiple entries)
ct_numbered <- ct_clean %>% group_by(employee) %>% mutate(test_nr = row_number())
ct_clean %>% group_by(employee) %>% mutate(test_nr = 1:n())
ct_clean %>% group_by(employee) %>% mutate(test_nr = seq_len(n()))
ct_clean %>% group_by(employee) %>% mutate(test_nr = seq_along(employee))
#spread out multiple test for one individual horizontally
ct_wide <- ct_numbered %>% group_by(date) %>% pivot_wider(names_from = "test_nr",
values_from = "ct",
names_expand = TRUE, names_vary = "slowest")
#merging rows to include the test-data and test-number in the same row
ct_df <- ct_wide %>%
group_by(employee) %>%
mutate(id = seq_along(employee)) %>%
pivot_wider(names_from = id, values_from = date, names_prefix = "date") %>%
summarize_all(list(~ .[!is.na(.)][1]))
You can do this by using if_else():
library(tidyverse)
d <- structure(list(employee = c("x", "y", "z"), date1 = structure(c(14619,
14611, 14619), class = "Date"), date2 = structure(c(14614, 14614,
14614), class = "Date"), test_1 = c("positive", "negative", "negative"
), test_2 = c("positive", "positive", "positive")), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L), spec = structure(list(
cols = list(employee = structure(list(), class = c("collector_character",
"collector")), date1 = structure(list(format = ""), class = c("collector_date",
"collector")), date2 = structure(list(format = ""), class = c("collector_date",
"collector")), test_1 = structure(list(), class = c("collector_character",
"collector")), test_2 = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
d
#> # A tibble: 3 × 5
#> employee date1 date2 test_1 test_2
#> <chr> <date> <date> <chr> <chr>
#> 1 x 2010-01-10 2010-01-05 positive positive
#> 2 y 2010-01-02 2010-01-05 negative positive
#> 3 z 2010-01-10 2010-01-05 negative positive
d |>
mutate(date1 = if_else(d$date1 < d$date2, d$date1, d$date2),
date2 = if_else(d$date1 < d$date2, d$date2, d$date1),
test_1 = if_else(d$date1 < d$date2, d$test_1, d$test_2),
test_2 = if_else(d$date1 < d$date2, d$test_2, d$test_1)
)
#> # A tibble: 3 × 5
#> employee date1 date2 test_1 test_2
#> <chr> <date> <date> <chr> <chr>
#> 1 x 2010-01-05 2010-01-10 positive positive
#> 2 y 2010-01-02 2010-01-05 negative positive
#> 3 z 2010-01-05 2010-01-10 positive negative
Created on 2022-03-28 by the reprex package (v2.0.1)
I found the answer to my problem:
The argument had to be passed in the code for assigning numbers to the duplicates.
The original code looked like this:
ct_numbered <- ct_variant %>% group_by(date, umcg) %>% mutate(test_nr =
row_number())
ct_variant %>% group_by(date, umcg) %>% mutate(test_nr = 1:n())
ct_variant %>% group_by(date, umcg) %>% mutate(test_nr = seq_len(n()))
ct_variant %>% group_by(date, umcg) %>% mutate(test_nr = seq_along(umcg))
This is the solution I used:
ct_numbered <- ct_variant %>% arrange(ymd(ct_variant$date)) %>% group_by(date,
umcg) %>% mutate(test_nr = row_number())
ct_variant %>% group_by(date, umcg) %>% mutate(test_nr = 1:n())
ct_variant %>% group_by(date, umcg) %>% mutate(test_nr = seq_len(n()))
ct_variant %>% group_by(date, umcg) %>% mutate(test_nr = seq_along(umcg))
Related
I would like a little help with the following question: note that this code generates a coefficient from a date I have chosen, in this case for the day 03/07 (dmda), it gave a coefficient equal to 15.55. In this case, I would like to generate a new table, where there is a column with dates and the other column with the coefficient corresponding to those dates. For the column dates, only the dates of date2 after the day considered in date1 (28/06) will be considered, in this case, the dates are: 01/07, 02/07 and 03/07.
So the table will look like this:
Thanks!
library(dplyr)
library(tidyverse)
library(lubridate)
df1 <- structure(
list(date1 = c("2021-06-28","2021-06-28","2021-06-28","2021-06-28","2021-06-28",
"2021-06-28","2021-06-28","2021-06-28"),
date2 = c("2021-04-02","2021-04-03","2021-04-08","2021-04-09","2021-04-10","2021-07-01","2021-07-02","2021-07-03"),
Week= c("Friday","Saturday","Thursday","Friday","Saturday","Thursday","Friday","Monday"),
DR01 = c(14,11,14,13,13,14,13,16), DR02= c(14,12,16,17,13,12,17,14),DR03= c(19,15,14,13,13,12,11,15),
DR04 = c(15,14,13,13,16,12,11,19),DR05 = c(15,14,15,13,16,12,11,19),
DR06 = c(21,14,13,13,15,16,17,18),DR07 = c(12,15,14,14,19,14,17,18)),
class = "data.frame", row.names = c(NA, -8L))
dmda<-"2021-07-03"
datas<-df1 %>%
filter(date2 == ymd(dmda)) %>%
summarize(across(starts_with("DR"), sum)) %>%
pivot_longer(everything(), names_pattern = "DR(.+)", values_to = "val") %>%
mutate(name = as.numeric(name))
colnames(datas)<-c("Days","Numbers")
mod <- nls(Numbers ~ b1*Days^2+b2,start = list(b1 = 47,b2 = 0), data = datas)
coef(mod)[2]
> coef(mod)[2]
b2
15.55011
We may subset the data where the 'date2' is greater than date1', get the 'date2' column extracted as a vector. Loop over the dates with map (from purrr), do the transformation within the loop, build the nls and extract the coefficient in a tibble, and use _dfr to collapse the list to a single tibble
library(purrr)
library(dplyr)
dates <- subset(df1, date2 > date1, select = date2)$date2
map_dfr(dates, ~ {
datas <- df1 %>%
filter(date2 == ymd(.x)) %>%
summarize(across(starts_with("DR"), sum)) %>%
pivot_longer(everything(), names_pattern = "DR(.+)", values_to = "val") %>%
mutate(name = as.numeric(name))
colnames(datas)<-c("Days","Numbers")
mod <- nls(Numbers ~ b1*Days^2+b2,start = list(b1 = 47,b2 = 0), data = datas)
tibble(dates = .x, coef = coef(mod)[2])
}) %>%
mutate(dates = format(ymd(dates), "%d/%m/%Y"))
# A tibble: 3 × 2
dates coef
<chr> <dbl>
1 01/07/2021 12.2
2 02/07/2021 12.4
3 03/07/2021 15.6
I have the following problem: How can I generate the table only until the date 03/07, instead of until 05/07.
Executable code below:
library(purrr)
library(dplyr)
library(tidyverse)
library(lubridate)
df1 <- structure(
list(date1 = c("2021-06-28","2021-06-28","2021-06-28","2021-06-28","2021-06-28",
"2021-06-28","2021-06-28","2021-06-28","2021-06-28","2021-06-28"),
date2 = c("2021-04-02","2021-04-03","2021-04-08","2021-04-09","2021-04-10","2021-07-01","2021-07-02","2021-07-03","2021-07-04","2021-07-05"),
Week= c("Friday","Saturday","Thursday","Friday","Saturday","Thursday","Friday","Saturday","Sunday","Monday"),
DR01 = c(14,11,14,13,13,14,13,16,15,12), DR02= c(14,12,16,17,13,12,17,14,13,13),DR03= c(19,15,14,13,13,12,11,15,13,13),
DR04 = c(15,14,13,13,16,12,11,19,11,13),DR05 = c(15,14,15,13,16,12,11,19,14,13),
DR06 = c(21,14,13,13,15,16,17,18,12,11),DR07 = c(12,15,14,14,19,14,17,18,14,13)),
class = "data.frame", row.names = c(NA, -10L))
dates <- subset(df1, date2 > date1, select = date2)$date2
map_dfr(dates, ~ {
datas <- df1 %>%
filter(date2 == ymd(.x)) %>%
summarize(across(starts_with("DR"), sum)) %>%
pivot_longer(everything(), names_pattern = "DR(.+)", values_to = "val") %>%
mutate(name = as.numeric(name))
colnames(datas)<-c("Days","Numbers")
mod <- nls(Numbers ~ b1*Days^2+b2,start = list(b1 = 47,b2 = 0), data = datas)
tibble(dates = .x, coef = coef(mod)[2])
}) %>%
mutate(dates = format(ymd(dates), "%d/%m/%Y"))
dates coef
<chr> <dbl>
1 01/07/2021 12.2
2 02/07/2021 12.4
3 03/07/2021 15.6
4 04/07/2021 13.3
5 05/07/2021 12.7
In the subset step, add one more condition with &
dates <- subset(df1, date2 > date1 & date2 <= "2021-07-03", select = date2)$date2
I have found this dataframe in an Excel file, very disorganized. This is just a sample of a bigger dataset, with many jobs.
df <- data.frame(
Job = c("Frequency", "Driver", "Operator"),
Gloves = c("Daily", 1,2),
Aprons = c("Weekly", 2,0),
)
Visually it's
I need it to be in this format, something that I can work in a database:
df <- data.frame(
Job = c("Driver", "Driver", "Operator", "Operator"),
Frequency= c("Daily", "Weekly", "Daily", "Weekly"),
Item= c("Gloves", "Aprons", "Gloves", "Aprons"),
Quantity= c(1,2,2,0)
)
Visually it's
Any thoughts in how do we have to manipulate the data? I have tried without any luck.
We could use tidyverse methods by doing this in three steps
Remove the first row - slice(-1), reshape to 'long' format (pivot_longer)
Keep only the first row - slice(1), reshape to 'long' format (pivot_longer)
Do a join with both of the reshaped datasets
library(dplyr)
library(tidyr)
df %>%
slice(-1) %>%
pivot_longer(cols = -Job, names_to = 'Item',
values_to = 'Quantity') %>%
left_join(df %>%
slice(1) %>%
pivot_longer(cols= -Job, values_to = 'Frequency',
names_to = 'Item') %>%
select(-Job) )
-output
# A tibble: 4 x 4
Job Item Quantity Frequency
<chr> <chr> <chr> <chr>
1 Driver Gloves 1 Daily
2 Driver Aprons 2 Weekly
3 Operator Gloves 2 Daily
4 Operator Aprons 0 Weekly
data
df <- data.frame(
Job = c("Frequency", "Driver", "Operator"),
Gloves = c("Daily", 1,2),
Aprons = c("Weekly", 2,0))
How can I melt/reshape/rotate my table from this:
profit lost obs fc.mape
mean 3724.743 804.1835 427.8899 0.21037696
std.dev 677.171 406.1391 372.5544 0.06072549
To this:
mean std.dev
profit x
lost x
obs x
fc.mape x
Here is a tidyverse solution. I find it too complicated but it works. Maybe there are simpler ones.
library(dplyr)
library(tidyr)
df1 %>%
mutate(id = row.names(.)) %>%
pivot_longer(
cols = -id,
names_to = "stat"
) %>%
group_by(id) %>%
mutate(n = row_number()) %>%
ungroup() %>%
pivot_wider(
id_cols = c(n, stat),
names_from = id,
values_from = value
) %>%
select(-n)
## A tibble: 4 x 3
# stat mean std.dev
# <chr> <dbl> <dbl>
#1 profit 3725. 677.
#2 lost 804. 406.
#3 obs 428. 373.
#4 fc.mape 0.210 0.0607
Data
df1 <-
structure(list(profit = c(3724.743, 677.171), lost = c(804.1835,
406.1391), obs = c(427.8899, 372.5544), fc.mape = c(0.21037696,
0.06072549)), class = "data.frame", row.names = c("mean", "std.dev"))
I have a df that looks like the following:
ID DATE
12 10-20-20
12 10-22-20
10 10-15-20
9 10-10-20
11 11-01-20
7 11-02-20
I would like to group by month and then create a column for unique id count and repeat id count like below:
MONTH Unique_Count Repeat_Count
10-1-20 2 2
11-1-20 2 0
I am able to get the date down to the first of the month and group by ID but I am not sure how to count unique instances within the months.
df %>%
mutate(month = floor_date(as.Date(DATE), "month")) %>%
group_by(ID) %>%
mutate(count = n())
Are you perhaps looking for:
df %>%
mutate(month = strftime(floor_date(as.Date(DATE, "%m-%d-%y"), "month"),
"%m-%d-%y")) %>%
group_by(month) %>%
summarize(unique_count = length(which(table(ID) == 1)),
repeat_count = sum(table(ID)[(which(table(ID) > 1))]))
#> # A tibble: 2 x 3
#> month unique_count repeat_count
#> <chr> <int> <int>
#> 1 10-01-20 2 2
#> 2 11-01-20 2 0
Here's a shot at it:
library(lubridate)
library(dplyr)
dates <- as.Date(c("2020-10-15", "2020-10-15", "2020-11-16", "2020-11-16", "2020-11-16"))
ids <- c(12, 12, 13, 13, 14)
df <- data.frame(dates, ids)
duplicates <- df %>%
group_by(dates_floored = floor_date(dates, unit = "month"), ids) %>%
mutate(duplicate_count = n()) %>%
filter(duplicate_count > 1) %>%
distinct(ids, .keep_all = TRUE)
uniques <- df %>%
group_by(dates_floored = floor_date(dates, unit = "month"), ids) %>%
mutate(unique_count = n()) %>%
filter(unique_count < 2) %>%
distinct(ids, .keep_all = TRUE)
df_cleaned <- full_join(uniques, duplicates, by = c("ids", "dates", "dates_floored")) %>%
group_by(dates_floored) %>%
summarize(count_duplicates = sum(duplicate_count, na.rm = TRUE),
count_unique = sum(unique_count, na.rm = TRUE))
df_cleaned