Bring excel-table in tidy format

Bring excel-table in tidy format - r

I have some struggles converting the following data (from an Excel-sheet) into a tidy format:
input <- structure(list(...11 = c(
NA, NA, "<1000", ">=1000 and <2000",
"2000", ">2000 and < 3000", ">=3000"
), ...13 = c(
"male", "female",
NA, NA, NA, NA, NA
), ...14 = c(
"<777", "<555", "0.3", "0.1",
"0.15", "0.13", "0.15"
), ...15 = c(
"888-999", "555-999", "0.23",
"0.21", "0", "0.21", "0.36"
), ...16 = c(
"556-899", "1020-1170",
"0.13", "0.29", "0.7", "0.8", "0.2"
), ...17 = c(
">960", ">11000",
"0.58", "0.31", "0.22", "0.65", "0.7"
)), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"))
# A tibble: 7 × 6
...11 ...13 ...14 ...15 ...16 ...17
<chr> <chr> <chr> <chr> <chr> <chr>
1 NA male <777 888-999 556-899 >960
2 NA female <555 555-999 1020-1170 >11000
3 <1000 NA 0.3 0.23 0.13 0.58
4 >=1000 and <2000 NA 0.1 0.21 0.29 0.31
5 2000 NA 0.15 0 0.7 0.22
6 >2000 and < 3000 NA 0.13 0.21 0.8 0.65
7 >=3000 NA 0.15 0.36 0.2 0.7
I would like to bring it into the following structure:
output <- tibble::tribble(
~gender, ~x, ~y, ~share,
"male", "<777", "<1000", 0.3,
"female", "<555", "<1000", 0.3,
"male", "<777", ">=1000 and <2000", 0.1,
"female", "<555", ">=1000 and <2000", 0.1,
)
# A tibble: 4 × 4
gender x y share
<chr> <chr> <chr> <dbl>
1 male <777 <1000 0.3
2 female <555 <1000 0.3
3 male <777 >=1000 and <2000 0.1
4 female <555 >=1000 and <2000 0.1
Any hints are much appreciated!

As outlined in the comments, here's a suggested approach:
Import the excel sheet twice using readxl's read_excel using the skip argument:
library(readxl)
df1 <- read_excel(file = "yourfile.xlsx", skip = 2)
df2 <- read_excel(file = "yourfile.xlsx", skip = 1)
That should give you (note X1 might be called ...1):
df1 <- read_table("NA male <777 888-999 556-899 >960
<1000 NA 0.3 0.23 0.13 0.58
>=1000and<2000 NA 0.1 0.21 0.29 0.31
2000 NA 0.15 0 0.7 0.22
>2000and<3000 NA 0.13 0.21 0.8 0.65
>=3000 NA 0.15 0.36 0.2 0.7")
df2 <- read_table("NA female <555 555-999 1020-1170 >11000
<1000 NA 0.3 0.23 0.13 0.58
>=1000and<2000 NA 0.1 0.21 0.29 0.31
2000 NA 0.15 0 0.7 0.22
>2000and<3000 NA 0.13 0.21 0.8 0.65
>=3000 NA 0.15 0.36 0.2 0.7")
Then do a little wrangling; most importantly turn into a long format:
library(dplyr)
library(tidyr)
df1 <- df1 |>
select(-male) |>
rename(y = X1) |>
mutate(gender = "male") |>
pivot_longer(-c("gender", "y"), names_to = "x", values_to = "share")
df2 <- df2 |>
select(-female) |>
rename(y = X1) |>
mutate(gender = "female") |>
pivot_longer(-c("gender", "y"), names_to = "x", values_to = "share")
And voila, a tidy frame:
bind_rows(df1, df2) |> arrange(y)
Output:
# A tibble: 40 × 4
y gender x share
<chr> <chr> <chr> <dbl>
1 <1000 male <777 0.3
2 <1000 male 888-999 0.23
3 <1000 male 556-899 0.13
4 <1000 male >960 0.58
5 <1000 female <555 0.3
6 <1000 female 555-999 0.23
7 <1000 female 1020-1170 0.13
8 <1000 female >11000 0.58
9 >=1000and<2000 male <777 0.1
10 >=1000and<2000 male 888-999 0.21
# … with 30 more rows

It's a bit unclear, but I think you'd need to do something like this
df <- input[3:nrow(input),]
input <- input[1:2, 2:3]
t <- input[rep(1:nrow(input), nrow(df)),]
s <- df[rep(1:nrow(df), 2), ]
t <- cbind(t,s)
, and repeat as needed if you need this for multiple columns.

Related

Convert the factors of a variable into the columns of the dataframe

I have a dataframe that looks like this
Concentration Value
Low 0.21
Medium 0.85
Low 0.10
Low 0.36
High 2.21
Medium 0.50
High 1.85
I would like to transform it into a dataframe where the column names are the factors of the variable:
Low Medium High
0.21 0.85 2.21
0.10 0.50 1.85
0.367
I've tried using pivot_wider, however, the values for each of the factors are stored as vectors.
Low Medium High
c(0.21,...) c(0.87 ,...) c(1.47 ,...)

Use an id variable for rows by group:
dat %>%
group_by(Concentration) %>%
mutate(id = row_number()) %>%
pivot_wider(names_from = Concentration, values_from = Value)
id Low Medium High
<int> <dbl> <dbl> <dbl>
1 1 0.21 0.85 2.21
2 2 0.1 0.5 1.85
3 3 0.36 NA NA

Using unstack from base R
mx <- max(table(df1$Concentration))
data.frame(lapply(unstack(df1, Value ~ Concentration), `length<-`, mx))
High Low Medium
1 2.21 0.21 0.85
2 1.85 0.10 0.50
3 NA 0.36 NA
data
df1 <- structure(list(Concentration = c("Low", "Medium", "Low", "Low",
"High", "Medium", "High"), Value = c(0.21, 0.85, 0.1, 0.36, 2.21,
0.5, 1.85)), class = "data.frame", row.names = c(NA, -7L))

Make connections between two datasets

I would like to make a connection between the x and df2 datasets. Notice that the dataset x, I have a percentage value, which in this case for the day 03-01-2021 is 0.1 and for the days 01-02-2021 and 01-01-2022 it is 0.45. So from that information, I know the percentage value for 03-01-2021 is 0.1, so this value falls into category I of my dataset df2 (since the values range from 0.1 to 0.2). As for the days 02-01-2021 and 01-01-2022, they correspond to category F of the df2,since the values range from 0.4 to 0.5. So, I would like to generate an output table as follows:
library(dplyr)
df1<- structure(
list(date2= c("01-01-2022","01-01-2022","03-01-2021","03-01-2021","01-02-2021","01-02-2021"),
Category= c("ABC","CDE","ABC","CDE","ABC","CDE"),
coef= c(5,4,0,2,4,5)),
class = "data.frame", row.names = c(NA, -6L))
x<-df1 %>%
group_by(date2) %>%
summarize(across("coef", sum),.groups = 'drop')%>%
arrange(date2 = as.Date(date2, format = "%d-%m-%Y"))
number<-20
x$Percentage<-x$coef/number
date2 coef Percentage
<chr> <dbl> <dbl>
1 03-01-2021 2 0.1
2 01-02-2021 9 0.45
3 01-01-2022 9 0.45
df2 <- structure(
list(
Category = c("A", "B", "C", "D",
"E", "F", "G", "H", "I", "J"),
From = c(0.9,
0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0),
Until = c(
1,
0.8999,
0.7999,
0.6999,
0.5999,
0.4999,
0.3999,
0.2999,
0.1999,
0.0999
),
`1 Val` = c(
2222,
2017.8,
1793.6,
1621.5,
1522.4,
1457.3,
1325.2,
1229.15,
1223.1,
1177.05
),
`2 Val` = c(3200, 2220, 2560,
2200, 2220, 2080, 1220, 1240, 1720, 1620),
`3 Val` = c(
4665,
4122.5,
3732,
3498.75,
3265.5,
3032.25,
2799,
2682.375,
2565.75,
2449.125
),
`4 Val` = c(
6112,
5222.8,
4889.6,
4224,
4278.4,
3972.8,
3667.2,
3224.4,
3361.6,
3222.8
)
),
row.names = c(NA,-10L),
class = c("tbl_df",
"tbl", "data.frame")
)
Category From Until 1 Val 2 Val 3 Val 4 Val
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 0.9 1 2222 3200 4665 6112
2 B 0.8 0.900 2018 2220 4122 5223
3 C 0.7 0.800 1794 2560 3732 4890
4 D 0.6 0.700 1622 2200 3499 4224
5 E 0.5 0.600 1522 2220 3266 4278
6 F 0.4 0.500 1457 2080 3032 3973
7 G 0.3 0.400 1325 1220 2799 3667
8 H 0.2 0.300 1229 1240 2682 3224
9 I 0.1 0.200 1223 1720 2566 3362
10 J 0 0.0999 1177 1620 2449 3223

Using tidyverse, we do a rowwise on the 'x' dataset, slice the rows of 'df2' where the 'Percentage' falls between the 'From' and 'Until', and unpack the data.frame/tibble column
library(dplyr)
library(tidyr)
x %>%
rowwise %>%
mutate(out = df2 %>%
slice(which(Percentage>= From &
Percentage <= Until)[1]) %>%
select(-(1:3)) ) %>%
ungroup %>%
unpack(out)
-output
# A tibble: 3 × 7
date2 coef Percentage `1 Val` `2 Val` `3 Val` `4 Val`
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 03-01-2021 2 0.1 1223. 1720 2566. 3362.
2 01-02-2021 9 0.45 1457. 2080 3032. 3973.
3 01-01-2022 9 0.45 1457. 2080 3032. 3973.
Or this could be done with a non-equi join
library(data.table)
nm1 <- names(df2)[endsWith(names(df2), 'Val')]
setDT(x)[setDT(df2), (nm1) := mget(nm1),
on = .(Percentage >= From, Percentage <= Until)]
-output
> x
date2 coef Percentage 1 Val 2 Val 3 Val 4 Val
1: 03-01-2021 2 0.10 1223.1 1720 2565.75 3361.6
2: 01-02-2021 9 0.45 1457.3 2080 3032.25 3972.8
3: 01-01-2022 9 0.45 1457.3 2080 3032.25 3972.8

Convert cox regression table to forest plot

I want to convert a cox table to forest plot as showed below. Unforunatly I’ve lost my original data (coxph object) so I have to use the data from the table. Data below are just examples:
Desired output:
Reprex for the two tables:
GRP1<-tibble::tribble(
~Variable, ~Level, ~Number, ~`HR.(univariable)`, ~`HR.(multivariable)`,
"Sex", "Female", "2204 (100.0)", NA, NA,
NA, "Male", "2318 (100.0)", "1.13 (0.91-1.40, p=0.265)", "1.13 (0.91-1.40, p=0.276)",
"Score", "1", "2401 (100.0)", NA, NA,
NA, "1-2", "1637 (100.0)", "1.49 (1.19-1.87, p=0.001)", "1.15 (0.90-1.47, p=0.250)",
NA, "3-4", "412 (100.0)", "1.71 (1.14-2.56, p=0.010)", "1.09 (0.71-1.67, p=0.710)",
NA, ">=5", "42 (100.0)", "1.67 (0.53-5.21, p=0.381)", "0.96 (0.30-3.05, p=0.943)",
"Treatment", "A", "1572 (100.0)", NA, NA,
NA, "B", "2951 (100.0)", "1.74 (1.26-2.40, p=0.001)", "1.53 (1.09-2.13, p=0.013)"
)
GRP2<-tibble::tribble(
~Variable, ~Level, ~Number, ~`HR.(univariable)`, ~`HR.(univariable)`,
"Sex", "Female", "2204 (100.0)", NA, NA,
NA, "Male", "2318 (100.0)", "1.70 (1.36-2.13, p<0.001)", "1.62 (1.28-2.04, p<0.001)",
"Score", "1", "2401 (100.0)", NA, NA,
NA, "1-2", "1637 (100.0)", "2.76 (1.21-6.29, p=0.016)", "2.69 (1.18-6.13, p=0.019)",
NA, "3-4", "412 (100.0)", "5.11 (2.26-11.58, p<0.001)", "4.46 (1.95-10.23, p<0.001)",
NA, ">=5", "42 (100.0)", "5.05 (2.19-11.64, p<0.001)", "4.08 (1.73-9.59, p=0.001)",
"Treatment", "A", "1572 (100.0)", NA, NA,
NA, "B", "2951 (100.0)", "1.48 (1.16-1.88, p=0.001)", "1.23 (0.95-1.59, p=0.114)"
)
Is it doable?
Best regards, H

The difficult thing about this task is not making the plot; it is converting your data from a bunch of text strings into a single long-format data frame that can be used for plotting. This involves using regular expressions to capture the appropriate number for each column, pivoting the result, then repeating that process for the second data frame before binding the two frames together. This is unavoidably ugly and complicated, but that is one of the reasons why having data stored in the correct format is so important.
Anyway, the following code performs the necessary operations:
library(dplyr)
wrangler <- function(data){
grp <- as.character(match.call()$data)
data %>%
tidyr::fill(Variable) %>%
mutate(Variable = paste(Variable, Level),
Number = as.numeric(gsub("^(\\d+).*$", "\\1", Number)),
univariable_HR = as.numeric(gsub("^((\\d+|\\.)+).*$", "\\1", `HR.(univariable)`)),
univariable_lower = as.numeric(gsub("^.+? \\((.+?)-.*$", "\\1", `HR.(univariable)`)),
univariable_upper = as.numeric(gsub("^.+?-(.+?),.*$", "\\1", `HR.(univariable)`)),
univariable_p = gsub("^.+?p=*(.+?)\\).*$", "\\1", `HR.(univariable)`),
multivariable_HR = as.numeric(gsub("^((\\d+|\\.)+).*$", "\\1", `HR.(multivariable)`)),
multivariable_lower = as.numeric(gsub("^.+? \\((.+?)-.*$", "\\1", `HR.(multivariable)`)),
multivariable_upper = as.numeric(gsub("^.+?-(.+?),.*$", "\\1", `HR.(multivariable)`)),
multivariable_p = gsub("^.+?p=*(.+?)\\).*$", "\\1", `HR.(multivariable)`),
group = grp) %>%
filter(!is.na(univariable_HR)) %>%
select(-Level, -`HR.(multivariable)`, - `HR.(univariable)`) %>%
tidyr::pivot_longer(cols = -(c(1:2, 11)), names_sep = "_", names_to = c("type", ".value"))
}
df <- rbind(wrangler(GRP1), wrangler(GRP2))
This now gives us the data in the correct format for plotting. Each row will become a single pointrange in our plot, so it needs a hazard ratio, a lower confidence bound, an upper confidence bound, a variable label, the type (multivariable versus univariable), and the group it originally came from (GRP1 or GRP2):
df
#> # A tibble: 20 x 8
#> Variable Number group type HR lower upper p
#> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 Sex Male 2318 GRP1 univariable 1.13 0.91 1.4 0.265
#> 2 Sex Male 2318 GRP1 multivariable 1.13 0.91 1.4 0.276
#> 3 Score 1-2 1637 GRP1 univariable 1.49 1.19 1.87 0.001
#> 4 Score 1-2 1637 GRP1 multivariable 1.15 0.9 1.47 0.250
#> 5 Score 3-4 412 GRP1 univariable 1.71 1.14 2.56 0.010
#> 6 Score 3-4 412 GRP1 multivariable 1.09 0.71 1.67 0.710
#> 7 Score >=5 42 GRP1 univariable 1.67 0.53 5.21 0.381
#> 8 Score >=5 42 GRP1 multivariable 0.96 0.3 3.05 0.943
#> 9 Treatment B 2951 GRP1 univariable 1.74 1.26 2.4 0.001
#> 10 Treatment B 2951 GRP1 multivariable 1.53 1.09 2.13 0.013
#> 11 Sex Male 2318 GRP2 univariable 1.7 1.36 2.13 <0.001
#> 12 Sex Male 2318 GRP2 multivariable 1.62 1.28 2.04 <0.001
#> 13 Score 1-2 1637 GRP2 univariable 2.76 1.21 6.29 0.016
#> 14 Score 1-2 1637 GRP2 multivariable 2.69 1.18 6.13 0.019
#> 15 Score 3-4 412 GRP2 univariable 5.11 2.26 11.6 <0.001
#> 16 Score 3-4 412 GRP2 multivariable 4.46 1.95 10.2 <0.001
#> 17 Score >=5 42 GRP2 univariable 5.05 2.19 11.6 <0.001
#> 18 Score >=5 42 GRP2 multivariable 4.08 1.73 9.59 0.001
#> 19 Treatment B 2951 GRP2 univariable 1.48 1.16 1.88 0.001
#> 20 Treatment B 2951 GRP2 multivariable 1.23 0.95 1.59 0.114
Now that we have the data in this format, the plot itself is straightforward:
library(ggplot2)
ggplot(df, aes(HR, Variable)) +
geom_pointrange(aes(xmin = lower, xmax = upper, colour = type),
position = position_dodge(width = 0.5)) +
facet_grid(group~., switch = "y") +
geom_vline(xintercept = 0, linetype = 2) +
theme_bw() +
theme(strip.placement = "outside",
strip.text= element_text(angle = 180),
strip.background = element_blank(),
panel.spacing = unit(0, "mm"))
Created on 2021-11-01 by the reprex package (v2.0.0)

R - Extracting rows of max/min values in a dataframe containing strings, NA and groups

I want to find a way to extract the n rows that contain the Top results (min and max) in a dataframe. The problem is that this dataframe contains strings and NA and also groups. Also if the top results are in the same row, I still need exactly n rows, so being in the same row counts just as 1 result.
V01_Code V01_Corr V01_Lag V02_Code V02_Corr V02_Lag V03_Code V03_Corr V03_Lag V04_Code V04_Corr V04_Lag Group
1 AMI 0.63 L7 <NA> NA <NA> <NA> NA <NA> <NA> NA <NA> B
2 CII -0.61 L7 CMI -0.53 L7 <NA> NA <NA> <NA> NA <NA> A
3 AFI 0.51 L7 <NA> NA <NA> <NA> NA <NA> <NA> NA <NA> A
4 AII 0.52 L7 BII 0.62 L4 BMI 0.60 L7 III 0.58 L4 B
5 BII 0.52 L7 IIA 0.74 L6 III 0.51 L7 IMA 0.75 L6 A
6 AII 0.58 L6/L7 BII 0.69 L4 BMI 0.70 L7 IIA 0.57 L4 A
7 IIA 0.58 L6 IMA 0.59 L6 IMI 0.52 L6 <NA> NA <NA> B
8 IMU 0.52 L6 <NA> NA <NA> <NA> NA <NA> <NA> NA <NA> A
I tried several versions like this:
aggregate(. ~ Group, df, function(x) max(head(sort(x),2),na.rm=T))
But it doesnt seem to work! As output I want a dataframe of the rows (for example 2 rows here) that contain the highest and lowest values. So in this case 0.75 in row 5 is the highest value, 2nd highest is in the same row which doesnt count then. The 2nd highest in any other row would be 0.7 in row 6. So for my top 2 result of max values I want:
V01_Code V01_Corr V01_Lag V02_Code V02_Corr V02_Lag V03_Code V03_Corr V03_Lag V04_Code V04_Corr V04_Lag Group
5 BII 0.52 L7 IIA 0.74 L6 III 0.51 L7 IMA 0.75 L6 A
6 AII 0.58 L6/L7 BII 0.69 L4 BMI 0.70 L7 IIA 0.57 L4 A
1 AMI 0.63 L7 <NA> NA <NA> <NA> NA <NA> <NA> NA <NA> B
4 AII 0.52 L7 BII 0.62 L4 BMI 0.60 L7 III 0.58 L4 B
n in this case would be 2, so the 2 rows that contain the maximum values for each group.
Here is my dataframe
structure(list(V01_Code = c("AMI", "CII", "AFI", "AII", "BII",
"AII", "IIA", "IMU"), V01_Corr = c(0.63, -0.61, 0.51, 0.52, 0.52,
0.58, 0.58, 0.52), V01_Lag = c("L7", "L7", "L7", "L7", "L7",
"L6/L7", "L6", "L6"), V02_Code = c(NA, "CMI", NA, "BII", "IIA",
"BII", "IMA", NA), V02_Corr = c(NA, -0.53, NA, 0.62, 0.74, 0.69,
0.59, NA), V02_Lag = c(NA, "L7", NA, "L4", "L6", "L4", "L6",
NA), V03_Code = c(NA, NA, NA, "BMI", "III", "BMI", "IMI", NA),
V03_Corr = c(NA, NA, NA, 0.6, 0.51, 0.7, 0.52, NA), V03_Lag = c(NA,
NA, NA, "L7", "L7", "L7", "L6", NA), V04_Code = c(NA, NA,
NA, "III", "IMA", "IIA", NA, NA), V04_Corr = c(NA, NA, NA,
0.58, 0.75, 0.57, NA, NA), V04_Lag = c(NA, NA, NA, "L4",
"L6", "L4", NA, NA), Group = c("B", "A", "A", "B", "A", "A",
"B", "A")), row.names = c("1", "2", "3", "4", "5", "6",
"7", "8"), class = "data.frame")

Here is an option with reshaping i.e. create a row sequence (row_number) column, reshape from wide to long with pivot_longer, arrange the rows by 'Group' and the 'value' column in descending order, then filter the first 'n' unique 'rn' - row_number column, ungroup and reshape back to 'wide' format with pivot_wider
library(dplyr)
library(tidyr)
df1 %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = ends_with("Corr"), names_to = 'Corr') %>%
arrange(Group, desc(value)) %>%
group_by(Group) %>%
filter(rn %in% head(unique(rn), 2)) %>%
ungroup %>%
select(-rn) %>%
pivot_wider(names_from = Corr, values_from = value)
-output
# A tibble: 4 x 13
V01_Code V01_Lag V02_Code V02_Lag V03_Code V03_Lag V04_Code V04_Lag Group V04_Corr V02_Corr V03_Corr V01_Corr
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 BII L7 IIA L6 III L7 IMA L6 A 0.75 0.74 0.51 0.52
2 AII L6/L7 BII L4 BMI L7 IIA L4 A 0.57 0.69 0.7 0.58
3 AMI L7 <NA> <NA> <NA> <NA> <NA> <NA> B NA NA NA 0.63
4 AII L7 BII L4 BMI L7 III L4 B 0.58 0.62 0.6 0.52

Using pmap with c(...) part 2

I have been exploring the various application of using pmap function and its variations recently and I am particularly interested in using c(...) to pass all the arguments into. The following data set belongs to another question that we discussed earlier today with a number of very knowledgeable users.
We were supposed to repeat the values in weight column based on values in Days column along their respective rows to get the following output:
df <- tribble(
~Name, ~School, ~Weight, ~Days,
"Antoine", "Bach", 0.03, 5,
"Antoine", "Ken", 0.02, 7,
"Barbara", "Franklin", 0.04, 3
)
Output:
df %>%
mutate(map2_dfr(Weight, Days, ~ set_names(rep(.x, .y), 1:.y))) %>%
select(-c(Weight, Days))
# A tibble: 3 x 9
Name School `1` `2` `3` `4` `5` `6` `7`
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
2 Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
3 Barbara Franklin 0.04 0.04 0.04 NA NA NA NA
My question is this output is achievable through various solutions but the following one proposed by one of the contributors caught my attention. I would like to know how I could rewrite it by means of c(...)
# This is not my code and it works:
pmap_dfr(df, function(Weight, Days, ...) c(..., setNames(rep(Weight, Days), 1:Days)))
# And I can also rewrite it in the following way which also works:
df %>%
mutate(data = pmap(list(Weight, Days), ~ setNames(rep(.x, .y), 1:.y))) %>%
unnest_wider(data)
But I would like to know why any of these doesn't work:
df %>%
mutate(pmap_dfr(., ~ c(..., setNames(rep(Weight, Days), 1:Days))))
df %>%
pmap_dfr(., ~ c(..., setNames(rep(Weight, Days), 1:Days)))
Thank you very much in advance and so sorry for the long description.

The issue seems to be mixing the custom anonymous/lambda function (function(Weight, Days, ...) - where the arguments are named as the same as the column name) with the default lambda function (~ - where the arguments are .x, .y if only two elements or if more than two - ..1, ..2, ..3 etc). In the OP's code
library(dplyr)
library(purrr)
df %>%
mutate(pmap_dfr(., ~ c(..., setNames(rep(Weight, Days), 1:Days))))
The 'Weight', 'Days' returns the full column values from original dataset and not from rows. If we want to still make use of the above command, we need to convert the data captured in each row to a tibble and use with
df %>%
pmap_dfr(., ~ with(as_tibble(list(...)),
setNames(rep(Weight, Days), seq_len(Days))))
# A tibble: 3 x 7
# `1` `2` `3` `4` `5` `6` `7`
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 0.03 0.03 0.03 0.03 0.03 NA NA
#2 0.02 0.02 0.02 0.02 0.02 0.02 0.02
#3 0.04 0.04 0.04 NA NA NA NA
If we want the other columns,
df %>%
pmap_dfr(., ~ c(list(...)[-(3:4)], with(as_tibble(list(...)),
setNames(rep(Weight, Days), seq_len(Days)))))
# A tibble: 3 x 9
# Name School `1` `2` `3` `4` `5` `6` `7`
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
#2 Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
#3 Barbara Franklin 0.04 0.04 0.04 NA NA NA NA
Or use rowwise
library(tidyr)
df %>%
rowwise %>%
mutate(out = list(setNames(rep(Weight, Days), seq_len(Days)))) %>%
ungroup %>%
unnest_wider(c(out)) %>%
select(-Weight, -Days)
# A tibble: 3 x 9
# Name School `1` `2` `3` `4` `5` `6` `7`
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
#2 Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
#3 Barbara Franklin 0.04 0.04 0.04 NA NA NA NA

This may not make much value addition, but may be helpful for understanding things in lambda functions.
pmap_df(df, ~ c(setNames(c(..1, ..2), names(df[1:2])), setNames(rep(..3, ..4), seq_len(..4))))
# A tibble: 3 x 9
Name School `1` `2` `3` `4` `5` `6` `7`
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
2 Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
3 Barbara Franklin 0.04 0.04 0.04 NA NA NA NA
pmap_df only is sufficient and pmap_dfr may be redundant
you can pass specific arguments like ..1, ..2, etc.
Or this will also do
pmap_df(df, ~ c(list(...)[1:2], setNames(rep(..3, ..4), seq_len(..4))))
# A tibble: 3 x 9
Name School `1` `2` `3` `4` `5` `6` `7`
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
2 Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
3 Barbara Franklin 0.04 0.04 0.04 NA NA NA NA

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Bring excel-table in tidy format - r

It's a bit unclear, but I think you'd need to do something like this df <- input[3:nrow(input),] input <- input[1:2, 2:3] t <- input[rep(1:nrow(input), nrow(df)),] s <- df[rep(1:nrow(df), 2), ] t <- cbind(t,s) , and repeat as needed if you need this for multiple columns.

Related

Convert the factors of a variable into the columns of the dataframe

Make connections between two datasets

Convert cox regression table to forest plot

R - Extracting rows of max/min values in a dataframe containing strings, NA and groups

Using pmap with c(...) part 2

Categories

Resources