How to create a row by dividing First row by third row - r

I have a dataset which has values in first row & total in third row. I want to create a fourth row which is percentage of first by total which can be done by dividing first row with fourth row.
below is structure of dataframe
ds = structure(list(t1 = structure(c("1", "2", "Total"), label = "currently smoke any tobacco product", labels = c(no = 0,
yes = 1), class = "haven_labelled"), c1Female = c(679357.516868591,
8394232.81394577, 9073590.33081436), c1Male = c(2254232.8617363,
5802560.20343018, 8056793.06516647), se.c1Female = c(63743.4459540534,
421866.610586848, 485610.056540901), se.c1Male = c(185544.754820322,
386138.725133411, 571683.479953732), Total_1 = c(`1` = 2933590.37860489,
`2` = 14196793.0173759, `3` = 17130383.3959808), per = c(`1` = 0.171250713471665,
`2` = 0.828749286528335, `3` = 1)), class = "data.frame", row.names = c(NA,
-3L))
My try & what is wrong with this
ds %>% mutate(percentage = .[1,]/.[3,])
OUTPUT SHOULD BE : Below is the dput of Output Dataframe that I want
structure(list(t1 = structure(c(1L, 2L, 4L, 3L), .Label = c("1",
"2", "Percentage", "Total"), class = "factor"), c1Female = c(679357.517,
8394232.814, 9073590.331, 0.074871963), c1Male = c(2254232.86,
5802560.2, 8056793.07, 0.279792821), se.c1Female = c(63743.446,
421866.611, 485610.057, 0.131264674), se.c1Male = c(185544.755,
386138.725, 571683.48, 0.324558539), Total_1 = c(2933590.38,
14196793.02, 17130383.4, 0.171250714), per = c(0.171250713, 0.828749287,
1, 0.171250713)), class = "data.frame", row.names = c(NA, -4L
))
Do share the tidyverse way to do this. Also, do tell what is wrong with this approach below line code
ds %>% mutate(percentage = .[1,]/.[3,])

We can use summarise_at to divide multiple column values to return a single row and then bind with the original dataset
library(dplyr)
ds %>%
summarise_at(-1, ~ .[1]/.[3]) %>%
mutate(t1 = 'Percentage') %>%
bind_rows(ds, .)
# t1 c1Female c1Male se.c1Female se.c1Male Total_1 per
#1 1 6.793575e+05 2.254233e+06 6.374345e+04 1.855448e+05 2.933590e+06 0.1712507
#2 2 8.394233e+06 5.802560e+06 4.218666e+05 3.861387e+05 1.419679e+07 0.8287493
#3 Total 9.073590e+06 8.056793e+06 4.856101e+05 5.716835e+05 1.713038e+07 1.0000000
#4 Percentage 7.487196e-02 2.797928e-01 1.312647e-01 3.245585e-01 1.712507e-01 0.1712507
Or another option is add_row
ds %>%
add_row(t1 = 'Percentage') %>%
mutate_at(-1, ~ replace_na(., .[1]/.[3]))
Or do this within the add_row step itself
ds %>%
add_row(t1 = 'Percentage', !!!as.list(.[-1][1,]/.[-1][3,]))
# t1 c1Female c1Male se.c1Female se.c1Male Total_1 per
#1 1 6.793575e+05 2.254233e+06 6.374345e+04 1.855448e+05 2.933590e+06 0.1712507
#2 2 8.394233e+06 5.802560e+06 4.218666e+05 3.861387e+05 1.419679e+07 0.8287493
#3 Total 9.073590e+06 8.056793e+06 4.856101e+05 5.716835e+05 1.713038e+07 1.0000000
#4 Percentage 7.487196e-02 2.797928e-01 1.312647e-01 3.245585e-01 1.712507e-01 0.1712507

Related

How to cbind a list of tables by one column, and suffix headings with the list item name

I've got a list of dataframes. I'd like to cbind them by the index column, sample_id. Each table has the same column headings, so I can't just cbind them otherwise I won't know which list item the columns came from. The name of the list item gives the measure used to generate them, so I'd like to suffix the column headings with the list item name.
Here's a simplified demo list of dataframes:
list_of_tables <- list(number = structure(list(sample_id = structure(1:3, levels = c("CSF_1",
"CSF_2", "CSF_4"), class = "factor"), total = c(655, 331, 271
), max = c(12, 5, 7)), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame")), concentration_cm_3 = structure(list(sample_id = structure(1:3, levels = c("CSF_1",
"CSF_2", "CSF_4"), class = "factor"), total = c(121454697, 90959097,
43080697), max = c(2050000, 2140000, 915500)), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame")), volume_nm_3 = structure(list(
sample_id = structure(1:3, levels = c("CSF_1", "CSF_2", "CSF_4"
), class = "factor"), total = c(2412783009, 1293649395, 438426087
), max = c(103500000, 117400000, 23920000)), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame")), area_nm_2 = structure(list(
sample_id = structure(1:3, levels = c("CSF_1", "CSF_2", "CSF_4"
), class = "factor"), total = c(15259297.4, 7655352.2, 3775922
), max = c(266500, 289900, 100400)), row.names = c(NA, -3L
), class = c("tbl_df", "tbl", "data.frame")))
You'll see it's a list of 4 tables, and the list item names are "number", "concentration_cm_3", "volume_nm_3", and "area_nm_2".
Using join_all from plyr I can merge them all by sample_id. However, how do I suffix with the list item name?
merged_tables <- plyr::join_all(stats_by_measure, by = "sample_id", type = "left")
we could do it this way:
The trick is to use .id = 'id' in bind_rows which adds the name as a column. Then we could pivot:
library(dplyr)
library(tidyr)
bind_rows(list_of_tables, .id = 'id') %>%
pivot_wider(names_from = id,
values_from = c(total, max))
sample_id total_number total_concentration_cm_3 total_volume_nm_3 total_area_nm_2 max_number max_concentration_cm_3 max_volume_nm_3 max_area_nm_2
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 CSF_1 655 121454697 2412783009 15259297. 12 2050000 103500000 266500
2 CSF_2 331 90959097 1293649395 7655352. 5 2140000 117400000 289900
3 CSF_4 271 43080697 438426087 3775922 7 915500 23920000 100400
Probably, we may use reduce2 here with suffix option from left_join
library(dplyr)
library(purrr)
nm <- names(list_of_tables)[1]
reduce2(list_of_tables, names(list_of_tables)[-1],
function(x, y, z) left_join(x, y, by = 'sample_id', suffix = c(nm, z)))
Or if we want to use join_all, probably we can rename the columns before doing the join
library(stringr)
imap(list_of_tables, ~ {
nm <- .y
.x %>% rename_with(~str_c(.x, nm), -1)
}) %>%
plyr::join_all( by = "sample_id", type = "left")
Or use a for loop
tmp <- list_of_tables[[1]]
names(tmp)[-1] <- paste0(names(tmp)[-1], names(list_of_tables)[1])
for(nm in names(list_of_tables)[-1]) {
tmp2 <- list_of_tables[[nm]]
names(tmp2)[-1] <- paste0(names(tmp2)[-1], nm)
tmp <- left_join(tmp, tmp2, by = "sample_id")
}
tmp

Plot multiple geom_line and geom_smooth objects in one plot

I have somewhat messy looking dataframes, like this one:
df0
# A tibble: 3 x 9
# Groups: Sequ [1]
Sequ Speaker Utterance A_intpl A_dur B_intpl B_dur C_intpl C_dur
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2 ID16.A cool >wha… 31.44786152… 10.5,17,1… 32.86993284… 9.5,16,17… 58.3368399… 14,17,17…
2 2 NA (0.228) 32.75735987… 15.5,17,1… 30.83469006… 14.5,16.9… 26.0386462… 3,17,16,…
3 2 ID16.B u:m Tenne… 32.05752604… 4.5,17,16… 29.95825107… 3.5,16,17… 55.9298614… 8,17,17,…
I want to plot the *_intpl values for each speaker (A, B, or C) for each of the three Utterances in a single chart both as line charts and as trend lines.
I'm just half successful doing this:
library(tidyr)
library(ggplot2)
library(dplyr)
df0 %>%
pivot_longer(cols = contains("_"),
names_to = c("Event_by", ".value"),
names_pattern = "^(.*)_([^_]+$)") %>%
separate_rows(c(intpl, dur), sep = ",", convert = TRUE) %>%
mutate(Time = cumsum(dur)) %>%
mutate(Utterance = paste0(sub(".*(.)$", "\\1",Speaker), ": ", Utterance),
Utterance = factor(Utterance, levels = unique(Utterance))) %>%
ggplot(aes(x = Time, y = log2(intpl),
group = Event_by,
colour = Event_by)) +
geom_line()+
geom_smooth(method = 'lm', color = "red", formula = y~x)+
facet_wrap(~ Utterance, ncol = 1, scales= "free_x")
Half successful because the line plots and trend lines are side-by-side, as if in three columns, whereas they should be in rows, one below the other - how can that be achieved?
Reproducible data:
structure(list(Sequ = c(2L, 2L, 2L), Speaker = c("ID16.A", NA,
"ID16.B"), Utterance = c("cool >what part?<", "(0.228)", "u:m Tennessee="
), A_intpl = c("31.4478615210995,31.5797510648522,31.7143985369445,31.651083739602,31.5806035086034,36.8956763912703,36.2882129597292,35.2124499461012,34.1366869324732,34.1366869324732,32.1927035724058,30.2487202123383,28.3047368522709,26.3607534922035,30.5278334848495,30.5919390424853,30.8898529369568,31.578968913188,31.9011198738002,32.1543265113196,31.9708002079533,31.966536408565,31.8762658607759,31.8994741472105,31.4215913971938,32.1510578328563,31.7863350712876,32.4685052625667,31.7422271490296,32.3286054977263,31.9998974949481,32.5177992323864,32.4727499785435,32.9310888953766,32.7592010033585,33.2231711877427,33.1593949301066,33.2432973964816,33.2569729073414,33.492144800249,33.317650964723,33.4835787832119,33.2377190454279,32.9200836384356,32.9684568771567,32.6400987016883,27.5447101464944,29.3948945479171,35.3449171857603,33.5932932239592,31.8416692621581,30.0900453003569,32.7850431084597,32.7589003618266,32.8365550655013,32.386716057622,32.8420792704881,32.6909995562489,32.6269434402016,32.7370944106334,32.7529759209752,32.6528826975113,32.3663573764448,32.7326853004792,32.6930038462418,32.8975978772676,33.1752899475416,33.2034433355001,33.0667431432803,32.6322933080614,33.2503168843178,32.7573598713719",
"32.7573598713719,32.7531704791313,32.7366130631104,32.918942216354,32.8309939530596,32.3856893430525,32.5368873543441,32.5628510484821,32.5628510484821,32.5628510484821,32.5506564332008,32.7477119716583,32.3458470743288,32.0575260428013",
"32.0575260428013,32.1628824338111,32.0093334061923,32.1461460586991,31.9080762250966,31.9469105074833,31.7431187667232,31.7194255656503,31.7394296413187,31.8594986292975,31.7498243274746,31.9069142374258,32.0835520942767,31.6257067057109,31.757232379438,31.9036689124911,32.1319749301918,31.7203280774998,31.7877137245706,32.3030946636177,32.2800139298454,32.164646135728,32.3636504940227,32.5657818936495,32.3859453482697,32.4797898358193,32.5319835105237,32.92233491509,32.8240561109448,32.664496027779,33.1835064752029,33.0366413969703,33.0406288190821,33.3232964677672,33.2206260057731,33.1537134269402,33.2783471506207,33.2933281566788,33.5322350394609,33.3815736723684,33.7905544185063,33.6143820666896,33.7490659591585,33.7260102344634,34.0721931066557,34.0455026427054,34.3735788774521,34.2888420421073,34.3913721165542,34.5982135545306,34.4417202731001,34.6586347152449,31.1590521215434,31.3276405983897,28.2379253186548,31.133030931336,34.0715906921349,35.8967950760285,35.9334551147377,35.8565504335515,35.7446081905229,35.6300325834155,35.8390086948751,35.9711743270411,36.0029493274176,35.8891056768339"
), A_dur = c("10.5,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,0.5",
"15.5,17,17,16,17,17,16,17,17,16,17,17,16,12.5", "4.5,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,5.5"
), B_intpl = c("32.8699328424689,32.8154348109057,32.5454364786882,32.408257038977,32.5304564519672,32.3270203236281,31.9233218634346,32.0166346064182,31.7360745988363,31.7546527359571,31.8603220354065,31.6520061326962,31.5603191463274,31.3357561466519,31.0976090032219,31.1405090978825,31.1697180784961,31.0863999545386,31.3126984044729,30.580776446803,30.7137016246273,31.0801914571091,31.2343922096768,31.2749857511594,31.3488604642844,30.9327390960718,31.0750482778561,31.1849119826023,31.4180114886183,31.5284273181104,31.147361398529,31.1128597713973,31.5551385744611,31.7479939892741,31.5890352680344,31.5470790538009,31.5427330200078,31.3901913024084,31.5423214446953,31.4814325586741,31.4937336232021,31.3483738841556,31.2516462059018,31.2233881922543,31.2572951780583,31.0087226975291,31.1197589042273,31.053748381687,30.8202174718598,30.845143129195,30.8727194789634,30.4231467151428,30.7254093759809,30.2757746547116,30.6047530953025,29.6835591414008,28.257421076205,29.4634886416064,29.183064807185,28.6935506287734,29.3989017421637,30.8936090542518,30.6884831327852,30.805770713392,30.6938909098627,30.8317757801268,30.8509115577427,30.6836198471168,30.7979978629801,31.0260101704105,30.6248844591805,30.8346900656087",
"30.8346900656087,30.9826158466835,29.814086001996,29.7839590794955,30.7928804535206,31.1589874726521,31.0547403039501,31.2268131145794,31.155503802286,31.3036925274762,31.4782621660348,31.0928322383151,31.589958621025,29.9582510795225",
"29.9582510795225,29.9796434055214,29.9405638729798,30.2602098442174,30.5011865525849,30.6753859842987,28.9331380886365,30.7736467776919,30.8457967803438,30.843630408183,30.8767570425033,30.9178344980247,30.734598946287,30.8877440413271,30.9225051837881,30.9534076039184,31.0172861192043,30.9371712793451,30.9806052132295,31.0593603717961,31.1156928565737,30.4713263393479,26.028518302418,28.1426546887905,29.4308434671559,30.7190322455213,31.2289674937063,31.7389027418913,32.2488379900763,32.7587732382613,33.2687084864463,33.7786437346312,34.2885789828162,34.7985142310012,35.3084494791862,35.8183847273712,36.3283199755562,36.8382552237412,37.3481904719262,37.8581257201112,38.3680609682962,25.5986933949893,29.7968031963901,30.5336819967028,30.1876589408847,30.4260367500101,30.2997107671214,30.3429716412578,30.3537316791924,30.4111899964144,30.7293520851914,30.7778983966343,30.9712137067708,30.9072589183658,31.0696990205164,30.5713926084448,31.3458855877875,31.4169903025083,31.5148974986093,31.5972499257413,31.2293401943969,31.2033325602348,31.1657434266985,30.6784877073261,30.6991365599664,30.6763195188897"
), B_dur = c("9.5,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,17,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,17,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,17,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,17,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,2.5",
"14.5,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17.0000000000146,13.4999999999854",
"3.5,16,17,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,17,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,17,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,7.5"
), C_intpl = c("58.3368399069697,58.249224089011,59.5198368051218,58.8722012497097,58.4418996252205,58.5849059154389,59.2752163985494,52.8407480422202,51.6276603912397,48.0255346632529,44.753541512539,41.4815483618252,38.2095552111114,34.9375620603975,31.6655689096837,28.3935757589698,25.121582608256,19.4712933827274,22.0108873782783,24.5504813738291,24.8441573376901,24.6902151101703,24.4029572181118,24.9753161974674,24.8664406826514,24.8486668451201,25.1137001504163,25.1142578332509,25.4902077628339,25.4075561268027,25.6622548410237,61.2421678149908,25.1600975771354,25.6667198263373,25.442560744158,25.8736383423437,25.5859074180431,24.7860400673889,24.4337707697216,24.3214953242744,23.915753514736,23.7363185577661,23.7186569801299,23.4313514771952,23.5730151254578,62.5124513171595,23.3260531660862,23.4498217326665,23.2145314844252,57.5586745434594,63.4646233226955,23.0706406704345,23.3318690599491,62.044649715831,62.2720656330432,22.2532276715887,62.7059140614625,22.9511208849958,22.5603175709988,23.3456453893988,63.2523901625561,60.6655429980934,60.2358824325868,59.957910796633,57.3999702562457,54.8277282980263,43.0269305132552,31.2261327284841,19.425334943713,22.7319906068577,26.0386462700023",
"26.0386462700023,29.345301933147,32.6519575962917,35.9586132594364,48.3773995023798,60.7961857453232,49.4980424442242,55.9907960862667,57.2956837917999,58.1409925994177,59.025022056064,60.0098263540792,60.4028460580062,61.2629030450653,55.9298614021542",
"55.9298614021542,55.3877180252389,61.3547152702855,61.7847919095391,56.2457623439544,62.5477315546977,62.3078007189967,62.4272469013149,57.6479672147315,62.9844338801191,58.0081708266629,63.3872796098875,59.0138830718112,58.0612924481098,58.38680047729,58.687179350318,63.8724230039733,63.4126777597892,63.6865154626743,63.5670658627636,63.4496590540706,63.7595297692908,58.9069708176601,63.4547681163061,64.3198376700797,63.415319961042,64.0985879957056,64.1201809531605,63.677902665454,64.1934303628317,64.4682003346273,64.2868853545462,24.8444135816353,64.1579626357752,63.8897139146875,58.5472675827292,64.5784992977498,64.0848591719068,63.8841268679761,64.2901359712354,64.395692486112,64.5425896391638,64.8060565909917,64.3618830026368,64.7088481705444,64.5005944199885,64.5540289192148,64.7408010459365,63.378880767685,63.3415589069662,63.5362700331647,63.5924807719723,63.575801461932,63.6799360982113,64.0041021410894,64.3144923757986,63.8692943755376,63.8594574363473,64.2731841085802,63.3314657812309,64.2758880216293,64.1011768977101,64.0261661917799,64.2865302330478,63.724697791255,64.1202175712152"
), C_dur = c("14,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,14",
"3,17,16,17,17,16,17,17,16,17,17,16,17,17,8", "8,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,2"
)), row.names = c(NA, -3L), groups = structure(list(Sequ = 2L,
.rows = structure(list(1:3), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
There's a possible solution with use of grid.arrange() func from library(gridExtra) library(grid) packages.
I've wrapped your data into unique charts and combined them together into arranged chart.
df1 = df0 %>%
pivot_longer(cols = contains("_"),
names_to = c("Event_by", ".value"),
names_pattern = "^(.*)_([^_]+$)") %>%
separate_rows(c(intpl, dur), sep = ",", convert = TRUE) %>%
mutate(Time = cumsum(dur)) %>%
mutate(Utterance = paste0(sub(".*(.)$", "\\1",Speaker), ": ", Utterance),
Utterance = factor(Utterance, levels = unique(Utterance)))
Set chart objects into enviroment:
for (i in unique(df1$Event_by)){
for (j in levels(df1$Utterance)){
assign(x = paste0(i,j), value = ggplot(data = df1[df1$Event_by == i & df1$Utterance == j,], aes(x = Time, y = log2(intpl))) +
geom_line()+
geom_smooth(method = 'lm', color = "red", formula = y~x))
}
}
Create grided chart:
library(gridExtra) library(grid)
grid.arrange(
`AA: cool >what part?<`,
`AB: u:m Tennessee=` ,
`ANA: (0.228)` ,
`BA: cool >what part?<` ,
`BB: u:m Tennessee=` ,
`BNA: (0.228)` ,
`CA: cool >what part?<` ,
`CB: u:m Tennessee=` ,
`CNA: (0.228)` ,
nrow = 3)
Although i think there should be better solution for that.
You can also try to explore below articlesfor arranging plots:
http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/81-ggplot2-easy-way-to-mix-multiple-graphs-on-the-same-page/
https://ggplot2-book.org/facet.html
Moreover, there's is no themming added to my solution

How to convert data with different levels of information into wide format? [duplicate]

This question already has an answer here:
Reshaping data.frame with a by-group where id variable repeats [duplicate]
(1 answer)
Closed 2 years ago.
I have a data of patients' operations/procedures (example as shown in the picture below) where one row describes a patient's procedure. There are 2 levels of information,
the first being the operation details, i.e. op_start_dt, priority_operation and asa_status
the second being the procedure details, i.e. proc_desc and proc_table
An operation can have more than 1 procedures. In the example below, patient A has 2 operations (defined by distinct op_start_dt). In his first operation, he had 1 procedure (defined by distinct proc_desc) and in his second, he had 2 procedures.
I would like to convert the data into a wide format, where a patient only has one row, and his information will be arranged operation by operation and within each operation, it will be arrange procedure by procedure, as shown below. So, proc_descxy refers to the proc_desc on xth operation and yth procedure.
Data:
df <- structure(list(patient = c("A", "A", "A"), department = c("GYNAECOLOGY /OBSTETRICS DEPT",
"GYNAECOLOGY /OBSTETRICS DEPT", "GYNAECOLOGY /OBSTETRICS DEPT"
), op_start_dt = structure(c(1424853000, 1424870700, 1424870700
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), priority_operation = c("Elective",
"Elective", "Elective"), asa_status = c(2, 3, 3), proc_desc = c("UTERUS, MALIGNANT CONDITION, EXTENDED HYSTERECTOMY WITH/WITHOUT LYMPHADENECTOMY",
"KIDNEY AND URETER, VARIOUS LESIONS, NEPHROURETERECTOMY, LAPAROSCOPIC",
"HEART, VARIOUS LESIONS, HEART TRANSPLANTATION"), proc_table = c("99",
"6A", "7C")), row.names = c(NA, 3L), class = "data.frame")
Desired output:
df <- structure(list(patient = "A", department = "GYNAECOLOGY /OBSTETRICS DEPT",
no_op = 2, op_start_dt1 = structure(1424853000, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), no_proc1 = 1, priority_operation1 = "Elective",
asa_status1 = 2, proc_desc11 = "UTERUS, MALIGNANT CONDITION, EXTENDED HYSTERECTOMY WITH/WITHOUT LYMPHADENECTOMY",
proc_table11 = "99", op_start_dt2 = structure(1424870700, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), no_of_proc2 = 2, priority_operation2 = "Elective",
asa_status2 = 3, proc_desc21 = "KIDNEY AND URETER, VARIOUS LESIONS, NEPHROURETERECTOMY, LAPAROSCOPIC",
proc_table21 = "6A", proc_desc22 = "HEART, VARIOUS LESIONS, HEART TRANSPLANTATION",
proc_table22 = "7C"), row.names = 1L, class = "data.frame")
My attempt:
I tried to work this out, but it gets confusing along the way, with pivot_longer then pivot_wideragain.
df %>%
# Operation-level Information
group_by(patient) %>%
mutate(op_nth = dense_rank(op_start_dt),
no_op = n_distinct(op_start_dt)) %>%
# Procedure-level Information
group_by(patient, op_start_dt) %>%
mutate(proc_nth = row_number(),
no_proc = n_distinct(proc_desc)) %>%
ungroup() %>%
# Make pivoting easier
mutate_all(as.character) %>%
# Pivot Procedure-level Information
pivot_longer(-c(patient, department, no_op, op_nth, proc_nth)) %>%
# Remove the indices for "Procedure" for Operation_level Information
mutate(proc_nth = case_when(!(name %in% c("op_start_dt", "no_proc", "priority_operation", "asa_status")) ~ proc_nth)) %>%
# Create the column names
unite(name, c(name, op_nth, proc_nth), sep = "", na.rm = TRUE) %>%
distinct() %>%
pivot_wider(names_from = name, values_from = value)
Create a unique ID column for each patient and then use pivot_wider.
library(dplyr)
df %>%
group_by(patient) %>%
mutate(row = row_number()) %>%
tidyr::pivot_wider(names_from = row, values_from = op_start_dt:proc_table)

How can I pivot_longer() while maintaining column pairings?

There's got to be a simpler way to do this!
I start with wide format data:
| family_id | first_name_child1 | surname_child1 | first_name_child2 | second_name_child2 | ... |
|...........|...................|................|...................|....................|.....|
And I want to turn it into long format:
| family_id | sibling_number | first_name | surname |
|...........|................|............|.........|
Question: How can I pivot_longer() while maintaining the first name/surname pairings?
This is how I did it:
df <- structure(list(family_id = 1:2, first_name_child1 = c("Verdie",
"Quentin"), first_name_child2 = c("Iris", "Bryon"), first_name_child3 = c(NA,
"Karie"), first_name_child4 = c(NA, "Christopher"), surname_child1 = c("Moy",
"Mccowen"), surname_child2 = c("Moy", "Mccowen"), surname_child3 = c(NA,
"Mccowen"), surname_child4 = c(NA, "Mccowen")), row.names = c(NA,
-2L), class = c("tbl_df", "tbl", "data.frame"))
library(dplyr)
library(tidyr)
fun <- function(x) {
names(x) <- gsub("_child\\d+", "", names(x))
x
}
df %>%
nest(child1 = ends_with("_child1"),
child2 = ends_with("_child2"),
child3 = ends_with("_child3"),
child4 = ends_with("_child4")) %>%
mutate_at(vars(starts_with("child")), lapply, fun) %>%
pivot_longer(-family_id, names_to = "sibling_number",
names_prefix = "child",
values_to = "name") %>%
unnest(name)
BUT I can do the reverse with 1 line:
df2 <- structure(list(family_id = c(1L, 1L, 2L, 2L, 2L, 2L), sibling_number = c(1L,
2L, 1L, 2L, 3L, 4L), first_name = c("Verdie", "Iris", "Quentin",
"Bryon", "Karie", "Christopher"), surname = c("Moy", "Moy", "Mccowen",
"Mccowen", "Mccowen", "Mccowen")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
pivot_wider(df2,
names_from = sibling_number,
names_prefix = "child",
values_from = c("first_name", "surname"))
Is this pivot_wider() easily reversible? Or alternatively, I thought there might be a way to combine do.call(), nest() and ends_with(), but couldn't work it out?
additional solution
df %>%
pivot_longer(cols = -family_id,
names_to = c(".value", "set"),
names_pattern = "(.*)(\\d+)")
The data.table solution would be:
library(data.table)
g <- melt(setDT(df),
id.vars = "family_id",
measure.vars = patterns(first_name = "first_name_child",
surname = "surname_child"),
variable.name = "sibling_number",
na.rm = F)
g[order(family_id)]
family_id sibling_number first_name surname
1: 1 1 Verdie Moy
2: 1 2 Iris Moy
3: 1 3 <NA> <NA>
4: 1 4 <NA> <NA>
5: 2 1 Quentin Mccowen
6: 2 2 Bryon Mccowen
7: 2 3 Karie Mccowen
8: 2 4 Christopher Mccowen
And, as a side note, you can get it back to wide format with
dcast(g,
family_id ~ sibling_number,
value.var = c("first_name","surname"))
This answer is copied from Henrik's comment:
pivot_longer(df, cols = -1, names_to = c(".value", "sibling_nr"), names_sep = "child")
Answer is posted and accepted to close out question.

How do I unnest a nested df and use the coumn name as part of the new column name?

I realize my title is probably a little confusing. I have some JSON that is a little confusing to unnest. I am trying to use the tidyverse.
Sample Data
df <- structure(list(long_abbr = c("Team11", "BBS"), short_name = c("Ac ",
"BK"), division = c("", ""), name = c("AC Slaters Muscles", "Broken Bats"
), abbr = c("T1", "T1"), owners = list(structure(list(commissioner = 0L,
name = "Chris Liss", id = "300144F8-79F4-11EA-8F25-9AE405472731"), class = "data.frame", row.names = 1L),
structure(list(commissioner = 1L, name = "Mark Ortin", id = "90849EF6-7427-11EA-95AA-4EEEAC7F8CD2"), class = "data.frame", row.names = 1L)),
id = c("1", "2"), logged_in_team = c(NA_integer_, NA_integer_
)), row.names = 1:2, class = "data.frame")
)
# Unnest Owners Information
df <- df %>%
unnest(owners)
I get the following error since I have duplicate columns that use name.
Error: Column names `name` and `id` must not be duplicated.
Is there an easy way to unnest the columns with a naming convention that takes the prefix owners (or in my case, I'd want it to take whatever the name of the column that hold the nested df is) before the nested columns. I.E. owners.commissioner, owners.name, owners.id. I'd also be interested in solutions that use camel case, and an underscore. I.E. ownersName, or owners_name.
set the argument names_sep:
df <- structure(
list(long_abbr = c("Team11", "BBS"),
short_name = c("Ac ", "BK"),
division = c("", ""),
name = c("AC Slaters Muscles", "Broken Bats"),
abbr = c("T1", "T1"),
owners = list(
structure(list(commissioner = 0L, name = "Chris Liss",
id = "300144F8-79F4-11EA-8F25-9AE405472731"),
class = "data.frame", row.names = 1L),
structure(list(commissioner = 1L, name = "Mark Ortin",
id = "90849EF6-7427-11EA-95AA-4EEEAC7F8CD2"),
class = "data.frame", row.names = 1L)),
id = c("1", "2"),
logged_in_team = c(NA_integer_, NA_integer_)),
row.names = 1:2, class = "data.frame"
)
tidyr::unnest(df, owners, names_sep = "_")
#> # A tibble: 2 x 10
#> long_abbr short_name division name abbr owners_commissi… owners_name
#> <chr> <chr> <chr> <chr> <chr> <int> <chr>
#> 1 Team11 "Ac " "" AC S… T1 0 Chris Liss
#> 2 BBS "BK" "" Brok… T1 1 Mark Ortin
#> # … with 3 more variables: owners_id <chr>, id <chr>, logged_in_team <int>
Created on 2020-04-26 by the reprex package (v0.3.0)
Does this solve your problem?

Resources