Count Backwards in String until pattern R - r

I'm trying to extract UPCs from item descriptions. There is a varying number of /'s in the front of the description, but the UPC is always right before the last /, so I was using a count of characters, however, there is a variable number of characters at the end based on pack size. In the replication, you can see on the first row what this is supposed to look like at the end, but the second row has dropped the first digit of the UPC and picked up the /. Looking for a way to do this inline with DPLYR. My original code is under the replication.
test <- structure(list(Month = structure(c(17987, 17987), class = "Date"),store_id = c("7005", "7005"), UPC = c("000004150860081","00001200050404/"), `Item Description` = c("ACQUA PANNA SPRING WATER/EACH/000004150860081/1","AQUAFINA 24PK/24PK/000001200050404/24"), `Cals Item Description` = c(NA_character_,NA_character_), `Sub-Category` = c(NA_character_, NA_character_), Category = c(NA_character_, NA_character_), Department = c(NA_character_,NA_character_), `Sales Dollars` = c(17.43, 131.78), Units = c(7,528), Cost = c(8.4, 112.2), `Gross Margin` = c(9.03, 19.58), `Gross Margin %` = c(0.5181, 0.1486)), row.names = c(NA,-2L), class = c("tbl_df", "tbl", "data.frame"))
foo <- list.files(pattern = "*.csv", full.names = T) %>%
map_df(~read_csv(.)) %>%
mutate(date = lubridate::mdy(str_sub(textbox43, start = -10))) %>%
mutate(store_id = str_sub(textbox6, start = 1, end = 4)) %>%
mutate(item_desc = textbox57) %>%
filter(!is.na(item_desc), item_desc != "") %>%
mutate(dollars = textbox58,
units = textbox59,
cost = textbox61,
gm = textbox66,
gm_pct = textbox67) %>%
mutate(UPC = str_sub(item_desc, start = -17, end = -3))

Is this what you want?
sub("^.*/([^/]+)/[^/]*$",
"\\1",
test$`Item Description`)
Returns:
[1] "000004150860081" "000001200050404"
Edit: You were asking for dplyr style:
test %>%
mutate(item_id = sub("^.*/([^/]+)/[^/]*$",
"\\1",
test$`Item Description`))

Related

How can I make the group-by code to call a function from another package faster?

I have below code to compute a meta value using meta package:
probMetaControl <- long %>% group_by(ID, sample) %>% group_split() %>% mclapply(mc.cores = 10 ,function(endf){
message(endf$ID[1])
res <- meta::metagen(data = endf, studlab = ID, TE = expression , seTE = sd, sm = "SMD",
n.e = rep(1,nrow(endf)),
method.tau = "REML",
hakn = TRUE,
control = list(maxiter=1000))
data.frame(
ID = endf$ID[1],
sample = endf$sample[1],
meta.exprs = res$TE.fixed,
stringsAsFactors = F
)
}) %>% do.call(what = rbind) %>% as.data.frame()
the long dataframe has around 800,000 rows. The small part of long dataframe is as:
as.data.table(structure(list(ID = c("h:5982", "h:3310", "h:7849", "h:2978",
"h:7318"), pID = c("X1053_at", "X117_at", "X121_at", "X1255_g_at",
"X1294_at"), sd = c(0.228908614809978, 0.436455554523966, 0.210542866430305,
0.672545478318169, 0.26926204466525), sample = c("A", "B", "A",
"C", "A"), expression = c(6.53920197406645, 6.12380136266864,
8.01553257692446, 4.62636832157394, 7.58222133679378)), row.names = c(NA,
-5L), class = c("data.table", "data.frame")))
At the moment, this code takes 23 mins to run. Is there any way to make it faster?

Is it possible to make it more readable? treemap

I just wanna know how can I make it more readable.
marketcap <- data.frame(Marketcap = c(641899161594, 30552518424, 271028619181,
9277626785, 3986737880, 1202315485,
6049985280, 30722840711),
id = c('Bitcoin', 'Dogecoin', 'Ethereum', 'Litecoin', 'Monero', 'Nem', 'Stellar', 'xrp'),
row.names = c('Bitcoin', 'Dogecoin', 'Ethereum', 'Litecoin', 'Monero', 'Nem',
'Stellar', 'xrp')); df
#install.packages('treemap')
library(treemap)
df1 <- na.omit(marketcap[,c('id','Marketcap')])
df1$Marketcap <- as.numeric(round(df1$Marketcap, 0))
df1$formatted_market_cap = paste0(df1$id, '\n', '$', formatC(c("642","30.5","271","9.3","4","1.2","6.044","30.7"), format = "e", digits = 2))
treemap(df1, index = 'formatted_market_cap', vSize = 'Marketcap', title = 'Cryptocurrency Market Cap (bn)', fontsize.labels=c(15, 4), palette='Set3')
For example, Nem is looking poor

Plot multiple geom_line and geom_smooth objects in one plot

I have somewhat messy looking dataframes, like this one:
df0
# A tibble: 3 x 9
# Groups: Sequ [1]
Sequ Speaker Utterance A_intpl A_dur B_intpl B_dur C_intpl C_dur
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2 ID16.A cool >wha… 31.44786152… 10.5,17,1… 32.86993284… 9.5,16,17… 58.3368399… 14,17,17…
2 2 NA (0.228) 32.75735987… 15.5,17,1… 30.83469006… 14.5,16.9… 26.0386462… 3,17,16,…
3 2 ID16.B u:m Tenne… 32.05752604… 4.5,17,16… 29.95825107… 3.5,16,17… 55.9298614… 8,17,17,…
I want to plot the *_intpl values for each speaker (A, B, or C) for each of the three Utterances in a single chart both as line charts and as trend lines.
I'm just half successful doing this:
library(tidyr)
library(ggplot2)
library(dplyr)
df0 %>%
pivot_longer(cols = contains("_"),
names_to = c("Event_by", ".value"),
names_pattern = "^(.*)_([^_]+$)") %>%
separate_rows(c(intpl, dur), sep = ",", convert = TRUE) %>%
mutate(Time = cumsum(dur)) %>%
mutate(Utterance = paste0(sub(".*(.)$", "\\1",Speaker), ": ", Utterance),
Utterance = factor(Utterance, levels = unique(Utterance))) %>%
ggplot(aes(x = Time, y = log2(intpl),
group = Event_by,
colour = Event_by)) +
geom_line()+
geom_smooth(method = 'lm', color = "red", formula = y~x)+
facet_wrap(~ Utterance, ncol = 1, scales= "free_x")
Half successful because the line plots and trend lines are side-by-side, as if in three columns, whereas they should be in rows, one below the other - how can that be achieved?
Reproducible data:
structure(list(Sequ = c(2L, 2L, 2L), Speaker = c("ID16.A", NA,
"ID16.B"), Utterance = c("cool >what part?<", "(0.228)", "u:m Tennessee="
), A_intpl = c("31.4478615210995,31.5797510648522,31.7143985369445,31.651083739602,31.5806035086034,36.8956763912703,36.2882129597292,35.2124499461012,34.1366869324732,34.1366869324732,32.1927035724058,30.2487202123383,28.3047368522709,26.3607534922035,30.5278334848495,30.5919390424853,30.8898529369568,31.578968913188,31.9011198738002,32.1543265113196,31.9708002079533,31.966536408565,31.8762658607759,31.8994741472105,31.4215913971938,32.1510578328563,31.7863350712876,32.4685052625667,31.7422271490296,32.3286054977263,31.9998974949481,32.5177992323864,32.4727499785435,32.9310888953766,32.7592010033585,33.2231711877427,33.1593949301066,33.2432973964816,33.2569729073414,33.492144800249,33.317650964723,33.4835787832119,33.2377190454279,32.9200836384356,32.9684568771567,32.6400987016883,27.5447101464944,29.3948945479171,35.3449171857603,33.5932932239592,31.8416692621581,30.0900453003569,32.7850431084597,32.7589003618266,32.8365550655013,32.386716057622,32.8420792704881,32.6909995562489,32.6269434402016,32.7370944106334,32.7529759209752,32.6528826975113,32.3663573764448,32.7326853004792,32.6930038462418,32.8975978772676,33.1752899475416,33.2034433355001,33.0667431432803,32.6322933080614,33.2503168843178,32.7573598713719",
"32.7573598713719,32.7531704791313,32.7366130631104,32.918942216354,32.8309939530596,32.3856893430525,32.5368873543441,32.5628510484821,32.5628510484821,32.5628510484821,32.5506564332008,32.7477119716583,32.3458470743288,32.0575260428013",
"32.0575260428013,32.1628824338111,32.0093334061923,32.1461460586991,31.9080762250966,31.9469105074833,31.7431187667232,31.7194255656503,31.7394296413187,31.8594986292975,31.7498243274746,31.9069142374258,32.0835520942767,31.6257067057109,31.757232379438,31.9036689124911,32.1319749301918,31.7203280774998,31.7877137245706,32.3030946636177,32.2800139298454,32.164646135728,32.3636504940227,32.5657818936495,32.3859453482697,32.4797898358193,32.5319835105237,32.92233491509,32.8240561109448,32.664496027779,33.1835064752029,33.0366413969703,33.0406288190821,33.3232964677672,33.2206260057731,33.1537134269402,33.2783471506207,33.2933281566788,33.5322350394609,33.3815736723684,33.7905544185063,33.6143820666896,33.7490659591585,33.7260102344634,34.0721931066557,34.0455026427054,34.3735788774521,34.2888420421073,34.3913721165542,34.5982135545306,34.4417202731001,34.6586347152449,31.1590521215434,31.3276405983897,28.2379253186548,31.133030931336,34.0715906921349,35.8967950760285,35.9334551147377,35.8565504335515,35.7446081905229,35.6300325834155,35.8390086948751,35.9711743270411,36.0029493274176,35.8891056768339"
), A_dur = c("10.5,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,0.5",
"15.5,17,17,16,17,17,16,17,17,16,17,17,16,12.5", "4.5,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,5.5"
), B_intpl = c("32.8699328424689,32.8154348109057,32.5454364786882,32.408257038977,32.5304564519672,32.3270203236281,31.9233218634346,32.0166346064182,31.7360745988363,31.7546527359571,31.8603220354065,31.6520061326962,31.5603191463274,31.3357561466519,31.0976090032219,31.1405090978825,31.1697180784961,31.0863999545386,31.3126984044729,30.580776446803,30.7137016246273,31.0801914571091,31.2343922096768,31.2749857511594,31.3488604642844,30.9327390960718,31.0750482778561,31.1849119826023,31.4180114886183,31.5284273181104,31.147361398529,31.1128597713973,31.5551385744611,31.7479939892741,31.5890352680344,31.5470790538009,31.5427330200078,31.3901913024084,31.5423214446953,31.4814325586741,31.4937336232021,31.3483738841556,31.2516462059018,31.2233881922543,31.2572951780583,31.0087226975291,31.1197589042273,31.053748381687,30.8202174718598,30.845143129195,30.8727194789634,30.4231467151428,30.7254093759809,30.2757746547116,30.6047530953025,29.6835591414008,28.257421076205,29.4634886416064,29.183064807185,28.6935506287734,29.3989017421637,30.8936090542518,30.6884831327852,30.805770713392,30.6938909098627,30.8317757801268,30.8509115577427,30.6836198471168,30.7979978629801,31.0260101704105,30.6248844591805,30.8346900656087",
"30.8346900656087,30.9826158466835,29.814086001996,29.7839590794955,30.7928804535206,31.1589874726521,31.0547403039501,31.2268131145794,31.155503802286,31.3036925274762,31.4782621660348,31.0928322383151,31.589958621025,29.9582510795225",
"29.9582510795225,29.9796434055214,29.9405638729798,30.2602098442174,30.5011865525849,30.6753859842987,28.9331380886365,30.7736467776919,30.8457967803438,30.843630408183,30.8767570425033,30.9178344980247,30.734598946287,30.8877440413271,30.9225051837881,30.9534076039184,31.0172861192043,30.9371712793451,30.9806052132295,31.0593603717961,31.1156928565737,30.4713263393479,26.028518302418,28.1426546887905,29.4308434671559,30.7190322455213,31.2289674937063,31.7389027418913,32.2488379900763,32.7587732382613,33.2687084864463,33.7786437346312,34.2885789828162,34.7985142310012,35.3084494791862,35.8183847273712,36.3283199755562,36.8382552237412,37.3481904719262,37.8581257201112,38.3680609682962,25.5986933949893,29.7968031963901,30.5336819967028,30.1876589408847,30.4260367500101,30.2997107671214,30.3429716412578,30.3537316791924,30.4111899964144,30.7293520851914,30.7778983966343,30.9712137067708,30.9072589183658,31.0696990205164,30.5713926084448,31.3458855877875,31.4169903025083,31.5148974986093,31.5972499257413,31.2293401943969,31.2033325602348,31.1657434266985,30.6784877073261,30.6991365599664,30.6763195188897"
), B_dur = c("9.5,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,17,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,17,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,17,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,17,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,2.5",
"14.5,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17.0000000000146,13.4999999999854",
"3.5,16,17,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,17,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,17,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,7.5"
), C_intpl = c("58.3368399069697,58.249224089011,59.5198368051218,58.8722012497097,58.4418996252205,58.5849059154389,59.2752163985494,52.8407480422202,51.6276603912397,48.0255346632529,44.753541512539,41.4815483618252,38.2095552111114,34.9375620603975,31.6655689096837,28.3935757589698,25.121582608256,19.4712933827274,22.0108873782783,24.5504813738291,24.8441573376901,24.6902151101703,24.4029572181118,24.9753161974674,24.8664406826514,24.8486668451201,25.1137001504163,25.1142578332509,25.4902077628339,25.4075561268027,25.6622548410237,61.2421678149908,25.1600975771354,25.6667198263373,25.442560744158,25.8736383423437,25.5859074180431,24.7860400673889,24.4337707697216,24.3214953242744,23.915753514736,23.7363185577661,23.7186569801299,23.4313514771952,23.5730151254578,62.5124513171595,23.3260531660862,23.4498217326665,23.2145314844252,57.5586745434594,63.4646233226955,23.0706406704345,23.3318690599491,62.044649715831,62.2720656330432,22.2532276715887,62.7059140614625,22.9511208849958,22.5603175709988,23.3456453893988,63.2523901625561,60.6655429980934,60.2358824325868,59.957910796633,57.3999702562457,54.8277282980263,43.0269305132552,31.2261327284841,19.425334943713,22.7319906068577,26.0386462700023",
"26.0386462700023,29.345301933147,32.6519575962917,35.9586132594364,48.3773995023798,60.7961857453232,49.4980424442242,55.9907960862667,57.2956837917999,58.1409925994177,59.025022056064,60.0098263540792,60.4028460580062,61.2629030450653,55.9298614021542",
"55.9298614021542,55.3877180252389,61.3547152702855,61.7847919095391,56.2457623439544,62.5477315546977,62.3078007189967,62.4272469013149,57.6479672147315,62.9844338801191,58.0081708266629,63.3872796098875,59.0138830718112,58.0612924481098,58.38680047729,58.687179350318,63.8724230039733,63.4126777597892,63.6865154626743,63.5670658627636,63.4496590540706,63.7595297692908,58.9069708176601,63.4547681163061,64.3198376700797,63.415319961042,64.0985879957056,64.1201809531605,63.677902665454,64.1934303628317,64.4682003346273,64.2868853545462,24.8444135816353,64.1579626357752,63.8897139146875,58.5472675827292,64.5784992977498,64.0848591719068,63.8841268679761,64.2901359712354,64.395692486112,64.5425896391638,64.8060565909917,64.3618830026368,64.7088481705444,64.5005944199885,64.5540289192148,64.7408010459365,63.378880767685,63.3415589069662,63.5362700331647,63.5924807719723,63.575801461932,63.6799360982113,64.0041021410894,64.3144923757986,63.8692943755376,63.8594574363473,64.2731841085802,63.3314657812309,64.2758880216293,64.1011768977101,64.0261661917799,64.2865302330478,63.724697791255,64.1202175712152"
), C_dur = c("14,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,14",
"3,17,16,17,17,16,17,17,16,17,17,16,17,17,8", "8,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,2"
)), row.names = c(NA, -3L), groups = structure(list(Sequ = 2L,
.rows = structure(list(1:3), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
There's a possible solution with use of grid.arrange() func from library(gridExtra) library(grid) packages.
I've wrapped your data into unique charts and combined them together into arranged chart.
df1 = df0 %>%
pivot_longer(cols = contains("_"),
names_to = c("Event_by", ".value"),
names_pattern = "^(.*)_([^_]+$)") %>%
separate_rows(c(intpl, dur), sep = ",", convert = TRUE) %>%
mutate(Time = cumsum(dur)) %>%
mutate(Utterance = paste0(sub(".*(.)$", "\\1",Speaker), ": ", Utterance),
Utterance = factor(Utterance, levels = unique(Utterance)))
Set chart objects into enviroment:
for (i in unique(df1$Event_by)){
for (j in levels(df1$Utterance)){
assign(x = paste0(i,j), value = ggplot(data = df1[df1$Event_by == i & df1$Utterance == j,], aes(x = Time, y = log2(intpl))) +
geom_line()+
geom_smooth(method = 'lm', color = "red", formula = y~x))
}
}
Create grided chart:
library(gridExtra) library(grid)
grid.arrange(
`AA: cool >what part?<`,
`AB: u:m Tennessee=` ,
`ANA: (0.228)` ,
`BA: cool >what part?<` ,
`BB: u:m Tennessee=` ,
`BNA: (0.228)` ,
`CA: cool >what part?<` ,
`CB: u:m Tennessee=` ,
`CNA: (0.228)` ,
nrow = 3)
Although i think there should be better solution for that.
You can also try to explore below articlesfor arranging plots:
http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/81-ggplot2-easy-way-to-mix-multiple-graphs-on-the-same-page/
https://ggplot2-book.org/facet.html
Moreover, there's is no themming added to my solution

How to convert data with different levels of information into wide format? [duplicate]

This question already has an answer here:
Reshaping data.frame with a by-group where id variable repeats [duplicate]
(1 answer)
Closed 2 years ago.
I have a data of patients' operations/procedures (example as shown in the picture below) where one row describes a patient's procedure. There are 2 levels of information,
the first being the operation details, i.e. op_start_dt, priority_operation and asa_status
the second being the procedure details, i.e. proc_desc and proc_table
An operation can have more than 1 procedures. In the example below, patient A has 2 operations (defined by distinct op_start_dt). In his first operation, he had 1 procedure (defined by distinct proc_desc) and in his second, he had 2 procedures.
I would like to convert the data into a wide format, where a patient only has one row, and his information will be arranged operation by operation and within each operation, it will be arrange procedure by procedure, as shown below. So, proc_descxy refers to the proc_desc on xth operation and yth procedure.
Data:
df <- structure(list(patient = c("A", "A", "A"), department = c("GYNAECOLOGY /OBSTETRICS DEPT",
"GYNAECOLOGY /OBSTETRICS DEPT", "GYNAECOLOGY /OBSTETRICS DEPT"
), op_start_dt = structure(c(1424853000, 1424870700, 1424870700
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), priority_operation = c("Elective",
"Elective", "Elective"), asa_status = c(2, 3, 3), proc_desc = c("UTERUS, MALIGNANT CONDITION, EXTENDED HYSTERECTOMY WITH/WITHOUT LYMPHADENECTOMY",
"KIDNEY AND URETER, VARIOUS LESIONS, NEPHROURETERECTOMY, LAPAROSCOPIC",
"HEART, VARIOUS LESIONS, HEART TRANSPLANTATION"), proc_table = c("99",
"6A", "7C")), row.names = c(NA, 3L), class = "data.frame")
Desired output:
df <- structure(list(patient = "A", department = "GYNAECOLOGY /OBSTETRICS DEPT",
no_op = 2, op_start_dt1 = structure(1424853000, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), no_proc1 = 1, priority_operation1 = "Elective",
asa_status1 = 2, proc_desc11 = "UTERUS, MALIGNANT CONDITION, EXTENDED HYSTERECTOMY WITH/WITHOUT LYMPHADENECTOMY",
proc_table11 = "99", op_start_dt2 = structure(1424870700, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), no_of_proc2 = 2, priority_operation2 = "Elective",
asa_status2 = 3, proc_desc21 = "KIDNEY AND URETER, VARIOUS LESIONS, NEPHROURETERECTOMY, LAPAROSCOPIC",
proc_table21 = "6A", proc_desc22 = "HEART, VARIOUS LESIONS, HEART TRANSPLANTATION",
proc_table22 = "7C"), row.names = 1L, class = "data.frame")
My attempt:
I tried to work this out, but it gets confusing along the way, with pivot_longer then pivot_wideragain.
df %>%
# Operation-level Information
group_by(patient) %>%
mutate(op_nth = dense_rank(op_start_dt),
no_op = n_distinct(op_start_dt)) %>%
# Procedure-level Information
group_by(patient, op_start_dt) %>%
mutate(proc_nth = row_number(),
no_proc = n_distinct(proc_desc)) %>%
ungroup() %>%
# Make pivoting easier
mutate_all(as.character) %>%
# Pivot Procedure-level Information
pivot_longer(-c(patient, department, no_op, op_nth, proc_nth)) %>%
# Remove the indices for "Procedure" for Operation_level Information
mutate(proc_nth = case_when(!(name %in% c("op_start_dt", "no_proc", "priority_operation", "asa_status")) ~ proc_nth)) %>%
# Create the column names
unite(name, c(name, op_nth, proc_nth), sep = "", na.rm = TRUE) %>%
distinct() %>%
pivot_wider(names_from = name, values_from = value)
Create a unique ID column for each patient and then use pivot_wider.
library(dplyr)
df %>%
group_by(patient) %>%
mutate(row = row_number()) %>%
tidyr::pivot_wider(names_from = row, values_from = op_start_dt:proc_table)

iterate over certain elements of a list, not a data.frame

I am trying to modify certain items from a list based on a criteria (starts with "rr_esp") in the render.data list.
library(tidyr)
library(dplyr)
library(purrr)
per <- 2015:2019
render.data <- list(
emision = structure(
list(
AÑO = c(2017, 2018, 2019),
TRABAJADORESMES_r = c(58147, 57937, 24818),
MASA_r = c(3439195127, 4091347036.2, 2441068565.77),
TRABAJADORESMESsinDOM = c(58147L, 57928L, 24818L),
MESES = c(12, 12, 5)
),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -3L)
),
siniestros = structure(
list(
AÑO = c(2017, 2018, 2019),
N = c(388L, 327L, 115L),
GR_66 = c(64, 53, 15),
JU = c(41L, 5L, 0L),
JN = c(20, 19, 6),
PORINC_66s = c(437.22, 293.73, 82.12),
EDADs = c(15142L, 12886L, 4712L),
SALARIOs = c(13707950.67, 15151144.7, 4800075.4)
),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -3L)
),
rr_esp1 = structure(
list(
AÑO = c(2017, 2018, 2019),
MESES = c(12, 12, 5),
TRAB_PROM = c(4845.58, 4828.08, 4963.60),
PORINC = c(6.83, 5.54, 5.47),
SALARIO = c(35329.76, 46333.77, 41739.78),
EDAD = c(39.02, 39.40, 40.97)
),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -3L)
),
rr_esp7 = structure(
list(
AÑO = c(2017, 2018, 2019),
JUI_LIQ = c(1539624.21, 318726, 0),
JUI_RVA = c(24434809.51, 2292925.89, 0),
JUI_IBNR = c(0, 25284030.0174036, 22434092.26),
JUI_ULT = c(25974433.72, 27895681.90, 22434092.26),
CM_JUICIO = c(1505898.34, 1806002.14, 1557923.07)
),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -3L)
)
)
When apply a loop over their elements, they loses their original itemnames
Afterwards, I dont know a better way to iterate a subset of list elements and assign them a new value. I google it but I do not find a critical solution for list instead of data.frames.
render.data <- invisible(lapply(seq_along(render.data), function(i){
if(startsWith(names(render.data)[i], prefix = "rr_esp")){
render.data[[i]] %>%
complete(`AÑO` = per) %>%
gather(
key = "metrica", value = "valor", -`AÑO`
) %>%
mutate(# orden de las metricas
metrica = factor(metrica, levels = unique(metrica))
) %>%
spread(
key = `AÑO`, value = "valor"
)} else{
render.data[[i]]
}
setNames(render.data[[i]], names(render.data)[i])
}))
This seems like a case where a for loop is much clearer than an lapply. The main advantages of lapply are (a) that it pre-allocates a data structure for the result and (b) has simple syntax to apply a simple function. You already have a data structure for the result, and your function is complex. I don't know what your expected output is, but I would try this:
# find elements to modify
rr_elements = which(startsWith(names(render.data), prefix = "rr_esp"))
# modify in for loop
for (i in rr_elements) {
render.data[[i]] = render.data[[i]] %>%
complete(`AÑO` = per) %>%
gather(key = "metrica", value = "valor",-`AÑO`) %>%
mutate(# orden de las metricas
metrica = factor(metrica, levels = unique(metrica))) %>%
spread(key = `AÑO`, value = "valor")
}
If you want to make this code more re-usable, create a function for the operation on one data frame, and then you can use it easily with for or lapply. In general, I'd say that picking the data frames on which to use the function is better done externally than internally. (That is, I don't like how you have an if() statement checking the name inside the function. Do this logic outside the function, and only give the function the data you want it to use.)
foo = function(data) {
data %>%
complete(`AÑO` = per) %>%
gather(key = "metrica", value = "valor",-`AÑO`) %>%
mutate(# orden de las metricas
metrica = factor(metrica, levels = unique(metrica))) %>%
spread(key = `AÑO`, value = "valor")
}
# now the for loop or lapply is simple:
rr_elements = which(startsWith(names(render.data), prefix = "rr_esp"))
# for loop version
for (i in rr_elements) {
render.data[[i]] = foo(render.data[[i]])
}
# lapply version
render.data[rr_elements] = lapply(render.data[rr_elements], foo)

Resources