the difference of multiple columns in R based in DATE [duplicate]

the difference of multiple columns in R based in DATE [duplicate] - r

This question already has answers here:
subtract value from previous row by group
(3 answers)
Closed 1 year ago.
I have this data :
structure(list(new_col = c(1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5,
6, 7, 1, 2, 3, 4, 5, 6, 7), DATE = structure(c(1608249600, 1608249600,
1608249600, 1608249600, 1608249600, 1608249600, 1608249600, 1608336000,
1608336000, 1608336000, 1608336000, 1608336000, 1608336000, 1608336000,
1608422400, 1608422400, 1608422400, 1608422400, 1608422400, 1608422400,
1608422400), tzone = "UTC", class = c("POSIXct", "POSIXt")),
HOSP_COUNT = c(582, 931, 1472, 2175, 2791, 3024, 2310, 588,
932, 1477, 2186, 2810, 3051, 2330, 590, 932, 1479, 2188,
2817, 3060, 2335)), row.names = c(NA, -21L), class = c("tbl_df",
"tbl", "data.frame"))
HOPS_COUNT is a cumulative variable, while I need daily measure. Something like this :
structure(list(X1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21), new_col = c(1, 2, 3, 4,
5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7), DATE = c("12/18/2020",
"12/18/2020", "12/18/2020", "12/18/2020", "12/18/2020", "12/18/2020",
"12/18/2020", "12/19/2020", "12/19/2020", "12/19/2020", "12/19/2020",
"12/19/2020", "12/19/2020", "12/19/2020", "12/20/2020", "12/20/2020",
"12/20/2020", "12/20/2020", "12/20/2020", "12/20/2020", "12/20/2020"
), HOSP_COUNT = c(582, 931, 1472, 2175, 2791, 3024, 2310, 6,
1, 5, 11, 19, 27, 20, 2, 1, 2, 2, 7, 9, 15)), row.names = c(NA,
-21L), class = c("tbl_df", "tbl", "data.frame"))
So I need a new column the formula is that
(HOPS_CASE of new_col(1) in the second date (12/19/2020)= (CASE_HOSP of new_col(1) in the first day 12/18/2020)-(Current CASE_HOSP of new_col(1) in the first day 12/19/2020)

This should work for you:
library(dplyr)
df %>%
group_by(new_col) %>%
mutate(new_count = HOSP_COUNT - lag(HOSP_COUNT),
new_count = ifelse(is.na(new_count), HOSP_COUNT, new_count))
Where new_count is your corrected counts:
# A tibble: 20 x 4
# Groups: new_col [7]
new_col DATE HOSP_COUNT new_count
<dbl> <dttm> <dbl> <dbl>
1 1 2020-12-18 00:00:00 582 582
2 2 2020-12-18 00:00:00 931 931
3 3 2020-12-18 00:00:00 1472 1472
4 4 2020-12-18 00:00:00 2175 2175
5 5 2020-12-18 00:00:00 2791 2791
6 6 2020-12-18 00:00:00 3024 3024
7 7 2020-12-18 00:00:00 2310 2310
8 1 2020-12-19 00:00:00 588 6
9 2 2020-12-19 00:00:00 932 1
10 3 2020-12-19 00:00:00 1477 5
11 4 2020-12-19 00:00:00 2186 11
12 5 2020-12-19 00:00:00 2810 19
13 6 2020-12-19 00:00:00 3051 27
14 7 2020-12-19 00:00:00 2330 20
15 1 2020-12-20 00:00:00 590 2
16 2 2020-12-20 00:00:00 932 0
17 3 2020-12-20 00:00:00 1479 2
18 4 2020-12-20 00:00:00 2188 2
19 5 2020-12-20 00:00:00 2817 7
20 6 2020-12-20 00:00:00 3060 9

Related

Data imputation by linear interpolation according to date in R

I have a large dataset of mineral nitrogen values from different plots which includes some missing data were on some dates we could not take samples. it is known that mineral N values in soil change linearly between samplings.
for the sake of simplification I have created a data frame that has 10 plots with 4 dates (with different distances between them) with missing data in one of the dates:
df <- data.frame(plot= c(1,2,3,4,5,6,7,8,9,10),
date = c("2020-10-01", "2020-10-01","2020-10-01","2020-10-01","2020-10-01","2020-10-01","2020-10-01","2020-10-01","2020-10-01","2020-10-01",
"2020-10-08", "2020-10-08","2020-10-08","2020-10-08","2020-10-08","2020-10-08","2020-10-08","2020-10-08","2020-10-08","2020-10-08",
"2020-10-29","2020-10-29","2020-10-29","2020-10-29","2020-10-29","2020-10-29","2020-10-29","2020-10-29","2020-10-29","2020-10-29",
"2020-11-05","2020-11-05","2020-11-05","2020-11-05","2020-11-05","2020-11-05","2020-11-05","2020-11-05","2020-11-05","2020-11-05"),
Nmin = c(100, 120, 50, 60, 70, 80, 100, 70, 30, 50, 90, 130, 60, 60, 60, 90, 105, 60, 25, 40, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 50, 170, 100, 60, 20, 130, 125, 20, 5, 0))
df$date <- as.Date(df$date, format="%d.%m.%Y")
df$Nmin <- as.numeric(df$Nmin)
is there a function that can calculate the missing values of Nmin plot-wise and takes in concideration the time between samplings (date)?

Using approx.
df <- transform(df, flag=ifelse(is.na(Nmin), 1, 0)) ## set flag for sake of identification
res <- by(df, df$plot, transform, Nmin=approx(date, Nmin, date)$y) |> unsplit(df$plot)
res
# plot date Nmin flag
# 1 1 2020-10-01 100 0
# 2 2 2020-10-01 120 0
# 3 3 2020-10-01 50 0
# 4 4 2020-10-01 60 0
# 5 5 2020-10-01 70 0
# 6 6 2020-10-01 80 0
# 7 7 2020-10-01 100 0
# 8 8 2020-10-01 70 0
# 9 9 2020-10-01 30 0
# 10 10 2020-10-01 50 0
# 11 1 2020-10-08 90 0
# 12 2 2020-10-08 130 0
# 13 3 2020-10-08 60 0
# 14 4 2020-10-08 60 0
# 15 5 2020-10-08 60 0
# 16 6 2020-10-08 90 0
# 17 7 2020-10-08 105 0
# 18 8 2020-10-08 60 0
# 19 9 2020-10-08 25 0
# 20 10 2020-10-08 40 0
# 21 1 2020-10-29 60 1
# 22 2 2020-10-29 160 1
# 23 3 2020-10-29 90 1
# 24 4 2020-10-29 60 1
# 25 5 2020-10-29 30 1
# 26 6 2020-10-29 120 1
# 27 7 2020-10-29 120 1
# 28 8 2020-10-29 30 1
# 29 9 2020-10-29 10 1
# 30 10 2020-10-29 10 1
# 31 1 2020-11-05 50 0
# 32 2 2020-11-05 170 0
# 33 3 2020-11-05 100 0
# 34 4 2020-11-05 60 0
# 35 5 2020-11-05 20 0
# 36 6 2020-11-05 130 0
# 37 7 2020-11-05 125 0
# 38 8 2020-11-05 20 0
# 39 9 2020-11-05 5 0
# 40 10 2020-11-05 0 0
Let's take a look at the plot.
clr <- rainbow(10)
with(res, plot(Nmin ~ date, type='n'))
by(res, res$plot, with, points(jitter(Nmin) ~ date, type='b', pch=ifelse(flag == 1, 21, 16), col=clr[plot], bg='white'))
legend('topleft', legend=paste('plot', 1:10), lty=1, col=clr, ncol=4, bty='n', cex=.7)
Note: For non-linear inter/extrapolation, see this answer.
Data:
df <- structure(list(plot = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2,
3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2,
3, 4, 5, 6, 7, 8, 9, 10), date = structure(c(18536, 18536, 18536,
18536, 18536, 18536, 18536, 18536, 18536, 18536, 18543, 18543,
18543, 18543, 18543, 18543, 18543, 18543, 18543, 18543, 18564,
18564, 18564, 18564, 18564, 18564, 18564, 18564, 18564, 18564,
18571, 18571, 18571, 18571, 18571, 18571, 18571, 18571, 18571,
18571), class = "Date"), Nmin = c(100, 120, 50, 60, 70, 80, 100,
70, 30, 50, 90, 130, 60, 60, 60, 90, 105, 60, 25, 40, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 50, 170, 100, 60, 20, 130, 125,
20, 5, 0), flag = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0)), class = "data.frame", row.names = c(NA, -40L
))

How to create before and after scores in two different columns based on date?

I have two tables first table has stress score recorded at various time points and second table has date of treatment. I want to get the stress scores before and after treatment for each participant who has received the treatment. Also I want a column that gives information on when was the stress score recorded before and after treatment. I do not understand from where do I begin,and what should my code look like.
score.dt = data.table(
participant.index = c(1, 1, 1, 3, 4, 4, 13, 21, 21, 25, 37, 40, 41, 41, 41, 43, 43, 43, 44),
repeat.instance = c(2, 3, 6, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 3, 1, 2, 3, 1),
date.recorded = c(
'2017-07-13',
'2017-06-26',
'2018-09-17',
'2016-04-14',
'2014-03-24',
'2016-05-30',
'2018-06-20',
'2014-08-03',
'2015-07-06',
'2014-12-17',
'2014-09-05',
'2013-06-10',
'2015-10-04',
'2016-11-04',
'2016-04-18',
'2014-02-13',
'2013-05-24',
'2014-09-10',
'2014-11-25'
),
subscale = c(
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress"
),
score = c(18, 10, 18, 36, 16, 30, 28, 10, 12, 40, 16, 12, 10, 14, 6, 32, 42, 26, 18)
)
date.treatment.dt = data.table (
participant.index = c(1, 4, 5, 6, 8, 10, 11, 12, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26),
date.treatment = c(
'2018 - 06 - 27',
'2001 - 07 - 16',
'2009 - 12 - 09',
'2009 - 05 - 20',
'2009 - 07 - 22',
'2008-07 - 02',
'2009 - 11 - 25',
'2009 - 09 - 16',
'1991 - 07 - 30',
'2016 - 05 - 25',
'2012 - 07 - 25',
'2007 - 03 - 19',
'2012 - 01 - 25',
'2011 - 09 - 21',
'2000 - 03 - 06',
'2001 - 09 - 25',
'1999 - 12 - 20',
'1997 -07 - 28',
'2002 - 03 - 12',
'2008 - 01 - 23'
))
Desired output columns: is something like this
score.date.dt = c("candidate.index.x", "repeat.instance", "subscale", "score", "date.treatment", "date.recorded", "score.before.treatment", "score.after.treatment", "months.before.treatment", "months.after.treatment")
Here the columns months.before.treatment indicates how many months before treatment the stress score was measured and month.after.treatment indicates how many months after treatment the stress score was measured.

In your example set, you only have four individuals with stress scores that have any rows in the treatment table (participants 1,4,21,and 25). Only one of these, participant 1, has both a pre-treatment stress measures and post-treatment stress measure...
Here is one way to produce the information you need:
inner_join(score.dt,date.treatment.dt, by="participant.index") %>%
group_by(participant.index, date.treatment) %>%
summarize(pre_treatment = min(date.recorded[date.recorded<=date.treatment]),
post_treatment = max(date.recorded[date.recorded>=date.treatment])) %>%
pivot_longer(cols = -(participant.index:date.treatment), names_to = "period", values_to = "date.recorded") %>%
left_join(score.dt, by=c("participant.index", "date.recorded" )) %>%
mutate(period=str_extract(period,".*(?=_)"),
months = abs(as.numeric(date.treatment-date.recorded))/(365.25/12)) %>%
pivot_wider(id_cols = participant.index:date.treatment, names_from = period, values_from=c(date.recorded, subscale, months,score))
Output:
participant.index date.treatment date.recorded_pre date.recorded_post subscale_pre subscale_post months_pre months_post score_pre score_post
<dbl> <date> <date> <date> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 2018-06-27 2017-06-26 2018-09-17 stress stress 12.0 2.69 10 18
2 4 2001-07-16 NA 2016-05-30 NA stress Inf 178. NA 30
3 21 2000-03-06 NA 2015-07-06 NA stress Inf 184. NA 12
4 25 2002-03-12 NA 2014-12-17 NA stress Inf 153. NA 40
Note: you will have to fix the date inputs to the two source files, like this:
# first correct, your date.treatment column, and convert to date
date.treatment.dt[, date.treatment := as.Date(str_replace_all(date.treatment," ",""), "%Y-%m-%d")]
# second, similarly fix the date column in your stress score table
score.dt[,date.recorded := as.Date(date.recorded,"%Y-%m-%d")]

It seems like there are a few parts to what you're asking. First, you need to merge the two tables together. Here I use dplyr::inner_join() which automatically detects that the candidate.index is the only column in common and merges on that while discarding records found in only one of the tables. Second, we convert to a date format for both dates to enable the calculation of elapsed months.
library(tidyverse)
library(data.table)
library(lubridate)
score.dt <- structure(list(participant.index = c(1, 1, 1, 3, 4, 4, 13, 21, 21, 25, 37, 40, 41, 41, 41, 43, 43, 43, 44), repeat.instance = c(2, 3, 6, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 3, 1, 2, 3, 1), date.recorded = c("2017-07-13", "2017-06-26", "2018-09-17", "2016-04-14", "2014-03-24", "2016-05-30", "2018-06-20", "2014-08-03", "2015-07-06", "2014-12-17", "2014-09-05", "2013-06-10", "2015-10-04", "2016-11-04", "2016-04-18", "2014-02-13", "2013-05-24", "2014-09-10", "2014-11-25"), subscale = c("stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress"), score = c(18, 10, 18, 36, 16, 30, 28, 10, 12, 40, 16, 12, 10, 14, 6, 32, 42, 26, 18)), row.names = c(NA, -19L), class = c("data.table", "data.frame"))
date.treatment.dt <- structure(list(participant.index = c(1, 4, 5, 6, 8, 10, 11, 12, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26), date.treatment = c("2018 - 06 - 27", "2001 - 07 - 16", "2009 - 12 - 09", "2009 - 05 - 20", "2009 - 07 - 22", "2008-07 - 02", "2009 - 11 - 25", "2009 - 09 - 16", "1991 - 07 - 30", "2016 - 05 - 25", "2012 - 07 - 25", "2007 - 03 - 19", "2012 - 01 - 25", "2011 - 09 - 21", "2000 - 03 - 06", "2001 - 09 - 25", "1999 - 12 - 20", "1997 -07 - 28", "2002 - 03 - 12", "2008 - 01 - 23")), row.names = c(NA, -20L), class = c("data.table", "data.frame"))
inner_join(date.treatment.dt, score.dt) %>%
mutate(across(contains("date"), as_date)) %>%
mutate(months.after = interval(date.treatment, date.recorded) %/% months(1)) %>%
mutate(months.before = 0 - months.after)
#> Joining, by = "participant.index"
#> participant.index date.treatment repeat.instance date.recorded subscale
#> 1: 1 2018-06-27 2 2017-07-13 stress
#> 2: 1 2018-06-27 3 2017-06-26 stress
#> 3: 1 2018-06-27 6 2018-09-17 stress
#> 4: 4 2001-07-16 1 2014-03-24 stress
#> 5: 4 2001-07-16 2 2016-05-30 stress
#> 6: 21 2000-03-06 1 2014-08-03 stress
#> 7: 21 2000-03-06 2 2015-07-06 stress
#> 8: 25 2002-03-12 1 2014-12-17 stress
#> score months.after months.before
#> 1: 18 -11 11
#> 2: 10 -12 12
#> 3: 18 2 -2
#> 4: 16 152 -152
#> 5: 30 178 -178
#> 6: 10 172 -172
#> 7: 12 184 -184
#> 8: 40 153 -153
Created on 2022-04-05 by the reprex package (v2.0.1)

Why can't I plot my model selection plot correctly?

I am working with multiple regression models. After running the dredge function, I got approximately 54 000 different combinations. I selected the first 300 models and ran this code:
par(mar=c(1,4,10,3))
> plot(fitt, labels = c("Intercept",
+ "YOFE",
+ "'RW Closeness'",
+ "'LW Closeness'",
+ "Age",
+ "SES",
+ "'GAD-7 Score'",
+ "Fantasy",
+ "'Personal Distress'",
+ "'Empathic Concern'",
+ "'Perspective Taking'",
+ "'PHQ-9 Score'",
+ "'Religioius Affinity'",
+ "'Agreement with IH'",
+ "'Moral Judgement of IH'",
+ "'Harm Assessment of IH'",
+ "'Agreement with IB'",
+ "'Moral Judgement of IB'",
+ "RMET",
+ "Sex"),ylab = expression("Cumulative" ~italic(w[i]*(AICc))),col = c(colfunc(1)), border = "gray30",labAsExpr = TRUE)
10 minutes later, I got this error:
Error in (function (text, side = 3, line = 0, outer = FALSE, at = NA, :
zero-length 'text' specified
In addition: Warning message:
In max(strwidth(arg[["text"]], cex = arg$cex, units = "in")) :
no non-missing arguments to max; returning -Inf
And this is the output plot:
I've tried plotting only the first model and the same error appears:
This also happens when using the whole model selection table (54 000 combinations).
What is a solution to this?
I'm running the latest version of R and RStudio on my 2016 12 inch Macbook.
Note: I've tried increasing the plot-window size manually by dragging the edges without any improvement.
This is what I'd like my plot to look like:
EDIT: Here is the data file data and the code.
modeloglobal<-lm(PROMEDIO_CREENCIA_NFALSA_CORONAVIRUS~Edad+Sex+
AnEdu+
Estrato_1+
GAD_TOTAL+
PHQ_TOTAL+
PracticRel_2+
CercanPolDer_1+
CercanPolIz_1+
RMET_TOTAL+
IRI_PREOCUPACIÓN_EMPATICA+
IRI_FANTASÍA+
IRI_MALESTAR_PERSONAL+
IRI_TOMA_DE_PERSPECTIVA+
PROMEDIO_DILEMAS_BI_ACTUARIGUAL_CORONAVIRUS+
PROMEDIO_DILEMAS_BI_BIENOMAL_CORONAVIRUS+
PROMEDIO_DI_SINPOL_ACTUARIGUAL+
PROMEDIO_DI_SINPOL_BIENOMAL+
PROMEDIO_DI_SINPOL_DANO, data=fake_news,na.action="na.fail")
library(MuMIn)
fitt<-dredge(modeloglobal,trace=2)
m.sel <- model.sel(fitt)
m.sel2 <- m.sel[1:300,]
library(binovisualfields)
And the code that runs the error (using a subset of the first 300 rows):
par(mar=c(1,4,10,3))
> plot(m.sel2, labels = c("Intercept",
+ "YOFE",
+ "'RW Closeness'",
+ "'LW Closeness'",
+ "Age",
+ "SES",
+ "'GAD-7 Score'",
+ "Fantasy",
+ "'Personal Distress'",
+ "'Empathic Concern'",
+ "'Perspective Taking'",
+ "'PHQ-9 Score'",
+ "'Religioius Affinity'",
+ "'Agreement with IH'",
+ "'Moral Judgement of IH'",
+ "'Harm Assessment of IH'",
+ "'Agreement with IB'",
+ "'Moral Judgement of IB'",
+ "RMET",
+ "Sex"),ylab = expression("Cumulative" ~italic(w[i]*(AICc))),col = c(colfunc(1)), border = "gray30",labAsExpr = TRUE)
EDIT 2: Here's the data frame I got from dput().
ResponseId Edad Sex Genero Nacion Resid Estrato_1 Gastos salud
1 R_25GEak825Ohmb9G 18 Female Femenino Colombia Colombia 7 Seguro privado
2 R_1kT7u0PALDHV8H6 20 Female Femenino Colombia Colombia 5 Seguro privado
3 R_2cpBb5Ifzj7lVGs 21 Female Femenino Colombia Colombia 6 Seguro privado
4 R_sGqNUMTXTJzwC09 20 Male Masculino Colombia Colombia 5 Seguro del Estado
5 R_2Cpixt9Z5FJkhg1 36 Male Masculino Colombia Colombia 6 Otro (especifique)
6 R_3QFq50SZNs6CePA 18 Female Femenino Colombia Colombia 7 Seguro privado
Relig PracticRel_2 AnEdu Q161 Ecron Epsiq Q183 Eneu Q184
1 Ninguna 0 15 Estudiante 1 0 <NA> 0 <NA>
2 Cristianismo (Catolicismo) 2 15 Estudiante 0 0 <NA> 0 <NA>
3 Cristianismo (Catolicismo) 2 19 Estudiante 0 0 <NA> 0 <NA>
4 Cristianismo (Catolicismo) 2 15 Estudiante 0 0 <NA> 0 <NA>
5 Cristianismo (Catolicismo) 1 17 Empleado de tiempo completo 0 0 <NA> 0 <NA>
6 Cristianismo (Catolicismo) 4 15 Estudiante 0 0 <NA> 0 <NA>
NPviven Sustancias Pviven AdhAS LevantarAS_1 CumplimAS_1 HorasFuera
1 1 1 Padres 1 5 6 Menos de una hora
2 3 0 Padres,Hermanos 1 1 6 Menos de una hora
3 4 0 Padres,Hermanos 1 2 6 Menos de una hora
4 4 0 Padres,Hermanos 1 2 6 Menos de una hora
5 3 0 Pareja,Hijos 1 2 3 Entre cuatro y seis horas
6 3 0 Padres,Hermanos 1 2 6 Entre una y tres horas
Apoyo CV19_1 ContagUd ContagEC Prob_1_contagio Prob_2_familiar_contagio
1 1 No 0 81 100
2 4 No 0 81 35
3 6 No 0 60 80
4 4 No 0 4 15
5 5 No 0 40 40
6 6 No 0 79 86
Prob_3_contagio_poblaciongeneral Caract_1 Caract_2 Inv_3 Caract_3 Caract_4 Caract_5 Caract_6 Caract_8
1 87 4 2 1 6 4 5 4 5
2 81 5 4 3 4 4 5 2 3
3 80 4 4 1 6 6 6 1 2
4 20 6 5 5 2 1 5 1 5
5 60 2 1 2 5 4 3 2 3
6 70 5 4 2 5 6 2 5 6
Caract_9 Caract_11 Caract_14 INV_15 Caract_15 Caract_16 Caract_17 CompPan_1 CompPan_2 CompPan_3
1 5 3 2 4 3 5 5 1 6 1
2 4 5 4 5 2 3 3 4 5 8
3 6 1 6 6 1 6 6 1 1 1
4 5 5 2 6 1 3 1 1 3 2
5 4 1 1 5 2 2 2 2 2 2
6 6 2 3 5 2 6 5 2 7 3
CompPan_4 CompPan_5 CompPan_6 CercanPolDer_1 CercanPolIz_1 IDpol_1 PHQ_TOTAL GAD_TOTAL
1 5 5 7 8 2 5 8 6
2 8 8 8 7 3 5 4 3
3 3 2 4 6 3 4 2 3
4 4 3 3 5 5 4 3 3
5 3 3 2 5 5 4 2 2
6 6 2 7 3 8 3 7 7
INTEROCEPCION_TOTAL BIS BAS_FUN_SEEKING BAS_REWARD_RESPONSIVENESS BAS_DRIVE BAS_TOTAL
1 45 19 14 19 11 44
2 44 20 10 17 14 41
3 24 17 10 19 13 42
4 17 17 9 14 8 31
5 36 21 10 17 11 38
6 41 25 6 17 13 36
IRI_TOMA_DE_PERSPECTIVA IRI_MALESTAR_PERSONAL IRI_FANTASÍA IRI_PREOCUPACIÓN_EMPATICA RMET_TOTAL
1 14 13 14 19 7
2 18 11 14 20 4
3 17 4 10 20 10
4 16 9 11 12 7
5 10 11 7 10 10
6 16 11 16 18 8
PROMEDIO_TIEMPO_REACCION_RMET PROMEDIO_CREENCIA_NFALSA_TODAS PROMEDIO_CREENCIA_NFALSA_CORONAVIRUS
1 2.411750 2.8 2.666667
2 3.348500 2.8 2.333333
3 3.261083 2.4 2.000000
4 6.390500 2.2 1.666667
5 13.212667 1.8 1.333333
6 4.218583 3.6 2.666667
PROMEDIO_CREENCIA_NFALSA_OTRO PROMEDIO_TIEMPOREACCION_NFALSA PROMEDIO_CREENCIA_NVERDADERA_TODAS
1 3.0 4.3438 3.333333
2 3.5 9.4222 3.000000
3 3.0 5.9734 3.666667
4 3.0 10.1448 2.666667
5 2.5 16.3196 1.333333
6 5.0 7.1954 3.333333
PROMEDIO_CREENCIA_NVERDADERA_CORONAVIRUS PROMEDIO_CREENCIA_NVERDADERA_OTRO
1 5 5
2 4 5
3 6 5
4 5 3
5 1 3
6 6 4
PROMEDIO_TIEMPOREACCION_NVERDADERA PROMEDIO_CREENCIA_NMISLEADING_TODAS
1 5.6440 2.666667
2 7.0430 2.666667
3 8.0265 3.666667
4 4.0495 3.000000
5 32.2400 1.666667
6 9.5830 4.333333
PROMEDIO_TIEMPOREACCION_NMISLEADING PROMEDIO_DILEMAS_BI_BIENOMAL_CORONAVIRUS
1 5.726667 1.000000
2 12.012333 4.000000
3 5.753000 4.333333
4 4.969667 1.333333
5 15.233000 0.000000
6 30.045667 3.666667
PROMEDIO_DILEMAS_BI_ACTUARIGUAL_CORONAVIRUS DILEMA_BI_CONTROL_BIENOMAL DILEMA_BI_CONTROL_ACTUARIGUAL
1 5.666667 4 7
2 7.666667 5 4
3 9.666667 2 6
4 4.333333 0 2
5 3.666667 -3 2
6 9.333333 4 10
PROMEDIO_DILEMAS_BI_BIENOMAL_JUNTOS PROMEDIO_DILEMAS_BI_ACTUARIGUAL_JUNTOS
1 1.75 6.00
2 4.25 6.75
3 3.75 8.75
4 1.00 3.75
5 -0.75 3.25
6 3.75 9.50
PROMEDIO_DILEMAS_DI_BIENOMAL PROMEDIO_DILEMAS_DI_ACTUARIGUAL PROMEDIO_DILEMAS_DI_DANO
1 0.5000000 6.666667 5.666667
2 1.8333333 7.666667 6.166667
3 0.5000000 5.666667 5.333333
4 1.6666667 5.000000 5.500000
5 0.8333333 4.833333 5.666667
6 0.1666667 5.166667 7.000000
TIEMPOREACCION_DILEMAS_DI TIEMPOREACCION_DILEMAS_BI PROMEDIO_DI_SINPOL_BIENOMAL
1 12.140500 7.89900 0.2
2 9.130667 9.99550 1.2
3 6.998333 9.25175 -1.0
4 1.857833 2.84125 0.4
5 19.014333 32.82850 0.8
6 11.633667 16.92000 0.2
PROMEDIO_DI_SINPOL_ACTUARIGUAL PROMEDIO_DI_SINPOL_DANO COMPRAS_COVID19 PERCEPCION_RIESGO_TOTAL
1 7.00 7.25 4.166667 39
2 8.00 6.75 6.833333 37
3 4.25 7.25 2.000000 42
4 4.50 7.00 2.666667 38
5 5.00 7.75 2.333333 26
6 5.50 7.75 4.500000 46
PERCEPCION_RIESGO_INDICE PROB_CONTAGIO_TOTAL PROMEDIO_DILEMASPOLITICOS_BIENOMAL
1 3.9 89.33333 1.0
2 3.7 65.66667 2.5
3 4.2 73.33333 4.0
4 3.8 13.00000 4.0
5 2.6 46.66667 0.5
6 4.6 78.33333 0.0
PROMEDIO_DILEMASPOLITICOS_ACTUARIGUAL PROMEDIO_DILEMASPOLITICOS_DANO D31_1_DI D32_2_DI D33_3_DI
1 6.0 2.5 -2 4 9
2 7.0 5.0 3 9 7
3 8.5 1.5 -3 3 8
4 6.0 2.5 0 3 8
5 4.5 1.5 -2 4 8
6 4.5 5.5 4 9 7
D41_1_DI D42_2_DI D43_3_DI D51_1_DI D52_2_DI D53_3_DI D61_1_DI D62_2_DI D63_3_DI D71_1_DIP D72_2_DIP
1 -1 7 7 5 10 4 -1 7 9 0 4
2 1 8 9 0 7 4 2 8 7 3 7
3 0 6 7 1 5 6 -3 3 8 3 7
4 0 5 8 4 7 3 -2 3 9 4 3
5 3 7 9 1 3 7 2 6 7 -2 2
6 1 8 6 0 4 9 -4 1 9 -4 1
D73_3_DIP D81_1_DIP D82_2_DIP D83_3_DIP D91_1_BI D92_2_BI D101_1_BI D102_2_BI D111_1_BI D112_2_BI
1 3 2 8 2 -3 4 3 9 3 4
2 6 2 7 4 3 8 5 8 4 7
3 2 5 10 1 5 10 5 10 3 9
4 2 4 9 3 4 9 0 2 0 2
5 2 3 7 1 -1 3 3 6 -2 2
6 8 4 8 3 4 9 5 10 2 9
D121_1_BI D122_2_BI total_iri promedio_falsaymisleading prediccioncompraspercprob
1 4 7 60 2.750 4.249759
2 5 4 63 2.750 4.404450
3 2 6 51 2.875 4.431635
4 0 2 48 2.500 5.143974
5 -3 2 38 1.750 3.765907
6 4 10 61 3.875 4.893797
prediccioncomprasperc
1 4.474456
2 4.439994
3 4.521980
4 4.689385
5 3.762449
6 4.967286
Here is the raw dput() output:
structure(list(ResponseId = c("R_25GEak825Ohmb9G", "R_1kT7u0PALDHV8H6",
"R_2cpBb5Ifzj7lVGs", "R_sGqNUMTXTJzwC09", "R_2Cpixt9Z5FJkhg1",
"R_3QFq50SZNs6CePA"), Edad = c(18, 20, 21, 20, 36, 18), Sex = structure(c(2L,
2L, 2L, 1L, 1L, 2L), .Label = c("Male", "Female"), class = "factor"),
Genero = c("Femenino", "Femenino", "Femenino", "Masculino",
"Masculino", "Femenino"), Nacion = c("Colombia", "Colombia",
"Colombia", "Colombia", "Colombia", "Colombia"), Resid = c("Colombia",
"Colombia", "Colombia", "Colombia", "Colombia", "Colombia"
), Estrato_1 = c(7, 5, 6, 5, 6, 7), `Gastos salud` = c("Seguro privado",
"Seguro privado", "Seguro privado", "Seguro del Estado",
"Otro (especifique)", "Seguro privado"), Relig = c("Ninguna",
"Cristianismo (Catolicismo)", "Cristianismo (Catolicismo)",
"Cristianismo (Catolicismo)", "Cristianismo (Catolicismo)",
"Cristianismo (Catolicismo)"), PracticRel_2 = c(0, 2, 2,
2, 1, 4), AnEdu = c(15, 15, 19, 15, 17, 15), Q161 = c("Estudiante",
"Estudiante", "Estudiante", "Estudiante", "Empleado de tiempo completo",
"Estudiante"), Ecron = c(1, 0, 0, 0, 0, 0), Epsiq = c(0,
0, 0, 0, 0, 0), Q183 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), Eneu = c(0,
0, 0, 0, 0, 0), Q184 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), NPviven = c("1",
"3", "4", "4", "3", "3"), Sustancias = c(1, 0, 0, 0, 0, 0
), Pviven = c("Padres", "Padres,Hermanos", "Padres,Hermanos",
"Padres,Hermanos", "Pareja,Hijos", "Padres,Hermanos"), AdhAS = c(1,
1, 1, 1, 1, 1), LevantarAS_1 = c(5, 1, 2, 2, 2, 2), CumplimAS_1 = c(6,
6, 6, 6, 3, 6), HorasFuera = c("Menos de una hora", "Menos de una hora",
"Menos de una hora", "Menos de una hora", "Entre cuatro y seis horas",
"Entre una y tres horas"), `Apoyo CV19_1` = c(1, 4, 6, 4,
5, 6), ContagUd = c("No", "No", "No", "No", "No", "No"),
ContagEC = c(0, 0, 0, 0, 0, 0), Prob_1_contagio = c(81, 81,
60, 4, 40, 79), Prob_2_familiar_contagio = c(100, 35, 80,
15, 40, 86), Prob_3_contagio_poblaciongeneral = c(87, 81,
80, 20, 60, 70), Caract_1 = c(4, 5, 4, 6, 2, 5), Caract_2 = c(2,
4, 4, 5, 1, 4), Inv_3 = c(1, 3, 1, 5, 2, 2), Caract_3 = c(6,
4, 6, 2, 5, 5), Caract_4 = c(4, 4, 6, 1, 4, 6), Caract_5 = c(5,
5, 6, 5, 3, 2), Caract_6 = c(4, 2, 1, 1, 2, 5), Caract_8 = c(5,
3, 2, 5, 3, 6), Caract_9 = c(5, 4, 6, 5, 4, 6), Caract_11 = c(3,
5, 1, 5, 1, 2), Caract_14 = c(2, 4, 6, 2, 1, 3), INV_15 = c(4,
5, 6, 6, 5, 5), Caract_15 = c(3, 2, 1, 1, 2, 2), Caract_16 = c(5,
3, 6, 3, 2, 6), Caract_17 = c(5, 3, 6, 1, 2, 5), CompPan_1 = c(1,
4, 1, 1, 2, 2), CompPan_2 = c(6, 5, 1, 3, 2, 7), CompPan_3 = c(1,
8, 1, 2, 2, 3), CompPan_4 = c(5, 8, 3, 4, 3, 6), CompPan_5 = c(5,
8, 2, 3, 3, 2), CompPan_6 = c(7, 8, 4, 3, 2, 7), CercanPolDer_1 = c(8,
7, 6, 5, 5, 3), CercanPolIz_1 = c(2, 3, 3, 5, 5, 8), IDpol_1 = c(5,
5, 4, 4, 4, 3), PHQ_TOTAL = c(8, 4, 2, 3, 2, 7), GAD_TOTAL = c(6,
3, 3, 3, 2, 7), INTEROCEPCION_TOTAL = c(45, 44, 24, 17, 36,
41), BIS = c(19, 20, 17, 17, 21, 25), BAS_FUN_SEEKING = c(14,
10, 10, 9, 10, 6), BAS_REWARD_RESPONSIVENESS = c(19, 17,
19, 14, 17, 17), BAS_DRIVE = c(11, 14, 13, 8, 11, 13), BAS_TOTAL = c(44,
41, 42, 31, 38, 36), IRI_TOMA_DE_PERSPECTIVA = c(14, 18,
17, 16, 10, 16), IRI_MALESTAR_PERSONAL = c(13, 11, 4, 9,
11, 11), IRI_FANTASÍA = c(14, 14, 10, 11, 7, 16), IRI_PREOCUPACIÓN_EMPATICA = c(19,
20, 20, 12, 10, 18), RMET_TOTAL = c(7, 4, 10, 7, 10, 8),
PROMEDIO_TIEMPO_REACCION_RMET = c(2.41175, 3.3485, 3.26108333333333,
6.3905, 13.2126666666667, 4.21858333333333), PROMEDIO_CREENCIA_NFALSA_TODAS = c(2.8,
2.8, 2.4, 2.2, 1.8, 3.6), PROMEDIO_CREENCIA_NFALSA_CORONAVIRUS = c(2.66666666666667,
2.33333333333333, 2, 1.66666666666667, 1.33333333333333,
2.66666666666667), PROMEDIO_CREENCIA_NFALSA_OTRO = c(3, 3.5,
3, 3, 2.5, 5), PROMEDIO_TIEMPOREACCION_NFALSA = c(4.3438,
9.4222, 5.9734, 10.1448, 16.3196, 7.1954), PROMEDIO_CREENCIA_NVERDADERA_TODAS = c(3.33333333333333,
3, 3.66666666666667, 2.66666666666667, 1.33333333333333,
3.33333333333333), PROMEDIO_CREENCIA_NVERDADERA_CORONAVIRUS = c(5,
4, 6, 5, 1, 6), PROMEDIO_CREENCIA_NVERDADERA_OTRO = c(5,
5, 5, 3, 3, 4), PROMEDIO_TIEMPOREACCION_NVERDADERA = c(5.644,
7.043, 8.0265, 4.0495, 32.24, 9.583), PROMEDIO_CREENCIA_NMISLEADING_TODAS = c(2.66666666666667,
2.66666666666667, 3.66666666666667, 3, 1.66666666666667,
4.33333333333333), PROMEDIO_TIEMPOREACCION_NMISLEADING = c(5.72666666666667,
12.0123333333333, 5.753, 4.96966666666667, 15.233, 30.0456666666667
), PROMEDIO_DILEMAS_BI_BIENOMAL_CORONAVIRUS = c(1, 4, 4.33333333333333,
1.33333333333333, 0, 3.66666666666667), PROMEDIO_DILEMAS_BI_ACTUARIGUAL_CORONAVIRUS = c(5.66666666666667,
7.66666666666667, 9.66666666666667, 4.33333333333333, 3.66666666666667,
9.33333333333333), DILEMA_BI_CONTROL_BIENOMAL = c(4, 5, 2,
0, -3, 4), DILEMA_BI_CONTROL_ACTUARIGUAL = c(7, 4, 6, 2,
2, 10), PROMEDIO_DILEMAS_BI_BIENOMAL_JUNTOS = c(1.75, 4.25,
3.75, 1, -0.75, 3.75), PROMEDIO_DILEMAS_BI_ACTUARIGUAL_JUNTOS = c(6,
6.75, 8.75, 3.75, 3.25, 9.5), PROMEDIO_DILEMAS_DI_BIENOMAL = c(0.5,
1.83333333333333, 0.5, 1.66666666666667, 0.833333333333333,
0.166666666666667), PROMEDIO_DILEMAS_DI_ACTUARIGUAL = c(6.66666666666667,
7.66666666666667, 5.66666666666667, 5, 4.83333333333333,
5.16666666666667), PROMEDIO_DILEMAS_DI_DANO = c(5.66666666666667,
6.16666666666667, 5.33333333333333, 5.5, 5.66666666666667,
7), TIEMPOREACCION_DILEMAS_DI = c(12.1405, 9.13066666666666,
6.99833333333333, 1.85783333333333, 19.0143333333333, 11.6336666666667
), TIEMPOREACCION_DILEMAS_BI = c(7.899, 9.9955, 9.25175,
2.84125, 32.8285, 16.92), PROMEDIO_DI_SINPOL_BIENOMAL = c(0.2,
1.2, -1, 0.4, 0.8, 0.2), PROMEDIO_DI_SINPOL_ACTUARIGUAL = c(7,
8, 4.25, 4.5, 5, 5.5), PROMEDIO_DI_SINPOL_DANO = c(7.25,
6.75, 7.25, 7, 7.75, 7.75), COMPRAS_COVID19 = c(4.16666666666667,
6.83333333333333, 2, 2.66666666666667, 2.33333333333333,
4.5), PERCEPCION_RIESGO_TOTAL = c(39, 37, 42, 38, 26, 46),
PERCEPCION_RIESGO_INDICE = c(3.9, 3.7, 4.2, 3.8, 2.6, 4.6
), PROB_CONTAGIO_TOTAL = c(89.3333333333333, 65.6666666666667,
73.3333333333333, 13, 46.6666666666667, 78.3333333333333),
PROMEDIO_DILEMASPOLITICOS_BIENOMAL = c(1, 2.5, 4, 4, 0.5,
0), PROMEDIO_DILEMASPOLITICOS_ACTUARIGUAL = c(6, 7, 8.5,
6, 4.5, 4.5), PROMEDIO_DILEMASPOLITICOS_DANO = c(2.5, 5,
1.5, 2.5, 1.5, 5.5), D31_1_DI = c(-2, 3, -3, 0, -2, 4), D32_2_DI = c(4,
9, 3, 3, 4, 9), D33_3_DI = c(9, 7, 8, 8, 8, 7), D41_1_DI = c(-1,
1, 0, 0, 3, 1), D42_2_DI = c(7, 8, 6, 5, 7, 8), D43_3_DI = c(7,
9, 7, 8, 9, 6), D51_1_DI = c(5, 0, 1, 4, 1, 0), D52_2_DI = c(10,
7, 5, 7, 3, 4), D53_3_DI = c(4, 4, 6, 3, 7, 9), D61_1_DI = c(-1,
2, -3, -2, 2, -4), D62_2_DI = c(7, 8, 3, 3, 6, 1), D63_3_DI = c(9,
7, 8, 9, 7, 9), D71_1_DIP = c(0, 3, 3, 4, -2, -4), D72_2_DIP = c(4,
7, 7, 3, 2, 1), D73_3_DIP = c(3, 6, 2, 2, 2, 8), D81_1_DIP = c(2,
2, 5, 4, 3, 4), D82_2_DIP = c(8, 7, 10, 9, 7, 8), D83_3_DIP = c(2,
4, 1, 3, 1, 3), D91_1_BI = c(-3, 3, 5, 4, -1, 4), D92_2_BI = c(4,
8, 10, 9, 3, 9), D101_1_BI = c(3, 5, 5, 0, 3, 5), D102_2_BI = c(9,
8, 10, 2, 6, 10), D111_1_BI = c(3, 4, 3, 0, -2, 2), D112_2_BI = c(4,
7, 9, 2, 2, 9), D121_1_BI = c(4, 5, 2, 0, -3, 4), D122_2_BI = c(7,
4, 6, 2, 2, 10), total_iri = c(60, 63, 51, 48, 38, 61), promedio_falsaymisleading = c(2.75,
2.75, 2.875, 2.5, 1.75, 3.875), prediccioncompraspercprob = c(`1` = 4.24975892576113,
`2` = 4.40445037029013, `3` = 4.43163539588384, `4` = 5.14397435590305,
`5` = 3.76590707825915, `6` = 4.8937968160894), prediccioncomprasperc = c(`1` = 4.47445595202732,
`2` = 4.4399943212902, `3` = 4.52198006754018, `4` = 4.68938453833302,
`5` = 3.7624488758014, `6` = 4.96728571465517)), row.names = c(NA,
6L), class = c("tbl_df", "tbl", "data.frame"))

Join two data frames in R based on closest timestamp within groups

I have the following dataframes
structure(list(id = c(1, 2, 3, 4, 5), time = structure(c(1484092800,
1485907200, 1490227200, 1490918400, 1491955200), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
id time
<dbl> <dttm>
1 1 2017-01-11 00:00:00
2 2 2017-02-01 00:00:00
3 3 2017-03-23 00:00:00
4 4 2017-03-31 00:00:00
5 5 2017-04-12 00:00:00
structure(list(id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3,
3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5), time = structure(c(1466553600,
1465948800, 1453420800, 1485302400, 1433030400, 1421712000, 1453852800,
1485302400, 1485993600, 1517529600, 1400544000, 1434067200, 1466985600,
1497484800, 1390003200, 1516060800, 1464825600, 1497916800, 1527638400,
1454025600, 1390608000, 1421712000, 1466467200, 1453852800, 1485820800
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), score = c(3,
2, 5, 4, 5, 24.2, 24.8, 25.4, 26, 26.6, 36.2, 36.8, 37.4, 38,
38.6, 44, 44.6, 45.2, 45.8, 46.4, 59, 59.6, 60.2, 60.8, 61.4)), row.names = c(NA,
-25L), class = c("tbl_df", "tbl", "data.frame"))
id time score
<dbl> <dttm> <dbl>
1 1 2016-06-22 00:00:00 3
2 1 2016-06-15 00:00:00 2
3 1 2016-01-22 00:00:00 5
4 1 2017-01-25 00:00:00 4
5 1 2015-05-31 00:00:00 5
6 2 2015-01-20 00:00:00 24.2
7 2 2016-01-27 00:00:00 24.8
8 2 2017-01-25 00:00:00 25.4
9 2 2017-02-02 00:00:00 26
10 2 2018-02-02 00:00:00 26.6
# … with 15 more rows
I would like to have the score of sdf where the time is closest to that of in df. But I would also have to look at the id's! I already tried this from Join two data frames in R based on closest timestamp:
d <- function(x,y) abs(x-y) # define the distance function
idx <- sapply( df$time, function(x) which.min( d(x,sdf$time) ))
cbind(df,sdf[idx,-1,drop=FALSE])
id time time score
1 1 2017-01-11 2017-01-25 4
2 2 2017-02-01 2017-02-02 26
3 3 2017-03-23 2017-02-02 26
4 4 2017-03-31 2017-02-02 26
5 5 2017-04-12 2017-06-15 38
But you don't look at the id, I tried to incorporate the id, however did not work. Any ideas? Thank you in advance :)

We can join the data frames by id and then calculate the time difference and keep the observation with the minimal time difference for each individual:
library(tidyverse)
df2 %>%
left_join(df1, by = "id") %>%
mutate(time_dif = abs(time.x - time.y)) %>%
group_by(id) %>%
filter(time_dif == min(time_dif))
# A tibble: 5 x 5
# Groups: id [5]
id time.x score time.y time_dif
<dbl> <dttm> <dbl> <dttm> <drtn>
1 1 2017-01-25 00:00:00 4 2017-01-11 00:00:00 14 days
2 2 2017-02-02 00:00:00 26 2017-02-01 00:00:00 1 days
3 3 2017-06-15 00:00:00 38 2017-03-23 00:00:00 84 days
4 4 2017-06-20 00:00:00 45.2 2017-03-31 00:00:00 81 days
5 5 2017-01-31 00:00:00 61.4 2017-04-12 00:00:00 71 days
Data
df1 <- structure(list(id = c(1, 2, 3, 4, 5), time = structure(c(1484092800,
1485907200, 1490227200, 1490918400, 1491955200), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
df2 <- structure(list(id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3,
3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5), time = structure(c(1466553600,
1465948800, 1453420800, 1485302400, 1433030400, 1421712000, 1453852800,
1485302400, 1485993600, 1517529600, 1400544000, 1434067200, 1466985600,
1497484800, 1390003200, 1516060800, 1464825600, 1497916800, 1527638400,
1454025600, 1390608000, 1421712000, 1466467200, 1453852800, 1485820800
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), score = c(3,
2, 5, 4, 5, 24.2, 24.8, 25.4, 26, 26.6, 36.2, 36.8, 37.4, 38,
38.6, 44, 44.6, 45.2, 45.8, 46.4, 59, 59.6, 60.2, 60.8, 61.4)), row.names = c(NA,
-25L), class = c("tbl_df", "tbl", "data.frame"))

map over and arrange a list based on the list names

I have a list which looks like:
List of 8
$ 9 :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 40 obs. of 2 variables:
..$ date: Date[1:40], format: "2014-03-22" "2019-03-18" "2018-04-28" ...
..$ .id : num [1:40] 9 9 9 9 9 9 9 9 9 9 ...
$ c(1, 7) :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 40 obs. of 2 variables:
..$ date: Date[1:40], format: "2004-08-26" "2012-10-21" "2007-03-10" ...
..$ .id : num [1:40] 7 7 1 7 7 7 7 1 7 7 ...
$ c(13, 18) :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 40 obs. of 2 variables:
..$ date: Date[1:40], format: "2016-01-31" "2016-03-24" "2018-10-17" ...
..$ .id : num [1:40] 13 13 13 18 13 18 13 13 13 13 ...
$ c(18, 2, 7, 13):Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 40 obs. of 2 variables:
..$ date: Date[1:40], format: "2013-04-05" "2019-04-23" "2005-03-05" ...
..$ .id : num [1:40] 13 2 7 2 2 13 13 7 13 7 ...
$ c(19, 5) :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 40 obs. of 2 variables:
..$ date: Date[1:40], format: "2018-04-10" "2016-08-03" "2012-05-18" ...
..$ .id : num [1:40] 5 19 5 5 5 5 5 5 19 5 ...
$ c(2, 7, 18) :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 40 obs. of 2 variables:
..$ date: Date[1:40], format: "2018-02-01" "2011-03-08" "2009-09-29" ...
..$ .id : num [1:40] 7 7 2 18 2 18 2 2 7 2 ...
$ c(5, 19) :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 40 obs. of 2 variables:
..$ date: Date[1:40], format: "2011-05-14" "2005-08-31" "2015-07-06" ...
..$ .id : num [1:40] 19 5 5 5 5 19 5 5 5 5 ...
$ c(7, 1, 2, 18) :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 40 obs. of 2 variables:
..$ date: Date[1:40], format: "2003-04-12" "2014-12-03" "2001-02-21" ...
..$ .id : num [1:40] 7 1 1 7 2 1 1 18 2 1 ...
The names of the list are the following:
9
c(1, 7)
c(13, 18)
c(18, 2, 7, 13)
c(19, 5)
c(2, 7, 18)
c(5, 19)
c(7, 1, 2, 18)
Two of the lists look like:
$`c(19, 5)`
# A tibble: 40 x 2
date .id
<date> <dbl>
1 2018-04-10 5
2 2016-08-03 19
3 2012-05-18 5
4 2007-09-11 5
5 2011-11-03 5
6 2007-04-09 5
7 2001-07-12 5
8 2018-07-30 5
9 2013-07-30 19
10 2001-08-13 5
# ... with 30 more rows
$`c(2, 7, 18)`
# A tibble: 40 x 2
date .id
<date> <dbl>
1 2018-02-01 7
2 2011-03-08 7
3 2009-09-29 2
4 2014-07-30 18
5 2004-04-17 2
6 2016-11-21 18
7 2007-10-27 2
8 2009-02-08 2
9 2016-01-18 7
10 2010-09-27 2
# ... with 30 more rows
What I would like to do is to arrange the lists by the .id and the date columns. However the .id arranged by the order it appears in the list names. So for the c(19, 5) list the 19 would be first (as well as ordered by date) and the 5 would be second (as well as ordered by date). For the c(5, 19) list the 5 would be ordered first (as well as ordered by date) and the 19 would be second (as well as ordered by date).
Any advice on how to do this would be great.
Data:
lst <- list(`9` = structure(list(date = structure(c(16151, 17973, 17649,
17738, 17388, 13927, 11594, 13095, 15312, 12030, 13805, 13240,
15660, 15926, 11645, 12139, 17853, 15328, 12561, 13595, 14147,
12142, 14112, 14083, 16057, 13074, 11458, 14735, 12892, 16139,
11935, 17666, 14789, 12231, 12343, 17012, 13099, 17682, 15150,
14195), class = "Date"), .id = c(9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9, 9)), row.names = c(NA, -40L), class = c("tbl_df",
"tbl", "data.frame")), `c(1, 7)` = structure(list(date = structure(c(12656,
15634, 13582, 17498, 15079, 12265, 18031, 17399, 11603, 13886,
16876, 16022, 16303, 17776, 12717, 15154, 12950, 13693, 17561,
16963, 15690, 12581, 14883, 18010, 14280, 12672, 16108, 14347,
14326, 14628, 17913, 13771, 15369, 14765, 12067, 16397, 11555,
14855, 16308, 12824), class = "Date"), .id = c(7, 7, 1, 7, 7,
7, 7, 1, 7, 7, 1, 1, 7, 1, 7, 1, 1, 1, 7, 1, 7, 1, 1, 1, 1, 1,
1, 7, 7, 1, 7, 7, 7, 7, 1, 1, 7, 7, 1, 1)), row.names = c(NA,
-40L), class = c("tbl_df", "tbl", "data.frame")), `c(13, 18)` = structure(list(
date = structure(c(16831, 16884, 17821, 15686, 14680, 16428,
17462, 15693, 14707, 16889, 17534, 17556, 15243, 17308, 16886,
17212, 15199, 15669, 17761, 17103, 16992, 17396, 17584, 15904,
15643, 16748, 17554, 16822, 17184, 16264, 15425, 16715, 15268,
15205, 14772, 17285, 17184, 16112, 15327, 17100), class = "Date"),
.id = c(13, 13, 13, 18, 13, 18, 13, 13, 13, 13, 13, 13, 18,
13, 18, 13, 13, 13, 18, 18, 13, 13, 13, 13, 18, 18, 13, 13,
13, 18, 13, 13, 13, 13, 13, 13, 18, 18, 18, 13)), row.names = c(NA,
-40L), class = c("tbl_df", "tbl", "data.frame")), `c(18, 2, 7, 13)` = structure(list(
date = structure(c(15800, 18009, 12847, 12378, 12365, 14864,
14961, 14562, 15723, 15856, 11545, 11755, 15080, 13149, 12655,
14898, 13067, 14375, 15499, 16681, 15682, 18030, 15732, 14452,
17624, 15741, 17894, 12768, 17295, 12015, 16533, 13589, 17072,
14678, 14067, 14348, 16846, 18125, 17826, 16874), class = "Date"),
.id = c(13, 2, 7, 2, 2, 13, 13, 7, 13, 7, 7, 7, 7, 2, 7,
7, 7, 7, 7, 18, 13, 13, 18, 7, 2, 7, 7, 7, 13, 2, 2, 2, 7,
18, 7, 2, 2, 18, 13, 18)), row.names = c(NA, -40L), class = c("tbl_df",
"tbl", "data.frame")), `c(19, 5)` = structure(list(date = structure(c(17631,
17016, 15478, 13767, 15281, 13612, 11515, 17742, 15916, 11547,
12959, 16713, 12521, 12457, 12174, 18054, 16407, 13462, 14704,
16642, 12551, 16289, 12034, 17676, 16486, 15009, 17220, 16753,
13335, 12498, 12697, 17725, 17833, 16329, 17182, 16435, 11475,
14732, 15210, 17823), class = "Date"), .id = c(5, 19, 5, 5, 5,
5, 5, 5, 19, 5, 5, 19, 5, 5, 5, 19, 5, 5, 5, 5, 5, 5, 5, 5, 19,
5, 5, 5, 5, 5, 5, 19, 19, 19, 19, 5, 5, 19, 5, 5)), row.names = c(NA,
-40L), class = c("tbl_df", "tbl", "data.frame")), `c(2, 7, 18)` = structure(list(
date = structure(c(17563, 15041, 14516, 16281, 12525, 17126,
13813, 14283, 16818, 14879, 15860, 16616, 17303, 15356, 14899,
14306, 15254, 17836, 12555, 15367, 17721, 16216, 16787, 16603,
14723, 13608, 13276, 17852, 16922, 17774, 14676, 16696, 17059,
15518, 13829, 14623, 17787, 14534, 17579, 15137), class = "Date"),
.id = c(7, 7, 2, 18, 2, 18, 2, 2, 7, 2, 7, 7, 18, 7, 7, 7,
7, 18, 7, 2, 7, 2, 7, 2, 2, 7, 2, 18, 18, 2, 18, 18, 2, 2,
7, 2, 7, 2, 2, 7)), row.names = c(NA, -40L), class = c("tbl_df",
"tbl", "data.frame")), `c(5, 19)` = structure(list(date = structure(c(15108,
13026, 16622, 12813, 11591, 15364, 16033, 16594, 15353, 14652,
14697, 17160, 17084, 16686, 13560, 11401, 16433, 11722, 17606,
15924, 16235, 17817, 16172, 14612, 12021, 17276, 18080, 16222,
16849, 14746, 14036, 17850, 11350, 15036, 15577, 14833, 16464,
15322, 15988, 17023), class = "Date"), .id = c(19, 5, 5, 5, 5,
19, 5, 5, 5, 5, 19, 19, 19, 19, 5, 5, 19, 5, 19, 5, 19, 19, 5,
19, 5, 19, 5, 19, 19, 19, 5, 19, 5, 19, 5, 19, 5, 5, 19, 19)), row.names = c(NA,
-40L), class = c("tbl_df", "tbl", "data.frame")), `c(7, 1, 2, 18)` = structure(list(
date = structure(c(12154, 16407, 11374, 12594, 13229, 13812,
12462, 16255, 16181, 15333, 15337, 16019, 14551, 16383, 13281,
15422, 12951, 17836, 16740, 12130, 18142, 16458, 18148, 15173,
12506, 15581, 15244, 16519, 15785, 17916, 17575, 15128, 15274,
15808, 12137, 16425, 15927, 14696, 12771, 12894), class = "Date"),
.id = c(7, 1, 1, 7, 2, 1, 1, 18, 2, 1, 2, 2, 1, 7, 7, 1,
1, 18, 2, 2, 2, 1, 18, 2, 1, 1, 7, 18, 7, 18, 2, 18, 1, 7,
2, 1, 7, 2, 2, 2)), row.names = c(NA, -40L), class = c("tbl_df",
"tbl", "data.frame")))

You could do something like the following:
# Loop over names of list
newlist <- lapply(names(lst), function(i) {
# Subset list by name
thislist <- lst[[i]]
# evaluate the list name
i <- eval(parse(text = i))
# order list
thislist[order(factor(thislist$.id, levels = as.character(i))),]
})

We can use imap from purrr and use match and order to order each dataframe
purrr::imap(lst, ~.x[order(match(.x$.id, eval(parse(text = .y)))), ])
#$`9`
# A tibble: 40 x 2
# date .id
# <date> <dbl>
# 1 2014-03-22 9
# 2 2019-03-18 9
# 3 2018-04-28 9
# 4 2018-07-26 9
# 5 2017-08-10 9
# 6 2008-02-18 9
# 7 2001-09-29 9
# 8 2005-11-08 9
# 9 2011-12-04 9
#10 2002-12-09 9
# … with 30 more rows
#$`c(1, 7)`
# A tibble: 40 x 2
# date .id
# <date> <dbl>
# 1 2007-03-10 1
# 2 2017-08-21 1
# 3 2016-03-16 1
# 4 2013-11-13 1
# 5 2018-09-02 1
# 6 2011-06-29 1
# 7 2005-06-16 1
# 8 2007-06-29 1
# 9 2016-06-11 1
#10 2004-06-12 1
# … with 30 more rows
#....
#.....
In base R, that can be achieved using Map
Map(function(x, y) x[order(match(x$.id, y)), ], lst,
lapply(names(lst), function(x) eval(parse(text = x))))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

the difference of multiple columns in R based in DATE [duplicate] - r

Related

Data imputation by linear interpolation according to date in R

How to create before and after scores in two different columns based on date?

Why can't I plot my model selection plot correctly?

Join two data frames in R based on closest timestamp within groups

map over and arrange a list based on the list names

Categories

Resources