I am working with COVID-19 data from my country by regions (3) in a dataframe. I want to use those columns of positive cases to generate other columns in which I want to calculate the growth in between rows. The dataframe:
> df
Lima Arequipa Huánuco
1 1 NA NA
2 6 NA NA
3 6 1 NA
4 8 2 5
5 9 3 7
6 11 4 8
I want to use a for loop to calculate in a new column named as each df's column adding to its name "_dif" in which I have the row 1 - lag (row 1) for each column. So I used this code:
for(col in names(df)) {
df[paste0(col, "_dif")] = df[col] - lag(df[col])
}
The output I want is the next one:
Lima Arequipa Huánuco Lima_dif Arequipa_dif Huánuco_dif
1 1 NA NA NA NA NA
2 6 NA NA 5 NA NA
3 6 1 NA 0 NA NA
4 8 2 5 2 1 NA
5 9 3 7 1 1 2
6 11 4 8 2 1 1
But when I see the df after the for loop I got this (only NA in the new columns):
Lima Arequipa Huánuco Lima_dif Arequipa_dif Huánuco_dif
1 1 NA NA NA NA NA
2 6 NA NA NA NA NA
3 6 1 NA NA NA NA
4 8 2 5 NA NA NA
5 9 3 7 NA NA NA
6 11 4 8 NA NA NA
Thanks in advance.
We can just use mutate with across from dplyr as the _all/_at suffixes are getting deprecated and in the newer version, across is more genneralized
library(dplyr)
df %>%
mutate(across(everything(), ~ . - lag(.), names = "{col}_dif"))
# Lima Arequipa Huánuco Lima_dif Arequipa_dif Huánuco_dif
#1 1 NA NA NA NA NA
#2 6 NA NA 5 NA NA
#3 6 1 NA 0 NA NA
#4 8 2 5 2 1 NA
#5 9 3 7 1 1 2
#6 11 4 8 2 1 1
Or in base R
df[paste0(names(df), "_dif")] <- lapply(df, function(x) c(NA, diff(x)))
Or another option is
df[paste0(names(df), "_dif")] <- rbind(NA, diff(as.matrix(df)))
The issue in the OP's for loop is that df[col] is a still a data.frame with a single column, we need df[[col]] to extract as vector because lag needs a vector. According to ?lag
x - Vector of values
lag(df[1])
# Lima
#1 NA
returns NA and it gets recycled
while,
lag(df[[1]])
#[1] NA 1 6 6 8 9
therefore, if we change the code to
for(col in names(df)) {
df[paste0(col, "_dif")] = df[[col]] - lag(df[[col]])
}
df
# Lima Arequipa Huánuco Lima_dif Arequipa_dif Huánuco_dif
#1 1 NA NA NA NA NA
#2 6 NA NA 5 NA NA
#3 6 1 NA 0 NA NA
#4 8 2 5 2 1 NA
#5 9 3 7 1 1 2
#6 11 4 8 2 1 1
data
df <- structure(list(Lima = c(1L, 6L, 6L, 8L, 9L, 11L), Arequipa = c(NA,
NA, 1L, 2L, 3L, 4L), Huánuco = c(NA, NA, NA, 5L, 7L, 8L)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
In dplyr, you can use mutate_all :
library(dplyr)
df %>% mutate_all(list(diff = ~. - lag(.)))
# Lima Arequipa Huánuco Lima_diff Arequipa_diff Huánuco_diff
#1 1 NA NA NA NA NA
#2 6 NA NA 5 NA NA
#3 6 1 NA 0 NA NA
#4 8 2 5 2 1 NA
#5 9 3 7 1 1 2
#6 11 4 8 2 1 1
Or shift in data.table
library(data.table)
setDT(df)[, (paste0(names(df), '_diff')) := .SD - shift(.SD)]
You almost had it.
df <- read_table("V Lima Arequipa Huanuco
1 1 NA NA
2 6 NA NA
3 6 1 NA
4 8 2 5
5 9 3 7
6 11 4 8")
for(col in names(df)) {
df[paste0(col, "_dif")] <- df[col] - lag(df[col], default = 0)
}
df
# A tibble: 6 x 8
V Lima Arequipa Huanuco V_dif Lima_dif Arequipa_dif Huanuco_dif
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 NA NA 1 1 NA NA
2 2 6 NA NA 2 6 NA NA
3 3 6 1 NA 3 6 1 NA
4 4 8 2 5 4 8 2 5
5 5 9 3 7 5 9 3 7
6 6 11 4 8 6 11 4 8
You didn't set lag's default to 0, so it went to NA.
Related
This question already has answers here:
Replacing values from a column using a condition in R
(2 answers)
Closed 7 months ago.
I have a data frame that is z-score converted. I want to delete from the data frame (and convert to NA) only those values that are higher or equal to 4, without dropping any row or column. I would appreciate an answer.
Best
You can use the following code:
df <- data.frame(v1 = c(1,3,6,7,3),
v2 = c(2,1,4,6,7),
v3 = c(1,2,3,4,5))
df
#> v1 v2 v3
#> 1 1 2 1
#> 2 3 1 2
#> 3 6 4 3
#> 4 7 6 4
#> 5 3 7 5
is.na(df) <- df >= 4
df
#> v1 v2 v3
#> 1 1 2 1
#> 2 3 1 2
#> 3 NA NA 3
#> 4 NA NA NA
#> 5 3 NA NA
Created on 2022-07-10 by the reprex package (v2.0.1)
you can simply use df[df>=4] <- NA to achieve what you want.
df <- data.frame(replicate(10,sample(0:10,10,rep=TRUE)))
> df
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 2 3 4 5 6 4 3 1 10 6
2 5 7 0 4 3 10 10 3 6 10
3 5 5 0 3 1 3 5 7 2 7
4 7 0 4 1 10 0 5 2 5 0
5 8 8 7 8 4 6 6 10 10 0
6 1 4 1 3 3 8 8 0 4 8
7 6 3 3 6 7 4 10 9 7 2
8 2 1 4 0 7 8 10 1 6 3
9 0 9 6 2 9 6 2 9 0 3
10 8 2 1 0 1 4 0 6 2 8
df[df>=4] <- NA
> df
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 2 3 NA NA NA NA 3 1 NA NA
2 NA NA 0 NA 3 NA NA 3 NA NA
3 NA NA 0 3 1 3 NA NA 2 NA
4 NA 0 NA 1 NA 0 NA 2 NA 0
5 NA NA NA NA NA NA NA NA NA 0
6 1 NA 1 3 3 NA NA 0 NA NA
7 NA 3 3 NA NA NA NA NA NA 2
8 2 1 NA 0 NA NA NA 1 NA 3
9 0 NA NA 2 NA NA 2 NA 0 3
10 NA 2 1 0 1 NA 0 NA 2 NA
Here is one more. Using replace_with_na_all() from naniar package:
Use replace_with_na_all() when you want to replace ALL values that meet a condition across an entire dataset. The syntax here is a little different, and follows the rules for rlang’s expression of simple functions. This means that the function starts with ~, and when referencing a variable, you use .x.
https://cran.r-project.org/web/packages/naniar/vignettes/replace-with-na.html
library(naniar)
library(dplyr)
df %>%
replace_with_na_all(condition = ~.x > 4)
v1 v2 v3
<dbl> <dbl> <dbl>
1 1 2 1
2 3 1 2
3 NA 4 3
4 NA NA 4
5 3 NA NA
Though the solution by #Quinten is very concise, just add an approach in tidyverse
library(dplyr)
set.seed(123)
df <- data.frame(
x = sample(1:10, 7),
y = sample(1:10, 7)
)
df %>%
mutate(
across(.fns = ~ if_else(.x >= 4, NA_integer_, .x))
)
#> x y
#> 1 3 NA
#> 2 NA NA
#> 3 2 1
#> 4 NA 2
#> 5 NA 3
#> 6 NA NA
#> 7 1 NA
Created on 2022-07-10 by the reprex package (v2.0.1)
In base R, we can use replace():
df <- replace(df, df > 4, NA_real_)
Output
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 NA NA 3 NA 1 3 1 1 NA NA
2 1 NA 2 NA NA 3 NA NA 2 0
3 NA 1 NA 2 2 1 NA NA 4 1
4 NA NA 0 NA NA NA 0 2 4 NA
5 NA 1 NA 3 0 NA 4 NA 2 3
6 0 3 NA 0 NA NA 1 1 NA 2
7 3 NA NA NA 2 2 NA 2 NA 4
8 NA 1 0 2 NA NA 2 NA NA NA
9 NA 3 NA 2 4 NA NA 0 1 3
10 1 3 NA 3 NA NA 3 4 NA NA
Or use replace in dplyr:
library(dplyr)
df %>%
mutate(across(everything(), ~ replace(.x, .x > 4, NA_real_)))
Data
set.seed(321)
df <- data.frame(replicate(10, sample(0:10, 10, rep = TRUE)))
If the columns are numeric, an option is also to use ^ on a logical matrix (df >= 4) to return NA for TRUE values and 1 for FALSE, then multiply with original data so that those elements corresponding to NA returns NA and the ones with 1 returns the original element
NA^(df >= 4) * df
I have a dataset that looks like this:
set.seed(999)
col1 = sample.int(10, 10)
col2 = sample.int(10, 10)
col3 = sample.int(10, 10)
col4 = sample.int(10, 10)
col5 = sample.int(10, 10)
col_data = data.frame(col1, col2, col3, col4, col5)
col1 col2 col3 col4 col5
1 4 8 3 9 8
2 7 5 9 7 10
3 1 7 7 8 2
4 6 6 5 5 4
5 8 10 8 3 7
6 2 3 1 2 6
7 5 9 2 1 1
8 10 2 4 4 3
9 9 1 10 6 9
10 3 4 6 10 5
I would like to create new columns in this dataset that :
Find out the position (i.e. column number) for the first "9" in each row
Find out the position (i.e. column number) for the first "7" in each row
Find out the position (i.e. column number) for the first "1" in each row
Find out the position (i.e. column number) for the first "10" in each row
Find out the position (i.e. column number) for the first "4" in each row
I thought this might be easier to do if the data was a matrix, and then convert it back to a data frame:
col_d = as.matrix(col_data)
first_4 = apply(col_d == 9, 1, which.max)
first_7 = apply(col_d == 7, 1, which.max)
first_1 = apply(col_d == 1, 1, which.max)
first_10 = apply(col_d == 10, 1, which.max)
first_4 = apply(col_d == 4, 1, which.max)
final = cbind(col_data, first_4, first_7, first_1, first_10, first_4)
But this does not appear to be working:
col1 col2 col3 col4 col5 first_4 first_7 first_1 first_10 first_4
1 4 8 3 9 8 1 1 1 1 1
2 7 5 9 7 10 1 1 1 5 1
3 1 7 7 8 2 1 2 1 1 1
4 6 6 5 5 4 5 1 1 1 5
5 8 10 8 3 7 1 5 1 2 1
6 2 3 1 2 6 1 1 3 1 1
7 5 9 2 1 1 1 1 4 1 1
8 10 2 4 4 3 3 1 1 1 3
9 9 1 10 6 9 1 1 2 3 1
10 3 4 6 10 5 2 1 1 4 2
For example: In the first row, there is no 10 - but the value of "first_10" is 1
Is there a way to resolve this error?
Thank you!
How about
apply(col_data == 7, 1, function(x) {ifelse(sum(x)==0, NA, which.max(x))})
[1] NA 1 2 NA 5 NA NA NA NA NA
apply(col_data == 10, 1, function(x) {ifelse(sum(x)==0, NA, which.max(x))})
[1] NA 5 NA NA 2 NA NA 1 3 4
You may change NA whatever you want, that it means there is no that number(i.e 7 or 10)
get second one
apply(col_data == 7, 1, function(x) {ifelse(sum(x)==0, NA, which(x)[2])})
get last one
apply(col_data == 7, 1, function(x) {ifelse(sum(x)==0, NA, dplyr::last(which(x)))})
Use max.col:
nr <- c(9, 7, 1, 10, 4)
nr <- setNames(nr, paste0("first_", nr))
cbind(col_data, sapply(nr, function(x) {
. <- col_data == x
tt <- max.col(., "first")
is.na(tt) <- tt == 1 & !.[,1]
tt
}))
# col1 col2 col3 col4 col5 first_9 first_7 first_1 first_10 first_4
#1 4 8 3 9 8 4 NA NA NA 1
#2 7 5 9 7 10 3 1 NA 5 NA
#3 1 7 7 8 2 NA 2 1 NA NA
#4 6 6 5 5 4 NA NA NA NA 5
#5 8 10 8 3 7 NA 5 NA 2 NA
#6 2 3 1 2 6 NA NA 3 NA NA
#7 5 9 2 1 1 2 NA 4 NA NA
#8 10 2 4 4 3 NA NA NA 1 3
#9 9 1 10 6 9 1 NA 2 3 NA
#10 3 4 6 10 5 NA NA NA 4 2
For the last:
nr <- c(9, 7, 1, 10, 4)
nr <- setNames(nr, paste0("last_", nr))
cbind(col_data, sapply(nr, function(x) {
. <- col_data == x
tt <- max.col(., "last")
is.na(tt) <- rowSums(.) == 0
tt
}))
# col1 col2 col3 col4 col5 last_9 last_7 last_1 last_10 last_4
#1 4 8 3 9 8 4 NA NA NA 1
#2 7 5 9 7 10 3 4 NA 5 NA
#3 1 7 7 8 2 NA 3 1 NA NA
#4 6 6 5 5 4 NA NA NA NA 5
#5 8 10 8 3 7 NA 5 NA 2 NA
#6 2 3 1 2 6 NA NA 3 NA NA
#7 5 9 2 1 1 2 NA 5 NA NA
#8 10 2 4 4 3 NA NA NA 1 4
#9 9 1 10 6 9 5 NA 2 3 NA
#10 3 4 6 10 5 NA NA NA 4 2
And for the second match:
nr <- c(9, 7, 1, 10, 4)
nr <- setNames(nr, paste0("2nd_", nr))
cbind(col_data, sapply(nr, function(x) {
. <- which(col_data == x, TRUE)
. <- tapply(.[,2], .[,1], `[`, 2)
replace(rep(NA_integer_, nrow(col_data)), as.integer(names(.)), .)
}))
# col1 col2 col3 col4 col5 2nd_9 2nd_7 2nd_1 2nd_10 2nd_4
#1 4 8 3 9 8 NA NA NA NA NA
#2 7 5 9 7 10 NA 4 NA NA NA
#3 1 7 7 8 2 NA 3 NA NA NA
#4 6 6 5 5 4 NA NA NA NA NA
#5 8 10 8 3 7 NA NA NA NA NA
#6 2 3 1 2 6 NA NA NA NA NA
#7 5 9 2 1 1 NA NA 5 NA NA
#8 10 2 4 4 3 NA NA NA NA 4
#9 9 1 10 6 9 5 NA NA NA NA
#10 3 4 6 10 5 NA NA NA NA NA
Or using apply on one column.
#First
apply(col_data == 9, 1, function(x) if(any(x)) which.max(x) else NA)
# [1] 4 3 NA NA NA NA 2 NA 1 NA
#Last
apply(col_data == 9, 1, function(x) if(any(x)) tail(which(x), 1) else NA)
# [1] 4 3 NA NA NA NA 2 NA 5 NA
#Second
apply(col_data == 9, 1, function(x) if(any(x)) which(x)[2] else NA)
# [1] NA NA NA NA NA NA NA NA 5 NA
I have a dataframe like this:
household person R01 R02 R03 R04 R05
1 1 1 NA 1 7 7 NA
2 1 2 1 NA 7 7 NA
3 1 3 3 3 NA 11 NA
4 1 4 3 3 11 NA NA
5 2 1 NA 7 16 NA NA
6 2 2 3 NA 7 NA NA
7 2 3 15 3 NA NA NA
and I'm trying add new columns which are the grouped transposed versions of columns R01 to R05, like this:
household person R01 R02 R03 R04 R05 R01x R02x R03x R04x R05x
1 1 1 NA 1 7 7 NA NA 1 3 3 NA
2 1 2 1 NA 7 7 NA 1 NA 3 3 NA
3 1 3 3 3 NA 11 NA 7 7 NA 11 NA
4 1 4 3 3 11 NA NA 7 7 11 NA NA
5 2 1 NA 7 16 NA NA NA 3 15 NA NA
6 2 2 3 NA 7 NA NA 7 NA 3 NA NA
7 2 3 15 3 NA NA NA 16 7 NA NA NA
I have tried various attempts using t() and reshaping using gather() and spread() but I don't think they are designed to do this as I'm moving the data around rather than just reshaping it.
Example Code
df <- data.frame(household = c(rep(1,4),rep(2,3)),
person = c(1:4,1:3),
R01 = c(NA,1,3,3,NA,3,15),
R02 = c(1,NA,3,3,7,NA,3),
R03 = c(7,7,NA,11,16,7,NA),
R04 = c(7,7,11,rep(NA,4)),
R05 = rep(NA,7))
Referring to my previous answer, you can transpose the matrx within group_modify():
library(dplyr)
df %>%
group_by(household) %>%
group_modify(~ {
mat <- t(.x[-1][1:nrow(.x)])
colnames(mat) <- paste0(rownames(mat), "x")
cbind(.x, mat)
}) %>%
ungroup()
# # A tibble: 7 × 11
# household person R01 R02 R03 R04 R05 R01x R02x R03x R04x
# <dbl> <int> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 NA 1 7 7 NA NA 1 3 3
# 2 1 2 1 NA 7 7 NA 1 NA 3 3
# 3 1 3 3 3 NA 11 NA 7 7 NA 11
# 4 1 4 3 3 11 NA NA 7 7 11 NA
# 5 2 1 NA 7 16 NA NA NA 3 15 NA
# 6 2 2 3 NA 7 NA NA 7 NA 3 NA
# 7 2 3 15 3 NA NA NA 16 7 NA NA
Partly using a previous answer, here's a way to do it.
Split the dataframe according to their group
Get their number of columns with at least one non-NA (important to do the transposition)
Reduce their size using the length size created in step 2, and do the transposition.
Swap (again) the colnames and rownames which were swapped (first) in the transposition.
Bind the columns with the original dataframe.
l <- split(df[startsWith(colnames(df), "R")], df$household)
len <- lapply(l, \(l) ncol(l) - (sum(sapply(l, \(x) any(!is.na(x))))))
l <- mapply(\(x, y) t(x[1:(length(x) - y)]), l, len, SIMPLIFY = F)
l <- lapply(l, function(x){
r <- paste0(rownames(x), "x")
c <- colnames(x)
rownames(x) <- c
colnames(x) <- r
data.frame(x)
})
cbind(df, bind_rows(l))
output
household person R01 R02 R03 R04 R05 R01x R02x R03x R04x
1 1 1 NA 1 7 7 NA NA 1 3 3
2 1 2 1 NA 7 7 NA 1 NA 3 3
3 1 3 3 3 NA 11 NA 7 7 NA 11
4 1 4 3 3 11 NA NA 7 7 11 NA
5 2 1 NA 7 16 NA NA NA 3 15 NA
6 2 2 3 NA 7 NA NA 7 NA 3 NA
7 2 3 15 3 NA NA NA 16 7 NA NA
df %>%
left_join(pivot_longer(.,starts_with('R'), names_to = 'name',
names_pattern = "(\\d+)", values_drop_na = TRUE,
names_transform = list(name = as.integer)) %>%
pivot_wider(c(household,name), names_from = person,
names_glue = "R0{person}x"),
by = c('household', person = 'name'))
household person R01 R02 R03 R04 R05 R01x R02x R03x R04x
1 1 1 NA 1 7 7 NA NA 1 3 3
2 1 2 1 NA 7 7 NA 1 NA 3 3
3 1 3 3 3 NA 11 NA 7 7 NA 11
4 1 4 3 3 11 NA NA 7 7 11 NA
5 2 1 NA 7 16 NA NA NA 3 15 NA
6 2 2 3 NA 7 NA NA 7 NA 3 NA
7 2 3 15 3 NA NA NA 16 7 NA NA
Another solution:
df %>%
left_join(
reshape2::recast(.,household+variable~person,id.var = c('household', 'person'))%>%
group_by(household) %>%
mutate(person = seq_along(variable), variable = NULL))
household person R01 R02 R03 R04 R05 1 2 3 4
1 1 1 NA 1 7 7 NA NA 1 3 3
2 1 2 1 NA 7 7 NA 1 NA 3 3
3 1 3 3 3 NA 11 NA 7 7 NA 11
4 1 4 3 3 11 NA NA 7 7 11 NA
5 2 1 NA 7 16 NA NA NA 3 15 NA
6 2 2 3 NA 7 NA NA 7 NA 3 NA
7 2 3 15 3 NA NA NA 16 7 NA NA
Here's a way to do it.
library(dplyr)
transposed_df <- df %>%
group_split(household) %>%
lapply(\(x){
select(x, -1:-2) %>%
t() %>%
head(nrow(x)) %>%
as_tibble() %>%
setNames(paste0(names(x)[-1:-2], 'x'))
}) %>%
bind_rows()
df %>%
bind_cols(transposed_df)
#> household person R01 R02 R03 R04 R05 R01x R02x R03x R04x
#> 1 1 1 NA 1 7 7 NA NA 1 3 3
#> 2 1 2 1 NA 7 7 NA 1 NA 3 3
#> 3 1 3 3 3 NA 11 NA 7 7 NA 11
#> 4 1 4 3 3 11 NA NA 7 7 11 NA
#> 5 2 1 NA 7 16 NA NA NA 3 15 NA
#> 6 2 2 3 NA 7 NA NA 7 NA 3 NA
#> 7 2 3 15 3 NA NA NA 16 7 NA NA
This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 3 years ago.
I have a dataframe with 3 column: participant ID, questionID and a column containing wether or not they gave the correct (1) response or not (0).
It looks like this:
> head(df)
# A tibble: 6 x 3
ID questionID correct
<dbl> <int> <dbl>
1 1 1 1
2 2 2 0
3 3 3 1
4 4 4 0
5 5 5 0
6 6 6 1
And can be recreated using:
set.seed(0)
df <- tibble(ID = seq(1, 100, 1),
questionID = rep(seq(1, 10,), 10),
correct = base::sample(c(0, 1), size = 100, replace = TRUE))
Now I would like each question to have their own column, with the ultimate goal of fitting a 2PL model to it. The data should for that purpose look like 1 row per participant, and 11 columns (ID and 10 question Columns).
How do I achieve this?
You can use pivot_wider from the tidyr package:
df %>%
pivot_wider(names_from = questionID,
values_from = correct,
names_prefix = "questionID_")
# A tibble: 100 x 11
ID questionID_1 questionID_2 questionID_3 questionID_4
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 NA NA NA
2 2 NA 0 NA NA
3 3 NA NA 0 NA
4 4 NA NA NA 1
5 5 NA NA NA NA
6 6 NA NA NA NA
7 7 NA NA NA NA
8 8 NA NA NA NA
9 9 NA NA NA NA
10 10 NA NA NA NA
# ... with 90 more rows, and 6 more variables: questionID_5 <dbl>,
# questionID_6 <dbl>, questionID_7 <dbl>, questionID_8 <dbl>,
# questionID_9 <dbl>, questionID_10 <dbl>
Using data.table you can use dcast
df <- data.frame(ID=c(1,2,3,4,5,6), questionID= c(1,22,13,4,35,8),correct=c(1,0,1,0,0,1))
df
ID questionID correct
1 1 1 1
2 2 22 0
3 3 13 1
4 4 4 0
5 5 35 0
6 6 8 1
setDT(df)
dcast(df,ID~questionID,value.var="correct")
ID 1 4 8 13 22 35
1: 1 1 NA NA NA NA NA
2: 2 NA NA NA NA 0 NA
3: 3 NA NA NA 1 NA NA
4: 4 NA 0 NA NA NA NA
5: 5 NA NA NA NA NA 0
6: 6 NA NA 1 NA NA NA
# replace NA to what you want
df[is.na(df)]<- "-"
If I have data such as
idx<-c("1_1_2015_0_00_00","1_1_2015_0_10_00","1_1_2015_0_30_00","1_1_2015_0_40_00","1_1_2015_0_60_00","1_1_2015_0_80_00")
rr<-c(2,3,4,1,5,6)
no<-seq(1,6)
dat<-data.frame(no,idx,rr)
then i want to pair with a standard index
id<-c("1_1_2015_0_00_00","1_1_2015_0_10_00","1_1_2015_0_20_00","1_1_2015_0_30_00","1_1_2015_0_40_00","1_1_2015_0_50_00","1_1_2015_0_60_00","1_1_2015_0_70_00","1_1_2015_0_80_00")
so i have rank of index of missing data such
no idx rr
1 1 1_1_2015_0_00_00 2
2 2 1_1_2015_0_10_00 3
3 NA NA NA
4 3 1_1_2015_0_30_00 4
5 4 1_1_2015_0_40_00 1
6 NA NA NA
7 5 1_1_2015_0_60_00 5
8 NA NA NA
9 6 1_1_2015_0_80_00 6
How to get it?
You can use match
dat[match(id, dat$idx), ]
# no idx rr
#1 1 1_1_2015_0_00_00 2
#2 2 1_1_2015_0_10_00 3
#NA NA <NA> NA
#3 3 1_1_2015_0_30_00 4
#4 4 1_1_2015_0_40_00 1
#NA.1 NA <NA> NA
#5 5 1_1_2015_0_60_00 5
#NA.2 NA <NA> NA
#6 6 1_1_2015_0_80_00 6
match(id, dat$idx) returns
#[1] 1 2 NA 3 4 NA 5 NA 6
and we use this vector to select rows of dat.