How to only ready real values in a set - r

This might be a simply question but I haven't figured it out.
I am writing a simple loop.
v6<-c()
> aa
[1] 1 4 8 9 10 12 15 16 17 18 19 20 21 25 29 30 38
[18] 39 46 47 48 49 52 53 54 60 65 69 73 75 81 82 83 85
[35] 86 87 90 91 92 94 96 97 98 99 100 101 104 105 106 110 112
[52] 113 114 116 117 118 119 122 125 126 128 129
for (i in aa){
v6[i]<-sum(as.numeric(Sep1$Units[Sep1$ID==i]))
}
> v6
[1] 3800 NA NA 2600 NA NA NA 7700 13500 11900 NA
[12] 15600 NA NA 2000 17700 9600 11600 3400 11200 6600 NA
[23] NA NA 6000 NA NA NA 8800 2400 NA NA NA
[34] NA NA NA NA 2600 4500 NA NA NA NA NA
[45] NA 23400 36000 4000 5100 NA NA 9200 5400 7000 NA
[56] NA NA NA NA 5000 NA NA NA NA 60000 NA
[67] NA NA 7200 NA NA NA 20000 NA 39600 NA NA
[78] NA NA NA 23600 1600 10600 NA 39000 1000 6200 NA
[89] NA 3000 100 1400 NA 12800 NA 5100 2000 32000 7000
[100] 10900 4800 NA NA 3200 14600 24000 NA NA NA 16200
[111] NA 5000 28800 16800 NA 2600 40000 800 8400 NA NA
[122] 18000 NA NA 24800 13600 NA 4600 11700
I realized R has red 1 through 129 instead of just read "1, 4, 8, ...". Now I know I can use na.omit(v6) to remove all the NA in values, but I am just wondering if there is a way that allows R to ready just the values in "aa" instead of going through 1 though 129 please?
I don't know if I have emphasized my question well. Thanks

Generally if you are using a for loop in R there is always a better way to do it.
You need to provide test data in order for me to show that this works, but I believe the following statement will do what you want without a for loop:
v6[aa] <- sum(as.numeric(Sep1$Units[Sep1$ID %in% aa]))
The expression "v6[aa] <-" says "for the elements in the vector v6 at the positions in the vector aa, assign the values in the following vector to those positions."

Related

How to create a column with percentages for specific values for a group in R [duplicate]

This question already has an answer here:
How to complete missing factor levels in data frame?
(1 answer)
Closed yesterday.
I have this dataset
dat = structure(list(mdm = 7:8, price = c(100L, 200L), count = c(200L, 300L)),
class = "data.frame", row.names = c(NA, -2L))
I need to transform this data by adding a column with percentages for each mdm group. Each group should have a perc column with values
50, 60, 70, 80, 85, 90, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 110, 115, 120, 130, 140, 150
where opposite the value 100, there should be a price and count value for each group of mdm from the dat dataset.
Desired output:
perc price count
50 NA NA
60 NA NA
70 NA NA
80 NA NA
85 NA NA
90 NA NA
95 NA NA
96 NA NA
97 NA NA
98 NA NA
99 NA NA
**100 100 200**
101 NA NA
102 NA NA
103 NA NA
104 NA NA
105 NA NA
110 NA NA
115 NA NA
120 NA NA
130 NA NA
140 NA NA
150 NA NA
50 NA NA
60 NA NA
70 NA NA
80 NA NA
85 NA NA
90 NA NA
95 NA NA
96 NA NA
97 NA NA
98 NA NA
99 NA NA
**100 200 300**
101 NA NA
102 NA NA
103 NA NA
104 NA NA
105 NA NA
110 NA NA
115 NA NA
120 NA NA
130 NA NA
140 NA NA
150 NA NA
mdm=7 values for price and count are equal to 100 and 200, so we put them down near the point where 100.
mdm=8 values for price and count are equal to 200 and 300, so we put them down near the point where 100.
What is the easy way to do it? Thank you for your help.
Here is tidyverse approach:
library(dplyr)
library(tidyr)
vector <- paste(c(50, 60, 70, 80, 85, 90, 95:105, 110, 115, 120, 130, 140, 150), collapse = ", ")
dat %>%
group_by(mdm) %>%
mutate(perc = vector) %>%
separate_rows(perc, sep=",", convert = TRUE) %>%
ungroup() %>%
select(perc, price, count) %>%
mutate(across(-perc, ~ifelse(perc==100, ., NA_real_))) %>%
print(n=50)
perc price count
<int> <dbl> <dbl>
1 50 NA NA
2 60 NA NA
3 70 NA NA
4 80 NA NA
5 85 NA NA
6 90 NA NA
7 95 NA NA
8 96 NA NA
9 97 NA NA
10 98 NA NA
11 99 NA NA
12 100 100 200
13 101 NA NA
14 102 NA NA
15 103 NA NA
16 104 NA NA
17 105 NA NA
18 110 NA NA
19 115 NA NA
20 120 NA NA
21 130 NA NA
22 140 NA NA
23 150 NA NA
24 50 NA NA
25 60 NA NA
26 70 NA NA
27 80 NA NA
28 85 NA NA
29 90 NA NA
30 95 NA NA
31 96 NA NA
32 97 NA NA
33 98 NA NA
34 99 NA NA
35 100 200 300
36 101 NA NA
37 102 NA NA
38 103 NA NA
39 104 NA NA
40 105 NA NA
41 110 NA NA
42 115 NA NA
43 120 NA NA
44 130 NA NA
45 140 NA NA
46 150 NA NA
You can use complete from tidyr:
tidyr::complete(
cbind(dat, perc = 100),
mdm, perc = c(50, 60, 70, 80, 85, 90, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 110, 115, 120, 130, 140, 150)
)
Output:
# A tibble: 46 × 4
mdm perc price count
<int> <dbl> <int> <int>
1 7 50 NA NA
2 7 60 NA NA
3 7 70 NA NA
4 7 80 NA NA
5 7 85 NA NA
6 7 90 NA NA
7 7 95 NA NA
8 7 96 NA NA
9 7 97 NA NA
10 7 98 NA NA
11 7 99 NA NA
12 7 100 100 200
13 7 101 NA NA
14 7 102 NA NA
15 7 103 NA NA
16 7 104 NA NA
17 7 105 NA NA
18 7 110 NA NA
19 7 115 NA NA
20 7 120 NA NA
21 7 130 NA NA
22 7 140 NA NA
23 7 150 NA NA
24 8 50 NA NA
25 8 60 NA NA
26 8 70 NA NA
27 8 80 NA NA
28 8 85 NA NA
29 8 90 NA NA
30 8 95 NA NA
31 8 96 NA NA
32 8 97 NA NA
33 8 98 NA NA
34 8 99 NA NA
35 8 100 200 300
36 8 101 NA NA
37 8 102 NA NA
38 8 103 NA NA
39 8 104 NA NA
40 8 105 NA NA
41 8 110 NA NA
42 8 115 NA NA
43 8 120 NA NA
44 8 130 NA NA
45 8 140 NA NA
46 8 150 NA NA
merge it with an expand.grid.
merge(cbind(dat, perc=100),
expand.grid(mdm=unique(dat$mdm), perc=c(50, 60, 70, 80, 85, 90, 95, 96, 97,
98, 99, 100, 101, 102, 103, 104, 105,
110, 115, 120, 130, 140, 150)),
all=TRUE)
# mdm perc price count
# 1 7 50 NA NA
# 2 7 60 NA NA
# 3 7 70 NA NA
# 4 7 80 NA NA
# 5 7 85 NA NA
# 6 7 90 NA NA
# 7 7 95 NA NA
# 8 7 96 NA NA
# 9 7 97 NA NA
# 10 7 98 NA NA
# 11 7 99 NA NA
# 12 7 100 100 200
# 13 7 101 NA NA
# 14 7 102 NA NA
# 15 7 103 NA NA
# 16 7 104 NA NA
# 17 7 105 NA NA
# 18 7 110 NA NA
# 19 7 115 NA NA
# 20 7 120 NA NA
# 21 7 130 NA NA
# 22 7 140 NA NA
# 23 7 150 NA NA
# 24 8 50 NA NA
# 25 8 60 NA NA
# 26 8 70 NA NA
# 27 8 80 NA NA
# 28 8 85 NA NA
# 29 8 90 NA NA
# 30 8 95 NA NA
# 31 8 96 NA NA
# 32 8 97 NA NA
# 33 8 98 NA NA
# 34 8 99 NA NA
# 35 8 100 200 300
# 36 8 101 NA NA
# 37 8 102 NA NA
# 38 8 103 NA NA
# 39 8 104 NA NA
# 40 8 105 NA NA
# 41 8 110 NA NA
# 42 8 115 NA NA
# 43 8 120 NA NA
# 44 8 130 NA NA
# 45 8 140 NA NA
# 46 8 150 NA NA

Scraping a Table into R using XML package

I am trying to scrape this table into R.
I am reading in the data using the XML library with the following command.
acsi <- htmlParse("https://www.theacsi.org/index.php?option=com_content&view=article&id=147&catid=&Itemid=212&i=Wireless+Telephone+Service")
However, I immediately get this: Warning: XML content does not seem to be XML: 'ss+Telephone+Service'. What am I doing wrong? Why isn't my table reading in properly?
Not sure about the package you tried, but here's a way to do it using rvest.
library(rvest)
raw <- read_html("https://www.theacsi.org/index.php?option=com_content&view=article&id=147&catid=&Itemid=212&i=Wireless+Telephone+Service")
df <- raw %>% html_nodes("table") %>% html_table()
head(df)
> head(df)
[[1]]
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15
1 Base-line 95 96 97 98 99 0 1 2 3 04 05 06 07
2 All Others NA NA NA NA NA NA NA NA NA 70 65 68 68
3 TracFone Wireless NA NA NA NA NA NA NA NA NA NM NM NM NM
4 T-Mobile NA NA NA NA NA NA NA NA NA NM 64 69 70
5 Verizon Wireless NA NA NA NA NA NA NA NA NA 68 67 69 71
6 Wireless Telephone Service NA NA NA NA NA NA NA NA NA 65 63 66 68
7 AT&T NA NA NA NA NA NA NA NA NA 63 62 63 68
8 U.S. Cellular NA NA NA NA NA NA NA NA NA NM NM NM NM
9 Sprint (T-Mobile) NA NA NA NA NA NA NA NA NA 59 63 63 61
10 Nextel Communications NA NA NA NA NA NA NA NA NA NM 59 #
11 AT&T Wireless NA NA NA NA NA NA NA NA NA 61 #
12 Sprint NA NA NA NA NA NA NA NA NA 59 63 63 61
X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30
1 08 09 10 11 12 13 14 15 16 17 18 19 20 21 PreviousYear%Change
2 71 73 76 77 76 78 78 79 77 79 80 81 77 NA -4.9
3 NM NM NM NM NM NM NM 77 75 77 78 78 76 NA -2.6
4 71 71 73 70 69 68 69 70 74 73 76 76 75 NA -1.3
5 72 74 73 72 70 73 75 71 71 74 74 74 74 NA 0.0
6 68 69 72 71 70 72 72 70 71 73 74 75 74 NA -1.3
7 71 67 69 66 69 70 68 70 71 72 74 74 74 NA 0.0
8 NM NM NM NM NM NM NM NM 72 74 74 74 71 NA -4.1
9 56 63 70 72 71 71 68 65 70 73 70 69 70 NA 1.4
10 NA NA NA NA NA NA N/A
11 NA NA NA NA NA NA N/A
12 56 63 70 72 71 71 68 65 70 73 70 69 NA NA -1.4

How to replicate a vector in R

I need to replicate the vector in such a way that the numbers change because currently I only replicate the same numbers.
example:
> rep(c(sample(c(1:100),5, replace = T),sample(NA ,5, replace = T)), 2)
[1] 33 91 48 18 29 NA NA NA NA NA 33 91 48 18 29 NA NA NA NA NA
I would like
[1] 33 91 48 18 29 NA NA NA NA NA 23 45 27 67 55 NA NA NA NA NA
You even had the function name in the title :)
mat <-
replicate(2, c(sample(c(1:100), 5, replace = T), sample(NA, 5, replace = T)))
mat
# [,1] [,2]
# [1,] 6 40
# [2,] 86 37
# [3,] 2 81
# [4,] 35 57
# [5,] 12 15
# [6,] NA NA
# [7,] NA NA
# [8,] NA NA
# [9,] NA NA
# [10,] NA NA
c(mat)
# [1] 6 86 2 35 12 NA NA NA NA NA 40 37 81 57 15 NA NA NA NA NA
as.vector(rbind(matrix(sample(c(1:100), 200, replace = T),5,40),matrix(NA,5,40)))
[1] 30 93 2 72 78 NA NA NA NA NA 36 90 40 37 72 NA NA NA NA NA 56 71 100 100 73 NA NA NA NA
[30] NA 27 41 15 57 38 NA NA NA NA NA 62 6 4 35 99 NA NA NA NA NA 77 57 71 25 31 NA NA NA
[59] NA NA 37 92 28 62 20 NA NA NA NA NA 29 42 60 65 28 NA NA NA NA NA 78 31 12 93 80 NA NA
[88] NA NA NA 44 74 98 26 33 NA NA NA NA NA 4 53 86 89 24 NA NA NA NA NA 37 15 14 81 82 NA
[117] NA NA NA NA 97 96 72 53 56 NA NA NA NA NA 71 91 50 73 20 NA NA NA NA NA 98 93 75 2 3
[146] NA NA NA NA NA 38 15 28 55 69 NA NA NA NA NA 92 78 37 43 81 NA NA NA NA NA 1 90 45 97
[175] 83 NA NA NA NA NA 90 23 68 80 91 NA NA NA NA NA 57 52 80 34 93 NA NA NA NA NA 35 74 70
[204] 60 39 NA NA NA NA NA 49 97 87 62 33 NA NA NA NA NA 35 11 13 50 60 NA NA NA NA NA 90 90
[233] 40 34 68 NA NA NA NA NA 56 25 38 81 88 NA NA NA NA NA 73 45 94 73 75 NA NA NA NA NA 22
[262] 96 3 51 19 NA NA NA NA NA 33 52 4 77 60 NA NA NA NA NA 65 64 53 5 44 NA NA NA NA NA
[291] 35 23 29 35 36 NA NA NA NA NA 73 99 35 20 22 NA NA NA NA NA 41 86 83 18 44 NA NA NA NA
[320] NA 39 29 91 36 32 NA NA NA NA NA 95 51 81 51 52 NA NA NA NA NA 89 73 21 21 79 NA NA NA
[349] NA NA 64 88 78 71 59 NA NA NA NA NA 91 90 30 58 15 NA NA NA NA NA 64 6 34 21 1 NA NA
[378] NA NA NA 17 77 62 45 90 NA NA NA NA NA 40 66 41 8 25 NA NA NA NA NA
it is an additional line, but it gets the job done:
fun <- function() c(sample(c(1:100),5, replace = T), sample(NA ,5, replace = T))
c(fun(), fun())

R data table interval calculation by column

I need to calculate unique ids within different intervals (3,4,5,6 months...) by for each month. I need to do that for different groups as well such as age, gender etc. This is how my data looks like:
ID Yr_month Age Gender
11 2012-01 30 M
11 2012-02 30 M
...
11 2012-12 30 M
12 2012-01 32 F...
The output should look like this:
Yr_month cnt_distinctID_3 count_distinctID_4....
2012-01 300 400
I am able to do this using multiple for loops and dplyr. Is there a faster way using data table to get this done? Thanks!
This is how my code looks like:
setorderv(test,c("id","year_mth"))
setkeyv(test,c("id"))
test <- data.table(cbind(test, first=0L))
test[test[unique(test),,mult="first", which=TRUE], first:=1L]
test1 <- test %>%
group_by(year_mth) %>%
summarize(first_total = sum(first)) %>%
select(year_mth,first_total)
test2 <- test1 %>%
arrange(year_mth) %>%
mutate(Cusum = cumsum(first_total)) %>%
select(year_mth, Cusum)
Then I am running for loop by year_mth and K<- seq(3:36) on the above. Its taking a lot of time as I am running a big dataset.
If I understand the question correctly, the OP wants to count unique IDs in rolling windows of varying sizes. The counts are to be presented in a table where the length of the rolling window runs horizontally and the ending month of the rolling window vertically.
This approach creates all intervalls as a data.table and aggregates during a non-equi join with the dataset. Finally, the results are reshaped from long to wide format.
Creating a sample dataset
The OP has not provided a sample dataset. So, we have to make up our own:
# create year-month sequence
yr_m <- CJ(2012:2014, 1:12)[, sprintf("%4i-%02i", V1, V2)]
n_id <- 100L # number of individual IDs
n_row <- 1e3L # number of rows to create
set.seed(123L) # required for reproducible results
DT <- data.table(ID = sample.int(n_id, n_row, TRUE),
Yr_month = ordered(sample(yr_m, n_row, TRUE), yr_m))
str(DT)
Classes ‘data.table’ and 'data.frame': 1000 obs. of 2 variables:
$ ID : int 29 79 41 89 95 5 53 90 56 46 ...
$ Yr_month: Ord.factor w/ 36 levels "2012-01"<"2012-02"<..: 10 22 6 31 31 18 28 11 3 16 ...
- attr(*, ".internal.selfref")=<externalptr>
Note that Yr_month has turned into a factor which is required for the subsequent non-equi join which involves comparison operations.
Create intervals
intervals <- rbindlist(
lapply(3:24, function(x) data.table(K = x,
start = head(yr_m, -(x - 1L)),
end = tail(yr_m, -(x - 1L)))
))
For illustration, only intervals of 3 to 24 months length are considered here.
intervals
K start end
1: 3 2012-01 2012-03
2: 3 2012-02 2012-04
3: 3 2012-03 2012-05
4: 3 2012-04 2012-06
5: 3 2012-05 2012-07
---
513: 24 2012-09 2014-08
514: 24 2012-10 2014-09
515: 24 2012-11 2014-10
516: 24 2012-12 2014-11
517: 24 2013-01 2014-12
Aggregate during non-equi join and reshape
DT[intervals, on = .(Yr_month >= start, Yr_month <= end),
.(count = uniqueN(ID), end, K), by = .EACHI][
, dcast(.SD, end ~ K, value.var = "count")]
end 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
1: 2012-03 59 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2: 2012-04 53 64 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
3: 2012-05 59 69 80 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
4: 2012-06 57 72 78 88 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
5: 2012-07 53 62 75 80 89 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
6: 2012-08 50 65 71 81 86 91 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
7: 2012-09 58 65 71 76 84 89 93 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
8: 2012-10 59 67 72 77 82 88 92 94 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
9: 2012-11 57 66 72 77 82 86 91 94 96 NA NA NA NA NA NA NA NA NA NA NA NA NA
10: 2012-12 57 67 75 80 83 88 91 95 97 98 NA NA NA NA NA NA NA NA NA NA NA NA
11: 2013-01 53 63 71 78 83 85 90 93 97 98 99 NA NA NA NA NA NA NA NA NA NA NA
12: 2013-02 57 68 77 82 87 91 92 95 97 97 98 99 NA NA NA NA NA NA NA NA NA NA
13: 2013-03 56 67 75 83 86 88 92 93 96 97 97 98 99 NA NA NA NA NA NA NA NA NA
14: 2013-04 57 67 76 81 87 90 92 95 96 98 99 99 100 100 NA NA NA NA NA NA NA NA
15: 2013-05 65 74 79 83 86 90 93 95 97 98 99 99 99 100 100 NA NA NA NA NA NA NA
16: 2013-06 71 77 83 85 87 89 92 95 97 98 99 99 99 99 100 100 NA NA NA NA NA NA
17: 2013-07 65 78 83 88 90 91 91 94 96 97 98 99 99 99 99 100 100 NA NA NA NA NA
18: 2013-08 57 73 84 88 91 93 94 94 97 99 99 99 100 100 100 100 100 100 NA NA NA NA
19: 2013-09 62 71 81 90 92 95 96 96 96 97 99 99 99 100 100 100 100 100 100 NA NA NA
20: 2013-10 62 71 79 87 93 95 98 98 98 98 98 99 99 99 100 100 100 100 100 100 NA NA
21: 2013-11 61 74 81 87 91 95 96 99 99 99 99 99 100 100 100 100 100 100 100 100 100 NA
22: 2013-12 64 76 83 88 93 96 98 99 99 99 99 99 99 100 100 100 100 100 100 100 100 100
23: 2014-01 56 70 78 84 89 94 96 98 99 99 99 99 99 99 100 100 100 100 100 100 100 100
24: 2014-02 52 67 76 83 88 90 95 96 98 99 99 99 99 99 99 100 100 100 100 100 100 100
25: 2014-03 51 62 72 80 85 89 91 95 96 98 99 99 99 99 99 99 100 100 100 100 100 100
26: 2014-04 58 62 71 76 83 87 90 92 96 97 99 99 99 99 99 99 99 100 100 100 100 100
27: 2014-05 60 67 70 78 82 88 90 92 94 97 98 99 99 99 99 99 99 99 100 100 100 100
28: 2014-06 58 74 78 80 85 88 93 93 94 94 97 98 99 99 99 99 99 99 99 100 100 100
29: 2014-07 60 70 81 83 85 88 90 94 94 95 95 98 99 100 100 100 100 100 100 100 100 100
30: 2014-08 64 71 79 89 91 91 93 94 96 96 96 96 99 99 100 100 100 100 100 100 100 100
31: 2014-09 57 68 74 82 92 94 94 94 95 96 96 96 96 99 99 100 100 100 100 100 100 100
32: 2014-10 57 67 74 79 87 96 97 97 97 97 98 98 98 98 100 100 100 100 100 100 100 100
33: 2014-11 48 63 71 77 82 89 97 98 98 98 98 99 99 99 99 100 100 100 100 100 100 100
34: 2014-12 52 61 71 77 82 86 91 99 99 99 99 99 99 99 99 99 100 100 100 100 100 100
end 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
uniqueN() is a data.table function which is used here to count the number of unique IDs.

Modified: Replacing values of rows with identical rownames in a dataframes

I have a dataframe with few rows of identical row names. I want to replace NAs of every second row with the non NA of identical immediate previous row. But if there already exists a value in second row then, it should not be affected.
Please see below:
df:
date 1 1 2 3 3
20040101 100 150 NA NA 140
20040115 200 NA 200 NA NA
20040131 170 NA NA NA NA
20040131 NA 165 180 190 190
20040205 NA NA NA NA NA
20040228 140 145 165 150 155
20040228 NA NA NA NA NA
20040301 150 155 170 150 160
20040315 NA NA 180 190 200
20040331 NA 145 160 NA NA
20040331 NA NA NA 175 180
I want the resulting data frame to be:
df_new:
date 1 1 2 3 3
20040101 100 150 NA NA 140
20040115 200 NA 200 NA NA
20040131 170 165 180 190 190
20040205 NA NA NA NA NA
20040228 140 145 165 150 155
20040301 150 155 170 150 160
20040315 NA NA 180 190 200
20040331 NA 145 160 175 180
I have tried the following for loop, but results are not as desired:
for (i in 2:nrow(df)) {
if(all(is.na(df[i, ]))){ df[i, ] = fill[(i-1), ]}
out[i, ]<- df[i-1,ncol]
}
Please guide me in this regard.
Thanks
Saba
Here is an option using data.table. We place the datasets in a list, then make it a single data.table using rbindlist, grouped by 'date', loop through the columns (lapply(.SD, ..) and subset the non-NA elements.
library(data.table)
unique(rbindlist(list(df1, df2))[,lapply(.SD, function(x)
if(all(is.na(x))) x else x[!is.na(x)]) , date])
# date X11A X11A.1 X21B X3CC X3CC.1
#1: 20040101 100 150 NA NA 140
#2: 20040115 200 NA 200 NA NA
#3: 20040131 170 165 180 190 190
#4: 20040205 NA NA NA NA NA
#5: 20040228 140 145 165 150 155
#6: 20040301 150 155 170 150 160
#7: 20040315 NA NA 180 190 200
#8: 20040331 NA 145 160 175 180
As the OP noted about using for loop and which, another option with data.table that uses both of them with set would be
setDT(df1)
dfN <- setDT(df2)[df1, on = "date"]
for(j in 2:ncol(df1)){
set(df1, i = which(is.na(df1[[j]])), j = j,
value = dfN[[j]][is.na(df1[[j]])])
}
df1
# date X11A X11A.1 X21B X3CC X3CC.1
#1: 20040101 100 150 NA NA 140
#2: 20040115 200 NA 200 NA NA
#3: 20040131 170 165 180 190 190
#4: 20040205 NA NA NA NA NA
#5: 20040228 140 145 165 150 155
#6: 20040301 150 155 170 150 160
#7: 20040315 NA NA 180 190 200
#8: 20040331 NA 145 160 175 180
An alternate solution using data.table:
library(data.table)
setDT(df)
df[,lapply(.SD,mean,na.rm=T),by=date]
## date X11A X11A.1 X21B X3CC X3CC.1
##1: 20040101 100 150 NaN NaN 140
##2: 20040115 200 NaN 200 NaN NaN
##3: 20040131 170 165 180 190 190
##4: 20040205 NaN NaN NaN NaN NaN
##5: 20040228 140 145 165 150 155
##6: 20040301 150 155 170 150 160
##7: 20040315 NaN NaN 180 190 200
##8: 20040331 NaN 145 160 175 180
Assumption: Here, I am assuming that in case numerous tuples occur for a single date, each column has only one unique value, otherwise NA.

Resources