I am trying to scrape this table into R.
I am reading in the data using the XML library with the following command.
acsi <- htmlParse("https://www.theacsi.org/index.php?option=com_content&view=article&id=147&catid=&Itemid=212&i=Wireless+Telephone+Service")
However, I immediately get this: Warning: XML content does not seem to be XML: 'ss+Telephone+Service'. What am I doing wrong? Why isn't my table reading in properly?
Not sure about the package you tried, but here's a way to do it using rvest.
library(rvest)
raw <- read_html("https://www.theacsi.org/index.php?option=com_content&view=article&id=147&catid=&Itemid=212&i=Wireless+Telephone+Service")
df <- raw %>% html_nodes("table") %>% html_table()
head(df)
> head(df)
[[1]]
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15
1 Base-line 95 96 97 98 99 0 1 2 3 04 05 06 07
2 All Others NA NA NA NA NA NA NA NA NA 70 65 68 68
3 TracFone Wireless NA NA NA NA NA NA NA NA NA NM NM NM NM
4 T-Mobile NA NA NA NA NA NA NA NA NA NM 64 69 70
5 Verizon Wireless NA NA NA NA NA NA NA NA NA 68 67 69 71
6 Wireless Telephone Service NA NA NA NA NA NA NA NA NA 65 63 66 68
7 AT&T NA NA NA NA NA NA NA NA NA 63 62 63 68
8 U.S. Cellular NA NA NA NA NA NA NA NA NA NM NM NM NM
9 Sprint (T-Mobile) NA NA NA NA NA NA NA NA NA 59 63 63 61
10 Nextel Communications NA NA NA NA NA NA NA NA NA NM 59 #
11 AT&T Wireless NA NA NA NA NA NA NA NA NA 61 #
12 Sprint NA NA NA NA NA NA NA NA NA 59 63 63 61
X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30
1 08 09 10 11 12 13 14 15 16 17 18 19 20 21 PreviousYear%Change
2 71 73 76 77 76 78 78 79 77 79 80 81 77 NA -4.9
3 NM NM NM NM NM NM NM 77 75 77 78 78 76 NA -2.6
4 71 71 73 70 69 68 69 70 74 73 76 76 75 NA -1.3
5 72 74 73 72 70 73 75 71 71 74 74 74 74 NA 0.0
6 68 69 72 71 70 72 72 70 71 73 74 75 74 NA -1.3
7 71 67 69 66 69 70 68 70 71 72 74 74 74 NA 0.0
8 NM NM NM NM NM NM NM NM 72 74 74 74 71 NA -4.1
9 56 63 70 72 71 71 68 65 70 73 70 69 70 NA 1.4
10 NA NA NA NA NA NA N/A
11 NA NA NA NA NA NA N/A
12 56 63 70 72 71 71 68 65 70 73 70 69 NA NA -1.4
Related
I am trying replace NA's over multiple columns with corresponding values from other columns in the df.
df = data.frame(ID = sample(1000:9999,10),
Age = sample(18:99,10),
Gender = sample(c("M","F"),10, replace = TRUE),
Test1 = sample(60:100,10),
Test2 = sample(60:100,10),
Test3 = sample(60:100,10),
Test1.x = rep(NA,10),
Test2.x = rep(NA,10),
Test3.x = rep(NA,10))
df$Test1[c(2,3,8)] = NA
df$Test2[c(4,10)] = NA
df$Test3[c(1,7)] = NA
df$Test1.x[c(2,3,4,8)] = sample(60:100,4)
df$Test2.x[c(4,9,10)] = sample(60:100,3)
df$Test3.x[c(1,6,7)] = sample(60:100,3)
print(df)
ID Age Gender Test1 Test2 Test3 Test1.x Test2.x Test3.x
1 7877 40 M 78 70 NA NA NA 84
2 6345 54 F NA 99 61 62 NA NA
3 9170 41 F NA 80 96 82 NA NA
4 2400 83 M 100 NA 100 94 95 NA
5 5920 66 M 77 62 69 NA NA NA
6 2569 34 M 99 96 81 NA NA 100
7 7879 28 M 64 71 NA NA NA 90
8 8652 53 F NA 74 89 95 NA NA
9 6357 97 F 92 86 83 NA 86 NA
10 1943 45 M 95 NA 98 NA 72 NA
I would like to replace only the NAs in the test scores with the corresponding test.x score, while using str_replace. My actual data frame contain more than 3 columns but all the corresponding column names are the same with the ".x" afterwards.
Any ideas to make this quick and easy? I'm struggling between mutating across said columns or using replace_nas.
Within dplyr we could use coalesce with across.
library(dplyr)
df |>
mutate(across(starts_with("Test") & !ends_with(".x"),
~ coalesce(., get(paste0(cur_column(), ".x")))))
Output:
ID Age Gender Test1 Test2 Test3 Test1.x Test2.x Test3.x
1 5022 90 M 94 68 79 NA NA 79
2 1625 41 M 71 66 89 71 NA NA
3 6438 86 M 86 94 94 86 NA NA
4 3249 93 F 74 90 76 68 90 NA
5 7338 70 F 64 63 70 NA NA NA
6 9416 27 F 78 74 75 NA NA 64
7 4374 45 F 82 100 60 NA NA 60
8 6226 21 F 61 82 63 61 NA NA
9 5265 97 M 83 83 68 NA 89 NA
10 5441 95 M 70 79 99 NA 79 NA
Update 2/9:
To allow for other variable names we could do a specific solution or a more general one:
Specific:
df |>
mutate(across(c(HW, Exam, Final) & !ends_with(".x"),
~ coalesce(., get(paste0(cur_column(), ".x")))))
General:
df |>
mutate(across(ends_with(".x"),
~ coalesce(get(sub("\\.x", "", cur_column())), .)))
New output:
ID Age Gender HW Exam Final HW.x Exam.x Final.x
1 5166 80 F 60 79 NA 60 79 64
2 3375 35 M NA 88 72 65 88 72
3 5722 19 F NA 65 75 81 65 75
4 3701 27 M 89 NA 61 89 89 61
5 1424 67 F 69 94 91 69 94 91
6 1407 20 F 75 72 66 75 72 66
7 2927 39 M 63 82 NA 63 82 86
8 7315 90 F NA 92 79 70 92 79
9 7420 76 F 87 83 87 87 83 87
10 9334 73 F 86 NA 64 86 82 64
New data:
df = data.frame(ID = sample(1000:9999,10),
Age = sample(18:99,10),
Gender = sample(c("M","F"),10, replace = TRUE),
HW = sample(60:100,10),
Exam = sample(60:100,10),
Final = sample(60:100,10),
HW.x = rep(NA,10),
Exam.x = rep(NA,10),
Final.x = rep(NA,10))
df$HW[c(2,3,8)] = NA
df$Exam[c(4,10)] = NA
df$Final[c(1,7)] = NA
df$HW.x[c(2,3,4,8)] = sample(60:100,4)
df$Exam.x[c(4,9,10)] = sample(60:100,3)
df$Final.x[c(1,6,7)] = sample(60:100,3)
Using dplyover
library(dplyover)
df <- df %>%
mutate(across2(matches("Test\\d+$"), ends_with(".x"),
coalesce, .names = "{xcol}"))
-output
df
ID Age Gender Test1 Test2 Test3 Test1.x Test2.x Test3.x
1 7877 40 M 78 70 84 NA NA 84
2 6345 54 F 62 99 61 62 NA NA
3 9170 41 F 82 80 96 82 NA NA
4 2400 83 M 100 95 100 94 95 NA
5 5920 66 M 77 62 69 NA NA NA
6 2569 34 M 99 96 81 NA NA 100
7 7879 28 M 64 71 90 NA NA 90
8 8652 53 F 95 74 89 95 NA NA
9 6357 97 F 92 86 83 NA 86 NA
10 1943 45 M 95 72 98 NA 72 NA
I am very confused by the R package Matrix.utils and its implementation of merge.Matrix(). I want to merge two matrices with 0 common values, but merge common column names and fill the rest with zeros.
The results are inconsistent and sensitive to whether merge() or merge.Matrix() is specified. I expected this to be similar to the dplyr::join() function but this is not true.
Simulating the data I plan to use:
mtx.x <- sample(1:100, 100) ; mtx.x <- matrix(mtx.x, nrow = 10)
mtx.y <- sample(1:100, 100) ; mtx.y <- matrix(mtx.y, nrow = 10)
colnames(mtx.x) <- letters[1:10] ; colnames(mtx.y) <- letters[6:15]
mtx.x ; mtx.y
a b c d e f g h i j
[1,] 82 61 76 36 27 67 85 38 29 87
[2,] 83 89 43 70 81 30 35 17 39 95
[3,] 1 75 69 54 66 3 10 47 93 73
[4,] 52 98 26 88 51 64 31 72 13 92
[5,] 44 74 86 9 63 58 50 56 6 49
[6,] 24 16 77 12 55 97 18 45 14 40
[7,] 11 5 79 94 2 80 37 15 41 42
[8,] 100 84 65 59 34 62 53 60 99 28
[9,] 19 78 8 25 96 21 90 46 68 71
[10,] 32 20 7 4 57 91 22 48 33 23
f g h i j k l m n o
[1,] 24 22 8 94 89 7 50 93 40 4
[2,] 63 80 32 44 64 83 16 96 46 47
[3,] 85 30 81 95 23 91 19 92 99 52
[4,] 21 55 61 58 27 76 67 65 37 14
[5,] 9 66 12 2 41 11 56 84 87 39
[6,] 18 57 88 3 68 100 74 62 82 25
[7,] 70 90 43 54 72 86 69 20 29 51
[8,] 1 59 60 45 79 75 15 5 73 10
[9,] 38 28 26 17 53 36 97 13 77 49
[10,] 6 71 98 35 42 31 78 33 48 34
Case 1: merge() with all.x/all.y set to TRUE does what I want
merge(x = mtx.x, y = mtx.y,
all.x = T, all.y = T)
f g h i j a b c d e k l m n o
1 1 59 60 45 79 NA NA NA NA NA 75 15 5 73 10
2 3 10 47 93 73 1 75 69 54 66 NA NA NA NA NA
3 6 71 98 35 42 NA NA NA NA NA 31 78 33 48 34
4 9 66 12 2 41 NA NA NA NA NA 11 56 84 87 39
5 18 57 88 3 68 NA NA NA NA NA 100 74 62 82 25
6 21 55 61 58 27 NA NA NA NA NA 76 67 65 37 14
7 21 90 46 68 71 19 78 8 25 96 NA NA NA NA NA
8 24 22 8 94 89 NA NA NA NA NA 7 50 93 40 4
9 30 35 17 39 95 83 89 43 70 81 NA NA NA NA NA
10 38 28 26 17 53 NA NA NA NA NA 36 97 13 77 49
11 58 50 56 6 49 44 74 86 9 63 NA NA NA NA NA
12 62 53 60 99 28 100 84 65 59 34 NA NA NA NA NA
13 63 80 32 44 64 NA NA NA NA NA 83 16 96 46 47
14 64 31 72 13 92 52 98 26 88 51 NA NA NA NA NA
15 67 85 38 29 87 82 61 76 36 27 NA NA NA NA NA
16 70 90 43 54 72 NA NA NA NA NA 86 69 20 29 51
17 80 37 15 41 42 11 5 79 94 2 NA NA NA NA NA
18 85 30 81 95 23 NA NA NA NA NA 91 19 92 99 52
19 91 22 48 33 23 32 20 7 4 57 NA NA NA NA NA
20 97 18 45 14 40 24 16 77 12 55 NA NA NA NA NA
Case 2: merge.Matrix() with same arguments wants me to specify by.x/by.y
merge.Matrix(x = mtx.x, y = mtx.y,
all.x = T, all.y = T)
Error in grr::matches(by.x, by.y, all.x, all.y, nomatch = NULL) :
argument "by.x" is missing, with no default
Case 3: specifying by.x/by.y as respective column names does not merge common columns. also, no idea why its offsetting the matrices by 5 and not 10, the matrices have no common values.
merge.Matrix(x = mtx.x, y = mtx.y,
all.x = T, all.y = T,
by.x = colnames(mtx.x), by.y = colnames(mtx.y))
a b c d e f g h i j y.f y.g y.h y.i y.j k l m n o
82 61 76 36 27 67 85 38 29 87 NA NA NA NA NA NA NA NA NA NA
83 89 43 70 81 30 35 17 39 95 NA NA NA NA NA NA NA NA NA NA
1 75 69 54 66 3 10 47 93 73 NA NA NA NA NA NA NA NA NA NA
52 98 26 88 51 64 31 72 13 92 NA NA NA NA NA NA NA NA NA NA
44 74 86 9 63 58 50 56 6 49 NA NA NA NA NA NA NA NA NA NA
24 16 77 12 55 97 18 45 14 40 24 22 8 94 89 7 50 93 40 4
11 5 79 94 2 80 37 15 41 42 63 80 32 44 64 83 16 96 46 47
100 84 65 59 34 62 53 60 99 28 85 30 81 95 23 91 19 92 99 52
19 78 8 25 96 21 90 46 68 71 21 55 61 58 27 76 67 65 37 14
32 20 7 4 57 91 22 48 33 23 9 66 12 2 41 11 56 84 87 39
fill.x NA NA NA NA NA NA NA NA NA NA 18 57 88 3 68 100 74 62 82 25
fill.x NA NA NA NA NA NA NA NA NA NA 70 90 43 54 72 86 69 20 29 51
fill.x NA NA NA NA NA NA NA NA NA NA 1 59 60 45 79 75 15 5 73 10
fill.x NA NA NA NA NA NA NA NA NA NA 38 28 26 17 53 36 97 13 77 49
fill.x NA NA NA NA NA NA NA NA NA NA 6 71 98 35 42 31 78 33 48 34
Case 4: by.x/by.y specified as common column names, all.x/all.y set to TRUE and fill.x/fill.y set to 0 does not do a full join as the documentation claims
common <- intersect(colnames(mtx.x), colnames(mtx.y))
merge.Matrix(x = mtx.x, y = mtx.y,
all.x = T, all.y = T,
by.x = common, by.y = common)
a b c d e f g h i j y.f y.g y.h y.i y.j k l m n o
82 61 76 36 27 67 85 38 29 87 24 22 8 94 89 7 50 93 40 4
83 89 43 70 81 30 35 17 39 95 63 80 32 44 64 83 16 96 46 47
1 75 69 54 66 3 10 47 93 73 85 30 81 95 23 91 19 92 99 52
52 98 26 88 51 64 31 72 13 92 21 55 61 58 27 76 67 65 37 14
44 74 86 9 63 58 50 56 6 49 9 66 12 2 41 11 56 84 87 39
I need to replicate the vector in such a way that the numbers change because currently I only replicate the same numbers.
example:
> rep(c(sample(c(1:100),5, replace = T),sample(NA ,5, replace = T)), 2)
[1] 33 91 48 18 29 NA NA NA NA NA 33 91 48 18 29 NA NA NA NA NA
I would like
[1] 33 91 48 18 29 NA NA NA NA NA 23 45 27 67 55 NA NA NA NA NA
You even had the function name in the title :)
mat <-
replicate(2, c(sample(c(1:100), 5, replace = T), sample(NA, 5, replace = T)))
mat
# [,1] [,2]
# [1,] 6 40
# [2,] 86 37
# [3,] 2 81
# [4,] 35 57
# [5,] 12 15
# [6,] NA NA
# [7,] NA NA
# [8,] NA NA
# [9,] NA NA
# [10,] NA NA
c(mat)
# [1] 6 86 2 35 12 NA NA NA NA NA 40 37 81 57 15 NA NA NA NA NA
as.vector(rbind(matrix(sample(c(1:100), 200, replace = T),5,40),matrix(NA,5,40)))
[1] 30 93 2 72 78 NA NA NA NA NA 36 90 40 37 72 NA NA NA NA NA 56 71 100 100 73 NA NA NA NA
[30] NA 27 41 15 57 38 NA NA NA NA NA 62 6 4 35 99 NA NA NA NA NA 77 57 71 25 31 NA NA NA
[59] NA NA 37 92 28 62 20 NA NA NA NA NA 29 42 60 65 28 NA NA NA NA NA 78 31 12 93 80 NA NA
[88] NA NA NA 44 74 98 26 33 NA NA NA NA NA 4 53 86 89 24 NA NA NA NA NA 37 15 14 81 82 NA
[117] NA NA NA NA 97 96 72 53 56 NA NA NA NA NA 71 91 50 73 20 NA NA NA NA NA 98 93 75 2 3
[146] NA NA NA NA NA 38 15 28 55 69 NA NA NA NA NA 92 78 37 43 81 NA NA NA NA NA 1 90 45 97
[175] 83 NA NA NA NA NA 90 23 68 80 91 NA NA NA NA NA 57 52 80 34 93 NA NA NA NA NA 35 74 70
[204] 60 39 NA NA NA NA NA 49 97 87 62 33 NA NA NA NA NA 35 11 13 50 60 NA NA NA NA NA 90 90
[233] 40 34 68 NA NA NA NA NA 56 25 38 81 88 NA NA NA NA NA 73 45 94 73 75 NA NA NA NA NA 22
[262] 96 3 51 19 NA NA NA NA NA 33 52 4 77 60 NA NA NA NA NA 65 64 53 5 44 NA NA NA NA NA
[291] 35 23 29 35 36 NA NA NA NA NA 73 99 35 20 22 NA NA NA NA NA 41 86 83 18 44 NA NA NA NA
[320] NA 39 29 91 36 32 NA NA NA NA NA 95 51 81 51 52 NA NA NA NA NA 89 73 21 21 79 NA NA NA
[349] NA NA 64 88 78 71 59 NA NA NA NA NA 91 90 30 58 15 NA NA NA NA NA 64 6 34 21 1 NA NA
[378] NA NA NA 17 77 62 45 90 NA NA NA NA NA 40 66 41 8 25 NA NA NA NA NA
it is an additional line, but it gets the job done:
fun <- function() c(sample(c(1:100),5, replace = T), sample(NA ,5, replace = T))
c(fun(), fun())
I need to calculate unique ids within different intervals (3,4,5,6 months...) by for each month. I need to do that for different groups as well such as age, gender etc. This is how my data looks like:
ID Yr_month Age Gender
11 2012-01 30 M
11 2012-02 30 M
...
11 2012-12 30 M
12 2012-01 32 F...
The output should look like this:
Yr_month cnt_distinctID_3 count_distinctID_4....
2012-01 300 400
I am able to do this using multiple for loops and dplyr. Is there a faster way using data table to get this done? Thanks!
This is how my code looks like:
setorderv(test,c("id","year_mth"))
setkeyv(test,c("id"))
test <- data.table(cbind(test, first=0L))
test[test[unique(test),,mult="first", which=TRUE], first:=1L]
test1 <- test %>%
group_by(year_mth) %>%
summarize(first_total = sum(first)) %>%
select(year_mth,first_total)
test2 <- test1 %>%
arrange(year_mth) %>%
mutate(Cusum = cumsum(first_total)) %>%
select(year_mth, Cusum)
Then I am running for loop by year_mth and K<- seq(3:36) on the above. Its taking a lot of time as I am running a big dataset.
If I understand the question correctly, the OP wants to count unique IDs in rolling windows of varying sizes. The counts are to be presented in a table where the length of the rolling window runs horizontally and the ending month of the rolling window vertically.
This approach creates all intervalls as a data.table and aggregates during a non-equi join with the dataset. Finally, the results are reshaped from long to wide format.
Creating a sample dataset
The OP has not provided a sample dataset. So, we have to make up our own:
# create year-month sequence
yr_m <- CJ(2012:2014, 1:12)[, sprintf("%4i-%02i", V1, V2)]
n_id <- 100L # number of individual IDs
n_row <- 1e3L # number of rows to create
set.seed(123L) # required for reproducible results
DT <- data.table(ID = sample.int(n_id, n_row, TRUE),
Yr_month = ordered(sample(yr_m, n_row, TRUE), yr_m))
str(DT)
Classes ‘data.table’ and 'data.frame': 1000 obs. of 2 variables:
$ ID : int 29 79 41 89 95 5 53 90 56 46 ...
$ Yr_month: Ord.factor w/ 36 levels "2012-01"<"2012-02"<..: 10 22 6 31 31 18 28 11 3 16 ...
- attr(*, ".internal.selfref")=<externalptr>
Note that Yr_month has turned into a factor which is required for the subsequent non-equi join which involves comparison operations.
Create intervals
intervals <- rbindlist(
lapply(3:24, function(x) data.table(K = x,
start = head(yr_m, -(x - 1L)),
end = tail(yr_m, -(x - 1L)))
))
For illustration, only intervals of 3 to 24 months length are considered here.
intervals
K start end
1: 3 2012-01 2012-03
2: 3 2012-02 2012-04
3: 3 2012-03 2012-05
4: 3 2012-04 2012-06
5: 3 2012-05 2012-07
---
513: 24 2012-09 2014-08
514: 24 2012-10 2014-09
515: 24 2012-11 2014-10
516: 24 2012-12 2014-11
517: 24 2013-01 2014-12
Aggregate during non-equi join and reshape
DT[intervals, on = .(Yr_month >= start, Yr_month <= end),
.(count = uniqueN(ID), end, K), by = .EACHI][
, dcast(.SD, end ~ K, value.var = "count")]
end 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
1: 2012-03 59 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2: 2012-04 53 64 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
3: 2012-05 59 69 80 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
4: 2012-06 57 72 78 88 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
5: 2012-07 53 62 75 80 89 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
6: 2012-08 50 65 71 81 86 91 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
7: 2012-09 58 65 71 76 84 89 93 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
8: 2012-10 59 67 72 77 82 88 92 94 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
9: 2012-11 57 66 72 77 82 86 91 94 96 NA NA NA NA NA NA NA NA NA NA NA NA NA
10: 2012-12 57 67 75 80 83 88 91 95 97 98 NA NA NA NA NA NA NA NA NA NA NA NA
11: 2013-01 53 63 71 78 83 85 90 93 97 98 99 NA NA NA NA NA NA NA NA NA NA NA
12: 2013-02 57 68 77 82 87 91 92 95 97 97 98 99 NA NA NA NA NA NA NA NA NA NA
13: 2013-03 56 67 75 83 86 88 92 93 96 97 97 98 99 NA NA NA NA NA NA NA NA NA
14: 2013-04 57 67 76 81 87 90 92 95 96 98 99 99 100 100 NA NA NA NA NA NA NA NA
15: 2013-05 65 74 79 83 86 90 93 95 97 98 99 99 99 100 100 NA NA NA NA NA NA NA
16: 2013-06 71 77 83 85 87 89 92 95 97 98 99 99 99 99 100 100 NA NA NA NA NA NA
17: 2013-07 65 78 83 88 90 91 91 94 96 97 98 99 99 99 99 100 100 NA NA NA NA NA
18: 2013-08 57 73 84 88 91 93 94 94 97 99 99 99 100 100 100 100 100 100 NA NA NA NA
19: 2013-09 62 71 81 90 92 95 96 96 96 97 99 99 99 100 100 100 100 100 100 NA NA NA
20: 2013-10 62 71 79 87 93 95 98 98 98 98 98 99 99 99 100 100 100 100 100 100 NA NA
21: 2013-11 61 74 81 87 91 95 96 99 99 99 99 99 100 100 100 100 100 100 100 100 100 NA
22: 2013-12 64 76 83 88 93 96 98 99 99 99 99 99 99 100 100 100 100 100 100 100 100 100
23: 2014-01 56 70 78 84 89 94 96 98 99 99 99 99 99 99 100 100 100 100 100 100 100 100
24: 2014-02 52 67 76 83 88 90 95 96 98 99 99 99 99 99 99 100 100 100 100 100 100 100
25: 2014-03 51 62 72 80 85 89 91 95 96 98 99 99 99 99 99 99 100 100 100 100 100 100
26: 2014-04 58 62 71 76 83 87 90 92 96 97 99 99 99 99 99 99 99 100 100 100 100 100
27: 2014-05 60 67 70 78 82 88 90 92 94 97 98 99 99 99 99 99 99 99 100 100 100 100
28: 2014-06 58 74 78 80 85 88 93 93 94 94 97 98 99 99 99 99 99 99 99 100 100 100
29: 2014-07 60 70 81 83 85 88 90 94 94 95 95 98 99 100 100 100 100 100 100 100 100 100
30: 2014-08 64 71 79 89 91 91 93 94 96 96 96 96 99 99 100 100 100 100 100 100 100 100
31: 2014-09 57 68 74 82 92 94 94 94 95 96 96 96 96 99 99 100 100 100 100 100 100 100
32: 2014-10 57 67 74 79 87 96 97 97 97 97 98 98 98 98 100 100 100 100 100 100 100 100
33: 2014-11 48 63 71 77 82 89 97 98 98 98 98 99 99 99 99 100 100 100 100 100 100 100
34: 2014-12 52 61 71 77 82 86 91 99 99 99 99 99 99 99 99 99 100 100 100 100 100 100
end 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
uniqueN() is a data.table function which is used here to count the number of unique IDs.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
How do I shift the data under ChangeJanAug from row 21 up, in order that the NA are filled with the correct numbers? Since I do not want to shift all the rows, I have to clue what to do.
city latitude JanTemp AprTemp AugTemp ChangeJanAug
1 MiamiFL 26 67 75 83 NA
2 HoustonTX 30 50 68 82 NA
3 MobileAL 31 50 68 82 NA
4 DallasTX 33 43 66 85 NA
5 PhoenixAZ 33 54 70 92 NA
6 LosAngelesCA 34 58 63 75 NA
7 MemphisTN 35 40 63 81 NA
8 NorfolkVA 37 39 57 77 NA
9 SanFranciscoCA 38 49 56 64 NA
10 BaltimoreMD 39 32 53 76 NA
11 KansasCityMO 39 28 55 76 NA
12 WashingtonDC 39 31 53 74 NA
13 PittsburghPA 40 25 50 71 NA
14 ClevelandOH 41 25 48 70 NA
15 NewYorkNY 41 32 53 76 NA
16 BostonMA 42 29 48 72 NA
17 SyracuseNY 43 22 46 68 NA
18 MinneapolisMN 45 12 46 71 NA
19 PortlandOR 46 40 51 69 NA
20 DuluthMN 47 7 39 64 NA
21 <NA> NA NA NA NA 16
22 <NA> NA NA NA NA 32
23 <NA> NA NA NA NA 32
24 <NA> NA NA NA NA 42
25 <NA> NA NA NA NA 38
26 <NA> NA NA NA NA 17
27 <NA> NA NA NA NA 41
28 <NA> NA NA NA NA 38
29 <NA> NA NA NA NA 15
30 <NA> NA NA NA NA 44
31 <NA> NA NA NA NA 48
32 <NA> NA NA NA NA 43
33 <NA> NA NA NA NA 46
34 <NA> NA NA NA NA 45
35 <NA> NA NA NA NA 44
36 <NA> NA NA NA NA 43
37 <NA> NA NA NA NA 46
38 <NA> NA NA NA NA 59
39 <NA> NA NA NA NA 29
40 <NA> NA NA NA NA 57
Thank you so much!
I agree with the comment of #Heroka that it would have been better to avoid such a situation. But now that you have the data in this form, you could use the following line of code to shift up the entries of the column ChangeJanAug by 20 rows:
df$ChangeJanAug <- c(df$ChangeJanAug[21:nrow(df)],rep(NA,(nrow(df)-20)))
Afterwards you could "clean up" the block of NA entries with
df <- df[1:20,]
If you plan to remove the NAs like this, you may not need to bother about vector recycling and you could use simply
df$ChangeJanAug <- df$ChangeJanAug[21:nrow(df)]
in the first step.
This could be an option
data$ChangeJanAug_new = c(data$ChangeJanAug[-(seq(20))], rep(NA, 20))
out = data[colnames(data) != "ChangeJanAug"]
#later if you want to remove NAs you could do this
out[!is.na(out$ChangeJanAug_new),]
Using na.omit and cbind you could do this (Given you original data is exactly as you mentioned in the question)
cbind(na.omit(data[,-6]), ChangeJanAug = na.omit(data$ChangeJanAug))
# city latitude JanTemp AprTemp AugTemp ChangeJanAug
#1 MiamiFL 26 67 75 83 16
#2 HoustonTX 30 50 68 82 32
#3 MobileAL 31 50 68 82 32
#4 DallasTX 33 43 66 85 42
#5 PhoenixAZ 33 54 70 92 38
#6 LosAngelesCA 34 58 63 75 17
#7 MemphisTN 35 40 63 81 41
#8 NorfolkVA 37 39 57 77 38
#9 SanFranciscoCA 38 49 56 64 15
#10 BaltimoreMD 39 32 53 76 44
#11 KansasCityMO 39 28 55 76 48
#12 WashingtonDC 39 31 53 74 43
#13 PittsburghPA 40 25 50 71 46
#14 ClevelandOH 41 25 48 70 45
#15 NewYorkNY 41 32 53 76 44
#16 BostonMA 42 29 48 72 43
#17 SyracuseNY 43 22 46 68 46
#18 MinneapolisMN 45 12 46 71 59
#19 PortlandOR 46 40 51 69 29
#20 DuluthMN 47 7 39 64 57