R, Pivot longer, multiple observations per row - r

I think I have a question that is nearly identical to this one: R Pivot multiple columns from wide to long but I am hopelessly lost on the regex when trying to follow along.
I am also trying to pivot data to be longer, and I also have multiple columns I'd like to save. My data currently:
FollowUpScans<-structure(list(study_id = c(40, 44, 49, 61, 66, 67, 68, 84, 86,
94, 95, 101, 123, 126, 131, 153, 154, 155, 156, 161, 166, 169,
175, 185, 199, 203, 207, 211, 217, 221, 227, 256, 257, 259, 266,
275, 284, 301, 306, 307, 309, 313, 320, 353, 382, 392, 398, 401,
402, 412, 415, 428, 431, 433, 434, 436), Score1 = c(3, 0, 4,
4, NA, 0, 0, 5, 0, 0, 7, 0, 4, 0, 4, 2, 3, 1, 0, 2, 2, 0, 3,
0, 0, 0, 9, 0, 0, 0, 6, 0, 0, 7, 5, 7, 0, 0, 8, 0, 0, 0, 5, 0,
3, 0, 5, 0, 2, 0, 0, 0, 0, 7, 0, 2), TimeBetweenScans = structure(c(316,
113, 335, 104, 7, 42, 30, 643, 404, 40, 171, 51, 449, 56, 104,
79, 116, 65, 39, 1193, 142, 106, 221, 36, 125, 137, 927, 63,
156, 32, 411, 201, 160, 166, 459, 212, 50, 312, 1627, 354, 33,
62, 842, 174, 216, 17, 214, 24, 149, 72, 9, 13, 42, 771, 113,
122), class = "difftime", units = "days"), Score2 = c(NA, 0,
7, NA, NA, NA, 0, 7, NA, 5, 8, 0, NA, NA, NA, 8, NA, NA, 9, NA,
NA, 0, 4, NA, NA, 0, 9, 2, 0, NA, NA, NA, NA, NA, NA, NA, 4,
1, 8, NA, NA, 3, NA, 0, 8, NA, 5, NA, 7, NA, 0, 3, NA, 7, NA,
4), TimeBetweenScans2 = structure(c(NA, 139, 660, NA, NA, NA,
84, 1794, NA, 221, 320, 227, NA, NA, NA, 989, NA, NA, 411, NA,
NA, 216, 474, NA, NA, 372, 1006, 429, 447, NA, NA, NA, NA, NA,
NA, NA, 313, 530, 1706, NA, NA, 130, NA, 300, 264, NA, 268, NA,
382, NA, 38, 138, NA, 1200, 166, 475), class = "difftime", units = "days"),
Score3 = c(NA, NA, NA, NA, NA, NA, 7, NA, NA, 8, NA, NA,
NA, NA, NA, 8, NA, NA, NA, NA, NA, 1, 4, NA, NA, 0, NA, 5,
0, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA,
NA, NA, NA, 5, NA, NA, NA, NA, NA, NA, 8, 0, 4), TimeBetweenScans3 = structure(c(NA,
NA, NA, NA, NA, NA, 467, NA, NA, 394, NA, NA, NA, NA, NA,
1097, NA, NA, NA, NA, NA, 266, 796, NA, NA, 941, NA, 533,
470, NA, NA, NA, NA, NA, NA, NA, NA, 783, NA, NA, NA, NA,
NA, NA, NA, NA, 388, NA, NA, NA, NA, NA, NA, 1512, 180, 640
), class = "difftime", units = "days"), Score4 = c(NA, NA,
NA, NA, NA, NA, 8, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 5, NA, NA, NA, 1, NA, 5, 0, NA, NA, NA, NA,
NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA), TimeBetweenScans4 = structure(c(NA,
NA, NA, NA, NA, NA, 826, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 497, NA, NA, NA, 1102, NA, 567, 1204,
NA, NA, NA, NA, NA, NA, NA, NA, 1574, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), class = "difftime", units = "days"),
Score5 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, 1, NA,
NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
TimeBetweenScans5 = structure(c(NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 575,
NA, NA, NA, 1225, NA, NA, 1266, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA), class = "difftime", units = "days")), row.names = c(NA,
-56L), class = c("tbl_df", "tbl", "data.frame"))
And instead of columns that looks like: study_id, Score1, TimeBetweenScans,Score2,TimeBetweenScans2, Score3, TimeBetweenScans3,etc.etc..
I'd love it to ultimately look like: study_id,Score,Time,Occurence
The "Occurence" column would just have a 1,2,3,4 etc.. to demonstrate which column it came from. The study_id column would be nice to keep because it demonstrates which "person" it came from.
Any help would be appreciated! Thank you!

You can try:
FollowUpScans %>%
rename(TimeBetweenScans1 = TimeBetweenScans) %>%
pivot_longer(-study_id,
names_to = c(".value", "Time"),
names_pattern = "([A-Za-z]+)([0-9]+)")
The steps are:
Rename the column that is likely to cause problems
pivot_longer specifying that the columns are named in a any number of characters followed by any number of digits pattern. You can use different regex patterns than the one I've shared here. For example, you could probably use "(.*)(\\d+)" for this particular dataset.
If you don't rename first, I would suspect that you would end up with too many rows. You should end up with nrow(FollowUpScans) * 5 rows.

Related

Creating new column with values from multiple other columns

I hope someone can help me with this one!
I have the following dataset and want to create a new column where the values of aver1, aver2 and aver3 are represented.
I tried it with rowSums but this did not work for me because when i put na.rm = TRUE also those rows who have only empty columns have 0 as their sum and I can not differentiate these from the ones that actually do have 0 as their value.
What I have:
count
aver1.
aver2.
aver3.
X
NA
1
NA
Y
1
NA
NA
X
NA
NA
0
What I want:
count
aver1.
aver2.
aver3.
aver_all
X
NA
1
NA
1
Y
1
NA
NA
1
X
NA
NA
0
0
the dput output:
structure(list(count = c(0,
0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0,
0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1,
1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0,
0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1,
1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0,
1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,
1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,
1), start = c(NA, NA, NA, 5, NA, NA, NA, NA, 1, NA, NA, NA, NA,
1, 1, 1, NA, NA, 9, NA, NA, NA, 3, 4, NA, 11, 1, NA, NA, 1, NA,
NA, NA, 6, NA, NA, 5, NA, 5, NA, NA, NA, NA, NA, 1, NA, 3, NA,
NA, 3, 1, NA, 13, NA, 0, NA, NA, NA, NA, 1, NA, NA, NA, 12, 1,
NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, NA, 1, NA, NA, NA, NA,
2, NA, 2, NA, NA, NA, 2, NA, NA, 1, NA, 3, NA, 3, NA, NA, NA,
NA, 10, NA, 1, NA, 0, 0, 1, 1, NA, NA, NA, NA, NA, 1, NA, 2,
7, NA, 1, NA, NA, 3, NA, 2, 6, NA, 3, NA, 1, 8, 1, NA, 1, NA,
NA, 0, NA, 0, 1, NA, NA, NA, NA, 3, NA, 0, NA, NA, NA, 1, NA,
NA, 0, NA, NA, NA, NA, NA, 2, NA, NA, 0, NA, NA, NA, NA, NA,
NA, 1, NA, 4), aver1 = c(NA, NA, NA, 0.5, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 0.166666666666667, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 0.133333333333333, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, 0.266666666666667, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 0.566666666666667, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 0.266666666666667, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA), aver2 = c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 0.333333333333333, 0.416666666666667, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.25, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.916666666666667,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.472222222222222,
NA, NA, NA, NA, NA, NA, 0.388888888888889, NA, NA, NA, 0.0833333333333333,
NA, NA, NA, NA, 0.0555555555555556, NA, 0.111111111111111, NA,
NA, NA, NA, NA, NA, NA, NA, 0.305555555555556, NA, 0.861111111111111,
NA, NA, NA, NA, NA, NA, NA, NA, 0.194444444444444, NA, NA, NA,
NA, NA, 0.611111111111111, NA, NA, NA, NA, 0, NA, 1, NA, 0.694444444444444,
NA, NA, NA, NA, 0.5, NA, 1, NA, NA, NA, NA, NA, 0.0277777777777778,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.138888888888889,
NA, NA, 0.583333333333333, NA, NA, NA, NA, NA, NA, 0.194444444444444,
NA, NA), aver3 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, 0.514285714285714,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 0.0285714285714286, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 1, 0.214285714285714, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.0142857142857143, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 0.614285714285714, NA, NA, NA, NA, 0.371428571428571,
NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA,
NA, NA, NA, 0.9, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.0571428571428571,
NA, NA, 0.128571428571429, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 0.1)), row.names = c(NA, -170L
), class = c("tbl_df", "tbl", "data.frame"))
This is an example that allows you to sum your selected variables from your data-frame (let's call this data-frame: 'df').
df$aver_all <- apply(df[, c("aver1", "aver2", "aver3")], 1, function(x) sum(x, na.rm=TRUE))
It will add 0s to rows where there are only NAs for aver1-2-3.
The next code will replace by NAs, the rows with full NAs.
df$aver_all <- apply(df[, c("aver1", "aver2", "aver3")], 1, function(x) ifelse(FALSE %in% is.na(x), sum(x, na.rm=TRUE), NA))
Given that you have said that you also have rows where all column values are NAs, I will create an additional row in your dataset that fulfills this condition:
dataset <- tibble(count = c("X", "Y", "X", "Z"), aver1. = c(NA, 1, NA, NA),
aver2. = c(1, NA, NA, NA), aver3. = c(NA, NA, 0, NA))
You can use the conditional case_when (https://dplyr.tidyverse.org/reference/case_when.html), which will allow you to set values depending on the conditions you choose for each row. In this case, you could use:
dataset$aver_all <- case_when(is.na(aver1.) & is.na(aver2.) & is.na(aver3.) ~ NA_real_,
aver1. | aver2. | aver3. ~ 1,
TRUE ~ 0)
Here the first condition sets rows where all values are NA to NA, the second sets a 1 if at least one of the three values of a row is a 1; and finally if none of these conditions is satisfied, a 0 is set.

extracting information from excel into lists in R

hello all i have this datasset :
> dput(test1)
structure(list(startdate = c("2019-11-06", "2019-11-06", "2019-11-06",
"2019-11-06", "2019-11-06", "2019-11-06", "2019-11-06", "2019-11-06",
"2019-11-06", "2019-11-06", "2019-11-06", "2019-11-06", "2019-11-06",
"2019-11-06", "2019-11-06", "2019-11-06", "2019-11-06", "2019-11-06",
"2019-11-06", "2019-11-06", "2019-11-06", "2019-11-27", "2019-11-27",
"2019-11-27", "2019-11-27", "2019-11-27", "2019-11-27", "2019-11-27",
"2019-11-27", "2019-11-27", "2019-11-27", "2019-11-27", "2019-11-27",
"2019-11-27", "2019-11-27", "2019-11-27", "2019-11-27", "2019-11-27",
"2019-11-27", "2019-11-27", "2019-11-01", "2019-11-05", "2019-11-15",
"2019-11-16", "2019-11-17", "2019-11-18", "2019-11-19", "2019-11-20",
"2019-11-21", NA), id = c("POL55", "POL56", "POL57", "POL58",
"POL59", "POL60", "POL61", "POL62", "POL63", "POL64", "POL65",
"POL66", "POL67", "POL68", "POL69", "POL56", "POL57", "POL58",
"POL59", "POL60", "POL61", "POL55", "POL56", "POL57", "POL58",
"POL59", "POL60", "POL61", "POL55", "POL56", "POL57", "POL58",
"POL59", "POL60", "POL61", "POL55", "POL56", "POL57", "POL58",
"POL59", "POL60", "POL61", "POL62", "POL63", "POL64", "POL65",
"POL66", "POL67", "POL68", NA), m0_9 = c(NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
32, 34, NA, NA, NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3, 98,
33, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), m10_19 = c(NA,
NA, NA, 32, 34, NA, NA, NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65,
3, 98, 33, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), m20_29 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 32, 34, NA,
NA, NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3, 98, 33, NA, NA,
NA, NA, NA, NA, NA), m30_39 = c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, 32, 34, NA, NA, NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3,
98, 33, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), m40_49 = c(32, 34, NA, NA,
NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3, 98, 33, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), m50_59 = c(NA,
NA, NA, NA, NA, NA, 32, 34, NA, NA, NA, NA, 55, 3, NA, NA, NA,
7, 9, 1, 65, 3, 98, 33, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), m60_69 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 32, 34, NA, NA, NA, NA, 55, 3, NA, NA, NA, 7, 9,
1, 65, 3, 98, 33, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA), m70 = c(NA, NA, NA, NA, NA, NA, 32,
34, NA, NA, NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3, 98, 33,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), f0_9 = c(32, 34, NA,
NA, NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3, 98, 33, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), f10_19 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 32, 34, NA, NA, NA, NA, 55,
3, NA, NA, NA, 7, 9, 1, 65, 3, 98, 33, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), f20_29 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 32, 34, NA, NA, NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3,
98, 33, NA, NA, NA), f30_39 = c(NA, NA, NA, 32, 34, NA, NA, NA,
NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3, 98, 33, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), f40_49 = c(NA, NA, NA, NA,
NA, 32, 34, NA, NA, NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3,
98, 33, NA, NA, NA, NA, NA, NA, NA, NA, 32, 34, NA, NA, NA, NA,
55, 3, NA, NA, NA, 7, 9, 1, 65, 3, 98, 33, NA), f50_59 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 32, 34, NA, NA, NA, NA,
55, 3, NA, NA, NA, 7, 9, 1, 65, 3, 98, 33, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), f60_69 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 32, 34, NA, NA, NA, NA,
55, 3, NA, NA, NA, 7, 9, 1, 65, 3, 98, 33, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA), f70 = c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, 32, 34, NA, NA, NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3,
98, 33, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -50L), class = c("tbl_df",
"tbl", "data.frame"))
I would like to create a list called ageCat. This list should contain a number of lists. The number of lists is the amount of age categories. Then for each age category i would like to extract the following info startAge, endAge, maleCount,femaleCount, totalCount.
Additionaly, i want only to sum up only individuals that have the same id and start date. For now i have written this:
create list of age
createLists <- function(startdate, id){
testFiltered = test1[policyid == id & start == startdate]
ageGroup <- vector("list", length == 8)
names(ageGroup) <- as.character(seq_along(ageGroup))
for(ageCat in seq_along(ageGroup)){
ageGroup[[ageCat]] <- getAgeInfo(testFiltered, ageCat)
}
getAgeInfo <- function(testFiltered, ageCat){
start =
end =
nomales =
nofemales =
}
ageGroup <- list(startAge = start,
endAge = end ,
maleCount = nomales ,
femaleCount = nofemales)
}
I have hard coded the length of the vecor ageGroup. How can i do this without hard coding it, aka. to look up how many columns with age categories I have for each gender?
Secondly, how can i extract the information startAge, endAge, maleCount,femaleCount, totalCount
Instead of working with lists I suggest to convert your data.frame to long format, getting rid of missing values and extracting sex and age. A `tidyverse´ approach might look like this:
library(dplyr)
library(tidyr)
library(tibble)
df <- tibble(
startdate = c(
"2019-11-06", "2019-11-06", "2019-11-06",
"2019-11-06", "2019-11-06", "2019-11-06", "2019-11-06", "2019-11-06",
"2019-11-06", "2019-11-06", "2019-11-06", "2019-11-06", "2019-11-06",
"2019-11-06", "2019-11-06", "2019-11-06", "2019-11-06", "2019-11-06",
"2019-11-06", "2019-11-06", "2019-11-06", "2019-11-27", "2019-11-27",
"2019-11-27", "2019-11-27", "2019-11-27", "2019-11-27", "2019-11-27",
"2019-11-27", "2019-11-27", "2019-11-27", "2019-11-27", "2019-11-27",
"2019-11-27", "2019-11-27", "2019-11-27", "2019-11-27", "2019-11-27",
"2019-11-27", "2019-11-27", "2019-11-01", "2019-11-05", "2019-11-15",
"2019-11-16", "2019-11-17", "2019-11-18", "2019-11-19", "2019-11-20",
"2019-11-21", NA
),
id = c(
"POL55", "POL56", "POL57", "POL58",
"POL59", "POL60", "POL61", "POL62", "POL63", "POL64", "POL65",
"POL66", "POL67", "POL68", "POL69", "POL56", "POL57", "POL58",
"POL59", "POL60", "POL61", "POL55", "POL56", "POL57", "POL58",
"POL59", "POL60", "POL61", "POL55", "POL56", "POL57", "POL58",
"POL59", "POL60", "POL61", "POL55", "POL56", "POL57", "POL58",
"POL59", "POL60", "POL61", "POL62", "POL63", "POL64", "POL65",
"POL66", "POL67", "POL68", NA
),
m0_9 = c(
NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
32, 34, NA, NA, NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3, 98,
33, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
),
m10_19 = c(
NA,
NA, NA, 32, 34, NA, NA, NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65,
3, 98, 33, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
),
m20_29 = c(
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 32, 34, NA,
NA, NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3, 98, 33, NA, NA,
NA, NA, NA, NA, NA
),
m30_39 = c(
NA, NA, NA, NA, NA, NA, NA, NA,
NA, 32, 34, NA, NA, NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3,
98, 33, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA
),
m40_49 = c(
32, 34, NA, NA,
NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3, 98, 33, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
),
m50_59 = c(
NA,
NA, NA, NA, NA, NA, 32, 34, NA, NA, NA, NA, 55, 3, NA, NA, NA,
7, 9, 1, 65, 3, 98, 33, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), m60_69 = c(
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 32, 34, NA, NA, NA, NA, 55, 3, NA, NA, NA, 7, 9,
1, 65, 3, 98, 33, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA
), m70 = c(
NA, NA, NA, NA, NA, NA, 32,
34, NA, NA, NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3, 98, 33,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), f0_9 = c(
32, 34, NA,
NA, NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3, 98, 33, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), f10_19 = c(
NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 32, 34, NA, NA, NA, NA, 55,
3, NA, NA, NA, 7, 9, 1, 65, 3, 98, 33, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), f20_29 = c(
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 32, 34, NA, NA, NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3,
98, 33, NA, NA, NA
), f30_39 = c(
NA, NA, NA, 32, 34, NA, NA, NA,
NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3, 98, 33, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA
), f40_49 = c(
NA, NA, NA, NA,
NA, 32, 34, NA, NA, NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3,
98, 33, NA, NA, NA, NA, NA, NA, NA, NA, 32, 34, NA, NA, NA, NA,
55, 3, NA, NA, NA, 7, 9, 1, 65, 3, 98, 33, NA
), f50_59 = c(
NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 32, 34, NA, NA, NA, NA,
55, 3, NA, NA, NA, 7, 9, 1, 65, 3, 98, 33, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), f60_69 = c(
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 32, 34, NA, NA, NA, NA,
55, 3, NA, NA, NA, 7, 9, 1, 65, 3, 98, 33, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA
), f70 = c(
NA, NA, NA, NA, NA, NA, NA, NA,
NA, 32, 34, NA, NA, NA, NA, 55, 3, NA, NA, NA, 7, 9, 1, 65, 3,
98, 33, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA
)
)
# Convert to tidy data frame
df_age <- df %>%
gather(age_sex, count, -startdate, -id) %>%
filter(!is.na(count)) %>%
extract(age_sex, into = c("sex", "start_age", "end_age"), regex = "(m|f)(\\d+)_?(\\d+)?", remove = FALSE) %>%
mutate(ageg = paste0(start_age, "_", end_age))
df_age
#> # A tibble: 187 x 8
#> startdate id age_sex sex start_age end_age count ageg
#> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
#> 1 2019-11-27 POL55 m0_9 m 0 9 32 0_9
#> 2 2019-11-27 POL56 m0_9 m 0 9 34 0_9
#> 3 2019-11-27 POL61 m0_9 m 0 9 55 0_9
#> 4 2019-11-27 POL55 m0_9 m 0 9 3 0_9
#> 5 2019-11-27 POL59 m0_9 m 0 9 7 0_9
#> 6 2019-11-27 POL60 m0_9 m 0 9 9 0_9
#> 7 2019-11-27 POL61 m0_9 m 0 9 1 0_9
#> 8 2019-11-27 POL55 m0_9 m 0 9 65 0_9
#> 9 2019-11-27 POL56 m0_9 m 0 9 3 0_9
#> 10 2019-11-27 POL57 m0_9 m 0 9 98 0_9
#> # ... with 177 more rows
# df back to nested list by startdate and ageg
df_list <- df_age %>%
# Count by startdate, ageg, start_age, end_age, sex
count(startdate, ageg, start_age, end_age, sex, wt = count) %>%
# male and female counts back in columns
spread(sex, n, fill = 0) %>%
# split by startdate
split(.$startdate) %>%
# ... and split each startdate list by ageg
lapply(function(x) split(x, x$ageg))
Created on 2020-03-10 by the reprex package (v0.3.0)

Filter variable based on NA > 20% in a range—R

How would I create a filter variable if there were more than 20% missing variable on a range of items. For example, if mssi1_1:mssi1_4 contains NA values in more than 20% of the variables, the filter out.
A reproducible dataset:
df2 <- structure(list(uci = c("10001h", "10476h", "10484h", "10580h",
"14280h", "2313n", "2778n", "3063n", "3579h", "3699h", "4090h",
"4393h", "4412h", "4528h", "4582h", "4683h", "4735h", "4736h",
"4745h", "4750h", "4756h", "4770h", "4771h", "4832h", "4872h",
"517n", "6292h", "6309h", "6481h", "6601h", "6704h", "6948h",
"7020h", "7030h", "7071h", "7160h", "7188h", "7235h", "7266h",
"7348h", "7746h", "7810h", "8082h", "8119h", "8334h", "8345h",
"8462h", "8486h", "8518h", "8578h", "8761h", "8799h", "8939h",
"9046h", "9191h", "9194h", "9222h", "9273h", "9293h", "9448h",
"9486h", "9757h", "9894h", "10268h", "10431h", "10498h", "10572h",
"10622h", "10652h", "10660h", "14457h", "2420n", "2966n", "3006n",
"3766h", "4219h", "4256h", "4366h", "4367h", "4534h", "4538h",
"4543h", "4569h", "4570h", "4757h", "4769h", "4806h", "4843h",
"4955h", "4958h", "50n", "601h", "603n", "6315h", "6340h", "6348h",
"6358h", "6369h", "6379h", "6395h"), ID = c(1, 5, 6, 13, 20,
28, 32, 36, 44, 48, 55, 69, 72, 80, 92, 107, 114, 115, 116, 117,
118, 124, 125, 131, 135, 154, 158, 160, 179, 185, 193, 214, 218,
220, 223, 232, 236, 240, 242, 248, 285, 288, 308, 313, 330, 332,
341, 345, 350, 354, 369, 372, 379, 389, 403, 404, 405, 412, 413,
421, 425, 445, 456, 2, 3, 7, 11, 14, 17, 18, 23, 30, 34, 35,
50, 59, 61, 66, 67, 83, 85, 87, 90, 91, 119, 123, 127, 133, 148,
149, 153, 156, 157, 162, 165, 166, 167, 169, 170, 173), Class = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), age = c(14, 17,
14, 14, 15, 14, 16, 20, 12, 16, 12, 15, 15, 12, 16, 17, 14, 14,
13, 13, 14, 14, 23, 12, 15, 15, 14, 13, 17, 22, 15, 17, 22, 14,
15, 15, 23, 15, 17, 12, 24, 15, 13, 13, 14, 17, 13, 21, 14, 14,
15, 13, 21, 14, 21, 15, 15, 14, 16, 13, 12, 12, 12, 14, 17, 16,
16, 15, 15, 13, 14, 20, 24, 15, 15, 14, 17, 14, 16, 15, 15, 17,
14, 15, 13, 19, 19, 14, 16, 16, 22, 21, 23, 19, 15, 15, 14, 14,
15, 24), sex = c(0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1,
0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
1), bhsMean = c(0.47, 0.3, 0.16, 0.15, 0.35, 0.06, 0.25, 0.35,
0.15, 0.35, 0.3, 0, 0.3, 0.38, 0.3, 0.1, 0.2, 0.1, 0.25, 0.2,
0.3, 0.4, 0.3, 0.4, 0.2, 0.2, 0.05, 0.1, 0.35, 0.1, 0, 0.25,
0.2, 0.25, 0.05, 0.35, 0.3, 0.25, 0.2, 0.27, 0.35, 0.15, 0.25,
0.1, 0.2, 0.25, 0.05, 0.1, 0.45, 0.3, 0.36, 0.3, 0.44, 0.15,
0.2, 0.11, 0.25, 0.2, 0.05, 0.45, 0, 0.4, 0.25, 0.6, 0.6, 0.55,
0.71, 0.67, 0.5, 0.5, 0.55, 0.68, 0.55, 0.4, 0.68, 0.5, 0.6,
0.53, 0.6, 0.65, 0.53, 0.65, 0.65, 0.65, 0.6, 0.55, 0.5, 0.55,
0.6, 0.75, 0.65, 0.45, 0.5, 0.5, 0.65, 0.45, 0.6, 0.65, 0.65,
0.45), tbMean = c(2.56, 3.89, 2.67, 2.33, 4.89, 1.44, 2.44, 2.44,
NA, NA, NA, NA, NA, 3.44, 1.22, 3.11, 4, 4.11, 3, 2, 2.78, 2.67,
3.44, 3.33, 3.33, 3.78, 3.89, 2.11, 4.56, 4, 1, 3.22, 3.33, 2.89,
1.44, 3.11, 2.67, 3.33, 3.44, 1.33, 2.78, 2.67, 3.33, 2, 2.44,
3.89, 2.44, 3.78, 3.67, 3.56, 3.56, 3.78, 1.78, 2.11, 3.33, 3.11,
2.67, 2.44, 3.56, 1.67, NA, 2.67, 4.44, 4.89, 4.56, 3.89, 4.44,
4.11, 3.67, 3.44, 4.44, 5, 3.78, 4.78, NA, NA, NA, NA, NA, 3.44,
4, 4.56, 4.11, 4, 3.78, 5.11, 3.56, 2.89, 3.11, 3.11, 4.33, 3.56,
5.11, 3.33, 4.11, 4.44, 4.67, 4, 4.56, 4.67), pbMean = c(2, 3.67,
4, 4.5, 2.17, 1, 3.5, 2.33, NA, NA, NA, NA, NA, 1.5, 3.67, 3,
3.5, 2.5, 2.17, 2, 1, 3.67, 2.33, 1.67, 2, 2, 3.17, 2.17, 1,
3.83, 1, 2.33, 2.67, 3, 1, 3.33, 2, 3, 1.83, 1.17, 1, 2, 2.33,
2.17, 2.17, 2.83, 2.67, 2.67, 1, 2.17, 1.67, 3.33, 1.33, 2.17,
2.17, 1.17, 2.33, 1.83, 2.17, 1, NA, 1.5, 1.2, 3.17, 4.67, 1.33,
2.83, 2.67, 2, 4.33, 3, 3, 5, 3.33, NA, NA, NA, NA, NA, 4.5,
1.5, 4, 5.17, 3.33, 3.33, 3.67, 4.5, 2, 3.17, 3.67, 4.83, 4.33,
3.67, 3.83, 5.17, 3, 2.33, 2.33, 4, 1.33), acssMean = c(2.29,
1.86, 1.14, 2, 1.14, NA, 2, 3.29, NA, NA, NA, NA, NA, NA, 1.57,
2.33, 3.43, 0.14, 1.43, 1.57, 2.29, 1.29, 0.29, 1.43, 0.57, 0.43,
2.29, NA, 2.57, 1.71, 2.43, 1.43, 2.71, 2.29, 2.29, 1.86, 0.86,
3.71, 1.57, NA, 2.29, 1, 2.71, 2, 0, 1.43, 2.71, NA, NA, NA,
1.86, NA, 1.83, 2, 3.43, 0, 3.43, 0.86, NA, NA, NA, 2.14, NA,
3.43, 4, 3.14, 3.29, 2.83, 1.71, 1.86, 2.14, 1.33, 1.71, 1.57,
NA, NA, NA, NA, NA, 2.71, 1.29, 3.57, 2.29, 0.14, 1.71, 0.14,
2.86, 2.71, 1.43, 1.71, 0.86, 2.33, 2.43, 1.71, 2.57, 1.14, 3.43,
2.86, 3.57, 1.86), mssi1_1 = c(NA, 0, 0, 1, 1, 0, 2, 2, 0, 0,
0, 0, 0, NA, 0, 1, 0, 0, 0, 0, 0, 1, NA, 0, 0, 1, 1, 1, 2, 1,
0, 0, NA, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 2, 1, 0, 0, 0, 0,
0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 2, 1, 1, 2, 1,
1, 1, NA, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 1, 0, 1, 1, 2,
0, 1, 1, 1, 0, 0, 0, 1, 1), mssi1_2 = c(NA, 1, 0, 1, 1, 0, 1,
2, 1, 1, 0, 0, 2, NA, 1, 1, 0, 0, 0, 0, 0, 1, NA, 0, 0, 2, 1,
0, 2, 1, 0, 1, NA, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 2, 1,
1, 1, 0, 1, 1, NA, 0, 0, 0, 0, 2, 1, 1, 1, 2, 0, 1, 0, 1, 1,
0, 0, 2, 0, 1, 1, 1, 0, 0, 0, 1, 1), mssi1_3 = c(NA, 0, 0, 0,
0, 0, 2, 0, 0, 0, 0, 0, 0, NA, 0, 0, 0, 0, 0, 0, 0, 0, NA, 0,
0, 0, 0, 0, 2, 0, 0, 0, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, NA, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, NA, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0), mssi1_4 = c(NA,
0, 0, 0, 0, 0, 1, 2, 1, 0, 0, 0, 0, NA, 0, 0, 0, 0, 0, 0, 0,
2, NA, 0, 0, 0, 0, 0, 1, 0, 0, 0, NA, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, NA, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, NA, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0), mssi1_5 = c(NA,
NA, NA, NA, NA, NA, 2, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 1, NA, NA, NA, 1, NA, NA, 1, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, 0, NA, NA,
NA, 0, NA, NA, 1, 2, NA, NA, NA, NA, NA, NA, NA, 3, NA, NA, NA,
2, NA, NA, NA, NA, NA, NA, NA, 2, NA, 1, NA, 2, NA, NA, NA, NA,
NA), mssi1_6 = c(NA, NA, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA,
0, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, 1, NA, NA,
1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
0, 0, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA,
NA, NA, NA, NA, NA, NA, 1, NA, NA, 1, 0, NA, NA, NA, NA, NA,
NA, NA, 3, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, 2, NA,
1, NA, 0, NA, NA, NA, NA, NA), mssi1_7 = c(NA, NA, NA, NA, NA,
NA, 2, 1, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA,
2, NA, NA, NA, 2, NA, NA, 2, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 2, 0, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, 2, NA, NA,
2, 1, NA, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, 1, NA, NA, NA,
NA, NA, NA, NA, 2, NA, 1, NA, 2, NA, NA, NA, NA, NA), mssi1_8 = c(NA,
NA, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA, 0, NA, NA, NA, NA,
NA, NA, NA, NA, 0, NA, NA, NA, 1, NA, NA, 2, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, 1, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA,
NA, 3, NA, NA, 1, 1, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA,
1, NA, NA, NA, NA, NA, NA, NA, 0, NA, 2, NA, 1, NA, NA, NA, NA,
NA), mssi1_9 = c(NA, NA, NA, NA, NA, NA, 2, 2, NA, NA, NA, NA,
2, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, 1, NA, NA,
2, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
1, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA,
NA, NA, NA, NA, NA, NA, 3, NA, NA, 1, 1, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, 2, NA,
1, NA, 1, NA, NA, NA, NA, NA), mssi1_10 = c(NA, NA, NA, NA, NA,
NA, 2, 0, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA, NA, NA,
0, NA, NA, NA, 1, NA, NA, 2, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA,
1, 2, NA, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, 2, NA, NA, NA,
NA, NA, NA, NA, 0, NA, NA, NA, 1, NA, NA, NA, NA, NA), mssi1_11 = c(NA,
NA, NA, NA, NA, NA, 3, 1, NA, NA, NA, NA, 2, NA, NA, NA, NA,
NA, NA, NA, NA, 0, NA, NA, NA, 0, NA, NA, 2, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA,
NA, 0, NA, NA, 1, 2, NA, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA,
0, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, 1, NA, NA, NA,
NA, NA), mssi1_12 = c(NA, NA, NA, NA, NA, NA, 1, 2, NA, NA, NA,
NA, 3, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, 0, NA,
NA, 2, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 0, 0, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0,
NA, NA, NA, NA, NA, NA, NA, 2, NA, NA, 1, 1, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, 1,
NA, NA, NA, 0, NA, NA, NA, NA, NA), mssi1_13 = c(NA, NA, NA,
NA, NA, NA, 1, 3, NA, NA, NA, NA, 3, NA, NA, NA, NA, NA, NA,
NA, NA, 0, NA, NA, NA, 1, NA, NA, 2, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, 1, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, 3,
NA, NA, 1, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1,
NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, 0, NA, NA, NA, NA,
NA), mssi1_14 = c(NA, NA, NA, NA, NA, NA, 1, 0, NA, NA, NA, NA,
0, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, 1, NA, NA,
1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
0, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA,
NA, NA, NA, NA, NA, NA, 1, NA, NA, 0, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, 1, NA,
0, NA, 0, NA, NA, NA, NA, NA), mssi1_15 = c(NA, NA, NA, NA, NA,
NA, 0, 0, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA,
1, NA, NA, NA, 0, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA,
0, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA,
NA, NA, NA, NA, NA, 0, NA, 0, NA, 0, NA, NA, NA, NA, NA), mssi1_16 = c(NA,
NA, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, NA, NA, NA, NA,
NA, NA, NA, NA, 0, NA, NA, NA, 0, NA, NA, 0, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA,
NA, 0, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 0, NA, NA, NA, NA, NA, NA, NA, 0, NA, 0, NA, 0, NA, NA, NA,
NA, NA), mssi1_17 = c(NA, NA, NA, NA, NA, NA, 0, 0, NA, NA, NA,
NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, 3, NA,
NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 0, 0, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0,
NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, 0, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, 0,
NA, 0, NA, 0, NA, NA, NA, NA, NA), mssi1_18 = c(NA, NA, NA, NA,
NA, NA, 0, 0, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA,
NA, 0, NA, NA, NA, 0, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, 0, NA,
NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, 0, NA,
NA, NA, NA, NA, NA, NA, 0, NA, 0, NA, 0, NA, NA, NA, NA, NA)), row.names = c(NA,
-100L), class = c("tbl_df", "tbl", "data.frame"))
I would hypothteically want to be able to apply this to any number of variables. But I do not know where to start. Would I first define some cases for the variables of interst e.g.:
case1 <- vars(mssi1_1:mssi1_4)
case2 <- vars(mssi1_5:mssi1_18)
Again, I do not really knw where to start. Apologies in advance if any of this is confusing. Please let me know if you need more info.
library(dplyr)
df2 %>%
mutate(missing_perc = rowMeans(is.na(select(., mssi1_1: mssi1_4))) * 100)
Output is:
uci ID Class age sex bhsMean tbMean pbMean acssMean mssi1_1 mssi1_2 mssi1_3 mssi1_4 missing_perc
1 10001h 1.00 1.00 14.0 0 0.470 2.56 2.00 2.29 NA NA NA NA 100
2 10476h 5.00 1.00 17.0 0 0.300 3.89 3.67 1.86 NA NA 0 0 50.0
3 10484h 6.00 1.00 14.0 0 0.160 2.67 4.00 1.14 0 0 0 0 0
4 10580h 13.0 1.00 14.0 0 0.150 2.33 4.50 2.00 1.00 1.00 0 0 0
5 14280h 20.0 1.00 15.0 0 0.350 4.89 2.17 1.14 1.00 1.00 0 0 0
6 2313n 28.0 1.00 14.0 0 0.0600 1.44 1.00 NA 0 0 0 0 0
Sample data:
df2 <- structure(list(uci = c("10001h", "10476h", "10484h", "10580h",
"14280h", "2313n"), ID = c(1, 5, 6, 13, 20, 28), Class = c(1,
1, 1, 1, 1, 1), age = c(14, 17, 14, 14, 15, 14), sex = c(0, 0,
0, 0, 0, 0), bhsMean = c(0.47, 0.3, 0.16, 0.15, 0.35, 0.06),
tbMean = c(2.56, 3.89, 2.67, 2.33, 4.89, 1.44), pbMean = c(2,
3.67, 4, 4.5, 2.17, 1), acssMean = c(2.29, 1.86, 1.14, 2,
1.14, NA), mssi1_1 = c(NA, NA, 0, 1, 1, 0), mssi1_2 = c(NA,
NA, 0, 1, 1, 0), mssi1_3 = c(NA, 0, 0, 0, 0, 0), mssi1_4 = c(NA,
0, 0, 0, 0, 0)), .Names = c("uci", "ID", "Class", "age",
"sex", "bhsMean", "tbMean", "pbMean", "acssMean", "mssi1_1",
"mssi1_2", "mssi1_3", "mssi1_4"), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
I like to use the tidyverse approach:
Using your dataset:
require(tidyverse)
df2 %>%
#Make it tidy. I assumed that uci and ID are keys in your data.
gather(variable, value, -uci, -ID) %>%
#Group the data by variable.
group_by(variable) %>%
#Calculating new variables based on the grouping: missing, valid and precent missing
#for each variable
mutate(Missing = sum(is.na(value)),
Valid = sum(!is.na(value)),
percentMissing = Missing/(Missing+Valid)) %>%
#Filtering OUT variables with more than 20% missing.
filter(percentMissing < 0.2)
Edit
If you'd like you can use spread to go back to wide format.
Same script with new lines for spread and select to remove the new variables:
df2 %>%
#Make it tidy. I assumed that uci and ID are keys in your data.
gather(variable, value, -uci, -ID) %>%
#Group the data by variable.
group_by(variable) %>%
#Calculating new variables based on the grouping, missing, valid and precent missing
#for each variables
mutate(Missing = sum(is.na(value)),
Valid = sum(!is.na(value)),
percentMissing = Missing/(Missing+Valid)) %>%
#Filtering OUT variables with more than 20% missing.
filter(percentMissing < 0.2) %>%
#Going back to the wide format, and removing the new variables
#Remove variables
select(-Missing, -Valid, -percentMissing) %>%
#Back to wide format
spread(variable, value)

How to further format forest Plots in R, from the metafor package?

I'm quite new to R and have been struggling with properly formatting a forest plot I've created.
When I click the "zoom" option in R to open the graph in a new window, it looks as such:
Forest Plot Currently
My main goal is to get the forest plot as compact as possible, i.e. publication quality/style. I currently have wayyyy too much white space in my plot. I think it has something to do with me messing around with the par() function, and now have no clue how to revert to defaults.
#Metafor library
library(metafor)
#ReadXL library to import excel sheet
library(readxl)
#Name the data sheet from the excel file
ACDF<- read_excel("outpatient_ACDF_meta_analysis.xlsx")
#View the data sheet with view(ACDF)
par(mar=c(20,1,1,1))
#This below measures with risk ratios. If you want to measure odds ratios, use argument measure=OR
returnop <- escalc(measure="OR", ai=op_return_OR, bi=op_no_return_OR, ci=ip_return_OR, di=ip_no_return_OR, data=ACDF)
#Generate a Random Effects Model
REmodel<-rma(yi=yi, vi=vi, data=returnop, slab=paste(Author, Year, sep=", "), method="REML")
#Generate a forest plot of the data
forest(REmodel, xlim=c(-17, 6),
ilab=cbind(ACDF$op_return_OR, ACDF$op_no_return_OR, ACDF$ip_return_OR, ACDF$ip_no_return_OR),
ilab.xpos=c(-9.5,-8,-6,-4.5), cex=.75, ylim=c(-1, 27),
psize=1)
### add column headings to the plot
text(c(-9.5,-8,-6,-4.5), 26, c("Return+", "Return-", "Return+", "Return-"))
text(c(-8.75,-5.25), 27, c("Outpatient", "Inpatient"))
text(-16, 26, "Study", pos=4)
text(6, 26, "Log Odds Ratio [95% CI]", pos=2)
I'm not 100% as to how to provide my data otherwise, but I used the dput function to provide as follows. Apologies for the N/As, still fleshing out the data for the future.
structure(list(Study = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA), Author = c("Stieber", "Villavicencio",
"Lied", "Liu", "Garringer", "Joseffer", "Trahan", "Lied", "Sheperd",
"Talley", "Martin", "McGirt", "Adamson", "Fu", "Arshi", "Khanna",
"McClelland", "Purger", "McLellend2", NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Year = c(2005, 2007,
2007, 2009, 2010, 2010, 2011, 2012, 2012, 2013, 2015, 2015, 2016,
2017, 2017, 2017, 2017, 2017, 2017, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), op_return_OR = c(NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 1, 3, 2, 16, 257, 7, NA, 5, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), op_no_return_OR = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
596, 769, 992, 4581, 958, 1749, NA, 3120, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), ip_return_OR = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 8, 9, 2, 257, 2034, 12, NA,
200, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA), ip_no_return_OR = c(NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 589, 641, 482, 16171, 8930, 1744, NA, 46312, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), op_death = c(NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, 1, NA,
1, 0, NA, 2, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA), op_no_death = c(NA, NA, NA, 45, NA,
NA, NA, NA, NA, NA, 596, NA, 993, 4597, NA, 1754, NA, 3125, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), ip_death = c(NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, 0, NA,
0, 42, NA, 2, NA, 20, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA), ip_no_death = c(NA, NA, NA, 64,
NA, NA, NA, NA, NA, NA, 597, NA, 484, 16386, NA, 1754, NA, 46492,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
2979.79797979798), op_thrombo = c(NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 0, NA, NA, 8, 20, 4, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), op_no_thrombo = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 597, NA, NA, 4589, 1195,
1752, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA), ip_thrombo = c(NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 2, NA, NA, 67, 150, 4, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), ip_no_thrombo = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 595, NA, NA, 16361, 10814,
1752, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA), op_stroke = c(NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 0, NA, NA, 2, 12, 0, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), op_no_stroke = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 597, NA, NA, 4595, 1203,
1756, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA), ip_stroke = c(NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 2, NA, NA, 14, 132, 0, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), ip_no_stroke = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 595, NA, NA, 16414, 10832,
1756, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA), op_dysphagia = c(NA, NA, NA, 0, NA, NA,
NA, NA, NA, NA, NA, NA, 11, NA, NA, NA, NA, 2, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), op_no_dysphagia = c(NA,
NA, NA, 45, NA, NA, NA, NA, NA, NA, NA, NA, 618, NA, NA, NA,
NA, 49, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA), ip_dysphagia = c(NA, NA, NA, 1, NA, NA, NA, NA,
NA, NA, NA, NA, 1, NA, NA, NA, NA, 59, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), ip_no_dysphagia = c(NA,
NA, NA, 63, NA, NA, NA, NA, NA, NA, NA, NA, 273, NA, NA, NA,
NA, 2917, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA), op_hematoma = c(NA, NA, NA, 0, NA, NA, NA, NA,
NA, NA, NA, NA, 1, NA, NA, NA, 1, 4, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), op_no_hematoma = c(NA,
NA, NA, 45, NA, NA, NA, NA, NA, NA, NA, NA, 629, NA, NA, NA,
2015, 47, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA), ip_hematoma = c(NA, NA, NA, 1, NA, NA, NA, NA,
NA, NA, NA, NA, 1, NA, NA, NA, 273, 65, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), ip_no_hematoma = c(NA,
NA, NA, 63, NA, NA, NA, NA, NA, NA, NA, NA, 273, NA, NA, NA,
7791, 1713, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA)), .Names = c("Study", "Author", "Year", "op_return_OR",
"op_no_return_OR", "ip_return_OR", "ip_no_return_OR", "op_death",
"op_no_death", "ip_death", "ip_no_death", "op_thrombo", "op_no_thrombo",
"ip_thrombo", "ip_no_thrombo", "op_stroke", "op_no_stroke", "ip_stroke",
"ip_no_stroke", "op_dysphagia", "op_no_dysphagia", "ip_dysphagia",
"ip_no_dysphagia", "op_hematoma", "op_no_hematoma", "ip_hematoma",
"ip_no_hematoma"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-35L))
The par option looks ok to me. I changed the ylim option and modified the y location and size of some of the header text as below:
#Generate a forest plot of the data
forest(REmodel, xlim=c(-17, 6),
ylim=c(-1, 10),
ilab=cbind(ACDF$op_return_OR, ACDF$op_no_return_OR, ACDF$ip_return_OR,
ACDF$ip_no_return_OR),
ilab.xpos=c(-9.5,-8,-6,-4.5), cex=.75,
psize=1)
### add column headings to the plot
text(c(-9.5,-8,-6,-4.5), 8.5, c("Return+", "Return-", "Return+", "Return-"),
cex = 0.65)
text(c(-8.75,-5.25), 9.5, c("Outpatient", "Inpatient"))
text(-17, 8.5, "Study", pos=4)
text(6, 8.5, "Log Odds Ratio [95% CI]", pos=2)
This gives the following plot:

Subset xts object using vector of unique index days

I'm trying to subset an xts object using a vector of xts timestamps that have been processed into a vector of unique timestamps. This follows on from this previous question that was only partially answered.
Some sample data:
dput(sample.data.merge, control="all")
structure(c(11.65, 11.13, 11.13, 11.5, 11.8, 11.45, 11.45, 11.08,
11.08, 11.25, 9.8, 10.45, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9,
10.45, 10.5, 10.5, 10.08, 10.08, 10.65, 10.08, 10.65, 10.6, 10.65,
10.65, 10.085, 10.145, 11.9, 11.085, 9.35, 9.15, 9.15, 9.9, 9.0875,
9.3, 9.3, 9.3, 9.35, 9.35, 9.35, 9.25, 9.5, 9.45, 9.3, 11.15,
11.15, 11.15, 11.15, 11.8, 8, 10.05, 10.05, 10.25, 10.4, 10.15,
10.15, 10.3, 10.15, 10.1, 11.08, 11.08, 11.08, 11.65, 11.85,
11.9, 11.9, 11.9, 12.65, 13.35, 13.35, 15.95, 15.9, 15.4, 15.4,
15.4, 15.4, 15.13, 12.13, 12.35, 11.082, 11.082, 11.08, 12.1,
12.3, 12.3, 12.4, 12.6, 12.6, 12.13, 12.45, 12.9, 12.9, 12.9,
14, 12.6, 12.6, 12.45, 15.25, 12.085, 12.95, 12.95, 12.35, 12.13,
12.8, 14, 14, 12.45, 12.45, 12.45, 12.45, 12.25, 12.6, 12.085,
15.1, 15.15, 15.35, 15.3, 12.5, 12.5, 12.15, 12.2, 11.085, 11.35,
11.45, 11.13, 11.13, 11.35, 11.2, 12.5, 12.6, 12.95, 12.95, 12.5,
12.45, 12.3, 12.3, 12.3, 12.45, 12.45, 12.45, 12.5, 12.45, 12.45,
12.13, 12.13, 12.65, 190, 190, 190, 190, 130, 190, 190, 190,
190, 190, 130, 190, 130, 130, 445, 445, 445, 445, 130, 445, 190,
445, 445, 190, 190, 190, 190, 130, 190, 190, 190, 190, 190, 190,
190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190,
190, 275, 190, 190, 190, 190, 190, 190, 190, 190, 190, 130, 130,
190, 190, 190, 130, 130, 130, 190, 130, 190, 190, 190, 130, 190,
190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190,
1190, 190, 190, 130, 130, 130, 190, 1130, 190, 190, 130, 190,
190, 190, 190, 190, 190, 130, 130, 190, 190, 375, 190, 190, 190,
130, 190, 130, 190, 190, 190, 190, 130, 190, 190, 190, 190, 190,
190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190,
130, 130, 130, 190, 130, 190, 190, 190, 130, 130, 445, 445, 130,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 0, 0, NA, NA, NA, NA, NA, 0.21, 0.21, 0.26, 0.0250000000000004,
0, 0.0250000000000004, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 0.0249999999999995, 0.0250000000000004, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.0250000000000004,
0.100000000000001, 0.39, NA, NA, NA, NA, NA, 0.0250000000000004,
NA, NA, NA, NA, NA, 0.524999999999999, 0.25, 0, 0, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 0.149999999999999, 0.135000000000001,
0.149999999999999, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 0.409999999999999, 0.375, 0.3, 0.635, 0.385, 0.335, 0.175000000000001,
0, NA, NA, NA, NA, NA, 1.4, 0.2, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 0.109999999999999, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 0.0749999999999993, 0.0749999999999993, 0.0749999999999993,
0.0250000000000004, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, NA, NA, NA,
NA, NA, 127.5, 0, 0, 0, 0, 0, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 30, 30, 30, NA, NA, NA, NA,
NA, 0, NA, NA, NA, NA, NA, 0, 0, 0, 0, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 30, 30, 30, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 0, 30, 30, 0, 0, 0, 0, 0, NA, NA, NA, NA, NA, 0,
0, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 0, 0, 30, 0, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 10.9,
10.9, NA, NA, NA, NA, NA, 10.29, 10.29, 10.34, 10.625, 10.65,
10.625, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 9.325,
9.325, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 10.15, 10.225, 10.69, NA, NA, NA, NA, NA, 11.9,
NA, NA, NA, NA, NA, 15.4, 15.4, 15.4, 15.4, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 12.35, 12.35, 12.425, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 12.65, 12.575, 12.875, 12.875, 12.625,
12.625, 12.625, 12.45, NA, NA, NA, NA, NA, 13.85, 15.125, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 11.275, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 12.375, 12.375, 12.375, 12.45, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 445, 445, NA, NA, NA, NA, NA, 317.5, 190, 190, 190, 190,
190, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 190,
190, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 160, 160, 160, NA, NA, NA, NA, NA, 190, NA, NA,
NA, NA, NA, 190, 190, 190, 190, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 160, 160, 160, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 190, 190, 190, 190, 190, 190, 190, 190, NA, NA, NA, NA,
NA, 190, 190, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 190, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 130, 130, 160, 190, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NaN, Inf, NA, NA, NA, NA, NA, 0.999999999999996,
1.71428571428572, 1, 1, NaN, 21.5999999999997, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1.00000000000004, 2.99999999999993,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 37.1999999999995, 8.54999999999987, 0.999999999999998,
NA, NA, NA, NA, NA, 29.9999999999996, NA, NA, NA, NA, NA, 0,
0, NaN, Inf, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1.66666666666666,
1.62962962962963, 0.166666666666658, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 1.26829268292683, 0.600000000000004,
3.75, 1.77165354330709, 0.454545454545457, 0.522388059701495,
1, NaN, NA, NA, NA, NA, NA, 1.07142857142857, 0.875000000000003,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.681818181818179, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, 2, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NaN, Inf, NA, NA, NA, NA, NA, 1, NaN, NaN, Inf, NaN, NaN,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NaN, NaN,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 1, 1, 1, NA, NA, NA, NA, NA, Inf, NA, NA, NA, NA, NA,
NaN, NaN, NaN, NaN, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1,
1, 32.3333333333333, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NaN, 6.16666666666667, 0, NaN, NaN, Inf, NaN, Inf, NA,
NA, NA, NA, NA, NaN, NaN, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NaN, NA, NA, NA, NA, NA, NA, NA, NA, NA, NaN, Inf, 1, NaN,
NA, NA, NA, NA, NA), .Dim = c(150L, 8L), .Dimnames = list(NULL,
c("price", "volume", "madprice", "madvolume", "medianprice",
"medianvolume", "absdevmadprice", "absdevmadvolume")), index = structure(c(1325584080,
1325594940, 1325594940, 1325604600, 1325759100, 1325762520, 1325762520,
1325769300, 1325769300, 1325848080, 1325864880, 1326128220, 1326196500,
1326196500, 1326196500, 1326196500, 1326196500, 1326196500, 1326209700,
1326279480, 1326283620, 1326288300, 1326288300, 1326289680, 1326289680,
1326289680, 1326292320, 1326294060, 1326294600, 1326297600, 1326387000,
1326456720, 1326467160, 1326711600, 1326723000, 1326724260, 1326809940,
1326814860, 1326885960, 1326885960, 1326889980, 1326894000, 1326895200,
1326895200, 1326898080, 1326986700, 1326987240, 1326992100, 1327072140,
1327328040, 1327328040, 1327328040, 1327417920, 1327423140, 1327424820,
1327425240, 1327483200, 1327496520, 1327570320, 1327570320, 1327575420,
1327588680, 1327588980, 1327595880, 1327595880, 1327595880, 1327664820,
1327674720, 1327680660, 1327680780, 1327680780, 1327683960, 1327914300,
1327914300, 1327915260, 1327918140, 1327924860, 1327924920, 1327924980,
1327924980, 1327927680, 1328013360, 1328014200, 1328025000, 1328025000,
1328026740, 1328089440, 1328091360, 1328091360, 1328110620, 1328111340,
1328111340, 1328112420, 1328113800, 1328193540, 1328194080, 1328194140,
1328196720, 1328274360, 1328274420, 1328278320, 1328519280, 1328520120,
1328520600, 1328520600, 1328524140, 1328527980, 1328531580, 1328540880,
1328540880, 1328547600, 1328547660, 1328547720, 1328547780, 1328607060,
1328608080, 1328618760, 1328623380, 1328623380, 1328625720, 1328631480,
1328717760, 1328717880, 1328793000, 1328797980, 1329132840, 1329210480,
1329215400, 1329215820, 1329215820, 1329219480, 1329223140, 1329300900,
1329301620, 1329315240, 1329315240, 1329388740, 1329389700, 1329390000,
1329390000, 1329390180, 1329391860, 1329391860, 1329391860, 1329402120,
1329467700, 1329467700, 1329469080, 1329469080, 1329471300), tzone = "", tclass = c("POSIXlt",
"POSIXt")), .indexCLASS = c("POSIXlt", "POSIXt"), .indexTZ = "", tclass = c("POSIXlt",
"POSIXt"), tzone = "", class = c("xts", "zoo"))
The code:
sample.data.mergesub <- sample.data.merge['T10:30/T17:30']
sample.data.mergeout <- sample.data.mergesub[ which((sample.data.mergesub$absdevmadprice >=5 & sample.data.mergesub$absdevmadprice < Inf) | (sample.data.mergesub$absdevmadvol>=10 & sample.data.mergesub$absdevmadvol<Inf)),]
sample.data.unique <- unique(.indexday(sample.data.mergeout))
This sample.data.unique is therefore a vector of index days. Question: I'd like to use this to extract the full day of data from the original dataset sample.data in order to later graph the full day of trades, rather than the subset of data. For instance, if Jan 03 2012 10:53:00 meets the conditions of having absdevmadprice >= 5, and less than infinite, then I'd like to return the day (Jan 03 2012) into a vector and use this to subset the original dataset. This would select all observations in that day (so over the whole trading period) and I could then graph this day.
I've tried this code (based on Joshua's answer here) but it doesn't work:
> sample.data.uniquePOS<-sample.data.merge[paste(as.Date(as.POSIXct(sample.data.unique, origin = "1970-01-01 00:00.00 UTC", tz="GMT")))]
It returns simply the column names:
> sample.data.uniquePOS
price volume madprice madvolume medianprice medianvolume absdevmadprice
absdevmadvolume
For info, the structure of the variables:
> str(sample.data.merge)
An ‘xts’ object on 2012-01-03 09:48:00/2012-02-17 09:35:00 containing:
Data: num [1:150, 1:8] 11.6 11.1 11.1 11.5 11.8 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:8] "price" "volume" "madprice" "madvolume" ...
Indexed by objects of class: [POSIXlt,POSIXt] TZ:
xts Attributes:
NULL
> str(sample.data.uniquePOS)
An 'xts' object of zero-width
> str(sample.data.unique)
num 15371
Thanks for the help (and if anyone can explain why the code doesn't work!).
answer to own question:
Using these posts (Ananda's answer to this, Joshua's answer to this, and the as.Date.numeric function I found out about here) I was able to solve my own problem. This line of code seems to do it:
sample.data.uniquePOS <- sample.data.merge[paste(as.Date.numeric(sample.data.unique, origin= "1970-01-01 00:00.00 UTC", tz="GMT")),]
Can't give a great explanation as to why it works compared to the below, but perhaps as.POSIXct can't take the same format that as.Date.numeric can?
sample.data.uniquePOS <- sample.data.merge[paste(as.Date(as.POSIXct(sample.data.unique, origin = "1970-01-01 00:00.00 UTC", tz="GMT")))]

Resources