I would like to identify if an activity occurs consecutive times and how often during a week. The starting point is t1 that records the occurrence of an activity at t1_1 , t1_2, t1_3 and so on. For example in the case of id 12 activity occurred at t1_2, t1_3, t2_2, t3_1, t3_3, t4_2, t5_2, t6_1, t6_2, t6_3 and t7_3. As here was reported activity during all 7 days I assume the activity occurred consecutively. I would like to identify all id's in which an activity occured consecutively and the sum of occurrence.
Input
id t1_1 t1_2 t1_3 t2_1 t2_2 t2_3 t3_1 t3_2 t3_3 t4_1 t4_2 t4_3 t5_1 t5_2 t5_3 t6_1 t6_2 t6_3 t7_1 t7_2 t7_3
12 0 1 1 0 1 0 1 0 1 0 1 0 0 1 0 1 1 1 0 0 1
123 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 1 1 1
10 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Output
Id Sum
12 11
10 21
Here is an option with rle. Loop over the rows of the dataset with apply (MARGIN = 1) without the 'id' column, apply rle and extract the lengths where the 'values' are 1 ('x1'). If the length of 'x1' is either 1 or greater than or equal to 7, get the sum (1 is because if all the values are 1). Then, stack the named list to a 2 column data.frame and set the names of the columns ('out')
out <- stack(setNames(apply(df1[-1], 1, function(x) {
x1 <- with(rle(x), lengths[as.logical(values)])
if(length(x1) >=7|length(x1) == 1) sum(x1) }), df1$id))[2:1]
names(out) <- c('Id', 'Sum')
out
# Id Sum
#1 12 11
#2 10 21
data
df1 <- structure(list(id = c(12L, 123L, 10L), t1_1 = c(0L, 0L, 1L),
t1_2 = c(1L, 0L, 1L), t1_3 = c(1L, 0L, 1L), t2_1 = c(0L,
1L, 1L), t2_2 = c(1L, 1L, 1L), t2_3 = c(0L, 1L, 1L), t3_1 = c(1L,
0L, 1L), t3_2 = c(0L, 0L, 1L), t3_3 = c(1L, 0L, 1L), t4_1 = c(0L,
1L, 1L), t4_2 = c(1L, 1L, 1L), t4_3 = c(0L, 1L, 1L), t5_1 = c(0L,
1L, 1L), t5_2 = c(1L, 1L, 1L), t5_3 = c(0L, 1L, 1L), t6_1 = c(1L,
0L, 1L), t6_2 = c(1L, 0L, 1L), t6_3 = c(1L, 0L, 1L), t7_1 = c(0L,
1L, 1L), t7_2 = c(0L, 1L, 1L), t7_3 = c(1L, 1L, 1L)),
class = "data.frame", row.names = c(NA,
-3L))
An option using data.table:
melt(DT, id.vars="id")[,
c("day", "time") := tstrsplit(variable, "_")][
value==1L, if(all(paste0("t", 1L:7L) %chin% day)) .(Sum=sum(value)) , id]
output:
id Sum
1: 10 21
2: 12 11
data:
library(data.table)
DT <- fread("id t1_1 t1_2 t1_3 t2_1 t2_2 t2_3 t3_1 t3_2 t3_3 t4_1 t4_2 t4_3 t5_1 t5_2 t5_3 t6_1 t6_2 t6_3 t7_1 t7_2 t7_3
12 0 1 1 0 1 0 1 0 1 0 1 0 0 1 0 1 1 1 0 0 1
123 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 1 1 1
10 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1")
Explanation:
Convert into long format using melt
use tstrsplit to split the columns names into day of the week and time
filter for value==1L and then for each id, check if all 7 days are in subset before summing (i.e. if(all(paste0("t", 1L:7L) %chin% day)) .(Sum=sum(value)))
Related
I would like to identify the duration of an activity that start at t1 and end at t7. The starting point is t1 that records the occurrence of an activity at t1_1 , t1_2, t1_3 and so on. For example in the case of id 12 activity occurred at t1_2 and t1_3 (i would like to save this) t2_2 (as there is no activity before and after I am not intrested in this activity), t3_1 (same as t2_2), t3_3, t4_2, t5_2, t6_1, t6_2, t6_3 and t7_3. I would like to identify to the start and end all id's in which an activity occured, the duration and the most frequent one.
Input:
id t1_1 t1_2 t1_3 t2_1 t2_2 t2_3 t3_1 t3_2 t3_3 t4_1 t4_2 t4_3 t5_1 t5_2 t5_3 t6_1 t6_2 t6_3 t7_1 t7_2 t7_3
12 0 1 1 0 1 0 1 0 1 0 1 0 0 1 0 1 1 1 0 0 1
123 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 1 1 1
10 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Output for id 12
Id Start/End Duration Frequency
12 t1_1, t1_3 2 1
12 t6_1, t6_3 3 1
One way to dot this is using the bioconductor library but are there any better solution?
Sample data
df1 <- structure(list(id = c(12L, 123L, 10L), t1_1 = c(0L, 0L, 1L),
t1_2 = c(1L, 0L, 1L), t1_3 = c(1L, 0L, 1L), t2_1 = c(0L,
1L, 1L), t2_2 = c(1L, 1L, 1L), t2_3 = c(0L, 1L, 1L), t3_1 = c(1L,
0L, 1L), t3_2 = c(0L, 0L, 1L), t3_3 = c(1L, 0L, 1L), t4_1 = c(0L,
1L, 1L), t4_2 = c(1L, 1L, 1L), t4_3 = c(0L, 1L, 1L), t5_1 = c(0L,
1L, 1L), t5_2 = c(1L, 1L, 1L), t5_3 = c(0L, 1L, 1L), t6_1 = c(1L,
0L, 1L), t6_2 = c(1L, 0L, 1L), t6_3 = c(1L, 0L, 1L), t7_1 = c(0L,
1L, 1L), t7_2 = c(0L, 1L, 1L), t7_3 = c(1L, 1L, 1L)),
class = "data.frame", row.names = c(NA,
-3L))
We convert to 'long' format with pivot_longer, then create a grouping variable with rleid (from data.table) based on the occurrence of similar adjacent elements in 'value', filter the rows where the 'value' is 1, grouped by 'id', 'grp', we keep only rows where the frequency count is greater than 1, summarise by pasteing (str_c) the first and last elements of 'name' as well get the count (n()) and arrange if necessary
library(dplyr)
library(tidyr)
library(stringr)
library(data.table)
df1 %>%
pivot_longer(cols = -id) %>%
mutate(grp = rleid(value)) %>%
filter(as.logical(value)) %>%
group_by(id, grp) %>%
filter(n() > 1) %>%
summarise(Start_End = str_c(first(name), last(name), sep=", "),
Duration = n()) %>%
arrange(id, grp)
library('data.table')
df1 <- melt(setDT(df1), id.var = 'id')
df1[, c('time', 'subtime') := tstrsplit(as.character(variable), "_", fixed = TRUE)]
df2 <- df1[, rle(value), by = .(id, time)][lengths > 1 & values == 1, ]
df3 <- df1[df2, on = c('id', 'time')]
df3 <- df3[, .(`Start/End` = paste0(time, '_', c(min(subtime), max(subtime)), collapse = " - "),
Duration = unique(lengths)),
by = .(id, time)]
df3[, Frequency := .N, by = .(id, `Start/End`)]
df3[, time := NULL]
df3[order(id), ]
# id Start/End Duration Frequency
# 1: 10 t1_1 - t1_3 3 1
# 2: 10 t2_1 - t2_3 3 1
# 3: 10 t3_1 - t3_3 3 1
# 4: 10 t4_1 - t4_3 3 1
# 5: 10 t5_1 - t5_3 3 1
# 6: 10 t6_1 - t6_3 3 1
# 7: 10 t7_1 - t7_3 3 1
# 8: 12 t1_1 - t1_3 2 1
# 9: 12 t6_1 - t6_3 3 1
# 10: 123 t2_1 - t2_3 3 1
# 11: 123 t4_1 - t4_3 3 1
# 12: 123 t5_1 - t5_3 3 1
# 13: 123 t7_1 - t7_3 3 1
I have a dataframe like this:
> e=read.table("SG.genotypes.txt", header=TRUE)
> head(e)
ID HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103
1 snp_3_47609552 0 1 1 1 1 0 1
2 snp_3_47614413 0 1 1 1 1 0 1
3 snp_3_47616151 0 1 1 1 1 0 1
4 snp_2_47616155 0 1 1 1 1 0 1
5 snp_2_47617504 0 1 1 1 1 0 1
6 snp_5_47617679 0 1 1 1 1 0 1
...
My data frame has many more snp_ names, but let's say how to split this example into 3 output files say named: chr_2,chr_3,chr_5
where chr_3 file for example will have just these lines:
ID HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103
1 snp_3_47609552 0 1 1 1 1 0 1
2 snp_3_47614413 0 1 1 1 1 0 1
3 snp_3_47616151 0 1 1 1 1 0
One way to do this would be to split column ID by string name and create two columns, but I wonder is there is a better way to do this.
We can substring the 'ID' column and use that to split
lst1 <- split(df1, substr(df1$ID, 1, 5))
Note that if the number after the 'snp_' is greater than 9, it may be better to use sub instead of substr
lst1 <- split(df1, sub("^(snp_\\d+)_.*", "\\1", df1$ID))
names(lst1) <- sub("snp", "chr", names(lst1))
lst1
#$chr_2
# ID HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103
#4 snp_2_47616155 0 1 1 1 1 0 1
#5 snp_2_47617504 0 1 1 1 1 0 1
#$chr_3
# ID HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103
#1 snp_3_47609552 0 1 1 1 1 0 1
#2 snp_3_47614413 0 1 1 1 1 0 1
#3 snp_3_47616151 0 1 1 1 1 0 1
#$chr_5
# ID HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103
#6 snp_5_47617679 0 1 1 1 1 0 1
Loop through the names of the list and write it to .csv file
lapply(names(lst1), function(nm) write.csv(lst[[nm]],
file = paste0(nm, ".csv"), quote = FALSE, row.names = FALSE))
data
df1 <- structure(list(ID = c("snp_3_47609552", "snp_3_47614413", "snp_3_47616151",
"snp_2_47616155", "snp_2_47617504", "snp_5_47617679"), HG00096 = c(0L,
0L, 0L, 0L, 0L, 0L), HG00097 = c(1L, 1L, 1L, 1L, 1L, 1L), HG00099 = c(1L,
1L, 1L, 1L, 1L, 1L), HG00100 = c(1L, 1L, 1L, 1L, 1L, 1L), HG00101 = c(1L,
1L, 1L, 1L, 1L, 1L), HG00102 = c(0L, 0L, 0L, 0L, 0L, 0L), HG00103 = c(1L,
1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
I have data frame containing the results of a multiple choice question. Each item has either 0 (not mentioned) or 1 (mentioned). The columns are named like this:
F1.2_1, F1.2_2, F1.2_3, F1.2_4, F1.2_5, F1.2_99
etc.
I would like to concatenate these values like this: The new column should be a semicolon-separated string of the selected items. So if a row has a 1 in F1.2_1, F1.2_4 and F1.2_5 it should be: 1;4;5
The last digit(s) of the dichotome columns are the item codes to be used in the string.
Any idea how this could be achieved with R (and data.table)? Thanks for any help!
edit:
Here is a example DF with the desired result:
structure(list(F1.2_1 = c(0L, 1L, 0L, 1L), F1.2_2 = c(1L, 0L,
0L, 1L), F1.2_3 = c(0L, 1L, 0L, 1L), F1.2_4 = c(0L, 1L, 0L, 0L
), F1.2_5 = c(0L, 0L, 0L, 0L), F1.2_99 = c(0L, 0L, 1L, 0L), desired_result = structure(c(3L,
2L, 4L, 1L), .Label = c("1;2;3", "1;3;4", "2", "99"), class = "factor")), .Names = c("F1.2_1",
"F1.2_2", "F1.2_3", "F1.2_4", "F1.2_5", "F1.2_99", "desired_result"
), class = "data.frame", row.names = c(NA, -4L))
F1.2_1 F1.2_2 F1.2_3 F1.2_4 F1.2_5 F1.2_99 desired_result
1 0 1 0 0 0 0 2
2 1 0 1 1 0 0 1;3;4
3 0 0 0 0 0 1 99
4 1 1 1 0 0 0 1;2;3
In his comment, the OP asked how to deal with more multiple choice questions.
The approach below will be able to handle an arbitrary number of questions and choices for each question. It uses melt() and dcast() from the data.table package.
Sample input data
Let's assume the input data.frame DT for the extended case contains two questions, one with 6 choices and the other with 4 choices:
DT
# F1.2_1 F1.2_2 F1.2_3 F1.2_4 F1.2_5 F1.2_99 F2.7_1 F2.7_2 F2.7_3 F2.7_11
#1: 0 1 0 0 0 0 0 1 1 0
#2: 1 0 1 1 0 0 1 1 1 1
#3: 0 0 0 0 0 1 1 0 1 0
#4: 1 1 1 0 0 0 1 0 1 1
Code
library(data.table)
# coerce to data.table and add row number for later join
setDT(DT)[, rn := .I]
# reshape from wide to long format
molten <- melt(DT, id.vars = "rn")
# alternatively, the measure cols can be specified (in case of other id vars)
# molten <- melt(DT, measure.vars = patterns("^F"))
# split question id and choice id
molten[, c("question_id", "choice_id") := tstrsplit(variable, "_")]
# reshape only selected choices from long to wide format,
# thereby pasting together the ids of the selected choices for each question
result <- dcast(molten[value == 1], rn ~ question_id, paste, collapse = ";",
fill = NA, value.var = "choice_id")
# final join for demonstration only, remove row number as no longer needed
DT[result, on = "rn"][, rn := NULL][]
# F1.2_1 F1.2_2 F1.2_3 F1.2_4 F1.2_5 F1.2_99 F2.7_1 F2.7_2 F2.7_3 F2.7_11 F1.2 F2.7
#1: 0 1 0 0 0 0 0 1 1 0 2 2;3
#2: 1 0 1 1 0 0 1 1 1 1 1;3;4 1;2;3;11
#3: 0 0 0 0 0 1 1 0 1 0 99 1;3
#4: 1 1 1 0 0 0 1 0 1 1 1;2;3 1;3;11
For each question, the final result shows which choices were selected in each row.
Reproducible data
The sample data can be created with
DT <- structure(list(F1.2_1 = c(0L, 1L, 0L, 1L), F1.2_2 = c(1L, 0L,
0L, 1L), F1.2_3 = c(0L, 1L, 0L, 1L), F1.2_4 = c(0L, 1L, 0L, 0L
), F1.2_5 = c(0L, 0L, 0L, 0L), F1.2_99 = c(0L, 0L, 1L, 0L), F2.7_1 = c(0L,
1L, 1L, 1L), F2.7_2 = c(1L, 1L, 0L, 0L), F2.7_3 = c(1L, 1L, 1L,
1L), F2.7_11 = c(0L, 1L, 0L, 1L)), .Names = c("F1.2_1", "F1.2_2",
"F1.2_3", "F1.2_4", "F1.2_5", "F1.2_99", "F2.7_1", "F2.7_2",
"F2.7_3", "F2.7_11"), row.names = c(NA, -4L), class = "data.frame")
We can try
j1 <- do.call(paste, c(as.integer(sub(".*_", "",
names(DF)[-7]))[col(DF[-7])]*DF[-7], sep=";"))
DF$newCol <- gsub("^;+|;+$", "", gsub(";*0;|0$|^0", ";", j1))
DF$newCol
#[1] "2" "1;3;4" "99" "1;2;3"
I've got genotyping data from several overlapping NPs/individuals which I am attempting to compare.
As you can see in the data structure below, e[1,2] and e[2,3] have NA's. Now I want to replace d[1,2](1) and d[2,3](1) by NA values.
d <- structure(list(`100099681` = c(0L, 2L, 0L), `101666591` = c(1L, 1L, 0L), `102247652` = c(1L, 1L, 1L), `102284616` = c(0L, 1L, 0L), `103582612` = c(0L, 1L, 1L), `104344528` = c(2L, 1L, 0L), `105729734` = c(1L, 0L, 1L), `109897137` = c(0L, 0L, 2L), `112768301` = c(0L, 1L, 1L), `114724443` = c(1L, 1L, 1L), `114826164` = c(1L, 0L, 1L), `115358770` = c(0L, 2L, 0L), `115399788` = c(1L, 1L, 0L), `118669033` = c(0L, 1L, 1L), `118875482` = c(2L, 1L, 0L), `119366362` = c(0L, 2L, 0L), `119627971` = c(0L, 1L, 1L), `120295351` = c(0L, 2L, 0L), `120998030` = c(0L, 0L, 2L)), .Names = c("100099681", "101666591", "102247652", "102284616", "103582612", "104344528", "105729734", "109897137", "112768301", "114724443", "114826164", "115358770", "115399788", "118669033", "118875482", "119366362", "119627971", "120295351", "120998030"), row.names = c("7:100038150_C", "7:100079759_T", "7:100256942_A"), class = "data.frame")
> d
# 100099681 101666591 102247652 102284616 103582612 104344528 105729734 109897137 112768301 114724443 114826164 115358770 115399788 118669033 118875482 119366362 119627971 120295351 120998030
#7:100038150_C 0 1 1 0 0 2 1 0 0 1 1 0 1 0 2 0 0 0 0
#7:100079759_T 2 1 1 1 1 1 0 0 1 1 0 2 1 1 1 2 1 2 0
#7:100256942_A 0 0 1 0 1 0 1 2 1 1 1 0 0 1 0 0 1 0 2
e<- structure(list(`100099681` = c(1L, 1L, 0L), `101666591` = c(NA, 1L, 1L), `102247652` = c(0L, NA, 0L), `102284616` = c(1L, 1L, 0L), `103582612` = c(1L, 0L, 1L), `104344528` = c(1L, 0L, 1L), `105729734` = c(0L, 0L, 1L), `109897137` = c(1L, 1L, 0L), `112768301` = c(0L, 1L, 1L), `114724443` = c(0L, 2L, 0L), `114826164` = c(0L, 0L, 2L), `115358770` = c(0L, 0L, 2L), `115399788` = c(0L, 2L, 0L), `118669033` = c(0L, 0L, 2L), `118875482` = c(0L, 1L, 1L), `119366362` = c(2L, 1L, 0L), `119627971` = c(0L, 1L, 1L), `120295351` = c(0L, 2L, 0L), `120998030` = c(0L, 2L, 1L)), .Names = c("100099681", "101666591", "102247652", "102284616", "103582612", "104344528", "105729734", "109897137", "112768301", "114724443", "114826164", "115358770", "115399788", "118669033", "118875482", "119366362", "119627971", "120295351", "120998030"), row.names = c("7:100038150_C", "7:100079759_T", "7:100256942_A"), class = "data.frame")
> e
# 100099681 101666591 102247652 102284616 103582612 104344528 105729734 109897137 112768301 114724443 114826164 115358770 115399788 118669033 118875482 119366362 119627971 120295351 120998030
#7:100038150_C 1 NA 0 1 1 1 0 1 0 0 0 0 0 0 0 2 0 0 0
#7:100079759_T 1 1 NA 1 0 0 0 1 1 2 0 0 2 0 1 1 1 2 2
#7:100256942_A 0 1 0 0 1 1 1 0 1 0 2 2 0 2 1 0 1 0 1
Thus my expected output would be
> expected_d
# 100099681 101666591 102247652 102284616 103582612 104344528 105729734 109897137 112768301 114724443 114826164 115358770 115399788 118669033 118875482 119366362 119627971 120295351 120998030
#7:100038150_C 0 NA 1 0 0 2 1 0 0 1 1 0 1 0 2 0 0 0 0
#7:100079759_T 2 1 NA 1 1 1 0 0 1 1 0 2 1 1 1 2 1 2 0
#7:100256942_A 0 0 1 0 1 0 1 2 1 1 1 0 0 1 0 0 1 0 2
I've gotten this far;
g <- which(is.na(e), arr.ind=TRUE)
> g
# row col
#7:100038150_C 1 2
#7:100079759_T 2 3
Then trying to use an apply function to replace the location by "TEST" (or na for that matter)
apply(g, 1, function(x){
e[x[1], x[2]] <- "TEST" }
)
#> apply(g, 1, function(x){ e[x[1], x[2]] <- "TEST" })
#7:100038150_C 7:100079759_T
# "TEST" "TEST"
I will be running this bit of code over several million rows/columns so speed will be an issue.
Thank you in advance:)
We can try doing
NA^(is.na(e))*d
If memory is an issue
d[] <- Map(function(x,y) NA^(is.na(y))* x, d, e)
Another way based on your approach,
d[which(is.na(e), arr.ind = T)] <- NA
What is the best way to determine a factor or create a new category field based on a number of boolean fields? In this example, I need to count the number of unique combinations of medications.
> MultPsychMeds
ID OLANZAPINE HALOPERIDOL QUETIAPINE RISPERIDONE
1 A 1 1 0 0
2 B 1 0 1 0
3 C 1 0 1 0
4 D 1 0 1 0
5 E 1 0 0 1
6 F 1 0 0 1
7 G 1 0 0 1
8 H 1 0 0 1
9 I 0 1 1 0
10 J 0 1 1 0
Perhaps another way to state it is that I need to pivot or cross tabulate the pairs. The final results need to look something like:
Combination Count
OLANZAPINE/HALOPERIDOL 1
OLANZAPINE/QUETIAPINE 3
OLANZAPINE/RISPERIDONE 4
HALOPERIDOL/QUETIAPINE 2
This data frame can be replicated in R with:
MultPsychMeds <- structure(list(ID = structure(1:10, .Label = c("A", "B", "C",
"D", "E", "F", "G", "H", "I", "J"), class = "factor"), OLANZAPINE = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L), HALOPERIDOL = c(1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L), QUETIAPINE = c(0L, 1L, 1L, 1L,
0L, 0L, 0L, 0L, 1L, 1L), RISPERIDONE = c(0L, 0L, 0L, 0L, 1L,
1L, 1L, 1L, 0L, 0L)), .Names = c("ID", "OLANZAPINE", "HALOPERIDOL",
"QUETIAPINE", "RISPERIDONE"), class = "data.frame", row.names = c(NA,
-10L))
Here's one approach using the reshape and plyr packages:
library(reshape)
library(plyr)
#Melt into long format
dat.m <- melt(MultPsychMeds, id.vars = "ID")
#Group at the ID level and paste the drugs together with "/"
out <- ddply(dat.m, "ID", summarize, combos = paste(variable[value == 1], collapse = "/"))
#Calculate a table
with(out, count(combos))
x freq
1 HALOPERIDOL/QUETIAPINE 2
2 OLANZAPINE/HALOPERIDOL 1
3 OLANZAPINE/QUETIAPINE 3
4 OLANZAPINE/RISPERIDONE 4
Just for fun, a base R solution (that can be turned into a oneliner :-) ):
data.frame(table(apply(MultPsychMeds[,-1], 1, function(currow){
wc<-which(currow==1)
paste(colnames(MultPsychMeds)[wc+1], collapse="/")
})))
Another way could be:
subset(
as.data.frame(
with(MultPsychMeds, table(OLANZAPINE, HALOPERIDOL, QUETIAPINE, RISPERIDONE)),
responseName="count"
),
count>0
)
which gives
OLANZAPINE HALOPERIDOL QUETIAPINE RISPERIDONE count
4 1 1 0 0 1
6 1 0 1 0 3
7 0 1 1 0 2
10 1 0 0 1 4
It's not an exact way you want it, but is fast and simple.
There is shorthand in plyr package:
require(plyr)
count(MultPsychMeds, c("OLANZAPINE", "HALOPERIDOL", "QUETIAPINE", "RISPERIDONE"))
# OLANZAPINE HALOPERIDOL QUETIAPINE RISPERIDONE freq
# 1 0 1 1 0 2
# 2 1 0 0 1 4
# 3 1 0 1 0 3
# 4 1 1 0 0 1