I'm trying to use the Scalable Bayesian Rule Lists Model for creating some rule lists in R.
Link to package: SBRL Package R
I read data into a list, split into train and test and plug into the function
sbrl_model <- sbrl(data_train,iters=20000, pos_sign="1", neg_sign="0",)
which gives me the following error:
Error in asMethod(object) :
column(s) 1, 2, 4, 6 not logical or a factor. Discretize the columns first.
When I convert the data_train into a factor and try using:
data_train <- sapply(data_train, as.factor)
sbrl_model <- sbrl::sbrl(data_train, iters=20000, pos_sign="1", neg_sign="0",)
I get the following error:
Error in data_train$label : $ operator is invalid for atomic vectors
My data has the following columns:
state, amounts, timestamp, code, risk, vendor, label
The label is 0 or 1. I need to create rules for detecting what data leads to a 1.
I'm new to R so this seems confusing. If I don't convert to factors, it complains, if I do it can't use the "$" operator. Any ideas what I'm doing wrong? Thank you
> dput(data_train)
structure(c("PR", "PR", "PR", "PR", "MA", "MA", "NH", "NH", "ME",
"ME", "ME", "VT", "VT", "CT", "CT", "NJ", "NJ", "NY", "NY", "NY",
"NY", "NY", "NY", "NY", "PA", "PA", "PA", "PA", "PA", "PA", "PA",
"PA", "PA", "DE", "VA", "VA", "VA", "WV", "WV", "WV", "WV", "WV",
"WV", "WV", "WV", "WV", "WV", "WV", "WV", "WV", "WV", "WV", "WV",
"WV", "WV", "WV", "GA", "GA", "FL", "FL", "FL", "FL", "FL", "FL",
"AL", "AL", "AL", "TN", "TN", "TN", "MS", "MS", "MS", "KY", "KY",
"KY", "KY", "KY", "KY", "KY", "KY", "KY", "OH", "OH", "OH", "OH",
"OH", "OH", "OH", "OH", "OH", "OH", "OH", "OH", "OH", "OH", "IN",
"IA", "IA", "IA", "IA", "WI", "MN", "MN", "MN", "MN", "MN", "SD",
"SD", "ND", "ND", "ND", "ND", "ND", "MO", "MO", "MO", "MO", "MO",
"MO", "MO", "MO", "MO", "MO", "MO", "MO", "KS", "KS", "KS", "KS",
"KS", "KS", "KS", "16441", "92946", "8970", "19937", "94589",
"50615", "75915", "50005", "23037", "14835", "83678", "66263",
"60818", "82760", "42137", "32888", "35385", "20242", "98269",
"16216", "76562", "49327", "30699", "1866", "91301", "75125",
"34016", "88673", "78612", "85008", "91030", "57276", "96772",
"79568", "59489", "14154", "71655", "78163", "41673", "19942",
"19364", "34004", "79349", "1611", "8875", "19673", "5422", "42395",
"11899", "26967", "73499", "79916", "71015", "73640", "39759",
"7735", "84853", "31662", "43183", "44787", "79001", "82999",
"17031", "88109", "62215", "56040", "66592", "59148", "20786",
"30106", "46561", "9125", "83512", "60031", "65233", "49512",
"8893", "46275", "11362", "29867", "61573", "46363", "91510",
"19267", "45554", "41193", "54267", "8045", "28089", "62450",
"69082", "66685", "80769", "15446", "62589", "42875", "74723",
"2934", "18540", "96540", "60812", "50636", "90924", "60556",
"90009", "15287", "35529", "28702", "82102", "96967", "5296",
"64804", "48743", "10867", "60914", "83678", "77883", "97631",
"97175", "48103", "63128", "46774", "18285", "74512", "69313",
"80414", "32394", "51103", "51155", "28672", "38460", "89024",
"49443", "2016-01-23 12:14:07", "2016-01-17 19:22:37", "2016-01-23 22:41:32",
"2016-01-27 09:58:34", "2016-01-30 08:40:06", "2016-01-28 01:41:40",
"2016-01-27 08:22:27", "2016-01-28 00:13:48", "2016-01-20 12:31:12",
"2016-01-17 08:25:30", "2016-01-28 13:01:36", "2016-01-20 12:10:46",
"2016-01-25 07:32:01", "2016-01-23 02:13:11", "2016-01-24 11:14:46",
"2016-01-16 20:59:35", "2016-01-19 20:12:58", "2016-01-19 06:38:06",
"2016-01-27 10:15:48", "2016-01-26 14:00:30", "2016-01-28 01:54:45",
"2016-01-27 05:43:58", "2016-01-25 22:07:06", "2016-01-18 09:58:05",
"2016-01-20 05:56:54", "2016-01-26 08:05:32", "2016-01-28 14:18:45",
"2016-01-22 06:25:48", "2016-01-27 18:05:50", "2016-01-16 11:33:47",
"2016-01-22 03:31:52", "2016-01-23 05:41:37", "2016-01-27 00:55:22",
"2016-01-16 17:19:51", "2016-01-18 10:05:42", "2016-01-22 10:20:16",
"2016-01-26 21:07:20", "2016-01-17 19:12:00", "2016-01-19 17:59:45",
"2016-01-28 08:50:18", "2016-01-16 09:31:52", "2016-01-24 14:50:13",
"2016-01-17 14:02:36", "2016-01-20 17:08:29", "2016-01-25 16:42:03",
"2016-01-19 04:18:27", "2016-01-20 03:05:13", "2016-01-26 23:34:33",
"2016-01-26 13:44:56", "2016-01-16 07:09:41", "2016-01-26 06:43:12",
"2016-01-26 20:22:25", "2016-01-23 05:58:38", "2016-01-19 23:21:00",
"2016-01-16 08:36:10", "2016-01-30 01:21:00", "2016-01-23 11:10:06",
"2016-01-27 15:29:30", "2016-01-30 15:50:38", "2016-01-19 08:32:33",
"2016-01-19 18:18:02", "2016-01-21 14:20:47", "2016-01-17 13:19:59",
"2016-01-20 05:49:06", "2016-01-16 15:54:17", "2016-01-21 09:15:42",
"2016-01-16 07:32:39", "2016-01-28 03:49:00", "2016-01-26 00:19:56",
"2016-01-25 10:29:44", "2016-01-23 06:26:45", "2016-01-29 08:03:34",
"2016-01-22 14:24:34", "2016-01-16 18:44:43", "2016-01-26 00:00:51",
"2016-01-20 17:38:03", "2016-01-17 22:38:47", "2016-01-30 10:12:01",
"2016-01-21 17:00:43", "2016-01-22 08:43:30", "2016-01-27 12:04:58",
"2016-01-25 21:09:40", "2016-01-27 16:35:42", "2016-01-27 20:09:03",
"2016-01-27 09:52:40", "2016-01-26 16:12:37", "2016-01-28 16:57:29",
"2016-01-30 13:48:47", "2016-01-30 19:15:03", "2016-01-24 19:33:56",
"2016-01-28 06:57:55", "2016-01-22 18:21:40", "2016-01-16 02:54:57",
"2016-01-23 08:18:44", "2016-01-20 13:47:54", "2016-01-24 16:23:39",
"2016-01-24 19:15:09", "2016-01-22 14:59:14", "2016-01-30 10:21:43",
"2016-01-27 11:54:39", "2016-01-30 15:19:59", "2016-01-24 19:21:48",
"2016-01-27 07:20:14", "2016-01-25 07:11:55", "2016-01-24 22:33:42",
"2016-01-26 14:30:57", "2016-01-16 13:12:46", "2016-01-28 11:25:45",
"2016-01-28 14:44:25", "2016-01-23 03:25:10", "2016-01-26 13:45:49",
"2016-01-19 06:14:21", "2016-01-25 22:12:29", "2016-01-25 12:13:07",
"2016-01-22 23:56:39", "2016-01-24 07:51:51", "2016-01-24 10:50:30",
"2016-01-21 07:02:41", "2016-01-21 09:52:54", "2016-01-26 22:35:52",
"2016-01-19 06:48:13", "2016-01-19 15:18:21", "2016-01-20 12:20:37",
"2016-01-16 07:04:34", "2016-01-24 10:20:05", "2016-01-25 09:01:09",
"2016-01-21 17:02:29", "2016-01-21 11:52:00", "2016-01-27 19:39:16",
"2016-01-19 18:33:35", "2016-01-18 06:00:23", "2016-01-17 01:27:11",
"2016-01-18 10:27:57", "3355", "4935", "5454", "9555", "5938",
"5855", "4888", "3885", "8533", "4359", "5339", "5554", "5894",
"8598", "5448", "9535", "3495", "3358", "3485", "3344", "8489",
"8553", "3354", "5889", "5948", "8455", "5988", "5595", "9354",
"8485", "4559", "4838", "5585", "5585", "8554", "8598", "5535",
"5355", "5844", "3485", "5885", "8833", "8558", "9889", "9885",
"8555", "3938", "8343", "8558", "5484", "3558", "3545", "8394",
"9933", "3853", "4598", "3855", "5845", "5588", "5495", "8585",
"9584", "3385", "8858", "9445", "8488", "8558", "5838", "5848",
"8845", "8848", "8945", "4599", "8585", "8858", "4598", "5358",
"5395", "9485", "4893", "4455", "8493", "9358", "5395", "8958",
"5888", "8888", "8555", "4885", "3538", "8998", "4445", "4838",
"9885", "3559", "5584", "9594", "8558", "3844", "5434", "8558",
"9898", "4395", "9585", "3858", "4858", "5895", "9383", "9858",
"8385", "5585", "4884", "8359", "8893", "3484", "8383", "5338",
"3544", "9859", "9454", "3539", "3583", "8455", "5983", "4345",
"4943", "5548", "8353", "8993", "8594", "8994", "3958", "3989",
"W sWn ae", "o gogynh ", " ntsnagWe", "aiatteaav", "shiytWngg",
"vvmthethW", "Wynhvrrht", "tttnheviv", "itg oiWhe", "a enotisn",
"ehaothe h", "stmeathng", "i emranth", "tersggtnh", "oeiehvhh ",
"sngeeetvg", "gyyhWatge", "ritnhengs", "etihi s e", "aoeertyWn",
"eeytitys ", "nmnmegome", "n vitsnot", " h i eoht", "ahghtangh",
"ehgn hynh", "ener aeig", "t niaat g", "agtWh eah", "vehi amae",
"enhnnn hg", "ennWhgnea", "tay hnaah", "igntyvrtv", "niesehahn",
" eoavongr", "hi ehhimm", "yovgianWi", "e tnehngg", "eyehtte n",
"at nimnrg", "enesgennW", "mhahnhyet", "tt amtgna", "hehtsoish",
"hyvtanggv", "et v nssn", "inhnahe h", "onahhraWn", "mn iiahsy",
" mymisnsg", "magWoshgr", "i t eneve", "nghy naen", "eyhsyehea",
"i ihntvea", "ththnWyri", "vntv yran", "ynaieere ", "yenre htW",
"ehyWga g ", "ngeagmenh", " nW ytito", "ermhaagvr", "eeWvtr eg",
"etreaehon", "thtWyerme", "hnveWnrta", "htmr ohee", "stitnthsi",
"snthhWh a", "ehhth iny", "shgoovema", " mseynWee", "netmiitnt",
"nvi eao", "t seWWay", "yngnerarm", "ggenitaeh", "n eaogiag",
"mitnetmnh", "not sine ", "ghmhnyhne", "eattnatgh", "vhatngtts",
"tntmegten", "hreyatert", "ggmneheri", "g y en he", "igrt ggrh",
"mehnssith", "gigstgnym", "iathWh ii", "h atynin ", "eiieWmetg",
"noyggtive", " iotneng ", "oveieteen", "shnagrhti", "itooo aWv",
"toreytnny", " henaaWvn", "shehnrh W", "ttrntehgi", "oWait tn ",
"hhshhnthh", "nogeamnme", "iraah thh", "eto ngvgr", "Wno tseie",
"ehnato eW", "anservnhn", "htsyyoarv", "n aththe", "vaneav h",
"tmttvniri", "gtmhgrtgv", "h tmtnvgt", " nnaiygnr", "httot ami",
"hehnheeis", "ihtaneito", "eogh h yg", "eWgeiimv ", "sgnyisihh",
"r ngangW", "teihyaeee", "hrytWnhgi", "nniaeavmh", "iotrWehn ",
" gnvgorht", "vyinaaen ", "tgniiseae", "14", "86", "51", "54",
"90", "15", "23", "49", "6", "45", "65", "55", "53", "52", "55",
"84", "74", "74", "45", "88", "4", "76", "65", "41", "77", "40",
"66", "39", "80", "6", "35", "56", "40", "57", "90", "66", "59",
"30", "98", "31", "55", "12", "29", "67", "85", "16", "94", "87",
"61", "55", "94", "95", "68", "10", "45", "41", "93", "55", "13",
"12", "80", "45", "59", "23", "45", "1", "68", "89", "86", "68",
"46", "50", "57", "78", "85", "40", "53", "26", "67", "75", "29",
"78", "91", "35", "37", "10", "90", "36", "9", "14", "36", "31",
"5", "57", "90", "65", "48", "80", "20", "13", "92", "62", "72",
"71", "52", "50", "16", "92", "79", "9", "97", "78", "69", "50",
"84", "96", "82", "95", "44", "2", "76", "13", "1", "16", "65",
"75", "91", "30", "60", "62", "97", "86", "82", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "1", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "1", "0", "0", "0", "0",
"0", "0", "0", "0", "1", "0", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "1", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "1", "0", "0", "0", "0", "0", "0", "0",
"0", "1", "0", "0", "0", "1", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "1"
), .Dim = c(133L, 7L), .Dimnames = list(NULL, c("state", "amounts",
"timestamp", "code", "vendor", "risk", "label")))
The problem is that you tried to turn the entire data.frame into a factor, not just 1 column. That resulted in an atomic vector full of junk, hence the error message you received.
This works:
data_train <- as.data.frame(data_train)
data_train$state <- as.factor(data_train$state)
data_train$amounts <- as.factor(as.character(data_train$amounts))
data_train$timestamp <- as.factor(data_train$timestamp)
data_train$code <- as.factor(data_train$code)
data_train$vender <- as.factor(data_train$vender)
data_train$label <- as.factor(data_train$label)
sbrl_model <- sbrl(data_train, iters=20000, pos_sign="1", neg_sign="0",)
create itemset ...
set transactions ...[48 item(s), 8 transaction(s)] done [0.00s].
sorting and recoding items ... [48 item(s)] done [0.00s].
creating sparse bit matrix ... [48 row(s), 8 column(s)] done [0.00s].
writing ... [48 set(s)] done [0.00s].
Creating S4 object ... done [0.00s].
Eclat
parameter specification:
tidLists support minlen maxlen target ext
FALSE 0.1 1 1 frequent itemsets FALSE
algorithmic control:
sparse sort verbose
7 -2 TRUE
Absolute minimum support count: 12
create itemset ...
set transactions ...[469 item(s), 125 transaction(s)] done [0.00s].
sorting and recoding items ... [4 item(s)] done [0.00s].
creating sparse bit matrix ... [4 row(s), 125 column(s)] done [0.00s].
writing ... [4 set(s)] done [0.00s].
Creating S4 object ... done [0.00s].
I have a problem solving this in R. I have this data frame called testa (dput included). I need to match all the letters in column ALT with the colnames (A,C,G,T,N) and get the corresponding values in those column along with the value for REF letters and get the result ad.new (my code does this job).
However, I need to expand this code to solve an issue with the line where the TYPE column has flat at the end. For the row with the flat, I need to match its start id (chr10:102053031) with other ids in start column. If they match, I need to sum up the corresponding value for ALT from A,C,G,T,N column and replace it with ad.new column for the flat line along with the REF value.
If you run the dput and my code you will be able to understand it. So basically, I want to match the letters in REF and ALT columns and get the corresponding values from the columns (A,C,G,T,N) and separate those values by comma for REF and ALT. However (in this example), for flat line I want to sum up the value in column A with matching start id with the start id of flat line (the value in this case is 6) and the value with another match (the value in this case is 7 from G column) and sum them together to give 13. So for flat line my result should be 0,13.
The expected result is also shown below.
my incomplete code:
testa[is.na(testa)]<-0
ref.counts<-testa[,testa[,"REF"]]
ref.counts<-as.matrix(Ref.counts)
ref.counts[is.na(Ref.counts)]<-0
ref.counts<-diag(Ref.counts)
alt.counts<-testa[,testa[,"ALT"]]
alt.counts<-as.matrix(alt.counts)
alt.counts[is.na(alt.counts)]<-0
alt.counts<-diag(alt.counts)
#############
##need to extend this code here
#############
ad.new<-paste(Ref.counts,alt.counts,sep=",")
dput for testa:
structure(c("chr10:101544447", "chr10:102053031", "chr10:102778767",
"chr10:102789831", "chr10:102989480", "chr10:102053031", "chr10:102053031",
"0", "6", "0", "0", "0", "0", "0", "0", "34", "24", "0", "0",
"34", "34", "0", "0", "0", "0", "0", "0", "7", "53", "0", "0",
"30", "12", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",
"chr10", "chr10", "chr10", "chr10", "chr10", "chr10", "chr10",
"101544447", "102053031", "102778767", "102789831", "102989480",
"102053031", "102053031", "A", "C", "C", "C", "C", "C", "C",
"T", "A", "T", "T", "T", "G", "G", "snp", "snp", "snp", "snp",
"snp", "snp:102053031:flat", "snp", "nonsynonymous SNV",
"intronic", "nonsynonymous SNV", "nonsynonymous SNV", "ncRNA_exonic",
"intronic", "intronic", "ABCC2:NM_000392:exon2:c.A116T:p.Y39F,",
"PKD2L1", "PDZD7:NM_024895:exon8:c.G1136A:p.R379Q,PDZD7:NM_001195263:exon8:c.G1136A:p.R379Q,",
"PDZD7:NM_024895:exon2:c.G146A:p.R49Q,PDZD7:NM_001195263:exon2:c.G146A:p.R49Q,",
"LBX1-AS1", "PKD2L1", "PKD2L1"), .Dim = c(7L, 15L), .Dimnames = list(
c("1", "2", "3", "4", "5", "6", "7"), c("start", "A", "C",
"G", "T", "N", "=", "-", "chr", "end", "REF", "ALT", "TYPE",
"refGene::location", "refGene::type")))
Expected result
ad.new
"0,53"
"34,6"
"24,0"
"0,30"
"0,12"
"0,13"
"34,7"
Something like this should work :
# apply the "normal" rule (non considering flat exceptions)
alts <- as.numeric(diag(testa[,testa[,"ALT"]]))
refs <- as.numeric(diag(testa[,testa[,"REF"]]))
res <- paste(refs,alts,sep=",")
# replace lines having TYPE ending with "flat"
flats <- grep('.*flat$',testa[,"TYPE"])
res[flats] <-
unlist(lapply(flats,function(x){
startId <- testa[x,"start"]
selection <- setdiff(which(testa[,"start"] == startId),r)
paste0("0,",sum(alts[selection]))
}))
ad.new <- as.matrix(res)
> ad.new
[,1]
[1,] "0,53"
[2,] "34,6"
[3,] "24,0"
[4,] "0,30"
[5,] "0,12"
[6,] "0,13"
[7,] "34,7"