R looping matrix elements - r

I'm trying to loop over matrix columns.
date <- rbind("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", "2000-01-05", "2000-01-06", "2000-01-07", "2000-01-08", "2000-01-09", "2000-01-10", "2000-01-11", "2000-01-12")
a1 <- rbind("0", "0", "0", "0", "6421", "41", "5667", "44", "1178", "0", "1070", "1")
b1 <- rbind("1", "1", "1", "1", "6421", "41", "5667", "44", "1178", "0", "1070", "1")
hb1 <- rbind("2", "2", "2", "2", "6421", "41", "5667", "44", "1178", "0", "1070", "1")
a2 <- rbind("0", "0", "0", "0", "6421", "41", "5667", "44", "1178", "0", "1070", "1")
b2 <- rbind("1", "1", "1", "1", "6421", "41", "5667", "44", "1178", "0", "1070", "1")
hb2 <- rbind("2", "2", "2", "2", "6421", "41", "5667", "44", "1178", "0", "1070", "1")
a3 <- rbind("0", "0", "0", "0", "6421", "41", "5667", "44", "1178", "0", "1070", "1")
b3 <- rbind("1", "1", "1", "1", "6421", "41", "5667", "44", "1178", "0", "1070", "1")
hb3 <- rbind("2", "2", "2", "2", "6421", "41", "5667", "44", "1178", "0", "1070", "1")
a4 <- rbind("0", "0", "0", "0", "6421", "41", "5667", "44", "1178", "0", "1070", "1")
b4 <- rbind("1", "1", "1", "1", "6421", "41", "5667", "44", "1178", "0", "1070", "1")
hb4 <- rbind("2", "2", "2", "2", "6421", "41", "5667", "44", "1178", "0", "1070", "1")
info_mat <- cbind(date, a1, b1, hb1, a2, b2, hb2, a3, b3, hb3, a4, b4, hb4)
print(info_mat)
I want to compute an evolution rate (V+1 - V)/V between the months for each variable
(evolution from January to Feb, Feb to March, ..., for a1, ..., hb4)
and get the result in a matrix that I will name "evolution_matrix"
I tried the following but for some reason it won't work.
Note that i represents here the fact that I want to perform the evolution for every variable. I think of i as being:
Evolution(January to February for variable a1) =
(value of a1 in February - value of a1 in January)/(value of a1 in January).
I don't know how to model it therefore I put i, but it doesn't refer to anything in the matrix.
for(row in 1:nrow(info_mat)) {
for(col in 1:ncol(info_mat)) {
evolution[[i]] = (info_mat[i+1] - info_mat[i] )/info_mat[i]
print(evolution[[i]])
}
}
Help please!

Why do you use matrix? You have only character (string) variables in matrix, but you want to use them as numbers. I think data.frame is good idea.
R package dplyr has function lapply which can apply your function to each column and simplify the result by list. But we don't want to apply 'evolution' function for column date.
evolution <- as.data.frame(info_mat)[, -1] %>%
lapply(function(x) {x = as.numeric(x); (x - lag(x)) / lag(x)}) %>%
as.data.frame()
In the last line I convert list to data.frame (for beautiful printing).
But we forgot about 'date' column. Let's add it into our data.frame.
evolution <- bind_cols(data.frame(date = date), evolution)
That is all. But if you want to do it by loop you can use this code:
evolution <- matrix(NA, nrow(info_mat), ncol(info_mat))
evolution[, 1] <- date
for(row in 2:nrow(info_mat)) {
for(col in 2:ncol(info_mat)) {
evolution[row, col] = as.numeric(info_mat[row, col])/as.numeric(info_mat[row - 1, col]) - 1
}
}
Comments about your example of code:
you have no variable i and don't use variables row and col.
what is the type of evolution variable?
info_mat[i+1] is not numeric. You cannot divide it on info_mat[i].
What does info_mat[i] means? Yes, info_mat[row, col] is equal to info_mat[(col - 1)* 12 (number of rows) + row] but info_mat[i] and info_mat[i + 1] can be in different columns.
And if you want to create data.frame with you data use this code:
df = data.frame(
data = c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", "2000-01-05", "2000-01-06", "2000-01-07", "2000-01-08", "2000-01-09", "2000-01-10", "2000-01-11", "2000-01-12"),
a1 = c(0, 0, 0, 0, 6421, 41, 5667, 44, 1178, 0, 1070, 1),
b1 = c(1, 1, 1, 1, 6421, 41, 5667, 44, 1178, 0, 1070, 1)
)

Related

How to convert a range of columns from Character to Number/Integer in R

I am tryin to convert a few columns which are in a range from Character to Integer. I dont want to write each column as.integer.
I am trying to find a more effective way where I can pass the the column names which I want to convert and then convert them into integer.
Is this doable in R? Or Should I do it one column after the other.
The Expected output:
Convert a range of data which is in char to Integer.
Convert a few columns without using passing them as range but rather as individual columns.
The code I wrote is given below:
library(readxl)
Final <- read_excel("C:/X/X/X- X/Desktop/Final.xlsx")
First_Date <- colnames(Final)[4]
Last_Date <- tail(colnames(Final),1)
str(Final)
Final <- Final %>%
mutate_if(c(First_Date:Last_Date),as.numeric)
The data I am working with is given below:
structure(list(UniqueID = c("3F-FA|807905", "3F-FA|808005", "3F-FA|808006",
"3F-FA|808007", "Py_AuAriFa|761403", "3F-FA|761502", "AutoTheta|761602",
"3F-FA|318901", "3F-FA|339401"), Xreg = c("3F-FA", "3F-FA", "3F-FA",
"3F-FA", "Py_AuAriFa", "3F-FA", "AutoTheta", "3F-FA", "3F-FA"
), Row = c("807905", "808005", "808006", "808007", "761403",
"761502", "761602", "318901", "339401"), `2023-02-01` = c("0",
"0", "0", "0", "50", "1", "7", "0", "0"), `2023-03-01` = c("0",
"0", "0", "0", "32", "1", "7", "0", "0"), `2023-04-01` = c("0",
"0", "0", "0", "36", "1", "7", "0", "0"), `2023-05-01` = c("0",
"0", "0", "0", "41", "1", "7", "0", "0"), `2023-06-01` = c("0",
"0", "0", "0", "31", "1", "6", "0", "0"), `2023-07-01` = c("0",
"0", "0", "0", "38", "1", "6", "0", "0"), `2023-08-01` = c("0",
"0", "0", "0", "34", "1", "6", "0", "0"), `2023-09-01` = c("0",
"0", "0", "0", "32", "1", "6", "0", "0"), `2023-10-01` = c("0",
"0", "0", "0", "35", "1", "5", "0", "0")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -9L))
The columns I am trying to convert is from 2023-02-01 to 2023-10-01. I cant use mutateif and pass it through the whole dataframe as the column Row has data which are character and can be converted to integer but should not be converted. Hence the selected few columns.
We can match the patterns in the column names to loop over those column and modify the class
library(dplyr)
Final <- Final %>%
mutate(across(matches("^\\d{4}-\\d{2}-\\d{2}$"), as.integer))
Or use the :
Final <- Final %>%
mutate(across("2023-02-01":"2023-10-01", as.integer))

Plotting multiple binary variables on the same plot in ggplot

I am hoping to use ggplot to construct a barplot of frequencies (or just % 1s) of a bunch of binary variables, and am having trouble getting them all together on one plot.
The variables all stem from the same question in a survey, so ideally it'd be nice to have data that is tidy with one column for this variable, but respondents could select more than one option and I'm hoping to retain that instead of having a "more than one selected" option. Here is a slice of the data:
structure(list(gender = structure(c("Male", "Male", "Female",
"Female", "Female", "Female", "Male", "Male", "Male", "Male"), label = "Q4", format.stata = "%24s"),
var1 = structure(c("0", "0", "1", "1", "0", "0", "0", "0",
"0", "0"), format.stata = "%9s"), var2 = structure(c("0",
"98", "1", "0", "0", "0", "0", "0", "0", "0"), format.stata = "%9s"),
var3 = structure(c("0", "0", "0", "0", "0", "0", "0", "0",
"0", "0"), format.stata = "%9s"), var4 = structure(c("1",
"0", "1", "0", "0", "0", "1", "1", "0", "0"), format.stata = "%9s"),
var5 = structure(c("1", "0", "0", "0", "0", "1", "0", "0",
"0", "0"), format.stata = "%9s")), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))
Get the data in long format so that it is easier to plot.
library(tidyverse)
df %>%
pivot_longer(cols = starts_with('var')) %>%
group_by(name) %>%
summarise(frequency_of_1 = sum(value == 1)) %>%
#If you need percentage use mean instead of sum
#summarise(frequency_of_1 = mean(value == 1)) %>%
ggplot() + aes(name, frequency_of_1) + geom_col()
In base R you can do this with colSums and barplot.
barplot(colSums(df[-1] == 1))
#For percentage
#barplot(colMeans(df[-1] == 1))

Merge horizontally only for certain columns flextable

I have the following table:
tmp <- structure(list(SOC = c("Blood", "", "", "Gast", "", "", "", "Skin",
"", "", "Adverse Event"), `Adverse Event` = c("Blood", "Raised Alt", "Raised Ast",
"Gast", "Bloating", "Diarrhoia", "Vomiting", "Skin", "Reddness",
"Rash", "Any Adverse Event"), C11 = c("", "0", "0", "", "0",
"2", "0", "", "0", "0", "2"), C21 = c("", "0", "0", "", "1",
"0", "1", "", "1", "0", "3"), T1 = c("", "0", "0", "", "1", "2",
"1", "", "1", "0", "3"), C12 = c("", "1", "0", "", "0", "0",
"0", "", "0", "1", "2"), C22 = c("", "0", "0", "", "0", "0",
"1", "", "0", "0", "1"), T2 = c("", "1", "0", "", "0", "0", "1",
"", "0", "1", "2"), C23 = c("", "0", "1", "", "0", "0", "0",
"", "0", "0", "1"), T3 = c("", "0", "1", "", "0", "0", "0", "",
"0", "0", "1"), C14 = c("", "1", "0", "", "0", "0", "0", "",
"0", "0", "1"), T4 = c("", "1", "0", "", "0", "0", "0", "", "0",
"0", "1")), row.names = c(NA, 11L), class = "data.frame")
I have turned it into a flextable like this:
tmp %>% regulartable()
And now I am trying to horizontally merge the matching values ONLY in the SOC and Adverse Event columns.
I have tried using merge_h() but that doesn't give me the option to select certain columns, so it merges all of the other columns as well if there are duplicated values.
I have tried merge_at() but it doesn't work if all of the i and j values are not consecutive, which mine wont be.
Does anyone know of a way to only make merge_h() apply to certain columns? Or any other way of achieving what I'm after?
EDIT: I'm trying to make a flextable that looks a bit like this, but without any of the numeric columns being merged. As you can see in the bottom right hand corner all of the 1's have been merged. I just want the first two columns to merge so I can create the indentation effect.
You can create a for-loop iterating over the lines in question and then only merge the first two columns:
lines <- c(1, 5, 7, 10)
for (ll in lines){
tmp <- merge_at(i = ll, j = 1:2, part = "body")
}
Might not be the most elegant, but it will do what you need

Rule Learning using SBRL in R

I'm trying to use the Scalable Bayesian Rule Lists Model for creating some rule lists in R.
Link to package: SBRL Package R
I read data into a list, split into train and test and plug into the function
sbrl_model <- sbrl(data_train,iters=20000, pos_sign="1", neg_sign="0",)
which gives me the following error:
Error in asMethod(object) :
column(s) 1, 2, 4, 6 not logical or a factor. Discretize the columns first.
When I convert the data_train into a factor and try using:
data_train <- sapply(data_train, as.factor)
sbrl_model <- sbrl::sbrl(data_train, iters=20000, pos_sign="1", neg_sign="0",)
I get the following error:
Error in data_train$label : $ operator is invalid for atomic vectors
My data has the following columns:
state, amounts, timestamp, code, risk, vendor, label
The label is 0 or 1. I need to create rules for detecting what data leads to a 1.
I'm new to R so this seems confusing. If I don't convert to factors, it complains, if I do it can't use the "$" operator. Any ideas what I'm doing wrong? Thank you
> dput(data_train)
structure(c("PR", "PR", "PR", "PR", "MA", "MA", "NH", "NH", "ME",
"ME", "ME", "VT", "VT", "CT", "CT", "NJ", "NJ", "NY", "NY", "NY",
"NY", "NY", "NY", "NY", "PA", "PA", "PA", "PA", "PA", "PA", "PA",
"PA", "PA", "DE", "VA", "VA", "VA", "WV", "WV", "WV", "WV", "WV",
"WV", "WV", "WV", "WV", "WV", "WV", "WV", "WV", "WV", "WV", "WV",
"WV", "WV", "WV", "GA", "GA", "FL", "FL", "FL", "FL", "FL", "FL",
"AL", "AL", "AL", "TN", "TN", "TN", "MS", "MS", "MS", "KY", "KY",
"KY", "KY", "KY", "KY", "KY", "KY", "KY", "OH", "OH", "OH", "OH",
"OH", "OH", "OH", "OH", "OH", "OH", "OH", "OH", "OH", "OH", "IN",
"IA", "IA", "IA", "IA", "WI", "MN", "MN", "MN", "MN", "MN", "SD",
"SD", "ND", "ND", "ND", "ND", "ND", "MO", "MO", "MO", "MO", "MO",
"MO", "MO", "MO", "MO", "MO", "MO", "MO", "KS", "KS", "KS", "KS",
"KS", "KS", "KS", "16441", "92946", "8970", "19937", "94589",
"50615", "75915", "50005", "23037", "14835", "83678", "66263",
"60818", "82760", "42137", "32888", "35385", "20242", "98269",
"16216", "76562", "49327", "30699", "1866", "91301", "75125",
"34016", "88673", "78612", "85008", "91030", "57276", "96772",
"79568", "59489", "14154", "71655", "78163", "41673", "19942",
"19364", "34004", "79349", "1611", "8875", "19673", "5422", "42395",
"11899", "26967", "73499", "79916", "71015", "73640", "39759",
"7735", "84853", "31662", "43183", "44787", "79001", "82999",
"17031", "88109", "62215", "56040", "66592", "59148", "20786",
"30106", "46561", "9125", "83512", "60031", "65233", "49512",
"8893", "46275", "11362", "29867", "61573", "46363", "91510",
"19267", "45554", "41193", "54267", "8045", "28089", "62450",
"69082", "66685", "80769", "15446", "62589", "42875", "74723",
"2934", "18540", "96540", "60812", "50636", "90924", "60556",
"90009", "15287", "35529", "28702", "82102", "96967", "5296",
"64804", "48743", "10867", "60914", "83678", "77883", "97631",
"97175", "48103", "63128", "46774", "18285", "74512", "69313",
"80414", "32394", "51103", "51155", "28672", "38460", "89024",
"49443", "2016-01-23 12:14:07", "2016-01-17 19:22:37", "2016-01-23 22:41:32",
"2016-01-27 09:58:34", "2016-01-30 08:40:06", "2016-01-28 01:41:40",
"2016-01-27 08:22:27", "2016-01-28 00:13:48", "2016-01-20 12:31:12",
"2016-01-17 08:25:30", "2016-01-28 13:01:36", "2016-01-20 12:10:46",
"2016-01-25 07:32:01", "2016-01-23 02:13:11", "2016-01-24 11:14:46",
"2016-01-16 20:59:35", "2016-01-19 20:12:58", "2016-01-19 06:38:06",
"2016-01-27 10:15:48", "2016-01-26 14:00:30", "2016-01-28 01:54:45",
"2016-01-27 05:43:58", "2016-01-25 22:07:06", "2016-01-18 09:58:05",
"2016-01-20 05:56:54", "2016-01-26 08:05:32", "2016-01-28 14:18:45",
"2016-01-22 06:25:48", "2016-01-27 18:05:50", "2016-01-16 11:33:47",
"2016-01-22 03:31:52", "2016-01-23 05:41:37", "2016-01-27 00:55:22",
"2016-01-16 17:19:51", "2016-01-18 10:05:42", "2016-01-22 10:20:16",
"2016-01-26 21:07:20", "2016-01-17 19:12:00", "2016-01-19 17:59:45",
"2016-01-28 08:50:18", "2016-01-16 09:31:52", "2016-01-24 14:50:13",
"2016-01-17 14:02:36", "2016-01-20 17:08:29", "2016-01-25 16:42:03",
"2016-01-19 04:18:27", "2016-01-20 03:05:13", "2016-01-26 23:34:33",
"2016-01-26 13:44:56", "2016-01-16 07:09:41", "2016-01-26 06:43:12",
"2016-01-26 20:22:25", "2016-01-23 05:58:38", "2016-01-19 23:21:00",
"2016-01-16 08:36:10", "2016-01-30 01:21:00", "2016-01-23 11:10:06",
"2016-01-27 15:29:30", "2016-01-30 15:50:38", "2016-01-19 08:32:33",
"2016-01-19 18:18:02", "2016-01-21 14:20:47", "2016-01-17 13:19:59",
"2016-01-20 05:49:06", "2016-01-16 15:54:17", "2016-01-21 09:15:42",
"2016-01-16 07:32:39", "2016-01-28 03:49:00", "2016-01-26 00:19:56",
"2016-01-25 10:29:44", "2016-01-23 06:26:45", "2016-01-29 08:03:34",
"2016-01-22 14:24:34", "2016-01-16 18:44:43", "2016-01-26 00:00:51",
"2016-01-20 17:38:03", "2016-01-17 22:38:47", "2016-01-30 10:12:01",
"2016-01-21 17:00:43", "2016-01-22 08:43:30", "2016-01-27 12:04:58",
"2016-01-25 21:09:40", "2016-01-27 16:35:42", "2016-01-27 20:09:03",
"2016-01-27 09:52:40", "2016-01-26 16:12:37", "2016-01-28 16:57:29",
"2016-01-30 13:48:47", "2016-01-30 19:15:03", "2016-01-24 19:33:56",
"2016-01-28 06:57:55", "2016-01-22 18:21:40", "2016-01-16 02:54:57",
"2016-01-23 08:18:44", "2016-01-20 13:47:54", "2016-01-24 16:23:39",
"2016-01-24 19:15:09", "2016-01-22 14:59:14", "2016-01-30 10:21:43",
"2016-01-27 11:54:39", "2016-01-30 15:19:59", "2016-01-24 19:21:48",
"2016-01-27 07:20:14", "2016-01-25 07:11:55", "2016-01-24 22:33:42",
"2016-01-26 14:30:57", "2016-01-16 13:12:46", "2016-01-28 11:25:45",
"2016-01-28 14:44:25", "2016-01-23 03:25:10", "2016-01-26 13:45:49",
"2016-01-19 06:14:21", "2016-01-25 22:12:29", "2016-01-25 12:13:07",
"2016-01-22 23:56:39", "2016-01-24 07:51:51", "2016-01-24 10:50:30",
"2016-01-21 07:02:41", "2016-01-21 09:52:54", "2016-01-26 22:35:52",
"2016-01-19 06:48:13", "2016-01-19 15:18:21", "2016-01-20 12:20:37",
"2016-01-16 07:04:34", "2016-01-24 10:20:05", "2016-01-25 09:01:09",
"2016-01-21 17:02:29", "2016-01-21 11:52:00", "2016-01-27 19:39:16",
"2016-01-19 18:33:35", "2016-01-18 06:00:23", "2016-01-17 01:27:11",
"2016-01-18 10:27:57", "3355", "4935", "5454", "9555", "5938",
"5855", "4888", "3885", "8533", "4359", "5339", "5554", "5894",
"8598", "5448", "9535", "3495", "3358", "3485", "3344", "8489",
"8553", "3354", "5889", "5948", "8455", "5988", "5595", "9354",
"8485", "4559", "4838", "5585", "5585", "8554", "8598", "5535",
"5355", "5844", "3485", "5885", "8833", "8558", "9889", "9885",
"8555", "3938", "8343", "8558", "5484", "3558", "3545", "8394",
"9933", "3853", "4598", "3855", "5845", "5588", "5495", "8585",
"9584", "3385", "8858", "9445", "8488", "8558", "5838", "5848",
"8845", "8848", "8945", "4599", "8585", "8858", "4598", "5358",
"5395", "9485", "4893", "4455", "8493", "9358", "5395", "8958",
"5888", "8888", "8555", "4885", "3538", "8998", "4445", "4838",
"9885", "3559", "5584", "9594", "8558", "3844", "5434", "8558",
"9898", "4395", "9585", "3858", "4858", "5895", "9383", "9858",
"8385", "5585", "4884", "8359", "8893", "3484", "8383", "5338",
"3544", "9859", "9454", "3539", "3583", "8455", "5983", "4345",
"4943", "5548", "8353", "8993", "8594", "8994", "3958", "3989",
"W sWn ae", "o gogynh ", " ntsnagWe", "aiatteaav", "shiytWngg",
"vvmthethW", "Wynhvrrht", "tttnheviv", "itg oiWhe", "a enotisn",
"ehaothe h", "stmeathng", "i emranth", "tersggtnh", "oeiehvhh ",
"sngeeetvg", "gyyhWatge", "ritnhengs", "etihi s e", "aoeertyWn",
"eeytitys ", "nmnmegome", "n vitsnot", " h i eoht", "ahghtangh",
"ehgn hynh", "ener aeig", "t niaat g", "agtWh eah", "vehi amae",
"enhnnn hg", "ennWhgnea", "tay hnaah", "igntyvrtv", "niesehahn",
" eoavongr", "hi ehhimm", "yovgianWi", "e tnehngg", "eyehtte n",
"at nimnrg", "enesgennW", "mhahnhyet", "tt amtgna", "hehtsoish",
"hyvtanggv", "et v nssn", "inhnahe h", "onahhraWn", "mn iiahsy",
" mymisnsg", "magWoshgr", "i t eneve", "nghy naen", "eyhsyehea",
"i ihntvea", "ththnWyri", "vntv yran", "ynaieere ", "yenre htW",
"ehyWga g ", "ngeagmenh", " nW ytito", "ermhaagvr", "eeWvtr eg",
"etreaehon", "thtWyerme", "hnveWnrta", "htmr ohee", "stitnthsi",
"snthhWh a", "ehhth iny", "shgoovema", " mseynWee", "netmiitnt",
"nvi eao", "t seWWay", "yngnerarm", "ggenitaeh", "n eaogiag",
"mitnetmnh", "not sine ", "ghmhnyhne", "eattnatgh", "vhatngtts",
"tntmegten", "hreyatert", "ggmneheri", "g y en he", "igrt ggrh",
"mehnssith", "gigstgnym", "iathWh ii", "h atynin ", "eiieWmetg",
"noyggtive", " iotneng ", "oveieteen", "shnagrhti", "itooo aWv",
"toreytnny", " henaaWvn", "shehnrh W", "ttrntehgi", "oWait tn ",
"hhshhnthh", "nogeamnme", "iraah thh", "eto ngvgr", "Wno tseie",
"ehnato eW", "anservnhn", "htsyyoarv", "n aththe", "vaneav h",
"tmttvniri", "gtmhgrtgv", "h tmtnvgt", " nnaiygnr", "httot ami",
"hehnheeis", "ihtaneito", "eogh h yg", "eWgeiimv ", "sgnyisihh",
"r ngangW", "teihyaeee", "hrytWnhgi", "nniaeavmh", "iotrWehn ",
" gnvgorht", "vyinaaen ", "tgniiseae", "14", "86", "51", "54",
"90", "15", "23", "49", "6", "45", "65", "55", "53", "52", "55",
"84", "74", "74", "45", "88", "4", "76", "65", "41", "77", "40",
"66", "39", "80", "6", "35", "56", "40", "57", "90", "66", "59",
"30", "98", "31", "55", "12", "29", "67", "85", "16", "94", "87",
"61", "55", "94", "95", "68", "10", "45", "41", "93", "55", "13",
"12", "80", "45", "59", "23", "45", "1", "68", "89", "86", "68",
"46", "50", "57", "78", "85", "40", "53", "26", "67", "75", "29",
"78", "91", "35", "37", "10", "90", "36", "9", "14", "36", "31",
"5", "57", "90", "65", "48", "80", "20", "13", "92", "62", "72",
"71", "52", "50", "16", "92", "79", "9", "97", "78", "69", "50",
"84", "96", "82", "95", "44", "2", "76", "13", "1", "16", "65",
"75", "91", "30", "60", "62", "97", "86", "82", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "1", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "1", "0", "0", "0", "0",
"0", "0", "0", "0", "1", "0", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "1", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "1", "0", "0", "0", "0", "0", "0", "0",
"0", "1", "0", "0", "0", "1", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "1"
), .Dim = c(133L, 7L), .Dimnames = list(NULL, c("state", "amounts",
"timestamp", "code", "vendor", "risk", "label")))
The problem is that you tried to turn the entire data.frame into a factor, not just 1 column. That resulted in an atomic vector full of junk, hence the error message you received.
This works:
data_train <- as.data.frame(data_train)
data_train$state <- as.factor(data_train$state)
data_train$amounts <- as.factor(as.character(data_train$amounts))
data_train$timestamp <- as.factor(data_train$timestamp)
data_train$code <- as.factor(data_train$code)
data_train$vender <- as.factor(data_train$vender)
data_train$label <- as.factor(data_train$label)
sbrl_model <- sbrl(data_train, iters=20000, pos_sign="1", neg_sign="0",)
create itemset ...
set transactions ...[48 item(s), 8 transaction(s)] done [0.00s].
sorting and recoding items ... [48 item(s)] done [0.00s].
creating sparse bit matrix ... [48 row(s), 8 column(s)] done [0.00s].
writing ... [48 set(s)] done [0.00s].
Creating S4 object ... done [0.00s].
Eclat
parameter specification:
tidLists support minlen maxlen target ext
FALSE 0.1 1 1 frequent itemsets FALSE
algorithmic control:
sparse sort verbose
7 -2 TRUE
Absolute minimum support count: 12
create itemset ...
set transactions ...[469 item(s), 125 transaction(s)] done [0.00s].
sorting and recoding items ... [4 item(s)] done [0.00s].
creating sparse bit matrix ... [4 row(s), 125 column(s)] done [0.00s].
writing ... [4 set(s)] done [0.00s].
Creating S4 object ... done [0.00s].

R code challenge: retrieving the values in matching columns and sum them up with matching rows

I have a problem solving this in R. I have this data frame called testa (dput included). I need to match all the letters in column ALT with the colnames (A,C,G,T,N) and get the corresponding values in those column along with the value for REF letters and get the result ad.new (my code does this job).
However, I need to expand this code to solve an issue with the line where the TYPE column has flat at the end. For the row with the flat, I need to match its start id (chr10:102053031) with other ids in start column. If they match, I need to sum up the corresponding value for ALT from A,C,G,T,N column and replace it with ad.new column for the flat line along with the REF value.
If you run the dput and my code you will be able to understand it. So basically, I want to match the letters in REF and ALT columns and get the corresponding values from the columns (A,C,G,T,N) and separate those values by comma for REF and ALT. However (in this example), for flat line I want to sum up the value in column A with matching start id with the start id of flat line (the value in this case is 6) and the value with another match (the value in this case is 7 from G column) and sum them together to give 13. So for flat line my result should be 0,13.
The expected result is also shown below.
my incomplete code:
testa[is.na(testa)]<-0
ref.counts<-testa[,testa[,"REF"]]
ref.counts<-as.matrix(Ref.counts)
ref.counts[is.na(Ref.counts)]<-0
ref.counts<-diag(Ref.counts)
alt.counts<-testa[,testa[,"ALT"]]
alt.counts<-as.matrix(alt.counts)
alt.counts[is.na(alt.counts)]<-0
alt.counts<-diag(alt.counts)
#############
##need to extend this code here
#############
ad.new<-paste(Ref.counts,alt.counts,sep=",")
dput for testa:
structure(c("chr10:101544447", "chr10:102053031", "chr10:102778767",
"chr10:102789831", "chr10:102989480", "chr10:102053031", "chr10:102053031",
"0", "6", "0", "0", "0", "0", "0", "0", "34", "24", "0", "0",
"34", "34", "0", "0", "0", "0", "0", "0", "7", "53", "0", "0",
"30", "12", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",
"chr10", "chr10", "chr10", "chr10", "chr10", "chr10", "chr10",
"101544447", "102053031", "102778767", "102789831", "102989480",
"102053031", "102053031", "A", "C", "C", "C", "C", "C", "C",
"T", "A", "T", "T", "T", "G", "G", "snp", "snp", "snp", "snp",
"snp", "snp:102053031:flat", "snp", "nonsynonymous SNV",
"intronic", "nonsynonymous SNV", "nonsynonymous SNV", "ncRNA_exonic",
"intronic", "intronic", "ABCC2:NM_000392:exon2:c.A116T:p.Y39F,",
"PKD2L1", "PDZD7:NM_024895:exon8:c.G1136A:p.R379Q,PDZD7:NM_001195263:exon8:c.G1136A:p.R379Q,",
"PDZD7:NM_024895:exon2:c.G146A:p.R49Q,PDZD7:NM_001195263:exon2:c.G146A:p.R49Q,",
"LBX1-AS1", "PKD2L1", "PKD2L1"), .Dim = c(7L, 15L), .Dimnames = list(
c("1", "2", "3", "4", "5", "6", "7"), c("start", "A", "C",
"G", "T", "N", "=", "-", "chr", "end", "REF", "ALT", "TYPE",
"refGene::location", "refGene::type")))
Expected result
ad.new
"0,53"
"34,6"
"24,0"
"0,30"
"0,12"
"0,13"
"34,7"
Something like this should work :
# apply the "normal" rule (non considering flat exceptions)
alts <- as.numeric(diag(testa[,testa[,"ALT"]]))
refs <- as.numeric(diag(testa[,testa[,"REF"]]))
res <- paste(refs,alts,sep=",")
# replace lines having TYPE ending with "flat"
flats <- grep('.*flat$',testa[,"TYPE"])
res[flats] <-
unlist(lapply(flats,function(x){
startId <- testa[x,"start"]
selection <- setdiff(which(testa[,"start"] == startId),r)
paste0("0,",sum(alts[selection]))
}))
ad.new <- as.matrix(res)
> ad.new
[,1]
[1,] "0,53"
[2,] "34,6"
[3,] "24,0"
[4,] "0,30"
[5,] "0,12"
[6,] "0,13"
[7,] "34,7"

Resources