discretize function of multiple columns - r

I have the following csv:
https://github.com/antonio1695/Python/blob/master/nearBPO/facturasprueba.csv
With it I want to use the apriori function to find association rules. However, I get the error:
Error in asMethod(object) :
column(s) 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 not logical or a factor. Discretize the columns first.
I have already bumped into this error before, and what I did was:
dataframe$columnX <- discretize(df$columnX)
However, this only works if I select manually each column and discretize them one by one. I would like to do the same thing but for aprox 3k columns. The case I gave you has only 11, I'm guessing that 11 will do.

I found the answer, thanks for everyones help though. To select and discretize multiple columns:
for (i in 2:12){df[,i]<-discretize(df[,i])}

Related

How to find the index of an array, where the element has value x, in R

I have a very large array (RFO_2003; dim = c(360, 180, 13, 12)) of numeric data. I made the array using a for-loop that does some calculations based another array. I am trying to check some samples of data in this array to ensure I have generated it properly.
To do this, I want to apply a function that returns the index of the array where that element equals a specific value. For example, I want to start by looking at a few examples where the value == 100.
I tried
which(RFO_2003 == 100)
That returned (first line of results)
[1] 459766 460208 460212 1177802 1241374 1241498 1241499 1241711 1241736 1302164 1302165
match gave the same results. What I was expecting was something more like
[8, 20, 3, 6], [12, 150, 4, 7], [16, 170, 4, 8]
Is there a way to get the indices in that format?
My searches have found solutions in other languages, lots of stuff on vectors, or the index is never output, it is immediately fed into another part of a custom function so I can't see which part would output the index in a way I understand, such as this question, although that one also returns dimnames not an index.

How to get a concrete isomorphism (renaming) in the igraph R package while searching for isomorphic subgraphs?

I have two undirected graphs.
require (igraph)
gsmall <- graph(c(1,3,5,8,3,5), directed = F)
gbig <- graph(c(3, 5, 3, 10, 4, 5, 4, 10, 5, 7, 5, 8, 5, 9, 7, 10, 8, 10, 9, 10), directed = F)
Now I want to know if gbig contains a subgraph which is isomorphic to gsmall. Or to put it precise I want one specific mapping (if it exists).
In the igraph R-package this can be done with the subgraph_isomorphisms function. The problem is that this function gives me all isomorphisms which is expensive already in this small example.
So I tried graph.subisomorphic.lad(gsmall, gbig, all.maps =F) which gives me
$iso
[1] TRUE
$map
[1] 3 1 10 6 9 8 4 5
$maps
NULL
as a result. Supposedly $map contains the information I need. But I don't know how to use these numbers to generate a renaming of nodes from gsmall such that the renamed version of gsmall is actually a subgraph of gbig. I have the same translation problem with the output of subgraph_isomorphisms which according to the help returns a 'list of vertex sequences, corresponding to all mappings from the first graph to the second' which I don't understand.
Can anyone tell me how to get that renaming I want? If I am right with the assumption that the $map entry of the result of graph.subisomorphic.lad(gsmall, gbig, all.maps =F) contains what I need how can I get that renaming from that point on? If not how to achieve it in another way?
Thanks in advance.

subseting columns from a data frame in R

I have a huge data frame, but I only need some columns to work on. my code:
outcome_data<- read.csv("dat.csv", colClasses= "character")
interested_data<- outcome_data[, c(1, 2, 7, 11, 17, 23)]
is giving me this error when I run it in my function:
Error in data.frame(list(Provider.Number = c("450690", "450358", "450820", : arguments imply differing number of rows: 370, 0
But works fine in interactive mode.
Any other alternative? or how to fix this?
data.table:::fread(data, select, ...)
select Vector of column names or numbers to keep, drop the rest.
etc.
fread(data, select=c("A","D"))
fread(data, select=c(1,4))

chisq.test() on transition matrix for point-of-gaze

All,
I am trying to do a chisq.test() for eye data in a transition matrix where each row represents the tally of gaze from one area of 7 areas of interest (AoIs) to each of the others. In this analysis, it makes no sense for there to be a transition from one AoI to itself. Hence, those fields contain NAs.
I have tried a variety of different formats from a basic tabular input of 8 columns and rows (with the top row being the headers and the left column being the "from's"), to a simple three column data from (from, to, values).
My data.frame looks like this:
from <- c("frLS", "frLF", "frRF", "frRS", "frIns", "frEng", "frOthr")
frLS <- c(NA, 77,3, 0, 17, 0, 1)
frLF <- c(18, NA, 14, 1, 56, 2, 9)
frRF <- c(1, 52, NA, 15, 16, 1, 14)
frRS <- c(0, 7, 35, NA, 13, 15, 30)
frIns <- c(3, 54, 2, 1, NA, 4, 37)
frEng <- c(0, 9, 0, 3, 27, NA, 61)
frOthr <- c(2, 60, 2, 5, 27, 4, NA)
aoi.df <- data.frame(from, frLS, frLF, frRF, frRS, frIns, frEng, frOthr)
(Note that this is not actual data, but example data taken from Holmqvist's et al., textbook on Eye Tracking.)
Note I have also tried this as a matrix
aoi.matrix <- matrix(c(frLS, frLF, frRF, frRS, frIns, frEng, frOthr), ncol=7)
But I believe the problem is the NAs not the form of the data but, if that is the case, I am not sure how to handle it.
The NAs indeed is the problem. The error message is quite clear:
> chisq.test(aoi.matrix)
Error in chisq.test(aoi.matrix) :
all entries of 'x' must be nonnegative and finite
Either you need to substitute the NA with something else, say, 0 if that makes sense.
Now, I don't quite understand your problem. But are you sure that a chisq.test is what you want to do? It doesn't make any sense to me. Recall that you're testing for independence. However, if the diagonal elements always are zero or NA, then they cannot be independent.
Okay, here is how to handle a chisq.test with NAs. One thing I did not know when I asked this question is that the NAs in my matrix are what are called "structural zeros." Hence, they are not zeros as "zero" is a count nor are they some unexplained blip in data collection. Rather, they arise from the structure of the data set. In the case of the transition matrix, we do not allow a transition from object "A" to itself, only to other objects.
All of that said, it turns out that there is (of course) an R package for that!! I need to refer you to the aylmer documentation for a more detailed explanation, but I pretty much got what I was hoping that the chi.square would give me from:
aylmer.test(aoi.df, alternative = "two.sided", simulate.p.value = TRUE)
Note that I did have to remove the first column of "from" names, but other than that things worked just fine.

create new vector from existing vectors by using "rep"

Suppose I have the following two vectors,
a<-c(2,3,5)
b<-c(1,3,2)
Now I want to create a new vector c with this results from a and b,
2, 3, 3, 3, 5, 5
I tried this code, but it just does not work, I am stocked here. Help please. How can I get the results showed above?
for (i in 1:3){
c<-rep(a[i], each=b[i])
}
rep(a,b) is what you're looking for.

Resources