I am trying to solve a question in which i have given an array of unsorted element and it ask to choose
three pairwise distinct index from array at a time to sort the array.
So my question is what is Pairwise Distinct ?
A collection of k items are pairwise distinct if no two of them are equal to one another. For example, the values 1, 2, and 3 are pairwise distinct, but the values 1, 1, and 3 are not.
Related
I have a database that contains many categories with each category associated with an ID number. I am looking for a method to assign ID numbers to the categories where if you were to add any of the assigned ID numbers together, it would create a unique sum. In turn, that unique sum could be used to go backwards and identify which categories were added together. It should not be limited to just two elements summed together as well; it would be 2, 3...n elements summed together. For example, if I summed four category ID numbers together, that sum would be unique to that combination of four categories only; no other combination of categories, four or otherwise, would result in the same sum.
An applied example of this could be a list or table of races. Each race would have an ID number associated with it. If a person identified themselves as being more than one race, e.g. white and black (two races), or white, black, and Asian (three races), then the sum of the IDs would be a unique number such that the sum would only be associated with that specific combination of races. You could then reference that summed number as a code value for a multi-racial description that could be printed on a report or something.
Is there a mathematical term for this?
Is there a formula one could use to create the ID numbers that would cause this result?
And finally, if presented with a number, what would be the most efficient way to breakdown the number to determine which elements were summed together assuming you had the list of constituent category IDs?
I would actually handle this through concatenating strings rather than adding numbers. For instance, if you had three races, with IDs of 1, 2, and 3, and you wanted to identify someone who picked all three, I would have an ID of "123". Of course, if you're just using numbers, then you can just add an extra zero to each table, so the first ID would be 1, the second would be 20, and the third would be 300. But, again, if you used string concatenation you could have IDs that were more than just numbers.
The smallest such series of integers are the powers of 2:
1, 2, 4, 8, 16, 32, 64, 128, 256...
No two different subsets of this series have the same sum - this is why the binary numeral system works. Note that this is equivalent to using the bits of an integer as a set data structure, normally called a bitset or bit array.
To determine the individual elements which make up a sum, you can use the bitwise & operator to test whether each individual bit is a 1 or a 0 (indicating whether that element is present or absent, respectively).
I am a new R user.
I have a dataframe consisting of 50 columns and 300 rows. The first column indicates the ID while the 2nd until the last column are standard deviation (sd) of traits. The pooled sd for each column are indicated at the last row. For each column, I want to remove all those values ten times greater than the pooled sd. I want to do this in one run. So far, the script below is what I have came up for knowing whether a value is greater than the pooled sd. However, even the ID (character) are being processed (resulting to all FALSE). If I put raw_sd_summary[-1], I have no way of knowing which ID on which trait has the criteria I'm looking for.
logic_sd <- lapply(raw_sd_summary, function(x) x>tail(x,1) )
logic_sd_df <- as.data.frame(logic_sd)
What shall I do? And how can I extract all those values labeled as TRUE (greater than pooled sd) that are ten times greater than the pooled SD (along with their corresponding ID's)?
I think your code won't work since lapply will run on a data.frame's columns, not its rows as you want. Change it to
logic_sd <- apply(raw_sd_summary, 2, function(x) x>10*tail(x,1) )
This will give you a logical array of being more than 10 times the last row. You could recover the IDs by replacing the first column
logic_sd[,1] <- raw_sd_summary[,1]
You could remove/replace the unwanted values in the original table directly by
raw_sd_summary[-300,-1][logic_sd[-300,-1]]<-NA # or new value
I have a data matrix in R having 45 rows. Each row represents a value of a individual sample. I need to do to a trial simulation; I want to pair up samples randomly and calculate their differences. I want a large sampling (maybe 10000) from all the possible permutations and combinations.
This is how I managed to do it till now:-
My data matrix ("data") has 45 rows and 2 columns. I selected 45 rows randomly and subtracted from another randomly generated 45 rows.
n1<-(data[sample(nrow(data),size=45,replace=F),])-(data[sample(nrow(data),size=45,replace=F),])
This gave me a random set of differences of size 45.
I made 50 such vectors (n1 to n50) and did rbind, which gave me a big data matrix containing random differences.
Of course, many rows between first random set and second random set were same and cancelled out. I removed it with a code as follows:
row_sub = apply(new, 1, function(row) all(row !=0 ))
new.remove.zero<-new[row_sub,]
BUT, is there a cleaner way to do this? A simpler way to generate all possible random pairs of rows, calculate their difference as bind it together as a new matrix?
Thanks in advance.
This question already has an answer here:
Obtaining connected components of neighboring values
(1 answer)
Closed 6 years ago.
I'm interested in identifying contiguous regions within a matrix (not necessarily square) of 0-1 (boolean) values, using R. I would like, given a matrix of 0-1 values, to identify each contiguous cluster (diagonals count, although an option of whether to count them or not would be ideal) and register the number of cells within that cluster.
Take the following example:
set.seed(14)
p <- matrix(0, ncol = 10, nrow = 10)
p[sample(1:100, 10)] <- 1
ones <- which(p == 1)
image(p)
I'd like to be able to identify (since I'm counting diagonals) four different groups, with (from top to bottom) 2, 1, 5, and 2 cells per cluster.
The raster package has an adjacent function which does a good job of locating adjacent cells, but I can't figure out how to do this.
One last constraint is that an ideal solution should be fast. I'd like to be able to use it within a data.table dt[, lapply(.SD, ...)] type situation with a large number of groups (each group being a data set from which I could create the matrix).
You definitely need connected component labeling algorithm
I've computed 4 different similarity matrices for same items but in different time periods and I'd like to compare how similarity between items changes over time. The problem is that order of items i.e. matrix columns is different for every matrix. How can I reorder columns in matrices so all my matrices become comparable?
Moving comment to an answer:
For each matrix mx_j create a vector of the column names:
cnj<-colnames(mx_j)
Then for any given pair of matrices j and k,
colmatch<- intersect(cnj,cnk) will identify the common columns, and the analysis can be limited to that subset of names.