Determining if each vector element not exceeds all previous elements - r

I need to compare element i with all previous elements i-1,i-2,..., and if i < i-1, i-2, ... return 1, otherwise return 0.
data <- c(10.3,14.3,7.7,15.8,14.4,16.7,15.3,20.2,17.1,7.7,15.3,16.3,19.9,14.4,18.7,20.7)
The result of comparing should be the following.
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I tried to make it with
as.integer(cummin(data)==data)
and i get
1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
The first elements easy to fix. But what to do with another 1 on 10 position.

A possible approach:
v <- rank(data,ties='first')
out <- as.integer(cummin(v)==v)
# [1] 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
Taking care of the first element:
out[1] <- 0

try this:
sapply(1 : length(data), FUN = function(i) all(data[i] < data[1 : (i - 1)]) * 1)
#[1] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

Related

Genetic Algorithm in R: Specify number of 1s in binary chromosomes

I am using the rbga function, but my question still stands for other genetic algorithm implementations in R. Is there a way to specify the number of 1s in binary chromosomes?
I have the following example provided by the library documentation.
data(iris)
library(MASS)
X <- as.data.frame(cbind(scale(iris[,1:4]), matrix(rnorm(36*150), 150, 36)))
Y <- iris[,5]
iris.evaluate <- function(indices) {
print("Chromosome")
print(indices)
print("================================")
result = 1
if (sum(indices) > 2) {
huhn <- lda(X[,indices==1], Y, CV=TRUE)$posterior
result = sum(Y != dimnames(huhn)[[2]][apply(huhn, 1,
function(x)
which(x == max(x)))]) / length(Y)
}
result
}
monitor <- function(obj) {
minEval = min(obj$evaluations);
plot(obj, type="hist");
}
woppa <- rbga.bin(size=40, mutationChance=0.05, zeroToOneRatio=10,
evalFunc=iris.evaluate, showSettings=TRUE, verbose=TRUE)
Here are some of the chromosomes.
"Chromosome"
0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
"================================"
"Chromosome"
0 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0
"================================"
"Chromosome"
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
"================================"
"Chromosome"
0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
"================================"
The 1s (i.e., the chosen characteristics) are 5, 8, 5 and 4 respectively.
I am trying to follow the technique specified in a paper and they claim that they apply a genetic algorithm and in the end they pick a specific number of characteristics.
Is it possible to specify in a genetic algorithm the number of characteristics that I want my solution(s)/chromosome(s) to have?
Could this be done on the final solution/chromosome and if yes how?

Identify a vector within a list with at least n ocurrences of given value in R

I would like to write a function which would allow to filter the input data.
My input data is a list object containing named numeric vectors (minimal reproducible example below - dummy list).
vec1 <- c(rep(0, 10), rep(1, 4), rep(0,5), rep(-1,5))
vec2 <- c(rep(-1, 7), rep(0,99), rep(1, 6))
vec3 <- c(rep(1,2), rep(-1,2), rep(0,10), rep(-1,4), rep(0,8))
vec4 <- rep(0, 100)
dummy_list <- list(vec1, vec2, vec3, vec4)
names(dummy_list) <- c("first", "second", "third", "fourth")
I want my function to test whether in this vector any non-zero value occurs at least 5 times in a row.
My desired output should be a list containing only first two vectors of the initial dummy_list.
Below is one of my multiple attempts - I would like it to be as much similar to this as possible (except that the solution should work).
dummy_list <- Filter(function(x) which(rle(x$values !=0) x$lengths>5, dummy_list)
Note that we check whether any of the the rle length is greater or equal to 5.
Filter(function(x)any(with(rle(x), lengths[values!=0]>=5)), dummy_list)
$first
[1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 -1 -1 -1 -1 -1
$second
[1] -1 -1 -1 -1 -1 -1 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[32] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[63] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[94] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1

Find the vertical distance between the top most '1' and the bottom most '1' in a matrix in R

I have successfully imported a csv file into R. It is a 6 by 6 matrix.
0 0 0 0 0 0
0 1 0 0 0 0
0 1 1 0 0 0
0 1 0 0 0 1
0 1 0 1 0 0
0 0 0 0 0 0
'1' exists in the second row and also exists in the second last row. So the distance between them vertically is 4.
Would I use the dist function to calculate this? And if so how would I implement it to give me the value of 4?
diff(range(which(rowSums(mat) > 0)))
# [1] 3
Explanation: since the data is binary, we can look at the distance between rows where the row sum is >0.
Adapting Sathish's nicely share data, this works:
mat <- matrix(as.integer(unlist(strsplit('0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0', " "))),
nrow = 6, ncol = 6, byrow = TRUE)

Counting repeated 5-mers gene from 100 DNA sequence samples

I am beginner in R and trying to solve this but have been struggling for few days already. Please help a newbie out.
I extracted 100 samples each of length 1000 from a 100,000 DNA sequence. Then, I want to count "AATAA" appeared how many times in the each of the sample.
dog_100
# [1] "GGGTCCTTGAAAGAAGCACAGGGTGGGGGTGGGGGTGGGGGTGGGGGAAGGCAGAGAGGAGGAAACAGGTTTTTGTCCTCAGGGCGTTGCCAGTCTGAAGGAGGTGATGGGATAATTATTTATGAGAGTTCAGGAATGCCAGGCATGGATTAAATGCAAACTAATGGAAATGACACAGAACAATACATTACAC......................................"
#[2] "CCAGGCCAGAACTGAGGCCCTCAGGGCCCCCCAGAATTCCTCATTTGCAGGATAAAAATATACTCAGCTCTTCAATCTTGGTTCTTGCTACTGCACCATGTGCTTCCTGGACTCTGGGAGGCCAGGGGTTAAGTGGGAGTGTTTGAATAAGGGAAAGGATGAGCCCTTTCCCCACACTTTGCCCCAAATAAC......................................"
#[3]
#........
# [4]
#........
# [100]
#........
I wrote a function to identify and count the "AATAA".
R
library(stringr)
cal_AATAA <- function(DNA){
sam_pro <- numeric(length(DNA))
k <- 5
sam_code <- "AATAA"
for(i in 1:(length(DNA))){
Num <- str_length(DNA[i])
for(j in 1:(Num - k +1)){
if ((str_sub(DNA[i], j, j+k-1)) == sam_code){
sam_pro[i] <- sam_pro[i] + 1
}
else {
sam_pro[i] <- sam_pro[i]
}
}
return (sam_pro)
}
}
sample_100 <- cal_AATAA(dog_100)
What I got after running the function is
> sample_100
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[46] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[91] 0 0 0 0 0 0 0 0 0 0
Tried to debug my code but don't know where went wrong. Appreciate any tips or guidance.
R has a built in function called gregexpr which can be used for counting patterns in a string. It outputs a list, so we have to use sapply to loop through the elements of the output. For each element, we count the number of values that are greater than zero because a value of -1 indicates that any match was not found. Look at the output of gregexpr("ap", c("appleap", "orange")) as an example.
dna = c("AGTACGTGCATAGC", "GTAGCTAGCTAGCAT")
sam = "AGC"
sapply(gregexpr(sam, dna), function(x) sum(x > 0))
#[1] 1 3

Loop through two data tables from column to row wise?

I have two data frames:
DT1: (This data frame's column values I need to edit based on another datatable DT2)
BIC BCC1 BCC2 BCC6 BCC8 BCC9 BCC10 BCC11
990081899A 0 1 0 0 0 0 0
9900023620 0 1 1 0 0 0 0
9900427160 0 1 0 1 0 0 0
990064457TA 1 1 0 1 0 0 0
990066595A 0 0 0 0 0 0 1
990088248A 0 0 0 0 0 0 1
990088882C1 0 0 0 0 0 0 1
990088882C2 0 0 0 1 1 0 0
990088882C3 0 0 0 1 1 0 0
990088882C4 0 0 0 1 1 0 0
990088882C5 0 0 0 1 1 0 0
DT2:
BCC HIER1 HIER2 HIER3 HIER4 HIER5
BCC8 BCC9 BCC10 BCC11 BCC12 0
BCC9 BCC10 BCC11 BCC12 0 0
BCC10 BCC11 BCC12 0 0 0
BCC11 BCC12 0 0 0 0
BCC17 BCC18 BCC19 0 0 0
BCC18 BCC19 0 0 0 0
BCC27 BCC28 BCC29 BCC80 0 0
BCC28 BCC29 0 0 0 0
BCC46 BCC48 0 0 0 0
BCC54 BCC55 0 0 0 0
BCC57 BCC58 0 0 0 0
BCC70 BCC71 BCC72 BCC103 BCC104 BCC169
I want to look up the column names in DT1 though first column values in DT2$BCC, according to the hierarchy logic, as:
I want to loop through DT1 column names except first column and nest that loop through DT2 first column values to check if they are equal. If they are equal then get that DT2$BCC value and check if DT1$(DT2$BCC) = 1, if yes then set value 0 in DT1 columns are present in (HIER1, HIER2, HIER3,.......)
Result should be:
BIC BCC1 BCC2 BCC6 BCC8 BCC9 BCC10 BCC11
990081899A 0 1 0 0 0 0 0
9900023620 0 1 1 0 0 0 0
9900427160 0 1 0 1 0 0 0
990064457TA 1 1 0 1 0 0 0
990066595A 0 0 0 0 0 0 0
990088248A 0 0 0 0 0 0 0
990088882C1 0 0 0 0 0 0 0
990088882C2 0 0 0 1 0 0 0
990088882C3 0 0 0 1 0 0 0
990088882C4 0 0 0 1 0 0 0
990088882C5 0 0 0 1 0 0 0
I am doing this now:
cols<-setdiff(names(DT1), "HIC")
subs<-as.character(DT2$BCC)
colsHier<-setdiff(names(DT2), "BCC")
paste0("DT1$", eval(cols[i]))<-
for( i in 1:length(cols)){
for (k in 1:length(subs)){
ifelse(cols[i] == subs[k],
ifelse(do.call(paste0, list('DT1$', eval(cols[1]),'[]')) == 1,
for (j in 1:length(colsHeir)){
if(colsHeir[j]!= 0)
x<-paste0('DT2$',eval(colsHier[j]))
paste0('DT1$',eval(x[k])):= 0}
,DT1$cols[i]), DT1$cols[i])}}
I am trying to match the value of do.call(paste0, list('DT1$', eval(cols[1]),'[]')) == 1, but when I am running this expression in R I am getting following:
> do.call(paste0, list('DT1$', eval(cols[2]),'[1]'))
[1] "DT1$BCC2[1]"
and NOT the value of the cell. How can I access the value of that cell to match with 1.
I am not able get the correct way of doing this. I am sorry for long question. Any help is appreciated.
library(reshape2)
melt the data
dt1.m <- melt(dt1, id = "BIC")
dt2.m <- melt(dt2, id = "BCC")
If the dt1.m$variable is equal to one of the values in dt2.m set it to 0
dt1.m$value <- ifelse(dt1.m$variable %in% dt2.m$value, 0, dt1.m$value)
cast the data into proper form
dt1.c <- dcast(dt1.m, ...~variable)
Dcast automatically reorders the rows.

Resources