Loop through two data tables from column to row wise? - r

I have two data frames:
DT1: (This data frame's column values I need to edit based on another datatable DT2)
BIC BCC1 BCC2 BCC6 BCC8 BCC9 BCC10 BCC11
990081899A 0 1 0 0 0 0 0
9900023620 0 1 1 0 0 0 0
9900427160 0 1 0 1 0 0 0
990064457TA 1 1 0 1 0 0 0
990066595A 0 0 0 0 0 0 1
990088248A 0 0 0 0 0 0 1
990088882C1 0 0 0 0 0 0 1
990088882C2 0 0 0 1 1 0 0
990088882C3 0 0 0 1 1 0 0
990088882C4 0 0 0 1 1 0 0
990088882C5 0 0 0 1 1 0 0
DT2:
BCC HIER1 HIER2 HIER3 HIER4 HIER5
BCC8 BCC9 BCC10 BCC11 BCC12 0
BCC9 BCC10 BCC11 BCC12 0 0
BCC10 BCC11 BCC12 0 0 0
BCC11 BCC12 0 0 0 0
BCC17 BCC18 BCC19 0 0 0
BCC18 BCC19 0 0 0 0
BCC27 BCC28 BCC29 BCC80 0 0
BCC28 BCC29 0 0 0 0
BCC46 BCC48 0 0 0 0
BCC54 BCC55 0 0 0 0
BCC57 BCC58 0 0 0 0
BCC70 BCC71 BCC72 BCC103 BCC104 BCC169
I want to look up the column names in DT1 though first column values in DT2$BCC, according to the hierarchy logic, as:
I want to loop through DT1 column names except first column and nest that loop through DT2 first column values to check if they are equal. If they are equal then get that DT2$BCC value and check if DT1$(DT2$BCC) = 1, if yes then set value 0 in DT1 columns are present in (HIER1, HIER2, HIER3,.......)
Result should be:
BIC BCC1 BCC2 BCC6 BCC8 BCC9 BCC10 BCC11
990081899A 0 1 0 0 0 0 0
9900023620 0 1 1 0 0 0 0
9900427160 0 1 0 1 0 0 0
990064457TA 1 1 0 1 0 0 0
990066595A 0 0 0 0 0 0 0
990088248A 0 0 0 0 0 0 0
990088882C1 0 0 0 0 0 0 0
990088882C2 0 0 0 1 0 0 0
990088882C3 0 0 0 1 0 0 0
990088882C4 0 0 0 1 0 0 0
990088882C5 0 0 0 1 0 0 0
I am doing this now:
cols<-setdiff(names(DT1), "HIC")
subs<-as.character(DT2$BCC)
colsHier<-setdiff(names(DT2), "BCC")
paste0("DT1$", eval(cols[i]))<-
for( i in 1:length(cols)){
for (k in 1:length(subs)){
ifelse(cols[i] == subs[k],
ifelse(do.call(paste0, list('DT1$', eval(cols[1]),'[]')) == 1,
for (j in 1:length(colsHeir)){
if(colsHeir[j]!= 0)
x<-paste0('DT2$',eval(colsHier[j]))
paste0('DT1$',eval(x[k])):= 0}
,DT1$cols[i]), DT1$cols[i])}}
I am trying to match the value of do.call(paste0, list('DT1$', eval(cols[1]),'[]')) == 1, but when I am running this expression in R I am getting following:
> do.call(paste0, list('DT1$', eval(cols[2]),'[1]'))
[1] "DT1$BCC2[1]"
and NOT the value of the cell. How can I access the value of that cell to match with 1.
I am not able get the correct way of doing this. I am sorry for long question. Any help is appreciated.

library(reshape2)
melt the data
dt1.m <- melt(dt1, id = "BIC")
dt2.m <- melt(dt2, id = "BCC")
If the dt1.m$variable is equal to one of the values in dt2.m set it to 0
dt1.m$value <- ifelse(dt1.m$variable %in% dt2.m$value, 0, dt1.m$value)
cast the data into proper form
dt1.c <- dcast(dt1.m, ...~variable)
Dcast automatically reorders the rows.

Related

Genetic Algorithm in R: Specify number of 1s in binary chromosomes

I am using the rbga function, but my question still stands for other genetic algorithm implementations in R. Is there a way to specify the number of 1s in binary chromosomes?
I have the following example provided by the library documentation.
data(iris)
library(MASS)
X <- as.data.frame(cbind(scale(iris[,1:4]), matrix(rnorm(36*150), 150, 36)))
Y <- iris[,5]
iris.evaluate <- function(indices) {
print("Chromosome")
print(indices)
print("================================")
result = 1
if (sum(indices) > 2) {
huhn <- lda(X[,indices==1], Y, CV=TRUE)$posterior
result = sum(Y != dimnames(huhn)[[2]][apply(huhn, 1,
function(x)
which(x == max(x)))]) / length(Y)
}
result
}
monitor <- function(obj) {
minEval = min(obj$evaluations);
plot(obj, type="hist");
}
woppa <- rbga.bin(size=40, mutationChance=0.05, zeroToOneRatio=10,
evalFunc=iris.evaluate, showSettings=TRUE, verbose=TRUE)
Here are some of the chromosomes.
"Chromosome"
0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
"================================"
"Chromosome"
0 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0
"================================"
"Chromosome"
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
"================================"
"Chromosome"
0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
"================================"
The 1s (i.e., the chosen characteristics) are 5, 8, 5 and 4 respectively.
I am trying to follow the technique specified in a paper and they claim that they apply a genetic algorithm and in the end they pick a specific number of characteristics.
Is it possible to specify in a genetic algorithm the number of characteristics that I want my solution(s)/chromosome(s) to have?
Could this be done on the final solution/chromosome and if yes how?

Separate a string of characters space-separated of dataframe in different columns

I am pretty new at using R and I have some data that I need to tidy a bit before I can use it. Basically I have a dataframe with a bunch of rows and columns and in every cell of this dataframe I have a string of 20 numbers of 1 and zeroes ("0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0").
Now I am trying to separate every number of a field having each number in a new column (1 field would be 20 columns). After that I would like to convert these newly separated strings into numbers. I will show a small sample of the data. Here I would need the numbers separated in 40 columns and 3 rows:
df<-data.frame(
"V1" = c("0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 ","0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ","1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "),
"V2" = c("0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ","0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 ","0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 "))
As you can see a good way to separate each number of a string would be treating the space as a delimiter, but I am not having any luck with that. I tried my luck with df<-lapply(strsplit(df, " "), as.numeric) but the dataframe can't be treated with this function. I tried then df<-lapply(strsplit(as.character(df), " "), as.numeric)
That way it separates correctly but making the full dataframe as a character messes up the data.
I suppose that it's easier than I think but I still lack skill in this code.
Easier option is read.table (no packages used)
read.table(text = as.character(df$V1), header = FALSE)
For multiple columns, use lapply
lapply(df, function(x) read.table(text = as.character(x), header = FALSE))
You can use cSplit from splitstackshape to convert multiple columns into separate columns.
splitstackshape::cSplit(df, names(df), " ")
# V1_01 V1_02 V1_03 V1_04 V1_05 V1_06 V1_07 V1_08 V1_09 V1_10 V1_11
#1: 0 0 0 0 0 0 0 0 0 0 0
#2: 0 0 0 1 0 0 0 0 0 0 0
#3: 1 0 0 0 0 0 0 0 0 0 0
# V1_12 V1_13 V1_14 V1_15 V1_16 V1_17 V1_18 V1_19 V1_20 V2_01 V2_02
#1: 0 0 0 1 0 0 0 0 0 0 0
#2: 0 0 0 0 0 0 0 0 0 0 0
#3: 0 0 0 0 0 0 0 0 0 0 0
# V2_03 V2_04 V2_05 V2_06 V2_07 V2_08 V2_09 V2_10 V2_11 V2_12 V2_13
#1: 0 0 0 0 1 0 0 0 0 0 0
#2: 0 0 0 0 0 0 0 0 0 0 0
#3: 0 0 0 0 0 0 0 1 0 0 0
# V2_14 V2_15 V2_16 V2_17 V2_18 V2_19 V2_20
#1: 0 0 0 0 0 0 0
#2: 0 0 0 0 0 1 0
#3: 0 0 0 0 0 0 0
Note that I have used names(df) here since you want to convert all the columns into separate columns. If you have additional columns and want to separate only few of them, you can also do
splitstackshape::cSplit(df, c("V1", "V2"), " ")
I found both answers equally good but the use of cSplit made the posterior process better I think. What I finally did to obtain the result:
df<-cSplit(df, names(df), " ")
df<-lapply(df,as.numeric)
df<-as.data.frame(df)
I suppose that this can be done with less lines of code but this way is more understandable for me. Thank you very much for your answers!

Filling a table with additional columns if they don't exist

I've the following difficult problem. Here short example of my data. Assume that I've two data sets (my real example has something about 20). The data frames result as a list computed by a self written function with lapply. So, I put the data frames in my example in a list, too. Then I "rbind" them to compute a frequency table.
df1 <- data.frame(rev(seq(12:0)), paste0("a=",sample(0:12, 13, replace=T)))
colnames(df1) <- c("k", "a")
df2 <- data.frame(rev(seq(12:0)), paste0("a=",sample(0:12, 13, replace=T)))
colnames(df2) <- c("k", "a")
list_df <- list(df1,df2)
df_combine<- plyr::ldply(list_df, rbind)
freq_foo <- table(df_combine$k,df_combine$a)
I get a frequency table of the following form.
a=0 a=11 a=12 a=2 a=5 a=6 a=7 a=8 a=3 a=9
1 1 0 0 0 0 0 0 1 0 0
2 1 0 0 0 0 0 0 0 0 1
3 1 0 0 0 0 1 0 0 0 0
4 0 0 0 1 0 1 0 0 0 0
5 0 0 0 1 1 0 0 0 0 0
6 0 0 0 0 0 0 1 0 0 1
7 0 1 1 0 0 0 0 0 0 0
8 1 0 0 0 0 1 0 0 0 0
9 0 0 0 0 0 0 2 0 0 0
10 0 0 1 0 1 0 0 0 0 0
11 1 1 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 1 0 1 0
13 1 0 1 0 0 0 0 0 0 0
I want to extend and manipulate my table in the following way:
First the table should go over a range of a=0 to a=15. So if there is a missing column, it should be added. And 2nd) I want to order the columns from 0 to 15.
For the first problem I tried
if(freq_foo$paste0("a=",0:15) == F){freq_foo$paste("a=",0:15) <- 0}
but this should work only for data frames and not for tables. Also. i've no idea how to order the columns with an ascending order. The data type isnt important to me because I just want to use the output for further calculations. So, it can also be a data frame instead of a table.
#convert freq_foo table to dataframe
df <- as.data.frame.matrix(freq_foo)
#add all zeros column for missing column name in 0:15 series
df[, paste0("a=", c(0:15)[!(c(0:15) %in% as.numeric(gsub(".*=(\\d+)", "\\1", names(df))))])] <- 0
#order columns from 0 to 15
df <- df[, order(as.numeric(gsub(".*=(\\d+)", "\\1", names(df))))]
Output is:
a=0 a=1 a=2 a=3 a=4 a=5 a=6 a=7 a=8 a=9 a=10 a=11 a=12 a=13 a=14 a=15
1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0
2 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
3 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0
5 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
6 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
7 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0
8 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
10 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
11 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
12 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0
13 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0
(Edit: Updated code after getting a requirement clarification from OP)

Determining if each vector element not exceeds all previous elements

I need to compare element i with all previous elements i-1,i-2,..., and if i < i-1, i-2, ... return 1, otherwise return 0.
data <- c(10.3,14.3,7.7,15.8,14.4,16.7,15.3,20.2,17.1,7.7,15.3,16.3,19.9,14.4,18.7,20.7)
The result of comparing should be the following.
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I tried to make it with
as.integer(cummin(data)==data)
and i get
1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
The first elements easy to fix. But what to do with another 1 on 10 position.
A possible approach:
v <- rank(data,ties='first')
out <- as.integer(cummin(v)==v)
# [1] 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
Taking care of the first element:
out[1] <- 0
try this:
sapply(1 : length(data), FUN = function(i) all(data[i] < data[1 : (i - 1)]) * 1)
#[1] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

For loop in a data table with dependencies in R?

I want to update the following datatable:
DT1: (This datatable columns values I need to edit based on my input)
BIC BCC1 BCC2 BCC6 BCC8 BCC9 BCC10 BCC11
990081899A 0 1 0 0 0 0 0
9900023620 0 1 1 0 0 0 0
9900427160 0 1 0 0 0 0 0
990064457TA 1 1 0 0 0 0 0
990066595A 0 0 0 1 0 0 1
990088248A 0 0 0 1 0 0 1
990088882C1 0 0 0 1 0 0 1
990088882C2 0 0 0 1 1 0 0
990088882C3 0 0 0 1 1 0 1
990088882C4 0 0 0 0 1 0 1
990088882C5 0 0 0 0 1 0 1
I want to loop through DT1 column names except first column to check if my input and any of the column names are equal. If they are equal, then check through all the rows of that column if the value is equal to 1. If yes, then set some of the other column values of that row equals to 0.
I am doing this now:
>Hierarchy <- function(Dt, cc, Hier){
Dt_cols<-setdiff(names(Dt), "HIC")
Dt_rows<-1:nrow(Dt)
for(j in 1:length(Dt_cols)){
if(Dt_cols[j] == cc){
for(i in 1:length(Dt_rows)){
if(eval(parse(text = paste("Dt[",i,",",eval(Dt_cols[j]),"]"))) == 1){
for(k in 1:length(Hier)){
if(Dt_cols[j] == Hier[k]){
hierVar<-as.character(eval(Dt_cols[j]));
Dt[i,hierVar]<- 0
}}}}}}return(Dt)}
If I am passing following arguments to this function:
>Hierarchy(DT1,"BCC8", c("BCC9","BCC10","BCC11","BCC12"))
Result should be:
BIC BCC1 BCC2 BCC6 BCC8 BCC9 BCC10 BCC11
990081899A 0 1 0 0 0 0 0
9900023620 0 1 1 0 0 0 0
9900427160 0 1 0 0 0 0 0
990064457TA 1 1 0 0 0 0 0
990066595A 0 0 0 1 0 0 0
990088248A 0 0 0 1 0 0 0
990088882C1 0 0 0 1 0 0 0
990088882C2 0 0 0 1 0 0 0
990088882C3 0 0 0 1 0 0 0
990088882C4 0 0 0 0 1 0 0
990088882C5 0 0 0 0 1 0 0
But with this function is not working properly. I am not able to find another way or correct way of doing. Any suggestions are appreciated. Thanks!

Resources