Merging two columns with two values - r

I have columns which I know there name and that their data are 0 and 1.
I would like to merge them to one but if in one row exist the 1 take the one value or if I have 1 and 1 keep 1.
Example of data:
stockI stockII
1 0
1 0
0 0
0 0
0 0
0 0
0 0
1 0
0 0
1 1
the output I could expect:
stockI/stockII
0
1
0
0
0
0
0
0
0
1
Is there any cbind method to make it?

We can try
as.integer(with(df1, (c(FALSE,stockI[-1] &
stockI[-nrow(df1)]) & stockI) | (stockI & stockII)))
#[1] 0 1 0 0 0 0 0 0 0 1

Related

Creating a repeated sequence of zero and ones with uneven "breaks" between

I am trying to create a sequence consisting of 1 and 0 using Rstudio.
My desired output is a sequence that first has five 1 then six 0, followed by four 1 then six 0. Then this should all be repeat until the end of a given vector.
The result should be like this:
1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 .....
Hope someone has a good solution, and sorry if I have some grammar mistakes
Best,
HB
rep(c(rep(1,5),rep(0,6),rep(1,4),rep(0,6)),n)
repeating your pattern n times.
You could use Map.
unlist(Map(function(x, ...) c(rep(x, ...), rep(0, 6)), 1, times=length(v):1))
# [1] 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0
Instead of length(v):1 you may also use rev(seq(v)) but it's slower.
Data
v <- c("Vector", "of", "specific", "length", "five")

Count number of unique instances in a column depending on values in other columns

I've got the following table (which is called train) (in reality much bigger)
UNSPSC adaptor alert bact blood collection packet patient ultrasoft whit
514415 0 0 0 0 0 0 0 1 0
514415 0 0 0 1 0 0 0 1 0
514415 0 0 1 0 0 0 0 1 0
514415 0 0 0 0 0 0 0 1 0
514415 0 0 0 0 0 0 0 1 0
514415 0 0 0 0 0 0 0 1 0
422018 0 0 0 0 0 0 0 1 0
422018 0 0 0 0 0 0 0 1 0
422018 0 0 0 1 0 0 0 1 0
411011 0 0 0 0 0 0 0 1 0
I want to calculate the number of unique UNSPSC per column where the value is equal to 1. So for column blood it will be 2 and for column ultrasoft will be 3.
I'm doing this but don't know how to continue:
apply(train[,-1], 2, ......)
I'm trying to not to use loops.
To continue from where you left, we can use apply with margin=2 and calculate the length of unique values of "UNSPSC" for each column.
apply(train[-1], 2, function(x) length(unique(train$UNSPSC[x==1])))
#adaptor alert bact blood collection packet
# 0 0 1 2 0 0
#patient ultrasoft whit
# 0 3 0
Better option is with sapply/lapply which gives the same result but unlike apply does not convert the dataframe into matrix.
sapply(train[-1], function(x) length(unique(train$UNSPSC[x==1])))
If you have columns of only 0 and 1, like in the example, just use colSums:
colSums(train[,-1]) # you remove the non numeric columns before use, like UNSPSC
# adaptor alert bact blood collection packet patient
# 0 0 1 2 0 0 0
# ultrasoft whit
# 10 0

Permutation position of numbers in R

I'm looking for a function in R which can do the permutation. For example, I have a vector with five 1 and ten 0 like this:
> status=c(rep(1,5),rep(0,10))
> status
[1] 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
Now I'd like to randomly permute the position of these numbers but keep the same number of 0 and 1 in vector and to get new series of number, for example to get something like this:
1 1 0 1 0 1 0 0 0 0 0 1 0 0 0
or
1 0 0 0 0 0 0 1 1 0 0 1 0 1 0
I found the function sample() can help us to sample, but the number of 1 and 0 is not the same each time. Do you know how can I do this with R? Thanks in advance.
We can use sample
sample(status)
#[1] 1 0 0 1 0 0 1 0 0 0 0 1 0 1 0
sample(status)
#[1] 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0
If we use sample to return the entire vector, it will do the permutation and give the frequency count same for each of the unique elements
colSums(replicate(5, sample(status)))
#[1] 5 5 5 5 5
i.e. we get 5 one's in each of the sampling. So, the remaining 0's would be 10.

Loop through two data tables from column to row wise?

I have two data frames:
DT1: (This data frame's column values I need to edit based on another datatable DT2)
BIC BCC1 BCC2 BCC6 BCC8 BCC9 BCC10 BCC11
990081899A 0 1 0 0 0 0 0
9900023620 0 1 1 0 0 0 0
9900427160 0 1 0 1 0 0 0
990064457TA 1 1 0 1 0 0 0
990066595A 0 0 0 0 0 0 1
990088248A 0 0 0 0 0 0 1
990088882C1 0 0 0 0 0 0 1
990088882C2 0 0 0 1 1 0 0
990088882C3 0 0 0 1 1 0 0
990088882C4 0 0 0 1 1 0 0
990088882C5 0 0 0 1 1 0 0
DT2:
BCC HIER1 HIER2 HIER3 HIER4 HIER5
BCC8 BCC9 BCC10 BCC11 BCC12 0
BCC9 BCC10 BCC11 BCC12 0 0
BCC10 BCC11 BCC12 0 0 0
BCC11 BCC12 0 0 0 0
BCC17 BCC18 BCC19 0 0 0
BCC18 BCC19 0 0 0 0
BCC27 BCC28 BCC29 BCC80 0 0
BCC28 BCC29 0 0 0 0
BCC46 BCC48 0 0 0 0
BCC54 BCC55 0 0 0 0
BCC57 BCC58 0 0 0 0
BCC70 BCC71 BCC72 BCC103 BCC104 BCC169
I want to look up the column names in DT1 though first column values in DT2$BCC, according to the hierarchy logic, as:
I want to loop through DT1 column names except first column and nest that loop through DT2 first column values to check if they are equal. If they are equal then get that DT2$BCC value and check if DT1$(DT2$BCC) = 1, if yes then set value 0 in DT1 columns are present in (HIER1, HIER2, HIER3,.......)
Result should be:
BIC BCC1 BCC2 BCC6 BCC8 BCC9 BCC10 BCC11
990081899A 0 1 0 0 0 0 0
9900023620 0 1 1 0 0 0 0
9900427160 0 1 0 1 0 0 0
990064457TA 1 1 0 1 0 0 0
990066595A 0 0 0 0 0 0 0
990088248A 0 0 0 0 0 0 0
990088882C1 0 0 0 0 0 0 0
990088882C2 0 0 0 1 0 0 0
990088882C3 0 0 0 1 0 0 0
990088882C4 0 0 0 1 0 0 0
990088882C5 0 0 0 1 0 0 0
I am doing this now:
cols<-setdiff(names(DT1), "HIC")
subs<-as.character(DT2$BCC)
colsHier<-setdiff(names(DT2), "BCC")
paste0("DT1$", eval(cols[i]))<-
for( i in 1:length(cols)){
for (k in 1:length(subs)){
ifelse(cols[i] == subs[k],
ifelse(do.call(paste0, list('DT1$', eval(cols[1]),'[]')) == 1,
for (j in 1:length(colsHeir)){
if(colsHeir[j]!= 0)
x<-paste0('DT2$',eval(colsHier[j]))
paste0('DT1$',eval(x[k])):= 0}
,DT1$cols[i]), DT1$cols[i])}}
I am trying to match the value of do.call(paste0, list('DT1$', eval(cols[1]),'[]')) == 1, but when I am running this expression in R I am getting following:
> do.call(paste0, list('DT1$', eval(cols[2]),'[1]'))
[1] "DT1$BCC2[1]"
and NOT the value of the cell. How can I access the value of that cell to match with 1.
I am not able get the correct way of doing this. I am sorry for long question. Any help is appreciated.
library(reshape2)
melt the data
dt1.m <- melt(dt1, id = "BIC")
dt2.m <- melt(dt2, id = "BCC")
If the dt1.m$variable is equal to one of the values in dt2.m set it to 0
dt1.m$value <- ifelse(dt1.m$variable %in% dt2.m$value, 0, dt1.m$value)
cast the data into proper form
dt1.c <- dcast(dt1.m, ...~variable)
Dcast automatically reorders the rows.

How can I calculate an empirical CDF in R?

I'm reading a sparse table from a file which looks like:
1 0 7 0 0 1 0 0 0 5 0 0 0 0 2 0 0 0 0 1 0 0 0 1
1 0 0 1 0 0 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1
1 0 0 1 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 2 1 0 1 0 1
Note row lengths are different.
Each row represents a single simulation. The value in the i-th column in each row says how many times value i-1 was observed in this simulation. For example, in the first simulation (first row), we got a single result with value '0' (first column), 7 results with value '2' (third column) etc.
I wish to create an average cumulative distribution function (CDF) for all the simulation results, so I could later use it to calculate an empirical p-value for true results.
To do this I can first sum up each column, but I need to take zeros for the undef columns.
How do I read such a table with different row lengths? How do I sum up columns replacing 'undef' values with 0'? And finally, how do I create the CDF? (I can do this manually but I guess there is some package which can do that).
This will read the data in:
dat <- textConnection("1 0 7 0 0 1 0 0 0 5 0 0 0 0 2 0 0 0 0 1 0 0 0 1
1 0 0 1 0 0 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1
1 0 0 1 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 2 1 0 1 0 1")
df <- data.frame(scan(dat, fill = TRUE, what = as.list(rep(1, 29))))
names(df) <- paste("Val", 1:29)
close(dat)
Resulting in:
> head(df)
Val 1 Val 2 Val 3 Val 4 Val 5 Val 6 Val 7 Val 8 Val 9 Val 10 Val 11 Val 12
1 1 0 7 0 0 1 0 0 0 5 0 0
2 1 0 0 1 0 0 0 3 0 0 0 0
3 0 0 0 1 0 0 0 2 0 0 0 0
4 1 0 0 1 0 3 0 0 0 0 1 0
5 0 0 0 1 0 0 0 2 0 0 0 0
....
If the data are in a file, provide the file name instead of dat. This code presumes that there are a maximum of 29 columns, as per the data you supplied. Alter the 29 to suit the real data.
We get the column sums using
df.csum <- colSums(df, na.rm = TRUE)
the ecdf() function generates the ECDF you wanted,
df.ecdf <- ecdf(df.csum)
and we can plot it using the plot() method:
plot(df.ecdf, verticals = TRUE)
You can use the ecdf() (in base R) or Ecdf() (from the Hmisc package) functions.

Resources