recode values into one column - r

I have a dataframe with one value per row, potentially in one of several columns. How can I create a single column that contains the column number the 1 is in? I would like to do this using dplyr, but the only methods I can think of involve for loops, which seems very not R like.
df<-data.frame(
a=c(1,0,0,0),
b=c(0,1,1,0),
c=c(0,0,0,1)
)
a b c
1 1 0 0
2 0 1 0
3 0 1 0
4 0 0 1
GOAL:
1 1
2 2
3 2
4 3

There is no need for dplyr here. This is what max.col() is for. Since all the other values in the row will be zero, then max.col() will give us the column number where the 1 appears.
max.col(df)
# [1] 1 2 2 3
If you need a column, then
data.frame(x = max.col(df))
# x
# 1 1
# 2 2
# 3 2
# 4 3
Or cbind() or matrix() for a matrix.

We could also do
as.matrix(df) %*%seq_along(df)
# [,1]
#[1,] 1
#[2,] 2
#[3,] 2
#[4,] 3

which(df==1, arr.ind=T)
# row col
# [1,] 1 1
# [2,] 2 2
# [3,] 3 2
# [4,] 4 3

Related

Add dynamic subset conditions as variables into the data.frame

I have a data frame like
> x = data.frame(A=c(1,2,3),B=c(2,3,4))
> x
A B
1 1 2
2 2 3
3 3 4
and subsetting conditions in a data frame like
> cond = data.frame(condition=c('A>1','B>2 & B<4'))
> cond
condition
1 A>1
2 B>2 & B<4
which I then apply dynamically
> eval(parse(text=paste0("subset(x,",cond[1,'condition'],")")))
A B
2 2 3
3 3 4
> eval(parse(text=paste0("subset(x,",cond[2,'condition'],")")))
A B
2 2 3
Now, instead of subsetting, I would like to add the subsetting conditions as variables into the data. The end result would look like
A B condition1 condition2
1 1 2 0 0
2 2 3 1 1
3 3 4 1 0
How could I derive the above table using the dynamic conditions?
Before using eval parse, I hope you have gone through some readings like
What specifically are the dangers of eval(parse(…))?
and many others which are available.
However, to answer your question, we can continue your flow and use eval parse in sapply
+(sapply(seq_len(nrow(cond)), function(i)
eval(parse(text=paste0("with(x,",cond[i,'condition'],")")))))
# [,1] [,2]
#[1,] 0 0
#[2,] 1 1
#[3,] 1 0
To add it to the dataframe,
x[paste0("condition", 1:nrow(cond))] <-
+(sapply(seq_len(nrow(cond)), function(i)
eval(parse(text=paste0("with(x,",cond[i,'condition'],")")))))
x
# A B condition1 condition2
#1 1 2 0 0
#2 2 3 1 1
#3 3 4 1 0
Simplifying it a bit (using #jogo's comment)
+(sapply(cond$condition, function(i) with(x, eval(parse(text=as.character(i))))))
# [,1] [,2]
#[1,] 0 0
#[2,] 1 1
#[3,] 1 0
Here is an option using tidyverse
library(tidyverse)
x %>%
mutate(!!! rlang::parse_exprs(str_c(cond$condition, collapse=";"))) %>%
rename_at(3:4, ~ paste0("condition", 1:2))
# A B condition1 condition2
#1 1 2 FALSE FALSE
#2 2 3 TRUE TRUE
#3 3 4 TRUE FALSE
If needed, the logical columns can be easily converted to binary with as.integer

R How to reshape matrix using value in a column

I have matrix like this:
ID Count
1 2
2 3
3 2
I want to create a matrix in which the number of rows for an ID equals the value of Count while adding a new column containing the index for each row within the ID value. For the matrix above, the result should be:
ID Index
1 1
1 2
2 1
2 2
2 3
3 1
3 2
For a simple case you can just use rep and sequence.
ID=c(1,2,3)
Count=c(2,3,2)
cbind(ID=rep(ID, Count), Index=sequence(Count))
# ID Index
#[1,] 1 1
#[2,] 1 2
#[3,] 2 1
#[4,] 2 2
#[5,] 2 3
#[6,] 3 1
#[7,] 3 2
Using tidyverse
library(tidyverse)
df1 <- df %>%
group_by(ID) %>%
nest() %>%
mutate(data=map(data,~seq_along(1:.x$Count))) %>%
unnest(data)
Output
ID data
1 1 1
2 1 2
3 2 1
4 2 2
5 2 3
6 3 1
7 3 2

Removing duplicate adjacent value in a matrix R

I have a text file where inside it have 30 rows and around 1000 columns.The data layout when I use read.table and use View() is as shown below. I have try a lot methods use to remove duplicate adjacent value for data.frame to remove in my case but not work.
1 1 1 2 2 2 3 3 3 3 2 2
2 2 2 2 2 2 2 2 2 2 2 2
My expected the output would be something like this:
1 2 3 2
2
After I filter the duplicate, I will write it back into a new matrix.
You can use rle. It "[c]ompute[s] the lengths and values of runs of equal values in a vector".
DF <- read.table(text = "1 1 1 2 2 2 3 3 3 3 2 2
2 2 2 2 2 2 2 2 2 2 2 2")
x <- apply(DF, 1, function(x) unname(rle(x)$values))
do.call(rbind, lapply(x, `length<-`, max(lengths(x))))
# [,1] [,2] [,3] [,4]
#[1,] 1 2 3 2
#[2,] 2 NA NA NA

R merging with a preference

Suppose you have a matrix that consists of two columns of only 1's and 2's.
A B
1 2
2 2
1 1
2 1
2 1
2 2
2 1
How would you merge these two columns into one so that 2 always overwrites 1?
Desired Output:
C
2
2
1
2
2
2
2
Assuming that the data is stored in a dataframe named df, you can use
df$C <- pmax(df$A, df$B)
to create a new column C with the desired result.
In the case of a matrix m you can use
m <- cbind(m, pmax(m[,1], m[,2]))
colnames(m) <- LETTERS[1:ncol(m)]
#> m
# A B C
#[1,] 1 2 2
#[2,] 2 2 2
#[3,] 1 1 1
#[4,] 2 1 2
#[5,] 2 1 2
#[6,] 2 2 2
#[7,] 2 1 2
#> class(m)
#[1] "matrix"
Without ifelse:
df$C <- apply(df[,c("A","B")],1,max)
With ifelse:
df$C2 <- with(df, ifelse(A==1&B==1,1,2))
Result
> df
A B C1 C2
1 1 2 2 2
2 2 2 2 2
3 1 1 1 1
4 2 1 2 2
5 2 1 2 2
6 2 2 2 2
7 2 1 2 2

Build summary table from matrix

I have this matrix
mdat <- matrix(c(0,1,1,1,0,0,1,1,0,1,1,1,1,0,1,1,1,1,0,1), nrow = 4, ncol = 5, byrow = TRUE)
[,1] [,2] [,3] [,4] [,5]
[1,] 0 1 1 1 0
[2,] 0 1 1 0 1
[3,] 1 1 1 0 1
[4,] 1 1 1 0 1
and I'm trying to build T:
T1 T2 T3
row1 1 2 4
row2 2 2 3
row3 2 5 5
row4 3 1 3
row5 3 5 5
row6 4 1 3
row7 4 5 5
where for each row in mdat:
T1 shows mdat row number
T2 shows mdat column where there's the first 1
T3 shows mdat column where there's the last consecutive 1.
Therefore
row1 in T is [1 2 4] because for row 1 in mdat the first 1 is in column 2 and the last consecutive 1 is in column 4.
row2 in T is [2 2 3] because for row 2 in mdat the first 1 is in column 2 and the last consecutive 1 is in column 3.
This is my try:
for (i in 1:4){
for (j in 1:5) {
if (mdat[i,j]==1) {T[i,1]<-i;T[i,2]<-j;
cont<-0;
while (mdat[i,j+cont]==1){
cont<-cont+1;
T[i,3]<-cont}
}
}
}
Here's a strategy using apply/rle as Richard suggested.
xx<-apply(mdat, 1, function(x) {
r <- rle(x)
w <- which(r$values==1)
l <- r$lengths[w]
s <- cumsum(c(0,r$lengths))[w]+1
cbind(start=s,stop=s+l-1)
})
do.call(rbind, Map(cbind, row=seq_along(xx), xx))
We start by finding the runs of 1 on each row using the "values" property of the rle and we calculate their start and stop positions using the "lengths" property. We turn this data into a list of two column matrices with one list item per row of the original matrix.
Now we use Map to add the row number back onto the matrix and then we rbind all the results. That seems to give you the data you're after
row start stop
[1,] 1 2 4
[2,] 2 2 3
[3,] 2 5 5
[4,] 3 1 3
[5,] 3 5 5
[6,] 4 1 3
[7,] 4 5 5
Try the Bioconductor IRanges package:
library(IRanges)
r <- unlist(slice(split(Rle(mdat), row(mdat)), 1, rangesOnly=TRUE)))
r
IRanges of length 7
start end width names
[1] 2 4 3 1
[2] 2 3 2 2
[3] 5 5 1 2
[4] 1 3 3 3
[5] 5 5 1 3
[6] 1 3 3 4
[7] 5 5 1 4
EDIT: optimized

Resources