Removing duplicate adjacent value in a matrix R - r

I have a text file where inside it have 30 rows and around 1000 columns.The data layout when I use read.table and use View() is as shown below. I have try a lot methods use to remove duplicate adjacent value for data.frame to remove in my case but not work.
1 1 1 2 2 2 3 3 3 3 2 2
2 2 2 2 2 2 2 2 2 2 2 2
My expected the output would be something like this:
1 2 3 2
2
After I filter the duplicate, I will write it back into a new matrix.

You can use rle. It "[c]ompute[s] the lengths and values of runs of equal values in a vector".
DF <- read.table(text = "1 1 1 2 2 2 3 3 3 3 2 2
2 2 2 2 2 2 2 2 2 2 2 2")
x <- apply(DF, 1, function(x) unname(rle(x)$values))
do.call(rbind, lapply(x, `length<-`, max(lengths(x))))
# [,1] [,2] [,3] [,4]
#[1,] 1 2 3 2
#[2,] 2 NA NA NA

Related

Convert 3D array to tidy data frame?

I have a 3D array that looks like this:
# Create two vectors
vector1 <- c(1,2,3,4,5,6)
vector2 <- c(10, 11, 12, 13, 14, 15,16)
# Convert to 3D array
my_array <- array(c(vector1, vector2), dim = c(2,3,2))
print(my_array)
where the output is
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 10 12 14
[2,] 11 13 15
I would like to turn this into a tidy dataset, where is one row per value, and there are 4 columns for each of the values:
the value itself
dimension 1
dimension 2
dimension 3
so for example, a few rows would be
Value Dimension1(Row) Dimension2(Column) Dimension3(Width)
1 1 1 1
2 2 1 1
...
15 2 3 2
Is there a good way to do this in base R, or with tidyverse tools like tidyr?
We could use reshape2::melt
library(reshape2)
melt(my_array)
-output
Var1 Var2 Var3 value
1 1 1 1 1
2 2 1 1 2
3 1 2 1 3
4 2 2 1 4
5 1 3 1 5
6 2 3 1 6
7 1 1 2 10
8 2 1 2 11
9 1 2 2 12
10 2 2 2 13
11 1 3 2 14
12 2 3 2 15
Or use as.data.frame.table in base R
as.data.frame.table(my_array)
Or may also use
cbind(which(is.finite(my_array), arr.ind = TRUE), value = c(my_array))

How to calculate number of specific values in a data frame in R? [duplicate]

This question already has answers here:
How to count the frequency of a string for each row in R
(4 answers)
Counting number of instances of a condition per row R [duplicate]
(1 answer)
Closed 5 years ago.
I have a dataframe df:
a b c
1 5 5
2 3 5
3 3 5
3 3 3
3 3 2
4 2 2
1 2 2
I want to calculate how much 3's I have in a row for example, how can I do it?
For example row 2 = 1, row 3 = 2 etc.
Please advice.
The answer of #ManuelBickel is good if you want to count all of the values. If you really just want to know how many 3's there are, this might be simpler.
rowSums(data==3)
[1] 0 1 2 3
If you want the counts returned in a more ordered fashion
set.seed(1)
m <- matrix(sample(c(1:3, 5), 15, replace=TRUE), 5, dimnames=list(LETTERS[1:5]))
m
# [,1] [,2] [,3]
# A 2 5 1
# B 2 5 1
# C 3 3 3
# D 5 3 2
# E 1 1 5
u <- sort(unique(as.vector(m)))
r <- sapply(setNames(u, u), function(x) rowSums(m == x))
r
# 1 2 3 5
# A 1 1 0 1
# B 1 1 0 1
# C 0 0 3 0
# D 0 1 1 1
# E 2 0 0 1
You can use apply and table for this. The output is a list giving you the counts of unique elements per row. (If this is of interest, setting the MARGIN of apply to 2 would give you the output per column.)
Update: Since others have provided solutions producing more "ordered" output in the meanwhile, I have amended my approach by using data.table::rbindlist for this purpose.
#I have skipped some of the last rows of your example
data <- read.table(text = "
a b c
1 5 5
2 3 5
3 3 5
3 3 3
", header = T, stringsAsFactors = F)
apply(data, 1, table)
# [[1]]
# 1 5
# 1 2
# [[2]]
# 2 3 5
# 1 1 1
# [[3]]
# 3 5
# 2 1
# [[4]]
# 3
# 3
#Update: output in more ordered fashion
library(data.table)
rbindlist(apply(data, 1, function(x) as.data.table(t(as.matrix(table(x)))))
,fill = TRUE
,use.names = TRUE)
# 1 5 2 3
# 1: 1 2 NA NA
# 2: NA 1 1 1
# 3: NA 1 NA 2
# 4: NA NA NA 3
#if necessary NA values might be replaced, see, e.g.,
##https://stackoverflow.com/questions/7235657/fastest-way-to-replace-nas-in-a-large-data-table

split dataframe cumulatively by variable level

With a df like this:
x=data.frame(id=c(1,1,1,2,2,2,3,3,3), val=c(1,2,3,2,3,4,1,3,0))
I want to get output like this:
[[1]]
id val
1 1 1
2 1 2
3 1 3
[[2]]
id val
1 1 1
2 1 2
3 1 3
4 2 2
5 2 3
6 2 4
[[3]]
id val
1 1 1
2 1 2
3 1 3
4 2 2
5 2 3
6 2 4
7 3 1
8 3 3
9 3 0
where the df is split into a list of as many dataframes as there are levels of the splitting variable, i.e. id. Each dataframe should start at the first level and include all rows up to each successive level.
I can do this with a loop:
out<-NULL
for(i in 1:3){
out[[i]] <- x[x$id<=i,]
}
out
However, is there a simpler method using e.g. split that I am overlooking? Ideally a one liner.
You can do this in base R with split and Reduce using the accumulate=TRUE argument. split is used to split the data.frame into a list of data.frames by by ID. Reduce is applies rbind to each list element and adding the accumulate=TRUE successively combines the data.frames in the list.
Reduce(rbind, split(x, x$id), accumulate=TRUE)
[[1]]
id val
1 1 1
2 1 2
3 1 3
[[2]]
id val
1 1 1
2 1 2
3 1 3
4 2 2
5 2 3
6 2 4
[[3]]
id val
1 1 1
2 1 2
3 1 3
4 2 2
5 2 3
6 2 4
7 3 1
8 3 3
9 3 0

recode values into one column

I have a dataframe with one value per row, potentially in one of several columns. How can I create a single column that contains the column number the 1 is in? I would like to do this using dplyr, but the only methods I can think of involve for loops, which seems very not R like.
df<-data.frame(
a=c(1,0,0,0),
b=c(0,1,1,0),
c=c(0,0,0,1)
)
a b c
1 1 0 0
2 0 1 0
3 0 1 0
4 0 0 1
GOAL:
1 1
2 2
3 2
4 3
There is no need for dplyr here. This is what max.col() is for. Since all the other values in the row will be zero, then max.col() will give us the column number where the 1 appears.
max.col(df)
# [1] 1 2 2 3
If you need a column, then
data.frame(x = max.col(df))
# x
# 1 1
# 2 2
# 3 2
# 4 3
Or cbind() or matrix() for a matrix.
We could also do
as.matrix(df) %*%seq_along(df)
# [,1]
#[1,] 1
#[2,] 2
#[3,] 2
#[4,] 3
which(df==1, arr.ind=T)
# row col
# [1,] 1 1
# [2,] 2 2
# [3,] 3 2
# [4,] 4 3

R merging with a preference

Suppose you have a matrix that consists of two columns of only 1's and 2's.
A B
1 2
2 2
1 1
2 1
2 1
2 2
2 1
How would you merge these two columns into one so that 2 always overwrites 1?
Desired Output:
C
2
2
1
2
2
2
2
Assuming that the data is stored in a dataframe named df, you can use
df$C <- pmax(df$A, df$B)
to create a new column C with the desired result.
In the case of a matrix m you can use
m <- cbind(m, pmax(m[,1], m[,2]))
colnames(m) <- LETTERS[1:ncol(m)]
#> m
# A B C
#[1,] 1 2 2
#[2,] 2 2 2
#[3,] 1 1 1
#[4,] 2 1 2
#[5,] 2 1 2
#[6,] 2 2 2
#[7,] 2 1 2
#> class(m)
#[1] "matrix"
Without ifelse:
df$C <- apply(df[,c("A","B")],1,max)
With ifelse:
df$C2 <- with(df, ifelse(A==1&B==1,1,2))
Result
> df
A B C1 C2
1 1 2 2 2
2 2 2 2 2
3 1 1 1 1
4 2 1 2 2
5 2 1 2 2
6 2 2 2 2
7 2 1 2 2

Resources