Count equal elements in R - r

I have x is:
x<-c( 1, 2 , 3 , 1 , 4 , 5 , 6 , 2 , 3 , 2 , 3 , 8 )
How can i count the equal elements in x? I want the returned result as 3.
Explanation: There are 3 values(1,2,3) that appeared at least twice.
With x[i]==1 there are 2 elements, count=1
With x[i]==2 there are 3 elements, count=2
With x[i]==3 there are 3 elements, count=3
I want the result is count=3.
Thank you very much!

So if I undestand well, you want to count how many numbers are repeated in the vector.
One way to do it would be to construct a table from the vector, and see how many elements have a count higher than one:
x <- c(1, 2, 3, 1, 4, 5, 6, 2, 3, 2, 3, 8)
tab <- table(x)
sum(tab > 1)
#> [1] 3
Created on 2020-11-26 by the reprex package (v0.3.0)

Here are couple of base R options :
Using rle :
sum(with(rle(sort(x)), lengths > 1))
#[1] 3
With tapply :
sum(tapply(x, x, length) > 1)

library(tibble)
library(dplyr)
x<-c( 1, 2 , 3 , 1 , 4 , 5 , 6 , 2 , 3 , 2 , 3 , 8 )
tibble::tibble(x) %>% dplyr::count(x)
Edit:
As I was kindly asked to add some comments I will gladly do so.
Here I emplyoed packages tibble to create a data.frame (or tibble, rather) and the dplyr package for pivoting.
tibble::tibble(x)
turns the vector x into a tibble (type of data.frame) for further analysis. Unfortunately, the only variable of the tibble x is also called x. Sorry for that!
%>%
The pipe operator takes the value to its left (here, the newly created tibble called x ) and provides it as input for the subsequent command :
dplyr::count(x)
Here, we use dplyr to count() the variable x from the tibble x (again, sorry for that). The result will show which variables occured how many times:
x n
<dbl> <int>
1 1 2
2 2 3
3 3 3
4 4 1
5 5 1
6 6 1
7 8 1
where the first column (1 to 7) are simply row numbers, x are the values provided by the original question and n counts how often each variable occured.

length(unique(x[duplicated(x)]))
# 3
Data
x <- c(1, 2, 3, 1, 4, 5, 6, 2, 3, 2, 3, 8)

Related

Return maximum of conditionally selected pairs from a vector in R

Reproducible example:
set.seed(1)
A <- round(runif(12, min = 1, max = 5))
> A
[1] 1 2 2 4 3 4 3 4 5 3 4 5
expectedResult <- c(max(A[1], A[4]), max(A[2], A[5]), max(A[3], A[6]), max(A[7], A[10]), max(A[8], A[11]), max(A[9], A[12]))
> expectedResult
[1] 4 3 4 3 4 5
Each A needs to be considered as a collection of segments with 6 elements. For example, A here has 2 segments such as A[1:6] and A[7:12]. For each segment, the first 3 elements are compared with the next 3 elements. Therefore I need to take max(A[1],A[4]), max(A[2], A[5]), max(A[2], A[5]), max(A[3], A[6]), max(A[7], A[10]), max(A[8], A[11]), max(A[9], A[12]).
My original vector has way more elements than this example and therefore I need a much simpler approach to do this. In addition, speed is also a factor for the original calculation and therefore looking for a fast solution as well.
We could create a function to split the vector by 'n' elements, loop over the list, create a matrix with nrow specified as 2, use pmax to do elementwise max after converting to data.frame, return the output by unlisting the list
f1 <- function(vec, n) {
lst1 <- split(vec, as.integer(gl(length(vec), n, length(vec))))
unname(unlist(lapply(lst1, function(x)
do.call(pmax, as.data.frame(t(matrix(x, nrow = 2, byrow = TRUE)))))))
}
-output
> f1(A, 6)
[1] 4 3 4 3 4 5
If the length is not a multiple of 3 or 6, another option is to do a group by operation with tapply after splitting
unname(unlist(lapply(split(A, as.integer(gl(length(A), 6,
length(A)))), function(x) tapply(x, (seq_along(x)-1) %% 3 + 1, FUN = max))))
[1] 4 3 4 3 4 5
data
A <- c(1, 2, 2, 4, 3, 4, 3, 4, 5, 3, 4, 5)
Another option in base R:
a <- 6
unlist(tapply(A, gl(length(A)/a, a),
function(x) pmax(head(x, a/2), tail(x, a/2))),, FALSE)
[1] 4 3 4 3 4 5
or even
a <- 6
unlist(tapply(A, gl(length(A)/a, a),
function(x) do.call(pmax, data.frame(matrix(x, ncol = 2)))),, FALSE)
[1] 4 3 4 3 4 5
You can reshape the vector to a 3d array, split by column, and take the parallel max. This should be pretty efficient as far as base R goes.
do.call(pmax.int, asplit(`dim<-`(A, c(3,2,2)), 2))
[1] 4 3 4 3 4 5

Subsetting of data.frames with variable name vs. column number

I am fairly new to R and I have run into a problem with subsetting data frames a number of times. I have found a fix but would just like to understand what I am missing.
Here is an exemplary bit of code, where I don't understand the functional difference.
Example data frame:
df <- data.frame(V1 = c(1:10), V2 = c(rep(1, times = 10)))
this produces an "undefined columns selected" error:
df1 <- df[df$V1 < 5, df$V2]
but this works:
df2 <- df[df$V1 < 5, 2]
I don't understand why when reffering to the column by its name via $V2 I do not recieve the same result as when reffering to the same column by its number.
This is a really basic question, I am aware, but I would just like to get my head around it.
Thanks and also sorry if formatting is off or anything (first time posting..),
Christoph
df[df$V1 < 5, df$V2] doesn't give an "undefined columns selected" error.
df[df$V1 < 5, df$V2]
# V1 V1.1 V1.2 V1.3 V1.4 V1.5 V1.6 V1.7 V1.8 V1.9
#1 1 1 1 1 1 1 1 1 1 1
#2 2 2 2 2 2 2 2 2 2 2
#3 3 3 3 3 3 3 3 3 3 3
#4 4 4 4 4 4 4 4 4 4 4
As you have only 1 in df$V2 and 1st column is present in your dataframe. It selects 1st column for length(df$V2) times and as it is not advised to have columns with same name it adds prefix .1, .2 to it.
This is same as doing
df[df$V1 < 5, c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)]
It would give an undefined column selected error , if you select columns which are not present in data.
df[df$V1 < 5, c(1, 3)]
Error in [.data.frame(df, df$V1 < 5, c(1, 3)) :
undefined columns selected
There are different ways in which you can access data
By column name which is
df[df$V1 < 5, "V2"]
#[1] 1 1 1 1
Or
df$V2[df$V1 < 5]
and by column position.
df[df$V1 < 5, 2]
#[1] 1 1 1 1

How to list R data frame variables in alphabetical order? [duplicate]

This is possibly a simple question, but I do not know how to order columns alphabetically.
test = data.frame(C = c(0, 2, 4, 7, 8), A = c(4, 2, 4, 7, 8), B = c(1, 3, 8, 3, 2))
# C A B
# 1 0 4 1
# 2 2 2 3
# 3 4 4 8
# 4 7 7 3
# 5 8 8 2
I like to order the columns by column names alphabetically, to achieve
# A B C
# 1 4 1 0
# 2 2 3 2
# 3 4 8 4
# 4 7 3 7
# 5 8 2 8
For others I want my own defined order:
# B A C
# 1 4 1 0
# 2 2 3 2
# 3 4 8 4
# 4 7 3 7
# 5 8 2 8
Please note that my datasets are huge, with 10000 variables. So the process needs to be more automated.
You can use order on the names, and use that to order the columns when subsetting:
test[ , order(names(test))]
A B C
1 4 1 0
2 2 3 2
3 4 8 4
4 7 3 7
5 8 2 8
For your own defined order, you will need to define your own mapping of the names to the ordering. This would depend on how you would like to do this, but swapping whatever function would to this with order above should give your desired output.
You may for example have a look at Order a data frame's rows according to a target vector that specifies the desired order, i.e. you can match your data frame names against a target vector containing the desired column order.
Here's the obligatory dplyr answer in case somebody wants to do this with the pipe.
test %>%
select(sort(names(.)))
test = data.frame(C=c(0,2,4, 7, 8), A=c(4,2,4, 7, 8), B=c(1, 3, 8,3,2))
Using the simple following function replacement can be performed (but only if data frame does not have many columns):
test <- test[, c("A", "B", "C")]
for others:
test <- test[, c("B", "A", "C")]
An alternative option is to use str_sort() from library stringr, with the argument numeric = TRUE. This will correctly order column that include numbers not just alphabetically:
str_sort(c("V3", "V1", "V10"), numeric = TRUE)
# [1] V1 V3 V10
test[,sort(names(test))]
sort on names of columns can work easily.
If you only want one or more columns in the front and don't care about the order of the rest:
require(dplyr)
test %>%
select(B, everything())
So to have a specific column come first, then the rest alphabetically, I'd propose this solution:
test[, c("myFirstColumn", sort(setdiff(names(test), "myFirstColumn")))]
Here is what I found out to achieve a similar problem with my data set.
First, do what James mentioned above, i.e.
test[ , order(names(test))]
Second, use the everything() function in dplyr to move specific columns of interest (e.g., "D", "G", "K") at the beginning of the data frame, putting the alphabetically ordered columns after those ones.
select(test, D, G, K, everything())
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
Similar to other syntax above but for learning - can you sort by column names?
sort(colnames(test[1:ncol(test)] ))
another option is..
mtcars %>% dplyr::select(order(names(mtcars)))
In data.table you can use the function setcolorder:
setcolorder reorders the columns of data.table, by reference, to the
new order provided.
Here a reproducible example:
library(data.table)
test = data.table(C = c(0, 2, 4, 7, 8), A = c(4, 2, 4, 7, 8), B = c(1, 3, 8, 3, 2))
setcolorder(test, c(order(names(test))))
test
#> A B C
#> 1: 4 1 0
#> 2: 2 3 2
#> 3: 4 8 4
#> 4: 7 3 7
#> 5: 8 2 8
Created on 2022-07-10 by the reprex package (v2.0.1)

How to reorder columns of a data.frame with na condition? [duplicate]

This is possibly a simple question, but I do not know how to order columns alphabetically.
test = data.frame(C = c(0, 2, 4, 7, 8), A = c(4, 2, 4, 7, 8), B = c(1, 3, 8, 3, 2))
# C A B
# 1 0 4 1
# 2 2 2 3
# 3 4 4 8
# 4 7 7 3
# 5 8 8 2
I like to order the columns by column names alphabetically, to achieve
# A B C
# 1 4 1 0
# 2 2 3 2
# 3 4 8 4
# 4 7 3 7
# 5 8 2 8
For others I want my own defined order:
# B A C
# 1 4 1 0
# 2 2 3 2
# 3 4 8 4
# 4 7 3 7
# 5 8 2 8
Please note that my datasets are huge, with 10000 variables. So the process needs to be more automated.
You can use order on the names, and use that to order the columns when subsetting:
test[ , order(names(test))]
A B C
1 4 1 0
2 2 3 2
3 4 8 4
4 7 3 7
5 8 2 8
For your own defined order, you will need to define your own mapping of the names to the ordering. This would depend on how you would like to do this, but swapping whatever function would to this with order above should give your desired output.
You may for example have a look at Order a data frame's rows according to a target vector that specifies the desired order, i.e. you can match your data frame names against a target vector containing the desired column order.
Here's the obligatory dplyr answer in case somebody wants to do this with the pipe.
test %>%
select(sort(names(.)))
test = data.frame(C=c(0,2,4, 7, 8), A=c(4,2,4, 7, 8), B=c(1, 3, 8,3,2))
Using the simple following function replacement can be performed (but only if data frame does not have many columns):
test <- test[, c("A", "B", "C")]
for others:
test <- test[, c("B", "A", "C")]
An alternative option is to use str_sort() from library stringr, with the argument numeric = TRUE. This will correctly order column that include numbers not just alphabetically:
str_sort(c("V3", "V1", "V10"), numeric = TRUE)
# [1] V1 V3 V10
test[,sort(names(test))]
sort on names of columns can work easily.
If you only want one or more columns in the front and don't care about the order of the rest:
require(dplyr)
test %>%
select(B, everything())
So to have a specific column come first, then the rest alphabetically, I'd propose this solution:
test[, c("myFirstColumn", sort(setdiff(names(test), "myFirstColumn")))]
Here is what I found out to achieve a similar problem with my data set.
First, do what James mentioned above, i.e.
test[ , order(names(test))]
Second, use the everything() function in dplyr to move specific columns of interest (e.g., "D", "G", "K") at the beginning of the data frame, putting the alphabetically ordered columns after those ones.
select(test, D, G, K, everything())
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
Similar to other syntax above but for learning - can you sort by column names?
sort(colnames(test[1:ncol(test)] ))
another option is..
mtcars %>% dplyr::select(order(names(mtcars)))
In data.table you can use the function setcolorder:
setcolorder reorders the columns of data.table, by reference, to the
new order provided.
Here a reproducible example:
library(data.table)
test = data.table(C = c(0, 2, 4, 7, 8), A = c(4, 2, 4, 7, 8), B = c(1, 3, 8, 3, 2))
setcolorder(test, c(order(names(test))))
test
#> A B C
#> 1: 4 1 0
#> 2: 2 3 2
#> 3: 4 8 4
#> 4: 7 3 7
#> 5: 8 2 8
Created on 2022-07-10 by the reprex package (v2.0.1)

Sort columns of a dataframe by column name

This is possibly a simple question, but I do not know how to order columns alphabetically.
test = data.frame(C = c(0, 2, 4, 7, 8), A = c(4, 2, 4, 7, 8), B = c(1, 3, 8, 3, 2))
# C A B
# 1 0 4 1
# 2 2 2 3
# 3 4 4 8
# 4 7 7 3
# 5 8 8 2
I like to order the columns by column names alphabetically, to achieve
# A B C
# 1 4 1 0
# 2 2 3 2
# 3 4 8 4
# 4 7 3 7
# 5 8 2 8
For others I want my own defined order:
# B A C
# 1 4 1 0
# 2 2 3 2
# 3 4 8 4
# 4 7 3 7
# 5 8 2 8
Please note that my datasets are huge, with 10000 variables. So the process needs to be more automated.
You can use order on the names, and use that to order the columns when subsetting:
test[ , order(names(test))]
A B C
1 4 1 0
2 2 3 2
3 4 8 4
4 7 3 7
5 8 2 8
For your own defined order, you will need to define your own mapping of the names to the ordering. This would depend on how you would like to do this, but swapping whatever function would to this with order above should give your desired output.
You may for example have a look at Order a data frame's rows according to a target vector that specifies the desired order, i.e. you can match your data frame names against a target vector containing the desired column order.
Here's the obligatory dplyr answer in case somebody wants to do this with the pipe.
test %>%
select(sort(names(.)))
test = data.frame(C=c(0,2,4, 7, 8), A=c(4,2,4, 7, 8), B=c(1, 3, 8,3,2))
Using the simple following function replacement can be performed (but only if data frame does not have many columns):
test <- test[, c("A", "B", "C")]
for others:
test <- test[, c("B", "A", "C")]
An alternative option is to use str_sort() from library stringr, with the argument numeric = TRUE. This will correctly order column that include numbers not just alphabetically:
str_sort(c("V3", "V1", "V10"), numeric = TRUE)
# [1] V1 V3 V10
test[,sort(names(test))]
sort on names of columns can work easily.
If you only want one or more columns in the front and don't care about the order of the rest:
require(dplyr)
test %>%
select(B, everything())
So to have a specific column come first, then the rest alphabetically, I'd propose this solution:
test[, c("myFirstColumn", sort(setdiff(names(test), "myFirstColumn")))]
Here is what I found out to achieve a similar problem with my data set.
First, do what James mentioned above, i.e.
test[ , order(names(test))]
Second, use the everything() function in dplyr to move specific columns of interest (e.g., "D", "G", "K") at the beginning of the data frame, putting the alphabetically ordered columns after those ones.
select(test, D, G, K, everything())
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
Similar to other syntax above but for learning - can you sort by column names?
sort(colnames(test[1:ncol(test)] ))
another option is..
mtcars %>% dplyr::select(order(names(mtcars)))
In data.table you can use the function setcolorder:
setcolorder reorders the columns of data.table, by reference, to the
new order provided.
Here a reproducible example:
library(data.table)
test = data.table(C = c(0, 2, 4, 7, 8), A = c(4, 2, 4, 7, 8), B = c(1, 3, 8, 3, 2))
setcolorder(test, c(order(names(test))))
test
#> A B C
#> 1: 4 1 0
#> 2: 2 3 2
#> 3: 4 8 4
#> 4: 7 3 7
#> 5: 8 2 8
Created on 2022-07-10 by the reprex package (v2.0.1)

Resources