This question already has answers here:
Count of unique elements of each row in a data frame in R
(3 answers)
Closed 5 years ago.
I want to count the number of unique values per row.
For instance with this data frame:
example <- data.frame(var1 = c(2,3,3,2,4,5),
var2 = c(2,3,5,4,2,5),
var3 = c(3,3,4,3,4,5))
I want to add a column which counts the number of unique values per row; e.g. 2 for the first row (as there are 2's and 3's in the first row) and 1 for the second row (as there are only 3's in the second row).
Does anyone know an easy code to do this? Up until now I only found code for counting the number of unique values per column.
This apply function returns a vector of the number of unique values in each row:
apply(example, 1, function(x)length(unique(x)))
You can append it to your data.frame using on of the following two ways (and if you want to name that column as count):
example <- cbind(example, count = apply(example, 1, function(x)length(unique(x))))
or
example$count <- apply(example, 1, function(x)length(unique(x)))
We can also use a vectorized approach with regex. After pasteing the elements of each row of the dataset (do.call(paste0, ...), match a pattern of any character, capture as a group ((.)), using the positive lookahead, match characters only if it appears again later in the string (\\1 - backreference for the captured group and replace it with blank (""). So, in effect only those characters remain that will be unique. Then, with nchar we count the number of characters in the string.
example$count <- nchar(gsub("(.)(?=.*?\\1)", "", do.call(paste0, example), perl = TRUE))
example$count
#[1] 2 1 3 3 2 1
Related
This question already has answers here:
Find value corresponding to maximum in other column [duplicate]
(2 answers)
Closed 1 year ago.
I want the frequency with the maximum samples in df
df <- data.frame(Freq = c(1,2,3,4,5,6,7,8,9,10), Valu = c(10,5,11,7,13,15,9,6,12,12))
apply(df, 2, which.max)
.
What I want
I want it to print just the frequency of the maximum Valu which is 6
We could use which.max on the column 'Sample', get the index and extract ([), the corresponding 'Freq' value
with(df, Freq[which.max(Valu)])
#[1] 6
If the column names are changing, then use position index
df[[1]][which.max(df[[2]])]
[1] 6
Or may use order as well
df[[1]][order(-df[[2]])][1]
[1] 6
If we loop over the columns (*apply) with MARGIN = 2 and apply the function which.max, it returns the index of max for those columns separately
I have a data frame with 2 columns and 26 rows, the first column is composed of characters while the second column is composed of numbers.
I also have a vector with a random selection of 5 characters.
I want to sum the numbers from column two of the 5 random characters.
How can I calculate this sum?
We can use aggregate
aggregate(ints ~ char, data1, sum)
Maybe what you need is :
result <- sum(data1$ints[data1$char %in% sample1], na.rm = TRUE)
This will sum the ints value in data1 which is present in sample1.
This question already has answers here:
How to retrieve the most repeated value in a column present in a data frame
(9 answers)
Closed 2 years ago.
I was given a sample vector v and was asked to use R code to extract, as a number (meaning: not as a character string), the value that was repeated the most times in v.
(Hints: use table(); note that which.max() gives you index of a vector's maximum value, like the maximum value within a table; names() allows you the extract the values of the original vector, when applied to the output of table().)
My answer is as follows:
names(which.max(table(v)))
it returns the correct answer as a string, not as a number. Am i using the hint correctly? Thanks.
names return the number as character, perhaps add as.integer/as.numeric to convert it to number.
as.integer(names(which.max(table(v))))
Moreover, in case of tie which.max would return only the first maximum. If you want all the values which are tied you can use :
v <- c(1, 1, 2, 4, 5, 3, 3)
as.integer(names(which.max(table(v))))
#[1] 1
tab <- table(v)
as.integer(names(tab[max(tab) == tab]))
#[1] 1 3
This question already has answers here:
How to index a vector sequence within a vector sequence
(5 answers)
Closed 5 years ago.
I have got a dataframe and I need to find row numbers where the values of the entries in one column match a certain pattern.
Let the col1 col1 = matrix(c(1,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,0,0,0,0,1), nrow = 21, ncol = 1) be an example of by column and vector r r = c(2, 0 ,2) be a vector I need to match it with.
I need R to return an index number of rows where the pattern in r matches the values in col1 (in this case row 11, 12, 13).
I thought I could achieve this with row.match, but that is not the case. I have tried different combinations of match function, but it doesn't yield any results either.
Maybe the way I am approaching this problem is wrong from the beginning, but I have trouble believing that there isn't any function, that would provide me with the expected result given some adjustment.
Thanks.
You could do this using rollapply from zoo. Basically, this runs identical on a rolling basis with a window of length(r). This tells you that the sequence is present starting at positon 11 of the col1 vector..
library(zoo)
which(rollapply(col1,length(r),identical,r))
[1] 11
To get a vector of positions, you could do:
which(rollapply(col1,length(r),identical,r))+0:(length(r)-1)
[1] 11 12 13
This question already has answers here:
Remove duplicate rows of a matrix or dataframe
(4 answers)
Closed 6 years ago.
Consider the following worked example.
person_A <- c(1,1,1,2,2,3,3,3,4,4,4,5,6)
person_B <- c(3,4,5,9,1,1,8,7,1,3,7,6,5)
df1 <- data.frame(person_A, person_B)
So in each row we have an ID of person_A and person_B
I want to filter df1 and remove the duplicate combinations of person_A and person_B and only have the unique combinations as an output. But, we must check the switched combinations (between person_A--person_B and person_B--person_A)
In other words, I want to remove the parts shaded in red
We can use duplicated. We use apply to sort the elements by row (MARGIN = 1), then transpose the output, use duplicated to find the duplicate elements as logical vector, negate (!), and subset only the unique rows
df1[!duplicated(t(apply(df1, 1, sort))),]