This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 5 years ago.
If I have a vector of numbers in R.
numbers <- c(1,1, 2,2,2, 3,3, 4,4,4,4, 1)
I want to return a vector that provides the number of times that value has occurred cumulatively along the vector. I.e.
results <- c(1,2, 1,2,3, 1,2, 1,2,3,4, 3)
We can use ave and apply the seq_along by grouping with the 'numbers' vector
ave(numbers, numbers, FUN = seq_along)
#[1] 1 2 1 2 3 1 2 1 2 3 4 3
Related
This question already has answers here:
Creating co-occurrence matrix
(5 answers)
Closed 3 months ago.
I have a R dataframe that consists of two columns, id and text, and I want to turn it into a cooccurrence matrix of word pairs that appear together in the same id's list of words.
So, this dataframe:
df <- data.frame(id = c(1, 1, 1, 2, 2, 2), text = c(but, the, and, but, a, the))
should be turned into something like this:
but
the
and
a
but
2
2
1
1
the
2
2
1
1
and
1
1
1
0
a
1
1
0
1
But at larger scale. I think this toy example should be transferable though. I'm not sure where to even start here, but tidyverse solutions are preferred.
Following this answer:
dat <- crossprod(table(df))
This question already has answers here:
R Create column which holds column name of maximum value for each row
(4 answers)
Closed 1 year ago.
Say we have the following matrix,
x <- matrix(1:9, nrow = 3, dimnames = list(c("X","Y","Z"), c("A","B","C")))
What I'm trying to do is:
1- Find the maximum value of each row. For this part, I'm doing the following,
df <- apply(X=x, MARGIN=1, FUN=max)
2- Then, I want to extract the column names of the maximum values and put them next to the values. Following the reproducible example, it would be "C" for the three rows.
Any assistance would be wonderful.
You can use apply like
maxColumnNames <- apply(x,1,function(row) colnames(x)[which.max(row)])
Since you have a numeric matrix, you can't add the names as an extra column (it would become converted to a character-matrix).
You can choose a data.frame and do
resDf <- cbind(data.frame(x),data.frame(maxColumnNames = maxColumnNames))
resulting in
resDf
A B C maxColumnNames
X 1 4 7 C
Y 2 5 8 C
Z 3 6 9 C
This question already has answers here:
Count observations greater than a particular value [duplicate]
(3 answers)
Closed 2 years ago.
I have a list
x <- c(1,2,3,4,5)
How do I count how many elements are over 3. So the output would be 2.
Take advantage of the fact logical are represented by 1 and 0
sum(1:5 > 3)
You can do like this:
x <- list(1, 2, 3, 4, 5)
sum( x > 3 )
#Output: [1] 2
This question already has answers here:
Remove rows with all or some NAs (missing values) in data.frame
(18 answers)
Closed 3 years ago.
I have two vectors, say A and B
A <- c(1, 2, 3, 4, 5)
B <- c(6, NA, 8, 9, NA)
I would like to exclude elements in A corresponding to the elements of B which comprise NAs.
So, I am in need of an automatic way to remove indices 2 and 5 from both A and B, so that the length of both vectors is the same.
Use is.na
A[!is.na(B)]
#[1] 1 3 4
B[!is.na(B)]
#[1] 6 8 9
Something like
na.omit(cbind(A,B))
This question already has answers here:
Finding ALL duplicate rows, including "elements with smaller subscripts"
(9 answers)
How can I remove all duplicates so that NONE are left in a data frame?
(3 answers)
Closed 5 years ago.
I would like to keep the non-duplicated values from a vector, but without retaining one element from duplicated values. unique() does not work for this. Neither would duplicated().
For example:
> test <- c(1,1,2,3,4,4,4,5,6,6,7,8,9,9)
> unique(test)
[1] 1 2 3 4 5 6 7 8 9
Whereas I would like the result to be: 2,3,5,7,8
Any ideas on how to approach this? Thank you!
We can use duplicated
test[!(duplicated(test)|duplicated(test, fromLast=TRUE))]
#[1] 2 3 5 7 8
You can use ave to count the length of sub-groups divided by unique values in test and retain only the ones whose length is 1 (the ones that have no duplicates)
test[ave(test, test, FUN = length) == 1]
#[1] 2 3 5 7 8
If test is comprised of characters, use seq_along as first argument of ave
test[ave(seq_along(test), test, FUN = length) == 1]