Delete all repeated values [duplicate] - r

This question already has answers here:
How can I remove all duplicates so that NONE are left in a data frame?
(3 answers)
Closed 6 years ago.
If I have a vector:
x <- c(5, 6, 2, 9, 5, 2, 1, 9, 9)
How can I make another vector that contains elements that were never repeated? In this case it would be: c(6, 1) (because 5, 2, and 9 are repeated)

test <- c(5, 6, 2, 9, 5, 2, 1, 9, 9)
setdiff(test, test[duplicated(test)])

vector.a <- c(5, 6, 2, 9, 5, 2, 1, 9, 9)
not.reap <- NULL
for (i in 1:length(vector.a)){
not.reap[i] <- !(vector.a[i] %in% vector.a[-i])
}
vector.a[not.reap]

Related

Subset every 5 rows by group?

I have a dataset with multiple groups, and want to subset rows within groups along multiples of 5, with the addition of the first row (so row 1,5,10,15, etc within every group).
Right now my dataset has a column named "Group ID" and a few other columns (e.g. time, date, etc), but nothing indicating row numbers of any kind.
Any help would be appreciated! I was thinking maybe something compatible with dplyr? I was trying things using the function slice but no luck so far.
You need to create the sequence within each group and then just use filter
library(dplyr)
df <- data.frame(id = c(1, 2, 1, 2, 2, 3, 4, 3, 1, 2, 4, 4, 4, 3, 1, 1, 1, 2, 2),
b = c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6))
df <- df %>%
group_by(id) %>%
mutate(group_index = row_number()) %>%
filter(group_index == 1 | group_index %% 5 == 0)

Apply operation on the next element of the data frame column vector

Let's say I have a df like this
df1 <- data.frame(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
var2 = c(2, 8, 0, 7, 3, 4, 1, 10, 13))
I want to get a vector of values which produce following operation:
(x-median(x-1))/median(x-1)
where this -1 refers to index of the element in column. For example, for first element in column var2 the result is:
(2-(median(c( 8, 0, 7, 3, 4, 1, 10, 13))) )/(median(c( 8, 0, 7, 3, 4, 1, 10, 13)))
-0.63636
Thanks!
Using sapply, we can loop over index of each value in var2, ignore that value and calculate median of remaining values and perform the calculation.
sapply(seq_along(df1$var2), function(i) {
med_i <- median(df1$var2[-i])
(df1$var2[i] - med_i)/med_i
})
#[1] -0.6364 1.2857 -1.0000 1.0000 -0.4545 -0.2000 -0.8182 1.8571 2.7143

Selecting columns using ends_with helper and a vector of string names

I have a data frame, in wide format, with each column representing one questionnaire item for one particular version of a questionnaire for a particular time point (repeated measures design).
My data would look something like the following:
df <- data.frame(id = c(1:5), t1_QOL_child_Q1 = c(5, 3, 6, 2, 7), t1_QOL_child_Q2 = c(5, 2, 3, 7, 1), t1_QOL_child_Q3 = c(7, 7, 6, 2, 5), t1_QOL_child_joy = c(9,9, 5, 3, 6), t1_QOL_teen_Q1 = c(5, 3, 6, 2, 7), t1_QOL_teen_Q2 = c(5, 2, 3, 7, 1), t1_QOL_teen_Q3 = c(7, 7, 6, 2, 5), t1_QOL_teen_joy = c(5, 7, 4, 7, 9), t1_QOL_adult_Q1 = c(5, 3, 6, 2, 7), t1_QOL_adult_Q2 = c(5, 2, 3, 7, 1), t1_QOL_adult_Q3 = c(7, 7, 6, 2, 5), t1_QOL_adult_joy = c(6, 5, 3, 3, 2), t2_QOL_child_Q1 = c(5, 3, 6, 2, 7), t2_QOL_child_Q2 = c(5, 2, 3, 7, 1), t2_QOL_child_Q3 = c(7, 7, 6, 2, 5), t2_QOL_child_joy = c(9,9, 5, 3, 6), t2_QOL_teen_Q1 = c(5, 3, 6, 2, 7), t2_QOL_teen_Q2 = c(5, 2, 3, 7, 1), t2_QOL_teen_Q3 = c(7, 7, 6, 2, 5), t2_QOL_teen_joy = c(5, 7, 4, 7, 9), t2_QOL_adult_Q1 = c(5, 3, 6, 2, 7), t2_QOL_adult_Q2 = c(5, 2, 3, 7, 1), t2_QOL_adult_Q3 = c(7, 7, 6, 2, 5), t2_QOL_adult_joy = c(6, 5, 3, 3, 2))
For example, column t1_QOL_child_Q1 would mean Question 1 (Q1) of the child version (child) of Quality of Life (QOL) questionnaire, with time point 1 (t1) data.
I want to select only subscales/columns whose suffix are labelled differently. In the sample data above, it would be the columns ending with "joy".
I have over 3000 columns and many more suffixes and it would be a pain to use the following:
select(df, ends_with("joy"), ends_with(<another suffix>), ends_with(<another suffix>))
I have thought of putting all the potential suffixes in a string vector, and use the vector as an input to the ends_with function, but ends_with could only take a single string instead of a vector of strings.
I have searched on Stackoverflow and found a solution that could accommodate a small vector of strings, which is the following:
select(df, sapply(vector_of_strings, starts_with))
However, I have too many suffixes in my vector of strings and the following error message resulted from it: Error: sapply(vector_of_strings, ends_with) must resolve to integer column positions, not a list
Help appreciated. Thanks!
We can use a single matches with multiple patterns separated by | to match substrings at the end ($) of the string
df %>%
select(matches("(joy|Q2)$"))

Subsample a matrix by selection locations with specific values within a matrix in R

I'm have to use R instead of Matlab and I'm new to it.
I have a large array of data repeating like 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10...
I need to find the locations where values equal to 1, 4, 7, 10 are found to create a sample using those locations.
In this case it will be position(=corresponding value) 1(=1) 4(=4) 7(=7) 10(=10) 11(=1) 14(=4) 17(=7) 20(=10) and so on.
in MatLab it would be y=find(ismember(x,[1, 4, 7, 10 ])),
Please, help! Thanks, Pavel
something like this?
foo <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
bar <- c(1, 4, 7, 10)
which(foo %in% bar)
#> [1] 1 4 7 10 11 14 17 20
#nicola, feel free to copy my answer and get the recognition for your answer, simply trying to close answered questions.
The %in% operator is what you want. For example,
# data in x
targets <- c(1, 4, 7, 10)
locations <- x %in% targets
# locations is a logical vector you can then use:
y <- x[locations]
There'll be an extra step or two if you wanted the row and column indices of the locations, but it's not clear if you do. (Note, the logicals will be in column order).

Data Subsetting: Lists within list in R

I have a list containing 100 lists within it, each of which has 552 numerical values. How do I sequentially extract the 1st value (and so on up to 552) from each of the 100 lists?
Example: 5 lists within a list containing the numbers 1-10
list(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), c(1, 2, 3, 4, 5, 6, 7,
8, 9, 10), c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), c(1, 2, 3, 4, 5,
6, 7, 8, 9, 10), c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
I want to extract each term sequentially i.e. 1,1,1,1,1 and then
2,2,2,2,2 and so on
This statement produces a list of vectors, taking the first element of each of your original vectors, the second element, etc., giving NA for the value of a short vector:
num <- max(unlist(lapply(x, length))) ## Length of the longest vector in x
lapply(seq(num), function(i) unlist(lapply(x, `[`, i)))
And here's a matrix approach:
matrix(unlist(x), ncol=length(x))
The rows of that matrix are your elements. This relies on each vector being the same length.

Resources