Data Subsetting: Lists within list in R - r

I have a list containing 100 lists within it, each of which has 552 numerical values. How do I sequentially extract the 1st value (and so on up to 552) from each of the 100 lists?
Example: 5 lists within a list containing the numbers 1-10
list(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), c(1, 2, 3, 4, 5, 6, 7,
8, 9, 10), c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), c(1, 2, 3, 4, 5,
6, 7, 8, 9, 10), c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
I want to extract each term sequentially i.e. 1,1,1,1,1 and then
2,2,2,2,2 and so on

This statement produces a list of vectors, taking the first element of each of your original vectors, the second element, etc., giving NA for the value of a short vector:
num <- max(unlist(lapply(x, length))) ## Length of the longest vector in x
lapply(seq(num), function(i) unlist(lapply(x, `[`, i)))
And here's a matrix approach:
matrix(unlist(x), ncol=length(x))
The rows of that matrix are your elements. This relies on each vector being the same length.

Related

Calculate intraclass correlation by group in R

I need some programming/statistic help.
I have a database with multiple groups (variable "group"). The members of each group rated some items (in our example-dataset the variables "var1", "var2" and "var3").
I would like to get the intraclass variance for each group. In particular i would like to calculate the r*wg(j), ICC(1) and ICC(2).
I looked for a solution but the icc function in r expect to have the raters (my team members) as columns and not as row. I could find a way to do it by creating a subset for every group and then transposing every dataset but I believe there is an easier solution.
Thanks to anyone who can help me with this.
group <- c(1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4)
var1 <- c(4, 5, 4, 2, 3, 4, 5, 3, 5, 8, 4, 3, 4, 4, 5)
var2 <- c(2, 3, 4, 2, 4, 4, 5, 6, 6, 9, 3, 3, 2, 5, 4)
var3 <- c(4, 5, 6, 2, 3, 6, 7, 6, 7, 8, 5, 6, 3, 3, 6)
df <- data.frame(group, var1, var2, var3)

Apply operation on the next element of the data frame column vector

Let's say I have a df like this
df1 <- data.frame(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
var2 = c(2, 8, 0, 7, 3, 4, 1, 10, 13))
I want to get a vector of values which produce following operation:
(x-median(x-1))/median(x-1)
where this -1 refers to index of the element in column. For example, for first element in column var2 the result is:
(2-(median(c( 8, 0, 7, 3, 4, 1, 10, 13))) )/(median(c( 8, 0, 7, 3, 4, 1, 10, 13)))
-0.63636
Thanks!
Using sapply, we can loop over index of each value in var2, ignore that value and calculate median of remaining values and perform the calculation.
sapply(seq_along(df1$var2), function(i) {
med_i <- median(df1$var2[-i])
(df1$var2[i] - med_i)/med_i
})
#[1] -0.6364 1.2857 -1.0000 1.0000 -0.4545 -0.2000 -0.8182 1.8571 2.7143

Selecting columns using ends_with helper and a vector of string names

I have a data frame, in wide format, with each column representing one questionnaire item for one particular version of a questionnaire for a particular time point (repeated measures design).
My data would look something like the following:
df <- data.frame(id = c(1:5), t1_QOL_child_Q1 = c(5, 3, 6, 2, 7), t1_QOL_child_Q2 = c(5, 2, 3, 7, 1), t1_QOL_child_Q3 = c(7, 7, 6, 2, 5), t1_QOL_child_joy = c(9,9, 5, 3, 6), t1_QOL_teen_Q1 = c(5, 3, 6, 2, 7), t1_QOL_teen_Q2 = c(5, 2, 3, 7, 1), t1_QOL_teen_Q3 = c(7, 7, 6, 2, 5), t1_QOL_teen_joy = c(5, 7, 4, 7, 9), t1_QOL_adult_Q1 = c(5, 3, 6, 2, 7), t1_QOL_adult_Q2 = c(5, 2, 3, 7, 1), t1_QOL_adult_Q3 = c(7, 7, 6, 2, 5), t1_QOL_adult_joy = c(6, 5, 3, 3, 2), t2_QOL_child_Q1 = c(5, 3, 6, 2, 7), t2_QOL_child_Q2 = c(5, 2, 3, 7, 1), t2_QOL_child_Q3 = c(7, 7, 6, 2, 5), t2_QOL_child_joy = c(9,9, 5, 3, 6), t2_QOL_teen_Q1 = c(5, 3, 6, 2, 7), t2_QOL_teen_Q2 = c(5, 2, 3, 7, 1), t2_QOL_teen_Q3 = c(7, 7, 6, 2, 5), t2_QOL_teen_joy = c(5, 7, 4, 7, 9), t2_QOL_adult_Q1 = c(5, 3, 6, 2, 7), t2_QOL_adult_Q2 = c(5, 2, 3, 7, 1), t2_QOL_adult_Q3 = c(7, 7, 6, 2, 5), t2_QOL_adult_joy = c(6, 5, 3, 3, 2))
For example, column t1_QOL_child_Q1 would mean Question 1 (Q1) of the child version (child) of Quality of Life (QOL) questionnaire, with time point 1 (t1) data.
I want to select only subscales/columns whose suffix are labelled differently. In the sample data above, it would be the columns ending with "joy".
I have over 3000 columns and many more suffixes and it would be a pain to use the following:
select(df, ends_with("joy"), ends_with(<another suffix>), ends_with(<another suffix>))
I have thought of putting all the potential suffixes in a string vector, and use the vector as an input to the ends_with function, but ends_with could only take a single string instead of a vector of strings.
I have searched on Stackoverflow and found a solution that could accommodate a small vector of strings, which is the following:
select(df, sapply(vector_of_strings, starts_with))
However, I have too many suffixes in my vector of strings and the following error message resulted from it: Error: sapply(vector_of_strings, ends_with) must resolve to integer column positions, not a list
Help appreciated. Thanks!
We can use a single matches with multiple patterns separated by | to match substrings at the end ($) of the string
df %>%
select(matches("(joy|Q2)$"))

Delete all repeated values [duplicate]

This question already has answers here:
How can I remove all duplicates so that NONE are left in a data frame?
(3 answers)
Closed 6 years ago.
If I have a vector:
x <- c(5, 6, 2, 9, 5, 2, 1, 9, 9)
How can I make another vector that contains elements that were never repeated? In this case it would be: c(6, 1) (because 5, 2, and 9 are repeated)
test <- c(5, 6, 2, 9, 5, 2, 1, 9, 9)
setdiff(test, test[duplicated(test)])
vector.a <- c(5, 6, 2, 9, 5, 2, 1, 9, 9)
not.reap <- NULL
for (i in 1:length(vector.a)){
not.reap[i] <- !(vector.a[i] %in% vector.a[-i])
}
vector.a[not.reap]

Subsample a matrix by selection locations with specific values within a matrix in R

I'm have to use R instead of Matlab and I'm new to it.
I have a large array of data repeating like 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10...
I need to find the locations where values equal to 1, 4, 7, 10 are found to create a sample using those locations.
In this case it will be position(=corresponding value) 1(=1) 4(=4) 7(=7) 10(=10) 11(=1) 14(=4) 17(=7) 20(=10) and so on.
in MatLab it would be y=find(ismember(x,[1, 4, 7, 10 ])),
Please, help! Thanks, Pavel
something like this?
foo <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
bar <- c(1, 4, 7, 10)
which(foo %in% bar)
#> [1] 1 4 7 10 11 14 17 20
#nicola, feel free to copy my answer and get the recognition for your answer, simply trying to close answered questions.
The %in% operator is what you want. For example,
# data in x
targets <- c(1, 4, 7, 10)
locations <- x %in% targets
# locations is a logical vector you can then use:
y <- x[locations]
There'll be an extra step or two if you wanted the row and column indices of the locations, but it's not clear if you do. (Note, the logicals will be in column order).

Resources