Hi I'm new to R and for a school project I'm trying to to create a lists of lists that I can access by index and append to. Something like
aList[1] = A, B, C
aList[1] returns [1] A, B, C
aList[1] += D
aList[1] returns [1] A, B, C, D
aList[2] = 1, 2, 3
aList[2] returns [2] 1, 2, 3
aList returns [1] A, B, C, D
[2] 1, 2, 3
However, I'm not sure if I'm using the right datatype (and definitely not the proper syntax) as everything I've tried just either makes a single index of a list or makes multiple indexes of one item.
This isn't the homework. This shouldn't even be an issue but I can't find a solution.
Lists in R are separate from vectors- each item in a vector can only be a basic type like a number or a string, while a list can contains vectors or other lists. It sounds like you want to create a list of vectors. This could be done as:
> aList = list(c("A", "B", "C"), c(1, 2, 3))
> aList[[1]]
[1] "A" "B" "C"
> aList[[1]] = c(aList[[1]], "D")
> aList[[1]]
[1] "A" "B" "C" "D"
> aList[[2]]
[1] 1 2 3
> aList
[[1]]
[1] "A" "B" "C" "D"
[[2]]
[1] 1 2 3
Note that you normally access a list using double brackets, like [[1]]. If you access a list using single brackets, you'll get a subset of the list:
[[1]]
[1] "A" "B" "C" "D"
Which isn't what you want if you want to modify that item.
Related
I am trying to match between all available combinations of vectors.
For example, i have 4 vectors:
a<-c(1,2,3,4)
b<-c(1,2,6,7)
c<-c(1,2,8,9)
d<-c(3,6,8,2)
The intended output should be able to tell me:
similarity between a & b: 1, 2
similarity between a & c: 1, 2
similarity between a & d: 2, 3
similarity between b & c: 1, 2
similarity between b & d: 2, 6
similarity between c & d: 2, 8
similarity between a & b & c: 1, 2
similarity between b & c & d: 2
similarity between a & c & d: 2
similarity between a & b & d: 2
similarity between a & b & c & d: 2
Does R have a function that does such comparison/ matching?
For simplicity, the number of vectors is set at 4 for now. I am in fact dealing with 100s of vectors and would like to match/intersect/compare between all possible combinations of vectors. For example with 4 vectors, there will be a possible 4C2+4C3+4C4=11 available combinations. With 100 vectors, there will be a possible 100C100+ 100C99+100C98+...+100C2 available combinations
thanks in advance
intersect seems to do what you want. It only does pairs of vectors at a time though eg
intersect(a, b) # 1 2
intersect(b, intersect(c, d)) # 2
If you want a shorthand to intersect more than 2, try Reduce (?Reduce)
# intersection of a/b/c/d
Reduce(intersect, list(a, b, c, d), a)
# intersection of b/c/d
Reduce(intersect, list(b, c, d), b)
Reduce will successively apply intersect to the list and the result of the previous intersect call, starting with intersect(b, b) (the init argument I just set to one of the vectors to be intersected, as the intersection of a set with itself is the set).
If you wanted a way to go through all (pairs, tuples, quadruples) of (a, b, c, d) and return the intersection, you could try
generate all combinations of (a, b, c, d) in lengths 2 (pairs), 3 (tuples), 4 (quadruples)
combos = lapply(2:4, combn, x=c('a', 'b', 'c', 'd'), simplify=F)
# [[1]]
# [[1]][[1]]
# [1] "a" "b"
# [[1]][[2]]
# [1] "a" "c"
# ...
# [[2]]
# [[2]][[1]]
# [1] "a" "b" "c"
# [[2]][[2]]
# [1] "a" "b" "d"
# ...
# [[3]]
# [[3]][[1]]
# [1] "a" "b" "c" "d"
Flatten it out to just a list of character vectors
combos = unlist(combos, recursive=F)
# [[1]]
# [1] "a" "b"
# ...
# [[10]]
# [1] "b" "c" "d"
# [[11]]
# [1] "a" "b" "c" "d"
For each set, call Reduce as specified above. We can use (e.g.) get("a") to get the variable a; or mget(c("a", "b", "c") to get the variables a, b, c in a list. If your variables are columns in a dataframe, then you can modify appropriately.
intersects = lapply(combos, function (varnames) {
Reduce(intersect, mget(varnames, inherits=T), get(varnames[1]))
})
# add some labels for clarity.
# You will probably actually want to /do/ something with the
# resulting intersections rather than this.
names(intersects) <- sapply(combos, paste, collapse=", ")
intersects
# $`a, b`
# [1] 1 2
# $`a, c`
# [1] 1 2
# ...
# $`a, b, c, d`
# [1] 2
You will need to modify to suit how your data is in R; e.g. columns of a dataframe vs named vectors in the workspace and so on.
You might also just prefer a for loop from step 3. onwards rather than all the *apply depending on what you want to do with the result. (Plus, if you have many vectors, holding all the intersections simultaneously in memory might not be a good idea anyway).
I have a vector like:
A B C A B A B D D E
and I'd like to break it into as many vectors as the number of "A" I have, like:
A B C
A B
A B D D E
is there a way to accomplish this task?
You can use split and cumsum:
split(x, cumsum(x == "A"))
What you get in return is a list of vectors. A list seems most useful to me here since it allows vectors of different sizes in each element (unlike a data.frame for instance).
Not as elegant as split approach but we can go also for strsplit:
strsplit(paste0("A", strsplit(paste0(vec, collapse = ""), "A")[[1]][-1]),"")
# [[1]]
# [1] "A" "B" "C"
# [[2]]
# [1] "A" "B"
# [[3]]
# [1] "A" "B" "D" "D" "E"
I encountered this problem, I have two lists which have the same dimension (or the same number of elements), and they are linked.
For example, one list stores the ID number of the students, while another one stores the exam marks of these students.
I want to sort the exam marks from small to large, but I do not want to loose the one-to-one link between students ID and their marks. How can I do so in R ?
Let's assume a list like this:
a <- list(id = c("c", "a", "b"), score = c(2, 9, 12))
> a
$id
[1] "c" "a" "b"
$score
[1] 2 9 12
Then you can sort it using lapply...
> lapply(a, function(x)x[order(a$id)])
$id
[1] "a" "b" "c"
$score
[1] 9 12 2
> lapply(a, function(x)x[order(a$score, decreasing=TRUE)])
$id
[1] "b" "a" "c"
$score
[1] 12 9 2
I am trying to generate a random sequence from a fixed number of characters that contains at least one of each character.
For example having the ensemble
m = letters[1:3]
I would like to create a sequence of N = 10 elements that contain at least one of each m characters, like
a
a
a
a
b
c
c
c
c
a
I tried with sample(n,N,replace=T) but in this way also a sequence like
a
a
a
a
a
c
c
c
c
a
can be generated that does not contain b.
f <- function(x, n){
sample(c(x, sample(m, n-length(x), replace=TRUE)))
}
f(letters[1:3], 5)
# [1] "a" "c" "a" "b" "a"
f(letters[1:3], 5)
# [1] "a" "a" "b" "b" "c"
f(letters[1:3], 5)
# [1] "a" "a" "b" "c" "a"
f(letters[1:3], 5)
# [1] "b" "c" "b" "c" "a"
Josh O'Briens answer is a good way to do it but doesn't provide much input checking. Since I already wrote it might as well present my answer. It's pretty much the same thing but takes care of checking things like only considering unique items and making sure there are enough unique items to guarantee you get at least one of each.
at_least_one_samp <- function(n, input){
# Only consider unique items.
items <- unique(input)
unique_items_count <- length(items)
if(unique_items_count > n){
stop("Not enough unique items in input to give at least one of each")
}
# Get values for vector - force each item in at least once
# then randomly select values to get the remaining.
vals <- c(items, sample(items, n - unique_items_count, replace = TRUE))
# Now shuffle them
sample(vals)
}
m <- c("a", "b", "c")
at_least_one_samp(10, m)
The data format is as following, the first column is the id:
1, b, c
2, a, d, e, f
3, u, i, c
4, k, m
5, o
However, i can do nothing to analyze this data. Do you have a good idea of how to read the data into R? Further, My question is: How to analyze the data whose different rows have different number of elements using R?
It seems you are trying to read a file with elements of unequal length. The structure in R that is list.
It is possible to do this by combining read.table with sep="\n" and then to apply strsplit on each row of data.
Here is an example:
dat <- "
1 A B
2 C D E
3 F G H I J
4 K L
5 M"
The code to read and convert to a list:
x <- read.table(textConnection(dat), sep="\n")
apply(x, 1, function(i)strsplit(i, "\\s")[[1]])
The results:
[[1]]
[1] "1" "A" "B"
[[2]]
[1] "2" "C" "D" "E"
[[3]]
[1] "3" "F" "G" "H" "I" "J"
[[4]]
[1] "4" "K" "L"
[[5]]
[1] "5" "M"
You can now use any list manipulation technique to work with your data.
using the readLines and strsplit to solve this problem.
text <- readLines("./xx.txt",encoding='UTF-8', n = -1L)
txt = unlist(strsplit(text, sep = " "))