Is there a way, in R, to save a whole vector into one value of a matrix or data frame, without having to combine it into a single value first?
For example, if I had a vector..
pk<-c(0.021477,0.021114,0.022794,0.014858,0.009690,0.003255,0.002715)
and a matrix..
tst<-matrix(data=NA,nrow=4,ncol=4)
is there anyway of saying, for example..
tst[1,1]<-pk
?
I know I could paste the vector together, but I'm wondering whether there's a way of avoiding this? It's a matter of efficiency as the actual matrix is 33427 x 33427, with each vector ~ 300 values long, and I need to run further analysis on each value in the matrix. I'm hoping to find a way to speed up the analysis.
You can certainly put a vector in each element of a matrix. Try something like
tst<-matrix(data=list(),nrow=4,ncol=4)
tst[[1,1]] <- pk #note double square brackets needed for assignment
It doesn't print 'nicely'
tst
[,1] [,2] [,3] [,4]
[1,] Integer,5 NULL NULL NULL
[2,] NULL NULL NULL NULL
[3,] NULL NULL NULL NULL
[4,] NULL NULL NULL NULL
but elements can be extracted in the obvious ways
> tst[1,1]
[[1]]
[1] 0.021477 0.021114 0.022794 0.014858 0.009690 0.003255 0.002715
#note list
> tst[[1,1]]
[1] 0.021477 0.021114 0.022794 0.014858 0.009690 0.003255 0.002715
#original vector
If the vectors are of varying length, I can see two ways of dealing with it. Either round up the lengths to the max and put things in an array
tst<- array(data=NA, dim=c(4,4,maxLen))
tst[1,1,1:length(pk)] <- pk
Alternatively you can just create a list of the pks, and generate a map to translate the 1 dimensional list index with the 2D matrix element they would've corresponded to.
Which of these is optimal will depend on the downstream analysis you wish to do. If there's 'inter-pk' communication (e.g. you use element 1 of the pk at [1,1] and element 1 of pk at [1,2], [2,1]...) then the array solution might be better. But if all computations are within an individual vector, the list may be a better way to go
You can also use the "as is" I function and make pk a 1 element list (that holds the vector):
pk <- c(0.021477,0.021114,0.022794,0.014858,0.009690,0.003255,0.002715)
dat <- data.frame(a <- I(list(pk)))
str(dat)
## 'data.frame': 1 obs. of 1 variable:
## $ a....I.list.pk..:List of 1
## ..$ : num 0.02148 0.02111 0.02279 0.01486 0.00969 ...
## ..- attr(*, "class")= chr "AsIs"
dat[1,1]
## [[1]]
## [1] 0.021477 0.021114 0.022794 0.014858 0.009690 0.003255 0.002715
Related
I'm trying to handle a vector of named numerics for the first time in R. The vector itself is named p.values. It consists of p-values which are named after their corresponding variabels. Through simulating I obtained a huge number of p-values that are always named like one of the five variables they correspond to. I'm interested in p-values of only one variable however and tried to extract them with p.values[["var_a"]] but that gives my only the p-value of var_a's last entry. p.values$var_a is invalid and as.numeric(p.values) or unname(p.values) gives my only all values without names obviously. Any idea how I can get R to give me the 1/5 of named numerics that are named var_a?
Short example:
p.values <- as.numeric(c(rep(1:5, each = 5)))
names(p.values) <- rep(letters[1:5], 5)
str(p.values)
Named num [1:25] 1 1 1 1 1 2 2 2 2 2 ...
- attr(*, "names")= chr [1:25] "a" "b" "c" "d" ...
I'd like to get R to show me all 5 numbers named "a".
Thanks for reading my first post here and I hope some more experienced R users know how to deal with named numerics and can help me with this issue.
You can subset p.values using [ with names(p.values) == "a" to show all values named a.
p.values[names(p.values) == "a"]
#a a a a a
#1 2 3 4 5
What is the convention to assign an object to a multi-level list?
Sofar I thought the convention 1,2 of indexing is to use [[]] instead of $.
Hence, when saving results in loops I usually used the following approach:
> result <- matrix(2,2,2)
> result_list <- list()
> result_list[["A"]][["B"]][["C"]] <- result
> print(result_list)
$A
$A$B
$A$B$C
[,1] [,2]
[1,] 2 2
[2,] 2 2
Which works as intended with this matrix.
But when assigning a single number the list seems to skip the last level.
> result <- 2
> result_list <- list()
> result_list[["A"]][["B"]][["C"]] <- result
> print(result_list)
$A
B
2
At the same time, if I use $ instead of [[]] the list again is as intendet.
> result_list$A$B$C <- result
> print(result_list)
$A
$A$B
$A$B$C
[1] 2
As mentioned here you can also use list("A" = list("B" = list("C" = 2))).
Which of these methods should be used for indexing a multi-level list in R?
Although the title of the question referst to multi-level list indexing, and the syntax mylist[['a']][['b']][['c']] is the same that one would use to retrieve an element of a multi-level list, the differences that you're observing actually arise from using the same syntax for creation (or not) of multi-level lists.
To show this, we can first explicitly create the multi-level (nested) lists, and then check that the indexing works as expected both for matrices and for single numbers.
mymatrix=matrix(1:4,nrow=2)
list_b=list(c=mymatrix)
list_a=list(b=list_b)
mynestedlist1=list(a=list_a)
str( mynestedlist1 )
# List of 1
# $ a:List of 1
# ..$ b:List of 1
# .. ..$ c: int [1:2, 1:2] 1 2 3 4
mynumber=2
list_e=list(f=mynumber)
list_d=list(e=list_e)
mynestedlist2=list(d=list_d)
str( mynestedlist2 )
# List of 1
# $ d:List of 1
# ..$ e:List of 1
# .. ..$ f: num 2
( Note that I've created the lists in sequential steps for clarity; the could have been all rolled-together in a single line, like: mynestedlist2=list(d=list(e=list(f=mynumber))) )
Anyway, now we'll check that indexing works Ok:
str(mynestedlist1[['a']][['b']][['c']])
# int [1:2, 1:2] 1 2 3 4
str(mynestedlist1$a$b$c)
# int [1:2, 1:2] 1 2 3 4
str(mynestedlist2[['d']][['e']][['f']])
# num 2
str(mynestedlist2$d$e$f)
# num 2
# and, just to check that we don't 'skip the last level':
str(mynestedlist2[['d']][['e']])
# List of 1
# $ f: num 2
So the direct answer to the question 'which of these methods should be used for indexing a multi-level list in R' is: 'any of them - they're all ok'.
So what's going on with the examples in the question, then?
Here, the same syntax is being used to try to implicitly create lists, and since the structure of the nested list is not specified explicitly, this relies on whether R can infer the structure that you want.
In the first and third examples, there's no ambiguity, but each for a different reason:
First example:
mynestedlist1=list()
mynestedlist1[['a']][['b']][['c']]=mymatrix
We've specified that mynestedlist1 is a list. But its elements could be any kind of object, until we assign them. In this case, we put into the element named 'a' an object with an element 'b' that contains an object with an element 'c' that is a matrix. Since there's no R object that can contain a matrix in a single element except a list, the only way to achieve this assignment is by creating a nested list.
Third example:
mynestedlist3=list()
mynestedlist3$g$h$i=mynumber
In this case, we've used the $ notation, which only applies to lists (or to data types that are similar/equivalent to lists, like dataframes). So, again, the only way to follow the instructions of this assignment is by creating a nested list.
Finally, the pesky second example, but starting with a simpler variant of it:
mylist2=list()
mylist2[['c']][['d']]=mynumber
Here there's an ambiguity. We've specified that mylist2 is a list, and we've put into the element named 'c' an object with an element 'd' that contains a single number. This element could have been a list, but it can also be a simple vector, and in this case R chooses this as the simpler option:
str(mylist2)
# List of 1
# $ c: Named num 2
# ..- attr(*, "names")= chr "d"
Contrast this to the behaviour when trying to assign a matrix using exactly the same syntax: in this case, the only way follow the syntax would be by creating another, nested, list inside the first one.
What about the full second example mylist2[['c']][['d']][['e']]=mynumber, where we try to assign a number named 'e' to the just-created but still-empty object 'd'?
This seems rather unclear, and this may be the reason for the different behaviours of different versions of R (as reported in the comments to the question). In the question, the action taken by R has been to assign the number while dropping its name, similarly to:
myvec=vector(); myvec2=vector()
myvec[['a']]=1
myvec2[['b']]=2
myvec[['a']]=myvec2
str(myvec)
# Named num 2
# - attr(*, "names")= chr "a"
However, the syntax alone doesn't seem to force this behaviour, so it would be sensible to avoid relying on this behaviour when trying to create nested lists, or lists of vectors.
In R, when I use a command like this:
b <-c(7,10)
b
Does it create a row vector (1 row, 2 cols) or a column vector (1 col, 2 rows) by default?
I can't tell from the displayed output.
I am R beginner (as is obvious :))
Neither. A vector does not have a dimension attribute by default, it only has a length.
If you look at the documentation on matrix arithmetic, help("%*%"), you see that:
Multiplies two matrices, if they are conformable. If one argument is a
vector, it will be promoted to either a row or column matrix to make
the two arguments conformable. If both are vectors of the same length,
it will return the inner product (as a matrix).
So R will interpret a vector in whichever way makes the matrix product sensible.
Some examples to illustrate:
> b <- c(7,10)
> b
[1] 7 10
> dim(b) <- c(1,2)
> b
[,1] [,2]
[1,] 7 10
> dim(b) <- c(2,1)
> b
[,1]
[1,] 7
[2,] 10
> class(b)
[1] "matrix"
> dim(b) <- NULL
> b
[1] 7 10
> class(b)
[1] "numeric"
A matrix is just a vector with a dimension attribute. So adding an explicit dimension makes it a matrix, and R will do that in whichever way makes sense in context.
And an example of the behavior in the context of matrix multiplication:
> m <- matrix(1:2,1,2)
> m
[,1] [,2]
[1,] 1 2
> m %*% b
[,1]
[1,] 27
> m <- matrix(1:2,2,1)
> m %*% b
[,1] [,2]
[1,] 7 10
[2,] 14 20
You can treat a vector ( c() ) in R as a row or a column.
You can see this by
rbind(c(1,3,5),c(2,4,6))
cbind(c(1,2,3),c(4,5,6))
It is a collection. By default tho when casting to a data frame
data.frame(c(1,2,3))
it is made a column, such where the first index will address which column of the table is being referenced, in contradiction to what is orthodox in linear algebra.
i.e., to access the hello in this casting of a vector into a data.frame
an additional index is required
a = data.frame(c("hello","F***ery"))
a[[1]][[1]]
and this is where things get wacky, because data frames don't chive with strings... the type of "hello" is supposedly an integer, with levels...
The c function creates an "atomic" vector, using the word of Norman Matloff in the art of R programming:
atomic vectors, since their components cannot be broken down into
smaller components.
It can be seen as a "concatenation" (in fact c stands for concatenate) of elements, indexed by their positions and so no dimensions (in a spatial sense), but just a continuous index that goes from 1 to the length of the object itself.
[revised version]
I have a large character vector in R of size 57241 that contains gene symbols e.g
gene <- c("AL627309.1","SMIM1","DFFB") # assume this of size 57241
I have another table in which one column table$genes has some combinations of genes in each row e.g
head(table$genes)
[1] ,OR4F5,AL627309.1,OR4F29,OR4F16,AL669831.1,
[2] ,TP73,CCDC27,SMIM1,LRRC47,CEP104,DFFB
..
this table has about 1400 rows. For each gene I wanted to find the index of row in table in which it is located.
To do that I used
ind <- sapply(gene, grep, table$genes, fixed=TRUE,USE.NAMES=FALSE))
The variable "ind" returned is a large list of size 57241 which looks like this
head(ind)
[[1]]
[1] 1
[[2]]
[1] 1
[[3]]
[1] 1
[[4]]
[1] 1
[[5]]
[1] 1
[[6]]
[1] 1
I know for a fact each gene exists only once in that table. So the numbers that I am interested in is the list one in each line above i.e. 1. How can I convert this into an integer vector? When I unlist() this somehow I get a vector of length ~500000 whereas I should be getting the same length as of the list. I have tried many functions and combinations but nothing seems to work. Any ideas?
Thanks
I'm not able to reproduce that behavior with either a list or a dataframe:
> gene <- c("AL627309.1","SMIM1","DFFB")
>
> table <- list(genes =c(",OR4F5,AL627309.1,OR4F29,OR4F16,AL669831.1,",
",TP73,CCDC27,SMIM1,LRRC47,CEP104,DFFB"))
> (ind <- sapply(gene, grep, table$genes, fixed=TRUE,USE.NAMES=FALSE))
[1] 1 2 2
I thought for a bit that you should be using match but after further consideration, it seemed as though there must be something different about your data structure. Try posting dput(head (table$gene)) and dput(gene) to make your problem reproducible. You should also stop using the word "list" to refer to the items in that table$gene items. It confuses regular users of R who think you are talking about an R "list". You can try to see which of the items in your ind "list" has a vector of length greater than one with:
which(sapply(ind, length) > 1)
I have a string matrix where the fields were derived from numbers in scientific notation. I want to convert the character matrix to a data frame and work on the numeric fields. During the matrix to data frame conversion R converts the strings to factors, maybe because of the 'e' character in the middle of the number. If the stringAsFactors() option is set to FALSE, the columns will be left as character, so still not numeric.
For example:
> m
[,1] [,2]
[1,] "1e-07" "4e-06"
[2,] "2e-05" "5e-05"
[3,] "0.03" "1e-07"
> data.frame(m)
X1 X2
1 1e-07 4e-06
2 2e-05 5e-05
3 0.03 1e-07
> class(data.frame(m))
[1] "data.frame"
> df = data.frame(m)
> df
X1 X2
1 1e-07 4e-06
2 2e-05 5e-05
3 0.03 1e-07
> class(df$X1)
[1] "factor"
> class(df$X2)
[1] "factor"
How can I force the data frame to interpret these strings as numbers? data.matrix() does actually convert string in scientific notation to numeric, but I want to know if there is a way to control the character matrix to data frame conversion directly, without going through the intermediate data.matrix() conversion step.
You should change it into a numeric matrix first, then make a data.frame of it.
# A string matrix
m <- matrix(as.character(runif(6)),3)
# as.data.frame doesn't turn it into numbers...
str(as.data.frame(m)) # factors
str(as.data.frame(m, stringsAsFactors=FALSE)) # strings
d <- m
# Make it numeric first
mode(d) <- "numeric"
# Now turn it into a data.frame...
d <- as.data.frame(d)
str(d) # numeric
str(m) # still strings...
...but it would be better if you could avoid storing the matrix values as strings in the first place! Unless you loaded them from a file, there shouldn't be any reason to. If you happened to get them as strings from some other operation, you should look back at that operation and see how you can avoid losing the numeric mode.