R create list or matrix - r

If I repeat this code
x<-1:6
n<-40
M<-200
y<-replicate(M,as.numeric(table(sample(x,n,1))))
str(y)
sometimes R decide to create a matrix and sometimes it creates a list. Can you explain me the reason for that? How can I be sure that it is a matrix or a list?
If you chose M very small, for example 10, it will almost always create a matrix. If you chose M very large, for example 2000, it will create a list.

You get a list for cases when not all the numbers in x are sampled.
You can always return a list by using simplify = FALSE.
y <- replicate(M, as.numeric(table(sample(x,n,TRUE))), simplify = FALSE)
Also, you are using 1 to set replace argument. It is better to use logical argument i.e TRUE.
To return always a matrix, we can do :
sapply(y, `[`, x)
This will append NA's for values where length is unequal.

May be it will help
[https://rafalab.github.io/dsbook/r-basics.html#data-types][1]
Vectors in matrix have to be all the same type and length
Vectors in list can contain elements of different classes and length
Try this:
x<-1
y<-2:7
z<-matrix(x,y)
z<-list(x,y)
In first case you will get matrix 2 rows and 1 column because y vector is longer
In the second case you will get a list with elements of different length.
Also
str()
function is very useful. But you can find the class of object using
class()
function.

Related

"Indexing" (in a mathematical sense) variables in R so that the correct variable is chosen in each iteration of a loop

I have 28 variables. Each one of them is a vector of numeric class. The names of these vectors are sub_1, sub_2, sub_3, and on and on all the way down to sub_28.
What I want to do with these vectors is to compute a system of 28 equations where, in each equation, only one of those vectors is involved.
The right hand-side of each equation is the calculation that I want to make on each vector and the left hand-side is where I want to store the output of each calculation.
So this is what I do. First, I declare a vector of length 28.
alpha = vector("numeric", 28)
Each component of this vector is going to store the corresponding outputs of the calculations.
For example, I want to set alpha[1] equal to
1 + length(sub_1)*(sum(log(sub_1/(1.5))))
And I want to set alpha[2] equal to
1 + length(sub_2)*(sum(log(sub_2/(1.5))))
And so on. You get the idea.
I thought about using a 'for' loop. This is what comes to my mind:
for (i in 1:28) {
alpha[i] = 1 + length(sub_i)*(sum(log(sub_i/(1.5))))}
I know exactly what is wrong with this code. The compiler searches for a variable whose name is sub_i, and it won't find that variable because I haven't declared it. What I want is for the compiler to read the _i as a subindex. I want the compiler to look —in each iteration of the loop— for the sub_i vector whose subindex i matches the number of the iteration. How can I achieve that?
Edit: by the way, the 28 vectors have varying lengths.
I would put the 28 numeric variable vectors in a list and apply a function on all list elements.
Simulated data:
set.seed(1) # for reproducibility
# pick some random vector lengths
veclengths <- sample(50:100, 28, replace = TRUE)
# generate random numeric values to generate the vectors
my.variables <- lapply(veclengths, function(x) rnorm(x,100,10))
# name the vectors as in your example (not required)
names(my.variables) <- paste0("sub_", seq_along(my.variables))
# extract these 28 separate vectors as individual variables for your use case
list2env(my.variables , envir = .GlobalEnv)
You could then load your vectors into a list, e.g.
vars <- ls(pattern="sub_.*") # pick variables by name pattern
# I sorted here numerically, for convenience
my.variables <- mget(vars[order(as.numeric(gsub("sub_", "", vars)))])
Then just apply the function you chose to all list elements separately
resfun <- function(x) {1 + length(x)*(sum(log(x/(1.5))))}
alpha <- unlist(lapply(my.variables, resfun))

Storing numeric vectors in the names of a list

Is it possible to store a numeric vector in the names variable of a list?
ie.
x <- c(1.2,3.4,5.9)
alist <- list()
alist[[x]]$somevar <- 2
I know I can store it as a vector within the list element, but I thought it would be faster to move through and find the element of the list I want (or add if needed) if the numeric vector is the name of the list element itself...
EDIT:
I have included a snippit of the code in context below, apologies for the change in nomenclature. In brief, I am working on a clustering problem, the dataset is too large to directly do the distance calculation on, my solution was to create bins for each dimension of the data and find the nearest bin for each observation in the original data. Of course, I cannot make a complete permutation matrix since this would be larger than the original data itself. Therefore, I have opted to find the nearest bin for each dimension individually and add it to a vector, temp.bin, which ideally would become the name of the list element in which the rowname of the original observation would be stored. I was hoping that this would simplify searching for and adding bins to the list.
I also realise that the distance calculation part is likely wrong - this is still very much a prototype.
binlist <- list()
for(i in 1:nrow(data)) # iterate through all data points
{
# for each datapoint make a container for the nearest bin found
temp.bin <- vector(length = length(markers))
names(temp.bin) <- markers
for(j in markers) # and dimensions
{
# find the nearest bin for marker j
if(dist == "eucl")
{
dists <- apply(X=bin.mat, MARGIN = 1, FUN= function(x,y) {sqrt(sum((x-y)^2))}, y=data[i,j])
temp.bin[j] <- bin.mat[which(dists == min(dists)),j]
}
}
### I realise this part doesn't work
binlist[[temp.bin]] <- append(binlist[[temp.bin]], values = i)
The closest answer so far is John Coleman.
names(alist) is a character vector. A numeric vector is not a string, hence it isn't a valid name for a list element. What you want is thus impossible. You could create a string representation of such a list and use that as a name, but that would be cumbersome. If this is what you really wanted to do, you could do something like the following:
x <- c(1.2,3.4,5.9)
alist <- list()
alist[[paste(x,collapse = " ")]]$somevar <- 2
This will create a 1-element list whose only element has the name "1.2 3.4 5.9".
While there might be some use cases for this, I suspect that you have an XY problem. What are you trying to achieve?
Solution
With some slight modifications we can achieve the following:
x = c(1.2,3.4,5.9)
alist = vector("list", length(x))
names(alist) = x
alist[as.character(x)] = list(c(somevar = 2))
#$`1.2`
#somevar
# 2
#
#$`3.4`
#somevar
# 2
#
#$`5.9`
#somevar
# 2
Explanation
Basically:
I had to create the list with the correct length (vector("list", length(x)))
Then assign the correct names (names(alist) = x)
So we can call list levels by name using [ and assign a new list to each list element (alist[as.character(x)] = list(c(somevar = 2)))
2nd Solution
Going by John Coleman comment:
It isn't clear that you answered the question. You gave a list whose
vector of names is the the vector x, coerced to character. OP asked if
it was possible "if the numeric vector is the name of the list element
itself... ". They wanted to treat x as a single name, not a vector of
names.
If you wanted to have the list element named after the vector x you could try, using the deparse(substitute(.)) trick
x = c(1.2,3.4,5.9)
alist = list()
alist[[deparse(substitute(x))]]$somevar = 2
#> alist[[deparse(substitute(x))]]
#$somevar
#[1] 2
If you really wanted the numeric values in x as the name itself, then I point you to John's solution.

R: Iterating Parameter Arguments from List for Random Generation For Loop

I'm new to the forum and to r, so please forgive the sloppy code.
In short, I am trying to get a normal distribution to iteratively use the parameters drawn from two lists for use in a For Loop that generates a 30x10000 matrix of random samples using these parameters.
The first list (List1) is a collection of numeric vectors. The second list (List2) has corresponding values I would like to use for the standard deviation argument in rnorm: i.e. vector 1 from List1's standard deviation is Value1 in List2.
set.seed(1500) #set up random gen
var1 = rnorm(1:1000, mean = #mean of vector(i) from list1, sd = #value(i) from List2)
sample(var1,size=1)
X = matrix(ncol = 30, nrow = 10000)
for(j in 1:length(var1)){ #simulates data using parameters set by rnorm var1 function
for(i in 1:10000){
X[i.j] = sample(var1,1)
}
}
Here's the original post where this code is inspired from.
Cheers!
It seems mapply() would help you:
# First let's turn the list1 into means.
dist.means = lapply(list1,mean)
Lapply is a way to execute a function for every element in a list. Mapply works in a very similar way but uses multiples lists.
samples = mapply(rnorm, 30*10000, dist.means, list2,SIMPLIFY=F)
A little bit more explanation: mapply() runs rnorm() multiple times. In the first attempt, it runs using the first element of first list as the first argument, the first element of second list as second argument, etc. So in our case it will run rnorm( 30*10000, dist.means[[1]], list2[[1]] ) then rnorm( 30*10000, dist.means[[2]], list2[[2]] ) and store the output in a list.
Note that I use a small trick here. The first list is a single number 30*10000. When you give list of different sizes to mapply it recycles the shorter one, i.e. it repeats the shorter lists until it has the same length of the longer lists.
Hope that helps

R sum element X in list of vector

I just started doing some R script and I can't figure out this problem.
I got a list of vector let say
myListOfVector <- list(
c(1,2),
c(1,2),
c(1,2),
c(1,2)
)
what I want is the sum of each X element of each vector that are in my list base on the position of the element
so that if I have 3 vector that contains (a, b, c), I will get the sum of each a, each b and each c in a list or vector
I know that each vector are the same length
What I seek is something like that
result <- sum(myListOfVector)
# result must be c(4, 8)
Does anybody have an idea ?
The only way I've been able to do it is by using a loop but it take so much time that I can't resign to do it.
I tried some apply and lapply but they don't seem to work like I want it to because all I have is one vector at a time.
Precision :
The list of vector is returned by a function that I can't modify
I need an efficient solution if possible
A list of vectors of the same length can be summed with Reduce:
Reduce(`+`, myListOfVector)
Putting it in a matrix and using colSums or rowSums, as mentioned by #AnandaMahto and #JanLuba, might be faster in some cases; I'm not sure.
Side note. The example in the OP is not a list; instead, it should be constructed like
myListOfVector <- list( # ... changing c to list on this line
c(1,2),
c(1,2),
c(1,2),
c(1,2)
)
you should first convert your list to a matrix:
mymatrix=matrix(myListOfVector,ncol=2,byrow=T)
and then use colSums:
colSums(mymatrix)

R_List from a selected rows of matrix

I have a matrix and I want to create a list with selected rows of that matrix being the list elements.
For example this is my matrix
my.matrix=matrix(1:100, nrow=20)
and I want to create a list from this matrix such a way that each element of this list is part of the matrix and the row index of each part is defined by
my.n=c(1,2,4,3,5,5)
where my.n gives the number of rows that should be extracted from my.matrix. my.n[1]=1 means row 1; my.n[2]=2 means row 2,3; my.n[3]=4 means rows 4 to 7 and so on.
So the first element of my list should be
my.matrix[1,]
second
my.matrix[2:3,]
and so on.
How to do it in an elegant way?
Not quite sure, but I think you want something like this ...
S <- split(seq_len(nrow(my.matrix)), rep.int(seq_along(my.n), my.n))
lapply(S, function(x) my.matrix[x, , drop = FALSE])
Here we are splitting the row numbers of my.matrix by replications of my.n. Then we use lapply() over the resulting list S to subset my.matrix with those row numbers.
end <- cumsum(my.n)
start <- c(1,(end+1)[-length(end)])
mapply(function(a,b) my.matrix[a:b,,drop=F], start, end)
mapply takes the first argument from two vectors and applies them to a function. It moves on to the second element of each vector and continues through each vector. This behavior works for this application to create a list of subsets as described. credit to #nongkrong for the mapply approach.

Resources