Apologies in advance for what I know is simple. I just haven't been able to find the solution despite the 1000 search attempts and my rudimentary skills are not up to the challenge.
I have a list of matrices consisting of rows of integers. I can find row totals etc with (l)apply function etc. What I am stuck on however is removing an entire element if any of the rows fail a certain criteria, say a total of <500.
So in the below example:
x1 <- rnorm(50,5,0.32)
dim(x1) <- c(5,10)
x2 =rnorm(50,25,3.2)
dim(x2) <- c(5,10)
x3 =rnorm(50,25,3.2)
dim(x3) <- c(5,10)
x4=rnorm(50,0.8,0.1)
dim(x4) <- c(5,10)
x5=rep(NaN,50)
dim(x5) <- c(5,10)
list1<-list(x1,x2,x3,x4,x5)
If I sum each row in each element for a total:
goodbit <- lapply(list1, function (x) apply(x, 1, function(c) sum(c)))
I know I can filter out the elements with NAs:
list1nonas <- Filter(Negate(anyNA),list1)
But I am having a hard time extending that to criteria based on the row totals. For example how can I remove any element where any row total in that element is < 8.
(Element [[4]] in this example).
You can use rowSums. If we want to test whether there are any rowSums less than 8 in a given matrix x, we can do any(rowSums(x) < 8). Therefore the logical negation negation of this will return TRUE if none of the row sums are less than 8.
We can therefore put this inside an sapply to run the test on each matrix in our list, and return a logical vector.
Subsetting our original list by this vector returns a filtered list with only those matrices that have no row sums below 8.
list1[sapply(list1, function(x) !any(rowSums(x) < 8))]
If I repeat this code
x<-1:6
n<-40
M<-200
y<-replicate(M,as.numeric(table(sample(x,n,1))))
str(y)
sometimes R decide to create a matrix and sometimes it creates a list. Can you explain me the reason for that? How can I be sure that it is a matrix or a list?
If you chose M very small, for example 10, it will almost always create a matrix. If you chose M very large, for example 2000, it will create a list.
You get a list for cases when not all the numbers in x are sampled.
You can always return a list by using simplify = FALSE.
y <- replicate(M, as.numeric(table(sample(x,n,TRUE))), simplify = FALSE)
Also, you are using 1 to set replace argument. It is better to use logical argument i.e TRUE.
To return always a matrix, we can do :
sapply(y, `[`, x)
This will append NA's for values where length is unequal.
May be it will help
[https://rafalab.github.io/dsbook/r-basics.html#data-types][1]
Vectors in matrix have to be all the same type and length
Vectors in list can contain elements of different classes and length
Try this:
x<-1
y<-2:7
z<-matrix(x,y)
z<-list(x,y)
In first case you will get matrix 2 rows and 1 column because y vector is longer
In the second case you will get a list with elements of different length.
Also
str()
function is very useful. But you can find the class of object using
class()
function.
I have an initial nested list (list of lists) of vectors of integers with NA values randomly replacing some integers. Within a nested list, if one vector contains all NA values, it needs to be broken up into two lists (or more, if more than one vector in the nested list contains all NAs). I ultimately need a vector of values that sums the length nested lists minus 1, i.e. sum(lengths(list[[i]]-1, where i is the list of vectors in the nested list, and removes any values less than or equal to 0.
So far I have been able to do this, but realized that if a list is 'artificially' broken into 2+ lists, I only need to subtract one from the first position of the nested list. Furthermore, if the first position of the nested list is NA, the subsequent lists in the nested list do not need to be subtracted by 1.
Below is some sample code that provides examples of the full nested list, the nested list with NA values randomly assigned, and the final desired vector of sums for the example lists.
#Full List
L.full<-list(list(1,3,c(0,2,0),c(0,0)),list(1,6,c(0,3,2,0,1,0),c(0,0,0,1,0,0),1,2,c(0,1),2,c(0,0)),
list(1,0),list(1,0),list(1,4,c(2,0,0,0),c(4,1),c(1,0,0,0,0),0),list(1,0))
#Nested list with "random" NAs
L.miss<-list(list(1,3,c(0,NA,0),c(0,0)),list(1,6,c(0,3,NA,0,NA,0),c(0,NA,0,1,0,0),1,NA,c(0,1),2,c(0,0)),
list(1,NA),list(1,0),list(1,NA,c(NA,0,0,0),c(NA,NA),c(1,0,0,NA,0),0),list(1,0))
#Desired final output
L.want<-c(5,11,5,1,3,5,1)
The below code may be a bit inelegant but almost gets me where I need to be; it outputs the final vector as [5,11,4,1,2,4,1], not the desired [5,11,5,1,3,5,1]. How can I have the code subtract one from just the first element in the list, if it is present?
#Break apart
test<-lapply(lapply(seq_along(L.miss), function(nm) {split(L.miss[[nm]], cumsum(sapply(L.miss[[nm]], function(x) all(is.na(x)))))}), function(lstA) lapply(lstA,function(x) Filter(function(y) !all(is.na(y)), x)))
#Bring the nested list up a level
test2<-unlist(test,recursive=FALSE)
#Remove NA values
test3<-rapply(test2,function(x) x[!is.na(x)], how="replace") #remove NAs
#Sum nested lists
test4<-integer()
for (i in 1:length(test3)){
test4[i]<-sum(lengths(test3[[i]]))-1
}
test5<-test4[test4>0] #remove values <=0
Thank you - if this question is too specific for this forum, please let me know and I will remove it.
I think I solved it - it was much simpler to delete the first position in the initial L.miss nested list then ignore the whole "minus 1" altogether:
#Delete first instance
l<-L.miss
for (i in 1:length(l)){
l[[i]][[1]]<-NULL
}
#Same code as above but for loop sums without -1
test<-lapply(lapply(seq_along(l), function(nm) {split(l[[nm]], cumsum(sapply(l[[nm]], function(x) all(is.na(x)))))}), function(lstA) lapply(lstA,function(x) Filter(function(y) !all(is.na(y)), x)))
test2<-unlist(test,recursive=FALSE)
test3<-rapply(test2,function(x) x[!is.na(x)], how="replace") #remove NAs
test4<-integer()
for (i in 1:length(test3)){
test4[i]<-sum(lengths(test3[[i]]))
}
test5<-test4[test4>0] #remove values <=0
#result
#> test5
#[1] 5 11 5 1 3 5 1
I am not very experienced with R, and have been struggling for days to repeat a string of code to fill a data matrix. My instinct is to create a for loop.
I am a biology student working on colour differences between sets of images, making use of the R package colordistance. The relevant data has been loaded in R as a list of 8x4 matrices (each matrix describes the colours in one image). Five images make up one set and there are 100 sets in total. Each set is identified by a number (not 1-100, it's an interrupted sequence, but I have stored the sequence of numbers in a vector called 'numberlist'). I have written the code to extract the desired data in the right format for the first set, and it is as follows;
## extract the list of matrices belonging to the first set (A3) from the the full list
A3<-histlist[grep('^3',names(histlist))]
## create a colour distance matrix (cdm), ie a pairwise comparison of "similarity" between the five matrices stored in A3
cdm3<-colordistance::getColorDistanceMatrix(A3, method="emd", plotting=FALSE)
## convert to data frame to fix row names
cdm3df<-as.data.frame(cdm3)
## remove column names
names(cdm3df)<-NULL
## return elements in the first row and column 2-5 only (retains row names).
cdm3filtered<-cdm3df[1,2:5]
Now I want to replace "3" in the code above with each number in 'numberlist' (not sure whether they should be as.factor or as.numeric). I've had many attempts starting with for (i in numberlist) {...} but with no successful output. To me it makes sense to store the output from the loop in a storage matrix; matrix(nrow=100,ncol=4) but I am very much stuck, and unable to populate my storage matrix row by row by iterating the code above...
Any help would be greatly appreciated!
Updates
What I want the outputs of the loop to to look like (+ appended in the storage matrix);
> cdm17filtered
17clr 0.09246918 0.1176651 0.1220622 0.1323586
This is my attempt:
for (i in numberlist$X) {
A[i] <- histlist[grep(paste0('^',i),names(histlist))]
cdm[i] <- colordistance::getColorDistanceMatrix(A[i], method="emd", plotting=FALSE)
cdm[i]df <- as.data.frame(cdm[i])
cdm[i]filtered <- cdm[i]df[1,2:5]
print(A[i]) # *insert in n'th column of storage matrix
}
The above is not working, and I'm missing the last bit needed to store the outputs of the loop in the storage matrix. (I was advised against using rbind to populate the storage matrix because it is slow..)
In your attempt, you use invalid R names with non-alphanumeric characters not escaped, cdm[i]df and cdm[i]filtered. It seems you intend to index from a larger container like a list of objects.
To properly generalize your process for all items in numberlist, adjust your ^3 setup. Specifically, build empty lists and in loop iteratively assign by index [i]:
# INITIALIZE LISTS (SAME LENGTH AS numberlist)
A <- vector(mode="list", length = length(numberlist))
cdm_matrices <- vector(mode="list", length = length(numberlist))
cdm_dfs <- vector(mode="list", length = length(numberlist))
cdm_filtered_dfs <- vector(mode="list", length = length(numberlist))
# POPULATE LISTS
for (i in numberlist$X) {
## extract the list of matrices belonging to the first set
A[i] <- histlist[grep(paste0('^', i), names(histlist))]
## create a colour distance matrix (cdm)
cdm_matrices[i] <- colordistance::getColorDistanceMatrix(A[i], method="emd", plotting=FALSE)
## convert to data frame to fix row names and remove column names
cdm_dfs[i] <- setNames(as.data.frame(cdm_matrices[i]), NULL)
## return elements in the first row and column 2-5 only (retains row names).
cdm_filtered_dfs[i] <- cdm_dfs[i][1,2:5]
}
Alternatively, if you only need the last object, cdm_filtered_df returned, use lapply where you do not need to use or index lists and all objects are local in scope of function (i.e., never saved in global environment):
cdm_build <- function(i) {
A <- histlist[grep(paste0('^', i), names(histlist))]
cdm <- colordistance::getColorDistanceMatrix(A, method="emd", plotting=FALSE)
cdm_df <- setNames(as.data.frame(cdm), NULL)
cdm_filtered_df <- cdm_df[1,2:5]
return(cdm_filtered_df) # REDUNDANT AS LAST LINE IS RETURNED BY DEFAULT
}
# LIST OF FILTERED CDM DATA FRAMES
cdm_filtered_dfs <- lapply(numberlist, cdm_build)
Finally, with either solution above, should you want to build a singular data frame, run rbind in a do.call():
cdm_final_df <- do.call(rbind, cdm_filtered_dfs)
Starting with an empty dataframe, I need to fill the dataframe as follows: A for loop generates a fixed number of values in each iteration, and I need to add a new column with the values in that list, and giving the column a unique name, col_i (where i is the ith iteration of the loop).
How can this (seemingly simple task) be done?
The most efficient way to build a dataframe piecewise is to store your parts in a pre-allocated list, then put them together afterwards.
For example:
num.iters <- 10
l <- vector('list', num.iters)
for (i in 1:num.iters) {
l[[i]] <- rnorm(3) # the column data
names(l)[i] <- paste('Col', i, sep='.') # the column name
}
do.call(cbind, l) # ... if your cols are the same datatype and you want a matrix
data.frame(l) # otherwise
What's wrong with ?cbind?
The functions cbind and rbind are S3 generic, with methods for data frames.
The data frame method will be used if at least one argument is a data frame
and the rest are vectors or matrices.
?colnames can also be applied to data.frames