Function for same length of vectors in R - r

I want to make a function, which should remake list of vectors of different lengths to list of vectors with the same lengths. I made two functions, but the second one does not work well.
My code is:
first function (works well)
delka<-function(x){
delky<<-NULL
for(i in 1:length(x)){
delky[i]<<-length(x[[i]])
}
}
Here I globally made object "delky". Second function is
uprava<- function(x){
stejne<<- NULL
for(i in 1:length(x)){
stejne[[i]]<<-vector(x[[i]], length(max(delky)))
}
}
Where I want to globally make an object "stejne" containing vectors with same lengths. But R answer me an issue
Error in vector(x[[i]], length(max(delky))) : invalid 'mode' argument
Do you have any ideas of what I am doing wrong?

Assuming #RomanLuštrik is correct about what you are trying to do, you can do this much more directly using the following:
lapply(my, `length<-`, max(lengths(my)))
## $a
## [1] 0.8669645 0.9224072 0.2003480 0.9476093 0.1095652 NA
## [7] NA NA NA NA NA
##
## $b
## [1] 0.6679763 0.2742245 0.7726615 0.4247057 0.7274648 0.8218540
## [7] 0.4874759 0.4764729 0.3958279 0.1653358 0.2331573
##
## $c
## [1] 0.71882342 0.92852497 0.75134020 0.53098586 0.17515857
## [6] 0.04997067 0.70350036 NA NA NA
## [11] NA
##
The lengths function was relatively recently introduced, so make sure you are running the most recent version of R.

Assuming you can work on whole lists at a time, and if you want to pad the shorter vectors with NAs, here's one way.
my <- list(a = runif(5),
b = runif(11),
c = runif(7))
maxl <- max(sapply(my, length))
sapply(my, FUN = function(x, ml) {
difference <- ml - length(x)
c(x, rep(NA, difference))
}, ml = maxl, simplify = FALSE)
$a
[1] 0.91906470 0.68651070 0.07317576 0.52985130 0.27916889 NA NA NA NA NA NA
$b
[1] 0.86384953 0.79707167 0.88226627 0.91590091 0.03181455 0.86493584 0.89597354 0.80890065 0.92418156 0.72947596 0.13847751
$c
[1] 0.2576621 0.6512487 0.5806530 0.8782730 0.0262019 0.1000885 0.5245472 NA NA NA NA
in another representation
sapply(my, FUN = function(x, ml) {
difference <- ml - length(x)
c(x, rep(NA, difference))
}, ml = maxl)
a b c
[1,] 0.91906470 0.86384953 0.2576621
[2,] 0.68651070 0.79707167 0.6512487
[3,] 0.07317576 0.88226627 0.5806530
[4,] 0.52985130 0.91590091 0.8782730
[5,] 0.27916889 0.03181455 0.0262019
[6,] NA 0.86493584 0.1000885
[7,] NA 0.89597354 0.5245472
[8,] NA 0.80890065 NA
[9,] NA 0.92418156 NA
[10,] NA 0.72947596 NA
[11,] NA 0.13847751 NA

Related

Combining multiple character vectors of different lengths into single matrix without recycling [duplicate]

This question already has answers here:
Combining (cbind) vectors of different length
(5 answers)
Closed 3 years ago.
I have a list of c.1100 individual character vectors, each of which corresponds to a particular set of genes (the character is the gene symbol in the form: e.g. "ENSG000011", "ENSG000012" etc.
I want to merge these vectors into a single data.frame/matrix, such that each item in the list becomes its own column. However, each of the items in the list is of a different length.
However, I cannot seem to find a single way of doing this.
I've tried a number of ways within R, but the format never seems to look quite right (e.g. it pastes all of the items of the list in one row, on top of oneanother, or I get an error as the elements are each of different lengths)
Using Base R we need to...
First lets create a sample dataset with 4 vectors:
a <- rnorm(10)
b <- rnorm(5)
c <- rnorm(7)
d <- rnorm(20)
Then we can put them in a list as:
f <- list(a,b,c,d)
Then we need to find the length of the longest vector:
max_len <- max(sapply(f, length))
Then we need to make all vectors the max_len by substituting NAs in for the gap (so if you have a max_len = 20 and a current vector is only length(current) = 10 then you need the last 10 values to be NA
f1 <- lapply(f, function(x) c(x, rep(NA, max_len - length(x))))
Then you can turn this into a matrix as:
matrix(unlist(f1), ncol = length(f1), byrow = F)
which results in
[,1] [,2] [,3] [,4]
[1,] -0.53487289 -1.8570456 0.8304454 -0.6440267
[2,] 0.04283173 -1.2541836 0.9579962 -1.1664334
[3,] -1.31686110 -0.6789986 0.9424487 0.4073388
[4,] -0.54987484 -0.4326257 -1.5165032 0.1990406
[5,] 0.31529161 -0.2712977 0.1347272 -0.2479010
[6,] -1.08465865 NA 0.7442857 -1.1319033
[7,] 1.11283161 NA -0.8397640 0.2636702
[8,] 0.08882676 NA NA -0.1332037
[9,] 0.76028752 NA NA 0.1607880
[10,] -2.68513818 NA NA -2.3300150
[11,] NA NA NA -0.3356175
[12,] NA NA NA 0.8115210
[13,] NA NA NA 1.1668857
[14,] NA NA NA 0.5538027
[15,] NA NA NA -0.8910439
[16,] NA NA NA -1.4056796
[17,] NA NA NA -1.6713585
[18,] NA NA NA 0.2557690
[19,] NA NA NA -0.5970861
[20,] NA NA NA 0.1851019

Replacing NAs from onle list by NAs in second list in equal positions in R

here is the problem. I have two lists of vectors. Those vectors have same length in same positions. But there are some NAs in those vectors. Data may looks like
HH
[[1]]
[1] 2 1 5 NA
[[2]]
[1] 2 0 5
[[3]]
[1] NA 1 NA
JJ
[[1]]
[1] 0 5 8 9
[[2]]
[1] NA 1 3
[[3]]
[1] 2 8 3
My goal is: have NAs in equal positions in both lists in all vectors. More exactly, write code, which will find NA in first list, nad replace value by NA in second list in equal position. I succesfully wrote similar function for vector, but i failed here. Can you help me? Here is my code.
D<-NULL
for(j in 1:length(PH)){
+ for(i in 1:length(PH[[j]])){
+ if(is.na(PH[[j]][i])==FALSE){
+ D[[j]][i]=AB[[j]][i]}
+ else{
+ D[[j]][i]=NA}}
+ }
Here's my two cents. Grabbing data from #Colonel's answer,
v1 <- unlist(firstlist)
v2 <- unlist(secondlist)
v1[is.na(v2)] <- NA
relist(v1, firstlist)
#[[1]]
#[1] NA "2" "3" NA
#[[2]]
#[1] "a" NA
You can use Map:
Map(function(u,v) {v[is.na(u)]<-NA;v}, firstlist, secondlist)
Example:
firstlist = list(c(1,2,3,NA), c('a',NA))
secondlist = list(c(NA,22,33,5), c('b','d'))
#[[1]]
#[1] NA 22 33 NA
#[[2]]
#[1] "b" NA

Need to vectorize function that using loop (replace NA rows with values from vector)

How I can rewrite this function to vectorized variant. As I know, using loops are not good practice in R:
# replaces rows that contains all NAs with non-NA values from previous row and K-th column
na.replace <- function(x, k) {
for (i in 2:nrow(x)) {
if (!all(is.na(x[i - 1, ])) && all(is.na(x[i, ]))) {
x[i, ] <- x[i - 1, k]
}
}
x
}
This is input data and returned data for function:
m <- cbind(c(NA,NA,1,2,NA,NA,NA,6,7,8), c(NA,NA,2,3,NA,NA,NA,7,8,9))
m
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] 1 2
[4,] 2 3
[5,] NA NA
[6,] NA NA
[7,] NA NA
[8,] 6 7
[9,] 7 8
[10,] 8 9
na.replace(m, 2)
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] 1 2
[4,] 2 3
[5,] 3 3
[6,] 3 3
[7,] 3 3
[8,] 6 7
[9,] 7 8
[10,] 8 9
Here is a solution using na.locf in the zoo package. row.na is a vector with one component per row of m such that a component is TRUE if the corresponding row of m is all NA and FALSE otherwise. We then set all elements of such rows to the result of applying na.locf to column 2.
At the expense of a bit of speed the lines ending with ## could be replaced with row.na <- apply(is.na(m), 1, all) which is a bit more readable.
If we knew that if any row has an NA in column 2 then all columns of that row are NA, as in the question, then the lines ending in ## could be reduced to just row.na <- is.na(m[, 2])
library(zoo)
nr <- nrow(m) ##
nc <- ncol(m) ##
row.na <- .rowSums(is.na(m), nr, nc) == nc ##
m[row.na, ] <- na.locf(m[, 2], na.rm = FALSE)[row.na]
The result is:
> m
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] 1 2
[4,] 2 3
[5,] 3 3
[6,] 3 3
[7,] 3 3
[8,] 6 7
[9,] 7 8
[10,] 8 9
REVISED Some revisions to improve speed as in comments below. Also added alternatives in discussion.
Notice that, unless you have a pathological condition where the first row is all NANA (in which case you're screwed anyway), you don't need to check whether all(is.na(x[i−1,]))all(is.na(x[i - 1, ])) is T or F because in the previous time thru the loop you "fixed" row i−1i-1 .
Further, all you care about is that the designated k-th value is not NA. The rest of the row doesn't matter.
BUT: The k-th value always "falls through" from the top, so perhaps you should:
1) treat the k-th column as a vector, e.g. c(NA,1,NA,NA,3,NA,4,NA,NA) and "fill-down" all numeric values. That's been done many times on SO questions.
2) Every row which is entirely NA except for column k gets filled with that same value.
I think that's still best done using either a loop or apply
You probably need to clarify whether some rows have both numeric and NA values, which your example fails to include. If that's the case, then things get trickier.
The most important part in this answer is getting the grouping you want, which is:
groups = cumsum(rowSums(is.na(m)) != ncol(m))
groups
#[1] 0 0 1 2 2 2 2 3 4 5
Once you have that the rest is just doing your desired operation by group, e.g.:
library(data.table)
dt = as.data.table(m)
k = 2
cond = rowSums(is.na(m)) != ncol(m)
dt[, (k) := .SD[[k]][1], by = cumsum(cond)]
dt[!cond, names(dt) := .SD[[k]]]
dt
# V1 V2
# 1: NA NA
# 2: NA NA
# 3: 1 2
# 4: 2 3
# 5: 3 3
# 6: 3 3
# 7: 3 3
# 8: 6 7
# 9: 7 8
#10: 8 9
Here is another base only vectorized approach:
na.replace <- function(x, k) {
is.all.na <- rowSums(is.na(x)) == ncol(x)
ref.idx <- cummax((!is.all.na) * seq_len(nrow(x)))
ref.idx[ref.idx == 0] <- NA
x[is.all.na, ] <- x[ref.idx[is.all.na], k]
x
}
And for fair comparison with #Eldar's solution, replace is.all.na with is.all.na <- is.na(x[, k]).
Finally I realized my version of vectorized solution and it works as expected. Any comments and suggestions are welcome :)
# Last Observation Move Forward
# works as na.locf but much faster and accepts only 1D structures
na.lomf <- function(object, na.rm = F) {
idx <- which(!is.na(object))
if (!na.rm && is.na(object[1])) idx <- c(1, idx)
rep.int(object[idx], diff(c(idx, length(object) + 1)))
}
na.replace <- function(x, k) {
v <- x[, k]
i <- which(is.na(v))
r <- na.lomf(v)
x[i, ] <- r[i]
x
}
Here's a workaround with the na.locf function from zoo:
m[na.locf(ifelse(apply(m, 1, function(x) all(is.na(x))), NA, 1:nrow(m)), na.rm=F),]
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] 1 2
[4,] 2 3
[5,] 2 3
[6,] 2 3
[7,] 2 3
[8,] 6 7
[9,] 7 8
[10,] 8 9

Match number within list of different length vectors

I want to match a number within a list containing vector of different lengths. Still my solution (below) doesn't match anything beyond the first item of each vector.
seq_ <- seq(1:10)
list_ <- list(seq_[1:3], seq_[4:7], seq_[8:10])
list_
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 4 5 6 7
#
# [[3]]
# [1] 8 9 10
but
for (i in seq_) {
print(match(i,list_))
}
# [1] 1
# [1] NA
# [1] NA
# [1] 3
# [1] NA
# [1] NA
# [1] NA
# [1] NA
# [1] NA
# [1] NA
In the general case, you probably will be happier with which, as in
EDIT: rewrote to show the full looping over values.
seq_ <- seq(1:10)
list_ <- list(seq_[1:3], seq_[4:7], seq_[8:10])
matchlist<-list(length=length(list_))
for( j in 1:length(list_)) {
matchlist[[j]] <- unlist(sapply(seq_, function(k) which(list_[[j]]==k) ))
}
That will return the locations of all matches. It's probably more clear what's happening if you create an input like my.list <- list(sample(1:10,4,replace=TRUE), sample(1:10,7,replace=TRUE))

How to simplify a leading-NA count function, and generalize it to work on matrix, dataframe

I wrote a leading-NA count function, it works on vectors. However:
a) Can you simplify my version?
b) Can you also generalize it to work directly on matrix, dataframe (must still work on individual vector), so I don't need apply()? Try to avoid all *apply functions, fully vectorize, it must still work on a vector, and no special-casing if at all possible.
leading_NA_count <- function(x) { max(cumsum((1:length(x)) == cumsum(is.na(x)))) }
# v0.1: works but seems clunky, slow and unlikely to be generalizable to two-dimensional objects
leading_NA_count <- function(x) { max(which(1:(length(x)) == cumsum(is.na(x))), 0) }
# v0.2: maybe simpler, needs max(...,0) to avoid max failing with -Inf if the which(...) is empty/ no leading-NAs case: e.g. c(1,2,3)
# (Seems impossible to figure out how to use which.max/which.min on this)
leading_NA_count <- function(x) { max(cumsum((1:length(x)) == cumsum(is.na(x)))) }
set.seed(1234)
mm <- matrix(sample(c(NA,NA,NA,NA,NA,0,1,2), 6*5, replace=T), nrow=6,ncol=5)
mm
[,1] [,2] [,3] [,4] [,5]
[1,] NA NA NA NA NA
[2,] NA NA 2 NA 1
[3,] NA 0 NA NA NA
[4,] NA NA 1 NA 2
[5,] 1 0 NA NA 1
[6,] 0 NA NA NA NA
leading_NA_count(mm)
[1] 4 # WRONG, obviously (looks like it tried to operate on the entire matrix by-column or by-row)
apply(mm,1,leading_NA_count)
[1] 5 2 1 2 0 0 # RIGHT
This works whether mm is a matrix, vector or data.frame. See ?max.col for more info:
max.col(cbind(!is.na(rbind(NA, mm)), TRUE), ties = "first")[-1] - 1
For part (a) of your question this is the simplest function I could think of:
leadingNaCount = function(x) { sum(cumprod(is.na(x))) }

Resources