How to stop recycling for uneven row lengths in r [duplicate] - r

I have several vectors of unequal length and I would like to cbind them. I've put the vectors into a list and I have tried to combine the using do.call(cbind, ...):
nm <- list(1:8, 3:8, 1:5)
do.call(cbind, nm)
# [,1] [,2] [,3]
# [1,] 1 3 1
# [2,] 2 4 2
# [3,] 3 5 3
# [4,] 4 6 4
# [5,] 5 7 5
# [6,] 6 8 1
# [7,] 7 3 2
# [8,] 8 4 3
# Warning message:
# In (function (..., deparse.level = 1) :
# number of rows of result is not a multiple of vector length (arg 2)
As expected, the number of rows in the resulting matrix is the length of the longest vector, and the values of the shorter vectors are recycled to make up for the length.
Instead I'd like to pad the shorter vectors with NA values to obtain the same length as the longest vector. I'd like the matrix to look like this:
# [,1] [,2] [,3]
# [1,] 1 3 1
# [2,] 2 4 2
# [3,] 3 5 3
# [4,] 4 6 4
# [5,] 5 7 5
# [6,] 6 8 NA
# [7,] 7 NA NA
# [8,] 8 NA NA
How can I go about doing this?

You can use indexing, if you index a number beyond the size of the object it returns NA. This works for any arbitrary number of rows defined with foo:
nm <- list(1:8,3:8,1:5)
foo <- 8
sapply(nm, '[', 1:foo)
EDIT:
Or in one line using the largest vector as number of rows:
sapply(nm, '[', seq(max(sapply(nm,length))))
From R 3.2.0 you may use lengths ("get the length of each element of a list") instead of sapply(nm, length):
sapply(nm, '[', seq(max(lengths(nm))))

You should fill vectors with NA before calling do.call.
nm <- list(1:8,3:8,1:5)
max_length <- max(unlist(lapply(nm,length)))
nm_filled <- lapply(nm,function(x) {ans <- rep(NA,length=max_length);
ans[1:length(x)]<- x;
return(ans)})
do.call(cbind,nm_filled)

This is a shorter version of Wojciech's solution.
nm <- list(1:8,3:8,1:5)
max_length <- max(sapply(nm,length))
sapply(nm, function(x){
c(x, rep(NA, max_length - length(x)))
})

Here is an option using stri_list2matrix from stringi
library(stringi)
out <- stri_list2matrix(nm)
class(out) <- 'numeric'
out
# [,1] [,2] [,3]
#[1,] 1 3 1
#[2,] 2 4 2
#[3,] 3 5 3
#[4,] 4 6 4
#[5,] 5 7 5
#[6,] 6 8 NA
#[7,] 7 NA NA
#[8,] 8 NA NA

Late to the party but you could use cbind.fill from rowr package with fill = NA
library(rowr)
do.call(cbind.fill, c(nm, fill = NA))
# object object object
#1 1 3 1
#2 2 4 2
#3 3 5 3
#4 4 6 4
#5 5 7 5
#6 6 8 NA
#7 7 NA NA
#8 8 NA NA
If you have a named list instead and want to maintain the headers you could use setNames
nm <- list(a = 1:8, b = 3:8, c = 1:5)
setNames(do.call(cbind.fill, c(nm, fill = NA)), names(nm))
# a b c
#1 1 3 1
#2 2 4 2
#3 3 5 3
#4 4 6 4
#5 5 7 5
#6 6 8 NA
#7 7 NA NA
#8 8 NA NA

Related

Assigning NA to the Matrix entries in R

I want to create a Matrix where the entry for each row is chosen randomly. I want the matrix to have the property that each row in the same column has a different value. If different rows (for example row i and row i+1) in the same column have the same value then I want to replace the entry for row i+1 with NA. Basically, I want the column to have different entries for each row. For example, column 1 entries are (1,2,2,4,1). Then, I want to make this column entries are (1,2,NA,4,NA). I have tried this
solution = matrix(NA,nrow=5,ncol=5)
for (i in 1:5) {
for (j in 1:5) {
one_entry = sample(1:10, 1)
solution[j,i] = one_entry
if (solution[j+1,i]==solution[j,i]){
#is.na(solution[j+1,i]) <- solution[j+1,I]
solution[j+1,i]<- NA
#solution[solution[j+1,i]] <- NA
} else {
solution[j+1, i] = one_entry
}
}
}
print(solution)
I got the error "Error in if (solution[j + 1, i] == solution[j, i]) { :
missing value where TRUE/FALSE needed". Please help how to resolve this.
Instead of element-wise comparison using if statement, you can replace duplicated entries with NA. The output of duplicated() is a logical vector setting the position of the duplicates to TRUE.
set.seed(1)
nr <- 5
nc <- 7
m <- matrix(sample(1:10, nr * nc, replace = TRUE), nrow = nr)
m
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 9 7 5 9 5 1 10
# [2,] 4 2 10 5 5 4 6
# [3,] 7 3 6 5 2 3 4
# [4,] 1 1 10 9 10 6 4
# [5,] 2 5 7 9 9 10 10
for (i in seq_len(nc)) {
m[, i][duplicated(m[, i])] <- NA
}
m
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 9 7 5 9 5 1 10
# [2,] 4 2 10 5 NA 4 6
# [3,] 7 3 6 NA 2 3 4
# [4,] 1 1 NA NA 10 6 NA
# [5,] 2 5 7 NA 9 10 NA
Using purrr library:
library(purrr)
set.seed(123)
#populate the matrix
(mat <- rerun(5, sample(1:10,size = 5, replace = TRUE)) %>%
reduce(cbind))
#> out elt elt elt elt
#> [1,] 3 5 5 3 9
#> [2,] 3 4 3 8 3
#> [3,] 10 6 9 10 4
#> [4,] 2 9 9 7 1
#> [5,] 6 10 9 10 7
map(2:length(mat), ~{ if (mat[[. - 1]] == mat[[.]]) .x } ) %>%
compact() %>%
walk(~{ mat[[.x]] <<- NA })
mat
#> out elt elt elt elt
#> [1,] 3 5 5 3 9
#> [2,] NA 4 3 8 3
#> [3,] 10 6 9 10 4
#> [4,] 2 9 NA 7 1
#> [5,] 6 10 NA 10 7
Created on 2021-06-28 by the reprex package (v2.0.0)

How do I loop correctly?

Here is the data below. I'm not sure which type of looping I should be using, but here is what I am looking to do: If, for row 1, there is a 6 present, then for column 7 we have "Yes", if there is no 6 present, then column 7 has "No". Ignore columns 8 & 9.
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 6 1 1 6 1 NA NA NA
[2,] 5 5 5 5 5 5 NA NA NA
[3,] 1 1 6 1 1 6 NA NA NA
[4,] 5 5 5 5 5 5 NA NA NA
[5,] 6 1 1 6 1 1 NA NA NA
[6,] 5 5 5 5 5 5 NA NA NA
[7,] 1 6 1 1 6 1 NA NA NA
[8,] 5 5 5 5 5 5 NA NA NA
[9,] 1 1 6 1 1 6 NA NA NA
[10,] 5 5 5 5 5 5 NA NA NA
Here is the code that I have.
data.matrix <- matrix(data=NA,nrow = b, ncol = n+3)
b <- 10
n <- 6
for (i in 1:b)
{
data.matrix[,1:n] <- sample(6,n,replace=T)
}
Side Note: I keep getting this error
"the condition has length > 1 and only the first element will be used"
Here is a solution using apply:
a[,7] <- apply(a, 1, function(x) ifelse(max(x,na.rm = T) == 6,"YES","NO"))
where a is the input data.frame/tibble. As commented above, if you have matrix, then convert it to data.frame and perform this operation.
Here is solution with lapply and which:
res <- apply(data.matrix, 1, function(x) {
x[[7]] <- length(which(x == 6)) > 0
x
})
res <- t(res)

adding repeating pattern while converting list to matrix in R

I am looking for a fast way to convert a list into a matrix with an additional column containing a repeating 1:5 pattern. For instance, the list mat looks like this. The list and the repeating pattern can get to thousands of values in length and so a fast approach would be ideal.
I can convert the list to a matrix using melt (may not be ideal for large matrices though), however, I am having trouble getting the repeating pattern to work.
The matrix looks like this
mat
[[1]]
[1] 5
[[2]]
[1] 1 4 5
[[3]]
[1] 3 1
[[4]]
[1] 4 6 5 3
The output should contain the values of the list as well as an index column containing a 1:5 repeating pattern depending on the length of each index in the list. For instance, mat[[4]] contains 4 values, therefore the index column should contain a values 1:4
output
[,1] [,2]
5 1
1 1
4 2
5 3
3 1
1 2
4 1
6 2
5 3
3 4
mat <- list(5, c(1,4,5), c(3,1), c(4,6,5,3)) ## your example data
We can use basic operations:
cbind( unlist(mat), sequence(lengths(mat)) )
# [,1] [,2]
# [1,] 5 1
# [2,] 1 1
# [3,] 4 2
# [4,] 5 3
# [5,] 3 1
# [6,] 1 2
# [7,] 4 1
# [8,] 6 2
# [9,] 5 3
#[10,] 3 4
Alternatively,
cbind( unlist(mat), unlist(lapply(mat, seq_along)) )
Here is another option with Map. We get the sequence of each list element with lapply, cbind the corresponding elements of list using Map and rbind it.
do.call(rbind, Map(cbind, mat, lapply(mat, seq_along)))
# [,1] [,2]
#[1,] 5 1
#[2,] 1 1
#[3,] 4 2
#[4,] 5 3
#[5,] 3 1
#[6,] 1 2
#[7,] 4 1
#[8,] 6 2
#[9,] 5 3
#[10,] 3 4
Or with data.table, we melt the list to a 2 column data.frame, convert it to data.table with setDT and assign (:=) the sequence of 'L1' to 'L1' after grouping by 'L1'
library(data.table)
setDT(melt(mat))[, L1 := seq_len(.N), L1][]
# value L1
# 1: 5 1
# 2: 1 1
# 3: 4 2
# 4: 5 3
# 5: 3 1
# 6: 1 2
# 7: 4 1
# 8: 6 2
# 9: 5 3
#10: 3 4

Apply function across mulitple columns in R

I'm trying to write a streamlined function in R to compare multiple columns in a matrix. What is the optimal way to do this in R? Most likely using apply.
I have seen this question crop up a number of times but with some conflicting views on the optimal way to write this.
for ( j in 2:ncol(net) )
{
for ( i in 1:nrow(net) )
{
net[i,j] <- min(net[i,j],net[i,1])
}
}
The end output of a matrix with the following
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 2 3
[3,] 3 2 3
would be
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 2
[3,] 3 2 3
We can unlist the columns the "net" except the first column (net[-1]), replicate the first column as the same length as the unlisted columns, and use pmin to get the minimum value of corresponding elements of the vectors.
pmin(unlist(net[-1], use.names=FALSE), net[,1][row(net[-1])])
#[1] 2 2 7 5 2 2 2 6 5 3 2 1 0 5 1
If we need a lapply solution,
unlist(lapply(net[-1], function(x) pmin(x, net[,1])), use.names=FALSE)
Using the OP's for loop
for ( i in 2:ncol(net) ){
for ( j in 1:nrow(net) ){
print(min(net[j,i],net[j,1]))
}
}
#[1] 2
#[1] 2
#[1] 7
#[1] 5
#[1] 2
#[1] 2
#[1] 2
#[1] 6
#[1] 5
#[1] 3
#[1] 2
#[1] 1
#[1] 0
#[1] 5
#[1] 1
Update
As the OP mentioned that this is not giving the expected output, trying with new data showed in the OP's post
net <- cbind(1:3, 2, 3)
cbind(net[,1],pmin(unlist(net[,-1], use.names=FALSE),
net[,1][row(net[,-1])]))
# [,1] [,2] [,3]
#[1,] 1 1 1
#[2,] 2 2 2
#[3,] 3 2 3
data
set.seed(24)
net <- as.data.frame(matrix(sample(0:9, 4*5, replace=TRUE), ncol=4))
If there are no NAs you can do
net <- head(airquality, 4) # example data
for (j in 1:nrow(net)) net[j, net[j,]>net[j,1]] <- net[j,1]
net
Here's a version with sapply and ifelse (which is vectorised, woo), which is likely faster, and deals with NA values in a predictable way:
sapply(X = seq(to = ncol(x = net)), FUN = function(j){
net[,j] <- ifelse(test = net[,1] < net[,j], yes = net[,1], no = net[,j])
})
Some sample data
net <- head(airquality)
net
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
results in:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 41 41 7.4 41 5 1
[2,] 36 36 8.0 36 5 2
[3,] 12 12 12.0 12 5 3
[4,] 18 18 11.5 18 5 4
[5,] NA NA NA NA NA NA
[6,] 28 NA 14.9 28 5 6
Note: I specified pretty much all argument names, as I've found this makes most code faster. If you don't care about time, a simpler [possibly more readable] version:
sapply(seq(ncol(net)), function(j){
net[,j] <- ifelse(net[,1] < net[,j], net[,1], net[,j])
})

Merging multiple columns in single data frame in R

I have a bizarre problem where I've combined together several data frames that have different species abundance data. I used rbind.fill() to collate the data frames, but some of the columns names for like species are spelled slightly differently, hence, for several species I have 2-3 columns.
Does anyone know of a way I can merge the data from these columns together?
Simple example:
dat <- matrix(data=c(
Sp.a=c(1,2,3,4,5,NA,NA,NA,NA,NA),
Sp.b=c(3,4,5,6,7,5,4,6,3,4),
Sp.c=c(4,4,4,3,2,NA,NA,NA,NA,NA),
Spp.A=c(NA,NA,NA,NA,NA,2,3,4,2,3),
Spp.C=c(NA,NA,NA,NA,NA,3,4,2,5,4)
), 10,5)
colnames(dat)<- c("Sp.a", "Sp.b", "Sp.c", "Spp.A", "Spp.C")
dat
sp.a sp.b sp.c Spp.A Spp.C
[1,] 1 3 4 NA NA
[2,] 2 4 4 NA NA
[3,] 3 5 4 NA NA
[4,] 4 6 3 NA NA
[5,] 5 7 2 NA NA
[6,] NA 5 NA 2 3
[7,] NA 4 NA 3 4
[8,] NA 6 NA 4 2
[9,] NA 3 NA 2 5
[10,] NA 4 NA 3 4
How can I get sp.a and Spp.A into a single column? (same for sp.c and Spp.C).
Thanks for any help,
Paul
Using reshape2 and going from long --> wide --> long(again) format:
library(reshape2)
## long format
dat.m <- melt(dat)
## remove missing values
dat.m <- dat.m[!is.na(dat.m$value),]
## rename names
dat.m$Var2 <- tolower(sub("Spp","Sp", dat.m$Var2) )
## wide format
dcast(Var1~Var2,data=dat.m)
Var1 sp.a sp.b sp.c
1 1 1 3 4
2 2 2 4 4
3 3 3 5 4
4 4 4 6 3
5 5 5 7 2
6 6 2 5 3
7 7 3 4 4
8 8 4 6 2
9 9 2 3 5
10 10 3 4 4
Here's one way. This is pretty general, and would even work if you had one series divided over three or more columns.
dat <- data.frame(dat)
# get the last letter of each column and make it lowercase,
# we'll be grouping the columns by this
ns <- tolower(gsub('^.+\\.', '', names(dat)))
# group the columns by their last letter, and run each group through pmax
result <- lapply(split.default(dat, ns), function(x) do.call(function(...) pmax(..., na.rm=TRUE), x))
do.call(cbind, result)
# a b c
# [1,] 1 3 4
# [2,] 2 4 4
# [3,] 3 5 4
# [4,] 4 6 3
# [5,] 5 7 2
# [6,] 2 5 3
# [7,] 3 4 4
# [8,] 4 6 2
# [9,] 2 3 5
# [10,] 3 4 4
ColsToMerge <- c("sp.a", "Spp.A")
dat[["A.merged"]] <-
apply(dat[, ColsToMerge], 1, function(rr) ifelse(is.na(rr[[1]]), rr[[2]], rr[[1]]))

Resources