Change values of matrix where row names equal column names - r

I am trying to change the values of a matrix so that, for each element where the row name equals the column name, the resulting matrix will have a value of one.
> z<-matrix(0, nrow=10, ncol=8)
> colnames(z)<-letters[1:8]
> rownames(z)<-c("f", "c", "a", "f", "a", "b", "f", "b", "h", "c")
> z
a b c d e f g h
f 0 0 0 0 0 0 0 0
c 0 0 0 0 0 0 0 0
a 0 0 0 0 0 0 0 0
f 0 0 0 0 0 0 0 0
a 0 0 0 0 0 0 0 0
b 0 0 0 0 0 0 0 0
f 0 0 0 0 0 0 0 0
b 0 0 0 0 0 0 0 0
h 0 0 0 0 0 0 0 0
c 0 0 0 0 0 0 0 0
z should be:
a b c d e f g h
f 0 0 0 0 0 1 0 0
c 0 0 1 0 0 0 0 0
a 1 0 0 0 0 0 0 0
f 0 0 0 0 0 1 0 0
a 1 0 0 0 0 0 0 0
b 0 1 0 0 0 0 0 0
f 0 0 0 0 0 1 0 0
b 0 1 0 0 0 0 0 0
h 0 0 0 0 0 0 0 1
c 0 0 1 0 0 0 0 0
I tried:
> z[unique(rownames(z)), unique(rownames(z))]<-1
> z
a b c d e f g h
f 1 1 1 0 0 1 0 1
c 1 1 1 0 0 1 0 1
a 1 1 1 0 0 1 0 1
f 0 0 0 0 0 0 0 0
a 0 0 0 0 0 0 0 0
b 1 1 1 0 0 1 0 1
f 0 0 0 0 0 0 0 0
b 0 0 0 0 0 0 0 0
h 1 1 1 0 0 1 0 1
c 0 0 0 0 0 0 0 0
and:
> z["a", "a"]<-1
> z
a b c d e f g h
f 0 0 0 0 0 0 0 0
c 0 0 0 0 0 0 0 0
a 1 0 0 0 0 0 0 0
f 0 0 0 0 0 0 0 0
a 0 0 0 0 0 0 0 0
b 0 0 0 0 0 0 0 0
f 0 0 0 0 0 0 0 0
b 0 0 0 0 0 0 0 0
h 0 0 0 0 0 0 0 0
c 0 0 0 0 0 0 0 0
but that only changed the first 'a' in the 'a' column.

You can also do this with base R using outer.
z[outer(rownames(z), colnames(z), "==")] <- 1
z
a b c d e f g h
f 0 0 0 0 0 1 0 0
c 0 0 1 0 0 0 0 0
a 1 0 0 0 0 0 0 0
f 0 0 0 0 0 1 0 0
a 1 0 0 0 0 0 0 0
b 0 1 0 0 0 0 0 0
f 0 0 0 0 0 1 0 0
b 0 1 0 0 0 0 0 0
h 0 0 0 0 0 0 0 1
c 0 0 1 0 0 0 0 0

Another option is (which is a modification of #akrun's 2nd option):
z[sapply(colnames(z), `==`, rownames(z))] <- 1
which also gives the correct answer:
> z
a b c d e f g h
f 0 0 0 0 0 1 0 0
c 0 0 1 0 0 0 0 0
a 1 0 0 0 0 0 0 0
f 0 0 0 0 0 1 0 0
a 1 0 0 0 0 0 0 0
b 0 1 0 0 0 0 0 0
f 0 0 0 0 0 1 0 0
b 0 1 0 0 0 0 0 0
h 0 0 0 0 0 0 0 1
c 0 0 1 0 0 0 0 0
The difference with #akrun's 'dimnames' solution is that in the above approach only the necessary spots are converted to 1 which is an advantage when the original matrix doesn't just contain zero's. This is also achieved by the 'outer'-option from #lmo and the 'cbind'-option of #akrun.

We can use row/column indexing to change the elements to 1
z[cbind(1:nrow(z), match( rownames(z), colnames(z)))] <- 1
z
# a b c d e f g h
#f 0 0 0 0 0 1 0 0
#c 0 0 1 0 0 0 0 0
#a 1 0 0 0 0 0 0 0
#f 0 0 0 0 0 1 0 0
#a 1 0 0 0 0 0 0 0
#b 0 1 0 0 0 0 0 0
#f 0 0 0 0 0 1 0 0
#b 0 1 0 0 0 0 0 0
#h 0 0 0 0 0 0 0 1
#c 0 0 1 0 0 0 0 0
Or another option is (should be slower for big datasets)
`dimnames<-`(+(sapply(colnames(z), `==`, rownames(z))), dimnames(z))
# a b c d e f g h
#f 0 0 0 0 0 1 0 0
#c 0 0 1 0 0 0 0 0
#a 1 0 0 0 0 0 0 0
#f 0 0 0 0 0 1 0 0
#a 1 0 0 0 0 0 0 0
#b 0 1 0 0 0 0 0 0
#f 0 0 0 0 0 1 0 0
#b 0 1 0 0 0 0 0 0
#h 0 0 0 0 0 0 0 1
#c 0 0 1 0 0 0 0 0
NOTE: BTW, both the solutions are base R only solutions and not came from some external packages.
Benchmarks
z1 <- matrix(0, 5000, 5000)
colnames(z1) <- 1:5000
set.seed(24)
row.names(z1) <- sample(1:5000, 5000, replace=TRUE)
z2 <- z1
z3 <- z1
z4 <- z1
system.time(z1[cbind(1:nrow(z1), match( rownames(z1), colnames(z1)))] <- 1)
# user system elapsed
# 0.03 0.08 0.11
system.time(z2[outer(rownames(z2), colnames(z2), "==")] <- 1)
# user system elapsed
# 0.67 0.16 0.83
identical(z1, z2)
#[1] TRUE
system.time( `dimnames<-`(+(sapply(colnames(z3), `==`, rownames(z3))), dimnames(z3)))
# user system elapsed
# 31.70 0.39 32.28
system.time(z3[vapply(colnames(z3), function(x) x== rownames(z3),
logical(nrow(z3)))] <- 1)
# user system elapsed
# 0.22 0.00 0.21
Testing with #Procrastinatus Maximus modification
system.time(z4[sapply(colnames(z4), `==`, rownames(z4))] <- 1)
# user system elapsed
# 28.42 0.36 28.85
By testing it on a 10000 x 10000 matrix, the timings are
system.time(z1[cbind(1:nrow(z1), match( rownames(z1), colnames(z1)))] <- 1)
# user system elapsed
# 0.12 0.32 0.44
system.time(z2[outer(rownames(z2), colnames(z2), "==")] <- 1)
# user system elapsed
# 2.72 0.86 3.58
and on 20000 X 20000 matrix
system.time(z1[cbind(1:nrow(z1), match( rownames(z1), colnames(z1)))] <- 1)
# user system elapsed
# 0.95 1.00 1.95
system.time(z2[outer(rownames(z2), colnames(z2), "==")] <- 1)
# user system elapsed
# 15.47 5.87 21.39

Related

Split variable into multiple multiple factor variables

I have some dataset similar to this:
df <- data.frame(n = seq(1:1000000), x = sample(LETTERS, 1000000, replace = T))
I'm looking for a guidance in finding a way to split variable x into multiple categorical variables with range 0-1
In the end it would look like this:
n x A B C D E F G H . . .
1 D 0 0 0 1 0 0 0 0 . . .
2 B 0 1 0 0 0 0 0 0 . . .
3 F 0 0 0 0 0 1 0 0 . . .
In my dataset, there's way more codes in variable x so adding each new variable manually would be too time consuming.
I was thinking about sorting codes in var x and assigning them an unique number each, then creating an iterating loop that creates new variable for each code in variable x.
But i feel like i'm overcomplicating things
A fast and easy way is to use fastDummies::dummy_cols:
fastDummies::dummy_cols(df, "x")
An alternative with tidyverse functions:
library(tidyverse)
df %>%
left_join(., df %>% mutate(value = 1) %>%
pivot_wider(names_from = x, values_from = value, values_fill = 0) %>%
relocate(n, sort(colnames(.)[-1])))
output
> dummmy <- fastDummies::dummy_cols(df, "x")
> colnames(dummy)[-c(1,2)] <- LETTERS
> dummy
n x A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
1 1 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
2 2 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
3 3 E 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 4 H 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 5 T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
6 6 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
7 7 R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
8 8 F 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 9 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
10 10 S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
Benchmark
Since there are many solutions and the question involves a large dataset, a benchmark might help. The nnet solution is the fastest according to the benchmark.
set.seed(1)
df <- data.frame(n = seq(1:1000000), x = sample(LETTERS, 1000000, replace = T))
library(microbenchmark)
bm <- microbenchmark(
fModel.matrix(),
fContrasts(),
fnnet(),
fdata.table(),
fFastDummies(),
fDplyr(),
times = 10L,
setup = gc(FALSE)
)
autoplot(bm)
Using match. First create a vector of zeroes, then match letter of df row with vector from the alphabet and turn to 1. You may use builtin LETTERS constant. Finally Vectorize the thing and cbind.
f <- \(x) {
z <- numeric(length(LETTERS))
z[match(x, LETTERS)] <- 1
setNames(z, LETTERS)
}
cbind(df, t(Vectorize(f)(df$x)))
# n x A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
# Q 1 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
# E 2 E 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# A 3 A 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# Y 4 Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
# J 5 J 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# D 6 D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# R 7 R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
# Z 8 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
# Q.1 9 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
# O 10 O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
Alternatively, transform x to a factor with LETTERS as levels and use model.matrix.
df <- transform(df, x=factor(x, levels=LETTERS))
cbind(df, `colnames<-`(model.matrix(~ 0 + x, df), LETTERS))
# n x A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
# 1 1 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
# 2 2 E 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 3 3 A 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 4 4 Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
# 5 5 J 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 6 6 D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 7 7 R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
# 8 8 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
# 9 9 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
# 10 10 O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
Data:
n <- 10
set.seed(42)
df <- data.frame(n = seq(1:n), x = sample(LETTERS, n, replace = T))
using data.table
library(data.table)
setDT(df) #make df a data.table if needed
merge(df, dcast(df, n ~ x, fun.agg = length), by = c("n"))
The main question here is that of resources? I think. I found using nnet is a fast solution:
library(nnet)
library(dplyr)
df %>% cbind(class.ind(.$x) == 1) %>%
mutate(across(-c(n, x), ~.*1))
n x A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
1 1 E 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 2 H 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 3 L 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 4 M 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
5 5 R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
6 6 A 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 7 Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
8 8 Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
9 9 F 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 10 U 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
11 11 O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
12 12 I 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13 13 O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
14 14 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
15 15 P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
16 16 T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
17 17 F 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 18 K 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19 19 H 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20 20 V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
21 21 V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
22 22 G 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23 23 P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
24 24 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
25 25 V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
26 26 R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
27 27 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
28 28 B 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
29 29 D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
30 30 M 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
31 31 E 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
32 32 V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
33 33 S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
34 34 Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
35 35 T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
[ reached 'max' / getOption("max.print") -- omitted 999965 rows ]
>
Another option would be to use ==.
. <- unique(df$x)
cbind(df, +do.call(cbind, lapply(setNames(., .), `==`, df$x)))
# n x C I L T Y
#1 1 I 0 1 0 0 0
#2 2 C 1 0 0 0 0
#3 3 C 1 0 0 0 0
#4 4 Y 0 0 0 0 1
#5 5 L 0 0 1 0 0
#6 6 T 0 0 0 1 0
#...
Or in one line using sapply.
cbind(df, +sapply(unique(df$x), `==`, df$x))
Or use contrasts and match them to df$x.
. <- contrasts(as.factor(df$x), FALSE)
#. <- contrasts(as.factor(unique(df$x)), FALSE) #Alternative
cbind(df, .[match(df$x, rownames(.)),])
#cbind(df, .[fastmatch::fmatch(df$x, rownames(.)),]) #Alternative
Or indexing in a matrix.
. <- unique(df$x) #Could be sorted
#. <- collapse::funique(df$x) #Alternative
#. <- kit::funique(df$x) #Alternative
i <- match(df$x, .)
#i <- fastmatch::fmatch(df$x, .) #Alternative
#i <- data.table::chmatch(df$x, .) #Alternative
nc <- length(.)
nr <- length(i)
cbind(df, matrix(`[<-`(integer(nc * nr), 1:nr + nr * (i - 1), 1), nr, nc,
dimnames=list(NULL, .)))
Or using outer.
. <- unique(df$x)
cbind(df, +outer(df$x, setNames(., .), `==`))
Or using rep and mĖ€atrix`.
. <- unique(df$x)
n <- nrow(df)
cbind(df, +matrix(df$x == rep(., each=n), n, dimnames=list(NULL, .)))
Benchmark of some methods which will work for more codes in variable x and not only for e.g. LETTERS.
set.seed(42)
df <- data.frame(n = seq(1:1000000), x = sample(LETTERS, 1000000, replace = T))
library(nnet)
library(dplyr)
microbenchmark::microbenchmark(times = 10L, setup = gc(FALSE), control=list(order="block")
, "nnet" = df %>% cbind(class.ind(.$x) == 1) %>%
mutate(across(-c(n, x), ~.*1))
, "contrasts" = {. <- contrasts(as.factor(df$x), FALSE)
cbind(df, .[match(df$x, rownames(.)),])}
, "==" = {. <- unique(df$x)
cbind(df, +do.call(cbind, lapply(setNames(., .), `==`, df$x)))}
, "==Sapply" = cbind(df, +sapply(unique(df$x), `==`, df$x))
, "matrix" = {. <- unique(df$x)
i <- match(df$x, .)
nc <- length(.)
nr <- length(i)
cbind(df, matrix(`[<-`(integer(nc * nr), 1:nr + nr * (i - 1), 1), nr, nc,
dimnames=list(NULL, .)))}
, "outer" = {. <- unique(df$x)
cbind(df, +outer(df$x, setNames(., .), `==`))}
, "rep" = {. <- unique(df$x)
n <- nrow(df)
cbind(df, +matrix(df$x == rep(., each=n), n, dimnames=list(NULL, .)))}
)
Result
Unit: milliseconds
expr min lq mean median uq max neval
nnet 208.6898 220.2304 326.2210 305.5752 386.3385 541.0621 10
contrasts 1110.0123 1168.7651 1263.5357 1216.1403 1357.0532 1514.4411 10
== 146.2217 156.8141 208.2733 185.1860 275.3909 278.8497 10
==Sapply 290.0458 291.4543 301.3010 295.0557 298.0274 358.0531 10
matrix 302.9993 304.8305 312.9748 306.8981 310.0781 363.0773 10
outer 524.5230 583.5224 603.3300 586.3054 595.4086 807.0260 10
rep 276.2110 285.3983 389.8187 434.2754 435.8607 442.3403 10

R multiple for loop

I have this loop over the file msp.chr1
for(i in names(msp.chr1[c(7:70)])){
tmp <- rle(msp.chr1[[i]])$lengths
msp.chr1$idx <- rep(1:length(tmp),tmp)
tmp2 <- unlist(by(msp.chr1[msp.chr1[[i]]==1,], list(msp.chr1$idx[msp.chr1[[i]]==1]),function(x){tail(x["epos"],1)-head(x["spos"],1)}))
assign(paste(i, ".chr1", sep=""), as.vector(tmp2))
rm(i); rm(tmp); rm(tmp2)
}
This file is a dataframe with multiple columns:
head(msp.chr1)
chm spos epos sgpos egpos nsnps PDAC1.0 PDAC1.1 PDAC10.0 PDAC10.1 PDAC100.0 PDAC100.1 PDAC101.0 PDAC101.1 PDAC102.0 PDAC102.1 PDAC103.0 PDAC103.1
1 1 123492 134160 0.12 0.13 252 0 0 0 0 1 0 0 0 0 0 0 0
2 1 134160 135025 0.13 0.14 20 0 0 0 0 1 0 0 0 0 0 0 0
3 1 135025 145600 0.14 0.15 150 0 0 0 0 1 0 0 0 0 0 0 0
4 1 145600 316603 0.15 0.32 195 0 1 0 0 1 0 0 1 0 0 0 1
5 1 316603 520140 0.32 0.52 765 0 0 0 0 0 0 0 0 0 0 0 0
6 1 520140 667054 0.52 0.67 1080 0 0 0 0 0 0 0 0 0 0 0 0
PDAC104.0 PDAC104.1 PDAC105.0 PDAC105.1 PDAC11.0 PDAC11.1 PDAC12.0 PDAC12.1 PDAC13.0 PDAC13.1 PDAC14.0 PDAC14.1 PDAC15.0 PDAC15.1 PDAC17.0 PDAC17.1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
3 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PDAC18.0 PDAC18.1 PDAC19.0 PDAC19.1 PDAC2.0 PDAC2.1 PDAC20.0 PDAC20.1 PDAC21.0 PDAC21.1 PDAC22.0 PDAC22.1 PDAC23.0 PDAC23.1 PDAC24.0 PDAC24.1 PDAC25.0
1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
2 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0
3 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0
4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
PDAC25.1 PDAC3.0 PDAC3.1 PDAC4.0 PDAC4.1 PDAC5.0 PDAC5.1 PDAC6.0 PDAC6.1 PDAC7.0 PDAC7.1 PDAC8.0 PDAC8.1 PDAC807.0 PDAC807.1 PDAC810.0 PDAC810.1
1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
PDAC9.0 PDAC9.1 idx
1 0 0 1
2 0 0 1
3 0 0 1
4 0 0 1
5 1 0 1
6 1 0 1
for(i in names(msp.chr1[c(7:70)])){
tmp <- rle(msp.chr1[[i]])$lengths
msp.chr1$idx <- rep(1:length(tmp),tmp)
tmp2 <- unlist(by(msp.chr1[msp.chr1[[i]]==1,], list(msp.chr1$idx[msp.chr1[[i]]==1]),function(x){tail(x["epos"],1)-head(x["spos"],1)}))
assign(paste(i, ".chr1", sep=""), as.vector(tmp2))
rm(i); rm(tmp); rm(tmp2)
}
But I actually have 23 files, of names msp.chr1, msp.chr2, ..., msp.chr23.
I want to add another loop on the above, to do that on all files at once.
I tried several things but it is not working...
Basically, every chr1 in my loop (including in the assign) should be replaced by chr1 to chr23.
Can you help?
Thanks,
You can generate the name of the file with paste, and then get the file by its name with get. A better option would be to create these files within a list, then you'd only use the j like df=list[[j]].
for(j in 1:23){
df = get(paste("msp.chr",j,sep=""))
for(i in names(df[c(7:70)])){
tmp <- rle(df[[i]])$lengths
df$idx <- rep(1:length(tmp),tmp)
tmp2 <- unlist(by(df[df[[i]]==1,], list(df$idx[df[[i]]==1]),function(x){tail(x["epos"],1)-head(x["spos"],1)}))
assign(paste(i, ".chr1", sep=""), as.vector(tmp2))
rm(i); rm(tmp); rm(tmp2)
}
}

Filling a table with additional columns if they don't exist

I've the following difficult problem. Here short example of my data. Assume that I've two data sets (my real example has something about 20). The data frames result as a list computed by a self written function with lapply. So, I put the data frames in my example in a list, too. Then I "rbind" them to compute a frequency table.
df1 <- data.frame(rev(seq(12:0)), paste0("a=",sample(0:12, 13, replace=T)))
colnames(df1) <- c("k", "a")
df2 <- data.frame(rev(seq(12:0)), paste0("a=",sample(0:12, 13, replace=T)))
colnames(df2) <- c("k", "a")
list_df <- list(df1,df2)
df_combine<- plyr::ldply(list_df, rbind)
freq_foo <- table(df_combine$k,df_combine$a)
I get a frequency table of the following form.
a=0 a=11 a=12 a=2 a=5 a=6 a=7 a=8 a=3 a=9
1 1 0 0 0 0 0 0 1 0 0
2 1 0 0 0 0 0 0 0 0 1
3 1 0 0 0 0 1 0 0 0 0
4 0 0 0 1 0 1 0 0 0 0
5 0 0 0 1 1 0 0 0 0 0
6 0 0 0 0 0 0 1 0 0 1
7 0 1 1 0 0 0 0 0 0 0
8 1 0 0 0 0 1 0 0 0 0
9 0 0 0 0 0 0 2 0 0 0
10 0 0 1 0 1 0 0 0 0 0
11 1 1 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 1 0 1 0
13 1 0 1 0 0 0 0 0 0 0
I want to extend and manipulate my table in the following way:
First the table should go over a range of a=0 to a=15. So if there is a missing column, it should be added. And 2nd) I want to order the columns from 0 to 15.
For the first problem I tried
if(freq_foo$paste0("a=",0:15) == F){freq_foo$paste("a=",0:15) <- 0}
but this should work only for data frames and not for tables. Also. i've no idea how to order the columns with an ascending order. The data type isnt important to me because I just want to use the output for further calculations. So, it can also be a data frame instead of a table.
#convert freq_foo table to dataframe
df <- as.data.frame.matrix(freq_foo)
#add all zeros column for missing column name in 0:15 series
df[, paste0("a=", c(0:15)[!(c(0:15) %in% as.numeric(gsub(".*=(\\d+)", "\\1", names(df))))])] <- 0
#order columns from 0 to 15
df <- df[, order(as.numeric(gsub(".*=(\\d+)", "\\1", names(df))))]
Output is:
a=0 a=1 a=2 a=3 a=4 a=5 a=6 a=7 a=8 a=9 a=10 a=11 a=12 a=13 a=14 a=15
1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0
2 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
3 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0
5 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
6 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
7 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0
8 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
10 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
11 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
12 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0
13 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0
(Edit: Updated code after getting a requirement clarification from OP)

r solve.QP: constraints are inconsistent, no solution

I am trying to find the global minimum of this problem and I cannot figure out why I am getting the error above. I am trying to set 5 of the assets equal to the exact weights and optimize the other 5 within a range of values. I would prefer not to use the meq=5 option.
dvec<-matrix(0, 1,ncol(dmat))
dmat
A B C D E F G H I J
A 6.85E-08 -0.000000039 -0.00000242 1.00E-07 -0.00000206 -0.00000102 -1.14E-07 -0.000000531 -0.00000137 -0.00000132
B -3.90E-08 0.001124367 0.000190585 -2.08E-06 0.000221485 0.000153652 5.99E-05 0.000038 0.0000762 0.000200415
C -2.42E-06 0.000190585 0.001730743 1.30E-07 0.000878497 0.000926944 6.45E-05 0.000339591 0.000958817 0.000665363
D 1.00E-07 -0.00000208 0.00000013 9.68E-07 -0.00000198 -0.00000106 -3.39E-07 0.000000912 0.00000142 0.00000279
E -2.06E-06 0.000221485 0.000878497 -1.98E-06 0.000857829 0.000590873 4.15E-05 0.00025093 0.000521244 0.000455809
F -1.02E-06 0.000153652 0.000926944 -1.06E-06 0.000590873 0.001226696 4.72E-05 0.000198401 0.000512625 0.000343511
G -1.14E-07 0.0000599 0.0000645 -3.39E-07 0.0000415 0.0000472 4.45E-05 0.0000435 0.000052 0.0000425
H -5.31E-07 0.000038 0.000339591 9.12E-07 0.00025093 0.000198401 4.35E-05 0.000362761 0.00031198 0.000224669
I -1.37E-06 0.0000762 0.000958817 1.42E-06 0.000521244 0.000512625 5.20E-05 0.00031198 0.00096765 0.000514901
J -1.32E-06 0.000200415 0.000665363 2.79E-06 0.000455809 0.000343511 4.25E-05 0.000224669 0.000514901 0.000748266
amat
A B C D E F G H I J A B C D E F G H I J
A -1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
B 0 -1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
C 0 0 -1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
D 0 0 0 1 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0
E 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
F 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 1 0 0 0 0
G 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 1 0 0 0
H 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 1 0 0
I 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 1 0
J 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 1
bvec
A B C D E F G H I J A B C D E F G H I J
(757,631) (805) (770,471) (71,668) (10,011,652) (5,870,322) (10,942,502) (52,569) (10,582,791) (5,293,429) - - - - - 5,870,322 10,942,502 52,569 10,582,791 5,293,429
sol<-solve.QP(dmat, dvec, amat, bvec, meq=0)
Error mentioned in the subject line could be due to NON positive-definiteness of matrix. As OP figured out is.positive.definite() is a way to check it.

r error recursive indexing failed at level 2 matrix

Rows of a Matrix book store the latitude, longitude at column 2 and 3 and column 6 to n stores the indices of the points which are within 600 m to the ith one. In the below code, I am trying to check if any points in ith row is within a range to the jth point. If so, then I am appending the indices of both the rows. But while doing so, I am getting an error Error in *tmp*[[j]] : recursive indexing failed at level 2
This is the data set
vehicle_id longt latit date B B B B B B B B B B B B B B B
1 19967 86.2885 23.8210 27 3 1 2 6 0 0 0 0 0 0 0 0 0 0 0
2 19967 86.2891 23.8200 27 2 2 6 0 0 0 0 0 0 0 0 0 0 0 0
3 19967 86.5343 23.8254 27 1 3 0 0 0 0 0 0 0 0 0 0 0 0 0
4 19967 86.7273 23.8200 27 1 4 0 0 0 0 0 0 0 0 0 0 0 0 0
5 19967 86.1362 23.7538 28 1 5 0 0 0 0 0 0 0 0 0 0 0 0 0
6 19967 86.2839 23.8212 28 1 6 0 0 0 0 0 0 0 0 0 0 0 0 0
B B B B B B B B B B B B B
1 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0
I wanted to know why I am getting this errror and how could I resolve it.???/
S=0
for(i in 1:(nrow(book)-1))
{
if( book[i, 1] != book[i+1,1] )
{
next
}
for(j in i:nrow(book))
{
if( book[i,1]!=book[j,1])
{
break }
if( book[i,1]==book[j,1] & (book[i,5] > 2 ))
{ for( k in 7:(5+book[i,5]))
if(distm (c(book[book[i,k],3], book[book[i,k],2]), c(book[j,3], book[j,2]), fun = distHaversine) < 600)
{ S=book[i,5]+book[j,5]
if (S-k > 0)
{ B<- matrix(0,nrow(book),(S-k))
book <- cbind(book,B)
book[i,5]=book[i,5]+book[j,5]
}
book[i,(6+book[i,5]):((6+book[i,5])+book[j,5])] <- book[j,(6:(5+book[j,5]))]
}
}
}
for( k in 7:(5+book[i,5]))
{ if(i!=book[i,k])
{book[book[i,k],1]=0;
}
}
}
`

Resources