R: column reference to itself - r

Please, help!
I have w:
x y
0 0
0 0
0 0
0 1
0 0
0 0
0 -1
0 0
0 0
0 1
0 0
0 -1
0 0
0 0
I would like to get:
x y
0 0
0 0
0 0
1 1
1 0
1 0
0 -1
0 0
0 0
1 1
1 0
0 -1
0 0
0 0
I use R:
for (i in 2:length(w$x)) { w$x[i] = w$x[i-1] + w$y[i]}
Is it possible to do without the use of a loop statement?
Thank you!

This assumes that you want to start with the initial value of 0 in the x column:
transform(w, x = cumsum(y))
## x y
## 1 0 0
## 2 0 0
## 3 0 0
## 4 1 1
## 5 1 0
## 6 1 0
## 7 0 -1
## 8 0 0
## 9 0 0
## 10 1 1
## 11 1 0
## 12 0 -1
## 13 0 0
## 14 0 0
Otherwise you can include the initial value:
transform(w, x = x[1] + cumsum(y))
The result here is the same.
Both of these assume that either y[1] is zero, or that you want to use the actual value if it is nonzero (your code ignores y[1]).

Related

Drawing conditional combinations of a binary vector one by one

I am trying to write a routine to find combinations conditionally of a binary vector. For example, consider the following vector:
> A <- rep(c(1,0,0),3)
> A
[1] 1 0 0 1 0 0 1 0 0
Note that, length of the vector A is always multiple of 3. So the following condition always holds:
length(A) %% 3 == 0
The main condition is that there must be only a single 1 in each set of 3 vectors consecutively. In this example, for instance, one element of A[1:3] will be 1, one element of A[4:6] will be 1 and one element of A[7:9] will be 1 and the rest are all 0. Therefore, for this example, there will be a total of 27 possible combinations.
Objective is to make a routine to draw/return the next valid combination until all the possible legal combinations are returned.
Note that, I am not looking for a table with all the possible combinations. That Solution is already available in my other query in StackOverflow. However, with that method, I am running into memory problems when going beyond more than a length of 45 elements in A, as it is returning the full matrix which is huge. Therefore instead of storing the full matrix, I want to retrieve one combination at a time, and then decide later if I want to store it or not.
What the OP is after is an iterator. If we were to do this properly, we would write a class in C++ with a get_next method, and expose this to R. As it stands, with base R, since everything is passed by value, we must call a function on our object-to-be-updated and reassign the object-to-be-updated every time.
Here is a very crude implementation:
get_next <- function(comb, v, m) {
s <- seq(1L, length(comb), length(v))
e <- seq(length(v), length(comb), length(v))
last_comb <- rev(v)
can_be_incr <- sapply(seq_len(m), function(x) {
!identical(comb[s[x]:e[x]], last_comb)
})
if (all(!can_be_incr)) {
return(FALSE)
} else {
idx <- which(can_be_incr)[1L]
span <- s[idx]:e[idx]
j <- which(comb[span] == 1L)
comb[span[j]] <- 0L
comb[span[j + 1L]] <- 1L
if (idx > 1L) {
## Reset previous maxed out sections
for (i in 1:(idx - 1L)) {
comb[s[i]:e[i]] <- v
}
}
}
return(comb)
}
And here is a simple usage:
m <- 3L
v <- as.integer(c(1,0,0))
comb <- rep(v, m)
count <- 1L
while (!is.logical(comb)) {
cat(count, ": ", comb, "\n")
comb <- get_next(comb, v, m)
count <- count + 1L
}
1 : 1 0 0 1 0 0 1 0 0
2 : 0 1 0 1 0 0 1 0 0
3 : 0 0 1 1 0 0 1 0 0
4 : 1 0 0 0 1 0 1 0 0
5 : 0 1 0 0 1 0 1 0 0
6 : 0 0 1 0 1 0 1 0 0
7 : 1 0 0 0 0 1 1 0 0
8 : 0 1 0 0 0 1 1 0 0
9 : 0 0 1 0 0 1 1 0 0
10 : 1 0 0 1 0 0 0 1 0
11 : 0 1 0 1 0 0 0 1 0
12 : 0 0 1 1 0 0 0 1 0
13 : 1 0 0 0 1 0 0 1 0
14 : 0 1 0 0 1 0 0 1 0
15 : 0 0 1 0 1 0 0 1 0
16 : 1 0 0 0 0 1 0 1 0
17 : 0 1 0 0 0 1 0 1 0
18 : 0 0 1 0 0 1 0 1 0
19 : 1 0 0 1 0 0 0 0 1
20 : 0 1 0 1 0 0 0 0 1
21 : 0 0 1 1 0 0 0 0 1
22 : 1 0 0 0 1 0 0 0 1
23 : 0 1 0 0 1 0 0 0 1
24 : 0 0 1 0 1 0 0 0 1
25 : 1 0 0 0 0 1 0 0 1
26 : 0 1 0 0 0 1 0 0 1
27 : 0 0 1 0 0 1 0 0 1
Note, this implementation will be memory efficient, however it will be very slow.

Best way to covert List to Matrix or Tibble format?

I'm am seeking a decent way to convert output from a function as a list into a matrix or tibble format.
The following tibble feeds into a function. The function returns a list. In this simple example, the returned list happens to contain the same values as the function input tibble.
# # A tibble: 6 x 15
# rev CoS gm sga ebitda bd ebit ie ii gain ebt chg_DTL current tax ni
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
This is the list that is returned from the function.
> ni_out
$rev
[1] 0 0 0 0 0 0
$CoS
[1] 0 0 0 0 0 0
$gm
[1] 0 0 0 0 0 0
$sga
[1] 0 0 0 0 0 0
$ebitda
[1] 0 0 0 0 0 0
$bd
[1] 0 0 0 0 0 0
$ebit
[1] 0 0 0 0 0 0
$ie
[1] 0 0 0 0 0 0
$ii
[1] 0 0 0 0 0 0
$gain
[1] 0 0 0 0 0 0
$ebt
[1] 0 0 0 0 0 0
$chg_DTL_net
[1] 0 0 0 0 0 0
$current
[1] 0 0 0 0 0 0
$tax
[1] 0 0 0 0 0 0
$ni
[1] 0 0 0 0 0 0
I desire to convert that back into something more pleasing to look at such as the original tibble format or a matrix.
I obtain the dimensions of the list output .
lengths(ni_out)[[1]]
# [1] 6
> length(ni_out)
# [1] 15
However, my unsuccessful attempt at a matrix appears as the following.
as.matrix(unlist(ni_out), nrow = lengths(ni_out)[[1]], ncol = length(ni_out))
# [,1]
# rev1 0
# rev2 0
# rev3 0
# rev4 0
# rev5 0
# rev6 0
# CoS1 0
# CoS2 0
# CoS3 0
# CoS4 0
# CoS5 0
# CoS6 0
# gm1 0
# gm2 0
# gm3 0
# gm4 0
# gm5 0
# gm6 0
# sga1 0
# sga2 0
# sga3 0
# sga4 0
# sga5 0
# sga6 0
# ebitda1 0
# ebitda2 0
# etc.
Thoughts for a matrix or tibble format ??
Next time please provide a reproducible example.
If your list is called mylist I would try data.table::rbindlist(mylist)
Please see an example below including the conversion of vectors to data.frames.
dat <- 0:5
mylist <- list(dat, dat, dat)
mylist <- lapply(mylist, function(x) data.frame(t(x)))
data.table::rbindlist(mylist)
> data.table::rbindlist(mylist)
X1 X2 X3 X4 X5 X6
1: 0 1 2 3 4 5
2: 0 1 2 3 4 5
3: 0 1 2 3 4 5
EDIT: it seems you want to cbind instead of rbind, so I would use the below in that case.
dat <- 0:5
mylist <- list(dat, dat, dat)
mylist <- lapply(mylist, function(x) data.frame(x))
dplyr::bind_cols(mylist)
x...1 x...2 x...3
1 0 0 0
2 1 1 1
3 2 2 2
4 3 3 3
5 4 4 4
6 5 5 5
As you can see the answer is different depending on what you want and therefore it's important to provide an example.
You can use do.call funtion like this:
a <- list(data.frame(x=1:5),data.frame(y=1:5))
do.call("cbind",a)
Check cbindlist function too.
simply call data.frame or as_tibble on the list:
l <- list(x=rep(0,6),y=rep(0,6), z=rep(0,6), t=rep(0,6))
data.frame(l)
x y z t
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
as_tibble(l)
# A tibble: 6 x 4
x y z t
<dbl> <dbl> <dbl> <dbl>
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
as for matrix transform it first to a data.frame then to a matrix
as.matrix(data.frame(l))
x y z t
[1,] 0 0 0 0
[2,] 0 0 0 0
[3,] 0 0 0 0
[4,] 0 0 0 0
[5,] 0 0 0 0
[6,] 0 0 0 0
Another option with as.data.table
library(data.table)
as.data.table(l)
data
l <- list(x=rep(0,6),y=rep(0,6), z=rep(0,6), t=rep(0,6))

Building a symmetric binary matrix

I have a matrix that is for example like this:
rownames V1
a 1
c 3
b 2
d 4
y 2
q 4
i 1
j 1
r 3
I want to make a Symmetric binary matrix that it's dimnames of that is the same as rownames of above matrix. I want to fill these matrix by 1 & 0 in such a way that 1 indicated placing variables that has the same number in front of it and 0 for the opposite situation.This matrix would be like
dimnames
a c b d y q i j r
a 1 0 0 0 0 0 1 1 0
c 0 1 0 0 0 0 0 0 1
b 0 0 1 0 1 0 0 0 0
d 0 0 0 1 0 1 0 0 0
y 0 0 1 0 1 0 0 0 0
q 0 0 0 1 0 1 0 0 0
i 1 0 0 0 0 0 1 1 0
j 1 0 0 0 0 0 1 1 0
r 0 1 0 0 0 0 0 0 1
Anybody know how can I do that?
Use dist:
DF <- read.table(text = "rownames V1
a 1
c 3
b 2
d 4
y 2
q 4
i 1
j 1
r 3", header = TRUE)
res <- as.matrix(dist(DF$V1)) == 0L
#alternatively:
#res <- !as.matrix(dist(DF$V1))
#diag(res) <- 0L #for the first version of the question, i.e. a zero diagonal
res <- +(res) #for the second version, i.e. to coerce to an integer matrix
dimnames(res) <- list(DF$rownames, DF$rownames)
# 1 2 3 4 5 6 7 8 9
#1 1 0 0 0 0 0 1 1 0
#2 0 1 0 0 0 0 0 0 1
#3 0 0 1 0 1 0 0 0 0
#4 0 0 0 1 0 1 0 0 0
#5 0 0 1 0 1 0 0 0 0
#6 0 0 0 1 0 1 0 0 0
#7 1 0 0 0 0 0 1 1 0
#8 1 0 0 0 0 0 1 1 0
#9 0 1 0 0 0 0 0 0 1
You can do this using table and crossprod.
tcrossprod(table(DF))
# rownames
# rownames a b c d i j q r y
# a 1 0 0 0 1 1 0 0 0
# b 0 1 0 0 0 0 0 0 1
# c 0 0 1 0 0 0 0 1 0
# d 0 0 0 1 0 0 1 0 0
# i 1 0 0 0 1 1 0 0 0
# j 1 0 0 0 1 1 0 0 0
# q 0 0 0 1 0 0 1 0 0
# r 0 0 1 0 0 0 0 1 0
# y 0 1 0 0 0 0 0 0 1
If you want the row and column order as they are found in the data, rather than alphanumerically, you can subset
tcrossprod(table(DF))[DF$rownames, DF$rownames]
or use factor
tcrossprod(table(factor(DF$rownames, levels=unique(DF$rownames)), DF$V1))
If your data is large or sparse, you can use the sparse matrix algebra in xtabs, with similar ways to change the order of the resulting table as before.
Matrix::tcrossprod(xtabs(data=DF, ~ rownames + V1, sparse=TRUE))

How to force table to have equal dimensions?

How can I force the dimensions of a table to be equal in R?
For example:
a <- c(0,1,2,3,4,5,1,3,4,5,3,4,5)
b <- c(1,2,3,3,3,3,3,3,3,3,5,5,6)
c <- table(a,b)
print(c)
# b
#a 1 2 3 5 6
# 0 1 0 0 0 0
# 1 0 1 1 0 0
# 2 0 0 1 0 0
# 3 0 0 2 1 0
# 4 0 0 2 1 0
# 5 0 0 2 0 1
However, I am looking for the following result:
print(c)
# b
#a 0 1 2 3 4 5 6
# 0 0 1 0 0 0 0 0
# 1 0 0 1 1 0 0 0
# 2 0 0 0 1 0 0 0
# 3 0 0 0 2 0 1 0
# 4 0 0 0 2 0 1 0
# 5 0 0 0 2 0 0 1
# 6 0 0 0 0 0 0 0
By using factors. table doesn't know the levels of your variable unless you tell it in some way!
a <- c(0,1,2,3,4,5,1,3,4,5,3,4,5)
b <- c(1,2,3,3,3,3,3,3,3,3,5,5,6)
a <- factor(a, levels = 0:6)
b <- factor(b, levels = 0:6)
table(a,b)
# b
#a 0 1 2 3 4 5 6
# 0 0 1 0 0 0 0 0
# 1 0 0 1 1 0 0 0
# 2 0 0 0 1 0 0 0
# 3 0 0 0 2 0 1 0
# 4 0 0 0 2 0 1 0
# 5 0 0 0 2 0 0 1
# 6 0 0 0 0 0 0 0
Edit The general way to force a square cross-tabulation is to do something like
x <- factor(a, levels = union(a, b))
y <- factor(b, levels = union(a, b))
table(x, y)

How can I calculate an empirical CDF in R?

I'm reading a sparse table from a file which looks like:
1 0 7 0 0 1 0 0 0 5 0 0 0 0 2 0 0 0 0 1 0 0 0 1
1 0 0 1 0 0 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1
1 0 0 1 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 2 1 0 1 0 1
Note row lengths are different.
Each row represents a single simulation. The value in the i-th column in each row says how many times value i-1 was observed in this simulation. For example, in the first simulation (first row), we got a single result with value '0' (first column), 7 results with value '2' (third column) etc.
I wish to create an average cumulative distribution function (CDF) for all the simulation results, so I could later use it to calculate an empirical p-value for true results.
To do this I can first sum up each column, but I need to take zeros for the undef columns.
How do I read such a table with different row lengths? How do I sum up columns replacing 'undef' values with 0'? And finally, how do I create the CDF? (I can do this manually but I guess there is some package which can do that).
This will read the data in:
dat <- textConnection("1 0 7 0 0 1 0 0 0 5 0 0 0 0 2 0 0 0 0 1 0 0 0 1
1 0 0 1 0 0 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1
1 0 0 1 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 2 1 0 1 0 1")
df <- data.frame(scan(dat, fill = TRUE, what = as.list(rep(1, 29))))
names(df) <- paste("Val", 1:29)
close(dat)
Resulting in:
> head(df)
Val 1 Val 2 Val 3 Val 4 Val 5 Val 6 Val 7 Val 8 Val 9 Val 10 Val 11 Val 12
1 1 0 7 0 0 1 0 0 0 5 0 0
2 1 0 0 1 0 0 0 3 0 0 0 0
3 0 0 0 1 0 0 0 2 0 0 0 0
4 1 0 0 1 0 3 0 0 0 0 1 0
5 0 0 0 1 0 0 0 2 0 0 0 0
....
If the data are in a file, provide the file name instead of dat. This code presumes that there are a maximum of 29 columns, as per the data you supplied. Alter the 29 to suit the real data.
We get the column sums using
df.csum <- colSums(df, na.rm = TRUE)
the ecdf() function generates the ECDF you wanted,
df.ecdf <- ecdf(df.csum)
and we can plot it using the plot() method:
plot(df.ecdf, verticals = TRUE)
You can use the ecdf() (in base R) or Ecdf() (from the Hmisc package) functions.

Resources