Split variable into multiple multiple factor variables - r

I have some dataset similar to this:
df <- data.frame(n = seq(1:1000000), x = sample(LETTERS, 1000000, replace = T))
I'm looking for a guidance in finding a way to split variable x into multiple categorical variables with range 0-1
In the end it would look like this:
n x A B C D E F G H . . .
1 D 0 0 0 1 0 0 0 0 . . .
2 B 0 1 0 0 0 0 0 0 . . .
3 F 0 0 0 0 0 1 0 0 . . .
In my dataset, there's way more codes in variable x so adding each new variable manually would be too time consuming.
I was thinking about sorting codes in var x and assigning them an unique number each, then creating an iterating loop that creates new variable for each code in variable x.
But i feel like i'm overcomplicating things

A fast and easy way is to use fastDummies::dummy_cols:
fastDummies::dummy_cols(df, "x")
An alternative with tidyverse functions:
library(tidyverse)
df %>%
left_join(., df %>% mutate(value = 1) %>%
pivot_wider(names_from = x, values_from = value, values_fill = 0) %>%
relocate(n, sort(colnames(.)[-1])))
output
> dummmy <- fastDummies::dummy_cols(df, "x")
> colnames(dummy)[-c(1,2)] <- LETTERS
> dummy
n x A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
1 1 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
2 2 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
3 3 E 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 4 H 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 5 T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
6 6 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
7 7 R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
8 8 F 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 9 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
10 10 S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
Benchmark
Since there are many solutions and the question involves a large dataset, a benchmark might help. The nnet solution is the fastest according to the benchmark.
set.seed(1)
df <- data.frame(n = seq(1:1000000), x = sample(LETTERS, 1000000, replace = T))
library(microbenchmark)
bm <- microbenchmark(
fModel.matrix(),
fContrasts(),
fnnet(),
fdata.table(),
fFastDummies(),
fDplyr(),
times = 10L,
setup = gc(FALSE)
)
autoplot(bm)

Using match. First create a vector of zeroes, then match letter of df row with vector from the alphabet and turn to 1. You may use builtin LETTERS constant. Finally Vectorize the thing and cbind.
f <- \(x) {
z <- numeric(length(LETTERS))
z[match(x, LETTERS)] <- 1
setNames(z, LETTERS)
}
cbind(df, t(Vectorize(f)(df$x)))
# n x A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
# Q 1 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
# E 2 E 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# A 3 A 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# Y 4 Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
# J 5 J 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# D 6 D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# R 7 R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
# Z 8 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
# Q.1 9 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
# O 10 O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
Alternatively, transform x to a factor with LETTERS as levels and use model.matrix.
df <- transform(df, x=factor(x, levels=LETTERS))
cbind(df, `colnames<-`(model.matrix(~ 0 + x, df), LETTERS))
# n x A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
# 1 1 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
# 2 2 E 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 3 3 A 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 4 4 Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
# 5 5 J 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 6 6 D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 7 7 R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
# 8 8 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
# 9 9 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
# 10 10 O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
Data:
n <- 10
set.seed(42)
df <- data.frame(n = seq(1:n), x = sample(LETTERS, n, replace = T))

using data.table
library(data.table)
setDT(df) #make df a data.table if needed
merge(df, dcast(df, n ~ x, fun.agg = length), by = c("n"))

The main question here is that of resources? I think. I found using nnet is a fast solution:
library(nnet)
library(dplyr)
df %>% cbind(class.ind(.$x) == 1) %>%
mutate(across(-c(n, x), ~.*1))
n x A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
1 1 E 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 2 H 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 3 L 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 4 M 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
5 5 R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
6 6 A 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 7 Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
8 8 Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
9 9 F 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 10 U 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
11 11 O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
12 12 I 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13 13 O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
14 14 Z 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
15 15 P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
16 16 T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
17 17 F 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 18 K 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19 19 H 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20 20 V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
21 21 V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
22 22 G 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23 23 P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
24 24 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
25 25 V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
26 26 R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
27 27 Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
28 28 B 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
29 29 D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
30 30 M 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
31 31 E 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
32 32 V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
33 33 S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
34 34 Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
35 35 T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
[ reached 'max' / getOption("max.print") -- omitted 999965 rows ]
>

Another option would be to use ==.
. <- unique(df$x)
cbind(df, +do.call(cbind, lapply(setNames(., .), `==`, df$x)))
# n x C I L T Y
#1 1 I 0 1 0 0 0
#2 2 C 1 0 0 0 0
#3 3 C 1 0 0 0 0
#4 4 Y 0 0 0 0 1
#5 5 L 0 0 1 0 0
#6 6 T 0 0 0 1 0
#...
Or in one line using sapply.
cbind(df, +sapply(unique(df$x), `==`, df$x))
Or use contrasts and match them to df$x.
. <- contrasts(as.factor(df$x), FALSE)
#. <- contrasts(as.factor(unique(df$x)), FALSE) #Alternative
cbind(df, .[match(df$x, rownames(.)),])
#cbind(df, .[fastmatch::fmatch(df$x, rownames(.)),]) #Alternative
Or indexing in a matrix.
. <- unique(df$x) #Could be sorted
#. <- collapse::funique(df$x) #Alternative
#. <- kit::funique(df$x) #Alternative
i <- match(df$x, .)
#i <- fastmatch::fmatch(df$x, .) #Alternative
#i <- data.table::chmatch(df$x, .) #Alternative
nc <- length(.)
nr <- length(i)
cbind(df, matrix(`[<-`(integer(nc * nr), 1:nr + nr * (i - 1), 1), nr, nc,
dimnames=list(NULL, .)))
Or using outer.
. <- unique(df$x)
cbind(df, +outer(df$x, setNames(., .), `==`))
Or using rep and mĖ€atrix`.
. <- unique(df$x)
n <- nrow(df)
cbind(df, +matrix(df$x == rep(., each=n), n, dimnames=list(NULL, .)))
Benchmark of some methods which will work for more codes in variable x and not only for e.g. LETTERS.
set.seed(42)
df <- data.frame(n = seq(1:1000000), x = sample(LETTERS, 1000000, replace = T))
library(nnet)
library(dplyr)
microbenchmark::microbenchmark(times = 10L, setup = gc(FALSE), control=list(order="block")
, "nnet" = df %>% cbind(class.ind(.$x) == 1) %>%
mutate(across(-c(n, x), ~.*1))
, "contrasts" = {. <- contrasts(as.factor(df$x), FALSE)
cbind(df, .[match(df$x, rownames(.)),])}
, "==" = {. <- unique(df$x)
cbind(df, +do.call(cbind, lapply(setNames(., .), `==`, df$x)))}
, "==Sapply" = cbind(df, +sapply(unique(df$x), `==`, df$x))
, "matrix" = {. <- unique(df$x)
i <- match(df$x, .)
nc <- length(.)
nr <- length(i)
cbind(df, matrix(`[<-`(integer(nc * nr), 1:nr + nr * (i - 1), 1), nr, nc,
dimnames=list(NULL, .)))}
, "outer" = {. <- unique(df$x)
cbind(df, +outer(df$x, setNames(., .), `==`))}
, "rep" = {. <- unique(df$x)
n <- nrow(df)
cbind(df, +matrix(df$x == rep(., each=n), n, dimnames=list(NULL, .)))}
)
Result
Unit: milliseconds
expr min lq mean median uq max neval
nnet 208.6898 220.2304 326.2210 305.5752 386.3385 541.0621 10
contrasts 1110.0123 1168.7651 1263.5357 1216.1403 1357.0532 1514.4411 10
== 146.2217 156.8141 208.2733 185.1860 275.3909 278.8497 10
==Sapply 290.0458 291.4543 301.3010 295.0557 298.0274 358.0531 10
matrix 302.9993 304.8305 312.9748 306.8981 310.0781 363.0773 10
outer 524.5230 583.5224 603.3300 586.3054 595.4086 807.0260 10
rep 276.2110 285.3983 389.8187 434.2754 435.8607 442.3403 10

Related

R multiple for loop

I have this loop over the file msp.chr1
for(i in names(msp.chr1[c(7:70)])){
tmp <- rle(msp.chr1[[i]])$lengths
msp.chr1$idx <- rep(1:length(tmp),tmp)
tmp2 <- unlist(by(msp.chr1[msp.chr1[[i]]==1,], list(msp.chr1$idx[msp.chr1[[i]]==1]),function(x){tail(x["epos"],1)-head(x["spos"],1)}))
assign(paste(i, ".chr1", sep=""), as.vector(tmp2))
rm(i); rm(tmp); rm(tmp2)
}
This file is a dataframe with multiple columns:
head(msp.chr1)
chm spos epos sgpos egpos nsnps PDAC1.0 PDAC1.1 PDAC10.0 PDAC10.1 PDAC100.0 PDAC100.1 PDAC101.0 PDAC101.1 PDAC102.0 PDAC102.1 PDAC103.0 PDAC103.1
1 1 123492 134160 0.12 0.13 252 0 0 0 0 1 0 0 0 0 0 0 0
2 1 134160 135025 0.13 0.14 20 0 0 0 0 1 0 0 0 0 0 0 0
3 1 135025 145600 0.14 0.15 150 0 0 0 0 1 0 0 0 0 0 0 0
4 1 145600 316603 0.15 0.32 195 0 1 0 0 1 0 0 1 0 0 0 1
5 1 316603 520140 0.32 0.52 765 0 0 0 0 0 0 0 0 0 0 0 0
6 1 520140 667054 0.52 0.67 1080 0 0 0 0 0 0 0 0 0 0 0 0
PDAC104.0 PDAC104.1 PDAC105.0 PDAC105.1 PDAC11.0 PDAC11.1 PDAC12.0 PDAC12.1 PDAC13.0 PDAC13.1 PDAC14.0 PDAC14.1 PDAC15.0 PDAC15.1 PDAC17.0 PDAC17.1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
3 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PDAC18.0 PDAC18.1 PDAC19.0 PDAC19.1 PDAC2.0 PDAC2.1 PDAC20.0 PDAC20.1 PDAC21.0 PDAC21.1 PDAC22.0 PDAC22.1 PDAC23.0 PDAC23.1 PDAC24.0 PDAC24.1 PDAC25.0
1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
2 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0
3 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0
4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
PDAC25.1 PDAC3.0 PDAC3.1 PDAC4.0 PDAC4.1 PDAC5.0 PDAC5.1 PDAC6.0 PDAC6.1 PDAC7.0 PDAC7.1 PDAC8.0 PDAC8.1 PDAC807.0 PDAC807.1 PDAC810.0 PDAC810.1
1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
PDAC9.0 PDAC9.1 idx
1 0 0 1
2 0 0 1
3 0 0 1
4 0 0 1
5 1 0 1
6 1 0 1
for(i in names(msp.chr1[c(7:70)])){
tmp <- rle(msp.chr1[[i]])$lengths
msp.chr1$idx <- rep(1:length(tmp),tmp)
tmp2 <- unlist(by(msp.chr1[msp.chr1[[i]]==1,], list(msp.chr1$idx[msp.chr1[[i]]==1]),function(x){tail(x["epos"],1)-head(x["spos"],1)}))
assign(paste(i, ".chr1", sep=""), as.vector(tmp2))
rm(i); rm(tmp); rm(tmp2)
}
But I actually have 23 files, of names msp.chr1, msp.chr2, ..., msp.chr23.
I want to add another loop on the above, to do that on all files at once.
I tried several things but it is not working...
Basically, every chr1 in my loop (including in the assign) should be replaced by chr1 to chr23.
Can you help?
Thanks,
You can generate the name of the file with paste, and then get the file by its name with get. A better option would be to create these files within a list, then you'd only use the j like df=list[[j]].
for(j in 1:23){
df = get(paste("msp.chr",j,sep=""))
for(i in names(df[c(7:70)])){
tmp <- rle(df[[i]])$lengths
df$idx <- rep(1:length(tmp),tmp)
tmp2 <- unlist(by(df[df[[i]]==1,], list(df$idx[df[[i]]==1]),function(x){tail(x["epos"],1)-head(x["spos"],1)}))
assign(paste(i, ".chr1", sep=""), as.vector(tmp2))
rm(i); rm(tmp); rm(tmp2)
}
}

Simplifying/condensing long time series functions in R

I have a long time series loop that I would like to simplify/condense. I am trying to simulate the calving of a cattle herd over a period of ten years (monthly intervals) using a random binomial distribution. The function starts with the assumption that the cattle have been covered by the bull. Each variable is affected by the previous. The variables are as follows:
G1:G9 gestation for each month.
MC1:MC7 mothers with calves for 7 months, then after the calves are weaned.
Rest1:Rest6 periods of rest before they are covered by the bull again.
DeadCows based on the mortality rate.
NPreg non-pregnant cows based on the conception rate.
Inputs:
size_cowherd, number of cattle in the herd.
concep, conception rate.
Thanks in advance.
The code I have is as follows:
size_cowherd<-100
concep<-0.95
cows <- function(t=119, mort=0.0005){
G1<- numeric(length = t + 1)
G2<- numeric(length = t + 1)
G3<- numeric(length = t + 1)
G4<- numeric(length = t + 1)
G5<- numeric(length = t + 1)
G6<- numeric(length = t + 1)
G7<- numeric(length = t + 1)
G8<- numeric(length = t + 1)
G9<- numeric(length = t + 1)
MC1<- numeric(length = t + 1)
MC2<- numeric(length = t + 1)
MC3<- numeric(length = t + 1)
MC4<- numeric(length = t + 1)
MC5<- numeric(length = t + 1)
MC6<- numeric(length = t + 1)
MC7<- numeric(length = t + 1)
Rest1<- numeric(length = t + 1)
Rest2<- numeric(length = t + 1)
Rest3<- numeric(length = t + 1)
Rest4<- numeric(length = t + 1)
Rest5<- numeric(length = t + 1)
Rest6<- numeric(length = t + 1)
DeadCows <- numeric(length = t + 1)
NPreg <- numeric(length = t + 1)
G1[1]<- rbinom(1,size_cowherd,concep)
G2[1]<- 0
G3[1]<- 0
G4[1]<- 0
G5[1]<- 0
G6[1]<- 0
G7[1]<- 0
G8[1]<- 0
G9[1]<- 0
MC1[1]<- 0
MC2[1]<- 0
MC3[1]<- 0
MC4[1]<- 0
MC5[1]<- 0
MC6[1]<- 0
MC7[1]<- 0
Rest1[1]<-0
Rest2[1]<-0
Rest3[1]<-0
Rest4[1]<-0
Rest5[1]<-0
Rest6[1]<-0
DeadCows[1] <- 0
NPreg[1] <- size_cowherd - G1[1]
for(step in 1:t){
G2[step+1] <- rbinom(1, G1[step], (1-mort))
G3[step+1] <- rbinom(1, G2[step], (1-mort))
G4[step+1] <- rbinom(1, G3[step], (1-mort))
G5[step+1] <- rbinom(1, G4[step], (1-mort))
G6[step+1] <- rbinom(1, G5[step], (1-mort))
G7[step+1] <- rbinom(1, G6[step], (1-mort))
G8[step+1] <- rbinom(1, G7[step], (1-mort))
G9[step+1] <- rbinom(1, G8[step], (1-mort))
MC1[step+1] <- rbinom(1, G9[step], (1-mort))
MC2[step+1] <- rbinom(1, MC1[step], (1-mort))
MC3[step+1] <- rbinom(1, MC2[step], (1-mort))
MC4[step+1] <- rbinom(1, MC3[step], (1-mort))
MC5[step+1] <- rbinom(1, MC4[step], (1-mort))
MC6[step+1] <- rbinom(1, MC5[step], (1-mort))
MC7[step+1] <- rbinom(1, MC6[step], (1-mort))
Rest1[step+1] <- rbinom(1,MC7[step],(1-mort))
Rest2[step+1] <- rbinom(1,Rest1[step],(1-mort))
Rest3[step+1] <- rbinom(1,Rest2[step],(1-mort))
Rest4[step+1] <- rbinom(1,Rest3[step],(1-mort))
Rest5[step+1] <- rbinom(1,Rest4[step],(1-mort))
Rest6[step+1] <- rbinom(1,Rest5[step],(1-mort))
G1[step+1] <- rbinom(1, Rest6[step], (1-mort))
DeadCows[step+1] <-sum(G1[step]-G2[step+1],G2[step]-G3[step+1],G3[step]-
G4[step+1],G4[step]-G5[step+1],G5[step]-G6[step+1],G6[step]-
G7[step+1],G7[step]-G8[step+1],G8[step]-G9[step+1],G9[step]-
MC1[step+1],MC1[step]-MC2[step+1],MC2[step]-MC3[step+1],MC3[step]-
MC4[step+1],MC4[step]-MC5[step+1],MC5[step]-MC6[step+1],MC6[step]-
MC7[step+1],MC7[step]-Rest1[step+1],Rest1[step]-
Rest2[step+1],Rest2[step]-Rest3[step+1],Rest3[step]-
Rest4[step+1],Rest4[step]-Rest5[step+1],Rest5[step]-
Rest6[step+1],Rest6[step]-G1[step+1])
if(G1[step]<size_cowherd){
G1[step+1]<- rbinom(1,Rest6[step], concep)
NPreg[step+1]<-Rest6[step]-G1[step+1]
}
}
out <-cbind(G1,G2,G3,G4,G5,G6,G7,G8,G9,MC1,MC2,MC3,MC4,MC5,MC6,MC7,Rest1,R
est2,Rest3,Rest4,Rest5,Rest6,DeadCows,NPreg)
return(out)
}
Below is a sample of what the output should look like. In the 23rd month, the cycle restarts again.
G1 G2 G3 G4 G5 G6 G7 G8 G9 MC1 MC2 MC3 MC4 MC5 MC6 MC7 Rest1 Rest2 Rest3
Rest4 Rest5 Rest6 DeadCows NPreg
1 96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 4
2 0 96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0
3 0 0 96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0
4 0 0 0 96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0
5 0 0 0 0 96 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0
6 0 0 0 0 0 96 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0
7 0 0 0 0 0 0 96 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0
8 0 0 0 0 0 0 0 96 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0
9 0 0 0 0 0 0 0 0 96 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 96 0 0 0 0 0 0 0 0 0
0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 96 0 0 0 0 0 0 0 0
0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 0 96 0 0 0 0 0 0 0
0 0 0 0 0
13 0 0 0 0 0 0 0 0 0 0 0 0 96 0 0 0 0 0 0
0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0 96 0 0 0 0 0
0 0 0 0 0
15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 96 0 0 0 0
0 0 0 0 0
16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 96 0 0 0
0 0 0 0 0
17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 96 0 0
0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 96 0
0 0 0 0 0
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 95
0 0 0 1 0
20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
95 0 0 0 0
21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 95 0 0 0
22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 94 1 0
Something like this should work for you. I think the trick here is to make use of matrices to keep the bookkeeping a bit more straightforward.
size_cowherd <- 100
concep <- 0.95
stage_names <- c(paste0("G",seq(9)), paste0("MC",seq(7)), paste0("Rest",seq(6)))
cows <- function(size_cowherd, concep, t=220, mort=0.0005, names=stage_names) {
n_stages <- length(names)
stages <- matrix(0, t, n_stages)
dead_cows <- n_preg <- rep(NA, t)
stages[1,1] <- rbinom(1, size_cowherd, concep)
dead_cows[1] <- 0
n_preg[1] <- size_cowherd - stages[1,1]
for(tt in 2:t) {
stages[tt,1] <- rbinom(1, stages[tt-1,n_stages], 1-mort)
for(i in 2:n_stages) {
stages[tt,i] <- rbinom(1, stages[tt-1,i-1], 1-mort)
}
dead_cows[tt] <- sum(stages[tt-1,] - stages[tt,c(2:n_stages,1)])
if(stages[tt-1,1] < size_cowherd) {
stages[tt, 1] <- rbinom(1, stages[tt-1,n_stages], concep)
n_preg[tt] <- stages[tt-1,n_stages] - stages[tt,1]
}
}
res <- cbind(stages, dead_cows, n_preg)
colnames(res) <- c(names, "Dead", "N_Preg")
return(res)
}
head(cows(100, 0.95), 24)
G1 G2 G3 G4 G5 G6 G7 G8 G9 MC1 MC2 MC3 MC4 MC5 MC6 MC7 Rest1 Rest2 Rest3 Rest4 Rest5 Rest6 Dead N_Preg
[1,] 96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4
[2,] 0 96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[7,] 0 0 0 0 0 0 95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
[8,] 0 0 0 0 0 0 0 95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[9,] 0 0 0 0 0 0 0 0 95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[10,] 0 0 0 0 0 0 0 0 0 95 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[11,] 0 0 0 0 0 0 0 0 0 0 95 0 0 0 0 0 0 0 0 0 0 0 0 0
[12,] 0 0 0 0 0 0 0 0 0 0 0 95 0 0 0 0 0 0 0 0 0 0 0 0
[13,] 0 0 0 0 0 0 0 0 0 0 0 0 95 0 0 0 0 0 0 0 0 0 0 0
[14,] 0 0 0 0 0 0 0 0 0 0 0 0 0 95 0 0 0 0 0 0 0 0 0 0
[15,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 95 0 0 0 0 0 0 0 0 0
[16,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 95 0 0 0 0 0 0 0 0
[17,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 94 0 0 0 0 0 1 0
[18,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 94 0 0 0 0 0 0
[19,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 94 0 0 0 0 0
[20,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 94 0 0 0 0
[21,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 94 0 0 0
[22,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 94 0 0
[23,] 92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2
[24,] 0 92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

R circlize: Error in circos.initialize

I am able to follow the Circlize example in the description of the package on CRAN easily:
library('circlize')
set.seed(123)
mat = matrix(sample(1:100, 18, replace = TRUE), 3, 6)
rownames(mat) = letters[1:3]
colnames(mat) = LETTERS[1:6]
### basic settings
par(mfrow = c(3, 2))
par(mar = c(1, 1, 1, 1))
chordDiagram(mat)
however, when I replace mat with myMatrix I get this error:
Error in circos.initialize(factors = factor(cate, levels = cate), xlim = cbind(rep(0, :
Since `xlim` is a matrix, it should have same number of rows as the length of the level of `factors` and number of columns of 2.
Can somebody explain why I am getting that message? I do not see a difference between mat and myMatrix other than myMatrix is larger:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A2 B2 C2 D2
A 1060360.659 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
B 0 32143148.75 996976.8445 0 4944648.524 5688385.041 61990.5913 0 0 0 0 -1563.225 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31922242.6
C 0 0 6342776.843 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D 0 0 0 28617385.81 17842142.64 0 0 0 0 0 0 0 0 409444.5633 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E 0 0 0 4990921.202 105686446.3 536246.2188 0 0 0 0 0 0 0 8587899.583 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 378565.5746
F 0 92732.7741 0 4282.9319 33543553.89 36773976.59 1894761.93 0 0 333209.342 0 20739.0655 327956.7365 0 1022673.163 12229.0255 0 0 386112.1743 224039.3207 0 2395066.197 268247.2897 0 0 0 0 0 0 11926701.96
G 0 0 0 0 0 0 7753767.003 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 5184133.29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I 0 0 0 0 462767.7374 0 0 0 8992223.296 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
J 0 0 0 0 0 0 0 0 0 1950552.642 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
K 0 0 0 0 891032.5584 0 0 0 0 0 520107.9821 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26724.8402 0 0 0 418902.5203
L 0 0 0 0 32044317.54 28147.5693 0 0 0 0 0 5383919.293 0 489912.5412 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4559115.003
M 0 0 0 0 0 3125823.41 0 0 0 0 0 0 1738293.164 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
N 0 1053825.966 -8526.9758 1283429.314 60333051.34 2621812.931 -1130.1924 0 -779545.8004 8055145.684 918.8702 -379747.1919 -177.6205 298563606.5 -9316.8654 0 0 0 0 0 2631991.077 0 0 0 0 0 1107369.803 0 0 118812465
O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1500451.292 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7432418.396
P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Q 0 0 1496058.76 0 -4056617.74 294503 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 410.4 0 0 0 0 0 0 0 1765984767
Code
dd <- read.table(header = TRUE, text = " rn A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A2 B2 C2 D2
A 1060360.659 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
B 0 32143148.75 996976.8445 0 4944648.524 5688385.041 61990.5913 0 0 0 0 -1563.225 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31922242.6
C 0 0 6342776.843 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D 0 0 0 28617385.81 17842142.64 0 0 0 0 0 0 0 0 409444.5633 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E 0 0 0 4990921.202 105686446.3 536246.2188 0 0 0 0 0 0 0 8587899.583 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 378565.5746
F 0 92732.7741 0 4282.9319 33543553.89 36773976.59 1894761.93 0 0 333209.342 0 20739.0655 327956.7365 0 1022673.163 12229.0255 0 0 386112.1743 224039.3207 0 2395066.197 268247.2897 0 0 0 0 0 0 11926701.96
G 0 0 0 0 0 0 7753767.003 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 5184133.29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I 0 0 0 0 462767.7374 0 0 0 8992223.296 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
J 0 0 0 0 0 0 0 0 0 1950552.642 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
K 0 0 0 0 891032.5584 0 0 0 0 0 520107.9821 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26724.8402 0 0 0 418902.5203
L 0 0 0 0 32044317.54 28147.5693 0 0 0 0 0 5383919.293 0 489912.5412 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4559115.003
M 0 0 0 0 0 3125823.41 0 0 0 0 0 0 1738293.164 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
N 0 1053825.966 -8526.9758 1283429.314 60333051.34 2621812.931 -1130.1924 0 -779545.8004 8055145.684 918.8702 -379747.1919 -177.6205 298563606.5 -9316.8654 0 0 0 0 0 2631991.077 0 0 0 0 0 1107369.803 0 0 118812465
O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1500451.292 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7432418.396
P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Q 0 0 1496058.76 0 -4056617.74 294503 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 410.4 0 0 0 0 0 0 0 1765984767")
myMatrix <- as.matrix(dd[, -1])
rownames(myMatrix) <- dd[, 1]
chordDiagram(myMatrix)
In the old version of circlize, the matrix must be of a matrix class instead of a data.frame, so you need to convert the data frame explicitly by:
myMatrix = as.matrix(A + B)
In circlize, a data frame is for data stored as a adjacency list (e.g the first column for group1, second column for group2, third column for the strength of the relation).
Since read.table() always returns a data.frame class, in the newer version of circlize, it is fine if the matrix represents as a data frame. When it is a data frame, the chordDiagram() will first check whether the number of columns is larger than 3 and all columns are numeric. If so, it will be converted to a matrix internally.

Select n rows after specific number

I work with a data.frame like this:
Country Date balance_of_payment business_confidence_indicator consumer_confidence_indicator CPI Crisis_IMF
1 Australia 1980-01-01 -0.87 100.215 99.780 25.4 0
2 Australia 1980-04-01 -1.62 100.061 99.746 26.2 0
3 Australia 1980-07-01 -3.70 100.599 100.049 26.6 0
4 Australia 1980-10-01 -3.13 100.597 100.735 27.2 0
5 Australia 1981-01-01 -2.73 101.149 101.016 27.8 0
6 Australia 1981-04-01 -4.11 100.936 100.150 28.4 0
I want to create a summary statistic with describe(dataset)from the HmiscPackage.
I need to differentiate between the timespans n-quarters before Crisis_IMF is 1, the time in which Crisis_IMF is 1 and the state n-quaters after Crisis_IMF is 1.
To select the time in which Crisis_IMF is 1, I did describe(dataset[dataset$Crisis_IMF==1,"balance_of_payment"]).
But I do not know how to make the command over the timespan of n-quarters (e.g. 8) after the event.
Edit:
dataset$Crisis_IMF
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[60] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[119] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[178] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[237] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[296] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[355] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[414] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[473] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[532] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[591] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[650] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[709] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
[768] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[827] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[886] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[945] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1004] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1063] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1122] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1181] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1240] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1299] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1358] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1417] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1476] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
[1535] 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
[1594] 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1653] 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1712] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1771] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
[1830] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1889] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1948] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2007] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2066] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2125] 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
[2184] 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2243] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2302] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2361] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2420] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2479] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2538] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2597] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2656] 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2715] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2774] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2833] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2892] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2951] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3010] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3069] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3128] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3187] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3246] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3305] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3364] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3423] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3482] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3541] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3600] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3659] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3718] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3777] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3836] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3895] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[3954] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
[4013] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[4072] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[4131] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[4190] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[4249] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[4308] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Edit2; further information on the dataset:
Country Date balance_of_payment Crisis_IMF
1 Australia 1980-01-01 -0.87 0
2 Australia 1980-04-01 -1.62 0
3 Australia 1980-07-01 -3.70 0
4 Australia 1980-10-01 -3.13 0
5 Australia 1981-01-01 -2.73 0
6 Australia 1981-04-01 -4.11 0
7 Australia 1981-07-01 -3.98 0
8 Australia 1981-10-01 -5.27 0
9 Australia 1982-01-01 -5.31 0
10 Australia 1982-04-01 -4.67 0
11 Australia 1982-07-01 -3.30 0
12 Australia 1982-10-01 -3.24 0
13 Australia 1983-01-01 -3.45 0
14 Australia 1983-04-01 -2.86 0
15 Australia 1983-07-01 -3.58 0
...
137 Australia 2014-01-01 -2.18 0
138 Australia 2014-04-01 -3.44 0
139 Australia 2014-07-01 -3.04 0
140 Australia 2014-10-01 -2.39 0
141 Austria 1980-01-01 -3.97 0
142 Austria 1980-04-01 -3.89 0
143 Austria 1980-07-01 -1.84 0
144 Austria 1980-10-01 -1.60 0
145 Austria 1981-01-01 -2.74 0
146 Austria 1981-04-01 -2.88 0
147 Austria 1981-07-01 -2.83 0
148 Austria 1981-10-01 -2.06 0
149 Austria 1982-01-01 -0.63 0
150 Austria 1982-04-01 0.61 0
151 Austria 1982-07-01 2.42 0
152 Austria 1982-10-01 2.70 0
There can be more then one crisis period for one country. That e.g. in Australia are Crisis from 1990-01-01 to 1991-04-01 and 2002-01-01 to 2005-01-01. I want to create 3 different describe commands, which show the behaviour of the variable in the above mentioned states.
You haven't provided your full data, so I have to guess that your Crisis_IMF column has an unbroken sequence of zeroes (before the crisis), followed by an unbroken sequence of ones (during which the IMF crisis was considered to be in effect), finally followed by an unbroken sequence of zeroes (after the crisis).
Below I've synthesized my own data for testing. I only synthesized columns Crisis_IMF and balance_of_payment, because those appear to be the only columns relevant to your problem. I used 30 rows, with the first 10 before, the next 10 during, and the last 10 after the crisis. I used sort of a random parabolic arc for the balance_of_payment, but that was entirely random.
library('Hmisc');
set.seed(1);
N <- 30;
df <- data.frame(balance_of_payment=-5+2*seq(-1.5,1.5,len=N)^2+rnorm(N,0,0.2), Crisis_IMF=c(rep(0,N/3),rep(1,N/3),rep(0,N/3)) );
df;
## balance_of_payment Crisis_IMF
## 1 -0.6252908 0
## 2 -1.0625579 0
## 3 -1.8228927 0
## 4 -1.8503850 0
## 5 -2.5744076 0
## 6 -3.2324647 0
## 7 -3.3561408 0
## 8 -3.6484112 0
## 9 -3.9805631 0
## 10 -4.4136342 0
## 11 -4.2642312 1
## 12 -4.6598435 1
## 13 -4.9904788 1
## 14 -5.3947830 1
## 15 -4.7696630 1
## 16 -5.0036359 1
## 17 -4.9550811 1
## 18 -4.6774634 1
## 19 -4.5735679 1
## 20 -4.4478071 1
## 21 -4.1687610 0
## 22 -3.9392921 0
## 23 -3.7811631 0
## 24 -3.8514970 0
## 25 -2.9444058 0
## 26 -2.6515349 0
## 27 -2.2006002 0
## 28 -1.9499174 0
## 29 -1.1949166 0
## 30 -0.4164117 0
crisisRange <- range(which(df$Crisis_IMF==1));
crisisRange;
## [1] 11 20
df$Off_Crisis <- c((1-crisisRange[1]):-1,rep(0,diff(crisisRange)+1),1:(nrow(df)-crisisRange[2]));
df;
## balance_of_payment Crisis_IMF Off_Crisis
## 1 -0.6252908 0 -10
## 2 -1.0625579 0 -9
## 3 -1.8228927 0 -8
## 4 -1.8503850 0 -7
## 5 -2.5744076 0 -6
## 6 -3.2324647 0 -5
## 7 -3.3561408 0 -4
## 8 -3.6484112 0 -3
## 9 -3.9805631 0 -2
## 10 -4.4136342 0 -1
## 11 -4.2642312 1 0
## 12 -4.6598435 1 0
## 13 -4.9904788 1 0
## 14 -5.3947830 1 0
## 15 -4.7696630 1 0
## 16 -5.0036359 1 0
## 17 -4.9550811 1 0
## 18 -4.6774634 1 0
## 19 -4.5735679 1 0
## 20 -4.4478071 1 0
## 21 -4.1687610 0 1
## 22 -3.9392921 0 2
## 23 -3.7811631 0 3
## 24 -3.8514970 0 4
## 25 -2.9444058 0 5
## 26 -2.6515349 0 6
## 27 -2.2006002 0 7
## 28 -1.9499174 0 8
## 29 -1.1949166 0 9
## 30 -0.4164117 0 10
n <- 8;
describe(df[df$Off_Crisis>=-n&df$Off_Crisis<=-1,'balance_of_payment']);
## df[df$Off_Crisis >= -n & df$Off_Crisis <= -1, "balance_of_payment"]
## n missing unique Info Mean
## 8 0 8 1 -3.11
##
## -4.41363415781177 (1, 12%), -3.98056311135777 (1, 12%), -3.64841115885525 (1, 12%), -3.35614082447269 (1, 12%), -3.23246466374394 (1, 12%), -2.57440760140387 (1, 12%), -1.85038498107066 (1, 12%), -1.82289266659616 (1, 12%)
describe(df[df$Off_Crisis==0,'balance_of_payment']);
## df[df$Off_Crisis == 0, "balance_of_payment"]
## n missing unique Info Mean .05 .10 .25 .50 .75 .90 .95
## 10 0 10 1 -4.774 -5.219 -5.043 -4.982 -4.724 -4.595 -4.429 -4.347
##
## -5.39478302143074 (1, 10%), -5.00363594891363 (1, 10%), -4.99047879387293 (1, 10%), -4.95508109661503 (1, 10%), -4.76966304348196 (1, 10%), -4.67746343562751 (1, 10%), -4.65984348113626 (1, 10%), -4.57356788939893 (1, 10%), -4.44780713171369 (1, 10%), -4.26423116226702 (1, 10%)
describe(df[df$Off_Crisis>=1&df$Off_Crisis<=n,'balance_of_payment']);
## df[df$Off_Crisis >= 1 & df$Off_Crisis <= n, "balance_of_payment"]
## n missing unique Info Mean
## 8 0 8 1 -3.186
##
## -4.16876100605885 (1, 12%), -3.93929212154225 (1, 12%), -3.85149697413106 (1, 12%), -3.78116310320806 (1, 12%), -2.94440583734139 (1, 12%), -2.65153490367274 (1, 12%), -2.20060024283928 (1, 12%), -1.949917420894 (1, 12%)
The solution works by first computing the range of indexes during which the crisis was in effect as crisisRange. Then it appends to the data.frame a new column Off_Crisis which captures how many quarters offset from the crisis the row is, using negative numbers for before and positive numbers for after, and assuming each row represents exactly one quarter.
The describe() calls can then be made by subsetting on the Off_Crisis column, getting just the quarters offset from the crisis that you want for each call.
Edit: Whew! That was tough. Pretty sure I got it though:
library('Hmisc');
set.seed(1);
N <- 60;
df <- data.frame(balance_of_payment=rep(-5+2*seq(-1.5,1.5,len=N/2)^2,2)+rnorm(N,0,0.2), Crisis_IMF=c(rep(0,N/6),rep(1,N/6),rep(0,N/3),rep(1,N/6),rep(0,N/6)) );
df;
## balance_of_payment Crisis_IMF
## 1 -0.6252908 0
## 2 -1.0625579 0
## 3 -1.8228927 0
## 4 -1.8503850 0
## 5 -2.5744076 0
## 6 -3.2324647 0
## 7 -3.3561408 0
## 8 -3.6484112 0
## 9 -3.9805631 0
## 10 -4.4136342 0
## 11 -4.2642312 1
## 12 -4.6598435 1
## 13 -4.9904788 1
## 14 -5.3947830 1
## 15 -4.7696630 1
## 16 -5.0036359 1
## 17 -4.9550811 1
## 18 -4.6774634 1
## 19 -4.5735679 1
## 20 -4.4478071 1
## 21 -4.1687610 0
## 22 -3.9392921 0
## 23 -3.7811631 0
## 24 -3.8514970 0
## 25 -2.9444058 0
## 26 -2.6515349 0
## 27 -2.2006002 0
## 28 -1.9499174 0
## 29 -1.1949166 0
## 30 -0.4164117 0
## 31 -0.2282641 0
## 32 -1.1198441 0
## 33 -1.5782326 0
## 34 -2.1802021 0
## 35 -2.9157211 0
## 36 -3.1513699 0
## 37 -3.5324846 0
## 38 -3.8079388 0
## 39 -3.8757143 0
## 40 -4.1999213 0
## 41 -4.5994921 1
## 42 -4.7884845 1
## 43 -4.7268380 1
## 44 -4.8405104 1
## 45 -5.1324004 1
## 46 -5.1361483 1
## 47 -4.8789267 1
## 48 -4.7125241 1
## 49 -4.7602814 1
## 50 -4.3903659 1
## 51 -4.2729353 0
## 52 -4.2181247 0
## 53 -3.7278522 0
## 54 -3.6794993 0
## 55 -2.7817662 0
## 56 -2.2442292 0
## 57 -2.2428854 0
## 58 -1.8645939 0
## 59 -0.9853426 0
## 60 -0.5270109 0
df$Off_Crisis <- ifelse(df$Crisis_IMF==1,0,with(rle(df$Crisis_IMF),{ mids <- lengths[-c(1,length(lengths))]; c(-lengths[1]:-1,sequence(mids)-rep(rbind(0,mids+1),rbind(ceiling(mids/2),floor(mids/2))),1:lengths[length(lengths)]); }));
df;
## balance_of_payment Crisis_IMF Off_Crisis
## 1 -0.6252908 0 -10
## 2 -1.0625579 0 -9
## 3 -1.8228927 0 -8
## 4 -1.8503850 0 -7
## 5 -2.5744076 0 -6
## 6 -3.2324647 0 -5
## 7 -3.3561408 0 -4
## 8 -3.6484112 0 -3
## 9 -3.9805631 0 -2
## 10 -4.4136342 0 -1
## 11 -4.2642312 1 0
## 12 -4.6598435 1 0
## 13 -4.9904788 1 0
## 14 -5.3947830 1 0
## 15 -4.7696630 1 0
## 16 -5.0036359 1 0
## 17 -4.9550811 1 0
## 18 -4.6774634 1 0
## 19 -4.5735679 1 0
## 20 -4.4478071 1 0
## 21 -4.1687610 0 1
## 22 -3.9392921 0 2
## 23 -3.7811631 0 3
## 24 -3.8514970 0 4
## 25 -2.9444058 0 5
## 26 -2.6515349 0 6
## 27 -2.2006002 0 7
## 28 -1.9499174 0 8
## 29 -1.1949166 0 9
## 30 -0.4164117 0 10
## 31 -0.2282641 0 -10
## 32 -1.1198441 0 -9
## 33 -1.5782326 0 -8
## 34 -2.1802021 0 -7
## 35 -2.9157211 0 -6
## 36 -3.1513699 0 -5
## 37 -3.5324846 0 -4
## 38 -3.8079388 0 -3
## 39 -3.8757143 0 -2
## 40 -4.1999213 0 -1
## 41 -4.5994921 1 0
## 42 -4.7884845 1 0
## 43 -4.7268380 1 0
## 44 -4.8405104 1 0
## 45 -5.1324004 1 0
## 46 -5.1361483 1 0
## 47 -4.8789267 1 0
## 48 -4.7125241 1 0
## 49 -4.7602814 1 0
## 50 -4.3903659 1 0
## 51 -4.2729353 0 1
## 52 -4.2181247 0 2
## 53 -3.7278522 0 3
## 54 -3.6794993 0 4
## 55 -2.7817662 0 5
## 56 -2.2442292 0 6
## 57 -2.2428854 0 7
## 58 -1.8645939 0 8
## 59 -0.9853426 0 9
## 60 -0.5270109 0 10
n <- 8;
describe(df[df$Off_Crisis>=-n&df$Off_Crisis<=-1,'balance_of_payment']);
## df[df$Off_Crisis >= -n & df$Off_Crisis <= -1, "balance_of_payment"]
## n missing unique Info Mean .05 .10 .25 .50 .75 .90 .95
## 16 0 16 1 -3.133 -4.253 -4.090 -3.825 -3.294 -2.476 -1.837 -1.762
##
## -4.41363415781177 (1, 6%), -4.19992133068899 (1, 6%), -3.98056311135777 (1, 6%), -3.87571430729169 (1, 6%), -3.80793877922333 (1, 6%), -3.64841115885525 (1, 6%)
## -3.53248462570045 (1, 6%), -3.35614082447269 (1, 6%), -3.23246466374394 (1, 6%), -3.15136989958027 (1, 6%), -2.91572106713267 (1, 6%), -2.57440760140387 (1, 6%)
## -2.1802021496148 (1, 6%), -1.85038498107066 (1, 6%), -1.82289266659616 (1, 6%), -1.57823262180228 (1, 6%)
describe(df[df$Off_Crisis==0,'balance_of_payment']);
## df[df$Off_Crisis == 0, "balance_of_payment"]
## n missing unique Info Mean .05 .10 .25 .50 .75 .90 .95
## 20 0 20 1 -4.785 -5.149 -5.133 -4.964 -4.765 -4.645 -4.442 -4.384
##
## lowest : -5.395 -5.136 -5.132 -5.004 -4.990, highest: -4.599 -4.574 -4.448 -4.390 -4.264
describe(df[df$Off_Crisis>=1&df$Off_Crisis<=n,'balance_of_payment']);
## df[df$Off_Crisis >= 1 & df$Off_Crisis <= n, "balance_of_payment"]
## n missing unique Info Mean .05 .10 .25 .50 .75 .90 .95
## 16 0 16 1 -3.157 -4.232 -4.193 -3.873 -3.312 -2.244 -2.075 -1.929
##
## -4.27293530430708 (1, 6%), -4.21812466033862 (1, 6%), -4.16876100605885 (1, 6%), -3.93929212154225 (1, 6%), -3.85149697413106 (1, 6%), -3.78116310320806 (1, 6%)
## -3.72785216159621 (1, 6%), -3.67949925417454 (1, 6%), -2.94440583734139 (1, 6%), -2.78176624658013 (1, 6%), -2.65153490367274 (1, 6%), -2.24422917606577 (1, 6%)
## -2.24288543679152 (1, 6%), -2.20060024283928 (1, 6%), -1.949917420894 (1, 6%), -1.86459386937746 (1, 6%)
For this demo I synthesized five periods: 10 rows of non-crisis, 10 rows of crisis (the first), 20 rows of non-crisis, 10 rows of crisis (the second), and 10 rows of non-crisis. The algorithm is the same, namely to compute an Off_Crisis column (which was much more difficult this time!) and then use it to subset the data.frame for each describe() call. Only now, data points from different crises will be combined in the subsets.

3D Array value assignment ruins the structure of array

Here is how to reproduce my problem. I want to create a 3D array
> g=array(0,dim=c(3,31,31))
> dim(g)
[1] 3 31 31
> dim(g[1,,])
[1] 31 31
This is x with dimension 31 by 31
> dim(x)
[1] 31 31
> x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
1 NA 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
2 0 NA 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
3 2 1 NA 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
4 0 0 0 NA 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 NA 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 NA 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
7 0 0 0 0 0 1 NA 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 NA 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 NA 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 1 1 0 0 0 1 0 0 NA 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0
11 0 0 0 0 0 0 0 0 2 0 NA 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 1 1 0 0 0 NA 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
13 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 0 0 1 1 1 0 0 0 0 0 0 0 0 1 NA 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
16 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 NA 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
17 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 NA 0 0 0 0 0 0 1 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 NA 0 1 1 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 NA 0 0 0 0 0 0 0 0 0 0 0
21 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 NA 0 0 0 0 0 0 0 0 0 0
22 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 1 0 0 0 0 0 0 0 0
23 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 NA 0 0 0 0 0 0 0 0
24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 NA 0 0 1 0 0 0 0
25 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 0 0
26 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 0
27 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 NA 0 1 0 0
28 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0
29 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 NA 0 0
30 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 0
31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA
when I try to assign x to the first 'section' of g using
> g[1,,] = x
The array structure of g is totally changed, as now it becomes:
> dim(g)
NULL
> head(g)
[[1]]
[1] NA 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
[[2]]
[1] 0
[[3]]
[1] 0
[[4]]
[1] 0 NA 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
[[5]]
[1] 0
[[6]]
[1] 0
This is totally different from what I expected, I am just trying to put a matrix to g[1,,] and dim(g) should still be 3 by 31 by 31, am I wrong? where did I do wrong?
Thanks to Pascal's comments below I've amended my answer, though I've left the dimensionality changed to 31x31x3 for perhaps easier understanding. The problem is the way that the data are converted from your data.frame object before storing in your array. I think by converting first to a matrix you should get what you're looking for:
g <- array(0,dim=c(31,31,3))
m <- matrix(1, 31, 31)
x <- as.data.frame(m)
## Storing x as it is will result in g becoming a list...
#g[,,1] <- x
## Converting the data.frame to a matrix will result in
## g remaining an array:
g[,,1] <- as.matrix(x)

Resources