Fined Find common names across several years - r

I have data with two columns, the first column is years from 1980 :2020 and the second column contains characters or names within each year. I am interested in producing two columns` table with the number of times specific name was present in one column and the years that the name was replicated in, in another column.
Any idea how to do that?
Thank you in advance for your time and help

Do you mean aggregate like below?
aggregate(year ~ ., df, toString)
such that
> aggregate(year ~ ., df, toString)
name year
1 A 1983, 1990, 2012
2 B 1984, 2007
3 C 2014
4 D 1981
5 E 2001, 2005, 2006
6 F 2015, 2018
7 G 1982, 1997
8 I 1998, 2002
9 J 1993, 1996, 2008, 2016, 2017
10 K 1986
11 L 2010
12 N 1987, 1995, 2004
13 O 1999, 2011, 2019
14 R 1988
15 S 1989
16 T 2013, 2020
17 U 1991, 1992, 2000
18 V 1994
19 W 1985
20 Y 1980, 2003, 2009
or table
> table(df)
name
year A B C D E F G I J K L N O R S T U V W Y
1980 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
1981 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1982 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1983 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1984 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1985 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
1986 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
1987 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
1988 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
1989 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
1990 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1991 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
1992 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
1993 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
1994 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
1995 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
1996 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
1997 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1998 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
1999 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
2000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
2001 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2002 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
2003 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
2004 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
2005 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2006 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2007 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2008 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
2009 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
2010 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
2011 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
2012 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
2014 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2015 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2016 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
2017 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
2018 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2019 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
2020 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
Data
set.seed(1)
df <- data.frame(
year = 1980:2020,
name = sample(LETTERS, 41, replace = TRUE)
)

I like #ThomasIsCoding solution a lot, but here are solutions using dplyr and data.table.
Example data:
dat <- data.frame(
year = sample(1990:1993, 10, replace=TRUE),
name = sample(letters[1:4], 10, replace=TRUE))
tidyverse & dplyr:
library(dplyr)
dat %>%
group_by(name) %>%
summarize(n = n(), years = toString(sort(unique(year))))
#> # A tibble: 4 x 3
#> name n years
#> <chr> <int> <chr>
#> 1 a 1 1993
#> 2 b 2 1992, 1993
#> 3 c 2 1990
#> 4 d 5 1992, 1993
data.table solution:
library(data.table)
dt = data.table(dat)
dt[, .(.N, years=toString(sort(unique(year)))), by="name"]
#> name N years
#> 1: b 2 1992, 1993
#> 2: d 5 1992, 1993
#> 3: c 2 1990
#> 4: a 1 1993

Related

Multiplying multiple columns with each other into a new dataframe in R

I want to multiply many of my binary variables into new columns, so called interactive variables. My dataset is structured like this:
YearCountry <- data.frame( Time = c("2000","2001", "2002", "2003",
"2000","2001", "2002", "2003",
"2000","2001", "2002", "2003"),
AL = c(1,1,1,1,0,0,0,0,0,0,0,0),
FR = c(0,0,0,0,1,1,1,1,0,0,0,0),
UK = c(0,0,0,0,0,0,0,0,1,1,1,1),
Y2000d = c(1,0,0,0,1,0,0,0,1,0,0,0),
Y2001d = c(0,1,0,0,0,1,0,0,0,1,0,0),
Y2002d = c(0,0,1,0,0,0,1,0,0,0,1,0),
Y2003d = c(0,0,0,1,0,0,0,1,0,0,0,1))
YearCountry
Time AL FR UK Y2000d Y2001d Y2002d Y2003d
1 2000 1 0 0 1 0 0 0
2 2001 1 0 0 0 1 0 0
3 2002 1 0 0 0 0 1 0
4 2003 1 0 0 0 0 0 1
5 2000 0 1 0 1 0 0 0
6 2001 0 1 0 0 1 0 0
7 2002 0 1 0 0 0 1 0
8 2003 0 1 0 0 0 0 1
9 2000 0 0 1 1 0 0 0
10 2001 0 0 1 0 1 0 0
11 2002 0 0 1 0 0 1 0
12 2003 0 0 1 0 0 0 1
I need to multiply the binary variable for each of the countries (AL,FR,UK) with each of the binary variables for a given year so that I get #country x #year new variables. In this case I have three countries and four years which gives 12 new variables. My full data contains 105 countries/regions and stretches over twenty years. I therefore need a general formula. I want data that looks like this
Interact <- data.frame(Time = c("2000","2001", "2002", "2003",
"2000","2001", "2002", "2003",
"2000","2001", "2002", "2003"),
Y2000xAL = c(1,0,0,0,0,0,0,0,0,0,0,0),
Y2001xAL = c(0,1,0,0,0,0,0,0,0,0,0,0),
Y2002xAL = c(0,0,1,0,0,0,0,0,0,0,0,0),
Y2003xAL = c(0,0,0,1,0,0,0,0,0,0,0,0),
Y2000xFR = c(0,0,0,0,1,0,0,0,0,0,0,0),
Y2001xFR = c(0,0,0,0,0,1,0,0,0,0,0,0),
Y2002xFR = c(0,0,0,0,0,0,1,0,0,0,0,0),
Y2003xFR = c(0,0,0,0,0,0,0,1,0,0,0,0),
Y2000xUk = c(0,0,0,0,0,0,0,0,1,0,0,0),
Y2001xUK = c(0,0,0,0,0,0,0,0,0,1,0,0),
Y2002xUK = c(0,0,0,0,0,0,0,0,0,0,1,0),
Y2003xUK = c(0,0,0,0,0,0,0,0,0,0,0,1))
Interact
Time Y2000xAL Y2001xAL Y2002xAL Y2003xAL Y2000xFR Y2001xFR Y2002xFR Y2003xFR Y2000xUk Y2001xUK Y2002xUK Y2003xUK
1 2000 1 0 0 0 0 0 0 0 0 0 0 0
2 2001 0 1 0 0 0 0 0 0 0 0 0 0
3 2002 0 0 1 0 0 0 0 0 0 0 0 0
4 2003 0 0 0 1 0 0 0 0 0 0 0 0
5 2000 0 0 0 0 1 0 0 0 0 0 0 0
6 2001 0 0 0 0 0 1 0 0 0 0 0 0
7 2002 0 0 0 0 0 0 1 0 0 0 0 0
8 2003 0 0 0 0 0 0 0 1 0 0 0 0
9 2000 0 0 0 0 0 0 0 0 1 0 0 0
10 2001 0 0 0 0 0 0 0 0 0 1 0 0
11 2002 0 0 0 0 0 0 0 0 0 0 1 0
12 2003 0 0 0 0 0 0 0 0 0 0 0 1
Here's an approach with dplyr::across. We can make the final result into a plain data.frame with purrr:invoke as demonstrated in this answer.
library(dplyr)
library(purrr)
YearCountry %>%
mutate(across(AL:UK, ~ . * select(cur_data(), Y2000d:Y2003d))) %>%
select(-(Y2000d:Y2003d)) %>%
invoke(.f = data.frame) %>%
rename_with(~str_replace(.,"\\.",""))
Time ALY2000d ALY2001d ALY2002d ALY2003d FRY2000d FRY2001d FRY2002d FRY2003d UKY2000d UKY2001d UKY2002d UKY2003d
1 2000 1 0 0 0 0 0 0 0 0 0 0 0
2 2001 0 1 0 0 0 0 0 0 0 0 0 0
3 2002 0 0 1 0 0 0 0 0 0 0 0 0
4 2003 0 0 0 1 0 0 0 0 0 0 0 0
5 2000 0 0 0 0 1 0 0 0 0 0 0 0
6 2001 0 0 0 0 0 1 0 0 0 0 0 0
7 2002 0 0 0 0 0 0 1 0 0 0 0 0
8 2003 0 0 0 0 0 0 0 1 0 0 0 0
9 2000 0 0 0 0 0 0 0 0 1 0 0 0
10 2001 0 0 0 0 0 0 0 0 0 1 0 0
11 2002 0 0 0 0 0 0 0 0 0 0 1 0
12 2003 0 0 0 0 0 0 0 0 0 0 0 1
1) model.matrix We split the names by the number of characters in them (the countries have 2 characters in their names and the years have 6) and paste pluses in each. (Alternately use Plus(grep("^..$", nms, value = TRUE)) to get the country names and use that in place of spl["2"] and similarly Plus(grep("^Y....d$", nms, value = TRUE)) in place of spl["6"].)
c(`2` = "AL+FR+UK", `6` = "Y2000d+Y2001d+Y2002d+Y2003d")
and from that the formula:
~(AL + FR + UK):(Y2000d + Y2001d + Y2002d + Y2003d) + 0
and then compute its model matrix.
The formula could also be expanded to one accepted by lm by modifying the sprintf format so we might not even need to create the model matrix. For example, if we had a response vector R then we could write: s <- sprintf("R ~ (%s)*(%s)", spl["2"], spl["4"]); fo <- formula(s); lm(fo, YearCountry) to include all variables and the interactions of countries and year as well as an intercept.
Plus <- function(x) paste(x, collapse = "+")
nms <- names(YearCountry)[-1]
spl <- sapply(split(nms, nchar(nms)), Plus)
s <- sprintf("~ (%s):(%s)+0", spl["2"], spl["6"])
fo <- formula(s)
model.matrix(fo, YearCountry)
giving this matrix:
AL:Y2000d AL:Y2001d AL:Y2002d AL:Y2003d FR:Y2000d FR:Y2001d FR:Y2002d FR:Y2003d UK:Y2000d UK:Y2001d UK:Y2002d UK:Y2003d
1 1 0 0 0 0 0 0 0 0 0 0 0
2 0 1 0 0 0 0 0 0 0 0 0 0
3 0 0 1 0 0 0 0 0 0 0 0 0
4 0 0 0 1 0 0 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0 0 0
7 0 0 0 0 0 0 1 0 0 0 0 0
8 0 0 0 0 0 0 0 1 0 0 0 0
9 0 0 0 0 0 0 0 0 1 0 0 0
10 0 0 0 0 0 0 0 0 0 1 0 0
11 0 0 0 0 0 0 0 0 0 0 1 0
12 0 0 0 0 0 0 0 0 0 0 0 1
attr(,"assign")
[1] 1 2 3 4 5 6 7 8 9 10 11 12
Alternately we can write it compactly like this:
Plus <- function(x) paste(x, collapse = "+")
nms <- names(YearCountry)
s <- sprintf("~ (%s):(%s)+0", Plus(nms[2:4]), Plus(nms[5:8]))
fo <- formula(s)
model.matrix(fo, YearCountry)
2) eList Another approach is to use list comprehensions. With the eList package we can do this:
library(eList)
DF(for(i in YearCountry[2:4]) for(j in YearCountry[5:8]) i*j)
giving this data frame. Use as.matrix(...) on it if you want a matrix.
AL.Y2000d AL.Y2001d AL.Y2002d AL.Y2003d FR.Y2000d FR.Y2001d FR.Y2002d FR.Y2003d UK.Y2000d UK.Y2001d UK.Y2002d UK.Y2003d
1 1 0 0 0 0 0 0 0 0 0 0 0
2 0 1 0 0 0 0 0 0 0 0 0 0
3 0 0 1 0 0 0 0 0 0 0 0 0
4 0 0 0 1 0 0 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0 0 0
7 0 0 0 0 0 0 1 0 0 0 0 0
8 0 0 0 0 0 0 0 1 0 0 0 0
9 0 0 0 0 0 0 0 0 1 0 0 0
10 0 0 0 0 0 0 0 0 0 1 0 0
11 0 0 0 0 0 0 0 0 0 0 1 0
12 0 0 0 0 0 0 0 0 0 0 0 1
3) listcompr listcompr is another list comprehension package. Note that the development version of this package is needed in order to use bycol=. Replace gen.named.matrix with gen.named.data.frame if you want a data frame.
# devtools::github_github("patrickroocks/listcompr")
library(listcompr)
nms <- names(YearCountry)
gen.named.matrix("{nms[i]}.{nms[j]}", YearCountry[[i]] * YearCountry[[j]],
i = 2:4, j = 5:8, bycol = TRUE)

R multiple for loop

I have this loop over the file msp.chr1
for(i in names(msp.chr1[c(7:70)])){
tmp <- rle(msp.chr1[[i]])$lengths
msp.chr1$idx <- rep(1:length(tmp),tmp)
tmp2 <- unlist(by(msp.chr1[msp.chr1[[i]]==1,], list(msp.chr1$idx[msp.chr1[[i]]==1]),function(x){tail(x["epos"],1)-head(x["spos"],1)}))
assign(paste(i, ".chr1", sep=""), as.vector(tmp2))
rm(i); rm(tmp); rm(tmp2)
}
This file is a dataframe with multiple columns:
head(msp.chr1)
chm spos epos sgpos egpos nsnps PDAC1.0 PDAC1.1 PDAC10.0 PDAC10.1 PDAC100.0 PDAC100.1 PDAC101.0 PDAC101.1 PDAC102.0 PDAC102.1 PDAC103.0 PDAC103.1
1 1 123492 134160 0.12 0.13 252 0 0 0 0 1 0 0 0 0 0 0 0
2 1 134160 135025 0.13 0.14 20 0 0 0 0 1 0 0 0 0 0 0 0
3 1 135025 145600 0.14 0.15 150 0 0 0 0 1 0 0 0 0 0 0 0
4 1 145600 316603 0.15 0.32 195 0 1 0 0 1 0 0 1 0 0 0 1
5 1 316603 520140 0.32 0.52 765 0 0 0 0 0 0 0 0 0 0 0 0
6 1 520140 667054 0.52 0.67 1080 0 0 0 0 0 0 0 0 0 0 0 0
PDAC104.0 PDAC104.1 PDAC105.0 PDAC105.1 PDAC11.0 PDAC11.1 PDAC12.0 PDAC12.1 PDAC13.0 PDAC13.1 PDAC14.0 PDAC14.1 PDAC15.0 PDAC15.1 PDAC17.0 PDAC17.1
1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
3 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PDAC18.0 PDAC18.1 PDAC19.0 PDAC19.1 PDAC2.0 PDAC2.1 PDAC20.0 PDAC20.1 PDAC21.0 PDAC21.1 PDAC22.0 PDAC22.1 PDAC23.0 PDAC23.1 PDAC24.0 PDAC24.1 PDAC25.0
1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
2 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0
3 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0
4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
PDAC25.1 PDAC3.0 PDAC3.1 PDAC4.0 PDAC4.1 PDAC5.0 PDAC5.1 PDAC6.0 PDAC6.1 PDAC7.0 PDAC7.1 PDAC8.0 PDAC8.1 PDAC807.0 PDAC807.1 PDAC810.0 PDAC810.1
1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
PDAC9.0 PDAC9.1 idx
1 0 0 1
2 0 0 1
3 0 0 1
4 0 0 1
5 1 0 1
6 1 0 1
for(i in names(msp.chr1[c(7:70)])){
tmp <- rle(msp.chr1[[i]])$lengths
msp.chr1$idx <- rep(1:length(tmp),tmp)
tmp2 <- unlist(by(msp.chr1[msp.chr1[[i]]==1,], list(msp.chr1$idx[msp.chr1[[i]]==1]),function(x){tail(x["epos"],1)-head(x["spos"],1)}))
assign(paste(i, ".chr1", sep=""), as.vector(tmp2))
rm(i); rm(tmp); rm(tmp2)
}
But I actually have 23 files, of names msp.chr1, msp.chr2, ..., msp.chr23.
I want to add another loop on the above, to do that on all files at once.
I tried several things but it is not working...
Basically, every chr1 in my loop (including in the assign) should be replaced by chr1 to chr23.
Can you help?
Thanks,
You can generate the name of the file with paste, and then get the file by its name with get. A better option would be to create these files within a list, then you'd only use the j like df=list[[j]].
for(j in 1:23){
df = get(paste("msp.chr",j,sep=""))
for(i in names(df[c(7:70)])){
tmp <- rle(df[[i]])$lengths
df$idx <- rep(1:length(tmp),tmp)
tmp2 <- unlist(by(df[df[[i]]==1,], list(df$idx[df[[i]]==1]),function(x){tail(x["epos"],1)-head(x["spos"],1)}))
assign(paste(i, ".chr1", sep=""), as.vector(tmp2))
rm(i); rm(tmp); rm(tmp2)
}
}

Standard deviation error for EcoTest.sample

I am using EcoTest.sample to compare rarefaction curves for 19 vegetation plots on two soil types (alluvial and canyon). The code below produces the following
warning (more than 50 times): "In cor(x > 0) : the standard deviation is zero".
The test still produces all the expected output. Should I be concerned about the warnings? Is it a result of my relatively small sample size?
rawdata<-read.table(text="Plot SiteType sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8 sp9 sp10 sp11 sp12 sp13 sp14 sp15 sp16 sp17 sp18 sp19 sp20 sp21 sp22 sp23 sp24 sp25 sp26 sp27 sp28 sp29 sp30 sp31 sp32 sp33 sp34 sp35
2 canyon 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0
3 alluvial 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0
5 alluvial 1 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0
6 alluvial 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0
7 alluvial 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0
8 alluvial 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0
10 alluvial 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 1 1 0 0
11 canyon 1 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 0
12 canyon 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
13 canyon 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0
14 canyon 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
15 canyon 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0
16 canyon 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
17 canyon 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0
18 canyon 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0
19 canyon 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0
20 canyon 1 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1
22 alluvial 1 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 1 0 0 1 0 0
23 alluvial 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0
", header=T)
data<-rawdata[,-1]
rownames(data)<-rawdata[,1]
test.data<-EcoTest.sample(data[,-1], by=data$SiteType, MARGIN=1, trace=F)
EDIT: Perhaps you need to set the nature of the index using q. For instance if I use q=2 the inverse Simpson index, I cannot reproduce your error. As it stands you're using q=0, the species richness. Perhaps there's nothing to do rather than using a different index. I'm not aware of the factors affecting index choice. I've read a thing or two here: http://www.tiem.utk.edu/~gross/bioed/bealsmodules/shannonDI.html and found this paper that I didn't go into much detail: https://dx.doi.org/10.1002%2Fece3.1155
Using Simpson's index: No warnings.
test.data<-EcoTest.sample(data[,-1], by=data$SiteType, MARGIN=1, trace=F,q=2)
Sample-based method
P(Obs <= null) = 0.205
As stated in this answer on SE, a standard deviation of zero will have an impact on the nature of the distribution. Therefore, any tests you perform that may have depended on a normal distribution will likely be erroneous. The p-values obtained say by a t-test may therefore be "insignificant."
When standard deviation is zero, your Gaussian (normal) PDF turns into Dirac delta function. You can't simply plug zero standard deviation into the conventional expression. For instance, if the PDF is plugged into some kind of numerical integration, this won't work. (Aksakal on SE)
https://stats.stackexchange.com/questions/233834/what-is-the-normal-distribution-when-standard-deviation-is-zero

Select n rows after specific number

I work with a data.frame like this:
Country Date balance_of_payment business_confidence_indicator consumer_confidence_indicator CPI Crisis_IMF
1 Australia 1980-01-01 -0.87 100.215 99.780 25.4 0
2 Australia 1980-04-01 -1.62 100.061 99.746 26.2 0
3 Australia 1980-07-01 -3.70 100.599 100.049 26.6 0
4 Australia 1980-10-01 -3.13 100.597 100.735 27.2 0
5 Australia 1981-01-01 -2.73 101.149 101.016 27.8 0
6 Australia 1981-04-01 -4.11 100.936 100.150 28.4 0
I want to create a summary statistic with describe(dataset)from the HmiscPackage.
I need to differentiate between the timespans n-quarters before Crisis_IMF is 1, the time in which Crisis_IMF is 1 and the state n-quaters after Crisis_IMF is 1.
To select the time in which Crisis_IMF is 1, I did describe(dataset[dataset$Crisis_IMF==1,"balance_of_payment"]).
But I do not know how to make the command over the timespan of n-quarters (e.g. 8) after the event.
Edit:
dataset$Crisis_IMF
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[60] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[119] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[178] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[237] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[296] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[355] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[414] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[473] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[532] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[591] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[650] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[709] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
[768] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[827] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[886] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[945] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1004] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1063] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1122] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1181] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1240] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1299] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1358] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1417] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1476] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
[1535] 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
[1594] 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1653] 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1712] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1771] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
[1830] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1889] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1948] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2007] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2066] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2125] 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
[2184] 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2243] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2302] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2361] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[2420] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2479] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2538] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2597] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2656] 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2715] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2774] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2833] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2892] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2951] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3010] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3069] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3128] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3187] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3246] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3305] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3364] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3423] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3482] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3541] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3600] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3659] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3718] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3777] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3836] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3895] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[3954] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
[4013] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[4072] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[4131] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[4190] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[4249] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[4308] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Edit2; further information on the dataset:
Country Date balance_of_payment Crisis_IMF
1 Australia 1980-01-01 -0.87 0
2 Australia 1980-04-01 -1.62 0
3 Australia 1980-07-01 -3.70 0
4 Australia 1980-10-01 -3.13 0
5 Australia 1981-01-01 -2.73 0
6 Australia 1981-04-01 -4.11 0
7 Australia 1981-07-01 -3.98 0
8 Australia 1981-10-01 -5.27 0
9 Australia 1982-01-01 -5.31 0
10 Australia 1982-04-01 -4.67 0
11 Australia 1982-07-01 -3.30 0
12 Australia 1982-10-01 -3.24 0
13 Australia 1983-01-01 -3.45 0
14 Australia 1983-04-01 -2.86 0
15 Australia 1983-07-01 -3.58 0
...
137 Australia 2014-01-01 -2.18 0
138 Australia 2014-04-01 -3.44 0
139 Australia 2014-07-01 -3.04 0
140 Australia 2014-10-01 -2.39 0
141 Austria 1980-01-01 -3.97 0
142 Austria 1980-04-01 -3.89 0
143 Austria 1980-07-01 -1.84 0
144 Austria 1980-10-01 -1.60 0
145 Austria 1981-01-01 -2.74 0
146 Austria 1981-04-01 -2.88 0
147 Austria 1981-07-01 -2.83 0
148 Austria 1981-10-01 -2.06 0
149 Austria 1982-01-01 -0.63 0
150 Austria 1982-04-01 0.61 0
151 Austria 1982-07-01 2.42 0
152 Austria 1982-10-01 2.70 0
There can be more then one crisis period for one country. That e.g. in Australia are Crisis from 1990-01-01 to 1991-04-01 and 2002-01-01 to 2005-01-01. I want to create 3 different describe commands, which show the behaviour of the variable in the above mentioned states.
You haven't provided your full data, so I have to guess that your Crisis_IMF column has an unbroken sequence of zeroes (before the crisis), followed by an unbroken sequence of ones (during which the IMF crisis was considered to be in effect), finally followed by an unbroken sequence of zeroes (after the crisis).
Below I've synthesized my own data for testing. I only synthesized columns Crisis_IMF and balance_of_payment, because those appear to be the only columns relevant to your problem. I used 30 rows, with the first 10 before, the next 10 during, and the last 10 after the crisis. I used sort of a random parabolic arc for the balance_of_payment, but that was entirely random.
library('Hmisc');
set.seed(1);
N <- 30;
df <- data.frame(balance_of_payment=-5+2*seq(-1.5,1.5,len=N)^2+rnorm(N,0,0.2), Crisis_IMF=c(rep(0,N/3),rep(1,N/3),rep(0,N/3)) );
df;
## balance_of_payment Crisis_IMF
## 1 -0.6252908 0
## 2 -1.0625579 0
## 3 -1.8228927 0
## 4 -1.8503850 0
## 5 -2.5744076 0
## 6 -3.2324647 0
## 7 -3.3561408 0
## 8 -3.6484112 0
## 9 -3.9805631 0
## 10 -4.4136342 0
## 11 -4.2642312 1
## 12 -4.6598435 1
## 13 -4.9904788 1
## 14 -5.3947830 1
## 15 -4.7696630 1
## 16 -5.0036359 1
## 17 -4.9550811 1
## 18 -4.6774634 1
## 19 -4.5735679 1
## 20 -4.4478071 1
## 21 -4.1687610 0
## 22 -3.9392921 0
## 23 -3.7811631 0
## 24 -3.8514970 0
## 25 -2.9444058 0
## 26 -2.6515349 0
## 27 -2.2006002 0
## 28 -1.9499174 0
## 29 -1.1949166 0
## 30 -0.4164117 0
crisisRange <- range(which(df$Crisis_IMF==1));
crisisRange;
## [1] 11 20
df$Off_Crisis <- c((1-crisisRange[1]):-1,rep(0,diff(crisisRange)+1),1:(nrow(df)-crisisRange[2]));
df;
## balance_of_payment Crisis_IMF Off_Crisis
## 1 -0.6252908 0 -10
## 2 -1.0625579 0 -9
## 3 -1.8228927 0 -8
## 4 -1.8503850 0 -7
## 5 -2.5744076 0 -6
## 6 -3.2324647 0 -5
## 7 -3.3561408 0 -4
## 8 -3.6484112 0 -3
## 9 -3.9805631 0 -2
## 10 -4.4136342 0 -1
## 11 -4.2642312 1 0
## 12 -4.6598435 1 0
## 13 -4.9904788 1 0
## 14 -5.3947830 1 0
## 15 -4.7696630 1 0
## 16 -5.0036359 1 0
## 17 -4.9550811 1 0
## 18 -4.6774634 1 0
## 19 -4.5735679 1 0
## 20 -4.4478071 1 0
## 21 -4.1687610 0 1
## 22 -3.9392921 0 2
## 23 -3.7811631 0 3
## 24 -3.8514970 0 4
## 25 -2.9444058 0 5
## 26 -2.6515349 0 6
## 27 -2.2006002 0 7
## 28 -1.9499174 0 8
## 29 -1.1949166 0 9
## 30 -0.4164117 0 10
n <- 8;
describe(df[df$Off_Crisis>=-n&df$Off_Crisis<=-1,'balance_of_payment']);
## df[df$Off_Crisis >= -n & df$Off_Crisis <= -1, "balance_of_payment"]
## n missing unique Info Mean
## 8 0 8 1 -3.11
##
## -4.41363415781177 (1, 12%), -3.98056311135777 (1, 12%), -3.64841115885525 (1, 12%), -3.35614082447269 (1, 12%), -3.23246466374394 (1, 12%), -2.57440760140387 (1, 12%), -1.85038498107066 (1, 12%), -1.82289266659616 (1, 12%)
describe(df[df$Off_Crisis==0,'balance_of_payment']);
## df[df$Off_Crisis == 0, "balance_of_payment"]
## n missing unique Info Mean .05 .10 .25 .50 .75 .90 .95
## 10 0 10 1 -4.774 -5.219 -5.043 -4.982 -4.724 -4.595 -4.429 -4.347
##
## -5.39478302143074 (1, 10%), -5.00363594891363 (1, 10%), -4.99047879387293 (1, 10%), -4.95508109661503 (1, 10%), -4.76966304348196 (1, 10%), -4.67746343562751 (1, 10%), -4.65984348113626 (1, 10%), -4.57356788939893 (1, 10%), -4.44780713171369 (1, 10%), -4.26423116226702 (1, 10%)
describe(df[df$Off_Crisis>=1&df$Off_Crisis<=n,'balance_of_payment']);
## df[df$Off_Crisis >= 1 & df$Off_Crisis <= n, "balance_of_payment"]
## n missing unique Info Mean
## 8 0 8 1 -3.186
##
## -4.16876100605885 (1, 12%), -3.93929212154225 (1, 12%), -3.85149697413106 (1, 12%), -3.78116310320806 (1, 12%), -2.94440583734139 (1, 12%), -2.65153490367274 (1, 12%), -2.20060024283928 (1, 12%), -1.949917420894 (1, 12%)
The solution works by first computing the range of indexes during which the crisis was in effect as crisisRange. Then it appends to the data.frame a new column Off_Crisis which captures how many quarters offset from the crisis the row is, using negative numbers for before and positive numbers for after, and assuming each row represents exactly one quarter.
The describe() calls can then be made by subsetting on the Off_Crisis column, getting just the quarters offset from the crisis that you want for each call.
Edit: Whew! That was tough. Pretty sure I got it though:
library('Hmisc');
set.seed(1);
N <- 60;
df <- data.frame(balance_of_payment=rep(-5+2*seq(-1.5,1.5,len=N/2)^2,2)+rnorm(N,0,0.2), Crisis_IMF=c(rep(0,N/6),rep(1,N/6),rep(0,N/3),rep(1,N/6),rep(0,N/6)) );
df;
## balance_of_payment Crisis_IMF
## 1 -0.6252908 0
## 2 -1.0625579 0
## 3 -1.8228927 0
## 4 -1.8503850 0
## 5 -2.5744076 0
## 6 -3.2324647 0
## 7 -3.3561408 0
## 8 -3.6484112 0
## 9 -3.9805631 0
## 10 -4.4136342 0
## 11 -4.2642312 1
## 12 -4.6598435 1
## 13 -4.9904788 1
## 14 -5.3947830 1
## 15 -4.7696630 1
## 16 -5.0036359 1
## 17 -4.9550811 1
## 18 -4.6774634 1
## 19 -4.5735679 1
## 20 -4.4478071 1
## 21 -4.1687610 0
## 22 -3.9392921 0
## 23 -3.7811631 0
## 24 -3.8514970 0
## 25 -2.9444058 0
## 26 -2.6515349 0
## 27 -2.2006002 0
## 28 -1.9499174 0
## 29 -1.1949166 0
## 30 -0.4164117 0
## 31 -0.2282641 0
## 32 -1.1198441 0
## 33 -1.5782326 0
## 34 -2.1802021 0
## 35 -2.9157211 0
## 36 -3.1513699 0
## 37 -3.5324846 0
## 38 -3.8079388 0
## 39 -3.8757143 0
## 40 -4.1999213 0
## 41 -4.5994921 1
## 42 -4.7884845 1
## 43 -4.7268380 1
## 44 -4.8405104 1
## 45 -5.1324004 1
## 46 -5.1361483 1
## 47 -4.8789267 1
## 48 -4.7125241 1
## 49 -4.7602814 1
## 50 -4.3903659 1
## 51 -4.2729353 0
## 52 -4.2181247 0
## 53 -3.7278522 0
## 54 -3.6794993 0
## 55 -2.7817662 0
## 56 -2.2442292 0
## 57 -2.2428854 0
## 58 -1.8645939 0
## 59 -0.9853426 0
## 60 -0.5270109 0
df$Off_Crisis <- ifelse(df$Crisis_IMF==1,0,with(rle(df$Crisis_IMF),{ mids <- lengths[-c(1,length(lengths))]; c(-lengths[1]:-1,sequence(mids)-rep(rbind(0,mids+1),rbind(ceiling(mids/2),floor(mids/2))),1:lengths[length(lengths)]); }));
df;
## balance_of_payment Crisis_IMF Off_Crisis
## 1 -0.6252908 0 -10
## 2 -1.0625579 0 -9
## 3 -1.8228927 0 -8
## 4 -1.8503850 0 -7
## 5 -2.5744076 0 -6
## 6 -3.2324647 0 -5
## 7 -3.3561408 0 -4
## 8 -3.6484112 0 -3
## 9 -3.9805631 0 -2
## 10 -4.4136342 0 -1
## 11 -4.2642312 1 0
## 12 -4.6598435 1 0
## 13 -4.9904788 1 0
## 14 -5.3947830 1 0
## 15 -4.7696630 1 0
## 16 -5.0036359 1 0
## 17 -4.9550811 1 0
## 18 -4.6774634 1 0
## 19 -4.5735679 1 0
## 20 -4.4478071 1 0
## 21 -4.1687610 0 1
## 22 -3.9392921 0 2
## 23 -3.7811631 0 3
## 24 -3.8514970 0 4
## 25 -2.9444058 0 5
## 26 -2.6515349 0 6
## 27 -2.2006002 0 7
## 28 -1.9499174 0 8
## 29 -1.1949166 0 9
## 30 -0.4164117 0 10
## 31 -0.2282641 0 -10
## 32 -1.1198441 0 -9
## 33 -1.5782326 0 -8
## 34 -2.1802021 0 -7
## 35 -2.9157211 0 -6
## 36 -3.1513699 0 -5
## 37 -3.5324846 0 -4
## 38 -3.8079388 0 -3
## 39 -3.8757143 0 -2
## 40 -4.1999213 0 -1
## 41 -4.5994921 1 0
## 42 -4.7884845 1 0
## 43 -4.7268380 1 0
## 44 -4.8405104 1 0
## 45 -5.1324004 1 0
## 46 -5.1361483 1 0
## 47 -4.8789267 1 0
## 48 -4.7125241 1 0
## 49 -4.7602814 1 0
## 50 -4.3903659 1 0
## 51 -4.2729353 0 1
## 52 -4.2181247 0 2
## 53 -3.7278522 0 3
## 54 -3.6794993 0 4
## 55 -2.7817662 0 5
## 56 -2.2442292 0 6
## 57 -2.2428854 0 7
## 58 -1.8645939 0 8
## 59 -0.9853426 0 9
## 60 -0.5270109 0 10
n <- 8;
describe(df[df$Off_Crisis>=-n&df$Off_Crisis<=-1,'balance_of_payment']);
## df[df$Off_Crisis >= -n & df$Off_Crisis <= -1, "balance_of_payment"]
## n missing unique Info Mean .05 .10 .25 .50 .75 .90 .95
## 16 0 16 1 -3.133 -4.253 -4.090 -3.825 -3.294 -2.476 -1.837 -1.762
##
## -4.41363415781177 (1, 6%), -4.19992133068899 (1, 6%), -3.98056311135777 (1, 6%), -3.87571430729169 (1, 6%), -3.80793877922333 (1, 6%), -3.64841115885525 (1, 6%)
## -3.53248462570045 (1, 6%), -3.35614082447269 (1, 6%), -3.23246466374394 (1, 6%), -3.15136989958027 (1, 6%), -2.91572106713267 (1, 6%), -2.57440760140387 (1, 6%)
## -2.1802021496148 (1, 6%), -1.85038498107066 (1, 6%), -1.82289266659616 (1, 6%), -1.57823262180228 (1, 6%)
describe(df[df$Off_Crisis==0,'balance_of_payment']);
## df[df$Off_Crisis == 0, "balance_of_payment"]
## n missing unique Info Mean .05 .10 .25 .50 .75 .90 .95
## 20 0 20 1 -4.785 -5.149 -5.133 -4.964 -4.765 -4.645 -4.442 -4.384
##
## lowest : -5.395 -5.136 -5.132 -5.004 -4.990, highest: -4.599 -4.574 -4.448 -4.390 -4.264
describe(df[df$Off_Crisis>=1&df$Off_Crisis<=n,'balance_of_payment']);
## df[df$Off_Crisis >= 1 & df$Off_Crisis <= n, "balance_of_payment"]
## n missing unique Info Mean .05 .10 .25 .50 .75 .90 .95
## 16 0 16 1 -3.157 -4.232 -4.193 -3.873 -3.312 -2.244 -2.075 -1.929
##
## -4.27293530430708 (1, 6%), -4.21812466033862 (1, 6%), -4.16876100605885 (1, 6%), -3.93929212154225 (1, 6%), -3.85149697413106 (1, 6%), -3.78116310320806 (1, 6%)
## -3.72785216159621 (1, 6%), -3.67949925417454 (1, 6%), -2.94440583734139 (1, 6%), -2.78176624658013 (1, 6%), -2.65153490367274 (1, 6%), -2.24422917606577 (1, 6%)
## -2.24288543679152 (1, 6%), -2.20060024283928 (1, 6%), -1.949917420894 (1, 6%), -1.86459386937746 (1, 6%)
For this demo I synthesized five periods: 10 rows of non-crisis, 10 rows of crisis (the first), 20 rows of non-crisis, 10 rows of crisis (the second), and 10 rows of non-crisis. The algorithm is the same, namely to compute an Off_Crisis column (which was much more difficult this time!) and then use it to subset the data.frame for each describe() call. Only now, data points from different crises will be combined in the subsets.

3D Array value assignment ruins the structure of array

Here is how to reproduce my problem. I want to create a 3D array
> g=array(0,dim=c(3,31,31))
> dim(g)
[1] 3 31 31
> dim(g[1,,])
[1] 31 31
This is x with dimension 31 by 31
> dim(x)
[1] 31 31
> x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
1 NA 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
2 0 NA 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
3 2 1 NA 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
4 0 0 0 NA 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 NA 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 NA 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
7 0 0 0 0 0 1 NA 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 NA 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 NA 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 1 1 0 0 0 1 0 0 NA 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0
11 0 0 0 0 0 0 0 0 2 0 NA 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 1 1 0 0 0 NA 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
13 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 0 0 1 1 1 0 0 0 0 0 0 0 0 1 NA 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
16 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 NA 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
17 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 NA 0 0 0 0 0 0 1 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 NA 0 1 1 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 NA 0 0 0 0 0 0 0 0 0 0 0
21 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 NA 0 0 0 0 0 0 0 0 0 0
22 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 1 0 0 0 0 0 0 0 0
23 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 NA 0 0 0 0 0 0 0 0
24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 NA 0 0 1 0 0 0 0
25 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 0 0
26 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0 0 0
27 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 NA 0 1 0 0
28 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 NA 0 0 0
29 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 NA 0 0
30 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA 0
31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA
when I try to assign x to the first 'section' of g using
> g[1,,] = x
The array structure of g is totally changed, as now it becomes:
> dim(g)
NULL
> head(g)
[[1]]
[1] NA 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
[[2]]
[1] 0
[[3]]
[1] 0
[[4]]
[1] 0 NA 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
[[5]]
[1] 0
[[6]]
[1] 0
This is totally different from what I expected, I am just trying to put a matrix to g[1,,] and dim(g) should still be 3 by 31 by 31, am I wrong? where did I do wrong?
Thanks to Pascal's comments below I've amended my answer, though I've left the dimensionality changed to 31x31x3 for perhaps easier understanding. The problem is the way that the data are converted from your data.frame object before storing in your array. I think by converting first to a matrix you should get what you're looking for:
g <- array(0,dim=c(31,31,3))
m <- matrix(1, 31, 31)
x <- as.data.frame(m)
## Storing x as it is will result in g becoming a list...
#g[,,1] <- x
## Converting the data.frame to a matrix will result in
## g remaining an array:
g[,,1] <- as.matrix(x)

Resources