Related
I have 2 data frame. Codes are: year, pd, treatm and rep.
Variablea are LAI in the first data frame, cimer, himv, nőv are in the second.
I would like to add variable LAI to the other variables/ columns.
I am not sure how to set the correct ordeing of LAI data, while 1 data has 4 codes to define.
Could You help me to solve this problem, please?
Thank You very much!
Data frames are:
> sample1
year treatm pd rep LAI
1 2020 1 A 1 2.58
2 2020 1 A 2 2.08
3 2020 1 A 3 2.48
4 2020 1 A 4 2.98
5 2020 2 A 1 3.34
6 2020 2 A 2 3.11
7 2020 2 A 3 3.20
8 2020 2 A 4 2.56
9 2020 1 B 1 2.14
10 2020 1 B 2 2.17
11 2020 1 B 3 2.24
12 2020 1 B 4 2.29
13 2020 2 B 1 3.41
14 2020 2 B 2 3.12
15 2020 2 B 3 2.81
16 2020 2 B 4 2.63
17 2021 1 A 1 2.15
18 2021 1 A 2 2.25
19 2021 1 A 3 2.52
20 2021 1 A 4 2.57
21 2021 2 A 1 2.95
22 2021 2 A 2 2.82
23 2021 2 A 3 3.11
24 2021 2 A 4 3.04
25 2021 1 B 1 3.25
26 2021 1 B 2 2.33
27 2021 1 B 3 2.75
28 2021 1 B 4 3.09
29 2021 2 B 1 3.18
30 2021 2 B 2 2.75
31 2021 2 B 3 3.21
32 2021 2 B 4 3.57
> sample2
year.pd.treatm.rep.cimer.himv.nőv
1 2020,A,1,1,92,93,94
2 2020,A,2,1,91,92,93
3 2020,B,1,1,72,73,75
4 2020,B,2,1,73,74,75
5 2020,A,1,2,95,96,100
6 2020,A,2,2,90,91,94
7 2020,B,1,2,74,76,78
8 2020,B,2,2,71,72,74
9 2020,A,1,3,94,95,96
10 2020,A,2,3,92,93,96
11 2020,B,1,3,76,77,77
12 2020,B,2,3,74,75,76
13 2020,A,1,4,90,91,97
14 2020,A,2,4,90,91,94
15 2020,B,1,4,74,75,NA
16 2020,B,2,4,73,75,NA
17 2021,A,1,1,92,93,94
18 2021,A,2,1,91,92,93
19 2021,B,1,1,72,73,75
20 2021,B,2,1,73,74,75
21 2021,A,1,2,95,96,100
22 2021,A,2,2,90,91,94
23 2021,B,1,2,74,76,78
24 2021,B,2,2,71,72,74
25 2021,A,1,3,94,95,96
26 2021,A,2,3,92,93,96
27 2021,B,1,3,76,77,77
28 2021,B,2,3,74,75,76
29 2021,A,1,4,90,91,97
30 2021,A,2,4,90,91,94
31 2021,B,1,4,74,75,NA
32 2021,B,2,4,73,75,NA
You can use inner_join from dply
library(tidyverse)
inner_join(sample2,sample1, by=c("year","pd", "treatm", "rep"))
Output (first six lines)
year pd treatm rep cimer himv nov LAI
1: 2020 A 1 1 92 93 94 2.58
2: 2020 A 2 1 91 92 93 3.34
3: 2020 B 1 1 72 73 75 2.14
4: 2020 B 2 1 73 74 75 3.41
5: 2020 A 1 2 95 96 100 2.08
6: 2020 A 2 2 90 91 94 3.11
You can also use data.table
sample2[sample1, on=.(year,pd,treatm,rep)]
I have 11 dataframes with various observations from seagrass surveys in the Chesapeake. Each dataframe contains the following variables (with example values included). There are 11 dataframes as each one represents observations from a single SAMPYR. So:
> head(density.2007)
PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS
1 HI 2 1.0 50 2006 2007 1 6.0
2 HI 5 0.5 100 2006 2007 1 11.6
3 HI 7 0.5 50 2006 2007 1 6.0
4 HI 9 0.5 100 2006 2007 1 9.6
5 HI 10 1.0 100 2006 2007 1 30.0
6 HI 23 1.0 50 2006 2007 1 40.4
> head(density.2008)
PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS NOTES id
29 HI 1 1.0 100 2007 2008 1 39.6 29
30 HI 2 1.0 50 2006 2008 2 54.8 30
31 HI 3 0.5 100 2007 2008 1 11.2 31
32 HI 4 1.0 100 2007 2008 1 8.8 32
33 HI 5 0.5 100 2006 2008 2 24.0 33
34 HI 7 0.5 50 2006 2008 2 0.0 34
I would like to write a for loop that takes the number of unique characters from the PLOT column, and calculates the frequency of each one (so I can then filter so only those that appear more than once are listed).
What I have so far is:
density.names <- c("density.2007",
"density.2008",
"density.2009",
"density.2010",
"density.2011",
"density.2012",
"density.2013",
"density.2014",
"density.2015",
"density.2016",
"density.2017"
)
for(i in 1:length(density.names)) {
get(density.names[i]) %>%
count(PLOT) %>%
print()
}
This code outputs
+ print()
PLOT n
1 HI 1 1
2 HI 10 1
3 HI 100 1
4 HI 103 1
5 HI 104 1
6 HI 11 1
7 HI 13 1
8 HI 14 1
9 HI 15 1
10 HI 17 1
11 HI 18 1
12 HI 2 1
13 HI 20 1
14 HI 21 1
15 HI 23 1
16 HI 25 1
17 HI 27 1
18 HI 29 1
19 HI 3 1
20 HI 31 1
21 HI 32 1
22 HI 36 1
23 HI 37 1
24 HI 38 1
25 HI 39 1
26 HI 4 1
27 HI 40 1
But I can't do anything further with that. Is there a way for me to filter rows so only those with a n=2 show up? Or to print 11 dataframes from the for loop, so I can further manipulate them but at least I'll have a copy of them in the global environment?
Thank you! I can provide additional details if that helps.
Don't do it in a loop !! It is done completely different. I'll show you step by step.
My first step is to prepare a function that will generate data similar to yours.
library(tidyverse)
dens = function(year, n) tibble(
PLOT = paste("HI", sample(1:(n/7), n, replace = T)),
SIZE = runif(n, 0.1, 3),
DENSITY = sample(seq(50,200, by=50), n, replace = T),
SEEDYR = year-1,
SAMPYR = year,
AGE = sample(1:5, n, replace = T),
SHOOTS = runif(n, 0.1, 3)
)
Let's see how it works and generate some sample data frames
set.seed(123)
density.2007 = dens(2007, 120)
density.2008 = dens(2008, 88)
density.2009 = dens(2009, 135)
density.2010 = dens(2010, 156)
The density.2007 data frame looks like this
# A tibble: 120 x 7
PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS
<chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 HI 15 1.67 200 2006 2007 4 1.80
2 HI 14 0.270 150 2006 2007 2 2.44
3 HI 3 0.856 50 2006 2007 3 0.686
4 HI 10 1.25 200 2006 2007 5 1.43
5 HI 11 0.673 50 2006 2007 5 1.40
6 HI 5 2.51 150 2006 2007 3 2.23
7 HI 14 0.543 150 2006 2007 2 2.17
8 HI 5 2.43 200 2006 2007 5 2.51
9 HI 9 1.69 100 2006 2007 4 2.67
10 HI 3 2.02 50 2006 2007 2 2.86
# ... with 110 more rows
Now they need to be combined into one frame
df = density.2007 %>%
bind_rows(density.2008) %>%
bind_rows(density.2009) %>%
bind_rows(density.2010)
output
# A tibble: 499 x 7
PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS
<chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 HI 15 1.67 200 2006 2007 4 1.80
2 HI 14 0.270 150 2006 2007 2 2.44
3 HI 3 0.856 50 2006 2007 3 0.686
4 HI 10 1.25 200 2006 2007 5 1.43
5 HI 11 0.673 50 2006 2007 5 1.40
6 HI 5 2.51 150 2006 2007 3 2.23
7 HI 14 0.543 150 2006 2007 2 2.17
8 HI 5 2.43 200 2006 2007 5 2.51
9 HI 9 1.69 100 2006 2007 4 2.67
10 HI 3 2.02 50 2006 2007 2 2.86
# ... with 489 more rows
In the next step, count how many times each value of the PLOT variable occurs
PLOT.count = df %>%
group_by(PLOT) %>%
summarise(PLOT.n = n()) %>%
arrange(PLOT.n)
ouptut
# A tibble: 22 x 2
PLOT PLOT.n
<chr> <int>
1 HI 20 3
2 HI 22 5
3 HI 21 7
4 HI 18 12
5 HI 2 19
6 HI 1 20
7 HI 15 20
8 HI 17 21
9 HI 6 22
10 HI 11 23
# ... with 12 more rows
In the penultimate step, let's append these counters to the original data frame
df = df %>% left_join(PLOT.count, by="PLOT")
output
# A tibble: 499 x 8
PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS PLOT.n
<chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <int>
1 HI 15 1.67 200 2006 2007 4 1.80 20
2 HI 14 0.270 150 2006 2007 2 2.44 32
3 HI 3 0.856 50 2006 2007 3 0.686 27
4 HI 10 1.25 200 2006 2007 5 1.43 25
5 HI 11 0.673 50 2006 2007 5 1.40 23
6 HI 5 2.51 150 2006 2007 3 2.23 38
7 HI 14 0.543 150 2006 2007 2 2.17 32
8 HI 5 2.43 200 2006 2007 5 2.51 38
9 HI 9 1.69 100 2006 2007 4 2.67 26
10 HI 3 2.02 50 2006 2007 2 2.86 27
# ... with 489 more rows
Now filter it at will
df %>% filter(PLOT.n > 30)
ouptut
# A tibble: 139 x 8
PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS PLOT.n
<chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <int>
1 HI 14 0.270 150 2006 2007 2 2.44 32
2 HI 5 2.51 150 2006 2007 3 2.23 38
3 HI 14 0.543 150 2006 2007 2 2.17 32
4 HI 5 2.43 200 2006 2007 5 2.51 38
5 HI 8 0.598 50 2006 2007 1 1.70 34
6 HI 7 1.94 50 2006 2007 4 1.61 35
7 HI 14 2.91 50 2006 2007 4 0.215 32
8 HI 7 0.846 150 2006 2007 4 0.506 35
9 HI 7 2.38 150 2006 2007 3 1.34 35
10 HI 7 2.62 100 2006 2007 3 0.167 35
# ... with 129 more rows
Or this way
df %>% filter(PLOT.n == min(PLOT.n))
df %>% filter(PLOT.n == median(PLOT.n))
df %>% filter(PLOT.n == max(PLOT.n))
output
# A tibble: 3 x 8
PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS PLOT.n
<chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <int>
1 HI 20 0.392 200 2009 2010 1 0.512 3
2 HI 20 0.859 150 2009 2010 5 2.62 3
3 HI 20 0.882 200 2009 2010 5 1.06 3
> df %>% filter(PLOT.n == median(PLOT.n))
# A tibble: 26 x 8
PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS PLOT.n
<chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <int>
1 HI 9 1.69 100 2006 2007 4 2.67 26
2 HI 9 2.20 50 2006 2007 4 1.49 26
3 HI 9 0.587 200 2006 2007 3 1.13 26
4 HI 9 1.27 50 2006 2007 1 2.55 26
5 HI 9 1.56 150 2006 2007 3 2.01 26
6 HI 9 0.198 100 2006 2007 3 2.08 26
7 HI 9 2.72 150 2007 2008 3 0.421 26
8 HI 9 0.251 200 2007 2008 2 0.328 26
9 HI 9 1.83 50 2007 2008 1 0.192 26
10 HI 9 1.97 100 2007 2008 1 0.900 26
# ... with 16 more rows
> df %>% filter(PLOT.n == max(PLOT.n))
# A tibble: 38 x 8
PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS PLOT.n
<chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <int>
1 HI 5 2.51 150 2006 2007 3 2.23 38
2 HI 5 2.43 200 2006 2007 5 2.51 38
3 HI 5 2.06 100 2006 2007 5 1.93 38
4 HI 5 1.25 150 2006 2007 4 2.29 38
5 HI 5 2.29 200 2006 2007 1 2.97 38
6 HI 5 0.789 150 2006 2007 2 1.59 38
7 HI 5 1.11 100 2007 2008 4 2.61 38
8 HI 5 2.38 150 2007 2008 4 2.95 38
9 HI 5 2.67 200 2007 2008 3 1.77 38
10 HI 5 2.63 100 2007 2008 1 1.90 38
# ... with 28 more rows
With the small reproducible example below, I'd like to identify the dplyr approach to arrive at the data.frame shown at the end of this note. The features of the dplyr output is that it will ensure that the data.frame is sorted by date (note that the dates 1999-04-13 and 1999-03-12 are out of order) and that it then "accumulate" the number of days within each wy grouping (wy = "water year"; Oct 1-Sep 30) that Q is above a threshold of 3.0.
dat <- read.table(text="
Date wy Q
1997-01-01 1997 9.82
1997-02-01 1997 3.51
1997-02-02 1997 9.35
1997-10-04 1998 0.93
1997-11-01 1998 1.66
1997-12-02 1998 0.81
1998-04-03 1998 5.65
1998-05-05 1998 7.82
1998-07-05 1998 6.33
1998-09-06 1998 0.55
1998-09-07 1998 4.54
1998-10-09 1999 6.50
1998-12-31 1999 2.17
1999-01-01 1999 5.67
1999-04-13 1999 5.66
1999-03-12 1999 4.67
1999-06-05 1999 3.34
1999-09-30 1999 1.99
1999-11-06 2000 5.75
2000-03-04 2000 6.28
2000-06-07 2000 0.81
2000-07-06 2000 9.66
2000-09-09 2000 9.08
2000-09-21 2000 6.72", header=TRUE)
dat$Date <- as.Date(dat$Date)
mdat <- dat %>%
group_by(wy) %>%
filter(Q > 3) %>%
?
Desired results:
Date wy Q abvThreshCum
1997-01-01 1997 9.82 1
1997-02-01 1997 3.51 2
1997-02-02 1997 9.35 3
1997-10-04 1998 0.93 0
1997-11-01 1998 1.66 0
1997-12-02 1998 0.81 0
1998-04-03 1998 5.65 1
1998-05-05 1998 7.82 2
1998-07-05 1998 6.33 3
1998-09-06 1998 0.55 3
1998-09-07 1998 4.54 4
1998-10-09 1999 6.50 1
1998-12-31 1999 2.17 1
1999-01-01 1999 5.67 2
1999-03-12 1999 4.67 3
1999-04-13 1999 5.66 4
1999-06-05 1999 3.34 5
1999-09-30 1999 1.99 5
1999-11-06 2000 5.75 1
2000-03-04 2000 6.28 2
2000-06-07 2000 0.81 2
2000-07-06 2000 9.66 3
2000-09-09 2000 9.08 4
2000-09-21 2000 6.72 5
library(dplyr)
dat %>%
arrange(Date) %>%
group_by(wy) %>%
mutate(abv = cumsum(Q > 3)) %>%
ungroup()
# # A tibble: 24 x 4
# Date wy Q abv
# <date> <int> <dbl> <int>
# 1 1997-01-01 1997 9.82 1
# 2 1997-02-01 1997 3.51 2
# 3 1997-02-02 1997 9.35 3
# 4 1997-10-04 1998 0.93 0
# 5 1997-11-01 1998 1.66 0
# 6 1997-12-02 1998 0.81 0
# 7 1998-04-03 1998 5.65 1
# 8 1998-05-05 1998 7.82 2
# 9 1998-07-05 1998 6.33 3
# 10 1998-09-06 1998 0.55 3
# # ... with 14 more rows
data.table approach
library(data.table)
setDT(dat, key = "Date")[, abvThreshCum := cumsum(Q > 3), by = .(wy)]
I have a database with the columns: "Year", "Month", "T1",......"T31":
For example df_0 is the original format and I want to convert it in the new_df (second part)
id0 <- c ("Year", "Month", "T_day1", "T_day2", "T_day3", "T_day4", "T_day5")
id1 <- c ("2010", "January", 10, 5, 2,3,3)
id2 <- c ("2010", "February", 20,36,5,8,1)
id3 <- c ("2010", "March", 12,23,23,5,25)
df_0 <- rbind (id1, id2, id3)
colnames (df_0)<- id0
head(df_0)
I would like to create a new dataframe in which the data from T1....T31 for each month and year will join to a column with all dates for example from 1st January 2010 to 4th January 2012:
date<-seq(as.Date("2010-01-01"), as.Date("2012-01-04"), by="days")
or join the value in a new column of a dataframe based on the values of other three columns (year, month and day):
year <- lapply(strsplit(as.character(date), "\\-"), "[", 1)
month <- lapply(strsplit(as.character(date), "\\-"), "[", 2)
day <- lapply(strsplit(as.character(date), "\\-"), "[", 3)
df <- cbind (year, month, day)
I would like to have a data frame with the information in this way:
Year <- rep(2010,15)
Month <- c(rep("January", 5), rep("February",5), rep("March",5))
Day<- rep(c(1,2,3,4,5))
Value <- c(10,5,2,3,3,20,36,5,8,1,12,23,23,5,25)
new_df <- cbind (Year, Month, Day, Value)
head(new_df)
Thanks in advance
What you're looking for is to reshape your data. One library which you can use is the reshape2 library. Here we can use the melt function in the reshape2 library:
melt(data.frame(df_0), id.vars=c("Year", "Month"))
Based on the data you have, the output would have:
Year Month variable value
1 2010 January T_day1 10
2 2010 February T_day1 20
3 2010 March T_day1 12
4 2010 January T_day2 5
5 2010 February T_day2 36
6 2010 March T_day2 23
7 2010 January T_day3 2
8 2010 February T_day3 5
9 2010 March T_day3 23
10 2010 January T_day4 3
11 2010 February T_day4 8
12 2010 March T_day4 5
13 2010 January T_day5 3
14 2010 February T_day5 1
15 2010 March T_day5 25
Which you can then alter the variable column to the days depending on how you have formatted that column.
Firstly, I generated my own test data. I used a reduced date vector for easier demonstration: 2010-01-01 to 2010-03-04. In my df_0 I generated a value for each date in my reduced date vector not including the last date, and including one additional date not in my date vector: 2010-03-05. It will become clear later why I did this.
set.seed(1);
date <- seq(as.Date('2010-01-01'),as.Date('2010-03-04'),by='day');
df_0 <- reshape(setNames(as.data.frame(cbind(do.call(rbind,strsplit(strftime(c(date[-length(date)],as.Date('2010-03-05')),'%Y %B %d'),' ')),round(rnorm(length(date)),3))),c('Year','Month','Day','T_day')),dir='w',idvar=c('Year','Month'),timevar='Day');
attr(df_0,'reshapeWide') <- NULL;
df_0;
## Year Month T_day.01 T_day.02 T_day.03 T_day.04 T_day.05 T_day.06 T_day.07 T_day.08 T_day.09 T_day.10 T_day.11 T_day.12 T_day.13 T_day.14 T_day.15 T_day.16 T_day.17 T_day.18 T_day.19 T_day.20 T_day.21 T_day.22 T_day.23 T_day.24 T_day.25 T_day.26 T_day.27 T_day.28 T_day.29 T_day.30 T_day.31
## 1 2010 January -0.626 0.184 -0.836 1.595 0.33 -0.82 0.487 0.738 0.576 -0.305 1.512 0.39 -0.621 -2.215 1.125 -0.045 -0.016 0.944 0.821 0.594 0.919 0.782 0.075 -1.989 0.62 -0.056 -0.156 -1.471 -0.478 0.418 1.359
## 32 2010 February -0.103 0.388 -0.054 -1.377 -0.415 -0.394 -0.059 1.1 0.763 -0.165 -0.253 0.697 0.557 -0.689 -0.707 0.365 0.769 -0.112 0.881 0.398 -0.612 0.341 -1.129 1.433 1.98 -0.367 -1.044 0.57 <NA> <NA> <NA>
## 60 2010 March -0.135 2.402 -0.039 <NA> 0.69 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
The first half of the solution is a reshaping from wide format to long, and can be done with a single call to reshape(). Additionally, I wrapped it in a call to na.omit() to prevent NA values from being generated from the unavoidable NA cells in df_0:
df_1 <- na.omit(reshape(df_0,dir='l',idvar=c('Year','Month'),timevar='Day',varying=grep('^T_day\\.',names(df_0)),v.names='Value'));
rownames(df_1) <- NULL;
df_1[order(match(df_1$Month,month.name),df_1$Day),];
## Year Month Day Value
## 1 2010 January 1 -0.626
## 4 2010 January 2 0.184
## 7 2010 January 3 -0.836
## 10 2010 January 4 1.595
## 12 2010 January 5 0.33
## 15 2010 January 6 -0.82
## 17 2010 January 7 0.487
## 19 2010 January 8 0.738
## 21 2010 January 9 0.576
## 23 2010 January 10 -0.305
## 25 2010 January 11 1.512
## 27 2010 January 12 0.39
## 29 2010 January 13 -0.621
## 31 2010 January 14 -2.215
## 33 2010 January 15 1.125
## 35 2010 January 16 -0.045
## 37 2010 January 17 -0.016
## 39 2010 January 18 0.944
## 41 2010 January 19 0.821
## 43 2010 January 20 0.594
## 45 2010 January 21 0.919
## 47 2010 January 22 0.782
## 49 2010 January 23 0.075
## 51 2010 January 24 -1.989
## 53 2010 January 25 0.62
## 55 2010 January 26 -0.056
## 57 2010 January 27 -0.156
## 59 2010 January 28 -1.471
## 61 2010 January 29 -0.478
## 62 2010 January 30 0.418
## 63 2010 January 31 1.359
## 2 2010 February 1 -0.103
## 5 2010 February 2 0.388
## 8 2010 February 3 -0.054
## 11 2010 February 4 -1.377
## 13 2010 February 5 -0.415
## 16 2010 February 6 -0.394
## 18 2010 February 7 -0.059
## 20 2010 February 8 1.1
## 22 2010 February 9 0.763
## 24 2010 February 10 -0.165
## 26 2010 February 11 -0.253
## 28 2010 February 12 0.697
## 30 2010 February 13 0.557
## 32 2010 February 14 -0.689
## 34 2010 February 15 -0.707
## 36 2010 February 16 0.365
## 38 2010 February 17 0.769
## 40 2010 February 18 -0.112
## 42 2010 February 19 0.881
## 44 2010 February 20 0.398
## 46 2010 February 21 -0.612
## 48 2010 February 22 0.341
## 50 2010 February 23 -1.129
## 52 2010 February 24 1.433
## 54 2010 February 25 1.98
## 56 2010 February 26 -0.367
## 58 2010 February 27 -1.044
## 60 2010 February 28 0.57
## 3 2010 March 1 -0.135
## 6 2010 March 2 2.402
## 9 2010 March 3 -0.039
## 14 2010 March 5 0.69
The second part of the solution requires merging the above long-format data.frame with the exact dates you stated you want in the resulting data.frame. This requires a fair amount of scaffolding code to transform the date vector into a data.frame with Year Month Day columns, but once that's done, you can simply call merge() with all.x=T to preserve every date in the date vector whether or not it was present in df_1, and to exclude any date in df_1 that is not also present in the date vector:
df_2 <- merge(transform(setNames(as.data.frame(do.call(rbind,strsplit(strftime(date,'%Y %B %d'),' '))),c('Year','Month','Day')),Day=as.integer(Day)),df_1,all.x=T);
df_2[order(match(df_2$Month,month.name),df_2$Day),];
## Year Month Day Value
## 29 2010 January 1 -0.626
## 30 2010 January 2 0.184
## 31 2010 January 3 -0.836
## 32 2010 January 4 1.595
## 33 2010 January 5 0.33
## 34 2010 January 6 -0.82
## 35 2010 January 7 0.487
## 36 2010 January 8 0.738
## 37 2010 January 9 0.576
## 38 2010 January 10 -0.305
## 39 2010 January 11 1.512
## 40 2010 January 12 0.39
## 41 2010 January 13 -0.621
## 42 2010 January 14 -2.215
## 43 2010 January 15 1.125
## 44 2010 January 16 -0.045
## 45 2010 January 17 -0.016
## 46 2010 January 18 0.944
## 47 2010 January 19 0.821
## 48 2010 January 20 0.594
## 49 2010 January 21 0.919
## 50 2010 January 22 0.782
## 51 2010 January 23 0.075
## 52 2010 January 24 -1.989
## 53 2010 January 25 0.62
## 54 2010 January 26 -0.056
## 55 2010 January 27 -0.156
## 56 2010 January 28 -1.471
## 57 2010 January 29 -0.478
## 58 2010 January 30 0.418
## 59 2010 January 31 1.359
## 1 2010 February 1 -0.103
## 2 2010 February 2 0.388
## 3 2010 February 3 -0.054
## 4 2010 February 4 -1.377
## 5 2010 February 5 -0.415
## 6 2010 February 6 -0.394
## 7 2010 February 7 -0.059
## 8 2010 February 8 1.1
## 9 2010 February 9 0.763
## 10 2010 February 10 -0.165
## 11 2010 February 11 -0.253
## 12 2010 February 12 0.697
## 13 2010 February 13 0.557
## 14 2010 February 14 -0.689
## 15 2010 February 15 -0.707
## 16 2010 February 16 0.365
## 17 2010 February 17 0.769
## 18 2010 February 18 -0.112
## 19 2010 February 19 0.881
## 20 2010 February 20 0.398
## 21 2010 February 21 -0.612
## 22 2010 February 22 0.341
## 23 2010 February 23 -1.129
## 24 2010 February 24 1.433
## 25 2010 February 25 1.98
## 26 2010 February 26 -0.367
## 27 2010 February 27 -1.044
## 28 2010 February 28 0.57
## 60 2010 March 1 -0.135
## 61 2010 March 2 2.402
## 62 2010 March 3 -0.039
## 63 2010 March 4 <NA>
Notice how 2010-03-04 is included, even though I didn't generate a value for it in df_0, and 2010-03-05 is excluded, even though I did.
I ran the BMA package in R to do a CoxPH test. I'm wondering what should I edit the data, so that this problem "arguments imply differing number of rows: 146, 0 " can be solved.
library(BMA)
data <- read.csv("Test1.csv", header = TRUE)
x<- data[1:146,]
x <- data[,c( "dom_econ_2","llgdp", "pcrdbofgdp")]
surv.t<- x$crisis1
cens<- x$cen1
test.bic.surv<- bic.surv(x, surv.t, cens, factor.type=Ture, strict=FALSE, nbest=200)
Error in data.frame(mm[, -1], surv.t = surv.t, cens = cens) :
arguments imply differing number of rows: 146, 0
Construction of data.
data <- read.table(text=" country Start crisis1 cen1 llgdp pcrdbofgdp dom_econ_2
1 Algeria 1988 48 1 90.537788 65.226883 0.00
2 Algeria 1994 24 1 43.727940 5.994088 14.25
3 Argentina 1985 96 0 12.049210 12.676220 0.00
4 Argentina 2002 12 1 27.514610 18.335609 14.96
5 Australia 1985 12 0 36.909191 30.567970 0.00
6 Australia 1997 12 1 60.054508 69.576698 104.06
7 Australia 2000 12 1 64.405777 80.765381 89.13
8 Australia 2008 12 1 95.728081 115.909699 237.16
9 Austria 2005 12 1 91.344994 108.155701 82.14
10 Belgium 2005 12 1 102.885399 71.527367 114.55
11 Bolivia 1985 12 0 4.461628 4.868293 0.00
12 Bolivia 1987 12 1 13.480320 13.259240 0.00
13 Bolivia 1989 12 1 17.370689 17.162399 0.00
14 Brazil 1985 132 0 7.082396 22.242729 0.00
15 Brazil 1999 12 1 40.434750 30.275040 153.22
16 Brazil 2001 24 1 45.114819 30.151600 133.65
17 Brazil 2008 12 1 57.924221 47.755600 409.57
18 canada 2008 12 1 119.428703 126.900398 225.36
19 Chile 1985 12 0 0.000000 0.000000 0.00
20 Chile 1987 12 1 0.000000 0.000000 0.00
21 Chile 1989 12 1 0.000000 0.000000 0.00
22 Chile 2008 12 1 0.000000 0.000000 35.17
23 Cote D'lvoire 1994 12 1 25.643181 22.177429 2.10
24 Cote D'lvoire 2011 24 1 41.235161 19.288630 4.68
25 china 1986 12 1 0.000000 0.000000 0.00
26 china 1989 12 1 62.773560 71.162529 0.00
27 china 1994 12 1 83.825783 76.370827 67.21
28 Colombia 1985 84 0 29.268551 32.937222 0.00
29 Colombia 1995 12 1 30.042919 30.603430 12.56
30 Colombia 1997 48 1 31.537670 34.393360 17.34
31 Colombia 2002 12 1 16.778780 22.066490 17.12
32 Costa Rica 1987 12 1 35.334270 17.252380 0.00
33 Costa Rica 1991 12 1 30.253300 10.472690 1.01
34 Costa Rica 1995 12 1 25.711729 10.946140 1.88
35 Dominican Republic 1985 12 0 22.065741 38.200081 0.00
36 Dominican Republic 1987 24 1 27.200859 41.605549 0.00
37 Dominican Republic 1990 12 1 23.815241 35.062832 0.77
38 Dominican Republic 2002 24 1 20.893270 38.377579 3.62
39 Ecuador 1985 96 0 24.365290 25.992100 0.00
40 Ecuador 1995 72 1 25.012659 25.226681 3.30
41 Egypt 1989 36 1 0.000000 0.000000 0.00
42 Egypt 2001 12 1 0.000000 0.000000 21.36
43 Egypt 2003 12 1 0.000000 0.000000 21.67
44 El Salvador 1988 12 1 5.249366 4.249679 0.00
45 Finland 1992 12 1 61.804680 93.284843 51.87
46 France 2005 12 1 73.674927 90.176163 1144.92
47 Germany 1997 12 1 69.414650 107.758598 1048.86
48 Germany 1999 12 1 85.617897 115.610901 1037.57
49 Germany 2005 12 1 105.417099 111.763199 1297.82
50 Greece 1985 24 0 58.569908 37.887230 0.00
51 Greece 1990 12 1 68.117287 34.083881 30.32
52 Greece 1999 36 1 55.327202 36.298470 44.28
53 Greece 2005 12 1 85.200127 73.185272 77.85
54 Guatemala 1986 12 1 23.963770 14.939860 0.00
55 Guatemala 1989 24 1 22.968491 14.576470 0.00
56 Honduras 1990 12 1 31.085350 29.356951 0.60
57 Honduras 1993 24 1 29.533979 25.364269 0.91
58 Honduras 1996 12 1 28.978729 22.788309 0.86
59 Hungary 1989 12 1 39.513908 44.371880 0.00
60 Hungary 1991 12 1 44.693378 42.222179 18.29
61 Hungary 1993 12 1 52.589550 28.814779 21.60
62 Hungary 1995 36 1 44.789848 21.890961 21.87
63 Hungary 1999 12 1 44.038410 24.015810 21.43
64 Iceland 1985 24 0 21.419769 34.361641 0.00
65 Iceland 1988 24 1 25.819929 34.976372 0.00
66 Iceland 2008 12 1 93.622017 184.647003 0.00
67 India 1988 12 1 40.268990 28.615240 0.00
68 India 1991 12 1 40.929920 23.150181 55.40
69 India 1993 12 1 42.146000 22.969900 53.35
70 India 2008 12 1 69.759697 44.396610 207.09
71 Indonesia 1997 24 1 50.021770 53.528721 40.59
72 Indonesia 2000 12 1 49.576542 17.631670 27.06
73 Indonesia 2008 12 1 36.236462 23.411659 101.12
74 Ireland 1993 12 1 46.543369 42.833199 16.32
75 Ireland 1997 12 1 69.748718 72.668739 22.49
76 Ireland 2005 12 1 87.587280 141.341995 51.42
77 Italy 1992 12 1 61.862431 57.690781 537.05
78 Italy 2005 12 1 58.811539 85.478607 856.04
79 Malaysia 1997 12 1 116.673599 139.381607 21.01
80 Mexico 1985 36 0 23.277300 10.972870 0.00
81 Mexico 1989 12 1 12.128950 11.774920 0.00
82 Mexico 1994 24 1 27.620720 33.321041 64.37
83 Mexico 1998 12 1 31.633909 22.903950 60.87
84 Mexico 2008 12 1 25.276720 20.486820 175.60
85 Morocco 1985 12 0 46.630791 28.247660 0.00
86 Netherlands 2005 12 1 111.478996 159.227707 196.86
87 New Zealand 1997 12 1 81.314529 96.649277 20.87
88 New Zealand 2008 12 1 91.273071 143.887497 40.38
89 Nicaragua 1985 24 0 0.000000 0.000000 0.00
90 Nicaragua 1988 48 1 0.000000 0.000000 0.00
91 Nicaragua 1993 12 1 0.000000 0.000000 0.54
92 Nigeria 1985 72 0 33.616810 15.274050 0.00
93 Nigeria 1999 12 1 18.795080 12.470600 10.26
94 Norway 1986 12 1 52.509472 65.354111 0.00
95 Norway 2008 12 1 0.000000 0.000000 138.04
96 Paraguay 1985 24 0 19.059549 13.474090 0.00
97 Paraguay 1989 12 1 18.109470 13.592000 0.00
98 Paraguay 1992 24 1 28.895550 20.640970 0.88
99 Paraguay 1998 24 1 27.359171 27.806259 1.41
100 Paraguay 2001 24 1 27.472139 27.111059 1.27
101 Peru 1985 12 0 18.312740 12.587190 0.00
102 Peru 1987 84 1 14.426420 9.529409 0.00
103 Peru 1998 12 1 29.766150 26.084431 9.76
104 Philippines 1990 12 1 32.946239 19.481730 8.97
105 Philippines 1997 12 1 60.959930 55.599201 15.96
106 Philippines 2000 12 1 57.644821 39.109230 14.52
107 Poland 1985 108 0 38.214378 51.334850 0.00
108 Poland 1995 36 1 27.932590 14.869600 51.27
109 Poland 1999 12 1 37.415001 22.911200 32.18
110 Poland 2008 12 1 48.807541 43.228100 178.28
111 Portugal 2005 12 1 92.989853 135.765900 89.34
112 Romania 1990 144 1 0.000000 0.000000 12.92
113 Romania 2008 12 1 31.392929 36.600521 32.11
114 Romania 2010 12 1 37.728611 45.040459 32.29
115 Russia 1987 120 1 0.000000 0.000000 0.00
116 Russia 1998 24 1 0.000000 0.000000 43.93
117 Russia 2008 12 1 0.000000 0.000000 293.34
118 Singapore 1997 12 1 109.437202 107.355103 29.25
119 South Africa 1985 12 0 51.689949 66.574753 0.00
120 South Africa 1988 12 1 49.117390 67.433647 0.00
121 South Africa 1996 12 1 47.592419 112.563797 41.01
122 South Africa 1998 12 1 53.312820 113.043098 36.40
123 South Africa 2000 24 1 52.709499 127.040100 34.19
124 South Africa 2008 12 1 46.246601 149.139099 80.10
125 Spain 1993 12 1 73.074364 77.935318 129.39
126 Spain 2005 12 1 100.510200 129.920197 159.93
127 Sri Lanka 1989 12 1 35.501869 19.156321 0.00
128 Sweden 1992 12 1 50.942661 124.471397 117.62
129 Sweden 2005 12 1 46.589840 102.645203 97.60
130 Sweden 2008 12 1 56.333191 124.272102 116.23
131 Switzerland 1999 12 1 165.171402 159.786499 27.19
132 Thailand 1997 12 1 90.951942 154.129700 27.92
133 Thailand 2000 12 1 112.097000 116.628799 21.31
134 Tunisia 1986 12 1 0.000000 0.000000 0.00
135 Turkey 1985 204 0 20.020611 15.242030 0.00
136 Turkey 2008 12 1 44.036678 29.615061 175.62
137 Uruguay 1985 156 0 43.514191 34.115601 0.00
138 Uruguay 2001 24 1 45.520069 49.360771 5.82
139 Venezuela 1986 12 1 0.000000 0.000000 0.00
140 Venezuela 1989 96 1 0.000000 0.000000 0.00
141 Venezuela 2002 12 1 0.000000 0.000000 23.89
142 Venezuela 2004 12 1 0.000000 0.000000 28.59
143 Venezuela 2010 12 1 0.000000 0.000000 85.81
144 United Kingdom 1993 12 1 59.609852 106.663597 409.43
145 United Kingdom 2008 12 1 163.094299 197.386902 1093.45
146 United States 2002 24 1 64.508629 169.231400 2012.69",
header=TRUE)
The problem is that surv.t & cens are blank.
## NOTICE IN THIS LINE, YOU SELECT ONLY THREE SPECIFIC COLUMNS
x <- data[,c( "dom_econ_2","llgdp", "pcrdbofgdp")]
## Then in this line, you are trying to access a column that is not there.
surv.t<- x$crisis1
I believe you meant to use data instead of x:
surv.t <- data$crisis1
cens <- data$cen1
If you want just the first 146 rows, use
surv.t <- data$crisis1[1:146]
cens <- data$cen1[1:146]
However, bear in mind that you can just use data$cen1 (etc) as an argument to your function. No need to create a new variable
As a general troubleshooting tip: If you are getting an error from a function and you are not sure why, one of the first steps is to check the arguments that you are passing to the function (ie, check the things inside the parentheses) and make sure they have the values that you (and the function) expect them to have.