My data:
no.att
year freq
1 1896 380
2 1900 1936
3 1904 1301
4 1906 1733
5 1908 3101
6 1912 4040
7 1920 4292
8 1924 5693
9 1928 5574
10 1932 3321
11 1936 7401
12 1948 7480
13 1952 9358
14 1956 6434
15 1960 9235
16 1964 9480
17 1968 10479
18 1972 11959
19 1976 10502
20 1980 8937
21 1984 11588
22 1988 14676
23 1992 16413
24 1994 3160
25 1996 13780
26 1998 3605
27 2000 13821
28 2002 4109
29 2004 13443
30 2006 4382
31 2008 13602
32 2010 4402
33 2012 12920
34 2014 4891
35 2016 13688
My goal:
from year 1992 and forwards the observation interval changes from every 4th year to every 2nd year.
I want to keep it every 4th year. so I want to ->
no.att[24,2] + no.att[25,2]
my solution is:
x <- 24
y <- 25
temp <- no.att[x,2]
temp1 <- no.att[y,2]
no.att[y,2] <- temp + temp1
x <- x + 2
y <- y + 2
running the above once and then skipping the two top lines does the trick.
What would an alternative to this approach be?
Using ave to sum freq every 4 yearly,
ans <- dat
ans$freq <- ave(dat$freq, ceiling(dat$year/4), FUN=sum)
ans[ans$year %in% seq(1896,2016,4),]
output:
year freq
1 1896 380
2 1900 1936
3 1904 1301
5 1908 4834
6 1912 4040
7 1920 4292
8 1924 5693
9 1928 5574
10 1932 3321
11 1936 7401
12 1948 7480
13 1952 9358
14 1956 6434
15 1960 9235
16 1964 9480
17 1968 10479
18 1972 11959
19 1976 10502
20 1980 8937
21 1984 11588
22 1988 14676
23 1992 16413
25 1996 16940
27 2000 17426
29 2004 17552
31 2008 17984
33 2012 17322
35 2016 18579
data:
dat <- read.table(text="year freq
1896 380
1900 1936
1904 1301
1906 1733
1908 3101
1912 4040
1920 4292
1924 5693
1928 5574
1932 3321
1936 7401
1948 7480
1952 9358
1956 6434
1960 9235
1964 9480
1968 10479
1972 11959
1976 10502
1980 8937
1984 11588
1988 14676
1992 16413
1994 3160
1996 13780
1998 3605
2000 13821
2002 4109
2004 13443
2006 4382
2008 13602
2010 4402
2012 12920
2014 4891
2016 13688", header=TRUE)
Related
I might be overcomplicating things - would love to know if if there is an easier way to solve this. I have a data frame (df) with 5654 observations - 1332 are foreign-born, and 4322 Canada-born subjects.
The variable df$YR_IMM captures: "In what year did you come to live in Canada?"
See the following distribution of observations per immigration year table(df$YR_IMM) :
1920 1926 1928 1930 1939 1942 1944 1946 1947 1948 1949 1950 1951 1952 1953 1954
2 1 1 2 1 2 1 1 1 9 5 1 7 13 3 5
1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970
10 5 8 6 6 1 5 1 6 3 7 16 18 12 15 13
1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986
10 17 8 18 25 16 15 12 16 27 13 16 11 9 17 16
1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
24 21 31 36 26 30 26 24 22 30 29 26 47 52 53 28 9
Naturally these are only foreign-born individuals (mean = 1985) - however, 348 foreign-borns are missing. There are a total of 4670 NAs that also include Canada-borns subjects.
How can I code these df$YR_IMM NAs in such a way that
348 (NA) --> 1985
4322(NA) --> 100
Additionally, the status is given by df$Brthcoun with 0 = "born in Canada" and 1 = "born outside of Canada.
Hope this makes sense - thank you!
EDIT: This was the solution ->
df$YR_IMM[is.na(df$YR_IMM) & df$Brthcoun == 0] <- 100
df$YR_IMM[is.na(df$YR_IMM) & df$Brthcoun == 1] <- 1985
Try the below code:
df$YR_IMM[is.na(df$YR_IMM) & df$Brthcoun == 0] <- 100
df$YR_IMM[is.na(df$YR_IMM) & df$Brthcoun == 1] <- 1985
I hope this helps!
Something like this should also work:
df$YR_IMM <- ifelse(is.na(df$YR_IMM) & df$Brthcoun == 0, 100, 1985)
I want to carry out an estimation procedure that uses data on all firms in a given sector, for a rolling window of 5 years.
I can do it easily in a loop, but since the estimation procedure takes quite a while, I would like to parallelize it. Is there any way to do this?
My data looks like this:
sale_log cogs_log ppegt_log m_naics4 naics_2 gvkey year
1 3.9070198 2.5146032 3.192821715 9.290151e-02 72 1001 1983
2 4.1028774 2.7375141 3.517861329 1.067687e-01 72 1001 1984
3 4.5909863 3.2106595 3.975112703 2.511660e-01 72 1001 1985
4 3.2560391 2.7867256 -0.763368555 1.351031e-02 44 1003 1982
5 3.2966287 2.8088799 -0.305698649 1.151525e-02 44 1003 1983
6 3.2636907 2.8330357 0.154036559 8.699394e-03 44 1003 1984
7 3.7916480 3.2346849 0.887916936 1.351803e-02 44 1003 1985
8 4.1778028 3.5364473 1.177985972 1.761273e-02 44 1003 1986
9 4.1819066 3.7297111 1.393016951 1.686331e-02 44 1003 1987
10 4.0174411 3.6050022 1.479584215 1.601205e-02 44 1003 1988
11 3.4466429 2.9633579 1.312863013 8.888067e-03 44 1003 1989
12 3.0667367 2.6128805 0.909779173 2.102674e-02 42 1004 1965
13 3.2362968 2.8140391 1.430690273 2.050934e-02 42 1004 1966
14 3.1981990 2.8822097 1.721614365 1.702929e-02 42 1004 1967
15 3.9265031 3.6159280 2.399823853 2.559074e-02 42 1004 1968
16 4.3343438 4.0116068 2.592692585 3.649313e-02 42 1004 1969
17 4.5869564 4.3059855 2.772196529 4.743631e-02 42 1004 1970
18 4.7015486 4.3995561 2.875267240 5.155589e-02 42 1004 1971
19 5.0564414 4.7539697 3.218686385 6.863808e-02 42 1004 1972
20 5.4323873 5.1711531 3.350849771 8.272720e-02 42 1004 1973
21 5.2979696 5.0033437 3.383504340 6.726429e-02 42 1004 1974
22 5.3958779 5.1475985 3.475121024 1.534230e-01 42 1004 1975
23 5.5442635 5.3195666 3.517557041 1.674937e-01 42 1004 1976
24 5.6260795 5.3909462 3.694842501 1.711362e-01 42 1004 1977
25 5.8039766 5.5455887 3.895724689 1.836405e-01 42 1004 1978
26 5.8198831 5.5665980 3.960153940 1.700499e-01 42 1004 1979
27 5.7474447 5.4697019 3.943733263 1.520660e-01 42 1004 1980
where gvkey is the firm id and naics are the industry codes.
The code I wrote:
theta=matrix(,60,23)
count=1
temp <- dat %>% select(
"sale_log", "cogs_log", "ppegt_log",
"m_naics4", "naics_2", "gvkey", "year"
)
for (i in 1960:2019) { # 5-year rolling sector-year specific production functions
sub <- temp[between(temp$year,i-5,i),] # subset 5 years
jcount <- 1
for (j in sort(unique(sub$naics_2))) { # loop over sectors
temp2 <- sub[sub$naics_2==j,]
mdl <- prodestOP(
Y=temp2$sale_log, fX=temp2$cogs_log, sX=temp2$ppegt_log,
pX=temp2$cogs_log, cX=temp2$m_naics4, idvar=temp2$gvkey,
timevar=temp2$year
)
theta[count,jcount] <- mdl#Model$FSbetas[2]
jcount <- jcount+1
}
count <- count+1
}
I am trying to apply a rolling window regression model to multiple groups in my data. Part of my data is as below:
gvkey year LC YTO
1 001004 1972 0.1919713 2.021182
2 001004 1973 0.2275895 2.029056
3 001004 1974 0.3341368 2.053517
4 001004 1975 0.3313518 2.090532
5 001004 1976 0.4005829 2.136939
6 001004 1977 0.4471945 2.123909
7 001004 1978 0.4442004 2.150281
8 001004 1979 0.5054544 2.173162
9 001004 1980 0.5269449 2.188077
10 001004 1981 0.5423774 2.200805
11 001004 1982 0.3528982 2.200851
12 001004 1983 0.3674031 2.190487
13 001004 1984 0.2267620 2.181291
14 001004 1985 0.2796132 2.159443
15 001004 1986 0.3382120 2.128420
16 001004 1987 0.3214131 2.089670
17 001004 1988 0.3883732 2.048279
18 001004 1989 0.4466488 1.999539
19 001004 1990 0.4929991 1.955500
20 001004 1991 0.5150894 1.934893
21 001004 1992 0.5218845 1.925521
22 001004 1993 0.5038105 1.904241
23 001004 1994 0.5041639 1.881731
24 001004 1995 0.5196658 1.863143
25 001004 1996 0.5352994 1.844464
26 001004 1997 0.4556059 1.835676
27 001004 1998 0.4905767 1.837886
28 001004 1999 0.5471959 1.824636
29 001004 2000 0.5920976 1.814944
30 001004 2001 0.5998172 1.893943
31 001004 2002 0.4499911 1.889703
32 001004 2003 0.4207154 1.870703
33 001004 2004 0.4371594 1.831638
34 001004 2005 0.4525900 1.802684
35 001004 2006 0.4342149 1.781757
36 001004 2007 0.4899473 1.753360
37 001004 2008 0.5436673 1.680464
38 001004 2009 0.5873861 1.612499
39 001004 2010 0.5216734 1.544322
40 001004 2011 0.5592963 1.415892
41 001004 2012 0.5627509 1.407393
42 001004 2013 0.5904637 1.384202
43 001004 2014 0.6170085 1.353340
44 001004 2015 0.7145900 1.314014
45 001007 1975 0.3721916 2.090532
46 001007 1976 0.2760902 2.136939
47 001007 1977 0.1866554 2.123909
48 001007 1978 0.1977654 2.150281
49 001007 1979 0.1927100 2.173162
50 001007 1980 0.2112344 2.188077
51 001007 1981 -0.2141724 2.200805
52 001007 1982 -0.2072785 2.200851
53 001007 1983 -1.7406963 2.190487
54 001007 1984 -14.8071429 2.181291
55 001009 1982 -1.2753247 2.200851
56 001009 1983 1.3349904 2.190487
57 001009 1984 2.6192237 2.181291
58 001009 1985 0.5867925 2.159443
59 001009 1986 0.6959436 2.128420
60 001009 1987 0.7142857 2.089670
61 001009 1988 0.7771897 2.048279
62 001009 1989 0.8293820 1.999539
63 001009 1990 0.8655382 1.955500
64 001009 1991 0.8712144 1.934893
65 001009 1992 0.8882548 1.925521
66 001009 1993 0.9190540 1.904241
67 001009 1994 0.9411806 1.881731
68 001010 1971 0.6492499 2.002337
69 001010 1972 0.6667664 2.021182
70 001010 1973 0.6840115 2.029056
71 001010 1974 0.7011797 2.053517
72 001010 1975 0.7189469 2.090532
73 001010 1976 0.7367344 2.136939
74 001010 1977 0.7511779 2.123909
75 001010 1978 0.7673365 2.150281
76 001010 1979 0.7795880 2.173162
77 001010 1980 0.7824448 2.188077
78 001010 1981 0.7821913 2.200805
79 001010 1982 0.7646078 2.200851
80 001010 1983 0.7426172 2.190487
81 001010 1984 -0.0657935 2.181291
82 001010 1985 0.2802410 2.159443
83 001010 1986 0.2052373 2.128420
84 001010 1987 0.2465290 2.089670
85 001010 1988 0.3437856 2.048279
86 001010 1989 0.7398662 1.999539
87 001010 1990 0.6360582 1.955500
88 001010 1991 0.7790707 1.934893
89 001010 1992 0.7588472 1.925521
90 001010 1993 0.7695341 1.904241
91 001010 1994 0.8060759 1.881731
92 001010 1995 0.8381234 1.863143
93 001010 1996 0.8661541 1.844464
94 001010 1997 0.8700456 1.835676
95 001010 1998 0.8748443 1.837886
96 001010 1999 0.8884077 1.824636
97 001010 2000 0.8979903 1.814944
98 001010 2003 0.6812689 1.870703
99 001011 1983 0.3043007 2.190487
100 001011 1984 0.3080601 2.181291
My function is
Match.LC.YTO<-function(x){rollapplyr(x,width=10,by.column=F,fill=NA, FUN=function(m){
temp.1<-lm(LC~YTO,data=m)
summary(temp.1)$r.squared*(sign(summary(temp.1)$coefficients[2,1]))
})}
df<-df%>%group_by(gvkey)%>%mutate(MTCH=Match.LC.YTO(df))
My data is grouped by gvkey, and for each group I need to calculate a variable named "MTCH" which equals the R squared value times the sign of coefficient of YTO in the linear model LC~YTO, and the model is estimated at a rolling window of 10 observations. I got the error message:
Error in mutate_impl(.data, dots) :
'data' must be a data.frame, not a matrix or an array
I have checked many other posts concerning the function rollapply and rollapplyr, and some suggest that I need to convert my df to zoo or matrix before I use rollapply function, but it still did not work.
rollapply in zoo will accept plain matrix and data frame arguments. That is not the problem. The following are problems with this code:
the code passes a matrix to lm but lm takes a data.frame
the code attempts to use rollapply with width of 10 on an object with fewer than 10 rows in the last group
if the intercept fits perfectly then there will be no 2nd coefficient from lm so the reference to coefficients[2, 1] will fail with an error.
Although not erroneous the following are areas for improvement:
TRUE and FALSE should be written out in full since T and F are valid variable names making this highly error-prone.
when using group_by in dplyr always match it with an ungroup. If you don't do that then the output will remember the grouping and the next time you use the output you will get a surprise. For example, consider the differnce between the following two snippets. The first results in n being the number of elements in the group that that row belongs to whereas the second results in the n being the number of rows in out.
out <- df %>% group_by(gvkey) %>% mutate(MTCH = Match.LC.YTO(LC, YTO))
out %>% mutate(n = n())
out <- df %>% group_by(gvkey) %>% mutate(MTCH = Match.LC.YTO(LC, YTO)) %>% ungroup
out %>% mutate(n = n())
questions to SO should be self-contained and reproducible so the library statements should not be omitted and the data should be provided in a reproducible manner
To fix these problems we
use partial = TRUE in rollapply to allow it to pass objects with fewer than 10 rows.
pass the variables involved directly
rollapply over the row numbers.
add an NA to the end of the coefficients to be picked up if the coefficient vector otherwise has only 1 element.
for clarity we have separated out the lm_summary function which was anonymous in the question
for reproduciblity we have added library statements and the Note at the end
The revised code is:
library(dplyr)
library(zoo)
Match.LC.YTO <- function(LC, YTO) {
lm_summary <- function(ix) {
temp.1 <- lm(LC ~ YTO, subset = ix)
summary(temp.1)$r.squared * sign(c(coef(temp.1), NA)[2])
}
rollapplyr(seq_along(LC), width = 10, FUN = lm_summary, partial = TRUE)
}
df %>% group_by(gvkey) %>% mutate(MTCH = Match.LC.YTO(LC, YTO)) %>% ungroup
If you would rather use fill = NA insted of partial = TRUE then add a check for the series length being less than the series width, i.e. less than 10:
Match.LC.YTO2 <- function(LC, YTO) {
lm_summary <- function(ix) {
temp.1 <- lm(LC ~ YTO, subset = ix)
summary(temp.1)$r.squared * sign(c(coef(temp.1), NA)[2])
}
if (length(LC) < 10) return(NA) ##
rollapplyr(seq_along(LC), width = 10, FUN = lm_summary, fill = NA)
}
df %>% group_by(gvkey) %>% mutate(MTCH = Match.LC.YTO2(LC, YTO)) %>% ungroup
Note 1
For sake of reproducibility we used this as the input df:
Lines <- " gvkey year LC YTO
1 001004 1972 0.1919713 2.021182
2 001004 1973 0.2275895 2.029056
3 001004 1974 0.3341368 2.053517
4 001004 1975 0.3313518 2.090532
5 001004 1976 0.4005829 2.136939
6 001004 1977 0.4471945 2.123909
7 001004 1978 0.4442004 2.150281
8 001004 1979 0.5054544 2.173162
9 001004 1980 0.5269449 2.188077
10 001004 1981 0.5423774 2.200805
11 001004 1982 0.3528982 2.200851
12 001004 1983 0.3674031 2.190487
13 001004 1984 0.2267620 2.181291
14 001004 1985 0.2796132 2.159443
15 001004 1986 0.3382120 2.128420
16 001004 1987 0.3214131 2.089670
17 001004 1988 0.3883732 2.048279
18 001004 1989 0.4466488 1.999539
19 001004 1990 0.4929991 1.955500
20 001004 1991 0.5150894 1.934893
21 001004 1992 0.5218845 1.925521
22 001004 1993 0.5038105 1.904241
23 001004 1994 0.5041639 1.881731
24 001004 1995 0.5196658 1.863143
25 001004 1996 0.5352994 1.844464
26 001004 1997 0.4556059 1.835676
27 001004 1998 0.4905767 1.837886
28 001004 1999 0.5471959 1.824636
29 001004 2000 0.5920976 1.814944
30 001004 2001 0.5998172 1.893943
31 001004 2002 0.4499911 1.889703
32 001004 2003 0.4207154 1.870703
33 001004 2004 0.4371594 1.831638
34 001004 2005 0.4525900 1.802684
35 001004 2006 0.4342149 1.781757
36 001004 2007 0.4899473 1.753360
37 001004 2008 0.5436673 1.680464
38 001004 2009 0.5873861 1.612499
39 001004 2010 0.5216734 1.544322
40 001004 2011 0.5592963 1.415892
41 001004 2012 0.5627509 1.407393
42 001004 2013 0.5904637 1.384202
43 001004 2014 0.6170085 1.353340
44 001004 2015 0.7145900 1.314014
45 001007 1975 0.3721916 2.090532
46 001007 1976 0.2760902 2.136939
47 001007 1977 0.1866554 2.123909
48 001007 1978 0.1977654 2.150281
49 001007 1979 0.1927100 2.173162
50 001007 1980 0.2112344 2.188077
51 001007 1981 -0.2141724 2.200805
52 001007 1982 -0.2072785 2.200851
53 001007 1983 -1.7406963 2.190487
54 001007 1984 -14.8071429 2.181291
55 001009 1982 -1.2753247 2.200851
56 001009 1983 1.3349904 2.190487
57 001009 1984 2.6192237 2.181291
58 001009 1985 0.5867925 2.159443
59 001009 1986 0.6959436 2.128420
60 001009 1987 0.7142857 2.089670
61 001009 1988 0.7771897 2.048279
62 001009 1989 0.8293820 1.999539
63 001009 1990 0.8655382 1.955500
64 001009 1991 0.8712144 1.934893
65 001009 1992 0.8882548 1.925521
66 001009 1993 0.9190540 1.904241
67 001009 1994 0.9411806 1.881731
68 001010 1971 0.6492499 2.002337
69 001010 1972 0.6667664 2.021182
70 001010 1973 0.6840115 2.029056
71 001010 1974 0.7011797 2.053517
72 001010 1975 0.7189469 2.090532
73 001010 1976 0.7367344 2.136939
74 001010 1977 0.7511779 2.123909
75 001010 1978 0.7673365 2.150281
76 001010 1979 0.7795880 2.173162
77 001010 1980 0.7824448 2.188077
78 001010 1981 0.7821913 2.200805
79 001010 1982 0.7646078 2.200851
80 001010 1983 0.7426172 2.190487
81 001010 1984 -0.0657935 2.181291
82 001010 1985 0.2802410 2.159443
83 001010 1986 0.2052373 2.128420
84 001010 1987 0.2465290 2.089670
85 001010 1988 0.3437856 2.048279
86 001010 1989 0.7398662 1.999539
87 001010 1990 0.6360582 1.955500
88 001010 1991 0.7790707 1.934893
89 001010 1992 0.7588472 1.925521
90 001010 1993 0.7695341 1.904241
91 001010 1994 0.8060759 1.881731
92 001010 1995 0.8381234 1.863143
93 001010 1996 0.8661541 1.844464
94 001010 1997 0.8700456 1.835676
95 001010 1998 0.8748443 1.837886
96 001010 1999 0.8884077 1.824636
97 001010 2000 0.8979903 1.814944
98 001010 2003 0.6812689 1.870703
99 001011 1983 0.3043007 2.190487
100 001011 1984 0.3080601 2.181291"
df <- read.table(text = Lines)
Note 2
The check for length in the line marked with ## at the end is no longer necessary as recent versions of zoo automatically make this check.
My Dataframe:
> head(scotland_weather)
JAN Year.1 FEB Year.2 MAR Year.3 APR Year.4 MAY Year.5 JUN Year.6 JUL Year.7 AUG Year.8 SEP Year.9 OCT Year.10
1 293.8 1993 278.1 1993 238.5 1993 191.1 1947 191.4 2011 155.0 1938 185.6 1940 216.5 1985 267.6 1950 258.1 1935
2 292.2 1928 258.8 1997 233.4 1990 149.0 1910 168.7 1986 137.9 2002 181.4 1988 211.9 1992 221.2 1981 254.0 1954
3 275.6 2008 244.7 2002 201.3 1992 146.8 1934 155.9 1925 137.8 1948 170.1 1939 202.3 2009 193.9 1982 248.8 2014
4 252.3 2015 227.9 1989 200.2 1967 142.1 1949 149.5 2015 137.7 1931 165.8 2010 191.4 1962 189.7 2011 247.7 1938
5 246.2 1974 224.9 2014 180.2 1979 133.5 1950 137.4 2003 135.0 1966 162.9 1956 190.3 2014 189.7 1927 242.3 1983
6 245.0 1975 195.6 1995 180.0 1989 132.9 1932 129.7 2007 131.7 2004 159.9 1985 189.1 2004 189.6 1985 240.9 2001
NOV Year.11 DEC Year.12 WIN Year.13 SPR Year.14 SUM Year.15 AUT Year.16 ANN Year.17
1 262.0 2009 300.7 2013 743.6 2014 409.5 1986 455.6 1985 661.2 1981 1886.4 2011
2 244.8 1938 268.5 1986 649.5 1995 401.3 2015 435.6 1948 633.8 1954 1828.1 1990
3 242.2 2006 267.2 1929 645.4 2000 393.7 1994 427.8 2009 615.8 1938 1756.8 2014
4 231.3 1917 265.4 2011 638.3 2007 393.2 1967 422.6 1956 594.5 1935 1735.8 1938
5 229.9 1981 264.0 2006 608.9 1990 391.7 1992 397.0 2004 590.6 1982 1720.0 2008
6 224.9 1951 261.0 1912 592.8 2015 389.1 1913 390.1 1938 589.2 2006 1716.5 1954
Year.X column is not ordered. I wish to convert this into the following format:
month year rainfall_mm
Jan 1993 293.8
Feb 1993 278.1
Mar 1993 238.5
...
Nov 2015 230.0
I tried t() but it keeps the year column separate.
also tried reshape2 recast(data, formula, ..., id.var, measure.var) but something is missing. as both month and Year.X columns are numeric and int
> str(scotland_weather)
'data.frame': 106 obs. of 34 variables:
$ JAN : num 294 292 276 252 246 ...
$ Year.1 : int 1993 1928 2008 2015 1974 1975 2005 2007 1990 1983 ...
$ FEB : num 278 259 245 228 225 ...
$ Year.2 : int 1990 1997 2002 1989 2014 1995 1998 2000 1920 1918 ...
$ MAR : num 238 233 201 200 180 ...
$ Year.3 : int 1994 1990 1992 1967 1979 1989 1921 1913 2015 1978 ...
$ APR : num 191 149 147 142 134 ...
Based on the pattern of alternating columns in the 'scotland_weather' for the 'YearX' column, one way would be to use c(TRUE, FALSE) to select the alternate column by recycling, which is similar to seq(1, ncol(scotland_weather), by =2). By using c(FALSE, TRUE), we get the seq(2, ncol(scotland_weather), by =2). This will be useful for extracting those columns, get the transpose (t) and concatenate (c) to vector. Once we are done with this, the next step will be to extract the column names that are not 'Year'. For this grep can be used. Then, we use data.frame to bind the vectors to a data.frame.
res <- data.frame(month= names(scotland_weather)[!grepl('Year',
names(scotland_weather))], year=c(t(scotland_weather[c(FALSE,TRUE)])),
rainfall_mm= c(t(scotland_weather[c(TRUE,FALSE)])))
head(res,4)
# month year rainfall_mm
#1 JAN 1993 293.8
#2 FEB 1993 278.1
#3 MAR 1993 238.5
#4 APR 1947 191.1
The problem you have is not only that you need to transform your data you do also have the problem that years for first column is in the second, years for the third column is in the fourth and so on...
Here is a solution using tidyr.
library(tidyr)
match <- Vectorize(function(x,y) grep(x,names(df)) - grep(y,names(df) == 1))
years <- grep("Year",names(scotland_weather))
df %>% gather("month","rainfall_mm",-years) %>%
gather("yearname","year",-c(months,time)) %>%
filter(match(month,yearname)) %>%
select(-yearname)
I ran the BMA package in R to do a CoxPH test. I'm wondering what should I edit the data, so that this problem "arguments imply differing number of rows: 146, 0 " can be solved.
library(BMA)
data <- read.csv("Test1.csv", header = TRUE)
x<- data[1:146,]
x <- data[,c( "dom_econ_2","llgdp", "pcrdbofgdp")]
surv.t<- x$crisis1
cens<- x$cen1
test.bic.surv<- bic.surv(x, surv.t, cens, factor.type=Ture, strict=FALSE, nbest=200)
Error in data.frame(mm[, -1], surv.t = surv.t, cens = cens) :
arguments imply differing number of rows: 146, 0
Construction of data.
data <- read.table(text=" country Start crisis1 cen1 llgdp pcrdbofgdp dom_econ_2
1 Algeria 1988 48 1 90.537788 65.226883 0.00
2 Algeria 1994 24 1 43.727940 5.994088 14.25
3 Argentina 1985 96 0 12.049210 12.676220 0.00
4 Argentina 2002 12 1 27.514610 18.335609 14.96
5 Australia 1985 12 0 36.909191 30.567970 0.00
6 Australia 1997 12 1 60.054508 69.576698 104.06
7 Australia 2000 12 1 64.405777 80.765381 89.13
8 Australia 2008 12 1 95.728081 115.909699 237.16
9 Austria 2005 12 1 91.344994 108.155701 82.14
10 Belgium 2005 12 1 102.885399 71.527367 114.55
11 Bolivia 1985 12 0 4.461628 4.868293 0.00
12 Bolivia 1987 12 1 13.480320 13.259240 0.00
13 Bolivia 1989 12 1 17.370689 17.162399 0.00
14 Brazil 1985 132 0 7.082396 22.242729 0.00
15 Brazil 1999 12 1 40.434750 30.275040 153.22
16 Brazil 2001 24 1 45.114819 30.151600 133.65
17 Brazil 2008 12 1 57.924221 47.755600 409.57
18 canada 2008 12 1 119.428703 126.900398 225.36
19 Chile 1985 12 0 0.000000 0.000000 0.00
20 Chile 1987 12 1 0.000000 0.000000 0.00
21 Chile 1989 12 1 0.000000 0.000000 0.00
22 Chile 2008 12 1 0.000000 0.000000 35.17
23 Cote D'lvoire 1994 12 1 25.643181 22.177429 2.10
24 Cote D'lvoire 2011 24 1 41.235161 19.288630 4.68
25 china 1986 12 1 0.000000 0.000000 0.00
26 china 1989 12 1 62.773560 71.162529 0.00
27 china 1994 12 1 83.825783 76.370827 67.21
28 Colombia 1985 84 0 29.268551 32.937222 0.00
29 Colombia 1995 12 1 30.042919 30.603430 12.56
30 Colombia 1997 48 1 31.537670 34.393360 17.34
31 Colombia 2002 12 1 16.778780 22.066490 17.12
32 Costa Rica 1987 12 1 35.334270 17.252380 0.00
33 Costa Rica 1991 12 1 30.253300 10.472690 1.01
34 Costa Rica 1995 12 1 25.711729 10.946140 1.88
35 Dominican Republic 1985 12 0 22.065741 38.200081 0.00
36 Dominican Republic 1987 24 1 27.200859 41.605549 0.00
37 Dominican Republic 1990 12 1 23.815241 35.062832 0.77
38 Dominican Republic 2002 24 1 20.893270 38.377579 3.62
39 Ecuador 1985 96 0 24.365290 25.992100 0.00
40 Ecuador 1995 72 1 25.012659 25.226681 3.30
41 Egypt 1989 36 1 0.000000 0.000000 0.00
42 Egypt 2001 12 1 0.000000 0.000000 21.36
43 Egypt 2003 12 1 0.000000 0.000000 21.67
44 El Salvador 1988 12 1 5.249366 4.249679 0.00
45 Finland 1992 12 1 61.804680 93.284843 51.87
46 France 2005 12 1 73.674927 90.176163 1144.92
47 Germany 1997 12 1 69.414650 107.758598 1048.86
48 Germany 1999 12 1 85.617897 115.610901 1037.57
49 Germany 2005 12 1 105.417099 111.763199 1297.82
50 Greece 1985 24 0 58.569908 37.887230 0.00
51 Greece 1990 12 1 68.117287 34.083881 30.32
52 Greece 1999 36 1 55.327202 36.298470 44.28
53 Greece 2005 12 1 85.200127 73.185272 77.85
54 Guatemala 1986 12 1 23.963770 14.939860 0.00
55 Guatemala 1989 24 1 22.968491 14.576470 0.00
56 Honduras 1990 12 1 31.085350 29.356951 0.60
57 Honduras 1993 24 1 29.533979 25.364269 0.91
58 Honduras 1996 12 1 28.978729 22.788309 0.86
59 Hungary 1989 12 1 39.513908 44.371880 0.00
60 Hungary 1991 12 1 44.693378 42.222179 18.29
61 Hungary 1993 12 1 52.589550 28.814779 21.60
62 Hungary 1995 36 1 44.789848 21.890961 21.87
63 Hungary 1999 12 1 44.038410 24.015810 21.43
64 Iceland 1985 24 0 21.419769 34.361641 0.00
65 Iceland 1988 24 1 25.819929 34.976372 0.00
66 Iceland 2008 12 1 93.622017 184.647003 0.00
67 India 1988 12 1 40.268990 28.615240 0.00
68 India 1991 12 1 40.929920 23.150181 55.40
69 India 1993 12 1 42.146000 22.969900 53.35
70 India 2008 12 1 69.759697 44.396610 207.09
71 Indonesia 1997 24 1 50.021770 53.528721 40.59
72 Indonesia 2000 12 1 49.576542 17.631670 27.06
73 Indonesia 2008 12 1 36.236462 23.411659 101.12
74 Ireland 1993 12 1 46.543369 42.833199 16.32
75 Ireland 1997 12 1 69.748718 72.668739 22.49
76 Ireland 2005 12 1 87.587280 141.341995 51.42
77 Italy 1992 12 1 61.862431 57.690781 537.05
78 Italy 2005 12 1 58.811539 85.478607 856.04
79 Malaysia 1997 12 1 116.673599 139.381607 21.01
80 Mexico 1985 36 0 23.277300 10.972870 0.00
81 Mexico 1989 12 1 12.128950 11.774920 0.00
82 Mexico 1994 24 1 27.620720 33.321041 64.37
83 Mexico 1998 12 1 31.633909 22.903950 60.87
84 Mexico 2008 12 1 25.276720 20.486820 175.60
85 Morocco 1985 12 0 46.630791 28.247660 0.00
86 Netherlands 2005 12 1 111.478996 159.227707 196.86
87 New Zealand 1997 12 1 81.314529 96.649277 20.87
88 New Zealand 2008 12 1 91.273071 143.887497 40.38
89 Nicaragua 1985 24 0 0.000000 0.000000 0.00
90 Nicaragua 1988 48 1 0.000000 0.000000 0.00
91 Nicaragua 1993 12 1 0.000000 0.000000 0.54
92 Nigeria 1985 72 0 33.616810 15.274050 0.00
93 Nigeria 1999 12 1 18.795080 12.470600 10.26
94 Norway 1986 12 1 52.509472 65.354111 0.00
95 Norway 2008 12 1 0.000000 0.000000 138.04
96 Paraguay 1985 24 0 19.059549 13.474090 0.00
97 Paraguay 1989 12 1 18.109470 13.592000 0.00
98 Paraguay 1992 24 1 28.895550 20.640970 0.88
99 Paraguay 1998 24 1 27.359171 27.806259 1.41
100 Paraguay 2001 24 1 27.472139 27.111059 1.27
101 Peru 1985 12 0 18.312740 12.587190 0.00
102 Peru 1987 84 1 14.426420 9.529409 0.00
103 Peru 1998 12 1 29.766150 26.084431 9.76
104 Philippines 1990 12 1 32.946239 19.481730 8.97
105 Philippines 1997 12 1 60.959930 55.599201 15.96
106 Philippines 2000 12 1 57.644821 39.109230 14.52
107 Poland 1985 108 0 38.214378 51.334850 0.00
108 Poland 1995 36 1 27.932590 14.869600 51.27
109 Poland 1999 12 1 37.415001 22.911200 32.18
110 Poland 2008 12 1 48.807541 43.228100 178.28
111 Portugal 2005 12 1 92.989853 135.765900 89.34
112 Romania 1990 144 1 0.000000 0.000000 12.92
113 Romania 2008 12 1 31.392929 36.600521 32.11
114 Romania 2010 12 1 37.728611 45.040459 32.29
115 Russia 1987 120 1 0.000000 0.000000 0.00
116 Russia 1998 24 1 0.000000 0.000000 43.93
117 Russia 2008 12 1 0.000000 0.000000 293.34
118 Singapore 1997 12 1 109.437202 107.355103 29.25
119 South Africa 1985 12 0 51.689949 66.574753 0.00
120 South Africa 1988 12 1 49.117390 67.433647 0.00
121 South Africa 1996 12 1 47.592419 112.563797 41.01
122 South Africa 1998 12 1 53.312820 113.043098 36.40
123 South Africa 2000 24 1 52.709499 127.040100 34.19
124 South Africa 2008 12 1 46.246601 149.139099 80.10
125 Spain 1993 12 1 73.074364 77.935318 129.39
126 Spain 2005 12 1 100.510200 129.920197 159.93
127 Sri Lanka 1989 12 1 35.501869 19.156321 0.00
128 Sweden 1992 12 1 50.942661 124.471397 117.62
129 Sweden 2005 12 1 46.589840 102.645203 97.60
130 Sweden 2008 12 1 56.333191 124.272102 116.23
131 Switzerland 1999 12 1 165.171402 159.786499 27.19
132 Thailand 1997 12 1 90.951942 154.129700 27.92
133 Thailand 2000 12 1 112.097000 116.628799 21.31
134 Tunisia 1986 12 1 0.000000 0.000000 0.00
135 Turkey 1985 204 0 20.020611 15.242030 0.00
136 Turkey 2008 12 1 44.036678 29.615061 175.62
137 Uruguay 1985 156 0 43.514191 34.115601 0.00
138 Uruguay 2001 24 1 45.520069 49.360771 5.82
139 Venezuela 1986 12 1 0.000000 0.000000 0.00
140 Venezuela 1989 96 1 0.000000 0.000000 0.00
141 Venezuela 2002 12 1 0.000000 0.000000 23.89
142 Venezuela 2004 12 1 0.000000 0.000000 28.59
143 Venezuela 2010 12 1 0.000000 0.000000 85.81
144 United Kingdom 1993 12 1 59.609852 106.663597 409.43
145 United Kingdom 2008 12 1 163.094299 197.386902 1093.45
146 United States 2002 24 1 64.508629 169.231400 2012.69",
header=TRUE)
The problem is that surv.t & cens are blank.
## NOTICE IN THIS LINE, YOU SELECT ONLY THREE SPECIFIC COLUMNS
x <- data[,c( "dom_econ_2","llgdp", "pcrdbofgdp")]
## Then in this line, you are trying to access a column that is not there.
surv.t<- x$crisis1
I believe you meant to use data instead of x:
surv.t <- data$crisis1
cens <- data$cen1
If you want just the first 146 rows, use
surv.t <- data$crisis1[1:146]
cens <- data$cen1[1:146]
However, bear in mind that you can just use data$cen1 (etc) as an argument to your function. No need to create a new variable
As a general troubleshooting tip: If you are getting an error from a function and you are not sure why, one of the first steps is to check the arguments that you are passing to the function (ie, check the things inside the parentheses) and make sure they have the values that you (and the function) expect them to have.