I have the below data frame (DF). I need to count how many times/years for each station has recorded maximum avg. temp, minumum avg. temp and maximum total precipitation.
In each row of DF above, year is followed by max avg. temp, min avg. temp and total avg. precipitation. For example, if in year 1985 highest max avg. temperature is recorded in station 1, it should count as one and so on.
Any suggestion or help is appreciated.
Thanks.
DF:
St_name Met_data
station1 1985 15.33 4.33 780.1, 1986 12.7 2.18 505.3, 1987 17.76 6.33 793.6, 1988 17.35 4.53 541, 1989 15.65 3.98 793.7, 1990 16.9 5.96 1169.4, 1991 16.42 5.26 790.6, 1992 14.99 5.04 932.6, 1993 13.96 4.75 1420.7, 1994 14.96 3.79 668.8, 1995 15 3.67 952.9, 1996 13.77 2.4 808.5, 1997 14.69 3.26 773.5, 1998 17.22 6.25 1126.4, 1999 16.35 4.32 921.9, 2000 14.55 2.83 893.9, 2001 15.71 4.33 1118.8, 2002 15.61 3.96 1000.4, 2003 14.83 2.84 911.7, 2004 14.9 4 965.1, 2005 16.16 4.7 647.7, 2006 16.18 5.14 800.8, 2007 15.52 4.15 890.3, 2008 14.35 2.91 1271.9, 2009 14.4 3.77 1343.8, 2010 15.32 4.57 1145.4, 2011 15.41 4.54 857.3, 2012 17.39 5.4 745, 2013 15.26 3.51 811.4, 2014 13.8 2.37 986.3
station2 1985 19.27 7.81 1465.5, 1986 20.37 8.81 1201.3, 1987 20.95 8.72 949.2, 1988 20.03 7.53 1104.6, 1989 19.11 7.42 1050.1, 1990 20.53 8.76 1486.2, 1991 20.21 9.53 1164.4, 1992 19.55 8.51 913.6, 1993 18.7 8.24 1485.1, 1994 19.43 8.42 1171.7, 1995 19.62 7.41 1084.9, 1996 19.01 6.29 1212.4, 1997 18.85 6.76 1243.2, 1998 21 8.27 1261.1, 1999 21.28 7.99 1122.4, 2000 19.99 7.74 1242.7, 2001 20.13 7.59 1305.8, 2002 20.13 7.69 1563, 2003 19.48 6.52 1237.1, 2004 19.94 7.42 1174.8, 2005 20.53 8.05 1140.5, 2006 20.16 7.18 1542, 2007 21.44 7.91 1167.8, 2008 17.6 5.51 1653.8, 2009 20.63 9.06 1326, 2010 21.31 8.7 1024.8, 2011 21.21 9.96 1847.6, 2012 22.22 9.39 782.5, 2013 20.46 9.29 770.7
station3 1985 14.43 2.97 951.6, 1986 15.41 3.37 415.6, 1987 15.08 4.34 1110, 1988 16.19 3.33 787.6, 1989 14.77 2.19 796.8, 1990 16.28 4.59 1213.6, 1991 16.72 4.67 907.4, 1992 14.74 4.18 935.6, 1993 15.22 5.06 903.1, 1994 15.46 2.79 907.5, 1995 15.34 4.21 1001.1, 1996 14.46 2.49 1204.5, 1997 14.95 2.95 819, 1998 17.5 5.3 1078.6, 1999 16.73 3.24 901.9, 2000 15.81 2.7 931.4, 2001 16.68 4.09 968.7, 2002 16.48 6.41 762.2, 2003 15.47 4.99 999.6, 2004 15.32 5.31 875.7, 2005 16.16 5.91 593.2, 2006 16.06 6.3 997.2, 2007 15.87 5.71 946, 2008 14.46 4.1 1128.1, 2009 14.26 4.38 1146.1, 2010 15.92 4.79 1037.6, 2011 15.25 5.47 1045.8, 2012 17.47 6.43 659.2, 2013 14.25 4 1092.9, 2014 13.26 2.98 1039.4
.
.
Output:
St_name max_T_count min_T_count precip_count
station1 1 0 0
station2 0 2 0
station3 1 1 1
.
.
You should at least make an effort to organize your data in a spreadsheet before posting. The first four lines in the code below are just for tidying your data. I am also not sure what you want for precip_count, but at least you can work that out based on this solution.
library(tidyverse)
df %>% separate_rows(Met_data, sep = ",") %>%
mutate(Met_data = trimws(Met_data)) %>%
separate(Met_data, sep = " ", into = c("year", "max_avg", "min_avg", "total_avg")) %>%
group_by(year) %>%
mutate(max_T_count = as.integer(max_avg == max(max_avg)),
min_T_count = as.integer(min_avg == min(min_avg)),
precip_count = as.integer(total_avg == max(total_avg))) %>%
ungroup() %>%
group_by(St_name) %>%
summarise_at(vars(ends_with("count")), sum)
%>% is the magrittr package pipe operator.
separate_rows separates the entries of the column at commas Met_data into new rows.
trimws trims the extra whitespaces around characters. This is necessary in order for separate the characters exactly at blanks.
separate separates Met_data at blanks and assigns the separated variables with column names.
group_by specifies by which grouping the aggregation is going to be done.
mutate creates new columns.
summarise_at makes summaries on specified columns with specified functions.
These are a handful. I advise you to read the documentations for each of these by typing ?function where you replace function by each of those mentioned above. Or you can use help like `help("%>%", package = "magrittr").
Here is the output.
# A tibble: 3 x 4
# St_name max_T_count min_T_count precip_count
# <fct> <int> <int> <int>
# 1 station1 1 17 11
# 2 station2 29 0 5
# 3 station3 0 13 14
Here is the data.
df <- structure(list(St_name = structure(1:3, .Label = c("station1",
"station2", "station3"), class = "factor"), Met_data = structure(c(2L,
3L, 1L), .Label = c(" 1985 14.43 2.97 951.6, 1986 15.41 3.37 415.6, 1987 15.08 4.34 1110, 1988 16.19 3.33 787.6, 1989 14.77 2.19 796.8, 1990 16.28 4.59 1213.6, 1991 16.72 4.67 907.4, 1992 14.74 4.18 935.6, 1993 15.22 5.06 903.1, 1994 15.46 2.79 907.5, 1995 15.34 4.21 1001.1, 1996 14.46 2.49 1204.5, 1997 14.95 2.95 819, 1998 17.5 5.3 1078.6, 1999 16.73 3.24 901.9, 2000 15.81 2.7 931.4, 2001 16.68 4.09 968.7, 2002 16.48 6.41 762.2, 2003 15.47 4.99 999.6, 2004 15.32 5.31 875.7, 2005 16.16 5.91 593.2, 2006 16.06 6.3 997.2, 2007 15.87 5.71 946, 2008 14.46 4.1 1128.1, 2009 14.26 4.38 1146.1, 2010 15.92 4.79 1037.6, 2011 15.25 5.47 1045.8, 2012 17.47 6.43 659.2, 2013 14.25 4 1092.9, 2014 13.26 2.98 1039.4",
" 1985 15.33 4.33 780.1, 1986 12.7 2.18 505.3, 1987 17.76 6.33 793.6, 1988 17.35 4.53 541, 1989 15.65 3.98 793.7, 1990 16.9 5.96 1169.4, 1991 16.42 5.26 790.6, 1992 14.99 5.04 932.6, 1993 13.96 4.75 1420.7, 1994 14.96 3.79 668.8, 1995 15 3.67 952.9, 1996 13.77 2.4 808.5, 1997 14.69 3.26 773.5, 1998 17.22 6.25 1126.4, 1999 16.35 4.32 921.9, 2000 14.55 2.83 893.9, 2001 15.71 4.33 1118.8, 2002 15.61 3.96 1000.4, 2003 14.83 2.84 911.7, 2004 14.9 4 965.1, 2005 16.16 4.7 647.7, 2006 16.18 5.14 800.8, 2007 15.52 4.15 890.3, 2008 14.35 2.91 1271.9, 2009 14.4 3.77 1343.8, 2010 15.32 4.57 1145.4, 2011 15.41 4.54 857.3, 2012 17.39 5.4 745, 2013 15.26 3.51 811.4, 2014 13.8 2.37 986.3",
" 1985 19.27 7.81 1465.5, 1986 20.37 8.81 1201.3, 1987 20.95 8.72 949.2, 1988 20.03 7.53 1104.6, 1989 19.11 7.42 1050.1, 1990 20.53 8.76 1486.2, 1991 20.21 9.53 1164.4, 1992 19.55 8.51 913.6, 1993 18.7 8.24 1485.1, 1994 19.43 8.42 1171.7, 1995 19.62 7.41 1084.9, 1996 19.01 6.29 1212.4, 1997 18.85 6.76 1243.2, 1998 21 8.27 1261.1, 1999 21.28 7.99 1122.4, 2000 19.99 7.74 1242.7, 2001 20.13 7.59 1305.8, 2002 20.13 7.69 1563, 2003 19.48 6.52 1237.1, 2004 19.94 7.42 1174.8, 2005 20.53 8.05 1140.5, 2006 20.16 7.18 1542, 2007 21.44 7.91 1167.8, 2008 17.6 5.51 1653.8, 2009 20.63 9.06 1326, 2010 21.31 8.7 1024.8, 2011 21.21 9.96 1847.6, 2012 22.22 9.39 782.5, 2013 20.46 9.29 770.7"
), class = "factor")), .Names = c("St_name", "Met_data"), class = "data.frame", row.names = c(NA,
-3L))
Related
With the small reproducible example below, I'd like to identify the dplyr approach to arrive at the data.frame shown at the end of this note. The features of the dplyr output is that it will ensure that the data.frame is sorted by date (note that the dates 1999-04-13 and 1999-03-12 are out of order) and that it then "accumulate" the number of days within each wy grouping (wy = "water year"; Oct 1-Sep 30) that Q is above a threshold of 3.0.
dat <- read.table(text="
Date wy Q
1997-01-01 1997 9.82
1997-02-01 1997 3.51
1997-02-02 1997 9.35
1997-10-04 1998 0.93
1997-11-01 1998 1.66
1997-12-02 1998 0.81
1998-04-03 1998 5.65
1998-05-05 1998 7.82
1998-07-05 1998 6.33
1998-09-06 1998 0.55
1998-09-07 1998 4.54
1998-10-09 1999 6.50
1998-12-31 1999 2.17
1999-01-01 1999 5.67
1999-04-13 1999 5.66
1999-03-12 1999 4.67
1999-06-05 1999 3.34
1999-09-30 1999 1.99
1999-11-06 2000 5.75
2000-03-04 2000 6.28
2000-06-07 2000 0.81
2000-07-06 2000 9.66
2000-09-09 2000 9.08
2000-09-21 2000 6.72", header=TRUE)
dat$Date <- as.Date(dat$Date)
mdat <- dat %>%
group_by(wy) %>%
filter(Q > 3) %>%
?
Desired results:
Date wy Q abvThreshCum
1997-01-01 1997 9.82 1
1997-02-01 1997 3.51 2
1997-02-02 1997 9.35 3
1997-10-04 1998 0.93 0
1997-11-01 1998 1.66 0
1997-12-02 1998 0.81 0
1998-04-03 1998 5.65 1
1998-05-05 1998 7.82 2
1998-07-05 1998 6.33 3
1998-09-06 1998 0.55 3
1998-09-07 1998 4.54 4
1998-10-09 1999 6.50 1
1998-12-31 1999 2.17 1
1999-01-01 1999 5.67 2
1999-03-12 1999 4.67 3
1999-04-13 1999 5.66 4
1999-06-05 1999 3.34 5
1999-09-30 1999 1.99 5
1999-11-06 2000 5.75 1
2000-03-04 2000 6.28 2
2000-06-07 2000 0.81 2
2000-07-06 2000 9.66 3
2000-09-09 2000 9.08 4
2000-09-21 2000 6.72 5
library(dplyr)
dat %>%
arrange(Date) %>%
group_by(wy) %>%
mutate(abv = cumsum(Q > 3)) %>%
ungroup()
# # A tibble: 24 x 4
# Date wy Q abv
# <date> <int> <dbl> <int>
# 1 1997-01-01 1997 9.82 1
# 2 1997-02-01 1997 3.51 2
# 3 1997-02-02 1997 9.35 3
# 4 1997-10-04 1998 0.93 0
# 5 1997-11-01 1998 1.66 0
# 6 1997-12-02 1998 0.81 0
# 7 1998-04-03 1998 5.65 1
# 8 1998-05-05 1998 7.82 2
# 9 1998-07-05 1998 6.33 3
# 10 1998-09-06 1998 0.55 3
# # ... with 14 more rows
data.table approach
library(data.table)
setDT(dat, key = "Date")[, abvThreshCum := cumsum(Q > 3), by = .(wy)]
everyone. I am using DLNM in R to analyze to lag-effect of climatic conditions on the prevalence of the disease.
I followed somebody else's program strictly
, and it worked in avg.temp and max.speed, but showed err "coef/vcov not consistent with basis matrix" in avg.ap and avg.hum. However, i just changed the variables set in code, and never changed other code.
I have a hypothesis that maybe DLNM doesn't like wet weather. T T
I don't know what to do, can you help me?
Part 1 was the Successfully run code, part 2 was the code that showed err, and part 3 was the data I used.
Thank you very much. I hope you can help me
Part 1. Successfully run code
attach(cpdlnm)
cb.temp = crossbasis(avg.temp, lag=1 ,
argvar=list(fun="ns",
knots= c(10)),
arglag=list(fun="lin"))
modeltemp = glm(pre1 ~ cb.temp +
ns(no,1*1),
family=quasipoisson(), cpdlnm)
pred1.temp = crosspred(cb.temp,
modelhum,
cen=round(median(avg.temp)),
bylag=1)
Part 2. Error code
attach(cpdlnm)
cb.hum = crossbasis(avg.hum, lag=1 ,
argvar=list(fun="ns",
knots= c(10)),
arglag=list(fun="lin"))
modelhum = glm(pre1 ~ cb.hum +
ns(no,1*1),
family=quasipoisson(), cpdlnm)
pred1.hum = crosspred(cb.hum, # This step shows "coef/vcov not consistent with basis matrix"
modelhum,
cen=round(median(avg.hum)),
bylag=0.1)
Part 3. the data are as following:
no pre1 date year month avg.ap avg.temp avg.hum max.speed
1 3.23 12-Jan 2012 1 996.60 9.00 81.60 5.30
2 6.04 12-Feb 2012 2 993.20 10.90 80.80 6.20
3 5.18 12-Mar 2012 3 991.00 16.40 78.70 7.60
4 4.07 12-Apr 2012 4 985.40 23.50 73.50 7.40
5 4.88 12-May 2012 5 982.60 26.30 77.20 7.00
6 5.11 12-Jun 2012 6 978.10 27.00 81.30 6.20
7 6.18 12-Jul 2012 7 979.50 28.10 77.70 6.40
8 6.17 12-Aug 2012 8 980.40 28.00 75.60 7.90
9 5.18 12-Sep 2012 9 987.60 25.30 73.60 6.30
10 5.16 12-Oct 2012 10 990.70 23.60 72.20 6.20
11 4.61 12-Nov 2012 11 991.70 18.00 79.70 6.90
12 5.26 12-Dec 2012 12 995.00 13.20 74.90 6.50
13 3.79 13-Jan 2013 1 997.10 11.20 78.40 5.70
14 3.87 13-Feb 2013 2 993.50 15.30 82.20 6.50
15 3.37 13-Mar 2013 3 989.90 20.20 74.20 8.00
16 2.85 13-Apr 2013 4 987.00 21.50 78.50 7.70
17 4.38 13-May 2013 5 983.30 25.60 79.20 6.80
18 5.67 13-Jun 2013 6 980.60 27.40 76.90 6.60
19 6.45 13-Jul 2013 7 981.30 28.00 77.50 7.10
20 6.95 13-Aug 2013 8 980.50 27.90 78.20 7.90
21 6.51 13-Sep 2013 9 985.90 25.40 77.60 6.00
22 8.16 13-Oct 2013 10 992.20 22.10 68.80 5.30
23 5.34 13-Nov 2013 11 994.50 18.70 72.30 6.20
24 6.18 13-Dec 2013 12 997.30 11.70 67.20 5.30
25 5.69 14-Jan 2014 1 996.70 12.70 70.30 6.00
26 6.44 14-Feb 2014 2 993.00 12.10 76.90 6.40
27 4.16 14-Mar 2014 3 991.60 16.50 83.90 7.30
28 4.13 14-Apr 2014 4 987.60 22.60 82.40 6.70
29 3.96 14-May 2014 5 983.60 25.70 78.80 7.70
30 4.72 14-Jun 2014 6 979.20 27.70 81.40 7.90
31 5.21 14-Jul 2014 7 980.70 28.30 80.20 9.40
32 5.29 14-Aug 2014 8 982.40 27.50 81.30 7.50
33 6.74 14-Sep 2014 9 984.70 27.10 77.70 8.50
34 4.80 14-Oct 2014 10 991.20 23.90 73.10 5.90
35 4.31 14-Nov 2014 11 993.30 18.60 79.60 6.20
36 4.35 14-Dec 2014 12 998.70 12.30 67.30 5.90
37 2.95 15-Jan 2015 1 996.70 13.30 76.30 6.20
38 4.63 15-Feb 2015 2 993.50 15.50 78.30 6.50
39 4.00 15-Mar 2015 3 991.70 17.70 83.40 6.30
40 4.16 15-Apr 2015 4 988.40 22.80 70.20 7.30
41 4.67 15-May 2015 5 982.40 26.70 80.50 8.00
42 5.62 15-Jun 2015 6 980.90 28.20 81.00 7.40
43 5.04 15-Jul 2015 7 980.20 27.30 79.40 6.70
44 5.79 15-Aug 2015 8 982.40 27.60 80.10 6.50
45 5.28 15-Sep 2015 9 986.30 26.00 84.60 6.50
46 4.39 15-Oct 2015 10 991.20 23.00 78.30 6.90
47 4.13 15-Nov 2015 11 993.50 19.40 85.30 6.90
48 3.30 15-Dec 2015 12 997.80 13.00 80.90 5.70
49 5.30 16-Jan 2016 1 996.00 11.80 82.30 6.40
50 4.57 16-Feb 2016 2 997.80 12.20 68.90 7.00
51 4.66 16-Mar 2016 3 991.70 17.00 78.90 7.00
52 4.01 16-Apr 2016 4 984.60 23.40 80.90 9.80
53 4.90 16-May 2016 5 983.80 25.50 78.70 8.30
54 3.75 16-Jun 2016 6 981.70 28.20 78.80 7.70
55 3.13 16-Jul 2016 7 981.10 28.90 77.60 7.60
56 3.25 16-Aug 2016 8 979.00 28.00 79.80 8.70
57 2.93 16-Sep 2016 9 984.30 26.60 75.20 6.40
58 2.93 16-Oct 2016 10 987.90 24.40 72.90 7.00
59 3.08 16-Nov 2016 11 993.40 18.10 79.60 6.70
60 2.99 16-Dec 2016 12 995.70 15.40 71.70 6.80
61 3.10 17-Jan 2017 1 994.70 14.50 79.20 6.50
62 3.75 17-Feb 2017 2 994.80 14.70 71.50 8.30
63 3.49 17-Mar 2017 3 990.20 16.50 83.60 8.50
64 3.36 17-Apr 2017 4 986.80 21.90 76.70 7.80
65 3.69 17-May 2017 5 985.00 24.80 77.50 10.00
66 3.76 17-Jun 2017 6 980.20 26.90 84.80 8.50
67 2.69 17-Jul 2017 7 981.00 27.50 83.60 9.80
68 3.05 17-Aug 2017 8 980.50 27.70 83.40 9.00
69 3.05 17-Sep 2017 9 984.20 27.60 81.50 7.10
70 2.46 17-Oct 2017 10 990.00 22.80 75.90 7.90
71 2.08 17-Nov 2017 11 993.00 17.80 79.50 7.00
72 2.32 17-Dec 2017 12 996.90 13.30 69.30 6.90
73 2.53 18-Jan 2018 1 992.10 12.00 78.40 8.10
74 3.29 18-Feb 2018 2 992.90 13.40 68.70 7.20
75 3.03 18-Mar 2018 3 988.30 19.20 78.20 9.10
76 2.30 18-Apr 2018 4 986.50 21.80 77.30 8.70
77 1.75 18-May 2018 5 982.60 26.70 79.40 8.90
78 2.03 18-Jun 2018 6 978.30 26.90 81.60 9.00
79 2.79 18-Jul 2018 7 976.80 27.90 82.10 9.20
80 2.32 18-Aug 2018 8 976.40 27.50 83.40 9.60
81 1.88 18-Sep 2018 9 983.50 26.10 80.10 8.90
82 2.76 18-Oct 2018 10 990.50 21.10 78.70 7.10
83 2.14 18-Nov 2018 11 991.50 18.20 80.30 7.10
84 1.78 18-Dec 2018 12 994.50 13.00 84.00 7.80
85 2.77 19-Jan 2019 1 995.20 11.70 84.50 7.30
86 4.60 19-Feb 2019 2 990.50 13.70 84.80 8.10
87 2.32 19-Mar 2019 3 987.70 17.30 85.90 9.90
88 2.07 19-Apr 2019 4 983.60 23.10 84.80 9.80
89 2.97 19-May 2019 5 981.80 24.30 83.20 7.70
90 2.48 19-Jun 2019 6 977.80 27.50 84.80 9.00
91 2.32 19-Jul 2019 7 977.20 27.80 85.00 8.90
92 2.06 19-Aug 2019 8 977.20 28.30 81.20 10.30
93 2.10 19-Sep 2019 9 984.60 26.40 72.70 8.20
94 2.89 19-Oct 2019 10 989.10 22.70 78.00 7.00
My guess is that when you specify "knots= c(10)", 10 is within the range of temperature but not the same for humidity (if the min>10, then the lag can't be defined).
Year.Sales.Advertise.Employees
1 1985 1.05 162 32
2 1986 1.26 285 47
3 1987 1.47 540 23
4 1988 2.16 261 68
5 1989 1.95 360 32
6 1990 2.4 690 17
7 1991 2.37 495 58
8 1992 3.15 948 75
9 1993 3.57 720 98
10 1994 4.41 1.14 43
11 1995 4.5 1.395 76
12 1996 5.61 1.56 89
13 1997 5.19 1.38 108
14 1998 5.67 1.26 76
15 1999 5.16 1.71 65
16 2000 6.84 1.86 93
I want to find the Spearman correlation between Sales and Advertise and ive been stuck for 3 hours please help. I think I have to separate the 1 variable into 5 variables but Im struggling.
We can use strsplit to split our data, i.e.
new_df <- setNames(data.frame(do.call(rbind, strsplit(df2$Year.Sales.Advertise.Employees, ' '))),
strsplit(names(df2), '.', fixed = TRUE)[[1]])
which gives,
Year Sales Advertise Employees
1 1985 1.05 162 32
2 1986 1.26 285 47
3 1987 1.47 540 23
4 1988 2.16 261 68
5 1989 1.95 360 32
6 1990 2.4 690 17
7 1991 2.37 495 58
8 1992 3.15 948 75
9 1993 3.57 720 98
10 1994 4.41 1.14 43
11 1995 4.5 1.395 76
12 1996 5.61 1.56 89
13 1997 5.19 1.38 108
14 1998 5.67 1.26 76
15 1999 5.16 1.71 65
16 2000 6.84 1.86 93
You can then use cor (i.e. cor(new_df$Advertise, new_df$Employees)) to find correlations between any columns you want.
NOTE1: Make sure that your initial column is a character (not factor)
NOTE2: By default, cor function calculates the pearson correlation. For spearman, add the argument cor(..., method = "spearman"), as mentioned by #Base_R_Best_R.
DATA
dput(df2)
structure(list(Year.Sales.Advertise.Employees = c("1985 1.05 162 32",
"1986 1.26 285 47", "1987 1.47 540 23", "1988 2.16 261 68", "1989 1.95 360 32",
"1990 2.4 690 17", "1991 2.37 495 58", "1992 3.15 948 75", "1993 3.57 720 98",
"1994 4.41 1.14 43", "1995 4.5 1.395 76", "1996 5.61 1.56 89",
"1997 5.19 1.38 108", "1998 5.67 1.26 76", "1999 5.16 1.71 65",
"2000 6.84 1.86 93")), class = "data.frame", row.names = c(NA,
-16L))
If you're asking for the data to be split into 4 discrete columns, this should do it.
Your data in the question needed some cleaning. It probably needs more (manual) cleaning, as advertise falls from 720 to 1.14 between 1993 and 1994. That's likely from hundreds of thousands to millions.
x <- c("1985 1.05 162 32",
"1986 1.26 285 47",
"1987 1.47 540 23",
"1988 2.16 261 68",
"1989 1.95 360 32",
"1990 2.4 690 17",
"1991 2.37 495 58",
"1992 3.15 948 75",
"1993 3.57 720 98",
"1994 4.41 1.14 43",
"1995 4.5 1.395 76",
"1996 5.61 1.56 89",
"1997 5.19 1.38 108",
"1998 5.67 1.26 76",
"1999 5.16 1.71 65",
"2000 6.84 1.86 93")
library(tidyverse)
clean_df <- x %>%
as.data.frame() %>%
separate('.',
into = c('year','sales', 'advertise', 'empl'),
sep = ' ') %>%
as_tibble() %>%
mutate_all(as.numeric)
cor(clean_df$sales, clean_df$advertise, method = 'spearman')
Not sure if you are looking for something like below or other things
# split strings into separate columns
df <- `names<-`(data.frame(t(apply(df, 1, function(x) as.numeric(unlist(strsplit(x,split = " ")))))),
unlist(strsplit(names(df),split = "\\.")))
# calculate correction coefficient
r <- cor(df$Sales,df$Advertise)
such that
> r
[1] -0.5624524
DATA
df <- structure(list(Year.Sales.Advertise.Employees = c("1985 1.05 162 32",
"1986 1.26 285 47", "1987 1.47 540 23", "1988 2.16 261 68", "1989 1.95 360 32",
"1990 2.4 690 17", "1991 2.37 495 58", "1992 3.15 948 75", "1993 3.57 720 98",
"1994 4.41 1.14 43", "1995 4.5 1.395 76", "1996 5.61 1.56 89",
"1997 5.19 1.38 108", "1998 5.67 1.26 76", "1999 5.16 1.71 65",
"2000 6.84 1.86 93")), class = "data.frame", row.names = c(NA,
-16L))
> df
Year.Sales.Advertise.Employees
1 1985 1.05 162 32
2 1986 1.26 285 47
3 1987 1.47 540 23
4 1988 2.16 261 68
5 1989 1.95 360 32
6 1990 2.4 690 17
7 1991 2.37 495 58
8 1992 3.15 948 75
9 1993 3.57 720 98
10 1994 4.41 1.14 43
11 1995 4.5 1.395 76
12 1996 5.61 1.56 89
13 1997 5.19 1.38 108
14 1998 5.67 1.26 76
15 1999 5.16 1.71 65
16 2000 6.84 1.86 93
Below are monthly prices of a particular stock;
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2008 46.09 50.01 48 48 50.15 43.45 41.05 41.67 36.66 25.02 22.98 22
2009 20.98 15 13.04 14.4 26.46 14.32 14.6 11.83 14 14.4 13.07 13.6
2010 15.31 15.71 18.97 15.43 13.5 13.8 14.21 12.73 12.35 13.17 14.59 15.01
2011 15.3 15.22 15.23 15 15.1 14.66 14.8 12.02 12.41 12.9 11.6 12.18
2012 12.45 13.33 12.4 14.16 13.99 13.75 14.4 15.38 16.3 18.02 17.29 19.49
2013 20.5 20.75 21.3 20.15 22.2 19.8 19.75 19.71 19.99 21.54 21.3 27.4
2014 23.3 20.5 20 22.7 25.4 25.05 25.08 24.6 24.5 21.2 20.52 18.41
2015 16.01 17.6 20.98 21.15 21.44 0 0 0 0 0 0 0
I want to decompose the data into seasonal and trend data but I am not getting a result.
How can I load the data as a "ts" class data so I can decompose it?
Here is a solution using tidyr, which is fairly accessible.
library(dplyr); library(tidyr)
data %>% gather(month, price, -Year) %>% # 1 row per year-month pair, name the value "price"
mutate(synth_date_txt= paste(month,"1,",Year), # combine month and year into a date string
date=as.Date(synth_date_txt,format="%b %d, %Y")) %>% # convert date string to date
select(date, price) # keep just the date and price
# date price
# 1 2008-01-01 46.09
# 2 2009-01-01 20.98
# 3 2010-01-01 15.31
# 4 2011-01-01 15.30
# 5 2012-01-01 12.45
This gives you an answer with date format (even though you didn't specify a date, just a month and year). It should work for your time series analysis, but if you really need a timestamp you can just use as.POSIXct(date)
Mike,
The program is R and below is the code I have tried.
sev=read.csv("X7UPM.csv")
se=ts(sev,start=c(2008, 1), end=c(2015,1), frequency=12)
se
se=se[,1]
S=decompose(se)
plot(se,col=c("blue"))
plot(decompose(se))
S.decom=decompose(se,type="mult")
plot(S.decom)
trend=S.decom$trend
trend
seasonal=S.decom$seasonal
seasonal
ts.plot(cbind(trend,trend*seasonal),lty=1:2)
plot(stl(se,"periodic"))
I have a question about how to calculate year to year correlations for certain statistics. Like I want to test whether two statistics (let's use a baseball example) whether for specific players is Average or On-Base Percentage more constant over time? Like which fluctuates more compared to the other.
My data currently is in the following format:
name Season ARuns Lag 1 BRuns Lag BRuns
321 Abad Andy 2003 -1.05 NA -1.19 NA
3158 Abercrombie Reggie 2006 27.42 NA -.42 NA
1312 Abercrombie Reggie 2007 7.65 27.42 .15 -.42
1069 Abercrombie Reggie 2008 5.34 7.65 -1.81 .15
4614 Abernathy Brent 2002 46.71 NA -.86 NA
707 Abernathy Brent 2003 -2.29 46.71 -.33 -.86
1297 Abernathy Brent 2005 5.59 -2.29 3.53 -.33
6024 Abreu Bobby 2002 102.89 NA 2.70 NA
6087 Abreu Bobby 2003 113.23 102.89 4.39 2.70
6177 Abreu Bobby 2004 128.60 113.23 2.29 4.39
Any ideas would be appreciated!