Spearman correlation and splitting 1 variable - r

Year.Sales.Advertise.Employees
1 1985 1.05 162 32
2 1986 1.26 285 47
3 1987 1.47 540 23
4 1988 2.16 261 68
5 1989 1.95 360 32
6 1990 2.4 690 17
7 1991 2.37 495 58
8 1992 3.15 948 75
9 1993 3.57 720 98
10 1994 4.41 1.14 43
11 1995 4.5 1.395 76
12 1996 5.61 1.56 89
13 1997 5.19 1.38 108
14 1998 5.67 1.26 76
15 1999 5.16 1.71 65
16 2000 6.84 1.86 93
I want to find the Spearman correlation between Sales and Advertise and ive been stuck for 3 hours please help. I think I have to separate the 1 variable into 5 variables but Im struggling.

We can use strsplit to split our data, i.e.
new_df <- setNames(data.frame(do.call(rbind, strsplit(df2$Year.Sales.Advertise.Employees, ' '))),
strsplit(names(df2), '.', fixed = TRUE)[[1]])
which gives,
Year Sales Advertise Employees
1 1985 1.05 162 32
2 1986 1.26 285 47
3 1987 1.47 540 23
4 1988 2.16 261 68
5 1989 1.95 360 32
6 1990 2.4 690 17
7 1991 2.37 495 58
8 1992 3.15 948 75
9 1993 3.57 720 98
10 1994 4.41 1.14 43
11 1995 4.5 1.395 76
12 1996 5.61 1.56 89
13 1997 5.19 1.38 108
14 1998 5.67 1.26 76
15 1999 5.16 1.71 65
16 2000 6.84 1.86 93
You can then use cor (i.e. cor(new_df$Advertise, new_df$Employees)) to find correlations between any columns you want.
NOTE1: Make sure that your initial column is a character (not factor)
NOTE2: By default, cor function calculates the pearson correlation. For spearman, add the argument cor(..., method = "spearman"), as mentioned by #Base_R_Best_R.
DATA
dput(df2)
structure(list(Year.Sales.Advertise.Employees = c("1985 1.05 162 32",
"1986 1.26 285 47", "1987 1.47 540 23", "1988 2.16 261 68", "1989 1.95 360 32",
"1990 2.4 690 17", "1991 2.37 495 58", "1992 3.15 948 75", "1993 3.57 720 98",
"1994 4.41 1.14 43", "1995 4.5 1.395 76", "1996 5.61 1.56 89",
"1997 5.19 1.38 108", "1998 5.67 1.26 76", "1999 5.16 1.71 65",
"2000 6.84 1.86 93")), class = "data.frame", row.names = c(NA,
-16L))

If you're asking for the data to be split into 4 discrete columns, this should do it.
Your data in the question needed some cleaning. It probably needs more (manual) cleaning, as advertise falls from 720 to 1.14 between 1993 and 1994. That's likely from hundreds of thousands to millions.
x <- c("1985 1.05 162 32",
"1986 1.26 285 47",
"1987 1.47 540 23",
"1988 2.16 261 68",
"1989 1.95 360 32",
"1990 2.4 690 17",
"1991 2.37 495 58",
"1992 3.15 948 75",
"1993 3.57 720 98",
"1994 4.41 1.14 43",
"1995 4.5 1.395 76",
"1996 5.61 1.56 89",
"1997 5.19 1.38 108",
"1998 5.67 1.26 76",
"1999 5.16 1.71 65",
"2000 6.84 1.86 93")
library(tidyverse)
clean_df <- x %>%
as.data.frame() %>%
separate('.',
into = c('year','sales', 'advertise', 'empl'),
sep = ' ') %>%
as_tibble() %>%
mutate_all(as.numeric)
cor(clean_df$sales, clean_df$advertise, method = 'spearman')

Not sure if you are looking for something like below or other things
# split strings into separate columns
df <- `names<-`(data.frame(t(apply(df, 1, function(x) as.numeric(unlist(strsplit(x,split = " ")))))),
unlist(strsplit(names(df),split = "\\.")))
# calculate correction coefficient
r <- cor(df$Sales,df$Advertise)
such that
> r
[1] -0.5624524
DATA
df <- structure(list(Year.Sales.Advertise.Employees = c("1985 1.05 162 32",
"1986 1.26 285 47", "1987 1.47 540 23", "1988 2.16 261 68", "1989 1.95 360 32",
"1990 2.4 690 17", "1991 2.37 495 58", "1992 3.15 948 75", "1993 3.57 720 98",
"1994 4.41 1.14 43", "1995 4.5 1.395 76", "1996 5.61 1.56 89",
"1997 5.19 1.38 108", "1998 5.67 1.26 76", "1999 5.16 1.71 65",
"2000 6.84 1.86 93")), class = "data.frame", row.names = c(NA,
-16L))
> df
Year.Sales.Advertise.Employees
1 1985 1.05 162 32
2 1986 1.26 285 47
3 1987 1.47 540 23
4 1988 2.16 261 68
5 1989 1.95 360 32
6 1990 2.4 690 17
7 1991 2.37 495 58
8 1992 3.15 948 75
9 1993 3.57 720 98
10 1994 4.41 1.14 43
11 1995 4.5 1.395 76
12 1996 5.61 1.56 89
13 1997 5.19 1.38 108
14 1998 5.67 1.26 76
15 1999 5.16 1.71 65
16 2000 6.84 1.86 93

Related

Why the humidity in DLNM(R) showed coef/vcov not consistent with basis matrix, but temperature was OK

everyone. I am using DLNM in R to analyze to lag-effect of climatic conditions on the prevalence of the disease.
I followed somebody else's program strictly
, and it worked in avg.temp and max.speed, but showed err "coef/vcov not consistent with basis matrix" in avg.ap and avg.hum. However, i just changed the variables set in code, and never changed other code.
I have a hypothesis that maybe DLNM doesn't like wet weather. T T
I don't know what to do, can you help me?
Part 1 was the Successfully run code, part 2 was the code that showed err, and part 3 was the data I used.
Thank you very much. I hope you can help me
Part 1. Successfully run code
attach(cpdlnm)
cb.temp = crossbasis(avg.temp, lag=1 ,
argvar=list(fun="ns",
knots= c(10)),
arglag=list(fun="lin"))
modeltemp = glm(pre1 ~ cb.temp +
ns(no,1*1),
family=quasipoisson(), cpdlnm)
pred1.temp = crosspred(cb.temp,
modelhum,
cen=round(median(avg.temp)),
bylag=1)
Part 2. Error code
attach(cpdlnm)
cb.hum = crossbasis(avg.hum, lag=1 ,
argvar=list(fun="ns",
knots= c(10)),
arglag=list(fun="lin"))
modelhum = glm(pre1 ~ cb.hum +
ns(no,1*1),
family=quasipoisson(), cpdlnm)
pred1.hum = crosspred(cb.hum, # This step shows "coef/vcov not consistent with basis matrix"
modelhum,
cen=round(median(avg.hum)),
bylag=0.1)
Part 3. the data are as following:
no pre1 date year month avg.ap avg.temp avg.hum max.speed
1 3.23 12-Jan 2012 1 996.60 9.00 81.60 5.30
2 6.04 12-Feb 2012 2 993.20 10.90 80.80 6.20
3 5.18 12-Mar 2012 3 991.00 16.40 78.70 7.60
4 4.07 12-Apr 2012 4 985.40 23.50 73.50 7.40
5 4.88 12-May 2012 5 982.60 26.30 77.20 7.00
6 5.11 12-Jun 2012 6 978.10 27.00 81.30 6.20
7 6.18 12-Jul 2012 7 979.50 28.10 77.70 6.40
8 6.17 12-Aug 2012 8 980.40 28.00 75.60 7.90
9 5.18 12-Sep 2012 9 987.60 25.30 73.60 6.30
10 5.16 12-Oct 2012 10 990.70 23.60 72.20 6.20
11 4.61 12-Nov 2012 11 991.70 18.00 79.70 6.90
12 5.26 12-Dec 2012 12 995.00 13.20 74.90 6.50
13 3.79 13-Jan 2013 1 997.10 11.20 78.40 5.70
14 3.87 13-Feb 2013 2 993.50 15.30 82.20 6.50
15 3.37 13-Mar 2013 3 989.90 20.20 74.20 8.00
16 2.85 13-Apr 2013 4 987.00 21.50 78.50 7.70
17 4.38 13-May 2013 5 983.30 25.60 79.20 6.80
18 5.67 13-Jun 2013 6 980.60 27.40 76.90 6.60
19 6.45 13-Jul 2013 7 981.30 28.00 77.50 7.10
20 6.95 13-Aug 2013 8 980.50 27.90 78.20 7.90
21 6.51 13-Sep 2013 9 985.90 25.40 77.60 6.00
22 8.16 13-Oct 2013 10 992.20 22.10 68.80 5.30
23 5.34 13-Nov 2013 11 994.50 18.70 72.30 6.20
24 6.18 13-Dec 2013 12 997.30 11.70 67.20 5.30
25 5.69 14-Jan 2014 1 996.70 12.70 70.30 6.00
26 6.44 14-Feb 2014 2 993.00 12.10 76.90 6.40
27 4.16 14-Mar 2014 3 991.60 16.50 83.90 7.30
28 4.13 14-Apr 2014 4 987.60 22.60 82.40 6.70
29 3.96 14-May 2014 5 983.60 25.70 78.80 7.70
30 4.72 14-Jun 2014 6 979.20 27.70 81.40 7.90
31 5.21 14-Jul 2014 7 980.70 28.30 80.20 9.40
32 5.29 14-Aug 2014 8 982.40 27.50 81.30 7.50
33 6.74 14-Sep 2014 9 984.70 27.10 77.70 8.50
34 4.80 14-Oct 2014 10 991.20 23.90 73.10 5.90
35 4.31 14-Nov 2014 11 993.30 18.60 79.60 6.20
36 4.35 14-Dec 2014 12 998.70 12.30 67.30 5.90
37 2.95 15-Jan 2015 1 996.70 13.30 76.30 6.20
38 4.63 15-Feb 2015 2 993.50 15.50 78.30 6.50
39 4.00 15-Mar 2015 3 991.70 17.70 83.40 6.30
40 4.16 15-Apr 2015 4 988.40 22.80 70.20 7.30
41 4.67 15-May 2015 5 982.40 26.70 80.50 8.00
42 5.62 15-Jun 2015 6 980.90 28.20 81.00 7.40
43 5.04 15-Jul 2015 7 980.20 27.30 79.40 6.70
44 5.79 15-Aug 2015 8 982.40 27.60 80.10 6.50
45 5.28 15-Sep 2015 9 986.30 26.00 84.60 6.50
46 4.39 15-Oct 2015 10 991.20 23.00 78.30 6.90
47 4.13 15-Nov 2015 11 993.50 19.40 85.30 6.90
48 3.30 15-Dec 2015 12 997.80 13.00 80.90 5.70
49 5.30 16-Jan 2016 1 996.00 11.80 82.30 6.40
50 4.57 16-Feb 2016 2 997.80 12.20 68.90 7.00
51 4.66 16-Mar 2016 3 991.70 17.00 78.90 7.00
52 4.01 16-Apr 2016 4 984.60 23.40 80.90 9.80
53 4.90 16-May 2016 5 983.80 25.50 78.70 8.30
54 3.75 16-Jun 2016 6 981.70 28.20 78.80 7.70
55 3.13 16-Jul 2016 7 981.10 28.90 77.60 7.60
56 3.25 16-Aug 2016 8 979.00 28.00 79.80 8.70
57 2.93 16-Sep 2016 9 984.30 26.60 75.20 6.40
58 2.93 16-Oct 2016 10 987.90 24.40 72.90 7.00
59 3.08 16-Nov 2016 11 993.40 18.10 79.60 6.70
60 2.99 16-Dec 2016 12 995.70 15.40 71.70 6.80
61 3.10 17-Jan 2017 1 994.70 14.50 79.20 6.50
62 3.75 17-Feb 2017 2 994.80 14.70 71.50 8.30
63 3.49 17-Mar 2017 3 990.20 16.50 83.60 8.50
64 3.36 17-Apr 2017 4 986.80 21.90 76.70 7.80
65 3.69 17-May 2017 5 985.00 24.80 77.50 10.00
66 3.76 17-Jun 2017 6 980.20 26.90 84.80 8.50
67 2.69 17-Jul 2017 7 981.00 27.50 83.60 9.80
68 3.05 17-Aug 2017 8 980.50 27.70 83.40 9.00
69 3.05 17-Sep 2017 9 984.20 27.60 81.50 7.10
70 2.46 17-Oct 2017 10 990.00 22.80 75.90 7.90
71 2.08 17-Nov 2017 11 993.00 17.80 79.50 7.00
72 2.32 17-Dec 2017 12 996.90 13.30 69.30 6.90
73 2.53 18-Jan 2018 1 992.10 12.00 78.40 8.10
74 3.29 18-Feb 2018 2 992.90 13.40 68.70 7.20
75 3.03 18-Mar 2018 3 988.30 19.20 78.20 9.10
76 2.30 18-Apr 2018 4 986.50 21.80 77.30 8.70
77 1.75 18-May 2018 5 982.60 26.70 79.40 8.90
78 2.03 18-Jun 2018 6 978.30 26.90 81.60 9.00
79 2.79 18-Jul 2018 7 976.80 27.90 82.10 9.20
80 2.32 18-Aug 2018 8 976.40 27.50 83.40 9.60
81 1.88 18-Sep 2018 9 983.50 26.10 80.10 8.90
82 2.76 18-Oct 2018 10 990.50 21.10 78.70 7.10
83 2.14 18-Nov 2018 11 991.50 18.20 80.30 7.10
84 1.78 18-Dec 2018 12 994.50 13.00 84.00 7.80
85 2.77 19-Jan 2019 1 995.20 11.70 84.50 7.30
86 4.60 19-Feb 2019 2 990.50 13.70 84.80 8.10
87 2.32 19-Mar 2019 3 987.70 17.30 85.90 9.90
88 2.07 19-Apr 2019 4 983.60 23.10 84.80 9.80
89 2.97 19-May 2019 5 981.80 24.30 83.20 7.70
90 2.48 19-Jun 2019 6 977.80 27.50 84.80 9.00
91 2.32 19-Jul 2019 7 977.20 27.80 85.00 8.90
92 2.06 19-Aug 2019 8 977.20 28.30 81.20 10.30
93 2.10 19-Sep 2019 9 984.60 26.40 72.70 8.20
94 2.89 19-Oct 2019 10 989.10 22.70 78.00 7.00
My guess is that when you specify "knots= c(10)", 10 is within the range of temperature but not the same for humidity (if the min>10, then the lag can't be defined).

Count highest value in data frame in R

I have the below data frame (DF). I need to count how many times/years for each station has recorded maximum avg. temp, minumum avg. temp and maximum total precipitation.
In each row of DF above, year is followed by max avg. temp, min avg. temp and total avg. precipitation. For example, if in year 1985 highest max avg. temperature is recorded in station 1, it should count as one and so on.
Any suggestion or help is appreciated.
Thanks.
DF:
St_name Met_data
station1 1985 15.33 4.33 780.1, 1986 12.7 2.18 505.3, 1987 17.76 6.33 793.6, 1988 17.35 4.53 541, 1989 15.65 3.98 793.7, 1990 16.9 5.96 1169.4, 1991 16.42 5.26 790.6, 1992 14.99 5.04 932.6, 1993 13.96 4.75 1420.7, 1994 14.96 3.79 668.8, 1995 15 3.67 952.9, 1996 13.77 2.4 808.5, 1997 14.69 3.26 773.5, 1998 17.22 6.25 1126.4, 1999 16.35 4.32 921.9, 2000 14.55 2.83 893.9, 2001 15.71 4.33 1118.8, 2002 15.61 3.96 1000.4, 2003 14.83 2.84 911.7, 2004 14.9 4 965.1, 2005 16.16 4.7 647.7, 2006 16.18 5.14 800.8, 2007 15.52 4.15 890.3, 2008 14.35 2.91 1271.9, 2009 14.4 3.77 1343.8, 2010 15.32 4.57 1145.4, 2011 15.41 4.54 857.3, 2012 17.39 5.4 745, 2013 15.26 3.51 811.4, 2014 13.8 2.37 986.3
station2 1985 19.27 7.81 1465.5, 1986 20.37 8.81 1201.3, 1987 20.95 8.72 949.2, 1988 20.03 7.53 1104.6, 1989 19.11 7.42 1050.1, 1990 20.53 8.76 1486.2, 1991 20.21 9.53 1164.4, 1992 19.55 8.51 913.6, 1993 18.7 8.24 1485.1, 1994 19.43 8.42 1171.7, 1995 19.62 7.41 1084.9, 1996 19.01 6.29 1212.4, 1997 18.85 6.76 1243.2, 1998 21 8.27 1261.1, 1999 21.28 7.99 1122.4, 2000 19.99 7.74 1242.7, 2001 20.13 7.59 1305.8, 2002 20.13 7.69 1563, 2003 19.48 6.52 1237.1, 2004 19.94 7.42 1174.8, 2005 20.53 8.05 1140.5, 2006 20.16 7.18 1542, 2007 21.44 7.91 1167.8, 2008 17.6 5.51 1653.8, 2009 20.63 9.06 1326, 2010 21.31 8.7 1024.8, 2011 21.21 9.96 1847.6, 2012 22.22 9.39 782.5, 2013 20.46 9.29 770.7
station3 1985 14.43 2.97 951.6, 1986 15.41 3.37 415.6, 1987 15.08 4.34 1110, 1988 16.19 3.33 787.6, 1989 14.77 2.19 796.8, 1990 16.28 4.59 1213.6, 1991 16.72 4.67 907.4, 1992 14.74 4.18 935.6, 1993 15.22 5.06 903.1, 1994 15.46 2.79 907.5, 1995 15.34 4.21 1001.1, 1996 14.46 2.49 1204.5, 1997 14.95 2.95 819, 1998 17.5 5.3 1078.6, 1999 16.73 3.24 901.9, 2000 15.81 2.7 931.4, 2001 16.68 4.09 968.7, 2002 16.48 6.41 762.2, 2003 15.47 4.99 999.6, 2004 15.32 5.31 875.7, 2005 16.16 5.91 593.2, 2006 16.06 6.3 997.2, 2007 15.87 5.71 946, 2008 14.46 4.1 1128.1, 2009 14.26 4.38 1146.1, 2010 15.92 4.79 1037.6, 2011 15.25 5.47 1045.8, 2012 17.47 6.43 659.2, 2013 14.25 4 1092.9, 2014 13.26 2.98 1039.4
.
.
Output:
St_name max_T_count min_T_count precip_count
station1 1 0 0
station2 0 2 0
station3 1 1 1
.
.
You should at least make an effort to organize your data in a spreadsheet before posting. The first four lines in the code below are just for tidying your data. I am also not sure what you want for precip_count, but at least you can work that out based on this solution.
library(tidyverse)
df %>% separate_rows(Met_data, sep = ",") %>%
mutate(Met_data = trimws(Met_data)) %>%
separate(Met_data, sep = " ", into = c("year", "max_avg", "min_avg", "total_avg")) %>%
group_by(year) %>%
mutate(max_T_count = as.integer(max_avg == max(max_avg)),
min_T_count = as.integer(min_avg == min(min_avg)),
precip_count = as.integer(total_avg == max(total_avg))) %>%
ungroup() %>%
group_by(St_name) %>%
summarise_at(vars(ends_with("count")), sum)
%>% is the magrittr package pipe operator.
separate_rows separates the entries of the column at commas Met_data into new rows.
trimws trims the extra whitespaces around characters. This is necessary in order for separate the characters exactly at blanks.
separate separates Met_data at blanks and assigns the separated variables with column names.
group_by specifies by which grouping the aggregation is going to be done.
mutate creates new columns.
summarise_at makes summaries on specified columns with specified functions.
These are a handful. I advise you to read the documentations for each of these by typing ?function where you replace function by each of those mentioned above. Or you can use help like `help("%>%", package = "magrittr").
Here is the output.
# A tibble: 3 x 4
# St_name max_T_count min_T_count precip_count
# <fct> <int> <int> <int>
# 1 station1 1 17 11
# 2 station2 29 0 5
# 3 station3 0 13 14
Here is the data.
df <- structure(list(St_name = structure(1:3, .Label = c("station1",
"station2", "station3"), class = "factor"), Met_data = structure(c(2L,
3L, 1L), .Label = c(" 1985 14.43 2.97 951.6, 1986 15.41 3.37 415.6, 1987 15.08 4.34 1110, 1988 16.19 3.33 787.6, 1989 14.77 2.19 796.8, 1990 16.28 4.59 1213.6, 1991 16.72 4.67 907.4, 1992 14.74 4.18 935.6, 1993 15.22 5.06 903.1, 1994 15.46 2.79 907.5, 1995 15.34 4.21 1001.1, 1996 14.46 2.49 1204.5, 1997 14.95 2.95 819, 1998 17.5 5.3 1078.6, 1999 16.73 3.24 901.9, 2000 15.81 2.7 931.4, 2001 16.68 4.09 968.7, 2002 16.48 6.41 762.2, 2003 15.47 4.99 999.6, 2004 15.32 5.31 875.7, 2005 16.16 5.91 593.2, 2006 16.06 6.3 997.2, 2007 15.87 5.71 946, 2008 14.46 4.1 1128.1, 2009 14.26 4.38 1146.1, 2010 15.92 4.79 1037.6, 2011 15.25 5.47 1045.8, 2012 17.47 6.43 659.2, 2013 14.25 4 1092.9, 2014 13.26 2.98 1039.4",
" 1985 15.33 4.33 780.1, 1986 12.7 2.18 505.3, 1987 17.76 6.33 793.6, 1988 17.35 4.53 541, 1989 15.65 3.98 793.7, 1990 16.9 5.96 1169.4, 1991 16.42 5.26 790.6, 1992 14.99 5.04 932.6, 1993 13.96 4.75 1420.7, 1994 14.96 3.79 668.8, 1995 15 3.67 952.9, 1996 13.77 2.4 808.5, 1997 14.69 3.26 773.5, 1998 17.22 6.25 1126.4, 1999 16.35 4.32 921.9, 2000 14.55 2.83 893.9, 2001 15.71 4.33 1118.8, 2002 15.61 3.96 1000.4, 2003 14.83 2.84 911.7, 2004 14.9 4 965.1, 2005 16.16 4.7 647.7, 2006 16.18 5.14 800.8, 2007 15.52 4.15 890.3, 2008 14.35 2.91 1271.9, 2009 14.4 3.77 1343.8, 2010 15.32 4.57 1145.4, 2011 15.41 4.54 857.3, 2012 17.39 5.4 745, 2013 15.26 3.51 811.4, 2014 13.8 2.37 986.3",
" 1985 19.27 7.81 1465.5, 1986 20.37 8.81 1201.3, 1987 20.95 8.72 949.2, 1988 20.03 7.53 1104.6, 1989 19.11 7.42 1050.1, 1990 20.53 8.76 1486.2, 1991 20.21 9.53 1164.4, 1992 19.55 8.51 913.6, 1993 18.7 8.24 1485.1, 1994 19.43 8.42 1171.7, 1995 19.62 7.41 1084.9, 1996 19.01 6.29 1212.4, 1997 18.85 6.76 1243.2, 1998 21 8.27 1261.1, 1999 21.28 7.99 1122.4, 2000 19.99 7.74 1242.7, 2001 20.13 7.59 1305.8, 2002 20.13 7.69 1563, 2003 19.48 6.52 1237.1, 2004 19.94 7.42 1174.8, 2005 20.53 8.05 1140.5, 2006 20.16 7.18 1542, 2007 21.44 7.91 1167.8, 2008 17.6 5.51 1653.8, 2009 20.63 9.06 1326, 2010 21.31 8.7 1024.8, 2011 21.21 9.96 1847.6, 2012 22.22 9.39 782.5, 2013 20.46 9.29 770.7"
), class = "factor")), .Names = c("St_name", "Met_data"), class = "data.frame", row.names = c(NA,
-3L))

Error in producing the output

I have problem with my code. I can't trace the error. I have coor data (40 by 2 matrix) as below and a rainfall data (14610 by 40 matrix).
No Longitude Latitude
1 100.69 6.34
2 100.77 6.24
3 100.39 6.11
4 100.43 5.53
5 100.39 5.38
6 101.00 5.71
7 101.06 5.30
8 100.80 4.98
9 101.17 4.48
10 102.26 6.11
11 102.22 5.79
12 102.28 5.31
13 102.02 5.38
14 101.97 4.88
15 102.95 5.53
16 103.13 5.32
17 103.06 4.94
18 103.42 4.76
19 103.42 4.23
20 102.38 4.24
21 101.94 4.23
22 103.04 3.92
23 103.36 3.56
24 102.66 3.03
25 103.19 2.89
26 101.35 3.70
27 101.41 3.37
28 101.75 3.16
29 101.39 2.93
30 102.07 3.09
31 102.51 2.72
32 102.26 2.76
33 101.96 2.74
34 102.19 2.36
35 102.49 2.29
36 103.02 2.38
37 103.74 2.26
38 103.97 1.85
39 103.72 1.76
40 103.75 1.47
rainfall= 14610 by 40 matrix;
coor= 40 by 2 matrix
my_prog=function(rainrain,coordinat,misss,distance)
{
rain3<-rainrain # target station i**
# neighboring stations for target station i
a=coordinat # target station i**
diss=as.matrix(distHaversine(a,coor,r=6371))
mmdis=sort(diss,decreasing=F,index.return=T)
mdis=as.matrix(mmdis$x)
mdis1=as.matrix(mmdis$ix)
dist=cbind(mdis,mdis1)
# NA creation
# create missing values in rainfall data
set.seed(100)
b=sample(1:nrow(rain3),(misss*nrow(rain3)),replace=F)
k=replace(rain3,b,NA)
# pick i closest stations
neig=mdis1[distance] # neighbouring selection distance
# target (with NA) and their neighbors
rainB=rainfal00[,neig]
rainA0=rainB[,2:ncol(rainB)]
rainA<-as.matrix(cbind(k,rainA0))
rain2=na.omit(rainA)
x=as.matrix(rain2[,1]) # used to calculate the correlation
n1=ncol(rainA)-1
#1) normal ratio(nr)
jum=as.matrix(apply(rain2,2,mean))
nr0=(jum[1]/jum)
nr=as.matrix(nr0[2:nrow(nr0),])
m01=as.matrix(rainA[is.na(k),])
m1=m01[,2:ncol(m01)]
out1=as.matrix(sapply(seq_len(nrow(m1)),
function(i) sum(nr*m1[i,],na.rm=T)/n1))
print(out1)
}
impute=my_prog(rainrain=rainfall[,1],coordinat=coor[1,],misss=0.05,distance=mdis<200)
I have run this code and and the output obtained is:
Error in my_prog(rainrain = rainfal00[, 1], misss = 0.05, coordinat = coor[1, :
object 'mdis' not found
I have checked the program, but cannot trace the problem. I would really appreciate if someone could help me.

Write a dataframe formatted to a csv sheet

I am having a dataframe which looks like that:
> (eventStudyList120_After)
Dates Company Returns Market Returns Abnormal Returns
1 25.08.2009 4.81 0.62595516 4.184045
2 26.08.2009 4.85 0.89132960 3.958670
3 27.08.2009 4.81 -0.93323011 5.743230
4 28.08.2009 4.89 1.00388875 3.886111
5 31.08.2009 4.73 2.50655343 2.223447
6 01.09.2009 4.61 0.28025201 4.329748
7 02.09.2009 4.77 0.04999239 4.720008
8 03.09.2009 4.69 -1.52822071 6.218221
9 04.09.2009 4.89 -1.48860354 6.378604
10 07.09.2009 4.85 -0.38646531 5.236465
11 08.09.2009 4.89 -1.54065680 6.430657
12 09.09.2009 5.01 -0.35443455 5.364435
13 10.09.2009 5.01 -0.54107231 5.551072
14 11.09.2009 4.89 0.15189458 4.738105
15 14.09.2009 4.93 -0.36811321 5.298113
16 15.09.2009 4.93 -1.31185921 6.241859
17 16.09.2009 4.93 -0.53398643 5.463986
18 17.09.2009 4.97 0.44765285 4.522347
19 18.09.2009 5.01 0.81109101 4.198909
20 21.09.2009 5.01 -0.76254262 5.772543
21 22.09.2009 4.93 0.11309704 4.816903
22 23.09.2009 4.93 1.64429117 3.285709
23 24.09.2009 4.93 0.37294212 4.557058
24 25.09.2009 4.93 -2.59894035 7.528940
25 28.09.2009 5.21 0.29588776 4.914112
26 29.09.2009 4.93 0.49762314 4.432377
27 30.09.2009 5.41 2.17220569 3.237794
28 01.10.2009 5.21 1.67482716 3.535173
29 02.10.2009 5.25 -0.79014302 6.040143
30 05.10.2009 4.97 -2.69996146 7.669961
31 06.10.2009 4.97 0.18086490 4.789135
32 07.10.2009 5.21 -1.39072582 6.600726
33 08.10.2009 5.05 0.04210020 5.007900
34 09.10.2009 5.37 -1.14940251 6.519403
35 12.10.2009 5.13 1.16479551 3.965204
36 13.10.2009 5.37 -2.24208216 7.612082
37 14.10.2009 5.13 0.41327193 4.716728
38 15.10.2009 5.21 1.54473332 3.665267
39 16.10.2009 5.13 -1.73781565 6.867816
40 19.10.2009 5.01 0.66416288 4.345837
41 20.10.2009 5.09 -0.27007314 5.360073
42 21.10.2009 5.13 1.26968917 3.860311
43 22.10.2009 5.01 0.29432965 4.715670
44 23.10.2009 5.01 1.73758937 3.272411
45 26.10.2009 5.21 0.38854011 4.821460
46 27.10.2009 5.21 2.72671890 2.483281
47 28.10.2009 5.21 -1.76846884 6.978469
48 29.10.2009 5.41 2.95523593 2.454764
49 30.10.2009 5.37 -0.22681024 5.596810
50 02.11.2009 5.33 1.38835160 3.941648
51 03.11.2009 5.33 -1.83751398 7.167514
52 04.11.2009 5.21 -0.68721323 5.897213
53 05.11.2009 5.21 -0.26954741 5.479547
54 06.11.2009 5.21 -2.24083342 7.450833
55 09.11.2009 5.17 0.39168239 4.778318
56 10.11.2009 5.09 -0.99082271 6.080823
57 11.11.2009 5.17 0.07924735 5.090753
58 12.11.2009 5.81 -0.34424802 6.154248
59 13.11.2009 6.21 -2.00230195 8.212302
60 16.11.2009 7.81 0.48655978 7.323440
61 17.11.2009 7.69 -0.21092848 7.900928
62 18.11.2009 7.61 1.55605852 6.053941
63 19.11.2009 7.21 0.71028798 6.499712
64 20.11.2009 7.01 -2.38596631 9.395966
65 23.11.2009 7.25 0.55334705 6.696653
66 24.11.2009 7.21 -0.54239847 7.752398
67 25.11.2009 7.25 3.36386413 3.886136
68 26.11.2009 7.01 -1.28927630 8.299276
69 27.11.2009 7.09 0.98053264 6.109467
70 30.11.2009 7.09 -2.61935612 9.709356
71 01.12.2009 7.01 -0.11946242 7.129462
72 02.12.2009 7.21 0.17152317 7.038477
73 03.12.2009 7.21 -0.79343095 8.003431
74 04.12.2009 7.05 0.43919792 6.610802
75 07.12.2009 7.01 1.62169804 5.388302
76 08.12.2009 7.01 0.74055990 6.269440
77 09.12.2009 7.05 -0.99504492 8.045045
78 10.12.2009 7.21 -0.79728245 8.007282
79 11.12.2009 7.21 -0.73784636 7.947846
80 14.12.2009 6.97 -0.14656077 7.116561
81 15.12.2009 6.89 -1.42712116 8.317121
82 16.12.2009 6.97 0.95988962 6.010110
83 17.12.2009 6.69 0.22718293 6.462817
84 18.12.2009 6.53 -1.46958638 7.999586
85 21.12.2009 6.33 -0.21365446 6.543654
86 22.12.2009 6.65 -0.17256757 6.822568
87 23.12.2009 7.05 -0.59940253 7.649403
88 24.12.2009 7.05 NA NA
89 25.12.2009 7.05 NA NA
90 28.12.2009 7.05 -0.22307263 7.273073
91 29.12.2009 6.81 0.76736750 6.042632
92 30.12.2009 6.81 0.00000000 6.810000
93 31.12.2009 6.81 -1.50965723 8.319657
94 01.01.2010 6.81 NA NA
95 04.01.2010 6.65 0.06111069 6.588889
96 05.01.2010 6.65 -0.13159651 6.781597
97 06.01.2010 6.65 0.09545081 6.554549
98 07.01.2010 6.49 -0.32727619 6.817276
99 08.01.2010 6.81 -0.07225296 6.882253
100 11.01.2010 6.81 1.61131397 5.198686
101 12.01.2010 6.57 -0.40791980 6.977920
102 13.01.2010 6.85 -0.53016383 7.380164
103 14.01.2010 6.93 1.82016604 5.109834
104 15.01.2010 6.97 -0.62552046 7.595520
105 18.01.2010 6.93 -0.80490241 7.734902
106 19.01.2010 6.77 2.02857647 4.741424
107 20.01.2010 6.93 1.68204556 5.247954
108 21.01.2010 6.89 1.02683875 5.863161
109 22.01.2010 6.90 0.96765669 5.932343
110 25.01.2010 6.73 -0.57603687 7.306037
111 26.01.2010 6.81 0.50990350 6.300096
112 27.01.2010 6.81 1.64994011 5.160060
113 28.01.2010 6.61 -1.13511086 7.745111
114 29.01.2010 6.53 -0.82206204 7.352062
115 01.02.2010 7.03 -1.03993428 8.069934
116 02.02.2010 6.93 0.61692305 6.313077
117 03.02.2010 7.73 2.53012795 5.199872
118 04.02.2010 7.97 1.96223075 6.007769
119 05.02.2010 9.33 -0.76549820 10.095498
120 08.02.2010 8.01 -0.34391479 8.353915
When I write it to a csv sheet it looks like that:
write.table(eventStudyList120_After$`Abnormal Returns`, file = "C://Users//AbnormalReturns.csv", sep = ";")
In fact I want to let it look like that:
So my question is:
How to write the data frame as it is into a csv and how to transpose the Abnormal return column and put the header as in the example sheet?
Two approaches: transpose the data in R or in Excel
In R
Add an index column, select the columns you want and transpose the data using the function t
d <- anscombe
d$index <- 1:nrow(anscombe)
td <- t(d[c("index", "x1")])
write.table(td, "filename.csv", col.names = F, sep = ";")
Result:
"index";1;2;3;4;5;6;7;8;9;10;11
"x1";10;8;13;9;11;14;6;4;12;7;5
In Excel
Excel allows you to transpose data as well: http://office.microsoft.com/en-us/excel-help/switch-transpose-columns-and-rows-HP010224502.aspx

cor() giving a missing value error on small matrix but not large matrix

I'm trying to use cor() to return the most correlated elements in order of their correlation. I wrote this function adapting cor() to do it and it works perfectly, but only when I run it on a big input. When I try and run it on a small input, I get a missing value where TRUE/FALSE needed error and I don't understand why?
Here is an example of my input data:
This can be directly copied into R(printed via write.table):
"Col2" "Col3" "Col4" "Col5" "Col6"
"Market Capitalization" NA NA 17082.69 17879.8 16266.11
"Cash & Equivalents" NA NA 747 132 394
"Preferred & Other" NA NA 0 0 0
"Total Debt" NA NA 12379 11982 11309
"Enterprise Value" NA NA 28714.69 29729.8 27181.11
"Total Revenue" 2896.75 3461.25 2818 3184 2901
"Growth % YoY" -0.15 0.68 1.7 3.44 -0.48
"Gross Profit" NA NA 1874 2080 1981
"Margin %" NA NA 66.5 65.33 68.29
"EBITDA" 758 1074 641 777 699
"Margin %1" 26.17 31.03 22.75 24.4 24.1
"Net Income Before XO" 214.5 410 172 192 207
"Margin %2" 7.4 11.85 6.1 6.03 7.14
"Adjusted EPS" 0.7 1.42 0.59 1.07 0.69
"Growth % YoY1" 0.72 -1.67 -3.28 5.94 -6.76
"Cash from Operations" 375.79 812.21 991 -84 961
"Capital Expenditures" NA NA -660 -676 -608
"Free Cash Flow" NA NA 331 -760 353
"Adjusted Price" 2094.66 3689.2 3805.62 3588.42 3582.4
This is the mycor() function I wrote
mycor<-function(dataset, relative.to=19, neg.cor=0){
#This takes the dataset (as a matrix) and computes the best correleted value
#and returns the row (variable ID) that is the most strongly correlated
#to the variable row referenced by relative.to. Use neg.cor = 1 for neg correlation
if(neg.cor == 0){
best.cor <- -1.0 #Have to get better correlation then this
best.cor.row <- integer() #The row with the best correlation
all.cor <- numeric() #The correlation for everything else
index <- 1 #The index for the all.cor array
for(i in 1:nrow(dataset)){
if(i != relative.to){ #No self correlation
temp.cor <- cor(dataset[i,], dataset[relative.to,], use = "na.or.complete")
all.cor[index] <- temp.cor
index <- index+1 #I wish the ++ opperator worked in R...
cat(best.cor)
pause()
if(temp.cor > best.cor){ #This remembers the best seen cor value
best.cor <- temp.cor
best.cor.row <- i
} #End inner if
} #End outter if
} #End for loop
}else{
best.cor <- 1.0 #Have to get better correlation then this
best.cor.row <- integer() #The row with the best correlation
all.cor <- numeric() #The correlation for everything else
index <- 1 #The index for the all.cor array
for(i in 1:nrow(dataset)){
if(i != relative.to){ #No self correlation
temp.cor <- cor(dataset[i,], dataset[relative.to,], use = "na.or.complete")
all.cor[index] <- temp.cor
index <- index+1 #I wish the ++ opperator worked in R...
if(temp.cor < best.cor){ #This remembers the worst seen cor value
best.cor <- temp.cor
best.cor.row <- i
} #End inner if
} #End outter if
} #End for loop
} #End else
return(list(all.cor = all.cor, best.cor.row = best.cor.row))
)
When I try and run this I get: Error in if (temp.cor > best.cor) { : missing value where TRUE/FALSE needed. The part about this that is strange, is that the mycor function works perfectly and gives no error when I give it a larger chunk of the same data set.
This is the larger chunk of the same data set.
This can also be copied into R(printed via write.table):
"Col2" "Col3" "Col4" "Col5" "Col6" "Col7" "Col8" "Col9" "Col10" "Col11" "Col12" "Col13" "Col14" "Col15" "Col16" "Col17" "Col18" "Col19" "Col20" "Col21" "Col22" "Col23" "Col24" "Col25" "Col26" "Col27" "Col28" "Col29" "Col30" "Col31" "Col32" "Col33" "Col34" "Col35" "Col36" "Col37" "Col38" "Col39" "Col40" "Col41" "Col42" "Col43" "Col44" "Col45" "Col46" "Col47" "Col48" "Col49" "Col50" "Col51" "Col52" "Col53" "Col54" "Col55" "Col56" "Col57" "Col58" "Col59" "Col60" "Col61" "Col62" "Col63" "Col64" "Col65" "Col66" "Col67" "Col68" "Col69" "Col70" "Col71" "Col72" "Col73" "Col74" "Col75" "Col76" "Col77" "Col78" "Col79" "Col80" "Col81" "Col82" "Col83" "Col84" "Col85" "Col86" "Col87" "Col88" "Col89" "Col90" "Col91" "Col92" "Col93" "Col94" "Col95" "Col96" "Col97" "Col98" "Col99" "Col100" "Col101" "Col102" "Col103" "Col104" "Col105" "Col106" "Col107" "Col108" "Col109" "Col110" "Col111"
"Market Capitalization" NA NA 17082.69 17879.8 16266.11 17540.1 18214.39 17110.13 18167.87 16700.24 15592.71 14824.06 14455.42 13685.56 12168.31 12550.1 12771.45 11273.2 10284.48 10863.21 10655.99 11750.74 10671.37 10818.32 13288.42 12558.8 12221.79 13213.51 12375.92 11854.12 10942.65 10689.79 11364.1 11887.9 11426.1 10249.34 10609.99 10167.51 9600.1 10001.68 9713.38 9184.3 9730.33 8249.64 9160.61 8586.38 8894.55 8908.81 11887.9 11426.1 10249.34 10609.99 10167.51 9600.1 10001.68 9713.38 9184.3 9730.33 8249.64 9160.61 8586.38 8894.55 8908.81 8566.69 8641.04 8444.84 7867.83 8163.04 7238.2 6279.55 6173.33 7376.47 9048.75 10095.35 10351.52 12311.04 12006.02 10785.58 11009.16 9655.09 7990.1 6918.52 7050.24 6844.2 6520.75 6873.11 7489.61 7459.85 7136.58 6930.38 6401.43 6048.8 5843.01 6224.43 6840.76 7529.23 8452.46 8247.48 8132.72 7632.03 7339.11 6549.2 6165.26 6535.8 5793.52 5621.57 5877.31 5391.98 4792.51 5362.35
"Cash & Equivalents" NA NA 747 132 394 69 1381 769 648 398 492 516 338 198 178 87 260 75 311 651 74 68 1757 144 210 192 186 157 94 234 63 177 81 119 818 477 26 70 487 55 49 49 60 62 117.86 83.4 59.2 108.34 119 818 477 26 70 487 55 49 49 60 62 117.86 83.4 59.2 108.34 271.35 432.14 41.63 59.57 94.83 72.81 37.66 73.6 485.05 188.94 291.14 57.5 102.29 153.82 105.01 198.26 183.46 269.87 12.23 94.9 106.88 117.28 57.37 103.23 342.29 429.89 48.49 111.39 245.22 360.74 80.65 205.1 36.76 203.96 143.32 74.33 282.45 349.66 384.84 238.24 317.86 315.65 291.01 185.21 353.33 160.33 160.31
"Preferred & Other" NA NA 0 0 0 0 0 0 213 213 213 213 213 213 213 213 213 213 213 213 213 213 213 257 256 255 255 254 254 254 255 255 255 254 255 255 252 252 253 254 255 221 222 221 221.47 221.13 221.2 220.79 254 255 255 252 252 253 254 255 221 222 221 221.47 221.13 221.2 220.79 222.09 212.56 249.61 212.56 249.61 212.56 212.56 212.56 249.61 212.56 212.56 212.56 249.61 318.02 318.02 318.02 318.02 322.34 322.42 322.54 322.65 322.74 322.77 322.84 639.92 639.98 640.13 640.24 640.31 640.39 640.47 640.54 640.73 640.89 640.95 641.09 641.25 645.87 634.99 635.05 635.18 637.51 637.73 638.05 638.15 640.53 640.77
"Total Debt" NA NA 12379 11982 11309 11111 11873 11073 10675 10676 10678 11144 10683 11526 11020 11027 10599 10773 10366 10699 10094 9751 9480 9363 9282 9213 8653 8943 8815 8968 8487 8162 8205 7687 7868 7498 7219 7245 7336 7432 7094 6968 6682 7000 6841.23 6584.25 6374.14 6264.74 7687 7868 7498 7219 7245 7336 7432 7094 6968 6682 7000 6841.23 6584.25 6374.14 6264.74 6234.03 6249.6 6448.51 6100.6 6011.55 5693.56 5536.13 5276.01 5449.52 4792.08 4881.68 4471.08 4312.4 4410.61 4480.08 4437.33 4758.17 4432.04 4532.28 4466.59 4387.54 4313.86 4316.43 4316.66 4146.02 4175.36 4082.33 4085.09 4089.16 4116.98 3970.11 3972.46 3827.89 3850.12 3927.94 3722.68 3709.36 3804.58 3658.69 3885.52 3667.45 3734.29 3737 3615.16 3492.38 3374.62 3229.81
"Enterprise Value" NA NA 28714.69 29729.8 27181.11 28582.1 28706.39 27414.13 28407.87 27191.24 25991.71 25665.06 25013.42 25226.56 23223.31 23703.1 23323.45 22184.2 20552.48 21124.21 20888.99 21646.74 18607.37 20294.32 22616.42 21834.8 20943.79 22253.51 21350.92 20842.12 19621.65 18929.79 19743.1 19709.9 18731.1 17525.34 18054.99 17594.51 16702.1 17632.68 17013.38 16324.3 16574.33 15408.64 16105.45 15308.35 15430.68 15286 19709.9 18731.1 17525.34 18054.99 17594.51 16702.1 17632.68 17013.38 16324.3 16574.33 15408.64 16105.45 15308.35 15430.68 15286 14751.46 14671.06 15101.34 14121.44 14329.37 13071.51 11990.59 11588.31 12590.55 13864.46 14898.46 14977.66 16770.77 16580.82 15478.67 15566.25 14547.82 12474.62 11760.98 11744.46 11447.51 11040.07 11454.93 12025.88 11903.5 11522.02 11604.35 11015.38 10533.05 10239.65 10754.35 11248.66 11961.09 12739.51 12673.05 12422.15 11700.18 11439.9 10458.04 10447.58 10520.58 9849.67 9705.29 9945.31 9169.17 8647.34 9072.61
"Total Revenue" 2896.75 3461.25 2818 3184 2901 3438 2771 3078 2915 3629 2993 3349 3140 3707 3017 3462 3273 3489 2845 3423 2998 3858 3149 3577 3228 3579 2957 3357 2649 3441 2555 3317 3107 3337 2395 2800 2181 2734 2164 2685 2279 2801 2176 2570 2057.03 2539.49 1848 2056 3337 2395 2800 2181 2734 2164 2685 2279 2801 2176 2570 2057.03 2539.49 1848 2056 1942.6 2627.56 2112.22 2886.26 2250.13 2820.78 2041.89 2318.59 1963.38 2346.24 1479.08 1776.59 1617.34 2061.62 1561.04 1853.05 1720.06 2011.03 1504.01 1886.15 1632.3 1920.34 1539.73 1867.36 1528.38 1879.88 1459.85 1668.79 1461.25 1821.99 1392.09 1697.76 1483.61 1799.69 1396.01 1586.08 1478.81 1717.88 1280.11 1456.11 1342.73 1720.3 1330.65 1479.39 1367.21 1613.83 1263.27
"Growth % YoY" -0.15 0.68 1.7 3.44 -0.48 -5.26 -7.42 -8.09 -7.17 -2.1 -0.8 -3.26 -4.06 6.25 6.05 1.14 9.17 -9.56 -9.65 -4.31 -7.13 7.8 6.49 6.55 21.86 4.01 15.73 1.21 -14.74 3.12 6.68 18.46 42.46 22.06 10.67 4.28 -4.3 -2.39 -0.55 4.47 10.79 10.3 17.75 25 5.89 -3.35 -12.51 -28.77 22.06 10.67 4.28 -4.3 -2.39 -0.55 4.47 10.79 10.3 17.75 25 5.89 -3.35 -12.51 -28.77 -13.67 -6.85 3.44 24.48 14.6 20.23 38.05 30.51 21.4 13.81 -5.25 -4.13 -5.97 2.52 3.79 -1.75 5.38 4.72 -2.32 1.01 6.8 2.15 5.47 11.9 4.59 3.18 4.87 -1.71 -1.51 1.24 -0.28 7.04 0.32 4.76 9.05 8.93 10.13 -0.14 -3.8 -1.57 -1.79 6.6 5.33 -1.02 NA NA NA
"Gross Profit" NA NA 1874 2080 1981 2393 1934 1993 1846 2244 1794 2000 1942 2103 1723 1826 1700 1979 1558 1551 1459 1531 1420 1588 1478 1595 1317 1506 1273 1554 1202 1322 1179 1460 1097 1217 916 1285 980 1169 1066 1349 975 1157 1024.93 1317.57 980 1091 1460 1097 1217 916 1285 980 1169 1066 1349 975 1157 1024.93 1317.57 980 1091 1052.71 1368.8 1091.61 1236.41 991.8 1374.86 1043.29 1236.87 1129.87 1507.31 998.19 1190.69 1151.22 1475.08 1025.84 1170.8 1115.9 1438.56 981.96 1159.37 1094.25 1401.25 1001.2 1198.64 1079.65 1405.45 984.46 1196.22 1086.13 1415.37 998.06 1177.1 1086.53 1381.01 971.41 1118.91 1055.19 1331.37 947.22 1036.88 991.58 1301.1 921.48 994.97 967.89 1217.32 848.39
"Margin %" NA NA 66.5 65.33 68.29 69.6 69.79 64.75 63.33 61.84 59.94 59.72 61.85 56.73 57.11 52.74 51.94 56.72 54.76 45.31 48.67 39.68 45.09 44.39 45.79 44.57 44.54 44.86 48.06 45.16 47.05 39.86 37.95 43.75 45.8 43.46 42 47 45.29 43.54 46.77 48.16 44.81 45.02 49.83 51.88 53.03 53.06 43.75 45.8 43.46 42 47 45.29 43.54 46.77 48.16 44.81 45.02 49.83 51.88 53.03 53.06 54.19 52.09 51.68 42.84 44.08 48.74 51.09 53.35 57.55 64.24 67.49 67.02 71.18 71.55 65.72 63.18 64.88 71.53 65.29 61.47 67.04 72.97 65.02 64.19 70.64 74.76 67.44 71.68 74.33 77.68 71.7 69.33 73.24 76.74 69.58 70.55 71.35 77.5 74 71.21 73.85 75.63 69.25 67.26 70.79 75.43 67.16
"EBITDA" 758 1074 641 777 699 1091 711 794 684 978 617 844 708 916 640 696 625 885 569 611 567 586 520 702 596 715 510 694 547 670 467 564 423 717 411 533 274 624 367 497 458 669 334 485 388.44 693.3 384 487 717 411 533 274 624 367 497 458 669 334 485 388.44 693.3 384 487 445 695.27 439.32 538.75 377.16 666.39 492.65 526.86 446.87 748.34 331.51 492.91 430.87 760.5 313.33 474.78 434.79 751.92 280.96 463.41 390.79 712.97 313.14 490.27 368.26 711.24 307.36 506.85 383.64 721.41 317.3 474.34 363.04 678.27 279.09 400.41 320.03 637.82 281.47 340.21 297.39 610.07 247.48 300.27 305.15 561.67 203.06
"Margin %1" 26.17 31.03 22.75 24.4 24.1 31.73 25.66 25.8 23.46 26.95 20.61 25.2 22.55 24.71 21.21 20.1 19.1 25.37 20 17.85 18.91 15.19 16.51 19.63 18.46 19.98 17.25 20.67 20.65 19.47 18.28 17 13.61 21.49 17.16 19.04 12.56 22.82 16.96 18.51 20.1 23.88 15.35 18.87 18.88 27.3 20.78 23.69 21.49 17.16 19.04 12.56 22.82 16.96 18.51 20.1 23.88 15.35 18.87 18.88 27.3 20.78 23.69 22.91 26.46 20.8 18.67 16.76 23.62 24.13 22.72 22.76 31.9 22.41 27.74 26.64 36.89 20.07 25.62 25.28 37.39 18.68 24.57 23.94 37.13 20.34 26.25 24.09 37.83 21.05 30.37 26.25 39.59 22.79 27.94 24.47 37.69 19.99 25.25 21.64 37.13 21.99 23.36 22.15 35.46 18.6 20.3 22.32 34.8 16.07
"Net Income Before XO" 214.5 410 172 192 207 440 214 280 193 386 168 314 236 353 186 229 205 339 153 183 163 185 283 303 209 313 154 261 205 234 129 183 148 290 121 184 55 253 92 158 50 260 69 157 123.03 286.54 101 169 290 121 184 55 253 92 158 50 260 69 157 123.03 286.54 101 169 128.51 280.74 104.07 182.51 49.48 283.27 72.14 191.53 124.96 339.41 69.8 180.05 135.23 351.55 66.51 176.45 143.61 355.04 47.56 166.61 120.15 327.99 71.42 188.48 113.12 333.3 76.4 201.03 117.88 339.87 87.21 189.31 117.29 324.84 62.45 153.94 100.63 309.44 77.54 116.48 92.2 303.36 64.65 106.7 121.1 263.26 49.06
"Margin %2" 7.4 11.85 6.1 6.03 7.14 12.8 7.72 9.1 6.62 10.64 5.61 9.38 7.52 9.52 6.17 6.61 6.26 9.72 5.38 5.35 5.44 4.8 8.99 8.47 6.47 8.75 5.21 7.77 7.74 6.8 5.05 5.52 4.76 8.69 5.05 6.57 2.52 9.25 4.25 5.88 2.19 9.28 3.17 6.11 5.98 11.28 5.47 8.22 8.69 5.05 6.57 2.52 9.25 4.25 5.88 2.19 9.28 3.17 6.11 5.98 11.28 5.47 8.22 6.62 10.68 4.93 6.32 2.2 10.04 3.53 8.26 6.36 14.47 4.72 10.13 8.36 17.05 4.26 9.52 8.35 17.65 3.16 8.83 7.36 17.08 4.64 10.09 7.4 17.73 5.23 12.05 8.07 18.65 6.26 11.15 7.91 18.05 4.47 9.71 6.8 18.01 6.06 8 6.87 17.63 4.86 7.21 8.86 16.31 3.88
"Adjusted EPS" 0.7 1.42 0.59 1.07 0.69 1.44 0.61 1.01 0.74 1.33 0.57 0.99 0.69 1.32 0.51 0.93 0.67 1.16 0.48 0.78 0.72 0.98 0.42 0.87 0.71 1.2 0.58 1.03 0.78 0.92 0.51 0.86 0.59 1.17 0.48 0.75 0.49 1.08 0.38 0.69 0.65 1.16 0.29 0.72 0.56 1.33 0.46 0.78 1.17 0.48 0.75 0.49 1.08 0.38 0.69 0.65 1.16 0.29 0.72 0.56 1.33 0.46 0.78 0.59 1.3 0.48 0.84 0.52 1.4 0.33 0.88 0.57 1.5 0.3 0.76 0.56 1.49 0.26 0.73 0.59 1.49 0.18 0.69 0.49 1.38 0.28 0.78 0.44 1.38 0.29 0.82 0.47 1.41 0.33 0.77 0.46 1.35 0.23 0.62 0.39 1.3 0.3 0.47 0.36 1.29 0.24 0.43 0.49 1.11 0.18
"Growth % YoY1" 0.72 -1.67 -3.28 5.94 -6.76 8.27 7.02 2.02 7.25 0.76 11.76 6.45 2.99 13.79 6.25 19.23 -6.94 18.37 14.29 -10.34 1.41 -18.33 -27.59 -15.53 -8.97 30.43 13.73 19.77 32.2 -21.37 6.25 14.67 20.41 8.33 26.32 8.7 -24.62 -6.9 31.03 -4.17 16.07 -12.78 -36.96 -7.69 -5.08 2.31 -4.17 -7.14 8.33 26.32 8.7 -24.62 -6.9 31.03 -4.17 16.07 -12.78 -36.96 -7.69 -5.08 2.31 -4.17 -7.14 13.46 -7.14 45.45 -4.55 -8.77 -6.67 10 15.79 1.79 0.67 13.64 4.11 -5.08 -0.07 44.44 5.89 20.41 8.05 -34.72 -11.62 11.36 0 -3.45 -4.88 -6.38 -2.13 -12.12 6.49 2.17 4.44 43.48 24.19 17.95 3.85 -23.33 31.91 8.33 0.78 25 9.3 -26.53 16.22 33.33 -23.21 NA NA NA
"Cash from Operations" 375.79 812.21 991 -84 961 391 845 402 976 572 1227 362 1407 179 794 1 997 26 798 645 581 -1237 733 563 630 109 346 481 710 -162 224 593 177 581 -346 389 525 164 490 152 766 218 492 -58 735.49 285 369 146 581 -346 389 525 164 490 152 766 218 492 -58 735.49 285 369 146 490.18 387.73 254.59 141.41 215.82 279.84 489.5 199.17 -325.31 -66.66 280.22 256.65 718.82 438.66 302.05 244.37 -52.38 647.78 53.19 258.9 294.29 359.1 267.8 184.51 310.07 585.52 233.75 145.31 426.63 480.57 187.86 270.34 236.08 472.92 243.13 69.8 261.19 291.41 285.57 77.33 283.64 328.4 309.68 11.95 357.21 141.59 357.15
"Capital Expenditures" NA NA -660 -676 -608 -478 -635 -523 -542 -503 -629 -460 -599 -548 -551 -465 -719 -531 -595 -529 -785 -584 -608 -547 -638 -519 -485 -482 -583 -480 -537 -420 -619 -385 -426 -390 -431 -439 -308 -373 -448 -356 -404 -317 -593.69 -310 -392 -340 -385 -426 -390 -431 -439 -308 -373 -448 -356 -404 -317 -593.69 -310 -392 -340 -302.22 -394.08 -274.8 -228.02 -75.57 -274.36 -684.94 -207.41 -211.95 -218.98 -157.07 -127.56 -210.59 -156.81 -150.58 -127.3 -226.32 -145.55 -171.37 -140.37 -244.12 -167.92 -185.35 -142.94 -239.55 -165.98 -166.25 -147.38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
"Free Cash Flow" NA NA 331 -760 353 -87 210 -121 434 69 598 -98 808 -369 243 -464 278 -505 203 116 -204 -1821 125 16 -8 -410 -139 -1 127 -642 -313 173 -442 196 -772 -1 94 -275 182 -221 318 -138 88 -375 141.79 -25 -23 -194 196 -772 -1 94 -275 182 -221 318 -138 88 -375 141.79 -25 -23 -194 187.96 -6.35 -20.21 -86.61 140.26 5.47 -195.45 -8.24 -537.26 -285.64 123.15 129.09 508.23 281.85 151.46 117.07 -278.7 502.23 -118.18 118.53 50.17 191.18 82.45 41.57 70.51 419.54 67.49 -2.08 426.63 480.57 187.86 270.34 236.08 472.92 243.13 69.8 261.19 291.41 285.57 77.33 283.64 328.4 309.68 11.95 357.21 141.59 357.15
"Adjusted Price" 2094.66 3689.2 3805.62 3588.42 3582.4 3885.75 3523.13 3554.9 3420.27 3141.36 2984.19 2838.81 2760.09 2517.44 2447.56 2403.89 2188.98 1960.8 1952.2 2033.87 2099.97 1993.98 2043.36 2296.42 2201.73 2277.15 2301.5 2203.47 2086.87 1938.95 2019.34 2002.47 2048.12 1881.97 1817.17 1807.02 1664.57 1659.78 1717.25 1585.27 1589.9 1506.13 1534.98 1531.24 1498.21 1528.96 1418.46 1431.1 1343.43 1244.04 1194.62 1076.93 1058.66 960.76 1112.69 1322.69 1414.59 1442.28 1545.6 1364.27 1305.46 1231.15 1022.23 869.37 796.9 820.22 762.84 715.9 756.11 816.37 731.97 705.73 657.84 628.55 571.47 624.67 651.89 676.63 759.77 742.27 734.39 657.44 619.61 569.84 524.2 510.26 475.43 449.8 441.27 409.34 383 413.34 441.72 435.71 419.07 385.87 356.85 346.15 326.97 318.45 323.72 314.18 313.22 300.88 329.3 315.1 312.34 279.11 163.47 NA
The larger chunk works perfectly, but I need to be able to check the correlation on the smaller sections. I'm really new to R so it might be easy, but I've read the boards here and the r manuals and can't find it.
In your example above, your code fails on the first (smaller) data set because row 3 consists only of 0's and NA's, so it has a standard deviation of 0 and so its correlation with any other row will return NA, since computing correlation involves dividing the sample covariance by the sample standard deviation of each vector. It doesn't happen in the larget example because row 3 has sufficient variation to have a non-zero standard deviation.
However, your approach seems a bit convoluted. If you want to compute the correlation between a single row in the matrix and all other rows, sorted by correlation, then you can use cor() on the transposed matrix and sort the result, for example:
mycor <- function(dataset, relative.to=19) {
mat <- t(dataset)
cors <- cor(mat, mat[, relative.to], use="na.or.complete")
cors[order(drop(cors)), ]
}
mycor(dataset)

Resources