Minimizing weighted sum of matrix while ensuring distribution of outcomes in R

Minimizing weighted sum of matrix while ensuring distribution of outcomes in R - r

I am struggeling with an optimization problem involving a simple matrix operation. The task is the following: I have a sqare matrix D containing "damage multipliers" stemming from a prodcuction reduction in producing countries (columns) and felt by "receiving" countries (rows).
AUT BEL BGR CYP CZE DEU DNK ESP EST FIN FRA GBR GRC HRV HUN IRL ITA LTU LUX LVA MLT NLD POL PRT ROU SVK SVN SWE
AUT 1.48 0.15 0.18 0.08 0.19 0.22 0.01 0.01 0.02 0.02 0.05 0.01 0.01 0.02 0.14 0.00 0.02 0.03 0.02 0.02 0.00 0.04 0.10 0.09 0.11 0.16 0.17 0.11
BEL 0.03 2.70 0.34 0.09 0.05 0.03 0.02 0.01 0.04 0.09 0.09 0.02 0.01 0.01 0.03 0.01 0.01 0.03 0.08 0.02 0.00 0.04 0.03 0.37 0.09 0.07 0.15 0.29
BGR 0.01 0.02 9.81 0.09 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.06 0.00 0.01 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.02 0.12 0.01 0.00 0.01
CYP 0.00 0.01 0.01 9.87 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
CZE 0.19 0.11 0.08 0.07 4.14 0.27 0.01 0.00 0.01 0.01 0.03 0.01 0.00 0.00 0.05 0.00 0.03 0.05 0.01 0.01 0.00 0.02 0.32 0.07 0.03 2.57 0.05 0.05
DEU 0.29 2.54 0.27 0.15 0.19 1.71 0.10 0.04 0.06 0.22 0.22 0.09 0.03 0.02 0.11 0.03 0.08 0.12 0.08 0.07 0.00 0.28 0.28 0.55 0.25 0.26 0.11 1.09
DNK 0.01 0.09 0.02 0.09 0.01 0.14 3.43 0.00 0.02 0.12 0.02 0.02 0.00 0.00 0.01 0.00 0.01 0.02 0.01 0.02 0.00 0.01 0.03 0.05 0.01 0.01 0.01 1.39
ESP 0.02 0.26 0.06 0.05 0.02 0.03 0.02 2.72 0.45 0.04 0.22 0.05 0.04 0.01 0.01 0.05 0.06 0.02 0.01 0.01 0.00 0.02 0.03 1.28 0.05 0.02 0.01 0.32
EST 0.00 0.01 0.00 0.03 0.00 0.00 0.00 0.00 5.03 0.17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05
FIN 0.01 0.09 0.02 0.03 0.01 0.01 0.06 0.00 0.21 5.48 0.01 0.01 0.00 0.00 0.00 0.01 0.00 0.02 0.01 0.02 0.00 0.01 0.02 0.05 0.01 0.01 0.00 1.99
FRA 0.04 0.89 0.11 0.13 0.03 0.08 0.03 0.18 0.04 0.08 5.19 0.05 0.02 0.01 0.03 0.05 0.06 0.06 0.03 0.03 0.00 0.14 0.04 0.54 0.08 0.04 0.03 0.79
GBR 0.03 0.80 0.09 2.13 0.03 0.05 0.12 0.08 0.03 0.30 0.15 3.13 0.02 0.01 0.02 0.41 0.02 0.12 0.02 0.05 0.00 0.19 0.06 0.36 0.05 0.04 0.02 2.28
GRC 0.00 0.04 0.14 0.26 0.00 0.00 0.00 0.01 0.00 0.00 0.01 0.00 2.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.03 0.00 0.00 0.02
HRV 0.19 0.01 0.01 0.03 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.25 0.03 0.00 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.09 0.01
HUN 0.29 0.07 0.08 0.17 0.30 0.08 0.02 0.00 0.01 0.01 0.06 0.00 0.00 0.01 4.83 0.00 0.01 0.09 0.01 0.05 0.00 0.01 0.05 0.04 0.13 0.23 0.06 0.04
IRL 0.00 0.03 0.01 0.06 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.03 0.00 0.00 0.00 1.80 0.00 0.00 0.01 0.00 0.00 0.00 0.01 0.02 0.00 0.00 0.00 0.03
ITA 0.76 0.46 0.40 0.20 0.06 0.24 0.02 0.18 0.04 0.05 0.19 0.03 0.14 0.06 0.06 0.06 4.16 0.05 0.02 0.07 0.00 0.14 0.05 0.37 0.15 0.08 0.21 0.34
LTU 0.00 0.02 0.01 0.01 0.00 0.00 0.00 0.00 0.02 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.18 0.00 0.03 0.00 0.00 0.01 0.01 0.00 0.00 0.00 0.02
LUX 0.00 0.14 0.00 0.02 0.00 0.02 0.00 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.04 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.01 0.01
LVA 0.00 0.01 0.00 0.03 0.00 0.00 0.00 0.00 0.05 0.15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.16 0.00 6.77 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.03
MLT 0.00 0.00 0.00 0.09 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.67 0.00 0.00 0.00 0.00 0.00 0.00 0.01
NLD 0.02 0.86 0.07 0.08 0.02 0.04 0.03 0.01 0.03 0.11 0.08 0.03 0.01 0.01 0.02 0.05 0.01 0.07 0.03 0.02 0.00 2.03 0.03 0.23 0.04 0.03 0.02 0.43
POL 0.02 0.09 0.03 0.19 0.16 0.13 0.01 0.01 0.01 0.02 0.06 0.01 0.00 0.00 0.02 0.00 0.01 0.33 0.00 0.03 0.00 0.01 2.18 0.05 0.02 0.11 0.01 0.11
PRT 0.00 0.05 0.01 0.10 0.00 0.03 0.01 0.07 0.02 0.01 0.04 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.53 0.01 0.00 0.00 0.07
ROU 0.04 0.06 0.89 0.13 0.04 0.02 0.00 0.00 0.00 0.01 0.01 0.00 0.01 0.00 0.31 0.00 0.02 0.02 0.00 0.01 0.00 0.00 0.03 0.04 10.52 0.06 0.01 0.03
SVK 0.23 0.04 0.02 0.08 1.12 0.60 0.00 0.00 0.00 0.01 0.32 0.00 0.00 0.00 0.11 0.00 0.00 0.07 0.00 0.02 0.00 0.00 0.34 0.03 0.03 7.06 0.02 0.03
SVN 0.13 0.01 0.02 0.01 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.01 0.00 0.05 0.01 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.01 6.77 0.01
SWE 0.02 0.20 0.05 0.08 0.02 0.03 0.26 0.01 0.12 0.90 0.04 0.03 0.00 0.01 0.01 0.01 0.01 0.03 0.01 0.06 0.00 0.02 0.05 0.12 0.03 0.02 0.02 8.05
The values represent the effect of a unitary shock in production: ie. if country AUT reduces production by one unit, the damage felt in country DEU is 0.29. Hence, the matrix can be seen as a symmetric network of production effects between countries.
My goal is to find the optimal weights of a weighted unitary shock (i.e. weighting the columns so that the total reduction of production summed over all countries is = 1) that:
ensure a certain distribution of damage across receiving (row) countries (i.e. the row sums), lets say an equal distribution
while at the same minimizing the damage in the overall economic system
I've tried solving it as a simple non-linear optimization problem with equality constraints, using the package Rsolnp:
# objective function to be minimized (global damage)
damage <- function(weights) {
D_weighted <- t(t(D)*weights); return(sum(D_weighted))
}
# constraints (combined in one function:
constr <- function(weights) {
# constraint 1: sum of weights needs to be 1
c1 = sum(weights)
# constraint 2: equal distribution in damage outcome
D_weighted <- t(t(D)*weights)
damage_per_country <- rowSums(D_weighted)/sum(D_weighted)
c2 = damage_per_country/sum(D_weighted)
return(c(c1, c2))
}
# target distribution of damage outcome (for example: equal distribution)
targ_dist <- c(rep(1/(ncol(D)), ncol(D)))
# starting weights (sart with same production reduction in every country)
startweights <- rep(1/ncol(D), ncol(D))
# run optimization with Rsolnp
opt_weights <- solnp(pars = startweights, fun = damage, eqfun = constr, eqB = c(1, targ_dist), LB = rep(0, ncol(D)), UB = rep(1, ncol(D)), control=list(outer.iter=1000,trace=0, tol= 0.001))
but it doesn't converge and returns a warning message:
"The linearized problem has no feasible solution. The problem may not be feasible".
Changing the tolerance doesn't solve the problem. It might be that this solver is not suited for this kind of problem or I need to reformulate the problem completely. I'd be thankful for any help!

Related

How to match elements from data frame with values from an array in R?

I want to match elements from df1 with values from an array1.
df1 <- (c('A','S','E','E','V','G','H','P','K','L','W','N','P','A','A','S','E','N','M','Y','S','G','D','R','H'))
array1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A 0.15 0.00 0.10 0.10 0.05 0.00 0.05 0.00 0.00 0.05 0.00 0.0 0.00 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.00 0.00 0.00
C 0.00 0.00 0.00 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.0 0.00 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
D 0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.00 0.05 0.05 0.00 0.0 0.10 0.0 0.00 0.25 0.10 0.20 0.10 0.00 0.15 0.05 0.00 0.00 0.05
E 0.05 0.10 0.05 0.05 0.00 0.05 0.00 0.10 0.10 0.20 0.00 0.0 0.05 0.0 0.00 0.00 0.05 0.10 0.00 0.20 0.10 0.05 0.15 0.10 0.10
F 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.00 0.00 0.05 0.0 0.05 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
G 0.00 0.00 0.10 0.00 0.05 0.00 0.00 0.00 0.05 0.00 0.00 0.0 0.00 0.0 0.00 0.00 0.00 0.00 0.05 0.00 0.00 0.05 0.00 0.00 0.00
H 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 0.00 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
I 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.2 0.05 0.1 0.05 0.05 0.00 0.00 0.00 0.00 0.05 0.00 0.00 0.00 0.00
K 0.00 0.10 0.00 0.05 0.00 0.05 0.05 0.05 0.00 0.00 0.00 0.0 0.00 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.00 0.05
L 0.00 0.00 0.05 0.05 0.05 0.05 0.10 0.00 0.10 0.00 0.00 0.0 0.00 0.2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.00
M 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 0.00 0.0 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
N 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 0.00 0.0 0.00 0.00 0.00 0.05 0.05 0.00 0.00 0.00 0.00 0.05 0.00
P 0.00 0.00 0.00 0.05 0.05 0.00 0.10 0.10 0.00 0.00 0.00 0.0 0.00 0.0 0.00 0.00 0.00 0.00 0.00 0.05 0.00 0.10 0.00 0.05 0.00
Q 0.00 0.05 0.05 0.00 0.10 0.00 0.00 0.00 0.00 0.00 0.05 0.0 0.00 0.1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05
R 0.00 0.00 0.05 0.00 0.05 0.15 0.00 0.00 0.00 0.05 0.00 0.0 0.00 0.0 0.00 0.00 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00
S 0.10 0.10 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.05 0.0 0.00 0.0 0.15 0.10 0.20 0.05 0.10 0.10 0.05 0.00 0.05 0.05 0.10
T 0.00 0.00 0.00 0.05 0.00 0.05 0.00 0.05 0.05 0.00 0.00 0.0 0.00 0.0 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.05 0.05 0.00
V 0.05 0.05 0.00 0.05 0.00 0.00 0.00 0.05 0.05 0.00 0.10 0.2 0.15 0.0 0.15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
W 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 0.00 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Y 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.10 0.0 0.00 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
The expected outcome can be a list or a df:
0.15, 0.10, 0.05, 0.05, 0.00, 0.00, 0.00, 0.10, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.10, 0.05, 0.05, 0.00, 0.00, 0.05, 0.05, 0.00, 0.00, 0.00
This is what I have tried:
res <- left_join(df1, array1, by = array1[[y]])
view(res)

You can use matrix subsetting on array1 :
array1[cbind(match(df1, rownames(array1)), 1:ncol(array1))]
#[1] 0.15 0.10 0.05 0.05 0.00 0.00 0.00 0.10 0.00 0.00 0.00 0.00 0.00
#[14] 0.00 0.00 0.10 0.05 0.05 0.00 0.00 0.05 0.05 0.00 0.00 0.00
match(df1, rownames(array1)) creates a row number to subset based on values in df1.

How to extract value of CPU idle from sar command using AWK

From the outut of a sar command, I want to extract only the lines in which the %iowait value is higher than a set threshold.
I tried using AWK but somehow I'm not able to perform the action.
sar -u -f sa12 | sed 's/\./,/g' | awk -f" " '{ if ( $7 -gt 0 ) print $0 }'
I tried to substitute the . with , and using -gt but still no joy.
Can someone suggest a solution?

If we need entire line output of sar -u with iowait > 0.01 then, we can use this ,
Command
sar -u | grep -v "CPU" | awk '$7 > 0.01'
Output will be similar to
03:40:01 AM all 3.16 0.00 0.05 0.11 0.00 96.68
04:40:01 PM all 0.19 0.00 0.05 0.02 0.00 99.74
if wish to out specific fields, say only iowait, we can use as given below,
Command to out specific field(s),
sar -u | grep -v "CPU" | awk '{if($7 > 0.01 ) print $7}'
Output will be
0.11
0.02
Note : grep -v is used just to remove the headings in the output
Hope this helps,

My sar -u gives several lines similar to the following:
Linux 4.4.0-127-generic (v1) 06/12/2018 _x86_64_ (1 CPU)
12:00:01 AM CPU %user %nice %system %iowait %steal %idle
12:05:01 AM all 0.29 0.00 0.30 0.01 0.00 99.40
12:15:01 AM all 0.33 0.00 0.34 0.00 0.00 99.32
12:25:01 AM all 0.33 0.00 0.30 0.01 0.00 99.36
12:35:01 AM all 0.31 0.00 0.29 0.01 0.00 99.39
12:45:01 AM all 0.33 0.00 0.32 0.01 0.00 99.35
12:55:01 AM all 0.32 0.00 0.30 0.00 0.00 99.38
01:05:01 AM all 0.32 0.00 0.28 0.00 0.00 99.39
01:15:01 AM all 0.33 0.00 0.30 0.01 0.00 99.37
01:25:01 AM all 0.31 0.00 0.30 0.01 0.00 99.39
01:35:01 AM all 0.31 0.00 0.33 0.00 0.00 99.36
01:45:01 AM all 0.31 0.00 0.28 0.01 0.00 99.40
01:55:01 AM all 0.31 0.00 0.30 0.00 0.00 99.38
02:05:01 AM all 0.31 0.00 0.28 0.01 0.00 99.40
02:15:01 AM all 0.32 0.00 0.30 0.01 0.00 99.38
02:25:01 AM all 0.31 0.00 0.30 0.01 0.00 99.38
02:35:01 AM all 0.33 0.00 0.33 0.00 0.00 99.33
02:45:01 AM all 0.35 0.00 0.32 0.01 0.00 99.32
02:55:01 AM all 0.28 0.00 0.30 0.00 0.00 99.42
03:05:01 AM all 0.32 0.00 0.31 0.00 0.00 99.37
03:15:01 AM all 0.34 0.00 0.30 0.01 0.00 99.36
03:25:01 AM all 0.32 0.00 0.29 0.01 0.00 99.38
03:35:01 AM all 0.33 0.00 0.26 0.00 0.00 99.40
03:45:01 AM all 0.34 0.00 0.29 0.00 0.00 99.36
03:55:01 AM all 0.30 0.00 0.28 0.01 0.00 99.41
04:05:01 AM all 0.32 0.00 0.30 0.01 0.00 99.37
04:15:01 AM all 0.37 0.00 0.31 0.01 0.00 99.32
04:25:01 AM all 1.78 2.04 0.59 0.05 0.00 95.55
To filter out those where %iowait is greater than, let's say, 0.01:
sar -u | awk '$7>0.01{print}'
Linux 4.4.0-127-generic (v1) 06/12/2018 _x86_64_ (1 CPU)
04:25:01 AM all 1.78 2.04 0.59 0.05 0.00 95.55
05:15:01 AM all 0.34 0.00 0.32 0.02 0.00 99.32
06:35:01 AM all 0.33 0.22 1.23 4.48 0.00 93.74
06:45:01 AM all 0.16 0.00 0.12 0.02 0.00 99.71
10:35:01 AM all 0.22 0.00 0.13 0.02 0.00 99.63
12:15:01 PM all 0.42 0.00 0.16 0.03 0.00 99.40
01:45:01 PM all 0.17 0.00 0.11 0.02 0.00 99.71
04:05:01 PM all 0.15 0.00 0.12 0.03 0.00 99.70
04:15:01 PM all 0.42 0.00 0.23 0.10 0.00 99.25
Edit:
As correctly pointed out by #Ed Morton, the awk code can be shortened to simply awk '$7>0.01', since the default action is to print the current line.

R profiling spending a lot of time using .External2

I am learning how to use R profiling, and have run the Rprof command on my code.
The summaryRprof function has shown that a lot of time is spent using .External2. What is this? Additionally, there is a large proportion of the total time spent on <Anonymous>, is there a way to find out what this is?
> summaryRprof("test")
$by.self
self.time self.pct total.time total.pct
".External2" 4.30 27.74 4.30 27.74
"format.POSIXlt" 2.70 17.42 2.90 18.71
"which.min" 2.38 15.35 4.12 26.58
"-" 1.30 8.39 1.30 8.39
"order" 1.16 7.48 1.16 7.48
"match" 0.58 3.74 0.58 3.74
"file" 0.44 2.84 0.44 2.84
"abs" 0.40 2.58 0.40 2.58
"scan" 0.30 1.94 0.30 1.94
"anyDuplicated.default" 0.20 1.29 0.20 1.29
"unique.default" 0.20 1.29 0.20 1.29
"unlist" 0.18 1.16 0.20 1.29
"c" 0.16 1.03 0.16 1.03
"data.frame" 0.14 0.90 0.22 1.42
"structure" 0.12 0.77 1.74 11.23
"as.POSIXct.POSIXlt" 0.12 0.77 0.12 0.77
"strptime" 0.12 0.77 0.12 0.77
"as.character" 0.08 0.52 0.90 5.81
"make.unique" 0.08 0.52 0.16 1.03
"[.data.frame" 0.06 0.39 1.54 9.94
"<Anonymous>" 0.04 0.26 4.34 28.00
"lapply" 0.04 0.26 1.70 10.97
"rbind" 0.04 0.26 0.94 6.06
"as.POSIXlt.POSIXct" 0.04 0.26 0.04 0.26
"ifelse" 0.04 0.26 0.04 0.26
"paste" 0.02 0.13 0.92 5.94
"merge.data.frame" 0.02 0.13 0.56 3.61
"[<-.factor" 0.02 0.13 0.52 3.35
"stopifnot" 0.02 0.13 0.04 0.26
".deparseOpts" 0.02 0.13 0.02 0.13
".External" 0.02 0.13 0.02 0.13
"close.connection" 0.02 0.13 0.02 0.13
"doTryCatch" 0.02 0.13 0.02 0.13
"is.na" 0.02 0.13 0.02 0.13
"is.na<-.default" 0.02 0.13 0.02 0.13
"mean" 0.02 0.13 0.02 0.13
"seq.int" 0.02 0.13 0.02 0.13
"sum" 0.02 0.13 0.02 0.13
"sys.function" 0.02 0.13 0.02 0.13
$by.total
total.time total.pct self.time self.pct
"write.table" 5.10 32.90 0.00 0.00
"<Anonymous>" 4.34 28.00 0.04 0.26
".External2" 4.30 27.74 4.30 27.74
"mapply" 4.22 27.23 0.00 0.00
"head" 4.16 26.84 0.00 0.00
"which.min" 4.12 26.58 2.38 15.35
"eval" 3.16 20.39 0.00 0.00
"eval.parent" 3.14 20.26 0.00 0.00
"write.csv" 3.14 20.26 0.00 0.00
"format" 2.92 18.84 0.00 0.00
"format.POSIXlt" 2.90 18.71 2.70 17.42
"do.call" 1.78 11.48 0.00 0.00
"structure" 1.74 11.23 0.12 0.77
"lapply" 1.70 10.97 0.04 0.26
"FUN" 1.66 10.71 0.00 0.00
"format.POSIXct" 1.62 10.45 0.00 0.00
"[.data.frame" 1.54 9.94 0.06 0.39
"[" 1.54 9.94 0.00 0.00
"-" 1.30 8.39 1.30 8.39
"order" 1.16 7.48 1.16 7.48
"rbind" 0.94 6.06 0.04 0.26
"paste" 0.92 5.94 0.02 0.13
"as.character" 0.90 5.81 0.08 0.52
"read.csv" 0.84 5.42 0.00 0.00
"read.table" 0.84 5.42 0.00 0.00
"as.character.POSIXt" 0.82 5.29 0.00 0.00
"match" 0.58 3.74 0.58 3.74
"merge.data.frame" 0.56 3.61 0.02 0.13
"merge" 0.56 3.61 0.00 0.00
"[<-.factor" 0.52 3.35 0.02 0.13
"[<-" 0.52 3.35 0.00 0.00
"strftime" 0.48 3.10 0.00 0.00
"file" 0.44 2.84 0.44 2.84
"weekdays" 0.42 2.71 0.00 0.00
"weekdays.POSIXt" 0.42 2.71 0.00 0.00
"abs" 0.40 2.58 0.40 2.58
"unique" 0.38 2.45 0.00 0.00
"scan" 0.30 1.94 0.30 1.94
"data.frame" 0.22 1.42 0.14 0.90
"cbind" 0.22 1.42 0.00 0.00
"anyDuplicated.default" 0.20 1.29 0.20 1.29
"unique.default" 0.20 1.29 0.20 1.29
"unlist" 0.20 1.29 0.18 1.16
"anyDuplicated" 0.20 1.29 0.00 0.00
"as.POSIXct" 0.18 1.16 0.00 0.00
"as.POSIXlt" 0.18 1.16 0.00 0.00
"c" 0.16 1.03 0.16 1.03
"make.unique" 0.16 1.03 0.08 0.52
"as.POSIXct.POSIXlt" 0.12 0.77 0.12 0.77
"strptime" 0.12 0.77 0.12 0.77
"as.POSIXlt.character" 0.12 0.77 0.00 0.00
"object.size" 0.12 0.77 0.00 0.00
"as.POSIXct.default" 0.10 0.65 0.00 0.00
"Ops.POSIXt" 0.08 0.52 0.00 0.00
"type.convert" 0.08 0.52 0.00 0.00
"!=" 0.06 0.39 0.00 0.00
"as.POSIXlt.factor" 0.06 0.39 0.00 0.00
"as.POSIXlt.POSIXct" 0.04 0.26 0.04 0.26
"ifelse" 0.04 0.26 0.04 0.26
"stopifnot" 0.04 0.26 0.02 0.13
"$" 0.04 0.26 0.00 0.00
"$.data.frame" 0.04 0.26 0.00 0.00
"[[" 0.04 0.26 0.00 0.00
"[[.data.frame" 0.04 0.26 0.00 0.00
"head.default" 0.04 0.26 0.00 0.00
".deparseOpts" 0.02 0.13 0.02 0.13
".External" 0.02 0.13 0.02 0.13
"close.connection" 0.02 0.13 0.02 0.13
"doTryCatch" 0.02 0.13 0.02 0.13
"is.na" 0.02 0.13 0.02 0.13
"is.na<-.default" 0.02 0.13 0.02 0.13
"mean" 0.02 0.13 0.02 0.13
"seq.int" 0.02 0.13 0.02 0.13
"sum" 0.02 0.13 0.02 0.13
"sys.function" 0.02 0.13 0.02 0.13
"%in%" 0.02 0.13 0.00 0.00
".rs.getSingleClass" 0.02 0.13 0.00 0.00
"[.POSIXlt" 0.02 0.13 0.00 0.00
"==" 0.02 0.13 0.00 0.00
"close" 0.02 0.13 0.00 0.00
"data.row.names" 0.02 0.13 0.00 0.00
"deparse" 0.02 0.13 0.00 0.00
"factor" 0.02 0.13 0.00 0.00
"is.na<-" 0.02 0.13 0.00 0.00
"match.arg" 0.02 0.13 0.00 0.00
"match.call" 0.02 0.13 0.00 0.00
"pushBack" 0.02 0.13 0.00 0.00
"seq" 0.02 0.13 0.00 0.00
"seq.POSIXt" 0.02 0.13 0.00 0.00
"simplify2array" 0.02 0.13 0.00 0.00
"tryCatch" 0.02 0.13 0.00 0.00
"tryCatchList" 0.02 0.13 0.00 0.00
"tryCatchOne" 0.02 0.13 0.00 0.00
"which" 0.02 0.13 0.00 0.00
$sample.interval
[1] 0.02
$sampling.time
[1] 15.5

corr.test arguments imply differing number of rows

I have seen this error multiple times in different projects and I was wondering if there is a way to tell which line caused the error in general?
My specific case:
http://archive.ics.uci.edu/ml/machine-learning-databases/00275/
#using the bike.csv
data<-read.csv("PATH_HERE\\Bike-Sharing-Dataset\\day.csv",header=TRUE)
require(psych)
corr.test(data)
data<-data[,c("atemp","casual","cnt","holiday","hum","mnth","registered",
"season","temp","weathersit","weekday","windspeed","workingday","yr")]
data[data=='']<-NA
#View(data)
require(psych)
cors<-corr.test(data)
returns the error:
Error in data.frame(lower = lower, r = r[lower.tri(r)], upper = upper, :
arguments imply differing number of rows: 0, 91

It works for me
> #using the bike.csv
> data <- read.csv("day.csv",header=TRUE)
> require(psych)
> corr.test(data)
Error in cor(x, use = use, method = method) : 'x' must be numeric
> data <- data[,c("atemp","casual","cnt","holiday","hum","mnth","registered",
+ "season","temp","weathersit","weekday","windspeed","workingday","yr")]
> data[data==''] <- NA
> #View(data)
>
> require(psych)
> cors <- corr.test(data)
> cors
Call:corr.test(x = data)
Correlation matrix
atemp casual cnt holiday hum mnth registered season temp
atemp 1.00 0.54 0.63 -0.03 0.14 0.23 0.54 0.34 0.99
casual 0.54 1.00 0.67 0.05 -0.08 0.12 0.40 0.21 0.54
cnt 0.63 0.67 1.00 -0.07 -0.10 0.28 0.95 0.41 0.63
holiday -0.03 0.05 -0.07 1.00 -0.02 0.02 -0.11 -0.01 -0.03
hum 0.14 -0.08 -0.10 -0.02 1.00 0.22 -0.09 0.21 0.13
mnth 0.23 0.12 0.28 0.02 0.22 1.00 0.29 0.83 0.22
registered 0.54 0.40 0.95 -0.11 -0.09 0.29 1.00 0.41 0.54
season 0.34 0.21 0.41 -0.01 0.21 0.83 0.41 1.00 0.33
temp 0.99 0.54 0.63 -0.03 0.13 0.22 0.54 0.33 1.00
weathersit -0.12 -0.25 -0.30 -0.03 0.59 0.04 -0.26 0.02 -0.12
weekday -0.01 0.06 0.07 -0.10 -0.05 0.01 0.06 0.00 0.00
windspeed -0.18 -0.17 -0.23 0.01 -0.25 -0.21 -0.22 -0.23 -0.16
workingday 0.05 -0.52 0.06 -0.25 0.02 -0.01 0.30 0.01 0.05
yr 0.05 0.25 0.57 0.01 -0.11 0.00 0.59 0.00 0.05
weathersit weekday windspeed workingday yr
atemp -0.12 -0.01 -0.18 0.05 0.05
casual -0.25 0.06 -0.17 -0.52 0.25
cnt -0.30 0.07 -0.23 0.06 0.57
holiday -0.03 -0.10 0.01 -0.25 0.01
hum 0.59 -0.05 -0.25 0.02 -0.11
mnth 0.04 0.01 -0.21 -0.01 0.00
registered -0.26 0.06 -0.22 0.30 0.59
season 0.02 0.00 -0.23 0.01 0.00
temp -0.12 0.00 -0.16 0.05 0.05
weathersit 1.00 0.03 0.04 0.06 -0.05
weekday 0.03 1.00 0.01 0.04 -0.01
windspeed 0.04 0.01 1.00 -0.02 -0.01
workingday 0.06 0.04 -0.02 1.00 0.00
yr -0.05 -0.01 -0.01 0.00 1.00
Sample Size
[1] 731
Probability values (Entries above the diagonal are adjusted for multiple tests.)
atemp casual cnt holiday hum mnth registered season temp
atemp 0.00 0.00 0.00 1.00 0.01 0.00 0.00 0.00 0.00
casual 0.00 0.00 0.00 1.00 1.00 0.04 0.00 0.00 0.00
cnt 0.00 0.00 0.00 1.00 0.28 0.00 0.00 0.00 0.00
holiday 0.38 0.14 0.06 0.00 1.00 1.00 0.15 1.00 1.00
hum 0.00 0.04 0.01 0.67 0.00 0.00 0.58 0.00 0.03
mnth 0.00 0.00 0.00 0.60 0.00 0.00 0.00 0.00 0.00
registered 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00
season 0.00 0.00 0.00 0.78 0.00 0.00 0.00 0.00 0.00
temp 0.00 0.00 0.00 0.44 0.00 0.00 0.00 0.00 0.00
weathersit 0.00 0.00 0.00 0.35 0.00 0.24 0.00 0.60 0.00
weekday 0.84 0.11 0.07 0.01 0.16 0.80 0.12 0.93 1.00
windspeed 0.00 0.00 0.00 0.87 0.00 0.00 0.00 0.00 0.00
workingday 0.16 0.00 0.10 0.00 0.51 0.87 0.00 0.74 0.15
yr 0.21 0.00 0.00 0.83 0.00 0.96 0.00 0.96 0.20
weathersit weekday windspeed workingday yr
atemp 0.05 1.00 0.00 1.00 1.00
casual 0.00 1.00 0.00 0.00 0.00
cnt 0.00 1.00 0.00 1.00 0.00
holiday 1.00 0.25 1.00 0.00 1.00
hum 0.00 1.00 0.00 1.00 0.13
mnth 1.00 1.00 0.00 1.00 1.00
registered 0.00 1.00 0.00 0.00 0.00
season 1.00 1.00 0.00 1.00 1.00
temp 0.05 1.00 0.00 1.00 1.00
weathersit 0.00 1.00 1.00 1.00 1.00
weekday 0.40 0.00 1.00 1.00 1.00
windspeed 0.29 0.70 0.00 1.00 1.00
workingday 0.10 0.33 0.61 0.00 1.00
yr 0.19 0.88 0.75 0.96 0.00
To see confidence intervals of the correlations, print with the short=FALSE option
>

It works for me:::
rm(list=ls())
# http://archive.ics.uci.edu/ml/machine-learning-databases/00275/
#using the bike.csv
day <- read.csv("Bike-Sharing-Dataset//day.csv")
require(psych)
day<-day[,c("atemp","casual","cnt","holiday","hum","mnth","registered",
"season","temp","weathersit","weekday","windspeed","workingday","yr")]
day[day=='']<-NA
require(psych)
corr.test(day)
# corr.test(day)
# Call:corr.test(x = day)
# Correlation matrix
# atemp casual cnt holiday hum mnth registered season temp weathersit weekday windspeed workingday yr
# atemp 1.00 0.54 0.63 -0.03 0.14 0.23 0.54 0.34 0.99 -0.12 -0.01 -0.18 0.05 0.05
# casual 0.54 1.00 0.67 0.05 -0.08 0.12 0.40 0.21 0.54 -0.25 0.06 -0.17 -0.52 0.25
# cnt 0.63 0.67 1.00 -0.07 -0.10 0.28 0.95 0.41 0.63 -0.30 0.07 -0.23 0.06 0.57
# holiday -0.03 0.05 -0.07 1.00 -0.02 0.02 -0.11 -0.01 -0.03 -0.03 -0.10 0.01 -0.25 0.01
# hum 0.14 -0.08 -0.10 -0.02 1.00 0.22 -0.09 0.21 0.13 0.59 -0.05 -0.25 0.02 -0.11
# mnth 0.23 0.12 0.28 0.02 0.22 1.00 0.29 0.83 0.22 0.04 0.01 -0.21 -0.01 0.00
# registered 0.54 0.40 0.95 -0.11 -0.09 0.29 1.00 0.41 0.54 -0.26 0.06 -0.22 0.30 0.59
# season 0.34 0.21 0.41 -0.01 0.21 0.83 0.41 1.00 0.33 0.02 0.00 -0.23 0.01 0.00
# temp 0.99 0.54 0.63 -0.03 0.13 0.22 0.54 0.33 1.00 -0.12 0.00 -0.16 0.05 0.05
# weathersit -0.12 -0.25 -0.30 -0.03 0.59 0.04 -0.26 0.02 -0.12 1.00 0.03 0.04 0.06 -0.05
# weekday -0.01 0.06 0.07 -0.10 -0.05 0.01 0.06 0.00 0.00 0.03 1.00 0.01 0.04 -0.01
# windspeed -0.18 -0.17 -0.23 0.01 -0.25 -0.21 -0.22 -0.23 -0.16 0.04 0.01 1.00 -0.02 -0.01
# workingday 0.05 -0.52 0.06 -0.25 0.02 -0.01 0.30 0.01 0.05 0.06 0.04 -0.02 1.00 0.00
# yr 0.05 0.25 0.57 0.01 -0.11 0.00 0.59 0.00 0.05 -0.05 -0.01 -0.01 0.00 1.00
# Sample Size
# [1] 731
# Probability values (Entries above the diagonal are adjusted for multiple tests.)
# atemp casual cnt holiday hum mnth registered season temp weathersit weekday windspeed workingday yr
# atemp 0.00 0.00 0.00 1.00 0.01 0.00 0.00 0.00 0.00 0.05 1.00 0.00 1.00 1.00
# casual 0.00 0.00 0.00 1.00 1.00 0.04 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00
# cnt 0.00 0.00 0.00 1.00 0.28 0.00 0.00 0.00 0.00 0.00 1.00 0.00 1.00 0.00
# holiday 0.38 0.14 0.06 0.00 1.00 1.00 0.15 1.00 1.00 1.00 0.25 1.00 0.00 1.00
# hum 0.00 0.04 0.01 0.67 0.00 0.00 0.58 0.00 0.03 0.00 1.00 0.00 1.00 0.13
# mnth 0.00 0.00 0.00 0.60 0.00 0.00 0.00 0.00 0.00 1.00 1.00 0.00 1.00 1.00
# registered 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00
# season 0.00 0.00 0.00 0.78 0.00 0.00 0.00 0.00 0.00 1.00 1.00 0.00 1.00 1.00
# temp 0.00 0.00 0.00 0.44 0.00 0.00 0.00 0.00 0.00 0.05 1.00 0.00 1.00 1.00
# weathersit 0.00 0.00 0.00 0.35 0.00 0.24 0.00 0.60 0.00 0.00 1.00 1.00 1.00 1.00
# weekday 0.84 0.11 0.07 0.01 0.16 0.80 0.12 0.93 1.00 0.40 0.00 1.00 1.00 1.00
# windspeed 0.00 0.00 0.00 0.87 0.00 0.00 0.00 0.00 0.00 0.29 0.70 0.00 1.00 1.00
# workingday 0.16 0.00 0.10 0.00 0.51 0.87 0.00 0.74 0.15 0.10 0.33 0.61 0.00 1.00
# yr 0.21 0.00 0.00 0.83 0.00 0.96 0.00 0.96 0.20 0.19 0.88 0.75 0.96 0.00
#
# To see confidence intervals of the correlations, print with the short=FALSE option
cheers

Principal component analysis (PCA) in R: why are the scores not orthogonal? (using Psych package)

I ran PCA in R using the principal() function in the "psych" package. I made the argument "rotate="none"", which asks for orthogonal rotation method. From what I understand, the scores of PC1 and PC2 should be orthogonal (i.e. there should be zero correlation between (raw data)(loading of PC1)and (raw data)(loading of PC2). However, I got 90% correlation. Why is that?
> #load the package
> library(psych)
> #calculate the correlation matrix
> corMat <- cor(data)
> #run PCA
> pca.results <- principal(r = corMat,**rotate ="none"**, nfactors = 20,covar=FALSE,scores=TRUE)
> pca.results`enter code here`
Principal Components Analysis
Call: principal(r = corMat, nfactors = 20, rotate = "none", covar = FALSE,
scores = TRUE)
Standardized loadings (pattern matrix) based upon correlation matrix
**PC1 PC2** PC3 PC4 PC5 PC6 PC7 PC8 PC9
payroll.chg -0.30 0.85 0.21 0.35 -0.03 0.02 0.07 -0.11 -0.02
HH.empl.chg -0.26 0.62 0.64 -0.35 0.01 -0.06 0.06 0.00 0.01
pop.empl.ratio -0.92 -0.34 0.13 0.04 0.06 -0.03 -0.04 0.03 -0.04
u.rate 0.99 0.10 0.02 0.04 0.01 0.04 0.04 0.04 0.01
median.duration.unempl 0.88 0.44 -0.02 0.02 -0.04 0.06 0.02 0.13 -0.05
LT.unempl.unempl.ratio 0.86 0.49 -0.04 0.01 -0.07 0.02 0.00 0.08 -0.02
U4 0.99 0.13 0.01 0.03 0.01 0.04 0.04 0.05 0.01
U6 0.98 0.13 -0.05 -0.02 0.00 0.06 0.04 0.03 0.04
vacancy.rate -0.87 0.35 -0.18 -0.11 -0.01 0.22 0.10 0.03 -0.01
hires.rate -0.92 0.08 0.24 0.21 -0.16 0.06 0.00 0.05 0.09
unemployed.to.employed 0.89 0.17 0.21 -0.02 0.05 0.24 -0.25 -0.05 0.00
Layoff.rate..JOLT. 0.23 -0.86 0.19 -0.03 -0.40 0.09 0.03 -0.02 -0.05
Exhaustion.rate 0.95 0.19 0.14 0.14 0.00 -0.07 0.01 0.06 -0.04
Quits.rate..JOLT. -0.98 0.01 0.04 0.04 0.01 0.02 -0.06 0.10 0.13
participation.rate -0.67 -0.61 0.31 0.14 0.16 -0.01 -0.03 0.11 -0.08
insured.u.rate 0.88 -0.40 0.17 0.08 0.12 0.05 0.09 -0.03 0.02
Initial.jobless.claims 0.78 -0.60 0.04 -0.06 0.06 0.05 0.07 0.02 0.07
Continuing.claims 0.86 -0.44 0.15 0.06 0.14 0.08 0.09 -0.05 0.03
Jobs.plentiful.jobs.hardtoget -0.98 0.00 -0.02 0.01 0.08 0.13 0.04 -0.02 -0.04
vacancy.unempl.ratio -0.97 0.04 -0.05 -0.03 0.08 0.18 0.07 0.03 -0.03
PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17 PC18
payroll.chg -0.06 0.02 -0.02 0.00 0.03 0.00 0.00 0.00 0.00
HH.empl.chg 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
pop.empl.ratio -0.02 0.00 -0.01 0.01 0.00 0.00 0.00 0.01 0.01
u.rate -0.01 0.00 0.03 -0.03 0.02 0.00 0.00 -0.01 -0.01
median.duration.unempl 0.02 0.05 -0.06 -0.01 -0.03 0.01 -0.02 0.00 0.00
LT.unempl.unempl.ratio 0.01 0.02 -0.01 0.02 0.00 0.00 0.05 0.00 0.00
U4 -0.01 0.00 0.04 -0.02 0.02 0.00 -0.01 -0.01 0.01
U6 -0.01 0.01 0.03 -0.03 0.02 -0.02 0.00 0.03 0.00
vacancy.rate -0.08 -0.06 0.01 0.01 -0.01 0.04 0.00 0.00 0.00
hires.rate 0.01 0.00 0.04 0.00 -0.06 -0.01 0.00 0.00 0.00
unemployed.to.employed -0.01 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00
Layoff.rate..JOLT. 0.01 0.00 -0.01 -0.01 0.03 0.00 0.00 0.00 0.00
Exhaustion.rate 0.05 -0.07 0.02 0.06 0.01 -0.01 -0.02 0.00 0.00
Quits.rate..JOLT. 0.04 -0.01 -0.04 0.00 0.05 0.02 0.00 0.00 0.00
participation.rate -0.06 0.00 0.02 -0.02 0.01 0.01 0.01 0.00 0.00
insured.u.rate 0.04 -0.02 -0.02 0.00 -0.02 0.02 0.01 0.00 0.02
Initial.jobless.claims -0.09 0.06 0.00 0.06 0.01 -0.01 -0.01 0.00 0.00
Continuing.claims 0.05 -0.02 -0.02 -0.02 -0.01 0.01 0.01 0.01 -0.02
Jobs.plentiful.jobs.hardtoget 0.11 0.07 0.05 0.02 0.01 0.02 0.00 0.00 0.00
vacancy.unempl.ratio 0.03 -0.01 -0.03 0.00 0.01 -0.06 0.00 0.00 0.00
PC19 PC20 h2 u2
payroll.chg 0.00 0.00 1 5.6e-16
HH.empl.chg 0.00 0.00 1 -2.9e-15
pop.empl.ratio 0.01 0.01 1 -1.6e-15
u.rate -0.01 0.01 1 1.1e-16
median.duration.unempl 0.00 0.00 1 -4.4e-16
LT.unempl.unempl.ratio 0.00 0.00 1 -6.7e-16
U4 0.01 0.00 1 -4.4e-16
U6 0.00 0.00 1 2.2e-16
vacancy.rate 0.00 0.00 1 0.0e+00
hires.rate 0.00 0.00 1 4.4e-16
unemployed.to.employed 0.00 0.00 1 -2.2e-16
Layoff.rate..JOLT. 0.00 0.00 1 -2.2e-15
Exhaustion.rate 0.00 0.00 1 -4.4e-16
Quits.rate..JOLT. 0.00 0.00 1 1.1e-16
participation.rate 0.00 -0.01 1 5.6e-16
insured.u.rate -0.01 0.00 1 -6.7e-16
Initial.jobless.claims 0.00 0.00 1 -2.0e-15
Continuing.claims 0.01 0.00 1 -6.7e-16
Jobs.plentiful.jobs.hardtoget 0.00 0.00 1 2.2e-16
vacancy.unempl.ratio 0.00 0.00 1 -2.2e-16
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12
SS loadings 14.23 3.73 0.83 0.37 0.28 0.20 0.12 0.07 0.05 0.05 0.02 0.02
Proportion Var 0.71 0.19 0.04 0.02 0.01 0.01 0.01 0.00 0.00 0.00 0.00 0.00
Cumulative Var 0.71 0.90 0.94 0.96 0.97 0.98 0.99 0.99 0.99 1.00 1.00 1.00
Proportion Explained 0.71 0.19 0.04 0.02 0.01 0.01 0.01 0.00 0.00 0.00 0.00 0.00
Cumulative Proportion 0.71 0.90 0.94 0.96 0.97 0.98 0.99 0.99 0.99 1.00 1.00 1.00
PC13 PC14 PC15 PC16 PC17 PC18 PC19 PC20
SS loadings 0.01 0.01 0.01 0 0 0 0 0
Proportion Var 0.00 0.00 0.00 0 0 0 0 0
Cumulative Var 1.00 1.00 1.00 1 1 1 1 1
Proportion Explained 0.00 0.00 0.00 0 0 0 0 0
Cumulative Proportion 1.00 1.00 1.00 1 1 1 1 1
Test of the hypothesis that 20 components are sufficient.
The degrees of freedom for the null model are 190 and the objective function was 68.46
The degrees of freedom for the model are -20 and the objective function was 0
Fit based upon off diagonal values = 1

To find the component scores you can skip the step in which you are finding the correlations. principal will do that for you. Then, you can skip the step Hong Ooi suggested andjust find the scores directly. They should be orthogonal.
Using your example:
pca.results <- principal(data,nfactors=20,rotate='none')
#then correlate the scores
cor(pca.results$scores) #these should be orthogonal
Bill

What you've got there are not the PCA scores, but the PCA loadings. To get the latter, use the predict method on your model. You should find that the predicted scores are indeed uncorrelated with each other.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Minimizing weighted sum of matrix while ensuring distribution of outcomes in R - r

Related

How to match elements from data frame with values from an array in R?

How to extract value of CPU idle from sar command using AWK

R profiling spending a lot of time using .External2

corr.test arguments imply differing number of rows

Principal component analysis (PCA) in R: why are the scores not orthogonal? (using Psych package)

Categories

Resources