Find all points on a plane - r

I am trying to get all points on a 2d plane in the range (0..10,0..10) with a step of 0.5. I would like two store these values in a dataframe like this:
x y
1 1 1.5
2 0 0.5
3 4 2.0
I am considering using a loop to start from 0.0 for the x column and fill the y column such that I get something like this:
x y
1 0 0
2 0 0.5
3 0 1
and so on upto 10. And increment it by 0.5 and do for 1 and so on. I would like to know a more efficient way of doing this in R?.

Is this what you want?
expand.grid(x=seq(0,10,by=0.5),y=seq(0,10,by=0.5))
x y
1 0.0 0.0
2 0.5 0.0
3 1.0 0.0
4 1.5 0.0
5 2.0 0.0
6 2.5 0.0
7 3.0 0.0
8 3.5 0.0
9 4.0 0.0
10 4.5 0.0
11 5.0 0.0
12 5.5 0.0
13 6.0 0.0
14 6.5 0.0
15 7.0 0.0
16 7.5 0.0
17 8.0 0.0
18 8.5 0.0
19 9.0 0.0
20 9.5 0.0
21 10.0 0.0
22 0.0 0.5
23 0.5 0.5
24 1.0 0.5
25 1.5 0.5
26 2.0 0.5
27 2.5 0.5
28 3.0 0.5
29 3.5 0.5
30 4.0 0.5
...

Related

Times series in R : how to change y-axis?

New R user here, working with meteorological data (data frame is called "Stations"). Trying to plot 3 time series with temperature on y-axis with a regression line on each one, but I encounter a few problems and there is no error messages.
Loop doesn't seem to be working and I can't figure out why.
Didn't manage to change x-axis graduation values for years ("Année" in the data frame) instead of a number.
Title is the same for the 3 plots, how do I change it so each plot has its own title?
Regression line is not shown on the graph.
Thanks in advance!
Here is my code :
for (i in c(6,8,10))
plot(ts(Stations[,i]), col="dodgerblue4", xlab="Temps", ylab="Température", main="Genève")
for (i in c(6,8,10))
abline(h=Stations[,i])```
Nb.enr time Année Mois Jour T2m_GE pcp_GE T2m_PU pcp_PU T2m_NY
1 19810101 1981 1 1 1.3 0.3 2.8 0.0 2.3
2 19810102 1981 1 2 1.2 0.1 2.3 1.2 1.6
3 19810103 1981 1 3 4.1 21.8 4.9 5.2 3.8
4 19810104 1981 1 4 5.1 10.3 5.1 17.4 4.9
5 19810105 1981 1 5 0.9 0.0 1.0 0.1 0.8
6 19810106 1981 1 6 0.5 5.7 0.7 6.0 0.5
7 19810107 1981 1 7 -2.7 0.0 -2.1 0.1 -1.9
8 19810108 1981 1 8 -3.2 0.0 -4.1 0.0 -3.8
9 19810109 1981 1 9 -5.2 0.0 -3.5 0.0 -5.1
10 19810110 1981 1 10 -3.1 10.6 -0.9 6.0 -2.6

Can't get data for all combinations

I have 31 possible scenarios for the value of the average and I would like to generate for each scenario samples but I do not get as much as I would like.
This is the scenarios
> scenari<-s*shift;scenari
Var1 Var2 Var3 Var4 Var5
2 1.5 0.0 0.0 0.0 0.0
3 0.0 1.5 0.0 0.0 0.0
4 1.5 1.5 0.0 0.0 0.0
5 0.0 0.0 1.5 0.0 0.0
6 1.5 0.0 1.5 0.0 0.0
7 0.0 1.5 1.5 0.0 0.0
8 1.5 1.5 1.5 0.0 0.0
9 0.0 0.0 0.0 1.5 0.0
10 1.5 0.0 0.0 1.5 0.0
11 0.0 1.5 0.0 1.5 0.0
12 1.5 1.5 0.0 1.5 0.0
13 0.0 0.0 1.5 1.5 0.0
14 1.5 0.0 1.5 1.5 0.0
15 0.0 1.5 1.5 1.5 0.0
16 1.5 1.5 1.5 1.5 0.0
17 0.0 0.0 0.0 0.0 1.5
18 1.5 0.0 0.0 0.0 1.5
19 0.0 1.5 0.0 0.0 1.5
20 1.5 1.5 0.0 0.0 1.5
21 0.0 0.0 1.5 0.0 1.5
22 1.5 0.0 1.5 0.0 1.5
23 0.0 1.5 1.5 0.0 1.5
24 1.5 1.5 1.5 0.0 1.5
25 0.0 0.0 0.0 1.5 1.5
26 1.5 0.0 0.0 1.5 1.5
27 0.0 1.5 0.0 1.5 1.5
28 1.5 1.5 0.0 1.5 1.5
29 0.0 0.0 1.5 1.5 1.5
30 1.5 0.0 1.5 1.5 1.5
31 0.0 1.5 1.5 1.5 1.5
32 1.5 1.5 1.5 1.5 1.5
and this is the function
genereting_fuction<-function(n){
for (i in 1:length(scenari)){
X1=rnorm(n)+scenari[i,1]
X4=rnorm(n)+scenari[i,4]
X2=X1*p12+std_e2*rnorm(n)+scenari[i,2]
X3=X1*p13+X4*p43+std_e3*rnorm(n)+scenari[i,3]
X5=X2*p25+X3*p35+std_e5*rnorm(n)+scenari[i,5]
sample=cbind(X1,X2,X3,X4,X5)
return(sample)
}
}
genereting_fuction(10)
I should get 31 samples of size 10X5 but I get only one sample
You are applying the for loop over return as well and eventually returning the sample corresponding to the last scenario only.
Try this :
genereting_fuction<-function(n){
sample <- list()
for (i in 1:nrow(scenari)){
X1=rnorm(n)+scenari[i,1]
X4=rnorm(n)+scenari[i,4]
X2=X1*p12+std_e2*rnorm(n)+scenari[i,2]
X3=X1*p13+X4*p43+std_e3*rnorm(n)+scenari[i,3]
X5=X2*p25+X3*p35+std_e5*rnorm(n)+scenari[i,5]
sample[[i]]=cbind(X1,X2,X3,X4,X5)
}
sample
}
The output will be a list and its ith element will be a sample corresponding to the ith scenario.

Aggregate/sum and N/A values

I have a problem with the way aggregate or N/A deals with sums.
I would like the sums per area.code from following table
test <- read.table(text = "
area.code A B C D
1 0 NA 0.00 NA NA
2 1 0.0 3.10 9.6 0.0
3 1 0.0 3.20 6.0 0.0
4 2 0.0 6.10 5.0 0.0
5 2 0.0 6.50 8.0 0.0
6 2 0.0 6.90 4.0 3.1
7 3 0.0 6.70 3.0 3.2
8 3 0.0 6.80 3.1 6.1
9 3 0.0 0.35 3.2 6.5
10 3 0.0 0.67 6.1 6.9
11 4 0.0 0.25 6.5 6.7
12 5 0.0 0.68 6.9 6.8
13 6 0.0 0.95 6.7 0.0
14 7 1.2 NA 6.8 0.0
")
So, seems pretty easy:
aggregate(.~area.code, test, sum)
area.code A B C D
1 1 0 6.30 15.6 0.0
2 2 0 19.50 17.0 3.1
3 3 0 14.52 15.4 22.7
4 4 0 0.25 6.5 6.7
5 5 0 0.68 6.9 6.8
6 6 0 0.95 6.7 0.0
Apparently not so simple, because area code 7 is completely omitted from the aggregate() command.
I would however like the N/As to be completely ignored or computed as zero values, which na= command gives that option?
replacing all N/As with 0 is an option if I just want the sum... but the mean is really problematic then (since it can't differentiate between 0 and N/A anymore)
If you are willing to consider an external package (data.table):
setDT(test)
test[, lapply(.SD, sum), area.code]
area.code A B C D
1: 0 NA 0.00 NA NA
2: 1 0.0 6.30 15.6 0.0
3: 2 0.0 19.50 17.0 3.1
4: 3 0.0 14.52 15.4 22.7
5: 4 0.0 0.25 6.5 6.7
6: 5 0.0 0.68 6.9 6.8
7: 6 0.0 0.95 6.7 0.0
8: 7 1.2 NA 6.8 0.0
One option is to create a function that gives NA when all the values are NA or otherwise use sum. Along with that, use na.action argument in aggregate as aggregate can remove the row if there is at least one NA
f1 <- function(x) if(all(is.na(x))) NA else sum(x, na.rm = TRUE)
aggregate(.~area.code, test, f1, na.action = na.pass)
# area.code A B C D
#1 0 NA 0.00 NA NA
#2 1 0.0 6.30 15.6 0.0
#3 2 0.0 19.50 17.0 3.1
#4 3 0.0 14.52 15.4 22.7
# 4 0.0 0.25 6.5 6.7
#6 5 0.0 0.68 6.9 6.8
#7 6 0.0 0.95 6.7 0.0
#8 7 1.2 NA 6.8 0.0
When there are only NA elements and we use sum with na.rm = TRUE, it returns 0
sum(c(NA, NA), na.rm = TRUE)
#[1] 0
Another solution is to use dplyr:
test %>%
group_by(area.code) %>%
summarise_all(sum, na.rm = TRUE)

Runge-Kutta 2nd order, wave transformation, solve an equation of retention*

I have a problem with solving equation of retention with method Runge-Kutta (2nd order) in Scilab.
I must to solve equation:
dh/dt=(InF(t)-OutF(t))/F(h), or this equation in picture (link)
where:
h - height [m];
t - time [sec];
InF- inflow to reservoir [m^3/sec];
OutF - outflow from reservoir [m^3/sec];
F - area [m^2];
but i don't have OutF(t), I have OutF(h) and i need to get chart transformation wave inflow-outflow.
I don't know how can I put to scilab code without errors.
Example code (met. Runge-Kutta 2nd order) for function (x,y):
function [y,x]=ExplicitRungeKutta(y0,x0,f,dt,N)
y=zeros(N,1)*%nan;
x=zeros(N,1)*%nan;
y(1)=y0;
x(1)=x0;
for i=2:1:N do
k1=f(x(i-1),y(i-1));
k2=f(x(i-1)+3*dt./4,y(i-1)+3*k1*dt./4);
y(i)=y(i-1)+(k1./3+2*k2./3)*dt;
x(i)=x(i-1)+dt;
end
endfunction
function dy=F(x,y)
dy=-x./(6*y-2*(x.^2)./y);
endfunction
y0=1;
x0=1;
xN=1.34;
N=25;
dt=(xN-x0)/N;
[y,x]=ExplicitRungeKutta(y0,x0,F,dt,N);
plot2d(x,y)
and this is good program, but my case is harder to implicate to scilab.
This is code to errors program:
clear
clc
WYS=[0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5 4.75 5 5.25 5.5 5.75 6];
FZ=[7 12.02 22.00 56.03 170.07 346.36 492.35 655.67 820.95 969.25 1164.35 1461.50 1885.19 2429.21 3040.32 3617.32 4155.82 4605.32 5018.49 5448.25 5801.89 6140.23 6414.84 6710.24 7013.57];
//wys - height in metres, Fz - area in m^2
TIME=[0 900 1800 2700 3600 4500 5400 6300 7200 8100 9000 9900 10800 11700 12600 13500 14400 15300 16200 17100 18000 18900 19800 20700 21600 22500 23400 24300 25200 26100 27000 27900 28800 29700 30600 31500 32400 33300 34200 35100 36000 36900 37800 38700 39600 40500 41400 42300 43200 44100 45000 45900 46800 47700 48600 49500 50400 51300 52200 53100 54000 54900 55800 56700 57600 58500 59400 60300 61200 62100 63000 63900 64800 65700 66600 67500 68400 69300 70200 71100 72000 72900 73800 74700 75600 76500 77400 78300 79200 80100 81000 81900 82800 83700 84600 85500 86400 87300 88200 89100 90000 90900 91800 92700 93600 94500 95400 96300 97200 98100 99000 99900 100800 101700 102600 103500 104400 105300 106200 107100 108000 108900 109800 110700 111600 112500 113400 114300 115200 116100 117000 117900 118800 119700 120600 121500 122400 123300 124200 125100 126000 126900 127800 128700 129600 130500 131400 132300 133200 134100 135000 135900 136800 137700 138600 139500 140400 141300 142200 143100 144000 144900 145800 146700 147600 148500 149400 150300 151200 152100 153000 153900 154800 155700 156600 157500 158400 159300 160200 161100 162000 162900 163800 164700 165600 166500 167400 168300 169200 170100 171000 171900 172800]
R10=[0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.2 0.3 0.3 0.5 0.8 1.3 1.9 2.6 3.3 4.1 4.9 5.7 6.4 7.1 7.7 8.4 9.0 9.5 10.1 10.6 11.0 11.5 11.9 11.7 11.1 10.2 9.0 7.8 6.7 5.8 5.2 4.7 4.4 4.2 4.0 3.9 3.9 3.8 3.8 3.8 3.8 3.8 3.8 3.8 3.8 3.8 3.8 3.8 3.8 3.8 3.9 3.9 3.9 3.9 3.9 3.9 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.2 3.9 3.6 3.1 2.6 2.0 1.5 1.1 0.8 0.6 0.4 0.3 0.2 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0]
// n=193 //intlow to reservoir, time in seconds, R10- rain 10years, chart inflow to reservoir
WYS1=[0 0.35 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5 4.3 4.5 4.6 4.7 4.8 4.9 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6]
oF1=[0 0 0.05 0.15 0.4 0.66 1.1 1.5 2 2.2 2.42 3 3.4 3.8 4.2 4.52 4.84 5.12 5.42 5.65 5.9 6.14 6.37 6.62 6.84 7.04 7.24 7.45 7.65 7.84 8 9.4 9.75 11.11 13.452 16.435 19.93 23.875 28.2 32.915 37.96 41.955 45.95 49.945 53.94 57.935 61.93 65.925]
//outlow from reservoir, of1 - outflow in m3/sec
//y=h,x=t, inf(t=x)=u outf(h=y)=w, Fz(h=y)=z
function [fz,w,u,h,t]=ExplicitRungeKutta(fz0,w0,u0,h0,t0,f,dt,N)
h=zeros(N,1)*%nan;
t=zeros(N,1)*%nan;
h(1)=h0;
t(1)=t0;
u(1)=u0;
w(1)=w0;
fz(1)=fz0;
for i=2:1:N do
k1=f(t(i-1),h(i-1));
k2=f(t(i-1)+3*dt./4,h(i-1)+3*k1*dt./4);
h(i)=h(i-1)+(k1./3+2*k2./3)*dt;
t(i)=t(i-1)+dt;
u(i)=interp1(TIME,R10,t(i),'linear');
w(i)=interp1(WYS,OF1,h(i),'linear');
fz(i)=interp1(WYS,FZ,h(i),'linear');
end
endfunction
function dh=F(u,w,fz)
dh=(u-w)./fz;
endfunction
h0=0;
t0=0;
u0=0;
w0=0;
fz0=7;
tN=172800;
N=17280;
dt=(tN-t0)/N;
[u,w,fz,h,t]=ExplicitRungeKutta(u0,w0,fz0,h0,t0,F,dt,N);
Maybe anybody have any idea how to try solve it?

Distance & cluster with dynamic time warping

I am using dtw to calculate distances between several series and getting strange results. Notice that in the sample data below the first 9 customers are identical sets (A==B==C, D==E==F, and G==H==I). The remaining rows are only for noise to allow me to make 8 clusters.
I expect that the first sets would be clustered with their identical partners. This happens when I calculate distance on the original data, but when I scale the data before distance/clustering I get different results.
The distances between identical rows in original data is 0.0 (as expected), but with scaled data the distances is not 0.0 (not even close). Any ideas why they are not the same?
library(TSdist)
library(dplyr)
library(tidyr)
mydata = as_data_frame(read.table(textConnection("
cust P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
1 A 1.1 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
2 B 1.1 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
3 C 1.1 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
4 D 0.0 1.0 2.0 1.0 0.0 1.0 2.0 1.0 0.0 1.0
5 E 0.0 1.0 2.0 1.0 0.0 1.0 2.0 1.0 0.0 1.0
6 F 0.0 1.0 2.0 1.0 0.0 1.0 2.0 1.0 0.0 1.0
7 G 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 1.5
8 H 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 1.5
9 I 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 1.5
10 D2 1.0 2.0 1.0 0.0 1.0 2.0 1.0 0.0 1.0 2.0
11 E2 5.0 6.0 5.0 4.0 5.0 6.0 5.0 4.0 5.0 6.0
12 F2 9.0 10.0 9.0 8.0 9.0 10.0 9.0 8.0 9.0 10.0
13 G2 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 1.5 1.0
14 H2 5.5 5.0 4.5 4.0 4.5 5.0 5.5 6.0 5.5 5.0
15 I2 9.5 9.0 8.5 8.0 8.5 9.0 9.5 10.0 9.5 9.0
16 A3 1.0 1.0 0.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0
17 B3 5.0 5.0 5.0 5.0 5.0 3.0 8.0 5.0 5.0 5.0
18 C3 9.0 9.0 9.0 9.0 9.0 5.4 14.4 9.0 9.0 9.0
19 D3 0.0 1.0 2.0 1.0 0.0 1.0 1.0 2.0 0.0 1.0
20 E3 4.0 5.0 5.0 6.0 4.0 5.0 6.0 5.0 4.0 5.0
21 F3 8.0 9.0 10.0 9.0 9.0 9.0 9.0 9.0 8.0 9.0
22 G3 2.0 1.5 1.0 0.5 0.0 0.5 1.0 2.0 1.5 1.5
23 H3 6.0 5.5 5.0 4.5 4.0 5.0 4.5 5.5 6.0 5.5
24 I3 10.0 9.5 9.0 9.0 8.0 8.5 9.0 9.5 10.0 9.5
25 D4 0.0 3.0 6.0 3.0 0.0 3.0 6.0 3.0 0.0 5.0
26 E4 3.0 6.0 9.0 6.0 3.0 6.0 9.0 6.0 3.0 6.0
27 F4 4.0 6.0 10.0 7.0 5.0 6.0 11.0 8.0 5.0 7.0
28 D5 5.0 0.0 3.0 6.0 3.0 0.0 3.0 6.0 3.0 0.0
29 D6 9.0 6.0 3.0 6.0 9.0 6.0 3.0 6.0 9.0 6.0
30 D7 9.0 11.0 5.0 4.0 6.0 10.0 7.0 5.0 6.0 11.0
31 Dw 0.0 0.8 1.4 2.0 1.0 0.0 2.0 0.0 1.0 2.0
32 Ew 4.0 4.8 5.4 6.0 5.0 4.0 6.0 4.0 5.0 6.0
33 Fw 8.0 8.8 9.4 10.0 9.0 8.0 10.0 8.0 9.0 10.0
34 Gw 2.0 1.5 1.0 0.5 0.0 1.0 2.0 1.5 1.3 1.1
35 Hw 6.0 5.5 5.0 4.5 4.0 5.0 6.0 5.5 5.3 5.1
36 Iw 10.0 9.5 9.0 8.5 8.0 9.0 10.0 9.5 9.3 9.1"),
header = TRUE, stringsAsFactors = FALSE))
k=8
# create a scale version of mydata (raw data - mean) / std dev
mydata_long = mydata %>%
mutate (mean = apply(mydata[,2:ncol(mydata)],1,mean,na.rm = T)) %>%
mutate (sd = apply(mydata[,2:(ncol(mydata))],1,sd,na.rm = T))%>%
gather (period,value,-cust,-mean,-sd) %>%
mutate (sc = (value-mean)/sd)
mydata_sc = mydata_long[,-c(2,3,5)] %>%
spread(period,sc)
# dtw
dtw_dist = TSDatabaseDistances(mydata[2:ncol(mydata)], distance = "dtw",lag.max= 2) #distance
dtw_clus = hclust(dtw_dist, method="ward.D2") # Cluster
dtw_res = data.frame(cutree(dtw_clus, k)) # cut dendrogram into 9 clusters
# dtw (w scaled data)
dtw_sc_dist = TSDatabaseDistances(mydata_sc[2:ncol(mydata_sc)], distance = "dtw",lag.max= 2) #distance
dtw_sc_clus = hclust(dtw_sc_dist, method="ward.D2") # Cluster
dtw_sc_res = data.frame(cutree(dtw_sc_clus, k)) # cut dendrogram into 9 clusters
results = cbind (dtw_res,dtw_sc_res)
names(results) = c("dtw", "dtw_scaled")
print(results)
dtw dtw_scaled
1 1 1
2 1 2
3 1 1
4 1 2
5 1 1
6 1 2
7 1 3
8 1 4
9 1 3
10 1 3
11 2 3
12 3 4
13 1 5
14 2 6
15 3 3
16 1 4
17 2 3
18 4 3
19 1 6
20 2 3
21 3 4
22 1 3
23 2 3
24 3 6
25 5 7
26 6 8
27 7 7
28 5 7
29 6 7
30 8 8
31 1 7
32 2 7
33 3 7
34 1 8
35 2 7
36 3 7
A couple issues
You are scaling rowwise, not columnwise (take a look at the intermediate results of your dplyr chain -- do they make sense?)
The data manipulations you used to produce the scaled data changed the rows ordering of your data frame to alphabetical:
> mydata_sc %>% head
cust P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
(chr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 A 2.84604989 -0.31622777 -0.31622777 -0.31622777 -0.31622777 -0.3162278 -0.3162278 -0.31622777 -0.31622777 -0.31622777
2 A3 0.00000000 0.00000000 -2.12132034 2.12132034 0.00000000 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000
3 B 2.84604989 -0.31622777 -0.31622777 -0.31622777 -0.31622777 -0.3162278 -0.3162278 -0.31622777 -0.31622777 -0.31622777
vs.
> mydata %>% head
Source: local data frame [6 x 11]
cust P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
(chr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 A 1.1 1 1 1 1 1 1 1 1 1
2 B 1.1 1 1 1 1 1 1 1 1 1
(check the cust variable ordering!)
Here's my approach, and how I think you can avoid similar mistakes in the future:
scale with built-in scale function
mydata_sc <- mydata %>% select(-cust) %>% scale %>% as.data.frame %>% cbind(cust =mydata$cust,.) %>% as.tbl
assert that your scaled dataframe is equivalent to a scaled version of your original dataframe:
> (scale(mydata_sc %>% select(-cust)) - scale(mydata %>% select(-cust)))
%>% colSums %>% sum
[1] 0.000000000000005353357
Create one single function to perform your desired manipulations:
return_dtw <- function(df) {
res_2 = TSDatabaseDistances(df[2:ncol(df)],distance="dtw",lag.max=2) %>%
hclust(.,method="ward.D2")
return(data.frame(cutree(res_2,k)))
}
execute function:
> mydata %>% return_dtw %>% cbind(mydata_sc %>% return_dtw)
cutree.res_2..k. cutree.res_2..k.
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 1
7 1 1
8 1 1
9 1 1
10 1 1
11 2 2
12 3 3
13 1 1
14 2 2
15 3 3
16 1 1
17 2 2
18 4 3
19 1 1
20 2 2
21 3 3
22 1 1
23 2 2
24 3 3
25 5 4
26 6 5
27 7 5
28 5 6
29 6 7
30 8 8
31 1 1
32 2 2
33 3 3
34 1 1
35 2 2
36 3 3
Some of the later customers are not grouped similarly, but that's for another question!

Resources