Distance & cluster with dynamic time warping - r

I am using dtw to calculate distances between several series and getting strange results. Notice that in the sample data below the first 9 customers are identical sets (A==B==C, D==E==F, and G==H==I). The remaining rows are only for noise to allow me to make 8 clusters.
I expect that the first sets would be clustered with their identical partners. This happens when I calculate distance on the original data, but when I scale the data before distance/clustering I get different results.
The distances between identical rows in original data is 0.0 (as expected), but with scaled data the distances is not 0.0 (not even close). Any ideas why they are not the same?
library(TSdist)
library(dplyr)
library(tidyr)
mydata = as_data_frame(read.table(textConnection("
cust P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
1 A 1.1 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
2 B 1.1 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
3 C 1.1 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
4 D 0.0 1.0 2.0 1.0 0.0 1.0 2.0 1.0 0.0 1.0
5 E 0.0 1.0 2.0 1.0 0.0 1.0 2.0 1.0 0.0 1.0
6 F 0.0 1.0 2.0 1.0 0.0 1.0 2.0 1.0 0.0 1.0
7 G 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 1.5
8 H 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 1.5
9 I 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 1.5
10 D2 1.0 2.0 1.0 0.0 1.0 2.0 1.0 0.0 1.0 2.0
11 E2 5.0 6.0 5.0 4.0 5.0 6.0 5.0 4.0 5.0 6.0
12 F2 9.0 10.0 9.0 8.0 9.0 10.0 9.0 8.0 9.0 10.0
13 G2 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 1.5 1.0
14 H2 5.5 5.0 4.5 4.0 4.5 5.0 5.5 6.0 5.5 5.0
15 I2 9.5 9.0 8.5 8.0 8.5 9.0 9.5 10.0 9.5 9.0
16 A3 1.0 1.0 0.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0
17 B3 5.0 5.0 5.0 5.0 5.0 3.0 8.0 5.0 5.0 5.0
18 C3 9.0 9.0 9.0 9.0 9.0 5.4 14.4 9.0 9.0 9.0
19 D3 0.0 1.0 2.0 1.0 0.0 1.0 1.0 2.0 0.0 1.0
20 E3 4.0 5.0 5.0 6.0 4.0 5.0 6.0 5.0 4.0 5.0
21 F3 8.0 9.0 10.0 9.0 9.0 9.0 9.0 9.0 8.0 9.0
22 G3 2.0 1.5 1.0 0.5 0.0 0.5 1.0 2.0 1.5 1.5
23 H3 6.0 5.5 5.0 4.5 4.0 5.0 4.5 5.5 6.0 5.5
24 I3 10.0 9.5 9.0 9.0 8.0 8.5 9.0 9.5 10.0 9.5
25 D4 0.0 3.0 6.0 3.0 0.0 3.0 6.0 3.0 0.0 5.0
26 E4 3.0 6.0 9.0 6.0 3.0 6.0 9.0 6.0 3.0 6.0
27 F4 4.0 6.0 10.0 7.0 5.0 6.0 11.0 8.0 5.0 7.0
28 D5 5.0 0.0 3.0 6.0 3.0 0.0 3.0 6.0 3.0 0.0
29 D6 9.0 6.0 3.0 6.0 9.0 6.0 3.0 6.0 9.0 6.0
30 D7 9.0 11.0 5.0 4.0 6.0 10.0 7.0 5.0 6.0 11.0
31 Dw 0.0 0.8 1.4 2.0 1.0 0.0 2.0 0.0 1.0 2.0
32 Ew 4.0 4.8 5.4 6.0 5.0 4.0 6.0 4.0 5.0 6.0
33 Fw 8.0 8.8 9.4 10.0 9.0 8.0 10.0 8.0 9.0 10.0
34 Gw 2.0 1.5 1.0 0.5 0.0 1.0 2.0 1.5 1.3 1.1
35 Hw 6.0 5.5 5.0 4.5 4.0 5.0 6.0 5.5 5.3 5.1
36 Iw 10.0 9.5 9.0 8.5 8.0 9.0 10.0 9.5 9.3 9.1"),
header = TRUE, stringsAsFactors = FALSE))
k=8
# create a scale version of mydata (raw data - mean) / std dev
mydata_long = mydata %>%
mutate (mean = apply(mydata[,2:ncol(mydata)],1,mean,na.rm = T)) %>%
mutate (sd = apply(mydata[,2:(ncol(mydata))],1,sd,na.rm = T))%>%
gather (period,value,-cust,-mean,-sd) %>%
mutate (sc = (value-mean)/sd)
mydata_sc = mydata_long[,-c(2,3,5)] %>%
spread(period,sc)
# dtw
dtw_dist = TSDatabaseDistances(mydata[2:ncol(mydata)], distance = "dtw",lag.max= 2) #distance
dtw_clus = hclust(dtw_dist, method="ward.D2") # Cluster
dtw_res = data.frame(cutree(dtw_clus, k)) # cut dendrogram into 9 clusters
# dtw (w scaled data)
dtw_sc_dist = TSDatabaseDistances(mydata_sc[2:ncol(mydata_sc)], distance = "dtw",lag.max= 2) #distance
dtw_sc_clus = hclust(dtw_sc_dist, method="ward.D2") # Cluster
dtw_sc_res = data.frame(cutree(dtw_sc_clus, k)) # cut dendrogram into 9 clusters
results = cbind (dtw_res,dtw_sc_res)
names(results) = c("dtw", "dtw_scaled")
print(results)
dtw dtw_scaled
1 1 1
2 1 2
3 1 1
4 1 2
5 1 1
6 1 2
7 1 3
8 1 4
9 1 3
10 1 3
11 2 3
12 3 4
13 1 5
14 2 6
15 3 3
16 1 4
17 2 3
18 4 3
19 1 6
20 2 3
21 3 4
22 1 3
23 2 3
24 3 6
25 5 7
26 6 8
27 7 7
28 5 7
29 6 7
30 8 8
31 1 7
32 2 7
33 3 7
34 1 8
35 2 7
36 3 7

A couple issues
You are scaling rowwise, not columnwise (take a look at the intermediate results of your dplyr chain -- do they make sense?)
The data manipulations you used to produce the scaled data changed the rows ordering of your data frame to alphabetical:
> mydata_sc %>% head
cust P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
(chr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 A 2.84604989 -0.31622777 -0.31622777 -0.31622777 -0.31622777 -0.3162278 -0.3162278 -0.31622777 -0.31622777 -0.31622777
2 A3 0.00000000 0.00000000 -2.12132034 2.12132034 0.00000000 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000
3 B 2.84604989 -0.31622777 -0.31622777 -0.31622777 -0.31622777 -0.3162278 -0.3162278 -0.31622777 -0.31622777 -0.31622777
vs.
> mydata %>% head
Source: local data frame [6 x 11]
cust P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
(chr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 A 1.1 1 1 1 1 1 1 1 1 1
2 B 1.1 1 1 1 1 1 1 1 1 1
(check the cust variable ordering!)
Here's my approach, and how I think you can avoid similar mistakes in the future:
scale with built-in scale function
mydata_sc <- mydata %>% select(-cust) %>% scale %>% as.data.frame %>% cbind(cust =mydata$cust,.) %>% as.tbl
assert that your scaled dataframe is equivalent to a scaled version of your original dataframe:
> (scale(mydata_sc %>% select(-cust)) - scale(mydata %>% select(-cust)))
%>% colSums %>% sum
[1] 0.000000000000005353357
Create one single function to perform your desired manipulations:
return_dtw <- function(df) {
res_2 = TSDatabaseDistances(df[2:ncol(df)],distance="dtw",lag.max=2) %>%
hclust(.,method="ward.D2")
return(data.frame(cutree(res_2,k)))
}
execute function:
> mydata %>% return_dtw %>% cbind(mydata_sc %>% return_dtw)
cutree.res_2..k. cutree.res_2..k.
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 1
7 1 1
8 1 1
9 1 1
10 1 1
11 2 2
12 3 3
13 1 1
14 2 2
15 3 3
16 1 1
17 2 2
18 4 3
19 1 1
20 2 2
21 3 3
22 1 1
23 2 2
24 3 3
25 5 4
26 6 5
27 7 5
28 5 6
29 6 7
30 8 8
31 1 1
32 2 2
33 3 3
34 1 1
35 2 2
36 3 3
Some of the later customers are not grouped similarly, but that's for another question!

Related

Times series in R : how to change y-axis?

New R user here, working with meteorological data (data frame is called "Stations"). Trying to plot 3 time series with temperature on y-axis with a regression line on each one, but I encounter a few problems and there is no error messages.
Loop doesn't seem to be working and I can't figure out why.
Didn't manage to change x-axis graduation values for years ("Année" in the data frame) instead of a number.
Title is the same for the 3 plots, how do I change it so each plot has its own title?
Regression line is not shown on the graph.
Thanks in advance!
Here is my code :
for (i in c(6,8,10))
plot(ts(Stations[,i]), col="dodgerblue4", xlab="Temps", ylab="Température", main="Genève")
for (i in c(6,8,10))
abline(h=Stations[,i])```
Nb.enr time Année Mois Jour T2m_GE pcp_GE T2m_PU pcp_PU T2m_NY
1 19810101 1981 1 1 1.3 0.3 2.8 0.0 2.3
2 19810102 1981 1 2 1.2 0.1 2.3 1.2 1.6
3 19810103 1981 1 3 4.1 21.8 4.9 5.2 3.8
4 19810104 1981 1 4 5.1 10.3 5.1 17.4 4.9
5 19810105 1981 1 5 0.9 0.0 1.0 0.1 0.8
6 19810106 1981 1 6 0.5 5.7 0.7 6.0 0.5
7 19810107 1981 1 7 -2.7 0.0 -2.1 0.1 -1.9
8 19810108 1981 1 8 -3.2 0.0 -4.1 0.0 -3.8
9 19810109 1981 1 9 -5.2 0.0 -3.5 0.0 -5.1
10 19810110 1981 1 10 -3.1 10.6 -0.9 6.0 -2.6

How to reorganize data with the function `gather` (or similar) to reduce four variables to two

I have the dataframe df1 that summarizes the mean number of animals per 6-hours interval and per zone (mean_A and mean_B). I also have the standard error of this means (Se_A and Se_B). As an example:
df1<-data.frame(Hour=c(0,6,12,18,24),
mean_A= c(7.3,6.8,8.9,3.4,12.1),
mean_B=c(6.3,8.2,3.1,4.8,13.2),
Se_A=c(1.3,2.1,0.9,3.2,0.8),
Se_B=c(0.9,0.3,1.8,1.1,1.3))
> df1
Hour mean_A mean_B Se_A Se_B
1 0 7.3 6.3 1.3 0.9
2 6 6.8 8.2 2.1 0.3
3 12 8.9 3.1 0.9 1.8
4 18 3.4 4.8 3.2 1.1
5 24 12.1 13.2 0.8 1.3
For plotting reasons, I need to reorganize the dataframe. What I would need is this (or similar):
> df1
Hour meanType meanValue Se
1 0 mean_A 7.3 1.3
2 6 mean_A 6.8 2.1
3 12 mean_A 8.9 0.9
4 18 mean_A 3.4 3.2
5 24 mean_A 12.1 0.8
6 0 mean_B 6.3 0.9
7 6 mean_B 8.2 0.3
8 12 mean_B 3.1 1.8
9 18 mean_B 4.8 1.1
10 24 mean_B 13.2 1.3
Does anyone how to do it?
Using reshape
reshape(df1, idvar = "Hour", varying = 2:5, direction = "long", sep = "_", timevar = "type")
# Hour type mean Se
#0.A 0 A 7.3 1.3
#6.A 6 A 6.8 2.1
#12.A 12 A 8.9 0.9
#18.A 18 A 3.4 3.2
#24.A 24 A 12.1 0.8
#0.B 0 B 6.3 0.9
#6.B 6 B 8.2 0.3
#12.B 12 B 3.1 1.8
#18.B 18 B 4.8 1.1
#24.B 24 B 13.2 1.3
We can also use tidyr's pivot_longer (version 0.8.3.9000)
library(tidyr)
pivot_longer(df1, cols = -Hour, names_to = c(".value", "Type"), names_sep = "_")
# A tibble: 10 x 4
# Hour Type mean Se
# <dbl> <chr> <dbl> <dbl>
# 1 0 A 7.3 1.3
# 2 0 B 6.3 0.9
# 3 6 A 6.8 2.1
# 4 6 B 8.2 0.3
# 5 12 A 8.9 0.9
# 6 12 B 3.1 1.8
# 7 18 A 3.4 3.2
# 8 18 B 4.8 1.1
# 9 24 A 12.1 0.8
#10 24 B 13.2 1.3
From the vignette:
Note the special variable name .value: this tells pivot_longer() that that component of the variable name defines the name of the output value column.
We can use melt from data.table which would make it easier as it is in-built with taking multiple measure patterns to create separate columns when reshaped from 'wide' to 'long'
library(data.table)
melt(setDT(df1), measure = patterns("^mean", "^Se"),
variable.name = "meanType", value.name = c("meanValue", "Se"))[,
meanType := names(df1)[2:3][meanType]][]
# Hour meanType meanValue Se
# 1: 0 mean_A 7.3 1.3
# 2: 6 mean_A 6.8 2.1
# 3: 12 mean_A 8.9 0.9
# 4: 18 mean_A 3.4 3.2
# 5: 24 mean_A 12.1 0.8
# 6: 0 mean_B 6.3 0.9
# 7: 6 mean_B 8.2 0.3
# 8: 12 mean_B 3.1 1.8
# 9: 18 mean_B 4.8 1.1
#10: 24 mean_B 13.2 1.3
If we need a tidyverse approach
library(tidyversse)
gather(df1, meanType, val, -Hour) %>%
separate(meanType, into = c("meanType1", "meanType")) %>%
spread(meanType1, val) %>%
mutate(meanType = str_c("mean_", meanType)) %>%
arrange(meanType)
# Hour meanType mean Se
#1 0 mean_A 7.3 1.3
#2 6 mean_A 6.8 2.1
#3 12 mean_A 8.9 0.9
#4 18 mean_A 3.4 3.2
#5 24 mean_A 12.1 0.8
#6 0 mean_B 6.3 0.9
#7 6 mean_B 8.2 0.3
#8 12 mean_B 3.1 1.8
#9 18 mean_B 4.8 1.1
#10 24 mean_B 13.2 1.3
NOTE: The gather also works here, but make sure to check the type of columns before doing the gather. As both the columns are of numeric type, it is not an issue. When, we have multiple types and if we gather into a single column, then we may need to type_convert (from readr) after the spread step

Setting defaults by ggproto when extending ggplot2

I ran the R Code posted in http://ggplot2.tidyverse.org/articles/extending-ggplot2.html#picking-defaults
and the following is the modified code that added some print() to show the values of variables in each step, my questions are marked as comments in the code:
StatDensityCommon <- ggproto("StatDensityCommon", Stat, required_aes = "x",
setup_params = function(data, params) {
print("PARAMS BEFORE:")
print(params)
if(!is.null(params$bandwidth))
return(params)
print("DATA: ")
print(data)
#1. When and how does the data being modified and the "group" field added?
xs <- split(data$x, data$group)
print("XS: ")
print(xs)
bws <- vapply(xs, bw.nrd0, numeric(1))
print("BWS: ")
print(bws)
bw <- mean(bws)
print("BW: ")
print(bw)
message("Picking bandwidth of ", signif(bw, 3))
params$bandwidth <- bw
print("PARAMS AFTER: ")
print(params)
params
},
compute_group = function(data, scales, bandwidth = 1) {
#2. how does the bandwidth computed in setup_params passed into compute_group
#even if the bandwidth has already been set to 1 in the arguments?
d <- density(data$x, bw = bandwidth)
data.frame(x = d$x, y = d$y)
}
)
stat_density_common <- function(mapping = NULL, data = NULL, geom = "line", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, bandwidth = NULL, ...){
layer(stat = StatDensityCommon, data = data, mapping = mapping, geom = geom, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(bandwidth = bandwidth, na.rm = na.rm, ...))
}
ggplot(mpg, aes(displ, colour = drv)) + stat_density_common()
The following are the outputs except the plot:
[1] "PARAMS BEFORE:"
$bandwidth
NULL
$na.rm
[1] FALSE
[1] "DATA: "
x colour PANEL group
1 1.8 f 1 2
2 1.8 f 1 2
3 2.0 f 1 2
4 2.0 f 1 2
5 2.8 f 1 2
6 2.8 f 1 2
7 3.1 f 1 2
8 1.8 4 1 1
9 1.8 4 1 1
10 2.0 4 1 1
11 2.0 4 1 1
12 2.8 4 1 1
13 2.8 4 1 1
14 3.1 4 1 1
15 3.1 4 1 1
16 2.8 4 1 1
17 3.1 4 1 1
18 4.2 4 1 1
19 5.3 r 1 3
20 5.3 r 1 3
21 5.3 r 1 3
22 5.7 r 1 3
23 6.0 r 1 3
24 5.7 r 1 3
25 5.7 r 1 3
26 6.2 r 1 3
27 6.2 r 1 3
28 7.0 r 1 3
29 5.3 4 1 1
30 5.3 4 1 1
31 5.7 4 1 1
32 6.5 4 1 1
33 2.4 f 1 2
34 2.4 f 1 2
35 3.1 f 1 2
36 3.5 f 1 2
37 3.6 f 1 2
38 2.4 f 1 2
39 3.0 f 1 2
40 3.3 f 1 2
41 3.3 f 1 2
42 3.3 f 1 2
43 3.3 f 1 2
44 3.3 f 1 2
45 3.8 f 1 2
46 3.8 f 1 2
47 3.8 f 1 2
48 4.0 f 1 2
49 3.7 4 1 1
50 3.7 4 1 1
51 3.9 4 1 1
52 3.9 4 1 1
53 4.7 4 1 1
54 4.7 4 1 1
55 4.7 4 1 1
56 5.2 4 1 1
57 5.2 4 1 1
58 3.9 4 1 1
59 4.7 4 1 1
60 4.7 4 1 1
61 4.7 4 1 1
62 5.2 4 1 1
63 5.7 4 1 1
64 5.9 4 1 1
65 4.7 4 1 1
66 4.7 4 1 1
67 4.7 4 1 1
68 4.7 4 1 1
69 4.7 4 1 1
70 4.7 4 1 1
71 5.2 4 1 1
72 5.2 4 1 1
73 5.7 4 1 1
74 5.9 4 1 1
75 4.6 r 1 3
76 5.4 r 1 3
77 5.4 r 1 3
78 4.0 4 1 1
79 4.0 4 1 1
80 4.0 4 1 1
81 4.0 4 1 1
82 4.6 4 1 1
83 5.0 4 1 1
84 4.2 4 1 1
85 4.2 4 1 1
86 4.6 4 1 1
87 4.6 4 1 1
88 4.6 4 1 1
89 5.4 4 1 1
90 5.4 4 1 1
91 3.8 r 1 3
92 3.8 r 1 3
93 4.0 r 1 3
94 4.0 r 1 3
95 4.6 r 1 3
96 4.6 r 1 3
97 4.6 r 1 3
98 4.6 r 1 3
99 5.4 r 1 3
100 1.6 f 1 2
101 1.6 f 1 2
102 1.6 f 1 2
103 1.6 f 1 2
104 1.6 f 1 2
105 1.8 f 1 2
106 1.8 f 1 2
107 1.8 f 1 2
108 2.0 f 1 2
109 2.4 f 1 2
110 2.4 f 1 2
111 2.4 f 1 2
112 2.4 f 1 2
113 2.5 f 1 2
114 2.5 f 1 2
115 3.3 f 1 2
116 2.0 f 1 2
117 2.0 f 1 2
118 2.0 f 1 2
119 2.0 f 1 2
120 2.7 f 1 2
121 2.7 f 1 2
122 2.7 f 1 2
123 3.0 4 1 1
124 3.7 4 1 1
125 4.0 4 1 1
126 4.7 4 1 1
127 4.7 4 1 1
128 4.7 4 1 1
129 5.7 4 1 1
130 6.1 4 1 1
131 4.0 4 1 1
132 4.2 4 1 1
133 4.4 4 1 1
134 4.6 4 1 1
135 5.4 r 1 3
136 5.4 r 1 3
137 5.4 r 1 3
138 4.0 4 1 1
139 4.0 4 1 1
140 4.6 4 1 1
141 5.0 4 1 1
142 2.4 f 1 2
143 2.4 f 1 2
144 2.5 f 1 2
145 2.5 f 1 2
146 3.5 f 1 2
147 3.5 f 1 2
148 3.0 f 1 2
149 3.0 f 1 2
150 3.5 f 1 2
151 3.3 4 1 1
152 3.3 4 1 1
153 4.0 4 1 1
154 5.6 4 1 1
155 3.1 f 1 2
156 3.8 f 1 2
157 3.8 f 1 2
158 3.8 f 1 2
159 5.3 f 1 2
160 2.5 4 1 1
161 2.5 4 1 1
162 2.5 4 1 1
163 2.5 4 1 1
164 2.5 4 1 1
165 2.5 4 1 1
166 2.2 4 1 1
167 2.2 4 1 1
168 2.5 4 1 1
169 2.5 4 1 1
170 2.5 4 1 1
171 2.5 4 1 1
172 2.5 4 1 1
173 2.5 4 1 1
174 2.7 4 1 1
175 2.7 4 1 1
176 3.4 4 1 1
177 3.4 4 1 1
178 4.0 4 1 1
179 4.7 4 1 1
180 2.2 f 1 2
181 2.2 f 1 2
182 2.4 f 1 2
183 2.4 f 1 2
184 3.0 f 1 2
185 3.0 f 1 2
186 3.5 f 1 2
187 2.2 f 1 2
188 2.2 f 1 2
189 2.4 f 1 2
190 2.4 f 1 2
191 3.0 f 1 2
192 3.0 f 1 2
193 3.3 f 1 2
194 1.8 f 1 2
195 1.8 f 1 2
196 1.8 f 1 2
197 1.8 f 1 2
198 1.8 f 1 2
199 4.7 4 1 1
200 5.7 4 1 1
201 2.7 4 1 1
202 2.7 4 1 1
203 2.7 4 1 1
204 3.4 4 1 1
205 3.4 4 1 1
206 4.0 4 1 1
207 4.0 4 1 1
208 2.0 f 1 2
209 2.0 f 1 2
210 2.0 f 1 2
211 2.0 f 1 2
212 2.8 f 1 2
213 1.9 f 1 2
214 2.0 f 1 2
215 2.0 f 1 2
216 2.0 f 1 2
217 2.0 f 1 2
218 2.5 f 1 2
219 2.5 f 1 2
220 2.8 f 1 2
221 2.8 f 1 2
222 1.9 f 1 2
223 1.9 f 1 2
224 2.0 f 1 2
225 2.0 f 1 2
226 2.5 f 1 2
227 2.5 f 1 2
228 1.8 f 1 2
229 1.8 f 1 2
230 2.0 f 1 2
231 2.0 f 1 2
232 2.8 f 1 2
233 2.8 f 1 2
234 3.6 f 1 2
[1] "XS: "
$`1`
[1] 1.8 1.8 2.0 2.0 2.8 2.8 3.1 3.1 2.8 3.1 4.2 5.3 5.3 5.7 6.5 3.7 3.7 3.9 3.9 4.7 4.7 4.7 5.2 5.2
[25] 3.9 4.7 4.7 4.7 5.2 5.7 5.9 4.7 4.7 4.7 4.7 4.7 4.7 5.2 5.2 5.7 5.9 4.0 4.0 4.0 4.0 4.6 5.0 4.2
[49] 4.2 4.6 4.6 4.6 5.4 5.4 3.0 3.7 4.0 4.7 4.7 4.7 5.7 6.1 4.0 4.2 4.4 4.6 4.0 4.0 4.6 5.0 3.3 3.3
[73] 4.0 5.6 2.5 2.5 2.5 2.5 2.5 2.5 2.2 2.2 2.5 2.5 2.5 2.5 2.5 2.5 2.7 2.7 3.4 3.4 4.0 4.7 4.7 5.7
[97] 2.7 2.7 2.7 3.4 3.4 4.0 4.0
$`2`
[1] 1.8 1.8 2.0 2.0 2.8 2.8 3.1 2.4 2.4 3.1 3.5 3.6 2.4 3.0 3.3 3.3 3.3 3.3 3.3 3.8 3.8 3.8 4.0 1.6
[25] 1.6 1.6 1.6 1.6 1.8 1.8 1.8 2.0 2.4 2.4 2.4 2.4 2.5 2.5 3.3 2.0 2.0 2.0 2.0 2.7 2.7 2.7 2.4 2.4
[49] 2.5 2.5 3.5 3.5 3.0 3.0 3.5 3.1 3.8 3.8 3.8 5.3 2.2 2.2 2.4 2.4 3.0 3.0 3.5 2.2 2.2 2.4 2.4 3.0
[73] 3.0 3.3 1.8 1.8 1.8 1.8 1.8 2.0 2.0 2.0 2.0 2.8 1.9 2.0 2.0 2.0 2.0 2.5 2.5 2.8 2.8 1.9 1.9 2.0
[97] 2.0 2.5 2.5 1.8 1.8 2.0 2.0 2.8 2.8 3.6
$`3`
[1] 5.3 5.3 5.3 5.7 6.0 5.7 5.7 6.2 6.2 7.0 4.6 5.4 5.4 3.8 3.8 4.0 4.0 4.6 4.6 4.6 4.6 5.4 5.4 5.4
[25] 5.4
[1] "BWS: "
1 2 3
0.4056219 0.2482564 0.3797632
[1] "BW: "
[1] 0.3445472
Picking bandwidth of 0.345
[1] "PARAMS AFTER: "
$bandwidth
[1] 0.3445472
$na.rm
[1] FALSE
Thanks in advance!

Find all points on a plane

I am trying to get all points on a 2d plane in the range (0..10,0..10) with a step of 0.5. I would like two store these values in a dataframe like this:
x y
1 1 1.5
2 0 0.5
3 4 2.0
I am considering using a loop to start from 0.0 for the x column and fill the y column such that I get something like this:
x y
1 0 0
2 0 0.5
3 0 1
and so on upto 10. And increment it by 0.5 and do for 1 and so on. I would like to know a more efficient way of doing this in R?.
Is this what you want?
expand.grid(x=seq(0,10,by=0.5),y=seq(0,10,by=0.5))
x y
1 0.0 0.0
2 0.5 0.0
3 1.0 0.0
4 1.5 0.0
5 2.0 0.0
6 2.5 0.0
7 3.0 0.0
8 3.5 0.0
9 4.0 0.0
10 4.5 0.0
11 5.0 0.0
12 5.5 0.0
13 6.0 0.0
14 6.5 0.0
15 7.0 0.0
16 7.5 0.0
17 8.0 0.0
18 8.5 0.0
19 9.0 0.0
20 9.5 0.0
21 10.0 0.0
22 0.0 0.5
23 0.5 0.5
24 1.0 0.5
25 1.5 0.5
26 2.0 0.5
27 2.5 0.5
28 3.0 0.5
29 3.5 0.5
30 4.0 0.5
...

Moving data in dataframe

I'm trying to move the data in the data frame around. I want to move all the first values not equal to 0 to Height 1.
Example data looks like follow
Tree <- c(1:10)
height0 <- c(0,0,0,0,0,0,0,0,0,0)
height1 <- c(1.5,2.0,0.0,1.2,1.3,0.9,0.0,0.0,1.8,0.0)
height2 <- c(2.4,2.2,1.1,1.9,1.4,1.7,0.0,0.0,2.7,0.0)
height3 <- c(3.1,2.9,2.1,2.6,2.2,2.4,0.0,0.6,3.6,0.0)
height4 <- c(3.8,3.4,2.9,3.0,2.9,3.1,0.0,1.1,4.1,0.0)
height5 <- c(4.2,3.7,3.6,3.7,3.5,3.8,0.7,1.9,4.6,0.0)
height6 <- c(4.4,4.1,4.1,4.2,4.0,4.5,1.6,2.6,4.9,1.2)
height7 <- c(4.7,4.4,4.3,4.6,4.2,4.9,2.2,3.0,5.1,2.0)
df <- data.frame(Tree, height0, height1, height2, height3, height4, height5, height6, height7)
So the Data frame df looks like follow
df
Tree height0 height1 height2 height3 height4 height5 height6 height7
1 1 0 1.5 2.4 3.1 3.8 4.2 4.4 4.7
2 2 0 2.0 2.2 2.9 3.4 3.7 4.1 4.4
3 3 0 0.0 1.1 2.1 2.9 3.6 4.1 4.3
4 4 0 1.2 1.9 2.6 3.0 3.7 4.2 4.6
5 5 0 1.3 1.4 2.2 2.9 3.5 4.0 4.2
6 6 0 0.9 1.7 2.4 3.1 3.8 4.5 4.9
7 7 0 0.0 0.0 0.0 0.0 0.7 1.6 2.2
8 8 0 0.0 0.0 0.6 1.1 1.9 2.6 3.0
9 9 0 1.8 2.7 3.6 4.1 4.6 4.9 5.1
10 10 0 0.0 0.0 0.0 0.0 0.0 1.2 2.0
I'm trying to move all the first height values to height 1, as not all the trees germinated at the same time and i only want to compare the growth speed and not get false results due to germination differences.
So what my data should like like afterwards is as follow
df
Tree height0 height1 height2 height3 height4 height5 height6 height7
1 1 0 1.5 2.4 3.1 3.8 4.2 4.4 4.7
2 2 0 2.0 2.2 2.9 3.4 3.7 4.1 4.4
3 3 0 1.1 2.1 2.9 3.6 4.1 4.3
4 4 0 1.2 1.9 2.6 3.0 3.7 4.2 4.6
5 5 0 1.3 1.4 2.2 2.9 3.5 4.0 4.2
6 6 0 0.9 1.7 2.4 3.1 3.8 4.5 4.9
7 7 0 0.7 1.6 2.2
8 8 0 0.6 1.1 1.9 2.6 3.0
9 9 0 1.8 2.7 3.6 4.1 4.6 4.9 5.1
10 10 0 1.2 2.0
Is there any a way to do this?
I have over 3000 trees I measured for 40 times, and doing it manually is going to take to long
Thank you
One option would be to loop through the rows (apply with MARGIN = 1), extract the non-zero elements, pad the rest with NA using the length<-), transpose the output and assign it back.
df[-(1:2)] <- t(apply(df[-(1:2)], 1, function(x) `length<-`(x[x!=0], ncol(df)-2)))
df
# Tree height0 height1 height2 height3 height4 height5 height6 height7
#1 1 0 1.5 2.4 3.1 3.8 4.2 4.4 4.7
#2 2 0 2.0 2.2 2.9 3.4 3.7 4.1 4.4
#3 3 0 1.1 2.1 2.9 3.6 4.1 4.3 NA
#4 4 0 1.2 1.9 2.6 3.0 3.7 4.2 4.6
#5 5 0 1.3 1.4 2.2 2.9 3.5 4.0 4.2
#6 6 0 0.9 1.7 2.4 3.1 3.8 4.5 4.9
#7 7 0 0.7 1.6 2.2 NA NA NA NA
#8 8 0 0.6 1.1 1.9 2.6 3.0 NA NA
#9 9 0 1.8 2.7 3.6 4.1 4.6 4.9 5.1
#10 10 0 1.2 2.0 NA NA NA NA NA

Resources