Getting the distance matrix back from already clustered data - r

I have used hclust in the TSclust package to do agglomerative hierarchical clustering. My question is, Can I get the dissimlarity (distance) matrix back from hclust? I wanted the values of the distance to rank which variable is closer to a single variable in the group of variables.
example: If (x1,x2, x3,x4,x5,x6,x7,x8,x9,x10) are the variables used to form the distance matrix, then what I wanted is the distance between x3 and the rest of variables (x3x1,x3x2,x3x4,x3x5, and so on). Can we do that? Here is the code and reproducible data.
Data:
structure(list(x1 = c(186.41, 100.18, 12.3, 14.38, 25.97, 0.06,
0, 6.17, 244.06, 19.26, 256.18, 255.69, 121.88, 75, 121.45, 11.34,
34.68, 3.09, 34.3, 26.13, 111.31), x2 = c(327.2, 8.05, 4.23,
6.7, 3.12, 1.91, 37.03, 39.17, 140.06, 83.72, 263.29, 261.22,
202.48, 23.27, 2.87, 7.17, 14.48, 3.41, 5.95, 70.56, 91.58),
x3 = c(220.18, 126.14, 98.59, 8.56, 0.5, 0.9, 17.45, 191.1,
164.64, 224.36, 262.86, 237.75, 254.88, 42.05, 9.12, 0.04,
12.22, 0.61, 61.86, 114.08, 78.94), x4 = c(90.74, 26.11,
47.86, 10.86, 3.74, 23.69, 61.79, 68.12, 87.92, 171.76, 260.98,
266.62, 96.27, 57.15, 78.89, 16.73, 6.59, 49.44, 57.21, 202.2,
67.17), x5 = c(134.09, 27.06, 7.44, 4.53, 17, 47.66, 95.96,
129.53, 40.23, 157.37, 172.61, 248.56, 160.84, 421.94, 109.93,
22.77, 2.11, 49.18, 64.13, 52.61, 180.87), x6 = c(173.17,
46.68, 6.54, 3.05, 0.35, 0.12, 5.09, 72.46, 58.19, 112.31,
233.77, 215.82, 100.63, 65.84, 2.69, 0.01, 3.63, 12.93, 66.55,
28, 61.74), x7 = c(157.22, 141.81, 19.98, 116.18, 16.55,
122.3, 62.67, 141.84, 78.3, 227.27, 340.22, 351.38, 147.73,
0.3, 56.12, 33.2, 5.51, 54.4, 82.98, 152.66, 218.26), x8 = c(274.08,
51.92, 54.86, 15.37, 0.31, 0.05, 36.3, 162.04, 171.78, 181.39,
310.73, 261.55, 237.99, 123.99, 1.92, 0.74, 0.23, 18.51,
7.68, 65.55, 171.33), x9 = c(262.71, 192.34, 2.75, 21.68,
1.69, 3.92, 0.09, 9.33, 120.36, 282.92, 236.7, 161.59, 255.44,
126.44, 7.63, 2.04, 1.02, 0.12, 5.87, 146.25, 134.11), x10 = c(82.71,
44.09, 1.52, 2.63, 4.38, 28.64, 168.43, 80.62, 20.36, 39.29,
302.31, 247.52, 165.73, 18.27, 2.67, 1.77, 23.13, 53.47,
53.14, 46.61, 86.29)), class = "data.frame", row.names = c(NA,
-21L))
Code:
as.ts(cdata)
library(dplyr) # data wrangling
library(ggplot2) # grammar of graphics
library(ggdendro) # dendrograms
library(TSclust) # cluster time series
cluster analysis
dist_ts <- TSclust::diss(SERIES = t(cdata), METHOD = "INT.PER") # note the data frame must be transposed
hc <- stats::hclust(dist_ts, method="complete") # method can be also "average" or diana (for DIvisive ANAlysis Clustering)
hcdata <- ggdendro::dendro_data(hc)
names_order <- hcdata$labels$label
# Use the following to remove labels from dendogram so not doubling up - but good for checking hcdata$labels$label <- ""
hcdata%>%ggdendro::ggdendrogram(., rotate=FALSE, leaf_labels=FALSE)

I believe the object you are looking for is stored in the variable dist_ts:
dist_ts <- TSclust::diss(SERIES = t(cdata), METHOD = "INT.PER")
print(dist_ts)

Related

Automatic lane / band detection for chromatography in R

I would like to implement an (easy) automatic lane / band detection for thin layer chromatography in R. Below I have the rawdata and an image for a not-so-clean square wave signal that represent several bands.
The following image shows the wave and (introduced by hand) start (blue) and stop (red) of a lane.
I would like to automatically determine:
How many lanes are there? (in this example: 9)
How broad are they?
what is the distance between lanes?
also: what is the center of each lane would/could be helpful
Any strategy on how to achieve this in R would be highly welcome. A "rough" estimation of the values for the questions above would already help, as the "precise" values could later be manually adjusted. But the automatically determined values should be somewhat near the actual values, of course.
So far I tried a peak detection using the pracma-package, but this wasn't really useful as I have a square-wave-like signal, not a sharp peak... But maybe I missed something?
Here is the original raw data:
a1 <-c(305.91, 219.13, 117.2, 35.92, -4.89, -9.72, -0.34, 0.67, -15.81,
-42.09, -61.73, -62.25, -43.29, -15.69, 6.4, 14.45, 9.44, -0.57,
-6.75, -5.25, 0.96, 4.55, -1.1, -17.24, -38.05, -52.97, -52.16,
-32.31, 0.65, 34.12, 55.7, 60.34, 53.11, 45.13, 45.36, 53.58,
60.06, 52.48, 25.47, -14.03, -49.77, -65.91, -56.74, -29.88,
-0.87, 16.9, 19.89, 14.68, 11.42, 15.44, 23.25, 25.29, 13.3,
-13.08, -44.98, -68.97, -74.62, -60.26, -33.6, -7.01, 9.42, 13.02,
8.98, 5.86, 9.19, 17.13, 21.35, 12.71, -11.49, -43.9, -69.95,
-76.04, -58.01, -24.17, 9.41, 28.69, 29.92, 20.83, 14.06, 17.41,
27.93, 34.07, 24.37, -3.49, -39.75, -67.82, -74.25, -56.8, -25.3,
4.69, 21.34, 22.66, 16, 11.65, 15.07, 23.04, 25.92, 14.53, -12.82,
-47.78, -75.72, -83.84, -68.64, -38.22, -7.6, 10.16, 11.43, 3.18,
-2.93, 0.37, 10.2, 15.29, 4.32, -24.87, -61.99, -89.58, -93.53,
-71.9, -35.99, -3.03, 14.36, 14.91, 7.51, 3.53, 8.38, 18.02,
22.33, 12.73, -11.41, -41.52, -64.3, -69.08, -53.5, -24.72, 4.7,
23.57, 27.96, 22.7, 17.69, 20.8, 31.89, 42.05, 39.24, 17.06,
-19.32, -54.78, -73.28, -68.27, -47.01, -24.84, -13, -8.96, 2.88,
39.23, 102.66, 174.58, 222.62, 219)

I want to grow a variable according to the weight of each country and the global changes (Creating a Bartik Instrument in R)

I am conducting some regression analysis and I need to first create a Bartik Instrument to use as an IV. Essentially, I have 10 decile groups of the income distribution. These are both at a global level and a country level (as there is an unbalanced panel of countries in the dataset). I want to grow each country's decile groups according to worldwide changes.
The image above represents the world and shows the percentage of people in each income decile on the left-hand side. On the right are the calculated percentage changes for each income decile between years. There are 10 columns all up for the 10 deciles.
The image below shows the country's decile groups. The starting year will be 1990 for each country (ie, the beginning decile proportion for each decile will be the year 1990 for each country. This serves as the "weight" in all of the statistics). Then, each decile will grow at the same percentage change as the global level.
For example, if dp1 is 1.92 in 1990 for the country Afghanistan, 1991 will be calculated from the global percentage change between 1990 and 1991. Because the global change was -2.857%, the predicted value of dp1 in 1991 for Afghanistan will be 1.865. This value will then be used in the calculation for predicting 1992.
The issue is, it needs to start at 1990 for each country and end in the final predicted year of 2019. I cannot just use a mutate function as it won't recognize that each country restarts in 1990.
Any guidance on this issue will be greatly appreciated. Please let me know if you need to see any more of the data as it is all open source and can therefore be freely shared.
Dput of the world data frame:
structure(list(Entity = c("World", "World", "World", "World",
"World", "World", "World", "World", "World", "World", "World",
"World", "World", "World", "World", "World", "World", "World",
"World", "World", "World", "World", "World", "World", "World",
"World", "World", "World", "World", "World", "World"), Year = c(1990,
1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001,
2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012,
2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020), Code = c("WLD",
"WLD", "WLD", "WLD", "WLD", "WLD", "WLD", "WLD", "WLD", "WLD",
"WLD", "WLD", "WLD", "WLD", "WLD", "WLD", "WLD", "WLD", "WLD",
"WLD", "WLD", "WLD", "WLD", "WLD", "WLD", "WLD", "WLD", "WLD",
"WLD", "WLD", "WLD"), gini = c("69.95", "70.18", "70.07", "69.88",
"69.79", "69.51", "69.16", "68.93", "68.98", "68.83", "68.76",
"68.43", "68.1", "67.68", "67.24", "66.79", "66.22", "65.57",
"64.9", "63.8", "63.28", "62.92", "62.54", "62.11", "61.68",
"61.47", "61.12", "60.92", "60.81", "60.65", "60.6"), palma = c(14.44,
14.74, 14.2, 13.7, 13.28, 12.95, 12.46, 12.31, 12.12, 12.04,
12.12, 11.74, 11.49, 11.18, 10.91, 10.55, 10.08, 9.67, 9.27,
8.53, 8.29, 8.12, 7.91, 7.74, 7.47, 7.36, 7.25, 7.11, 7.07, 6.95,
6.97), dp1 = c(0.35, 0.34, 0.35, 0.36, 0.36, 0.38, 0.38, 0.38,
0.39, 0.38, 0.38, 0.39, 0.38, 0.38, 0.39, 0.39, 0.4, 0.4, 0.4,
0.43, 0.43, 0.43, 0.43, 0.43, 0.45, 0.45, 0.44, 0.43, 0.44, 0.44,
0.43), dp2 = c(0.71, 0.72, 0.76, 0.78, 0.82, 0.82, 0.84, 0.86,
0.87, 0.9, 0.88, 0.89, 0.9, 0.92, 0.93, 0.93, 0.95, 0.98, 1.01,
1.07, 1.09, 1.1, 1.1, 1.14, 1.14, 1.14, 1.17, 1.19, 1.18, 1.19,
1.2), dp3 = c(1.09, 1.06, 1.1, 1.18, 1.19, 1.25, 1.29, 1.31,
1.3, 1.33, 1.31, 1.36, 1.38, 1.39, 1.43, 1.48, 1.5, 1.54, 1.59,
1.69, 1.72, 1.74, 1.79, 1.81, 1.83, 1.88, 1.89, 1.88, 1.94, 1.95,
1.94), dp4 = c(1.52, 1.5, 1.59, 1.64, 1.74, 1.75, 1.82, 1.82,
1.9, 1.89, 1.88, 1.93, 1.97, 2, 2.02, 2.07, 2.18, 2.24, 2.29,
2.44, 2.48, 2.51, 2.57, 2.58, 2.71, 2.72, 2.73, 2.82, 2.76, 2.81,
2.8), dp5 = c(2.11, 2.15, 2.27, 2.34, 2.42, 2.51, 2.53, 2.56,
2.6, 2.64, 2.7, 2.72, 2.74, 2.8, 2.92, 2.99, 3.05, 3.1, 3.2,
3.44, 3.52, 3.57, 3.65, 3.77, 3.76, 3.86, 3.93, 3.96, 3.95, 4.03,
4.04), dp6 = c(3.23, 3.18, 3.25, 3.38, 3.44, 3.52, 3.6, 3.64,
3.66, 3.74, 3.68, 3.87, 3.98, 4.08, 4.07, 4.14, 4.35, 4.54, 4.68,
4.85, 5.02, 5.11, 5.2, 5.29, 5.48, 5.43, 5.54, 5.56, 5.62, 5.6,
5.57), dp7 = c(5.49, 5.42, 5.43, 5.42, 5.37, 5.41, 5.5, 5.66,
5.49, 5.57, 5.67, 5.73, 5.86, 6.03, 6.23, 6.23, 6.49, 6.63, 6.91,
7.12, 7.37, 7.38, 7.59, 7.72, 7.84, 7.94, 7.92, 8.02, 8, 8.05,
8.13), dp8 = c(10.96, 10.76, 10.3, 10.04, 9.78, 9.73, 9.78, 9.82,
9.67, 9.61, 9.7, 9.75, 9.73, 10, 10.18, 10.5, 10.55, 10.88, 11.04,
11.32, 11.4, 11.63, 11.62, 11.72, 11.78, 11.82, 12.05, 11.85,
12.1, 12.08, 12.12), dp9 = c(21.51, 21.26, 20.81, 20.53, 20.22,
20.17, 20.15, 20.03, 19.9, 19.75, 19.77, 19.7, 19.88, 19.72,
19.74, 19.75, 19.69, 19.71, 19.75, 19.51, 19.48, 19.49, 19.39,
19.36, 19.23, 19.14, 19.05, 19.37, 19.25, 19.3, 19.38), dp10 = c(52.93,
53.51, 54.05, 54.24, 54.58, 54.39, 54.02, 53.85, 54.14, 54.13,
53.96, 53.6, 53.13, 52.61, 52.04, 51.45, 50.77, 49.93, 49.08,
48.07, 47.44, 46.98, 46.61, 46.14, 45.75, 45.56, 45.23, 44.9,
44.72, 44.52, 44.37), `dp1_PChangeFrom-1` = c(NA, -0.0285714285714284,
0.0294117647058822, 0.0285714285714286, 0, 0.0555555555555556,
0, 0, 0.0263157894736842, -0.0256410256410257, 0, 0.0263157894736842,
-0.0256410256410257, 0, 0.0263157894736842, 0, 0.0256410256410257,
0, 0, 0.0749999999999999, 0, 0, 0, 0, 0.0465116279069768, 0,
-0.0222222222222222, -0.0227272727272727, 0.0232558139534884,
0, -0.0227272727272727), `dp2_PChangeFrom-1` = c(NA, 0.0140845070422535,
0.0555555555555556, 0.0263157894736842, 0.0512820512820512, 0,
0.024390243902439, 0.0238095238095238, 0.0116279069767442, 0.0344827586206897,
-0.0222222222222222, 0.0113636363636364, 0.0112359550561798,
0.0222222222222222, 0.0108695652173913, 0, 0.0215053763440859,
0.0315789473684211, 0.0306122448979592, 0.0594059405940595, 0.0186915887850467,
0.00917431192660551, 0, 0.0363636363636362, 0, 0, 0.0263157894736842,
0.0170940170940171, -0.00840336134453782, 0.00847457627118645,
0.00840336134453782), `dp3_PChangeFrom-1` = c(NA, -0.0275229357798165,
0.0377358490566038, 0.0727272727272726, 0.00847457627118645,
0.0504201680672269, 0.032, 0.0155038759689923, -0.00763358778625955,
0.0230769230769231, -0.0150375939849624, 0.0381679389312977,
0.014705882352941, 0.00724637681159421, 0.0287769784172662, 0.034965034965035,
0.0135135135135135, 0.0266666666666667, 0.0324675324675325, 0.0628930817610062,
0.0177514792899408, 0.0116279069767442, 0.0287356321839081, 0.0111731843575419,
0.0110497237569061, 0.0273224043715846, 0.00531914893617022,
-0.0052910052910053, 0.0319148936170213, 0.00515463917525774,
-0.00512820512820513), `dp4_PChangeFrom-1` = c(NA, -0.0131578947368421,
0.0600000000000001, 0.031446540880503, 0.0609756097560976, 0.00574712643678161,
0.04, 0, 0.0439560439560439, -0.00526315789473685, -0.0052910052910053,
0.0265957446808511, 0.0207253886010363, 0.0152284263959391, 0.01,
0.0247524752475247, 0.0531400966183576, 0.0275229357798165, 0.0223214285714285,
0.0655021834061135, 0.0163934426229508, 0.0120967741935483, 0.0239043824701195,
0.00389105058365768, 0.0503875968992248, 0.00369003690036909,
0.00367647058823522, 0.0329670329670329, -0.0212765957446809,
0.0181159420289856, -0.00355871886121005), `dp5_PChangeFrom-1` = c(NA,
0.018957345971564, 0.0558139534883721, 0.0308370044052863, 0.0341880341880342,
0.037190082644628, 0.00796812749003985, 0.0118577075098815, 0.015625,
0.0153846153846154, 0.0227272727272727, 0.00740740740740741,
0.00735294117647059, 0.021897810218978, 0.0428571428571429, 0.0239726027397261,
0.0200668896321069, 0.0163934426229509, 0.0322580645161291, 0.0749999999999999,
0.0232558139534884, 0.0142045454545454, 0.0224089635854342, 0.0328767123287672,
-0.00265251989389927, 0.0265957446808511, 0.0181347150259068,
0.00763358778625949, -0.00252525252525247, 0.020253164556962,
0.00248138957816372), `dp6_PChangeFrom-1` = c(NA, -0.0154798761609907,
0.0220125786163522, 0.04, 0.0177514792899408, 0.0232558139534884,
0.0227272727272727, 0.0111111111111111, 0.0054945054945055, 0.0218579234972678,
-0.0160427807486631, 0.0516304347826087, 0.0284237726098191,
0.0251256281407035, -0.00245098039215681, 0.017199017199017,
0.0507246376811594, 0.0436781609195403, 0.0308370044052863, 0.0363247863247863,
0.0350515463917526, 0.0179282868525898, 0.0176125244618395, 0.0173076923076923,
0.0359168241965974, -0.00912408759124101, 0.0202578268876612,
0.00361010830324902, 0.0107913669064749, -0.00355871886121005,
-0.00535714285714274), `dp7_PChangeFrom-1` = c(NA, -0.0127504553734062,
0.00184501845018446, -0.00184162062615097, -0.00922509225092248,
0.00744878957169461, 0.0166358595194085, 0.0290909090909091,
-0.0300353356890459, 0.0145719489981785, 0.0179533213644524,
0.0105820105820107, 0.0226876090750436, 0.0290102389078498, 0.033167495854063,
0, 0.0417335473515248, 0.0215716486902927, 0.0422322775263952,
0.0303907380607815, 0.0351123595505618, 0.00135685210312073,
0.0284552845528455, 0.0171277997364954, 0.0155440414507772, 0.0127551020408164,
-0.00251889168765749, 0.0126262626262626, -0.00249376558603486,
0.00625000000000009, 0.00993788819875777), `dp8_PChangeFrom-1` = c(NA,
-0.0182481751824818, -0.0427509293680297, -0.0252427184466021,
-0.0258964143426295, -0.0051124744376277, 0.00513874614594028,
0.00408997955010234, -0.0152749490835031, -0.00620475698035165,
0.00936524453694067, 0.00515463917525781, -0.00205128205128201,
0.0277492291880781, 0.018, 0.031434184675835, 0.00476190476190483,
0.0312796208530806, 0.014705882352941, 0.0253623188405798, 0.00706713780918729,
0.0201754385964913, -0.00085984522785912, 0.00860585197934608,
0.00511945392491457, 0.00339558573853998, 0.0194585448392555,
-0.0165975103734441, 0.0210970464135021, -0.00165289256198344,
0.00331125827814562), `dp9_PChangeFrom-1` = c(NA, -0.0116225011622501,
-0.0211665098777047, -0.0134550696780393, -0.015099853872382,
-0.00247279920870411, -0.000991571641051221, -0.00595533498759293,
-0.00649026460309548, -0.00753768844221098, 0.00101265822784808,
-0.0035407182599899, 0.00913705583756344, -0.00804828973843059,
0.00101419878296144, 0.000506585612968671, -0.00303797468354424,
0.00101574403250379, 0.00202942668696089, -0.0121518987341771,
-0.00153767298821123, 0.000513347022587167, -0.00513083632632108,
-0.0015471892728211, -0.0067148760330578, -0.00468018720748829,
-0.00470219435736676, 0.0167979002624672, -0.0061951471347445,
0.00259740259740263, 0.00414507772020717), `dp10_PChangeFrom-1` = c(NA,
0.0109578688834309, 0.0100915716688469, 0.00351526364477345,
0.00626843657817102, -0.00348112861854155, -0.00680272108843533,
-0.00314698259903743, 0.00538532961931289, -0.000184706316956003,
-0.00314058747459822, -0.00667160859896218, -0.00876865671641789,
-0.00978731413514028, -0.0108344421212697, -0.011337432744043,
-0.0132167152575316, -0.0165452038605476, -0.0170238333667134,
-0.0205786471067644, -0.0131058872477637, -0.00969645868465432,
-0.00787569178373771, -0.0100836730315383, -0.00845253576072823,
-0.00415300546448082, -0.00724319578577712, -0.00729604244970149,
-0.00400890868596881, -0.00447227191413228, -0.00336927223719689
)), row.names = c(NA, -31L), class = "data.frame")
dput of the countries data frame::
structure(list(Year = numeric(0), Entity = character(0), Code = character(0),
gini = character(0), palma = numeric(0), dp1 = numeric(0),
dp2 = numeric(0), dp3 = numeric(0), dp4 = numeric(0), dp5 = numeric(0),
dp6 = numeric(0), dp7 = numeric(0), dp8 = numeric(0), dp9 = numeric(0),
dp10 = numeric(0)), class = c("grouped_df", "tbl_df", "tbl",
"data.frame"), row.names = integer(0), groups = structure(list(
Entity = character(0), Year = numeric(0), .rows = structure(list(), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = integer(0), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE))

Run Forecasting model with multiple Dependent and Independent variables in R

I have a data set with 7 features including the date column where my dependent variables are NORTH and YORKSANDTHEHUMBER and the rest are independent variables. I want to automate the process where I take my first dependent feature NORTH and run it against all the independent variables in a univariate manner so that the first model gives me the result for NORTH and x1, second for NORTH and x2 and so on via using for loop but I couldn't make the sense. Can anyone please guide me in this?
Data:
structure(list(Date = structure(c(289094400, 297043200, 304992000,
312854400, 320716800, 328665600, 336614400, 344476800, 352252800,
360201600, 368150400, 376012800, 383788800, 391737600, 399686400,
407548800, 415324800, 423273600, 431222400, 439084800, 446947200,
454896000, 462844800, 470707200, 478483200, 486432000, 494380800,
502243200, 510019200, 517968000, 525916800, 533779200, 541555200,
549504000, 557452800, 565315200, 573177600, 581126400, 589075200,
596937600, 604713600, 612662400, 620611200, 628473600, 636249600,
644198400, 652147200, 660009600, 667785600, 675734400, 683683200,
691545600, 699408000, 707356800, 715305600, 723168000, 730944000,
738892800, 746841600, 754704000, 762480000, 770428800, 778377600,
786240000, 794016000, 801964800, 809913600, 817776000, 825638400,
833587200, 841536000, 849398400, 857174400, 865123200, 873072000,
880934400, 888710400, 896659200, 904608000, 912470400, 920246400,
928195200, 936144000, 944006400, 951868800, 959817600, 967766400,
975628800, 983404800, 991353600, 999302400, 1007164800, 1014940800,
1022889600, 1030838400, 1038700800, 1046476800, 1054425600, 1062374400,
1070236800, 1078099200, 1086048000, 1093996800, 1101859200, 1109635200,
1117584000, 1125532800, 1133395200, 1141171200, 1149120000, 1157068800,
1164931200, 1172707200, 1180656000, 1188604800, 1196467200, 1204329600,
1212278400, 1220227200, 1228089600, 1235865600, 1243814400, 1251763200,
1259625600, 1267401600, 1275350400, 1283299200, 1291161600, 1298937600,
1306886400, 1314835200, 1322697600, 1330560000, 1338508800, 1346457600,
1354320000, 1362096000, 1370044800, 1377993600, 1385856000, 1393632000,
1401580800, 1409529600, 1417392000, 1425168000, 1433116800, 1441065600,
1448928000, 1456790400, 1464739200, 1472688000, 1480550400, 1488326400,
1496275200, 1504224000, 1512086400, 1519862400, 1527811200, 1535760000,
1543622400, 1551398400, 1559347200, 1567296000, 1575158400, 1583020800,
1590969600, 1598918400, 1606780800, 1614556800, 1622505600, 1630454400,
1638316800), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Industrialproduction = c(8.2, 8.79, 0.94, 1.53, -3.18, -8.66,
-8.96, -11.93, -8.14, -4.5, 1.53, 2.06, 2.39, 2.02, 2.01,
1.68, 2.16, 2.15, 3.77, 5.95, 3.58, 0.81, -1.58, -1.72, 3.62,
9.78, 8.51, 3.49, 1.97, -1.02, 1.92, 6.13, 3.87, 3.54, 2.76,
4.19, 4.73, 4.84, 6.64, 3.88, 2.05, 1.36, 0.53, 1.47, 1.61,
3.22, -1.45, -2.76, -3.83, -5.06, -4.01, -1.76, -0.27, -0.82,
2.23, 0.69, 1.38, 2.07, 2.32, 4.1, 4.61, 5.68, 6.13, 5.91,
2.85, 1.66, 1, 0.37, 2.52, 1.26, 1.24, 1.48, 0.37, 2.24,
2.7, 4.38, 7.6, 3.89, 0.84, -0.82, -0.46, 5.61, 9.48, 5.06,
1.95, 2.1, 1.08, 6.27, 1.46, 2.28, 3.21, 3.37, 12.94, -1.06,
-2.07, -6.22, -5.19, 6.65, 6.78, 4.35, -2.69, -1.31, -2.08,
3.44, -3.08, -0.92, -1.62, -0.91, 8.32, 2.57, 4.33, 2.44,
1.52, -1.3, -4.94, -3.97, -3.59, -1.83, 1.77, -1.86, -4.86,
-5.07, -7.55, -5.37, -0.33, -1.2, -0.11, -1.11, -8.39, -5.4,
-5.52, -4.16, 0.12, -0.7, -0.58, -0.59, 0.48, 3.87, 5.29,
7.91, 7.21, -0.45, -2.23, -1.86, 4.19, 5.9, 5.94, 2.45, 0,
-0.75, -1.08, 1.63, -3.28, -0.22, 3.49, 1.07, 1.53, 5.3,
4.21, 6.14, 10.24, 2.26, 0.71, -1.3, -8.9, -12.36, -5.02,
-2.83, 3.76, 9.86, 1.9, 0.94), Householdconsumption = c(30.09,
32.53, 33.35, 35.23, 37.18, 37.59, 38.89, 39.82, 41.56, 42.7,
43.74, 45.03, 46.19, 46.95, 48.29, 49.84, 51.26, 52.15, 53.5,
54.36, 55.4, 56.7, 57.05, 58.88, 60.09, 61.44, 63.27, 64.74,
66.63, 68.35, 69.55, 70.81, 72.3, 74.29, 76.65, 78.82, 81.51,
83.81, 86.53, 88.4, 90.29, 92.46, 93.95, 95.99, 97.85, 100.83,
102.42, 104.05, 106.08, 107.79, 109.33, 110.63, 111.71, 113.52,
114.9, 116.02, 118.31, 119.4, 122.27, 124.05, 125.13, 125.99,
127.59, 129.19, 130.16, 132.29, 135.06, 136.61, 139.34, 142.14,
144.59, 146.95, 149.43, 151.71, 155.34, 156.37, 158.39, 160.69,
164.47, 164.41, 167.54, 169.48, 170.09, 172.51, 176.26, 177.61,
179.44, 180.28, 182.96, 184.01, 186.83, 186.34, 188.79, 190.18,
191.94, 194.56, 196.46, 198.86, 201.75, 203.09, 205.24, 208.26,
210.84, 213.9, 216.18, 217.54, 220.61, 222.9, 223.67, 227.66,
230.62, 232.57, 234.8, 237.82, 241.91, 244.47, 248.84, 248.63,
248.14, 243.9, 241.46, 239.04, 240.72, 243.03, 241.87, 248,
249.95, 251.91, 254.92, 254.81, 257.17, 261.11, 262.28, 265.29,
266.74, 271.42, 274.28, 277.61, 282.48, 282.94, 285.76, 290.21,
292.88, 294.9, 296.07, 299.14, 302.58, 302.82, 309.63, 313.2,
318.64, 320.87, 323.41, 325.57, 326.56, 329.67, 335.95, 337.61,
341.08, 345.09, 346.16, 350.18, 350.23, 347.89, 339.85, 270.86,
325.65, 320.28, 311.3, 341.24, 354.61, 361.47), Investmentgrowth = c(17.3,
22.73, 25.8, 29.99, 21.59, 15.49, 11.11, 6.04, 4.23, 4.42,
4.28, 3.51, 6.53, 8.81, 10.52, 12.63, 14.6, 8.04, 7.42, 10.72,
11.15, 16.11, 15.45, 11.36, 18.41, 8.32, 8.99, 8.18, 0.86,
5.04, 9.07, 14.27, 11.11, 19.61, 23.14, 19.47, 27.16, 24.6,
17.45, 16.17, 20.57, 17.01, 17.76, 15.36, 8.28, 7.05, 2.92,
2.83, -3.08, -4.32, -7.48, -6.69, -3.71, -4.64, -3.87, -4.88,
-1.72, -0.38, 1.97, 4.65, 2.84, 2.98, 3.68, 2.88, 0.69, 3.5,
4.91, 5.66, 11.3, 13.85, 10.87, 4.01, -5.63, -8.06, -3.81,
3.94, 10.74, 9.14, 3.83, 3.36, 3.29, 3.24, 7.59, 3.43, 7.05,
13.14, 1.12, 7.68, 4.22, 1.34, 9.27, 0.78, 0.66, -1.52, 4.17,
12.34, 11.74, 5.2, 1.89, -1.56, 2.26, 5.89, 5.79, 4.84, 3.44,
7.15, 7.27, 7.31, 6.11, 5.7, 8.15, 6.96, 7.79, 10.05, 2.71,
9.61, 4.63, 2.72, 1.13, -6.1, -8.98, -14.36, -9.8, -11.41,
-3.13, 1.28, 3.81, 9.18, 1.62, 2.05, 2.14, 2.03, 7.32, 3.88,
0.09, 3.44, -1.27, 6.8, 10.41, 5.73, 12.93, 7.89, 6.8, 7.92,
8.2, 9.32, 6.18, 7.39, 5.22, 6.07, 9.44, 5.64, 6.8, 7.2,
4.77, 6.83, 3.74, 1.63, 2.59, 1.17, 4.39, 3.28, 3.78, 2.18,
-1.93, -19.78, -7.51, -2.54, -0.99, 23.33, 6.54, 4.25), ConsumerPriceIndex = c(24.88,
25.94, 27.55, 28.28, 29.79, 31.39, 31.92, 32.55, 33.55, 34.94,
35.55, 36.48, 37.02, 38.14, 38.14, 38.45, 38.73, 39.54, 40.1,
40.49, 40.76, 41.57, 41.99, 42.35, 43.25, 44.46, 44.46, 44.74,
45.06, 45.58, 45.81, 46.42, 46.88, 47.49, 47.72, 48.14, 48.44,
49.43, 49.83, 50.33, 50.82, 52.02, 52.42, 53.11, 53.91, 55.6,
56.69, 57.09, 57.59, 60.27, 60.67, 61.27, 61.67, 62.56, 62.56,
62.86, 63.16, 64.05, 64.45, 64.35, 64.55, 65.35, 65.45, 65.64,
66.24, 67.04, 67.34, 67.63, 68.03, 68.63, 68.93, 69.13, 69.13,
69.82, 70.22, 70.32, 70.7, 71.3, 71.5, 71.8, 71.9, 72.3,
72.4, 72.6, 72.3, 72.9, 73.1, 73.2, 73, 74.1, 74.1, 74, 74.1,
74.6, 74.8, 75.2, 75.3, 75.4, 75.9, 76.2, 76.1, 76.6, 76.7,
77.4, 77.5, 78.1, 78.6, 78.9, 78.9, 80.1, 80.5, 81.3, 81.4,
82, 81.9, 83, 83.4, 85.2, 86.1, 85.5, 85.8, 86.7, 87.1, 88,
88.7, 89.5, 89.8, 91.2, 92.2, 93.3, 94.4, 95.1, 95.4, 95.5,
96.5, 97.6, 98.1, 98.3, 99.1, 99.6, 99.7, 100.2, 100.3, 100.1,
99.7, 100.2, 100.2, 100.3, 100.2, 100.6, 101.1, 101.9, 102.7,
103.5, 104.3, 105, 105.1, 105.9, 106.6, 107.1, 107, 107.9,
108.4, 108.5, 108.6, 108.8, 109.2, 109.4, 109.7, 111.4, 112.4,
114.7), NORTH = c(4.06976744186047, 5.51675977653633, 7.2799470549305,
4.75015422578655, 4.59363957597172, 3.15315315315317, 1.2008733624454,
-0.377562028047452, -0.108283703302655, 0.650406504065032,
0.969305331179318, 0.106666666666688, 3.09003729355352, 2.11886304909562,
2.32793522267207, 5.68743818001977, -1.46934955545156, 3.95611702127658,
5.19438987619354, -0.0912012507600199, 2.81677896109541,
3.97412590369087, 1.30118326353028, 3.31553807249226, 1.32872294960955,
2.93700394923507, 0.908853875665812, 1.81241002546971, -1.3414545718222,
4.81772747317361, -3.4743890895067, 4.63823913990992, 0.857370960463727,
1.78620594713658, 0.527472527472524, -4.05973562947765, -0.136726966764838,
3.16657890117607, 5.95161125667812, 8.01002055498458, 10.5501040737437,
13.4138468987035, 2.93371279497212, 8.84291046495554, -6.87764606265876,
2.90741287990725, 3.71548486856639, 1.23317430567388, -1.1153443739474,
4.31313207880924, -1.64273763383666, 0.751373343751978, -3.21877014345816,
1.16314882913623, -3.59065232516701, -4.65283582701413, 4.98489115166134,
3.18459755147199, -3.72875180849018, 2.20137289784552, -4.22488416879167,
-0.706371260732776, -2.33320725244584, -2.77596063540517,
9.48636128308308, -2.15172116987927, -5.71766285746257, 1.92271571537407,
0.655934629757954, 4.01517293049256, -2.89270965830984, 3.910032505864,
-1.31616434600239, 1.51533020314829, 3.09793915477058, 1.00146317751519,
-0.516295759142123, 4.36356154298765, -0.254418667464494,
-1.38015492270122, -0.375369475589906, 3.79511767246943,
1.67693295616696, 0.197127124553074, -1.01758464617007, 5.70477696100394,
-1.37564670926045, 1.39335708665185, 2.29473337483174, -1.40489357721877,
10.7514355294201, -0.403985348024547, -0.0106181613732362,
10.6504339189417, 7.72602065226992, 6.66622841015428, 7.3618861388054,
7.20852539277177, 7.17954849482943, 5.47999408979134, 9.96115783870405,
6.960515961579, 4.82626274289161, -0.428385428540776, 1.6283388103162,
2.07440844957785, -0.707412409361252, -4.9247119657169, 4.3311229522328,
2.53158682305453, -0.8800288960527, 2.40275362264064, 0.67520264383003,
3.97711266595697, 0.00749650524863867, -0.990038901876062,
-0.63991866618197, -2.00199671222057, -5.15098853828302,
-3.65317386916235, -4.67277715297035, -0.564594703469009,
3.29526766976492, 0.0888482310529472, -0.524228981506815,
3.04012050839788, -1.53185447929528, -0.338917708381546,
-2.5450727924491, 3.36238295093309, -0.918735392055365, -0.766840492430499,
-0.767135363240273, 0.0468961039030733, 1.51618073336643,
-2.02356670927575, -1.11584500803018, 2.45568937824186, 0.989863990072745,
-0.4214032191629, 2.8219393653178, 4.51474479784726, -2.49049271581373,
-0.41346860604498, 3.13864420514751, -0.0877964623534655,
-0.674347043417658, -0.143267961613368, -0.243406512930108,
0.0402054219496719, 0.12912750657269, 0.168664845016241,
-0.713623226415894, 1.49163339466038, 1.57747101133233, -2.10536689354583,
3.12980292320487, -0.90833324273064, -1.71375697178543, 0.582188469928239,
2.89692448021907, 0.0768238907010953, -1.53392147948349,
1.23622644511851, -0.0506227154778281, 0.327869614383542,
2.62019966395382, 3.48629495563575, 0.593740862165774, 4.09560684327741,
2.32207959691005, 0.506809670097958), YORKSANDTHEHUMBER = c(4.0121120363361,
5.45851528384282, 9.52380952380951, 6.04914933837431, 3.03030303030299,
5.42099192618225, 2.78993435448577, -0.53219797764768, 1.97966827180309,
1.15424973767052, 0.466804979253115, -1.96179659266907, 2.42232754081095,
0.719794344473031, -0.306278713629415, 3.37941628264209,
2.74393263992076, 3.91920555341303, 1.91585099967527, 0.892125625853447,
2.91888477848958, 3.78293078507868, 0.109815847271484, 6.83486625601216,
0.722691730511011, 3.56008625759656, -0.227160867754524,
2.69419041475355, -1.17134094520194, 2.78546324684064, 1.01487759630426,
1.54843356139717, 4.15602836879435, 4.43619773934357, -0.309698451507728,
-1.45519947678222, -1.09839057574248, 9.08267346664877, 11.8913598474363,
13.9511229623114, 9.71243848306475, 7.66524473371739, 6.46801731884651,
-2.26736490763654, -4.35729847494552, -2.93870179974964,
-7.72353426221536, -7.01127302722023, 2.02543627323513, 2.51245245873873,
0.712134856164617, -2.74951902189779, 3.20525370229387, -2.17225212432703,
0.304311135936791, -5.21962007478405, -1.22771231792975,
5.62676205566459, -0.0988236572110239, 0.865912760888606,
-3.71050647202427, 1.5475703474865, -3.43233328040058, -2.86288061069106,
-0.551968808874026, 2.05442655433966, 0.388675938226524,
-2.60493926554792, -2.23312255163324, 5.04817095211292, 1.43656632546456,
2.53687507970646, -2.37376845704496, 4.95419269721737, 2.5486061891899,
-0.64046817419928, 1.75846231104579, 0.542834308795226, -0.322606591645488,
-2.67961743436791, 3.57498650723638, 2.89743475977992, 1.28567849851333,
1.828392232888, -0.335580970541442, 5.34860062451308, -2.98213938289875,
3.55468980520775, 2.76514398982056, 3.45832186518539, 1.32470422187813,
2.79428923624948, 3.8093136923264, 9.02544568216825, 7.65854560247412,
11.0775256253873, -0.658987130155868, 10.726463566155, 5.35747018223358,
4.66387144397987, 5.14763674355188, 10.581371911713, 3.46926043870116,
-0.000369065205607915, 0.924675325682334, 3.681119585314,
-0.0731638011738147, 0.690177922935143, 1.33427941484383,
2.65734876034112, 1.62515008951355, 1.48038293242949, 0.494192527588077,
2.39510739408179, 0.818557817036399, -1.1083492547105, -1.89465779498896,
-3.74953204588813, -3.7238074999174, -4.9788025925358, -4.65464963206228,
3.34588197167384, 2.20886725349025, 1.99954661835316, -0.777545762347822,
3.58681336123701, -2.96757202302368, -3.36310924643208, 2.01483012871867,
2.4154475314586, -0.642314624781054, -2.0920093049768, -1.73904001349183,
1.69071701857513, 0.201962934561265, -2.66472457335063, 0.323680874793625,
1.37879437405697, 3.26467995053582, 2.21645486418079, -0.646736928898328,
2.06516965491332, 1.8250141624007, -1.68545096699093, -0.818973277015041,
4.05215303886115, -1.16233786449552, -1.56747999678074, 0.67708495662531,
2.92754908797974, 1.50505329502891, -1.12667258046976, -0.765034978617734,
2.67854615526131, -0.306294171526678, 0.175047038539941,
1.56451236885344, 0.618844724791642, 3.34585295985361, -1.76420421213768,
-0.079420811764984, 1.56942028744185, 0.407910173531572,
-0.268243129544691, 2.57107118459526, -0.758721256899304,
3.03713057699041, 2.68699850192726, 1.88666482868311, 4.78697689266296,
2.43248653386118, 1.27252711337855)), row.names = c(NA, -172L
), class = "data.frame")
Code:
library(tseries)
library(dplyr)
# ARDL MODELING AND FORECASTING
in_sampleARDL <- data %>%
dplyr::filter(Date < '2020-03-01')
out_sampleARDL <-data %>%
dplyr::filter(Date >= '2020-03-01')
auto_ardl(NORTH~Householdconsumption,
data = in_sampleARDL, max_order = 4, selection = 'BIC')
pred1 <-forecast(ardlDlm(formula = NORTH ~ diff(Householdconsumption),
data = in_sampleARDL, p =3)
, x =out_sampleARDL$NORTH, h = 4)
error1 = out_sampleARDL$NORTH[1:2]- pred1[["forecasts"]]
mean(error1^2)
auto_ardl(NORTH~Industrialproduction,
data = in_sampleARDL, max_order = 4, selection = 'BIC')
pred2 <-forecast(ardlDlm(formula = NORTH ~ Industrialproduction,
data = in_sampleARDL, p =3)
, x =out_sampleARDL$NORTH, h = 4)
error2 = out_sampleARDL$NORTH[1:4]- pred2[["forecasts"]]
mean(error2^2)

Frequency table for intervals

I saved data into the object datos so I could calculate AF (absolute frequency) and RF(relative frequency) for a continuous variable in column V1. But I want to have the frequencies be in intervals.
I don't really know how to do it so I need your help. If anyone has any idea about how to do it, here is my code:
k is the number of intervals I'm using
and largo is the quantity of data I have.
read.table("datos.txt", header = FALSE)-> datos
largo<-length(datos$V1)
k<- (1+log2(largo))
k<-round(k,digits = 0)
vectordatos <- datos$v1
histograma<-hist(datos$V1,breaks=k)
FA<-table(datos$V1)
FR<-table(datos$V1)/largo
FA
FR
The datos object is as follows:
datos = structure(list(V1 = c(6.16, 5.83, 5.66, 3.63, 1.38, 9.64, 7.46,
5.34, 7.93, 8.5, 4.18, 5.18, 10.27, 5.41, 4.76, 4.67, 10.02,
7.1, 5.38, 8.55, 4.85, 8.28, 2.9, 7.18, 6.54, 5.66, 7.26, 6.45,
3.97, 6.55, 5.15, 7.83, 5.52, 7.21, 7.3, 6.19)), class = "data.frame", row .names = c(NA,
-36L))
You can use cut to create k intervals and table to represent the frequency per interval. You can use the following code:
table(cut(datos$V1,k))
Output:
(1.37,2.86] (2.86,4.34] (4.34,5.83] (5.83,7.31] (7.31,8.79] (8.79,10.3]
1 4 11 11 6 3

R - Combine data frames to a table, separating values with a slash ("/")

I am working with the data frames shown below:
tbl45 <- structure(list(`2010's` = c(0.48, 1.45, 33.33, 25.6, 32.37, 6.76
), `2020's` = c(0.48, 0.97, 31.88, 36.71, 28.5, 1.45), `2030's` = c(0.48,
1.93, 27.54, 34.3, 33.33, 2.42), `2040's` = c(0.48, 1.93, 33.33,
26.57, 28.5, 9.18), `2050's` = c(0.48, 1.93, 33.33, 26.09, 32.85,
5.31), `2060's` = c(0.48, 3.38, 25.6, 32.37, 36.23, 1.93), `2070's` = c(0.48,
1.93, 33.82, 28.99, 31.4, 3.38), `2080's` = c(0.48, 2.42, 34.3,
31.4, 28.99, 2.42), `2090's` = c(0.48, 2.42, 31.4, 33.33, 29.95,
2.42)), .Names = c("2010's", "2020's", "2030's", "2040's", "2050's",
"2060's", "2070's", "2080's", "2090's"), row.names = c("[0,100]",
"(100,200]", "(200,300]", "(300,400]", "(400,500]", "(500,600]"
), class = "data.frame")
tbl85 <- structure(list(`2010's` = c(0.48, 1.45, 31.4, 30.43, 34.78, 1.45
), `2020's` = c(0.48, 1.45, 36.23, 29.95, 30.43, 1.45), `2030's` = c(0.48,
1.93, 32.37, 28.02, 34.3, 2.9), `2040's` = c(0.48, 2.9, 30.43,
33.33, 31.4, 1.45), `2050's` = c(0.48, 2.9, 32.85, 30.43, 29.47,
3.86), `2060's` = c(0.48, 4.83, 33.33, 30.43, 26.57, 4.35), `2070's` = c(0.48,
5.8, 31.88, 36.23, 24.15, 1.45), `2080's` = c(0.48, 5.8, 35.27,
33.82, 23.19, 1.45), `2090's` = c(1.45, 8.21, 38.16, 32.85, 17.87,
1.45)), .Names = c("2010's", "2020's", "2030's", "2040's", "2050's",
"2060's", "2070's", "2080's", "2090's"), row.names = c("[0,100]",
"(100,200]", "(200,300]", "(300,400]", "(400,500]", "(500,600]"
), class = "data.frame")
and I would like to combine them in one single table (or data frame), with the values separated by a slash ("/") or parenthesis. Then I will save it as a .xls file and copy the table to word.
The final result would be something like this (I am showing only the first column for the simplicity sake):
2010's
[0,100] 0.48 / 0.48
(100,200] 1.45 / 1.45
(200,300] 33.33 / 31.40
(300,400] 25.60 / 30.43
(400,500] 32.37 / 34.78
(500,600] 6.76 / 1.45
How can I achieve that using R?
Try this:
res <- mapply(function(x,y) paste(x,y, sep = "/"), tbl45, tbl85)
rownames(res) <- rownames(tbl45)
res
2010's 2020's 2030's 2040's 2050's 2060's
[0,100] "0.48/0.48" "0.48/0.48" "0.48/0.48" "0.48/0.48" "0.48/0.48" "0.48/0.48"
(100,200] "1.45/1.45" "0.97/1.45" "1.93/1.93" "1.93/2.9" "1.93/2.9" "3.38/4.83"
(200,300] "33.33/31.4" "31.88/36.23" "27.54/32.37" "33.33/30.43" "33.33/32.85" "25.6/33.33"
(300,400] "25.6/30.43" "36.71/29.95" "34.3/28.02" "26.57/33.33" "26.09/30.43" "32.37/30.43"
(400,500] "32.37/34.78" "28.5/30.43" "33.33/34.3" "28.5/31.4" "32.85/29.47" "36.23/26.57"
(500,600] "6.76/1.45" "1.45/1.45" "2.42/2.9" "9.18/1.45" "5.31/3.86" "1.93/4.35"
2070's 2080's 2090's
[0,100] "0.48/0.48" "0.48/0.48" "0.48/1.45"
(100,200] "1.93/5.8" "2.42/5.8" "2.42/8.21"
(200,300] "33.82/31.88" "34.3/35.27" "31.4/38.16"
(300,400] "28.99/36.23" "31.4/33.82" "33.33/32.85"
(400,500] "31.4/24.15" "28.99/23.19" "29.95/17.87"
(500,600] "3.38/1.45" "2.42/1.45" "2.42/1.45"
We could do this by unlisting both the datasets and then paste
res <- tbl45
res[] <- paste(unlist(tbl45), unlist(tbl85), sep='/')
res
# 2010's 2020's 2030's 2040's 2050's
#[0,100] 0.48/0.48 0.48/0.48 0.48/0.48 0.48/0.48 0.48/0.48
#(100,200] 1.45/1.45 0.97/1.45 1.93/1.93 1.93/2.9 1.93/2.9
#(200,300] 33.33/31.4 31.88/36.23 27.54/32.37 33.33/30.43 33.33/32.85
#(300,400] 25.6/30.43 36.71/29.95 34.3/28.02 26.57/33.33 26.09/30.43
#(400,500] 32.37/34.78 28.5/30.43 33.33/34.3 28.5/31.4 32.85/29.47
#(500,600] 6.76/1.45 1.45/1.45 2.42/2.9 9.18/1.45 5.31/3.86
# 2060's 2070's 2080's 2090's
#[0,100] 0.48/0.48 0.48/0.48 0.48/0.48 0.48/1.45
#(100,200] 3.38/4.83 1.93/5.8 2.42/5.8 2.42/8.21
#(200,300] 25.6/33.33 33.82/31.88 34.3/35.27 31.4/38.16
#(300,400] 32.37/30.43 28.99/36.23 31.4/33.82 33.33/32.85
#(400,500] 36.23/26.57 31.4/24.15 28.99/23.19 29.95/17.87
#(500,600] 1.93/4.35 3.38/1.45 2.42/1.45 2.42/1.45

Resources