Decoding Unknown Data Type - decode

I have received some encoded data from Arduino via PySerial. I have access to an application which decodes the data, but I need to know what it is doing and I do not have access to the source code.
Data file contents:
%N|nkNsnrNlnzNqnEOknJOlM
%VA#_##hpZzbdIvzegvxefvkeavdeXvXeXvPeMvReDvlM
%PaA#gH#lnMO#QaLN#mbzM#cbmM#^beM#Pb_M#Fb]M#xaUM#balM
%Ma##HI#FzJP#auPO#~uPO#{uPO#}uMO#vuN#wuyN#uuqN#xulM
%knOOinSOXnMOAnFOcmxNYmlNBm_NslSNqlHNclnM^N
%PezuReouLeluDeju~diuFe`uBeXuAeUu~dJuxdAu^N
%MM#NaJM#`MM#t`VM#h`aM#f`fM#Y`jM#O`mM#G`uM#{_BN#u_^N
%rN#tuhN#nu[N#kuRN#huEN#au{M#[uqM#Nu^M#CuFM#ttuL#at^N
%XlPMMlvLMlWLPlBLVllKMlWKDlCKKlrJNl[J`lHJPO
%pd|trdrttdjtudbtmd_tkd[tkdWtmdOtldGtvdHtPO
Output from application:
86 31 -48 97 -51 33 -1109 -3121
-984 -358 551 -1108 584 -378 -1111 -3117
-1758 -631 973 -1967 1034 -671 -1128 -3123
-1670 -601 908 -1875 976 -642 -1151 -3130
-1672 -602 890 -1885 976 -645 -1181 -3144
-1685 -607 877 -1890 976 -643 -1191 -3156
-1692 -616 869 -1904 973 -650 -1214 -3169
-1704 -616 863 -1914 959 -649 -1229 -3181
-1712 -627 861 -1928 953 -651 -1231 -3192
-1710 -636 853 -1950 945 -648 -1245 -3218
-1712 -646 845 -1970 946 -652 -1256 -3248
-1710 -657 842 -1985 936 -658 -1267 -3274
-1716 -660 845 -1996 923 -661 -1267 -3305
-1724 -662 854 -2008 914 -664 -1264 -3326
-1730 -663 865 -2010 901 -671 -1258 -3348
-1722 -672 870 -2023 891 -677 -1267 -3369
-1726 -680 874 -2033 881 -690 -1276 -3389
-1727 -683 877 -2041 862 -701 -1269 -3406
-1730 -694 885 -2053 838 -716 -1266 -3429
-1736 -703 898 -2059 821 -735 -1248 -3448
I have tried several encodings like ASCII, UTF-8, and UUEncoding but none have given me any tangible results.
Does anyone have an idea as to what this could be?
Thanks in advance,
Cheers

Related

Automize portfolios volatilities computation in R

Thanks for reading my post. I have a series of portfolios created from the combination of several stocks. I should compute the volatility of those portfolios using the historical daily performances of each stock. Since I have all the combinations in one data frame (called final_output), and all stocks return in another data frame (called perf, where the columns are stocks and rows days) I don't know which will be the most efficient way to automize the process. Below you can find an extract:
> Final_output
ISIN_1 ISIN_2 ISIN_3 ISIN_4
2 CH0595726594 CH1111679010 XS1994697115 CH0587331973
3 CH0595726594 CH1111679010 XS1994697115 XS2027888150
4 CH0595726594 CH1111679010 XS1994697115 XS2043119358
5 CH0595726594 CH1111679010 XS1994697115 XS2011503617
6 CH0595726594 CH1111679010 XS1994697115 CH1107638921
7 CH0595726594 CH1111679010 XS1994697115 XS2058783270
8 CH0595726594 CH1111679010 XS1994697115 JE00BGBBPB95
> perf
CH0595726594 CH1111679010 XS1994697115 CH0587331973
626 0.0055616769 -0.0023656130 1.363791e-03 1.215922e-03
627 0.0086094443 0.0060037334 0.000000e+00 2.519220e-03
628 0.0053802380 0.0009027081 0.000000e+00 7.508635e-04
629 -0.0025213543 -0.0022046297 4.864050e-05 1.800720e-04
630 0.0192416817 0.0093401627 -6.079767e-03 3.800836e-03
631 -0.0101224820 0.0051741294 6.116956e-03 -1.345184e-03
632 -0.0013293793 -0.0100475153 -4.494163e-03 -1.746106e-03
633 0.0036350604 0.0012999350 3.801130e-03 -5.997121e-05
634 0.0030097434 -0.0011484496 -1.187614e-03 -2.069131e-03
635 0.0002034381 0.0030493901 -1.851762e-03 -3.806280e-04
636 -0.0035594427 0.0167455769 -2.148123e-04 -4.709560e-04
637 0.0007654623 -0.0051958237 -3.711191e-04 1.604010e-04
638 0.0107592678 -0.0016260163 4.298764e-04 3.397951e-03
639 0.0050953486 -0.0007403020 2.011738e-03 8.790770e-04
640 0.0008532851 -0.0071121648 -9.746114e-04 5.389598e-04
641 -0.0068204614 0.0133810874 -9.755622e-05 -1.346674e-03
642 0.0091395678 0.0102591793 1.717157e-03 -1.977785e-03
643 0.0027520640 -0.0157912638 1.256440e-03 -1.301119e-04
644 -0.0048902196 0.0039494471 -1.624514e-03 -3.373340e-03
645 -0.0116838833 0.0062450826 6.625549e-04 1.205255e-03
646 0.0004566442 -0.0018570102 -3.456636e-03 4.474138e-03
647 0.0041586368 0.0085679315 4.435933e-03 1.957455e-03
648 0.0007575758 0.0002912621 0.000000e+00 2.053306e-03
649 0.0046429473 -0.0138309230 -4.435798e-03 1.541798e-03
650 0.0049731250 -0.0488164953 4.181975e-03 -9.733133e-04
651 0.0008497451 -0.0033110870 2.724477e-04 -7.555498e-04
652 0.0004494831 0.0049831300 -8.657588e-04 -1.790813e-04
653 -0.0058905751 0.0020143588 8.178287e-04 -1.213991e-03
654 0.0000000000 0.0167525773 4.864050e-05 9.365068e-04
655 0.0010043186 0.0048162231 0.000000e+00 -2.110146e-03
656 -0.0024079462 -0.0100403633 -2.431907e-03 -9.176600e-04
657 -0.0095544604 -0.0193670047 0.000000e+00 -8.935435e-03
658 0.0008123477 0.0114339172 2.437835e-03 5.530483e-03
659 0.0022828734 -0.0015415446 -3.239300e-03 2.765060e-03
660 0.0049096523 -0.0001029283 3.199079e-02 2.327835e-03
661 -0.0027702226 -0.0357198003 9.456712e-04 3.189602e-04
662 -0.0008081216 -0.0139311449 -2.891020e-02 -1.295363e-03
663 -0.0033867462 0.0068745264 -2.529552e-03 -1.496588e-04
664 -0.0015216068 -0.0558572120 -3.023653e-03 -7.992975e-03
665 0.0052829422 0.0181072771 4.304652e-03 -3.319519e-03
666 0.0084386054 0.0448545861 -8.182748e-04 4.279284e-03
667 -0.0076664829 -0.0059415480 -2.047362e-04 6.059936e-03
668 -0.0062108665 -0.0039847073 7.313506e-04 5.993467e-04
669 -0.0053350948 0.0068119154 -1.042631e-02 -2.056524e-03
670 -0.0263588067 0.0245395479 -2.188962e-02 -6.732491e-03
671 -0.0021511018 0.0220649895 1.412435e-02 1.702085e-03
672 0.0205058100 -0.0007179119 3.057527e-03 -1.002423e-02
673 0.0096862280 -0.0194488633 1.207407e-03 -1.553899e-03
674 0.0007143951 -0.0068557672 6.227450e-03 1.790274e-03
675 -0.0021926470 -0.0051114507 -6.267498e-03 -1.035691e-03
676 0.0076655765 -0.0139300847 6.583825e-03 3.059472e-03
677 -0.0032457653 0.0180480206 -4.635495e-03 1.064002e-03
678 0.0036633764 0.0060676410 -2.762676e-04 5.364970e-04
679 -0.0008111122 -0.0013635410 -1.065898e-03 1.214059e-03
680 0.0050228311 0.0055141267 3.003507e-03 1.121643e-03
681 -0.0007067495 0.0147281558 -2.699002e-03 -1.514035e-04
682 -0.0024248548 0.0002573473 -2.113685e-03 -1.423409e-03
683 -0.0002025624 0.0138417207 -4.374895e-03 1.415328e-04
684 -0.0141822418 -0.0169517332 -3.578920e-03 -1.799234e-03
685 -0.0005651749 -0.0259693324 -5.926428e-03 -3.635333e-03
686 0.0004112688 0.0133043570 -1.545642e-03 1.981828e-03
687 -0.0150565262 -0.0107757493 -1.717916e-02 -1.328749e-02
688 0.0039129754 -0.0441013167 -8.376631e-03 -5.653841e-04
689 0.0019748467 0.0115063340 -2.835394e-02 7.868428e-03
690 0.0072614108 0.0358764014 3.586897e-02 7.960077e-03
691 -0.0003604531 0.0106119001 1.024769e-04 -2.733651e-04
What I should do is look for each portfolio (each row of final_output is a portfolio, i.e. 4 stocks portfolio) in perf and compute the volatility (standard deviation) of that portfolio using the stocks historical daily performances of the last three months. (Of course, here I have pasted only 4 stocks performances for simplicity.) Once done for the first, I should do the same for all the other rows (portfolios).
Below is the formula I used for computing the volatility:
#formula for computing the volatility
sqrt(t(weights) %*% covariance_matrix %*% weights)
#where covariance_matrix is
cov(portfolio_component_monthly_returns)
#All the portfolios are equiponderated
weights = [ 0.25 0.25 0.25 0.25 ]
What I'm trying to do since yesterday is to automize the process for all the rows, indeed I have more than 10'000 rows. I'm an RStudio naif, so even trying and surfing on the new I have no results and no ideas of how to automize it. Would someone have a clue how to do it?
Hope to have been clearer as possible, in case do not hesitate to ask me.
Many thanks

R indexing and list translation

i have a list with indexes like this:
> mid_cp
[1] 3065 4871 13153 15587 18100 24010 26324 25648 38195 38196 39384 42237 45686 54217 55032 63684 62800 9134 35261 36449 36866 53968 16969
[24] 43529 46995 52351 4174 7011 18962 18151 18889 24036 32916 34061 34815 36866 51973 55802 53593 55421 56615 88 150 161 192 781
[47] 830 1300 1573 2396 2784 2547 3214 3135 3297 3301 4053 4249 4919 5856 6297 7328 7621 7708 8063 8219 8864 8887 9201
[70] 9214 9533 10334 10301 11235 10529 11356 10566 10872 12228 12250 12507 12048 12643 12913 13224 14297 16772 15363 18759 18979 16264 17363
[93] 20732 17971 22194 22422 19417 22903 22929 23087 19627 19961 23954 24297 25422 25423 25704 25765 25780 22769 22796 26871 27095 23789 24066
[116] 24069 27423 24366 24600 24871 25110 28374 26280 27873 29722 28839 29063 31031 31150 31546 32491 30356 33045 30863 33555 34201 34404 34684
[139] 35498 32912 33207 35874 33488 33716 36761 34543 36807 37000 35157 38195 38196 38458 36438 36619 39484 40109 37532 40143 40160 40458 41257
[162] 38434 38653 41866 41899 39429 42818 40001 43398 43441 40282 40566 43979 43996 40793 40806 40992 41065 41102 41330 41964 46322 43351 46670
and I have a table like this:
> head(movie.cp)
name id
252 $ (Dollars) (The Heist) 252
253 $5 a Day (Five Dollars a Day) 253
1 $9.99 1
254 $windle (Swindle) 254
255 "BBC2 Playhouse" Caught on a Train 255
256 "Independent Lens" Race to Execution 256
How do i get the mid_cp list to be a name list using the movie.cp table?
P.S.: I am completely newbie regarding R
are the numbers in mid_cp equivalent to movie.cp$id? if so try mid_cp <- movie.cp$name[match(mid_cp,movie.cp$id)]

Report the mean number of characters in Corpus document

So I have a corpus setup reading bunch of text file with paragraphs in them.
library('tm')
my.text.location <- "C:/Users//.../*/"
apapers <- VCorpus(DirSource(my.text.location))
Now I need to find the mean of the characters in each text. Running a
mean(nchar(apapers), na.rm =T) results in a very weird output, more than the number of characters.
Any other way to get the mean?
You didn't supply a reproducible example, but rowMeans(sapply(apapers, nchar)) will return the mean number of characters over all documents. "Content" is the column you need.
A longer version is running a sapply over the corpus counting the number of per document. Transpose this data and turn it into a data.frame. The data.frame will contain two columns, content and meta. Content is the one you need. Taking the mean of the content column will give you the average number of characters in a document. The advantage of this is that you have the table in case you need to report the numbers.
# your code
my_count <- data.frame(t(sapply(apapers, nchar)))
mean(my_count$content)
Reproducible example using the crude dataset:
library(tm)
data("crude")
crude <- as.VCorpus(crude)
# in one statement
rowMeans(sapply(crude, nchar))
content meta
1220.30 453.15
# longer version keeping intermediate results.
my_count <- data.frame(t(sapply(crude, nchar)))
mean(my_count$content)
[1] 1220.3
my_count
content meta
127 527 440
144 2634 458
191 330 444
194 394 441
211 552 441
236 2774 455
237 2747 477
242 930 453
246 2115 440
248 2066 466
273 2241 458
349 593 492
352 621 468
353 591 445
368 629 440
489 876 445
502 1166 446
543 463 447
704 1797 456
708 360 451

select best indices from the result of ensemble using mRMR

I am using the R package mRMRe for feature selection and trying to get the indices of most common feature from the results of ensemble:
ensemble <- mRMR.ensemble(data = dd, target_indices = target_idx,solution_count = 5, feature_count = 30)
features_indices = as.data.frame(solutions(ensemble))
This give me the below data:
MR_1 MR_2 MR_3 MR_4 MR_5
2793 2794 2796 2795 2918
1406 1406 1406 1406 1406
2798 2800 2798 2798 2907
2907 2907 2907 2907 2800
2709 2709 2709 2709 2709
1350 2781 1582 1350 1582
2781 1350 2781 2781 636
2712 2712 2712 2712 2781
636 636 636 636 2779
2067 2067 2067 2067 2712
2328 2328 2357 2357 2067
2357 783 2328 2328 2328
772 2357 772 772 772
I want to use some sort of voting logic to select the most frequent index for each row across all columns.
For example in the above image :
1. For the first row there is no match - so select the first one.
2. There are some rows where min occurrence is 2 - so select that one.
3. In case of tie - check if any occurs thrice, if yes select that one, or else from the tied indices select the first occurring one.
May be I am making it too complex, but basically I want to select best indices from all the indices for each row from the dataframe.
Can someone please help me on this?
Here's a simple solution using apply:
apply(df, 1, function(x) { names(which.max(table(x))) })
which gives:
[1] "2793" "1406" "2798" "2907" "2709" "1350" "2781" "2712" "636" "2067" "2328" "2328" "772"
For each row, the function table counts occurrences of each unique element, then we return the name of the element with the maximum number of occurrences (if there is a tie, the first one is selected).

R generate 2D histogram from raw data

I have some raw data in 2D, x, y as given below. I want to generate a 2D histogram from the data. Typically, dividing the x,y values into bins of size 0.5, and count the number of occurrences in each bin (for both x and y at the same time). Is there any way to do that?
> df
x y
1 4.2179611 5.7588577
2 5.3901279 5.8219784
3 4.1933089 6.4317645
4 5.8076411 5.8999598
5 5.5781166 5.9382342
6 4.5569735 6.7833469
7 4.4024492 5.8019719
8 4.1734975 6.0896355
9 5.1707871 5.5640962
10 5.6380258 6.9112775
11 4.6405353 5.2251746
12 4.1809004 6.1127144
13 4.2764079 5.4598799
14 5.4466446 6.0130047
15 5.2443804 5.5421851
16 5.7521515 5.4115965
17 4.9667564 5.3519795
18 4.5007141 6.8669231
19 5.0268273 5.7681888
20 4.4738948 6.4241168
21 4.4116357 5.9819519
22 4.5741988 6.4595129
23 4.0839075 6.8105259
24 4.7154364 6.5054761
25 4.8986785 5.5511226
26 5.6262397 6.8996480
27 4.9034275 5.6716375
28 4.1872928 5.8387641
29 4.0444855 5.2554446
30 4.8911393 5.8449165
31 5.7268887 6.7100432
32 5.9136374 6.5059128
33 4.9481286 6.4679917
34 4.6198987 5.7462047
35 5.7306916 6.0613158
36 5.5818586 6.4533566
37 5.9240267 6.7748290
38 4.8160926 6.4942865
39 5.5456258 5.7911897
40 4.3075173 6.8165520
41 4.9654533 5.8904734
42 5.9581820 5.7692468
43 4.2417172 5.7990554
44 5.3670112 5.8252479
45 5.2932098 5.3983672
46 5.7456521 6.2563828
47 4.9398795 5.2879065
48 4.8526884 6.9827555
49 5.6135753 6.5219431
50 4.0727956 5.2647714
51 6.9418969 5.2584325
52 5.4189039 5.9936456
53 3.9193741 6.7099562
54 5.5885252 5.9680734
55 5.9581279 5.1843804
56 4.5724421 6.6774004
57 4.7700303 6.6083613
58 5.5490254 6.2431170
59 4.1668548 5.1017475
60 5.8948947 6.7646917
61 6.5501872 5.2803433
62 5.6011444 4.2733087
63 5.1337226 6.5225780
64 5.3153358 6.6164809
65 3.3815056 6.4077659
66 3.8405670 5.3677008
67 6.7036350 4.3090214
68 3.2446588 4.0965275
69 4.6563593 7.6868628
70 5.2382914 7.0020874
71 6.0771605 6.6232541
72 3.5672511 6.9333691
73 5.0865233 4.0778233
74 5.6743559 5.5177734
75 4.5759146 7.2210012
76 5.8203140 4.9787148
77 3.1106176 6.3937707
78 4.6310679 4.4731806
79 6.8237641 6.2679791
80 3.7653803 5.9188107
81 5.6139040 5.8586176
82 6.2016662 5.3514293
83 3.9362048 5.3217560
84 6.8005236 7.9247371
85 5.8030101 7.7492432
86 6.0143418 6.0709249
87 6.5734089 7.6112815
88 4.0569383 5.8440535
89 4.6825752 7.7926235
90 4.8204027 6.3106798
91 3.5001675 6.3156079
92 3.6521280 7.5155810
93 5.0945236 4.8206873
94 3.8732946 5.6771599
95 6.4812309 5.6082170
96 5.0308355 7.6877289
97 5.2193389 7.7133717
98 6.2239631 5.5387684
99 4.6501488 7.8559335
100 3.5389389 5.4594034
101 5.7139486 4.5008182
102 3.5425132 7.3562487
103 6.9950663 6.1036549
104 5.3801845 5.8903123
105 4.7629191 5.3394552
106 4.4102815 7.2312852
107 5.8723641 4.1410996
108 3.4691208 4.6383708
109 4.6479362 5.8562699
110 3.0315732 6.8614265
111 5.9456145 4.7497545
112 4.8461189 4.4730002
113 4.9606723 5.1099093
114 4.7802659 7.8147864
115 5.0189229 6.9308301
116 6.4738074 5.0539666
117 5.3725075 5.3282273
118 6.5374505 7.0508875
119 4.0907139 5.0855075
120 5.0557532 5.6449829
121 6.5483249 7.5800015
122 3.1083616 7.3697234
123 3.6119548 7.7639486
124 6.5157691 7.7152933
125 4.0305622 7.0521419
126 3.2197769 6.5881246
127 4.7570419 6.4564400
128 4.0063007 6.3981942
129 4.4412649 7.6576221
130 5.7348769 6.7601804
131 3.1312551 5.6295996
132 3.8627964 7.5817083
133 5.2008281 5.1082509
134 6.4229161 6.2816475
135 2.5241894 6.0802138
136 7.3759753 5.1090478
137 3.7284166 5.2045976
138 3.4404286 6.9708127
139 6.4237399 5.1363851
140 4.1829368 5.1612791
141 5.9500285 5.4765621
142 3.3555182 6.2627360
143 7.7691356 5.1877095
144 4.0684189 7.1663495
145 7.3929140 7.3819058
146 2.1659981 7.9796005
147 4.8539955 7.3108966
148 5.3932658 4.7116979
149 3.5610560 4.6096759
150 5.1883331 6.8068501
151 6.4233558 7.2955388
152 7.3308739 6.1761356
153 3.0710449 4.5296235
154 7.5400128 5.1559900
155 3.5776389 5.2057676
156 4.0402288 7.1487121
157 2.3107258 6.9816127
158 7.2065591 7.7307439
159 5.7577620 5.6652052
160 2.0595554 7.4373547
161 7.5994468 4.6216856
162 4.8053745 3.9113634
163 7.5769460 7.6019067
164 5.5362034 8.9270974
165 3.6713241 3.9060205
166 6.0612046 7.3862080
167 6.9205755 7.0792392
168 6.0892821 6.3248315
169 2.0532905 4.1545875
170 3.4086310 3.5510909
171 5.2148895 5.3266145
172 4.7638780 7.9240988
173 6.4717329 5.1350172
174 7.8287022 4.3457324
175 6.0299681 3.0952274
176 3.2760103 5.2730464
177 2.5729991 7.6594251
178 3.9403251 7.8928014
179 6.0021556 7.5313493
180 7.8561727 4.5092728
181 3.5818174 4.1140876
182 7.4972295 5.5313987
183 6.0138287 6.9369784
184 3.9257191 7.6395296
185 3.0462106 3.1347680
186 6.0630447 4.1847229
187 7.4878528 5.1004141
188 4.5145570 4.6389011
189 6.2777996 4.2647980
190 3.0166336 7.5755042
191 2.8791041 6.4471746
192 7.1029767 7.0061048
193 2.4526181 6.3373793
194 5.8762775 7.0746223
195 7.0609100 8.1256569
196 4.7252400 8.4829780
197 3.3695501 8.8786640
198 3.8505741 6.8260398
199 5.3573846 6.3864944
200 3.7039072 8.9951078
201 4.6216933 6.7890198
202 7.0390643 5.9458624
203 5.7172605 6.9083246
204 2.3814644 8.3856125
205 2.4432566 3.2618192
206 4.3881965 6.7022219
207 5.2583749 7.2432485
208 5.8540367 8.5154705
209 6.4267791 4.9593757
210 5.0668461 3.1358129
211 2.6845736 8.9880143
212 7.3094761 5.4049133
213 4.2176252 5.5062193
214 5.2025716 4.0798478
215 6.5592571 8.1852765
216 2.0417939 7.0843906
217 7.6045374 7.4870940
218 6.5971789 8.8641329
219 5.3541694 7.2176914
220 2.8314803 6.4831720
221 2.4252467 4.0918736
222 6.6804732 6.3624739
223 6.0325285 6.2057468
224 2.2751047 5.1275412
225 5.5397481 5.9890834
226 4.6420585 4.6013327
227 7.6385642 5.1722194
228 6.7378078 5.8246169
229 5.0647686 7.9219705
230 2.8672731 6.6371082
231 7.5487359 4.5727898
232 1.0837662 7.1788146
233 5.4483746 6.8955122
234 9.3085746 4.8330044
235 3.8484225 6.0133789
236 2.8034987 3.0023096
237 2.8952626 8.2623788
238 5.7666136 3.2158710
239 6.4978214 5.7866574
240 1.5184268 5.9791716
241 2.3836147 8.2897188
242 4.7318649 6.1174515
243 5.8544588 7.5056688
244 9.6776416 6.5151695
245 0.4319531 4.2470331
246 0.9810053 8.6452087
247 7.0819634 3.2488110
248 1.9084265 6.1122130
249 7.5096342 3.3495096
250 8.9564496 3.4960564
251 5.7603943 6.9091760
252 0.8801204 7.2744429
253 1.2183581 6.4264214
254 1.7761613 7.1199729
255 3.2490662 7.9935963
256 3.5420375 8.4801333
257 8.7709382 3.8011487
258 8.4770868 3.4749692
259 0.9965042 6.7509705
260 7.5049457 5.4313474
261 9.7261151 6.5909553
262 5.3893371 4.0194548
263 9.6154510 7.3117416
264 1.0327841 6.2376586
265 4.0064715 3.7333634
266 6.6941050 3.9452152
267 4.1317951 9.3322756
268 9.6481471 7.5330023
269 7.3474233 1.0310166
270 3.7343864 4.9808341
271 9.1412231 2.6655861
272 5.8414100 0.1329439
273 2.4837309 7.4956203
274 2.7983337 1.3563719
275 0.6335727 7.9273816
276 7.5566740 0.4321263
277 8.6182079 0.6038505
278 0.8928523 8.0131172
279 5.7375090 8.5275545
280 0.7864533 3.3954255
281 8.7808839 1.7059789
282 9.6621659 0.9215045
283 8.4894688 8.7667948
284 1.0358920 7.2505891
285 0.7378660 0.1173287
286 9.5485481 3.3186128
287 6.8987508 9.5480887
288 7.4105831 5.8809522
289 6.6984457 5.9509037
290 1.7878216 9.1932955
291 0.8443295 5.1662902
292 0.4498266 8.9636923
293 2.5068754 5.3692908
294 9.2509052 2.4204235
295 4.1333742 6.2581851
296 6.5510938 7.2923688
297 4.3412873 3.5514825
298 4.2349765 9.3207514
299 2.8730785 7.2752405
300 2.0425362 6.6513146
301 6.4498432 7.2949259
302 5.7453188 6.3263712
303 7.0501276 8.2238207
304 4.1915008 1.5325379
305 8.1307954 7.7681944
306 7.3156552 6.3031412
307 4.0302052 0.3039900
308 3.3740358 2.1386235
309 8.2055657 2.9112215
310 1.8817856 7.0503046
311 7.0820523 6.8739097
312 5.0725238 6.9951556
313 1.6246224 5.4126084
314 3.8865553 7.6398192
315 6.6727672 8.9677947
316 9.6048687 7.6757966
317 2.2006018 9.6385351
318 9.6403802 7.6438900
319 0.1267512 0.9048408
320 1.8160829 7.3193066
321 9.9318386 9.6068456
322 2.1275892 7.8034724
323 1.2232242 1.0695030
324 3.0198057 3.8964732
325 3.3265773 8.5865587
326 5.1519605 7.5068253
327 0.4137485 5.9223826
328 1.6896445 0.6071874
329 1.8534083 2.3554291
330 1.7182264 9.3488597
331 6.4165456 9.8670765
332 7.6270001 2.1839607
333 8.9867227 5.9565743
334 6.9185079 0.2440980
335 6.7359209 7.1072908
336 3.8034763 5.8466404
337 3.4583027 6.9041502
338 1.7983897 1.7108336
339 6.9184406 6.3632716
340 1.3538600 6.8484462
341 3.6731748 4.9846946
342 5.6139620 8.0637827
343 9.0991782 2.3051189
344 1.1220448 8.9624365
345 2.5925265 8.3673795
346 9.9977377 8.5423564
347 5.1761187 5.1240824
348 5.9330451 9.4141322
349 6.3337224 6.8055697
350 2.7287418 5.7100024
351 6.1022411 2.9733360
352 2.7331869 3.7135612
353 6.7394034 8.2721572
354 2.1757932 9.0574057
355 5.5011486 6.0124142
356 4.5301911 2.5865048
357 5.3137001 0.7062267
358 0.6959286 3.2395043
359 5.3494169 6.5742589
360 7.1472046 6.3821916
361 0.1749855 0.3954287
362 6.7709760 6.5212015
363 7.2983482 3.0086604
364 0.6147726 9.3336870
365 7.4417342 2.6836695
366 1.2769881 4.0591093
367 9.5342317 5.3443613
368 0.9368862 1.1391497
369 8.4271193 8.6641296
370 6.2000851 8.2987486
371 2.1768279 6.0684896
372 5.2021222 6.9222675
373 0.6095874 8.4759464
374 2.0217473 9.5844241
375 4.8080163 6.5052801
376 3.6099334 0.3272768
377 6.0132712 7.9920535
378 4.0495344 8.8153621
379 6.9646704 7.0375214
380 3.9211171 2.5994333
381 4.4749268 1.0517360
382 1.1683429 3.8710614
383 1.7618115 0.3513996
384 1.1257639 5.7446745
385 3.7351688 8.7376011
386 4.9234662 7.1975462
387 7.4899861 7.3846309
388 7.4170082 2.2885060
389 0.8526702 3.8160722
390 4.5907512 8.9315418
391 7.6996179 9.8409051
392 0.2340987 4.2906009
393 2.2502736 1.7819172
394 3.5679969 1.7419479
395 5.4214908 5.6001803
396 3.9965213 9.2021549
397 3.8610336 2.0462740
398 5.9490575 4.4422382
399 9.8897791 5.6402915
400 6.1153192 4.1236797
401 5.8906384 2.6153750
402 8.0582664 2.7137804
403 7.2969209 2.9362187
404 3.8673527 1.0837191
405 3.5647339 6.2338014
406 9.6490210 0.8373270
407 0.8133243 6.3393130
408 2.8760565 9.9462423
409 3.3836457 7.4451869
410 4.7772609 2.9141127
411 8.6635971 5.7812494
412 5.6192160 1.4764255
413 9.1334625 8.9822399
414 0.4662385 6.6440937
415 3.4503559 4.2064800
416 0.6704780 2.8508758
417 0.5211872 4.3109175
418 7.5615411 9.2851454
419 7.5081906 4.0019450
420 8.8851669 9.7323717
421 7.3856288 8.6152906
422 9.5926351 0.3993818
423 1.4478981 1.4845263
424 5.0425560 1.3501638
425 0.8952120 7.9407680
426 6.4732584 7.1493210
427 9.6595225 5.2377876
428 7.2204625 2.0300222
429 3.5410601 7.3117738
430 6.7991771 3.6368291
Just for clarification, I want to get something like this plot below (this plot doesn't have to do anything with my raw data, I am just showing it to explain the problem more clearly! If I use hist(df$x) it will show the distribution of x only.)
The ggplot is elegant and fast and pretty, as usual. But if you want to use base graphics (image, contour, persp) and display your actual frequencies (instead of the smoothing 2D kernel), you have to first obtain the binnings yourself and create a matrix of frequencies. Here's some code (not necessarily elegant, but pretty robust) that does 2D binning and generates plots somewhat similar to the ones above:
require(mvtnorm)
xy <- rmvnorm(1000,c(5,10),sigma=rbind(c(3,-2),c(-2,3)))
nbins <- 20
x.bin <- seq(floor(min(xy[,1])), ceiling(max(xy[,1])), length=nbins)
y.bin <- seq(floor(min(xy[,2])), ceiling(max(xy[,2])), length=nbins)
freq <- as.data.frame(table(findInterval(xy[,1], x.bin),findInterval(xy[,2], y.bin)))
freq[,1] <- as.numeric(freq[,1])
freq[,2] <- as.numeric(freq[,2])
freq2D <- diag(nbins)*0
freq2D[cbind(freq[,1], freq[,2])] <- freq[,3]
par(mfrow=c(1,2))
image(x.bin, y.bin, freq2D, col=topo.colors(max(freq2D)))
contour(x.bin, y.bin, freq2D, add=TRUE, col=rgb(1,1,1,.7))
palette(rainbow(max(freq2D)))
cols <- (freq2D[-1,-1] + freq2D[-1,-(nbins-1)] + freq2D[-(nbins-1),-(nbins-1)] + freq2D[-(nbins-1),-1])/4
persp(freq2D, col=cols)
For a really fun time, try making an interactive, zoomable, 3D surface:
require(rgl)
surface3d(x.bin,y.bin,freq2D/10, col="red")
Bivariate density estimates can be done with MASS::kde2d, or KernSmooth::bkde2D (both supplied with the base R distribution). The latter uses an algorithm based on the fast Fourier transform over a grid of points, and is very fast. The result can be plotted with contour or persp or similar functions in other graphing packages.
Using your data:
require(KernSmooth)
z <- bkde2D(df, .5)
persp(z$fhat)
If you want it with a 2d contour, you can also use the package ggplot2. Some example code is shown in this question:
gradient breaks in a ggplot stat_bin2d plot
Adjusted slightly:
x <- rnorm(10000)+5
y <- rnorm(10000)+5
df <- data.frame(x,y)
require(ggplot2)
p <- ggplot(df, aes(x, y))
p <- p + stat_bin2d(bins = 20)
p
Here's the output of the code above:
For completeness, you can also use the hist2d{gplots} function. It seems to be the most straightforward for a 2D plot:
library(gplots)
# data is in variable df
# define bin sizes
bin_size <- 0.5
xbins <- (max(df$x) - min(df$x))/bin_size
ybins <- (max(df$y) - min(df$y))/bin_size
# create plot
hist2d(df, same.scale=TRUE, nbins=c(xbins, ybins))
# if you want to retrieve the data for other purposes
df.hist2d <- hist2d(df, same.scale=TRUE, nbins=c(xbins, ybins), show=FALSE)
df.hist2d$counts
i came to this page from http://www.r-bloggers.com/5-ways-to-do-2d-histograms-in-r/ which lists one of the answers above.
It provides code samples for a total of 5 methods:
hist2d from the library gplots
hexbin,hexbinplot from the library hexbin
stat_bin2d from the library ggplot2
kde2d from the library MASS
the "hard way" solution listed above.
freq <- as.data.frame(table(findInterval(xy[,1], x.bin),findInterval(xy[,2], y.bin)))
freq[,1] <- as.numeric(freq[,1])
freq[,2] <- as.numeric(freq[,2])
This is probably wrong since it destroys the original indices.

Resources