convert float to integer - math

how to scale and by which factor to scale dctmtx coefficients from float to get following integer values:
float dctmtx:
( (0.3536 0.3536 0.3536 0.3536 0.3536 0.3536 0.3536 0.3536),
(0.4904 0.4157 0.2778 0.0975 -0.0975 -0.2778 -0.4157 -0.4904),
(0.4619 0.1913 -0.1913 -0.4619 -0.4619 -0.1913 0.1913 0.4619),
(0.4157 -0.0975 -0.4904 -0.2778 0.2778 0.4904 0.0975 -0.4157),
(0.3536 -0.3536 -0.3536 0.3536 0.3536 -0.3536 -0.3536 0.3536),
(0.2778 -0.4904 0.0975 0.4157 -0.4157 -0.0975 0.4904 -0.2778),
(0.1913 -0.4619 0.4619 -0.1913 -0.1913 0.4619 -0.4619 0.1913),
(0.0975 -0.2778 0.4157 -0.4904 0.4904 -0.4157 0.2778 -0.0975)
)
integer dctmtx:
(( 125, 122, 115, 103, 88, 69, 47, 24 ),
( 125, 103, 47, -24, -88, -122, -115, -69 ),
( 125, 69, -47, -122, -88, 24, 115, 103 ),
( 125, 24, -115, -69, 88, 103, -47, -122 ),
( 125, -24, -115, 69, 88, -103, -47, 122 ),
( 125, -69, -47, 122, -88, -24, 115, -103 ),
( 125, -103, 47, 24, -88, 122, -115, 69 ),
( 125, -122, 115, -103, 88, -69, 47, -24 )
);

Besides one of the two matrices being rotated the two don't appear to have a direct linear relationship:
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
int main (int argc, char *argv[])
{
float dctmtx[8][8] =
{ 0.3536, 0.3536, 0.3536, 0.3536, 0.3536, 0.3536, 0.3536, 0.3536,
0.4904, 0.4157, 0.2778, 0.0975, -0.0975, -0.2778, -0.4157, -0.4904,
0.4619, 0.1913, -0.1913, -0.4619, -0.4619, -0.1913, 0.1913, 0.4619,
0.4157, -0.0975, -0.4904, -0.2778, 0.2778, 0.4904, 0.0975, -0.4157,
0.3536, -0.3536, -0.3536, 0.3536, 0.3536, -0.3536, -0.3536, 0.3536,
0.2778, -0.4904, 0.0975, 0.4157, -0.4157, -0.0975, 0.4904, -0.2778,
0.1913, -0.4619, 0.4619, -0.1913, -0.1913, 0.4619, -0.4619, 0.1913,
0.0975, -0.2778, 0.4157, -0.4904, 0.4904, -0.4157, 0.2778, -0.0975
};
int j,k, i;
float m;
for ( j = 0; j < 8; j++) {
for ( k = 0; k < 8; k++) {
if ( k == 0)
m = (dctmtx[k][j] * 354) ;
else
m = (dctmtx[k][j] * 248) ;
i = lroundf(m);
printf("%4d ",i);
}
printf("\n");
}
}
The first coefficient in each row appears to be to a different accuracy than the remaining:
%% convftoi
125 122 115 103 88 69 47 24
125 103 47 -24 -88 -122 -115 -69
125 69 -47 -122 -88 24 115 103
125 24 -115 -69 88 103 -47 -122
125 -24 -115 69 88 -103 -47 122
125 -69 -47 122 -88 -24 115 -103
125 -103 47 24 -88 122 -115 69
125 -122 115 -103 88 -69 47 -24
After a little finessing to find scaling factors that did give a match.
addendum
After LutzL's answer I derived the float coefficient matrix algorithmically:
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#define PI 3.14159265359
int main (int argc, char *argv[])
{
float calcmtx[8][8];
int j,k, i;
float m;
printf("float coefficients calculated\n");
for ( j = 0; j < 8; j++) {
for ( k = 0; k < 8; k++) {
if ( j == 0)
m = cos(PI*j*(2*k+1)/16)/(sqrt(2)*2) ;
else
m = cos(PI*j*(2*k+1)/16)/2 ;
calcmtx[k][j] = floorf(m*10000 + 0.5)/10000;
}
}
for ( j = 0; j < 8; j++) {
for ( k = 0; k < 8; k++) {
printf("% 2.4f ", calcmtx[k][j]);
}
printf("\n");
}
printf("\n") ;
printf("integer coefficients derived\n");
for ( j = 0; j < 8; j++) {
for ( k = 0; k < 8; k++) {
if (k == 0)
m = sqrt(2);
else
m = 1;
i = (int) (calcmtx[j][k] * 250 * m);
printf("%4d ", i);
}
printf("\n");
}
printf("\n") ;
printf("approximated integer coefficients\n");
for ( j = 0; j < 8; j++) {
for ( k = 0; k < 8; k++) {
if ( k == 0)
m = calcmtx[j][k] * 354 ;
else
m = calcmtx[j][k] * 248 ;
i = lroundf(m);
printf("%4d ", i);
}
printf("\n");
}
}
And we see that the integer matrix first coefficient is multiplied by the square root of two:
%% gencoeffi
float coefficients calculated
0.3536 0.3536 0.3536 0.3536 0.3536 0.3536 0.3536 0.3536
0.4904 0.4157 0.2778 0.0975 -0.0975 -0.2778 -0.4157 -0.4904
0.4619 0.1913 -0.1913 -0.4619 -0.4619 -0.1913 0.1913 0.4619
0.4157 -0.0975 -0.4904 -0.2778 0.2778 0.4904 0.0975 -0.4157
0.3536 -0.3536 -0.3536 0.3536 0.3536 -0.3536 -0.3536 0.3536
0.2778 -0.4904 0.0975 0.4157 -0.4157 -0.0975 0.4904 -0.2778
0.1913 -0.4619 0.4619 -0.1913 -0.1913 0.4619 -0.4619 0.1913
0.0975 -0.2778 0.4157 -0.4904 0.4904 -0.4157 0.2778 -0.0975
integer coefficients derived
125 122 115 103 88 69 47 24
125 103 47 -24 -88 -122 -115 -69
125 69 -47 -122 -88 24 115 103
125 24 -115 -69 88 103 -47 -122
125 -24 -115 69 88 -103 -47 122
125 -69 -47 122 -88 -24 115 -103
125 -103 47 24 -88 122 -115 69
125 -122 115 -103 88 -69 47 -24
approximated integer coefficients
125 122 115 103 88 69 47 24
125 103 47 -24 -88 -122 -115 -69
125 69 -47 -122 -88 24 115 103
125 24 -115 -69 88 103 -47 -122
125 -24 -115 69 88 -103 -47 122
125 -69 -47 122 -88 -24 115 -103
125 -103 47 24 -88 122 -115 69
125 -122 115 -103 88 -69 47 -24
Which matches the approximation when the float accuracy is limited.

If you read up on the discrete cosine transform, you will find that the basic coefficient is
cos(pi*i*(2*j+1)/16), i,j=0..7
Then the first table consists of these values scaled by 0.5, except for the first row/column, which is scaled by 0.25*sqrt(2)=1/sqrt(8). Which is the correct way to obtain an orthogonal matrix. The square sum of the first column is 8, of the others 4.
The second table is the rounded results when multiplying the cosine values with 125, uniformly. Here one has to take care to properly rescale the vector when using the transpose matrix to compute the inverse transform.
First table reproduced, except for the first column:
> [[ Cos(pi*i*(2*j+1)/16)/2 : i in [0..7] ]: j in [0..7] ];
[
[ 0.5, 0.49039264, 0.46193977, 0.41573481, 0.35355339, 0.27778512, 0.19134172, 0.09754516 ],
[ 0.5, 0.41573481, 0.19134172, -0.09754516, -0.35355339, -0.49039264, -0.46193977, -0.27778512 ],
[ 0.5, 0.27778512, -0.19134172, -0.49039264, -0.35355339, 0.09754516, 0.46193977, 0.41573481 ],
[ 0.5, 0.09754516, -0.46193977, -0.27778512, 0.35355339, 0.41573481, -0.19134172, -0.49039264 ],
[ 0.5, -0.09754516, -0.46193977, 0.27778512, 0.35355339, -0.41573481, -0.19134172, 0.49039264 ],
[ 0.5, -0.27778512, -0.19134172, 0.49039264, -0.35355339, -0.09754516, 0.46193977, -0.41573481 ],
[ 0.5, -0.41573481, 0.19134172, 0.09754516, -0.35355339, 0.49039264, -0.46193977, 0.27778512 ],
[ 0.5, -0.49039264, 0.46193977, -0.41573481, 0.35355339, -0.27778512, 0.19134172, -0.09754516 ]
]
Second table, before integer rounding
> [[ Cos( pi*i*(2*j+1)/16 ) *125 : i in [0..7] ]: j in [0..7] ];
[
[ 125, 122.5982, 115.4849, 103.9337, 88.3883, 69.4463, 47.8354, 24.3863 ],
[ 125, 103.9337, 47.8354, -24.3863, -88.3883, -122.5982, -115.4849, -69.4463 ],
[ 125, 69.4463, -47.8354, -122.5982, -88.3883, 24.3863, 115.4849, 103.9337 ],
[ 125, 24.3863, -115.4849, -69.4463, 88.3883, 103.9337, -47.8354, -122.5982 ],
[ 125, -24.3863, -115.4849, 69.4463, 88.3883, -103.9337, -47.8354, 122.5982 ],
[ 125, -69.4463, -47.8354, 122.5982, -88.3883, -24.3863, 115.4849, -103.9337 ],
[ 125, -103.9337, 47.8354, 24.3863, -88.3883, 122.5982, -115.4849, 69.4463 ],
[ 125, -122.5982, 115.4849, -103.9337, 88.3883, -69.4463, 47.8354, -24.3863 ]
]

Related

New column with percentage change in R

How do I make a new column in DF with the percentage change in share price over the year?
DF <- data.frame(name = c("EQU", "YAR", "MOWI", "AUSS", "GJF", "KOG", "SUBC"),
price20 = c(183, 343, 189, 88, 179, 169, 62),
price21 = c(221, 453, 183, 85, 198, 232, 67))
Here's a line that will do it for you. I've also added a round function to it so the table is more readable.
DF$percent_change <- round((DF$price21 - DF$price20) / DF$price20 * 100, 2)
name price20 price21 percent_change
1 EQU 183 221 20.77
2 YAR 343 453 32.07
3 MOWI 189 183 -3.17
4 AUSS 88 85 -3.41
5 GJF 179 198 10.61
6 KOG 169 232 37.28
7 SUBC 62 67 8.06
This line should do it:
DF$change <- (DF$price21/DF$price20*100) - 100
We can use a simple division and scales::percent:
library(dplyr)
DF %>% mutate(percent_change = scales::percent((price21-price20)/price20))
name price20 price21 percent_change
1 EQU 183 221 20.77%
2 YAR 343 453 32.07%
3 MOWI 189 183 -3.17%
4 AUSS 88 85 -3.41%
5 GJF 179 198 10.61%
6 KOG 169 232 37.28%
7 SUBC 62 67 8.06%

ggplot_line: label the top 2 peak with X-axis values

I am new to R programming. I am plotting a mass spectrum with ggplot and would like to label the top 2 peaks with their x-axis values (i.e. m). Does anyone know how to achieve that?
Thanks so much for your help!
Here is part of the raw data I used for the ggplot.
m Intensity
1 30001 2.964e+01
2 30002 3.336e+01
3 30003 3.968e+01
4 30004 5.015e+01
5 30005 6.838e+01
6 30006 1.016e+02
7 30007 1.464e+02
8 30008 2.130e+02
9 30009 3.115e+02
10 30010 3.951e+02
11 30011 5.134e+02
12 30012 5.316e+02
13 30013 6.377e+02
14 30014 8.813e+02
15 30015 1.071e+03
16 30016 1.119e+03
17 30017 1.202e+03
18 30018 1.299e+03
19 30019 1.112e+03
20 30020 1.205e+03
21 30021 1.422e+03
22 30022 1.653e+03
23 30023 1.726e+03
24 30024 2.423e+03
25 30025 3.059e+03
26 30026 3.267e+03
27 30027 3.993e+03
28 30028 5.172e+03
29 30029 5.278e+03
30 30030 2.794e+03
31 30031 1.459e+03
32 30032 2.512e+03
33 30033 6.590e+03
34 30034 1.245e+04
35 30035 1.144e+04
36 30036 5.197e+03
37 30037 6.012e+03
38 30038 1.453e+04
39 30039 1.513e+04
40 30040 5.802e+03
41 30041 9.226e+03
42 30042 5.809e+03
43 30043 3.074e+03
44 30044 3.882e+03
45 30045 9.941e+02
46 30046 8.170e+02
47 30047 1.149e+03
48 30048 3.567e+02
49 30049 3.805e+02
50 30050 3.654e+02
51 30051 4.724e+02
52 30052 7.819e+02
53 30053 8.634e+02
54 30054 5.235e+02
55 30055 1.712e+02
56 30056 9.232e+01
57 30057 9.434e+01
58 30058 7.191e+01
59 30059 8.036e+01
60 30060 4.456e+01
61 30061 9.428e+01
62 30062 9.392e+01
63 30063 8.413e+01
64 30064 5.671e+01
65 30065 2.639e+01
66 30066 2.027e+01
67 30067 4.584e+01
68 30068 6.956e+01
69 30069 6.181e+01
70 30070 6.450e+01
71 30071 2.826e+01
72 30072 3.610e+01
73 30073 6.325e+01
74 30074 3.509e+01
75 30075 3.478e+01
76 30076 1.120e+01
77 30077 6.993e+00
78 30078 9.936e+00
79 30079 7.738e+00
80 30080 9.771e+00
81 30081 1.762e+01
82 30082 3.060e+01
83 30083 2.175e+01
84 30084 2.816e+01
85 30085 2.700e+01
86 30086 2.114e+01
87 30087 4.378e+01
88 30088 5.824e+01
89 30089 6.193e+01
90 30090 4.146e+01
91 30091 9.697e+04
92 30092 9.458e+04
93 30093 9.216e+04
94 30094 8.972e+04
95 30095 8.723e+04
96 30096 8.468e+04
97 30097 8.211e+04
98 30098 7.959e+04
99 30099 7.726e+04
100 30100 7.527e+04
101 30101 7.379e+04
102 30102 7.298e+04
103 30103 7.301e+04
104 30104 7.399e+04
105 30105 7.602e+04
106 30106 7.916e+04
107 30107 8.340e+04
108 30108 8.862e+04
109 30109 9.460e+04
110 30110 1.010e+05
111 30111 1.074e+05
112 30112 1.133e+05
113 30113 1.180e+05
114 30114 1.211e+05
115 30115 1.222e+05
116 30116 1.213e+05
117 30117 1.186e+05
118 30118 1.146e+05
119 30119 1.100e+05
120 30120 1.054e+05
121 30121 1.014e+05
122 30122 9.838e+04
123 30123 9.637e+04
124 30124 9.535e+04
125 30125 9.508e+04
126 30126 9.520e+04
127 30127 9.527e+04
128 30128 9.484e+04
129 30129 9.355e+04
130 30130 9.128e+04
131 30131 8.809e+04
132 30132 8.425e+04
133 30133 8.012e+04
134 30134 7.603e+04
135 30135 7.225e+04
136 30136 6.895e+04
137 30137 6.617e+04
138 30138 6.392e+04
139 30139 6.214e+04
140 30140 6.078e+04
141 30141 5.980e+04
142 30142 5.922e+04
143 30143 5.905e+04
144 30144 5.934e+04
145 30145 6.013e+04
146 30146 6.143e+04
147 30147 6.324e+04
148 30148 6.552e+04
149 30149 6.816e+04
150 30150 7.100e+04
151 30151 7.384e+04
152 30152 7.655e+04
153 30153 7.904e+04
154 30154 8.132e+04
155 30155 8.353e+04
156 30156 8.595e+04
157 30157 8.896e+04
158 30158 9.302e+04
159 30159 9.864e+04
160 30160 1.063e+05
161 30161 1.165e+05
162 30162 1.293e+05
163 30163 1.443e+05
164 30164 1.605e+05
165 30165 1.759e+05
166 30166 1.883e+05
167 30167 1.957e+05
168 30168 1.969e+05
169 30169 1.921e+05
170 30170 1.824e+05
171 30171 1.693e+05
172 30172 1.544e+05
173 30173 1.390e+05
174 30174 1.241e+05
175 30175 1.102e+05
176 30176 9.755e+04
177 30177 8.644e+04
178 30178 7.692e+04
179 30179 6.900e+04
180 30180 6.262e+04
181 30181 5.766e+04
182 30182 5.397e+04
183 30183 5.137e+04
184 30184 4.972e+04
185 30185 4.889e+04
186 30186 4.881e+04
187 30187 4.940e+04
188 30188 5.059e+04
189 30189 5.230e+04
190 30190 5.444e+04
191 30191 5.690e+04
192 30192 5.960e+04
193 30193 6.244e+04
194 30194 6.539e+04
195 30195 6.842e+04
196 30196 7.153e+04
197 30197 7.471e+04
198 30198 7.795e+04
199 30199 8.118e+04
200 30200 8.430e+04
201 30201 8.719e+04
202 30202 8.976e+04
203 30203 9.193e+04
204 30204 9.364e+04
205 30205 9.480e+04
206 30206 9.531e+04
207 30207 9.504e+04
208 30208 9.391e+04
209 30209 9.189e+04
210 30210 8.912e+04
211 30211 8.587e+04
212 30212 8.251e+04
213 30213 7.939e+04
214 30214 7.680e+04
215 30215 7.492e+04
216 30216 7.381e+04
217 30217 7.349e+04
218 30218 7.394e+04
219 30219 7.510e+04
220 30220 7.690e+04
221 30221 7.919e+04
222 30222 8.174e+04
223 30223 8.425e+04
224 30224 8.637e+04
225 30225 8.776e+04
226 30226 8.826e+04
227 30227 8.788e+04
228 30228 8.690e+04
229 30229 8.569e+04
230 30230 8.465e+04
231 30231 8.405e+04
232 30232 8.398e+04
233 30233 8.434e+04
234 30234 8.494e+04
235 30235 8.554e+04
236 30236 8.598e+04
237 30237 8.623e+04
238 30238 8.638e+04
239 30239 8.665e+04
240 30240 8.736e+04
241 30241 8.884e+04
242 30242 9.147e+04
243 30243 9.559e+04
244 30244 1.016e+05
245 30245 1.097e+05
246 30246 1.200e+05
247 30247 1.321e+05
Here is my code for ggplot:
ggplot(data=raw.1) +
geom_line(mapping = aes(x=m, y=Intensity))
Below is the ggplot output:
I would do it this way. My solution requires the ggrepel package as well as some dplyr functions. The key to this working is that you can set data = for each geom_ layer in ggplot2. The geom_text_repel() layer from ggrepel ensures that the labels will not overlap your data from geom_line().
library(ggplot2)
library(dplyr)
library(ggrepel)
ggplot(mapping = aes(x = m, y = Intensity, label = m)) +
geom_line(data=raw.1) +
geom_text_repel(data = raw.1 %>%
arrange(desc(Intensity)) %>% # arranges in descending order
slice_head(n = 2)) # only keeps the top two intensities.
My plot does not look like yours since you only shared the first 247 data points. I suspect that this initial solution might not work for you because I am a chemist and have some idea what you hope to accomplish. This approach labels the top two highest intensities, not necessarily the top two peaks. We need to identify local all maxima and then select the two tallest.
Here is how we do that. The following code calculates the slope between each point, and then looks for points where a positive slope changes to a negative slope (local maximum), then it sorts and selects the top two by intensity.
top_two <- raw.1 %>%
mutate(deriv = Intensity - lag(Intensity) ,
max = case_when(deriv >=0 & lead(deriv) <0 ~ T,
T ~ F)) %>%
filter(max) %>%
arrange(desc(Intensity)) %>%
slice_head(n = 2)
Let's modify the original plot code to put this in.
ggplot(mapping = aes(x = m, y = Intensity, label = m)) +
geom_line(data = raw.1) +
geom_text_repel(data = top_two, nudge_y = 1e4)
Data:
raw.1 <- structure(list(m = c(30001, 30002, 30003, 30004, 30005, 30006,
30007, 30008, 30009, 30010, 30011, 30012, 30013, 30014, 30015,
30016, 30017, 30018, 30019, 30020, 30021, 30022, 30023, 30024,
30025, 30026, 30027, 30028, 30029, 30030, 30031, 30032, 30033,
30034, 30035, 30036, 30037, 30038, 30039, 30040, 30041, 30042,
30043, 30044, 30045, 30046, 30047, 30048, 30049, 30050, 30051,
30052, 30053, 30054, 30055, 30056, 30057, 30058, 30059, 30060,
30061, 30062, 30063, 30064, 30065, 30066, 30067, 30068, 30069,
30070, 30071, 30072, 30073, 30074, 30075, 30076, 30077, 30078,
30079, 30080, 30081, 30082, 30083, 30084, 30085, 30086, 30087,
30088, 30089, 30090, 30091, 30092, 30093, 30094, 30095, 30096,
30097, 30098, 30099, 30100, 30101, 30102, 30103, 30104, 30105,
30106, 30107, 30108, 30109, 30110, 30111, 30112, 30113, 30114,
30115, 30116, 30117, 30118, 30119, 30120, 30121, 30122, 30123,
30124, 30125, 30126, 30127, 30128, 30129, 30130, 30131, 30132,
30133, 30134, 30135, 30136, 30137, 30138, 30139, 30140, 30141,
30142, 30143, 30144, 30145, 30146, 30147, 30148, 30149, 30150,
30151, 30152, 30153, 30154, 30155, 30156, 30157, 30158, 30159,
30160, 30161, 30162, 30163, 30164, 30165, 30166, 30167, 30168,
30169, 30170, 30171, 30172, 30173, 30174, 30175, 30176, 30177,
30178, 30179, 30180, 30181, 30182, 30183, 30184, 30185, 30186,
30187, 30188, 30189, 30190, 30191, 30192, 30193, 30194, 30195,
30196, 30197, 30198, 30199, 30200, 30201, 30202, 30203, 30204,
30205, 30206, 30207, 30208, 30209, 30210, 30211, 30212, 30213,
30214, 30215, 30216, 30217, 30218, 30219, 30220, 30221, 30222,
30223, 30224, 30225, 30226, 30227, 30228, 30229, 30230, 30231,
30232, 30233, 30234, 30235, 30236, 30237, 30238, 30239, 30240,
30241, 30242, 30243, 30244, 30245, 30246, 30247), Intensity = c(29.64,
33.36, 39.68, 50.15, 68.38, 101.6, 146.4, 213, 311.5, 395.1,
513.4, 531.6, 637.7, 881.3, 1071, 1119, 1202, 1299, 1112, 1205,
1422, 1653, 1726, 2423, 3059, 3267, 3993, 5172, 5278, 2794, 1459,
2512, 6590, 12450, 11440, 5197, 6012, 14530, 15130, 5802, 9226,
5809, 3074, 3882, 994.1, 817, 1149, 356.7, 380.5, 365.4, 472.4,
781.9, 863.4, 523.5, 171.2, 92.32, 94.34, 71.91, 80.36, 44.56,
94.28, 93.92, 84.13, 56.71, 26.39, 20.27, 45.84, 69.56, 61.81,
64.5, 28.26, 36.1, 63.25, 35.09, 34.78, 11.2, 6.993, 9.936, 7.738,
9.771, 17.62, 30.6, 21.75, 28.16, 27, 21.14, 43.78, 58.24, 61.93,
41.46, 96970, 94580, 92160, 89720, 87230, 84680, 82110, 79590,
77260, 75270, 73790, 72980, 73010, 73990, 76020, 79160, 83400,
88620, 94600, 101000, 107400, 113300, 118000, 121100, 122200,
121300, 118600, 114600, 110000, 105400, 101400, 98380, 96370,
95350, 95080, 95200, 95270, 94840, 93550, 91280, 88090, 84250,
80120, 76030, 72250, 68950, 66170, 63920, 62140, 60780, 59800,
59220, 59050, 59340, 60130, 61430, 63240, 65520, 68160, 71000,
73840, 76550, 79040, 81320, 83530, 85950, 88960, 93020, 98640,
106300, 116500, 129300, 144300, 160500, 175900, 188300, 195700,
196900, 192100, 182400, 169300, 154400, 139000, 124100, 110200,
97550, 86440, 76920, 69000, 62620, 57660, 53970, 51370, 49720,
48890, 48810, 49400, 50590, 52300, 54440, 56900, 59600, 62440,
65390, 68420, 71530, 74710, 77950, 81180, 84300, 87190, 89760,
91930, 93640, 94800, 95310, 95040, 93910, 91890, 89120, 85870,
82510, 79390, 76800, 74920, 73810, 73490, 73940, 75100, 76900,
79190, 81740, 84250, 86370, 87760, 88260, 87880, 86900, 85690,
84650, 84050, 83980, 84340, 84940, 85540, 85980, 86230, 86380,
86650, 87360, 88840, 91470, 95590, 101600, 109700, 120000, 132100
)), row.names = c(NA, -247L), class = c("tbl_df", "tbl", "data.frame"
))
This approach assumes or treats your x-axis as discrete values of a continuous variable and finds the local maxima based on 2nd derivative using code from Finding local maxima and minima
Rest of the plotting is similar to Ben Norris's answer using geom_text_repel() to label the points of interest.
Also as noted, the data your provided are different vs. the figure in your question.
library(ggplot2)
library(ggrepel)
# find local maxima aka peaks
local_maximas <- raw.1[which(diff(sign(diff(raw.1$Intensity)))==-2)+1,]
top2 <- tail(local_maximas[order(local_maximas$Intensity),],2) #subset of top 2 highest peaks
raw.1$label <- ifelse(raw.1$m %in% top2$m, raw.1$m, NA) #make labels for plot
ggplot(data = raw.1) +
geom_line(aes(x=m, y=Intensity)) +
geom_text_repel(aes(x = m, y = Intensity, label = label))

R - Iterate through a Data Frame using for Loop

I am trying to generate the below series (see attached image) based on the logic given below. I was able create the series for one product and store (code given below). i am having trouble when i try to generalize this for multiple product store combinations. Could you please advise if there is an easier way to do this.
Logic
a given
b lag of d by 4
c initial c for first week thereafter (c previous row + b current - a current)
d initial d - c current
my code
library(dplyr)
df = structure(list(
Product = c(11078931, 11078931, 11078931, 11078931, 11078931,
11078931, 12021216, 12021216, 12021216, 12021216,
12021216, 12021216, 10932270, 10932270, 10932270,
10932270, 10932270),
STORE = c(90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 547, 547,
547, 547, 547),
WEEK = c(201627, 201628, 201629, 201630, 201631, 201632, 201627, 201628,
201629, 201630, 201631, 201632, 201627, 201628, 201629, 201630,
201631),
WEEK_SEQ = c(914, 915, 916, 917, 918, 919, 914, 915, 916, 917, 918, 919,
914, 915, 916, 917, 918),
a = c(9.161, 9.087, 8.772, 8.698, 7.985, 6.985, 0.945, 0.734, 0.629, 0.599,
0.55, 0.583, 5.789, 5.694, 5.488, 5.47, 5.659),
initial_d = c(179, 179, 179, 179, 179, 179, 18, 18, 18, 18, 18, 18, 37, 37,
37, 37, 37),
Initial_c = c(62, 0, 0, 0, 0, 0, 33, 0, 0, 0, 0, 0, 59, 0, 0, 0, 0)
),
.Names = c("Product", "STORE", "WEEK", "WEEK_SEQ", "a", "initial_d",
"Initial_c"),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -17L))
# filter to extract one product and store
# df = df %>% filter(Product == 11078931) %>% filter(STORE == 90)
df$b = 0
df$c = 0
df$d = NA
c_init = 62
d_init = 179
df$d <- d_init
df$c[1] <- c_init
RQ <- function(df,...){
for(i in seq_along(df$WEEK_SEQ)){
if(i>4){
df[i, "b"] = round(df[i-4,"d"], digits = 0)# Calculate b with the lag
}
if(i>1){
df[i, "c"] = round(df[i-1, "c"] + df[i, "b"] - df[i, "a"], digits = 0) # calc c
}
df[i, "d"] <- round(d_init - df[i, "c"], digits = 0) # calc d
if(df[i, "d"] < 0) {
df[i, "d"] <- 0 # reset negative d values
}
}
return(df)
}
df = df %>% group_by(SKU_CD, STORE_CD) %>% RQ(df)
Expected output series
could you please advice what is wrong in my code. this code works fine for one product and store combination. but for multiple product and store it doesn't. thanks for your time and input!
Consider base R's by which subsets the input dataframe by each combination of factor types to return a list of the subsetted dataframes. Then, run a do.call(rbind, ...) to row bind the list into one final dataframe.
RQ_dfs <- by(df, df[c("Product", "STORE")], FUN=RQ)
finaldf <- do.call(rbind, RQ_dfs)
While I cannot achieve your outputted series screenshot with posted data, the filtered commented out pairing does show up finaldf:
# # A tibble: 17 × 10
# Product STORE WEEK WEEK_SEQ a initial_d Initial_c b c d
# * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 11078931 90 201627 914 9.161 179 62 0 62 117
# 2 11078931 90 201628 915 9.087 179 0 0 53 126
# 3 11078931 90 201629 916 8.772 179 0 0 44 135
# 4 11078931 90 201630 917 8.698 179 0 0 35 144
# 5 11078931 90 201631 918 7.985 179 0 117 144 35
# 6 11078931 90 201632 919 6.985 179 0 126 263 0
# 7 12021216 90 201627 914 0.945 18 33 0 0 179
# 8 12021216 90 201628 915 0.734 18 0 0 -1 180
# 9 12021216 90 201629 916 0.629 18 0 0 -2 181
# 10 12021216 90 201630 917 0.599 18 0 0 -3 182
# 11 12021216 90 201631 918 0.550 18 0 179 175 4
# 12 12021216 90 201632 919 0.583 18 0 180 354 0
# 13 10932270 547 201627 914 5.789 37 59 0 0 179
# 14 10932270 547 201628 915 5.694 37 0 0 -6 185
# 15 10932270 547 201629 916 5.488 37 0 0 -11 190
# 16 10932270 547 201630 917 5.470 37 0 0 -16 195
# 17 10932270 547 201631 918 5.659 37 0 179 157 22

clEnqueueWriteBuffer execution time in a loop

I have an OpenCL code where i invoke clEnqueueWriteBuffer and clEnqueueNDRangeKernel inside a loop multiple time. I measure the data transfer time and the kernel execution time of each loop using GetLocalTime function. The issue I am facing is that the clEnqueueWriteBuffer and clEnqueueNDRangeKernel in the first iteration takes much longer to complete than the ones in the second iteration. Why does this happen?
I am working on a system with ARM A10 APU. My opencl loop code is :
for(j = 0; j < PARTITION_COUNT; j++){
//Writing to input buffers
GetLocalTime(&start);
clEnqueueWriteBuffer(queue[0], buf_A, CL_TRUE, 0, PARTITION_SIZE * sizeof(int), input_A + (PARTITION_SIZE * j), 0, NULL, &eventList[0]);
checkErr(cl_err, "clEnqueueWriteBuffer : buf_A");
clEnqueueWriteBuffer(queue[1], buf_B, CL_TRUE, 0, PARTITION_SIZE * sizeof(int), input_B + (PARTITION_SIZE * j), 0, NULL, &eventList[1]);
checkErr(cl_err, "clEnqueueWriteBuffer : buf_B");
clEnqueueWriteBuffer(queue[2], buf_C, CL_TRUE, 0, PARTITION_SIZE * sizeof(int), input_C + (PARTITION_SIZE * j), 0, NULL, &eventList[2]);
checkErr(cl_err, "clEnqueueWriteBuffer : buf_C");
clEnqueueWriteBuffer(queue[3], buf_D, CL_TRUE, 0, PARTITION_SIZE * sizeof(int), input_D + (PARTITION_SIZE * j), 0, NULL, &eventList[3]);
checkErr(cl_err, "clEnqueueWriteBuffer : buf_D");
clFinish(queue[0]);
clFinish(queue[1]);
clFinish(queue[2]);
clFinish(queue[3]);
//getting end time
GetLocalTime(&end);
//displaying final time
cout<<"\nTime : "<<start.wMinute<<" "<<start.wSecond<<" "<<start.wMilliseconds;
cout<<"\nTime : "<<end.wMinute<<" "<<end.wSecond<<" "<<end.wMilliseconds;
GetLocalTime(&start);
cl_err = clEnqueueNDRangeKernel(queue[4],kernel[Q6_PROGRAM_ID][FILTER1_KERNEL],1,NULL,&globalSize,&localSize,4,eventList,&eventList[4]);
checkErr(cl_err, "clEnqueueNDRangeKernel : filter1_kernel");
//clFinish(queue[4]);
//Invoking the second filter kernel
cl_err = clEnqueueNDRangeKernel(queue[5],kernel[Q6_PROGRAM_ID][FILTER2_KERNEL],1,NULL,&globalSize,&localSize,1,eventList + 4,&eventList[5]);
checkErr(cl_err, "clEnqueueNDRangeKernel : filter2_kernel");
//clFinish(queue[5]);
//Invoking the third filter kernel
cl_err = clEnqueueNDRangeKernel(queue[6],kernel[Q6_PROGRAM_ID][FILTER3_KERNEL],1,NULL,&globalSize,&localSize,1,eventList + 5,&eventList[6]);
checkErr(cl_err, "clEnqueueNDRangeKernel : filter3_kernel");
//clFinish(queue[6]);
//Invoking the aggregate kernel
cl_err = clEnqueueNDRangeKernel(queue[8],kernel[Q6_PROGRAM_ID][AGGREGATE_KERNEL],1,NULL,&globalSize,&localSize,1,eventList + 6,&eventList[7]);
checkErr(cl_err, "clEnqueueNDRangeKernel : aggregate kernel");
output_A = (int *)clEnqueueMapBuffer(queue[9],output_buf_A,CL_TRUE, CL_MAP_READ, 0, rLen * sizeof(int), 1, eventList + 7, &eventList[8], &cl_err);
checkErr(cl_err, "clEnqueueReadBuffer : output_A");
for(i = 0; i < rLen; i++){
if(output_A[i] > 0){
//cout<<"\n"<<output_A[i];
sum += output_A[i];
}
}
clFinish(queue[4]);
clFinish(queue[5]);
clFinish(queue[6]);
clFinish(queue[8]);
clFinish(queue[9]);
GetLocalTime(&end);
//displaying final time
cout<<"\nTime1 : "<<start.wMinute<<" "<<start.wSecond<<" "<<start.wMilliseconds;
cout<<"\nTime1 : "<<end.wMinute<<" "<<end.wSecond<<" "<<end.wMilliseconds;
}
GetLocalTime(&end1);
//displaying final time
cout<<"\nTime2 : "<<start1.wMinute<<" "<<start1.wSecond<<" "<<start1.wMilliseconds;
cout<<"\nTime2 : "<<end1.wMinute<<" "<<end1.wSecond<<" "<<end1.wMilliseconds;
my output is :
Time : 27 30 404
Time : 27 30 466
Time1 : 27 30 474
Time1 : 27 30 547
Time : 27 30 551
Time : 27 30 555
Time1 : 27 30 561
Time1 : 27 30 582
Time : 27 30 587
Time : 27 30 591
Time1 : 27 30 597
Time1 : 27 30 617
Time : 27 30 622
Time : 27 30 627
Time1 : 27 30 638
Time1 : 27 30 659
Time : 27 30 670
Time : 27 30 675
Time1 : 27 30 679
Time1 : 27 30 699
Time : 27 30 706
Time : 27 30 711
Time1 : 27 30 718
Time1 : 27 30 737
Time2 : 27 30 404
Time2 : 27 30 743
PROGRAM EXECUTION OVER

Using date time data with ggplot scale_colour_gradient

I am plotting some time series GPS coordinates using ggmap and ggplot. I want to visualise the time series by creating a colour gradient. I have had a couple of attempts so far as shown below.
My data can be accessed here
Import data
Dec7 = read.csv("7-12-15.csv", header = TRUE, stringsAsFactors = FALSE)
Dec7$timestamp <- as.Date(Dec7$timestamp)
head(Dec7)
X_id seq_id timestamp lon address lat rssi sensor gps_quality batt_v
1 56656ecd0dd8e408d8c2e43f 71 2015-12-07 -3.780899 208 53.20252 -63 gps 1 3274
2 56656ed20dd8e408d8c2e440 72 2015-12-07 -3.780958 208 53.20246 -63 gps 1 3274
3 56656edc0dd8e408d8c2e441 73 2015-12-07 -3.780967 208 53.20246 -65 gps 1 3274
4 56656ee60dd8e408d8c2e442 74 2015-12-07 -3.780968 208 53.20242 -64 gps 1 3274
5 56656ef10dd8e408d8c2e443 75 2015-12-07 -3.780997 208 53.20240 -64 gps 1 3274
6 56656efa0dd8e408d8c2e446 76 2015-12-07 -3.780965 208 53.20243 -64 gps 1 3274
str(Dec7)
data.frame': 22420 obs. of 10 variables:
$ X_id : chr "56656ecd0dd8e408d8c2e43f" "56656ed20dd8e408d8c2e440" "56656edc0dd8e408d8c2e441" "56656ee60dd8e408d8c2e442" ...
$ seq_id : int 71 72 73 74 75 76 77 78 86 87 ...
$ timestamp : Date, format: "2015-12-07" "2015-12-07" "2015-12-07" "2015-12-07" ...
$ lon : num -3.78 -3.78 -3.78 -3.78 -3.78 ...
$ address : num 208 208 208 208 208 208 208 208 208 208 ...
$ lat : num 53.2 53.2 53.2 53.2 53.2 ...
$ rssi : int -63 -63 -65 -64 -64 -64 -64 -63 -64 -64 ...
$ sensor : chr "gps" "gps" "gps" "gps" ...
$ gps_quality: int 1 1 1 1 1 1 1 1 1 1 ...
$ batt_v : int 3274 3274 3274 3274 3274 3274 3274 3274 3274 3274 ...
I have classed timestamp as.Date as I am aware that this can then be passed successfully into scale_colour_gradient within the ggplot call as follows:
mapImageData <- get_googlemap(center = c(lon = median(Dec7$lon),
lat = median(Dec7$lat)), zoom = 17,
size = c(500, 500),
maptype = c("satellite"))
sheep_hiraetlyn_Dec7_map = ggmap(mapImageData,extent = "device") +
geom_point(aes(x = lon,y = lat, color=timestamp),
data = Dec7, size = 1, pch = 20) +
scale_color_gradient(trans = "date", low="red", high="blue")
This produces the following map:
As you can see the colour gradient is discrete rather than the desired continuous gradient - presumably this is because it categorizes the timestamp into days?
Also The legend labels consist of 2 overlayed labels so are not clear.
I have tried using as.POSIXct but this cannot be passed to trans.
I have also used as.Integer, which creates a nice colour gradient but the legend cannot be interpreted in terms of date/time.
Any ideas how I can get round this problem?
Thanks
Convert the POSIXct timestamps to integer and define the breaks and labels manually
Dec7$time <- as.integer(Dec7$timestamp)
labels <- pretty(Dec7$timestamp, 5)
ggmap(mapImageData,extent = "device") +
geom_point(
aes(x = lon,y = lat, color=time),
data = Dec7, size = 1, pch = 20
) +
scale_color_gradient(
low="red", high="blue",
breaks = as.integer(labels),
labels = format(labels, "%m/%d %H:%M")
)

Resources