generate human-frinedly ticks in power scale - r

I want to find an algorithm to generate ticks in a power scale in a human-friendly way.
For example, if the power is 1/2, between the range of [0, 100], without considering human friendliness, the ticks may be (0, 1, 4, 9, 25, 36, 49, 64, 81, 100).
However, in order to make a plot in 1/2-power scale labeled with the ticks, it would be better to round the ticks to things in multiple of 1's 2's 5's and 10's for this specific example whenever appropriate.
So the human-friendly version of the numbers may be (0, 1, 5, 10, 25, 35, 50, 65, 80, 100) (if the input parameter of the number of ticks is around 10). This is easy to be done manually for specific examples like this.
How to come up with a general algorithm that will work for any positive power and any non-negative intervals (note the interval boundaries may not integers, they can be arbitrary positive real numbers) so that the algorithmic result would be the same as what would be chosen by a human?

This is how to round a power sequence to numbers dividable by 5:
power <- 2
a <- sapply(seq(10), function(x) x ** power)
a
#> [1] 1 4 9 16 25 36 49 64 81 100
b <- seq(0, 100, by = 5)
sapply(a, function(a, b) {b[which.min(abs(a-b))]}, b)
#> [1] 0 5 10 15 25 35 50 65 80 100
Created on 2022-04-29 by the reprex package (v2.0.0)

Related

how to do fourier frequency matrix multiplication if size is different?

sorry this is not a program issue.
I just get confused for this Theory:
The FFT of a convolution is equal to the multiplication of their own's FFT.
i.e.:
FFT(conv(x,y)) = FFT(x) * FFT(y)
for the left side:
lets say i have a image with 100x100 size and kernel 3x3, if I convolve, i will get a matrix of 98x98, then its FFT will also be 98x98
for the right side:
if I take FFT for each I will get a frequency matrix of 3x3 and 100x100 respectively.
Then how should i do the multiplication? Some of you may say we can pad the 3x3 kernel to 100x100 and take FFT, but still we will get a matrix of 100x100 instead of 98x98?
Can someone give me some hints?
A convolution of two signals of size L and P respectively will have a result of size N = L + N - 1.
Therefore, the mathematically correct implementation of conv(x,y) will have size 102x102. You should zero pad to both x and y to make them of size 102.
When you perform the convolution as CNN convolution layers does (which is what I think you are doing) without any zero padding, you are actually cropping the result (you are leaving outside the border results).
Therefore, you can just do a 102x102 fft result and crop accordingly for the 98x98 result (crop 2 at the start and 2 and the end).
ATTENTION: Unlike how zero padding usually works for Convolutional layers, for this case add zeros at the END. If not, you will be adding a shift that will be reflected in a shift in the output. ex. the expected result could be [1, 2, 3, 4] and if you apply 1 zero at the beggining and 1 at the end (instead of 2 at the end) you will have [4, 1, 2, 3].
ATTENTION 2: Not making the sizes to 102 when using iff(fft()) technique will produce something call Aliasing. This will make for example, an expected result of 30, 31, 57, 47, 87, 47, 33, 27, 5 to be 77, 64, 84, 52, 87. Note this results is actually product of making:
30, 31, 57, 47, 87
+ 47, 33, 27, 5
--------------------
77, 64, 84, 52, 87

Finding value of Y-axis from a given X-axis value in R

Fairly new to R and I'm trying to run analysis of FTIR spectra for my dissertation through the ChemoSpec package. In specialist software like Spectragryph (can't access on my own computer, hence using R) it's possible to locate peak values very easily but I can't seem to work out the right way to do it here.
This is the formula I'm hoping to perform on all of my spectra:
Carbonyl Index (CI) = Absorbance at 1740cm-1 (the maximum of carbonyl
peak)/ Absorbance at 1460cm-1 x (the maximum of carbonyl peak)
Here is an example of the plot code for the spectra:
## ChemoSpec plot
plotSpectra(HDPE_samples,
main = "48 hr exposure",
which = c(8, 9, 10, 11, 12, 13, 14, 15, 16,
39, 40, 41, 42, 43, 44, 60, 61),
## y axis shows absorbance (%)
yrange = c(0, 0.9),
offset = 0.005,
lab.pos = 2450,
## x axis shows wave numbers (cm-1)
xlim = c(1300, 3000))
For now I'd be happy just to retrieve the absorbance values associated with the wave numbers in the formula if anyone could give me pointers on which functions/packages to look at
Here is an example of reading data at a specific frequency.
library(ChemoSpec)
#> Loading required package: ChemoSpecUtils
data(metMUD1)
plotSpectra(metMUD1)
# Where is the maximum of signal 1?
which.max(metMUD1$data[1,])
#> [1] 1098
# What is the frequency and intensity at the max value?
metMUD1$freq[1098]
#> [1] 1.340894
metMUD1$data[1, 1098]
#> [1] 0.0680055
Created on 2020-01-15 by the reprex package (v0.3.0)

R - Generating frequency table from a table of pre-defined bins

I need to generate a cumulative frequency plot of some bubble size data (I have >1000000 objects). In geology the way we do this is by using geometric binning.
I calculate the bins using the following method:
smallest object value aka 0.0015mm * 10^0.1 = upper limit of bin 1, the upper limit of each succcessive bin is generated by multiplying the lower limit by 10^0.1
Bin 1: 0.0015 - 0.001888388
Bin 2: 0.00188388 - 0.002377340
I tried writing a while loop to generate these as breakpoints in R but it wasnt working. So I generated my bins in Excel and now have a table with bins that range from my smallest object to my largest with bins sized appropriately.
What I now want to do is read this into R and use it to find the frequency of objects in each bin. I can't find how to do this - possibly because in most disciplines you dont set your bins like this.
I am fairly new to R so am trying to keep my code fairly simple.
Thanks,
B
The easiest option is to use ?cut. Here's an example with randomly generated data.
# generate data
set.seed(666)
runif(min=0, max=100, n=1000) -> x
# create arbitrary cutpoints (these should be replaced by the ones generated by your geometric method)
cutpoints <- c(0, 1, 10, 11, 12, 15, 20, 50, 90, 99, 99.01, 100)
table(cut(x, cutpoints))
(0,1] (1,10] (10,11] (11,12] (12,15] (15,20]
9 92 13 10 27 45
(20,50] (50,90] (90,99] (99,99.01] (99.01,100]
310 399 87 0 8
Also note include.lowest parameter in cut defaults to FALSE:
include.lowest: logical, indicating if an ‘x[i]’ equal to the lowest
(or highest, for ‘right = FALSE’) ‘breaks’ value should be
included.

cutting a variable into pieces in R

I'm trying to cut() my data D into 3 pieces: [0-4], [5-12], [13-40] (see pic below). But I wonder how to exactly define my breaks in cut to achieve that?
Here is my data and R code:
D <- read.csv("https://raw.githubusercontent.com/rnorouzian/m/master/t.csv", h = T)
table(cut(D$time, breaks = c(0, 5, 9, 12))) ## what should breaks be?
# (0,5] (5,9] (9,12] # cuts not how I want the 3 pieces .
# 228 37 10
The notation (a,b] means ">a and <=b".
So, to get your desired result, just define the cuts so you get the grouping that you want, including a lower and upper bound:
table(cut(D$time, breaks=c(-1, 4, 12, 40)))
## (-1,4] (4,12] (12,40]
## 319 47 20
You may also find it helpful to look at the two arguments right=FALSE, which changes the endpoints of the intervals from (a,b] to [a,b), and include.lowest, which includes the lowest breaks value (in the OP's example, this is [0,5] with closed brackets on the lower bound). You can also use infinity. Here's an example with a couple of those options put to use:
table(cut(D$time, breaks = c(-Inf, 4, 12, Inf), include.lowest=TRUE))
## [-Inf,4] (4,12] (12, Inf]
## 319 47 20
This produces the right buckets, but the interval notation would need tweaking. Assuming all times are integers. Might need to tweak the labels manually - each time you have an right-open interval notation, replace the factor label with a closed interval notation. Use your best string 'magic'
Personally, I like to make sure all possibilities are covered. Perhaps future data from this process might exceed 40? I like to put an upper bound of +Inf in all my cuts. This prevents NA from creeping into the data.
What cut needs is a 'whole numbers only` option.
F=cut(D$time,c(0,5,13,40),include.lowest = TRUE,right=FALSE)
# the below levels hard coded but you could write a loop to turn all labels
# of the form [m,n) into [m,n-1]
levels(F)[1:2]=c('[0,4]','[5,12]')
Typically there would be more analysis before final results are obtained, so I wouldn't sweat the labels too much until the work is closer to complete.
Here are my results
> table(F)
F
[0,4] [5,12] [13,40]
319 47 20
R can compare integers to floats, like in
> 6L >= 8.5
[1] FALSE
Thus you can use floats as breaks in cut such as in
table(cut(D$time, breaks = c(-.5, 4.5, 12.5, 40.5)))
For integers this fullfills your bucket definition of [0-4], [5-12], [13-40] without you having to think to much about square brackets against round brackets.
A fancy alternative would be clustering around the mean of you buckets as in
D <- read.csv("https://raw.githubusercontent.com/rnorouzian/m/master/t.csv", h = T)
D$cluster <- kmeans(D$time, center = c(4/2, (5+12)/2, (13+40)/2))$cluster
plot(D$time, rnorm(nrow(D)), col=D$cluster)
You shoud add two aditional arguments right and include.lowest to your code!
table(cut(D$time, breaks = c(0, 5, 13, 40), right=FALSE, include.lowest = TRUE))
In the case of right=FALSE the intervals should be closed on the left and open on the right such that you would have your desired result. include.lowest=TRUE causes that your highest break value (here 40) is included to the last interval.
Result:
[0,5) [5,13) [13,40]
319 47 20
Vice versa you can write:
table(cut(D$time, breaks = c(0, 4, 12, 40), right=TRUE, include.lowest = TRUE))
with the result:
[0,4] (4,12] (12,40]
319 47 20
Both mean exact what you looking for:
[0,4] [5,12] [13,40]
319 47 20

WKT: how do you define Polygons with 3 rings (==2 holes)?

I found in here this document. I read it but I keep wondering how to define a Polygon with 3 rings in WKT?
You can use either the POLYGON or the MULTIPOLYGON type, but make sure the outer container ring is listed first followed by the inner hole rings. The orientations of the inner rings are not important since holes are explicit in the syntax.
X & Y are space separated, coordinates are comma separated, and ring extents are limited by parentheses and separated by commas. Polygons (outer ring plus any inner rings) are also limited by parentheses.
Finally, inner rings cannot cross each other, nor can they cross the outer ring.
Examples:
POLYGON ((10 10, 110 10, 110 110, 10 110), (20 20, 20 30, 30 30, 30 20), (40 20, 40 30, 50 30, 50 20))
MULTIPOLYGON (((10 10, 110 10, 110 110, 10 110), (20 20, 20 30, 30 30, 30 20), (40 20, 40 30, 50 30, 50 20)))

Resources