logticks in plain R graphics - r

https://ggplot2.tidyverse.org/reference/annotation_logticks.html
There is a package for logticks, but it is not for basic R graphics. Is there a ready-to-use function to easily generate the ticks (with the range as the input argument) that would be printed nicely so that they can be used with basic R graphics (the axis() command specifically)?
For example, between 10 and 100, I may want to have major ticks at 10, 20, ..., 90, 100, and minor ticks at 11, ..., 19, 21, ...,29, ..., 91, ..., 99.
Between 100 and 1000, the ticks will be just the above ones multiple by 10. For other ranges, the ticks can be similarly defined.
So if I have an input range of 15 and 150. I will have all the ticks described above that are within 15 and 150 as output.
The function should be relatively flexible. For example, I may reduce the frequency of major ticks, say to keep the ones at 10, 20, 50, 100. But the minor ticks can still remain as is, plus, the additional ones like 30, 40, 60, 70, 80, 90.
Another flexibility may be uses people can specify how many major ticks and minor ticks should be (but needs to be rounded so that the ticks are not an awkward place).

Related

Julia: How to make the histogram have same number of bins for two vectors of equal size?

I want to calculate frequency of occurrence in multiple vectors and want the resulting number of bins to be consistent across vectors so its easier to calculate wasserstein distance among them.
The following code shows that histogram gives different sized bins.
using StatsBase
for i in 1:10
h = fit(Histogram,randn(1000), nbins=10); println(size(h.weights))
end
How to make number of bins consistent?
One way to be completely consistent across runs is to supply more than just the number of bins; to be perfectly consistent, we also supply their exact positions. With Julia's StatsBase, you do that by supplying the "edges" (bin boundaries). Here's a demo where bins run from i to i+1:
julia> fit(Histogram, randn(1000), -5:5)
Histogram{Int64, 1, Tuple{UnitRange{Int64}}}
edges:
-5:5
weights: [0, 2, 23, 139, 319, 355, 143, 18, 1, 0]
closed: left
isdensity: false

how to do fourier frequency matrix multiplication if size is different?

sorry this is not a program issue.
I just get confused for this Theory:
The FFT of a convolution is equal to the multiplication of their own's FFT.
i.e.:
FFT(conv(x,y)) = FFT(x) * FFT(y)
for the left side:
lets say i have a image with 100x100 size and kernel 3x3, if I convolve, i will get a matrix of 98x98, then its FFT will also be 98x98
for the right side:
if I take FFT for each I will get a frequency matrix of 3x3 and 100x100 respectively.
Then how should i do the multiplication? Some of you may say we can pad the 3x3 kernel to 100x100 and take FFT, but still we will get a matrix of 100x100 instead of 98x98?
Can someone give me some hints?
A convolution of two signals of size L and P respectively will have a result of size N = L + N - 1.
Therefore, the mathematically correct implementation of conv(x,y) will have size 102x102. You should zero pad to both x and y to make them of size 102.
When you perform the convolution as CNN convolution layers does (which is what I think you are doing) without any zero padding, you are actually cropping the result (you are leaving outside the border results).
Therefore, you can just do a 102x102 fft result and crop accordingly for the 98x98 result (crop 2 at the start and 2 and the end).
ATTENTION: Unlike how zero padding usually works for Convolutional layers, for this case add zeros at the END. If not, you will be adding a shift that will be reflected in a shift in the output. ex. the expected result could be [1, 2, 3, 4] and if you apply 1 zero at the beggining and 1 at the end (instead of 2 at the end) you will have [4, 1, 2, 3].
ATTENTION 2: Not making the sizes to 102 when using iff(fft()) technique will produce something call Aliasing. This will make for example, an expected result of 30, 31, 57, 47, 87, 47, 33, 27, 5 to be 77, 64, 84, 52, 87. Note this results is actually product of making:
30, 31, 57, 47, 87
+ 47, 33, 27, 5
--------------------
77, 64, 84, 52, 87

Qlik Sense - One line graph, with two lines that need a different scale (one with values of 1000, the other with values of 10)

I have a one line graph, with two lines that need a different scale, each on the Y axis (one with values of 1000, the other with values of 10)
I have 5 machines, so 5 results for each function.
Measure 1: Avg(Speed): 1000, 800, 1000, 700, 600
Measure 2: Count(PartsProduced): 3, 5, 23, 50, 10
When I create the graph - you can't really see the results of Measure 2. I want to create a second Y axis because I need to see them combined.
Thanks!
After you've created both of your measures (Qlik calls them expressions), navigate to the 'Axes' tab in your chart properties dialog and select one of your expressions from the list. Once you've made your selection, choose the 'Right (Top)' radio button in the 'Position' section. See image below:

Plot kernel density estimation with the kernels over the individual observations in R

Well to keep things short what I want to achieve is a plot like the right one:
I would like to obtain a standard KDE plot with its individual kernels plotted over the observations.
The best solution would be the one that considers all the different kernel functions (e.g. rectangular, triangular etc).
Well after reading this Answer I managed to come up with an solution.
# Create some input data
x<-c(19, 20, 10, 17, 16, 13, 16, 10, 7, 18)
# Calculate the KDE
kde<-density(x,kernel="gaussian",bw=bw.SJ(x)*0.2)
# Calcualte the singel kernels/pdf's making up the KDE of all observations
A.kernel<-sapply(x, function(i) {density(i,kernel="gaussian",bw=kde$bw)},simplify=F)
sapply(1:length(A.kernel), function(i){A.kernel[[i]][['y']]<<-(A.kernel[[i]][['y']])/length(x)},simplify=F)
# Plot everything together ensuring the right scale (the area of the single kernels is corrected)
plot(kde)
rug(x,col=2,lwd=2.5)
sapply(A.kernel, function(i){
lines(i,col="red")}
)
The result looks like this:

Extracting information on terminal nodes in partykit:ctree with a large number of multivariate responses

I am using partykit:ctree to explore my dataset, which is a set of about 15,000 beach surveys, investigating the number of pieces of debris found from 50 different categories. There are lots of zeros in the data, and a large spread of total debris amounts. I also have a series of independent variables, including some factors, some count data, and some continuous data.
Here is a very small sample dataset:
Counts<- as.data.frame(matrix (rpois(100,1), ncol=5))
colnames(Counts)<-c("Glass", "HardPlastic", "SoftPlastic", "PlasticBag", "Fragments")
State<-rep(c("CA","OR","WA"), each=6)
Counts$State<-c(State,"CA","OR")
County<-rep((1:9), each=2)
Counts$County<-c(County, 1,4)
Counts$Distance<-c(10, 15, 13, 19, 18, 23, 38, 40, 49, 44, 47, 45, 52, 53, 55, 59, 51, 53, 14, 33)
Year<-rep(c("2010","2011","2012"), times=7)
Counts$Year<-Year[1:20]
I have used the following code to partition my data:
M.2<-ctree(Glass + HardPlastic + SoftPlastic + PlasticBag + Fragments ~
as.factor (State) + as.factor (County) + Distance + as.factor (Year), data=Counts)
plot(M.2, terminal_panel = node_barplot, cex = 0.5)
This comes up with a lovely graph, but how do I extract the membership of each of the terminal nodes? I can see it in the graph if there are only a few items, but once the number of possible categories increases to 50, it becomes much harder to look at it graphically. I would like to see the information contained within the nodes; particularly the relative probabilities of each individual category being contained in each terminal node.
I know that if this were a BinaryTree class, I could use the nodes argument, but when I query the class(M.2) it tells me it is from the constaparty class, and I haven't been able to find how to get node information from this class.
I have also run into a secondary problem, which is that when I run the ctree on my sample data set, it crashes R every time! It works fine with my actual data set, but I can't figure out what is wrong with the sample set.
EDIT: The desired output would be something along the lines of:
Node15:
Hard Plastic 30
Glass 5
Soft Plastic 23
Plastic Bag 6
Fragments 12
I just e-mailed with the package maintainer (Torsten Hothorn) and principal author of ctree() to which such requests would really best be directed. (He currently does not participate in SO.) Apparently, this is a bug in the partykit version of ctree() and he is working on resolving this. For the time being it is best to use the old party version for this - and hopefully a fixed partykit version will become available soon.

Resources