what is the best approach to the cost compare - reasoning

If one bottle costs more than one cap, which statement is true?
one bottle + one cap costs more than two bottles
one bottle costs more than two caps
two caps cost more than one bottle
one bottle + one cap costs more than two caps
two bottles cost more than three caps
how to solve questions like this? what should be the approach?

You could solve it by rewriting the expressions into inequalities and graphing it out.
Making inequalities
For how to graph it, check this link
So I would basically see this as a set of inequalities, substituting caps with x and bottles with y
One bottle costs more than one cap, then becomes y > x
one bottle + one cap costs more than two bottles, becomes y + x > y + y and then rewrite this to y < x
one bottle costs more than two caps, becomes y > 2*x
and so on.
Plotting it out
Then plot the two inequalities into a coordinate system, like shown here
For the first question you'll see that no values satifies both y > x and y < x

Related

How to calculate the area of valleys in a curve?

I have a series of daily values, y. For each day, di (i.e., each row), I would like to calculate the (graph) area, ai, of the region between the curve and the horizontal line y = yi between di and the most recent previous occurrence of the value yi. Sketch below. Because observations occur at regular, discrete timesteps (daily), the calculated area, ai, is equivalent to the sum of the daily differences between each daily y and yi (black bars in figure). I'm interested only in valleys, so the calculated area, ai, can be set to 0 when y is decreasing (yi - yi-1 <= 0).
Toy data below. Expected result shown in dat$a.
dat$a[6] was calculated from 55 - 50;
dat$a[7] was calculated from (60-55)+(60-50). And so on.
dat = data.frame(d = seq.Date(as_date("2021-01-01"),as_date("2021-01-10"),by = "1 day"),
y = c(100,95,90,70,50,55,60,75,85,90),
a = c(0,0,0,0,0,5,15,65,115,145))
My first thought was to calculate the area between the curve and the horizontal line y = yi between days di and the the most recent previous occurrence of the value yi, using perhaps geiger::area.between.curves(), but I couldn't work out how to identify most recent previous occurrence of the value yi.
[In case the context helps, the actual data are daily values of the area (m2) of a wetland not submerged by water. When the water rises, a portion of the wetland that had been dry for some time becomes wet. Here, I'm trying to calculate the extent of the reflooding in m2-days. A portion of the wetland that has been dry for a long time but becomes reflooded will contribute many m2-days to the sum.]
I'm most comfortable in the tidyverse, and such answers are greatly preferred. I am not familiar with data.table.
Thanks in advance
Update
I was able to able to achieve my desired calculation in Excel, though it's brutally inelegant. Couple hundred rows in an example, linked below. Given that my real data are 180k rows, my poor machine hated the 18 million calculated cells. Though I can move on with my analysis, I am still very interested in an R solution. My implemented approach differs subtly from my imagined R approach in that it's summing 'horizontal rectangles', so to speak, each of the same (small) y-unit height, rather than 'vertical rectangles', each of unit width.
Here's the file.
Since the question is missing complete information we will compute the the area under the curve assuming that a day is one unit. Modify as appropriate for your specific problem.
library(pracma)
nr <- nrow(dat)
dat0 <- dat[c(1, 1:nr, nr), ]
dat0[c(1, nr), "y"] <- 0
with(dat0, abs(polyarea(as.numeric(d), y)))

3D Ploting in Scilab: Weird plot behaviour

I want to plot a function in scilab in order to find the maximum over a range of numbers:
function y=pr(a,b)
m=1/(1/270000+1/a);
n=1/(1/150000+1/a);
y=5*(b/(n+b)-b/(m+b))
endfunction
x=linspace(10,80000,50)
y=linspace(10,200000,50)
z=feval(x,y,pr)
surf(x,y,z);
disp( max(z))
For these values this is the plot:
It's obvious that increasing the X axis will not increase the maximum but Y axis will.
However from my tests it seems the two axis are mixed up. Increasing the X axis will actually double the max Z value.
For example, this is what happens when I increase the Y axis by a factor of ten (which intuitively should increase the function value):
It seems to increase the other axis (in the sense that z vector is calculated for y,x pair of numbers instead of x,y)!
What am I doing wrong here?
With Scilab's surf you have to use transposed z if comming from feval. It is easy so realize if you use a different number of points in X and Y directions, as surf will complain about the size of the third argument. So in your case, use:
surf(x,y,z')
For more information see the help page of surf.
Stephane's answer is correct, but I thought I'd try to explain better why / what is happening.
From the help surf page (emphasis mine):
X,Y:
two vectors of real numbers, of lengths nx and ny ; or two real matrices of sizes ny x nx: They define the data grid (horizontal coordinates of the grid nodes). All grid cells are quadrangular but not necessarily rectangular. By default, X = 1:size(Z,2) and Y = 1:size(Z,1) are used.
Z:
a real matrix explicitly defining the heights of nodes, of sizes ny x nx.
In other words, think of surf as surf( Col, Row, Z )
From the help feval page (changed notation for convenience):
z=feval(u,v,f):
returns the matrix z such as z(i,j)=f(u(i),v(j))
In other words, in your z output, the i become rows (and therefore u should represent your rows), and j becomes your columns (and therefore v should represent your columns).
Therefore, you can see that you've called feval with the x, y arguments the other way round. In a sense, you should have designed pr so that it should have expected to be called as pr(y,x) instead, so that when passed to feval as feval(y,x,pr), you would end up with an output whose rows increase with y, and columns increase with x.
Then you could have called surf(x, y, z) normally, knowing that x corresponds to columns, and y corresponds to rows.
However, if you don't want to change your whole function just for this, which presumably you don't want to, then you simply have to transpose z in the call to surf, to ensure that you match x to the columns or z' (i.e, the rows of z), and y to the rows of z' (i.e. the columns of z).
Having said all that, it would probably be much better to make your function vectorized, and just use the surf(x, y, pr) syntax directly.

Algorithmically detecting jumps in a time-series

I have about 50 datasets that include all trades within a timeframe of 30 days for about 10 pairs on 5 exchanges. All pairs are of the same asset class, meaning they are strongly correlated and expect to have similar properties, but are on different scales. An example of this data would be
set.seed(1)
n <- 1000
dates <- seq(as.POSIXct("2019-08-05 00:00:00", tz="UTC"), as.POSIXct("2019-08-05 23:59:00", tz="UTC"), by="1 min")
x <- data.frame("t" = sort(sample(dates, 1000)),"p" = cumsum(sample(c(-1, 1), n, TRUE)))
Roughly, I need to identify the relevant local minima and maxima, which happen daily. The yellow marks are my points of interest. Unlike this example, there is usually only one such point per day and I consider each day separately. However, it is hard to filter out noise from my actual points of interest.
My actual goal is to find the exact point, at which the pair started to make a jump and the exact point, at which the jump is over. This needs to be as accurate as possible, as I want to observe which asset moved first and which asset followed at which point in time (as said, they are highly correlated).
Between two extreme values, I want to minimize the distance and maximize the relative/absolute change, as my points of interest are usually close to each other and their difference is quite large.
I already looked at other questions like
Finding local maxima and minima and Algorithm to locate local maxima and also this algorithm that has the same goal. However, my dataset is extremely noisy. I already reduced the dataset to 5-minute intervals, however, this has led to omitting the relevant points in the functions to identify local minima & maxima. Therefore, this was a not good solution given my goal.
How can I achieve my goal with a quite accurate algorithm? Manually skimming through all the time-series is not an option, since this would require me to evaluate 50 * 30 time-series manually, which is too time-consuming. I'm really puzzled and trying to find a suitable solution for a week.
If more code snippets are demanded, I'm happy to share, however they didn't give me meaningful results, which would be opposed to the idea of providing a minimum working example, therefore I decided to leave them out for now.
EDIT:
First off, I updated the plot and added timestamps to the dataset to give you an idea (the actual resolution). Ideally, the algorithm would detect both jumps on the left. The inner two dots because they're closer together and jump without interception, and the outer dots because they're more extreme in values. In fact, this maybe answers the question whether the algorithm is allowed to look into the future. Yes, if there's another local extrema in the range of, say, 30 observations (or 30 minutes), then ignore the intermediate local extrema.
In my data, jumps have been from 2% - ~ 15%, such that a jump needs to be at least 2% to be considered. And only if a threshold of 15 (this might be adaptable) consecutive steps in the same direction before / after the peaks and valleys is reached.
A very naive approach was to subset the data around the global minimum and maximum of a day. In most cases, this has denoised data and worked as an indicator. However, this is not robust when the global extrema are not in the range of the jump.
Hope this clarifies why this isn't a statistical question (there are some tests to determine whether a jump has happened, but not for jump arrival time afaik).
In case anyone wants a real example:
this is a corresponding graph, this is the raw data of the relevant period and this is the reduced dataset.
Perhaps as a starting point, look at function streaks
in package PMwR (which I maintain). A streak is
defined as a move of a specified size that is
uninterrupted by a countermove of the same size. The
function works with returns, not differences, so I add
100 to your data.
For instance:
set.seed(1)
n <- 1000
x <- 100 + cumsum(sample(c(-1, 1), n, TRUE))
plot(x, type = "l")
s <- streaks(x, up = 0.12, down = -0.12)
abline(v = s[, 1])
abline(v = s[, 2])
The vertical lines show the starts and ends of streaks.
Perhaps you can then filter the identified streaks by required criteria such as length. Or
you may play around with different thresholds for up
and down moves (though this is not really recommended
in the current implementation, but perhaps the results
are good enough). For instance, up streaks might look as follows. A green vertical shows the start of a streak; a red line shows its end.
plot(x, type = "l")
s <- streaks(x, up = 0.12, down = -0.05)
s <- s[!is.na(s$state) & s$state == "up", ]
abline(v = s[, 1], col = "green")
abline(v = s[, 2], col = "red")

Formula for incremental payment plan starting low and ending high

I think this is simple but maybe I'm over thinking it or I'm just crap at Math.
I'm trying to work out a formula for a incremental payment plan calculator without interest, That starts with low payment and ends on the 8 month with higher payment.
$6,600 / 8 = $825 per month
The above is showing $825 per month for 8 months.
I want the first payment to start low and increment up per month until the last payment is higher until the 6,600 is payed.
how would I work this out in Math terms.
In some sense you are underthinking it rather than overthinking it, since there are infinitely many solutions and you haven't given any criteria for choosing between those solutions.
Presumably you want the increments to be the same size each month.
Let x be the initial amount and y the monthly step size
You want
x + (x+y) + (x + 2y) + ... + (x + 7y) = 6600
or
8x + 28y = 6600
Mathematically, this equation has infinitely many solutions. If you specify that x,y are positive and that furthermore, x has at most 2 decimal places so as to be exactly expressible as currency, there are still a very large number of solutions.
What you can do is solve for y in terms of x to get that:
y = (1650 - 2x)/7
But -- you would still have to pick x. This formula would allow you to explore the trade-off between x and y. For example, if pick x = 500 then y is (approximately) 92.86 (you would probably have to adjust the final payment by a few pennies to get it to balance out in the end).

Connectivity graph of a combinational circuit

I am reading the book called, "VLSI Physical Design: From Graph Partitioning to Timing Closure" by Andrew B. Kahng, Jens Lienig, Igor L. Markov, and Jin Hu.
In that book, there is a picture of a combinational circuit like shown in Fig 1.
For the combinational circuit in Fig 1, the authors show the connectivity graph as shown in Fig 2 below.
My question is, there is no connectivity directly between gates x and y. In that case, why does the graph show two edges between gates (or nodes) x and y ?
Thanks for your help.
While there is no direct connectivity between x and y (such as x feeding y), the net N1 connects three nodes: a, x, and y. So, as all three are electrically equivalent, you must preserve the connections for the relationship among all three nodes. Therefore, for N1, you need an edge between a and x, an edge between a and y, and an edge between x and y. Similarly for N2, as it connects b, x, and y, you need an edge between every pin pair among b, x, and y.
In the general case, if you have a multi-pin net, a net that connects multiple nodes, then you will need to have an edge between every pin pair:
"A p-pin net is represented by (p choose 2) total connections between its nodes"
-- connectivity graph definition on p.28.
As an aside, you can see that this is a tedious process and the number of edges can quickly grow in this model. If you use a hyperedge and hypergraph model, however, then you only need one hyperedge to represent N1 and one hyperedge to represent N2 (versus the three regular edges).
I do also want to point out that this connectivity definition is very general, and in some cases, the edges between x and y can be removed. For instance, if you are performing timing propagation (e.g., arrival time) on N1, then you only need a directed edge between a and x and a directed edge between a and y.
I hope this helps.

Resources