Using SUMPRODUCT instead of SUMIFS

Using SUMPRODUCT instead of SUMIFS - formula

I'm trying to reduce formula, but I really don't understand how to use the SUMPRODUCT function. I am a new excel user and doing things piece by piece.
I've tried using the SUM and SUMIFS function but it makes the formula so long and the file slow in reading the changes made.
SUMIFS(OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,7,15,1),OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,5,15,1),$B5,OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,6,15,1),DATA!$D$2),SUMIFS(OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,12,15,1),OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,10,15,1),$B5,
OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,11,15,1),DATA!$F$2),SUMIFS(OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,18,15,1),
OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,16,15,1),$B5,OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,17,15,1),DATA!$F$2),SUMIFS(OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,23,15,1),OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,21,15,1),$B5,OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,22,15,1),DATA!$H$2))
I expect the output to be at least 1/2 of the my given formula or if possible ,less. Thank you in advance.
1This is where sales are recorded
2When daily sales are recorded it is automatically deducted in inventory
The one that has the codes are the SOLD column. The sum of the QTY column of the retail, rebates and 25% are the ones being deducted.
I managed to reduce the Formula, but still working on reducing it more, here it is:
=IF(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),SUM(
SUMPRODUCT(--(OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,5,15,1)=$B5),--(OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,6,15,1)=DATA!$D$2),OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,7,15,1)),
SUMPRODUCT(--(OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,10,15,1)=$B5),--(OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,11,15,1)=DATA!$F$2),OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,12,15,1)),
SUMPRODUCT(--(OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,16,15,1)=$B5),--(OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,17,15,1)=DATA!$F$2),OFFSET(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),2,18,15,1))))

=IF(INDEX('DAILY SALES'!$B:$B,MATCH(O$2,'DAILY SALES'!$B:$B,0)),
SUMPRODUCT(--('DAILY SALES'!G4:G18=$B5),--('DAILY SALES'!H4:H18=DATA!$D$2),'DAILY SALES'!I4:I18)+
SUMPRODUCT(--('DAILY SALES'!L4:L18=$B5),--('DAILY SALES'!M4:M18=DATA!$F$2),'DAILY SALES'!N4:N18)+
SUMPRODUCT(--('DAILY SALES'!R4:R18=$B5),--('DAILY SALES'!S4:S18=DATA!$F$2),'DAILY SALES'!T4:T18)+
SUMPRODUCT(--('DAILY SALES'!W4:W18=$B5),--('DAILY SALES'!X4:X18=DATA!$H$2),'DAILY SALES'!Y4:Y18))
This is the shortest I can get, however, I lose functionality of just extending everything to the last day of the year.

Related

Why I cannot use the natural logarithm function with the data set?

I've got a code that works with the Data Set. I found out that it doesn't want to work with the ln(x) function. The data set can be found here.
LY <- ln(Apple$Close - Apple$Open)
Warning in log(x) : NaNs produced
Could you please help me to fix this problem?

Since stocks can go down as well as up (unfortunately), Close can be less than Open and Close - Open can be negative. It just doesn't make sense to take the natural log of a negative number; it's like dividing by zero, or more precisely like taking the square root of a negative number.
Actually, you can take the logarithm of a complex number with a negative real part:
log(as.complex(-1))
## [1] 0+3.141593i
... but "i times pi" is probably not a very useful result for further data analysis ...
(in R, log() takes the natural logarithm. While the SciViews package provides ln() as a synonym, you might as well just get used to using log() - this is a convention across most programming languages ...)
Depending on what you're trying to do, the logarithm of the close/open ratio can be a useful value (log(Close/Open)): this is negative when Close < Open, positive when Close > Open). As #jpiversen points out, this is called the logarithmic return; as #KarelZe points out, log(Close/Open) is mathematically equivalent to log(Close) - log(Open) (which might be what your professor wanted ... ???)

Are you looking for logarithmic return? In that case the formula would be:
log(Apple$Close / Apple$Open)
Since A / B for two positive values is always positive, this will not create NaNs.

Formula and fractions in lyx

I want to write to following formulas in lyx: Ic = (Number of new cases in D)/P0 and this one TBM =(Total number of all-cause deaths in a given region over a specified period)/(Estimated total exposed population of the same region during the same period). can someone help me please

When you are in math mode, type \text and then space. Then start typing normal text. Or try "ctrl + M" (when you are already inside math mode). For more information, please read Help > Math. Inserting text is discussed there.

agrep max.distance arguments in R

I need some help with the specific arguments of the agrep package in R.
In terms of cost, all, insertions, deletions and substitutions each have a "maximum number/fraction of substitutions" integer or fraction input parameter.
Ive read the documentation on it, but I still cannot figure out some specifics:
What is the difference of a "cost=1" and "all=1"?
How is a decimal interpreted, such as "cost=0.1", "inserts=0.9", "all=0.25", etc.?
I understand the basics of the Levenshtein Distance, but how is it applied in terms of the cost or all arguments?
Sorry if this is fairly basic, but like I said, the documentation I have read on it is slightly confusing.
Thanks in advance

Not 100% certain, but here is my understanding:
in max.distance, cost and all are interchangeable if you don't specify a costs argument (this is the next argument); if you do, then cost will limit based on the weighted (as per costs) costs of insertion/deletion/substitutions you specified, whereas all will limit on the raw count of those operations
The fraction represents what fraction of the number of characters in your pattern argument you want to allow as insertion/deletions/substitutions (i.e. 0.1 on a 10 character pattern would allow 1 change). If you specify costs, then it is the fraction of # of characters in pattern * max(costs), though presumably fractions in max.distance{insertions/deletions/substitutions} will be # of characters * corresponding costs value.
I agree that the documentation is not as complete as it could be. I discovered the above by building simple test examples and messing around with them. You should be able to do the same an confirm for yourself, particularly the last part (i.e. whether costs affects the fraction measure of max.distance{insertions/deletions/substitutions}), which I haven't tested.

How to select stop words using tf-idf? (non english corpus)

I have managed to evaluate the tf-idf function for a given corpus. How can I find the stopwords and the best words for each document? I understand that a low tf-idf for a given word and document means that it is not a good word for selecting that document.

Stop-words are those words that appear very commonly across the documents, therefore loosing their representativeness. The best way to observe this is to measure the number of documents a term appears in and filter those that appear in more than 50% of them, or the top 500 or some type of threshold that you will have to tune.
The best (as in more representative) terms in a document are those with higher tf-idf because those terms are common in the document, while being rare in the collection.
As a quick note, as #Kevin pointed out, very common terms in the collection (i.e., stop-words) produce very low tf-idf anyway. However, they will change some computations and this would be wrong if you assume they are pure noise (which might not be true depending on the task). In addition, if they are included your algorithm would be slightly slower.
edit:
As #FelipeHammel says, you can directly use the IDF (remember to invert the order) as a measure which is (inversely) proportional to df. This is completely equivalent for ranking purposes, and therefore to select the top "k" terms. However, it is not possible to use it to select based on ratios (e.g., words that appear in more than 50% of the documents), although a simple thresholding will fix that (i.e., selecting terms with idf lower than a specific value). In general, a fix number of terms is used.
I hope this helps.

From "Introduction to Information Retrieval" book:
tf-idf assigns to term t a weight in document d that is
highest when t occurs many times within a small number of documents (thus lending high discriminating power to those documents);
lower when the term occurs fewer times in a document, or occurs in many documents (thus offering a less pronounced relevance signal);
lowest when the term occurs in virtually all documents.
So words with lowest tf-idf can considered as stop words.

When is it appropriate to use floating precision data types?

It's clear that one shouldn't use floating precision when working with, say, monetary amounts since the variation in precision leads to inaccuracies when doing calculations with that amount.
That said, what are use cases when that is acceptable? And, what are the general principles one should have in mind when deciding?

Floating point numbers should be used for what they were designed for: computations where what you want is a fixed precision, and you only care that your answer is accurate to within a certain tolerance. If you need an exact answer in all cases, you're best using something else.
Here are three domains where you might use floating point:
Scientific Simulations
Science apps require a lot of number crunching, and often use sophisticated numerical methods to solve systems of differential equations. You're typically talking double-precision floating point here.
Games
Think of games as a simulation where it's ok to cheat. If the physics is "good enough" to seem real then it's ok for games, and you can make up in user experience what you're missing in terms of accuracy. Games usually use single-precision floating point.
Stats
Like science apps, statistical methods need a lot of floating point. A lot of the numerical methods are the same; the application domain is just different. You find a lot of statistics and monte carlo simulations in financial applications and in any field where you're analyzing a lot of survey data.
Floating point isn't trivial, and for most business applications you really don't need to know all these subtleties. You're fine just knowing that you can't represent some decimal numbers exactly in floating point, and that you should be sure to use some decimal type for prices and things like that.
If you really want to get into the details and understand all the tradeoffs and pitfalls, check out the classic What Every Programmer Should Know About Floating Point, or pick up a book on Numerical Analysis or Applied Numerical Linear Algebra if you're really adventurous.

I'm guessing you mean "floating point" here. The answer is, basically, any time the quantities involved are approximate, measured, rather than precise; any time the quantities involved are larger than can be conveniently represented precisely on the underlying machine; any time the need for computational speed overwhelms exact precision; and any time the appropriate precision can be maintained without other complexities.
For more details of this, you really need to read a numerical analysis book.

Short story is that if you need exact calculations, DO NOT USE floating point.
Don't use floating point numbers as loop indices: Don't get caught doing:
for ( d = 0.1; d < 1.0; d+=0.1)
{ /* Some Code... */ }
You will be surprised.
Don't use floating point numbers as keys to any sort of map because you can never count on equality behaving like you may expect.

Most real-world quantities are inexact, and typically we know their numeric properties with a lot less precision than a typical floating-point value. In almost all cases, the C types float and double are good enough.
It is necessary to know some of the pitfalls. For example, testing two floating-point numbers for equality is usually not what you want, since all it takes is a single bit of inaccuracy to make the comparison non-equal. tgamblin has provided some good references.
The usual exception is money, which is calculated exactly according to certain conventions that don't translate well to binary representations. Part of this is the constants used: you'll never see a pi% interest rate, or a 22/7% interest rate, but you might well see a 3.14% interest rate. In other words, the numbers used are typically expressed in exact decimal fractions, not all of which are exact binary fractions. Further, the rounding in calculations is governed by conventions that also don't translate well into binary. This makes it extremely difficult to precisely duplicate financial calculations with standard floating point, and therefore people use other methods for them.

It's appropriate to use floating point types when dealing with scientific or statistical calculations. These will invariably only have, say, 3-8 significant digits of accuracy.
As to whether to use single or double precision floating point types, this depends on your need for accuracy and how many significant digits you need. Typically though people just end up using doubles unless they have a good reason not to.
For example if you measure distance or weight or any physical quantity like that the number you come up with isn't exact: it has a certain number of significant digits based on the accuracy of your instruments and your measurements.
For calculations involving anything like this, floating point numbers are appropriate.
Also, if you're dealing with irrational numbers floating point types are appropriate (and really your only choice) eg linear algebra where you deal with square roots a lot.
Money is different because you typically need to be exact and every digit is significant.

I think you should ask the other way around: when should you not use floating point. For most numerical tasks, floating point is the preferred data type, as you can (almost) forget about overflow and other kind of problems typically encountered with integer types.
One way to look at floating point data type is that the precision is independent of the dynamic, that is whether the number is very small of very big (within an acceptable range of course), the number of meaningful digits is approximately the same.
One drawback is that floating point numbers have some surprising properties, like x == x can be False (if x is nan), they do not follow most mathematical rules (distributivity, that is x( y + z) != xy + xz). Depending on the values for z, y, and z, this can matters.

From Wikipedia:
Floating-point arithmetic is at its
best when it is simply being used to
measure real-world quantities over a
wide range of scales (such as the
orbital period of Io or the mass of
the proton), and at its worst when it
is expected to model the interactions
of quantities expressed as decimal
strings that are expected to be exact.
Floating point is fast but inexact. If that is an acceptable trade off, use floating point.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex