How to apply conditional number format? - qt

With this modelData.amount.toLocaleCurrencyString() I get:
+---------+---------+
| Input | Output |
+---------+---------+
| 100000 | 100,000 |
| 1000000 | 1e+06 |
| -10000 | -10,000 |
| 0 | 0 |
+---------+---------+
Why do I get scientific notation for numbers above 999,999? This actually isn't that useful for me. What I need is conditional formatting like #,##0;(#,##0);- that puts negative in parentheses and converts 0 to - in addition to normal comma separators for positives.
I also don't want currency symbol in my numbers.

Related

Jupyter Notebook and Lab can not render this markdown snippet from Google Colab

This is how it looks on Google Colab when rendered,
The actual script
# Notation
Here is a summary of some of the notation you will encounter.
|General <img width=70/> <br /> Notation <img width=70/> | Description<img width=350/>| Python (if applicable) |
|: ------------|: ------------------------------------------------------------||
| $a$ | scalar, non bold ||
| $\mathbf{a}$ | vector, bold ||
| **Regression** | | | |
| $\mathbf{x}$ | Training Example feature values (in this lab - Size (1000 sqft)) | `x_train` |
| $\mathbf{y}$ | Training Example targets (in this lab Price (1000s of dollars)). | `y_train`
| $x^{(i)}$, $y^{(i)}$ | $i_{th}$Training Example | `x_i`, `y_i`|
| m | Number of training examples | `m`|
| $w$ | parameter: weight, | `w` |
| $b$ | parameter: bias | `b` |
| $f_{w,b}(x^{(i)})$ | The result of the model evaluation at $x^{(i)}$ parameterized by $w,b$: $f_{w,b}(x^{(i)}) = wx^{(i)}+b$ | `f_wb` |
However, in my local Jupyter Notebook/Lab it doesn't render correctly, I installed these extensions in Jupyter Lab
still it won't render and looks something like this
Try this below, as it is close:
# Notation
Here is a summary of some of the notation you will encounter.
| General <br> Notation <br /> | Description | Python (if applicable) |
| :-: | :----: | :- |
| $$a$$ | scalar, non bold ||
| $$\mathbf{a}$$ | vector, bold ||
| **Regression** | | | |
| $$\mathbf{x}$$ | Training Example feature values (in this lab - Size (1000 sqft)) | `x_train` |
| $$\mathbf{y}$$ | Training Example targets (in this lab Price (1000s of dollars)). | `y_train`
| $$x^{(i)}, y^{(i)}$$ | $$i_{th}$$Training Example | `x_i`, `y_i`|
| m | Number of training examples | `m`|
| $$w$$ | parameter: weight, | `w` |
| $$b$$ | parameter: bias | `b` |
| $$f_{w,b}(x^{(i)})$$ | The result of the model evaluation at $$x^{(i)}$$ parameterized by $$w,b$$: $$f_{w,b}(x^{(i)}) = wx^{(i)}+b$$ | `f_wb` |
Yields in classic notebook in session launched here:
Most of the issue is explained by here; it seems you need to use double dollar signs when embedding latex in a table. So for all but the first few rows, I simply did find replace to double the dollar sign symbols, and then pasted that in. ( I later realized I needed to hand edit the $$x^{(i)}, y^{(i)}$$ line.) The first few rows I did by hand trying to understand how they matched and attempting to control the alignment.
I cannot say what is going on with the alignment. According to here and even using that code there's a way to align left the first column. It kept messing up the table though incorporating that and the latex.

Addition of calculated field in rpivotTable

I want to create a calculated field to use with the rpivotTable package, similar to the functionality seen in excel.
For instance, consider the following table:
+--------------+--------+---------+-------------+-----------------+
| Manufacturer | Vendor | Shipper | Total Units | Defective Units |
+--------------+--------+---------+-------------+-----------------+
| A | P | X | 173247 | 34649 |
| A | P | Y | 451598 | 225799 |
| A | P | Z | 759695 | 463414 |
| A | Q | X | 358040 | 225565 |
| A | Q | Y | 102068 | 36744 |
| A | Q | Z | 994961 | 228841 |
| A | R | X | 454672 | 231883 |
| A | R | Y | 275994 | 124197 |
| A | R | Z | 691100 | 165864 |
| B | P | X | 755594 | 302238 |
| . | . | . | . | . |
| . | . | . | . | . |
+--------------+--------+---------+-------------+-----------------+
(my actual table has many more columns, both dimensions and measures, time, etc. and I need to define multiple such "calculated columns")
If I want to calculate defect rate (which would be Defective Units/Total Units) and I want to aggregate by either of the first three columns, I'm not able to.
I tried assignment by reference (:=), but that still didn't seem to work and summed up defect rates (i.e., sum(Defective_Units/Total_Units)), instead of sum(Defective_Units)/sum(Total_Units):
myData[, Defect.Rate := Defective_Units / Total_Units]
This ended up giving my defect rates greater than 1. Is there anywhere I can declare a calculated field, which is just a formula evaluated post aggregation?
You're lucky - the creator of pivottable.js foresaw cases like yours (and mine, earlier today) by implementing an aggregator called "Sum over Sum" and a few more, likewise, cf. https://github.com/nicolaskruchten/pivottable/blob/master/src/pivot.coffee#L111 and https://github.com/nicolaskruchten/pivottable/blob/master/src/pivot.coffee#L169.
So we'll use "Sum over Sum" as parameter "aggregatorName", and the columns whose quotient we want in the "vals" parameter.
Here's a meaningless usage example from the mtcars data for reproducibility:
require(rpivotTable)
data(mtcars)
rpivotTable(mtcars,rows="gear", cols=c("cyl","carb"),
aggregatorName = "Sum over Sum",
vals =c("mpg","disp"),
width="100%", height="400px")

Get weight of words by occurence

Maybe this is related to math.stacexhange, but I am affraid, that I will get a formula in answer what I won't undersand.
I have products in our database, and I have products from different suppliers in another table.
What I want is to pair, these supplieres products to our products if it is possible, or show for me at least show me a list, where the matching is high.
I did iterate throught all the suppliers products, and explodes the product name by spaces, and store it in a table, and the count of the occurence.
The table seems like this.
+--------+-------------+---------------+-------+
| id | word | originalWord | count |
+--------+-------------+---------------+-------+
| 220950 | Tracer | Tracer | 493 |
| 220951 | Destroyer | Destroyer | 3 |
| 220952 | Avago5050 | Avago5050 | 4 |
| 220953 | mouse | mouse | 2535 |
| 220954 | TRAMYS44916 | /TRAMYS44916/ | 2 |
| 220955 | GameZone | GameZone | 16 |
| 220956 | Enduro | Enduro | 3 |
| 220957 | AVAGO | AVAGO | 10 |
| 220958 | 5050 | 5050 | 4 |
| 220959 | optical | optical | 2370 |
| 220960 | USB | USB | 6160 |
+--------+-------------+---------------+-------+
and so on. Of course, in another table I stored, what is the product id for each word.
So what I want is to determine the weight of a word by occurence.
As you see, the word TRAMYS44916 is occured only twice, almost certain that is a partnumber, so this is the most heavy word. It weight should be 1.
Let's say the most occured is USB with 6160 occurence, so it weight should be like 0.01 or something like that, I think.
What is the best way to get all the weights of the words?
There are other tables for other suppliers so dispersion is always change.
This reminds me of Naive Bayes text classification, so to determine which product should it belongs to, you can calculate tf-idf of all the words.
Then if you want to pair it from another product name, you can decompose it to words again and select the product id based on the highest term value, however maybe you should specify some threshold for this, because in some cases it would not be that clear.
tf-idf = ("number of word matches in product name"/"word count of product name") * log ("number of products" / "number of products that contains the word")
You can see how it is done in the example here (In your case the document will be the product full name): https://en.wikipedia.org/wiki/Tf–idf#Example_of_tf.E2.80.93idf
Example implementation in Java: https://guendouz.wordpress.com/2015/02/17/implementation-of-tf-idf-in-java/

Combine DataFrame rows into a new column

I am wondering if there is simple way to achieve this in Julia besides iterating over the rows in a for-loop.
I have a table with two columns that looks like this:
| Name | Interest |
|------|----------|
| AJ | Football |
| CJ | Running |
| AJ | Running |
| CC | Baseball |
| CC | Football |
| KD | Cricket |
...
I'd like to create a table where each Name in first column is matched with a combined Interest column as follows:
| Name | Interest |
|------|----------------------|
| AJ | Football, Running |
| CJ | Running |
| CC | Baseball, Football |
| KD | Cricket |
...
How do I achieve this?
UPDATE: OK, so after trying a few things including print_joint and grpby, I realized that the easiest way to do this would be by() function. I'm 99% there.
by(myTable, :Name, df->DataFrame(Interest = string(df[:Interest])))
This gives me my :Interest column as "UTF8String[\"Running\"]", and I can't figure out which method I should use instead of string() (or where to typecast) to get the desired ASCIIString output.

Creating a unique integer on the basis of a string

I have a larger dataset (data.table with approx 9m rows) with a column that I would like to use to aggregate values (min and max etc). The column is a combination of various other columns and has a string based format, like the one below:
string <- "318XXXX | VNSGN | BIER"
To gain some speed in performing tasks, I would like to recode this to a unique integer. Another application that I use on a regular basis to deal with data has a build-in function that transforms a string as the one above in a integer (e.g. 73823). I was wondering whether there is a similar function in R? The idea is that a particular string will always result in the same integer; this will allow it to be used in merging data.tables etc.
Here a little example of the data.table column that I would like to encode in simple integer values:
sample <- c("318XXXX | VNSGN | BIER", "462XXXX | TZZZH | 9905", "462XXXX | TZZZH | 9905",
"462XXXX | TZZZH | 9905", "511XXXX | FAWOR | 336H", "511XXXX | FAWOR | 336H",
"652XXXX | XXXXR | T136", "652XXXX | XXXXR | T136", "672XXXX | BQQSZ | 7777",
"672XXXX | BQQSZ | 7777")
I am hoping to encode the strings into an additional column to the table like the one below; note that the same strings result in the same numbers.
String Number
318XXXX | VNSGN | BIER 19872
462XXXX | TZZZH | 9905 78392
462XXXX | TZZZH | 9905 78392
462XXXX | TZZZH | 9905 78392
511XXXX | FAWOR | 336H 23053
511XXXX | FAWOR | 336H 23053
652XXXX | XXXXR | T136 95832
652XXXX | XXXXR | T136 95832
672XXXX | BQQSZ | 7777 71829
672XXXX | BQQSZ | 7777 71829
The data.table package will create indexes for you without making you handle them explicitly so it would be less work than the approach in the question. See the setkey function in data.table.
Also the sqldf package can use the SQL create index statement as per Examples 4h and 4i on the sqldf home page as can just about any database package.

Resources