Is it possible to add more than one measure in a single crosstab?
I have a crosstab in which rows represent departments and columns represent gender. It is possible to display counts and percentages?
I am reporting from relational data.
Yes. You can add multiple measures to your crosstab by nesting them in either rows or columns.
In your example, nest Counts and Percentages under Departments row or Gender Column.
Given below is an example of displaying two measures Revenue and Quantity in a crosstab.
The result looks like this:
Please refer to IBM Infocenter for more details on nested Crosstab.
http://publib.boulder.ibm.com/infocenter/c8bi/v8r4m0/index.jsp
Related
I would like to know how to combine duplicated/repeated rows into one. I am currently working with an OTU table with taxonomy, which is the result of merging five different sequencing runs. Each taxon (rows) has a number of counts per sample (columns).
The problem is that now I have multiple duplicated/repeated taxa by merging and keeping all the results together. For example:
What I would like to do is to combine/summarize those repeated taxa into one single row. I do not want to remove the data, I just want to combine it. I have seen similar posts in the forum, but mosts just want to delete the duplicates. I am not sure of how to proceed, any help will be appreciated!
I am extremely new to R and thus not familiar with the various packages.
I am simply using Soybean data from library(mlbench) data(Soybean) and I want visualize in a table the CLASS factor (19 levels) by various categories (date, plant.stand, precip, etc) (there are 35 such vars). I want to show frequency, NAs and mode. In essence then each Class would then be broken out by the various category (date, plant.stand, precip) etc with the frequency data.
I' sure there must be a simple way but I'm very new to R.
Thanks for the help.
Update
As per the table below:
table data
I want to basically count all the categorical data ie (date, plant.stand, precip, etc) and sort by CLASS variable. The only way I can think of is by creating a key for each factor level per categorical variable, counting the occurrence of each key and then sorting. Is there perhaps an easier way?
I am working on a data set in R having dimensions
dim(adData)
[1] 15844717 11
Out of 11 features,
one feature is having 273596(random integers used as id) unique values out of 15844717.
second feature is having 884353(random integers used as id) unique values out of 15844717.
My confusion is whether to convert them into factors or not because categorical variables with large number of levels will create a problem at the time of modelling or please suggest how to treat them.
I am new to Data Science and never worked on large data sets before.
~300k categories for one variable is sure to cause computational issues. I would first take a step back and examine the nature of this variable and its relevance to the prediction at hand. Without knowing the source of the data, it is hard to give specific advice.
If it is truly a categorical variable, it would be silly to leave the ids as numeric variables since the scale and order of the ids are likely meaningless.
Is it possible to group the levels into fewer but still meaningful categories?
Example 1: If the ids were zipcodes in the United States, there are potentially 40,000 unique values. These can be grouped into states or regions, reducing the number of levels to 50 or fewer.
Example 2: If the ids were product ids from an e-commerce site, they could be grouped by product category or sub-category. There would be much fewer distinct values to work with.
Another option is to examine the relative frequency of each category. If there are a few very common categories, with thousands of rare categories, you leave the common levels in tact and group the rare levels into an 'other' category.
I've got a tablereport with on the rows product category and on the columns years. In the valuesection, I want to show the number of sales. This works fine. But now I also want to show the % of columntotal for the product categories.
I use dax:
Measure := count(factSales[salesnr])/calculate(count(factSales[salesnr]);all(factSales))
But this yields the percentage of grand total over all years. I want the percentage of columntotal for every seperate year.
The ALL(Table) function tells Power Pivot to ignore any filters applied over the whole table. Therefore, you're telling Power Pivot to count all the rows of the factsales table regarless of the Category or Year being filtered on the pivot table.
However, in your case, what you want is the sum for ALL the categories on each year. Since you want the sum of ALL the categories you must use `ALL(factsales[categories]). In this way, you're ignoring only the filters for the categories and not the filters for the years.
Based on the previous explanation the dax formula would be:
Measure :=
count(factSales[salesnr]) / calculate(count(factSales[salesnr]);all(factsales[categories]))
I've two Views using Table Summarize to count their rows. One view is student arrivals and the other is studen departures.
I want to subtract the row count in ViewB (departures) from ViewA (arrivals) and combine them into a net figure. Perhaps combine in another view.
I am aware of union as a tool for merging but I don't want to combine the result set, I only want to subtract the row counts.
Any suggestions much appreciated!