Slicer refreshing data - odbc

I have 6 tables (measurement and targeted as a pair) appliances targeted and measured, electronics targeted and measured, household targeted and measured, that are connected in powerBI using ODBC. I have made three tables using merge table as new (Household, electronics, appliances). Now i have shown the data in tabular form and bar graph using appliances. And I have created a slicer with three custom options Appliances, household and electronics. Now I want this slicer to change the appliances data in the graph and table to change to household and electronics depending upon the selection made in the slicer.
Kindly help me out, I have changed the names as the data is highly confidential. Let me know if I didn't provide much clarity.
thanks a bunch

In order for a slicer to work how you want, I think the easiest approach would be to merge all your tables into a bigger one with product type as a new column.
For example, if tables Household, Electronics, and Appliances,
ID Name Value
------------------
101 Chair 80
102 Desk 120
ID Name Value
------------------
203 Phone 800
206 Tablet 650
ID Name Value
------------------
311 Washer 380
367 Dryer 440
would become
ID Type Name Value
-------------------------------
101 Household Chair 80
102 Household Desk 120
203 Electronics Phone 800
206 Electronics Tablet 650
311 Appliances Washer 380
367 Appliances Dryer 440
Then you'd add a slicer on Type.

Related

How to find duplicate using Lat Long data and make it a Unique Identifier in big dataset

My Dataset looks something like this. Note below is hypothetical dataset.
Objective: Sales employee has to go to a particular location and verify the houses/Stores/buildings and device captures below mentioned information
Sr.No.
Store_Name
Phone-No.
Agent_id
Area
Lat-Long
1
ABC Stores
89099090
121
Bay Area
23.909090,89.878798
2
Wuhan Masks
45453434
122
Santa Fe
24.452134,78.123243
3
Twitter Cafe
67556090
123
Middle East
11.889766,23.334483
4
abc
33445569
121
Santa Cruz
23.345678,89.234213
5
Silver Gym
11004110
234
Worli Sea Link
56.564311, 78.909087
6
CK Clothings
00908876
223
90 th Street
34.445887, 12.887654
Facts:
#1 Unique Identifier for finding Duplicates – ** Check Sr.No 1 & 4 basically same
In this dummy dataset all the columns can be manipulated i.e. for same store/house/building-outlet
a) Since Name is entered manually for same house/store names can be changed and entered in the system -
multiple visits can happen
b) Mobile number can also be manipulated, different number can be associated with same outlet
c) Device with Agent capturing lat-long info also can be fudged - by moving closer or near to the building
Problem:
How to make Lat-Long Data as the Unique Identifier keeping in mind point - c), above for finding duplicates in the huge dataset.
Deploying QR is not also very helpful as this can also be tweaked.
Hereby stopping the fraudulent practice by an employee ( Same emp can visit same store/outlet or a different emp can also again visit the same store outlet to increase visit count)
Right now I can only think of Lat-Long Column to make UID please feel free to suggest if anything else can be made

R smbinning package: why 'Too many categories' for some variables?

I have a dataset in R containing many variables of different types and I am attempting to use the smbinning package to calculate Information Value.
I am using the following code:
smbinning.sumiv(Sample,y="flag")
This code produces IV for most of the variables, but for some the Process column states 'Too many categories' as shown in the output below:
Char IV Process
12 relationship NA Too many categories
15 nationality NA Too many categories
22 business_activity NA Too many categories
23 business_activity_group NA Too many categories
25 local_authority NA Too many categories
26 neighbourhood NA Too many categories
If I take a look at the values of business_activity_group for instance, I can see that there are not too many possible values it can take:
Affordable Rent Combined Commercial Community Combined
2546 4
Freeholders Combined Garages
23 6
General Needs Combined Keyworker
57140 340
Leasehold Combined Market Rented Combined
88 1463
Older Persons Combined Rent To Homebuy
4774 76
Shared Ownership Combined Staff Acommodation Combined
167 5
Supported Combined
2892
I thought this could be due to low volumes in some of the categories so I tried banding some of the groups together. This did not change the result.
Can anyone please explain why 'Too many categories' occurs, and what I can do to these variables in order to produce IV from the smbinning package?

Counting number of specific strings in an R data frame

I have a data frame that has many columns, but the two columns I am interested in are major and department. I need to find a way to count the number of specific entries in a column. So my data frame looks something like
student_num major dept
123 child education
124 child education
125 special education
126 justice administration
127 justice administration
128 justice administration
129 police administration
130 police administration
What I want is a student count for each major and department. Something like
education child special administration justice police
3 2 1 5 3 2
I have tried several methods but nothing is quite what I need. I tried using the aggregate() function and the ddply() from plyr but they give me department as two - for two unique entries, education and administration. How can I count each unique entry and not how many unique entries there are?
You can try:
library(dplyr)
count(my_dataframe, major)
count(my_dataframe, dept)
# Create example data frame
dt <- read.table(text = "student_num major dept
123 child education
124 child education
125 special education
126 justice administration
127 justice administration
128 justice administration
129 police administration
130 police administration",
header = TRUE, stringsAsFactors = FALSE)
# Select columns
dt <- dt[, c("major", "dept")]
# Unlist the data frame
dt_vec <- unlist(dt)
# Count the number
table(dt_vec)
dt_vec
administration child education justice police
5 2 3 3 2
special
1

Test performing on counts

In R a dataset data1 that contains game and times. There are 6 games and times simply tells us how many time a game has been played in data1. So head(data1) gives us
game times
1 850
2 621
...
6 210
Similar for data2 we get
game times
1 744
2 989
...
6 711
And sum(data1$times) is a little higher than sum(data2$times). We have about 2000 users in data1 and about 1000 users in data2 but I do not think that information is relevant.
I want to compare the two datasets and see if there is a statistically difference and which game "causes" that difference.
What test should I use two compare these. I don't think Pearson's chisq.test is the right choice in this case, maybe wilcox.test is the right to chose ?

How to perform aggregate function two times with two different criteria to a data.table

I like to use igraph to display relation ship between Switch and the Impacted Networks as a result of problem with the switch. Since there are lots of incidents, I would like to show the top 10 switches are that most problematic and as a result impacted Networks.
df
Incident Date Switch ImpactedNetwork
123 1/1/2012 A Wireless
455 1/2/2012 B LocalLan
460 1/3/2012 A LocalLan
465 1/4/2012 A Production
etc
to assing 1 everytime an incident happens with a swithc:
df$count<-c(1)
To come up with the top 10 most problematic switches:
library(data.table)
df<-df[, total := sum(count), by = Switch]
df
Incident Date Switch ImpactedNetwork count Total
123 1/1/2012 A Wireless 1 3
455 1/2/2012 B LocalLan 1 1
460 1/3/2012 A LocalLan 1 3
465 1/4/2012 A Production 1 3
from this df, how do I get the top 10 highest incident based on Total?
Once I determine, the top ten switches, I need aggregate count of ImpactedNetwork for the top then problematic switch.
t<-aggregate(count~ImpactedNetwork+Switch, df, sum)
t
ImpactedNework Switch count
Production A 1
Wireless A 1
plot(t, layout = layout.kamada.kawai, vertex.label = V(g)$name, vertex.label.color= "darkblue", edge.arrow.size=0.9, edge.curved=T, edge.label=t$count, edge.label.color="#F900F9", edge.label.font=10,vertex.shape="rectangle",edge.color="darkgreen", main="Top 10 Problematic Switches and Impacted Network"))
The ideas is this:
calculate the most incident generated switches
calculate count of impacted network give the Switch
3.Igraph the the result.
Should I do this in two separate data frames to first calculate the most incident generated switches? And, anthor data frame to calucate the count of impacted network? Any ideas is appreciated.
df[,Total:=sum(count),by=Switch][head(order(-Total),10)][ ... etc ... ]

Resources