How can i make Association rules considering count information? - r

Hi now i'm studying association rules with R.
i have a question.
in transcation data,
we consider just buy or non-buy (binary data)
i want to know how to perform association rules with count data
ex)
item1 item2 item3
1 2 0 1
2 0 1 0
3 1 0 0
first customer bought two item1s!!
but in ordinary association rules, that count information is ignored
how can we consider that information?

High, The quantitative association rules (QAR) mining may be helpful.
Firstly, you should divide the value field of every item to some sets and give every set an unique label. Then, the original dataset can be transformed to a binary dataset containing those labels.
for example, for item1, if the original data has the following information:
the first person have bought 5 item1s
the second one have bought 2 item1s
the third one have bought 7 item1s.
You can divide the value field of item1 to [0, 3), [3, 6) and [6, 9), and use a1, a2 and a3 to represent them, so the item 'item1' can be replaced by 3 other items, which are a1, a2 and a3, and the original data can be replaced by the follows.
the first person have bought one a2.
the second person have bought one a1.
the third person have bought one a3.
After doing this work on every item, the original dataset can be transformed to a binary dataset.

Related

Assign new ID taking into account previous changes

Sorry I do not know how to properly title my question. It is easier to understand with an example.
Sample data
Consider the following example.
> l_ids=as.data.frame(cbind(a=c("strong","intense","intensity"),
id=c("1","2","3"),new_id=c("","1","2")),stringsAsFactors = FALSE)
a id new_id
1 strong 1
2 intense 2 1
3 intensity 3 2
I would like to update the id of each word in a with a new_id, if it applies. Consider this as a synonym dictionary. As I iterate over new_id;
> for (i in 1:nrow(l_ids)){
+ if (nchar(l_ids$new_id[i])>0){
+ l_ids$id[i]=l_ids$new_id[i]
+ }
+ }
> l_ids
a id new_id
1 strong 1
2 intense 1 1
3 intensity 2 2
The problem is that I would like for intensity to also be given a 1. Is there a way to do this without having to iterate multiple times?
Update on background
I have a document where I have a list of synonyms. These are synonyms only relevant to the field of application of the problem. Example:
> dictionary
good bad
1 strong intense
2 intense intensity
3 light soft
I am then given a list of words, each with a given id. My task is to check if any of those words is in the bad column of dictionary and, if so, update it with the id of the word to its left. As can be seen, intensity would need two steps to become strong (a good word in the dictionary). Is there a way to do so without having to do multiple iterations? (say, a for loop)

Create Expression in Report Builder 3.0 Report to sum a column

I created a report with 3 columns, Department, Ticket Count, Ticket Number. It groups Department names and the second column shows 1 instance of the Ticket Count. The last column shows all of the ticket numbers.
I added a row that shows the grand total of all of the departments displayed.
This is the data set results :
Department TicketCount TicketNumber
D1 3 12345
D1 3 22345
D1 3 32345
I group the Department and the TicketCount so that the display is like this:
Department TicketCount TicketNumber
D1 3 12345
22345
32345
I want to add a ticket total at the end but the result is always adding all of the ticket counts and not just one.
So the Total displayed is 9 not 3.
I need to create an expression that picks the distinct TicketCounts of the departments and sums them.
The function DistinctCount returns the correct number of counts when I have multiple departments but not the values.
I tried the RunningValue function but it adds all of the values in the column.
=RunningValue(Fields!ReopenedTicketCount.Value, sum, Nothing)
I need to create a function that sums the distinct values of the ticket counts of each department.
Can anyone point me in the direction as to the functions that I need to use?
I figured this out. I just used the COUNT function on the third column to get the grand total.

Excel: Select data for graph

To put it simple, I have three columns in excel like the ones below:
Vehicle x y
1 10 10
1 15 12
1 12 9
2 8 7
2 11 6
3 7 12
x and y are the coordinates of customers assigned to the corresponding vehicle. This file is the output of a program I run in advance. The list will always be sorted by vehicle, but the number of customers assigned to vehicle "k" may change from one experiment to the next.
I would like to plot a graph containing 3 series, one for each vehicle, where the customers of each vehicle would appear (as dots in 2D based on their x- and y- values) in different color.
In my real file, I have 12 vehicles and 3200 customers, and the ranges change from one experiment to the next, so I would like to automate the process, i.e copy-paste the list on my excel and see the graph appear automatically (if this is possible).
Thanks in advance for your time and effort.
EDIT: There is a similar post here: Use formulas to select chart data but requires the use of VB. Moreover, I am not sure whether it has been indeed answered.
you should try this free online tool - www.cloudyexcel.com/excel-to-graph/

Conditionally matching elements in multiple columns of two large datasets with each other

I have two very large datasets for demand and returns of products (about 4 million entries per dataset, but unequal length). The first dataset gives [1] the date of demand, [2] the id of the customer and [3] the id of the product. The second dataset gives the [1] date of return, [2] the id of the customer and [3] the id of the product.
Now I would like to match all demands for given customers and products with the returns of the same customer and product. Pairs of product types and customers are not unique, because customer can demand a product multiple times. Therefore, I want to match a demand for a product with the earliest return in the dataset. It can also happen that some products are not returned, or that some products are returned which have not been demanded (because customers return items that were demanded before the starting data in the dataset).
To that end I've written the following code:
transactionNumber = 1:nrow(demandSet) #transaction numbers for the demandSet
matchedNumber = rep(0, nrow(demandSet)) #vector of which values in the returnSet correspond to the transactions in the demandSet
for (transaction in transactionNumber){
indices <- which(returnSet[,2]==demandSet[transaction,2]&returnSet[,3]==demandSet[transaction,3])
if (length(indices)>0){
matchedNumber[transaction] <- indices[which.min(returnSet[indices,][,1])] #Select the index of the transaction with the minimum date
}
}
However, this takes around a day to compute. Anyone have a better suggestion? Note that the suggestions from match two columns with two other columns do not work here, since match() overflows memory.
As a working example consider
demandDates = c(1,1,1,5,6,6,8,8)
demandCustIds = c(1,1,1,2,3,3,1,1)
demandProdIds = c(1,2,3,4,1,5,2,6)
demandSet = data.frame(demandDates,demandCustIds,demandProdIds)
returnDates = c(1,1,4,4,4)
returnCustIds = c(4,4,1,1,1)
returnProdIds = c(5,7,1,2,3)
returnSet = data.frame(returnDates,returnCustIds,returnProdIds)
(This actually doesn't work completely correctly, since transaction 7 is incorrectly matched with return 4, however for the sake of the question lets assume this I what I want... I can fix this later)
require(data.table)
DD<-data.table(demandSet,key="demandCustIds,demandProdIds")
DR<-data.table(returnSet,key="returnCustIds,returnProdIds")
DD[DR,mult="first"]
demandCustIds demandProdIds demandDates returnDates
1: 1 1 1 4
2: 1 2 1 4
3: 1 3 1 4
4: 4 5 NA 1
5: 4 7 NA 1

Using two datasets in a single report using SQL server reporting service

I need to show a report of same set of data with different condition.
I need to show count of users registered by grouping region, country and userType, I have used drill down feature for showing this and is working fine. Also the reported data is the count of users registered between two dates. Along with that I have to show the total users in the system using the same drill down that is total users by region, country and usertype in a separate column along with each count (count of users between two date)
so that my result will be as follwsinitialy it will be like
Region - Country - New Reg - Total Reg - User Type 1 - UserType2
+ Region1 2 10 1 5 1 5
+ Region2 3 7 2 4 1 3
and upon expanding the region it will be like
Region - Country - New Reg - Total Reg - User Type 1 - UserType2
+ Region1 2 10 1 5 1 5
country1 1 2 1 2 - -
country2 1 8 1 8 - -
+ Region2 3 7 2 4 1 3
Is there a way I can show my report like this, I have tried with two data sets one with conditional datas and other with non conditional but it didn't work, its always bing total number of regiostered users for all the total reg columns
Unless I'm mistaken, you're trying to create an expandable table, with different grouping levels? Fortunately, this can be easily done in SSRS if you know where to look. The totals on your example don't seem to match up in the user columns, so I may have misunderstood the problem.
For starters, set up your query to produce a single dataset like this:
Region Country New Reg - Total Reg - User Type 1 - User Type 2
Region1 country1 1 2 1
Region1 country2 1 8 1
Region2 country3 2 4 1 1
Region2 country4 1 3 1
Now that you've got that, you want to set up a new table with the fields "NewReg", "TotalReg", "UserType1" and "UserType2". Then right-click the table row, and go to "Add Group > Row Group > Parent Group". Select "Country" in the Group by and click okay. Then, repeat this process and select "Region". This time however, tick the "Add group header" box. This will insert another row above the original.
Now, for each of your fields ("NewReg", "TotalReg" etc), click in the new row above and select the field again. this will automaticaly add a Sum(FieldName) value into the cell. This will add together all the individual row totals and present a new, grouped by region row when you run the report.
That should give you the table you require with the data aggregated correctly, so all you need to do is manage the show/hide the detail rows on demand.
To do this, select your detail row (the original row) and right-click "> Row visibility". Set this to "Hide". Now, select the cell that contains the "Region" and take note of its ID using Properties (for now, let's assume it's called "Region"). Click back onto your detail row and look at the properties window. At the bottom you'll see a "Visibility" setting. In there, set "InitialToggleState" to False and "ToggleItem" to the name of your region group's cell (i.e. "Region").
Now all that should be left is to do the formatting etc and tidy up.
I have solved this problem by taking all the records from DB and filtering the records to collect new reg count by using an expression as following
=Sum(IIF(Fields!RegisteredOn.Value >Parameters!FromDate.Value and Fields!RegisteredOn.Value < Parameters!EndDate.Value , 1,0))

Resources