how can i draw 'at least three' in ERR diagram - erd

How can i draw the ERR diagram with the 'at least three student' and 'A librarian may be a summer trainee'. I am very confuse with doing this kind of question. I don't know whether it is one to many or one to something?Thanks a lot!
Statement:
A student works in a library and a library employs at least three student . A student may be a summer trainee.

Related

Adding custom properties to RDF statements [duplicate]

I know that I can represent any relation as a RDF triplet as in:
Barack Obama -> president of -> USA
(I am aware that this is not RDF, I am just illustrating)
But how do I add additional information about this relation, like for example the time dimension? I mean he is in his second presidential period and any period last only for a lapse of time. And, how about after and before his presidential periods?
There are several options to do this. I'll illustrate some of the more popular ones.
Named Graphs / Quads
In RDF, named graphs are subsets of an RDF dataset that are assigned a specific identifier (the "graph name"). In most RDF databases, this is implemented by adding a fourth element to the RDF triple, turning it from a triple into a "quad" (sometimes it's also called the 'context' of the triple).
You can use this mechanism to express information about a certain collection of statements. For example (using pseudo N-Quads syntax for RDF):
:i1 a :TimePeriod .
:i1 :begin "2009-01-20T00:00:00Z"^^xsd:dateTime .
:i1 :end "2017-01-20T00:00:00Z"^^xsd:dateTime .
:barackObama :presidentOf :USA :i1 .
Notice the fourth element in the last statement: it links the statement "Barack Obama is president of the USA" to the named graph identified by :i.
The named graphs approach is particularly useful in situations where you have data to express about several statements at once. It is of course also possible to use it for data about individual statements (as the above example illustrates), though it may quickly become cumbersome if used in that fashion (every distinct time period will need its own named graph).
Representing the relation as an object
An alternative approach is to model the relation itself as an object. The relation between "Barack Obama" and "USA" is not just that one is the president of the other, but that one is president of the other between certain dates. To express this in RDF (as Joshua Taylor also illustrated in his comment):
:barackObama :hasRole :president_44 .
:president_44 a :Presidency ;
:of :USA ;
:begin "2009-01-20T00:00:00Z"^^xsd:dateTime ;
:end "2017-01-20T00:00:00Z"^^xsd:dateTime .
The relation itself has now become an object (an instance of the "Presidency" class, with identifier :president_44).
Compared to using named graphs, this approach is much more tailored to asserting data about individual statements. A possible downside is that it becomes a bit more complex to query the relation in SPARQL.
RDF Reification
Not sure this approach actually still counts as "popular", but RDF reification is the historically W3C-sanctioned approach to asserting "statements about statements". In this approach we turn the statement itself into an object:
:obamaPresidency a rdf:Statement ;
rdf:subject :barackObama ;
rdf:predicate :presidentOf ;
rdf:object :USA ;
:trueBetween [
:begin "2009-01-20T00:00:00Z"^^xsd:dateTime ;
:end "2017-01-20T00:00:00Z"^^xsd:dateTime .
] .
There's several good reasons not to use RDF reification in this case, however:
it's conceptually a bit strange. The knowledge that we want to express is about the temporal aspect of the relation, but using RDF reification we are saying something about the statement.
What we have expressed in the above example is: "the statement about Barack Obama being president of the USA is valid between ... and ...". Note that we have not expressed that Barack Obama actually is the president of the USA! You could of course still assert that separately (by just adding the original triple as well as the reified one), but this creates a further duplication/maintenance problem.
It is a pain to use in SPARQL queries.
As Joshua also indicated in his comment, the W3C Note on defining N-ary RDF relations is useful to look at, as it goes into more depth about these (and other) approaches.
RDF*, or RDF-star, allows expressing additional information about an RDF triple, allowing nested structures such as:
<< :BarackObama :presidentOf :USA >> :since :2009
Bit of a contrived example (since the term of presidency could be expressed by simply normalizing data structure), but it’s quite useful for expressing “external” concerns (such as probability or provenance).
See Olaf’s blog post and, for technical details, https://www.w3.org/2021/12/rdf-star.html.
I’m pretty sure Apache Jena supports it already, not quite sure about other products like Neo4j.

customer segmentation in retail [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
I have a large sales database of a 'home and construction' retail.
And I need to know who are the electricians, plumbers, painters, etc. in the store.
My first approach was to select the articles related to a specialty (wires [article] is related to an electrician [specialty], for example) And then, based on customer sales, know who the customers are.
But this is a lot of work.
My second approach is to make a cluster segmentation first, and then discover which cluster belong to a specialty. (this is a lot better because I would be able to discover new segments)
But, how can I do that? What type of clustering should I occupy? Kmeans, fuzzy? What variables should I take to that model? Should I use PCA to know how many cluster to search?
The header of my data (simplified):
customer_id | transaction_id | transaction_date | item_article_id | item_group_id | item_category_id | item_qty | sales_amt
Any help would be appreciated
(sorry my english)
You want to identify classes of customers based on what they buy (I presume this is for marketing reasons). This calls for a clustering approach. I will talk you through the entire setup.
The clustering space
Let us first consider what exactly you are clustering: either orders or customers. In either case, the way you characterize the items and the distances between them is the same. I will discuss the basic case for orders first, and then explain the considerations that apply to clustering by customers instead.
For your purpose, an order is characterized by what articles were purchased, and possibly also how many of them. In terms of a space, this means that you have a dimension for each type of article (item_article_id), for example the "wire" dimension. If all you care about is whether an article is bought or not, each item has a coordinate of either 0 or 1 in each dimension. If some order includes wire but not pipe, then it has a value of 1 on the "wire" dimension and 0 on the "pipe" dimension.
However, there is something to say for caring about the quantities. Perhaps plumbers buy lots of glue while electricians buy only small amounts. In that case, you can set the coordinate in each dimension to the quantity of the corresponding article (presumably item_qty). So suppose you have three articles, wire, pipe and glue, then an order described by the vector (2, 3, 0) includes 2 wire, 3 pipe and 0 glue, while an order described by the vector (0, 1, 4) includes 0 wire, 1 pipe and 4 glue.
If there is a large spread in the quantities for a given article, i.e. if some orders include order of magnitude more of some article than other orders, then it may be helpful to work with a log scale. Suppose you have these four orders:
2 wire, 2 pipe, 1 glue
3 wire, 2 pipe, 0 glue
0 wire, 100 pipe, 1 glue
0 wire, 300 pipe, 3 glue
The former two orders look like they may belong to electricians while the latter two look like they belong to plumbers. However, if you work with a linear scale, order 3 will turn out to be closer to orders 1 and 2 than to order 4. We fix that by using a log scale for the vectors that encode these orders (I use the base 10 logarithm here, but it does not matter which base you take because they differ only by a constant factor):
(0.30, 0.30, 0)
(0.48, 0.30, -2)
(-2, 2, 0)
(-2, 2.48, 0.48)
Now order 3 is closest to order 4, as we would expect. Note that I have used -2 as a special value to indicate the absence of an article, because the logarithm of 0 is not defined (log(x) tends to negative infinity as x tends to 0). -2 means that we pretend that the order included 1/100th of the article; you could make the special value more or less extreme, depending on how much weight you want to give to the fact that an article was not included.
The input to your clustering algorithm (regardless of which algorithm you take, see below) will be a position matrix with one row for each item (order or customer), one column for each dimension (article), and either the presence (0/1), amount, or logarithm of the amount in each cell, depending on which you choose based on the discussion above. If you cluster by customers, you can simply sum the amounts from all orders that belong to that customer before you calculate what goes into each cell of your position matrix (if you use the log scale, sum the amounts before taking the logarithm).
Clustering by orders rather than by customers gives you more detail, but also more noise. Customers may be consistent within an order but not between them; perhaps a customer sometimes behaves like a plumber and sometimes like an electrician. This is a pattern that you will only find if you cluster by orders. You will then find how often each customer belongs to each cluster; perhaps 70% of somebody's orders belong to the electrician type and 30% belong to the plumber type. On the other hand, a plumber may only buy pipe in one order and then only buy glue in the next order. Only if you cluster by customers and sum the amounts of their orders, you get a balanced view of what each customer needs on average.
From here on I will refer to your position matrix by the name my.matrix.
The clustering algorithm
If you want to be able to discover new customer types, you probably want to let the data speak for themselves as much as possible. A good old fashioned
hierarchical clustering with complete linkage (CLINK) may be an appropriate choice in this case. In R, you simply do hclust(dist(my.matrix)) (this will use the Euclidean distance measure, which is probably good enough in your case). It will join closely neighbouring items or clusters together until all items are categorized in a hierarchical tree. You can treat any branch of the tree as a cluster, observe typical article amounts for that branch and decide whether that branch represents a customer segment by itself, should be split in sub-branches, or joined with a sibling branch instead. The advantage is that you find the "full story" of which items and clusters of items are most similar to each other and how much. The disadvantage is that the outcome of the algorithm does not tell you where to draw the borders between your customer segments; you can cut up the clustering tree in many ways, so it's up to your interpretation how you want to identify your customer types.
On the other hand, if you are comfortable fixing the number of clusters (k) beforehand, k-means is a very robust way to get just any segmentation of your customers in k distinct types. In R, you would do kmeans(my.matrix, k). For marketing purposes, it may be sufficient to have (say) 5 different profiles of customers that you make custom advertisement for, rather than treating all customers the same. With k-means you don't explore all of the diversity that is present in your data, but you might not need to do so anyway.
If you don't want to fix the number of clusters beforehand, but you also don't want to manually decide where to draw the borders between the segments afterwards, there is a third possibility. You start with the k-means algorithm, where you let it generate an amount of cluster centers that is much larger than the number of clusters that you hope to end up with (for example, if you hope to end up with somewhere about 10 clusters, let the k-means algorithm look for 200 clusters). Then, use the mean shift algorithm to further cluster the resulting centers. You will end up with a smaller number of compact clusters. The approach is explained in more detail by James Li over here. You can use the mean shift algorithm in R with the ms function from the LPCM package, see this documentation.
About using PCA
PCA will not tell you how many clusters you need. PCA answers a different question: which variables seem to represent a common underlying (hidden) factor. In a sense, it is a way to cluster variables, i.e. properties of entities, not to cluster the entities themselves. The number of principal components (common underlying factors) is not indicative of the number of clusters needed. PCA can still be interesting if you want to learn something about the predictive value of each article about a customer's interests.
Sources
Michael J. Crawley, 2005. Statistics. An Introduction using R.
Gerry P. Quinn and Michael J. Keough, 2002. Experimental Design and Data Analysis for Biologists.
Wikipedia: hierarchical clustering, k-means, mean shift, PCA

Recursive hypothesis-building with ambiguites - what's it called?

There's a problem I've encountered a lot (in the broad fields of data analyis or AI). However I can't name it, probably because I don't have a formal CS background. Please bear with me, I'll give two examples:
Imagine natural language parsing:
The flower eats the cow.
You have a program that takes each word, and determines its type and the relations between them. There are two ways to interpret this sentence:
1) flower (substantive) -- eats (verb) --> cow (object)
using the usual SVO word order, or
2) cow (substantive) -- eats (verb) --> flower (object)
using a more poetic world order. The program would rule out other possibilities, e.g. "flower" as a verb, since it follows "the". It would then rank the remaining possibilites: 1) has a more natural word order than 2), so it gets more points. But including the world knowledge that flowers can't eat cows, 2) still wins. So it might return both hypotheses, and give 1) a score of 30, and 2) a score of 70.
Then, it remembers both hypotheses and continues parsing the text, branching off. One branch assumes 1), one 2). If a branch reaches a contradiction, or a ranking of ~0, it is discarded. In the end it presents ranked hypotheses again, but for the whole text.
For a different example, imagine optical character recognition:
** **
** ** *****
** *******
******* **
* ** **
** **
I could look at the strokes and say, sure this is an "H". After identifying the H, I notice there are smudges around it, and give it a slightly poorer score.
Alternatively, I could run my smudge recognition first, and notice that the horizontal line looks like an artifact. After removal, I recognize that this is ll or Il, and give it some ranking.
After processing the whole image, it can be Hlumination, lllumination or Illumination. Using a dictionary and the total ranking, I decide that it's the last one.
The general problem is always some kind of parsing / understanding. Examples:
Natural languages or ambiguous languages
OCR
Path finding
Dealing with ambiguous or incomplete user imput - which interpretations make sense, which is the most plausible?
I'ts recursive.
It can bail out early (when a branch / interpretation doesn't make sense, or will certainly end up with a score of 0). So it's probably some kind of backtracking.
It keeps all options in mind in light of ambiguities.
It's based off simple rules at the bottom can_eat(cow, flower) = true.
It keeps a plausibility ranking of interpretations.
It's recursive on a meta level: It can fork / branch off into different 'worlds' where it assumes different hypotheses when dealing with the next part of data.
It'll forward the individual rankings, probably using bayesian probability, to dependent hypotheses.
In practice, there will be methods to train this thing, determine ranking coefficients, and there will be cutoffs if the tree becomes too big.
I have no clue what this is called. One might guess 'decision tree' or 'recursive descent', but I know those terms mean different things.
I know Prolog can solve simple cases of this, like genealogies and finding out who is whom's uncle. But you have to give all the data in code, and it doesn't seem convienent or powerful enough to do this for my real life cases.
I'd like to know, what is this problem called, are there common strategies for dealing with this? Is there good literature on the topic? Are there libraries for ideally C(++), Python, were you can just define a bunch of rules, and it works out all the rankings and hypotheses?
I don't think there is one answer that fits all the bullet points you have. But I hope my links will lead you closer to an answer or might give you a different question.
I think the closest answer is Bayesian network since you have probabilities affecting each other as I understand it, it is also related to Conditional probability and Fuzzy Logic
You also describe a bit of genetic programming as well as Artificial Neural Networks
I can name drop some more topics which might be related:
http://en.wikipedia.org/wiki/Rule-based_programming
http://en.wikipedia.org/wiki/Expert_system
http://en.wikipedia.org/wiki/Knowledge_engineering
http://en.wikipedia.org/wiki/Fuzzy_system
http://en.wikipedia.org/wiki/Bayesian_inference

FastICA and Blind Signal Separation Mathematics Question

I'm trying to make my own implementation of the FastICA algorithm based on the paper here: http://www.cs.helsinki.fi/u/ahyvarin/papers/NN00new.pdf.
I need some help w/ the math though.
In the middle of page 14 there is an equation that looks somewhat like
w+ = E{ xg(w^Tx) } - E{ g[prime]( w^T x)} w
What does the E mean? Back from my probability days I recall that it is the "expected value" of a random variable but it doesn't make sense to me what the expected value of a vector is.
Thanks,
mj
ICA is interesting stuff. I used it some in my graduate research, but I didn't dig in too much under the hood; I just downloaded the FastICA implementation for MatLab and used that.
Anyway, you are correct that E{...} denotes expected value. The elements of the vector x represent the individual signals. Strictly speaking, x is a time series and should be written x(t), but the convention in ICA is to treat x instead as a random variable. In that context, of course, the idea of expected value makes sense. For example E{x} would just be the mean value of x (taken to be zero in ICA as the signals have been centered).
The authors of the paper you linked also have a book on ICA. It's outrageously expensive on Amazon, but if you can find a copy at, say, a nearby university library, it might be worth a look. It's been several years, but I remember it as being as gentle an introduction as one could hope for given the mathematics.

What is a "graph carving"?

I've faved a question here, and the most promising answer to-date implies "graph carvings". Problem is, I have no clue what it is (neither does the OP, apparently), and it sounds very promising and interesting for several uses. My Googlefu failed me on this topic, as I found no useful/free resource talking about them.
Can someone please tell me what is a 'graph carving', how I can make one for a graph, and how I can determine what makes a certain carving better suited for a task than another?
Please don't go too mathematical on me (or be ready to answer more questions): I understand what's a graph, what's a node and what's a vertex, I manage with big O notation, but I have no real maths background.
I think the answer given in the linked question is a little loose with terminology. I think it is describing a tree carving of a graph G. This is still not particularly google-friendly, I admit, but perhaps it will get you going on your way. The main application of this structure appears to be in one particular DFS algorithm, described in these two papers.
A possibly more clear description of the same algorithm may appear in this book.
I'm not sure stepping through this algorithm would be particularly helpful. It is a reasonably complex algorithm and the explanation would probably just parrot those given in the papers I linked. I can't claim to understand it very well myself. Perhaps the most fruitful approach would be to look at the common elements of those three links, and post specific questions about parts you don't understand.
Q1:
what is a 'graph carving'
There are two types of graph carving: Tree-Carving and Carving.
A tree-carving of a graph is a partition of the vertex set V into subsets V1,V2,...,Vk with the following properties. Each subset constitutes a node of a tree T. For every vertex v in Vj, all the neighbors of v in G belong either to Vj itself, or to Vi where Vi is adjacent to Vj in the tree T.
A carving of a graph is a partitioning of the vertex set V into a collection of subsets V1,V2,...Vk with the following properties. Each subset constitutes a node of a rooted tree T. Each non-leaf node Vj of T has a special vertex denoted by g(Vj) that belongs to p(Vj). For every vertex v in Vi, all the neighbors of v that are in ancestor sets of Vi belong to either
Vi or
Vj, where Vj is the parent of Vi in the tree T, or
Vl, where Vl is the grandparent in the tree T. In this case, however the neighbor of v can only be g(p(Vi))
Those defination referred from chapter 6 of book "Approximation Algorithms for NP-Hard problems" and paper1. (paper1 is picked from Gain's answer, thanks Gain.)
According to my understanding. Tree-Carving or Carving are a kind of representation (or a simplification) of an original graph G. So that the resulting new graph still preserve 'connection properties' of G, but with much smaller size(less vertex, less nodes). These two methods both somehow try to delete 'local' 'similar' information but to keep 'structure' 'vital' information. By merging some 'closed' vertices into one vertex and deleting some edges.
And It seems that Tree Carving is a little bit simpler and easier to understand Since in **Carving**, edges are allowed to go to a single vertex in the grapdhparenet node as well. It would preserve more information.
Q2:
how I can make one for a graph
I only know how to get a tree-carving.
You can refer the algorithm from paper1.
It's a Depth-First-Search based algorithm.
Do DFS, before return from an iteration, check whether this edges is 'bridge' edge or not. If yes, you need remove this 'bridge' and adding some 'back edge'.
You would get a DFS-partition which yields a tree-carving of G.
Q3:
how I can determine what makes a certain carving better suited for a task than another?
Sorry I don't know. I am also a new guy in graph theory.
If you have more question:
What's g function of g(Vj)?
a special node called gray node. go to paper1
What's p function of p(Vj)?
I am not sure. maybe p represent 'parent'. go to paper1
What's the back edge of node t?
some edge(u,v) s.t. u is a decent of t and v is a precedent of t. goto to paper1
What's bridge?
bridge wiki

Resources