I just started programming in R, neo4j and R-neo4j so please be indulgent if my question is trivial.
I have created following database (please confer the attached photo) [1] using R-neo4j and the following R Project code [2].
The database contains the outcome of computer game matches between four players. The dataset consists of four nodes, player 1 to player 4. The nodes are connected via the relationship "defeats", which indicates the outcome of the matches. There are two label entries attached to each relationship containing the following data: judge, game.
From the graph database using Cypher queries, I want to extract data in the following form (please confer the picture in [1]):
Winning player Loosing player Game Judge
player 1 player 4 Starcraft player 2
player 1 player 4 LOL player 3
player 4 player 1 LOL player 2
player 1 player 4 Starcraft player 3
player 1 player 2 LOL player 3
player 2 player 1 LOL player 4
player 4 player 1 Starcraft player 4
I want to make a query (preferred in the R-neo4j environment) to the graph database, where the input is "player 1" and the table above is returned.
I hope that my question is clear and someone can help me with this.
Have a good day.
Christian
[1] https://goo.gl/cMxXHo
[2] The R (Rneo4j) code:
clear(graph)
Y
player1 = createNode(graph,"user",ID="Player 1",male=T)
player2 = createNode(graph,"user",ID="Player 2",male=T)
player3 = createNode(graph,"user",ID="Player 3",male=F)
player4 = createNode(graph,"user",ID="Player 4",male=F)
addConstraint(graph,"user","ID")
rel1 = createRel(player1,"defeats",player4)
rel2 = createRel(player1,"defeats",player4)
rel3 = createRel(player4,"defeats",player1)
rel4 = createRel(player1,"defeats",player4)
rel5 = createRel(player1,"defeats",player2)
rel6 = createRel(player2,"defeats",player1)
rel7 = createRel(player3,"defeats",player1)
rel1 = updateProp(rel1, game = "Starcraft", judge = "Player 2")
rel2 = updateProp(rel2, game = "League of Legends", judge = "Player 3")
rel3 = updateProp(rel3, game = "League of Legends", judge = "Player 2")
rel4 = updateProp(rel4, game = "Starcraft", judge = "Player 3")
rel5 = updateProp(rel5, game = "League of Legends", judge = "Player 3")
rel6 = updateProp(rel6, game = "League of Legends", judge = "Player 4")
rel7 = updateProp(rel7, game = "Starcraft", judge = "Player 4")
A couple things. If you want to use clear(graph) without having to type "Y", you can use clear(graph, input=F). Also, if you weren't aware, you can set properties on relationships when you create them:
rel1 = createRel(player1, "defeats", player4, game="Starcraft", judge="Player 2")
To answer the question, I'd do this:
getDataForPlayer = function(name) {
query = "
MATCH (winner:user)-[game:defeats]->(loser:user)
WHERE winner.ID = {name} OR loser.ID = {name}
RETURN winner.ID AS `Winning Player`,
loser.ID AS `Losing Player`,
game.game AS Game,
game.judge AS Judge
"
return(cypher(graph, query, name=name))
}
getDataForPlayer("Player 1")
Output:
Winning Player Losing Player Game Judge
1 Player 4 Player 1 League of Legends Player 2
2 Player 2 Player 1 League of Legends Player 4
3 Player 3 Player 1 Starcraft Player 4
4 Player 1 Player 2 League of Legends Player 3
5 Player 1 Player 4 Starcraft Player 2
6 Player 1 Player 4 League of Legends Player 3
7 Player 1 Player 4 Starcraft Player 3
Looking at your graph, it kind of hits me of not having the right structure. Even though every scenario might be different, it is always good to consider what happens when you add MUCH more data. Can your model handle it?
For example, you are using relationships to represent results of games, which then of course requires attributes to store the judge and the games. Game names actually look like tournament games to me, but you'll know what works better. When storing the player and tournament names, you end up having a lot of repetition because the same names and players appear everywhere.
If you continue to add results between players you will end up with many relationships and the possibilities for error and repetition keep growing.
What can you do in order to improve your model then? Think of your basic relationship as a starting point but now it has outgrown the original requirement: you can introduce nodes for tournaments and nodes for games; keep relationships for storing the roles of the players within a game and so on. There is always more than one way to do it (TIMTOWTDI).
Given that a picture is worth a thousand words, look at the improved model here:
You see how it is also easier to add additional properties to the corresponding nodes or relationships in the model.
In order to produce your desired table with results, you then can use:
MATCH
(g:Game)-[:WINNER]->(w:Player),
(g)-[:LOSER]->(l:Player),
(g)-[:JUDGE]->(j:Player),
(g)<-[:HAS_GAMES]-(t:Tournament)
WHERE
w.name = 'Player 1' OR l.name = 'Player 1'
RETURN
w.name AS 'Winning Player',
l.name AS 'Losing Player',
t.name AS 'Game',
j.name AS 'Judge'
and adapt for R as suggested by Nicole. If you pretend to add lots of data, I think this structure will adapt better to your needs, and you can also explore different ways of querying for the same data, as you can now start with the tournaments or explore games directly.
Related
I have below one problem regarding permutation and combination.
I know one solution which I am providing here. But I have another approach to the same problem but it is not giving me same answer as previous one. Can someone tell where am I making mistake here.
Problem: From a group of 7 men and 6 women, five persons are to be selected to form a committee so that at least 3 men are there in the committee. In how many ways can it be done?
First Answer:
We can select 5 men ...(option 1)
Number of ways to do this = 7C5
We can select 4 men and 1 woman ...(option 2)
Number of ways to do this = 7C4 × 6C1
We can select 3 men and 2 women ...(option 3)
Number of ways to do this = 7C3 × 6C2
Total number of ways = 7C5 + (7C4 × 6C1) + (7C3 × 6C2)
= 756.
Below is my new approach, where I am making mistake but not able to understand it.
atleast 3 men should be there. So ways to choose 3 men out of 7 = 7C3
= 35.
Now 2 person has to be selected from remaining 4 men and 6 women. The no of ways it can be done = 10C2 = 45.
Therefore, total no of way = 35*45 = 1575.
Can someone tell me what I am missing in second approach.
Your approach will count some ways more than
Suppose from the 7 men you choose
M1,M2,M3
and from the remaining 10 person you choose a men M4 and remaining women W1,W2,W3...W6
Now suppose you choose M1,M2,M4 men from the 7 men
and from remaining 10 you choose M3,W1,W2...W6
Now both of this represent the same set and should be counted only once but you are counting them as 2 different ways.Thats why your answer is greater than the expected answer
Sorry I do not know how to properly title my question. It is easier to understand with an example.
Sample data
Consider the following example.
> l_ids=as.data.frame(cbind(a=c("strong","intense","intensity"),
id=c("1","2","3"),new_id=c("","1","2")),stringsAsFactors = FALSE)
a id new_id
1 strong 1
2 intense 2 1
3 intensity 3 2
I would like to update the id of each word in a with a new_id, if it applies. Consider this as a synonym dictionary. As I iterate over new_id;
> for (i in 1:nrow(l_ids)){
+ if (nchar(l_ids$new_id[i])>0){
+ l_ids$id[i]=l_ids$new_id[i]
+ }
+ }
> l_ids
a id new_id
1 strong 1
2 intense 1 1
3 intensity 2 2
The problem is that I would like for intensity to also be given a 1. Is there a way to do this without having to iterate multiple times?
Update on background
I have a document where I have a list of synonyms. These are synonyms only relevant to the field of application of the problem. Example:
> dictionary
good bad
1 strong intense
2 intense intensity
3 light soft
I am then given a list of words, each with a given id. My task is to check if any of those words is in the bad column of dictionary and, if so, update it with the id of the word to its left. As can be seen, intensity would need two steps to become strong (a good word in the dictionary). Is there a way to do so without having to do multiple iterations? (say, a for loop)
I would like to retrieve information from wikidata and store it in a dataframe. For the sake of simplicity I am going to assume that I want to get the genre of the following movies and then filter those that belong to sci-fi:
movies = c("Star Wars Episode IV: A New Hope", "Interstellar",
"Happythankyoumoreplease")
I know there is a package called WikidataR. If I am not wrong, and according to its vignettes there are two commands that may be useful: find_item and find_property allow you to retrieve a set of Wikidata items or properties where the aliase or descriptions match a particular search term. Apparently they are great for me, so I thought of doing something like
for (i in movies) {
info = find_item(i)
}
This is what I get from each item:
> find_item("Interstellar")
Wikidata item search
Number of results: 10
Results:
1 Interstellar (Q13417189) - 2014 US science fiction film
2 Interstellar (Q6057099)
3 interstellar medium (Q41872) - matter and fields (radiation) that exist in the space between the star systems in a galaxy;includes gas in ionic, atomic or molecular form, dust and cosmic rays. It fills interstellar space and blends smoothly into the surrounding intergalactic space
4 space colonization (Q686876) - concept of permanent human habitation outside of Earth
5 rogue planet (Q167910) - planetary-mass object that orbits the galaxy directly
6 interstellar cloud (Q1054444) - accumulation of gas, plasma and dust in a galaxy
7 interstellar travel (Q834826) - term used for hypothetical manned or unmanned travel between stars
8 Interstellar Boundary Explorer (Q835898)
9 starship (Q2003852) - spacecraft designed for interstellar travel
10 interstellar object (Q2441216) - astronomical object in interstellar space, such as a comet
>
Unfortunately, the information that I get from find_item (see below) has two problems:
it is not a dataframe with all wikidata information of the item I
am searching but a list of what seems to be metadata (wikidata's id,
link...).
it does not have the information I need (wikidata's
properties from each particular wikidata item).
Similarly, find_property provides metadata of a certain property. find_property("genre") retrieves the following information:
> find_property("genre")
Wikidata property search
Number of results: 4
Results:
1 genre (P136) - a creative work's genre or an artist's field of work (P101). Use main subject (P921) to relate creative works to their topic
2 radio format (P415) - describes the overall content broadcast on a radio station
3 sex or gender (P21) - sexual identity of subject: male (Q6581097), female (Q6581072), intersex (Q1097630), transgender female (Q1052281), transgender male (Q2449503). Animals: male animal (Q44148), female animal (Q43445). Groups of same gender use "subclass of" (P279)
4 gender of a scientific name of a genus (P2433) - determines the correct form of some names of species and subdivisions of species, also subdivisions of a genus
This has similar problems:
it is not a dataframe
it just stores metadata about the property
I don't find any way to link each property with each object in movies vector.
Is there any way to end up with a dataframe containing the genre's of those movies? (or a dataframe with all wikidata's information which I will have to manipulate in order to filter or select my desired data?)
These are just lists. you can get a picture with str(find_item("Interstellar")) for example.
Then you can go through each element of the list and pick the item that you need. For example. Getting the title and the label
a <- find_item("Interstellar")
b <- Reduce(rbind,lapply(a, function(x) cbind(x$title,x$label)))
data.frame(b)
## X1 X2
## 1 Q13417189 Interstellar
## 2 Q6057099 Interstellar
## 3 Q41872 interstellar medium
## 4 Q686876 space colonization
## 5 Q167910 rogue planet
## 6 Q1054444 interstellar cloud
## 7 Q834826 interstellar travel
## 8 Q835898 Interstellar Boundary Explorer
## 9 Q2003852 starship
## 10 Q2441216 interstellar object
This works easily for regular data if some element is missing then you will have to handle it for example some items don't have description. So you can get around with the following.
Reduce("rbind",lapply(a,
function(x) cbind(x$title,
x$label,
ifelse(length(x$description)==0,NA,x$description))))
I have a input file with different food types
Corn Fiber 17
Beans Protein 12
Milk Protien 15
Butter Fat 201
Eggs Fat 2
Bread Fiber 12
Eggs Cholesterol 4
Eggs Protein 8
Milk Fat 5
(Don't take these too seriously. I'm no nutrition expert) Anyway, I have the following script that reads the input file then puts the following into a table
file = io.open("food.txt")
foods = {}
nutritions = {}
for line in file:lines()
do
local f, n, v = line:match("(%a+) (%a+) (%d+)")
nutritions[n] = {value = v}
--foods[f] = {} Not sure how to implement here
end
file:close()
(It's a little messy right now)
Notice also that different foods can have different nutrients. For example, eggs have both protein and fat. I need a way to let the program, know which value I am trying to call. For example:
> print(foods.Eggs.Fat)
2
> print(foods.Eggs.Protein
8
I believe I need two tables, as shown above. The foods table will contain a table of nutritions. This way, I can have multiple food types with multiple different nutrient facts. However, I am not sure how to handle a table of tables. How can I implement this within my program?
The straightforward way is to test if food[f] exists, to decide whether to create a new table or add elements to existing one.
foods = {}
for line in file:lines() do
local f, n, v = line:match("(%a+) (%a+) (%d+)")
if foods[f] then
foods[f][n] = v
else
foods[f] = {[n] = v}
end
end
My problem I have is that I need to calculate out how much a point is worth based on played games.
If a team plays a match it can get 3 points for a win, 1 point for a tie and 0 points for a loss.
And the problem here is following:
Team 1
Wins:8 Tie:2 Loss:3 Points:26 Played Games: 13
Team 2
Wins:8 Tie:3 Loss:4 Points:27 Played Games: 15
And here you can see that Team 2 has 1 more point than Team 1 has. But Team 2 has played 2 more matches and have a lesser win % then Team 1 has. But if you should list these two then Team 2 would get a higher "rating" then Team 1 has.
So how should the math look for this to make it fair? where Team 1 will have a better score here then Team 2 ?
Just divide by the number of games to get the average points per game played.
Team1: 2.0 ppg
Team2: 1.8 ppg
Okey first of all thanks for the help.
And the solution of this is the following:
p/pg * p = Real points
p = Sum(points),
pg = Played games
So for the example up top the real points will be:
Team 1: 52
Team 2: 48.6