Creating a weighted adjacency matrix with iterations - r

I have a data on the list of directors from different companies. Directors from one company meet at the same board of directors. Moreover, I also have a data how many times these directors were in the same board of directors. I have to create an adjacency matrix consisting from these directors. Nodes represent how many times 2 directors were in the same board of directors (i.e. if A and B are from company 1, and there were 11 meetings in this company, hence it must be 11 on at the intersection of A and B and if A and B from different boards of directors (from different companies), then it must be 0 at the intersection.
I have created this matrix in Excel successfully via command
=IF(VLOOKUP($E2;$A$1:$C$27;2;0)=(VLOOKUP(F$1;$A$1:$C$27;2;0));$C2;0)
However, the main problem is that two or more directors may meet in more than one board of directors (one company). In this case the total number of meetings must be added together. For example, if A and B meet together in company 1 for 11 times and in company 3 for 4 times, then it must be 15 at the intersection and, unfortunately, I can't understand how to realize it. I've searched for similar problems and I didn't found any cases where the data in original data was repeated. I have no idea, whether it is possible to realize it in Excel or should I apply another software (R or something else)?

See if this array formula works for you:-
=SUM(ISNUMBER(MATCH(IF($A$2:$A$27=F$1,$B$2:$B$27,"+"),IF($A$2:$A$27=$E2,$B$2:$B$27,"-"),0))*$C$2:$C$27)
Must be entered with CtrlShiftEnter

Related

Generating 'weight' column for network analysis

I'm very new to R, so forgive any omissions or errors.
I have a dataframe that contains a series of events (called 'incidents') represented by a column named 'INCIDENT_NUM'. These are strings (ex. 2016111111), and there are multiple cells per incident if there are multiple employees involved in the incident. Employees are represented in their own string column ('EMPL_NO'), and they can be in the column multiple times if they're involved in multiple incidents.
So, the data I have looks like:
Incident Number
EMPL_NO
201611111
EID0012
201611111
EID0013
201611112
EID0012
201611112
EID0013
201611112
EID0011
What I am aiming to do is see which employees are connected to one another by how many incidents they're co-involved with. Looking at tutorials for network analysis, folks have data that looks like this, which is what I ultimately want:
From
To
Weight
EID0011
EID0012
2
EID0012
EID0013
1
Is there any easy process for this? My data has thousands of rows, so doing this by hand doesn't feel feasible.
Thanks in advance!!!

Need to get combination of records from Data Frame in R that satisfies a specific target in R

Let me say that I have a below Data Frame in R with 500 player records with the following columns
PlayerID
TotalRuns
RunRate
AutionCost
Now out of the 500 players, I want my code to give me multiple combinations of 3 players that would satisfy the following criteria. Something like a Moneyball problem.
The sum of auction cost of all the 3 players shouldn't exceed X
They should have a minimum of Y TotalRuns
Their RunRate must be higher than the average run rate of all the players.
Kindly help with this. Thank you.
So there are choose(500,3) ways to choose 3 players which is 20,708,500. It's not impossible to generate all these combinations combn might do it for you, but I couldn't be bothered waiting to find out. If you do this with player IDs and then test your three conditions, this would be one way to solve your problem. An alternative would be a Monte Carlo method. Select three players that initially satisfy your conditions. Randomly select another player who doesn't belong to the current trio, if he satisfies the conditions save the combination and repeat. If you're optimizing (it's not clear but your question has optimization in the tag), then the new player has to result in a new trio that's better than the last, so if he doesn't improve your objective function (whatever it might be), then you don't accept the trade.
choose(500,3)
Shows there are almost 21,000,000 combinations of 3 players drawn from a pool of 500 which means a complete analysis of the entire search space ought to be reasonably doable in a reasonable time on a modern machine.
You can generate the indeces of these combinations using iterpc() and getnext() from the iterpc package. As in
# library(iterpc) # uncomment if not loaded
I <- iterpc(5, 3)
getnext(I)
You can also drastically cut the search space in a number of ways by setting up initial filtering criteria and/or by taking the first solution (while loop with condition = meeting criterion). Or, you can get and rank order all of them (loop through all combinations) or some intermediate where you get n solutions. And preprocessing can help reduce the search space. For example, ordering salaries in ascending order first will give you the cheapest salary solution first. Ordering the file by descending runs will give you the highest runs solutions first.
NOTE: While this works fine, I see iterpc now is superseded by the arrangements package where the relevant iterator is icombinations(). getnext() is still the access method for succeeding iterators.
Thanks, I used a combination of both John's and James's answers.
Filtered out all the players who don't satisfy the criteria and that boiled down only to 90+ players.
Then I used picked up players in random until all the variations got exhausted
Finally, I computed combined metrics for each variation (set) of players to arrive at the optimized set.
The code is a bit messy and doesn't wanna post it here.

Set vertex names

I have a network on R and I have to attach names to all vertices that have more than 3 related ties(or better, that have degree >=2, that is, 2 or more adjacent edges). In one case I have a network made of firms who collaborated with one another, and I need to assign to all vertices with degree>=3 the correspondent firm's name (which I have in the csv dataset in the column Project Company).

R How to compare Rows and Delete by Matching Strings

Match Up Date Points Opponent Points Reb Opponent Reb
Dal vs Den 8/16/2015 20 21 10 15
Den vs Dal 8/16/2015 21 20 15 10
I have a dataframe with sports data. However, I have two rows for every game due to the way that the data had to be collected. For example the two rows above are the same game, but again the data had to be collected twice for each game in this case: once for Dal and one for Den.
I'd like to find a way to delete duplicate games. I figure that one of my conditions to compare will have to be game date. How else would I be able to tell R that I would like for it to check to delete duplicate rows? I assume that I should be able to tell R to:
Check that game date matches
If game date match and if "Teams" match then delete duplicate. (Can this be done even though the strings are not an exact match, i.e. since Den vs Dal and Dal vs Den would not be a matching string?)
Move on to the next row and repeat until the end of the spreadsheet.
R would not need to check more than 50 rows down before moving on to the next row.
Is there a function to test for matching individual words? So for example I do not want to have to tell R: "If cell contains Den... Or "If cell contains Dal as this would involve too many teams. R needs to be able to check the cells for ANY value that could be in and then look to find out if the same value can be found as a string in later rows.
Please help.

network directed graph optimization package in R

I have used R package lpsolve in past, but feel that it is not perfect for my current problem.
I want to optimize below problem.
I have nodes and links as depicted in the diagram. I start from new york and I want to ship fruits to customer on day 4. Each node consist of 4 parts: physical location, item, site type, time. You can say that node name is combination of above 4 fields.
I can take 2 paths. My objective is to meet the customer demand and to send all fruits to sink at minimum cost.
Transportation cost for each fruit and time take to travel on the lane is given by text which is on transportation route.
my new york location is the only input and it gets 50 fruits on day 1 and customer is the only output location and in this case customer is looking for 30 fruits on day 4.
in the current scenario solution is to send 30 fruits along newyork, newmexico, customer lane and 20 fruits along newyork, arizona, customer lane. For 20 fruits we will choose newyork, arizona, customer lane as arizona, customer lane has less cost (90 USD) as compared to newmexico, customer cost (100 usd)Inline image 6
To provide the input to model, I create a sink to newyork link and send 50 fruits on that lane. direct transportation lanes from arizona and newmexico to sink are very costly and because of the high cost, my optimization will avoid them as much as possible.
as of now I am building all links and nodes using sql. I am also using sql to populate quantity that newyork gets and quantity that customer wants. Then I optimize my network using IBM ILOG.
I want to replace IBM ILOG optimization part with R package. Which package should I use?
my constraints are:
input quantity to each node has to be equal to output quantity from each node.
new york get 50 fruits on day 1
customer want 30 fruits on day 4 and we cannot give more to a customer.
to make optimization easy I create sink to newyork link, which I have shown by dotted line.
In ILOG I can create TUPLEs and then write my optimization code. I guess I can solve this problem in R package Lpsolve too, but creating constraints and objective is going to involve writing many loops. In my actual network I have 10000+ nodes and I was wondering if there is any R package specially designed for this purpose.
Would it be possible to provide simple code to solve below problem in R?

Resources