network directed graph optimization package in R - r

I have used R package lpsolve in past, but feel that it is not perfect for my current problem.
I want to optimize below problem.
I have nodes and links as depicted in the diagram. I start from new york and I want to ship fruits to customer on day 4. Each node consist of 4 parts: physical location, item, site type, time. You can say that node name is combination of above 4 fields.
I can take 2 paths. My objective is to meet the customer demand and to send all fruits to sink at minimum cost.
Transportation cost for each fruit and time take to travel on the lane is given by text which is on transportation route.
my new york location is the only input and it gets 50 fruits on day 1 and customer is the only output location and in this case customer is looking for 30 fruits on day 4.
in the current scenario solution is to send 30 fruits along newyork, newmexico, customer lane and 20 fruits along newyork, arizona, customer lane. For 20 fruits we will choose newyork, arizona, customer lane as arizona, customer lane has less cost (90 USD) as compared to newmexico, customer cost (100 usd)Inline image 6
To provide the input to model, I create a sink to newyork link and send 50 fruits on that lane. direct transportation lanes from arizona and newmexico to sink are very costly and because of the high cost, my optimization will avoid them as much as possible.
as of now I am building all links and nodes using sql. I am also using sql to populate quantity that newyork gets and quantity that customer wants. Then I optimize my network using IBM ILOG.
I want to replace IBM ILOG optimization part with R package. Which package should I use?
my constraints are:
input quantity to each node has to be equal to output quantity from each node.
new york get 50 fruits on day 1
customer want 30 fruits on day 4 and we cannot give more to a customer.
to make optimization easy I create sink to newyork link, which I have shown by dotted line.
In ILOG I can create TUPLEs and then write my optimization code. I guess I can solve this problem in R package Lpsolve too, but creating constraints and objective is going to involve writing many loops. In my actual network I have 10000+ nodes and I was wondering if there is any R package specially designed for this purpose.
Would it be possible to provide simple code to solve below problem in R?

Related

R - How to get/print description of the variables in a dataset loaded from any package / library (eg: ISLR)?

Is there a function that can print the description / detailed info (what does a variable represent, what are it's units etc.) about the variables that are part of a dataset loaded from a library or package?
Note: I am using jupyter notebook.
Is there a way to check if a dataset has in-built info?
I have loaded the datasets from the library (ISLR) for the book "Introduction to Statistical Learning with R."
I want to see the description of the variables included in the 'College' dataset like : Top10perc, Outstate, etc.
# load library
library(ISLR)
df = College # saved data with a generic name 'df'
For ex:
(got this description from a pdf for the package ISLR)
__College__
U.S. News and World Report’s College Data
__Description__
Statistics for a large number of US Colleges from the 1995 issue of US News and World Report.
__Format__
A data frame with 777 observations on the following 18 variables.
Private A factor with levels No and Yes indicating private or public university
Apps Number of applications received
Accept Number of applications accepted
Enroll Number of new students enrolled
Top10perc Pct. new students from top 10% of H.S. class
Top25perc Pct. new students from top 25% of H.S. class
F.Undergrad Number of fulltime undergraduates
P.Undergrad Number of parttime undergraduates
Outstate Out-of-state tuition
Room.Board Room and board costs
Books Estimated book costs
Personal Estimated personal spending
PhD Pct. of faculty with Ph.D.’s
Terminal Pct. of faculty with terminal degree
S.F.Ratio Student/faculty ratio
perc.alumni Pct. alumni who donate
Expend Instructional expenditure per student
Grad.Rate Graduation rate
They are typically documented in help files. For example, to get the documentation on the Auto data set:
library(ISLR)
?Auto
This will show all of the help files and you can click through to get more information.
help(packages = "ISLR")
Alternately, the help files are assembled into a Reference Manual and that can be readily accessed on the CRAN home page of the package, e.g. https://cran.r-project.org/package=ISLR
This should help you
```{r}
library(ISLR)
?ISLR
```

Creating a weighted adjacency matrix with iterations

I have a data on the list of directors from different companies. Directors from one company meet at the same board of directors. Moreover, I also have a data how many times these directors were in the same board of directors. I have to create an adjacency matrix consisting from these directors. Nodes represent how many times 2 directors were in the same board of directors (i.e. if A and B are from company 1, and there were 11 meetings in this company, hence it must be 11 on at the intersection of A and B and if A and B from different boards of directors (from different companies), then it must be 0 at the intersection.
I have created this matrix in Excel successfully via command
=IF(VLOOKUP($E2;$A$1:$C$27;2;0)=(VLOOKUP(F$1;$A$1:$C$27;2;0));$C2;0)
However, the main problem is that two or more directors may meet in more than one board of directors (one company). In this case the total number of meetings must be added together. For example, if A and B meet together in company 1 for 11 times and in company 3 for 4 times, then it must be 15 at the intersection and, unfortunately, I can't understand how to realize it. I've searched for similar problems and I didn't found any cases where the data in original data was repeated. I have no idea, whether it is possible to realize it in Excel or should I apply another software (R or something else)?
See if this array formula works for you:-
=SUM(ISNUMBER(MATCH(IF($A$2:$A$27=F$1,$B$2:$B$27,"+"),IF($A$2:$A$27=$E2,$B$2:$B$27,"-"),0))*$C$2:$C$27)
Must be entered with CtrlShiftEnter

Splitting data from a data.frame of length 1 by using a delimiter in R

I just imported a text file into R, and now I have a data frame of length 1.
I had 152 reviews separated by a * character in that text file. Each review is like a paragraph long.
How can I get a data frame of length 152, each review being only 1 in the data frame ? I used this line to import the file into R :
myReviews <- read.table("C:/Users/Norbert/Desktop/research/Important files/Airline Reviews/Reviews/air_can_Review.txt", header=FALSE,sep="*")
(myReview has a length if 1 here.)... I need it to be 152, the number of reviews inside the text file.
How can I split the data from the data frame by the "*" delimiter, or just import the text file correctly by putting it into a data frame of length 152 instead of 1.
EDIT : Example of the data
I have 152 of this kind of data, all separated by "*":
I boarded Air Canada flight AC 7354 to Washington DCA the next morning, April 13th. After arriving at DCA, I
discovered to my dismay that the suitcase I had checked in at Tel Aviv was not on my flight. I spent 5 hours at
Pearson trying to sort out the mess resulting from the bumped connection.
*First time in 40 years to fly with Air Canada. We loved the slightly wider seats. Disappointed in no movies to
watch as not all travelers have i-phones etc. Also, on our trip down my husband asked twice for coffee, black. He
did not get any and was not offered anything else. On our trip home, I asked for a black coffee. She said she was
making more as several travelers had asked. I did not my coffee or anything else to drink. It was a long trip with
nothing to drink especially when they had asked. When trying to get their attention, they seemed to avoid my
tries. We found the two female stewardess's not very friendly. It may seem a small thing but very frustrating for
a traveler.
*Painless check-in. Two legs of 3 on AC: AC105, YYZ-YVR. Roomy and clean A321 with fantastic crew. AC33: YVR-SYD,
very light load and had 3 seats to myself. A very enthusiastic and friendly crew as usual on this transpacific
route that I take several times a year. Arrived 20 min ahead of schedule. The expected high level of service from
our flag carrier, Air Canada. Altitude Elite member.
They are airline reviews, each review separated by a *. I would like to take all of the reviews and put them in one data.frame in R, but each review should get its own "slot" or "position". The "*" is intended to be the separator.

Particial specification in Information retrieval

Hello i got an assignment on Information Retrieval and i could not realise how to create that partial specification,i mean the value of the words like here: http://nlp.stanford.edu/IR-book/html/htmledition/finite-automata-and-language-models-1.html
the = 0.2
a = 0.1
frog = 0.01... and so on. I would be thankful if someone explains how to calculate these values.
Learn about Language models!
a) Explain the idea!
b) Consider the following document collection:
D1: Today is sunny. Sunny Berlin! To be or not to be.
D2: She is in Berlin today. She is a sunny girl. Berlin is always exciting!
Calculate the corresponding Unigram Language Model for each document! Assume
the stop probability to be xed across models (and equal to 0:2). Use these models
to rank the documents given the query \sunny Berlin"!
The value of those words are not calculated there on the page. The are obtained from statistics of from the definition of the model.
For example if you look at the picture below, there are two different models with different probabilities for each word. As the designer of your model you will need to define the probabilities by yourself.
If you couldn't understand what is the language model here is a simple example:
Imagine people who are living in London have one language model M1 and people living in NY have other language model M2.
Based on some statistics, we know that people in London use the word "sunny" two times more than people in NY (for any reason) so in M1 probability of using "sunny" will be 0.04 and in M2 "sunny" = 0.02. Refereeing to other texts TV, Magazine and so on, we can define "with what probability" people of London(M1) and NY(M2) use other words and we create a table like what shown above.
Now we have a sentence "She is a sunny girl" which we don't know its from person in London or in NY.
Referring to the table we can guess this more likely is from a Londoner (M1) because they use this word more!

Is there some way of recycling a Crystal Reports dataset?

I'm trying to write a Crystal Report which has totals grouped in a different way to the main report. The only way I've been able to do this so far is to use a subreport for the totals, but it means having to hit the data source again to retrieve the same data, which seems like nonsense. Here's a simplified example:
date name earnings source location
-----------------------------------------------------------
12-AUG-2008 Tom $50.00 washing cars uptown
12-AUG-2008 Dick $100.00 washing cars downtown { main report }
12-AUG-2008 Harry $75.00 mowing lawns around town
total earnings for washing cars: $150.00 { subreport }
total earnings for mowing lawns: $75.00
date name earnings source location
-----------------------------------------------------------
13-AUG-2008 John $95.00 dog walking downtown
13-AUG-2008 Jane $105.00 washing cars around town { main report }
13-AUG-2008 Dave $65.00 mowing lawns around town
total earnings for dog walking: $95.00
total earnings for washing cars: $105.00 { subreport }
total earnings for mowing lawns: $65.00
In this example, the main report is grouped by 'date', but the totals are grouped additionally by 'source'. I've looked up examples of using running totals, but they don't really do what I need. Isn't there some way of storing the result set and having both the main report and the subreport reference the same data?
Hmm... as nice as it is to call the stored proc from the report and have it all contained in one location, however we found (like you) that you eventually hit a point where you can't get crystal to do what you want even tho the data is right there.
We ended up introducing a business layer which sits under the report and rather than "pulling" data from the report we "push" the datasets to it and bind the data to the report. The advantage is that you can manipulate the data in code in datasets or objects before it reaches the report and then simply bind the data to the report.
This article has a nice intro on how to setup pushing data to the reports. I understand that your time/business constraints may not allow you to do this, but if it's at all possible, I'd highly recommend it as it's meant we can remove all "coding" out of our reports and into managed code which is always a good thing.
The only way I can think of doing this without a second run through the data would be by creating some formulas to do running totals per group. The problem I assume you are running into with the existing running totals is that they are intended to follow each of the groups that they are totaling. Since you seem to want the subtotals to follow after all of the 'raw' data this won't work.
If you create your own formulas for each group that simply adds on the total from those rows matching the group you should be able to place them at the end of the report. The downside to this approach is that the resulting subtotals will not be dynamic in relationship to the groups. In other words if you had a new 'source' it would not show up in the subtotals until you added it or if you had no 'dog walking' data you would still have a subtotal for it.

Resources