cutting stock optimization/waste minimize in r using lpsolve/lpsolveapi - r

I am having a tough time understanding the how to formulate code to a cutting stock problem. I have searched the web extensively and I see a lot of theory but no actual examples.
The majority of query results point to the wikipedia page: http://en.wikipedia.org/wiki/Cutting_stock_problem
13 patterns to be produced, with required amounts indicated alongside.
The machine produces by default a 5600 width piece to be cut into widths below. Goal is to minimize waste.
Widths/Required amount
1380 22
1520 25
1560 12
1710 14
1820 18
1880 18
1930 20
2000 10
2050 12
2100 14
2140 16
2150 18
2200 20
Would someone show me how to formulate this solution in R with lpsolve/lpsolve API?
stock=5600
widths=c(1380,1520,1560,1710,1820,1880,1930,2000,2050,2100,2140,2150,2200)
required=c(22,25,12,14,18,18,20,10,12,14,16,18,20)
library(lpSolveAPI)
...
solve(lprec)
get.variables(lprec)

You could model it as a Mixed Integer Problem and solve it using various techniques. Of course to generate variables (i.e. a valid pattern of widths) you need to use a suitable column generation method.
Have a look at this C++ project: https://code.google.com/p/cspsol
cspsol is based on GLPK API library, uses column generation and branch & bound to solve the MIP. It may give you some hints about how to do it in R.
Good luck !

Related

How can I tabulate data that are listed on a map in a pdf?

So, the ATF publishes reports going all the way back to 2008 on trace statistics from each state. I need to pull the number of firearms traced from the source states listed on the pdf (see pdf).
This is for all the years going back to 2008, and I have no idea how to efficiently pull this data. I attempted this with R because that is the only programming language I have experience in (see below).
txt <- pdf_text("https://www.atf.gov/about/docs/report/
colorado-firearms-trace-data-2014/download")
cat(txt[7])
The results...
Top 15 Source States for Firearms with a Colorado Recovery
January 1, 2014 – December 31, 2014
25
27
27
19 23
1,762
71 28
25
45 60
26
97 22
44
NOTE: An additional 32 states accounted for 261 other traces.
The source state was identified in 2,562 total traces.
Bureau of Alcohol, Tobacco, Firearms and Explosives, Office
of Strategic Intelligence and Information
Beyond this, I haven't been able to find anything online that can help me tabulate this data into something like this:
recovered year from weapons
colorado 2014 colorado 1762
colorado 2014 other 261
colorado 2014 washington 25
(and so on...)
Realizing this may be due to the limitations of R, I just want to know if there a good source where I can learn how to develop a function for this (possibly in R). Especially before I attempt to type this out by hand, or learn a new language from scratch (both of which I'm not sure if I can make the time for.)

R smbinning package: why 'Too many categories' for some variables?

I have a dataset in R containing many variables of different types and I am attempting to use the smbinning package to calculate Information Value.
I am using the following code:
smbinning.sumiv(Sample,y="flag")
This code produces IV for most of the variables, but for some the Process column states 'Too many categories' as shown in the output below:
Char IV Process
12 relationship NA Too many categories
15 nationality NA Too many categories
22 business_activity NA Too many categories
23 business_activity_group NA Too many categories
25 local_authority NA Too many categories
26 neighbourhood NA Too many categories
If I take a look at the values of business_activity_group for instance, I can see that there are not too many possible values it can take:
Affordable Rent Combined Commercial Community Combined
2546 4
Freeholders Combined Garages
23 6
General Needs Combined Keyworker
57140 340
Leasehold Combined Market Rented Combined
88 1463
Older Persons Combined Rent To Homebuy
4774 76
Shared Ownership Combined Staff Acommodation Combined
167 5
Supported Combined
2892
I thought this could be due to low volumes in some of the categories so I tried banding some of the groups together. This did not change the result.
Can anyone please explain why 'Too many categories' occurs, and what I can do to these variables in order to produce IV from the smbinning package?

Optimizing the sum of a variable in R given a constraint

Using the following dataset:
ID=c(1:24)
COST=c(85,109,90,104,107,87,99,95,82,112,105,89,101,93,111,83,113,81,97,97,91,103,86,108)
POINTS=c(113,96,111,85,94,105,105,95,107,88,113,100,96,89,89,93,100,92,109,90,101,114,112,109)
mydata=data.frame(ID,COST,POINTS)
I need a R function that will consider all combinations of rows where the sum of 'COST' is less than a fixed value - in this case, $500 - and make the optimal selection based on the summed 'POINTS'.
Your help is appreciated.
So since this post is still open I thought I would give my solution. These kinds of problems are always fun. So, you can try to brute force the solution by checking all possible combinations (some 2^24, or over 16 million) one by one. This could be done by considering that for each combination, a value is either in it or not. Thinking in binary you could use the follow function code which was inspired by this post:
#DO NOT RUN THIS CODE
for(i in 1:2^24)
sum_points[i]<-ifelse(sum(as.numeric((intToBits(i)))[1:24] * mydata$COST) < 500,
sum(as.numeric((intToBits(i)))[1:24] * mydata$POINTS),
0)
I estimate this would take many hours to run. Improvements could be made with parallelization, etc, but still this is a rather intense calculation. This method will also not scale very well, as an increase by 1 (to 25 different IDs now) will double the computation time. Another option would be to cheat a little. For example, we know that we have to stay under $500. If we added up the n cheapest items, at would n would we definitely be over $500?
which(cumsum(sort(mydata$COST))>500)
[1] 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
So any more than 5 IDs chosen and we are definitely over $500. What else.
Well we can run a little code and take the max for that portion and see what that tells us.
sum_points<-1:10000
for(i in 1:10000)
sum_points[i]<-ifelse(sum(as.numeric((intToBits(i)))[1:24]) <6,
ifelse(sum(as.numeric((intToBits(i)))[1:24] * mydata$COST) < 500,
sum(as.numeric((intToBits(i)))[1:24] * mydata$POINTS),
0),
0)
sum_points[which.max(sum_points)]
[1] 549
So we have to try to get over 549 points with the remaining 2^24 - 10000 choices. But:
which(cumsum(rev(sort(mydata$POINTS)))<549)
[1] 1 2 3 4
Even if we sum the 4 highest point values, we still dont beat 549, so there is no reason to even search those. Further, the number of choices to consider must be greater than 4, but less than 6. My gut feeling tells me 5 would be a good number to try. Instead of looking at all 16 millions choices, we can just look at all of the ways to make 5 out of 24, which happens to be 24 choose 5:
num<-1:choose(24,5)
combs<-combn(24,5)
sum_points<-1:length(num)
for(i in num)
sum_points[i]<-ifelse(sum(mydata[combs[,i],]$COST) < 500,
sum(mydata[combs[,i],]$POINTS),
0)
which.max(sum_points)
[1] 2582
sum_points[2582]
[1] 563
We have a new max on the 2582nd iteration. To retrieve the IDs:
mydata[combs[,2582],]$ID
[1] 1 3 11 22 23
And to verify that nothing went wrong:
sum(mydata[combs[,2582],]$COST)
[1] 469 #less than 500
sum(mydata[combs[,2582],]$POINTS)
[1] 563 #what we expected.

Test performing on counts

In R a dataset data1 that contains game and times. There are 6 games and times simply tells us how many time a game has been played in data1. So head(data1) gives us
game times
1 850
2 621
...
6 210
Similar for data2 we get
game times
1 744
2 989
...
6 711
And sum(data1$times) is a little higher than sum(data2$times). We have about 2000 users in data1 and about 1000 users in data2 but I do not think that information is relevant.
I want to compare the two datasets and see if there is a statistically difference and which game "causes" that difference.
What test should I use two compare these. I don't think Pearson's chisq.test is the right choice in this case, maybe wilcox.test is the right to chose ?

Sorting data in R

I have a dataset that I need to sort by participant (RECORDING_SESSION_LABEL) and by trial_number. However, when I sort the data using R none of the sort functions I have tried put the variables in the correct numeric order that I want. The participant variable comes out ok but the trial ID variable comes out in the wrong order for what I need.
using:
fix_rep[order(as.numeric(RECORDING_SESSION_LABEL), as.numeric(trial_number)),]
Participant ID comes out as:
118 118 118 etc. 211 211 211 etc. 306 306 306 etc.(which is fine)
trial_number comes out as:
1 1 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 2 2 20 20 .... (which is not what I want - it seems to be sorting lexically rather than numerically)
What I would like is trial_number to be order like this within each participant number:
1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 ....
I have checked that these variables are not factors and are numeric and also tried without the 'as.numeric', but with no joy. Looking around I saw suggestions that sort() and mixedsort() might do the trick in place of 'order', both come up with errors. I am slowly pulling my hair out over what I think should be a simple thing. Can anybody help shed some light on how to do this to get what I need?
Even though you claim it is not a factor, it does behave exactly as if it were a factor. Testing if something is a factor can be tricky since a factor is just an integer vector with a levels attribute and a class label. If it is a factor, your code needs to have a call to as.character() nested inside the as.numeric():
fix_rep[order(as.numeric(RECORDING_SESSION_LABEL), as.numeric(as.character(trial_number))),]
To be really sure if it's a factor, I recommend the str() function:
str(trial_number)
I think it may be worthwhile for you to design your own function in this case. It wouldn't be too hard, basically you could just design a bubble-sort algorithm with a few alterations. These alterations could change each number to a string, and begin by sorting those with different numbers of digits into different bins (easily done by finding which numbers, which are now strings, have the greatest numbers of indices). Then, in a similar fashion, the numbers in these bins could be sorted by converting the least significant digit to a numeric type and checking to see which are the largest/smallest. If you're interested, I could come up with some code for this, however, it looks like the two above me have beat me to the punch with some of the built-in functions. I've never used those functions, so I'm not sure if they'll work as you intend, but there's no use in reinventing the wheel.

Resources