Frequency Plot in R - r

I have an array of numbers (called tails.Z) ranging from 0 to 999 and I want to see which numbers appear most frequently. In order to do so, I do a simple frequency plot using hist(tails.Z, breaks=1000) with the following result:
According to the plot, the most frequent number appears over 400 times and is some value close to zero. A second peak is somewhere around the value of 200 and indicates that the number appears just short of 400 times.
However, when I do sort(table(tails.Z)) to see the actual numbers and their frequencies I get that the most frequent number is 175 which appears 377 times, then the 2nd most frequent number is 176 which appears 290 times, then 3 which appears 266 times, 0 255 times and 5 263 times. How is it possible that the first peak in the graph is higher than 400 but in table there is no number with that frequency?
EDIT: I should add that tails.Z is an array of integers ranging from 0 to 999 and that there is 114,411 elements in it.

See what hist function does using str:
hs=hist(tails.Z, breaks=1000)
str(hs)
tail(cbind(hs$mids,hs$counts),20)
barplot(hs$counts)
summary(hs$counts)

Related

Plot stacked bar with calculated values in another column

I have a dataframe which includes leads we generated everyday, columns are lead_id, generated_date, handled_date, termination_date, each row is one lead, if a lead has empty handled_date, that means this lead is still waiting to be handled, and the lead is available for later days before temination_date. I want to plot a stack bar which shows how many leads are available everyday, each stacked bar means on which day the lead is generated. the senario is:
on 20190413,we generate 100 leads, then on 20190413 we have one bar With height 100,
on 20190414, we generate 150 new leads, and 50 leads from previous day (20190413) are handled, so 50 left(no leads are terminated), in the graph, 20190414 should have 2 stacked bars, one is 50 (100-50) leads left from 20190413, another one is 150 New generated leads
on 20190415, we generate 100 leads, 30 leads from 20190413 are handled 'yesterday'(20190414, and no leads are terminated), 50 leads from 20190414 are handled also 'yesterday' (20190414), so in the graph, for 20190415 there should be 3 stacked bars, 20 (100-50-30) leads from 20190413, 100 (150-50) from 20190414, and 100 New generated leads.
As long as it is before the termination_date, and the lead has empty value in column of handled_date, then this lead is availalbe.
Can anybody tell me how to plot? Thank you very much.

How can I get a random multiple of 50 using Microsoft Small Basic?

How can I get a random multiple of 50 between 0 and 800?
So I would need numbers:
0,50,100,150,200,250,300,350,400,450,500,550,600,650,700,750,800.
I've tried using math.getrandomnumber(800) but that gives me any number.
Get a random number between 0 and 16, then multiply it with 50.
Firstly You Should get a random number under 16. Then multiply the random number under 16 x 50. Like that you will get always a random multiple of 50 under 800. Because 50 x 16 = 800. And 16 is the maximal number that you can multiply with 50.
RandomNumber_under16 = Math.GetRandomNumber(16)
random_multiple_of_50_under_800 = RandomNumber_under16*50
TextWindow.WriteLine(random_multiple_of_50_under_800)
So I would need numbers, 0,50,100,150,200,250,300,350,400,450,500,550,600,650,700,750,800
You can build these numbers by scaling the numbers in the range [0,16] by a factor of 50.
Given the definition of the Math.GetRandomNumber function
Math.GetRandomNumber(maxNumber)
Gets a random number between 1 and the specified maxNumber (inclusive).
Parameters
maxNumber: The maximum number for the requested random value.
Returns
A Random number that is less than or equal to the specified max.
your solution will have to account for the fact that the returned random integer starts at 1 whereas you need to include 0 in your list.
Next code snippet produces the desired result:
For i = 1 To 20
TextWindow.WriteLine((Math.GetRandomNumber(17) - 1) * 50)
EndFor
' Math.GetRandomNumber(17) -> [1,17]
' Math.GetRandomNumber(17) - 1 -> [0,16]
' (Math.GetRandomNumber(17) - 1) * 50 -> {0,50,100,150, ... ,800}

Simulate a single n-sided die where the side with the highest number shows up twice as often as all other sides

I need to do this assignment. I just don't know how it works. The question is.
Modify the function roll() from the lecture in a way that it simulates a single n-sided die where the side with the highest number shows up twice as often as all other sides. Functions you may find useful are ?, c(), min(), max(), length(), sort() and rep().
And the function goes.
roll <- function( num = 1:6, rolls = 1) {
dice <- sample(num, size = rolls, replace = TRUE)
return(dice)
}
I'm pretty sure that i have to use the 'prob'-parameters in the sample-Function but i don't know how.
You can do it without the prob argument by thinking about what kind of fairly-weighted (all faces equally probable) die would give the results you want.
sample(1:6, 1) gives you a single sample from an unbiased six-sided die. What you seem to want in this instance is equivalent to a seven-sided die with two sixes. Which would be...
sample(c(1:6,6),1)
That's an equal change of 1 to 5, and double the chance of a 6.
> table(sample(c(1:6,6),7000,replace=TRUE))
1 2 3 4 5 6
972 1018 1016 980 1018 1996
Its not clear to me whether "the highest number shows up twice as often as all other sides" means "all the other sides put together". In which case you want to sample from a 10-sided die with 1 to 5 plus 5 sixes:
sample(c(1:5, rep(6,5)),1)
That's an equal chance of either getting 1 to 5 OR 6.
> table(sample(c(1:5, rep(6,5)),10000,replace=TRUE))
1 2 3 4 5 6
1012 961 943 1018 1026 5040
Generalise to N and write your function.
You are right, the prob-Parameter is useful here (eventhough you could do without).
Here are the steps you have to complete:
Find out which of the entries in num is largest (dont assume that it is the last)
You need the index (="position") of that entry.
Calculate which probability each entry except the largest one would have. Example: If n=6 then each prob is 1/7 with the exception of the last which has 2/7.
Make a vector containing these probabilities in the right positions. You already know the position of the largest, so you would put the doubled prob in that position.
Give the prob to sample().
Test! Run it many times to find out if the largest is really approx. double as often.

Cost Optimization across Different Suppliers for a Product

I've this following optimization problem. A company produces a product, say Big A. To produce this product, it requires 5 processes. (Please find the detail table below). For each process, there are number of supplier that supply raw material for that particular process. E.g. For process 1, there are 3 supplier 1,2 & 3.
The constrain for the CEO of this company,say C, is that for each process the CEO has to purchase supplies from Supplier 1 first, then for additional supplies from 2nd Supplier and so on.
The optimization problem is C wants 700 units for total material to produce for 1 unit of Big A then how will he do it at minimum cost. How the optimization will change if the amount of units require increases to 1500 units.
I'll be grateful if I get the solution of this answer. But if somebody can suggest me some reference regarding this problem it will be a great help too. I'm mainly using R software here.
Process Supplier Cost Units Cumm_Cost Cumm_Unit
1 1 10 100 10 100
1 2 20 110 30 210
1 3 10 200 40 410
2 1 20 100 20 100
2 2 30 150 50 250
2 3 10 150 60 400
3 1 40 130 40 130
3 2 30 140 70 270
3 3 50 120 120 390
4 1 20 120 20 120
4 2 40 120 60 240
4 3 20 180 80 420
5 1 30 180 30 180
5 2 10 160 40 320
5 3 30 140 70 460
Regards,
I will start by solving the specific problem that you have posted and then will demonsrate how to formulate the problem more abstractively. For simplicity, I will use Excel's Solver add-in to solve the problem, but any configuration of a modeling language (such as AIMMS, AMPL, LINGO, OPL, MOSEL and numerous others) with a solver (CPLEX, GUROBI, GLPK, CBC and numerous others) can be used. If you would like to use R, there exists an lpSolve package that calls the lpSolve solver (which is not the best one in the word to be honest, but it is free of charge).
Note that for "real" (large scale) integer problems, the commercial solvers CPLEX, GUROBI and XPRESS perform a lot better than others. The first completely free solver that performs decently in most tests (including Hans Mittelman's page) is CBC. CBC can be hooked up in excel and solve the built-in solver model without restrictions in the number of constraints or variables, by using this add-in. Therefore, assuming that most CPU is going to be spent by the optimization algorithm, using CBC/OpenSolver seems like an efficient choice.
SPREADSHEET SETUP
I follow some conventions for convenience:
Decision variable cells are marked Green.
Constraints are marked red.
Data are marked grey.
Objective function is marked blue.
First, lets augment the table you presented as follows:
The added columns explained briefly:
Selected?: equals 1 if the (Process, Supplier) combo is allowed to produced a positive quantity, zero otherwise.
Quantity: the quantity produced, defined for each (Process, Supplier) combo.
Max Quantity?: Equals 1 if the Suppliers produces the maximum amount of units for that particular Process.
Quantity UB: equals Units * Selected?. This makes the upper bound either equal to Units, when the Supplier is allowed to produce this Process, or zero otherwise.
Quantity LB: equals Units * Max Quantity?. This is to ensure that whenever the Max Quantity? column is 1, the produced quantity will be equal to Units.
Selection: For the 1st supplier, it equals 0. For the 2nd and 3rd suppliers, it equals the Max Quantity? of the previous supplier (row) minus the Selected? of the current supplier (row).
A screenshot with formulas:
There exist two more constraints:
There must be at least one item produced from each process and
The total number of items should be 700 (or later 1,500).
Here is their setup:
and here are the formulas:
In brief, we use SUMIF to sum the quantities that are specific to each supplier, which we are going to constrain to be more than 1 item for each process.
To finish the spreadsheet setup, we need to calculate the objective function, namely the cost of the allocation. This is easily done by taking the SUMPRODUCT of columns Quantity and Cost. Note that the cumulative quantities are derived data and not very useful in the current context.
After the above steps, the spreadsheet looks like below:
SOLVER MODEL
For the solver model we need to declare
The Objective
The Decisions
The Constraints
The Solver (and tweak some parameters if necessary).
For ease of exposition, I have given each range the name of its header. The solver model looks as follows:
It should all be explanatory, except possibly the Selected >= 0 part. The column selected equals the difference between the binary max Quantity of the previous supplier minus the Selected of the current supplier. Selected >= 0 => max Quantity of previous supplier >= Selected of current supplier. Therefore, if the previous supplier does not produce at max quantity (binary = 0), the current supplier cannot produce.
Then we need to make sure that the solver setting are OK:
and solve the problem.
Solution for req = 700 :
As we see, the model tries to avoid procedures 3 and 5 as much as possible, and satisfies the constraint "at least 1 item per process" by picking up exactly 1 item for processes 3 and 5. The objective function value is 11,710.
Solution for req = 1,500 :
Here we need more capacity, but yet process 3 seems expensive and the model tries to avoid it by allocating whatever is necessary (just 1 unit to supplier 1).
I hope this helps. The spreadsheet can be downloaded here. I include the definition of the mathematical model below, in case you would like to transfer it to another language.
MATHEMATICAL FORMULATION
A formal definition of your problem is as follows.
SETS:
PARAMETERS:
Decisions:
Objective:
Constraints:
Constraint explanation:
C1: A supplier cannot produce anything from a process if he has not been allocated to that process.
C2: If a supplier's maximum indicator is set to 1, then the production variable should be the maximum possible.
C3: We cannot select supplier s for process p if we have not produced the max quantity available from the previous supplier s_[-1].
C4: We need to produce at least 1 item from each process.
C5: the total production from all processes and suppliers should equal the required amount.
Looks like you should look at the simplex algorithm (or some existing implementation of it).
Wikipedia has a fairly nice description of the algorithm, http://en.wikipedia.org/wiki/Simplex_algorithm

Barplot in R - How to divide the plot into separate parts by showing them in different colors

I am a newbie to R. I would like to create a barplot which is visually divided into different parts.
My data looks like the following:
"1.0";"0.0";"1.0";"2.0";"710";"12500"
first four numbers give the number of parts that need to be ordered from the two parts list below. The fifth number gives the result sum of the first part, the sixth then the result sum of the second part.
part 1: 10;50;100;300
part 2: 500;1000;2000;5000;
this is how it is calculated.
1 * 10 + 0 * 50 + 1 * 100 + 2 * 300 = 710 ;
1 * 500 + 0 * 1000 + 1 * 2000 + 2 * 5000 = 12500
So what I now want to plot is for example the value 12500, but I want to visually divide this value into the different part (stacked bars) like two five thousands, then one two thousand then one five hundred -> the bar should consist of these parts which can visually be seen or marked with the value (would be nice to have different colors for each part in the part)
How can I do it? Folks, I did my homework, I searched a lot and did try it on my own, but couldn't achieve what I want.
daten <- matrix(c(10,50,100,300,500,1000,2000,5000),ncol=2)
multiplier <- c(1,0,1,2)
barplot(daten*multiplier)
To display bar segments in reverse order, you need to rearrange the rows in the daten*multiplier array:
barplot((daten*multiplier)[nrow(daten):1,])

Resources