Related
Good evening,
As part of a data analysis course we have been thrown into the Metaheuristics realm.....and I am really struggling to understand how to implement a Tabu search in R since my background in programming is rather limited.
I haven't found any R or Python example on Google or youtube either so I'm really praying I'll find something here.
The problem I have is similar to the "location problem" in optimisation. I need to find the best combination of Hubs that minimizes the total distance between Hubs and nodes.
I need to find 5 hubs, and the total capacity for each one is 120
nodes <- structure(list(node_number = 1:50,
x = c(2L, 80L, 36L, 57L, 33L, 76L, 77L, 94L,
89L, 59L, 39L, 87L, 44L, 2L, 19L, 5L,
58L, 14L, 43L, 87L, 11L, 31L, 51L, 55L,
84L, 12L, 53L, 53L, 33L, 69L, 43L, 10L,
8L, 3L, 96L, 6L, 59L, 66L, 22L, 75L, 4L,
41L, 92L, 12L, 60L, 35L, 38L, 9L, 54L, 1L),
y = c(62L, 25L, 88L, 23L, 17L, 43L, 85L, 6L, 11L,
72L, 82L, 24L, 76L, 83L, 43L, 27L, 72L, 50L,
18L, 7L, 56L, 16L, 94L, 13L, 57L, 2L, 33L, 10L,
32L, 67L, 5L, 75L, 26L, 1L, 22L, 48L, 22L, 69L,
50L, 21L, 81L, 97L, 34L, 64L, 84L, 100L, 2L, 9L, 59L, 58L),
node_demand = c(3L, 14L, 1L, 14L, 19L, 2L, 14L, 6L,
7L, 6L, 10L, 18L, 3L, 6L, 20L, 4L,
14L, 11L, 19L, 15L, 15L, 4L, 13L,
13L, 5L, 16L, 3L, 7L, 14L, 17L,
3L, 3L, 12L, 14L, 20L, 13L, 10L,
9L, 6L, 18L, 7L, 20L, 9L, 1L, 8L,
5L, 1L, 7L, 9L, 2L)),
.Names = c("node_number", "x", "y", "node_demand"),
class = "data.frame", row.names = c(NA, -50L))
hubs_required = 5
total_capacity = 120
My strategy was to create a distance matrix, then I will create another 50 x 50 matrix to represent wether a node becomes a hub or not, and finally I will multiply both and add all the distances to get the total distance.
I created the dataframe:
nodes_df <- as.data.frame(nodes)
colnames(nodes_df) <- c("x", "y", "node_demand")
rownames(nodes_df) <- paste('Node',1:50)
I created the distance matrix
distance_df <-as.data.frame(as.matrix(round(dist(nodes_df,method = "euclidean",diag = TRUE,upper = TRUE))))
colnames(distance_df) <- paste("Node",1:50)
I created the node demand matrix:
demand <- as.vector(rep(c(nodes_df[,'node_demand']),50))
demand_matrix <- matrix(demand,nrow=50,ncol=50,byrow = TRUE)
diag(demand_matrix) <- 0
demand_matrix <- as.data.frame(demand_matrix)
I created an empty matrix to show whether a node becomes a hub "1" or not "0"
hubs_matrix <- matrix(0,nrow = 50,ncol = 50,byrow = TRUE)
colnames(hubs_matrix) <- paste("Hub",1:50)
rownames(hubs_matrix) <- paste("Node",1:50)
Then to create the initial solution I randomly assign Hubs and calculate the distance and demand.
set.seed(37)
hubs_matrix <- do.call("cbind", lapply(1:50, function(x) sample(c(1, rep(0, 49)), 50)))
sum_distances <- (hubs_matrix * distance_df)
sum(rowSums(sum_distances))
The idea is to try different combinations of '1'' and '0' as to minimise the total distance but I am having the following issues:
I got no idea how to do the local search and do the permutations from the initial solution.
I got no idea how to prevent R to use the best solution for a certain period of time, i.e the Tabu list
I got no idea how to deal with the supply restriction for each node ( total demand from each node < 120), I could do it with a loop but since in this case I'm multiplying matrices I'm pretty lost.
Anybody could give me a hand???
Many thanks!
I have sampled 'n' rows from a dataframe called nodes:
nodes <- structure(list(node_number = 1:50,
x = c(2L, 80L, 36L, 57L, 33L, 76L, 77L, 94L,
89L, 59L, 39L, 87L, 44L, 2L, 19L, 5L,
58L, 14L, 43L, 87L, 11L, 31L, 51L, 55L,
84L, 12L, 53L, 53L, 33L, 69L, 43L, 10L,
8L, 3L, 96L, 6L, 59L, 66L, 22L, 75L, 4L,
41L, 92L, 12L, 60L, 35L, 38L, 9L, 54L, 1L),
y = c(62L, 25L, 88L, 23L, 17L, 43L, 85L, 6L, 11L,
72L, 82L, 24L, 76L, 83L, 43L, 27L, 72L, 50L,
18L, 7L, 56L, 16L, 94L, 13L, 57L, 2L, 33L, 10L,
32L, 67L, 5L, 75L, 26L, 1L, 22L, 48L, 22L, 69L,
50L, 21L, 81L, 97L, 34L, 64L, 84L, 100L, 2L, 9L, 59L, 58L),
node_demand = c(3L, 14L, 1L, 14L, 19L, 2L, 14L, 6L,
7L, 6L, 10L, 18L, 3L, 6L, 20L, 4L,
14L, 11L, 19L, 15L, 15L, 4L, 13L,
13L, 5L, 16L, 3L, 7L, 14L, 17L,
3L, 3L, 12L, 14L, 20L, 13L, 10L,
9L, 6L, 18L, 7L, 20L, 9L, 1L, 8L,
5L, 1L, 7L, 9L, 2L)),
.Names = c("node_number", "x", "y", "node_demand"),
class = "data.frame", row.names = c(NA, -50L))
To sample I use this code:
hubs <- nodes[sample(1:total_nodes, hubs_required, replace = FALSE),]
Which returns :
node_number x y node_demand
33 33 8 26 12
14 14 2 83 6
42 42 41 97 20
13 13 44 76 3
10 10 59 72 6
I would like to return all the rows that haven't been selected so that I can perform a series of calculations on them.
I thought that using something like data[-sample,] would work but I get the following error
Error in xj[i] : invalid subscript type 'list'.
Anybody know who could I get these values?
It would be easier to keep the list of indexes that selected. Somthing like
hubs <- nodes[keep <- sample(1:total_nodes, hubs_required, replace = FALSE),]
other_hubs <- nodes[-keep, ]
Otherwise, if your data has some sort of key/ID, you can do something like
other_hubs <- nodes[nodes%node_number %in% hubs$node_number, ]
or with dplyr, this can be an anti-join
nodes %>% anti_join(hubs, by="node_number")
This question already has answers here:
How does predict.lm() compute confidence interval and prediction interval?
(2 answers)
Closed 4 years ago.
With the following set up, why does one get the same standard deviations in both cases, namely: 1.396411?
Regression:
CopierDataRegression <- lm(V1~V2, data=CopierData1)
Intervals:
X6 <- data.frame(V2=6)
predict(CopierDataRegression, X6, se.fit=TRUE, interval="confidence", level=0.90)
predict(CopierDataRegression, X6, se.fit=TRUE, interval="prediction", level=0.90)
Both give the same result for se.fit.
One gets the correct standard deviations for the predictions with the following code:
z <- predict(CopierDataRegression, X6, se.fit=TRUE)
sqrt(z$se.fit^2 + z$residual.scale^2),
but I dont understand why one in this formula adds the residual standard deviation in the computation of the standard errors, could someone explain this?
Data:
CopierData1 <- structure(list(V1 = c(20L, 60L, 46L, 41L, 12L, 137L, 68L, 89L,
4L, 32L, 144L, 156L, 93L, 36L, 72L, 100L, 105L, 131L, 127L, 57L,
66L, 101L, 109L, 74L, 134L, 112L, 18L, 73L, 111L, 96L, 123L,
90L, 20L, 28L, 3L, 57L, 86L, 132L, 112L, 27L, 131L, 34L, 27L,
61L, 77L), V2 = c(2L, 4L, 3L, 2L, 1L, 10L, 5L, 5L, 1L, 2L, 9L,
10L, 6L, 3L, 4L, 8L, 7L, 8L, 10L, 4L, 5L, 7L, 7L, 5L, 9L, 7L,
2L, 5L, 7L, 6L, 8L, 5L, 2L, 2L, 1L, 4L, 5L, 9L, 7L, 1L, 9L, 2L,
2L, 4L, 5L)), .Names = c("V1", "V2"),
class = "data.frame", row.names = c(NA, -45L))
You have to account for error in the estimation due to sampling and from the noise term when you make a prediction. The confidence interval only accounts for the former. See the answer here.
Further, they do not give the same result for the bounds:
> predict(CopierDataRegression, X6,
+ se.fit=TRUE, interval="confidence", level=0.90)$fit
fit lwr upr
1 89.63133 87.28387 91.9788
> predict(CopierDataRegression, X6,
+ se.fit=TRUE, interval="prediction", level=0.90)$fit
fit lwr upr
1 89.63133 74.46433 104.7983
The se.fit only gives you for the error of the predicted mean, not the sd of the error term as documented in ?predict.lm:
se.fit standard error of predicted means
residual.scale residual standard deviations
Suppose I have a sequence of numbers:
> dput(firstGrade_count)
c(4L, 346L, 319L, 105L, 74L, 5L, 124L, 2L, 10L, 35L, 6L, 206L,
7L, 8L, 6L, 9L, 26L, 1L, 35L, 18L, 4L, 4L, 2L, 63L, 6L, 23L,
6L, 82L, 10L, 17L, 45L, 74L, 10L, 8L, 14L, 23L, 26L, 53L, 55L,
16L, 2L, 141L, 113L, 98L, 179L, 13L, 34L, 16L, 8L, 144L, 2L,
141L, 26L, 9L, 125L, 201L, 32L, 452L, 179L, 30L, 4L, 141L, 5L,
40L, 7L, 255L, 120L, 223L, 28L, 252L, 21L, 8L, 362L, 4L, 5L,
2L, 285L, 18L, 76L, 5L, 73L, 11L, 367L, 7L, 50L, 6L, 37L, 15L,
48L, 5L, 12L, 7L, 96L)
I want to plot it using ggplot2 so the result would be something similar to:
barplot(firstGrade_count)
How would I define the aesthetics in ggplot2?
Here is the plot produced by base plot that I mentioned above:
firstGrade_count <- c(4,346,319,105,74,5,124,2,10,35,6,206,7,8,6,9,26,1,35,18,4,4,2,63,6,23,6,
82,10,17,45,74,10,8,14,23,26,53,55,16,2,141,113,98,179,13,34,16,8,144,2,141,26,9,
125,201,32,452,179,30,4,141,5,40,7,255,120,223,28,252,21,8,362,4,5,2,285,18,76,5,73,
11,367,7,50,6,37,15,48,5,12,7,96)
You can optionally make this into a data frame to make it easier to work with and to hold any additional features, though it's not required.
library(ggplot2)
# Optional transformation to data.frame:
firstGrade_count <- as.data.frame(firstGrade_count)
# Index for x-coordinates
firstGrade_count$index <- seq(1:nrow(firstGrade_count))
# Plotting
c <- ggplot(firstGrade_count, aes(index,firstGrade_count))
c + geom_bar(stat = "identity")
Or,
firstGrade_count <- as.data.frame(firstGrade_count)
c <- ggplot(firstGrade_count, aes(factor(firstGrade_count)))
c + geom_bar()
The way I set it up shows you the count of each unique value. There are many variations and additional formats you could add:
If you don't want the count you can add the stat = option and change it from the default to something else.
Without creating any dataframe:
ggplot() +
geom_bar(aes(1:length(firstGrade_count),firstGrade_count), stat='identity') +
xlab('')
############ uncoded data
x10<- structure(c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 5L, 8L, 9L, 31L, 1L,
0L, 0L, 0L, 1L, 0L, 1L, 2L, 7L, 2L, 10L, 0L, 2L, 0L, 2L, 2L,
5L, 2L, 4L, 6L, 8L, 4L, 1L, 1L, 3L, 2L, 2L, 6L, 1L, 12L, 18L,
7L, 29L, 8L, 4L, 6L, 8L, 6L, 19L, 3L, 9L, 12L, 3L, 12L, 14L,
1L, 2L, 1L, 3L, 1L, 0L, 4L, 6L, 3L, 11L, 0L, 0L, 0L, 1L, 3L,
7L, 5L, 8L, 21L, 26L, 51L, 0L, 1L, 0L, 3L, 5L, 10L, 9L, 29L,
55L, 60L, 125L, 3L, 0L, 1L, 1L, 3L, 10L, 1L, 6L, 18L, 17L, 13L,
6L, 3L, 4L, 13L, 6L, 33L, 17L, 48L, 84L, 54L, 103L, 34L, 11L,
20L, 27L, 26L, 50L, 29L, 30L, 54L, 28L, 34L, 31L, 5L, 7L, 3L,
4L, 20L, 8L, 16L, 16L, 8L, 41L, 1L, 0L, 0L, 3L, 1L, 3L, 3L, 11L,
19L, 16L, 56L, 0L, 0L, 0L, 0L, 3L, 11L, 3L, 18L, 25L, 21L, 62L,
3L, 0L, 1L, 4L, 2L, 7L, 8L, 15L, 22L, 12L, 19L, 5L, 2L, 8L, 9L,
9L, 42L, 18L, 51L, 70L, 45L, 103L, 29L, 15L, 23L, 34L, 25L, 57L,
23L, 38L, 55L, 30L, 33L, 36L, 5L, 5L, 6L, 6L, 16L, 6L, 10L, 17L,
9L, 35L, 2L, 0L, 1L, 1L, 2L, 4L, 6L, 8L, 22L, 33L, 73L, 0L, 0L,
0L, 1L, 2L, 7L, 7L, 15L, 27L, 21L, 56L, 1L, 2L, 2L, 0L, 2L, 9L,
4L, 8L, 24L, 13L, 17L, 14L, 2L, 8L, 10L, 16L, 51L, 16L, 51L,
69L, 29L, 99L, 44L, 18L, 25L, 34L, 19L, 49L, 26L, 43L, 63L, 15L,
30L, 42L, 9L, 17L, 7L, 3L, 16L, 8L, 13L, 22L, 18L, 45L, 0L, 0L,
1L, 3L, 0L, 7L, 4L, 14L, 15L, 20L, 47L, 0L, 1L, 0L, 1L, 1L, 3L,
3L, 5L, 6L, 11L, 21L, 1L, 0L, 0L, 4L, 2L, 3L, 8L, 7L, 17L, 3L,
13L, 5L, 2L, 6L, 13L, 15L, 34L, 19L, 42L, 62L, 37L, 83L, 52L,
16L, 26L, 26L, 29L, 53L, 28L, 45L, 45L, 15L, 22L, 26L, 8L, 12L,
11L, 5L, 12L, 5L, 7L, 17L, 10L, 28L), .Dim = c(11L, 6L, 5L), .Dimnames = structure(list(
c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10"),
c("I've changed for work/ a new job/ gone on a work plan",
"I want a phone that doesn't offer", "I want Best Mates/ Favourites",
"I was offered or saw a better offer on another network",
"Issues with the network (poor coverage)", "Other"
), YearQuarter = c("2011-09-01", "2011-12-01", "2012-03-01",
"2012-06-01", "2012-09-01")), .Names = c("", "", "YearQuarter"
)), class = "table")
############ recoded data
x10 <- structure(c(40L, 3L, 13L, 12L, 3L, 9L, 12L, 13L, 10L, 36L, 16L,
30L, 15L, 54L, 21L, 14L, 22L, 10L, 77L, 16L, 29L, 185L, 28L,
84L, 30L, 19L, 24L, 157L, 82L, 132L, 62L, 197L, 84L, 49L, 78L,
32L, 72L, 11L, 30L, 83L, 17L, 43L, 31L, 25L, 37L, 148L, 93L,
121L, 63L, 206L, 93L, 44L, 80L, 27L, 106L, 16L, 30L, 77L, 17L,
42L, 30L, 20L, 32L, 128L, 117L, 120L, 45L, 215L, 106L, 63L, 102L,
35L, 67L, 15L, 29L, 32L, 9L, 11L, 16L, 18L, 24L, 120L, 94L, 104L,
37L, 230L, 90L, 38L, 79L, 24L), .Dim = c(3L, 6L, 5L), .Dimnames = structure(list(
c("Promoters", "Detractors", "Passive"), c("I've changed for work/ a new job/ gone on a work plan",
"I want a phone that doesn't offer", "I want Best Mates/ Favourites",
"I was offered or saw a better offer on another network",
"Issues with the network (poor coverage)", "Other"
), YearQuarter = c("2011-09-01", "2011-12-01", "2012-03-01",
"2012-06-01", "2012-09-01")), .Names = c("", "", "YearQuarter"
)), class = "table")
x10.p <- round(prop.table(x10,c(3,2)),2)*100
Hi there
The Net Promotion Score is a question which asks the consumers to rate the 'the likelihood to recommend the product or the service' on a zero to ten scale. People reported with 10 and 9 are called 'promoters', people rated 8 and 7 are seen as 'Passive', and people reported less than 6 are considered as detractors. The Net Promotion score is the difference between the percentage of 'Promoters' minus the the percentage of 'Detractors'.
I summerised and recoded the answers from the question into a table x10 from Sep 2011 to Sep 2012. The numbers are actual people counts for each group (Promoter,Detractor and Passive). Apologies for the three dimensioanl table, I am interested in the Net Promoter Score for each reason( i.e what's the percentage difference among the promoters and detractors for "I've changed for work/ a new job/ gone on a work plan" in Sep 2012.
The Net Promotion Score before I can plot it which requires a bit manipulation. I wonder if anyone knows to how do it?
Cheers
First, don't round until you've done all your calculations (otherwise you will have percentages not adding to 1)
x10.p <- prop.table(x10,c(3,2))*100
# get the total promoters
promoters <- apply(x10.p, 2:3, function(x) sum(tail(x,2)))
# and detractors
detractors <- apply(x10.p, 2:3, function(x) sum(head(x,7)))
# passive is everything else
passive <- passive <- 100 - (detractors +promoters)
# the net score
net <- promoters - detractors
net
YearQuarter
2011-09-01 2011-12-01 2012-03-01 2012-06-01 2012-09-01
I've changed for work/ a new job/ gone on a work plan 66.071429 50.00000 53.982301 59.210526 46.846847
I want a phone that doesn't offer 37.500000 52.86195 46.153846 44.117647 44.230769
I want Best Mates/ Favourites -2.857143 15.06849 6.451613 12.195122 -3.448276
I was offered or saw a better offer on another network 24.390244 20.21563 15.193370 3.013699 8.176101
Issues with the network (poor coverage) -43.333333 -39.35860 -39.502762 -46.448087 -54.061625
Other -17.391304 -18.23899 -23.841060 -19.500000 -29.078014
You want september 2012, select just that column, with drop = FALSE to ensure it is still a matrix with 1 column.
net[,'2012-09-01', drop = FALSE]
YearQuarter
2012-09-01
I've changed for work/ a new job/ gone on a work plan 46.846847
I want a phone that doesn't offer 44.230769
I want Best Mates/ Favourites -3.448276
I was offered or saw a better offer on another network 8.176101
Issues with the network (poor coverage) -54.061625
Other -29.078014