Implementing basic Tabu Search in R without libraries

Implementing basic Tabu Search in R without libraries - r

As part of a data analysis course I need to implement a Tabu search, but since I haven't got a background on CS I'm really struggling to come-up with a solution.
I got the following data:
node <- c(381, 178, 366, 153, 240, 251, 397, 181, 144, 202, 332, 186,
262, 419, 282, 279, 272, 302, 216, 186, 394, 265, 323, 204, 274,
305, 230, 212, 224, 326, 205, 338, 199, 353, 272, 364, 154, 288,
368, 139, 436, 431, 229, 357, 212, 437, 234, 247, 360, 297)
I need to select the 5 smallest elements performing the tabu search. The description from the algorithm has been taken from this book https://cs.gmu.edu/~sean/book/metaheuristics/Essentials.pdf
1: l Desired maximum tabu list length
2: n number of tweaks desired to sample the gradient
3: S some initial candidate solution
4: Best S
5: L {} a tabu list of maximum length l . Implemented as first in, first-out queue
6: Enqueue S into L
7: repeat
8: if Length(L) > l then
9: Remove oldest element from L
10: R Tweak(Copy(S))
11: for n
1 times do
12: W Tweak(Copy(S))
13: if W 2/ L and (Quality(W) > Quality(R) or R 2 L) then
14: R W
15: if R 2/ L then
16: S R
17: Enqueue R into L
18: if Quality(S) > Quality(Best) then
19: Best S
20: until Best is the ideal solution or we have run out of time
21: return Best
I got some of the elements but I'm stuck in adding the elements that don't exist in the tabu list when I get a new solution. I'm not sure how to put all the elements together or if my approach is "acceptable" (even if the algorithm is relatively simple).
max_length <- 20 # max tabu length
iterations <- 100
#set initial solution
solution <- c(sample(node,5,replace = FALSE))
best_solution <- solution
# we create the tabu list
tabu_list <- c()
tabu_list <- c(tabu_list,best_solution)
if(length(tabu_list) > max_length){
tabu_list <- tabu_list[-1:-5] # we eliminate the first 5 elements.
}
#create a new solution
new_node_list<- node[!(node %in% tabu_list)]
solution <- c(sample(new_node_list,5,replace = FALSE))
I can check if an item from the solution exist in the tabu list, but I am not sure how I can add those that don't exist to the tabu list.
which(tabu_list==solution)
## How can I add only those elements from solution that are not included in the tabu list?
Anybody could give me a hand on this one?
Many thanks in advance.

Related

Extract clusters of similar continuous numbers from vectors

I am attempting an analysis that requires the extraction of a some (2 or 3) consecutive values in which perform further analysis later.
I have two vectors: a is the output from a machine of consecutive cellular signals. b is the same output, but shifted by 1. This notation is used to understand the variability between one signal and the next one
a <- c(150, 130, 135, 180, 182, 190, 188, 195, 170, 140, 120, 130, 180, 181)
b <- c(130, 135, 180, 182, 190, 188, 195, 170, 140, 120, 130, 180, 181, 130)
What I am trying to do is to identify the most homogeneous (stable) region (i.e. one value is similar to the following) in this set of data.
The idea I had was to perform a subtraction between a and b and consider the absolute value:
c <- abs(a-b)
which gives
c
[1] 20 5 45 2 8 2 7 25 30 20 10 50 1 51
Now, if I want the 3 closest consecutive points, I can clearly see that the sequence 2 8 2 is by far the one that I would consider, but I have no idea on how I can automatically extract these 3 values, especially from arrays of hundreds of data points.

Initial data:
a <- c(150, 130, 135, 180, 182, 190, 188, 195, 170, 140, 120, 130, 180, 181)
b <- c(130, 135, 180, 182, 190, 188, 195, 170, 140, 120, 130, 180, 181, 130)
Find absolute difference between two vectors:
res <- abs(a - b)
For each element in res get neighbors and calculate sum off absolute difference :
# with res[(x-1):(x+1)] we extract x and it's neighbors
resSimilarity <- sapply(seq_along(res), function(x) sum(res[(x-1):(x+1)]))
resPosition <- which.min(resSimilarity)
# [1] 5
To extract values from original vectors use:
a[(resPosition - 1):(resPosition + 1)]
# [1] 180 182 190
b[(resPosition - 1):(resPosition + 1)]
# [1] 182 190 188

Here is one more alternative:
a <- c(150, 130, 135, 180, 182, 190, 188, 195, 170, 140, 120, 130, 180, 181)
b <- c(130, 135, 180, 182, 190, 188, 195, 170, 140, 120, 130, 180, 181, 130)
res <- abs(a-b)
> which.min(diff(c(0, cumsum(res)), lag=3))
[1] 4
> res[(4):(4+2)]
[1] 2 8 2
The above code uses cumsum to get the cumulative sums of your absolute differences. Then it calls diff with lag=3 to get the differences between each element and the element 3 positions away from it. Finally it takes the position where the increase in cumulative sum over successive 3 elements was the smallest.

Plot the ranges of values in R

I am interested in plotting the range of values of variables so that the names appear on the Y-axis and the range on the X-axis, for a better visualization.
I have used the following code:
primer_matrix1a <- matrix(
c(
"EF1", 65, 217,
"EF6", 165, 197,
"EF14", 96, 138,
"EF15", 103, 159,
"EF20", 86, 118,
"G9", 115, 173,
"G25", 112, 140,
"BE22", 131, 135,
"TT20", 180, 190
)
,nrow=9,ncol=3,byrow = T)
# Format data
Primer_name <- primer_matrix1a[,1]
Primer_name <- matrix(c(Primer_name),nrow = 9,byrow = T)
Primer_values<- matrix(c(as.numeric(primer_matrix1a[ ,2-3])),nrow = 9,ncol = 2,byrow = T)
Primer_Frame <- data.frame(Primer_name,Primer_values)
colnames(Primer_Frame) <- c("Primer","min","max")
Primer_Frame$mean<- mean(c(Primer_Frame$min,Primer_Frame$max))
ggplot(Primer_Frame, aes(x=Primer))+
geom_linerange(aes(ymin=min,ymax=max),linetype=2,color="blue")+
geom_point(aes(y=min),size=3,color="red")+
geom_point(aes(y=max),size=3,color="red")+
theme_bw()
but the plot is weird, EF15 goes from 103, 159, while G9 goes from 115 to 173, and they do not overlap, so I am doing something wrong.

It looks like something is getting muddled when you are joining the matrix, but the approach is already more complex than it should be, so you might want to start afresh. It is probably easiest converting it to a dataframe and then formatting it there, rather than fiddling around with all the matrix functions:
df <- as.data.frame(primer_matrix1a)
names(df)<- c("Primer","min","max")
df$min <- as.numeric(as.character(df$min)) # Converts factor to numeric
df$max <- as.numeric(as.character(df$max))
df$mean<- mean(c(df$min,df$max))
ggplot(df, aes(x=Primer))+
geom_linerange(aes(ymin=min,ymax=max),linetype=2,color="blue")+
geom_point(aes(y=min),size=3,color="red")+
geom_point(aes(y=max),size=3,color="red")+
theme_bw()

Time series Analysis in R

I have been given a dataset for 118 days. I'm supposed to forecast the values for the next 28 days. I've tried out the below code. But I'm getting the same values for all the 28 days. Can you help me find my mistake? Thank you.
library(forecast)
library(dplyr)
head(product)
ts_product = ts(product$Qty, start=1,frequency=1)
ts_product
plot(ts_product)
#predictions of 28 days
m_ets = ets(ts_product)
f_ets = forecast(m_ets, h=28)
plot(f_ets)
The data for Qty is given by:
Qty = c(53, 40, 37, 45, 69, 105, 62, 101, 104, 46, 92, 157, 133, 173,
139, 163, 145, 154, 245, 147, 85, 131, 228, 192, 240, 346, 267,
267, 243, 233, 233, 244, 241, 136, 309, 236, 310, 266, 280, 321,
349, 335, 410, 226, 391, 314, 250, 368, 282, 203, 250, 233, 233,
277, 338, 279, 279, 266, 253, 178, 238, 126, 279, 258, 350, 277,
226, 287, 180, 268, 191, 279, 214, 133, 292, 212, 307, 232, 165,
107, 121, 188, 198, 154, 128, 85, 106, 67, 63, 88, 107, 56, 41,
59, 27, 58, 80, 75, 93, 54, 14, 36, 107, 82, 83, 112, 37, 57,
9, 51, 47, 57, 68, 97, 25, 45, 69, 89)
This is the prediction I get.
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
119 69.53429 2.089823 136.9788 -33.61312 172.6817
120 69.53429 -2.569107 141.6377 -40.73834 179.8069
121 69.53429 -6.944751 146.0133 -47.43031 186.4989
122 69.53429 -11.083248 150.1518 -53.75959 192.8282
123 69.53429 -15.019428 154.0880 -59.77946 198.8480
124 69.53429 -18.780346 157.8489 -65.53129 204.5999
125 69.53429 -22.387517 161.4561 -71.04798 210.1166
126 69.53429 -25.858385 164.9270 -76.35622 215.4248
127 69.53429 -29.207323 168.2759 -81.47798 220.5466
128 69.53429 -32.446345 171.5149 -86.43163 225.5002
129 69.53429 -35.585612 174.6542 -91.23273 230.3013
130 69.53429 -38.633808 177.7024 -95.89454 234.9631
131 69.53429 -41.598429 180.6670 -100.42854 239.4971
132 69.53429 -44.485993 183.5546 -104.84468 243.9133
133 69.53429 -47.302214 186.3708 -109.15172 248.2203
134 69.53429 -50.052133 189.1207 -113.35736 252.4259
135 69.53429 -52.740222 191.8088 -117.46844 256.5370
136 69.53429 -55.370474 194.4391 -121.49106 260.5596
137 69.53429 -57.946468 197.0150 -125.43070 264.4993
138 69.53429 -60.471431 199.5400 -129.29230 268.3609
139 69.53429 -62.948280 202.0169 -133.08032 272.1489
140 69.53429 -65.379664 204.4482 -136.79880 275.8674
141 69.53429 -67.768000 206.8366 -140.45144 279.5200
142 69.53429 -70.115495 209.1841 -144.04163 283.1102
143 69.53429 -72.424177 211.4928 -147.57245 286.6410
144 69.53429 -74.695908 213.7645 -151.04676 290.1153
145 69.53429 -76.932409 216.0010 -154.46719 293.5358
146 69.53429 -79.135268 218.2038 -157.83618 296.9048
Also, do you think any other model other than ets, which we have used here will work for this problem ?

Understanding ets()
The ets() function is an exponential smoothing technique for state space models. By default, the ets() function will attempt to automatically fit a model to a time series via model = 'ZZZ' using the supplied frequency= parameter. This is particularly problematic as an incorrectly specified frequency= will cause a non-ideal model to be generate w.r.t to seasonality yielding the flat estimates.
Seasonalities
You may think that one should specify frequency=1 within a ts() object for daily data. However, that is an incorrect way to go about it. In fact, the correct way to specify frequency= is to understand R's "unique" definition:
The frequency is the number of observations per season.
Thus, we need to care about the seasonality of your data.
There are two guiding tables to consult.
The first is a macro view:
Data Frequency
Annual 1
Quarterly 4
Monthly 12
Weekly 52
The second is a micro view:
Data Frequencies
Minute Hour Day Week Year
Daily 7 365.25
Hourly 24 168 8766
Half-hourly 48 336 17532
Minutes 60 1440 10080 525960
Seconds 60 3600 86400 604800 31557600
There are two seasonalities (e.g. frequency= options) to consider with daily data:
7 (weekly) and 365.25 (daily)
For more information see: Seasonal periods
Revisiting the estimation
The reason why ets() is not working appropriately is due to the seasonality used. (e.g. frequency = 1). By changing it based on the above, we get:
# Changed the frequency to 7
ts_product = ts(product$Qty, start=1, frequency=7)
# Predictions of 28 days
m_ets <- ets(ts_product)
f_ets <- forecast(m_ets, h = 28)
plot(f_ets)
Alternative models
There are two other models worth looking into briefly: HoltWinters() and auto.arima(). Discussion for is available for the prior: HoltWinters vs. ets
hw = HoltWinters(ts_product)
f_ets = predict(hw, n.ahead = 28, prediction.interval = T, level = 0.95)
plot(hw, f_ets)
The ARIMA generated by running auto.arima():
aa = auto.arima(ts_product)
f_ets = forecast(aa, h = 28)
plot(f_ets)
Misc data note
Briefly looking at your data under:
ts_product = ts(a, start=1, frequency=1)
plot(ts_product)
Note, there is a relatively large disturbance between times 18-85 that would cause a model to be considered non-stationary. You may wish to first try differencing it out via diff() and then repeat the above.
In addition, you may want to try to obtain a full year's worth of data instead of only 118 days.

Take a look at ?arima. For example:
mar=arima(product$Qty,order = c(1,0,1))
f_ar=forecast(mar, h=28)
plot(f_ar)
Your data appears to have seasonality, try to use that information in the ets or arima models.

Represent interval between values in ggplot2 geom_line()

I need to plot a large amount of data, but most of them are equal to 0. My idea was, in order to save space and computation time, to not store values equal to 0.
Furthermore, I want to use geom_line() function of ggplot2 package in R, because with my data, this representation is the best one and has the aesthetics that I want.
My problem is: How, between two values of my X axis, can I plot a line at 0. Do I have to generate the associated Data Frame or a trick is possible to plot this?
Example:
X Y
117 1
158 14
179 4
187 1
190 1
194 2
197 1
200 4
203 3
208 1
211 1
212 5
218 1
992 15
1001 1
1035 1
1037 28
1046 1
1048 1
1064 14
1078 1
# To generate the DF
X <- c(117, 158, 179, 187, 190, 194, 197, 200, 203, 208, 211, 212, 218, 992, 1001, 1035, 1037, 1046, 1048, 1064, 1078)
Y <- c(1,14,4,1,1,2,1,4,3,1,1,5,1,15,1,1,28,1,1,14,1)
data <- data.frame(X,Y)
g <- ggplot(data = data, aes(x = data$X, y = data$Y))
g <- g + geom_line()
g
To give you an idea, that I am trying to do is to convert this image:
to something like this:
http://www.hostingpics.net/viewer.php?id=407269stack2.png
To generate the second figure, I have to define two positions around peaks in order to have this good shape.
I tried to change the scale to continuous scale, or discrete, but I did not have good peaks. So, there is a trick to say at ggplot2, if a position in X axis is between two values of X, this position will be display at 0?
Thank you a lot, any kind of help will be highly appreciated.

Your problem is that R doesn't see any interval values of X. You can fix that by doing the following:
X <- c(117, 158, 179, 187, 190, 194, 197, 200, 203, 208, 211, 212, 218, 992, 1001, 1035, 1037, 1046, 1048, 1064, 1078)
Y <- c(1,14,4,1,1,2,1,4,3,1,1,5,1,15,1,1,28,1,1,14,1)
Which is your original data frame.
Z <- data.frame(seq(min(X),max(X)))
Creates a data frame that has all of the X values.
colnames(Z)[1] <- "X"
Renames the first column as "X" to be able to merge it with your "data" dataframe.
data <- data.frame(X,Y)
data <- merge(Z[1],data, all.x = X)
Creates a new data frame with all of the interval X values.
data[is.na(data)] <- 0
Sets all X values that are NA to 0.
g <- ggplot(data = data, aes(x = data$X, y = data$Y))
g <- g + geom_line()
g
Now plots it.

How to get "who's different" in a vector, with R

Simple question. Consider this vector:
[1] 378 380 380 380 380 360 187 380
How could we determine what are the numbers that differs from the others in that list? In that case it would be 378 360 and 187. Any ideas? I'm aware that the solution might not be simple...
I'm learning R and working on a dataset for my research, so it's != homework.
Any help would be greatly appreciated !

Maybe another alternative:
x <- c(378, 380, 380, 380, 380, 360, 187, 380)
setdiff(unique(x), x[duplicated(x)])

Extracting unrepeated elements can be done with something like:
a<-c(378, 380, 380, 380, 380, 360, 187, 380)
b <- table(a)
names(b[b==1])
#[1] "187" "360" "378"

A different approach:
x <- c(378, 380, 380, 380, 380, 360, 187, 380)
y <- rle(sort(x)); y[[2]][y[[1]]==1]

You can find the most frequent entry by using table() and which.max(), you can then index the original vector with a logical vector containing the non-equal entries like so: data[data!=mostfrequent]. You can get help by ?table() and ?which.max(), please comment if you need more.
Your sample vector
x <- c(378, 380, 380, 380, 380, 360, 187, 380)
Find the frequency of each number in it with table. For convenience later on, we convert it to be a data frame.
counts <- as.data.frame(table(x), stringsAsFactors = FALSE)
which.max lets us locate the modal value (the most common one).
modal_value <- which.max(counts$Freq)
The other values can then be found via indexing.
as.numeric(counts[-modal_value, "x"])

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Implementing basic Tabu Search in R without libraries - r

Related

Extract clusters of similar continuous numbers from vectors

Plot the ranges of values in R

Time series Analysis in R

Represent interval between values in ggplot2 geom_line()

How to get "who's different" in a vector, with R

Categories

Resources