How to get "who's different" in a vector, with R - r

Simple question. Consider this vector:
[1] 378 380 380 380 380 360 187 380
How could we determine what are the numbers that differs from the others in that list? In that case it would be 378 360 and 187. Any ideas? I'm aware that the solution might not be simple...
I'm learning R and working on a dataset for my research, so it's != homework.
Any help would be greatly appreciated !

Maybe another alternative:
x <- c(378, 380, 380, 380, 380, 360, 187, 380)
setdiff(unique(x), x[duplicated(x)])

Extracting unrepeated elements can be done with something like:
a<-c(378, 380, 380, 380, 380, 360, 187, 380)
b <- table(a)
names(b[b==1])
#[1] "187" "360" "378"

A different approach:
x <- c(378, 380, 380, 380, 380, 360, 187, 380)
y <- rle(sort(x)); y[[2]][y[[1]]==1]

You can find the most frequent entry by using table() and which.max(), you can then index the original vector with a logical vector containing the non-equal entries like so: data[data!=mostfrequent]. You can get help by ?table() and ?which.max(), please comment if you need more.
Your sample vector
x <- c(378, 380, 380, 380, 380, 360, 187, 380)
Find the frequency of each number in it with table. For convenience later on, we convert it to be a data frame.
counts <- as.data.frame(table(x), stringsAsFactors = FALSE)
which.max lets us locate the modal value (the most common one).
modal_value <- which.max(counts$Freq)
The other values can then be found via indexing.
as.numeric(counts[-modal_value, "x"])

Related

How to add a sequence of numbers around each value in a vector

I've got a vector of numbers:
vec <- c(50, 75, 100, 125, 150, 200, 250, 300, 350, 400, 450, 475, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000)
I'd like to add the 10 numbers above and 10 numbers below each value. E.g. for 50 you would add 40 through to 49 and 51 through to 60.
Any help much appreciated!
Using sapply :
c(sapply(vec, `+`, -10:10))
If the number overlap you might want to add unique to the above output to get only unique values.
We can use outer from base R
c(outer(-10:10, vec, `+`))

Implementing basic Tabu Search in R without libraries

As part of a data analysis course I need to implement a Tabu search, but since I haven't got a background on CS I'm really struggling to come-up with a solution.
I got the following data:
node <- c(381, 178, 366, 153, 240, 251, 397, 181, 144, 202, 332, 186,
262, 419, 282, 279, 272, 302, 216, 186, 394, 265, 323, 204, 274,
305, 230, 212, 224, 326, 205, 338, 199, 353, 272, 364, 154, 288,
368, 139, 436, 431, 229, 357, 212, 437, 234, 247, 360, 297)
I need to select the 5 smallest elements performing the tabu search. The description from the algorithm has been taken from this book https://cs.gmu.edu/~sean/book/metaheuristics/Essentials.pdf
1: l Desired maximum tabu list length
2: n number of tweaks desired to sample the gradient
3: S some initial candidate solution
4: Best S
5: L {} a tabu list of maximum length l . Implemented as first in, first-out queue
6: Enqueue S into L
7: repeat
8: if Length(L) > l then
9: Remove oldest element from L
10: R Tweak(Copy(S))
11: for n
1 times do
12: W Tweak(Copy(S))
13: if W 2/ L and (Quality(W) > Quality(R) or R 2 L) then
14: R W
15: if R 2/ L then
16: S R
17: Enqueue R into L
18: if Quality(S) > Quality(Best) then
19: Best S
20: until Best is the ideal solution or we have run out of time
21: return Best
I got some of the elements but I'm stuck in adding the elements that don't exist in the tabu list when I get a new solution. I'm not sure how to put all the elements together or if my approach is "acceptable" (even if the algorithm is relatively simple).
max_length <- 20 # max tabu length
iterations <- 100
#set initial solution
solution <- c(sample(node,5,replace = FALSE))
best_solution <- solution
# we create the tabu list
tabu_list <- c()
tabu_list <- c(tabu_list,best_solution)
if(length(tabu_list) > max_length){
tabu_list <- tabu_list[-1:-5] # we eliminate the first 5 elements.
}
#create a new solution
new_node_list<- node[!(node %in% tabu_list)]
solution <- c(sample(new_node_list,5,replace = FALSE))
I can check if an item from the solution exist in the tabu list, but I am not sure how I can add those that don't exist to the tabu list.
which(tabu_list==solution)
## How can I add only those elements from solution that are not included in the tabu list?
Anybody could give me a hand on this one?
Many thanks in advance.

Plot the ranges of values in R

I am interested in plotting the range of values of variables so that the names appear on the Y-axis and the range on the X-axis, for a better visualization.
I have used the following code:
primer_matrix1a <- matrix(
c(
"EF1", 65, 217,
"EF6", 165, 197,
"EF14", 96, 138,
"EF15", 103, 159,
"EF20", 86, 118,
"G9", 115, 173,
"G25", 112, 140,
"BE22", 131, 135,
"TT20", 180, 190
)
,nrow=9,ncol=3,byrow = T)
# Format data
Primer_name <- primer_matrix1a[,1]
Primer_name <- matrix(c(Primer_name),nrow = 9,byrow = T)
Primer_values<- matrix(c(as.numeric(primer_matrix1a[ ,2-3])),nrow = 9,ncol = 2,byrow = T)
Primer_Frame <- data.frame(Primer_name,Primer_values)
colnames(Primer_Frame) <- c("Primer","min","max")
Primer_Frame$mean<- mean(c(Primer_Frame$min,Primer_Frame$max))
ggplot(Primer_Frame, aes(x=Primer))+
geom_linerange(aes(ymin=min,ymax=max),linetype=2,color="blue")+
geom_point(aes(y=min),size=3,color="red")+
geom_point(aes(y=max),size=3,color="red")+
theme_bw()
but the plot is weird, EF15 goes from 103, 159, while G9 goes from 115 to 173, and they do not overlap, so I am doing something wrong.
It looks like something is getting muddled when you are joining the matrix, but the approach is already more complex than it should be, so you might want to start afresh. It is probably easiest converting it to a dataframe and then formatting it there, rather than fiddling around with all the matrix functions:
df <- as.data.frame(primer_matrix1a)
names(df)<- c("Primer","min","max")
df$min <- as.numeric(as.character(df$min)) # Converts factor to numeric
df$max <- as.numeric(as.character(df$max))
df$mean<- mean(c(df$min,df$max))
ggplot(df, aes(x=Primer))+
geom_linerange(aes(ymin=min,ymax=max),linetype=2,color="blue")+
geom_point(aes(y=min),size=3,color="red")+
geom_point(aes(y=max),size=3,color="red")+
theme_bw()

What is the meaning of "out" object for box plot in R?

Suppose this is my data set:
ID<- seq(1:50)
mou<-sample(c(2000, 2500, 440, 4990, 23000, 450, 3412, 4958,745,1000), 50, replace= TRUE)
calls<-sample(c(50, 51, 12, 60, 90, 888, 444, 668, 16, 89, 222,33, 243, 239, 333, 645,23, 50,555), 50, replace= TRUE)
rev<- sample(c(100, 345, 758, 44, 58, 334, 50000, 888, 205, 940,298, 754), 50, replace= TRUE)
dt<- data.frame(mou, calls, rev)
I did the box plot for calls and while analyzing it, I saw the following objects for the boxplot.
x<-boxplot(dt$calls)
names(x)
> names(x)
[1] "stats" "n" "conf" "out" "group" "names"
Looking at the output for x$stats, I figured that stats object gives me the lower whisker the lower hinge, the median, the the upper hinge and the upper whisker for each group. But i am little bit confused what the object "out" really mean? Does this signify the outlier values or something else?
The out object for my boxplot gives the following results:
> x$out
[1] 555 10000 555 555 555 555 555 10000
It gives you: "The values of any data points which lie beyond the extremes of the whiskers"
Take a look at here for more insight.

Represent interval between values in ggplot2 geom_line()

I need to plot a large amount of data, but most of them are equal to 0. My idea was, in order to save space and computation time, to not store values equal to 0.
Furthermore, I want to use geom_line() function of ggplot2 package in R, because with my data, this representation is the best one and has the aesthetics that I want.
My problem is: How, between two values of my X axis, can I plot a line at 0. Do I have to generate the associated Data Frame or a trick is possible to plot this?
Example:
X Y
117 1
158 14
179 4
187 1
190 1
194 2
197 1
200 4
203 3
208 1
211 1
212 5
218 1
992 15
1001 1
1035 1
1037 28
1046 1
1048 1
1064 14
1078 1
# To generate the DF
X <- c(117, 158, 179, 187, 190, 194, 197, 200, 203, 208, 211, 212, 218, 992, 1001, 1035, 1037, 1046, 1048, 1064, 1078)
Y <- c(1,14,4,1,1,2,1,4,3,1,1,5,1,15,1,1,28,1,1,14,1)
data <- data.frame(X,Y)
g <- ggplot(data = data, aes(x = data$X, y = data$Y))
g <- g + geom_line()
g
To give you an idea, that I am trying to do is to convert this image:
to something like this:
http://www.hostingpics.net/viewer.php?id=407269stack2.png
To generate the second figure, I have to define two positions around peaks in order to have this good shape.
I tried to change the scale to continuous scale, or discrete, but I did not have good peaks. So, there is a trick to say at ggplot2, if a position in X axis is between two values of X, this position will be display at 0?
Thank you a lot, any kind of help will be highly appreciated.
Your problem is that R doesn't see any interval values of X. You can fix that by doing the following:
X <- c(117, 158, 179, 187, 190, 194, 197, 200, 203, 208, 211, 212, 218, 992, 1001, 1035, 1037, 1046, 1048, 1064, 1078)
Y <- c(1,14,4,1,1,2,1,4,3,1,1,5,1,15,1,1,28,1,1,14,1)
Which is your original data frame.
Z <- data.frame(seq(min(X),max(X)))
Creates a data frame that has all of the X values.
colnames(Z)[1] <- "X"
Renames the first column as "X" to be able to merge it with your "data" dataframe.
data <- data.frame(X,Y)
data <- merge(Z[1],data, all.x = X)
Creates a new data frame with all of the interval X values.
data[is.na(data)] <- 0
Sets all X values that are NA to 0.
g <- ggplot(data = data, aes(x = data$X, y = data$Y))
g <- g + geom_line()
g
Now plots it.

Resources