How to increase performance when randomly selecting clusters and adding observations?

How to increase performance when randomly selecting clusters and adding observations? - r

In a clustered dataset, I want to randomly pick some clusters and then add some simulated observations to the selected clusters. Then I want to create a dataset that combines the simulated and original observations from the selected clusters with all the original observations from the unselected clusters. I would also like to repeat this process many times and thus create many (maybe 1000) new datasets. I managed to do this using for loop but would like to know if there is any more efficient and concise way to accomplish this. Here is an example dataset:
## simulate some data
y <- rnorm(20)
x <- rnorm(20)
z <- rep(1:5, 4)
w <- rep(1:4, each=5)
dd <- data.frame(id=z, cluster=w, x=x, y=y)
# id cluster x y
# 1 1 1 0.30003855 0.65325768
# 2 2 1 -1.00563626 -0.12270866
# 3 3 1 0.01925927 -0.41367651
# 4 4 1 -1.07742065 -2.64314895
# 5 5 1 0.71270333 -0.09294102
# 6 1 2 1.08477509 0.43028470
# 7 2 2 -2.22498770 0.53539884
# 8 3 2 1.23569346 -0.55527835
# 9 4 2 -1.24104450 1.77950291
# 10 5 2 0.45476927 0.28642442
# 11 1 3 0.65990264 0.12631586
# 12 2 3 -0.19988983 1.27226678
# 13 3 3 -0.64511396 -0.71846622
# 14 4 3 0.16532102 -0.45033862
# 15 5 3 0.43881870 2.39745248
# 16 1 4 0.88330282 0.01112919
# 17 2 4 -2.05233698 1.63356842
# 18 3 4 -1.63637927 -1.43850664
# 19 4 4 1.43040234 -0.19051680
# 20 5 4 1.04662885 0.37842390
cl <- split(dd, dd$cluster) ## split the data based on clusters
k <- length(dd$id)
l <- length(cl)
`%notin%` <- Negate(`%in%`) ## define "not in" to exclude unselected clusters so
## as to retain their original observations
A clsamp function in the following code is then created which includes two for loops. The first for loop is to exclude the unselected clusters and the second for loop is to simulate new observations and append them to the selected clusters. Note that I randomly sample 2 clusters (10% of the total number of observations), without replacement
clsamp <- function(cl, k) {
a <- sample(cl, size=0.1*k, replace=FALSE)
jud <- (names(cl) %notin% names(a))
need <- names(cl)[jud]
T3 <- NULL
for (k in need) {
T3 <- rbind(T3, cl[[k]])
}
subt <- NULL
s <- a
for (j in 1:2) {
y <- rnorm(2)
x <- rnorm(2)
d <- cbind(id=nrow(a[[j]]) + c(1:length(x)),
cluster=unique(a[[j]]$cluster), x, y)
s[[j]] <- rbind(a[[j]], d)
subt <- rbind(subt, s[[j]])
}
T <- rbind(T3, subt)
return(T)
}
Finally, this creates a list of 5 datasets each of which combines the simulated and original observations from the selected clusters with all the original observations from the unselected clusters
Q <- vector(mode="list", length=5)
for (i in 1:length(Q)) {
Q[[i]] <- clsamp(cl, 20)
}
Anyone knows a shorter way to do this? Maybe use the replicate function? Thanks.

This generates a sizeX2 matrix of random values and cbinds sampled cluster names and consecutive ids to it. It directly starts with dd and also works when you convert dd to a matrix mm, which might be slightly faster. Output is a data frame, though. Instead of your k I use f to directly calculate the number of rows that should be added to the two selected clusters. In case the size gets zero, the original data frame is returned.
clsamp2 <- function(m, f=.1) {
size <- round(nrow(m)*f)
if (size == 0) as.data.frame(m)
else {
ids <- unique(m[,1])
cls <- unique(m[,2])
rd <- matrix(rnorm(size * 4), ncol=2, dimnames=list(NULL, c("x", "y")))
out <- rbind.data.frame(m, cbind(id=rep(max(ids) + 1:size, each=2),
cluster=sample(cls, 2), rd))
`rownames<-`(out[order(out$cluster, out$id), ], NULL)
}
}
Result
set.seed(42) ## same seed also used for creating `dd`
clsamp2(dd, .1)
## or
mm <- as.matrix(dd)
clsamp2(mm, .1)
# id cluster x y
# 1 1 1 -0.30663859 1.37095845
# 2 2 1 -1.78130843 -0.56469817
# 3 3 1 -0.17191736 0.36312841
# 4 4 1 1.21467470 0.63286260
# 5 5 1 1.89519346 0.40426832
# 6 1 2 -0.43046913 -0.10612452
# 7 2 2 -0.25726938 1.51152200
# 8 3 2 -1.76316309 -0.09465904
# 9 4 2 0.46009735 2.01842371
# 10 5 2 -0.63999488 -0.06271410
# 11 6 2 1.37095845 0.40426832
# 12 7 2 0.36312841 1.51152200
# 13 1 3 0.45545012 1.30486965
# 14 2 3 0.70483734 2.28664539
# 15 3 3 1.03510352 -1.38886070
# 16 4 3 -0.60892638 -0.27878877
# 17 5 3 0.50495512 -0.13332134
# 18 1 4 -1.71700868 0.63595040
# 19 2 4 -0.78445901 -0.28425292
# 20 3 4 -0.85090759 -2.65645542
# 21 4 4 -2.41420765 -2.44046693
# 22 5 4 0.03612261 1.32011335
# 23 6 4 -0.56469817 -0.10612452
# 24 7 4 0.63286260 -0.09465904
To create the list of five samples, you may use replicate.
replicate(5, clsamp2(dd, .1), simplify=FALSE)
Running time is negligible.
system.time(replicate(1000, clsamp2(dd, .1), simplify=FALSE))
# user system elapsed
# 0.44 0.03 0.44

Related

R, arguments imply differing number of row

I generated a data frame (df) in R (see below). If I use the column "x2" instead of "x2a" to make the data frame everything works well. However, as soon as I use "x2a" instead of "x2" I get an error because the input of "x2a" is of various lengths. Do you have an idea how I can change the code that it is going to work with column "x2a"?
Error message with "x2a":
Error in data.frame(Id = rep(df$Id), Noise = unlist(split_it), Start = rep(df$Start), :
arguments imply differing number of rows: 3, 16
Code to reproduce the data frame and error
x1 <- c("A", "B", "C")
x2 <- c("[1,3,5,6,7]","[5,7,8,9,10]","[3,4,5,8,9]")
x2a <- c("[1,3,5]","[5,7,8,9,10, 20, 30, 24]","[3,4,5,8,9]")
x3 <- c(8000, 74555, 623334)
x4 <- c(9000, 76000, 623500)
df <- data.frame(cbind(x1, x2a, x3, x4))
colnames(df) <- c("Id", "Noise", "Start", "End")
df$Start <- as.numeric(as.character(df$Start))
df$End <- as.numeric(as.character(df$End))
# remove square brackets
df$Noise <- gsub("\\[|\\]", "", df$Noise)
# split
split_it <- strsplit(df$Noise, split = ",")
df_2 <- data.frame(Id = rep(df$Id), Noise = unlist(split_it), Start = rep(df$Start), End = rep(df$End))
df_2 <- df_2[order(df_2$Id),]
rownames(df_2) <- NULL

base R
What I'm inferring you want is not something R can "intuit" for you: you want it to repeat the values in Id based on the number of elements found when strsplit did its work. (How should R know to look in one object and arbitrarily repeat another?)
Try using rep(., times=.) to specify how many times each element of Id (etc) should be repeated in order to stay "in step" with Noise.
# split
split_it <- strsplit(df$Noise, split = ",")
n <- lengths(split_it)
print(n)
# [1] 3 8 5
df_2 <- data.frame(Id = rep(df$Id, times=n),
Noise = unlist(split_it),
Start = rep(df$Start, times=n),
End = rep(df$End, times=n))
df_2 <- df_2[order(df_2$Id),]
rownames(df_2) <- NULL
df_2
# Id Noise Start End
# 1 A 1 8000 9000
# 2 A 3 8000 9000
# 3 A 5 8000 9000
# 4 B 5 74555 76000
# 5 B 7 74555 76000
# 6 B 8 74555 76000
# 7 B 9 74555 76000
# 8 B 10 74555 76000
# 9 B 20 74555 76000
# 10 B 30 74555 76000
# 11 B 24 74555 76000
# 12 C 3 623334 623500
# 13 C 4 623334 623500
# 14 C 5 623334 623500
# 15 C 8 623334 623500
# 16 C 9 623334 623500
dplyr
library(dplyr)
df %>%
mutate(Noise = strsplit(Noise, split = ",")) %>%
unnest(Noise) %>%
mutate(Noise = as.integer(Noise)) # I'm inferring this is desired, not required
# # A tibble: 16 x 4
# Id Noise Start End
# <chr> <int> <dbl> <dbl>
# 1 A 1 8000 9000
# 2 A 3 8000 9000
# 3 A 5 8000 9000
# 4 B 5 74555 76000
# 5 B 7 74555 76000
# 6 B 8 74555 76000
# 7 B 9 74555 76000
# 8 B 10 74555 76000
# 9 B 20 74555 76000
# 10 B 30 74555 76000
# 11 B 24 74555 76000
# 12 C 3 623334 623500
# 13 C 4 623334 623500
# 14 C 5 623334 623500
# 15 C 8 623334 623500
# 16 C 9 623334 623500

How to remove near zero variance without employing the caret package?

I am new to programming. But here is the piece of code that I have tried to remove the nearZeroVar function of the caret package from:
N <- 200 # number of points per class
D <- 2 # dimensionality
K <- 4 # number of classes
X <- data.frame() # data matrix (each row = single example)
y <- data.frame() # class labels
...(some lines are omitted)...
X <- as.matrix(X)
Y <- matrix(0, N * K, K)
for (i in 1:(N * K)) { Y[i, y[i,]] <- 1}
...(some lines are omitted)...
nzv <- nearZeroVar(train)
nzv.nolabel <- nzv-1
inTrain <- createDataPartition(y=train$label, p=0.7, list=F)
training <- train[inTrain, ]
CV <- train[-inTrain, ]
X <- as.matrix(training[, -1])
N <- nrow(X)
y <- training[, 1]
K <- length(unique(y))
X.proc <- X[, -nzv.nolabel]/max(X)
D <- ncol(X.proc)
Xcv <- as.matrix(CV[, -1])
ycv <- CV[, 1]
Xcv.proc <- Xcv[, -nzv.nolabel]/max(X)
Y <- matrix(0, N, K)
So, to get rid of the nearZeroVar function, I have tried to use the Filter function and the following foo function:
foo <- function(data) {
out <- lapply(data, function(x) length(unique(x)))
want <- which(!out > 1)
unlist(want)
}
nzv <- foo(trainingSet)
nzv.nolabel <- nzv - 1
But I get error messages: "Error in X[, training.nolabel]: incorrect number of dimensions. Execution halted" or something like "Non-conformable arrays". Any ideas on how to work around the `nearZeroVar" are strongly appreciated. Please, let me know if I should share some more details.

It's not evident from the code posted that how Filter() was being used.
Try the following;
# create sample data
R> df <- data.frame(a=1:10, b=sample(10:19), c=rep(5,10))
R> df
a b c
1 1 16 5
2 2 17 5
3 3 18 5
4 4 13 5
5 5 15 5
6 6 14 5
7 7 11 5
8 8 12 5
9 9 19 5
10 10 10 5
creating a custom function like;
R> zeroVarianceCol<- function(df){
new_df<-Filter(var,df)
}
passing the dataframe to this custom function like, x<- zeroVarianceCol(df) will remove the near zero variance column, in this case column c.
R> x
a b
1 1 16
2 2 17
3 3 18
4 4 13
5 5 15
6 6 14
7 7 11
8 8 12
9 9 19
10 10 10

Create N random integers with no gaps

For a clustering algorithm that I'm implementing, I would like to initialize the clusters assignments at random. However, I need that there are no gaps. That is, this is not ok:
set.seed(2)
K <- 10 # initial number of clusters
N <- 20 # number of data points
z_init <- sample(K,N, replace=TRUE) # initial assignments
z_init
# [1] 2 8 6 2 10 10 2 9 5 6 6 3 8 2 5 9 10 3 5 1
sort(unique(z_init))
# [1] 1 2 3 5 6 8 9 10
where labels 4 and 7 have not been used.
Instead, I would like this vector to be:
# [1] 2 6 5 2 8 8 2 7 4 5 5 3 6 2 4 7 8 3 4 1
where the label 5 has become 4 and so forth to fill the lower empty labels.
More examples:
The vector 1 2 3 5 6 8 should be ̀1 2 3 4 5 6 7
The vector 15,5,7,7,10 should be ̀1 2 3 3 4
Can it be done avoiding for loops? I don't need it to be fast, I prefer it to be elegant and short, since I'm doing it only once in the code (for label initialization).
My solution using a for loop
z_init <- c(3,2,1,3,3,7,9)
idx <- order(z_init)
for (i in 2:length(z_init)){
if(z_init[idx[i]] > z_init[idx[i-1]]){
z_init[idx[i]] <- z_init[idx[i-1]]+1
}
else{
z_init[idx[i]] <- z_init[idx[i-1]]
}
}
z_init
# 3 2 1 3 3 4 5

Edit: #GregSnow came up with the current shortest answer. I'm 100% convinced that this is the shortest possible way.
For fun, I decided to golf the code, i.e. write it as short as possible:
z <- c(3, 8, 4, 4, 8, 2, 3, 9, 5, 1, 4)
# solution by hand: 1 2 3 3 4 4 4 5 6 6 7
sort(c(factor(z))) # 18 bits, as proposed by #GregSnow in the comments
# [1] 1 2 3 3 4 4 4 5 6 6 7
Some other (functioning) attempts:
y=table(z);rep(seq(y),y) # 24 bits
sort(unclass(factor(z))) # 24 bits, based on #GregSnow 's answer
diffinv(diff(sort(z))>0)+1 # 26 bits
sort(as.numeric(factor(z))) # 27 bits, #GregSnow 's original answer
rep(seq(unique(z)),table(z)) # 28 bits
cumsum(c(1,diff(sort(z))>0)) # 28 bits
y=rle(sort(z))$l;rep(seq(y),y) # 30 bits
Edit2: Just to show that bits isn't everything:
z <- sample(1:10,10000,replace=T)
Unit: microseconds
expr min lq mean median uq max neval
sort(c(factor(z))) 2550.128 2572.2340 2681.4950 2646.6460 2729.7425 3140.288 100
{ y = table(z) rep(seq(y), y) } 2436.438 2485.3885 2580.9861 2556.4440 2618.4215 3070.812 100
sort(unclass(factor(z))) 2535.127 2578.9450 2654.7463 2623.9470 2708.6230 3167.922 100
diffinv(diff(sort(z)) > 0) + 1 551.871 572.2000 628.6268 626.0845 666.3495 940.311 100
sort(as.numeric(factor(z))) 2603.814 2672.3050 2762.2030 2717.5050 2790.7320 3558.336 100
rep(seq(unique(z)), table(z)) 2541.049 2586.0505 2733.5200 2674.0815 2760.7305 5765.815 100
cumsum(c(1, diff(sort(z)) > 0)) 530.159 545.5545 602.1348 592.3325 632.0060 844.385 100
{ y = rle(sort(z))$l rep(seq(y), y) } 661.218 684.3115 727.4502 724.1820 758.3280 857.412 100
z <- sample(1:100000,replace=T)
Unit: milliseconds
expr min lq mean median uq max neval
sort(c(factor(z))) 84.501189 87.227377 92.13182 89.733291 94.16700 150.08327 100
{ y = table(z) rep(seq(y), y) } 78.951701 82.102845 85.54975 83.935108 87.70365 106.05766 100
sort(unclass(factor(z))) 84.958711 87.273366 90.84612 89.317415 91.85155 121.99082 100
diffinv(diff(sort(z)) > 0) + 1 9.784041 9.963853 10.37807 10.090965 10.34381 17.26034 100
sort(as.numeric(factor(z))) 85.917969 88.660145 93.42664 91.542263 95.53720 118.44512 100
rep(seq(unique(z)), table(z)) 86.568528 88.300325 93.01369 90.577281 94.74137 118.03852 100
cumsum(c(1, diff(sort(z)) > 0)) 9.680615 9.834175 10.11518 9.963261 10.16735 14.40427 100
{ y = rle(sort(z))$l rep(seq(y), y) } 12.842614 13.033085 14.73063 13.294019 13.66371 133.16243 100

It seems to me that you are trying to randomly assign elements of a set (the numbers 1 to 20) to clusters, subject to the requirement that each cluster be assigned at least one element.
One approach that I could think of would be to select a random reward r_ij for assigning element i to cluster j. Then I would define binary decision variables x_ij that indicate whether element i is assigned to cluster j. Finally, I would use mixed integer optimization to select the assignment from elements to clusters that maximizes the collected reward subject to the following conditions:
Every element is assigned to exactly one cluster
Every cluster has at least one element assigned to it
This is equivalent to randomly selecting an assignment, keeping it if all clusters have at least one element, and otherwise discarding it and trying again until you get a valid random assignment.
In terms of implementation, this is pretty easy to accomplish in R using the lpSolve package:
library(lpSolve)
N <- 20
K <- 10
set.seed(144)
r <- matrix(rnorm(N*K), N, K)
mod <- lp(direction = "max",
objective.in = as.vector(r),
const.mat = rbind(t(sapply(1:K, function(j) rep((1:K == j) * 1, each=N))),
t(sapply(1:N, function(i) rep((1:N == i) * 1, K)))),
const.dir = c(rep(">=", K), rep("=", N)),
const.rhs = rep(1, N+K),
all.bin = TRUE)
(assignments <- apply(matrix(mod$solution, nrow=N), 1, function(x) which(x > 0.999)))
# [1] 6 5 3 3 5 6 6 9 2 1 3 4 7 6 10 2 10 6 6 8
sort(unique(assignments))
# [1] 1 2 3 4 5 6 7 8 9 10

You could do like this:
un <- sort(unique(z_init))
(z <- unname(setNames(1:length(un), un)[as.character(z_init)]))
# [1] 2 6 5 2 8 8 2 7 4 5 5 3 6 2 4 7 8 3 4 1
sort(unique(z))
# [1] 1 2 3 4 5 6 7 8
Here I replace elements of un in z_init with corresponding elements of 1:length(un).

A simple (but possibly inefficient) approach is to convert to a factor then back to numeric. Creating the factor will code the information as integers from 1 to the number of unique values, then add labels with the original values. Converting to numeric then drops the labels and leaves the numbers:
> x <- c(1,2,3,5,6,8)
> (x2 <- as.numeric(factor(x)))
[1] 1 2 3 4 5 6
>
> xx <- c(15,5,7,7,10)
> (xx2 <- as.numeric(factor(xx)))
[1] 4 1 2 2 3
> (xx3 <- as.numeric(factor(xx, levels=unique(xx))))
[1] 1 2 3 3 4
The levels = portion in the last example sets the numbers to match the order in which they appear in the original vector.

R data frame manipulation

Suppose I have a data frame that looks like this.
# start end motif
# 2 6 a
# 10 15 b
# 30 35 c
How would I create a data frame that fills in the remaining start and end locations like so up to a certain number Max_end:
Max_end <- 33
# start end motif
# 0 2 na # <- 0-2 are filled in because it is not in the original data frame
# 2 6 a # <- 2-6 are in the original
# 6 10 na # <- 6-10 is not
# 10 15 b # <- 10-15 is
# 15 30 na # and so on
# 30 33 c
And further, calculates the distance between the start and end locations and creates a one column data frame.
# Length motif
# 2 na
# 4 a
# 4 na
# 5 b
# 15 na
# 3 c
Currently this is how i am doing it: It is very inefficient
library(data.table)
library(stringi)
f <- fread('ABC.txt',header=F,skip=1)$V1
f <- paste(f, collapse = "")
motifs = c('GATC', 'CTGCAG', 'ACCACC', 'CC(A|T)GG', 'CCAC.{8}TGA(C|T)')
v <- na.omit(data.frame(do.call(rbind, lapply(stri_locate_all_regex(f, motifs), unlist))))
v <- v[order(v[,1]),]
v2difference <- "blah"
for(i in 2:nrow(v)){
if(v[i,1] > v[i-1,2]+2){v2difference[i] <- v[i,1]-v[i-1,2]-2}
}
v2difference[1] <- v[1,1]
v2 <- data.frame(Order=seq(1, 2*nrow(v), 2),Lengths=matrix(v2difference, ncol = 1),Motifs="na")
v1 <- data.frame(Order=seq(2, 2*nrow(v), 2),Lengths=(v$end-v$start+1),Motifs=na.omit(unlist(stri_extract_all_regex(f,motifs))))
V <- data.frame(Track=1,rbind(v1,v2))
V <- V[order(V$Order),]
B <- V[,!(names(V) %in% "Order")]

Max_end <- 33
breaks <- c(0, t(as.matrix(dat[,1:2])), Max_end) # get endpoints
breaks <- breaks[breaks <= Max_end]
merge(dat, data.frame(start=breaks[-length(breaks)], end=breaks[-1]), all=T)
# start end motif
# 1 0 2 <NA>
# 2 2 6 a
# 3 6 10 <NA>
# 4 10 15 b
# 5 15 30 <NA>
# 6 30 33 <NA>
# 7 30 35 c
To specify a start and endpoint, you could do
Max_end <- 33
Max_start <- 10
breaks <- unique(c(Max_start, t(as.matrix(dat[,1:2])), Max_end))
breaks <- breaks[breaks <= Max_end & breaks >= Max_start]
merge(dat, data.frame(start=breaks[-length(breaks)], end=breaks[-1]), all.y=T)
# start end motif
# 1 10 15 b
# 2 15 30 <NA>
# 3 30 33 <NA>
Note: this doesn't include "c" in the shortened final interval, you would need to decide if that values gets included or not when the interval changes.

How to select/find coordinates within a distance from a list (X/Y) using R

I have a data frame with list of X/Y locations (>2000 rows). What I want is to select or find all the rows/locations based on a max distance. For example, from the data frame select all the locations that are between 1-100 km from each other. Any suggestions on how to do this?

You need to somehow determine the distance between each pair of rows.
The simplest way is with a corresponding distance matrix
# Assuming Thresh is your threshold
thresh <- 10
# create some sample data
set.seed(123)
DT <- data.table(X=sample(-10:10, 5, TRUE), Y=sample(-10:10, 5, TRUE))
# create the disance matrix
distTable <- matrix(apply(createTable(DT), 1, distance), nrow=nrow(DT))
# remove the lower.triangle since we have symmetry (we don't want duplicates)
distTable[lower.tri(distTable)] <- NA
# Show which rows are above the threshold
pairedRows <- which(distTable >= thresh, arr.ind=TRUE)
colnames(pairedRows) <- c("RowA", "RowB") # clean up the names
Starting with:
> DT
X Y
1: -4 -10
2: 6 1
3: -2 8
4: 8 1
5: 9 -1
We get:
> pairedRows
RowA RowB
[1,] 1 2
[2,] 1 3
[3,] 2 3
[4,] 1 4
[5,] 3 4
[6,] 1 5
[7,] 3 5
These are the two functions used for creating the distance matrix
# pair-up all of the rows
createTable <- function(DT)
expand.grid(apply(DT, 1, list), apply(DT, 1, list))
# simple cartesian/pythagorean distance
distance <- function(CoordPair)
sqrt(sum((CoordPair[[2]][[1]] - CoordPair[[1]][[1]])^2, na.rm=FALSE))

I'm not entirely clear from your question, but assuming you mean you want to take each row of coordinates and find all the other rows whose coordinates fall within a certain distance:
# Create data set for example
set.seed(42)
x <- sample(-100:100, 10)
set.seed(456)
y <- sample(-100:100, 10)
coords <- data.frame(
"x" = x,
"y" = y)
# Loop through all rows
lapply(1:nrow(coords), function(i) {
dis <- sqrt(
(coords[i,"x"] - coords[, "x"])^2 + # insert your preferred
(coords[i,"y"] - coords[, "y"])^2 # distance calculation here
)
names(dis) <- 1:nrow(coords) # replace this part with an index or
# row names if you have them
dis[dis > 0 & dis <= 100] # change numbers to preferred threshold
})
[[1]]
2 6 7 9 10
25.31798 95.01579 40.01250 30.87070 73.75636
[[2]]
1 6 7 9 10
25.317978 89.022469 51.107729 9.486833 60.539243
[[3]]
5 6 8
70.71068 91.78780 94.86833
[[4]]
5 10
40.16217 99.32774
[[5]]
3 4 6 10
70.71068 40.16217 93.40771 82.49242
[[6]]
1 2 3 5 7 8 9 10
95.01579 89.02247 91.78780 93.40771 64.53681 75.66373 97.08244 34.92850
[[7]]
1 2 6 9 10
40.01250 51.10773 64.53681 60.41523 57.55867
[[8]]
3 6
94.86833 75.66373
[[9]]
1 2 6 7 10
30.870698 9.486833 97.082439 60.415230 67.119297
[[10]]
1 2 4 5 6 7 9
73.75636 60.53924 99.32774 82.49242 34.92850 57.55867 67.11930

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to increase performance when randomly selecting clusters and adding observations? - r

Related

R, arguments imply differing number of row

How to remove near zero variance without employing the caret package?

Create N random integers with no gaps

R data frame manipulation

How to select/find coordinates within a distance from a list (X/Y) using R

Categories

Resources