How can I create unique random numbers in R? - r

I hope to generate random numbers between 1:100 and then test their divisibility by 3. I have created a loop.
v <- c(0)
for(i in 1:100){
r <- floor(runif(1, min=1, max=100))
if(r %% 3 == 0){
v <- append(v,r)
}
}
print(v)
However, the numbers do keep repeating as you can see in the following output. Is there any way to only generate unique multiples of 3 between 1:100. I am aware there's a way to use the seq function and generate the same numbers, but I still want to know how to acquire unique random numbers.
Output:
[1] 0 18 87 30 45 90 12 72 75 60 27 84 90 27 42 54 63 15 63 30 72 69 57 30 3 6 15 30 3
[30] 60 72 6 6 18 75 96 84 78 24

sample(1:33)*3 is all the multiples of 3 in your range in a random order.

Related

Subsetting data at an irregular interval in an R function

I have a function like this
extract = function(x)
{
a = x$2007[6:18]
b = x$2007[30:42]
c = x$2007[54:66]
}
the subsetting needs to continue up to 744 in this way. I need to skip the first 6 data points, and then pull out every other 12 points into a new object or a list. Is there a more elegant way to do this with a for loop or apply?
Side note: if 2007 is truly a column name (you would have had to explicitly do this, R defaults to converting numbers to names starting with letters, see make.names("2007")), then x$"2007"[6:18] (etc) should work for column reference.
To generate that sequence of integers, let's try
nr <- 100
ind <- seq(6, nr, by = 12)
ind
# [1] 6 18 30 42 54 66 78 90
ind[ seq_along(ind) %% 2 == 1 ]
# [1] 6 30 54 78
ind[ seq_along(ind) %% 2 == 0 ]
# [1] 18 42 66 90
Map(seq, ind[ seq_along(ind) %% 2 == 1 ], ind[ seq_along(ind) %% 2 == 0 ])
# [[1]]
# [1] 6 7 8 9 10 11 12 13 14 15 16 17 18
# [[2]]
# [1] 30 31 32 33 34 35 36 37 38 39 40 41 42
# [[3]]
# [1] 54 55 56 57 58 59 60 61 62 63 64 65 66
# [[4]]
# [1] 78 79 80 81 82 83 84 85 86 87 88 89 90
So you can use this in your function to create a list of subsets:
nr <- nrow(x)
ind <- seq(6, nr, by = 12)
out <- lapply(Map(seq, ind[ seq_along(ind) %% 2 == 1 ], ind[ seq_along(ind) %% 2 == 0 ]),
function(i) x$"2007"[i])
we could use
split( x[7:744] , cut(7:744,seq(7,744,12)) )

Calculate number of values in vector that exceed values in column of data.frame

I have a long list of numbers, e.g.
set.seed(123)
y<-round(runif(100, 0, 200))
And I would like to store in column y the number of values that exceed each value in column x of a data frame:
df <- data.frame(x=seq(0,200,20))
I can compute the numbers manually, like this:
length(which(y>=20)) #93 values exceed 20
length(which(y>=40)) #81 values exceed 40
etc. I know I can use a for-loop with all values of x, but is there a more elegant way?
I tried this:
df$y <- length(which(y>=df$x))
But this gives a warning and does not give me the desired output.
The data frame should look like this:
df
x y
1 0 100
2 20 93
3 40 81
4 60 70
5 80 61
6 100 47
7 120 40
8 140 29
9 160 19
10 180 8
11 200 0
You can compare each value of df$x against all value of y using sapply
sapply(df$x, function(a) sum(y>a))
#[1] 99 93 81 70 61 47 40 29 18 6 0
#Looking at your output, maybe you want
sapply(df$x, function(a) sum(y>=a))
#[1] 100 93 81 70 61 47 40 29 19 8 0
Here's another approach using outer that allows for element wise comparison of two vectors
rowSums(outer(df$x,y, "<="))
#[1] 100 93 81 70 61 47 40 29 19 8 0
Yet one more (from alexis_laz's comment)
length(y) - findInterval(df$x, sort(y), left.open = TRUE)
# [1] 100 93 81 70 61 47 40 29 19 8 0

Transpose and rearrange rows in a matrix

I have several files with the following structure:
data <- matrix(c(1:100000), nrow=1000, ncol=100)
The first 500 rows are X coordinates and the final 500 rows are Y coordinates of several object contours. Row # 1 (X) and row 501 (Y) correspond to coordinates of the same object. I need to:
transpose the whole matrix and arrange it so now row 1 is column 1 and row 501 is column 2 and have paired x, y coordinates in contiguous columns. Row 2 and row 502 should be in column 1 and column 2 below the data of previous object.
ideally, have an extra column with filename info.
thanks.
Simpler version:
Transpose the matrix, then create a vector with the column indices and subset with them:
mat <- matrix(1:100, nrow = 10)
mat2 <- t(mat)
cols <- unlist(lapply(1:(nrow(mat2)/2), function(i) c(i, i+nrow(mat2)/2)))
mat3 <- mat2[,cols]
Then just make it a dataframe as below.
You can subset pairs of rows separated by nrow/2, make them a 2-column matrix and then cbind them all:
df <- as.data.frame(do.call(cbind, lapply(1:(nrow(mat)/2), function(i) {
matrix(mat[c(i, nrow(mat)/2 + i),], ncol = 2, byrow = TRUE)
})))
df
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 fname
# 1 1 6 2 7 3 8 4 9 5 10 a
# 2 11 16 12 17 13 18 14 19 15 20 e
# 3 21 26 22 27 23 28 24 29 25 30 e
# 4 31 36 32 37 33 38 34 39 35 40 o
# 5 41 46 42 47 43 48 44 49 45 50 y
# 6 51 56 52 57 53 58 54 59 55 60 q
# 7 61 66 62 67 63 68 64 69 65 70 v
# 8 71 76 72 77 73 78 74 79 75 80 b
# 9 81 86 82 87 83 88 84 89 85 90 v
# 10 91 96 92 97 93 98 94 99 95 100 y
Then just add the new column as necessary, since it's now a dataframe:
df$fname <- sample(letters, nrow(df), TRUE)
What about
n <- 500
df <- data.frame(col1 = data[1:n, ],
col2 = data[(nrow(data) - 500):nrow(data), ],
fileinfo = "this is the name of the file...")
Try David's answer, but this way:
n <- 500
df <- data.frame(col1 = data[1:n, ],
col2 = data[(nrow(data) - (n-1)):nrow(data), ],
fileinfo = "this is the name of the file...")

Split vector randomly into two sets

I have a vector t with length 100 and want to divide it into 30 and 70 values but the values should be chosen randomly and without replacement. So none of the 30 values are allowed to be in the sub vector of the 70 values and vice versa.
I know the R function sample which I can use to randomly chose values from a vector with and without replacement. However, even when I use replace = FALSE I have to run the sample function twice once with 30 and once with 70 values to chose. That means that some of the 30 values might be in the 70 values and vice versa.
Any ideas?
How about this:
t <- 1:100 # or whatever your original set is
a <- sample(t, 70)
b <- setdiff(t, a)
Regarding my comment, what is wrong with:
vec <- 1:100
set.seed(2)
samp <- sample(length(vec), 30)
a <- vec[samp]
b <- vec[-samp]
?
To show these are separate sets with no duplicates:
R> intersect(a, b)
integer(0)
If you have duplicate values in your vector that is a different matter, but your question is unclear.
With duplicates in vec things are a bit more complicated and it depends what result you wanted to achieve.
R> set.seed(4)
R> vec <- sample(100, 100, replace = TRUE)
R> set.seed(6)
R> samp <- sample(100, 30)
R> a <- vec[samp]
R> b <- vec[-samp]
R> length(a)
[1] 30
R> length(b)
[1] 70
R> length(setdiff(vec, a))
[1] 41
So the setdiff() "fails" here as it doesn't get the length right, but then a and b contain duplicate values (but not observations! from the sample):
R> intersect(a, b)
[1] 57 35 91 27 71 63 8 92 49 77
The duplicates (intersection) arises because the values above occurred twice in the original sample vec
What about something like this?
x <- 1:100
s70 <- sample(x, 70, replace=FALSE)
s30 <-sample(setdiff(x, s70), 30, replace=FALSE)
s30 will have the same numbers as setdiff(x, s70), the difference between them is:
s30 an unordered vector of length 30 and setdiff(x, s70) will give you an (ascending) ordered vector of length 30. You said you want random subsamples of length 70 and 30 so s30 is better than just setdiff(x, s70). If order does not really matter, so the better alternative will be using setdiff without sample as in #seancarmody's answer.
As you've mentioned "split", you can also try something like this:
set.seed(1)
t <- sample(20:40, 100, replace=TRUE)
groups <- rep("A", 100)
groups[sample(100, 30)] <- "B"
table(groups)
# groups
# A B
# 70 30
split(t, groups)
# $A
# [1] 25 32 39 24 38 39 33 21 24 23 36 40 27 36 24 33 22 25 28 28 38 27 30 30 23
# [26] 34 35 37 33 31 36 20 30 35 34 30 29 25 22 26 33 28 26 29 26 33 30 36 21 38
# [51] 27 37 27 27 30 38 38 36 29 34 28 26 35 25 23 25 21 33 36 28
#
# $B
# [1] 27 33 34 28 30 35 39 20 32 37 36 22 28 36 31 38 21 30 39 25 28 40 24 34 22
# [26] 38 36 29 37 32

For loop in R with increments

I am trying to write a for loop which will increment its value by 2. The equivalent code is c is
for (i=0; i<=78; i=i+2)
How do I achieve the same in R?
See ?seq for more info:
for(i in seq(from=1, to=78, by=2)){
# stuff, such as
print(i)
}
or
for(i in seq(1, 78, 2))
p.s. Pardon my C ignorance. There, I just outed myself.
However, this is a way to do what you want in R (please see updated code)
EDIT
After learning a bit of how C works, it looks like the example posted in the question iterates over the following sequence: 0 2 4 6 8 ... 74 76 78.
To replicate that exactly in R, start at 0 instead of at 1, as above.
seq(from=0, to=78, by=2)
[1] 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44
[24] 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78
you can do so in following way, you can put any length upto which you want iteration in place of length(v1), and the increment value at position of 2 to your desired value
for(i in seq(1,length(v1),2))

Resources