rep and/or seq function to create continuously reducing vector? - r

Suppose I have a vector from 1 to 5,
a<-c(1:5)
What I need to do is to repeat the vector by losing one element continuously. That is, the final outcome should be like
1 2 3 4 5 1 2 3 4 1 2 3 1 2 1

We can reverse the vector and apply sequence
sequence(rev(a))
#[1] 1 2 3 4 5 1 2 3 4 1 2 3 1 2 1
Or another option is toeplitz
m1 <- toeplitz(a)
m1[lower.tri(m1, diag=TRUE)]
#[1] 1 2 3 4 5 1 2 3 4 1 2 3 1 2 1

Related

Generating a vector with random numbers repeated 'n' times

In RCode, How to generate a vector with 1000 values, with randomic numbers between 1:3, but each value repeat "n" times in sequence?
I know that
sample(1:3,1000, replace=TRUE)
will generate 1000 values ranging between 1 and 3, but I need each value to repeat, for example, 5 times. Like below:
[1] 2 2 2 2 2 3 3 3 3 3 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 3 3 3 3 3 2 2 2 2 2 1 1 1 1 1 3 3 3 3 3
.
.
.
We can use the rep with each
rep(sample(1:3, 1000, replace = TRUE), each = 5)
Try this:
rep(sample(1:3,1000, replace=TRUE),each=5)
Here is another option using kronecker produce
kronecker(sample(1:3,1000,replace = TRUE),rep(1,5))

Is there a way to change the index on kmeans()$cluster?

I am using kmeans() to create groups based on a score. The goal is to assign star ratings, so that the individuals with the highest scores get four stars, and the individuals with the lowest scores get 1 star. I would like to create the star variable based on the kmeans()$cluster value. However, as it stands, kmeans()$cluster indexes the clusters, but the index does not correspond to the relative position of the group.
Is there a way to manually assign the cluster indexes, or to set the index to be assigned in a certain order? I'm hoping to have kmeans()$cluster=1 for the low score group, kmeans()$cluster=2 for second lowest, etc.
id <- 1:500
set.seed(12); score <- runif(500, 0, 1)
dat <- data.frame(id, score)
km = kmeans(dat$score, 4, nstart=10)
plot(dat$score,
col = c(km$cluster),
main="K-Means result with 4 clusters",
pch=20,
cex=0.8)
dat$star <- km$cluster
plot(dat$score,
dat$star,
main="Score v. cluster number")
Any of these will yield a new cluster assignment vector such that 1 refers to the cluster with the smallest center, 2 the next and so on. The first is expressed solely in terms of fitted(km) whereas the second is expressed in terms of km$centers and km$cluster and the last is expressed in terms of fitted(km) and km$center
fit <- fitted(km)
factor(fit, labels = 1:nlevels(factor(fit)))
rank(km$centers)[km$cluster])
match(fitted(km), sort(km$centers))
Yes. You can just use a small table of what you want the values to be and use the original cluster number to look them up. Here is an example.
set.seed(2017)
KM3 = kmeans(iris[,1:4], 3)
KM3$cluster
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[39] 2 2 2 2 2 2 2 2 2 2 2 2 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[77] 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 1 1 1 1 3 1 1 1 1 1 1 3
[115] 3 1 1 1 1 3 1 3 1 3 1 1 3 3 1 1 1 1 1 3 1 1 1 1 3 1 1 1 3 1 1 1 3 1 1 3
The clusters are in an awkward order. I want the low numbered points to be in cluster 1, the middle in cluster 2 and the high numbered points in cluster 3. So I want to change all of the 1's to 3, the 2's to 1 and the 3's to 2.
Relabel = c(3,1,2)
KM3$cluster = Relabel[KM3$cluster]
KM3$cluster
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[39] 1 1 1 1 1 1 1 1 1 1 1 1 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[77] 2 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 2 3 3 3 3 2 3 3 3 3 3 3 2
[115] 2 3 3 3 3 2 3 2 3 2 3 3 2 2 3 3 3 3 3 2 3 3 3 3 2 3 3 3 2 3 3 3 2 3 3 2
Just a little extra detail. It says Relabel = c(3,1,2) because I want 1 to become 3, so the first location has a 3. I want 2 to become 1, so the second location has a 1. And i want 3 to become 2 so the third location has a 2.

split dataframe cumulatively by variable level

With a df like this:
x=data.frame(id=c(1,1,1,2,2,2,3,3,3), val=c(1,2,3,2,3,4,1,3,0))
I want to get output like this:
[[1]]
id val
1 1 1
2 1 2
3 1 3
[[2]]
id val
1 1 1
2 1 2
3 1 3
4 2 2
5 2 3
6 2 4
[[3]]
id val
1 1 1
2 1 2
3 1 3
4 2 2
5 2 3
6 2 4
7 3 1
8 3 3
9 3 0
where the df is split into a list of as many dataframes as there are levels of the splitting variable, i.e. id. Each dataframe should start at the first level and include all rows up to each successive level.
I can do this with a loop:
out<-NULL
for(i in 1:3){
out[[i]] <- x[x$id<=i,]
}
out
However, is there a simpler method using e.g. split that I am overlooking? Ideally a one liner.
You can do this in base R with split and Reduce using the accumulate=TRUE argument. split is used to split the data.frame into a list of data.frames by by ID. Reduce is applies rbind to each list element and adding the accumulate=TRUE successively combines the data.frames in the list.
Reduce(rbind, split(x, x$id), accumulate=TRUE)
[[1]]
id val
1 1 1
2 1 2
3 1 3
[[2]]
id val
1 1 1
2 1 2
3 1 3
4 2 2
5 2 3
6 2 4
[[3]]
id val
1 1 1
2 1 2
3 1 3
4 2 2
5 2 3
6 2 4
7 3 1
8 3 3
9 3 0

r repeat sequence number sequence while keeping the order of the sequence

I want repeat a sequence for specific length:
Sequence is 1:4 and I want to repeat the sequence till number of rows in a data frame.
Lets say length of the data frame is 24
I tried following:
test <- rep(1:4, each=24/4)
1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4
Lengthwise this is fine but i want to retain the sequence
1 2 3 4 1 2 3 4 1 2 3 4.....
You need to use times instead of each
rep(1:4, times=24/4)
[1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
We can just pass it without any argument and it takes the times by default
rep(1:4, 24/4)
#[1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Generating large drawing lists in R

Say I have a list in R like so,
[1] 3 5 4 7
And I want to generate all "drawings" from this list, from 1 up to the value of each number. For example,
1 1 1 1
1 1 1 2
1 1 1 3
...
2 3 3 1
2 3 3 2
2 3 3 3
...
3 5 4 7
I know I have used rep() in the past to do something very similar, which works for lists of 2 or 3 numbers (i.e. something like 1 4 5), but I'm not sure how to generalize this here.
Thoughts?
As suggested in comments, use Map function to apply seq to elements of your vector, then use expand.grid to generate data.frame with Cartesian product of result's elements:
head(expand.grid(Map(seq,c(3,5,4,7))))
Var1 Var2 Var3 Var4
1 1 1 1 1
2 2 1 1 1
3 3 1 1 1
4 1 2 1 1
5 2 2 1 1
6 3 2 1 1

Resources