How to randomise a vector and keep the frequency of the elements fixed? - r

Extending this former question, how can I shuffle (randomize) the following vector
a1 = c(1, 1, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5)
in order to get something like this:
a2 = c(5, 5, 3, 3, 3, 3, 1, 1, 2, 4, 4, 4)
or even better like this:
a3 = c(4, 4, 4, 2, 3, 3, 3, 3, 1, 1, 5, 5)?
such that each element could randomly change to another but with keeping the number of each element constant?

You can try something like this: create a factor from a1 with randomly shuffled levels and then convert it to integers:
as.integer(factor(a1, levels = sample(unique(a1), length(unique(a1)))))
# [1] 5 5 4 4 4 4 3 3 2 1 1 1

The data:
a1 <- c(1, 1, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5)
First steps:
# extract values and their frequencies
val <- unique(a1)
tab <- table(a1)
freq <- tab[as.character(val)]
Keep original order of frequencies but sample values
rep(sample(val), freq)
# [1] 4 4 1 1 1 1 3 3 5 2 2 2
Keep original frequencies but sample order of values
rep(sa <- sample(val), freq[as.character(sa)])
# [1] 4 2 2 2 2 3 3 1 1 5 5 5

Seems like a perfect application for rle and its inverse rep:
rand_inverse_rle <- function(x) { x=sort(x)
ord=sample (length(rle(x)$values) )
unlist( mapply( rep, rle(x)$values[ord], rle(x)$lengths[ord]))}
rand_inverse_rle(a1)
#----------
[1] 3 3 4 5 5 5 2 2 2 2 1 1
This was my reading of a function needed to satisfy the natural language requirements:
> a1 = sample( c(1, 1, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5) )
> a1
[1] 5 2 5 2 5 1 3 4 2 2 3 1
> rand_inverse_rle(a1)
[1] 5 5 5 4 2 2 2 2 3 3 1 1
> rand_inverse_rle(a1)
[1] 1 1 3 3 5 5 5 2 2 2 2 4
> rand_inverse_rle(a1)
[1] 1 1 3 3 4 5 5 5 2 2 2 2

Related

Divide data in to chunks with multiple values in each chunk in R

I have a dataframe with observations from three years time, with column df$week that indicates the week of the observation. (The week count of the second year continues from the count of the first, so the data contains 207 weeks).
I would like to divide the data to longer time periods, to df$period that would include all observations from several weeks' time.
If a period would be the length of three weeks, and I the data would include 13 observations in six weeks time, the I idea would be to divide
weeks <- c(1, 1, 1, 2, 2, 3, 3, 4, 5, 5, 6, 6, 6)
into
periods <- c(1, 1, 1, 2, 2, 3, 3), c(4, 5, 5, 6, 6, 6)
periods
[1]
1 1 1 2 2 3 3
[2]
4 5 5 6 6 6
To look something like
> df
week period
1 1 1
2 1 1
3 1 1
4 2 1
5 2 1
6 3 1
7 3 1
8 4 2
9 5 2
10 5 2
11 6 2
12 6 2
13 6 2
>
The data contains +13k rows so would need to do some sort of map in style of
mapPeriod <- function(df, fun) {
out <- vector("vector_of_weeks", length(df))
for (i in seq_along(df)) {
out[i] <- fun(df[[i]])
}
out
}
I just don't know what to include in the fun to divide the weeks to the decided sequences of periods. Can function rep be of assistance here? How?
I would be very grateful for all input and suggestions.
split(weeks, f = (weeks - 1) %/% 3)
$`0`
[1] 1 1 1 2 2 3 3
$`1`
[1] 4 5 5 6 6 6
from comments below
weeks <- c(1, 1, 1, 2, 2, 3, 3, 4, 5, 5, 6, 6, 6)
df <- data.frame(weeks)
library(data.table)
df$period <- data.table::rleid((weeks - 1) %/% 3)
# weeks period
# 1 1 1
# 2 1 1
# 3 1 1
# 4 2 1
# 5 2 1
# 6 3 1
# 7 3 1
# 8 4 2
# 9 5 2
# 10 5 2
# 11 6 2
# 12 6 2
# 13 6 2

Sorting specific columns of a dataframe by their names in R

df is a test dataframe and I need to sort the last three columns in ascending order (without hardcoding the order).
df <- data.frame(X = c(1, 2, 3, 4, 5),
Z = c(1, 2, 3, 4, 5),
Y = c(1, 2, 3, 4, 5),
A = c(1, 2, 3, 4, 5),
C = c(1, 2, 3, 4, 5),
B = c(1, 2, 3, 4, 5))
Desired output:
> df
X Z Y A B C
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
5 5 5 5 5 5 5
I'm aware of the order() function but I can't seem to find the right way to implement it to get the desired output.
Update:
Base R:
cbind(df[1:3],df[4:6][,order(colnames(df[4:6]))])
First answer:
We could use relocate from dplyr:
https://dplyr.tidyverse.org/reference/relocate.html
It is configured to arrange columns:
Here we relocate by the index.
We take last (index = 6) and put it before (position 5, which is C)
library(dplyr)
df %>%
relocate(6, .before = 5)
An alternative:
library(dplyr)
df %>%
select(order(colnames(df))) %>%
relocate(4:6, .before = 1)
X Z Y A B C
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
5 5 5 5 5 5 5
In base R, a selection on the first columns then sort the last 3 names :
df[, c(names(df)[1:(ncol(df)-3)], sort(names(df)[ncol(df)-2:0]))]
We want to reorder the columns based on the column names, so if we use names(df) as the argument to order, we can reorder the data frame as follows.
The complicating factor is that order() returns a vector of numbers, so if we want to reorder only a subset of the column names, we'll need an approach that retains the original sort order for the first three columns.
We accomplish this by creating a vector of the first 3 column names, the sorted remaining column names using a function that returns the values rather than locations in the vector, and then use this with the [ form of the extract operator.
df <- data.frame(X = c(1, 2, 3, 4, 5),
Z = c(1, 2, 3, 4, 5),
Y = c(1, 2, 3, 4, 5),
A = c(1, 2, 3, 4, 5),
C = c(1, 2, 3, 4, 5),
B = c(1, 2, 3, 4, 5))
df[,c(names(df[1:3]),sort(names(df[4:6])))]
...and the output:
> df[,c(names(df[1:3]),sort(names(df[4:6])))]
X Z Y A B C
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
5 5 5 5 5 5 5
to_order <- seq(ncol(df)) > ncol(df) - 3
df[order(to_order*order(names(df)))]
#> X Z Y A B C
#> 1 1 1 1 1 1 1
#> 2 2 2 2 2 2 2
#> 3 3 3 3 3 3 3
#> 4 4 4 4 4 4 4
#> 5 5 5 5 5 5 5
Created on 2021-12-24 by the reprex package (v2.0.1)

R- dataframe with vector entries of different length

I have a vector that looks like this
[1] "NNNNNNNNNN" "NN NN NN NN NN" "NN NN NN NN NN" "N NN NNN NN NN"
[5] "NNNNNNNNNNNN" "NNN NNN NNN" "NN-NNNNNNN" "NNNNNNN"
[9] "NNNNNNN" "NNNNNNN"
The vector is coded as numbers further on in the code to look like this
[[1]]
[1] 2 2 2 2 2 2 2 2 2 2
[[2]]
[1] 2 2 9 2 2 9 2 2 9 2 2 9 2 2
[[3]]
[1] 2 2 9 2 2 9 2 2 9 2 2 9 2 2
[[4]]
[1] 2 9 2 2 9 2 2 2 9 2 2 9 2 2
[[5]]
[1] 2 2 2 2 2 2 2 2 2 2 2 2
[[6]]
[1] 2 2 2 9 2 2 2 9 2 2 2
It is actual this vector I need to create a dataframe from so I can transpose it and then use it in regression. When I try and use as.data.frame a get an error message say that lengths vary. Any help greatly appreciated
Here is the output of dput
dput(vecs2T)
list(c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2), c(2, 2, 9, 2, 2, 9, 2,
2, 9, 2, 2, 9, 2, 2), c(2, 2, 9, 2, 2, 9, 2, 2, 9, 2, 2, 9, 2,
2), c(2, 9, 2, 2, 9, 2, 2, 2, 9, 2, 2, 9, 2, 2), c(2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2), c(2, 2, 2, 9, 2, 2, 2, 9, 2, 2, 2),
c(2, 2, 9, 2, 2, 2, 2, 2, 2, 2), c(2, 2, 2, 2, 2, 2, 2),
c(2, 2, 2, 2, 2, 2, 2), c(2, 2, 2, 2, 2, 2, 2))
This is what I would like the data frame to look like
V1 V2 V3 V4 V5 V6 V7 V8
1 1 1 1 1 9 1 2 2
2 2 2 2 2 1 1 1 1
3 1 1 1 1 1 1 1 1
4 9 9 9 2 2 2 2 1
5 2 2 2 2 2 2 2 2
6 1 1 1 9 2 2 2 2
7 1 1 1 1 1 1 1 1
I would like the first number from each of the vectors to be under V1. So each row corresponds to one of the vectors in dput above. But because the vectors are of different length some of the columns will be empty which seem to be causing the trouble.

Count the occurrence of one vector's values in another vector including non match values in R

I have 2 vectors:
v1 <- c(1, 2, 3, 4, 1, 3, 5, 6, 4)
v2 <- c(1, 2, 3, 4, 5, 6, 7)
I want to calculate the occurrence of values of v1 in v2. The expected result is:
1 2 3 4 5 6 7
2 1 2 2 1 1 0
I know there is a function can do this:
table(v1[v1 %in% v2])
However, it only list the matched values:
1 2 3 4 5 6
2 1 2 2 1 1
How can I show all the values in v2?
You can do
table(factor(v1, levels=unique(v2)))
# 1 2 3 4 5 6 7
# 2 1 2 2 1 1 0

R Language - The average number of steps to get number

I have sequence of the numbers(Really it is just a piece of this sequence. In fact I have over 100k numbers)
1 2 3 3 2 3 2 3 2 1 2 3 2 3 2 3 3 2 3 2 3 2 1 3 3 2 3 3 2 3 3 3 2 3 2 3 2 1 3 2 3 3 3 2 3 3 2 3 2 3
I need to calculate the average number of steps after I get 1 in this sequence.
For example:
In this sequence 1 is first number. Now I count number of steps to get next 1 and I get 9. Next 1 is after 13 steps, next after 15 steps etc.
Now I have to calculate the average number of steps.
So there we have (9+13+15)/3= 12.(3)
How I can do this in R Language?
You can try:
mean(diff(which(x == 1)))
## [1] 12.33333
Given:
x <- c(1, 2, 3, 3, 2, 3, 2, 3, 2, 1, 2, 3, 2, 3, 2, 3, 3, 2, 3, 2,
3, 2, 1, 3, 3, 2, 3, 3, 2, 3, 3, 3, 2, 3, 2, 3, 2, 1, 3, 2, 3,
3, 3, 2, 3, 3, 2, 3, 2, 3)

Resources