Issue of generating conditional numbers to a set frequency in R - r

I am having a issue generating conditional numbers. Repeated frequency of the number is shown in "size". For example, 1 should be repeated 3 times and 2 should be repeated 2 times and so on.
My desired output is shown below but I am unable to achieve this. Can somebody correct me please?
Desired output
x1
1 1
2 1
3 1
4 2
5 2
6 3
7 4
8 4
9 5
10 5
data <- data.frame(x1= rep(c(1),each=10))
data
size <- as.array(c(3,2,1,2,2))
for(i in 1:5) {
x_val <- size[i]
new <- rep(c(x_val), each=x_val)
data[nrow(size[i]) + 1, ] <- new
}
print(data)
x1
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
10 1

We could use rep with times
data.frame(x1 = rep(seq_along(size), size))
-output
x1
1 1
2 1
3 1
4 2
5 2
6 3
7 4
8 4
9 5
10 5
If we need a for loop
x1 <- c()
for(i in seq_along(size)) x1 <- c(x1, rep(i, each = size[i]))
x1
#[1] 1 1 1 2 2 3 4 4 5 5

Related

How can I create rank variables for each other variables in R?

Hello dear community members.
I'm trying to create ranking variables for certain variables in R. For example I want to transform this data frame
> df
X1 X2 X3 X4 X5
1 1 4 7 3 2
2 2 5 8 4 3
3 3 6 3 5 4
4 4 1 2 6 5
5 5 2 1 7 6
into
> df
X1 X2 X3 X4 X5 x1_rank x2_rank x3_rank
1 1 4 7 3 2 3 2 1
2 2 5 8 4 3 3 2 1
3 3 6 3 5 4 3 1 3
4 4 1 2 6 5 1 3 2
5 5 2 1 7 6 1 2 3
like this (select X1~X3, and make ranking variables between them).
I tried this code
for (i in 1:nrow(df)) {
df_rank <- df[i, ] %>%
dplyr::select(X1, X2, X3, X4) %>%
base::rank()
}
I can imagine I can solve this problem by using for loop but I'm beginner about R so I do not understand why this doesn't work.
One way to achieve it is to use the ties argument on negative values.
df <- tibble::tribble(
~x1, ~x2, ~x3, ~x4, ~x5,
1,4,7,3,2,
2,5,8,4,3,
3,6,3,5,4,
4,1,2,6,5,
5,2,1,7,6
)
library(magrittr)
df %>%
cbind(
t(apply(-df[,1:3], 1, rank, ties = "min")) %>% {colnames(.) <- paste0(colnames(.), "_rank"); .}
)
x1 x2 x3 x4 x5 x1_rank x2_rank x3_rank
1 1 4 7 3 2 3 2 1
2 2 5 8 4 3 3 2 1
3 3 6 3 5 4 2 1 2
4 4 1 2 6 5 1 3 2
5 5 2 1 7 6 1 2 3
As to why your code does not work - the for loop does not return anything, instead, it assigns a variable df_rank every iteration. To fix it, you could declare an object outside of the loop, and add content to it each iteration, and finally bind that to the original data.
m <- matrix(ncol = 3, nrow = 5)
for (i in 1:nrow(df)) {
m[i,] <- -df[i, ] %>%
dplyr::select(x1, x2, x3) %>%
base::rank(ties = "min")
}
colnames(m) <- paste0(names(df)[1:3], "_rank")
df %>% bind_cols(m)

Filter based on matching condition in R [duplicate]

This question already has an answer here:
Find rows in a data frame where two columns are equal
(1 answer)
Closed 2 years ago.
I'm trying to execute a command to only keep rows where the 'ID' is the same in column Y as it is in column X. In other words, keep the row if the 'ID' in column Y matches the ID in column X.
edit: here's the code that is close but not quite there. What I need is to add a condition to the Y column. So it should keep rows where the ID in column X equals the ID in column Y when column Y = '34'.
data %>%
filter(ID %in% X == ID %in% Y)
You can use join or just do something like this:
df <- data.frame(x = 1:13, y = c(1:5,7:14))
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 7
7 7 8
8 8 9
9 9 10
10 10 11
11 11 12
12 12 13
13 13 14
rows_to_select <- which(df$x==df$y,TRUE)
df[rows_to_select,]
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
You can use the 'which' function in base R. Example:
set.seed(7) # create toy dataframe
x1 <- sample(1:2, 10, replace = TRUE)
x2 <- sample(1:2, 10, replace = TRUE)
df <- data.frame(x1, x2)
df
x1 x2
1 2 2
2 2 1
3 2 1
4 1 1
5 2 1
6 2 2
7 1 2
8 1 1
9 1 1
10 1 2
keep <- which(df$x1 == df$x2) # only this line
keep
1 4 6 8 9
df2 <- df[keep , ] # and this line required for the reduced dataframe
df2
x1 x2
1 2 2
4 1 1
6 2 2
8 1 1
9 1 1

Select rows of data frame based on a vector with duplicated values

What I want can be described as: give a data frame, contains all the case-control pairs. In the following example, y is the id for the case-control pair. There are 3 pairs in my data set. I'm doing a resampling with respect to the different values of y (the pair will be both selected or neither).
sample_df = data.frame(x=1:6, y=c(1,1,2,2,3,3))
> sample_df
x y
1 1 1
2 2 1
3 3 2
4 4 2
5 5 3
6 6 3
select_y = c(1,3,3)
select_y
> select_y
[1] 1 3 3
Now, I have computed a vector contains the pairs I want to resample, which is select_y above. It means the case-control pair number 1 will be in my new sample, and number 3 will also be in my new sample, but it will occur 2 times since there are two 3. The desired output will be:
x y
1 1
2 1
5 3
6 3
5 3
6 3
I can't find out an efficient way other than writing a for loop...
Solution:
Based on #HubertL , with some modifications, a 'vectorized' approach looks like:
sel_y <- as.data.frame(table(select_y))
> sel_y
select_y Freq
1 1 1
2 3 2
sub_sample_df = sample_df[sample_df$y%in%select_y,]
> sub_sample_df
x y
1 1 1
2 2 1
5 5 3
6 6 3
match_freq = sel_y[match(sub_sample_df$y, sel_y$select_y),]
> match_freq
select_y Freq
1 1 1
1.1 1 1
2 3 2
2.1 3 2
sub_sample_df$Freq = match_freq$Freq
rownames(sub_sample_df) = NULL
sub_sample_df
> sub_sample_df
x y Freq
1 1 1 1
2 2 1 1
3 5 3 2
4 6 3 2
selected_rows = rep(1:nrow(sub_sample_df), sub_sample_df$Freq)
> selected_rows
[1] 1 2 3 3 4 4
sub_sample_df[selected_rows,]
x y Freq
1 1 1 1
2 2 1 1
3 5 3 2
3.1 5 3 2
4 6 3 2
4.1 6 3 2
Another method of doing the same without a loop:
sample_df = data.frame(x=1:6, y=c(1,1,2,2,3,3))
row_names <- split(1:nrow(sample_df),sample_df$y)
select_y = c(1,3,3)
row_num <- unlist(row_names[as.character(select_y)])
ans <- sample_df[row_num,]
I can't find a way without a loop, but at least it's not a for loop, and there is only one iteration per frequency:
sample_df = data.frame(x=1:6, y=c(1,1,2,2,3,3))
select_y = c(1,3,3)
sel_y <- as.data.frame(table(select_y))
do.call(rbind,
lapply(1:max(sel_y$Freq),
function(freq) sample_df[sample_df$y %in%
sel_y[sel_y$Freq>=freq, "select_y"],]))
x y
1 1 1
2 2 1
5 5 3
6 6 3
51 5 3
61 6 3

Subsequent row summing in dataframe object

I would like to do subsequent row summing of a columnvalue and put the result into a new columnvariable without deleting any row by another columnvalue .
Below is some R-code and an example that does the trick and hopefully illustrates my question. I was wondering if there is a more elegant way to do since the for loop will be time consuming in my actual object.
Thanks for any feedback.
As an example dataframe:
MyDf <- data.frame(ID = c(1,1,1,2,2,2), Y = 1:6)
MyDf$FIRST <- c(1,0,0,1,0,0)
MyDf.2 <- MyDf
MyDf.2$Y2 <- c(1,3,6,4,9,15)
The purpose of this is so that I can write code that calculates Y2 in MyDf.2 above for each ID, separately.
This is what I came up with and, it does the trick. (Calculating a TEST column in MyDf that has to be equal to Y2 cin MyDf.2)
MyDf$TEST <- NA
for(i in 1:length(MyDf$Y)){
MyDf[i,]$TEST <- ifelse(MyDf[i,]$FIRST == 1, MyDf[i,]$Y,MyDf[i,]$Y + MyDf[i-1,]$TEST)
}
MyDf
ID Y FIRST TEST
1 1 1 1 1
2 1 2 0 3
3 1 3 0 6
4 2 4 1 4
5 2 5 0 9
6 2 6 0 15
MyDf.2
ID Y FIRST Y2
1 1 1 1 1
2 1 2 0 3
3 1 3 0 6
4 2 4 1 4
5 2 5 0 9
6 2 6 0 15
You need ave and cumsum to get the column you want. transform is just to modify your existing data.frame.
> MyDf <- transform(MyDf, TEST=ave(Y, ID, FUN=cumsum))
ID Y FIRST TEST
1 1 1 1 1
2 1 2 0 3
3 1 3 0 6
4 2 4 1 4
5 2 5 0 9
6 2 6 0 15

R: create a data frame out of a rolling window

Lets say I have a data frame with the following structure:
DF <- data.frame(x = 0:4, y = 5:9)
> DF
x y
1 0 5
2 1 6
3 2 7
4 3 8
5 4 9
what is the most efficient way to turn 'DF' into a data frame with the following structure:
w x y
1 0 5
1 1 6
2 1 6
2 2 7
3 2 7
3 3 8
4 3 8
4 4 9
Where w is a length 2 window rolling through the dataframe 'DF.' The length of the window should be arbitrary, i.e a length of 3 yields
w x y
1 0 5
1 1 6
1 2 7
2 1 6
2 2 7
2 3 8
3 2 7
3 3 8
3 4 9
I am a bit stumped by this problem, because the data frame can also contain an arbitrary number of columns, i.e. w,x,y,z etc.
/edit 2: I've realized edit 1 is a bit unreasonable, as xts doesn't seem to deal with multiple observations per data point
My approach would be to use the embed function. The first thing to do is to create a rolling sequence of indices into a vector. Take a data-frame:
df <- data.frame(x = 0:4, y = 5:9)
nr <- nrow(df)
w <- 3 # window size
i <- 1:nr # indices of the rows
iw <- embed(i,w)[, w:1] # matrix of rolling-window indices of length w
> iw
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 4
[3,] 3 4 5
wnum <- rep(1:nrow(iw),each=w) # window number
inds <- i[c(t(iw))] # the indices flattened, to use below
dfw <- sapply(df, '[', inds)
dfw <- transform(data.frame(dfw), w = wnum)
> dfw
x y w
1 0 5 1
2 1 6 1
3 2 7 1
4 1 6 2
5 2 7 2
6 3 8 2
7 2 7 3
8 3 8 3
9 4 9 3

Resources