how to subset every 6 rows in R? - r

I have to subset the data of 6 rows every time. How to do that in R?
data:
col1 : 1,2,3,4,5,6,7,8,9,10
col2 : a1,a2,a3,a4,a5,a6,a7,a8,a9,a10
I want to do subset of 6 rows every time. First subset of the rows will have 1:6 ,next subset of the rows will have 7:nrow(data). I have tried using seq function.
seqData <- seq(1,nrow(data),6)
output: It is giving 1 and 7th row but I want 1 to 6 rows first, next onwards 7 to nrow(data).
How to get output like that.

Will this work:
set.seed(1)
dat <- data.frame(c1 = sample(1:5,12,T),
c2 = sample(1:5,12,T))
dat
c1 c2
1 1 2
2 4 2
3 1 1
4 2 5
5 5 5
6 3 1
7 2 1
8 3 5
9 3 5
10 1 2
11 5 2
12 5 1
split(dat, rep(1:ceiling(nrow(dat)/6), each = 6))
$`1`
c1 c2
1 1 2
2 4 2
3 1 1
4 2 5
5 5 5
6 3 1
$`2`
c1 c2
7 2 1
8 3 5
9 3 5
10 1 2
11 5 2
12 5 1

The function below creates a numeric vector with integers increasing by 1 unit every n rows. And uses this vector to split the data as needed.
data <- data.frame(col1 = 1:10, col2 = paste0("a", 1:10))
split_nrows <- function(x, n){
f <- c(1, rep(0, n - 1))
f <- rep(f, length.out = NROW(x))
f <- cumsum(f)
split(x, f)
}
split_nrows(data, 6)

Here's a simple example with mtcars that yields a list of 6 subset dfs.
nrows <- nrow(mtcars)
breaks <- seq(1, nrows, 6)
listdfs <- lapply(breaks, function(x) mtcars[x:(x+5), ]) # increment by 5 not 6
listdfs[[6]] <- listdfs[[6]][1:2, ] #last df: remove 4 NA rows (36 - 32)

Related

I have a list of data frames and a character vector. I want to rename the second column of each data frame by iterating through the vector. How do I?

I have a list of dataframes. Each of these dataframes has the same number of columns and rows, and has a similar data structure:
df.list <- list(data.frame1, data.frame2, data.frame3)
I have a vector of characters:
charvec <- c("a","b","c")
I want to replace the column name of the second column in each data frame by iterating through the above character vector. For example, the first data frame's second column should be "a". The second data frame's second column should be "b".
[[1]]
col1 a
1 1 2
2 2 3
[[2]]
col1 b
1 1 2
2 2 3
A reproducible example:
charvec <- c("a","b","c")
df_list <- list(df1 = data.frame(x = seq_len(3), y = seq_len(3)), df2 = data.frame(x = seq_len(4), y = seq_len(4)), df3 = data.frame(x = seq_len(5), y = seq_len(5)))
for(i in seq_along(df_list)){
names(df_list[[i]])[2] <- charvec[i]
}
> df_list
$df1
x a
1 1 1
2 2 2
3 3 3
$df2
x b
1 1 1
2 2 2
3 3 3
4 4 4
$df3
x c
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Also can use map2 from purrr. Thanks to #ismirsehregal for example data.
library(purrr)
map2(
df_list,
charvec,
\(x, y) {
names(x)[2] <- y
x
}
)
Output
$df1
x a
1 1 1
2 2 2
3 3 3
$df2
x b
1 1 1
2 2 2
3 3 3
4 4 4
$df3
x c
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5

Divide the number into different groups according to the adjacency relationship

I have a dataframe that stores adjacency relations. I want to divide numbers into different groups according to this dataframe. The dataframe are as follows:
df = data.frame(from=c(1,1,2,2,2,3,3,3,4,4,4,5,5), to=c(1,3,2,3,4,1,2,3,2,4,5,4,5))
df
from to
1 1 1
2 1 3
3 2 2
4 2 3
5 2 4
6 3 1
7 3 2
8 3 3
9 4 2
10 4 4
11 4 5
12 5 4
13 5 5
In above dataframe, number 1 has links with number 1 and 3, number 2 has links with number 2, 3, 4, so number 1 can not be in same group with number 3 and number 2 can not be in same group with number 3 and number 4. In the end, groups can be c(1, 2, 5) and c(3, 4).
I wonder how to program it?
First replace the values of to with NA when from and to are equal.
df2 <- transform(df, to = replace(to, from == to, NA))
Then recursively bind each row of the data if from of the latter row has not appeared in to of the former rows.
Reduce(function(x, y) {
if(y$from %in% x$to) x else rbind(x, y)
}, split(df2, 1:nrow(df2)))
# from to
# 1 1 NA
# 2 1 3
# 3 2 NA
# 4 2 3
# 5 2 4
# 12 5 4
# 13 5 NA
Finally, you could extract unique elements for the both columns to get the two groups.
The overall pipeline should be
df |>
transform(to = replace(to, from == to, NA)) |>
(\(dat) split(dat, 1:nrow(dat)))() |>
Reduce(f = \(x, y) if(y$from %in% x$to) x else rbind(x, y))
The answer of Darren Tsai has solved this problem, but with some flaw.
Following is a very clumsy solution:
df = data.frame(from=c(1,1,2,2,2,3,3,3,4,4,4,5,5), to=c(1,3,2,3,4,1,2,3,2,4,5,4,5))
df.list = lapply(split(df,df$from), function(x){
x$to
})
group.idx = rep(1, length(unique(df$from)))
for (i in seq_along(df.list)) {
df.vec <- df.list[[i]]
curr.group = group.idx[i]
remain.vec = setdiff(df.vec, i)
for (j in remain.vec) {
if(group.idx[j] == curr.group){
group.idx[j] = curr.group + 1
}
}
}
group.idx
[1] 1 1 2 2 1

Combining elements of one column into two columns by group in R

Given a two column data.frame with one containing group labels and a second containing integer values ordered from smallest to largest. How can the data be expanded creating pairs of combinations of the integer column?
Not sure the best way to state this. I'm not interested in all possible combinations but instead all unique combinations starting from the lowest value.
In r, the combn function gives the desired output not considering groups, for example:
t(combn(seq(1:4),2))
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 1 4
[4,] 2 3
[5,] 2 4
[6,] 3 4
Since the first values is 1 we get the unique combination of (1,2) and not the additional combination of (2,1) which I don't need. How would one then apply a similar method by groups?
for example given a data.frame
test <- data.frame(Group = rep(c("A","B"),each=4),
Val = c(1,3,6,8,2,4,5,7))
test
Group Val
1 A 1
2 A 3
3 A 6
4 A 8
5 B 2
6 B 4
7 B 5
8 B 7
I was able to come up with this solution that gives the desired output:
test <- data.frame(Group = rep(c("A","B"),each=4),
Val = c(1,3,6,8,2,4,5,7))
j=1
for(i in unique(test$Group)){
if(j==1){
one <- filter(test,i == Group)
two <- data.frame(t(combn(one$Val,2)))
test1 <- data.frame(Group = i,Val1=two$X1,Val2=two$X2)
j=j+1
}else{
one <- filter(test,i == Group)
two <- data.frame(t(combn(one$Val,2)))
test2 <- data.frame(Group = i,Val1=two$X1,Val2=two$X2)
test1 <- rbind(test1,test2)
}
}
test1
Group Val1 Val2
1 A 1 3
2 A 1 6
3 A 1 8
4 A 3 6
5 A 3 8
6 A 6 8
7 B 2 4
8 B 2 5
9 B 2 7
10 B 4 5
11 B 4 7
12 B 5 7
However, this is not elegant and is really slow as the number of groups and length of each group become large. It seems like there should be a more elegant and efficient solution but so far I have not come across anything on SO.
I would appreciate any ideas!
here is a data.table approach
library( data.table )
#make test a data.table
setDT(test)
#split by group
L <- split( test, by = "Group")
#get unique combinations of 2 Vals
L2 <- lapply( L, function(x) {
as.data.table( t( combn( x$Val, m = 2, simplify = TRUE ) ) )
})
#merge them back together
data.table::rbindlist( L2, idcol = "Group" )
# Group V1 V2
# 1: A 1 3
# 2: A 1 6
# 3: A 1 8
# 4: A 3 6
# 5: A 3 8
# 6: A 6 8
# 7: B 2 4
# 8: B 2 5
# 9: B 2 7
#10: B 4 5
#11: B 4 7
#12: B 5 7
You can set simplify = F in combn() and then use unnest_wider() in dplyr.
library(dplyr)
library(tidyr)
test %>%
group_by(Group) %>%
summarise(Val = combn(Val, 2, simplify = F)) %>%
unnest_wider(Val, names_sep = "_")
# Group Val_1 Val_2
# <chr> <dbl> <dbl>
# 1 A 1 3
# 2 A 1 6
# 3 A 1 8
# 4 A 3 6
# 5 A 3 8
# 6 A 6 8
# 7 B 2 4
# 8 B 2 5
# 9 B 2 7
# 10 B 4 5
# 11 B 4 7
# 12 B 5 7
library(tidyverse)
df2 <- split(df$Val, df$Group) %>%
map(~gtools::combinations(n = 4, r = 2, v = .x)) %>%
map(~as_tibble(.x, .name_repair = "unique")) %>%
bind_rows(.id = "Group")

add column to existing column in r

How do I convert 2 columns from a data.frame onto 2 different columns?
I.E:
Data
A B C D
1 3 5 7
2 4 6 8
to
Data
A B
1 3
2 4
5 7
6 8
You can use rbind
rbind(df[,1:2], data.frame(A = df$C, B = df$D))
You can use a fast version of rbind, rbindlist from data.table:
library(data.table)
rbindlist(lapply(seq(1, ncol(df), 2), function(i) df[,i:(i+1)]))
Here is my solution but it requires to change names of the columns.
names(dat) <- c("A", "B", "A", "B")
merge(dat[1:2], dat[3:4], all = T)
A B
1 1 3
2 2 4
3 5 7
4 6 8
And here is another solution more easy.
dat[3:4, ] <- dat[ ,3:4]
dat <- dat[1:2]
dat
A B
1 1 3
2 2 4
3 5 7
4 6 8
For scalability, a solution that will halve any even size data frame and append the rows:
half <- function(df) {m <- as.matrix(df)
dim(m) <- c(nrow(df)*2,ncol(df)/2)
nd <- as.data.frame(m)
names(nd) <- names(df[(1:dim(nd)[2])]);nd}
half(Data)
A B
1 1 5
2 2 6
3 3 7
4 4 8

Sorting and numbering can be easier (in R) [duplicate]

This question already has answers here:
How can I rank observations in-group faster?
(4 answers)
Closed 6 years ago.
First: I want to sort a dataframe and then add a rank to the dataframe.
df <- data.frame(a = 3:1, b = 6:4, Rank = NA) # create dataframe
df <- df[order(df[, 1], df[, 2]), ] # sort dataframe
for ( i in 1:nrow(dataframe) ) dataframe[i, 3] <- i # add the ranking
Second: I want to sort within a group g
df <- data.frame(g = sample(1:4, 4), num = 1:20, Rank = NA)
df <- df[order(df[, 1], df[, 2]), ]
row <- 1
for (x in 1:4) {
rank <- 1
df[row, 3] <- rank # adding the number one to list
row <- row + 1 # move to the next row!
while (df[row - 1, 1] == df[row, 1] & row < length(df[,1]) + 1){
# Check if state is the last row still same same, otherwise stop next loop!
rank <- rank + 1 # adding next to rank!
df[row, 3] <- rank # Put rank in dataframe!
row <- row + 1 # move to next row
}
}
it works but I would like to accomplish the same tasks with more parsimonious or efficient coding.
Try:
set.seed(123)
df = data.frame(g=sample(1:4, 4), num = 1:20, Rank = NA)
library(dplyr)
df %>% group_by(g) %>% arrange(num) %>% mutate(rank = seq_along(g))
Source: local data frame [20 x 4]
Groups: g
g num Rank rank
1 1 3 NA 1
2 1 7 NA 2
3 1 11 NA 3
4 1 15 NA 4
5 1 19 NA 5
6 2 1 NA 1
7 2 5 NA 2
8 2 9 NA 3
9 2 13 NA 4
10 2 17 NA 5
11 3 2 NA 1
12 3 6 NA 2
13 3 10 NA 3
14 3 14 NA 4
15 3 18 NA 5
16 4 4 NA 1
17 4 8 NA 2
18 4 12 NA 3
19 4 16 NA 4
20 4 20 NA 5
Is this what you need?
df = data.frame(g=sample(1:4, 4), num = 1:20, Rank = NA)
df <- df[order(df[,1],df[,2]),]
df$Rank <- rep(1:5,4)

Resources