How to randomize the order of all sublists simultaneously - r

I am looking to randomize the order of the sublists, but retaining the structure. To illustrate, I can do this with a data frame:
df1 <- data.frame("X1" = LETTERS[1:5], "X2" = letters[1:5])
df1
df1R <- df1[sample(df1[,1]),]
df1R
> df1
X1 X2
1 A a
2 B b
3 C c
4 D d
5 E e
>
> df1R <- df1[sample(df1[,1]),]
> df1R
X1 X2
2 B b
5 E e
1 A a
3 C c
4 D d
You can see here that the overall order is randomised, but rows remain together, this is what I mean by retaining the structure - A stays with a, B stays with b...
I'd like to implement this for a list:
m1 <- list(LETTERS[1:5], letters[1:5])
But I'm stuck on the how, I've had a good look round but not found a solution. Any advice?
The result would look like:
> m1R
[[1]]
[1] "B" "C" "E" "A" "D"
[[2]]
[1] "b" "c" "e" "a" "d"

You could do this to reorder all elements:
neworder <- sample.int(5)
lapply(m1, function(x) x[neworder])

Related

splitting vector every two indices

Given vector of N elements:
LETTERS[1:10]
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"
How can one get a data.table/frame (df) as follows?
>df
one two
A B
C D
E F
G H
I J
EDIT
Generalizing I would like to know given a vector to split as follows:
[A B C],[D E],[F G H I J]
and obtaining:
V1 V2 V3 V4 V5
A B C NA NA
D E NA NA NA
F G H I J
One option is the matrix way
as.data.frame(matrix(LETTERS[1:10], ncol=2,byrow=TRUE,
dimnames = list(NULL, c('one', 'two'))), stringsAsFactors=FALSE)
# one two
#1 A B
#2 C D
#3 E F
#4 G H
#5 I J
f we need to create an index, we can use gl to split the vector and rbind
do.call(rbind, split(v1, as.integer(gl(length(v1), 2, length(v1)))))
where
v1 <- LETTERS[1:10]
Update
Based on the update in OP's post
lst <- split(v1, rep(1:3, c(3, 2, 5)))
do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
# [,1] [,2] [,3] [,4] [,5]
#1 "A" "B" "C" NA NA
#2 "D" "E" NA NA NA
#3 "F" "G" "H" "I" "J"
Or otherwise
library(stringi)
stri_list2matrix(lst, byrow = TRUE)
Update2
If we are using a 'splitVec'
lst <- split(v1, cumsum(seq_along(v1) %in% splitVec))
and then proceed as above

how to get the first priority by applying rules into a data frame in r

I have a data frame including different levels of choices:
df = read.table(text="Index V1 V2 V3 V4 V5
1 A A A B A
2 B B B B B
3 B C C B B
4 B B C D E
5 B B C C D
6 A B B B B
7 C C B D D
8 A B C D E", header=T, stringsAsFactors=F)
I would like to create another column to hold the most accepted choice for each row. if there are more than one choices, take the maximum numbers of occurrences. if the maximum number is more than 1, take the first choice with the maximum number of occurrences. So my result is expected:
Index V1 V2 V3 V4 V5 final
1 A A A B A A
2 B B B B B B
3 B C C B B B
4 B B C D E B
5 B B C C D B
6 A B B B B B
7 C C B D D C
8 A B C D E A
Thanks for helps.
apply(df[,-1], 1, function(x)
x[which.max(ave(rep(1, length(x)), x, FUN = sum))] )
#[1] "A" "B" "B" "B" "B" "B" "C" "A"
df[7,2:6] = c("D", "C", "B", "C", "D")
apply(df[,-1], 1, function(x)
x[which.max(ave(rep(1, length(x)), x, FUN = sum))] )
#[1] "A" "B" "B" "B" "B" "B" "D" "A"
We can do this with finding the frequency of values in each row using table. Loop through the rows of dataset except the first column (apply with MARGIN = 1), get the frequency with table, find the index of the maximum frequency (which.max) and get the names that corresponds to the max frequency
df$final <- apply(df[-1], 1, FUN = function(x) {
tbl <- table(factor(x, levels = unique(x)))
names(tbl)[which.max(tbl)]})
df$final
#[1] "A" "B" "B" "B" "B" "B" "C" "A"

"lapply" in R does not work for each element

test.data <- data.frame(a=seq(10),b=rep(seq(5),times=2),c=rep(seq(5),each=2))
test.data <- data.frame(lapply(test.data, as.character), stringsAsFactors = F)
test.ref <- data.frame(original=seq(10),name=letters[1:10])
test.ref <- data.frame(lapply(test.ref, as.character), stringsAsFactors = F)
test.match <- function (x) {
result = test.ref$name[which(test.ref$original == x)]
return(result)
}
> data.frame(lapply(test.data, test.match))
a b c
1 a a a
2 b b a
3 c c a
4 d d a
5 e e a
6 f a a
7 g b a
8 h c a
9 i d a
10 j e a
> lapply(test.data, test.match)
$a
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
$b
[1] "a" "b" "c" "d" "e"
$c
[1] "a"
Hi all,
I am learning to use the apply family in R. However, I am stuck in a rather simple exercise. Above is my code. I am trying to use the "test.match" function to replace all the elements in "test.data" by the reference rule in "test.ref". However, the last column does not work if I turn the final result into data frame. It is even worse if I keep the result as a list.
Many thanks for your help,
Kevin
As mentioned in the comments, you probably want match:
do.test.match.df <- function(df, ref_df = test.ref){
res <- df
res[] <- lapply(df, function(x) ref_df$name[ match(x, ref_df$original) ])
return(res)
}
do.test.match.df(test.data)
which gives
a b c
1 a a a
2 b b a
3 c c b
4 d d b
5 e e c
6 f a c
7 g b d
8 h c d
9 i d e
10 j e e
This is the idiomatic way. lapply will always return a vanilla list. A data.frame is a special kind of list (a list of column vectors). With res[] <- lapply(df, myfun), we're assigning to columns of res.
Since all your columns are the same class, I'd suggest using a matrix instead of a data.frame.
test.mat <- as.matrix(test.data)
do.test.match <- function(mat, ref_df=test.ref){
res <- matrix(, nrow(mat), ncol(mat))
res[] <- ref_df$name[ match( c(mat), ref_df$original ) ]
return(res)
}
do.test.match(test.mat)

How to add a list to a data frame in R?

I have 2 tables as below:
a = read.table(text=' a b
1 c
1 d
2 c
2 a
2 b
3 a
', head=T)
b = read.table(text=' a c
1 x i
2 y j
3 z k
', head=T)
And I want result to be like this:
1 x i c d
2 y j c a b
3 z k a
Originally I thought to use tapply to transform them to lists (eg. aa = tapply(a[,2], a[,1], function(x) paste(x,collapse=","))), then append it back to table b, but I got stuck...
Any suggestion to do this?
Thanks a million.
One way to do it:
mapply(FUN = c,
lapply(split(b, row.names(b)), function(x) as.character(unlist(x, use.names = FALSE))),
split(as.character(a$b), a$a),
SIMPLIFY = FALSE)
# $`1`
# [1] "x" "i" "c" "d"
#
# $`2`
# [1] "y" "j" "c" "a" "b"
#
# $`3`
# [1] "z" "k" "a"

Column Split without repeat

I have a dataframe with one column that I would like to split into several columns, but the number of splits is dynamic throughout the rows.
Var1
====
A/B
A/B/C
C/B
A/C/D/E
I have tried using colsplit(df$Var1,split="/",names=c("Var1","Var2","Var3","Var4")), but rows with less than 4 variables will repeat.
From Hansi, the desired output would be:
Var1 Var2 Var3 Var4
[1,] "A" "B" NA NA
[2,] "A" "B" "C" NA
[3,] "C" "B" NA NA
[4,] "A" "C" "D" "E"
> read.table(text=as.character(df$Var1), sep="/", fill=TRUE)
V1 V2 V3 V4
1 A B
2 A B C
3 C B
4 A C D E
Leading zeros in digit only fields can be preserved with colClasses="character"
a <- data.frame(Var1=c("01/B","04/B/C","0098/B","8708/C/D/E"))
read.table(text=as.character(a$Var1), sep="/", fill=TRUE, colClasses="character")
V1 V2 V3 V4
1 01 B
2 04 B C
3 0098 B
4 8708 C D E
If I understood your objective correctly here is one possible solution, I'm sure there is a better way of doing it but this was the first that came to mind:
a <- data.frame(Var1=c("A/B","A/B/C","C/B","A/C/D/E"))
splitNames <- c("Var1","Var2","Var3","Var4")
# R> a
# Var1
# 1 A/B
# 2 A/B/C
# 3 C/B
# 4 A/C/D/E
b <- t(apply(a,1,function(x){
temp <- unlist(strsplit(x,"/"));
return(c(temp,rep(NA,max(0,length(splitNames)-length(temp)))))
}))
colnames(b) <- splitNames
# R> b
# Var1 Var2 Var3 Var4
# [1,] "A" "B" NA NA
# [2,] "A" "B" "C" NA
# [3,] "C" "B" NA NA
# [4,] "A" "C" "D" "E"
i do not know a function to solve your problem, but you can achieve it easily with standard R commands :
# Here are your data
df <- data.frame(Var1=c("A/B", "A/B/C", "C/B", "A/C/D/E"), stringsAsFactors=FALSE)
# Split
rows <- strsplit(df$Var1, split="/")
# Maximum amount of columns
columnCount <- max(sapply(rows, length))
# Fill with NA
rows <- lapply(rows, `length<-`, columnCount)
# Coerce to data.frame
out <- as.data.frame(rows)
# Transpose
out <- t(out)
As it relies on strsplit, you may need to make some type conversion. See type.con

Resources