R - split list every x items - r

I have data to analyse that is presented in the form of a list (just one row and MANY columns).
A B C D E F G H I
1 2 3 4 5 6 7 8 9
Is there a way to tell R to split this list every x items and get something as seen below (the columns C D E F G H I are virtually the same as A B)?
A B
1 2
3 4
5 6
7 8
9

If the number of columns is a multiple of 'x', then we unlist the dataset, and use matrix to create the expected output.
as.data.frame(matrix(unlist(df1), ncol=2, dimnames=list(NULL, c("A", "B")) , byrow=TRUE))
If the number of columns is not a multiple of 'x', then
x <- 2
gr <- as.numeric(gl(ncol(df1), x, ncol(df1)))
lst <- split(unlist(df1), gr)
do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
# A B
# 1 1 2
# 2 3 4
# 3 5 6
# 4 7 8
# 5 9 NA

Related

Creating an identifier using pairs of row indices [duplicate]

I would like to generate indices to group observations based on two columns. But I want groups to be made of observation that share, at least one observation in commons.
In the data below, I want to check if values in 'G1' and 'G2' are connected directly (appear on the same row), or indirectly via other intermediate values. The desired grouping variable is shown in 'g'.
For example, A is directly linked to Z (row 1) and X (row 2). A is indirectly linked to 'B' via X (A -> X -> B), and further linked to Y via X and B (A -> X -> B -> Y).
dt <- data.frame(id = 1:10,
G1 = c("A","A","B","B","C","C","C","D","E","F"),
G2 = c("Z","X","X","Y","W","V","U","s","T","T"),
g = c(1,1,1,1,2,2,2,3,4,4))
dt
# id G1 G2 g
# 1 1 A Z 1
# 2 2 A X 1
# 3 3 B X 1
# 4 4 B Y 1
# 5 5 C W 2
# 6 6 C V 2
# 7 7 C U 2
# 8 8 D s 3
# 9 9 E T 4
# 10 10 F T 4
I tried with group_indices from dplyr, but haven't managed it.
Using igraph get membership, then map on names:
library(igraph)
# convert to graph, and get clusters membership ids
g <- graph_from_data_frame(df1[, c(2, 3, 1)])
myGroups <- components(g)$membership
myGroups
# A B C D E F Z X Y W V U s T
# 1 1 2 3 4 4 1 1 1 2 2 2 3 4
# then map on names
df1$group <- myGroups[df1$G1]
df1
# id G1 G2 group
# 1 1 A Z 1
# 2 2 A X 1
# 3 3 B X 1
# 4 4 B Y 1
# 5 5 C W 2
# 6 6 C V 2
# 7 7 C U 2
# 8 8 D s 3
# 9 9 E T 4
# 10 10 F T 4

I want to create a new csv from the existing csv consist of multiple same columns but not sorted data

I have a CSV with these data:
List Rank.A List Rank.B List Rank.C
a 4 a 8 b 3
b 5 e 5 e 9
c 7 f 5 r 1
I want to create a new csv in which there is only a one-column with a name List with a unique value and there is 3 more columns of "Rank.A", "Rank.B", "Rank.C" in same list. Suppose if Rank.A not listed with any row of List than it display blank. I want data in this format
List Rank.A Rank.B Rank.C
a 4 8
b 5 3
c 7
e 5 9
f 5
r 1
Can you please help me in that?
A base R option using split.default (to split your data.frame by columns) and Reduce + merge to combine data into a single data.frame.
Reduce(
function(x, y) merge(x, y, all = TRUE),
split.default(df, rep(1:(ncol(df) / 2), each = 2)))
# List Rank.A Rank.B Rank.C
# 1 a 4 8 NA
# 2 b 5 NA 3
# 3 c 7 NA NA
# 4 e NA 5 9
# 5 f NA 5 NA
# 6 r NA NA 1
Note that this assumes that you always have pairs of columns (List, Rank.x) in your original data.
Sample data
df <- read.table(text =
"List Rank.A List Rank.B List Rank.C
a 4 a 8 b 3
b 5 e 5 e 9
c 7 f 5 r 1", header = T, check.names = F)

Reorder a subset of an R data.frame modifying the row names as well

Given a data.frame:
foo <- data.frame(ID=1:10, x=1:10)
rownames(foo) <- LETTERS[1:10]
I would like to reorder a subset of rows, defined by their row names. However, I would like to swap the row names of foo as well. I can do
sel <- c("D", "H") # rows to reorder
foo[sel,] <- foo[rev(sel),]
sel.wh <- match(sel, rownames(foo))
rownames(foo)[sel.wh] <- rownames(foo)[rev(sel.wh)]
but that is long and complicated. Is there a simpler way?
We can replace the sel values in rownames with the reverse of sel.
x <- rownames(foo)
foo[replace(x, x %in% sel, rev(sel)), ]
# ID x
#A 1 1
#B 2 2
#C 3 3
#H 8 8
#E 5 5
#F 6 6
#G 7 7
#D 4 4
#I 9 9
#J 10 10
Not as concise as ronak-shah's answer, but you could also use order.
# extract row names
temp <- row.names(foo)
# reset of vector
temp[which(temp %in% sel)] <- temp[rev(which(temp %in% sel))]
# reset order of data.frame
foo[order(temp),]
ID x
A 1 1
B 2 2
C 3 3
H 8 8
E 5 5
F 6 6
G 7 7
D 4 4
I 9 9
J 10 10
As noted in the comments, this relies on the row names following a lexicographical order. In instances where this is not true, we can use match.
# set up
set.seed(1234)
foo <- data.frame(ID=1:10, x=1:10)
row.names(foo) <- sample(LETTERS[1:10])
sel <- c("D", "H")
Now, the rownames are
# initial data.frame
foo
ID x
B 1 1
F 2 2
E 3 3
H 4 4
I 5 5
D 6 6
A 7 7
G 8 8
J 9 9
C 10 10
# grab row names
temp <- row.names(foo)
# reorder vector containing row names
temp[which(temp %in% sel)] <- temp[rev(which(temp %in% sel))]
Using, match along with order
foo[order(match(row.names(foo), temp)),]
ID x
B 1 1
F 2 2
E 3 3
D 6 6
I 5 5
H 4 4
A 7 7
G 8 8
J 9 9
C 10 10
your data frame is small so you can duplicate it then change the value of each raw:
footmp<-data.frame(foo)
foo[4,]<-footemp[8,]
foot{8,]<-footemp[4,]
Bob

add column to existing column in r

How do I convert 2 columns from a data.frame onto 2 different columns?
I.E:
Data
A B C D
1 3 5 7
2 4 6 8
to
Data
A B
1 3
2 4
5 7
6 8
You can use rbind
rbind(df[,1:2], data.frame(A = df$C, B = df$D))
You can use a fast version of rbind, rbindlist from data.table:
library(data.table)
rbindlist(lapply(seq(1, ncol(df), 2), function(i) df[,i:(i+1)]))
Here is my solution but it requires to change names of the columns.
names(dat) <- c("A", "B", "A", "B")
merge(dat[1:2], dat[3:4], all = T)
A B
1 1 3
2 2 4
3 5 7
4 6 8
And here is another solution more easy.
dat[3:4, ] <- dat[ ,3:4]
dat <- dat[1:2]
dat
A B
1 1 3
2 2 4
3 5 7
4 6 8
For scalability, a solution that will halve any even size data frame and append the rows:
half <- function(df) {m <- as.matrix(df)
dim(m) <- c(nrow(df)*2,ncol(df)/2)
nd <- as.data.frame(m)
names(nd) <- names(df[(1:dim(nd)[2])]);nd}
half(Data)
A B
1 1 5
2 2 6
3 3 7
4 4 8

Convert a matrix with dimnames into a long format data.frame

Hoping there's a simple answer here but I can't find it anywhere.
I have a numeric matrix with row names and column names:
# 1 2 3 4
# a 6 7 8 9
# b 8 7 5 7
# c 8 5 4 1
# d 1 6 3 2
I want to melt the matrix to a long format, with the values in one column and matrix row and column names in one column each. The result could be a data.table or data.frame like this:
# col row value
# 1 a 6
# 1 b 8
# 1 c 8
# 1 d 1
# 2 a 7
# 2 c 5
# 2 d 6
...
Any tips appreciated.
Use melt from reshape2:
library(reshape2)
#Fake data
x <- matrix(1:12, ncol = 3)
colnames(x) <- letters[1:3]
rownames(x) <- 1:4
x.m <- melt(x)
x.m
Var1 Var2 value
1 1 a 1
2 2 a 2
3 3 a 3
4 4 a 4
...
The as.table and as.data.frame functions together will do this:
> m <- matrix( sample(1:12), nrow=4 )
> dimnames(m) <- list( One=letters[1:4], Two=LETTERS[1:3] )
> as.data.frame( as.table(m) )
One Two Freq
1 a A 7
2 b A 2
3 c A 1
4 d A 5
5 a B 9
6 b B 6
7 c B 8
8 d B 10
9 a C 11
10 b C 12
11 c C 3
12 d C 4
Assuming 'm' is your matrix...
data.frame(col = rep(colnames(m), each = nrow(m)),
row = rep(rownames(m), ncol(m)),
value = as.vector(m))
This executes extremely fast on a large matrix and also shows you a bit about how a matrix is made, how to access things in it, and how to construct your own vectors.
A modification that doesn't require you to know anything about the storage structure, and that easily extends to high dimensional arrays if you use the dimnames, and slice.index functions:
data.frame(row=rownames(m)[as.vector(row(m))],
col=colnames(m)[as.vector(col(m))],
value=as.vector(m))

Resources