Data manipulation in R merge columns - r

I have a column in a data set A: 1, 1 , 2 , 2, 3, 4, 4, 4, 4, 5, 5.
and a data set B B:1, 2, 3, 4, 5
Is there a way how to respectively assign the values of B to the values of A.
The desirable result has to be :
A B C
1 v v
1 b v
2 n b
2 m b
3 k n
4 m
4 m
4 m
4 m
5 k
5 k

You could try
C <- B[A]
#> C
# [1] "v" "v" "b" "b" "n" "m" "m" "m" "m" "k" "k"
If you want to store this result in a data frame, you could use
length(B) <- length(A) # adapt the length of column B to that of column A
df <- cbind(A, B, C) # generate a matrix with three columns
df[is.na(df)] <- "" # remove the NA entries in column B (replace them with
# an empty string) in the rows where it is not defined
df <- as.data.frame(df) # convert the matrix into a data frame
#> df
# A B C
#1 1 v v
#2 1 b v
#3 2 n b
#4 2 m b
#5 3 k n
#6 4 m
#7 4 m
#8 4 m
#9 4 m
#10 5 k
#11 5 k
data
A <- c(1, 1, 2, 2, 3, 4, 4, 4, 4, 5, 5)
B <- c("v", "b", "n", "m", "k")
However, if you already have the columns A and B stored in a data frame and you only need to generate column C, you could obtain this result using df$C <- with(df, B[A])

Related

Sort matrix by colnames from another matrix

I have two matrices with the same dimensions and they both have the same stock names as colnames, but in a different order!
I would like to sort the matrix "A" by the colnames of the matrix "B".
So the A colnames and the according value should be in the same order as the colnames of B.
How can I do this?
Example:
Kind Regards
Your example in R terms would be
A <- matrix(c(1, 4, 2), nrow = 1)
colnames(A) <- c("B", "D", "E")
A
# B D E
# [1,] 1 4 2
B <- matrix(c(2, 5, 1), nrow = 1)
colnames(B) <- c("E", "B", "D")
B
# E B D
# [1,] 2 5 1
Then we may simply subset the columns of A in the same order as they are in B:
A[, colnames(B)]
# E B D
# 2 1 4

R-Converting Incidence matrix(csv file) to edge list format

I am studying social network analysis and will be using Ucinet to draw network graphs. For this, I have to convert the csv file to an edge list format. Converting the adjacency matrix to the edge list was successful. However, it is difficult to convert an incidence matrix to the edge list format.
The csv file('some.csv') I have, with a incidence matrix like this:
A B C D
a 1 0 3 1
b 0 0 0 2
c 3 2 0 1
The code that converted the adjacency matrix to the edge list was as follows:
x<-read.csv("C:/.../something.csv", header=T, row.names=1)
net<-as.network(x, matrix.type='adjacency', ignore.eval=FALSE, names.eval='dd', loops=FALSE)
el<-edgelist(net, attrname='dd')
write.csv(el, file='C:/.../result.csv')
Now It only succeedded in loading the file. I tried to follow the above method, but I get an error.
y<-read.csv("C:/.../some.csv", header=T, row.names=1)
net2<-network(y, matrix.type='incidence', ignore.eval=FALSE, names.eval='co', loops=FALSE)
Error in network.incidence(x, g, ignore.eval, names.eval, na.rm, edge.check) :
Supplied incidence matrix has empty head/tail lists. (Did you get the directedness right?)
I want to see the result in this way:
a A 1
a C 3
a D 1
b D 2
c A 3
c B 2
c D 1
I tried to put the values as the error said, but I could not get the result i wanted.
Thank you for any assistance with this.
Here's your data:
inc_mat <- matrix(
c(1, 0, 3, 1,
0, 0, 0, 2,
3, 2, 0, 1),
nrow = 3, ncol = 4, byrow = TRUE
)
rownames(inc_mat) <- letters[1:3]
colnames(inc_mat) <- LETTERS[1:4]
inc_mat
#> A B C D
#> a 1 0 3 1
#> b 0 0 0 2
#> c 3 2 0 1
Here's a generalized function that does the trick:
as_edgelist.weighted_incidence_matrix <- function(x, drop_rownames = TRUE) {
melted <- do.call(cbind, lapply(list(row(x), col(x), x), as.vector)) # 3 col matrix of row index, col index, and `x`'s values
filtered <- melted[melted[, 3] != 0, ] # drop rows where column 3 is 0
# data frame where first 2 columns are...
df <- data.frame(mode1 = rownames(x)[filtered[, 1]], # `x`'s rownames, indexed by first column in `filtered``
mode2 = colnames(x)[filtered[, 2]], # `x`'s colnames, indexed by the second column in `filtered`
weight = filtered[, 3], # the third column in `filtered`
stringsAsFactors = FALSE)
out <- df[order(df$mode1), ] # sort by first column
if (!drop_rownames) {
return(out)
}
`rownames<-`(out, NULL)
}
Take it for a spin:
el <- as_edgelist.weighted_incidence_matrix(inc_mat)
el
#> mode1 mode2 weight
#> 1 a A 1
#> 2 a C 3
#> 3 a D 1
#> 4 b D 2
#> 5 c A 3
#> 6 c B 2
#> 7 c D 1
Here are the results you wanted:
control_df <- data.frame(
mode1 = c("a", "a", "a", "b", "c", "c", "c"),
mode2 = c("A", "C", "D", "D", "A", "B", "D"),
weight = c(1, 3, 1, 2, 3, 2, 1),
stringsAsFactors = FALSE
)
control_df
#> mode1 mode2 weight
#> 1 a A 1
#> 2 a C 3
#> 3 a D 1
#> 4 b D 2
#> 5 c A 3
#> 6 c B 2
#> 7 c D 1
Do they match?
identical(control_df, el)
#> [1] TRUE
This might not be the most efficient way, but it produces expected result:
y <- matrix( c(1,0,3,0,0,2,3,0,0,1,2,1), nrow=3)
colnames(y) <- c("e.A","e.B","e.C","e.D")
dt <- data.frame(rnames=c("a","b","c"))
dt <- cbind(dt, y)
# rnames e.A e.B e.C e.D
#1 a 1 0 3 1
#2 b 0 0 0 2
#3 c 3 2 0 1
# use reshape () function to convert dataframe into the long format
M <- reshape(dt, direction="long", idvar = "rnames", varying = c("e.A","e.B","e.C","e.D"))
M <- M[M$e >0,]
M
# rnames time e
# a.A a A 1
# c.A c A 3
# c.B c B 2
# a.C a C 3
# a.D a D 1
# b.D b D 2
# c.D c D 1
# If M needs to be sorted by the column rnames:
M[order(M$rnames), ]
# rnames time e
# a.A a A 1
# a.C a C 3
# a.D a D 1
# b.D b D 2
# c.A c A 3
# c.B c B 2
# c.D c D 1

R find order of a vector

I have these two vectors:
x=c('a','c','b','b','c','a','d','d')
y=c(1, 4, 2, 4, 5, 9, 3, 3)
I want the order of x based on value of y such that each group in x are ordered following their minimum in y. Moreover within each group a, b, c, d, I want the order depending on ascending values of y.
eg the result of this ordering per group is:
x |a a b b d d c c
y |1 9 2 4 3 3 4 5
Hence the output must be:
output = c(1, 7, 3, 4, 8, 2, 5, 6)
I tried to use ave but can't combine both:
> ave(y, x, FUN=function(u) rank(u, ties.method='first'))
[1] 1 1 1 2 2 2 1 2
> ave(y, x, FUN=min)
[1] 1 4 2 2 4 1 3 3
You are trying to order first by the grouped y minimum and then by the y value itself, so you should pass these as the first and second arguments to the order function:
ordering <- order(ave(y, x, FUN=min), y)
x[ordering]
# [1] "a" "a" "b" "b" "d" "d" "c" "c"
y[ordering]
# [1] 1 9 2 4 3 3 4 5

Merging two vectors with an 'or'

I have 2 vectors, each of which has some NA values.
a <- c(1, 2, NA, 3, 4, NA)
b <- c(NA, 6, 7, 8, 9, NA)
I'd like to combine these two with a result that uses the value from a if it is non-NA, otherwise the value from b.
So the result would look like:
c <- c(1, 2, 7, 3, 4, NA)
How can I do this efficiently in R?
How about:
> c <- ifelse(is.na(a), b, a)
> c
[1] 1 2 7 3 4 NA
Try
a[is.na(a)] <- b[is.na(a)]
a
## [1] 1 2 7 3 4 NA
Or, if you don't want to overwrite a, just do
c <- a
c[is.na(c)] <- b[is.na(c)]
c
## [1] 1 2 7 3 4 NA

order while splitting (eg. TA should be split to two column "A" in first "T" second) in r

I have following issue, I could solve:
set.seed (1234)
mydf <- data.frame (var1a = sample (c("TA", "AA", "TT"), 5, replace = TRUE),
varb2 = sample (c("GA", "AA", "GG"), 5, replace = TRUE),
varAB = sample (c("AC", "AA", "CC"), 5, replace = TRUE)
)
mydf
var1a varb2 varAB
1 TA AA CC
2 AA GA AA
3 AA GA AC
4 AA AA CC
5 TT AA AC
I want to split two letter into different column, and then order alphabetically.
Edit: Ordering can be done before split, for example var1a value "TA" var1a should be "AT" or after split so that var1aa should be "A", and var1ab be "T" (instead of "T", "A").
so sorting is within each cell.
split_col <- function(.col, data){
.x <- colsplit( data[[.col]], names = paste0(.col, letters[1:2]))
}
split each column and combine
require(reshape)
splitdf <- do.call(cbind, lapply(names(mydf), split_col, data = mydf))
var1aa var1ab varb2a varb2b varABa varABb
1 T A A A C C
2 A A G A A A
3 A A G A A C
4 A A A A C C
5 T T A A A C
But the unsolved part is I want to order the pair of columns such that columnname"a" and columname"b" are ordered, alphabetically. Thus expected output:
var1aa var1ab varb2a varb2b varABa varABb
1 A T A A C C
2 A A A G A A
3 A A A G A C
4 A A A A C C
5 T T A A A C
Can how can order (short with each pair of variable) ?
mylist <-as.list(mydf)
splits <- lapply(mylist, reshape::colsplit, names=c("a", "b"))
rowsort <- lapply(splits, function(x) t(apply(x, 1, sort)))
comb <- do.call(data.frame, rowsort)
comb
var1a.1 var1a.2 varb2.1 varb2.2 varAB.a varAB.b
1 A T A A C C
2 A A A G A A
3 A A A G A C
4 A A A A C C
5 T T A A A C
EDIT:
If names are important, you can replace them:
replaceNums <- function(x){
.which <- regmatches(x, regexpr("[[:alnum:]]*(?=.)", x, perl=TRUE))
stopifnot(length(x) %% 2 == 0) #checkstep
paste0(.which, c("a", "b"))
}
names(comb) <- replaceNums(names(comb))
comb
var1aa var1ab varb2a varb2b varABa varABb
1 A T A A C C
2 A A A G A A
3 A A A G A C
4 A A A A C C
5 T T A A A C

Resources