Sort matrix by colnames from another matrix - r

I have two matrices with the same dimensions and they both have the same stock names as colnames, but in a different order!
I would like to sort the matrix "A" by the colnames of the matrix "B".
So the A colnames and the according value should be in the same order as the colnames of B.
How can I do this?
Example:
Kind Regards

Your example in R terms would be
A <- matrix(c(1, 4, 2), nrow = 1)
colnames(A) <- c("B", "D", "E")
A
# B D E
# [1,] 1 4 2
B <- matrix(c(2, 5, 1), nrow = 1)
colnames(B) <- c("E", "B", "D")
B
# E B D
# [1,] 2 5 1
Then we may simply subset the columns of A in the same order as they are in B:
A[, colnames(B)]
# E B D
# 2 1 4

Related

Retaining row names when converting matrix to a factor value

This most likely has a simple remedy, but I'm trying to convert a matrix with one column and row names to a factor value, but when I do so the row names disappear:
x <- c("A", "B", "C", "D")
y <- c(1, 0, 1, 0)
y <- as.matrix(y)
rownames(y) <- x
f <- as.factor(y[,1])
So then the factor 'f' looks like:
[,1]
1 1
2 0
3 1
4 0
Rather than:
[,1]
A 1
B 0
C 1
D 0
Does anyone know if there's a way of retaining the row names when having coverting to a factor value?
I don't think you have any other option than to
f <- as.factor(y)
dim(f) <- c(4, 1)
rownames(f) <- rownames(y)
f
# [,1]
# A 2
# B 1
# C 2
# D 1
# Levels: 1 2
Matrices aren't really well suited for factors, a data frame would be a better option.

How to calculate the mode of all rows or columns from a dataframe

I would like how to calculate the mode of all rows or columns from a matrix.
For example, I have:
seq <- c(1,2,3)
seq1 <- rep(seq, each=4)
mat1 <- matrix(seq1, 3)
mat1
rows <- c(1,2,3)
columns <- c("a", "b", "c", "d")
colnames (mat1) <- columns
rownames (mat1) <- rows
mat1
a b c d
1 1 1 2 3
2 1 2 2 3
3 1 2 3 3
Now, I want to calculate the mode of each row and column.
Thanks in advance
adapted from Is there a built-in function for finding the mode?
modefunc <- function(x){
tabresult <- tabulate(x)
themode <- which(tabresult == max(tabresult))
if(sum(tabresult == max(tabresult))>1) themode <- NA
return(themode)
}
#rows
apply(mat1, 1, modefunc)
#columns
apply(mat1, 2, modefunc)

na.strings applied to a dataframe

I currently have a dataframe in which there are several rows I would like converted to "NA". When I first imported this dataframe from a .csv, I could use na.strings=c("A", "B", "C) and so on to remove the values I didn't want.
I want to do the same thing again, but this time using a dataframe already, not importing another .csv
To import the data, I used:
data<-read.csv("code.csv", header=T, strip.white=TRUE, stringsAsFactors=FALSE, na.strings=c("", "A", "B", "C"))
Now, with "data", I would like to subset it while removing even more specific values in the rows.. I tried someting like:
data2<-data.frame(data, na.strings=c("D", "E", "F"))
Of course this doesn't work because I think na.strings only works with the "read" package.. not other functions. Is there any equivalent to simply convert certain values into NA so I can na.omit(data2) fairly easily?
Thanks for your help.
Here's a way to replace values in multiple columns:
# an example data frame
dat <- data.frame(x = c("D", "E", "F", "G"),
y = c("A", "B", "C", "D"),
z = c("X", "Y", "Z", "A"))
# x y z
# 1 D A X
# 2 E B Y
# 3 F C Z
# 4 G D A
# values to replace
na.strings <- c("D", "E", "F")
# index matrix
idx <- Reduce("|", lapply(na.strings, "==", dat))
# replace values with NA
is.na(dat) <- idx
dat
# x y z
# 1 <NA> A X
# 2 <NA> B Y
# 3 <NA> C Z
# 4 G <NA> A
Just assign the NA values directly.
e.g.:
x <- data.frame(a=1:5, b=letters[1:5])
# > x
# a b
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d
# 5 5 e
# convert the 'b' and 'd' in columb b to NA
x$b[x$b %in% c('b', 'd')] <- NA
# > x
# a b
# 1 1 a
# 2 2 <NA>
# 3 3 c
# 4 4 <NA>
# 5 5 e
data[ data == "D" ] = NA
Note that if you were trying to replace NA with "D", the reverse (df[ df == NA ] = "D") will not work; you would need to use df[is.na(df)] <- "D"
Since we don't have your data I will use mtcars. Suppose we want to set values anywhere in mtcars that are equal to 4 or 19.2 to NA
ind <- which(mtcars == 4, arr.ind = TRUE)
mtcars[ind] <- NA
In your setting you would replace this number by "D" or "E"

order while splitting (eg. TA should be split to two column "A" in first "T" second) in r

I have following issue, I could solve:
set.seed (1234)
mydf <- data.frame (var1a = sample (c("TA", "AA", "TT"), 5, replace = TRUE),
varb2 = sample (c("GA", "AA", "GG"), 5, replace = TRUE),
varAB = sample (c("AC", "AA", "CC"), 5, replace = TRUE)
)
mydf
var1a varb2 varAB
1 TA AA CC
2 AA GA AA
3 AA GA AC
4 AA AA CC
5 TT AA AC
I want to split two letter into different column, and then order alphabetically.
Edit: Ordering can be done before split, for example var1a value "TA" var1a should be "AT" or after split so that var1aa should be "A", and var1ab be "T" (instead of "T", "A").
so sorting is within each cell.
split_col <- function(.col, data){
.x <- colsplit( data[[.col]], names = paste0(.col, letters[1:2]))
}
split each column and combine
require(reshape)
splitdf <- do.call(cbind, lapply(names(mydf), split_col, data = mydf))
var1aa var1ab varb2a varb2b varABa varABb
1 T A A A C C
2 A A G A A A
3 A A G A A C
4 A A A A C C
5 T T A A A C
But the unsolved part is I want to order the pair of columns such that columnname"a" and columname"b" are ordered, alphabetically. Thus expected output:
var1aa var1ab varb2a varb2b varABa varABb
1 A T A A C C
2 A A A G A A
3 A A A G A C
4 A A A A C C
5 T T A A A C
Can how can order (short with each pair of variable) ?
mylist <-as.list(mydf)
splits <- lapply(mylist, reshape::colsplit, names=c("a", "b"))
rowsort <- lapply(splits, function(x) t(apply(x, 1, sort)))
comb <- do.call(data.frame, rowsort)
comb
var1a.1 var1a.2 varb2.1 varb2.2 varAB.a varAB.b
1 A T A A C C
2 A A A G A A
3 A A A G A C
4 A A A A C C
5 T T A A A C
EDIT:
If names are important, you can replace them:
replaceNums <- function(x){
.which <- regmatches(x, regexpr("[[:alnum:]]*(?=.)", x, perl=TRUE))
stopifnot(length(x) %% 2 == 0) #checkstep
paste0(.which, c("a", "b"))
}
names(comb) <- replaceNums(names(comb))
comb
var1aa var1ab varb2a varb2b varABa varABb
1 A T A A C C
2 A A A G A A
3 A A A G A C
4 A A A A C C
5 T T A A A C

Create new column based on 4 values in another column

I want to create a new column based on 4 values in another column.
if col1=1 then col2= G;
if col1=2 then col2=H;
if col1=3 then col2=J;
if col1=4 then col2=K.
HOW DO I DO THIS IN R?
Please I need someone to help address this. I have tried if/else and ifelse but none seems to be working. Thanks
You could use nested ifelse:
col2 <- ifelse(col1==1, "G",
ifelse(col1==2, "H",
ifelse(col1==3, "J",
ifelse(col1==4, "K",
NA )))) # all other values map to NA
In this simple case it's overkill, but for more complicated ones...
You have a special case of looking up values where the index are integer numbers 1:4. This means you can use vector indexing to solve your problem in one easy step.
First, create some sample data:
set.seed(1)
dat <- data.frame(col1 = sample(1:4, 10, replace = TRUE))
Next, define the lookup values, and use [ subsetting to find the desired results:
values <- c("G", "H", "J", "K")
dat$col2 <- values[dat$col1]
The results:
dat
col1 col2
1 2 H
2 2 H
3 3 J
4 4 K
5 1 G
6 4 K
7 4 K
8 3 J
9 3 J
10 1 G
More generally, you can use [ subsetting combined with match to solve this kind of problem:
index <- c(1, 2, 3, 4)
values <- c("G", "H", "J", "K")
dat$col2 <- values[match(dat$col1, index)]
dat
col1 col2
1 2 H
2 2 H
3 3 J
4 4 K
5 1 G
6 4 K
7 4 K
8 3 J
9 3 J
10 1 G
There are a number of ways of doing this, but here's one.
set.seed(357)
mydf <- data.frame(col1 = sample(1:4, 10, replace = TRUE))
mydf$col2 <- rep(NA, nrow(mydf))
mydf[mydf$col1 == 1, ][, "col2"] <- "A"
mydf[mydf$col1 == 2, ][, "col2"] <- "B"
mydf[mydf$col1 == 3, ][, "col2"] <- "C"
mydf[mydf$col1 == 4, ][, "col2"] <- "D"
col1 col2
1 1 A
2 1 A
3 2 B
4 1 A
5 3 C
6 2 B
7 4 D
8 3 C
9 4 D
10 4 D
Here's one using car's recode.
library(car)
mydf$col3 <- recode(mydf$col1, "1" = 'A', "2" = 'B', "3" = 'C', "4" = 'D')
One more from this question:
mydf$col4 <- c("A", "B", "C", "D")[mydf$col1]
You could have a look at ?symnum.
In your case, something like:
col2<-symnum(col1, seq(0.5, 4.5, by=1), symbols=c("G", "H", "J", "K"))
should get you close.

Resources