My data looks like,
A B C D
B C A D
X Y M Z
O M L P
How can I sort the rows to get something like
A B C D
A B C D
M X Y Z
L M O P
Thanks,
t(apply(DF, 1, sort))
The t() function is necessary because row operations with the apply family of functions returns the results in column-major order.
What did you try? This is really straight-forward and easy to solve with a simple loop.
> s <- x
> for(i in 1:NROW(x)) {
+ s[i,] <- sort(s[i,])
+ }
> s
V1 V2 V3 V4
1 A B C D
2 A B C D
3 M X Y Z
4 L M O P
No plyr answer yet?!
foo <- matrix(sample(LETTERS,10^2,T),10,10)
library("plyr")
aaply(foo,1,sort)
Exactly the same as DWins answer except that you don't need t()
Another fast base R option from Martin Morgan in Fastest way to select i-th highest value from row and assign to new column is
matrix(a[order(row(a), a, method="radix")], ncol=ncol(a))
Timings can be found here
Related
For example I have a dataframe data1 with these columns:
A B C D G T Q Y U J N
And I have another dataframe data2 with rows as follows:
A B C M
D G K T
Q F Y U
J W E N
Based on the above dataframe, I should have a column M after column C and before column D. I also should have a column K between columns G and T etc..
Therefore I want to use data2 to fill up the missing columns in data1. If I do that successfully, data1 should be :
A B C M D G K T Q F Y U J W E N
My code so far:
for(row in 1:nrow(data2))
{
for(column in 1:ncol(data2)){
element = data2[row,column]
for(column in 1:ncol(data1))
{
if(element!=colnames(data1)[column])
{
}
}
}
I'm not sure where to go with my code now, I don't think that it is an efficient code to begin with. Any help is appreciated.
We can transpose the second dataset, convert to a vector and use that as column names after assigning the columns that are not in the original data to NA
nm1 <- c(t(data2))
nm2 <- setdiff(nm1, names(data2))
data1[nm2 ] <- NA
data1 <- data1[nm1]
what is the fastest way to convert the data.table:
1: A B C
2: D E F
3: G H I
into the vector: G H I D E F A B C
I use:
X <- X[order(nrow(X):1),]
X <- melt(t(X))$value
But my feeling is, that this can be optimized :-)
Thank you
One option is to reverse the index, transpose to a matrix and concatenate
c(t(X[.N:1]))
I would like to keep only the top 2 factor levels based on the frequency and group all other factors into Other. I tried this but it doesn't help.
df=data.frame(a=as.factor(c(rep('D',3),rep('B',5),rep('C',2))),
b=as.factor(c(rep('A',5),rep('B',5))),
c=as.factor(c(rep('A',3),rep('B',5),rep('C',2))))
myfun=function(x){
if(is.factor(x)){
levels(x)[!levels(x) %in% names(sort(table(x),decreasing = T)[1:2])]='Others'
}
}
df=as.data.frame(lapply(df, myfun))
Expected Output
a b c
D A A
D A A
D A A
B A B
B A B
B B B
B B B
B B B
others B others
others B others
This might get a bit messy, but here is one approach via base R,
fun1 <- function(x){levels(x) <-
c(names(sort(table(x), decreasing = TRUE)[1:2]),
rep('others', length(levels(x))-2));
return(x)}
However the above function will need to first be re-ordered and as OP states in comment, the correct one will be,
fun1 <- function(x){ x=factor(x,
levels = names(sort(table(x), decreasing = TRUE)));
levels(x) <- c(names(sort(table(x), decreasing = TRUE)[1:2]),
rep('others', length(levels(x))-2));
return(x) }
This is now easy thanks to fct_lump() from the forcats package.
fct_lump(df$a, n = 2)
# [1] D D D B B B B B Other Other
# Levels: B D Other
The argument n controls the number of most common levels to be preserved, lumping together the others.
I'm trying to build a factor column that relates to two other factor columns with completely different factor levels. Here's example data.
set.seed(1234)
a<-sample(LETTERS[1:10],50,replace=TRUE)
b<-sample(letters[11:20],50,replace=TRUE)
df<-data.frame(a,b)
df$a<-as.factor(df$a)
df$b<-as.factor(df$b)
The rule I want to make creates a new column, c, that bases it's factor level value based on the value of column a.
if any row in column a ="F", that row in column c will equal whatever the entry is for column b. The code I'm trying:
dfn<-dim(df)[1]
for (i in 1:dfn){
df$c[i]<-ifelse(df$a[i]=="F",df$b[i],df$a[i])
}
df
only spits out the numbered index of the factor level for column b and not the actual entry. What have I done wrong?
I think you'll need to do a little finagling of character values. This seems to do it.
w <- df$a == "F"
df$c <- factor(replace(as.character(df$a), w, as.character(df$b)[w]))
Here is a quick look at the new column,
factor(replace(as.character(df$a), w, as.character(df$b)[w]))
# [1] B G G G I G A C G s G k C J C I C C B C D D B A C I n J I A
# [31] E C D p B H C C J I l G D G D p G E C H
# Levels: A B C D E G H I J k l n p s
As my previous comment, a solution with dplyr:
df %>% mutate(c = ifelse(a == "F", as.character(b), as.character(a)))
If you plan on doing anything involving combinations of the columns as factors, for example, comparisons, you should refactor to the same set of levels.
u<-union(levels(df$a),levels(df$b))
df$a<-factor(df$a,u)
df$b<-factor(df$b,u)
df$c<-df$a
ind<-df$a=="F"
df$c[ind]<-df$b[ind]
By taking this precaution, you can sensibly do
> sum(df$c==df$b)
[1] 6
> sum(df$a=="F")
[1] 6
otherwise the first line will fail.
I have several data.tables that I would like to rbindlist. The tables contain factors with (possibly missing) levels. Then rbindlist(...) behaves differently from do.call(rbind(...)):
dt1 <- data.table(x=factor(c("a", "b"), levels=letters))
rbindlist(list(dt1, dt1))[,x]
## [1] a b a b
## Levels: a b
do.call(rbind, list(dt1, dt1))[,x]
## [1] a b a b
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
If I want to keep the levels, do I have tor resort to rbind or is there a data.table way?
I guess rbindlist is faster because it doesn't do the checking of do.call(rbind.data.frame,...)
Why not to set the levels after binding?
Dt <- rbindlist(list(dt1, dt1))
setattr(Dt$x,"levels",letters) ## set attribute without a copy
from the ?setattr:
setattr() is useful in many situations to set attributes by reference and can be used on any object or part of an object, not just data.tables.
Thanks for pointing out this problem. As of version 1.8.11 it has been fixed:
dt1 <- data.table(x=factor(c("a", "b"), levels=letters))
rbindlist(list(dt1, dt1))[,x]
#[1] a b a b
#Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z