what is the fastest way to convert the data.table:
1: A B C
2: D E F
3: G H I
into the vector: G H I D E F A B C
I use:
X <- X[order(nrow(X):1),]
X <- melt(t(X))$value
But my feeling is, that this can be optimized :-)
Thank you
One option is to reverse the index, transpose to a matrix and concatenate
c(t(X[.N:1]))
Related
I'm trying to add those data with each other, but I found "N/A" in the final output when I enter new names didn't exists in the first vector, so how can i handle it to show all the data without any "N/A"
I think you just want to merge/append two factors, an easy approach would be to convert them to character, append them and make it a factor again.
Just a simple example with letters
p <- as.factor(LETTERS[3:8])
q <- as.factor(LETTERS[1:5])
as.factor(c(as.character(p), as.character(q)))
# [1] C D E F G H A B C D E
# Levels: A B C D E F G H
I'm trying to build a factor column that relates to two other factor columns with completely different factor levels. Here's example data.
set.seed(1234)
a<-sample(LETTERS[1:10],50,replace=TRUE)
b<-sample(letters[11:20],50,replace=TRUE)
df<-data.frame(a,b)
df$a<-as.factor(df$a)
df$b<-as.factor(df$b)
The rule I want to make creates a new column, c, that bases it's factor level value based on the value of column a.
if any row in column a ="F", that row in column c will equal whatever the entry is for column b. The code I'm trying:
dfn<-dim(df)[1]
for (i in 1:dfn){
df$c[i]<-ifelse(df$a[i]=="F",df$b[i],df$a[i])
}
df
only spits out the numbered index of the factor level for column b and not the actual entry. What have I done wrong?
I think you'll need to do a little finagling of character values. This seems to do it.
w <- df$a == "F"
df$c <- factor(replace(as.character(df$a), w, as.character(df$b)[w]))
Here is a quick look at the new column,
factor(replace(as.character(df$a), w, as.character(df$b)[w]))
# [1] B G G G I G A C G s G k C J C I C C B C D D B A C I n J I A
# [31] E C D p B H C C J I l G D G D p G E C H
# Levels: A B C D E G H I J k l n p s
As my previous comment, a solution with dplyr:
df %>% mutate(c = ifelse(a == "F", as.character(b), as.character(a)))
If you plan on doing anything involving combinations of the columns as factors, for example, comparisons, you should refactor to the same set of levels.
u<-union(levels(df$a),levels(df$b))
df$a<-factor(df$a,u)
df$b<-factor(df$b,u)
df$c<-df$a
ind<-df$a=="F"
df$c[ind]<-df$b[ind]
By taking this precaution, you can sensibly do
> sum(df$c==df$b)
[1] 6
> sum(df$a=="F")
[1] 6
otherwise the first line will fail.
I am self-taught useR so please bear with me.
I have something similar the following dataset:
individual value
a 0.917741317
a 0.689673689
a 0.846208486
b 0.439198006
b 0.366260159
b 0.689985484
c 0.703381117
c 0.29467743
c 0.252435687
d 0.298108973
d 0.42951805
d 0.011187204
e 0.078516181
e 0.498118235
e 0.003877632
I would like to create a matrix with the values for a in column1, values for b in column2, etc. [I also add a 1 at the bottom of every column for a later algebra operations]
I have tried so far:
for (i in unique(df$individual)) {
values <- subset(df$value, df$individual == i)
m <- cbind(c(values[1:3],1))
}
I get a (4,1) matrix with the last individual values. What is missing to make it additive for each loop and get all as many columns as individuals?
This operation is called "reshaping". There is a base function, but I find it easier with the reshape2 package:
DF <- read.table(text="individual value
a 0.917741317
a 0.689673689
a 0.846208486
b 0.439198006
b 0.366260159
b 0.689985484
c 0.703381117
c 0.29467743
c 0.252435687
d 0.298108973
d 0.42951805
d 0.011187204
e 0.078516181
e 0.498118235
e 0.003877632", header=TRUE)
DF$id <- 1:3
library(reshape2)
DF2 <- dcast(DF, id ~ individual)
DF2[,-1]
# a b c d e
#1 0.9177413 0.4391980 0.7033811 0.2981090 0.078516181
#2 0.6896737 0.3662602 0.2946774 0.4295180 0.498118235
#3 0.8462085 0.6899855 0.2524357 0.0111872 0.003877632
I have several data.tables that I would like to rbindlist. The tables contain factors with (possibly missing) levels. Then rbindlist(...) behaves differently from do.call(rbind(...)):
dt1 <- data.table(x=factor(c("a", "b"), levels=letters))
rbindlist(list(dt1, dt1))[,x]
## [1] a b a b
## Levels: a b
do.call(rbind, list(dt1, dt1))[,x]
## [1] a b a b
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
If I want to keep the levels, do I have tor resort to rbind or is there a data.table way?
I guess rbindlist is faster because it doesn't do the checking of do.call(rbind.data.frame,...)
Why not to set the levels after binding?
Dt <- rbindlist(list(dt1, dt1))
setattr(Dt$x,"levels",letters) ## set attribute without a copy
from the ?setattr:
setattr() is useful in many situations to set attributes by reference and can be used on any object or part of an object, not just data.tables.
Thanks for pointing out this problem. As of version 1.8.11 it has been fixed:
dt1 <- data.table(x=factor(c("a", "b"), levels=letters))
rbindlist(list(dt1, dt1))[,x]
#[1] a b a b
#Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
My data looks like,
A B C D
B C A D
X Y M Z
O M L P
How can I sort the rows to get something like
A B C D
A B C D
M X Y Z
L M O P
Thanks,
t(apply(DF, 1, sort))
The t() function is necessary because row operations with the apply family of functions returns the results in column-major order.
What did you try? This is really straight-forward and easy to solve with a simple loop.
> s <- x
> for(i in 1:NROW(x)) {
+ s[i,] <- sort(s[i,])
+ }
> s
V1 V2 V3 V4
1 A B C D
2 A B C D
3 M X Y Z
4 L M O P
No plyr answer yet?!
foo <- matrix(sample(LETTERS,10^2,T),10,10)
library("plyr")
aaply(foo,1,sort)
Exactly the same as DWins answer except that you don't need t()
Another fast base R option from Martin Morgan in Fastest way to select i-th highest value from row and assign to new column is
matrix(a[order(row(a), a, method="radix")], ncol=ncol(a))
Timings can be found here