I need to change/invert rows in my data frame, not transposing the data but moving the bottom row to the top and so on. If the data frame was:
1 2 3
4 5 6
7 8 9
I need to convert to
7 8 9
4 5 6
1 2 3
I've read about sort() but I don't think it is what I need or I'm not able to find the way.
There probably are more elegant ways, but this works:
m <- matrix(1:9, ncol=3, byrow=TRUE)
# m[rev(seq_len(nrow(m))), ] # Initial answer
m[nrow(m):1, ]
[,1] [,2] [,3]
[1,] 7 8 9
[2,] 4 5 6
[3,] 1 2 3
This works because you are indexing the matrix with a reversed sequence of integers as the row index. nrow(m):1 results in 3 2 1.
You can reverse the order of a data.frame using the dplyr package:
iris %>% arrange(-row_number())
Or without using the pipe-operator by doing
arrange(iris, -row_number())
I would reverse the rows an index starting with the number of rows, along this line
revdata <- thedata[dim(thedata)[1L]:1,]
I think this is the simplest way:
MyMatrix = matrix(1:20, ncol = 2)
MyMatrix[ nrow(MyMatrix):1, ]
If you want to reverse the columns, just do
MyMatrix[ , ncol(MyMatrix):1 ]
We can reverse the order of row.names (for data.frame only):
# create data.frame
m <- matrix(1:9, ncol=3, byrow=TRUE)
df_m <- data.frame(m)
#reverse
df_m[rev(rownames(df_m)), ]
# X1 X2 X3
# 3 7 8 9
# 2 4 5 6
# 1 1 2 3
Veeery late, but this seems to be working fast, does not need any extra packages and is simple:
for(i in 1:ncol(matrix)) {matrix[,i] = rev(matrix[,i])}
I guess that for frequent use, one would make a function out of it.
Tested with R v=3.3.1.
Encounter this problem today and here I am providing another solution for your interests.
m <- matrix(1:9, ncol=3, byrow=TRUE)
apply(m,2,rev)
Related
I would like to do calculations across columns in my data, by row. The calculations are "moving" in that I would like to know the difference between two numbers in column 1 and 2, then columns 3 and 4, and so on. I have looked at "loops" and "rollapply" functions, but could not figure this out. Below are three options of what was attempted. Only the third option gives me the result I am after, but it is very lengthy code and also does not allow for automation (the input data will be a much larger matrix, so typing out the calculation for each row won't work).
Please advice how to make this code shorter and/or any other packages/functions to check out which will do the job. THANK YOU!
MY TEST SCRIPT IN R + errors/results
Sample data set
a<- c(1,2,3, 4, 5)
b<- c(1,2,3, 4, 5)
c<- c(1,2,3, 4, 5)
test.data <- data.frame(cbind(a,b*2,c*10))
names(test.data) <- c("a", "b", "c")
Sample of calculations attempted:
OPTION 1
require(zoo)
rollapply(test.data, 2, diff, fill = NA, align = "right", by.column=FALSE)
RESULT 1 (not what we're after. What we need is at the bottom of Option 3)
# a b c
#[1,] NA NA NA
#[2,] 1 2 10
#[3,] 1 2 10
#[4,] 1 2 10
#[5,] 1 2 10
OPTION 2:
results <- for (i in 1:length(nrow(test.data))) {
diff(as.numeric(test.data[i,]), lag=1)
print(results)}
RESULT 2: (again not what we're after)
# NULL
OPTION 3: works, but long way, so would like to simplify code and make generic for any length of observations in my dataframe and any number of columns (i.e. more than 3). I would like to "automate" the steps below, if know number of observations (i.e. rows).
row1=diff(as.numeric(test[1,], lag=1))
row2=diff(as.numeric(test[2,], lag=1))
row3=diff(as.numeric(test[3,], lag=1))
row4=diff(as.numeric(test[4,], lag=1))
row5=diff(as.numeric(test[5,], lag=1))
results.OK=cbind.data.frame(row1, row2, row3, row4, row5)
transpose.results.OK=data.frame(t(as.matrix(results.OK)))
names(transpose.results.OK)=c("diff.ab", "diff.bc")
Final.data = transpose.results.OK
print(Final.data)
RESULT 3: (THIS IS WHAT I WOULD LIKE TO GET, "row1" can be "obs1" etc)
# diff.ab diff.bc
#row1 1 8
#row2 2 16
#row3 3 24
#row4 4 32
#row5 5 40
THE END
Here are the 3 options redone plus a 4th option:
# 1
library(zoo)
d <- t(rollapplyr(t(test.data), 2, diff, by.column = FALSE))
# 2
d <- test.data[-1]
for (i in 1:nrow(test.data)) d[i, ] <- diff(unlist(test.data[i, ]))
# 3
d <- t(diff(t(test.data)))
# 4 - also this works
nc <- ncol(test.data)
d <- test.data[-1] - test.data[-nc]
For any of them to set the names:
colnames(d) <- paste0("diff.", head(names(test.data), -1), colnames(d))
(2) and (4) give this data.frame and (1) and (3) give the corresponding matrix:
> d
diff.ab diff.bc
1 1 8
2 2 16
3 3 24
4 4 32
5 5 40
Use as.matrix or as.data.frame if you want the other.
An apply based solution using diff on row-wise can be achieved as:
# Result
res <- t(apply(test.data, 1, diff)) #One can change it to data.frame
# Name of the columns
colnames(res) <- paste0("diff.", head(names(test.data), -1),
tail(names(test.data), -1))
res
# diff.ab diff.bc
# [1,] 1 8
# [2,] 2 16
# [3,] 3 24
# [4,] 4 32
# [5,] 5 40
I need to change/invert rows in my data frame, not transposing the data but moving the bottom row to the top and so on. If the data frame was:
1 2 3
4 5 6
7 8 9
I need to convert to
7 8 9
4 5 6
1 2 3
I've read about sort() but I don't think it is what I need or I'm not able to find the way.
There probably are more elegant ways, but this works:
m <- matrix(1:9, ncol=3, byrow=TRUE)
# m[rev(seq_len(nrow(m))), ] # Initial answer
m[nrow(m):1, ]
[,1] [,2] [,3]
[1,] 7 8 9
[2,] 4 5 6
[3,] 1 2 3
This works because you are indexing the matrix with a reversed sequence of integers as the row index. nrow(m):1 results in 3 2 1.
You can reverse the order of a data.frame using the dplyr package:
iris %>% arrange(-row_number())
Or without using the pipe-operator by doing
arrange(iris, -row_number())
I would reverse the rows an index starting with the number of rows, along this line
revdata <- thedata[dim(thedata)[1L]:1,]
I think this is the simplest way:
MyMatrix = matrix(1:20, ncol = 2)
MyMatrix[ nrow(MyMatrix):1, ]
If you want to reverse the columns, just do
MyMatrix[ , ncol(MyMatrix):1 ]
We can reverse the order of row.names (for data.frame only):
# create data.frame
m <- matrix(1:9, ncol=3, byrow=TRUE)
df_m <- data.frame(m)
#reverse
df_m[rev(rownames(df_m)), ]
# X1 X2 X3
# 3 7 8 9
# 2 4 5 6
# 1 1 2 3
Veeery late, but this seems to be working fast, does not need any extra packages and is simple:
for(i in 1:ncol(matrix)) {matrix[,i] = rev(matrix[,i])}
I guess that for frequent use, one would make a function out of it.
Tested with R v=3.3.1.
Encounter this problem today and here I am providing another solution for your interests.
m <- matrix(1:9, ncol=3, byrow=TRUE)
apply(m,2,rev)
I know there are similar questions but I couldn't find an answer to my question. I'm trying to rank elements in a matrix and then extract data of 5 highest elements.
Here is my attempt.
set.seed(20)
d<-matrix(rnorm(100),nrow=10,ncol=10)
start<-d[1,1]
for (i in 1:10) {
for (j in 1:10) {
if (start < d[i,j])
{high<-d[i,j]
rowind<-i
colind<-j
}
}
}
Although this gives me the data of the highest element, including row and column numbers, I can't think of a way to do the same for elements ranked from 2 to 5. I also tried
rank(d, ties.method="max")
But it wasn't helpful because it just spits out the rank in vector format.
What I ultimately want is a data frame (or any sort of table) that contains
rank, column name, row name, and the data(number) of highest 5 elements in matrix.
Edit
set.seed(20)
d<-matrix(rnorm(100),nrow=10,ncol=10)
d[1,2]<-5
d[2,1]<-5
d[1,3]<-4
d[3,1]<-4
Thanks for the answers. Those perfectly worked for my purpose, but as I'm running this code for correlation chart -where there will be duplicate numbers for every pair- I want to count only one of the two numbers for ranking purpose. Is there any way to do this? Thanks.
Here's a very crude way:
DF = data.frame(row = c(row(d)), col = c(col(d)), v = c(d))
DF[order(DF$v, decreasing=TRUE), ][1:5, ]
row col v
91 1 10 2.208443
82 2 9 1.921899
3 3 1 1.785465
32 2 4 1.590146
33 3 4 1.556143
It would be nice to only have to partially sort, but in ?order, it looks like this option is only available for sort, not for order.
If the matrix has row and col names, it might be convenient to see them instead of numbers. Here's what I might do:
dimnames(d) <- list(letters[1:10], letters[1:10])
DF = data.frame(as.table(d))
DF[order(DF$Freq, decreasing=TRUE), ][1:5, ]
Var1 Var2 Freq
91 a j 2.208443
82 b i 1.921899
3 c a 1.785465
32 b d 1.590146
33 c d 1.556143
The column names don't make much sense here, unfortunately, but you can change them with names(DF) <- as usual.
Here is one option with Matrix
library(Matrix)
m1 <- summary(Matrix(d, sparse=TRUE))
head(m1[order(-m1[,3]),],5)
# i j x
#93 3 10 2.359634
#31 1 4 2.234804
#23 3 3 1.980956
#55 5 6 1.801341
#16 6 2 1.678989
Or use melt
library(reshape2)
m2 <- melt(d)
head(m2[order(-m2[,3]), ], 5)
Here is something quite simple in base R.
# set.seed(20)
# d <- matrix(rnorm(100), nrow = 10, ncol = 10)
d.rank <- matrix(rank(-d), nrow = 10, ncol = 10)
which(d.rank <= 5, arr.ind=TRUE)
row col
[1,] 3 1
[2,] 2 4
[3,] 3 4
[4,] 2 9
[5,] 1 10
d[d.rank <= 5]
[1] 1.785465 1.590146 1.556143 1.921899 2.208443
Results can (easily) be made clearer (see comment from Frank):
cbind(which(d.rank <= 5, arr.ind=TRUE), v = d[d.rank <= 5], rank = rank(-d[d.rank <= 5]))
row col v rank
[1,] 3 1 1.785465 3
[2,] 2 4 1.590146 4
[3,] 3 4 1.556143 5
[4,] 2 9 1.921899 2
[5,] 1 10 2.208443 1
I have 2 objects:
A data frame with 3 variables:
v1 <- 1:10
v2 <- 11:20
v3 <- 21:30
df <- data.frame(v1,v2,v3)
A numeric vector with 3 elements:
nv <- c(6,11,28)
I would like to compare the first variable to the first number, the second variable to the second number and so on.
which(df$v1 > nv[1])
which(df$v2 > nv[2])
which(df$v3 > nv[3])
Of course in reality my data frame has a lot more variables so manually typing each variable is not an option.
I encounter these kinds of problems quite frequently. What kind of documentation would I need to read to be fluent in these matters?
One option would be to compare with equally sized elements. For this we can replicate the elements in 'nv' each by number of rows of 'df' (rep(nv, each=nrow(df))) and compare with df or use the col function that does similar output as rep.
which(df > nv[col(df)], arr.ind=TRUE)
If you need a logical matrix that corresponds to comparison of each column with each element of 'nv'
sweep(df, 2, nv, FUN='>')
You could also use mapply:
mapply(FUN=function(x, y)which(x > y), x=df, y=nv)
#$v1
#[1] 7 8 9 10
#
#$v2
#[1] 2 3 4 5 6 7 8 9 10
#
#$v3
#[1] 9 10
I think these sorts of situations are tricky because normal looping solutions (e.g. the apply function) only loop through one object, but you need to loop both through df and nv simultaneously. One approach is to loop through the indices and to use them to grab the appropriate information from both df and nv. A convenient way to loop through indices is the sapply function:
sapply(seq_along(nv), function(x) which(df[,x] > nv[x]))
# [[1]]
# [1] 7 8 9 10
#
# [[2]]
# [1] 2 3 4 5 6 7 8 9 10
#
# [[3]]
# [1] 9 10
I have a vector to be append, and here is the code,which is pretty slow due to the nrow is big.
All I want to is to speed up. I have tried c() and append() and both seems not fast enough.
And I checkd Efficiently adding or removing elements to a vector or list in R?
Here is the code:
compare<-vector()
for (i in 1:nrow(domin)){
for (j in 1:nrow(domin)){
a=0
if ((domin[i,]$GPA>domin[j,]$GPA) & (domin[i,]$SAT>domin[j,]$SAT)){
a=1
}
compare<-c(compare,a)
}
print(i)
}
I found it is hard to figure out the index for the compare if I use
#compare<-rep(0,times=nrow(opt_predict)*nrow(opt_predict))
The information you want would be better placed in a matrix:
v1 <- 1:3
v2 <- c(1,2,2)
mat1 <- outer(v1,v1,`>`)
mat2 <- outer(v2,v2,`>`)
both <- mat1 & mat2
To see which positions the inequality holds for, use which:
which(both,arr.ind=TRUE)
# row col
# [1,] 2 1
# [2,] 3 1
Comments:
This answer should be a lot faster than your loop. However, you are really just sorting two vectors, so there is probably a faster way to do this than taking the exhaustive set of inequalities...
In your case, there is only a partial ordering (since, for a given i and j, it is possible that neither one is strictly greater than the other in both dimensions). If you were satisfied with sorting first on v1 and then on v2, you could use the data.table package to easily get a full ordering:
set.seed(1)
v1 <- sample.int(10,replace=TRUE)
v2 <- sample.int(10,replace=TRUE)
require(data.table)
DT <- data.table(v1,v2)
setkey(DT)
DT[,rank:=.GRP,by='v1,v2']
which gives
v1 v2 rank
1: 1 8 1
2: 3 3 2
3: 3 8 3
4: 4 2 4
5: 6 7 5
6: 7 4 6
7: 7 10 7
8: 9 5 8
9: 10 4 9
10: 10 8 10
It depends on what you were planning to do next.