Loss of dimensions of dataframe after applying rowMeans() in R - r

I subset a dataframe and i applied rowMeans() on it but the dimensions of the resultant variable ('y') are lost and i am not able to use 'y' in my further code.
dim(mtcars)
# [1] 32 11
y = rowMeans((mtcars[,3:6]))
dim(y)
# NULL
Why 'y' is no longer a dataframe?. And what can i do to get back its dimensions?.
I tried the following but it didn't work.
as.data.frame(y)
# or
data.frame(y)

When you apply rowMeans() you are creating a vector out of a dataframe. So, you are going from n rows and k columns to a nx1 vector.
For a case with n=8 and k=5 we would have:
> a=as.data.frame(matrix(1:40,8,5))
> a
V1 V2 V3 V4 V5
1 1 9 17 25 33
2 2 10 18 26 34
3 3 11 19 27 35
4 4 12 20 28 36
5 5 13 21 29 37
6 6 14 22 30 38
7 7 15 23 31 39
8 8 16 24 32 40
> rowMeans(a)
[1] 17 18 19 20 21 22 23 24

Related

How do I convert a vector of triplets to a 3xnx3 matrix in Dyalog APL?

I have a vector containing 9000 integer elements, where each group of 9 has 3 sub-groups that I'd like to separate out, resulting in a matrix with the shape 3 1000 3. Here's what I did:
⎕IO←0
m←(9÷⍨≢data) 9⍴data
a←m[;0 1 2]
b←m[;3 4 5]
c←m[;6 7 8]
d←↑a b c
which does what I want -- but can I shape the vector directly?
Solution
1 0 2 ⍉ (9÷⍨≢data) 3 3 ⍴ data
Explanation
By using ⍳45 as placeholder data, we can see what is intended:
data ← ⍳45
a←m[;0 1 2]
b←m[;3 4 5]
c←m[;6 7 8]
d←↑a b c
d
0 1 2
9 10 11
18 19 20
27 28 29
36 37 38
3 4 5
12 13 14
21 22 23
30 31 32
39 40 41
6 7 8
15 16 17
24 25 26
33 34 35
42 43 44
The final shape will clearly be 3 (9÷⍨≢data) 3, but we are filling one row from each layer first, then the second row from each layer, and so on. Compare this to the normal way of filling; all rows of the first layer, then all the rows of the second layer, and so on:
3 (9÷⍨≢data) 3⍴data
0 1 2
3 4 5
6 7 8
9 10 11
12 13 14
15 16 17
18 19 20
21 22 23
24 25 26
27 28 29
30 31 32
33 34 35
36 37 38
39 40 41
42 43 44
In other words, our job is to swap the filling order of the first two axes. To do this, we list the axis lengths in the order we want them filled:
(9÷⍨≢data) 3 3⍴data
0 1 2
3 4 5
6 7 8
9 10 11
12 13 14
15 16 17
18 19 20
21 22 23
24 25 26
27 28 29
30 31 32
33 34 35
36 37 38
39 40 41
42 43 44
Now we need to swap the first two axes. This is possible using the dyadic transpose function ⍉ which (for our use case) can be thought of as the "reorder axes" function. The left argument is an array of where you want the corresponding axis to go (first element defines the final location of the first axis and so on). While the normal indices of the axes are 0 1 2 we can swap the first two axes with 1 0 2.
Thus 1 0 2 ⍉ (9÷⍨≢data) 3 3 ⍴ data takes our (9÷⍨≢data) 3 3 shape and puts it into the desired shape of 3 (9÷⍨≢data) 3.
d ≡ 1 0 2 ⍉ (9÷⍨≢data) 3 3 ⍴ data
1

Doing Row products conditionnally to value of specific columns

I have a dataset of this form.
a=data.frame(A=1:5,B=1:5,matrix(seq(50),nrow = 5))
colnames(a)<-c("A","B", paste0(1:10))
A B 1 2 3 4 5 6 7 8 9 10
1 1 1 6 11 16 21 26 31 36 41 46
2 2 2 7 12 17 22 27 32 37 42 47
3 3 3 8 13 18 23 28 33 38 43 48
4 4 4 9 14 19 24 29 34 39 44 49
5 5 5 10 15 20 25 30 35 40 45 50
I am intending to use apply in order to do the product of rows conditionnally to the value of A and B. Let's take row 2 for instance, we have A=2 and B=2 then the code will be looking for column="2" and column="2+2" and will do the product of all the elements of the selected vectors, Result is thus equal to 7*12*17=1248.
I can do it for a row
prod(a[1,match(a$A[1],colnames(a)):match(a$A[1]+a$B[1],colnames(a))])
but can't figure a way to apply it to all the data.frame. Any help?
Here is one option with apply,specify the MARGIN as 1 to loop over the rows,, create the index to match the column names from the first two elements (A, 'B), create a sequence (:), subset the values of 'x' and get theprod`
apply(a, 1, function(x) {
i1 <- match(x[1], names(x))
i2 <- match(x[1] + x[2], names(x))
prod(x[i1:i2])
})
#[1] 6 1428 150696 17535024 2362500000

Select every nth row, offset the start and repeat

I am trying to create a new column in a data.frame that is created by selecting the 9th row of a column starting at the first row (i.e. row 1, row 9, row 17). Once it reaches the nth row of the column I need it to repeat this process starting at row 2 (selecting row 2, row 10, row 18). I have a fixed number of rows at 96 so I need it to repeat until it would start on the 9th row and then quit.
Here is an example of what I would like to do:
df <- data.frame(Row=1:96)
> df$nineth <- c(1,9,17,25,33,41,49,57,65,73,81,89,2,10,18,26,34,42,50,58,66,74,82,90)
> print(df)
Row nineth
1 1 1
2 2 9
3 3 17
4 4 25
5 5 33
6 6 41
7 7 49
8 8 57
9 9 65
10 10 73
11 11 81
12 12 89
13 13 2
14 14 10
15 15 18
16 16 26
17 17 34
18 18 42
19 19 50
20 20 58
21 21 66
22 22 74
23 23 82
24 24 90
Is there a way to do this using a for loop? I am more familiar with them than the apply family.
You can use R's matrix/vector duality to do this easily...
df <- data.frame(Row=1:96)
df$nineth <- as.vector(matrix(df$Row, byrow = TRUE, ncol = 8))
head(df,15)
Row nineth
1 1 1
2 2 9
3 3 17
4 4 25
5 5 33
6 6 41
7 7 49
8 8 57
9 9 65
10 10 73
11 11 81
12 12 89
13 13 2
14 14 10
15 15 18
Following works:
n <- 9
df$nineth <- unlist(lapply(1:(n-1),
function(x){
df$Row[seq(x, nrow(df),by=n-1)]}))

How to choose certain rows and certain column from a group of rows and covert them into one row and then bind them with the data frame?

How to choose certain rows and certain column from a group of rows and covert them into one row and then bind them with the data frame?
Please see the Example df, there are Group 1 and Group 2, each Group has 6 rows.
1st row of the Group, choose Row 1, 2 and 3 of that Group of Weight & Height & Volume and convert them into one row and bind them to the data frame.
2nd row of the Group, choose Row 1, 2 and 3 of that Group of Weight & Height & Volume and convert them into one row and bind them to the data frame.
3rd row of the Group, choose Row 2, 3 and 4 of that Group of Weight & Height & Volume and convert them into one row and bind them to the data frame.
4th row of the Group, choose Row 3, 4 and 5 of that Group of Weight & Height & Volume and convert them into one row and bind them to the data frame.
5th row of the Group, choose Row 4, 5 and 6 of that Group of Weight & Height & Volume and convert them into one row and bind them to the data frame.
6th row of the Group, choose Row 4, 5 and 6 of that Group of Weight & Height & Volume and convert them into one row and bind them to the data frame.
And will become something like the Expected Outcome.
How to achieve this the vectorize way?
Update: I have got what I want to achieve but is there a more efficient approach ?
# create a column to store the row indexes
df$row = rownames(df)
# choose the first row of every group, and store the indexes into "first"
library(plyr)
df$V1 = 0
df$V2 = 0
df$V3 = 0
first = ddply(df, .(Group), function(x) x[ 1 , ])$row
second= ddply(df, .(Group), function(x) x[ 2 , ])$row
third = ddply(df, .(Group), function(x) x[ 3 , ])$row
df$V1[first] = df$Weight[first]
df$V2[first] = df$Height[first]
df$V3[first] = df$Volume[first]
#....so on and do the rest of the V4~6
Example df
Weight Height Volume Group
1: 11 12 17 1
2: 25 17 19 1
3: 29 25 20 1
4: 34 35 27 1
5: 39 36 31 1
6: 18 20 37 1
7: 9 12 4 2
8: 10 33 7 2
9: 18 25 19 2
10: 26 19 20 2
11: 27 22 25 2
12: 38 59 36 2
Expected Outcome
Weight Height Volume Group V1 V2 V3 V4 V5 V6 V7 V8 V9
1: 11 12 17 1 11 12 17 29 25 20 29 25 20
2: 25 17 19 1 11 12 17 29 25 20 29 25 20
3: 29 25 20 1 25 17 19 29 25 20 34 35 27
4: 34 35 27 1 29 25 20 34 35 27 39 36 31
5: 39 36 31 1 34 35 27 39 36 31 18 20 37
6: 18 20 37 1 34 35 27 39 36 31 18 20 37
7: 9 12 4 2 9 12 4 10 33 7 18 25 19
8: 10 33 7 2 9 12 4 10 33 7 18 25 19
9: 18 25 19 2 10 33 7 18 25 19 26 19 20
10: 26 19 20 2 18 25 19 26 19 20 27 22 25
11: 27 22 25 2 26 19 20 27 22 25 38 59 36
12: 38 59 36 2 26 19 20 27 22 25 38 59 36

rank and order in R

i am having trouble understanding the difference between the R function rank and the R function order. they seem to produce the same output:
> rank(c(10,30,20,50,40))
[1] 1 3 2 5 4
> order(c(10,30,20,50,40))
[1] 1 3 2 5 4
Could somebody shed some light on this for me?
Thanks
set.seed(1)
x <- sample(1:50, 30)
x
# [1] 14 19 28 43 10 41 42 29 27 3 9 7 44 15 48 18 25 33 13 34 47 39 49 4 30 46 1 40 20 8
rank(x)
# [1] 9 12 16 25 7 23 24 17 15 2 6 4 26 10 29 11 14 19 8 20 28 21 30 3 18 27 1 22 13 5
order(x)
# [1] 27 10 24 12 30 11 5 19 1 14 16 2 29 17 9 3 8 25 18 20 22 28 6 7 4 13 26 21 15 23
rank returns a vector with the "rank" of each value. the number in the first position is the 9th lowest. order returns the indices that would put the initial vector x in order.
The 27th value of x is the lowest, so 27 is the first element of order(x) - and if you look at rank(x), the 27th element is 1.
x[order(x)]
# [1] 1 3 4 7 8 9 10 13 14 15 18 19 20 25 27 28 29 30 33 34 39 40 41 42 43 44 46 47 48 49
As it turned out this was a special case and made things confusing. I explain below for anyone interested:
rank returns the order of each element in an ascending list
order returns the index each element would have in an ascending list
I always find it confusing to think about the difference between the two, and I always think, "how can I get to order using rank"?
Starting with Justin's example:
Order using rank:
## Setup example to match Justin's example
set.seed(1)
x <- sample(1:50, 30)
## Make a vector to store the sorted x values
xx = integer(length(x))
## i is the index, ir is the ith "rank" value
i = 0
for(ir in rank(x)){
i = i + 1
xx[ir] = x[i]
}
all(xx==x[order(x)])
[1] TRUE
rank is more complicated and not neccessarily an index (integer):
> rank(c(1))
[1] 1
> rank(c(1,1))
[1] 1.5 1.5
> rank(c(1,1,1))
[1] 2 2 2
> rank(c(1,1,1,1))
[1] 2.5 2.5 2.5 2.5
In layman's language, order gives the actual place/position of a value after sorting the values
For eg:
a<-c(3,4,2,7,8,5,1,6)
sort(a) [1] 1 2 3 4 5 6 7 8
The position of 1 in a is 7. similarly position of 2 in a is 3.
order(a) [1] 7 3 1 2 6 8 4 5
as is stated by ?order() in R prompt,
order just return a permutation which sort the original vector into ascending/descending order.
suppose that we have a vector
A<-c(1,4,3,6,7,4);
A.sort<-sort(A);
then
order(A) == match(A.sort,A);
rank(A) == match(A,A.sort);
besides, i find that order has the following property(not validated theoratically):
1 order(A)∈(1,length(A))
2 order(order(order(....order(A)....))):if you take the order of A in odds number of times, the results remains the same, so as to even number of times.
some observations:
set.seed(0)
x<-matrix(rnorm(10),1)
dm<-diag(length(x))
# compute rank from order and backwards:
rank(x) == col(x)%*%dm[order(x),]
order(x) == col(x)%*%dm[rank(x),]
# in special cases like this
x<-cumsum(rep(c(2,0),times=5))+rep(c(0,-1),times=5)
# they are equal
order(x)==rank(x)

Resources