R: rowsum function changes order of groups after aggregation - r

I've got this data frame which has duplicates (same ID but different numbers):
ID X1 X2 X3 X4 X5
45 1 0 0 1 0
45 0 1 0 0 1
15 1 0 1 0 0
7 1 0 1 1 0
7 0 1 0 0 0
I want to sum the vectors that have the same ID so I've used rowsum:
m <- rowsum(m, m$ID)
However it messes up with the order of the rows showing something like this:
ID X1 X2 X3 X4 X5
15 1 0 1 0 0
45 1 1 0 1 1
7 1 1 1 1 0
Instead of what I want:
ID X1 X2 X3 X4 X5
45 1 1 0 1 1
15 1 0 1 0 0
7 1 1 1 1 0
Anyone knows how to fix this?

Put reorder = FALSE in rowsum.
From ?rowsum:
reorder: if ‘TRUE’, then the result will be in order of
‘sort(unique(group))’, if ‘FALSE’, it will be in the order
that groups were encountered.

Related

Change values in multiple columns with unique value and merge into single column (R)

Let's say I have a dataset (ds) with 4 rows with 3 variables as seen below:
ds
x1 x2 x3
1 0 0
0 0 1
0 1 0
0 0 1
How do I change the "1" to a unique value for each column and combine them into a single column?
So, the first step:
x1 x2 x3
1 0 0
0 0 3
0 2 0
0 0 3
Then, the second step (creating x4):
x1 x2 x3 x4
1 0 0 1
0 0 3 3
0 2 0 2
0 0 3 3
I have a lot more variables than this, I just want to know how to minimize the number of lines I write so it's not like 10+ lines.
You could do this:
df <- read.table(text="x1 x2 x3
1 0 0
0 0 1
0 1 0
0 0 1", header=TRUE, stringsAsFactors=FALSE)
df <- df*col(df)
df$x4 <- rowSums(df)
x1 x2 x3 x4
1 1 0 0 1
2 0 0 3 3
3 0 2 0 2
4 0 0 3 3

Looking to multiply all rows of all columns by a constant

I am trying to find a simple (1 line of code or so) to multiply all rows of all columns of a dataframe by 100 for example.
df <- data.frame(replicate(10,sample(0:1,1000,rep=TRUE)))
head(df)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 0 1 1 0 0 0 0 1 0 1
2 0 0 1 0 0 1 0 1 0 0
3 0 0 1 1 0 0 1 1 0 0
4 0 1 0 1 1 1 0 0 0 1
5 1 0 1 1 0 0 0 1 0 0
6 0 0 0 0 1 0 0 1 1 1
The way I am currently doing it;
dfX1 <- as.data.frame(df$X1 * 100)
But this way I would have to do this 10 times... and then use the cbind function to bind them all back together again.
dfFULL <- cbind(dfX1, dfX2, dfX3...)
Anybody know of a cleaner way?

Use if statement for each cell of a dataframe in R

I have two dataframes, A and B, each with 64 rows and 431 columns. Each dataframe contains values of zeros and ones. I need to create a new dataframe C that has values of 1 when the cell of A is equal to the cell of B, and a value of 0 when a cell of A is different to the cell of B. How to apply the if statement to each cell of the two dataframes?
Example of dataframes
A <- data.frame(replicate(431,sample(0:1,64,rep=TRUE)))
B <- data.frame(replicate(431,sample(0:1,64,rep=TRUE)))
Example rows from A
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 0 1 1 0 1 0 1 0 0 1
2 1 1 0 1 1 0 0 0 0 0
3 1 0 0 0 1 0 0 1 1 0
4 0 0 0 0 1 1 1 1 1 0
5 1 0 1 1 0 0 0 1 1 1
Example rows from B
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 1 0 1 0 0 1 0 1 0 1
2 0 0 0 1 0 1 1 1 1 1
3 1 0 1 1 1 1 0 0 0 0
4 1 0 0 0 0 1 1 0 0 0
5 0 0 0 0 1 1 1 1 1 0
Output I would like to obtain, dataframe C
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 0 0 1 0 0 0 0 0 1 0
2 0 0 1 1 0 0 0 0 0 0
3 1 1 0 0 1 0 1 0 0 1
4 0 1 1 1 0 1 1 0 0 1
5 0 1 0 0 0 0 0 1 0 0
Because of R's behind the scenes magic, you don't even need to use an if statement. You can just do this:
C <- (A == B) * 1
The first part (A == B) goes through every cell of A and B and compares them directly. The result is a bunch of TRUE and FALSE values. Multiplying everything by 1 forces the TRUE values to become 1 and FALSE to become 0.
You assess whether A and B are the same (cell-wise) and then transform the TRUE / FALSE values into binary by multiplying it by 1:
df <- (A == B) * 1
The previous answers are correct. If you really want to use an if statement, then you can use this:
C <- ifelse(A == B, 1, 0)
Basic operations on R matrix-like data structures tend to be cell-wise. Logicals mixed with numbers in operations tend to coerced into the number themselves, 0 (FALSE) and 1 (TRUE) so the (A == B) + 0 would do what you want to cell-wise, however to make sure that the result is a data.frame and not a matrix you need to call as.data.frame:
C = as.data.frame((A == B) + 0)

Add columns from different data frames and stack on two indicators

We’d like to merge some columns from a data frame with the matching columns from various different data frames. Our main data frame predict looks as follows:
>predict
x1 x2 x3
1 1 1
0 1 0
1 1 0
1 1 0
0 0 1
(There may be more columns depending on the quantity of prediction runs)
Our goal is to merge this data frame with the y-columns from three different test data frames (df_1 df_2 and df_3) which all have the same structure. The needed columns are accessed through df_1$y[test] ([test] is a logical vector which identifies the 5 values which match our x-values) and have the same structure as the x-columns from predict.
The desired output would look like this:
>predict_test
x1 x2 x3 y1 y2 y3
1 1 1 1 1 1
0 1 0 0 0 0
1 1 0 0 1 0
1 1 0 1 1 1
0 0 1 0 0 1
In the next step we need to stack the x- and the y- columns into one column in order to do evaluations. It is important to stack them in the correct order, i.e. x2 under x1 and x3 under x2. The y-columns respectively.
>predict_test_stack
x_all y_all
1 1
0 0
1 0
1 1
0 0
1 1
1 0
1 1
1 1
0 0
1 1
0 0
0 0
0 1
1 1
This probably works with melt, but we don't know how to apply it while indicating two different id variables.
Thanks for your help.
data
df1 <- read.table(text = "x1 x2 x3
1 1 1
0 1 0
1 1 0
1 1 0
0 0 1",stringsAsFactors = FALSE,header=TRUE)
df2 <- read.table(text = "y1 y2 y3
1 1 1
0 0 0
0 1 0
1 1 1
0 0 1",stringsAsFactors = FALSE,header=TRUE)
solution
we concatenate the data.frames, then unlist the data.frame, keeping the correct number of columns. Finally we set the names by going into the data.frames to find the pattern.
list1 <- list(df1,df2)
side_by_side <- data.frame(list1)
# x1 x2 x3 y1 y2 y3
# 1 1 1 1 1 1 1
# 2 0 1 0 0 0 0
# 3 1 1 0 0 1 0
# 4 1 1 0 1 1 1
# 5 0 0 1 0 0 1
output <- data.frame(matrix(unlist(side_by_side),ncol = length(list1)))
names(output) <- sapply(list1,function(x){sub("[[:digit:]]","",names(x)[1])})
# x y
# 1 1 1
# 2 0 0
# 3 1 0
# 4 1 1
# 5 0 0
# 6 1 1
# 7 1 0
# 8 1 1
# 9 1 1
# 10 0 0
# 11 1 1
# 12 0 0
# 13 0 0
# 14 0 1
# 15 1 1

How to change data to binary in R and keep the row names column?

I have a data frame that looks like this
Site <- c("X1","X2","X3","X4","X5","X6","X7","X8","X9","X10")
A <- c(0,0,1,2,4,5,6,7,13,56)
B <- c(1,0,0,0,0,4,5,7,7,8)
C <- c(2,3,0,0,4,5,67,8,43,21)
D <- c(134,0,0,2,0,0,9,0,45,55)
mydata <- data.frame(Site,A,B,C,D,stringsAsFactors=FALSE)
I want to convert all values > 0 to be 1 (i.e. binary), without jeopardising the column and row names.
I have tried mydata[mydata>=1]<-1 but it also changed my first column (the row names) to 1 as well:
head(mydata)
Site A B C D
1 1 0 1 1 1
2 1 0 0 1 0
3 1 1 0 0 0
4 1 1 0 0 1
5 1 1 0 1 0
6 1 1 1 1 0
So how do I change just the values to binary, not the row names?
We can create a logical matrix and coerce to binary
mydata[-1] <- +(mydata[-1] > 0)
As an alternative to the answer given by #akrun (+1), we can also try using sapply() to logically convert any non-zero number to 1 or else 0:
mydata[-1] <- sapply(mydata[-1], function(x) { as.numeric(x > 0) })
mydata
Site A B C D
1 X1 0 1 1 1
2 X2 0 0 1 0
3 X3 1 0 0 0
4 X4 1 0 0 1
5 X5 1 0 1 0
6 X6 1 1 1 0
7 X7 1 1 1 1
8 X8 1 1 1 0
9 X9 1 1 1 1
10 X10 1 1 1 1
If we weren't sure about the relative positioning of the columns, we could also address the numeric columns using mydata[c("A", "B", "C", "D")] or something similar.
You could also try this which disregards if the number is negative or positive:
mydata[-1] <- (!is.na(mydata[-1]/mydata[-1]))*1
ifelse function allows you to assign a new data if the value agrees or not your condition. Works for vectors but data frames also. I bind the Site column with the transformed ones.
myBinData <- data.frame(Site = mydata$Site, ifelse(mydata[, -1] == 0, 0, 1))
Site A B C D
1 X1 0 1 1 1
2 X2 0 0 1 0
3 X3 1 0 0 0
4 X4 1 0 0 1
5 X5 1 0 1 0
6 X6 1 1 1 0
7 X7 1 1 1 1
8 X8 1 1 1 0
9 X9 1 1 1 1
10 X10 1 1 1 1

Resources