How generate binary variables without duplicated ID rows - r

I have this data frame :
>df
ID X1 X2 X3
IX 0 0 1
IX 1 1 0
IY 0 0 1
IZ 1 0 0
IZ 0 1 0
I need to create a no duplicated data frame that have unique ID and
as result it should take in consideration all binary elements
In other word , the result should be :
ID X1 X2 X3
IX 1 1 1
IY 0 0 1
IZ 1 1 0
I tried to use the duplicated function, but it just delete ID rows without having consideration to binary values and it doesn't give the needed result.
What should I do please?

aggregate(df[2:4], df[1], sum)

Related

Create a new variable based on any 2 conditions being true

I have a dataframe in R with 4 variables and would like to create a new variable based on any 2 conditions being true on those variables.
I have attempted to create it via if/else statements however would require a permutation of every variable condition being true. I would also need to scale to where I can create a new variable based on any 3 conditions being true. I am not sure if there is a more efficient method than using if/else statements?
My example:
I have a dataframe X with following column variables
x1 = c(1,0,1,0)
X2 = c(0,0,0,0)
X3 = c(1,1,0,0)
X4 = c(0,0,1,0)
I would like to create a new variable X5 if any 2 of the variables are true (eg ==1)
The new variable based on the above dataframe would produce X5 (1,0,1,0)
This can easily be done by using the apply function:
x1 = c(1,0,1,0)
x2 = c(0,0,0,0)
x3 = c(1,1,0,0)
x4 = c(0,0,1,0)
df <- data.frame(x1,x2,x3,x4)
df$x5 <- apply(df,1,function(row) ifelse(sum(row != 0) == 2, 1, 0))
x1 x2 x3 x4 X5
1 1 0 1 0 1
2 0 0 1 0 0
3 1 0 0 1 1
4 0 0 0 0 0
apply with option 1 means: Do this function on every row. To scale this up to 3...N true values, just change the number in the ifelse statement.
You can try this:
#Data
df <- data.frame(x1,X2,X3,X4)
#Code
df$X5 <- ifelse(rowSums(df,na.rm=T)==2,1,0)
x1 X2 X3 X4 X5
1 1 0 1 0 1
2 0 0 1 0 0
3 1 0 0 1 1
4 0 0 0 0 0
You can use:
df$X5 <- 1*(apply(df == 1, 1, sum) == 2)
or
df$X5 <- 1*(mapply(sum, df) == 2)
Output
> df
X1 X2 X3 X4 X5
1 0 1 0 1
0 0 1 0 0
1 0 0 1 1
0 0 0 0 0
Data
df <- data.frame(X1,X2,X3,X4)

How to count how many conditions an observation meets using R?

If I have a date set with lots of binary variables, all with values o/1. I want to create a new column, and add by one if the observation is 1 of one binary variable, add by two if it has 1 of two binary variables...
Such as:
x1 x2 x3 x4 x5
1 1 1 0 1
0 0 1 0 0
0 0 0 0 0
I want to have
x1 x2 x3 x4 x5 count
1 1 1 0 1 4
0 0 1 0 0 1
0 0 0 0 0 0
If your dataset contains only the binary variables you are interested in, you can use
df$count <- rowSums(df)
Otherwise, please provide a more detailed description of your data.
Another option is Reduce with +
df$count <- Reduce(`+`, df)

extract rows for which first non-zero element is one

I would like to extract every row from the data frame my.data for which the first non-zero element is a 1.
my.data <- read.table(text = '
x1 x2 x3 x4
0 0 1 1
0 0 0 1
0 2 1 1
2 1 2 1
1 1 1 2
0 0 0 0
0 1 0 0
', header = TRUE)
my.data
desired.result <- read.table(text = '
x1 x2 x3 x4
0 0 1 1
0 0 0 1
1 1 1 2
0 1 0 0
', header = TRUE)
desired.result
I am not even sure where to begin. Sorry if this is a duplicate. Thank you for any suggestions or advice.
Here's one approach:
# index of rows
idx <- apply(my.data, 1, function(x) any(x) && x[as.logical(x)][1] == 1)
# extract rows
desired.result <- my.data[idx, ]
The result:
x1 x2 x3 x4
1 0 0 1 1
2 0 0 0 1
5 1 1 1 2
7 0 1 0 0
Probably not the best answer, but:
rows.to.extract <- apply(my.data, 1, function(x) {
no.zeroes <- x[x!=0] # removing 0
to.return <- no.zeroes[1] == 1 # finding if first number is 0
# if a row is all 0, then to.return will be NA
# this fixes that problem
to.return[is.na(to.return)] <- FALSE # if row is all 0
to.return
})
my.data[rows.to.extract, ]
x1 x2 x3 x4
1 0 0 1 1
2 0 0 0 1
5 1 1 1 2
7 0 1 0 0
Use apply to iterate over all rows:
first.element.is.one <- apply(my.data, 1, function(x) x[x != 0][1] == 1)
The function passed to apply compares the first [1] non-zero [x != 0] element of x to == 1. It will be called once for each row, x will be a vector of four in your example.
Use which to extract the indices of the candidate rows (and remove NA values, too):
desired.rows <- which(first.element.is.one)
Select the rows of the matrix -- you probably know how to do this.
Bonus question: Where do the NA values mentioned in step 2 come from?

R subsetting data according to non zeros in row

I have a dataset like this:
1 2 3 4 5
1 1 0 0 0 0
2 1 0 0 2 0
3 0 0 0 1 5
4 2 0 0 0 0
I want to subset the rows with more than one column beeing non-zero (means rows 2,3)
I know that it has to be something like dataset[... dataset ...] but I did not find out how to access rows as such without using a for-loop
You just need rowSums really. Assuming your dataset is called "mydf", try:
> mydf[rowSums(mydf != 0) > 1, ]
X1 X2 X3 X4 X5
2 1 0 0 2 0
3 0 0 0 1 5
Here, rowSums(mydf != 0) will return a vector of how many values in each row is greater not zero. Then, adding our condition > 1 would create a logical vector which can be used to subset the rows we want.

Adding columns to data.frame in R

Having 2 vectors like the following:
vec1<-c("x", "y")
vec2<-c(rep(0, 5))
I would like to create a data.frame object where vec1 becomes the 1st of column of data.frame DF and vec2 becomes the row with column names too. Visually talking, it may be like this.
vec1 1 2 3 4 5
x 0 0 0 0 0
y 0 0 0 0 0
I have tried the following code, but it adds both vectors as columns:
DF<-data.frame(vec1, vec2)
Instead of generating a vector for your rows, you can generate a whole matrix, and then use data.frame to bind it to your first vector. Something like this :
mat <- matrix(0, nrow=2, ncol=5)
vec <- c("x","y")
data.frame(vec, mat)
Which gives :
vec X1 X2 X3 X4 X5
1 x 0 0 0 0 0
2 y 0 0 0 0 0
You can use rbind() inside the data.frame() function to put vec2 values in both rows of new data frame.
vec1<-c("x", "y")
vec2<-c(rep(0, 5))
data.frame(vec1,rbind(vec2,vec2))
vec1 X1 X2 X3 X4 X5
1 x 0 0 0 0 0
2 y 0 0 0 0 0

Resources