extract rows for which first non-zero element is one - r

I would like to extract every row from the data frame my.data for which the first non-zero element is a 1.
my.data <- read.table(text = '
x1 x2 x3 x4
0 0 1 1
0 0 0 1
0 2 1 1
2 1 2 1
1 1 1 2
0 0 0 0
0 1 0 0
', header = TRUE)
my.data
desired.result <- read.table(text = '
x1 x2 x3 x4
0 0 1 1
0 0 0 1
1 1 1 2
0 1 0 0
', header = TRUE)
desired.result
I am not even sure where to begin. Sorry if this is a duplicate. Thank you for any suggestions or advice.

Here's one approach:
# index of rows
idx <- apply(my.data, 1, function(x) any(x) && x[as.logical(x)][1] == 1)
# extract rows
desired.result <- my.data[idx, ]
The result:
x1 x2 x3 x4
1 0 0 1 1
2 0 0 0 1
5 1 1 1 2
7 0 1 0 0

Probably not the best answer, but:
rows.to.extract <- apply(my.data, 1, function(x) {
no.zeroes <- x[x!=0] # removing 0
to.return <- no.zeroes[1] == 1 # finding if first number is 0
# if a row is all 0, then to.return will be NA
# this fixes that problem
to.return[is.na(to.return)] <- FALSE # if row is all 0
to.return
})
my.data[rows.to.extract, ]
x1 x2 x3 x4
1 0 0 1 1
2 0 0 0 1
5 1 1 1 2
7 0 1 0 0

Use apply to iterate over all rows:
first.element.is.one <- apply(my.data, 1, function(x) x[x != 0][1] == 1)
The function passed to apply compares the first [1] non-zero [x != 0] element of x to == 1. It will be called once for each row, x will be a vector of four in your example.
Use which to extract the indices of the candidate rows (and remove NA values, too):
desired.rows <- which(first.element.is.one)
Select the rows of the matrix -- you probably know how to do this.
Bonus question: Where do the NA values mentioned in step 2 come from?

Related

How to randomly replace a value

I have a vector of a certain length of which I want to randomly replace every 2 by 0 or 1, with a probability of 0.4 (for value=1). I have used this code below. I expected to have a different value (0 or 1) for the different 2 replaced, but I have only 1 or 0 that replace the 2.
vec<-c(rep(2,18),1,0)
ifelse (vec==2,rbinom(1,1,0.40)
here is one output
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
and another output
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
When you go into the source code of ifelse via typing View(ifelse), you will see a piece of code showing that
len <- length(ans)
ypos <- which(test)
npos <- which(!test)
if (length(ypos) > 0L)
ans[ypos] <- rep(yes, length.out = len)[ypos]
if (length(npos) > 0L)
ans[npos] <- rep(no, length.out = len)[npos]
ans
That means, once you have one single value for yes or no in ifelse, that single value is repeated len times and placed to the corresponding logical positions.
In you case, rbinom(1,1,0.40) is just a single value for yes, thus being repeated once it has an realization.
One workaround is like below
> ifelse(vec == 2, rbinom(sum(vec == 2), 1, 0.40), vec)
[1] 1 1 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 0
This replaces all 2 values with either 0 or 1
vec[vec == 2] <- rbinom(sum(vec == 2), 1, prob = .4)
If you draw a 0 and want the value to remain 2 then you could use sample, which would be equivalent to a binomial draw:
vec[vec == 2] <- sample(c(1, 2), sum(vec == 2), prob = c(0.4, 0.6), replace = T)
Try next code:
#Code
vec<-c(rep(2,18),1,0)
vec2 <- unlist(lapply(seq(2,length(vec),by=2), function(x) {vec[x] <- rbinom(1,1,0.40)}))
vec[seq(2,length(vec),by=2)] <-vec2
Output:
vec
[1] 2 0 2 0 2 1 2 0 2 0 2 0 2 1 2 0 2 0 1 1

Create a new variable based on any 2 conditions being true

I have a dataframe in R with 4 variables and would like to create a new variable based on any 2 conditions being true on those variables.
I have attempted to create it via if/else statements however would require a permutation of every variable condition being true. I would also need to scale to where I can create a new variable based on any 3 conditions being true. I am not sure if there is a more efficient method than using if/else statements?
My example:
I have a dataframe X with following column variables
x1 = c(1,0,1,0)
X2 = c(0,0,0,0)
X3 = c(1,1,0,0)
X4 = c(0,0,1,0)
I would like to create a new variable X5 if any 2 of the variables are true (eg ==1)
The new variable based on the above dataframe would produce X5 (1,0,1,0)
This can easily be done by using the apply function:
x1 = c(1,0,1,0)
x2 = c(0,0,0,0)
x3 = c(1,1,0,0)
x4 = c(0,0,1,0)
df <- data.frame(x1,x2,x3,x4)
df$x5 <- apply(df,1,function(row) ifelse(sum(row != 0) == 2, 1, 0))
x1 x2 x3 x4 X5
1 1 0 1 0 1
2 0 0 1 0 0
3 1 0 0 1 1
4 0 0 0 0 0
apply with option 1 means: Do this function on every row. To scale this up to 3...N true values, just change the number in the ifelse statement.
You can try this:
#Data
df <- data.frame(x1,X2,X3,X4)
#Code
df$X5 <- ifelse(rowSums(df,na.rm=T)==2,1,0)
x1 X2 X3 X4 X5
1 1 0 1 0 1
2 0 0 1 0 0
3 1 0 0 1 1
4 0 0 0 0 0
You can use:
df$X5 <- 1*(apply(df == 1, 1, sum) == 2)
or
df$X5 <- 1*(mapply(sum, df) == 2)
Output
> df
X1 X2 X3 X4 X5
1 0 1 0 1
0 0 1 0 0
1 0 0 1 1
0 0 0 0 0
Data
df <- data.frame(X1,X2,X3,X4)

selecting all rows which has value > 1 in r

dataseti have a data and want to select all rows which has value > 1 in r.
i tried
sel <- apply(data[,collist],1,function(row) "1" %in% row)
but it is not working and give me whole a data frame,
[data set][1]
how can i subset these data?
thanks
The Note at the end shows the data used in the examples below. I have changed the headings as shown since the ones provided in the question are unwieldy and have removed the column of minus signs.
1) Using that data, the correct answer to the question of selecting all rows with a 1 in any column is that only the first two data rows are selected and that is, in fact, what happens:
subset(data, A == 1 | B == 1 | C == 1)
## Sym A B C
## 1 ACAP3 0 0 1
## 2 ACTRT2 0 0 1
2) This version does not make use of the headings:
has1 <- rowSums(data == 1) > 0
data[has1, ]
## Sym A B C
## 1 ACAP3 0 0 1
## 2 ACTRT2 0 0 1
3) Although the above should work it would be a bit safer to just check the numeric columns which for this data can be done like this:
has1 <- rowSums(data[-1] == 1) > 0
data[has1, ]
## Sym A B C
## 1 ACAP3 0 0 1
## 2 ACTRT2 0 0 1
4) or if we did not know which columns were numeric:
is.num <- sapply(data, is.numeric)
has1 <- rowSums(data[is.num] == 1) > 0
data[has1, ]
## Sym A B C
## 1 ACAP3 0 0 1
## 2 ACTRT2 0 0 1
Note
As the question did not provide input in reproducible form, the input shown in such form is assumed to be:
Lines <- 'Hugo_Symbol "A - 3 A- A9J" "B - F2 - 7273 - 01" "C - FB - AAPP - 01"
ACAP3 0 0 - 1
ACTRT2 0 0 - 1
AGRN 0 0 - 0
ANKRD65 0 0 - 0
ATAD3A 0 0 - 0
'
data <- read.table(text = Lines, skip = 1, col.names = c("Sym", "A", "B", "X", "C"),
colClasses = c(NA, NA, NA, "NULL", NA))
The above produces this:
data
## Sym A B C
## 1 ACAP3 0 0 1
## 2 ACTRT2 0 0 1
## 3 AGRN 0 0 0
## 4 ANKRD65 0 0 0
## 5 ATAD3A 0 0 0

Assign value to new column if another column has exact string

I have a df that looks like this.
Date Winner
4/12 Tom
4/13 Abe
4/14 George
4/15 Tom
I would like to add new columns that assign a 1 if if the name appears in the winner column and 0 if the name did not appear and vice versa. Ideally the df would look like this as a result
Date Winner Tom_Win Tom_Lose Abe_Win Abe_Lose George_Win George Lose
4/12 Tom 1 0 0 1 0 1
4/13 Abe 0 1 1 0 0 1
4/14 George 0 1 0 1 1 0
4/15 Tom 1 0 0 1 0 1
Is there an easy way to accomplish this?
This is extremely simple to do if you use the model.matrix functions, it will create N dummy columns with 0 when the name does not appear and one when it does (exactly as you requested), the code below:
(assuming your data is called db)
> winners <- model.matrix(~Winner - 1, data=db)
> winners
WinnerAbe WinnerGeorge WinnerTom
1 0 0 1
2 1 0 0
3 0 1 0
4 0 0 1
This bit is to compute the columns with the losing values
winners <- as.data.frame(winners)
winners$loserAbe <- as.numeric(!winners$WinnerAbe) #naturally you have to
#do this for every column you need
WinnerAbe WinnerGeorge WinnerTom loserAbe
1 0 0 1 1
2 1 0 0 0
3 0 1 0 1
4 0 0 1 1
winners$Date <- db$Date #this last bit so you don't lose the date.
Using mtabulate from qdapTools package we can do the following three steps,
library(qdapTools)
d1 <- mtabulate(d3$Winner)
d2 <- setNames(data.frame(sapply(d1, function(i) ifelse(i == 1, 0, 1))),
paste0(names(d1), '_Lose'))
cbind(d3$Date, d1, d2)
# d3$Date Abe George Tom Abe_Lose George_Lose Tom_Lose
#1 4/12 0 0 1 1 1 0
#2 4/13 1 0 0 0 1 1
#3 4/14 0 1 0 1 0 1
#4 4/15 0 0 1 1 1 0
DATA
str(d3)
'data.frame': 4 obs. of 2 variables:
$ Date : Factor w/ 4 levels "4/12","4/13",..: 1 2 3 4
$ Winner: Factor w/ 3 levels "Abe","George",..: 3 1 2 3
I'm sure there is a better way than this but this works in base R and it's fairly simple:
If your data looks like this:
df <- data.frame(Date = c("4/12","4/13","4/14","4/15"),Winner = c("Tom","Abe","George","Tom"))
Append the extra columns like so:
xcols <- c(paste0(unique(df$Winner), '_Win'), paste0(unique(df$Winner), '_Lose'))
df[ , xcols] <- 0
Now make a character vector with instructions to give the points for every player.
evl <- unlist(lapply(unique(df$Winner), function(x){paste0('df[', which(df$Winner == x), ',', which(names(df) == paste0(x, '_Win')), '] <- 1')}))
And execute the code:
eval(parse(text = evl))
df <- data.frame(
Date = c("4/12", "4/13","4/14", "4/15"),
Winner = c("Tom", "Abe", "George", "Tom")
)
df2 <- do.call(cbind,
lapply(seq_along(levels(df$Winner)), function(x) {
win <- ifelse(df$Winner == levels(df$Winner)[x], 1, 0)
lose <- ifelse(df$Winner == levels(df$Winner)[x], 0, 1)
dat <- cbind(win, lose)
colnames(dat) <- c(paste(levels(df$Winner)[x], "win", sep = "_"), paste(levels(df$Winner)[x], "lose", sep = "_"))
dat
})
)
cbind(df, df2)
> cbind(df, df2)
Date Winner Abe_win Abe_lose George_win George_lose Tom_win Tom_lose
1 4/12 Tom 0 1 0 1 1 0
2 4/13 Abe 1 0 0 1 0 1
3 4/14 George 0 1 1 0 0 1
4 4/15 Tom 0 1 0 1 1 0

Split vector into contiguous runs of equal values

I have a data table and one of the columns is a bunch of 0's and 1's, just like vec below.
vec = c(rep(1, times = 6), rep(0, times = 10), rep(1, times = 11), rep(0, times = 4))
> vec
[1] 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
What I want to do is to split the data everytime there's a change in that column from 0 to 1 or vice-versa. Here is what I have done so far:
b = c(vec[1],diff(vec))
rowby = numeric(0)
for (i in 2:(length(b))) {
if (b[i] != 0) {
rowby <- c(rowby, i-1)
}
}
splitted_data <- split(vec, cumsum(c(TRUE,(1:length(vec) %in% rowby)[-length(vec)])))
There must be some thing right under my nose I can't see. What is a correct way to do this? This works for the example above, but not generally.
Try
split(vec,cumsum(c(1, abs(diff(vec)))))
#$`1`
#[1] 1 1 1 1 1 1
#$`2`
#[1] 0 0 0 0 0 0 0 0 0 0
#$`3`
#[1] 1 1 1 1 1 1 1 1 1 1 1
#$`4`
#[1] 0 0 0 0
Or use rle
split(vec,inverse.rle(within.list(rle(vec), values <- seq_along(values))))
With current versions of data.table, rleid is one function which can be used for this job:
library(data.table)#v1.9.5+
split(vec,rleid(vec))

Resources