Find rows after certain value in same column - r

I have a column in my dataframe containing ascending numbers which are interrupted by Zeros.
I would like to find all rows which come before a Zero and create a new datatable containing only these rows.
My Column: 1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0
What I need: 4, 6
Any help would be much appreciated! Thanks!

A dplyr solution:
library(dplyr)
df %>%
filter(lead(x) == 0, x != 0)
#> x
#> 1 4
#> 2 6
Created on 2021-07-08 by the reprex package (v2.0.0)
data
df <- data.frame(x = c(1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0))

Welcome to SO!
You can try with base R. The idea is to fetch the rownames of the rows before the 0 and subset() the df by them:
# your data
df <- data.frame(col = c(1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0))
# an index that get all the rownames before the 0
index <- as.numeric(rownames(df)[df$col == 0]) -1
# here you subset your original df by index: there is also a != 0 to remove the 0 before 0
df_ <- subset(df, rownames(df) %in% index & col !=0)
df_
col
4 4
12 6

Using base R:
df <- data.frame(x = c(1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0),
y = LETTERS[1:13])
df[diff(df$x)<0,]
x y
4 4 D
12 6 L

Using Run Lengths in base R. To get the index of x, add the run lengths until 0 value occurs.
x <- c(1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0)
y <- rle(x)
x[cumsum(y$lengths)][which(y$values == 0) - 1]
# [1] 4 6

Related

Plot histogram for each group dplyr

Let's cosnider very easy dataframe containing four groups:
cat <- c(1, 0, 0, 1, 2, 1, 2, 3, 2, 1, 3)
var <- c(10, 5, 3, 2, 5, 1, 2, 10, 50, 2, 30)
df <- data.frame(cat, var)
What I would like to do is that using dplyr plot distribution of values between those four categories
I have the feeling that it can be eaisly done with group_by, but I'm not sure how it can be done. Do you know how I can do it?

split vector after all predefined set of elements occured

I have to do the following:
I have a vector, let as say
x <- c(1, 1, 2, 3, 3, 3, 4, 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1)
I have to subset the remainder of a vector after 1, 2, 3, 4 occurred at least once.
So the subset new vector would only include 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1.
I need a relatively easy solution on how to do this. It might be possible to do an if and while loop with breaks, but I am kinda struggling to come up with a solution.
Is there a simple (even mathematical way) to do this in R?
Use sapply to find where each predefined number occurs first time.
x[-seq(max(sapply(1:4, function(y) which(x == y)[1])))]
# [1] 4 5 5 3 2 11 1 3 3 4 1
Data
x <- c(1, 1, 2, 3, 3, 3, 4, 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1)
You can use run length encoding for this
x = c(1, 1, 2, 3, 3, 3, 4, 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1)
encoded = rle(x)
# Pick the first location of 1, 2, 3, and 4
# Then find the max index location
indices = c(which(encoded$values == 1)[1],
which(encoded$values == 2)[1],
which(encoded$values == 3)[1],
which(encoded$values == 4)[1])
index = max(indices)
# Find the index of x corresponding to your split location
reqd_index = cumsum(encoded$lengths)[index-1] + 2
# Print final split value
x[reqd_index:length(x)]
The result is as follows
> x[reqd_index:length(x)]
[1] 4 5 5 3 2 11 1 3 3 4 1

if else in a loop in R

I want to create a variable region based on a series of similar variables zipid1 to zipid26. My current code is like this:
dat$region <- with(dat, ifelse(zipid1 == 1, 1,
ifelse(zipid2 == 1, 2,
ifelse(zipid3 == 1, 3,
ifelse(zipid4 == 1, 4,
5)))))
How can I write a loop to avoid typing from zipid1 to zipid26? Thanks!
We subset the 'zipid' columns, create a logical matrix by comparing with 1 (== 1), get the column index of the TRUE value with max.col (assuming there is only a single 1 per each row and assign it to create 'region'
dat$region <- max.col(dat[paste0("zipid", 1:26)] == 1, "first")
Using a small reproducible example
max.col(dat[paste0("zipid", 1:5)] == 1, "first")
data
dat <- data.frame(id = 1:5, zipid1 = c(1, 3, 2, 4, 5),
zipid2 = c(2, 1, 3, 5, 4), zipid3 = c(3, 2, 1, 5, 4),
zipid4 = c(4, 3, 6, 2, 1), zipid5 = c(5, 3, 8, 1, 4))

dplyr sample by groups of values

I want to make samples based on grouped values with dplyr :
What I tried :
id <- c(1, 1, 1, 2, 3, 3, 4, 5, 5, 5, 6, 6, 7, 8, 8, 8, 8, 8)
id <- as.data.frame(id)
sample <- id %>%
group_by(id) %>%
sample_n(2, replace = FALSE) %>%
ungroup(id)
sample
Expected result ( n sample =2) :
1, 1, 1, 2
or
1, 1, 1, 3, 3
or
5, 5, 5, 6, 6
etc.
I have got an error:
Error: `size` must be less or equal than 1 (size of data), set `replace` = TRUE to use sampling with replacement
Perhaps this helps
id %>%
distinct(id) %>%
sample_n(2, replace = FALSE) %>%
inner_join(id, .)

merge table in R

I have the 2 tables as below
subj <- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
gamble <- c(1, 2, 3, 1, 2, 3, 1, 2, 3)
ev <- c(4, 5, 6, 4, 5, 6, 4, 5, 6)
table1 <- data.frame(subj, gamble, ev)
subj2 <- c(1, 2, 3)
gamble2 <- c(1, 3, 2)
table2 <- data.frame(subj2, gamble2)
I want to merge the two tables by gamble, only choose the gamble from table 1 which has the same number to gamble in table 2. The expected output is as follows:
sub gamble ev
1 1 4
2 3 6
3 2 5
You are looking for merge
merge(table1, table2, by.x=c("subj", "gamble"), by.y=c("subj2", "gamble2"), all=FALSE, sort=TRUE)
edited as per Ananda's helpful observation

Resources