R: row-wise checking for multiple values - r

I have a dataset that looks like this
With further rows below. I want to create a column to the right that will have 1 if it matches with a certain value I am checking for row-wise and otherwise it will be 0.
For a single value I have the following code -
set.seed(4991)
my_data <- data.frame(ceiling(matrix(runif(100,4,10),ncol = 5)))
comval <- c(5)
my_data$bleh <- as.integer(apply(my_data, 1, function(r) any(comval %in% r)))
The output looks like this -
Which is what I want. Now the issue I am having is that if I have two or more values under 'comval' , for instance,
comval<-c(5,10)
I am getting 1 on the 'bleh' column for all columns that either have 5 or 10. The output is like -
It is like an OR logical operator. I need it to work as an AND logical operator, that is, 'bleh' column will have the value 1 only if all the values in 'comval' are there in the rows.
Also, I am trying to write a function here so I need to take the length(comval) as an input and then check for all the values in 'comval' against each row.

You could check if length of intersect is equal or greater than 1.
my_data$bleh <- as.integer(apply(my_data, 1, function(r) {
length(intersect(comval, unlist(r))) >= 1
}))
# X1 X2 X3 X4 X5 bleh
# 1 5 10 5 6 10 1
# 2 9 9 5 8 6 1
# 3 5 10 5 5 5 1
# 4 10 8 6 5 8 1
# 5 8 6 7 9 10 1
# 6 5 10 8 10 8 1
# 7 9 8 10 5 7 1
# 8 6 8 10 6 7 1
# 9 5 5 6 6 8 1
# 10 10 5 8 6 8 1
# 11 9 10 10 7 7 1
# 12 6 8 7 10 8 1
# 13 6 9 7 6 9 0
# 14 8 6 6 10 7 1
# 15 9 9 5 7 7 1
# 16 10 9 9 10 6 1
# 17 7 10 5 10 8 1
# 18 9 8 10 9 9 1
# 19 10 8 9 6 8 1
# 20 5 8 6 7 5 1

Related

How to replace NA values in one column of a data frame, with values from a column in a different data frame?

How do I replace the NA values in 'example' with the corresponding values in 'example 2'? So 7 would take the place of the first NA and 8 would take the place of the second NA etc. My data is much larger so I would not be able to rename the values individually for the multiple NAs. Thanks
example <- data.frame('count' = c(1,3,4,NA,8,NA,9,0,NA,NA,7,5,8,NA))
example2 <- data.frame('count' = c(7,8,4,6,7))
Another possible solution, based on replace:
example$count <- replace(example$count, is.na(example$count), example2$count)
example
#> count
#> 1 1
#> 2 3
#> 3 4
#> 4 7
#> 5 8
#> 6 8
#> 7 9
#> 8 0
#> 9 4
#> 10 6
#> 11 7
#> 12 5
#> 13 8
#> 14 7
You can try with :
example[is.na(example),] <- example2
Which will give you :
count
1 1
2 3
3 4
4 7
5 8
6 8
7 9
8 0
9 4
10 6
11 7
12 5
13 8
14 7
EDIT: Since you probably have more than just one column in your dataframes, you should use :
example$count[is.na(example$count)] <- example2$count
Another option using which to check the index of NA values:
ind <- which(is.na(example$count))
example[ind, "count"] <- example2$count
Output:
count
1 1
2 3
3 4
4 7
5 8
6 8
7 9
8 0
9 4
10 6
11 7
12 5
13 8
14 7

How to find closest match from list in R

I have a list of numbers and would like to find which is the next highest compared to each number in a data.frame. I have:
list <- c(3,6,9,12)
X <- c(1:10)
df <- data.frame(X)
And I would like to add a variable to df being the next highest number in the list. i.e:
X Y
1 3
2 3
3 3
4 6
5 6
6 6
7 9
8 9
9 9
10 12
I've tried:
df$Y <- which.min(abs(list-df$X))
but that gives an error message and would just get the closest value from the list, not the next above.
Another approach is to use findInterval:
df$Y <- list[findInterval(X, list, left.open=TRUE) + 1]
> df
X Y
1 1 3
2 2 3
3 3 3
4 4 6
5 5 6
6 6 6
7 7 9
8 8 9
9 9 9
10 10 12
You could do this...
df$Y <- sapply(df$X, function(x) min(list[list>=x]))
df
X Y
1 1 3
2 2 3
3 3 3
4 4 6
5 5 6
6 6 6
7 7 9
8 8 9
9 9 9
10 10 12

How do I select rows in a data frame before and after a condition is met?

I'm searching the web for a few a days now and I can't find a solution to my (probably easy to solve) problem.
I have huge data frames with 4 variables and over a million observations each. Now I want to select 100 rows before, all rows while and 1000 rows after a specific condition is met and fill the rest with NA's. I tried it with a for loop and if/ifelse but it doesn't work so far. I think it shouldn't be a big thing, but in the moment I just don't get the hang of it.
I create the data using:
foo<-data.frame(t = 1:15, a = sample(1:15), b = c(1,1,1,1,1,4,4,4,4,1,1,1,1,1,1), c = sample(1:15))
My Data looks like this:
ID t a b c
1 1 4 1 7
2 2 7 1 10
3 3 10 1 6
4 4 2 1 4
5 5 13 1 9
6 6 15 4 3
7 7 8 4 15
8 8 3 4 1
9 9 9 4 2
10 10 14 1 8
11 11 5 1 11
12 12 11 1 13
13 13 12 1 5
14 14 6 1 14
15 15 1 1 12
What I want is to pick the value of a (in this example) 2 rows before, all rows while and 3 rows after the value of b is >1 and fill the rest with NA's. [Because this is just an example I guess you can imagine that after these 15 rows there are more rows with the value for b changing from 1 to 4 several times (I did not post it, so I won't spam the question with unnecessary data).]
So I want to get something like:
ID t a b c d
1 1 4 1 7 NA
2 2 7 1 10 NA
3 3 10 1 6 NA
4 4 2 1 4 2
5 5 13 1 9 13
6 6 15 4 3 15
7 7 8 4 15 8
8 8 3 4 1 3
9 9 9 4 2 9
10 10 14 1 8 14
11 11 5 1 11 5
12 12 11 1 13 11
13 13 12 1 5 NA
14 14 6 1 14 NA
15 15 1 1 12 NA
I'm thankful for any help.
Thank you.
Best regards,
Chris
here is the same attempt as missuse, but with data.table:
library(data.table)
foo<-data.frame(t = 1:11, a = sample(1:11), b = c(1,1,1,4,4,4,4,1,1,1,1), c = sample(1:11))
DT <- setDT(foo)
DT[ unique(c(DT[,.I[b>1] ],DT[,.I[b>1]+3 ],DT[,.I[b>1]-2 ])), d := a]
t a b c d
1: 1 10 1 2 NA
2: 2 6 1 10 6
3: 3 5 1 7 5
4: 4 11 4 4 11
5: 5 4 4 9 4
6: 6 8 4 5 8
7: 7 2 4 8 2
8: 8 3 1 3 3
9: 9 7 1 6 7
10: 10 9 1 1 9
11: 11 1 1 11 NA
Here
unique(c(DT[,.I[b>1] ],DT[,.I[b>1]+3 ],DT[,.I[b>1]-2 ]))
gives you your desired indixes : the unique indices of the line for your condition, the same indices+3 and -2.
Here is an attempt.
Get indexes that satisfy the condition b > 1
z <- which(foo$b > 1)
get indexes for (z - 2) : (z + 3)
ind <- unique(unlist(lapply(z, function(x){
g <- pmax(x - 2, 1) #if x - 2 is negative
g : (x + 3)
})))
create d column filled with NA
foo$d <- NA
replace elements with appropriate indexes with foo$a
foo$d[ind] <- foo$a[ind]
library(dplyr)
library(purrr)
# example dataset
foo<-data.frame(t = 1:15,
a = sample(1:15),
b = c(1,1,1,1,1,4,4,4,4,1,1,1,1,1,1),
c = sample(1:15))
# function to get indices of interest
# for a given index x go 2 positions back and 3 forward
# keep only positive indices
GetIDsBeforeAfter = function(x) {
v = (x-2) : (x+3)
v[v > 0]
}
foo %>% # from your dataset
filter(b > 1) %>% # keep rows where b > 1
pull(t) %>% # get the positions
map(GetIDsBeforeAfter) %>% # for each position apply the function
unlist() %>% # unlist all sets indices
unique() -> ids_to_remain # keep unique ones and save them in a vector
foo$d = foo$c # copy column c as d
foo$d[-ids_to_remain] = NA # put NA to all positions not in our vector
foo
# t a b c d
# 1 1 5 1 8 NA
# 2 2 6 1 14 NA
# 3 3 4 1 10 NA
# 4 4 1 1 7 7
# 5 5 10 1 5 5
# 6 6 8 4 9 9
# 7 7 9 4 15 15
# 8 8 3 4 6 6
# 9 9 7 4 2 2
# 10 10 12 1 3 3
# 11 11 11 1 1 1
# 12 12 15 1 4 4
# 13 13 14 1 11 NA
# 14 14 13 1 13 NA
# 15 15 2 1 12 NA

Eliminate in an increasing order rows in a data frame

Eliminate in an increasing order rows in a data frame
x<-c(4,5,6,23,5,6,7,8,0,3)
y<-c(2,4,5,6,23,5,6,7,8,0)
z<-c(1,2,4,5,6,23,5,6,7,8)
df<-data.frame(x,y,z)
df
x y z
1 4 2 1
2 5 4 2
3 6 5 4
4 23 6 5
5 5 23 6
6 6 5 23
7 7 6 5
8 8 7 6
9 0 8 7
10 3 0 8
I would like to eliminate number 23 in the df from all columns by instructing to sequentially increasingly remove a row per column (not by matching the value 23, but by its initial x location).
df
x y z
1 4 2 1
2 5 4 2
3 6 5 4
4 5 6 5
5 6 5 6
6 7 6 5
7 8 7 6
8 0 8 7
9 3 0 8
Thank you
You can iterate through the columns and remove the element from each, then reassemble as a data frame:
result <- as.data.frame(lapply(1:ncol(df), function(x) df[-(x+3),x]))
names(result) <- names(df)
result
## x y z
## 1 4 2 1
## 2 5 4 2
## 3 6 5 4
## 4 5 6 5
## 5 6 5 6
## 6 7 6 5
## 7 8 7 6
## 8 0 8 7
## 9 3 0 8
df[-(x+3),x] is the column with the value removed, by location. To start with row N in column x you would use df[-(x+N-1),x].
You could also try:
n <- 4
df1 <- df[-n,]
df1[] <- unlist(df,use.names=FALSE)[-seq(n, prod(dim(df)), by=nrow(df)+1)]
df1
# x y z
#1 4 2 1
#2 5 4 2
#3 6 5 4
#5 5 6 5
#6 6 5 6
#7 7 6 5
#8 8 7 6
#9 0 8 7
#10 3 0 8

Remove rows from a single-column data frame

When I try to remove the last row from a single column data frame, I get a vector back instead of a data frame:
> df = data.frame(a=1:10)
> df
a
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
> df[-(length(df[,1])),]
[1] 1 2 3 4 5 6 7 8 9
The behavior I'm looking for is what happens when I use this command on a two-column data frame:
> df = data.frame(a=1:10,b=11:20)
> df
a b
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
> df[-(length(df[,1])),]
a b
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
My code is general, and I don't know a priori whether the data frame will contain one or many columns. Is there an easy workaround for this problem that will let me remove the last row no matter how many columns exist?
Try adding the drop = FALSE option:
R> df[-(length(df[,1])), , drop = FALSE]
a
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9

Resources