From one vector delete all elements of another vector in r [duplicate] - r

This question already has answers here:
R: Remove the number of occurrences of values in one vector from another vector, but not all
(2 answers)
Closed 6 years ago.
I have 2 vectors
vec_1
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 2 3 4 5 6 7 8 9 10 11 12 13 14 2 3 4 5 6 7 8 9
[35] 10 11 12 13 14 2 3 4 5 6 7 8 9 10 11 12 13 14
vec_2
[1] 12 3 13 3 14 4 10 8 9 5 7 5 13 11 6 10 8 8 14 12 6 11 8 5 3 6
I want to delete all elements of vec_2 from vec_1
And sure, that function setdiff is not the case,because, for example, in vec_2 there are two 10s values. And I want to delete only to 10(not all four values of 10).
EDITED: expected output:
vec_1
[1] 2 2 2 2 3 4 4 4 5 6 7 7 7 9 9 9 10 10 11 11 12 12 13 13 14 14
How can i do this in r?

Here is one idea via union
unlist(sapply(union(vec_1, vec_2), function(i)
rep(i, each = length(vec_1[vec_1 == i]) - length(vec_2[vec_2 == i]))))
#[1] 2 2 2 2 3 4 4 4 5 6 7 7 7 9 9 9 10 10 11 11 12 12 13 13 14 14

Definitely, not the best solution but here is one way.
I created a simplified example.
vec1 <- c(1, 2, 3, 1, 1, 5)
vec2 <- c(1, 3, 5)
#Converting the frequency table to a data frame
x1 <- data.frame(table(vec1))
x2 <- data.frame(table(vec2))
#Assuming your vec1 has all the elements present in vec2
new_df <- merge(x1, x2, by.x = "vec1", by.y = "vec2", all.x = TRUE)
new_df
# vec1 Freq.x Freq.y
#1 1 3 1
#2 2 1 NA
#3 3 1 1
#4 5 1 1
#Replacing NA's by 0
new_df[is.na(new_df)] <- 0
#Subtracting the frequencies of common elements in two vectors
final <- cbind(new_df[1], new_df[2] - new_df[3])
final
# vec1 Freq.x
#1 1 2
#2 2 1
#3 3 0
#4 5 0
#Recreating a new vector based on the final dataframe
rep(final$vec1, times = final$Freq.x)
# [1] 1 1 2

You can do this using a simple for loop:
for(i in 1:length(vec2)){
i=which(vec1 %in% vec2[i])[1]
vec1=vec1[-i]
}
You just identify the first position and remove from the original vector.

You can try this too:
for (el in vec2[vec2 %in% intersect(vec1, vec2)])
vec1 <- vec1[-which(vec1==el)[1]]
sort(vec1)
#[1] 2 2 2 2 3 4 4 4 5 6 7 7 7 9 9 9 10 10 11 11 12 12 13 13 14 14

Related

R: row-wise checking for multiple values

I have a dataset that looks like this
With further rows below. I want to create a column to the right that will have 1 if it matches with a certain value I am checking for row-wise and otherwise it will be 0.
For a single value I have the following code -
set.seed(4991)
my_data <- data.frame(ceiling(matrix(runif(100,4,10),ncol = 5)))
comval <- c(5)
my_data$bleh <- as.integer(apply(my_data, 1, function(r) any(comval %in% r)))
The output looks like this -
Which is what I want. Now the issue I am having is that if I have two or more values under 'comval' , for instance,
comval<-c(5,10)
I am getting 1 on the 'bleh' column for all columns that either have 5 or 10. The output is like -
It is like an OR logical operator. I need it to work as an AND logical operator, that is, 'bleh' column will have the value 1 only if all the values in 'comval' are there in the rows.
Also, I am trying to write a function here so I need to take the length(comval) as an input and then check for all the values in 'comval' against each row.
You could check if length of intersect is equal or greater than 1.
my_data$bleh <- as.integer(apply(my_data, 1, function(r) {
length(intersect(comval, unlist(r))) >= 1
}))
# X1 X2 X3 X4 X5 bleh
# 1 5 10 5 6 10 1
# 2 9 9 5 8 6 1
# 3 5 10 5 5 5 1
# 4 10 8 6 5 8 1
# 5 8 6 7 9 10 1
# 6 5 10 8 10 8 1
# 7 9 8 10 5 7 1
# 8 6 8 10 6 7 1
# 9 5 5 6 6 8 1
# 10 10 5 8 6 8 1
# 11 9 10 10 7 7 1
# 12 6 8 7 10 8 1
# 13 6 9 7 6 9 0
# 14 8 6 6 10 7 1
# 15 9 9 5 7 7 1
# 16 10 9 9 10 6 1
# 17 7 10 5 10 8 1
# 18 9 8 10 9 9 1
# 19 10 8 9 6 8 1
# 20 5 8 6 7 5 1

selecting common columns from different elements of a list

I have a data set in list format. The list is further divide into 20 elements. Each element contains 12 rows and some columns. Now I want to extract common columns from each element of the list and make a new data set. I try to make a reproducible example. Please see code
a<-data.frame(x=(1:10),y=(1:10),z=(1:10))
b<-data.frame(x=(1:10),y=(1:10),n=(1:10))
c<-data.frame(x=(1:10),y=(1:10),q=(1:10))
data<-list(a,b,c)
data1<-ldply(data)
required_data<-data1[,-3:-5]
Find the common columns using Reduce, subset them from list and bind them together
cols <- Reduce(intersect, lapply(data, colnames))
do.call(rbind, lapply(data, `[`, cols))
# x y
#1 1 1
#2 2 2
#3 3 3
#4 4 4
#5 5 5
#6 6 6
#7 7 7
#8 8 8
#9 9 9
#10 10 10
#11 1 1
#...
The last step can also be performed using
purrr::map_df(data, `[`, cols)
with base R, you can fist find the names in common
commonName <- names((r<-table(unlist(Map(names,data))))[r>1])
then retrieve the columns from list and integrate (similar to the second step in the solution by #Ronak Shah)
res <- Reduce(rbind,lapply(data, '[',commonName))
which gives:
> res
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
11 1 1
12 2 2
13 3 3
14 4 4
15 5 5
16 6 6
17 7 7
18 8 8
19 9 9
20 10 10
21 1 1
22 2 2
23 3 3
24 4 4
25 5 5
26 6 6
27 7 7
28 8 8
29 9 9
30 10 10

How to find closest match from list in R

I have a list of numbers and would like to find which is the next highest compared to each number in a data.frame. I have:
list <- c(3,6,9,12)
X <- c(1:10)
df <- data.frame(X)
And I would like to add a variable to df being the next highest number in the list. i.e:
X Y
1 3
2 3
3 3
4 6
5 6
6 6
7 9
8 9
9 9
10 12
I've tried:
df$Y <- which.min(abs(list-df$X))
but that gives an error message and would just get the closest value from the list, not the next above.
Another approach is to use findInterval:
df$Y <- list[findInterval(X, list, left.open=TRUE) + 1]
> df
X Y
1 1 3
2 2 3
3 3 3
4 4 6
5 5 6
6 6 6
7 7 9
8 8 9
9 9 9
10 10 12
You could do this...
df$Y <- sapply(df$X, function(x) min(list[list>=x]))
df
X Y
1 1 3
2 2 3
3 3 3
4 4 6
5 5 6
6 6 6
7 7 9
8 8 9
9 9 9
10 10 12

How do I select rows in a data frame before and after a condition is met?

I'm searching the web for a few a days now and I can't find a solution to my (probably easy to solve) problem.
I have huge data frames with 4 variables and over a million observations each. Now I want to select 100 rows before, all rows while and 1000 rows after a specific condition is met and fill the rest with NA's. I tried it with a for loop and if/ifelse but it doesn't work so far. I think it shouldn't be a big thing, but in the moment I just don't get the hang of it.
I create the data using:
foo<-data.frame(t = 1:15, a = sample(1:15), b = c(1,1,1,1,1,4,4,4,4,1,1,1,1,1,1), c = sample(1:15))
My Data looks like this:
ID t a b c
1 1 4 1 7
2 2 7 1 10
3 3 10 1 6
4 4 2 1 4
5 5 13 1 9
6 6 15 4 3
7 7 8 4 15
8 8 3 4 1
9 9 9 4 2
10 10 14 1 8
11 11 5 1 11
12 12 11 1 13
13 13 12 1 5
14 14 6 1 14
15 15 1 1 12
What I want is to pick the value of a (in this example) 2 rows before, all rows while and 3 rows after the value of b is >1 and fill the rest with NA's. [Because this is just an example I guess you can imagine that after these 15 rows there are more rows with the value for b changing from 1 to 4 several times (I did not post it, so I won't spam the question with unnecessary data).]
So I want to get something like:
ID t a b c d
1 1 4 1 7 NA
2 2 7 1 10 NA
3 3 10 1 6 NA
4 4 2 1 4 2
5 5 13 1 9 13
6 6 15 4 3 15
7 7 8 4 15 8
8 8 3 4 1 3
9 9 9 4 2 9
10 10 14 1 8 14
11 11 5 1 11 5
12 12 11 1 13 11
13 13 12 1 5 NA
14 14 6 1 14 NA
15 15 1 1 12 NA
I'm thankful for any help.
Thank you.
Best regards,
Chris
here is the same attempt as missuse, but with data.table:
library(data.table)
foo<-data.frame(t = 1:11, a = sample(1:11), b = c(1,1,1,4,4,4,4,1,1,1,1), c = sample(1:11))
DT <- setDT(foo)
DT[ unique(c(DT[,.I[b>1] ],DT[,.I[b>1]+3 ],DT[,.I[b>1]-2 ])), d := a]
t a b c d
1: 1 10 1 2 NA
2: 2 6 1 10 6
3: 3 5 1 7 5
4: 4 11 4 4 11
5: 5 4 4 9 4
6: 6 8 4 5 8
7: 7 2 4 8 2
8: 8 3 1 3 3
9: 9 7 1 6 7
10: 10 9 1 1 9
11: 11 1 1 11 NA
Here
unique(c(DT[,.I[b>1] ],DT[,.I[b>1]+3 ],DT[,.I[b>1]-2 ]))
gives you your desired indixes : the unique indices of the line for your condition, the same indices+3 and -2.
Here is an attempt.
Get indexes that satisfy the condition b > 1
z <- which(foo$b > 1)
get indexes for (z - 2) : (z + 3)
ind <- unique(unlist(lapply(z, function(x){
g <- pmax(x - 2, 1) #if x - 2 is negative
g : (x + 3)
})))
create d column filled with NA
foo$d <- NA
replace elements with appropriate indexes with foo$a
foo$d[ind] <- foo$a[ind]
library(dplyr)
library(purrr)
# example dataset
foo<-data.frame(t = 1:15,
a = sample(1:15),
b = c(1,1,1,1,1,4,4,4,4,1,1,1,1,1,1),
c = sample(1:15))
# function to get indices of interest
# for a given index x go 2 positions back and 3 forward
# keep only positive indices
GetIDsBeforeAfter = function(x) {
v = (x-2) : (x+3)
v[v > 0]
}
foo %>% # from your dataset
filter(b > 1) %>% # keep rows where b > 1
pull(t) %>% # get the positions
map(GetIDsBeforeAfter) %>% # for each position apply the function
unlist() %>% # unlist all sets indices
unique() -> ids_to_remain # keep unique ones and save them in a vector
foo$d = foo$c # copy column c as d
foo$d[-ids_to_remain] = NA # put NA to all positions not in our vector
foo
# t a b c d
# 1 1 5 1 8 NA
# 2 2 6 1 14 NA
# 3 3 4 1 10 NA
# 4 4 1 1 7 7
# 5 5 10 1 5 5
# 6 6 8 4 9 9
# 7 7 9 4 15 15
# 8 8 3 4 6 6
# 9 9 7 4 2 2
# 10 10 12 1 3 3
# 11 11 11 1 1 1
# 12 12 15 1 4 4
# 13 13 14 1 11 NA
# 14 14 13 1 13 NA
# 15 15 2 1 12 NA

How to generate an uneven sequence of numbers in R

Here's an example data frame:
df <- data.frame(x=c(1,1,2,2,2,3,3,4,5,6,6,6,9,9),y=c(1,2,3,4,6,3,7,8,6,4,3,7,3,2))
I want to generate a sequence of numbers according to the number of observations of y per x group (e.g. there are 2 observations of y for x=1). I want the sequence to be continuously increasing and jumps by 2 after each x group.
The desired output for this example would be:
1,2,5,6,7,10,11,14,17,20,21,22,25,26
How can I do this simply in R?
To expand on my comment, the groupings can be arbitrary, you simply need to recast it to the correct ordering. There are a few ways to do this, #akrun has shown that this can be accomplished using match function, or you can make use the the as.numeric function if this is easier to understand for yourself.
df <- data.frame(x=c(1,1,2,2,2,3,3,4,5,6,6,6,9,9),y=c(1,2,3,4,6,3,7,8,6,4,3,7,3,2))
# these are equivalent
df$newx <- as.numeric(factor(df$x, levels=unique(df$x)))
df$newx <- match(df$x, unique(df$x))
Since you now have a "new" releveling which is sequential, we can use the logic that was discussed in the comments.
df$newNumber <- 1:nrow(df) + (df$newx-1)*2
For this example, this will result in the following dataframe:
x y newx newNumber
1 1 1 1
1 2 1 2
2 3 2 5
2 4 2 6
2 6 2 7
3 3 3 10
3 7 3 11
4 8 4 14
5 6 5 17
6 4 6 20
6 3 6 21
6 7 6 22
9 3 7 25
9 2 7 26
where df$newNumber is the output you wanted.
To create the sequence 0,0,4,4,4,9,..., basically what you're doing is taking the minimum of each group and subtracting 1. The easiest way to do this is using the library(dplyr).
library(dplyr)
df %>%
group_by(x) %>%
mutate(newNumber2 = min(newNumber) -1)
Which will have the output:
Source: local data frame [14 x 5]
Groups: x
x y newx newNumber newNumber2
1 1 1 1 1 0
2 1 2 1 2 0
3 2 3 2 5 4
4 2 4 2 6 4
5 2 6 2 7 4
6 3 3 3 10 9
7 3 7 3 11 9
8 4 8 4 14 13
9 5 6 5 17 16
10 6 4 6 20 19
11 6 3 6 21 19
12 6 7 6 22 19
13 9 3 7 25 24
14 9 2 7 26 24

Resources