How to find closest match from list in R - r

I have a list of numbers and would like to find which is the next highest compared to each number in a data.frame. I have:
list <- c(3,6,9,12)
X <- c(1:10)
df <- data.frame(X)
And I would like to add a variable to df being the next highest number in the list. i.e:
X Y
1 3
2 3
3 3
4 6
5 6
6 6
7 9
8 9
9 9
10 12
I've tried:
df$Y <- which.min(abs(list-df$X))
but that gives an error message and would just get the closest value from the list, not the next above.

Another approach is to use findInterval:
df$Y <- list[findInterval(X, list, left.open=TRUE) + 1]
> df
X Y
1 1 3
2 2 3
3 3 3
4 4 6
5 5 6
6 6 6
7 7 9
8 8 9
9 9 9
10 10 12

You could do this...
df$Y <- sapply(df$X, function(x) min(list[list>=x]))
df
X Y
1 1 3
2 2 3
3 3 3
4 4 6
5 5 6
6 6 6
7 7 9
8 8 9
9 9 9
10 10 12

Related

R: row-wise checking for multiple values

I have a dataset that looks like this
With further rows below. I want to create a column to the right that will have 1 if it matches with a certain value I am checking for row-wise and otherwise it will be 0.
For a single value I have the following code -
set.seed(4991)
my_data <- data.frame(ceiling(matrix(runif(100,4,10),ncol = 5)))
comval <- c(5)
my_data$bleh <- as.integer(apply(my_data, 1, function(r) any(comval %in% r)))
The output looks like this -
Which is what I want. Now the issue I am having is that if I have two or more values under 'comval' , for instance,
comval<-c(5,10)
I am getting 1 on the 'bleh' column for all columns that either have 5 or 10. The output is like -
It is like an OR logical operator. I need it to work as an AND logical operator, that is, 'bleh' column will have the value 1 only if all the values in 'comval' are there in the rows.
Also, I am trying to write a function here so I need to take the length(comval) as an input and then check for all the values in 'comval' against each row.
You could check if length of intersect is equal or greater than 1.
my_data$bleh <- as.integer(apply(my_data, 1, function(r) {
length(intersect(comval, unlist(r))) >= 1
}))
# X1 X2 X3 X4 X5 bleh
# 1 5 10 5 6 10 1
# 2 9 9 5 8 6 1
# 3 5 10 5 5 5 1
# 4 10 8 6 5 8 1
# 5 8 6 7 9 10 1
# 6 5 10 8 10 8 1
# 7 9 8 10 5 7 1
# 8 6 8 10 6 7 1
# 9 5 5 6 6 8 1
# 10 10 5 8 6 8 1
# 11 9 10 10 7 7 1
# 12 6 8 7 10 8 1
# 13 6 9 7 6 9 0
# 14 8 6 6 10 7 1
# 15 9 9 5 7 7 1
# 16 10 9 9 10 6 1
# 17 7 10 5 10 8 1
# 18 9 8 10 9 9 1
# 19 10 8 9 6 8 1
# 20 5 8 6 7 5 1

From one vector delete all elements of another vector in r [duplicate]

This question already has answers here:
R: Remove the number of occurrences of values in one vector from another vector, but not all
(2 answers)
Closed 6 years ago.
I have 2 vectors
vec_1
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 2 3 4 5 6 7 8 9 10 11 12 13 14 2 3 4 5 6 7 8 9
[35] 10 11 12 13 14 2 3 4 5 6 7 8 9 10 11 12 13 14
vec_2
[1] 12 3 13 3 14 4 10 8 9 5 7 5 13 11 6 10 8 8 14 12 6 11 8 5 3 6
I want to delete all elements of vec_2 from vec_1
And sure, that function setdiff is not the case,because, for example, in vec_2 there are two 10s values. And I want to delete only to 10(not all four values of 10).
EDITED: expected output:
vec_1
[1] 2 2 2 2 3 4 4 4 5 6 7 7 7 9 9 9 10 10 11 11 12 12 13 13 14 14
How can i do this in r?
Here is one idea via union
unlist(sapply(union(vec_1, vec_2), function(i)
rep(i, each = length(vec_1[vec_1 == i]) - length(vec_2[vec_2 == i]))))
#[1] 2 2 2 2 3 4 4 4 5 6 7 7 7 9 9 9 10 10 11 11 12 12 13 13 14 14
Definitely, not the best solution but here is one way.
I created a simplified example.
vec1 <- c(1, 2, 3, 1, 1, 5)
vec2 <- c(1, 3, 5)
#Converting the frequency table to a data frame
x1 <- data.frame(table(vec1))
x2 <- data.frame(table(vec2))
#Assuming your vec1 has all the elements present in vec2
new_df <- merge(x1, x2, by.x = "vec1", by.y = "vec2", all.x = TRUE)
new_df
# vec1 Freq.x Freq.y
#1 1 3 1
#2 2 1 NA
#3 3 1 1
#4 5 1 1
#Replacing NA's by 0
new_df[is.na(new_df)] <- 0
#Subtracting the frequencies of common elements in two vectors
final <- cbind(new_df[1], new_df[2] - new_df[3])
final
# vec1 Freq.x
#1 1 2
#2 2 1
#3 3 0
#4 5 0
#Recreating a new vector based on the final dataframe
rep(final$vec1, times = final$Freq.x)
# [1] 1 1 2
You can do this using a simple for loop:
for(i in 1:length(vec2)){
i=which(vec1 %in% vec2[i])[1]
vec1=vec1[-i]
}
You just identify the first position and remove from the original vector.
You can try this too:
for (el in vec2[vec2 %in% intersect(vec1, vec2)])
vec1 <- vec1[-which(vec1==el)[1]]
sort(vec1)
#[1] 2 2 2 2 3 4 4 4 5 6 7 7 7 9 9 9 10 10 11 11 12 12 13 13 14 14

How to generate an uneven sequence of numbers in R

Here's an example data frame:
df <- data.frame(x=c(1,1,2,2,2,3,3,4,5,6,6,6,9,9),y=c(1,2,3,4,6,3,7,8,6,4,3,7,3,2))
I want to generate a sequence of numbers according to the number of observations of y per x group (e.g. there are 2 observations of y for x=1). I want the sequence to be continuously increasing and jumps by 2 after each x group.
The desired output for this example would be:
1,2,5,6,7,10,11,14,17,20,21,22,25,26
How can I do this simply in R?
To expand on my comment, the groupings can be arbitrary, you simply need to recast it to the correct ordering. There are a few ways to do this, #akrun has shown that this can be accomplished using match function, or you can make use the the as.numeric function if this is easier to understand for yourself.
df <- data.frame(x=c(1,1,2,2,2,3,3,4,5,6,6,6,9,9),y=c(1,2,3,4,6,3,7,8,6,4,3,7,3,2))
# these are equivalent
df$newx <- as.numeric(factor(df$x, levels=unique(df$x)))
df$newx <- match(df$x, unique(df$x))
Since you now have a "new" releveling which is sequential, we can use the logic that was discussed in the comments.
df$newNumber <- 1:nrow(df) + (df$newx-1)*2
For this example, this will result in the following dataframe:
x y newx newNumber
1 1 1 1
1 2 1 2
2 3 2 5
2 4 2 6
2 6 2 7
3 3 3 10
3 7 3 11
4 8 4 14
5 6 5 17
6 4 6 20
6 3 6 21
6 7 6 22
9 3 7 25
9 2 7 26
where df$newNumber is the output you wanted.
To create the sequence 0,0,4,4,4,9,..., basically what you're doing is taking the minimum of each group and subtracting 1. The easiest way to do this is using the library(dplyr).
library(dplyr)
df %>%
group_by(x) %>%
mutate(newNumber2 = min(newNumber) -1)
Which will have the output:
Source: local data frame [14 x 5]
Groups: x
x y newx newNumber newNumber2
1 1 1 1 1 0
2 1 2 1 2 0
3 2 3 2 5 4
4 2 4 2 6 4
5 2 6 2 7 4
6 3 3 3 10 9
7 3 7 3 11 9
8 4 8 4 14 13
9 5 6 5 17 16
10 6 4 6 20 19
11 6 3 6 21 19
12 6 7 6 22 19
13 9 3 7 25 24
14 9 2 7 26 24

How to replace the NA values after merge two data.frame? [duplicate]

This question already has answers here:
Replacing NAs with latest non-NA value
(21 answers)
Closed 7 years ago.
I have two data.frame as the following:
> a <- data.frame(x=c(1,2,3,4,5,6,7,8), y=c(1,3,5,7,9,11,13,15))
> a
x y
1 1 1
2 2 3
3 3 5
4 4 7
5 5 9
6 6 11
7 7 13
8 8 15
> b <- data.frame(x=c(1,5,7), z=c(2, 4, 6))
> b
x z
1 1 2
2 5 4
3 7 6
Then I use "join" for two data.frames:
> c <- join(a, b, by="x", type="left")
> c
x y z
1 1 1 2
2 2 3 NA
3 3 5 NA
4 4 7 NA
5 5 9 4
6 6 11 NA
7 7 13 6
8 8 15 NA
My requirement is to replace the NAs in the Z column by the last None-Na value before the current place. I want the result like this:
> c
x y z
1 1 1 2
2 2 3 2
3 3 5 2
4 4 7 2
5 5 9 4
6 6 11 4
7 7 13 6
8 8 15 6
This time (if your data is not too large) a loop is an elegant option:
for(i in which(is.na(c$z))){
c$z[i] = c$z[i-1]
}
gives:
> c
x y z
1 1 1 2
2 2 3 2
3 3 5 2
4 4 7 2
5 5 9 4
6 6 11 4
7 7 13 6
8 8 15 6
data:
library(plyr)
a <- data.frame(x=c(1,2,3,4,5,6,7,8), y=c(1,3,5,7,9,11,13,15))
b <- data.frame(x=c(1,5,7), z=c(2, 4, 6))
c <- join(a, b, by="x", type="left")
You might also want to check na.locf in the zoo package.

Eliminate in an increasing order rows in a data frame

Eliminate in an increasing order rows in a data frame
x<-c(4,5,6,23,5,6,7,8,0,3)
y<-c(2,4,5,6,23,5,6,7,8,0)
z<-c(1,2,4,5,6,23,5,6,7,8)
df<-data.frame(x,y,z)
df
x y z
1 4 2 1
2 5 4 2
3 6 5 4
4 23 6 5
5 5 23 6
6 6 5 23
7 7 6 5
8 8 7 6
9 0 8 7
10 3 0 8
I would like to eliminate number 23 in the df from all columns by instructing to sequentially increasingly remove a row per column (not by matching the value 23, but by its initial x location).
df
x y z
1 4 2 1
2 5 4 2
3 6 5 4
4 5 6 5
5 6 5 6
6 7 6 5
7 8 7 6
8 0 8 7
9 3 0 8
Thank you
You can iterate through the columns and remove the element from each, then reassemble as a data frame:
result <- as.data.frame(lapply(1:ncol(df), function(x) df[-(x+3),x]))
names(result) <- names(df)
result
## x y z
## 1 4 2 1
## 2 5 4 2
## 3 6 5 4
## 4 5 6 5
## 5 6 5 6
## 6 7 6 5
## 7 8 7 6
## 8 0 8 7
## 9 3 0 8
df[-(x+3),x] is the column with the value removed, by location. To start with row N in column x you would use df[-(x+N-1),x].
You could also try:
n <- 4
df1 <- df[-n,]
df1[] <- unlist(df,use.names=FALSE)[-seq(n, prod(dim(df)), by=nrow(df)+1)]
df1
# x y z
#1 4 2 1
#2 5 4 2
#3 6 5 4
#5 5 6 5
#6 6 5 6
#7 7 6 5
#8 8 7 6
#9 0 8 7
#10 3 0 8

Resources