I have a simple function:
new_function <- function(x)
{
letters <- c("A","B","C")
new_letters<- c("D","E","F")
if (x %in% letters) {"Correct"}
else if (x %in% new_letters) {"Also Correct"}
else {x}
}
I make a dataframe with letters:
df <- as.data.frame(LETTERS[seq( from = 1, to = 10 )])
names(df)<- c("Letters")
I want to apply the function on the dataframe:
df$result <- new_function(df$Letters)
And it doesn't work (the function only writes "Correct")
I get this warning:
Warning message:
In if (x %in% letters) { :
the condition has length > 1 and only the first element will be used
You can use lapply:
df$result <- lapply(df$Letters,new_function)
Output:
df
Letters result
1 A Correct
2 B Correct
3 C Correct
4 D Also Correct
5 E Also Correct
6 F Also Correct
7 G 7
8 H 8
9 I 9
10 J 10
I would rewrite your new_function with ifelse as #akrun suggested. as.character converts x to character in case it is a factor:
new_function <- function(x){
ifelse(x %in% c("A","B","C"), "Correct",
ifelse(x %in% c("D","E","F"), "Also Correct", as.character(x)))
}
df$result <- new_function(df$Letters)
or with case_when from dplyr:
library(dplyr)
new_function <- function(x){
case_when(x %in% c("A","B","C") ~ "Correct",
x %in% c("D","E","F") ~ "Also Correct",
TRUE ~ as.character(x))
}
df %>%
mutate(result = new_function(Letters))
Result:
Letters result
1 A Correct
2 B Correct
3 C Correct
4 D Also Correct
5 E Also Correct
6 F Also Correct
7 G G
8 H H
9 I I
10 J J
Data:
df <- as.data.frame(LETTERS[seq( from = 1, to = 10 )])
names(df)<- c("Letters")
Related
i have a basic R question: imagine the following code:
a <- c("A","B","C")
b <- c("A","B","C")
c <- c("A","X","C")
x <- c("A","B","C")
y <- c("","B","C")
z <- c("","","C")
frame <- data.frame(a,b,c,x,y,z)
now i want to get the content of the last 3 columns but only if they contain value. So the Output should look like this
new1 <- c("A","X","C")
new2 <- c("A","B","C")
new3 <- c("A","B","C")
frame2 <- data.frame(new1,new2,new3)
I am thankful for every help.
Using apply from base R
as.data.frame(t(apply(frame, 1, FUN = function(x) tail(x[nzchar(x)], 3))))
You can do,
new_frame <- frame[colSums(frame == '') == 0]
new_frame[tail(seq_along(new_frame), 3)]
b c x
1 A A A
2 B X B
3 C C C
I would like to find the index of the elements in a matrix m that match with a vector v1 and v2. So from something similar to res I would like to get element 2 and from
res1 get 8. Thank you!
flink = c("logit", "probit", "GEVmodNS", "GEVmod")
fcor = c("tanimoto", "exponential", "gaussian", "independent")
v1 = c('logit', 'exponential')
v2 = c('probit', 'independent')
m = expand.grid(fcor,flink)
res = m %in%v1
res1 = m %in%v2
which(apply(m, 1, function(x) all(x %in% v1)))
[1] 2
which(apply(m, 1, function(x) all(x %in% v2)))
[1] 8
If order matters so that you want a row to match the order of v1 or v2 exactly then use == instead of %in%.
UPdate: Thanks to #Greg pointer!
We still could use which with == but have to declare the positions of the the vector and matrix elements:
> which(m[,1] == v1[2] & m[,2] == v1[1])
[1] 2
> which(m[,1] == v2[2] & m[,2] == v2[1])
[1] 8
First answer(which is not correct in the sense of OP's question)
which(v1 == m)
which(v2 == m)
> which(v1 == m)
[1] 2 6 10 14 17 19
> which(v2 == m)
[1] 4 8 12 16 21 23
Try this
> which(!lengths(Map(setdiff, list(v1), asplit(m, 1))))
[1] 2
> which(!lengths(Map(setdiff, list(v2), asplit(m, 1))))
[1] 8
Assuming you want order to matter when matching
Var2 Var1
1 logit tanimoto
2 logit exponential
# ... ...
# Should match row 2.
v1 <- c('logit', 'exponential')
# Should NOT match row 2.
v3 <- c('exponential', 'logit')
here's an elegant alternative with the native pipe |> that works for an arbitrary number of fields:
(v1 == t(m)) |> apply(2, all) |> which()
# [1] 2
(v2 == t(m)) |> apply(2, all) |> which()
# [1] 8
Just make sure you name your columns in the proper order
m <- expand.grid(flink, fcor)
such that they correspond to the values in v1, etc.
Here's a function in base R to do this -
match_a_row <- function(data, var1, var2) {
which(data[[1]] == var1 & data[[2]] == var2)
}
match_a_row(m, 'exponential', 'logit')
#[1] 2
match_a_row(m, 'independent', 'probit')
#[1] 8
I am trying to perform a function simmiliar to the function in excel fount below:
IF(COUNTIF(RANGE, CRITERIA), "FOUND", "MISSING")
I want to print a new column in my dataframe with found or missing. I understand in R that I can use %in% for example:
A$C %in C$B
To find if the values in column C of the A dataframe exist in the values in column B of the C datafame. However, I do not know how to subset said results with a conditional function to print found or missing to a new column in the correct row.
Here is an example of the dataframes:
A <- data.frame("C" = c(3,5,9,21,25), "D" = 1:5)
C <- data.frame("B" = c(3,6,21,22,8) , "F" = 10:14)
A$C %in% C$B
A[A$C %in% C$B,]
Based on the limited information:
lookup_list <- c(1:3)
x <- c('a','b','c')
y <- c(10, 3, 5)
df <- data.frame(x,y)
x y
1 a 10
2 b 3
3 c 5
df <- df %>%
mutate(status = case_when(
y %in% lookup_list ~ 'FOUND',
!y %in% lookup_list ~ 'MISSING'
))
x y status
1 a 10 MISSING
2 b 3 FOUND
3 c 5 MISSING
I have a list of ~8000 vectors, and I would like to know how many duplicates there are of these 8000 vectors, but the order of the elements in each could be different.
for example:
list <- c()
list[[1]] <- c(1,2,3)
list[[2]] <- c(2,1,3)
list[[3]] <- c(3,2,1)
list[[4]] <- c(4,5)
list[[5]] <- c(5,4)
list[[6]] <- c(1,2,3,5)
should give me a count of 3 for c(1,2,3) and 2 for c(4,5) and 1 for c(1,2,3,5)
I'd like the count of each of the duplicates, not just how many are duplicated.
library(tidyverse)
library(gtools)
get_perm <- function(v) {
m <- permutations(n = length(v), r = length(v), v = v, set = F)
m[order(c(m))]
}
all <- map(list, get_perm)
unique <- map(list, get_perm) %>% unique()
res_vec <- c()
element <- c()
for(i in seq_along(unique)) {
element[[i]] <- unique[[i]] %>% unique() %>% paste(collapse = ",")
res_vec[[i]] <- all %in% unique[i] %>% sum()
}
tibble(
elements = unlist(element),
numbers = res_vec
)
Result
# A tibble: 3 x 2
elements numbers
<chr> <int>
1 1,2,3 3
2 4,5 2
3 1,2,3,5 1
elements contains all the individual elements of the vectors for each group and numbers are the numbers of vectors you have in each group.
We create a function to take vector as an argument ('val'), then loop through the list with sapply, check if all the 'valare%in%the 'x', andsumthe logicalvector`
f1 <- function(lst, val) sum(sapply(lst, function(x) all(val %in% x)))
f1(list, c(1, 2, 3))
[#1] 3
f1(list, c(4, 5))
#[1] 2
I have the following df and use-case, I'd like to find and set something in all rows for which exist another row satisfying a condition e.g.
df <- data.frame(X=c('a','b','c'), Y=c('a','c','d'))
> df
X Y
1 a a
2 b c
3 c d
I'd like to find those rows whos Y value is the same as X value in another row. In the example above would be row #2 is true because Y = c and row #3 has X = c. Note that row #1 does not satisfy the condition.
Something like:
df$Flag <- find(df, Y == X_in_another_row(df))
1
For each Y, we check if any value in X (other than in the same row) matches.
sapply(1:NROW(df), function(i) df$Y[i] %in% df$X[-i])
#[1] FALSE TRUE FALSE
If indices are necessary, wrap the whole thing in which
which(sapply(1:NROW(df), function(i) df$Y[i] %in% df$X[-i]))
#[1] 2
2 (not tested well)
df <- data.frame(X=c('a','b','c'), Y=c('a','c','d'), stringsAsFactors = FALSE)
temp = outer(df$X, df$Y, "==") #Check equality among values of X and Y
diag(temp) = FALSE #Set diagonal values as FALSE (for same row)
colSums(temp) > 0
#[1] FALSE TRUE FALSE
which(match(df$Y,df$X)!=1:nrow(df))
I think this should work.
df <- data.frame(X= c(1,2,3,4,5,3,2,1), Y = c(1,2,3,4,5,6,7,8))
which(with(df, (X %in% Y) & (X != Y)))
Works on the original data.frame, if we set stringsasfactors=FALSE
df <- data.frame(X=c('a','b','c'), Y=c('a','c','d'), stringsAsFactors = F)
which(with(df, (X %in% Y) & (X != Y)))
Quite convoluted but I'll put it here anyway. This should work even if there are repeated values in X.
For example with the following dataframe df2:
df2 = data.frame(X=c('a','b','c','a','d'), Y=c('a','c','d','e','b'))
X Y
1 a a
2 b c
3 c d
4 a e
5 d b
## Specifying the same factor levels allows us to get a square matrix
df2$X = factor(df2$X,levels=union(df2$X,df2$Y))
df2$Y = factor(df2$Y,levels=union(df2$X,df2$Y))
m = as.matrix(table(df2))
valY = rowSums(m)*colSums(m)-diag(m)
which(df2$Y %in% names(valY)[as.logical(valY)])
[1] 1 2 3 5
Essentially you want to know whether Y is in X but you want the condition to be FALSE when X == Y:
df$Z <- with(df, (Y != X) & (Y %in% X))
# Assume you want to use position 4, value 'c', to find all the rows that Y is 'c'
df <- data.frame(X = c('a', 'b', 'd', 'c'),
Y = c('a', 'c', 'c', 'd'))
row <- 4 # assume the desire row is position 4
val <- as.character( df[(row),'X'] ) # get the character and turn it into character type
df[df$Y == val,]
# Result
# X Y
# 2 b c
# 3 d c