I have the following dataframe:
d <- data.frame(a=c(1,2,3,4), b=c(20,19,18,17))
row.names(d) <- c("A", "B", "C", "D")
I want another data.frame, with the same columns and 2 rows, which contain the row names of the 2 smallest elements in that column.
In the example the expected result would be:
# Expected results
exp <- data.frame(a=c("A", "B"), b=c("C","D"))
We loop over the columns with lapply, order the values, use that index to subset the n corresponding row.names of 'd', and wrap with data.frame
n <- 2
data.frame(lapply(d, function(x) sort(head(row.names(d)[order(x)], n))))
-output
# a b
#1 A C
#2 B D
With R 4.1.0, we can also use the |> operator for chaining the functions (applied in the order for easier understanding) along with \(x) - for lambda function in base R
# // ordered the column values
# // get corresponding row names
lapply(d, \(x) row.names(d)[order(x)] |>
head(n) |> # // get the first n values
sort()) |> # // sort them
data.frame() # // convert the list to data.frame
# a b
#1 A C
#2 B D
Or using dplyr
library(dplyr)
d %>%
summarise(across(everything(),
~ sort(head(row.names(d)[order(.)], n))))
# a b
#1 A C
#2 B D
Using sapply in base R -
rn <- rownames(d)
sapply(d, function(x) rn[order(x) %in% 1:2])
# a b
#[1,] "A" "C"
#[2,] "B" "D"
Related
There are three columns in my data frame which are characters, "A","B", and "C" (this order can vary for different data frames). I want to assign values to them, A= 1+0i, B=2+3i and C=3+2i. I use as.complex(factor(col1)) and the same thing for column two and three, but it makes all three column equal to 1+0i!!
col1 <- c("A","A", "A")
col2 <- c("B", "B","B")
col3 <- c("C","C","C")
df <- data.frame(col1,col2,col3)
print(df)
A= 1+0i
B=2+3i
C=3+2i
df2<- transform(df, col1=as.complex(as.factor(col1)),col2=as.complex(as.factor(col2)),col3=as.complex(as.factor(col3)))
sapply(df2,class)
View(df2)
So this is a weird thing you're doing. You have a column of strings, letters like "A" and "B". Then you have objects with the same names, A = 1 + 0i, etc. Normally we don't treat object names as "data", but you're sort of mixing the two here. The solution I'd propose is to make everything data: combine your A, B, and C values into a vector, and give the vector names accordingly. Then we can replace the values in the data frame with the corresponding values from our named vector:
vec = c(A, B, C)
names(vec) = c("A", "B", "C")
df[] = lapply(df, \(x) vec[x])
df
# col1 col2 col3
# 1 1+0i 2+3i 3+2i
# 2 1+0i 2+3i 3+2i
# 3 1+0i 2+3i 3+2i
Hello I have a list :
list=c("OK_67J","GGT_je","Ojj_OK_778","JUu3","JJE")
and i would like to transforme it as a df :
COL1 COL2
OK_67J A
GGT_je B
Ojj_OK_778 A
JUu3 B
JJE B
where I add a A if there is the OK_pattern and B if not.
I tried :
COL2<-rep("Virus",length(list))
list[grep("OK_",tips)]<-"A"
df <- data.frame(COL1=list,COL2=COL2)
Use grepl :
ifelse(grepl('OK_', list), "A", "B")
#[1] "A" "B" "A" "B" "B"
You can also do it without ifelse :
c("B", "A")[grepl('OK_', list) + 1]
It is better to not use variable name as list since it's a default function in R.
When you exchange your list[grep("OK_",tips)]<-"A" with COL2[grep("OK_",list)] <- "A" your solution will work.
list <- c("OK_67J", "GGT_je", "Ojj_OK_778", "JUu3", "JJE")
COL2 <- rep("B", length(list))
COL2[grep("OK_", list)] <- "A"
df <- data.frame(COL1 = list, COL2 = COL2)
df
# COL1 COL2
#1 OK_67J A
#2 GGT_je B
#3 Ojj_OK_778 A
#4 JUu3 B
#5 JJE B
First off, list is not a list but a character vector:
list=c("OK_67J","GGT_je","Ojj_OK_778","JUu3","JJE")
class(list)
[1] "character"
To transform it to a dataframe:
df <- data.frame(v1 = list)
To add the new column use grepl:
df$v2 <- ifelse(grepl("OK_", df$v1), "A", "B")
or use str_detect:
library(stringr)
df$v2 <- ifelse(str_detect(df$v1, "OK_"), "A", "B")
Result:
df
v1 v2
1 OK_67J A
2 GGT_je B
3 Ojj_OK_778 A
4 JUu3 B
5 JJE B
I have a vector with words, e.g., like this:
w <- LETTERS[1:5]
and a dataframe with tokens of these words but also tokens of other words in different columns, e.g., like this:
set.seed(21)
df <- data.frame(
w1 = c(sample(LETTERS, 10)),
w2 = c(sample(LETTERS, 10)),
w3 = c(sample(LETTERS, 10)),
w4 = c(sample(LETTERS, 10))
)
df
w1 w2 w3 w4
1 U R A Y
2 G X P M
3 Q B S R
4 E O V T
5 V D G W
6 T A Q E
7 C K L U
8 D F O Z
9 R I M G
10 O T T I
# convert factor to character:
df[] <- lapply(df[], as.character)
I'd like to extract from dfall the tokens of those words that are contained in the vector w. I can do it like this but that doesn't look nice and is highly repetitive and error prone if the dataframe is larger:
extract <- c(df$w1[df$w1 %in% w],
df$w2[df$w2 %in% w],
df$w3[df$w3 %in% w],
df$w4[df$w4 %in% w])
I tried this, using paste0 to avoid addressing each column separately but that doesn't work:
extract <- df[paste0("w", 1:4)][df[paste0("w", 1:4)] %in% w]
extract
data frame with 0 columns and 10 rows
What's wrong with this code? Or which other code would work?
To answer your question, "What's wrong with this code?": The code df[paste0("w", 1:4)][df[paste0("w", 1:4)] %in% w] is the equivalent of df[df %in% w] because df[paste0("w", 1:4)], which you use twice, simply returns the entirety of df. That means df %in% w will return FALSE FALSE FALSE FALSE because none of the variables in df are in w (w contains strings but not vectors of strings), and df[c(F, F, F, F)] returns an empty data frame.
If you're dealing with a single data type (strings), and the output can be a character vector, then use a matrix instead of a data frame, which is faster and is, in this case, a little easier to subset:
mat <- as.matrix(df)
mat[mat %in% w]
#[1] "B" "D" "E" "E" "A" "B" "E" "B"
This produces the same output as your attempt above with extract <- ….
If you want to keep some semblance of the original data frame structure then you can try the following, which outputs a list (necessary as the returned vectors for each variable might have different lengths):
lapply(df, function(x) x[x %in% w])
#### OUTPUT ####
$w1
[1] "B" "D" "E"
$w2
[1] "E" "A"
$w3
[1] "B"
$w4
[1] "E" "B"
Just call unlist or unclass on the returned list if you want a vector.
Let's have a list of data frames:
df1 <- data.frame(V1=c("a", "b", "c"),V2=c("d", "e","f"), V3=c("g","h","i"),V4=c("j","k","l"))
df2 <- data.frame(V1=c("m","n"), V2=c("o","p"), V3=c("q","r"))
l <-list(df1, df2)
> l
[[1]]
V1 V2 V3 V4
1 a d g j
2 b e h k
3 c f i l
[[2]]
V1 V2 V3
1 m o q
2 n p r
Moreover, we have a vector:
ele <- c("a","b","e","g","i","m","p","s","t")
I want to obtain a new data frame contructed by matching elements from vector ele and list l. Data frame should have colnames from matched elemenets from vector and element right to the matches elements from the list.
For instance:
df3 <-data.frame(a="d",b='e',e="h",g="j",i="l",m="o",p="r")
> df3
a b e g i m p
1 d e h j l o r
As you may notice there is not spefic matching pattern.
Probably there's better solutions somewhere, but this is a possibility:
library(tidyverse)
library(magrittr)
l %<>%
map(~ t(.x) %>%
as_tibble() %>%
flatten_chr())
ele %>%
map(~ map(l, equals, .x)) %>%
map_chr(~ {
lgl <- map_lgl(.x, any)
if (!any(lgl)) {
NA
} else {
lgl_idx <- min(which(lgl))
lgl <- l[[lgl_idx]]
lgl[min(which(.x[[lgl_idx]])) + 1]
}
}) %>%
set_names(ele) %>%
na.omit()
Needs some more exception handling (such as when the vector equals an element in the last column) but it works on the example you've given.
a b e g i m p
"d" "e" "h" "j" "l" "o" "r"
You can fine the element that matches an argument using which, and then add a vector to it (in this case c(0,1)).
ele_list = as.list(ele)
names(ele_list) = ele
unlist(lapply(ele_list, function(e) df1[which(df1 == e, arr.ind = TRUE) + c(0, 1)]))
a b e g i
"d" "e" "h" "j" "l"
I only did it for df1, you could run the third line for both, then combine the vectors and convert to dataframe.
This question already has answers here:
Filter multiple values on a string column in dplyr
(6 answers)
Closed last year.
Is there a function that takes one dataset, one col, one operator, but several values to evaluate a condition?
v1 <- c(1:3)
v2 <- c("a", "b", "c")
df <- data.frame(v1, v2)
Options to subset (programmatically)
result <- df[df$v2 == "a" | df$v2 == "b", ]
result
1 1 a
2 2 b
Or, for more robustness
result1 <- df[ df[[2]] == "a" | df[[2]] == "b", ]
result1
v1 v2
1 1 a
2 2 b
Alternatively, for easier syntax:
library(dplyr)
result2 <- filter(df, v2 == "a" | v2 == "b")
result2
v1 v2
1 1 a
2 2 b
(Am I right to assume that I can safely use dplyr's filter() inside a function?
)
I did not include subset() above as it is known to be for interactive use only.
In all the cases above, one has to repeat the condition (v2 == "a" | v2 == "b").
I'm looking for a function to which I could pass a vector to the argument, like c("a", "b") because I would like to pass a large number of values, and automate the process.
Such function could perhaps be something like:
fun(df, col = v2, operator = "|", value = c("a", "b")
Thank you
We can use %in% if the number of elements to check is more than 1.
df[df$v2 %in% c('a', 'b'),]
# v1 v2
#1 1 a
#2 2 b
Or if we use subset, the df$ can be removed
subset(df, v2 %in% c('a', 'b'))
Or the dplyr::filter
filter(df, v2 %in% c('a', 'b'))
This can be wrapped in a function
f1 <- function(dat, col, val){
filter(dat, col %in% val)
}
f1(df, v2, c('a', 'b'))
# v1 v2
#1 1 a
#2 2 b
If we need to use ==, we could loop the vector to compare in a list and use Reduce with |
df[Reduce(`|`, lapply(letters[1:2], `==`, df$v2)),]