DF
ID B C D
1 A 1 1 3
2 B 2 3 1
3 C 1 1 1
4 D 3 1 1
5 E 1 0 0
Given a dataframe such the one mentioned above, how can I quickly calculate the means for each row in one column and store them in another column of the dataframe? For example the average of column B would be: 0.5, 1, 0.5, 1,5, 0.5.
And is it possible to have a function that does it automatically for several columns at once?
Option is to get the matching row element from 'ID' to divide the column with the value
f1 <- function(dat, colNm) transform(dat,
newCol = dat[[colNm]]/dat[match(colNm, ID), colNm])
f1(DF, 'B')
# ID B C D newCol
#1 A 1 1 3 0.5
#2 B 2 3 1 1.0
#3 C 1 1 1 0.5
#4 D 3 1 1 1.5
#5 E 1 0 0 0.5
If it is to divide by a constant value, then just do
DF[-1] <- DF[-1]/2
data
DF <- structure(list(ID = c("A", "B", "C", "D", "E"), B = c(1L, 2L,
1L, 3L, 1L), C = c(1L, 3L, 1L, 1L, 0L), D = c(3L, 1L, 1L, 1L,
0L)), class = "data.frame", row.names = c("1", "2", "3", "4",
"5"))
Related
I have a dataframe like this:
ID S1 C
1 1 2 3
2 1 2 3
3 3 1 1
4 6 2 5
5 6 7 5
What I need is the number of rows per group ID where S1 <= C. This is the desired output.
ID Obs
1 1 2
2 3 1
3 6 1
Even though the question was answered below, I have a follow up question: Is it possible to do the same for multiple columns (S1, S2, ..). For example for the dataframe below:
ID S1 S2 C
1 1 2 2 3
2 1 2 2 3
3 3 1 1 1
4 6 2 2 5
5 6 7 7 5
And then get:
ID S1.Obs S2.Obs
1 1 2 2
2 3 1 1
3 6 1 1
A base R solution with aggregate().
aggregate(Obs ~ ID, transform(df, Obs = S1 <= C), sum)
# ID Obs
# 1 1 2
# 2 3 1
# 3 6 1
A dplyr solution
library(dplyr)
df %>%
filter(S1 <= C) %>%
count(ID, name = "Obs")
# ID Obs
# 1 1 2
# 2 3 1
# 3 6 1
Data
df <- structure(list(ID = c(1L, 1L, 3L, 6L, 6L), S1 = c(2L, 2L, 1L, 2L, 7L),
C = c(3L, 3L, 1L, 5L, 5L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))
Extension
If you want to apply this rule on multiple columns such as S1, S2, S3:
df %>%
group_by(ID) %>%
summarise(across(starts_with("S"), ~ sum(.x <= C)))
data <- data.frame(
ID = c(1, 1, 3, 6, 6),
S1 = c(2, 2, 1, 2, 7),
C = c(3, 3, 1, 5, 5)
)
library(dplyr)
data.filtered <- data[data$S1 <= data$C,]
data.filtered %>% group_by(ID) %>%
summarize(Obs = length(ID))
An option with data.table
library(data.table)
setDT(df)[S1 <=C, .(Obs = .N), ID]
# ID Obs
#1: 1 2
#2: 3 1
#3: 6 1
data
df <- structure(list(ID = c(1L, 1L, 3L, 6L, 6L), S1 = c(2L, 2L, 1L, 2L, 7L),
C = c(3L, 3L, 1L, 5L, 5L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))
I have a dataframe as follows
group x y
a 1 2
a 3 1
b 1 3
c 1 1
c 2 3
I want to be able to generate all combinations of the x and y columns within a group, like so
group xy
a 1-2
a 1-1
a 3-2
a 3-1
b 1-3
c 1-1
c 1-3
c 2-1
c 2-3
I've tried using the following code, but it seems as though the group_by function is not working as expected
library(dplyr)
library(tidyr)
combn <- df %>%
group_by(group) %>%
expand(x, y)
My current results are instead giving me every combination of all three columns
head(combn)
group x y
a 1 1
a 1 2
a 1 3
a 2 1
a 2 2
a 2 3
Dput:
structure(list(group = structure(c(1L, 1L, 2L, 3L, 3L), .Label = c("a",
"b", "c"), class = "factor"), x = structure(c(1L, 3L, 1L, 1L,
2L), .Label = c("1", "2", "3"), class = "factor"), y = structure(c(2L,
1L, 3L, 1L, 3L), .Label = c("1", "2", "3"), class = "factor")), class = "data.frame", row.names = c(NA,
-5L))
You could use crossing from purrr to create combinations within a group and then unnest to create them as separate rows.
library(dplyr)
df1 <- df %>%
group_by(group) %>%
summarise(xy = list(crossing(x, y))) %>%
tidyr::unnest(xy)
df1
# group a b
# <fct> <int> <int>
#1 a 1 2
#2 a 3 2
#3 a 1 1
#4 a 3 1
#5 b 1 3
#6 c 1 1
#7 c 2 1
#8 c 1 3
#9 c 2 3
If you want to combine the two columns, you could use unite :
tidyr::unite(df1, xy, a, b, sep = "-")
# group xy
# <fct> <chr>
#1 a 1-2
#2 a 3-2
#3 a 1-1
#4 a 3-1
#5 b 1-3
#6 c 1-1
#7 c 2-1
#8 c 1-3
#9 c 2-3
I would like summarize my data by counting the entities and create counting_column for each entity.
let say:
df:
id class
1 A
1 B
1 A
1 A
1 B
1 c
2 A
2 B
2 B
2 D
I want to create a table like
id A B C D
1 3 2 1 0
2 1 2 0 1
How can I do this in R using apply function?
df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
class = structure(c(1L, 2L, 1L, 1L, 2L, 3L, 1L, 2L, 2L, 4L
), .Label = c("A", "B", "C", "D"), class = "factor")), .Names = c("id",
"class"), class = "data.frame", row.names = c(NA, -10L))
with(df, table(id, class))
# class
#id A B C D
# 1 3 2 1 0
# 2 1 2 0 1
xtabs(~ id + class, df)
# class
#id A B C D
# 1 3 2 1 0
# 2 1 2 0 1
tapply(rep(1, nrow(df)), df, length, default = 0)
# class
#id A B C D
# 1 3 2 1 0
# 2 1 2 0 1
This seems like a very strange requirement but if you insist on using apply then the function count counts the number of rows for which id equals x and class equals y. It is applied to every combination of id and class to get a using nested apply calls. Finally we add the row and column names.
uid <- unique(DF$id)
uclass <- unique(DF$class)
count <- function(x, y, DF) sum(x == DF$id & y == DF$class)
a <- apply(matrix(uclass), 1, function(u) apply(matrix(uid), 1, count, u, DF))
dimnames(a) <- list(uid, uclass)
giving:
> a
A B c D
1 3 2 1 0
2 1 2 0 1
Note
We used this for DF
Lines <- "id class
1 A
1 B
1 A
1 A
1 B
1 c
2 A
2 B
2 B
2 D"
DF <- read.table(text = Lines, header = TRUE)
Hello I have the data frame and I need to remove all the rows with max values from each columns.
Example
A B C
1 2 3 5
2 4 1 1
3 1 4 3
4 2 1 1
So the output is:
A B C
4 2 1 1
Is there any quick way to do this?
We can do this with %in%
df1[!seq_len(nrow(df1)) %in% sapply(df1, which.max),]
# A B C
#4 2 1 1
If there are ties for maximum values in each row, then do
df1[!Reduce(`|`, lapply(df1, function(x) x== max(x))),]
df[-sapply(df, which.max),]
# A B C
#4 2 1 1
DATA
df = structure(list(A = c(2L, 4L, 1L, 2L), B = c(3L, 1L, 4L, 1L),
C = c(5L, 1L, 3L, 1L)), .Names = c("A", "B", "C"),
class = "data.frame", row.names = c(NA,-4L))
I got a dataframe where there is gene expression data
I'm trying to extract all rows where ANY of the columns has a value (data is already in log2 values) >= 2 but can't seem to get there. My data is:
A B C D
Gene1 1 2 3 1
Gene2 2 1 1 4
Gene3 1 1 0 1
Gene4 1 2 0 1
I would only like to retain gene1, gene2 and gene4 without stating all columns (as this is just a toy example).
You could use rowSums on a logical matrix derived from df >=2 and double negate (!) to get the index of rows to subset.
df[!!rowSums(df >=2),]
# A B C D
#Gene1 1 2 3 1
#Gene2 2 1 1 4
#Gene4 1 2 0 1
Or using the reverse condition df <2 to get the logical matrix, userowSums, then check whether this is less than ncol(df)
df[rowSums(df <2) < ncol(df),]
# A B C D
#Gene1 1 2 3 1
#Gene2 2 1 1 4
#Gene4 1 2 0 1
Or
df[apply(t(df>=2),2, any), ]
data
df <- structure(list(A = c(1L, 2L, 1L, 1L), B = c(2L, 1L, 1L, 2L),
C = c(3L, 1L, 0L, 0L), D = c(1L, 4L, 1L, 1L)), .Names = c("A",
"B", "C", "D"), class = "data.frame", row.names = c("Gene1",
"Gene2", "Gene3", "Gene4"))