If in a data.frame - r

I would like to choose a value between two columns in the same row following values in other columns.
My function would be like: if values inside shapiro1, shapiro2 and F_test are less than 0.05 choose value in t_test else choose wilcox's value. Does it seem possible to you to make a function like this and apply on a larger columns?
structure(list(modalities = structure(1:3, .Label = c("BS1",
"HW1", "PG"), class = "factor"), shapiro1 = c(0.0130672654432492,
0.305460485386201, 0.148320635833262), shapiro2 = c(0.920315823302857,
0.1354174735521, 0.148320635833262), F_test = c(0.20353475323665,
0.00172897172228584, 1), t_test = c(2.88264982135322e-06, 5.75374264225996e-05,
NaN), wilcox = c(0.00909069801592506, 0.00902991076269246, NaN
)), class = "data.frame", row.names = c(NA, -3L))

You could select columns, apply rowSums and check if any value in that row is less than 0.05 and select t_test or wilcox values accordingly.
cols <- c("shapiro1", "shapiro2", "F_test")
ifelse(rowSums(df[cols] < 0.05) > 0, df$t_test, df$wilcox)
#[1] 2.882650e-06 5.753743e-05 NaN

Related

Pull out column names of cells which match logical criteria

I've got a table such as this:
structure(list(Suggested.Symbol = c("CCT4", "DHRS2", "PMS2",
"FARSB", "RPL31", "ASNS"), p_onset = c(0.9378, 0.5983, 7.674e-10,
0.09781, 0.5495, 0.7841), p_dc14 = c(0.3975, 0.3707, 6.117e-17,
0.2975, 0.4443, 0.7661), p_tfc6 = c(0.2078, 0.896, 7.388e-19,
0.5896, 0.3043, 0.6696), p_tms30 = c(0.5724, 0.3409, 4.594e-13,
0.2403, 0.1357, 0.3422)), row.names = c(NA, 6L), class = "data.frame")
I'd like to create a new column called 'summary'. In it, on a row-wise basis, I'd like to return the columns names of the cells with values <0.05, comma separated. Is that possible??
We can use toString by looping over the rows, create a logical vector where the values are less than 0.05, subset the names and paste them with toString
df1$summary <- apply(df1[-1], 1, \(x) toString(names(x)[x < 0.05]))

Create a new column by aggregating multiple columns in R

Background
I have a dataset, df, where I would like to aggregate multiple columns and create a new column. I need to multiply Type, Span and Population columns and create a new Output column
ID Status Type Span State Population
A Yes 2 70% Ga 10000
Desired output
ID Status Type Span State Population Output
A Yes 2 70% Ga 10000 14000
dput
structure(list(ID = structure(1L, .Label = "A ", class = "factor"),
Status = structure(1L, .Label = "Yes", class = "factor"),
Type = 2L, Span = structure(1L, .Label = "70%", class = "factor"),
State = structure(1L, .Label = "Ga", class = "factor"), Population = 10000L), class = "data.frame",
row.names = c(NA,
-1L))
This is what I have tried
df %>%
mutate(Output = Type * Span * Population)
Here, we are creating a new column based on the inputs from different column. We can just use mutate to get the Span percent of Population and multiply by 'Type'. Note that 'Span' is not numeric, as it is having %, so we extract the numeric part with parse_number divide by 100, then multiply with Population along with the 'Type'
library(dplyr)
df %>%
mutate(Output = Type * Population * readr::parse_number(as.character(Span))/100)
# ID Status Type Span State Population Output
#1 A Yes 2 70% Ga 10000 14000
If the columns 'Type', 'Population' are not numeric, it is better to convert to numeric with as.numeric(as.character(df$Type)) and for 'Population' (assuming they are factor class). Another option is type.convert(df, as.is = TRUE) and then work on that modified class dataset
We can remove the '%' sign using sub, convert to numeric and multiply values.
This can be done in base R as :
df$output <- with(df, Type * as.numeric(sub('%', '', Span)) * Population/100)
df
# ID Status Type Span State Population output
#1 A Yes 2 70% Ga 10000 14000

Conditional filter of rows based on the values of multiple variables in previous row

I am trying to subset a dataframe to only retain rows for which the value of two variables differ from the value of the previously retained row.
Starting with
df<-structure(list(x = c("ARM018", "ARM018", "ARM018", "ARM021",
"ARM021"), y = c("ARF014", "ARF027", "ARF028",
"ARF014", "ARF020")), class = "data.frame", row.names = c(NA,
-5L))
df
I would like to obtain
df_wanted <-structure(list(x = c("ARM018", "ARM021"), y = c("ARF014",
"ARF020")), class = "data.frame", row.names = c(NA, -2L))
df_wanted
because the values of both x and y differ across the two rows
I had assumed that the lag function from the dplyr package could help
and that the following code would returned df_wanted yet it does return the expected result
library(dplyr)
df_attempt<-df %>%
filter(lag(x)!=x & lag(y)!=y)
Is there any solution to this using the lag function?
a combination of dplyr:cumsum and dplyr:lag could do the trick:
library(dplyr)
df %>% mutate_all(as.character) %>%
filter(cumsum(x != x[1] & y != y[1]) !=
lag(cumsum(x != x[1] & y != y[1]), default = -1))
x y
1 ARM018 ARF014
2 ARM021 ARF020

removing sublists from a list

I have list of 155 elements, eahc contain 3 lists.
below I made an small example. I am only interested in keeping values in gene and am trying in R to remove first and second list of each element all at once! leaving me only values in gene.
test <- list(name="Adipose", desc= "Roche", gene = c("KRT14", "RPE65"))
test1 <- list(name="muscle", desc= "Roche", gene = c("THRSP", "KRT14"))
test2 <- list(name="WBC" , desc= "Roche", gene = c("RBP4", "CCDC80"))
x <- c(test,test1, test2)
How to achieve that?
As shown by the dput you posted in the comments, your actual data structure is a list of lists. In this case, you can use an lapply to get what you want:
list <- structure(list(Adipose = structure(list(name = "Adipose", desc = "Roche", genes = c("ACACB", "ACP5", "ACTA1")), .Names = c("name", "desc", "genes")), WBC = structure(list( name = "WBC ", desc = "Roche", genes = c("THRSP", "KRT14", "APOB", "LEP")), .Names = c("name", "desc", "genes"))), .Names = c("Adipose ", "WBC "))
lapply(list, function(x) x[names(x)=="genes"])
#$`Adipose `
#$`Adipose `$genes
#[1] "ACACB" "ACP5" "ACTA1"
#
#$`WBC `
#$`WBC `$genes
#[1] "THRSP" "KRT14" "APOB" "LEP"

How to match column values in two dataframes and make rownames with the matching corresponding column values

I have this dataframe called mydf. I want to match the current column in another dataframe called secondf with the column key.genomloc and extract the corresponding key.wesmut.genom column values and make that rowname as shown in the result.
This is what I have tried, but does not work as desired:
current <- secondf[,"key.genomloc"]
replacement <- secondf[,"key.wesmut.genom"]
v <- mydf[,"current"] %in% current
w <- current %in% mydf[,"current"]
rownames(mydf)<-mydf[,"current"]
rownames(mydf)[v] <- replacement[w]
Data:
mydf <-structure(list(current = structure(c(5L, 2L), .Label = c("chr1:115256529:T:C",
"chr1:115256530:G:T", "chr1:115258744:C:A", "chr1:115258744:C:T",
"chr1:115258747:C:T", "chr11:32417945:T:C", "chr12:25398284:C:A",
"chr12:25398284:C:T", "chr13:28592640:A:C", "chr13:28592641:T:A",
"chr13:28592642:C:A", "chr13:28592642:C:G", "chr15:90631838:C:T",
"chr15:90631934:C:T", "chr2:209113112:C:T", "chr2:209113113:G:A",
"chr2:209113113:G:C", "chr2:209113113:G:T", "chr2:25457242:C:T",
"chr2:25457243:G:A", "chr2:25457243:G:T", "chr4:55599320:G:T"
), class = "factor"), `index` = c(1451738, 1451718)), .Names = c("current",
"index"), row.names = 1:2, class = "data.frame")
secondf<-structure(c("WES:FLT3:p.D835H", "WES:FLT3:p.D835N", "WES:FLT3:p.D835Y",
"WES:FLT3:p.D835A", "WES:FLT3:p.D835V", "chr1:115256530:G:T",
"chr13:28592642:C:T", "chr13:28592642:C:A", "chr1:115258747:C:T",
"chr13:28592641:T:A"), .Dim = c(5L, 2L), .Dimnames = list(NULL,
c("key.wesmut.genom", "key.genomloc")))
Result
rowname current index
WES:FLT3:p.D835A chr1:115258747:C:T 1451738
WES:FLT3:p.D835H chr1:115256530:G:T 1451718
We can use match
mydf$rowname <- secondf[,1][match(mydf$current,secondf[,2])]
mydf[c(3,1:2)]
# rowname current index
#1 WES:FLT3:p.D835A chr1:115258747:C:T 1451738
#2 WES:FLT3:p.D835H chr1:115256530:G:T 1451718

Resources