Extract the max values for row and column - r

I need to fin the values for which the row max and the column max are for the same position. Test data (The real data doesn't need to be a square matrix):
scores<-structure(c(0.4, 0.6, 0.222222222222222, 0.4, 0.4, 0, 0.25, 0.5,
0.285714285714286), .Dim = c(3L, 3L), .Dimnames = list(c("a",
"b", "c"), c("d", "e", "f")))
I already found which are the columns/rows with the max value for that row/column.
rows<-structure(list(a = c("d", "e"), b = "d", c = "f"), .Names = c("a",
"b", "c"))
cols<-structure(list(d = "b", e = c("a", "b"), f = "b"), .Names = c("d",
"e", "f"))
But I don't manage to get the values from the matrix. The problem are when the same (max) value appear twice or more. I don't know how to check the indices of that case. I tried using mapply:
mapply(function(x, y) {
cols[x] == rows[y]
}, rows, cols)
But this stops when rows or cols has more than one element.
Expected output: c(0.6, 0.4)
The first is the max value of column 1 and row 2, the second value is the max value of row 1 and column 2.
d e f | Max
a 0.4000000 0.4 0.2500000 0.4
b 0.6000000 0.4 0.5000000 0.6
c 0.2222222 0.0 0.2857143 0.2857
Max: 0.6 0.4 0.5
As you can see for row 2 and column 1 the max value is the same, and for row 1 and column 1 it is the same value, but for row 3 and column 3 it isn't

I think , I understood what you are trying to do. Not an optimal solution though.
We find out the indices for maximum value in rows as well as column and then find out the indices which intersect and display the corresponding value from the dataframe.
a1 <- which(apply(scores, 1, function(x) x == max(x)))
a2 <- which(apply(scores, 2, function(x) x == max(x)))
scores[intersect(a1, a2)]
#[1] 0.6 0.4
And in one-line
scores[intersect(which(apply(scores, 1, function(x) x == max(x))),
which(apply(scores, 2, function(x) x == max(x))))]

This is what you want:
# Compute rows and columns max and rows max positions
row_max<-apply(scores, 1, max)
row_max_pos<-apply(scores, 1, which.max)
col_max<-apply(scores, 2, max)
# For each row, check if max is equal to corresponding column max
res <- sapply(1:length(row_max),
function(i) ifelse(row_max[i] == col_max[row_max_pos[i]], T, F))
row_max[res]
It also work with same max values on multiple rows/columns, for example with this data:
scores <- structure(c(0.4, 0.6, 0.222222222222222, 0.4, 0.4, 0, 0.25, 0.5,
0.285714285714286, 0.13, 0.2, 0.6), .Dim = c(4L, 3L),
.Dimnames = list(c("a", "b", "c", "d"), c("e", "f", "g")))

Related

Remove Columns from a table that are 90% one value

Example Data:
A<- c(1,2,3,4,1,2,3,4,1,2)
B<- c(A,B,C,D,E,F,G,H,I,J)
C<- c(1,1,1,1,1,1,1,1,1,0)
D<- c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE)
df1<-data.frame(A,B,C,D)
df1 %>%
select_if(
###column is <90% one value
)
So I have a table that has a few columns that are predominantly one value--like C and D in the above example. I need to get rid of any columns that are 90% or more one unique value. How can I get rid of the columns that fit this criteria?
We may use select with where, get the frequency count with table, convert to proportions, get the max value and check if it is less than .90 to select the particular column
library(dplyr)
df1 <- df1 %>%
select(where(~ max(proportions(table(.))) < .90))
data
df1 <- structure(list(A = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2), B = c("A",
"B", "C", "D", "E", "F", "G", "H", "I", "J"), C = c(1, 1, 1,
1, 1, 1, 1, 1, 1, 0), D = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-10L))

Volcano plot for multiple clusters

I am trying to make a volcano plot for different clusters. I have 2 conditions, untreated vs. treated. I have a differential expression excel file that cellranger generated for me but within the file it has multiple clusters each which have a fold change and p value. How do I create a volcano plot that contains all the clusters rather than one? Would I have to do a volcano plot for each cluster and then combine them all somehow?
I used this code to generate the plot for just one of the clusters...
macrophage_list <- read.table("differential_expression_macrophage.csv", header = T, sep = ",")`
EnhancedVolcano(macrophage_list, lab = as.character(macrophage_list$FeatureName), x = 'Cluster1.Log2.Fold.Change', y = 'Cluster1.Adjusted.P.Value', xlim = c(-8,8), title = 'Macrophage', pCutoff = 10e-5, FCcutoff = 1.5, pointSize = 3.0, labSize = 3.0)
How do I merge all the information in the excel file to create a volcano plot?
I uploaded each data cluster one by one and then merged them by using rbind, but is there a simpler/quicker way to do this?
output for dput(gene_list[1:20, 1:14])
structure(list(Feature.ID = structure(1:20, .Label = c("a", "b",
"c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o",
"p", "q", "r", "s", "t"), class = "factor"), Feature.Name = structure(1:20, .Label = c("A",
"B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N",
"O", "P", "Q", "R", "S", "T"), class = "factor"), Cluster.1_Mean.Counts = c(0.000960904,
0.000320301, 0.001281205, 0.000320301, 0.000320301, 0.016335362,
0.000960904, 0, 0.001601506, 0.000320301, 0.007046627, 0.026585,
0.017296265, 0.004804518, 0, 0.874742598, 0.017616566, 0.007366928,
0.008327831, 0.001921807), Cluster.1_Log2.fold.change = c(0.291978774,
1.954943787, -2.008530337, -2.482461526, 3.539906287, 0.407455991,
-0.214981215, 1.539906287, 0.802940693, 2.539906287, -1.333136538,
-1.879953595, -0.52422405, -0.877946228, 1.539906287, -0.629373147,
1.118442519, 0.170672478, 1.065975099, 1.099333696), Cluster.1_Adjusted.p.value = c(1,
0.910243711, 0.04672812, 0.080866038, 0.610296549, 0.80063597,
1, 1, 0.951841603, 0.797013021, 0.103401275, 0.000594428, 0.907754993,
0.532689631, 1, 0.480958806, 0.078345008, 1, 0.198557945, 0.668312142
), Cluster.2_Mean.Counts = c(0.000902278, 0.001804555, 0.006315943,
0.004511388, 0, 0.029775159, 0.001804555, 0, 0.002706833, 0,
0.023459216, 0.128123411, 0.030677437, 0.009022775, 0, 2.174488883,
0.018947828, 0.019850106, 0.010827331, 0.000902278), Cluster.2_Log2.fold.change = c(0.792589781,
4.769869705, 0.35201719, 0.839132367, 3.184907204, 1.32985554,
0.962514783, 3.184907204, 1.725475586, 2.599944703, 0.560416339,
0.580736324, 0.407299626, 0.184907204, 3.184907204, 0.816580902,
1.120776867, 1.742684876, 1.409613491, 0.599944703), Cluster.2_Adjusted.p.value = c(1,
0.153573448, 1, 0.737977734, 1, 0.14478935, 0.853816767, 1, 0.47952604,
1, 0.65316285, 0.507251471, 0.776636022, 1, 1, 0.346630571, 0.285006452,
0.060868933, 0.21546202, 1), Cluster.3_Mean.Counts = c(0.001813813,
0, 0.019045032, 0.00725525, 0, 0.022672657, 0.000906906, 0, 0,
0, 0.029927908, 0.043531502, 0.046252221, 0.029021001, 0, 3.146057931,
0.020858845, 0.013603594, 0.008162157, 0), Cluster.3_Log2.fold.change = c(1.455721575,
2.192687169, 2.008262598, 1.504631175, 3.192687169, 0.9044422,
0.334706174, 3.192687169, -0.451169021, 2.607724668, 0.931421856,
-1.032594057, 1.038258504, 1.970294748, 3.192687169, 1.412371018,
1.26985503, 1.14829305, 0.991053308, -0.451169021), Cluster.3_Adjusted.p.value = c(0.757752635,
1, 0.032609935, 0.33316083, 1, 0.441825712, 1, 1, 1, 1, 0.380305075,
0.605158722, 0.339946318, 0.016952505, 1, 0.056529024, 0.259458704,
0.339639234, 0.536765022, 1), Cluster.4_Mean.Counts = c(0.000641899,
0, 0.002567596, 0.004493293, 0, 0.010270384, 0.003209495, 0,
0.000641899, 0, 0.028243557, 0.160474756, 0.012196081, 0.005135192,
0, 1.199709274, 0.005135192, 0.004493293, 0.005777091, 0.001283798
), Cluster.4_Log2.fold.change = c(0.269229783, 1.661547206, -0.886889419,
0.778904157, 2.661547206, -0.289908942, 1.602653517, 2.661547206,
0.076584705, 2.076584705, 0.854192284, 0.961549693, -0.967809414,
-0.644261223, 2.661547206, -0.104384578, -0.790579612, -0.467735811,
0.459913345, 0.722947751), Cluster.4_Adjusted.p.value = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.584036686, 1, 1, 1, 1, 1, 1,
1, 1)), class = "data.frame", row.names = c(NA, 20L))
Based on your dataset, you need to reshape them but first in order to reshape them using the right pattern, we will rename some column names:
colnames(df) <- gsub(".Mean", "_Mean", colnames(df))
colnames(df) <- gsub(".Log2", "_Log2", colnames(df))
colnames(df) <- gsub(".Adjus","_Adjus",colnames(df))
Now, we can reshape it using the right pattern with pivot_longer function from tidyr package:
library(tidyr)
final_df <- df %>% pivot_longer(., -c(Feature.ID, Feature.Name), names_to = c("set",".value"), names_pattern = "(.+)_(.+)")
# A tibble: 80 x 6
Feature.ID Feature.Name set Mean.Counts Log2.fold.change Adjusted.p.value
<fct> <fct> <chr> <dbl> <dbl> <dbl>
1 a A Cluster.1 0.000961 0.292 1
2 a A Cluster.2 0.000902 0.793 1
3 a A Cluster.3 0.00181 1.46 0.758
4 a A Cluster.4 0.000642 0.269 1
5 b B Cluster.1 0.000320 1.95 0.910
6 b B Cluster.2 0.00180 4.77 0.154
7 b B Cluster.3 0 2.19 1
8 b B Cluster.4 0 1.66 1
9 c C Cluster.1 0.00128 -2.01 0.0467
10 c C Cluster.2 0.00632 0.352 1
# … with 70 more rows
Now, we can create the volcano plot by using ggplot2 and ggrepel libraries for the labeling of Feature.Name (if you don't have ggrepel, you have to install it):
library(ggplot2)
library(ggrepel)
ggplot(final_df, aes(x = Log2.fold.change,y = -log10(Adjusted.p.value), label = Feature.Name))+
geom_point()+
geom_text_repel(data = subset(final_df, Adjusted.p.value < 0.05),
aes(label = Feature.Name))
And you get your volcano plot with all clusters merged, all points with the same color, and with labeling of Feature.names with an adjusted p value < 0.05

Building sequence data for a recommender system- replacing cross-tabular matrix with a variable value

I am trying to build a sequence data for a recommender system. I have built a cross-tabular data (Table 1) and Table 2 as shown below:
enter image description here
I have been trying to replace all the 1's in Table 1 by the "Grade" from the Table 2 in R.
Any insight/suggestion is greatly appreciated.
Instead of replacing the first one with second, the second table and directly changed to 'wide' with dcast
library(reshape2)
res <- dcast(df2, St.No. ~ Courses, value.var = 'Grade')[names(df1)]
res
# St.No. Math Phys Chem CS
#1 1 A B
#2 2 B B
#3 3 A A C
#4 4 B B D
If we need to replace the blanks with 0
res[res =='"] <- "0"
data
df1 <- data.frame(St.No. = 1:4, Math = c(0, 0, 1, 1), Phys = c(1, 1, 0, 1),
Chem = c(0, 1, 1, 0), CS = c(1, 0, 1, 1))
df2 <- data.frame(St.No. = rep(1:4, each = 4), Courses = rep(c("Math",
"Phys", "Chem", "CS"), 4),
Grade = c("", "A", "", "B", "", "B", "B", "",
"A", "", "A", "C", "B", "B", "", "D"),
stringsAsFactors = FALSE)

Conditional displaying values in R

I'd like to see which values have a particular entry issue, but I'm not getting things done right.
For instance, I need to print on screen values from column "c" but conditional of a given value from "b" say where [b==0].
Finally, I need to add a new string for those whose condition is true.
df<- structure(list(a = c(11.77, 10.9, 10.32, 10.96, 9.906, 10.7,
11.43, 11.41, 10.48512, 11.19), b = c(2, 3, 2, 0, 0, 0, 1, 2,
4, 0), c = c("q", "c", "v", "f", "", "e", "e", "v", "a", "c")), .Names = c("a",
"b", "c"), row.names = c(NA, -10L), class = "data.frame")
I tried this without success:
if(df[b]==0){
print(df$c)
}
if((df[b]==0)&(df[c]=="v")){
df[c] <-paste("2")
}
Thanks for helping.
The correct syntax is like df[rows, columns], so you could try:
df[df$b==0, "c"]
You can accomplish changing values using ifelse:
df$c <- ifelse(df$b==0 & df$c=="v", paste(df$c, 2, sep=""), df$c)
Does this help?
rows <- which(df$b==0)
if (length(rows)>0) {
print(df$c[rows])
df$c[rows] <- paste(df$c[rows],'2')
## maybe you wanted to have:
# df$c[rows] <- '2'
}
There are several ways to subset data in R, like e.g.:
df$c[df$b == 0]
df[df$b == 0, "c"]
subset(df, b == 0, c)
with(df, c[b == 0])
# ...
To conditionally add another column (here: TRUE/FALSE):
df$e <- FALSE; df$e[df$b == 0] <- TRUE
df <- transform(df, c = ifelse(b == 0, TRUE, FALSE))
df <- within(df, e <- ifelse(b == 0, TRUE, FALSE))
# ...

How to reorder rows in a matrix

I have a matrix and would like to reorder the rows so that for example row 5 can be switched to row 2 and row 2 say to row 7. I have a list with all rownames delimited with \n and I thought I could somehow read it into R (its a txt file) and then just use the name of the matrix (in my case 'k' and do something like k[txt file,]-> k_new but this does not work since the identifiers are not the first column but are defined as rownames.
k[ c(1,5,3,4,7,6,2), ] #But probably not what you meant....
Or perhaps (if your 'k' object rownames are something other than the default character-numeric sequence):
k[ char_vec , ] # where char_vec will get matched to the row names.
(dat <- structure(list(person = c(1, 1, 1, 1, 2, 2, 2, 2), time = c(1,
2, 3, 4, 1, 2, 3, 4), income = c(100, 120, 150, 200, 90, 100,
120, 150), disruption = c(0, 0, 0, 1, 0, 1, 1, 0)), .Names = c("person",
"time", "income", "disruption"), row.names = c("h", "g", "f",
"e", "d", "c", "b", "a"), class = "data.frame"))
dat[ c('h', 'f', 'd', 'b') , ]
#-------------
person time income disruption
h 1 1 100 0
f 1 3 150 0
d 2 1 90 0
b 2 3 120 1

Resources