Get data from the last column with data per row [duplicate] - r

This question already has answers here:
Column name of last non-NA row per row; using tidyverse solution?
(1 answer)
Extract first and last values among a number of columns in data frame
(2 answers)
Closed 4 years ago.
I have different sequences of events for elements in a spreadsheet. Row to row the number of events differ.
I want to get the last element for each row and put it in another column for each element as in the column "Last"
ev1 ev2 ev3 ev4 Last
A A1 A2 A3 NA A3
B B1 B2 NA NA B2
C C1 C2 C3 C4 C4
D D1 D2 D3 NA D3
E E1 NA NA NA E1
If any of the events in each row is = "Delivered" I want to show Delivered instead of the last event.

You can try dplyr::coalesce on the whole data.frame. But you have to change the order of the columns. The coalesce collapses right to left but you want last column (right most). The solution could be:
library(dplyr)
df$Last <- coalesce(!!! df[ncol(df):1])
df
# ev1 ev2 ev3 ev4 Last
# A A1 A2 A3 <NA> A3
# B B1 B2 <NA> <NA> B2
# C C1 C2 C3 C4 C4
# D D1 D2 D3 <NA> D3
# E E1 <NA> <NA> <NA> E1
Data:
df <- read.table(text =
"ev1 ev2 ev3 ev4
A A1 A2 A3 NA
B B1 B2 NA NA
C C1 C2 C3 C4
D D1 D2 D3 NA
E E1 NA NA NA",
header = TRUE, stringsAsFactors = FALSE)

Related

How to filter rows in R with a specific value? [duplicate]

This question already has answers here:
Filter data.frame rows by a logical condition
(9 answers)
Closed 4 years ago.
My dataset has 21 columns with 4625 rows. I can't paste few lines of the dataset due to the content of the column here, just giving a demo dataset:
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20 c21
1 GCF1 ............................10..................................... 386
2 GCF2 ............................10......................................10
3 GCF3 ............................32......................................10
The column21 have 331 different number and I want to group my data according to the number of column21. For example, I want to see how many of the GCFs have '10' and their characteristics according to the other columns.I tried the following command. It comes with the 236 rows those have 10 in column 11 but not in column21.
f2 <- f1[rowSums(sapply(f1[-21], '%in%', c('10'))) > 0,]
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20 c21
1 GCF1 ............................10......................................386
2 GCF2 ............................10......................................10
How can I sort rows on the basis of value in column 21 ?
The filter command from dplyr is designed to do exactly this.
This will return only the rows that have 10 in c21
library(dplyr)
df %>%
filter(c21 == 10)
Let's make your question reproducible:
df <- data.frame("a" = 1:5, "b" = c(3, 5, 7, 7, 7), "c" = c(5, 3, 3, 7, 9))
a b c
1 1 3 5
2 2 5 3
3 3 7 3
4 4 7 7
5 5 7 9
You want to filter out this data frame based on the condition of, say, column c being equal to 3, correct? Well df$c==3 is your "mask": FALSE TRUE TRUE FALSE FALSE
You can use this mask to filter your data frame: df[df$c==3,] gives:
a b c
2 2 5 3
3 3 7 3
Using base R:
df[df$c21==10, ]
or
subset(df, c21==10)
Using dplyr:
filter(df, c21==10)

RStudio: Compare two different data tables when row.name is same

I have two data frames with different columns and rows:
One of the tables are specifically expressed genes names in rows (750 entries) with statistical analysis (p-val, fold change) in columns. (750x2 matrix)
The second table is all expressed genes names(13,000) and their associated genes in the same row (rows go as long as 100). (13,000x100 matrix)
I am interested in creating a data frame with the 750 expressed gene names from the first file and using a match function in R that will insert the associated genes from table2.
Example:
First Data
table
Name Fold Change P-value
A 0 3
B 1 4
F 2 6
H 1 8
Second Data table
Name X1 X2 X3 X4 X5
A A1 A2 A3 A4 A5
B B1 B2 B3 B4 B5
C C1 C2 C3 C4 C5
D D1 D2 D3 D4 D5
E E1 E2 E3 E4 E5
F F1 F2 F3 F4 F5
Desired Output
Name X1 X2 X3 X4 X5
A A1 A2 A3 A4 A5
B B1 B2 B3 B4 B5
D D1 D2 D3 D4 D5
F F1 F2 F3 F4 F5

append three excel sheet's data with same header in R

The data.xlsx contains three sheet S1,S2,S3. All of them use the same header, how to merge these data into one data frame?
data.xlsx S1 sheet
A B C
a1 b1 c1
data.xlsx S2 sheet
A B C
a2 b2 c2
data.xlsx S3 sheet
A B C
a3 b3 c3
Here is my starting code
s1 = read.xlsx('data.xlsx', sheetName='S1') # contains 2 rows
s2 = read.xlsx('data.xlsx', sheetName='S2') # contains 3 rows
s3 = read.xlsx('data.xlsx', sheetName='S3') # contains 4 rows
all = s1 + s2 + s3 # of cause this is wrong code
I wish the all contains
A B C
a1 b1 c1
a2 b2 c2
a3 b3 c3
do this
rbind(s1,s2,s3)
this assumes s1 ,s2 and s3 has same no of cols

Recursively building a list of dataframes (in R) - is loop only option?

I have a dataframe with n columns and I need to obtain combinations of its variables:
E.g.:
df <- data.frame(A = c("a1","a2","a3","a4","a5","a6"),
B = c("a1","a1","a3","a3","a5","a5"),
C = c("a1","a1","a1","a3","a4","a4"),
D = c("a1","a1","a1","a3","a4","a5"))
I need to make a list that would have n-1 elements each including all the unique combinations of the dataframe variables. The first element includes unique values for each columns starting from the first and ending to the last. For each subsequent element I need to drop the first column of the previous appended dataframe. Like this:
myList <- list(unique(df[,1:ncol(df)),
unique(df[,2:ncol(df)),
unique(df[,3:ncol(df)))
I managed to solve this with a for loop:
myList <- list()
for (i in 1:(ncol(df) - 1)){
myList[[i]] <- unique(df[, i:ncol(df)])
}
but I was left wondering whether there was a faster and more elegant way to do this.
With sapply():
sapply(1:(ncol(df)-1),
FUN = function(x, nc, df) unique(df[, x:nc]), nc = ncol(df), df = df)
An elegant solution would be a recursion:
func = function(df, n, lst)
{
if(ncol(df)==n) return(lst)
func(df, n+1, c(lst, list(unique(df[n:ncol(df)]))))
}
#> func(df,1, list())
#[[1]]
# A B C D
#1 a1 a1 a1 a1
#2 a2 a1 a1 a1
#3 a3 a3 a1 a1
#4 a4 a3 a3 a3
#5 a5 a5 a4 a4
#6 a6 a5 a4 a5
#[[2]]
# B C D
#1 a1 a1 a1
#3 a3 a1 a1
#4 a3 a3 a3
#5 a5 a4 a4
#6 a5 a4 a5
#[[3]]
# C D
#1 a1 a1
#4 a3 a3
#5 a4 a4
#6 a4 a5

expand.grid with separate variable for each column

I would like to achieve the following data.frame in R:
i1 i2 i3
1 A1 A2 A3
2 No A2 A3
3 A1 No A3
4 No No A3
5 A1 A2 No
6 No A2 No
7 A1 No No
8 No No No
In each column the variable can either be the concatenated string "A" and the column number or "No". The data.frame should contain all possible combinations.
My idea was to use expand.grid, but I don't know how to create the list dynamically. Or is there a better approach?
expand.grid(list(c("A1", "No"), c("A2", "No"), c("A3", "No")))
I guess you could create your own helper function, something like that
MyList <- function(n) expand.grid(lapply(paste0("A", seq_len(n)), c, "No"))
Then simply pass it the number of elements (e.g., 3)
MyList(3)
# Var1 Var2 Var3
# 1 A1 A2 A3
# 2 No A2 A3
# 3 A1 No A3
# 4 No No A3
# 5 A1 A2 No
# 6 No A2 No
# 7 A1 No No
# 8 No No No
Alternatively, you could also try data.tables CJ equivalent which should much more efficient than expand.grid for a big n
library(data.table)
DTCJ <- function(n) do.call(CJ, lapply(paste0("A", seq_len(n)), c, "No"))
DTCJ(3) # will return a sorted cross join
# V1 V2 V3
# 1: A1 A2 A3
# 2: A1 A2 No
# 3: A1 No A3
# 4: A1 No No
# 5: No A2 A3
# 6: No A2 No
# 7: No No A3
# 8: No No No
Another option is using Map with expand.grid
n <- 3
expand.grid(Map(c, paste0('A', seq_len(n)), 'NO'))
Or
expand.grid(as.data.frame(rbind(paste0('A', seq_len(n)),'NO')))
Another option, only using the most fundamental functions in R, is to use the indices:
df <- data.frame(V1 = c('A','A','A', 'A',rep('No',4)), V2 = c('A','A','No','No','A','A','No','No'), V3 = c('A','No','A','No','A','No','A','No'), stringsAsFactors = FALSE)
to get the row and col indices of the elements we need to change:
rindex <- which(df != 'No') %% nrow(df)
cindex <- ceiling(which(df != 'No')/nrow(df))
the solution is basically a one-liner:
df[matrix(c(rindex,cindex),ncol=2)] <- paste0(df[matrix(c(rindex,cindex),ncol=2)],cindex)
> df
V1 V2 V3
1 A1 A2 A3
2 A1 A2 No
3 A1 No A3
4 A1 No No
5 No A2 A3
6 No A2 No
7 No No A3
8 No No No

Resources