The data.xlsx contains three sheet S1,S2,S3. All of them use the same header, how to merge these data into one data frame?
data.xlsx S1 sheet
A B C
a1 b1 c1
data.xlsx S2 sheet
A B C
a2 b2 c2
data.xlsx S3 sheet
A B C
a3 b3 c3
Here is my starting code
s1 = read.xlsx('data.xlsx', sheetName='S1') # contains 2 rows
s2 = read.xlsx('data.xlsx', sheetName='S2') # contains 3 rows
s3 = read.xlsx('data.xlsx', sheetName='S3') # contains 4 rows
all = s1 + s2 + s3 # of cause this is wrong code
I wish the all contains
A B C
a1 b1 c1
a2 b2 c2
a3 b3 c3
do this
rbind(s1,s2,s3)
this assumes s1 ,s2 and s3 has same no of cols
Related
I have two text files:
First.txt
A1 B1 C1
A2 B1 C2
A3 B2 C2
A4 B3 C3
and
Second.txt
C1 D1
C2 D1
C3 D2
Here is the code written in jupyter notebook:
!type First.txt Second.txt
The output is given as:
A1 B1 C1
A2 B1 C2
A3 B2 C2
A4 B3 C3C1 D1
C2 D1
C3 D2
but I want the output to be like:
A1 B1 C1
A2 B1 C2
A3 B2 C2
A4 B3 C3
C1 D1
C2 D1
C3 D2
How can I print the two text files one after the other?
I would not resort to shell commands for this. Just this snippet should work fine:
for filename in ["First.txt", "Second.txt"]:
with open(filename) as f:
for line in f:
print(line)
I have two data frames with different columns and rows:
One of the tables are specifically expressed genes names in rows (750 entries) with statistical analysis (p-val, fold change) in columns. (750x2 matrix)
The second table is all expressed genes names(13,000) and their associated genes in the same row (rows go as long as 100). (13,000x100 matrix)
I am interested in creating a data frame with the 750 expressed gene names from the first file and using a match function in R that will insert the associated genes from table2.
Example:
First Data
table
Name Fold Change P-value
A 0 3
B 1 4
F 2 6
H 1 8
Second Data table
Name X1 X2 X3 X4 X5
A A1 A2 A3 A4 A5
B B1 B2 B3 B4 B5
C C1 C2 C3 C4 C5
D D1 D2 D3 D4 D5
E E1 E2 E3 E4 E5
F F1 F2 F3 F4 F5
Desired Output
Name X1 X2 X3 X4 X5
A A1 A2 A3 A4 A5
B B1 B2 B3 B4 B5
D D1 D2 D3 D4 D5
F F1 F2 F3 F4 F5
This question already has answers here:
Column name of last non-NA row per row; using tidyverse solution?
(1 answer)
Extract first and last values among a number of columns in data frame
(2 answers)
Closed 4 years ago.
I have different sequences of events for elements in a spreadsheet. Row to row the number of events differ.
I want to get the last element for each row and put it in another column for each element as in the column "Last"
ev1 ev2 ev3 ev4 Last
A A1 A2 A3 NA A3
B B1 B2 NA NA B2
C C1 C2 C3 C4 C4
D D1 D2 D3 NA D3
E E1 NA NA NA E1
If any of the events in each row is = "Delivered" I want to show Delivered instead of the last event.
You can try dplyr::coalesce on the whole data.frame. But you have to change the order of the columns. The coalesce collapses right to left but you want last column (right most). The solution could be:
library(dplyr)
df$Last <- coalesce(!!! df[ncol(df):1])
df
# ev1 ev2 ev3 ev4 Last
# A A1 A2 A3 <NA> A3
# B B1 B2 <NA> <NA> B2
# C C1 C2 C3 C4 C4
# D D1 D2 D3 <NA> D3
# E E1 <NA> <NA> <NA> E1
Data:
df <- read.table(text =
"ev1 ev2 ev3 ev4
A A1 A2 A3 NA
B B1 B2 NA NA
C C1 C2 C3 C4
D D1 D2 D3 NA
E E1 NA NA NA",
header = TRUE, stringsAsFactors = FALSE)
I have a dataframe with n columns and I need to obtain combinations of its variables:
E.g.:
df <- data.frame(A = c("a1","a2","a3","a4","a5","a6"),
B = c("a1","a1","a3","a3","a5","a5"),
C = c("a1","a1","a1","a3","a4","a4"),
D = c("a1","a1","a1","a3","a4","a5"))
I need to make a list that would have n-1 elements each including all the unique combinations of the dataframe variables. The first element includes unique values for each columns starting from the first and ending to the last. For each subsequent element I need to drop the first column of the previous appended dataframe. Like this:
myList <- list(unique(df[,1:ncol(df)),
unique(df[,2:ncol(df)),
unique(df[,3:ncol(df)))
I managed to solve this with a for loop:
myList <- list()
for (i in 1:(ncol(df) - 1)){
myList[[i]] <- unique(df[, i:ncol(df)])
}
but I was left wondering whether there was a faster and more elegant way to do this.
With sapply():
sapply(1:(ncol(df)-1),
FUN = function(x, nc, df) unique(df[, x:nc]), nc = ncol(df), df = df)
An elegant solution would be a recursion:
func = function(df, n, lst)
{
if(ncol(df)==n) return(lst)
func(df, n+1, c(lst, list(unique(df[n:ncol(df)]))))
}
#> func(df,1, list())
#[[1]]
# A B C D
#1 a1 a1 a1 a1
#2 a2 a1 a1 a1
#3 a3 a3 a1 a1
#4 a4 a3 a3 a3
#5 a5 a5 a4 a4
#6 a6 a5 a4 a5
#[[2]]
# B C D
#1 a1 a1 a1
#3 a3 a1 a1
#4 a3 a3 a3
#5 a5 a4 a4
#6 a5 a4 a5
#[[3]]
# C D
#1 a1 a1
#4 a3 a3
#5 a4 a4
#6 a4 a5
Suppose the following situation. There are two tables, each one of them with data of different quality. Both of them have the same variables A, B and C. Variables in the first table are called A1, B1 and C2, while those in the second table are called A2, B2, and C2.
The first table can be updated with the second table. There are six possible combinations:
A1, B1, C2
A1, B2, C1
A2, B1, C1
A1, B2, C2
A2, B1, C2
A2, B2, C1
The question is how to get that in R. What I'm using is what follows:
require(utils)
require(stringr)
vars <- c("A1", "B1", "C1", "A2", "B2", "C2")
combine <- function(data, n){
com1 = combn(data, n)# make all combinations
com2 = c(str_sub(com1, end=-2L))# remove the number in the end of the name
com3 = matrix(com2, nrow = dim(com1)[1], ncol = dim(com1)[2])# vector to matrix
com3 = split(com3, rep(1:ncol(com3), each = nrow(com3)))# matrix to list
com3 = lapply(com3, duplicated)# find list elements with duplicated names
com3 = lapply(com3, function(X){X[which(!any(X == TRUE))]})# identify duplicated names
pos = which(as.numeric(com3) == 0)# get position of duplicates
com3 = com1[,pos]# return elements from the original list
com3 = split(com3, rep(1:ncol(com3), each = nrow(com3)))# matrix to list
com3 = lapply(com3, sort)# sort by alphabetical order
com3 = as.data.frame(com3, stringsAsFactors = FALSE)# matrix to data frame
res = list(positions = pos, combinations = com3)# return position and combinations
return(res)
}
combine(vars, 3)
$positions
[1] 1 4 6 10 11 15 17 20
$combinations
X1 X2 X3 X4 X5 X6 X7 X8
1 A1 A1 A1 A1 A2 A2 A2 A2
2 B1 B1 B2 B2 B1 B1 B2 B2
3 C1 C2 C1 C2 C1 C2 C1 C2
I'd like to know if anyone knows a more straightforward solution than creating all possible combinations and afterwards cleaning up the result as my function does.
You're over thinking the problem. Just use expand.grid:
> expand.grid(c('A1','A2'),c('B1','B2'),c('C1','C2'))
Var1 Var2 Var3
1 A1 B1 C1
2 A2 B1 C1
3 A1 B2 C1
4 A2 B2 C1
5 A1 B1 C2
6 A2 B1 C2
7 A1 B2 C2
8 A2 B2 C2