data.table alternative for dplyr mutate_? - r

I have following r code which uses dplyr.
Due to large data size, we want to use data.table.
test <- function(Act, mac, type, thisYear){
Act %>%
mutate_(var = type) %>%
filter(var == mac) %>%
filter(floor_date(as.Date(submit_ts), 'year') == thisYear)
}
Act is as follows
| submit_ts | col1 | col2 |
| ------------- |---------------|-------|
| '2015-01-01' | 'x' | 1000 |
| '2015-01-01' | 'y' | 200 |
| '2015-01-01' | 'x' | 200 |
basically function can works as follows
test(act, 'x', 'col1', 2015)
result is as follows
| submit_ts | col1 | col2 |
| ------------- |---------------|-------|
| '2015-01-01' | 'x' | 1000 |
| '2015-01-01' | 'x' | 200 |
test(act, 200, 'col2', 2015)
result is as follows
| submit_ts | col1 | col2 |
| ------------- |---------------|-------|
| '2015-01-01' | 'y' | 200 |
| '2015-01-01' | 'x' | 200 |
How should I do it using data.table ?

We can do a similar approach in data.table with
library(data.table)
library(lubridate)
test1 <- function(Act, mac, type){
setnames(setDT(Act), type, "var")[
var==mac & year(floor_date(as.Date(submit_ts), "year"))==thisYear]
}
test1(dat, 2, "val")
# submit_ts var
#1: 2013-05-05 2
#2: 2013-05-12 2
NOTE: The floor_date does not return a yyyy year.
data
dat <- data.frame(submit_ts= c("2013-05-05", "2012-05-10", "2013-05-12"),
val = c(2, 1, 2), stringsAsFactors=FALSE)
thisYear <- 2013

Related

Display long table with hiding records

I want to show table which can display n number of top records and n number of bottom records if the table is very long.
df <- nycflights13::flights
funct <- function(data, var){
var_lab(data[[var]])<-"Table 1"
t1<- expss::cro_cpct(data[[var]])
t1
}
funct(data=df,var="distance")
# I tried like below but still doesn't work
t1<- expss::cro_cpct(df[["distance"]]) %>% filter(row_number() <= 10 | row_number() >= (n() - 10)) %>%
add_row(.after = 10)
t2 <- t1 %>% mutate(across(everything(), as.character))
t3 <- t2 %>% mutate(across(everything(), ~replace_na(t2, "...")))
I want to give a parameter like by which it can trim table like below, for example if i give new parameter n = 10 then it should show first 10 records and bottom 10 records and trim the rest of records without changing the original percentage values.
Not very nice, but works for me:
library(expss)
df <- nycflights13::flights
funct <- function(data, var){
var_lab(data[[var]])<-"Table 1"
t1<- expss::cro_cpct(data[[var]])
t1
}
res = funct(data=df,var="distance")
res = add_rows(
head(res, 10),
NA,
tail(res, 10)
)
# All row labels are located in the first column separated with '|'.
# We need to replace the last label with '...'.
# That's why we have this regular expression here.
res$row_labels[11] = gsub("\\|[^|]+$", "|...", res$row_labels[1])
# I don't recommend using the line below because it converts all numerics to characters.
# It can complicate the further processing.
# It's better to leave all columns except row_labels as is, e. g. filled with NA
res[11, -1] = '...'
res
# | | | #Total |
# | ------- | ------------ | -------------------- |
# | Table 1 | 17 | 0.000296933273154857 |
# | | 80 | 0.014549730384588 |
# | | 94 | 0.28980687459914 |
# | | 96 | 0.180238496804998 |
# | | 116 | 0.131541440007601 |
# | | 143 | 0.130353706914982 |
# | | 160 | 0.111646910706226 |
# | | 169 | 0.161828633869397 |
# | | 173 | 0.0656222533672233 |
# | | 184 | 1.63432073544433 |
# | | ... | ... |
# | | 2475 | 3.34406252227 |
# | | 2521 | 0.0843290495759793 |
# | | 2565 | 1.52237689146495 |
# | | 2569 | 0.0976910468679478 |
# | | 2576 | 0.0926431812243153 |
# | | 2586 | 2.43604057296244 |
# | | 3370 | 0.00237546618523885 |
# | | 4963 | 0.108380644701523 |
# | | 4983 | 0.101551179418961 |
# | | #Total cases | 336776 |
Filter and add_row in between the top and bottom rows:
df <- nycflights13::flights
df %>%
select(carrier, distance) %>%
arrange(desc(distance )) %>%
filter(row_number() <= 10 | row_number() >= (n() - 10)) %>%
mutate(across(everything(), as.character)) %>%
add_row(.after = 10, carrier = "...", distance = "...") %>%
writexl::write_xlsx(., "table.xlsx")
If you want an spss format style, you could do it with the janitor package manually, e.g.
df %>%
janitor::tabyl(distance ) %>%
select(-n) %>%
arrange(desc(distance )) %>%
janitor::adorn_totals() %>%
janitor::adorn_pct_formatting() %>%
filter(row_number() <= 10 | row_number() >= (n() - 10)) %>%
add_row(.after = 10) %>%
as_tibble() %>%
mutate(across(everything(), as.character)) %>%
mutate(across(everything(), ~replace_na(.x, "...")))

How can I select cases based on time data?

I am new to using R and I'm stumbling upon a few problems which I can't seem to solve on my own. I can't figure out how I can select cases based on time units.
I want to select cases where Time_D - Time_A is equal or above 5 seconds (for the same individual).
For instance my data frame consists of the following data:
+-------------------+--------------+---------------+
| | Individual | Time_A | Time_D |
+-------------------+--------------+---------------+
| 1 | A | 09:21:27 | 09:21:28 |
| 2 | A | 09:21:29 | 09:21:40 |
| 3 | A | 09:21:30 | 09:21:36 |
| 4 | B | 09:32:14 | 09:32:23 |
| 5 | B | 09:32:18 | 09:32:22 |
+-------------------+--------------+---------------+
And I want to only select the cases where Time_D - Time_A >= 5 seconds to get the following data frame:
+----------------+------------+-------------+
| | Individual | Time_A | Time_D |
+----------------+------------+-------------+
| 2 | A | 09:21:29 | 09:21:40 |
| 3 | A | 09:21:30 | 09:21:36 |
| 4 | B | 09:32:14 | 09:32:23 |
+----------------+------------+-------------+
I have already coded for time:
DT <- as.data.table(df3)[, Time_A := as.ITime(Time_A)][, Time_D := as.ITime(Time_D)]
After converting the columns to ITime you can subtract Time_D - Time_A and keep rows where the difference is greater than 5.
library(data.table)
cols <- c('Time_A', 'Time_D')
setDT(df)[, (cols) := lapply(.SD, as.ITime), .SDcols = cols]
df[(Time_D - Time_A) >= 5]
# Individual Time_A Time_D
#1: A 09:21:29 09:21:40
#2: A 09:21:30 09:21:36
#3: B 09:32:14 09:32:23
In base R, you can do this with as.POSIXct.
subset(df, as.POSIXct(Time_D, format = '%T') -
as.POSIXct(Time_A, format = '%T') >= 5)
We can use tidyverse
library(dplyr)
library(lubridate)
df1 %>%
filter(period_to_seconds(hms(Time_D)) - period_to_seconds(hms(Time_A)) >=5)
# Individual Time_A Time_D
#1 A 09:21:29 09:21:40
#2 A 09:21:30 09:21:36
#3 B 09:32:14 09:32:23
data
df1 <- structure(list(Individual = c("A", "A", "A", "B", "B"),
Time_A = c("09:21:27",
"09:21:29", "09:21:30", "09:32:14", "09:32:18"), Time_D = c("09:21:28",
"09:21:40", "09:21:36", "09:32:23", "09:32:22")), class = "data.frame",
row.names = c(NA,
-5L))

For each combination of a set of variables in a list, calculating correlations between this combination and another variable in R

In R I want to generate correlation co-efficients by comparing 2 variables whilst also retaining a phylogenetic signal.
The initial way I thought to do this is not computationally efficient, and I think there is a much simpler, but I do not have the skills in R to do it.
I have a csv file which looks like this:
+-------------------------------+-----+----------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
| Species | OGT | Domain | A | C | D | E | F | G | H | I | K | L | M | N | P | Q | R | S | T | V | W | Y |
+-------------------------------+-----+----------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
| Aeropyrum pernix | 95 | Archaea | 9.7659115711 | 0.6720465616 | 4.3895390781 | 7.6501943794 | 2.9344881615 | 8.8666657183 | 1.5011817208 | 5.6901432494 | 4.1428307243 | 11.0604191603 | 2.21143353 | 1.9387130928 | 5.1038552753 | 1.6855017182 | 7.7664358772 | 6.266067034 | 4.2052190807 | 9.2692433532 | 1.318690698 | 3.5614200159 |
| Argobacterium fabrum | 26 | Bacteria | 11.5698896021 | 0.7985475923 | 5.5884500155 | 5.8165463343 | 4.0512504104 | 8.2643271309 | 2.0116736244 | 5.7962804605 | 3.8931525401 | 9.9250463349 | 2.5980609708 | 2.9846761128 | 4.7828063605 | 3.1262365491 | 6.5684282943 | 5.9454781844 | 5.3740045968 | 7.3382308193 | 1.2519739683 | 2.3149400984 |
| Anaeromyxobacter dehalogenans | 27 | Bacteria | 16.0337898849 | 0.8860252895 | 5.1368827707 | 6.1864992608 | 2.9730203513 | 9.3167603253 | 1.9360386851 | 2.940143349 | 2.3473650439 | 10.898494736 | 1.6343905351 | 1.5247123262 | 6.3580285706 | 2.4715303021 | 9.2639057482 | 4.1890063803 | 4.3992339725 | 8.3885969061 | 1.2890166336 | 1.8265589289 |
| Aquifex aeolicus | 85 | Bacteria | 5.8730327277 | 0.795341216 | 4.3287799008 | 9.6746388172 | 5.1386954322 | 6.7148035486 | 1.5438364179 | 7.3358775924 | 9.4641440609 | 10.5736658776 | 1.9263080969 | 3.6183861236 | 4.0518679067 | 2.0493569604 | 4.9229955632 | 4.7976564501 | 4.2005259246 | 7.9169763709 | 0.9292167138 | 4.1438942987 |
| Archaeoglobus fulgidus | 83 | Archaea | 7.8742687687 | 1.1695110027 | 4.9165979364 | 8.9548767369 | 4.568636662 | 7.2640358917 | 1.4998752909 | 7.2472039919 | 6.8957233203 | 9.4826333048 | 2.6014466253 | 3.206476915 | 3.8419576418 | 1.7789787933 | 5.7572748236 | 5.4763351139 | 4.1490633048 | 8.6330814159 | 1.0325605451 | 3.6494619148 |
+-------------------------------+-----+----------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
What I want to do is, for each possible combination of the percentages within the 20 single letter columns (amino acids, so 10 million combinations). Is to calculate the correlation between each different combination and the OGT variable in the CSV.... (whilst retaining a phylogenetic signal)
My current code is this:
library(parallel)
library(dplyr)
library(tidyr)
library(magrittr)
library(ape)
library(geiger)
library(caper)
taxonomynex <- read.nexus("taxonomyforzeldospecies.nex")
zeldodata <- read.csv("COMPLETECOPYFORR.csv")
Species <- dput(zeldodata)
SpeciesLong <-
Species %>%
gather(protein, proportion,
A:Y) %>%
arrange(Species)
S <- unique(SpeciesLong$protein)
Scombi <- unlist(lapply(seq_along(S),
function(x) combn(S, x, FUN = paste0, collapse = "")))
joint_protein <- function(protein_combo, data){
sum(data$proportion[vapply(data$protein,
grepl,
logical(1),
protein_combo)])
}
SplitSpecies <-
split(SpeciesLong,
SpeciesLong$Species)
cl <- makeCluster(detectCores() - 1)
clusterExport(cl, c("Scombi", "joint_protein"))
SpeciesAggregate <-
parLapply(cl,
X = SplitSpecies,
fun = function(data){
X <- lapply(Scombi,
joint_protein,
data)
names(X) <- Scombi
as.data.frame(X)
})
Species <- cbind(Species, SpeciesAggregate)
`
Which attempts to feed in each combination into memory and then calculate the sum of each proportion of each of the acids, but this takes forever to finish and crashes before completion.
I think it would be better to feed in correlation co-efficents into a vector, and then just print out the relative co-efficients of each different combination for each species, but I don't know the best way of doing this in R.
I also aim to retain a phylogenetic signal using the ape package using something along the lines of this:
pglsModel <- gls(OGT ~ AminoAcidCombination, correlation = corBrownian(phy = taxonomynex),
data = zeldodata, method = "ML")
summary(pglsModel)
Apologies for how unclear this is, if anyone has any advice, much appreciated!
Edit: Link to taxonomyforzeldospecies.nex
Output from dput(Zeldodata):
1 Species OGT Domain A C D E F G H I K L M N P Q R S T V W Y
------------------------------- ----- ---------- --------------- -------------- -------------- -------------- -------------- -------------- -------------- -------------- -------------- --------------- -------------- -------------- -------------- -------------- -------------- -------------- -------------- -------------- -------------- --------------
2 Aeropyrum pernix 95 Archaea 9.7659115711 0.6720465616 4.3895390781 7.6501943794 2.9344881615 8.8666657183 1.5011817208 5.6901432494 4.1428307243 11.0604191603 2.21143353 1.9387130928 5.1038552753 1.6855017182 7.7664358772 6.266067034 4.2052190807 9.2692433532 1.318690698 3.5614200159
3 Argobacterium fabrum 26 Bacteria 11.5698896021 0.7985475923 5.5884500155 5.8165463343 4.0512504104 8.2643271309 2.0116736244 5.7962804605 3.8931525401 9.9250463349 2.5980609708 2.9846761128 4.7828063605 3.1262365491 6.5684282943 5.9454781844 5.3740045968 7.3382308193 1.2519739683 2.3149400984
4 Anaeromyxobacter dehalogenans 27 Bacteria 16.0337898849 0.8860252895 5.1368827707 6.1864992608 2.9730203513 9.3167603253 1.9360386851 2.940143349 2.3473650439 10.898494736 1.6343905351 1.5247123262 6.3580285706 2.4715303021 9.2639057482 4.1890063803 4.3992339725 8.3885969061 1.2890166336 1.8265589289
5 Aquifex aeolicus 85 Bacteria 5.8730327277 0.795341216 4.3287799008 9.6746388172 5.1386954322 6.7148035486 1.5438364179 7.3358775924 9.4641440609 10.5736658776 1.9263080969 3.6183861236 4.0518679067 2.0493569604 4.9229955632 4.7976564501 4.2005259246 7.9169763709 0.9292167138 4.1438942987
6 Archaeoglobus fulgidus 83 Archaea 7.8742687687 1.1695110027 4.9165979364 8.9548767369 4.568636662 7.2640358917 1.4998752909 7.2472039919 6.8957233203 9.4826333048 2.6014466253 3.206476915 3.8419576418 1.7789787933 5.7572748236 5.4763351139 4.1490633048 8.6330814159 1.0325605451 3.6494619148
this will give you a long data frame with each combination and sum per Species (takes about 35 seconds on my machine)...
zeldodata <-
Species %>%
gather(protein, proportion, A:Y) %>%
group_by(Species) %>%
mutate(combo = sapply(1:n(), function(i) combn(protein, i, FUN = paste0, collapse = ""))) %>%
mutate(sum = sapply(1:n(), function(i) combn(proportion, i, FUN = sum))) %>%
unnest() %>%
select(-protein, -proportion)
an example of calculating each species separately and saving the data to disk before reading each one in and combining them...
library(readr)
library(dplyr)
library(tidyr)
library(purrr)
# read in CSV file
zeldodata <-
read_delim(
delim = "|",
trim_ws = TRUE,
col_names = TRUE,
col_types = "cicdddddddddddddddddddd",
file = "Species | OGT | Domain | A | C | D | E | F | G | H | I | K | L | M | N | P | Q | R | S | T | V | W | Y
Aeropyrum pernix | 95 | Archaea | 9.7659115711 | 0.6720465616 | 4.3895390781 | 7.6501943794 | 2.9344881615 | 8.8666657183 | 1.5011817208 | 5.6901432494 | 4.1428307243 | 11.0604191603 | 2.21143353 | 1.9387130928 | 5.1038552753 | 1.6855017182 | 7.7664358772 | 6.266067034 | 4.2052190807 | 9.2692433532 | 1.318690698 | 3.5614200159
Argobacterium fabrum | 26 | Bacteria | 11.5698896021 | 0.7985475923 | 5.5884500155 | 5.8165463343 | 4.0512504104 | 8.2643271309 | 2.0116736244 | 5.7962804605 | 3.8931525401 | 9.9250463349 | 2.5980609708 | 2.9846761128 | 4.7828063605 | 3.1262365491 | 6.5684282943 | 5.9454781844 | 5.3740045968 | 7.3382308193 | 1.2519739683 | 2.3149400984
Anaeromyxobacter dehalogenans | 27 | Bacteria | 16.0337898849 | 0.8860252895 | 5.1368827707 | 6.1864992608 | 2.9730203513 | 9.3167603253 | 1.9360386851 | 2.940143349 | 2.3473650439 | 10.898494736 | 1.6343905351 | 1.5247123262 | 6.3580285706 | 2.4715303021 | 9.2639057482 | 4.1890063803 | 4.3992339725 | 8.3885969061 | 1.2890166336 | 1.8265589289
Aquifex aeolicus | 85 | Bacteria | 5.8730327277 | 0.795341216 | 4.3287799008 | 9.6746388172 | 5.1386954322 | 6.7148035486 | 1.5438364179 | 7.3358775924 | 9.4641440609 | 10.5736658776 | 1.9263080969 | 3.6183861236 | 4.0518679067 | 2.0493569604 | 4.9229955632 | 4.7976564501 | 4.2005259246 | 7.9169763709 | 0.9292167138 | 4.1438942987
Archaeoglobus fulgidus | 83 | Archaea | 7.8742687687 | 1.1695110027 | 4.9165979364 | 8.9548767369 | 4.568636662 | 7.2640358917 | 1.4998752909 | 7.2472039919 | 6.8957233203 | 9.4826333048 | 2.6014466253 | 3.206476915 | 3.8419576418 | 1.7789787933 | 5.7572748236 | 5.4763351139 | 4.1490633048 | 8.6330814159 | 1.0325605451 | 3.6494619148"
)
# save an RDS file for each species
for(species in unique(zeldodata$Species)) {
zeldodata %>%
filter(Species == species) %>%
gather(protein, proportion, A:Y) %>%
mutate(combo = sapply(1:n(), function(i) combn(protein, i, FUN = paste0, collapse = ""))) %>%
mutate(sum = sapply(1:n(), function(i) combn(proportion, i, FUN = sum))) %>%
unnest() %>%
select(-protein, -proportion) %>%
saveRDS(file = paste0(species, ".RDS"))
}
# read in and combine all the RDS files
zeldodata <-
list.files(pattern = "\\.RDS") %>%
map(read_rds) %>%
bind_rows()

Converting comma separated list to dataframe

If I have a list similar to x <- c("Name,Age,Gender", "Rob,21,M", "Matt,30,M"), how can I convert to a dataframe where Name, Age, and Gender become the column headers.
Currently my approach is,
dataframe <- data.frame(matrix(unlist(x), nrow=3, byrow=T))
which gives me
matrix.unlist.user_data...nrow...num_rows..byrow...T.
1 Name,Age,Gender
2 Rob,21,M
3 Matt,30,M
and doesn't help me at all.
How can I get something which resembles the following from the list mentioned above?
+---------------------------------------------+
| name | age | gender |
| | | |
+---------------------------------------------+
| | | |
| | | |
| ... | ... | ... |
| | | |
| | | ++
+---------------------------------------------+
| | | |
| ... | ... | ... |
| | | |
| | | |
+---------------------------------------------+
We paste the strings into a single string with \n and use either read.csv or read.table from base R
read.table(text=paste(x, collapse='\n'), header = TRUE, stringsAsFactors = FALSE, sep=',')
Alternatively,
data.table::fread(paste(x, collapse = "\n"))
Name Age Gender
1: Rob 21 M
2: Matt 30 M

Extracting columns from text file

I load a text file (tree.txt) to R, with the below content (copy pasted from JWEKA - J48 command).
I use the following command to load the text file:
data3 <-read.table (file.choose(), header = FALSE,sep = ",")
I would like to insert each column into a separate variables named like the following format COL1, COL2 ... COL8 (in this example since we have 8 columns). If you load it to EXCEL with delimited separation each row will be separated in one column (this is the required result).
Each COLn will contain the relevant characters of the tree in this example.
How can separate and insert the text file into these columns automatically while ignoring the header and footer content of the file?
Here is the text file content:
[[1]]
J48 pruned tree
------------------
MSTV <= 0.4
| MLTV <= 4.1: 3 -2
| MLTV > 4.1
| | ASTV <= 79
| | | b <= 1383:00:00 2 -18
| | | b > 1383
| | | | UC <= 05:00 1 -2
| | | | UC > 05:00 2 -2
| | ASTV > 79:00:00 3 -2
MSTV > 0.4
| DP <= 0
| | ALTV <= 09:00 1 (170.0/2.0)
| | ALTV > 9
| | | FM <= 7
| | | | LBE <= 142:00:00 1 (27.0/1.0)
| | | | LBE > 142
| | | | | AC <= 2
| | | | | | e <= 1058:00:00 1 -5
| | | | | | e > 1058
| | | | | | | DL <= 04:00 2 (9.0/1.0)
| | | | | | | DL > 04:00 1 -2
| | | | | AC > 02:00 1 -3
| | | FM > 07:00 2 -2
| DP > 0
| | DP <= 1
| | | UC <= 03:00 2 (4.0/1.0)
| | | UC > 3
| | | | MLTV <= 0.4: 3 -2
| | | | MLTV > 0.4: 1 -8
| | DP > 01:00 3 -8
Number of Leaves : 16
Size of the tree : 31
An example of the COL1 content will be:
MSTV
|
|
|
|
|
|
|
|
MSTV
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
COL2 content will be:
MLTV
MLTV
|
|
|
|
|
|
>
DP
|
|
|
|
|
|
|
|
|
|
|
|
DP
|
|
|
|
|
|
Try this:
cleaned.txt <- capture.output(cat(paste0(tail(head(readLines("FILE_LOCATION"), -4), -4), collapse = '\n'), sep = '\n'))
cleaned.df <- read.fwf(file = textConnection(cleaned.txt),
header = FALSE,
widths = rep.int(4, max(nchar(cleaned.txt)/4)),
strip.white= TRUE
)
cleaned.df <- cleaned.df[,colSums(is.na(cleaned.df))<nrow(cleaned.df)]
For the cleaning process, I end up using a combination of head and tail to remove the 4 spaces on the top and the bottom. There's probably a more efficient way to do this outside of R, but this isn't so bad. Generally, I'm just making the file readable to R.
Your file looks like a fixed-width file so I use read.fwf, and use textConnection() to point the function to the cleaned output.
Finally, I'm not sure how your data is actually structured, but when I copied it from stackoverflow, it pasted with a bunch of whitespace at the end of each line. I'm using some tricks to guess at how long the file is, and removing extraneous columns over here
widths = rep.int(4, max(nchar(cleaned.txt)/4))
cleaned.df <- cleaned.df[,colSums(is.na(cleaned.df))<nrow(cleaned.df)]
Next, I'm creating the data in the way you would like it structured.
for (i in colnames(cleaned.df)) {
assign(i, subset(cleaned.df, select=i))
assign(i, capture.output(cat(paste0(unlist(get(i)[get(i)!=""])),sep = ' ', fill = FALSE)))
}
rm(i)
rm(cleaned.df)
rm(cleaned.txt)
What this does is it creates a loop for each column header in your data frame.
From there it uses assign() to put all the data in each column into its' own data frame. In your case, they are named V1 through V15.
Next, it uses a combination of cat() and paste() with unlist() an capture.output() to concatenate your list into a single character vectors, for each of the data frames, so they are now character vectors, instead of data frames.
Keep in mind that because you wanted a space at each new character, I'm using a space as a separator. But because this is a fixed-width file, some columns are completely blank, which I'm removing using
get(i)[get(i)!=""]
(Your question said you wanted COL2 to be: MLTV MLTV | | | | | | > DP | | | | | | | | | | | | DP | | | | | |).
If we just use get(i), there will be a leading whitespace in the output.

Resources