I am rather new to R and I am trying to prepare an interactive data table using the DT package. My data contains numeric values, but some of these values are preceded by < or > sign. What I want is for my data table is to allow interactive sorting on the numeric values, regardless of whether there is a < or > sign in front of it. So for example >10, <5, 9, >8 should sort to <5, >8, 9, >10.
My initial approach for this was to duplicate the column containing the numeric values with < and > signs, to remove the < and > signs from this duplicate column, and to convert this data to numeric values to obtain a column with only the numeric values. What I then would like is to be able to order the data in the table on these numeric values, but I want to be able to do this when clicking the ordening button of the column containing the numeric values with the < and > signs. Therefore, I want to hide the column containing only the numeric values (since I do not want this column to be present in the table), but I want to somehow link the ordining function of the original column to this hidden column.
Here are some example data and a script in which I have already duplicated the column (b to c), removed the < and > signs, and converted it to numeric values to obtain the column c, which I have then hidden:
library(DT)
df <- data.frame(a=1:5, b=c('10','5.0','2.0','< 1.0','> 20'), c=c(10,5,2,1,20))
DT <- DT::datatable(df,
options = list(columnDefs =
list(list(visible=FALSE,
targets=3))))
DT
I have not been able to find a way to sort the data in the table on this hidden column c by using the sorting button of column b.
I have found that this should be possible in JavaScript: jQuery DataTables - Ordering dates by hidden column
However, I am not able to figure out how to do the same in R, either by using a suitable function in R, or by providing it in JavaScript using the JS() function.
Could anyone help me with this problem?
Here is a solution using render:
library(DT)
render <- c(
"function(data, type, row){",
" if(type === 'sort'){",
" return parseFloat(data.match(/\\d+\\.?\\d+/)[0]);",
" }else{",
" return data;",
" }",
"}"
)
df <- data.frame(
a = 1:5,
b = c('10','5.0','2.0','< 1.0','> 20')
)
DT <- datatable(df,
options = list(
columnDefs = list(
list(render = JS(render), type = "num", targets = 2)
)
)
)
DT
This solution does not require a hidden column.
Here's a way to do it. To get the "sorting key" use order.
library(DT)
# df <- data.frame(a=1:5, b=c('10','5.0','2.0','< 1.0','> 20'), c=c(10,5,2,1,20))
df <- data.frame(a = 1:5, b = c('10', '5.0', '2.0', '< 1.0', '> 20'))
df
#ONE APPROACH
df$c <-
stringr::str_replace(string = df$b,
pattern = "[<>]",
replacement = "") %>%
as.numeric()
#ANOTHER APPROACH
df$c <- gsub("[<>]", "", df$b) %>% as.numeric()
DT::datatable(df[order(df$c), -3], rownames = FALSE)
library(DT)
df <- data.frame(a=1:5, b=c('10','5.0','2.0','< 1.0','> 20'), c=c(10,5,2,1,20))
DT <- DT::datatable(df,
options = list(columnDefs =
list(list(visible=FALSE, targets=3),
list(orderData=3, targets=2)
)))
DT
Note: This answer is based on this one here, but DT now uses R indexing instead of JS indexing.
Related
I have runtime data for various devices that can be widely different, ranging from a few minutes to several months that I would like to display in a datatable. So I thought the seconds_to_period function from lubridate provides a neat format to print this data. However, I seem unable to display it within a datatable from DT, which is what I want to do (within a shiny App).
Some example data:
library(lubridate)
library(DT)
names <- c("A","B","C","D","E","F")
timevec <- c(225,2250,22500,225000,2250000,22500000)
timevec <- seconds_to_period(timevec)
Writing this into a datatable without any formatting does not work as it only displays the seconds without considering the minutes/hours etc.:
##### This cuts off at the seconds -> useless
table <- data.frame(name = names, time = timevec)
my_table <- datatable(table)
Formatting the time column with formatDate also doesn't work since it is not a date or POSIXct object. I can print the desired format by typecasting it as a string, but then the sorting of the column doesn't work as it is sorted alphabetically:
##### This prints the period format, but sorting does not work
table <- data.frame(name = names, time = as.character(timevec))
my_table <- datatable(table)
and of course I could just print the total time in seconds, but as I said I find this very unintuitive to read:
##### This prints the seconds -> unintuitive to read
table <- data.frame(name = names, time = as.duration(timevec))
my_table <- datatable(table)
Any Ideas on how to achieve this or alternative suggestions how to intuitively display duration data?
solution by programming DT to sort a shown character column by a hidden numeric column via columnDefs
library(tidyverse)
library(lubridate)
library(DT)
names <- c("A", "B", "C", "D", "E", "F")
timevec_raw <- c(225, 2250, 22500, 225000, 2250000, 22500000)
timevec_period <- seconds_to_period(timevec_raw)
(table <- tibble(
name = names,
timenum = timevec_raw,
timechar = as.character(timevec_period)
)
)
my_table <- datatable(table,
options = list(
columnDefs = list(
list(
visible = FALSE, targets = 2
), # hide column 2 the numeric one
list(
orderData = c(2), # the ordering of column 3 comes from hidden column 2
targets = c(3)
)
)
)
)
I am trying to escape special character "--" in column of DT table but formatPercentage is not letting this happen , manually passing formatPercentage(c(2,3,5)) is working and i want to make it dynamic. so i am looking for a solution through which column having "--" can be displayed in DT table.
I have tried ifelse but doesn't work , this code is just a part of my function.
df <- mtcars[1:6,1:5]
df$drat <- "--"
df$disp <- "--"
datatable(df, escape = FALSE) %>%
formatPercentage(2:5)
so the actual problem is I am trying to mask one column in my DT table output but formatPercentage not providing the require output. so i am looking for a solution .
My function is big thats why i am unable to create a reproducible example
You can exclude the columns which has '--' in them.
library(DT)
all_cols <- 2:5
format_cols <- setdiff(all_cols, which(colSums(df == '--') > 0))
datatable(df, escape = FALSE) %>% formatPercentage(format_cols)
I have this log file that has about 1200 characters (max) on a line. What I want to do is read this first and then extract certain portions of the file into new columns. I want to extract rows that contain the text “[DF_API: input string]”.
When I read it and then filter based on the rows that I am interested, it almost seems like I am losing data. I tried this using the dplyr filter and using standard grep with the same result.
Not sure why this is the case. Appreciate your help with this. The code and the data is there at the following link.
Satish
Code is given below
library(dplyr)
setwd("C:/Users/satis/Documents/VF/df_issue_dec01")
sec1 <- read.delim(file="secondary1_aa_small.log")
head(sec1)
names(sec1) <- c("V1")
sec1_test <- filter(sec1,str_detect(V1,"DF_API: input string")==TRUE)
head(sec1_test)
sec1_test2 = sec1[grep("DF_API: input string",sec1$V1, perl = TRUE),]
head(sec1_test2)
write.csv(sec1_test, file = "test_out.txt", row.names = F, quote = F)
write.csv(sec1_test2, file = "test2_out.txt", row.names = F, quote = F)
Data (and code) is given at the link below. Sorry, I should have used dput.
https://spaces.hightail.com/space/arJlYkgIev
Try this below code which could give you a dataframe of filtered lines from your file based a matching condition.
#to read your file
sec1 <- readLines("secondary1_aa_small.log")
#framing a dataframe by extracting required lines from above file
new_sec1 <- data.frame(grep("DF_API: input string", sec1, value = T))
names(new_sec1) <- c("V1")
Edit: Simple way to split the above column into multiple columns
#extracting substring in between < & >
new_sec1$V1 <- gsub(".*[<\t]([^>]+)[>].*", "\\1", new_sec1$V1)
#replacing comma(,) with a white space
new_sec1$V1 <- gsub("[,]+", " ", new_sec1$V1)
#splitting into separate columns
new_sec1 <- strsplit(new_sec1$V1, " ")
new_sec1 <- lapply(new_sec1, function(x) x[x != ""] )
new_sec1 <- do.call(rbind, new_sec1)
new_sec1 <- data.frame(new_sec1)
Change columns names for your analysis.
I have a data base with 250 columns and want to read only 50 of them instead of loading all of them then dropping columns with dplyr::select. I suppose I can do that using a column specification. I don't want to type the column specification manually for all those columns.
The 50 columns I want to keep have a common prefix, say 'blop', so I managed to manually change the column specification object I got from readr::spec_csv. I then used it to read my data file :
short_colspec <- readr::spec_csv('myfile.csv')
short_colspec$cols <- lapply(names(short_colspec$cols), function(name){
if (substr(name, 1, 4) == 'blop'){
return(col_character())
} else {
return(col_skip())
}
})
short_data <- read_csv('myfile.csv', col_types = short_colspec)
Is there a way to specify such a column specification with readr (or any other package) functions in a more robust way than what I did ?
using data.table's fread you can select columns you want to skip (=drop) or keep (=select)
#read first line of file to select which columns to keep
#adjust the strsplit-character here ';' according to your csv-type
keep_col <- readLines( "myfile.csv", n = 1L ) %>% strsplit( ";" ) %>% el() %>% grep( "blop", . )
#read file, only the desired columns
fread( "myfile.csv", select = keep_col )
I am trying to highlight rows of an excel file based on a match from the columns in a separate excel file. Pretty much, I want to highlight a row in file1 if a cell in that row matches a cell in file2.
I saw the R package "conditionalFormatting" has some of this functionality, but I cannot figure out how to use it.
the pseudo-code i think would look something like this:
file1 <- read_excel("file1")
file2 <- read_excel("file2")
conditionalFormatting(file1, sheet = 1, cols = 1:end, rows = 1:22,
rule = "number in file1 is found in a specific column of file 2")
Please let me know if this makes sense or if i need to clarify something.
Thanks!
The conditionalFormatting() function embeds active conditional formatting into the excel document but is likely more complicated than you need for a one-time highlight. I'd suggest loading each file into a dataframe, determining which rows contain a matching cell, creating a highlight style (yellow background), loading the file as a workbook object, setting the appropriate rows to the highlight style, and saving the updated workbook object.
The following function is the used to determine which rows have a match. The magrittr package provides the %>% pipes and the data.table package provides the transpose() function.
find_matched_rows <- function(df1, df2) {
require(magrittr)
require(data.table)
# the dataframe object treats each column as a list making it much easier and
# faster to search via column than row. Transpose the original file1 dataframe
# to treat the rows as columns.
df1_transposed <- data.table::transpose(df1)
# assuming that the location of the match in the second file is irrelevant,
# unlist the file2 dataframe so that each value in file1 can be searched in a
# vector
df2_as_vector <- unlist(df2)
# determine which columns contain a match. If one or more matches are found,
# attribute the row as 'TRUE' in the output vector to be used to subset the
# row numbers
match_map <- lapply(df1_transposed,FUN = `%in%`, df2_as_vector) %>%
as.data.frame(stringsAsFactors = FALSE) %>%
sapply(function(x) sum(x) > 0)
# make a vector of row numbers using the logical match_map vector to subset
matched_rows <- seq(1:nrow(df1))[match_map]
return(matched_rows)
}
The following code loads the data, finds the matched rows, applies the highlight, and saves over the original file1.xlsx. The second tst_df1 and tst_df2 provide for an easy way of testing the find_matched_rows() function. As expected, it finds that the 1st and 3rd rows of the first dataframe contain a cell that matches a cell in second dataframe.
# used to ensure that the correct rows are highlighted. the dataframe does not
# include the header as an independent row unlike excel.
file1_header_row <- 1
file2_header_row <- 1
tst_df1 <- openxlsx::read.xlsx("./file1.xlsx",
startRow = file1_header_row)
tst_df2 <- openxlsx::read.xlsx("./file2.xlsx",
startRow = file2_header_row)
#example data for testing
tst_df1 <- data.frame(fname = c("John", "Bob", "Bill"),
lname = c("Smith", "Johnson", "Samson"),
wage = c(10, 15.23, 137.38),
stringsAsFactors = FALSE)
tst_df2 <- data.frame(a = c(10, 34, 284.2),
b = c("Billy", "Bill", "Billy-Bob"),
c = c("Samson", "Johansson", NA),
stringsAsFactors = FALSE)
df_matched_rows <- find_matched_rows(tst_df1, tst_df2)
# any color found in colours() can be used here or hex color beginning with "#"
highlight_style <- openxlsx::createStyle(fgFill = "yellow")
file1_wb <- openxlsx::loadWorkbook(file = "./file1.xlsx")
openxlsx::addStyle(wb = file1_wb,
sheet = 1,
style = highlight_style,
rows = file1_header_row + df_matched_rows,
cols = 1:ncol(tst_df1),
stack = TRUE,
gridExpand = TRUE)
openxlsx::saveWorkbook(wb = file1_wb,
file = "./file1.xlsx",
overwrite = TRUE)