How do I drop the row numbers when displaying R tibble through the DT package?
The options = list(rownames = FALSE) argument doesn't seem to work, I also "made up" options = list(rownumbers = FALSE) and that didn't work. I messed around with things like select(2:everything()) but that did not work either. Maybe piping in remove_rownames() at then end will work... it does not, or perhaps as.data.frame() piped at the end... Nope.
library(tidyverse)
library(DT)
datatable(mtcars %>% head() %>% as_tibble(), options = list(rownames = FALSE))
library(tidyverse)
library(DT)
datatable(mtcars %>% head() %>% as_tibble, rownames = FALSE)
You just need to rearrange the parenthesis, so that the rownames value is changed for datatable and not for the as_tibble call.
Related
Could you please help with getting the row gap or break line between the sections displayed in the report? I am using the r2rtf package along with tidyverse.
For example using mtcars I have a column rowname I want to display the data with gap between these rownames
mtcars$rowname <- rowname(mtcars)
mtcars %>%
rtf_body() %>%
rtf_encode() %>%
write_rtf('cars.rtf')
It can be handled in data manipulation step using dplyr before r2rtf.
You just need to add \n at the end of each value.
mtcars %>%
mutate(across(everything(), function(x) paste(as.character(x), "\n"))) %>%
rtf_body() %>%
rtf_encode() %>%
write_rtf('cars.rtf')
I'm not really familiar with the r2rtf package, but you could try with the text_space_after argument of rtf_body
library(r2rtf)
mtcars$rowname <- row.names(mtcars)
mtcars[1:5, c("rowname", "mpg", "cyl")] |>
rtf_body(text_space_after = 200) |>
rtf_encode() |>
write_rtf('cars.rtf')
Created on 2022-10-03 with reprex v2.0.2
Is there a quick and easy way using dplyr to add a column called 'site_id' which populates rows from the number given to the filename when using map_df from purrr package to bring the data in to one dataframe?
For example my.files will read in two csv files:
"H:/Documents/2015.csv" and "H:/Documents/2021.csv"
my.files <- list.files(my.path, pattern = "*.csv", full.names = TRUE)
I then use map_df to bring all the data in to one data frame, but would like to create an additional column called 'site_id' that will populate each row from that file with its original file title e.g. 2015 or 2021
I currently merge the .csv files together with this code:
temp.df <- my.files %>% map_df(~read.csv(., skip = 15))
But I envisage using mutate to help but am unsure how it would work...
temp.df <- my.files %>% map_df(~read.csv(., skip = 15) %>%
mutate(site_id = ????))
Any help is much appreciated.
We may use imap if we want to use mutate
library(dplyr)
library(purrr)
setNames(my.files, my.files) %>%
imap_df(~ read.csv(.x, skip = 15) %>%
mutate(site_id = .y))
Or specify the .id in map
setNames(my.files, my.files) %>%
map_dfr(read.csv, skip = 15, .id = "site_id")
Using purrr & dplyr:
temp.df <- my.files %>%
purrr::set_names() %>%
purrr::map(., ~read.csv(., skip = 15)) %>%
dplyr::bind_rows(.id = "site_id")
I have a moderate-sized data set that is 1000 rows by 81 columns. I'd like to use the output from str(), but I'd like to present it in a "prettier" way. I've tried things like this:
df %>% str() %>% kableExtra::kbl() %>% kableExtra::kable_minimal()
and
tbl_summary(as.data.frame(str(df)))
but neither works. I'm not married to str() or to any specific package, but that's the kind of summary I'm going for.
In the end, this is intended to generate an HTML file, but I'd like it to work with PDF output as well.
Any ideas on how to do this?
Update II:
This can be achieved making use of this gist devtools::source_gist('4a0a5ab9fe7e1cf3be0e')
<devtools::source_gist('4a0a5ab9fe7e1cf3be0e')>
print(strtable(iris, factor.values=as.integer), na.print='') %>%
kable() %>%
htmlTable()
Update I:
you could extend:
data.frame(variable = names(iris),
class = sapply(iris, typeof),
levels = sapply(iris, class),
first_values = sapply(iris, function(x) paste0(head(x), collapse = ", ")),
levels_values = sapply(iris, function(x) paste0(unique(x), collapse =", ")),
row.names = NULL) %>%
kable() %>%
htmlTable()
First answer:
Something like this using iris dataset:
library(knitr)
library(magrittr)
library(htmlTable)
data.frame(variable = names(iris),
classe = sapply(iris, typeof),
first_values = sapply(iris, function(x) paste0(head(x), collapse = ", ")),
row.names = NULL) %>%
kable() %>%
htmlTable()
skimr and gt (or kable, or flextable, or DT, or many other table packages) could also work here:
mtcars |>
skimr::skim() |>
gt::gt()
I'm trying to extract a table from a PDF with the R tabulizer package. The functions work fine, but it can't get all the data from the entire table.
Below are my codes
library(tabulizer)
library(tidyverse)
library(abjutils)
D_path = "https://github.com/financebr/files/raw/master/Compacto09-08-2019.pdf"
out <- extract_tables(D_path,encoding = 'UTF-8')
arrumar_nomes <- function(x) {
x %>%
tolower() %>%
str_trim() %>%
str_replace_all('[[:space:]]+', '_') %>%
str_replace_all('%', 'p') %>%
str_replace_all('r\\$', '') %>%
abjutils::rm_accent()
}
tab_tidy <- out %>%
map(as_tibble) %>%
bind_rows() %>%
set_names(arrumar_nomes(.[1,])) %>%
slice(-1) %>%
mutate_all(funs(str_replace_all(., '[[:space:]]+', ' '))) %>%
mutate_all(str_trim)
Comparing the PDF table (D_path) with the tab_tidy database you can see that some information was missing. All first columns, which are merged, are not found during extract_tables(). Also, all lines that contain “Boi Gordo” and “Boi Magro” information are not found by the function either.
The rest is in perfect condition. Would you know why and how to solve it? The questions here in the forum dealing with this do not have much answer.
allcsvs = list.files(pattern = "*.csv$", recursive = TRUE)
library(tidyverse)
##LOOP to redact the snow data csvs##
for(x in 1:length(allcsvs)) {
df = read.csv(allcsvs[x], check.names = FALSE)
newdf = df %>%
gather(COL_DATE, SNOW_DEPTH, -PT_ID, -DATE) %>%
mutate(
DATE = as.Date(DATE,format = "%m/%d/%Y"),
COL_DATE = as.Date(COL_DATE, format = "%Y.%m.%d")
) %>%
filter(DATE == COL_DATE) %>%
select(-COL_DATE)
####TURN DATES UNAMBIGUOUS HERE####
df$DATE = lubridate::mdy(df$DATE)
finaldf = merge(newdf, df, all.y = TRUE)
write.csv(finaldf, allcsvs[x])
df = read.csv(allcsvs[x])
newdf = df[, -grep("X20", colnames(df))]
write.csv(newdf, allcsvs[x])
}
I am using the code above to populate a new column row-by-row using values from different existing columns, using date as selection criteria. If I manually open each .csv in excel and delete the first column, this code works great. However, if I run it on the .csvs "as is"
I get the following message:
Error: Column 1 must be named
So far I've tried putting -rownames within the parenthesis of gather, I've tried putting remove_rownames %>% below newdf = df %>%, but nothing seems to work. I tried reading the csv without the first column [,-1] or deleting the first column in R df[,1]<-NULL but for some reason when I do that my code returns an empty table instead of what I want it to. In other words, I can delete the rownames in Excel and it works great, if I delete them in R something funky happens.
Here is some sample data: https://drive.google.com/file/d/1RiMrx4wOpUdJkN4il6IopciSF6pKeNLr/view?usp=sharing
You can consider to import them with readr::read_csv.
An easy solution with tidyverse:
allcsvs %>%
map(read_csv) %>%
reduce(bind_rows) %>%
gather(COL_DATE, SNOW_DEPTH, -PT_ID, -DATE) %>%
mutate(
DATE = as.Date(DATE,format = "%m/%d/%Y"),
COL_DATE = as.Date(COL_DATE, format = "%Y.%m.%d")
) %>%
filter(DATE == COL_DATE) %>%
select(-COL_DATE)
With utils::read.csv, you are importing strings are factors. as.Date(DATE,format = "%m/%d/%Y") evaluates NA.
Update
Above solution returns one single dataframe. To write the each data file separately with the for loop:
for(x in 1:length(allcsvs)) {
read_csv(allcsvs[x]) %>%
gather(COL_DATE, SNOW_DEPTH, -PT_ID, -DATE) %>%
mutate(
COL_DATE = as.Date(COL_DATE, format = "%Y.%m.%d")
) %>%
filter(DATE == COL_DATE) %>%
select(-COL_DATE) %>%
write_csv(paste('tidy', allcsvs[x], sep = '_'))
}
Comparison
purrr:map and purrr:reduce can be used instead of for loop in some cases. Those functions take another functions as arguments.
readr::read_csv is typically 10x faster than base R equivalents. (More info: http://r4ds.had.co.nz/data-import.html). Also it can handle CSV files better.