I`ve 70 csv files with the same columns in a folder, each of them are 0.5 GB.
I want to import them into a single dataframe in R.
Normally I import each of them correctly as below:
df <- read_delim("file.csv",
"|", escape_double = FALSE, col_types = cols(pc_no = col_character(),
id_key = col_character()), trim_ws = TRUE)
To import all of them, coded like that and error as follows:
argument "delim" is missing, with no default
tbl <-
list.files(pattern = "*.csv") %>%
map_df(~read_delim("|", escape_double = FALSE, col_types = cols(pc_no = col_character(), id_key = col_character()), trim_ws = TRUE))
With read_csv, imported but appears only one column which contains all columns and values.
tbl <-
list.files(pattern = "*.csv") %>%
map_df(~read_csv(., col_types = cols(.default = "c")))
In your second block of code, you're missing the ., so read_delim is interpreting your arguments as read_delim(file="|", delim=<nothing provided>, ...). Try:
tbl <- list.files(pattern = "*.csv") %>%
map_df(~ read_delim(., delim = "|", escape_double = FALSE,
col_types = cols(pc_no = col_character(), id_key = col_character()),
trim_ws = TRUE))
I explicitly identified delim= here but it's not strictly necessary. Had you done that in your first attempt, however, you would have seen
readr::read_delim(delim = "|", escape_double = FALSE,
col_types = cols(pc_no = col_character(), id_key = col_character()),
trim_ws = TRUE)
# Error in read_delimited(file, tokenizer, col_names = col_names, col_types = col_types, :
# argument "file" is missing, with no default
which is more indicative of the actual problem.
Related
I am working with the SALES dataset and a trimmed copy of it (TEST). The problem I have is that the SALES file doubles its value when it is saved.
This occurs only with this file (SALES), when the same procedure is performed with the TEST file, the result has the same size as the original file.
I tried transforming the file to an R base data frame and the result is still the same.
Similarly, if I open the file SALES_2 and save it, the size of this file doubles again.
This is the current code:
library(jsonlite)
library(lubridate)
library(tidyverse)
library(readr)
library(stringi)
library(stringr)
library(readxl)
options(scipen = 999)
SALES <- read_delim("C:/Users/edjca/OneDrive/FORMA/PRUEBA/SALES.csv",
delim = "|", escape_double = FALSE, locale =
locale(decimal_mark = ",", grouping_mark ="."), trim_ws = TRUE)
TEST <- read_delim("C:/Users/edjca/OneDrive/FORMA/PRUEBA/Test.csv",
delim = "|", escape_double = FALSE,
locale = locale(decimal_mark = ",", grouping_mark = "."), trim_ws = TRUE)
data.table::fwrite(SALES, "C:/Users/edjca/OneDrive/FORMA/PRUEBA/SALES_2.csv", sep = "|", dec = ",")
data.table::fwrite(TEST, "C:/Users/edjca/OneDrive/FORMA/PRUEBA/Test_2.csv", sep = "|", dec = ",")
Add a picture of the results in my folder and objects sizes in R
I have several files that are in the format of "XXX.qassoc" saved in a folder. I am trying to write a for loop that converts these files to txt like this all at once.
A <- read.table(file = "XXX.qassoc", quote = "\"", comment.char = "",header=TRUE)
write.table(A, file = "XXX.txt", sep = "\t", row.names = FALSE)
Does anyone know what I can do? Thank you!
Set directory with setwd('path') and then:
library(rebus)
library(tidyverse)
library(stringr)
list.files() %>%
str_subset("\\.qassoc$") %>%
walk(~ {A <- read.table(file = .x, quote = "\"", comment.char = "",header=TRUE)
write.table(A, file = str_replace(.x, '\\.qassoc$', '.txt'), sep = "\t", row.names = FALSE)})
b <- data.frame(var1 = c(9.2, 3.5,5.5), var2 = 1:3,row.names = c("a","b","c"))
write_tsv(b,path = result_path,na = "NA",append = T,col_names = T,quote_escape = "double")
b is exported as tsv but the row.names are missing. row.names=T is not an argument for write_tsv.
What can I do to maintain the rownames?
Row names are never kept for any of the readr write_delim() functions. You can either add the row names to the data or use write.table().
Add row names:
library(tibble)
write_tsv(b %>% rownames_to_column(), path = result_path, na = "NA", append = T, col_names = T, quote_escape = "double")
Or:
write.table(b, result_path, na = "NA", append = TRUE, col.names = TRUE, row.names = TRUE, sep = "\t", quote = TRUE)
I am currently trying to download public Treasury data and when setting up my scraping, I am only pulling the date column, 20-year column, and extrapolation factor. The 10-year column, situated in the middle of the table, is not included in the scrape and paste into excel. My code is below. directory not included.
url <- "https://www.treasury.gov/resource-center/data-chart-center/interest-
rates/Pages/TextView.aspx?data=longtermrateYear&year=2020"
ten_year_comp <- read_html(url, encoding = "table")
ten_year_comp %>%
html_nodes("table") %>%
.[[4]] %>%
html_table(fill = TRUE) %>%
write.xlsx(ten_year_comp, file = "TREASURY10YR.xlsx", sheetName = "ten_year_comp",
col.names = TRUE, row.names = TRUE, asTable = TRUE, append = FALSE)
url <- "https://www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/TextView.aspx?data=longtermrateYear&year=2020"
ten_year_comp <- html_nodes(read_html(url), "table")[[4]] %>% html_table(fill = T)
write.xlsx(
ten_year_comp,
file = "TREASURY10YR.xlsx",
sheetName = "ten_year_comp",
col.names = TRUE,
row.names = TRUE,
asTable = TRUE,
append = FALSE
)
I have a script that I am using to read multiple PDF files. Here is my code
corpus_raw <- data.frame("company" = c(),"text" = c(), check.names = FALSE)
for (i in 1:length(pdf_list)){
print(i)
document_text <- pdf_text(paste("V:/CodingProject2_FundOverview/", pdf_list[i],sep = "")) %>%
strsplit("\r\n")
document <- data.frame("company" = gsub(x = pdf_list[i],pattern = ".pdf", replacement = ""),
"text" = document_text, stringsAsFactors = FALSE, check.names = FALSE)
colnames(document) <- c("company", "text")
corpus_raw <- rbind(corpus_raw,document)
}
I get the following error message:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 79, 56
I even tried to keep the check.names = FALSE but it seems like I am doing something wrong. Any help will be appreciated. Thanks
I knew I was doing something stupid. Anyways, I was able to figure out the answer on my own.
for (i in 1:length(pdf_list)){
print(i)
document_text <- pdf_text(paste("V:/CodingProject2_FundOverview/", pdf_list[i],sep = "")) %>%
strsplit("\r\n")
document <- data.frame("company" = gsub(x = pdf_list[i],pattern = ".pdf", replacement = ""),
"text" = I(document_text), stringsAsFactors = FALSE, check.names = FALSE)
colnames(document) <- c("company", "text")
corpus_raw <- rbind(corpus_raw,document)
}