I am working with the SALES dataset and a trimmed copy of it (TEST). The problem I have is that the SALES file doubles its value when it is saved.
This occurs only with this file (SALES), when the same procedure is performed with the TEST file, the result has the same size as the original file.
I tried transforming the file to an R base data frame and the result is still the same.
Similarly, if I open the file SALES_2 and save it, the size of this file doubles again.
This is the current code:
library(jsonlite)
library(lubridate)
library(tidyverse)
library(readr)
library(stringi)
library(stringr)
library(readxl)
options(scipen = 999)
SALES <- read_delim("C:/Users/edjca/OneDrive/FORMA/PRUEBA/SALES.csv",
delim = "|", escape_double = FALSE, locale =
locale(decimal_mark = ",", grouping_mark ="."), trim_ws = TRUE)
TEST <- read_delim("C:/Users/edjca/OneDrive/FORMA/PRUEBA/Test.csv",
delim = "|", escape_double = FALSE,
locale = locale(decimal_mark = ",", grouping_mark = "."), trim_ws = TRUE)
data.table::fwrite(SALES, "C:/Users/edjca/OneDrive/FORMA/PRUEBA/SALES_2.csv", sep = "|", dec = ",")
data.table::fwrite(TEST, "C:/Users/edjca/OneDrive/FORMA/PRUEBA/Test_2.csv", sep = "|", dec = ",")
Add a picture of the results in my folder and objects sizes in R
Related
I have several files that are in the format of "XXX.qassoc" saved in a folder. I am trying to write a for loop that converts these files to txt like this all at once.
A <- read.table(file = "XXX.qassoc", quote = "\"", comment.char = "",header=TRUE)
write.table(A, file = "XXX.txt", sep = "\t", row.names = FALSE)
Does anyone know what I can do? Thank you!
Set directory with setwd('path') and then:
library(rebus)
library(tidyverse)
library(stringr)
list.files() %>%
str_subset("\\.qassoc$") %>%
walk(~ {A <- read.table(file = .x, quote = "\"", comment.char = "",header=TRUE)
write.table(A, file = str_replace(.x, '\\.qassoc$', '.txt'), sep = "\t", row.names = FALSE)})
I am often in the situation where I have multiple files that have identical structure but different content, which ends up in the situation where I have ugly and repetitive read.table() lines. For example:
df1 <- read.table("file1.tsv", fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
df2 <- read.table("file2.tsv", fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
df3 <- read.table("file3.tsv", fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
df4 <- read.table("file4.tsv", fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
Is there a way to store the parameters as a variable, or somehow set a default, to avoid this repetitiveness? (Maybe not, and I've been writing too much python lately).
Naively I tried
read_parameters <- c(fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
df1 <- read.table("file1.tsv", read_parameters)
but this gives an error Error in !header : invalid argument type.
Alternatively I could run a loop for each of the files, but I never have found out how to iteratively name data frames in a loop in R, and in any case I think perhaps an answer to this question would be useful to the community, as I think this is a common situation?
You could write a wrapper-function for read table and set the default parameters as you need them
my.read.table <- function(temp.source, fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
{
return(read.table(temp.source, fill = fill, header = header, stringsAsFactors = stringsAsFactors, quote = quote, sep = sep))
}
Than you can call this function simply by
df <- my.read.table("file1.tsv")
or you could use lapply to call the same function on every source-string.
sources.to.load <- c("file1.tsv", "file2.tsv", "file3.tsv")
df_list <- lapply(sources.to.load, read.table, fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t")
Edit:
If you want to keep the parameter vector method as well, you could add it to your wrapper function.
my.read.table2 <- function(temp.source, fill = T, header = T, stringsAsFactors = F, quote = "", sep = "\t", parameterstring)
{
if(exists("parameterstring"))
{
fill <- as.logical(parameterstring[1])
header <- as.logical(parameterstring[2])
stringsAsFactors <- as.logical(parameterstring[3])
quote <- parameterstring[4]
sep <- parameterstring[5] # if you need this to be more "strict" about the parameternames in the supplied vector: sep <- parameterstring[which(names(parameterstring) == "sep"))]
}
return(read.table(temp.source, fill = fill, header = header, stringsAsFactors = stringsAsFactors, quote = quote, sep = sep))
}
Than you can call this function simply by
df <- my.read.table2("file1.tsv") # this will call the function with the default settings
df2 <- my.read.table2("file1.tsv", parameterstring = read_parameters) # this will overwrite the default settings by the parameters stored in read_parameters
I`ve 70 csv files with the same columns in a folder, each of them are 0.5 GB.
I want to import them into a single dataframe in R.
Normally I import each of them correctly as below:
df <- read_delim("file.csv",
"|", escape_double = FALSE, col_types = cols(pc_no = col_character(),
id_key = col_character()), trim_ws = TRUE)
To import all of them, coded like that and error as follows:
argument "delim" is missing, with no default
tbl <-
list.files(pattern = "*.csv") %>%
map_df(~read_delim("|", escape_double = FALSE, col_types = cols(pc_no = col_character(), id_key = col_character()), trim_ws = TRUE))
With read_csv, imported but appears only one column which contains all columns and values.
tbl <-
list.files(pattern = "*.csv") %>%
map_df(~read_csv(., col_types = cols(.default = "c")))
In your second block of code, you're missing the ., so read_delim is interpreting your arguments as read_delim(file="|", delim=<nothing provided>, ...). Try:
tbl <- list.files(pattern = "*.csv") %>%
map_df(~ read_delim(., delim = "|", escape_double = FALSE,
col_types = cols(pc_no = col_character(), id_key = col_character()),
trim_ws = TRUE))
I explicitly identified delim= here but it's not strictly necessary. Had you done that in your first attempt, however, you would have seen
readr::read_delim(delim = "|", escape_double = FALSE,
col_types = cols(pc_no = col_character(), id_key = col_character()),
trim_ws = TRUE)
# Error in read_delimited(file, tokenizer, col_names = col_names, col_types = col_types, :
# argument "file" is missing, with no default
which is more indicative of the actual problem.
I'm trying to import data into R.
When I submit
Dataset <- read.table("Data.txt",
header = TRUE, sep = "\t", na.strings = "NA", dec = ".", strip.white = TRUE)
it works, but when I added row.names = 1 and I submit
Dataset <- read.table("Data.txt",
header = TRUE, sep = "\t", na.strings = "NA", dec = ".", row.names = 1, strip.white = TRUE)
I get ERREUR:<text>
If your first instance works, perhaps the easiest way would be simply to :
`Dataset <- read.table("Data.txt", header = TRUE, sep = "\t",
na.strings = "NA", dec = ".", strip.white = TRUE)
rownames(Dataset) <- Dataset[, 1]
Dataset <- Dataset[, -1]`
And you should have the solution with the first column of Data.txt being the row names of Dataset
I have two files, one is data.csv and other header.csv. How can I use the header.csv to set the column names of the data.frame read from the data.csv file.
This does not work:
data = read.table(path, sep = ";", quote = "",
dec = ".", head=FALSE, fill = TRUE, comment.char = "")
header = read.table(path_header, sep = ";", quote = "",
dec = ".", head=FALSE, fill = TRUE, comment.char = "")
colnames(data) <- header
And since I have more then 200 columns so not really convenient to use
colnames(data) <- c("A","B",...)