I have almost 400 dataframes loaded in R. But the names still have the .csv extension.
I read the data with this code
Files <- list.files(pattern="\\.csv$")
for (i in 1:length(Files)){
assign(Files[i],
read.csv(Files[i],
sep = ";",
header = T))
}
Is there a way to remove the .cvs extension while importing the datasets?
Thanks a lot!
Here is a way that doesn't use assign, which is likely much better practice. You can keep the file names as the element names of the list.
library(tidyverse)
files <- list.files(pattern="\\.csv$")
df_list <- map(files, read_csv2)
names(df_list) <- str_remove(files, "\\.csv$")
Try this:
Files <- list.files(pattern="\\.csv$")
for (i in 1:length(Files)){
assign(gsub("\\..*","",Files)[i], # replace your this line of code
read.csv(Files[i],
sep = ";",
header = T))
}
You may want to add an extra gsub step:
Files <- list.files(pattern="\\.csv$")
File.name <- gsub("\\.csv$", "", Files)
for (i in 1:length(Files)){
assign(File.name[i],
read.csv(Files[i],
sep = ";",
header = T))
}
Related
I have written the following function to combine 300 .csv files. My directory name is "specdata". I have done the following steps for execution,
x <- function(directory) {
dir <- directory
data_dir <- paste(getwd(),dir,sep = "/")
files <- list.files(data_dir,pattern = '\\.csv')
tables <- lapply(paste(data_dir,files,sep = "/"), read.csv, header = TRUE)
pollutantmean <- do.call(rbind , tables)
}
# Step 2: call the function
x("specdata")
# Step 3: inspect results
head(pollutantmean)
Error in head(pollutantmean) : object 'pollutantmean' not found
What is my mistake? Can anyone please explain?
There's a lot of unnecessary code in your function. You can simplify it to:
load_data <- function(path) {
files <- dir(path, pattern = '\\.csv', full.names = TRUE)
tables <- lapply(files, read.csv)
do.call(rbind, tables)
}
pollutantmean <- load_data("specdata")
Be aware that do.call + rbind is relatively slow. You might find dplyr::bind_rows or data.table::rbindlist to be substantially faster.
To update Prof. Wickham's answer above with code from the more recent purrr library which he coauthored with Lionel Henry:
Tbl <-
list.files(pattern="*.csv") %>%
map_df(~read_csv(.))
If the typecasting is being cheeky, you can force all the columns to be as characters with this.
Tbl <-
list.files(pattern="*.csv") %>%
map_df(~read_csv(., col_types = cols(.default = "c")))
If you are wanting to dip into subdirectories to construct your list of files to eventually bind, then be sure to include the path name, as well as register the files with their full names in your list. This will allow the binding work to go on outside of the current directory. (Thinking of the full pathnames as operating like passports to allow movement back across directory 'borders'.)
Tbl <-
list.files(path = "./subdirectory/",
pattern="*.csv",
full.names = T) %>%
map_df(~read_csv(., col_types = cols(.default = "c")))
As Prof. Wickham describes here (about halfway down):
map_df(x, f) is effectively the same as do.call("rbind", lapply(x, f)) but under the hood is much more efficient.
and a thank you to Jake Kaupp for introducing me to map_df() here.
This can be done very succinctly with dplyr and purrr from the tidyverse. Where x is a list of the names of your csv files you can simply use:
bind_rows(map(x, read.csv))
Mapping read.csv to x produces a list of dfs that bind_rows then neatly combines!
```{r echo = FALSE, warning = FALSE, message = FALSE}
setwd("~/Data/R/BacklogReporting/data/PastDue/global/") ## where file are located
path = "~/Data/R/BacklogReporting/data/PastDue/global/"
out.file <- ""
file.names <- dir(path, pattern = ".csv")
for(i in 1:length(file.names)){
file <- read.csv(file.names[i], header = TRUE, stringsAsFactors = FALSE)
out.file <- rbind(out.file, file)
}
write.csv(out.file, file = "~/Data/R/BacklogReporting/data/PastDue/global/global_stacked/past_due_global_stacked.csv", row.names = FALSE) ## directory to write stacked file to
past_due_global_stacked <- read.csv("C:/Users/E550143/Documents/Data/R/BacklogReporting/data/PastDue/global/global_stacked/past_due_global_stacked.csv", stringsAsFactors = FALSE)
files <- list.files(pattern = "\\.csv$") %>% t() %>% paste(collapse = ", ")
```
If your csv files are into an other directory, you could use something like this:
readFilesInDirectory <- function(directory, pattern){
files <- list.files(path = directory,pattern = pattern)
for (f in files){
file <- paste(directory,files, sep ="")
temp <- lapply(file, fread, sep=",")
data <- rbindlist( temp )
}
return(data)
}
In your current function pollutantmean is available only in the scope of the function x. Modify your function to this
x <- function(directory) {
dir <- directory
data_dir <- paste(getwd(),dir,sep = "/")
files <- list.files(data_dir,pattern = '\\.csv')
tables <- lapply(paste(data_dir,files,sep = "/"), read.csv, header = TRUE)
assign('pollutantmean',do.call(rbind , tables))
}
assign should put result of do.call(rbind, tables) into variable called pollutantmean in global environment.
I am attempting to load multiple text files into R and in each of the files, the columns are divided using the "|" character.
To give a sense of what the file structure looks like, a given row will look like:
congression printer|-182.431552949032
In this file I want to separate the congressional printer string from the numerical characters.
When using the following code:
folder <- '~/filepath'
file_list <- list.files(path=folder, pattern="*.txt")
data <-
do.call('rbind',
lapply(file_list,
function(x)
read.table(paste(folder, x, sep= ""),
header = TRUE, row.names = NULL)))
It'll load in the data as:
[1] [2]
congression printer|-182.431552949032
Is there away to correct this later using the tidyr::separate() function or by hedging the problem in the beginning? When trying to just put sep ="|" in the code above, that just impacts how my text files are found so that doesn't really work.
Things are always easier (and more powerful) with data.table :
library(data.table)
folder <- '~/filepath'
pathsList <- list.files(path=folder, pattern="*.txt", full.names = T)
rbindlist(lapply(pathsList, fread))
this works too:
folder <- '~/filepath'
file_list <- list.files(path=folder, pattern="*.txt")
data <-
do.call('rbind',
lapply(file_list,
function(x)
read.table(paste0(folder, x), sep = "|",
header = TRUE, row.names = NULL)))
I want to apply the below script to every file in the Weather directory and copy the changes back to the same csv file (Bladen.csv in this case).
Bladen <- read.csv("C:/Users//Desktop/Weather/Bladen.csv",header=T, na.strings=c("","NA"))
Bladen <- Bladen[,c(1,6,11,17,18,19)]
I would try something like this:
setwd('/adress/to/the/path')
files <- dir()
for(i in files){
Bladen <- read.csv(i, header=T, na.strings=c("","NA"))
Bladen <- Bladen[,c(1,6,11,17,18,19)]
write.csv(Bladen, i)
}
Please tell me if it works for you.
If you are looking to update each file in your directory by adding the same column to each file and writing the file back to the same directory.
setwd(set_your_path)
filenames <- list.files()
lapply(filenames, function(i){
Bladen = read.csv(i, sep = ",", header = TRUE, na.strings = c("NA","N/A","null",""," "))
Bladen<- Bladen[, c(1,6,11,17,18,19)]
write.csv(Bladen, i, sep = ",")
})
I have a list of approximately 500 csv files each with a filename that consists of a six-digit number followed by a year (ex. 123456_2015.csv). I would like to append all files together that have the same six-digit number. I tried to implement the code suggested in this question:
Import and rbind multiple csv files with common name in R but I want the appended data to be saved as new csv files in the same directory as the original files are currently saved. I have also tried to implement the below code however the csv files produced from this contain no data.
rm(list=ls())
filenames <- list.files(path = "C:/Users/smithma/Desktop/PM25_test")
NAPS_ID <- gsub('.+?\\([0-9]{5,6}?)\\_.+?$', '\\1', filenames)
Unique_NAPS_ID <- unique(NAPS_ID)
n <- length(Unique_NAPS_ID)
for(j in 1:n){
curr_NAPS_ID <- as.character(Unique_NAPS_ID[j])
NAPS_ID_pattern <- paste(".+?\\_(", curr_NAPS_ID,"+?)\\_.+?$", sep = "" )
NAPS_filenames <- list.files(path = "C:/Users/smithma/Desktop/PM25_test", pattern = NAPS_ID_pattern)
write.csv(do.call("rbind", lapply(NAPS_filenames, read.csv, header = TRUE)),file = paste("C:/Users/smithma/Desktop/PM25_test/MERGED", "MERGED_", Unique_NAPS_ID[j], ".csv", sep = ""), row.names=FALSE)
}
Any help would be greatly appreciated.
Because you're not doing any data manipulation, you don't need to treat the files like tabular data. You only need to copy the file contents.
filenames <- list.files("C:/Users/smithma/Desktop/PM25_test", full.names = TRUE)
NAPS_ID <- substr(basename(filenames), 1, 6)
Unique_NAPS_ID <- unique(NAPS_ID)
for(curr_NAPS_ID in Unique_NAPS_ID){
NAPS_filenames <- filenames[startsWith(basename(filenames), curr_NAPS_ID)]
output_file <- paste0(
"C:/Users/nwerth/Desktop/PM25_test/MERGED_", curr_NAPS_ID, ".csv"
)
for (fname in NAPS_filenames) {
line_text <- readLines(fname)
# Write the header from the first file
if (fname == NAPS_filenames[1]) {
cat(line_text[1], '\n', sep = '', file = output_file)
}
# Append every line in the file except the header
line_text <- line_text[-1]
cat(line_text, file = output_file, sep = '\n', append = TRUE)
}
}
My changes:
list.files(..., full.names = TRUE) is usually the best way to go.
Because the digits appear at the start of the filenames, I suggest substr. It's easier to get an idea of what's going on when skimming the code.
Instead of looping over the indices of a vector, loop over the values. It's more succinct and less likely to cause problems if the vector's empty.
startsWith and endsWith are relatively new functions, and they're great.
You only care about copying lines, so just use readLines to get them in and cat to get them out.
You might consider something like this:
##will take the first 6 characters of each file name
six.digit.filenames <- substr(filenames, 1,6)
path <- "C:/Users/smithma/Desktop/PM25_test/"
unique.numbers <- unique(six.digit.filenames)
for(j in unique.numbers){
sub <- filenames[which(substr(filenames,1,6) == j)]
data.for.output <- c()
for(file in sub){
##now do your stuff with these files including read them in
data <- read.csv(paste0(path,file))
data.for.output <- rbind(data.for.output,data)
}
write.csv(data.for.output,paste0(path,j, '.csv'), row.names = F)
}
I have 40 text files with names :
[1] "2006-03-31.txt" "2006-06-30.txt" "2006-09-30.txt" "2006-12-31.txt" "2007-03-31.txt"
[6] "2007-06-30.txt" "2007-09-30.txt" "2007-12-31.txt" "2008-03-31.txt" etc...
I need to extract one specific data, i know how to do it individually but this take a while:
m_value1 <- `2006-03-31.txt`$Marknadsvarde_tot[1]
m_value2 <- `2006-06-30.txt`$Marknadsvarde_tot[1]
m_value3 <- `2006-09-30.txt`$Marknadsvarde_tot[1]
m_value4 <- `2006-12-31.txt`$Marknadsvarde_tot[1]
Can someone help me with a for loop which would extract the data from a specific column and row through all the different text files please?
Assuming your files are all in the same folder, you can use list.files to get the names of all the files, then loop through them and get the value you need. So something like this?
m_value<-character() #or whatever the type of your variable is
filelist<-list.files(path="...", all.files = TRUE)
for (i in 1:length(filelist)){
df<-read.table(myfile[i], h=T)
m_value[i]<-df$Marknadsvarde_tot[1]
}
EDIT:
In case you have imported already all the data you can use get:
txt_files <- list.files(pattern = "*.txt")
for(i in txt_files) { x <- read.delim(i, header=TRUE) assign(i,x) }
m_value<-character()
for(i in 1:length(txt_files)) {
m_value[i] <- get(txt_files[i])$Marknadsvarde_tot[1]
}
You could utilize the select-parameter from fread of the data.table-package for this:
library(data.table)
file.list <- list.files(pattern = '.txt')
lapply(file.list, fread, select = 'Marknadsvarde_tot', nrow = 1, header = FALSE)
This will result in a list of datatables/dataframes. If you just want a vector with all the values:
sapply(file.list, function(x) fread(x, select = 'Marknadsvarde_tot', nrow = 1, header = FALSE)[[1]])
temp = list.files(pattern="*.txt")
library(data.table)
list2env(
lapply(setNames(temp, make.names(gsub("*.txt$", "", temp))),
fread), envir = .GlobalEnv)
Added data.table to an existing answer at Importing multiple .csv files into R
After you get all your files you can get data from the data.tables using DT[i,j,k] where i will be your condition