I'm trying to write a function to more or less automatize file import for experimental data.So far it works fine if the folder only contains one file, but the program I use creates two files in the given file path for e.g. Trial1:
Trial1_001_match_20161115_121628.csv.aborted and Trial1_001_midi_20161115_121628.csv.aborted. I'm only interested in the midi file. Is there an easy way to implement that only the file containing the string midi gets imported or something like this?
path <- "C:/Users/Thomas/Desktop/tapping backup/Pilot141116/pilot_151116_pat1_250/realisations/participant_8/Trial1"
setwd( path )
files <- list.files(path = path, pattern = ".csv", full.names = T )
# set up a function to read a file and add a column for filename
import <- function( file ) {
df <- read_csv( file, col_names = T )
df$file <- file
return( df )
}
# run that function across all files.
data1 <- ldply( .data = files, .fun = import )`
As you don't giva a reproducible example, I can't check, but the following should work: files[grepl("midi", files)].
Related
I'm processing files through an application using R. The application requires a simple inputfile, outputfilename specification as parameters. Using the below code, this works fine.
input <- "\"7374.txt\""
output <- "\"7374_cleaned.txt\""
system2("DataCleaner", args = c(input, output))
However I wish to process a folder of .txt files, rather then have to do each one individually. If i had access to the source code i would simply alter the application to accept a folder rather then an individual file, but unfortunately i don't. Is it possible to somehow do this in R? I had tried starting to create a loop,
input <- dir(pattern=".txt")
but i don't know how i could insert a vector in as an argument without the regex included as part of that? Also i would then need to be able to paste '_cleaned' on to the end of the outputfile names? Many thanks in advance.
Obviously, I can't test it because I don't have your DataCleaner program but how about this...
# make some files
dir.create('folder')
x = sapply(seq_along(1:5), function(f) {t = tempfile(tmpdir = 'folder', fileext = '.txt'); file.create(t); t})
# find the files
inputfiles = list.files(path = 'folder', pattern = 'txt', full.names = T)
# remove the extension
base = tools::file_path_sans_ext(inputfiles)
# make the output file names
outputfiles = paste0(base, '_cleaned.txt')
mysystem <- function(input, output) {
system2('DataCleaner', args = c(input, output))
}
lapply(seq_along(1:length(inputfiles)), function(f) mysystem(inputfiles[f], outputfiles[f]))
It uses lapply to iterate over all the members of the input and output files and calls the system2 function.
I need to shape the data stored in Excel files and save it as new .csv files. I figured out what specific actions should be done, but can't understand how to use lapply.
All Excell files have the same structure. Each of the .csv files should have the name of original files.
## the original actions successfully performed on a single file
library(readxl)
library("reshape2")
DataSource <- read_excel("File1.xlsx", sheet = "Sheet10")
DataShaped <- melt(subset(DataSource [-(1),], select = - c(ng)), id.vars = c ("itemname","week"))
write.csv2(DataShaped, "C:/Users/Ol/Desktop/Meta/File1.csv")
## my attempt to apply to the rest of the files in the directory
lapply(Files, function (i){write.csv2((melt(subset(read_excel(i,sheet = "Sheet10")[-(1),], select = - c(ng)), id.vars = c ("itemname","week"))))})
R returns the result to the console but doesn't create any files. The result resembles .csv structure.
Could anybody explain what I am doing wrong? I'm new to R, I would be really grateful for the help
Answer
Thanks to the prompt answer from #Parfait the code is working! So glad. Here it is:
library(readxl)
library(reshape2)
Files <- list.files(full.names = TRUE)
lapply(Files, function(i) {
write.csv2(
melt(subset(read_excel(i, sheet = "Decomp_Val")[-(1),],
select = -c(ng)),id.vars = c("itemname","week")),
file = paste0(sub(".xlsx", ".csv",i)))
})
It reads an Excel file in the directory, drops first row (but headers) and the column named "ng", melts the data by labels "itemname" and "week", writes the result as a .csv to the working directory attributing the name of the original file. And then - rinse and repeat.
Simply pass an actual file path to write.csv2. Otherwise, as denoted in docs ?write.csv, the default value for file argument is empty string "" :
file: either a character string naming a file or a connection open for writing. "" indicates output to the console.
Below concatenates the Excel file stem to the specified path directory with .csv extension:
path <- "C:/Users/Ol/Desktop/Meta/"
lapply(Files, function (i){
write.csv2(
melt(subset(read_excel(i, sheet = "Sheet10")[-(1),],
select = -c(ng)),
id.vars = c("itemname","week")),
file = paste0(path, sub(".xlsx", ".csv", i))
)
})
to parse json, i can use this approach
library("rjson")
json_file <- "https://api.coindesk.com/v1/bpi/currentprice/USD.json"
json_data <- fromJSON(paste(readLines(json_file), collapse=""))
but what if i want work with set of json files
it located
json_file<-"C:/myfolder/"
How to parse in to data.frame all json files in this folder? (there 1000 files)?
A lot of missing info, but this will probably work..
I used pblapply to get a nice progress-bar (since you are mentioning >1000 files).
I never used the solution below for JSON-files (no experience wit JSON), but it works flawless on .csv and .xls files ( of course with different read-functions).. so I expect it to work with JSON also.
library(data.table)
library(pbapply)
library(rjson)
folderpath <- "C:\\myfolder\\"
filefilter <- "*.json$"
#set paramaters as needed
f <- list.files( path = folderpath,
pattern = filefilter,
full.names = TRUE,
recursive = FALSE )
#read all files to a list
f.list <- pblapply( f, function(x) fromJSON( file = x ) )
#join lists together
dt <- data.table::rbindlist( f.list )
Suppose I have file_a.R. It is sourced via R's base source function by some other files file_b.R, file_c.R, which are located in the same folder or sub folder. Is there an easy way to get the paths of file_b.R and file_c.R given the path of file_a.R?
EDIT:
If you want to get all links between R files, and some files that are sourced in those files, you can use something like that:
library(stringr)
#Get all R files paths in working directory and subdirectories
filelist <- lapply(list.files(
pattern = "[.]R$", recursive = TRUE
), print)
#Extract one file's sources
getSources <- function(file, pattern) {
#Store all file lines in a character vector
lines <- readLines(file, warn = FALSE)
#Extract R-filenames starting with "pattern" in all lines containing "source"
sources <- lapply(lines, function(x) {
if (length(grep("source", x) > 0)) {
str_extract(x, paste0(pattern, ".*[.]R"))
}
else{
NA
}
})
#Remove NA (lines without source)
sources <- sources[!is.na(sources)]
#Return a list
list(path = file,
pattern = pattern,
sources = unlist(sources))
}
#Example
corresp <- lapply(X = filelist, FUN = getSources, pattern = "file")
It will return a list of:
$path: R file path
$pattern: pattern used to match sources
$sources: the name of the sourced file
And you'll be able to see if anything is sourced anywhere, including file_A.
I'm working on some R code to automatize file import. I´ve been using sub to change the path string, more specifically I want to go through Trial1 to Trial10 for participant 1 and so forth and than save it as data[i]. Instead of putting this manually for all trials could this be done more efficiently with a loop? The function itself adds the filepath to the imported data so I can use this information later
path <- "C:/Users/Thomas/Desktop/tapping backup/Pilot141116/pilot_151116_pat1_250/realisations/participant_8/Trial1"
setwd( path )
files <- list.files(path = path, pattern = "midi.*\\.csv", full.names = T )
# set up a function to read a file and add a column for filename
import <- function( file ) {
df <- read_csv( file, col_names = T )
df$file <- file
return( df )
}
# run that function across all files.
data1 <- ldply( .data = files, .fun = import )
I would build the file list from pilot_151116_pat1_250/realisations/ with recursive set to TRUE and full.names set to TRUE. Then you run the ldply loop with the import function. Later you can deduce from the file column which participant and trial you data was part of. This can be done by using strsplit with sep equal to /, or by using separate from the dplyr package.