I have 900 text files in my directory as seen in the following figure below
each file consists of data in the following format
667869 667869.000000
580083 580083.000000
316133 316133.000000
11065 11065.000000
I would like to extract fourth row from each text file and store the values in an array, any suggestions are welcome
This sounds more like a StackOverflow question, similar to
Importing multiple .csv files into R
You can try something like:
setwd("/path/to/files")
files <- list.files(path = getwd(), recursive = FALSE)
head(files)
myfiles = lapply(files, function(x) read.csv(file = x, header = TRUE))
mydata = lapply(myfiles, FUN = function(df){df[4,]})
str(mydata)
do.call(rbind, mydata)
A lazy answer is:
array <- c()
for (file in dir()) {
row4 <- read.table(file,
header = FALSE,
row.names = NULL,
skip = 3, # Skip the 1st 3 rows
nrows = 1, # Read only the next row after skipping the 1st 3 rows
sep = "\t") # change the separator if it is not "\t"
array <- cbind(array, row4)
}
You can further keep the name of the files
colnames(array) <- dir()
Related
I have a list of 50 text files all beginning with NEW.
I want to loop through each textfile/dataframe and run some function and then output the results via the write.table function. Therefore for each file, a function is applied and then an output should be created containing the original name with output at the end.
Here is my code.
fileNames <- Sys.glob("*NEW.*")
for (fileName in fileNames) {
df <- read.table(fileName, header = TRUE)
FUNCTION (not shown as this works)
...
result <-print(chr1$results) #for each file a result would be printed.
write.table(result, file = paste0(fileName,"_output.txt"), quote = F, sep = "\t", row.names = F, col.names = T)
#for each file a new separate file is created with the original output name retained.
}
However, I only get one output rather than 50 output files. It seems like its only looping through one file. What am I doing wrong?
readme <- function(folder_name = "my_texts"){
file_list <- list.files(path = folder_name, pattern = "*.txt",
recursive = TRUE, full.names = TRUE).
#list files with .txt ending
textdata <- lapply(file_list, function(x) {.
paste(readLines(x), collapse=" ").
}).
#apply readlines over the file list.
data.table::setattr(textdata, "names", file_list) .
#add names attribute to textdata from file_list.
lapply(names(file_list), function(x){.
lapply(names(file_list[[x]]), function(y) setattr(DT[[x]], y,
file_list[[x]][[y]])).
}).
#set names attribute over the list.
df1 <- data.frame(doc_id = rep(names(textdata), lengths(textdata)),
doc_text = unlist(textdata), row.names = NULL).
#convert to dataframe where names attribute is doc_id and textdata is text.
return(df1).
}
I need to extract cells from the range C6:E6 (in the code range is [4, 3:5]) from three different csv files ("Multi_year_summary.csv") which are in different folders and then copy them into a new excel files. All csv files have the same name (written above). I tried as follow:
library("xlsx")
zz <- dir("C:/Users/feder/Documents/Simulations_DNDC")
aa <- list.files("C:/Users/feder/Documents/Simulations_DNDC/Try_1", pattern = "Multi_year_summary.csv",
full.names = T, recursive = T, include.dirs = T)
bb <- lapply(aa, read.csv2, sep = ",", header = F)
for (i in 1:length(bb)) {
xx <- bb[[i]][4, 3:5]
qq <- rbind(xx)
jj <- write.xlsx(qq, "C:/Users/feder/Documents/Simulations_DNDC/Try_1/Results.xlsx",
sheetName="Tabelle1",col.names = FALSE, row.names = FALSE)
}
The code is executed, but extracts the cells only from one file so that in Results.xlsx I have only one row instead of three. Maybe the problem starts from xx <- bb[[i]][4, 3:5] since if I execute xx the console gives back "1 obs. of 3 variables" instead of 3 objects.
Any help will be greatly appreciated.
After reading the csv you can extract the relevant data needed in the same lapply loop, combine them into one dataframe and write it in xlsx format.
result <- do.call(rbind, lapply(aa, function(x) read.csv(x, header = FALSE)[4, 3:5]))
write.xlsx(result,
"C:/Users/feder/Documents/Simulations_DNDC/Try_1/Results.xlsx",
sheetName="Tabelle1",col.names = FALSE, row.names = FALSE)
ListOfFileNames= list.files(path = "D:/in/",
pattern = '*.txt',recursive = T)
options(stringsAsFactors = F)
setwd("D:/in/")
outFile <- file("output.txt", "w")
for (i in ListOfFileNames){
x = read.delim(ListOfFileNames[i], skip = 29, nrows= 1)
x = as.character(x)
writeLines(x, paste('D:/out/out.csv',sep = ","))
}
enter link description hereThis the txt files that I have.
I would like to extract row number 30 and 63 from each txt file and save it into one txt file. How can I solve this in R ? This is the codes that I try to extract row number 30 and store it in one csv file. But it doesn't work. Could you please help ?
Thanks
You can try :
ListOfFileNames= list.files(path = "D:/in/",
pattern = '*.txt',recursive = TRUE, full.names = TRUE)
result <- do.call(rbind, lapply(ListOfFileNames, function(x)
read.csv(x)[c(30, 63), ]))
write.csv(result, 'D:/out/out.csv', row.names = FALSE)
I would like to read multiple text files from my directory the files are arranged in following format
regional_vol_GM_atlas1.txt
regional_vol_GM_atlas2.txt
........
regional_vol_GM_atlas152.txt
Data from the files looks in following format
667869 667869
580083 580083
316133 316133
3631 3631
following is the script that i have written
library(readr)
library(stringr)
library(data.table)
array <- c()
for (file in dir(/media/dev/Daten/Task1/subject1/t1)) # path to the directory where .txt files are located
{
row4 <- read.table(file=list.files(pattern ="regional_vol*.txt"),
header = FALSE,
row.names = NULL,
skip = 3, # Skip the 1st 3 rows
nrows = 1, # Read only the next row after skipping the 1st 3 rows
sep = "\t") # change the separator if it is not "\t"
array <- cbind(array, row4)
}
I am incurring following error
Error in file(file, "rt") : invalid 'description' argument
kindly suggest me where i was wrong in the script
This seems to work fine for me. Make changes as per code comments in case files have headers :
[Answer Edited to reflect new information posted by OP]
# rm(list=ls()) #clean memory if you can afford to
mydir<- "~/Desktop/a" #change as per your path
# read full paths
myfiles<- list.files(mydir,pattern = "regional_vol*",full.names=T)
myfiles #check that files listed correctly
# initialise the dataframe from first file
# change header =T/F depending on presence of header
# make sure sep is correct
df<- read.csv( myfiles[1], header = F, skip = 0, nrows = 4, sep="" )[-c(1:3),]
#check that first line was read correctly
df
#read all the other files and update dataframe
#we read 4 lines to read the header correctly, then remove 3
ans<- lapply(myfiles[-1], function(x){ read.csv( x, header = F, skip = 0, nrows = 4, sep="")[-c(1:3),] })
ans
#update dataframe
lapply(ans, function(x){df<<-rbind(df,x)} )
#this should be the required dataframe
df
Also, if you are on Linux, a much simple method would be to simply make the OS do it for you
awk 'FNR == 4' regional_vol*.txt
This should do it for you.
# set the working directory (where files are saved)
setwd("C:/Users/your_path_here/Desktop/")
file_names = list.files(getwd())
file_names = file_names[grepl(".TXT",file_names)]
# print file_names vector
file_names
# read the WY.TXT file, just for testing
# file = read.csv("C:/Users/your_path_here/Desktop/regional_vol_GM_atlas1.txt", header=F, stringsAsFactors=F)
# see the data structure
str(file)
# run read.csv on all values of file_names
files = lapply(file_names, read.csv, header=F, stringsAsFactors = F)
files = do.call(rbind,files)
# set column names
names(files) = c("field1", "field2", "field3", "field4", "field5")
str(files)
write.table(files, "C:/Users/your_path_here/Desktop/mydata.txt", sep="\t")
write.csv(files,"C:/Users/your_path_here/Desktop/mydata.csv")
I try to convert all my .txt files in .csv, but I didn't manage to create the loop.
The actual line for one file (which works perfectly) would be the following:
tab = read.delim("name_file", header = TRUE, skip = 11)
write.table(tab, file="name_file.csv",sep=",",col.names=TRUE,row.names=FALSE)
And I would like to do that for all the .txt file I have in wd.
I tried the loop with, based on some reasearch on the web, but I am not sure it's the right one:
FILES = list.files(pattern = ".txt")
for (i in 1:length(FILES)) {
FILES = read.csv(file = FILES[i], header = TRUE, skip = 11, fill = TRUE)
write.csv(FILES, file = paste0(sub("folder_name", ".txt","", FILES[i]), ".csv"))
}
I'm on Windows system.
I would appreciate some help... Thanks!
Hi I have the same problem before just like you, and now I made it works. Try this:
directory <- "put_your_txt_directory_here"
ndirectory <- "put_your_csv_directory_here"
file_name <- list.files(directory, pattern = ".txt")
files.to.read <- paste(directory, file_name, sep="/")
files.to.write <- paste(ndirectory, paste0(sub(".txt","", file_name),".csv"), sep="/")
for (i in 1:length(files.to.read)) {
temp <- (read.csv(files.to.read[i], header = TRUE, skip = 11, fill = TRUE))
write.csv(temp, file = files.to.write[i])
}
You need to index the output inside the loop as well. Try this:
INFILES = list.files(pattern = ".txt")
OUTFILES = vector(mode = "character", length = length(INFILES))
for (i in 1:length(INFILES)) {
OUTFILES[i] = read.csv(file = INFILES[i], header = TRUE, skip = 11,
fill = TRUE)
write.csv(OUTFILES[i], file = paste0("folder_name", sub(".txt","", INFILES[i]), ".csv"))
}
Assuming that your input files always have at least 11 rows (since you skip the first 11 rows!) this should work:
filelist = list.files(pattern = ".txt")
for (i in 1:length(filelist)) {
cur.input.file <- filelist[i]
cur.output.file <- paste0(cur.input.file, ".csv")
print(paste("Processing the file:", cur.input.file))
# If the input file has less than 11 rows you will reveive the error message:
# "Error in read.table: no lines available in input")
data = read.delim(cur.input.file, header = TRUE, skip = 11)
write.table(data, file=cur.output.file, sep=",", col.names=TRUE, row.names=FALSE)
}
If you reveive any error during file conversion it is caused by the content (e. g. unequal number of rows per column, unequal number of columns etc.).
PS: Using a for loop is OK here since it does not limit the performance (there is no "vectorized" logic to read and write files).