Trouble with a specific character (") in text file in R - r

In R I have a folder with many pictures. I would like to create one text file where the folder path of all my pictures are written (1 line/picture) as:
data/obj/scan 1.png
data/obj/scan 2.png
But finally I got in my text file:
"data/obj/""scan 1.png"
"data/obj/""scan 2.png"
How to remove or avoid the " ?
Here is my program:
folder <- "C:/Users/user/Desktop/R/test/"
jpeg <- list.files(folder)
train <- matrix(nrow=as.numeric(length(jpeg)),ncol=2)
ind <- 0
for (k in jpeg){
ind <- ind +1
name <- paste0(k)
train[ind,1:2] <- cbind(as.character("data/obj/"),as.character(name))
}
write.table(train, "train.txt", sep="", row.names = FALSE, col.names = FALSE)

Welcome to SO.
You'll need to provide quote = FALSE to write.table to avoid writing the quotation marks:
folder <- "C:/Users/user/Desktop/R/test/"
jpeg <- basename(list.files(folder, pattern= "\\.jpeg$", recursive = TRUE))
train <- paste0("data/obj/", jpeg)
write.table(train, "train.txt", sep="", row.names = FALSE, col.names = FALSE, quote = FALSE)

Related

How to convert .txt to .csv file in R

I found this code here, and it worked to convert the '.txt' to '.csv' but the file is not broken into columns, pretty sure there's an easy fix or line to add here, but I'm not finding it. Still new to r and working through, so any help or direction is appreciated.
EDIT: The file contains the following, a list of invasive plants:
Header: Noxious Weed List.
'(a) Abrus precatorius – rosary pea '
'(b) Aeginetia spp. – aeginetia'
'(c) Ageratina adenophora – crofton weed '
'(d) Alectra spp. – alectra '
And so I would like to get all the parts, i.e., genus, species, and common name, in a separate column. and if possible, delete the letters '(a)' and the ' - ' separating hyphen.
filelist = list.files(pattern = ".txt")
for (i in 1:length(filelist)) {
input<-filelist[i]
output <- paste0(gsub("\\.txt$", "", input), ".csv")
print(paste("Processing the file:", input))
data = read.delim(input, header = TRUE)
write.table(data, file=output, sep=",", col.names=TRUE, row.names=FALSE)
}
You'll need to adjust if you have common names with three or more words, but this is the general idea:
path <- "C:\\Your File Path Here\\"
file <- paste0(path, "WeedList.txt")
DT <- read.delim(file, header = FALSE, sep = " ")
DT <- DT[-c(1),-c(1,4,7)]
colnames(DT) <- c("Genus", "Species", "CommonName", "CommonName2")
DT$CommonName <- gsub("'", "", DT$CommonName)
DT$CommonName2 <- gsub("'", "", DT$CommonName2)
DT$CommonName <- paste(DT$CommonName, DT$CommonName2, sep = " ")
DT <- DT[,-c(4)]
write.csv(DT, paste0(path, "WeedList.csv"), row.names = FALSE)

Extract specific cells from multiple csv files and copy them into a new excel file

I need to extract cells from the range C6:E6 (in the code range is [4, 3:5]) from three different csv files ("Multi_year_summary.csv") which are in different folders and then copy them into a new excel files. All csv files have the same name (written above). I tried as follow:
library("xlsx")
zz <- dir("C:/Users/feder/Documents/Simulations_DNDC")
aa <- list.files("C:/Users/feder/Documents/Simulations_DNDC/Try_1", pattern = "Multi_year_summary.csv",
full.names = T, recursive = T, include.dirs = T)
bb <- lapply(aa, read.csv2, sep = ",", header = F)
for (i in 1:length(bb)) {
xx <- bb[[i]][4, 3:5]
qq <- rbind(xx)
jj <- write.xlsx(qq, "C:/Users/feder/Documents/Simulations_DNDC/Try_1/Results.xlsx",
sheetName="Tabelle1",col.names = FALSE, row.names = FALSE)
}
The code is executed, but extracts the cells only from one file so that in Results.xlsx I have only one row instead of three. Maybe the problem starts from xx <- bb[[i]][4, 3:5] since if I execute xx the console gives back "1 obs. of 3 variables" instead of 3 objects.
Any help will be greatly appreciated.
After reading the csv you can extract the relevant data needed in the same lapply loop, combine them into one dataframe and write it in xlsx format.
result <- do.call(rbind, lapply(aa, function(x) read.csv(x, header = FALSE)[4, 3:5]))
write.xlsx(result,
"C:/Users/feder/Documents/Simulations_DNDC/Try_1/Results.xlsx",
sheetName="Tabelle1",col.names = FALSE, row.names = FALSE)

R: Append multiple rows to dataframe within for-loop

I have PDF files that I made from these wikipedia pages (for example):
https://en.wikipedia.org/wiki/AIM-120_AMRAAM
https://en.wikipedia.org/wiki/AIM-9_Sidewinder
I have a list of keywords I want to search for within the document and extract the sentences in which they appear.
keywords <- c("altitude", "range", "speed")
I can call the file, extract the text from the PDF, pull the sentences with the keywords from the PDF. This works if I do this with each of the keywords individually, but when I try to do this in a loop I keep getting this issue where the rows aren't appending. Instead it's almost doing a cbind and then an error gets thrown regarding the number of columns. Here is my code and any help you can provide as to what I can do to make this work is much appreciated.
How do I get the rows to append correctly and appear in one file per PDF?
pdf.files <- list.files(path = "/path/to/file", pattern = "*.pdf", full.names = FALSE, recursive = FALSE)
for (i in 1:length(pdf.files)) {
for (j in 1:length(keywords)) {
text <- pdf_text(file.path("path", "to", "file", pdf.files[i]))
text2 <- tolower(text)
text3 <- gsub("\r", "", text2)
text4 <- gsub("\n", "", text3)
text5 <- grep(keywords[j], unlist(strsplit(text4, "\\.\\s+")), value = TRUE)
}
temp <- rbind(text5)
assign(pdf.files[i], temp)
}
After I get the rows to append correctly the next step will be to add in the keywords as a variable to the left of the extracted sentences. Example of ideal output:
keywords sentence
altitude sentence1.1
altitude sentence1.2
range sentence2.1
range sentence2.2
range sentence2.3
speed sentence3.1
speed sentence3.2
Would this be done in the loop as well or post as a separate function?
Any help is appreciated.
Alright so it took some real thinking but I made it work and it's not pretty but it gets the job done:
# This first part initializes the files to be written to
files <- list.files(path = "/path/to/file", pattern = "*.*", full.names = FALSE, recursive = FALSE)
for (h in 1:length(files)) {
temp1 <- data.frame(matrix(ncol = 2, nrow = 0))
x <- c("Title", "x")
colnames(temp1) <- x
write.table(temp1, paste0("/path/to/file", tools::file_path_sans_ext(files[h]), ".txt"), sep = "\t", row.names = FALSE, quote = FALSE)
}
# This next part fills in the files with the sentences
pdf.files <- list.files(path = "/path/to/file", pattern = "*.pdf", full.names = FALSE, recursive = FALSE)
for (i in 1:length(pdf.files)) {
for (j in 1:length(keywords)) {
text <- pdf_text(file.path("path", "to", "file", pdf.files[i]))
text2 <- tolower(text)
text3 <- gsub("\r", "", text2)
text4 <- gsub("\n", "", text3)
text5 <- as.data.frame(grep(keywords[j], unlist(strsplit(text4, "\\.\\s+")), value = TRUE))
colnames(text5) <- "x"
if (nrow(text5) != 0) {
title <- as.data.frame(keywords[j])
colnames(title) <- "Title"
temp <- cbind(title, text5)
temp <- unique(temp)
write.table(temp, paste0("/path/to/file", tools::file_path_sans_ext(pdf.files[i]), ".txt"), sep = "\t", row.names = FALSE, quote = FALSE, col.names = FALSE, append = TRUE)
}
}
}

How can I quickly find all the files in a directory that are missing a first row?

I have a folder of files that are in .csv format. They have blank lines in them that are necessary (this indicates an absence of a measure from a LiDAR unit, which is good and needs to stay in). But occasionally, the first row is empty this throws off the code and the package and everything aborts.
Right now I have to open each .csv and see if the first line is empty.
I would like to do one of the following, but am at a loss how to:
1) write a code that quickly scans through all of the files in the directory and tells me which ones are missing the first line
2) be able to skip the empty lines that are only at the beginning--which can vary, sometimes more than one line is empty
3) have a code that cycles through all of the .csv files and inserts a dummy first line of numbers so the files all import no problem.
Thanks!
Here's a bit of code that does 1 and 2 above. I'm not sure why you'd want to insert dummy line(s) given the ability to do 1 and 2; it's straightforward to do, but usually it's not a good idea to modify raw data files.
# Create some test files
cat("x,y", "1,2", sep="\n", file = "blank0.csv")
cat("", "x,y", "1,2", sep="\n", file = "blank1.csv")
cat("", "", "x,y", "1,2", sep="\n", file = "blank2.csv")
files <- list.files(pattern = "*.csv", full.names = TRUE)
for(i in seq_along(files)) {
filedata <- readLines(files[i])
lines_to_skip <- min(which(filedata != "")) - 1
cat(i, files[i], lines_to_skip, "\n")
x <- read.csv(files[i], skip = lines_to_skip)
}
This prints
1 ./blank0.csv 0
2 ./blank1.csv 1
3 ./blank2.csv 2
and reads in each dataset correctly.
I believe that the two functions that follow can do what you want/need.
First, a function to determine the files with a second line blank.
second_blank <- function(path = ".", pattern = "\\.csv"){
fls <- list.files(path = path, pattern = pattern)
second <- sapply(fls, function(f) readLines(f, n = 2)[2])
which(nchar(gsub(",", "", second)) == 0)
}
Then, a function to read in the files with such lines, one at a time. Note that I assume that the first line is the columns header and that at least the second line is left blank. There is a dots argument, ..., for you to pass other arguments to read.table, such as stringsAsFactors = FALSE.
skip_blank <- function(file, ...){
header <- readLines(file, n = 1)
header <- strsplit(header, ",")[[1]]
count <- 1L
while(TRUE){
txt <- scan(file, what = "character", skip = count, nlines = 1)
if(nchar(gsub(",", "", txt)) > 0) break
count <- count + 1L
}
dat <- read.table(file, skip = count, header = TRUE, sep = ",", dec = ".", fill = TRUE, ...)
names(dat) <- header
dat
}
Now, an example usage.
second_blank(pattern = "csv") # a first run as an example usage
inx <- second_blank() # this will be needed later
fl_names <- list.files(pattern = "\\.csv") # get all the CSV files
df_list <- lapply(fl_names[inx], skip_blank) # read the problem ones
names(df_list) <- fl_names[inx] # tidy up the result list
df_list

How to load a txt file one by one in R rather than read all at once and combine into a single matrix

I have 100 text file in a folder. I can use this function below to read all the files and store it into myfile.
file_list <- list.files("C:/Users/User/Desktop/code/Test/", full=T)
file_con <- lapply(file_list, function(x){
return(read.table(x, head=F, quote = "\"", skip = 6, sep = ","))
})
myfile <- do.call(rbind, file_con)
My question is how I can read the first file in the Test folder before I read the second file. All the text file name also are different and I cannot change it to for example number from 1 to 100. I was thinking of maybe can add a integer no infront of all my text file, then use a for loop to match the file and call but is this possible?
I need to read the first file then do some calculation and then export the result into result.txt before read the second file.but now I'm doing it manually and I have almost 800 file, so it will be a big trouble for me to sit and wait it to compute. The code below is the one that I current in use.
myfile = read.table("C:/Users/User/Desktop/code/Test/20081209014205.txt", header = FALSE, quote = "\"", skip = 0, sep = ",")
The following setup will read one file at the time, perform an analysis,
and save it back with a slightly modified name.
save_file_list <- structure(
.Data = gsub(
pattern = "\\.txt$",
replacement = "-e.txt",
x = file_list),
.Names = file_list)
your_function <- function(.file_content) {
## The analysis you want to do on the content of each file.
}
for (.file in file_list) {
.file_content <- read.table(
file = .file,
head = FALSE,
quote = "\"",
skip = 6,
sep = ",")
.result <- your_function(.file_content)
write.table(
x = .result,
file = save_file_list[.file])
}
Now I can read a file and do calculation using
for(e in 1:100){
myfile = read.table(file_list[e], header = FALSE, quote = "\"", skip = 0, sep = ",");
while(condition){
Calculation
}
myresult <- file.path("C:/Users/User/Desktop/code/Result/", paste0("-",e, ".txt"));
write.table(x, file = myresult, row.names=FALSE, col.names=FALSE ,sep = ",");
Now my problem is how I can make my output file to have the same name of the original file but add a -e value at the back?

Resources