Related
I have 100 csv files in the same folder, let's say the path="D:\Data".
For each file I want to:
Step 1. read the file from row 12 since the column names are at row 12;
Step 2. select certain columns from the file, let's say the colname I want to keep
are "Date","Time","Value";
Step 3. add the file name to the file as a new column, for example, I want to
save file1 of which name is "example 1.csv" as file1$Name="example 1.csv",
and similarly, save file2 of which name is "example 2.csv" as
file2$Name="example 2.csv", etc...
So far we got 100 new files with 4 columns "Date","Time","Value","Name". Then finally rbind all the 100 new files together.
I have no idea how to code these steps all together in R. So anyone can help? Thanks very much for your time.
update
Due the complicated data structure in my data, it always return errors by using the sample code in answers. The ideas behind the code were correct, but somehow I could only solve the problem by using the code as below. I believe there would be more elegant way to modify my code instead of using loop.
# set up working directory
setwd("D:/Data")
library(data.table)
files <- list.files(path ="D:/Data", pattern = ".csv")
# read and save each file as a list of data frame in temp
temp <- lapply(files, read.csv, header = TRUE, skip=11, sep = "\t", fileEncoding="utf-16")
seq_along(temp) # the number of files is 112
## select columns "Date","Time","Value" as a new file,
## and attach the file name as a new column to each new file,
## and finally row bind all the files together
temp2=NULL
for(i in 1:112) {
dd=cbind(File=files[i],temp[[i]][,c("Date","Time","Value")])
temp2=rbind(temp2,dd)
}
You can do this very neatly with vroom. It can take a list of files as an argument rather than having to do each separately, and add the filename column itself:
library(vroom)
vroom(files, skip = 11, id = 'filename', col_select = c(Date, Time, Value, filename))
You could try something like this
list_of_files <- list.files(path <- "D:/Data/", pattern="*.csv", full.names=TRUE)
library(dplyr)
library(purrr)
list_of_files %>%
set_names() %>%
map_dfr(~ .x %>%
readr::read_csv(.,
skip = 12,
col_names = TRUE
) %>%
select(Date, Time, Value) %>%
mutate(Date = as.character(Date)) %>%
# Alternatively you could use the .id argument in map_dfr for the filename
mutate(filename = match(.x, list_of_files)))
In R, how does one read delimiter or and also convert delimiter for "|" vertical line (ASCII: | |). I need to split on whole numbers inside the file, so strsplit() does not help me.
I have R code that reads csv file, but it still retains the vertical line "|" character. This file has a separator of "|" between fields. When I try to read with read.table() I get comma, "," separating every individual character. I also try to use dplyr in R for tab_spanner_delim(delim = "|") to convert the vertical line after the read.delim("file.csv", sep="|") read the file, even this read.delmin() does not work. I new to special char R programming.
read.table(text = gsub("|", ",", readLines("file.csv")))
dat_csv <- read.delim("file.csv", sep="|")
x <- cat_csv %>% tab_spanner_delim(delim = "|")
dput() from read.table(text = gsub("|", ",", readLines("file.csv")))
",\",R,D,|,I,|,7,8,|,0,1,0,|,0,0,1,2,|,8,8,1,0,1,|,1,|,7,|,1,0,5,|,1,1,6,|,1,9,9,9,1,2,2,0,|,0,0,:,0,0,|,|,A,M,|,6,|,|,|,|,|,|,|,|,|,|,|,|,|,\",",
",\",R,D,|,I,|,7,8,|,0,1,0,|,0,0,1,2,|,8,8,1,0,1,|,1,|,7,|,1,0,5,|,1,1,6,|,1,9,9,9,1,2,2,6,|,0,0,:,0,0,|,4,.,9,|,|,6,|,|,|,|,|,|,|,|,|,|,|,|,|,\","
dput() from dat_csv <- read.delim("file.csv", sep="|")
"RD|I|78|010|0012|88101|1|7|105|116|19991220|00:00||AM|6|||||||||||||",
"RD|I|78|010|0012|88101|1|7|105|116|19991226|00:00|4.9||6|||||||||||||"
dput(dat_csv)
"RD|I|78|010|0012|88101|1|7|105|116|19991220|00:00||AM|6|||||||||||||",
"RD|I|78|010|0012|88101|1|7|105|116|19991226|00:00|4.9||6|||||||||||||"
We can read the data line by line using readLines. Remove unwanted characters at the end of each line using trimws, paste the string into one string with new line (\n) character as the collapse argument and use this string in read.table to read data as dataframe.
data <- read.table(text = paste0(trimws(readLines('file.csv'),
whitespace = '[", ]'), collapse = '\n'), sep = '|')
I am attempting to solve this issue in R, but I'll upvote answers in any programming language.
I have an example vector of filenames like so called file_list
c("D:/example/sub1/session1/OD/CD/text.txt", "D:/example/sub2/session1/OD/CD/text.txt",
"D:/example/sub3/session1/OD/CD/text.txt")
What I'm trying to do is move and rename the text files to be based on the part of the parent directory that contains the part about sub and session. So the first file would be renamed sub2_session1_text.txtand be copied along with the other text files to just 1 new directory called all_files
I'm struggling with some of the specifics of how to rename the file. I'm trying to use substr combined with str_locate_all and paste0 to copy and rename the files based on these parent directories.
Locate the position in each element of the vector file_list to construct starting and ending position for substr
library(stringr)
ending<-str_locate_all(pattern="/OD",file_list)
starting <- str_locate_all(pattern="/sub", file_list)
I then want to somehow pull out of those lists the starting and ending position of those patterns for each element and then feed it to substr to get the naming down and then in turn use paste0 to create
What I'd like is something like
substr_naming_vector<-substr(file_list, start=starting[starting_position],stop=ending[starting_position])
but I don't know how to index the list such that it can know how to correctly index for each element the starting_position. Once I figure that out I'd fill in something like this
#paste the filenames into a vector that represents them being renamed in a new directory
all_files <- paste0("D:/all_files/", substr_naming_vector)
#rename and copy the files
file.copy(from = file_list, to = all_files)
Here's an example using regular expression, which makes it somewhat shorter:
library(stringr)
library(magrittr)
all_dirs <-
c("D:/example/sub1/session1/OD/CD/text.txt",
"D:/example/sub2/session1/OD/CD/text.txt",
"D:/example/sub3/session1/OD/CD/text.txt")
new_dirs <-
all_dirs %>%
# Match each group using regex
str_match_all("D:/example/(.+)/(.+)/OD/CD/(.+)") %>%
# Paste the matched groups into one path
vapply(function(x) paste0(x[2:4], collapse = "_"), character(1)) %>%
paste0("D:/all_files/", .)
# Copy them.
file.copy(all_dirs, new_dirs)
This is one way of doing it. I assumed your file is always called text.txt.
library(stringr)
my_files <- c("D:/example/sub1/session1/OD/CD/text.txt",
"D:/example/sub2/session1/OD/CD/text.txt",
"D:/example/sub3/session1/OD/CD/text.txt")
# get the sub information
subs <- str_extract(string = my_files,
pattern = "sub[0-9]")
# get the session information
sessions <- str_extract(string = my_files,
pattern = "session[0-9]")
# paste it all together
new_file_names <- paste("D:/all_files/",
paste(subs,
sessions,
"text.txt",
sep = "_"),
sep = "")
file.copy(from = my_files,
to = new_file_names)
I have a CSV file that contains thousands of lines like this:
1001;basket/files/legobrick.mp3
4096;basket/files/sunshade.avi
2038;data/lists/blockbuster.ogg
2038;data/random/noidea.dat
I want to write this to a new CSV file but include only rows which contain '.mp3' or '.avi'. The output file should be just one column and look like this:
"basket/files/legobrick.mp3#1001",
"basket/files/sunshade.avi#4096",
So the first column should be suffixed to the second column and separated by a hash symbol and each line should be quoted and separated by a comma as shown above.
The source CSV file does not contain a header with column names. It's just data.
Can someone tell me how to code this in R?
Edit (following marked answer): This question is not a duplicate because it involves filtering rows and the output code format is completely different requiring different processing methods. The marked answer is also completely different which really backs up my assertion that this is not a duplicate.
You can do it in the following way :
#Read the file with ; as separator
df <- read.csv2(text = text, header = FALSE, stringsAsFactors = FALSE)
#Filter the rows which end with "avi" or "mp3"
inds <- grepl("avi$|mp3$", df$V2)
#Create a new dataframe by pasting those rows with a separator
df1 <- data.frame(new_col = paste(df$V2[inds], df$V1[inds], sep = "#"))
df1
# new_col
#1 basket/files/legobrick.mp3#1001
#2 basket/files/sunshade.avi#4096
#Write the csv
write.csv(df1, "/path/of/file.csv", row.names = FALSE)
Or if you want it as a text file you can do
write.table(df1, "path/test.txt", row.names = FALSE, col.names = FALSE, eol = ",\n")
data
text = "1001;basket/files/legobrick.mp3
4096;basket/files/sunshade.avi
2038;data/lists/blockbuster.ogg
2038;data/random/noidea.dat"
See whether the below code helps
library(tidyverse)
df %>%
filter(grepl("\\.mp3|\\.avi", file_path)) %>%
mutate(file_path = paste(file_path, ID, sep="#")) %>%
pull(file_path) %>% dput
A data.table answer:
dt <- fread("file.csv")
fwrite(dt[V2 %like% "mp3$|avi$", .(paste0(V2, "#", V1))], "output.csv", col.names = FALSE)
I want to read a xlsx file and I want to convert the data in the file into a long text string. I want to format this string in an intelligent manner, such as each row is contained in parentheses “()”, and keep the data in a comma separated value string. So for example if this was the xlsx file looked like this..
one,two,three
x,x,x
y,y,y
z,z,z
after formatting the string would look like
header(one,two,three)row(x,x,x)row(y,y,y)row(z,z,z)
How would you accomplish this task with R?
my first instinct was something like this… but I can’t figure it out..
library(xlsx)
sheet1 <- read.xlsx("run_info.xlsx",1)
paste("(",sheet1[1,],")")
This works for me:
DF <- read.xlsx("run_info.xlsx",1)
paste0("header(", paste(names(DF), collapse = ","), ")",
paste(paste0("row(", apply(DF, 1, paste, collapse = ","), ")"),
collapse = ""))
# [1] "header(one,two,three)row(x,x,x)row(y,y,y)row(z,z,z)"