How to concatenate 2 datasets whilst removing superfluous columns in R Studio?

How to concatenate 2 datasets whilst removing superfluous columns in R Studio? - r

I have 2 CSV datasets:
File 1:
Identity,Number,Data,Result,RT
5,3,13,45,34
6,1,44,12,56
3,1,67,23,47
0,6,43,55,91
4,5,33,34,29
File 2:
Identity,NB,NB,Result,Data,
1,4,55,92,62
3,7,43,12,74
7,3,58,52,64
0,6,10,22,96
3,8,13,92,22
I would like to concatenate these two datasets to create one dataset with the data of file 2 beneath the data of file 1 in the correct corresponding columns.
File 3:
Identity,Number,Data,Result,RT
5,3,13,45,34
6,1,44,12,56
3,1,67,23,47
0,6,43,55,91
4,5,33,34,29
Identity,NB,NB,Result,Data,
1,4,55,92,62
3,7,43,12,74
7,3,58,52,64
0,6,10,22,96
3,8,13,92,22
But where the columns for Data and Result line up on top of one another.
N.B.The columns with the corresponding data in file 1 don't align with the columns holding the same data in file 2.

If your expected output is an object in R, it's impossible. If you mean that File 3 is a csv file binding File 1 and 2, then you can try this :
f1 <- read.csv("file1.csv")
f2 <- read.csv("file2.csv")
inter <- intersect(names(f1), names(f2))
diff1 <- names(f1)[! names(f1) %in% inter]
diff2 <- names(f2)[! names(f2) %in% inter]
write.table(f1[c(inter, diff1)], "file3.csv", quote = F, sep = ",", row.names = F)
write.table(f2[c(inter, diff2)], "file3.csv", quote = F, sep = ",", row.names = F, append = T)
# append = T is applicable to write.table, not write.csv.
file3.csv
Identity,Data,Result,Number,RT
5,13,45,3,34
6,44,12,1,56
3,67,23,1,47
0,43,55,6,91
4,33,34,5,29
Identity,Data,Result,NB,NB.1
1,62,92,4,55
3,74,12,7,43
7,64,52,3,58
0,96,22,6,10
3,22,92,8,13
Use Excel to open it:

Related

Converting text files to excel files in R

I have radiotelemetry data that is downloaded as a series of text files. I was provided with code in 2018 that looped through all the text files and converted them into CSV files. Up until 2021 this code worked. However, now the below code (specifically the lapply loop), returns the following error:
"Error in setnames(x, value) :
Can't assign 1 names to a 4 column data.table"
# set the working directory to the folder that contain this script, must run in RStudio
setwd(dirname(rstudioapi::callFun("getActiveDocumentContext")$path))
# get the path to the master data folder
path_to_data <- paste(getwd(), "data", sep = "/", collapse = NULL)
# extract .TXT file
files <- list.files(path=path_to_data, pattern="*.TXT", full.names=TRUE, recursive=TRUE)
# regular expression of the record we want
regex <- "^\\d*\\/\\d*\\/\\d*\\s*\\d*:\\d*:\\d*\\s*\\d*\\s*\\d*\\s*\\d*\\s*\\d*"
# vector of column names, no whitespace
columns <- c("Date", "Time", "Channel", "TagID", "Antenna", "Power")
# loop through all .TXT files, extract valid records and save to .csv files
lapply(files, function(x){
df <- read_table(file) # read the .TXT file to a DataFrame
dt <- data.table(df) # convert the dataframe to a more efficient data structure
colnames(dt) <- c("columns") # modify the column name
valid <- dt %>% filter(str_detect(col, regex)) # filter based on regular expression
valid <- separate(valid, col, into = columns, sep = "\\s+") # split into columns
towner_name <- str_sub(basename(file), start = 1 , end = 2) # extract tower name
valid$Tower <- rep(towner_name, nrow(valid)) # add Tower column
file_path <- file.path(dirname(file), paste(str_sub(basename(file), end = -5), ".csv", sep=""))
write.csv(valid, file = file_path, row.names = FALSE, quote = FALSE) # save to .csv
})
I looked up possible fixes for this and found using "setnames(skip_absent=TRUE)" in the loop resolved the setnames error but instead gave the error "Error in is.data.frame(x) : argument "x" is missing, with no default"
lapply(files, function(file){
df <- read_table(file) # read the .TXT file to a DataFrame
dt <- data.table(df) # convert the dataframe to a more efficient data structure
setnames(skip_absent = TRUE)
colnames(dt) <- c("col") # modify the column name
valid <- dt %>% filter(str_detect(col, regex)) # filter based on regular expression
valid <- separate(valid, col, into = columns, sep = "\\s+") # split into columns
towner_name <- str_sub(basename(file), start = 1 , end = 2) # extract tower name
valid$Tower <- rep(towner_name, nrow(valid)) # add Tower column
file_path <- file.path(dirname(file), paste(str_sub(basename(file), end = -5), ".csv", sep=""))
write.csv(valid, file = file_path, row.names = FALSE, quote = FALSE) # save to .csv
})
I'm confused at to why this code is no longer working despite working fine last year? Any help would be greatly appreciated!

The error occured at this line colnames(dt) <- c("columns") where you provided only one value to rename the (supposedly) 4-column dataframe. If you meant to replace a particular column, you can
colnames(dt)[i] <- c("columns")
where i is the index of the column you are renaming. Alternatively, provide a vector with 4 new names.

How to loop over on different files and save the output with filename in R?

I have several files with the names RTDFE, TRYFG, FTYGS, WERTS...like 100 files in txt format. For each file, I'm using the following code and writing the output in a file.
name = c("RTDFE")
file1 <- paste0(name, "_filter",".txt")
file2 <- paste0(name, "_data",".txt")
### One
A <- read.delim(file1, sep = "\t", header = FALSE)
#### two
B <- read.delim(file2, sep = "\t", header = FALSE)
C <- merge(A, B, by="XYZ")
nrow(C)
145
Output:
Samples Common
RTDFE 145
Every time I'm assigning the file to variable name running my code and writing the output in the file. Instead, I want the code to be run on all the files in one go and want the following output. Common is the row of merged data frame C
The output I need:
Samples Common
RTDFE 145
TRYFG ...
FTYGS ...
WERTS ...
How to do this? Any help.

How about putting all your names in a single vector, called names, like this:
names<-c("TRYFG","RTDFE",...)
and then feeding each one to a function that reads the files, merges them, and returns the rows
f<-function(n) {
fs = paste0(n,c("_filter", "_data"),".txt")
C = merge(
read.delim(fs[1],sep="\t", header=F),
read.delim(fs[2],sep="\t", header=F), by="XYZ")
data.frame(Samples=n,Common=nrow(C))
}
Then just call call this function f on each of the values in names, row binding the result together
do.call(rbind, lapply(names, f))
An easy way to create the vector names is like this:
p = "_(filter|data).txt"
names = unique(gsub(p,"",list.files(pattern = p)))

I am making some assumptions here.
The first assumption is that you have all these files in a folder with no other text files (.txt) in this folder.
If so you can get the list of files with the command list.files.
But when doing so you will get the "_data.txt" and the "filter.txt".
We need a way to extract the basic part of the name.
I use "str_replace" to remove the "_data.txt" and the "_filter.txt" from the list.
But when doing so you will get a list with two entries. Therefore I use the "unique" command.
I store this in "lfiles" that will now contain "RTDFE, TRYFG, FTYGS, WERTS..." and any other file that satisfy the conditions.
After this I run a for loop on this list.
I reopen the files similarly as you do.
I merge by XYZ and I immediately put the results in a data frame.
By using rbind I keep adding results to the data frame "res".
library(stringr)
lfiles=list.files(path = ".", pattern = ".txt")
## we strip, from the files, the "_filter and the data
lfiles=unique( sapply(lfiles, function(x){
x=str_replace(x, "_data.txt", "")
x=str_replace(x, "_filter.txt", "")
return(x)
} ))
res=NULL
for(i in lfiles){
file1 <- paste0(i, "_filter.txt")
file2 <- paste0(i, "_data.txt")
### One
A <- read.delim(file1, sep = "\t", header = FALSE)
#### two
B <- read.delim(file2, sep = "\t", header = FALSE)
res=rbind(data.frame(Samples=i, Common=nrow(merge(A, B, by="XYZ"))))
}

Ok, I will assume you have a folder called "data" with files named "RTDFE_filter.txt, RTDFE_data, TRYFG_filter.txt, TRYFG_data.txt, etc. (only and exacly this files).
This code should give a possible way
# save the file names
files = list.files("data")
# get indexes for "data" (for "filter" indexes, add 1)
files_data_index = seq(1, length(f), 2) # 1, 3, 5, ...
# loop on indexes
results = lapply(files_data_index, function(i) {
A <- read.delim(files[i+1], sep = "\t", header = FALSE)
B <- read.delim(files[i], sep = "\t", header = FALSE)
C <- merge(A, B, by="XYZ")
samp = strsplit(files[i], "_")[[1]][1]
com = nrow(C)
return(c(Samples = samp, Comon = com))
})
# combine results
do.call(rbind, results)

Extract specific cells from multiple csv files and copy them into a new excel file

I need to extract cells from the range C6:E6 (in the code range is [4, 3:5]) from three different csv files ("Multi_year_summary.csv") which are in different folders and then copy them into a new excel files. All csv files have the same name (written above). I tried as follow:
library("xlsx")
zz <- dir("C:/Users/feder/Documents/Simulations_DNDC")
aa <- list.files("C:/Users/feder/Documents/Simulations_DNDC/Try_1", pattern = "Multi_year_summary.csv",
full.names = T, recursive = T, include.dirs = T)
bb <- lapply(aa, read.csv2, sep = ",", header = F)
for (i in 1:length(bb)) {
xx <- bb[[i]][4, 3:5]
qq <- rbind(xx)
jj <- write.xlsx(qq, "C:/Users/feder/Documents/Simulations_DNDC/Try_1/Results.xlsx",
sheetName="Tabelle1",col.names = FALSE, row.names = FALSE)
}
The code is executed, but extracts the cells only from one file so that in Results.xlsx I have only one row instead of three. Maybe the problem starts from xx <- bb[[i]][4, 3:5] since if I execute xx the console gives back "1 obs. of 3 variables" instead of 3 objects.
Any help will be greatly appreciated.

After reading the csv you can extract the relevant data needed in the same lapply loop, combine them into one dataframe and write it in xlsx format.
result <- do.call(rbind, lapply(aa, function(x) read.csv(x, header = FALSE)[4, 3:5]))
write.xlsx(result,
"C:/Users/feder/Documents/Simulations_DNDC/Try_1/Results.xlsx",
sheetName="Tabelle1",col.names = FALSE, row.names = FALSE)

How do I extract the value from a particular cell across hundreds of .csv files?

I have approximately 400 .csv files and need to take just one value from each of them (cell B2 if opened using spreadsheet software).
Each file is an extract from a single date and is named accordingly (i.e. extract_2017-11-01.csv, extract_2018-04-05, etc.)
I know that I can do something like this to iterate over the files (correct me if I am wrong, or if there is a better way please do tell me):
path <- "~/csv_files"
out.file <- ""
file.names <- dir(path, pattern =".csv")
for(i in 1:length(file.names)){
file <- read.table(file.names[i], header = TRUE, sep = ",")
out.file <- rbind(out.file, file)
}
I want to effectively add something at the end of this which creates a data frame consisting of two columns: the first column will show the date (this ideally would be taken from the filename) and the second column will hold the value in cell B2.
How can I do this?

This lets you select only the second row and the second column when you import:
extract_2018_11_26 <- read.table("csv_files/extract_2018-11-26.csv",
sep=";", header = T, nrows=1, colClasses = c("NULL", NA, "NULL"))
Because nrows=1 means that we read only the first rows (except the header), and
in colClasses you secify "NULL" if you want to skip a column and NA if you want to keep it.
Here following your code, gsub() lets you find a pattern and replace it in a string:
out.file <- data.frame()
for(i in 1:length(file.names)){
file <- read.table(file.names[i],
sep=";", header = T, nrows=1, colClasses = c("NULL", NA,"NULL"))
date <- gsub("csv_files/extract_|.csv", "",x=file.names[i]) # extracts the date from the file name
out.file <- rbind(out.file, data.frame(date, col=file[, 1]))
}
out.file
# date col
# 1 2018-11-26 2
# 2 2018-11-27 2
Here the two .csv original files:
#first file, name: extract_2018-11-26.csv
col1 col2 col3
1 1 2 3
2 4 5 6
#second file, name: extract_2018-11-27.csv
col1 col2 col3
1 1 2 3
2 4 5 6

data.table approach
#build a list with csv files you want to load
files <- list.files( path = "yourpath", pattern = ".*.csv$", full.names = TRUE )
library(data.table)
#get value from second row (skip = 1) , second column ( select = 2 ) from each csv, using `data.table::fread`...
#bind the list together using `data.table::rbindlist`
rbindlist( lapply( files, fread, nrows = 1, skip = 1, select = 2 ) )
extracting the data from a filename is a different question, regex related.. please ask in a different quenstion...

Save and Load multiple dataframes from one CSV

How can I save 3 dataframes of different dimensions to one csv in order to load them afterwards in 3 different dataframes?
E.g
write.table(A, file = "di2.csv", row.names = FALSE, col.names = FALSE, sep=',')
write.table(B, file = "di2.csv", row.names = FALSE, col.names = FALSE, sep=',', append=TRUE)
write.table(C, file = "di2.csv", row.names = FALSE, col.names = FALSE, sep=',', append=TRUE)
or in a more elegant way
write.csv(rbind(A, B, C), "di2.csv")
How can I load this CSV to 3 dataframes, A,B and C?

This worked for me:
save(A_table, B_table, C_table, file="temp.Rdata")
load("temp.Rdata")

As mentioned in the comments if your purpose is just to read them back into R later then you could use save/load.
Another simple solution is dump/source:
A <- B <- C <- BOD # test input
dump(c("A", "B", "C"))
# read back
source("dumpdata.R")
Other multiple object formats that you could consider would be hdf and an SQLite database (which is a single file).
On the other hand if it is important that it be readable text, directly readable by Excel and at least somewhat similar to a csv file then write the data frames out one after another with a blank line after each. Then to read them back later, read the file and separate the input at the blank lines. st and en are the starting and ending line numbers of the chunks in Lines.
A <- B <- C <- BOD # test inputs
# write data frames to a single file
con <- file("out.csv", "w")
for(el in list(A, B, C)) {
write.csv(el, con, row.names = FALSE)
writeLines("", con)
}
close(con)
# read data.frames from file into a list L
Lines <- readLines("out.csv")
en <- which(Lines == "")
st <- c(1, head(en, -1))
L <- Map(function(st, en) read.csv(text = Lines[st:en]), st, en)
Note that there are some similarities between this question and Importing from CSV from a specified range of values

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to concatenate 2 datasets whilst removing superfluous columns in R Studio? - r

Related

Converting text files to excel files in R

How to loop over on different files and save the output with filename in R?

Extract specific cells from multiple csv files and copy them into a new excel file

How do I extract the value from a particular cell across hundreds of .csv files?

Save and Load multiple dataframes from one CSV

Categories

Resources