I have 15 data frames, which were automatically imported from csv files, each consists of 3 columns, that's why the column names are the same V1, V2 and V3. I would like to distinguish them when I'll get them joined together, so I ' asking for an idea how to rename each columns automatically like e.g. DataFrameName_V1 or something similar.
I have searched a lot for this code,but haven't find any solution yet.
Btw I have this code for the csv import:
file_list <- list.files(pattern="*.csv")
for (i in 1:length(file_list))
{
assign(file_list[i],
read.csv(paste(file_list[i]), na.strings=c("NA", "NULL", ""), header=FALSE, sep=",", stringsAsFactors = FALSE, encoding = "UTF-8"))
}
Related
I'm new to R and I can't make this work with the information I'm finding.
I have many .txt files in a folder, each of them containing data from one subject. The files have identical columns, but the number of rows for each file varies. In addition, the column headers only start in row 9. What I want to do is
import the .txt files into RStudio in one go while skipping the first 8 rows, and
merging them all together into one data frame by their columns, so that the final data frame is a data set containing the data from all subjects in long format.
I managed to do 1 (I think) using the easycsv package and the following code:
fread_folder(directory = "C:/Users/path/to/my/files",
extension = "TXT",
sep = "auto",
nrows = -1L,
header = "auto",
na.strings = "NA",
stringsAsFactors = FALSE,
verbose=getOption("datatable.verbose"),
skip = 8L,
drop = NULL,
colClasses = NULL,
integer64=getOption("datatable.integer64"),# default:"integer64"
dec = if (sep!=".") "." else ",",
check.names = FALSE,
encoding = "unknown",
quote = "\"",
strip.white = TRUE,
fill = FALSE,
blank.lines.skip = FALSE,
key = NULL,
Names=NULL,
prefix=NULL,
showProgress = interactive(),
data.table=FALSE
)
That worked, however now my problem is that the data frames have been named after the very long path to my files and obviously after the txt files (without the 7 though). So they are very long and unwieldy and contain characters that they probably shouldn't, such as spaces.
So now I'm having trouble merging the data frames into one, because I don't know how else to refer to the data frames other than the default names that have been given to them, or how to rename them, or how to specify how the data frames should be named when importing them in the first place.
The code below looks for what files are in your directory, uses those names to get the file as a variable, and then uses rbindlist to combined the tables into a single table. Hope that helps. It assumes each .csv or .txt file in the directory has been pulled into the current environment as a separate data.table.
for (x in (list.files(directory))) {
# Remove the .txt extension from the filename to get the table name
if (grepl(".txt",x)) {
x = gsub(".txt","",x)
}
thisTable <- get(x) # use "get" to pull in the string as a variable
# now just combined into a single dataframe
if (exists("combined")) {
combined = rbindlist(list(combined,thisTable))
} else {
combined <- thisTable
}
}
The following should work well. However, without sample data or a more clear description of what you want it's hard to know for certain if this if what you are looking to accomplish.
#set working directory
setwd("C:/Users/path/to/my/files")
#read in all .txt files but skip the first 8 rows
Data.in <- lapply(list.files(pattern = "\\.txt$"),read.csv,header=T,skip=8)
#combines all of the tables by column into one
Data.in <- do.call(rbind,Data.in)
I am combining 1 column of data from multiple/many text files into a single CSV file. This part is fine with the code I have. However, I would like to have after importing the filename (e.g., "roth_Aluminusa_E1.0.DPT") as the column header for the data column taken from that file. I know, similar questions have been asked but I can't work it out. Thanks for any help :-)
Code I am using which works for combining the files:
files3 <- list.files()
WAVELENGTH <- read.table(files3[1], header=FALSE, sep=",")[ ,1]
TEST9 <- do.call(cbind,lapply(files3,function(fn)read.table(fn, header=FALSE, sep = ",")[ , 2]))
TEST10 <- cbind(WAVELENGTH, TEST9)
You can do the following to add column names to TEST10. This assumes the column name you want for the first column is files3[1]
colnames(TEST10) <- c(files3[1], files3)
In case you want to keep the name of the first column as is, then we add the desired column names before binding WAVELENGTH with TEST9.
colnames(TEST9) <- files3
TEST10 <- cbind(WAVELENGTH, TEST9)
Then you can write to a csv as usual, keeping the column names as headers in the resulting file.
write.csv(TEST10, file = "TEST10.csv", row.names = FALSE)
I have a couple of thousand *.csv files (all with a unique name) but the header - columns are equal in the files - like "Timestamp", "System_Name", "CPU_ID", etc...
My question is how can I either replace the "System_Name" (which is a system name like "as12535.org.at" or any other character combination, and anonymize this ? I am grateful for any hint or point in the right direction ...
Below the structure of a CSV file...
"Timestamp","System_Name","CPU_ID","User_CPU","User_Nice_CPU","System_CPU","Idle_CPU","Busy_CPU","Wait_IO_CPU","User_Sys_Pct"
"1161025010002000","as06240.org.xyz:LZ","-1","1.83","0.00","0.56","97.28","2.72","0.33","3.26"
"1161025010002000","as06240.org.xyz:LZ","-1","1.83","0.00","0.56","97.28","2.72","0.33","3.26"
"1161025010002000","as06240.org.xyz:LZ","-1","1.83","0.00","0.56","97.28","2.72","0.33","3.26"
I tried it with the R package anonymizer which works fine on the vector level, but i ran into issues doing this for thousands of csv files that i was reading in R - what i tried was the following - creating a list with all the csv Files as dataframes inside the list.
initialize a list
r.path <- setwd("mypath")
ldf <- list()
# creates the list of all the csv files in my directory - but filter for
# files with Unix in the filename for testing.
listcsv <- dir(pattern = ".UnixM.")
for (i in 1:length(listcsv)){
ldf[[i]] <- read.csv(file = listcsv[i])
}
I was twisting my brain to death, as i could not anonymize the System_Name column, or even replace some characters (for pseudo-anonymization) and loop through the list (ldf) and the dataframe elements of that very list.
My list ldf (containing the df for the single csv files) looks like this:
summary(ldf)
Length Class Mode
[1,] 5 data.frame list
[2,] 5 data.frame list
[3,] 5 data.frame list
How can I now read in all the CSV files, change or anonymize the entire or even parts of the "System_Name" columns, and do this for each and every CSV in my directory, in a loop in R ? Doesn't need to be super elegant - am happy when it does the job :-)
A common pattern for doing this would be:
df <- do.call(
rbind,
lapply(dir(pattern = "UnixM"),
read.csv, stringsAsFactors = FALSE)
)
df$System_Name <- anonymizer::anonymize(df$System_Name)
It differs from what you were trying, in that it binds all the data frames in one, then anonymize.
Of course you can keep everything in a list, like #S Rivero suggests. It would look like:
listdf <- lapply(
dir(pattern = "UnixM"),
function(filename) {
df <- read.csv(filename, stringsAsFactors = FALSE)
df$System_Name <- anonymizer::anonymize(df$System_Name)
df
}
)
One of the columns in my dataframe contains semicolon(;) and when I try to download the dataframe to a csv using fwrite, it is splitting that value into different columns.
Ex: Input : abcd;#6 After downloading it becomes : 1st column : abcd,
2nd column: #6
I want both to be in the same column.
Could you please suggest how to write the value within a single column.
I am using below code to read the input file:
InpData <- read.table(File01, header=TRUE, sep="~", stringsAsFactors = FALSE,
fill=TRUE, quote="", dec=",", skipNul=TRUE, comment.char="")
while for writing:
fwrite(InpData, File01, col.names=T, row.names=F, quote = F, sep="~")
You didn't give us an example, but it is possible you need to use a different separator than ";"
fwrite(x, file = "", sep = ",")
sep: The separator between columns. Default is ",".
If this simple solution does not work, we need the data to reproduce your problem.
Hi so I have a data in the following format
101,20130826T155649
------------------------------------------------------------------------
3,1,round-0,10552,180,yellow
12002,1,round-1,19502,150,yellow
22452,1,round-2,28957,130,yellow,30457,160,brake,31457,170,red
38657,1,round-3,46662,160,yellow,47912,185,red
and I have been reading them and cleaning/formating them by this code
b <- read.table("sid-101-20130826T155649.csv", sep = ',', fill=TRUE, col.names=paste("V", 1:18,sep="") )
b$id<- b[1,1]
b<-b[-1,]
b<-b[-1,]
b$yellow<-B$V6
and so on
There are about 300 files like this, and ideally they will all compiled without the first two lines, since the first line is just id and I made a separate column to identity these data. Does anyone know how to read these table quickly and clean and format the way I want then compile them into a large file and export them?
You can use lapply to read all the files, clean and format them, and store the resulting data frames in a list. Then use do.call to combine all of the data frames into single large data frame.
# Get vector of files names to read
files.to.load = list.files(pattern="csv$")
# Read the files
df.list = lapply(files.to.load, function(file) {
df = read.table(file, sep = ',', fill=TRUE, col.names=paste("V", 1:18,sep=""))
... # Cleaning and formatting code goes here
df$file.name = file # In case you need to know which file each row came from
return(df)
})
# Combine into a single data frame
df.combined = do.call(rbind, df.list)