I have a folder containing different csv files. Below is the picture showing the csv files. I would like to import all of them at once and name them in one go. Also, I would like to keep the column names unchanged.
Here is what I tried:
#Loading the data
filenames <- list.files(path="C:/Users/Juste/Desktop/Customs Data",
pattern="Imports 201+.*csv")
filelist <- lapply(filenames, read.csv)
#assigning names to data.frames
names(filelist) <- paste0("Imports_201",2:length(filelist))
#note the invisible function keeps lapply from spitting out the data.frames to the console
invisible(lapply(names(filelist), function(x) assign(x,filelist[[x]],envir=.GlobalEnv)))
When I tried this, it only imports the first five csv files, it leaves out “Imports 2017_anonymised”. Also the column names change the format. For example, column “Best country” becomes “Best.country”. How can I import all of the csv files and keep the column names unchanged?
You could try map() from the purrr package and read_csv() from the readr package (note that it is written with an underscore). This way your column names don't get changed.
library(purrr)
library(readr)
map(filenames, read_csv)
or if you automatically want to concatenate the dataframes use
map_df(filenames, read_csv)
Sorry, I can't add comments because I don't currently have enough reputation on here to do so. However, I think your regex might be a little off for the import. Try pattern = "^Imports\\s+201\\d_anonymised\\.csv$".
Regarding the "."s in the column names, I believe that by default, R's core data import commands add these where there are spaces. Otherwise you'll need to use backticks each time you want to refer to a column with space in its name. You could try setting check.names = F in you read.csv() function, as this is what calls make.names() to sanitize the column names upon data import. type ?make.names to see what it's doing.
Related
I'm new here and I don't know how this site works. If I make mistakes, sorry.
Soooo I have 23 xlsx files with many sheets in them.
I have to create dataset which contains all of those files but with only one sheet. Columns and names of the sheets are the same.
I have to bind them by rows.
If anyone know how to do it, I will be very grateful.
file.list <-list.files("D:/Profile/name/Desktop/Viss/foldername",pattern=".xlsx")
df.list <- lapply(file.list, read_excel)
Error: path does not exist:
df <- rbindlist(df.list, idcol = "id")
I don't know where to put the extract of this one sheet and I don't know what to write in idcol="".
I think your approach is correct, but you should use the full path in file.list <-list.files("D:/Profile/name/Desktop/Viss/foldername",pattern=".xlsx", full.names=TRUE)
EDIT: You should use pattern="\\.xlsx" in
list.files("D:/Profile/name/Desktop/Viss/foldername",pattern="\\.xlsx", full.names=TRUE)
EDIT2: You can always see any function help by running ? followed by your function name like ?rbindlist, or in RStudio, pressing F1 on the function name. The idcol parameter should be TRUE or FALSE, in your case, FALSE probably.
idcol
Generates an index column. Default (NULL) is not to. If idcol=TRUE then the column is auto named .id. Alternatively the column name can be directly provided, e.g., idcol = "id". If input is a named list, ids are generated using them, else using integer vector from 1 to length of input list. See examples.*
EDIT3 if you want to specify the sheet name you can use
lapply(file.list, function(x) read_excel(x, sheet="sheetname"))
I'm importing a csv file into R. I read a post here that said in order to get R to treat the first row of data as headers I needed to include the call header=TRUE.
I'm using the import function for RStudio and there is a Code Preview section in the bottom right. The default is:
library(readr)
existing_data <- read_csv("C:/Users/rruch/OneDrive/existing_data.csv")
View(existing_data)
I've tried placing header=TRUE in the following places:
read_csv(header=TRUE, "C:/Users...)
existing_data.csv", header=TRUE
after 2/existing_data.csv")
Would anyone be able to point me in the right direction?
You should use col_names instead of header. Try this:
library(readr)
existing_data <- read_csv("C:/Users/rruch/OneDrive/existing_data.csv", col_names = TRUE)
There are two different functions to read csv files (actually far more than two): read.csv from utils package and read_csv from readr package. The first one gets header argument and the second one col_names.
You could also try fread function from data.table package. It may be the fastest of all.
Good luck!
It looks like there is one variable name that is correctly identified as a variable name (notice your first column). I would guess that your first row only contains the variable "Existing Product List", and that your other variable names are actually contained in the second row. Open the file in Excel or LibreOffice Calc to confirm.
If it is indeed the case that all of the variable names you've listed (including "Existing Product List") are in the first row, then you're in the same boat as me. In my case, the first row contains all of my variables, however they appear as both variable names and the first row of observations. Turns out the encoding is messed up (which could also be your problem), so my solution was simply to remove the first row.
library(readr)
mydat = read_csv("my-file-path-&-name.csv")
mydat = mydat[-1, ]
I have many big files. But I would like get only the names of the columns without load them.
Using the data.table packages, I can do
df1 <-fread("file.txt")
names1<- names(df)
But, for get all names of the all files, is ver expensive. There is some other option?
Many functions to read in data have optional arguments that allow you to specify how many lines you'd like to read in. For example, the read.table function would allow you to do:
df1 <- read.table("file.txt", nrows=1, header=TRUE)
colnames(df1)
I'd bet that fread() has this option too.
(Note that you may even be able to get away with nrows=0, but I haven't checked to see if that works)
EDIT
As commenter kindly points out, fread() and read.table() work a little differently.
For fread(), you'll want to supply argument nrows=0 :
df1 <- fread("file.txt", nrows=0) ##works
As per the documentation,
nrows=0 is a special case that just returns the column names and types; e.g., a dry run for a large file or to quickly check format consistency of a set of files before starting to read any.
But nrows=0 is one of the ignored cases when supplied in read.table()
df1 <- fread("file.txt") ##reads entire file
df1 <- fread("file.txt", nrows=-1) ##reads entire file
df1 <- fread("file.txt", nrows=0) ##reads entire file
I have used the following code to read multiple .csv files in R:
Assembly<-t(read.table("E:\\test\\exp1.csv",sep="|",header=FALSE,col.names=c("a","b","c","d","Assembly","f"))[1:4416,"Assembly",drop=FALSE])
Top1<-t(read.table("E:\\test\\exp2.csv",sep="|",header=FALSE,col.names=c("a","b","c","d","Top1","f"))[1:4416,"Top1",drop=FALSE])
Top3<-t(read.table("E:\\test\\exp3.csv",sep="|",header=FALSE,col.names=c("a","b","c","d","Top3","f"))[1:4416,"Top3",drop=FALSE])
Top11<-t(read.table("E:\\test\\exp4.csv",sep="|",header=FALSE,col.names=c("a","b","c","d","Top11","f"))[1:4416,"Top11",drop=FALSE])
Assembly1<-t(read.table("E:\\test\\exp5.csv",sep="|",header=FALSE,col.names=c("a","b","c","d","Assembly1","f"))[1:4416,"Assembly1",drop=FALSE])
Area<-t(read.table("E:\\test\\exp6.csv",sep="|",header=FALSE,col.names=c("a","b","c","d","Area","f"))[1:4416,"Area",drop=FALSE])
data<-rbind(Assembly,Top1,Top3,Top11,Assembly1,Area)
So the entire data is in the folder "test" in E drive. Is there a simpler way in R to read multiple .csv data with a couple of lines of code or some sort of function call to substitute what has been made above?
(Untested code; no working example available) Try: Use the list.files function to generate the correct names and then use colClasses as argument to read.csv to throw away the first 4 columns (and since that vector is recycled you will alss throw away the 6th column):
lapply(list.files("E:\\test\\", patt="^exp[1-6]"), read.csv,
colClasses=c(rep("NULL", 4), "numeric"), nrows= 4416)
If you want this to be returned as a dataframe, then wrap data.frame around it.
I am using a for loop to read in multiple csv files and naming the datasets import1, import2, etc. For example:
assign(paste("import",i,sep=""), read.csv(files[i], header=FALSE))
However, I now want to rename the variables in each dataset. I have tried the following:
names(as.name(paste("import",i,sep=""))) <- c("xxxx", "yyyy")
But get the error "target of assignment expands to non-language object". (I need to change the name of variables in each dataset within the loop as the variable names need to be different in each dataset).
Any suggestions on how to do this would be much appreciated.
Thanks.
While I do agree it would be much better to keep your data.frames in a list rather than creating a bunch of variables in your global environment, you can also set names when you read the files in
assign(paste("import",i,sep=""),
read.csv(files[i], header=FALSE, col.names=c("xxxx", "yyyy")))
Using assign() isn't very "R-like".
A better approach would be to read the files into a list of data.frames, instead of one data.frame object per file. Assuming files is the vector of file names (as you imply above):
import <- lapply(files, read.csv, header=FALSE)
Then if you want to operate on each data.frame in the list using a loop, you easily can:
for (i in seq_along(import)) names(import[[i]]) <- c('xxx', 'yyy')