I am trying to perform the following:
Analyse_Store <- function("x"){
datapaste"x" <- read.table(file = "x".txt, sep=";", header = TRUE)
}
So basically I am trying to read a table named "x" and assign it to the name datapaste"x" unfortunately this does not work as expected and I cannot find any help online.
Thanks in advance
Use assign to read table into the variable named x.
The following function will take argument x and assign x.txt from working directory to a variable x inside the function.
Analyse_Store <- function(x){
assign(paste("datapaste",x,sep=""), read.table(file = paste(x,".txt",sep=""), sep=";", header = TRUE))
}
Usage would be
datapastex = Analyse_Store("x")
Unless you are further processing the table inside the function, I don't see much use for a function in this case. You could just do
datapastex = read.table(file = "x.txt", sep=";", header = TRUE)
Related
What I want to do is take every file in the subdirectory that I am in and essentially just shift the column header names over one left.
I try to accomplish this by using fread in a for loop:
library(data.table)
## I need to write this script to reorder the column headers which are now apparently out of wack
## I just need to shift them over one
filelist <- list.files(pattern = ".*.txt")
for(i in 1:length(filelist)){
assign(filelist[[i]], fread(filelist[[i]], fill = TRUE))
names(filelist[[i]]) <- c("RowID", "rsID", "PosID", "Link", "Link.1","Direction", "Spearman_rho", "-log10(p)")
}
However, I keep getting the following or a variant of the following error message:
Error in names(filelist[[i]]) <- c("RowID", "rsID", "PosID", "Link", "Link.1", :
'names' attribute [8] must be the same length as the vector [1]
Which is confusing to me because, as you can clearly see above, R Studio is able to load the files as having the correct number of columns. However, the error message seems to imply that there is only one column. I have tried different functions, such as colnames, and I have even tried to define the separator as being quotation marks (as my files were previously generated by another R script that quotation-separated the entries), to no luck. In fact, if I try to define the separator as such:
for(i in 1:length(filelist)){
assign(filelist[[i]], fread(filelist[[i]], sep = "\"", fill = TRUE))
names(filelist[[i]]) <- c("RowID", "rsID", "PosID", "Link", "Link.1","Direction", "Spearman_rho", "-log10(p)")
}
I get the following error:
Error in fread(filelist[[i]], sep = "\"", fill = TRUE) :
sep == quote ('"') is not allowed
Any help would be appreciated.
I think the problem is that, despite the name, list.files returns a character vector, not a list. So using [[ isn't right. Then, with assign, you create an objects that have the same name as the files (not good practice, it would be better to use a list). Then you try to modify the names of the object created, but only using the character string of the object name. To use an object who's name is in a character string, you need to use get (which is part of why using a list is better than creating a bunch of objects).
To be more explicit, let's say that filelist = c("data1.txt", "data2.txt"). Then, when i = 1, this code: assign(filelist[[i]], fread(filelist[[i]], fill = TRUE)) creates a data table called data1.txt. But your next line, names(filelist[[i]]) <- ... doesn't modify your data table, it modifies the first element of filelist, which is the string "data1.txt", and that string indeed has length 1.
I recommend reading your files into a list instead of using assign to create objects.
filelist <- list.files(pattern = ".*.txt")
datalist <- lapply(filelist, fread, fill = TRUE)
names(datalist) <- filelist
For changing the names, you can use data.table::setnames instead:
for(dt in datalist) setnames(dt, c("RowID", "rsID", "PosID", "Link", "Link.1","Direction", "Spearman_rho", "-log10(p)"))
However, fread has a col.names argument, so you can just do it in the read step directly:
my_names <- c("RowID", "rsID", "PosID", "Link", "Link.1","Direction", "Spearman_rho", "-log10(p)")
datalist <- lapply(filelist, fread, fill = TRUE, col.names = my_names)
I would also suggest not using "-log10(p)" as a column name - nonstandard column names (with parens and -) are usually more trouble than they are worth.
Could you run the following code to have a closer look at what you are putting into filelist?
i <- 1
assign(filelist[[i]], fread(filelist[[i]], fill = TRUE))
print(filelist[[i]])
I suspect you may need to use the code below instead of the assign statement
filelist[[i]] <- fread(filelist[[i]], fill = TRUE)
I am having trouble using lapply with a custom function I created, but it is a fairly specific function and I haven't been able to find an answer that suits me.
The function is quite long and does a lot of things, but I think I've managed to trim it down to a reproducible example that gives the error I am getting.
The thing is, I have a folder with two different types of files, that I want to load into R as elements of a list. They have more or less the same information, but they have different layout, different file extension, different almost everything, so the process to import them is different for each type.
The function goes like this:
f <- function(a_file, b_file, type){
if (type == "a"){act <- read.table(a_file, skip = 19, header = TRUE, sep = "", dec = ".")}
if (type == "b"){act <- read.table(b_file, header=FALSE, sep="\t", dec=".")}
return(act)}
Then I create vectors with the names of the two types of files I want to call, like this:
a_files <- dir(pattern=".deg")
b_files <- dir(pattern=".act")
And finally try to apply the function like this:
act_list <- lapply(a_files, f, b_files, type = "b")
which works if type = "a", but fails for type = "b", giving the error:
Error in file(file, "rt") : invalid 'description' argument
which I am pretty sure has to do with the fact that I am applying the function to only the "a_files" vector and not to "b_files", but I as much as I try I can't figure out a way to fix it...
There is a much simpler way to solve your problem.
First, define the function to have only one argument for the filename. Then decide inside it what type of file it is and do the right type of read.
readFiles = function(file){
if(grepl(file,pattern = "\\.deg")){
act <- read.table(a_file, skip = 19, header = TRUE, sep = "", dec = ".")
}
if(grepl(file,pattern = "\\.act")){
act <- read.table(b_file, header=FALSE, sep="\t", dec=".")
}
return(act)
}
Finally, you only have to call lapply on your files vector:
filesVector = dir(pattern = "(\\.act|\\.deg)")
result = lapply(filesVector, readFiles)
filesVector will return all files that contain ".act" or ".deg".
NOTE: Your pattern is not correct, since it will return files that contain any character followed by "act" or by "deg".
I have this little problem in R : I loaded a dataset, modified it and stored it in the variable "mean". Then I used an other variable "dataset" also containing this dataset
data<-read.table()
[...modification on data...]
mean<-data
dataset<-mean
I used the variable "dataset" in some other functions of my script, etc. and at the end I want to store in a file with the name "table_mean.csv"
Of course the command write.csv(tabCorr,file=paste("table_",dataset,".csv",sep=""))
nor the one with ...,quote(dataset)... do what I want...
Does anyone know how I can retrieve "mean" (as string) from "dataset" ?
(The aim would be that I could use this script for other purposes simply changing e.g. dataset<-variance)
Thank you in advance !
I think you are trying to do something like the following code does:
data1 <- 1:4
data2 <- 4:8
## Configuration ###
useThisDataSet <- "data2" # Change to "data1" to use other dataset.
currentDataSet <- get(x = useThisDataSet)
## Your data analysis.
result <- fivenum(currentDataSet)
## Save results.
write.csv(x = result, file = paste0("table_", useThisDataSet, ".csv"))
However, a better alternative would be to wrap your code into a function and pass in your data:
doAnalysis <- function(data, name) {
result <- fivenum(data)
write.csv(x = result, file = paste0("table_", name, ".csv"))
}
doAnalysis(data1, "data1")
If you always want to use the name of the object passed into the function as part of the filename, we can use non-standard evaluation to save some typing:
doAnalysisShort <- function(data) {
result <- fivenum(data)
write.csv(x = result, file = paste0("table_", substitute(data), ".csv"))
}
doAnalysisShort(data1)
I am still pretty new to R and very new to for-loops and functions, but I searched quite a bit on stackoverflow and couldn't find an answer to this question. So here we go.
I'm trying to create a script that will (1) read in multiple .csv files and (2) apply a function to strip twitter handles from urls in and do some other things to these files. I have developed script for these two tasks separately, so I know that most of my code works, but something goes wrong when I try to combine them. I prepare for doing so using the following code:
# specify directory for your files and replace 'file' with the first, unique part of the
# files you would like to import
mypath <- "~/Users/you/data/"
mypattern <- "file+.*csv"
# Get a list of the files
file_list <- list.files(path = mypath,
pattern = mypattern)
# List of names to be given to data frames
data_names <- str_match(file_list, "(.*?)\\.")[,2]
# Define function for preparing datasets
handlestripper <- function(data){
data$handle <- str_match(data$URL, "com/(.*?)/status")[,2]
data$rank <- c(1:500)
names(data) <- c("dateGMT", "url", "tweet", "twitterid", "rank")
data <- data[,c(4, 1:3, 5)]
}
That all works fine. The problem comes when I try to execute the function handlestripper() within the for-loop.
# Read in data
for(i in data_names){
filepath <- file.path(mypath, paste(i, ".csv", sep = ""))
assign(i, read.delim(filepath, colClasses = "character", sep = ","))
i <- handlestripper(i)
}
When I execute this code, I get the following error: Error in data$URL : $ operator is invalid for atomic vectors. I know that this means that my function is being applied to the string I called from within the vector data_names, but I don't know how to tell R that, in this last line of my for-loop, I want the function applied to the objects of name i that I just created using the assign command, rather than to i itself.
Inside your loop, you can change this:
assign(i, read.delim(filepath, colClasses = "character", sep = ","))
i <- handlestripper(i)
to
tmp <- read.delim(filepath, colClasses = "character", sep = ",")
assign(i, handlestripper(tmp))
I think you should make as few get and assign calls as you can, but there's nothing wrong with indexing your loop with names as you are doing. I do it all the time, anyway.
I'm wondering if it would be possible to draw out out the filename from a file.choose() command embedded within a read.csv call. Right now I'm doing this in two steps, but the user has to select the same file twice in order to extract both the data (csv) and the filename for use in a function I run. I want to make it so the user only has to select the file once, and then I can use both the data and the name of the file.
Here's what I'm working with:
data <- read.csv(file.choose(), skip=1))
name <- basename(file.choose())
I'm running OS X if that helps, since I think file.choose() has different behavior depending on the OS. Thanks in advance.
Why you use an embedded file.choose() command?
filename <- file.choose()
data <- read.csv(filename, skip=1)
name <- basename(filename)
use this:
df = read.csv(file.choose(), sep = "<use relevant seperator>", header = T, quote = "")
seperators are generally comma , or fullstop .
Example:
df = read.csv(file.choose(), sep = ",", header = T, quote = "")
#
Use :
df = df[,!apply(is.na(df), 2, all)] # works for every data format
to remove blank columns to the left