Concatinate text using paste to call a vector in r - r

I'm very new to R so may still be thinking in spreadsheets. I'd like to loop a list of names from a vector (list) through a function (effect) and append text to the front and end of the name a bit of text ("data$" and ".time0" or ".time1") so it references a specific vector of a dataframe I already have loaded (i.e., data$variable.time0 and data$variable.time1).
Paste just gives me a character named "data$variable.time0" or "data$variable.time1", rather than referencing the vector of the dataframe I want it to. Can I convert this to a reference somehow?
for (i in list){
function(i)
}
effect <- function(i){
time0 <- paste("data$",i,".time0", sep = ""))
time1 <- paste("data$",i,".time1", sep = ""))
#code continues but not relevant here
}

You can use eval(parse(text = "...")) to evaluate characters.
Try
time0 <- eval(parse(text = paste("data$",i,".time0", sep = ""))))
within your loop.

Related

How to name dataframe in for loop

I'm trying to call a dataframe but it's named with a number because it was originally multiple. I want to either rename the dataframes in my loop or find a way to call my dataframe even though it is titled with a number. Right now, after I run this code:
filenames <- list.files(path = "filepath",pattern = ".*txt")
head(filenames)
names <- substr(filenames,1,22)
for(i in names){
filepath <-file.path("filepath",paste(i,".txt",sep = ""))
assign(i,read.delim(filepath,colClasses = c('character','character','factor','factor'),sep = "\t"))
}
I get a lot of separate dataframes with names like '101_1b1_Al_sc_Meditron.txt'. When I try to even view the dataframe, R is confused because the name begins with a number.
Is there a good solution here?
The simplest solution is to reference the original names using backticks.
example:
`123_mtcars` <- mtcars
View(`123_mtcars`)
If you would prefer to create a naming convention or just to remove numbers from each dataframe name you could do that in your loop and use the new variable in your assign statement.
example:
filenames <- list.files(path = "filepath",pattern = ".*txt")
head(filenames)
names <- substr(filenames,1,22)
for(i in names){
filepath <-file.path("filepath",paste(i,".txt",sep = ""))
# gsub to replace all numbers with "" for the name i
dfName <- gsub("[0-9]", "", i)
assign(dfName,read.delim(filepath,colClasses = c('character','character','factor','factor'),sep = "\t"))
}
The are 3 solutions I can think of :
1. Keeping your code in current state.
If we don't change anything about your code and your dataframes are named as '101_1b1_Al_sc_Meditron' to view the contents of the dataframe you can use backticks. Try using it like this :
`101_1b1_Al_sc_Meditron`
2. Change the name of dataframes.
In your loop change the assign line to
assign(paste0('df_', i), read.delim(filepath,
colClasses = c('character','character','factor','factor'),sep = "\t"))
So after running for loop you'll have filenames as df_101_1b1_Al_sc_Meditron which is a standard name and you can access them without any problem.
3. Store data in a list.
Instead of having so many dataframes in the global environment why not store them in a list. Lists are easier to manage.
list_of_files <-lapply(filepath, function(x) read.delim(x,
colClasses = c('character','character','factor','factor'),sep = "\t"))

How to assign new data to the variable within variable in R

I have my file names as all.files in working directory. I want to read these files in loop and assign the names as gsub(".csv","", all.files) for each file.
all.files <- c("harvestA.csv", "harvestB.csv", "harvestC.csv", "harvestD.csv",
"seedA.csv", "seedB.csv", "seedC.csv", "seedD.csv")
I tried something like this below but it won't work. What do I need here?
for(i in 1:length(all.files)){
assign(gsub(".csv","", all.files)[i]) <- read.table(
all.files[i],
header = TRUE,
sep = ","
)
}
You could keep them in a named list as it is not a good practice to clutter the environment with lot of global variables
list_df <- lapply(all.files, read.csv)
names(list_df) <- sub("\\.csv", "", all.files)
You can always extract individual dataframes as list_df[["harvestA"]], list_df[["harvestB"]] etc.
If you still need them as separate dataframes
list2env(list_df, .GlobalEnv)
The . is a metacharacter in regex matching any character. So, we can use fixed = TRUE to match the literal dot. Also, in the OP's code, with the assign, there is no need for another assignment operator (<-), the second argument in assign is value and here it is the dataset read with the read.table
for(i in 1:length(all.files)){
assign(sub(".csv","", all.files, fixed = TRUE)[i], read.table(
all.files[i],
header = TRUE,
sep = ","
))
}
An option using strsplit
for (i in seq_along(all.files)) {
assign(x = strsplit(allfiles[i],"\\.")[[1]][1],
value = read.csv(all.files[i]),
pos = .GlobalEnv)
}

`rbind` unique entries of all columns of a data frame and write it to a csv file

##Initialise empty dataframe
g <-data.frame(x= character(), y= character(),z=numeric())
## Loop through each columns and list out unique values (with the column name)
for(i in 1:ncol(iris))
{
a<-data.frame(colnames(iris)[i],unique(iris[,i]),i)
g<-rbind(g,a)
setNames(g,c('x','y','z'))
}
## write the output to csv file
write.csv(g,"1.csv")
The output CSV file is something like this
Now the Column headers I want are not proper. I want column headers to be 'x','y','z' respectively. Also the first column should not be there.
Also if you have any other efficient way to do this, let me know. Thanks!
This will do the work:
for(i in 1:ncol(iris))
{
a<-data.frame(colnames(iris)[i],unique(iris[,i]),i)
g<-rbind(g,a)
}
g <- setNames(g,c('x','y','z')) ## note the `g <-`
write.csv(g, file="1.csv", row.names = FALSE) ## don't write row names
setNames returns a new data frame with names "x", "y" and "z", rather than updating the input data frame g. You need the explicit assignment <- to do the "replacement". You may hide such <- by using either of the two
names(g) <- c('x','y','z')
colnames(g) <- c('x','y','z')
Alternatively, you can use the col.names argument inside write.table:
for(i in 1:ncol(iris))
{
a<-data.frame(colnames(iris)[i],unique(iris[,i]),i)
g<-rbind(g,a)
}
write.table(g, file="a.csv", col.names=c("x","y","z"), sep =",", row.names=FALSE)
write.csv() does not support col.names, hence we use write.table(..., sep = ","). Trying to use col.names in write.csv will generate a warning.
A more efficient way
I would avoid using rbind inside a loop. I would do:
x <- lapply(iris, function (column) as.character(unique(column)))
g <- cbind.data.frame(stack(x), rep.int(1:ncol(iris), lengths(x)))
write.table(g, file="1.csv", row.names=FALSE, col.names=c("x","y","z"), sep=",")
Read ?lapply and ?stack for more.

R 3.1 sapply to a list of files

I want to parse the read.table() function to a list of .txt files. These files are in my current directory.
my.txt.list <-
list("subject_test.txt", "subject_train.txt", "X_test.txt", "X_train.txt")
Before applying read.table() to elements of this list, I want to check if the dt has not been already computed and is in a cache directory. dt from cache directory are already in my environment(), in form of file_name.dt
R> ls()
"subject_test.dt" "subject_train.dt"
In this example, I only want to compute "X_test.txt" and "X_train.txt". I wrote a small function to test if dt has already been cached and apply read.table()in case not.
my.rt <- function(x,...){
# apply read.table to txt files if data table is not already cached
# x is a character vector
y <- strsplit(x,'.txt')
y <- paste(y,'.dt',sep = '')
if (y %in% ls() == FALSE){
rt <- read.table(x, header = F, sep = "", dec = '.')
}
}
This function works if I take one element this way :
subject_test.dt <- my.rt('subject_test.txt')
Now I want to sapply to my files list this way:
my.res <- saply(my.txt.list,my.rt)
I have my.resas a list of df, but the issue is the function compute all files and does take into account already computed files.
I must be missing something, but I can't see why.
TY for suggestions.
I think it has to do with the use of strsplit in your example. strsplit returns a list.
What about this?
my.txt.files <- c("subject_test.txt", "subject_train.txt", "X_test.txt", "X_train.txt")
> ls()
[1] "subject_test.dt" "subject_train.dt"
my.rt <- function(x){
y <- gsub(".txt", ".dt", x, fixed = T)
if (!(y %in% ls())) {
read.table(x, header = F, sep = "", dec = '.') }
}
my.res <- sapply(my.txt.files, FUN = my.rt)
Note that I'm replacing .txt with .dt and I'm doing a "not in". You will get NULL entries in the result list if a file is not processed.
This is untested, but I think it should work...

Executing function on objects of name 'i' within for-loop in R

I am still pretty new to R and very new to for-loops and functions, but I searched quite a bit on stackoverflow and couldn't find an answer to this question. So here we go.
I'm trying to create a script that will (1) read in multiple .csv files and (2) apply a function to strip twitter handles from urls in and do some other things to these files. I have developed script for these two tasks separately, so I know that most of my code works, but something goes wrong when I try to combine them. I prepare for doing so using the following code:
# specify directory for your files and replace 'file' with the first, unique part of the
# files you would like to import
mypath <- "~/Users/you/data/"
mypattern <- "file+.*csv"
# Get a list of the files
file_list <- list.files(path = mypath,
pattern = mypattern)
# List of names to be given to data frames
data_names <- str_match(file_list, "(.*?)\\.")[,2]
# Define function for preparing datasets
handlestripper <- function(data){
data$handle <- str_match(data$URL, "com/(.*?)/status")[,2]
data$rank <- c(1:500)
names(data) <- c("dateGMT", "url", "tweet", "twitterid", "rank")
data <- data[,c(4, 1:3, 5)]
}
That all works fine. The problem comes when I try to execute the function handlestripper() within the for-loop.
# Read in data
for(i in data_names){
filepath <- file.path(mypath, paste(i, ".csv", sep = ""))
assign(i, read.delim(filepath, colClasses = "character", sep = ","))
i <- handlestripper(i)
}
When I execute this code, I get the following error: Error in data$URL : $ operator is invalid for atomic vectors. I know that this means that my function is being applied to the string I called from within the vector data_names, but I don't know how to tell R that, in this last line of my for-loop, I want the function applied to the objects of name i that I just created using the assign command, rather than to i itself.
Inside your loop, you can change this:
assign(i, read.delim(filepath, colClasses = "character", sep = ","))
i <- handlestripper(i)
to
tmp <- read.delim(filepath, colClasses = "character", sep = ",")
assign(i, handlestripper(tmp))
I think you should make as few get and assign calls as you can, but there's nothing wrong with indexing your loop with names as you are doing. I do it all the time, anyway.

Resources