How to store data from read.table to variable array - r

I have data files something like
class1 class2 ....
1 1 ....
2 1
If I try to read data file like this
var <- read.table("file path", sep="\t",header=TRUE)
It works correctly, so I can access to the data using 'var' variable.
but, If I try to read data using for loop using variable list like this,
var <- c()
for(file in list.files(path="inputDir")){
i <- i+1
var[i] <- read.table("file path", sep="\t", header=TRUE)
}
I get only first column of the file, and can't get full data of the file.
Do I have to make separate variables like var1, var2, ...?
Can't I use var[i]??

With
var <- c()
you create a (numerical) vector. I guess the imported data gets coerced to that format too, which is why you only see 'one column'.
What you want is a list:
var <- list()
Make sure to index it with double brackets afterwards, like so:
var[[i]] = ...

You should use list to do such work. data.frame can only store variable with same rows.
var <- list()
i <- 1
for(file in list.files(path="inputDir")){
var[[as.character(i)]] <- read.table("file path", sep="\t", header=TRUE)
i <- i+1
}
I hope this will help you.
I don't if these code can work correctly, and you can debug according to error reports.
And if you really do not know how to do it, you should give some sample files, so everyone can debug for you.

Related

How to quickly export multiple files from RStudio

I'd like to know, how can I export subsets of a dataframe in R in an automated way?
I am currently using this manual method, where I retype 'a' and 'file_name' values for every file I want to save:
data <- MS[grepl('a', MS$name),]
write.xlsx(data, 'file_path/file_name')
Any help would be very much appreciated.
I would try something like this:
lijst <- c('a','b','c') # list of the values you type for 'a'
for(a in lijst){
filename <- paste0('file_path/',a,'.xlsx')
data <- MS[grepl(a, MS$name),]
write.xlsx(data, filename)
}

Binding rows of multiple data frames into one data frame in R

I have a vector of file paths called dfs, and I want create a dataframe of those files and bind them together into one huge dataframe, so I did something like this :
for (df in dfs){
clean_df <- bind_rows(as.data.table(read.delim(df, header=T, sep="|")))
return(clean_df)
}
but only the last item in the dataframe is being returned. How do I fix this?
I'm not sure about your file format, so I'll take common .csv as an example. Replace the a * i part with actually reading all the different files, instead of just generating mockup data.
files = list()
for (i in 1:10) {
a = read.csv('test.csv', header = FALSE)
a = a * i
files[[i]] = a
}
full_frame = data.frame(data.table::rbindlist(files))
The problem is that you can only pass one file at a time to the function read.delim(). So the solution would be to use a function like lapply() to read in each file specified in your df.
Here's an example, and you can find other answers to your question here.
library(tidyverse)
df <- c("file1.txt","file2.txt")
all.files <- lapply(df,function(i){read.delim(i, header=T, sep="|")})
clean_df <- bind_rows(all.files)
(clean_df)
Note that you don't need the function return(), putting the clean_df in parenthesis prompts R to print the variable.

Apply function to all dataframes

I work with SAS files (sas7bdat = dataframes) and SAS formats (sas7bcat).
My sas7bdat files are in a "data" file, so I can get a list in object files_names.
Here is the first part of my code, working perfectly
files_names <- list.files(here("data"))
nb_files <- length(files_names)
data_names <- vector("list",length=nb_files)
for (i in 1 : nb_files) {
data_names[i] <- strsplit(files_names[i], split=".sas7bdat")
}
for (i in 1:nb_files) {
assign(data_names[[i]],
read_sas(paste(here("data", files_names[i])), "formats/formats.sas7bcat")
)}
but I get some issues when trying to apply function as_factor from package haven (in order to apply labels on my new dataframes and get like SEX = "Male" instead of SEX = 1).
I can make it work dataframe by dataframe like the code below
df_labelled <- haven::as_factor(df, only_labelled = TRUE)
I would like to create a loop but didn't work because my data_names[i] isn't a dataframe and as_factor requires a dataframe in first argument.
I'm quite new to R, thank you very much if someone could help me.
you might want to think about using different data structures, for example you can use a named list to save your dataframes then you can easily loop through them.
In fact you could do everything in one loop, I'm sure there's a more efficient way to do this, but here's an example of one way without changing your code too much :
files_names <- list.files(here("data"))
raw_dfs <- list()
labelled_dfs <- list()
for (file_name in files_names) {
# # strsplit returns a list either extract the first element
# # like this
# df_name <- (strsplit(file_name, split=".sas7bdat"))[[1]]
# # or use something else like gsub
df_name <- gsub(".sas7bdat", '', file_name)
raw_dfs[df_name] <- read_sas(paste(here("data", file_name)), "formats/formats.sas7bcat")
labelled_dfs[df_name] <- haven::as_factor(raw_dfs[[df_name]], only_labelled = TRUE)
}

Write multiple loaded variables into different .txt files

I need to write multiple variables (dataframes) into different .txt files, named based it's original variables names. I tried use ls() function to select by pattern my desirable variables, but with no success. Is there any other approach to do this?
Using ls() function I was able to create .txt files with the correct filenames based on my variables (data1_tables.txt, data2_tables.txt, etc), but with the wrong output.
#create some variables based on mtcars data
data1 <- mtcars[1:5,]
data2 <- mtcars[6:10,]
data3 <- mtcars[11:20,]
fileNames=ls(pattern="data",all.names=TRUE)
for (i in fileNames) {
write.table(i,paste(i,"_tables.txt",sep=""),row.names = T,sep="\t",quote=F)
}
I want that the created files (data1_tables.txt, data2_tables.txt, data3_tables.txt) have the output from the original data1, data2, data3 variables.
What is happenning is that, what you're actually writing to files are the elements from the fileNames vector (which are just strings). If you want to write any object to a file through the write functions, you need to input the object itself, not the name of the object.
#create some variables based on mtcars data
data1 <- mtcars[1:5,]
data2 <- mtcars[6:10,]
data3 <- mtcars[11:20,]
fileNames = ls(pattern="data", all.names=TRUE)
for(i in fileNames) {
write.table(x=get(i), # The get function gets an object with a given name.
file=paste0(i, "_tables.txt"), # paste0 is basically a paste with sep="" by default
row.names=T,
sep="\t",
quote=F)
}
Change the end of your code to:
for (i in fileNames) {
write.table(eval(as.name(i)),paste(i,"_tables.txt",sep=""),row.names = T,sep="\t",quote=F)
}

Reading nodes from multiple html and storing result as a vector

I have a list of locally saved html files. I want to extract multiple nodes from each html and save the results in a vector. Afterwards, I would like to combine them in a dataframe. Now, I have a piece of code for 1 node, which works (see below), but it seems quite long and inefficient if I apply it for ~ 20 variables. Also, something really strange with the saving to vector (XXX_name) it starts with the last observation and then continues with the first, second, .... Do you have any suggestions for simplifying the code/ making it more efficient?
# Extracts name variable and stores in a vector
XXX_name <- c()
for (i in 1:216) {
XXX_name <- c(XXX_name, name)
mydata <- read_html(files[i], encoding = "latin-1")
reads_name <- html_nodes(mydata, 'h1')
name <- html_text(reads_name)
#print(i)
#print(name)
}
Many thanks!
You can put the workings inside a function then apply that function to each of your variables with map
First, create the function:
read_names <- function(var, node) {
mydata <- read_html(files[var], encoding = "latin-1")
reads_name <- html_nodes(mydata, node)
name <- html_text(reads_name)
}
Then we create a df with all possible combinations of inputs and apply the function to that
library(tidyverse)
inputs <- crossing(var = 1:216, node = vector_of_nodes)
output <- map2(inputs$var, inputs$node, read_names)

Resources