I would like to tabulate how often a function is used in one or more R script files. I have found the function NCmisc::list.functions.in.file, and it is very close to what I want:
library(stringr)
cat("median(iris$Sepal.Length)\n median(iris$Sepal.Width)\n library(stringr); str_length(iris$Species) \n", file = "script.R")
list.functions.in.file("script.R")
package:base package:stats package:stringr
"library" "median" "str_length"
Note that median is used twice in the script, but list.functions.in.file does not use this information, and only lists each unique function. Are there any packages out there that can produce such frequencies? And bonus for the ability to analyze a corpus of multiple R scripts, not just a single file.
(note this is NOT about counting function calls, e.g. in recursion, and I want to avoid executing the scripts)
That NCmisc function is just a wrapper round utils::parse and utils::getParseData, so you can just make your own function (and then you don't need the dependency on NCmisc:
count.function.names <- function(file) {
data <- getParseData(parse(file=file))
function_names <- data$text[which(data$token=="SYMBOL_FUNCTION_CALL")]
occurrences <- data.frame(table(function_names))
result <- occurrences$Freq
names(result) <- occurrences$function_names
result
}
Should do what you want...
Related
Does anyone know the best way to carry out a "for loop" that would read in different subject id's and append them to the name of an exported csv?
As an example, I have multiple output files from an electrocardiogram software program (each file belongs to one individual). The files are named C800_HR.bdf.evt, C801_HR.bdf.evt, C802_HR.bdf.evt etc. Each file gets read into r and then has a script applied to calculate heart rate variability. At the end of the script, I need to add a loop that will extract the subject id (e.g., C800, C801, C802) and write a new file name for each individual so that it becomes C800_RtoR.csv. Essentially, I would like to avoid changing the syntax every time I read in and export a file name.
I am currently using the following syntax to read in multiple files:
>setwd("/Users/kmpc/Downloads")
>myhrvdata <-lapply(Sys.glob("C8**_HR.bdf.evt"), read.delim)
Try this out:
cardio_files <- list.files(pattern = "C8\\d{2}_HR.bdf.evt")
subject_ids <- sub("^(C8\\d{2})_.*", "\\1" cardio_files)
myList <- lapply(cardio_files, read.delim)
## do calculations on the list
for (i in names(myList)) {
write.csv(myList[[i]], paste0(subject_ids[i], "_RtoR.csv"))
}
The only thing is, you have to deal with using a list when doing your calculations. You could combine them to a single data.frame, but it would be best to leave it as a list to write the files at the end.
Consider generalizing your process by creating a function that: 1) reads in file, 2) processes data, 3) outputs to csv. Then have lapply call the defined method iteratively across all Sys.glob items and even return a list of calculated data frames.
proc_heart_rate <- function(f_name) {
# READ IN .evt FILE INTO df
df <- read.delim(f_name)
# CALCULATE HEART RATE VARIABILITY WITH df
...
# OUTPUT df TO CSV
subject_id <- gsub("\\_.*", "", f_name)
write.csv(df, paste0(subject_id, "_RtoR.csv"))
# RETURN df FOR OTHER USES
return(df)
}
# LIST OF DATA FRAMES WITH CALCULATIONS
myhrvdata_list <-lapply(Sys.glob("C8**_HR.bdf.evt"), proc_heart_rate)
I have a quite big number of quite heavy datasets. I would like to extract a subset out of each of them and save it into different csv files (one for each dataset). These are the commands I would like to loop for all the files I have in the folder:
df <-read.csv("1985.csv",header=FALSE,stringsAsFactors=TRUE,sep="\t")
df_short <- df[df$V6=="OPP", ]
write.csv(df_short, file = "OPP_1985.csv",row.names=FALSE)
rm(df)
rm(df_short)
This is probably a very noob question, but I am struggling to understand how to do it, so I would appreciate a lot help with this!
EDIT:
Following #SimonShine's suggestion, I have run this code and it works!
You don't specify if you are trying to collect the subsets into one dataset, or if you are trying to make one file per subset. You refer to OPP_1985 that appears out of scope for the code you wrote. Did you mean to refer to df_short?
You could start by abstracting what you want to do with one datafile into a function, e.g.:
extract_and_save_from_dataset <- function(csvfile) {
df <- read.csv(csvfile, header=F, stringsAsFactors=T, sep="\t")
df_short <- df[df$V6 == "OPP",]
csvfile_short <- gsub(".csv", "_short.csv", csvfile)
write.csv(df_short, file=csvfile_short, row_names=F)
}
Assuming you have a collection of dataset filenames, you could apply this function multiple times:
# csvfiles <- c("OPP_1985.csv", "OPP_1986.csv", ...)
csvfiles <- list.files("/path/to/my/csvfiles")
for (csvfile in csvfiles) {
extract_and_save_from_dataset(csvfile)
}
The data.table approach is probably the fastest option, specially if you have a large dataset. The function fwrite{data.table} works in parallel using many CPUS, making it extremely fast.
Here is how you can divide your original data according to subgroups defined based on the values of df$V6 and save each subset into a separate .csv file.
library (data.table)
set(df)[, fwrite(.SD, paste0("output_", V6,".csv")), by = V6, .SDcols=names(df) ]
ps. The name of the files will be output_*.csv where * is the correspondent V6 value.
I'm sure this is very simple, but I'm new to doing my own programming in R and haven't quite gotten a hang of the syntax for looping.
I have code like this:
mydata1 <- read.table("ph001.txt", header=TRUE)
# ... series of formatting and merging steps
write.table(mydata4, "ph001_anno.txt", row.names=FALSE, quote=FALSE, sep="\t")
png("manhattan_ph001.png"); manhattan(mydata4); dev.off()
png("qq_ph001.png"); qq(mydata4$P); dev.off()
The input file ph001.txt is output from a linear regression algorithm, and from that file, I need to output ph001_anno.txt, manhattan_ph001.png, and qq_ph001.png. The latter two are using the qqman package.
I have a folder that contains ph001 through ph138, and would like a loop function that reads these files individually and creates the corresponding output files for each file. As I said, I'm sure there is an easy way to do this as a loop function, but the part that's tripping me up is modifying the output filenames.
You can use the stringr package to do a lot of the string manipulation you want in order to generate your file names, like so:
f <- function(i) {
num <- str_pad(i, 3, pad = "0")
a <- str_c("ph", num, "_anno.txt")
m <- str_c("manhattan_ph", num, ".png")
q <- str_c("qq_ph", num, ".png")
# Put code to do stuff with these file names here
}
sapply(1:138, f)
In the above block of code, for each number in 1:138 you create the name of three files. You can then use those file names in calls to read.table or ggsave or whatever you want.
my question ties to the following problem:
Run external R script n times and save outputs in a data frame
The difference is, that I dont generate different results by randomization functions but would like to use every time a different set of input variables (e.g. run the chunk of code for a range of latitudes lat=c(50,60,70,80))
Has anyone a hint for me?
Thanks a lot!
Wrap the script into a function by putting:
my_function <- function(latitude) {
at the top and
}
at the bottom.
That way, you could source it once then then use ldply from the plyr package:
results <- ldply(10 * 5:8, myFunction)
If you wanted a column to identify which latitude was used, you could either add that to your function's data.frame or use:
results <- ldply(10 * 5:8, function(lat) data.frame(latitude = lat, myFunction())
If for some reason you didn't want to modify your script, you could create a wrapper function:
my_wrapper <- function(a) {
latitude <- a
source("script.R", local = TRUE)$value
}
or even use eval and parse:
my_function <- eval(parse(text = c("function(latitude) {",
readLines("script.R"), "}")))
Hi I'm writing a function to read a file and returning a time series. I then need to assign this time series to a variable. I'm trying to do this in a terse way and utilise any functional programming features of R.
# read a file and returns the
readFile <- function( fileName , filePath){
fullPath <- paste(filePath, filename, sep='');
f <- as.xts(read.zoo(fullPath, format='%d/%m/%Y',
FUN=as.Date, header=TRUE, sep='\t'));
return(na.locf(f));
}
filePath <- 'C://data/'
# real list of files is a lot longer
fnames <- c('d1.csv', 'd2.csv','d3.csv');
varnames <- c('data1', 'data2', 'data3');
In the above piece of code I would like to initialise variables by the name of data1, data2, data2 by applying the readfile function to fnames and filepath (which is always constant).
Something like :
lapply( fnames, readFile, filePath);
The above doesnt work, of course and neither does it do this dynamic variable assignment that I'm trying to achieve. Any R functional programming gurus out there that could guide me ?
The working version of this would look something like :
data1 <- readFile( 'd1.csv', filepath);
data2 <- readFile( 'd2.csv', filepath);
YIKES
Constructing many variables with specified names is a somewhat common request on SO and can certainly be managed with the assign function, but you'll probably find it easier to handle your data if you build a list instead of multiple variables. For instance, you could read in all the results and obtain a named list with:
lst <- setNames(lapply(fnames, readFile, filePath), varnames)
Then you could access your results from each csv file with lst[["data1"]], lst[["data2"]], and lst[["data3"]].
The benefit of this approach is that you can now perform an operation over all your time series variables using lapply(lst, ...) instead of looping through all your variables or looping through variable names and using the get function.