Unable to program a loop for regression analysis in R - r

I have a problem with my R code. I would like to run about 100 regressions and perform this process with a loop. I have tried to program the loop myself using help from YouTube and the like, but I am getting nowhere. Therefore, I would like to ask you if you can help me.
Specifically, it's about the following:
I have a dataset of the 100 companies in the Nasdaq100 and I would like to regress the sales per share with the stock price performance on a quarterly basis. Another problem is that the data set contains these 100 companies and a subset with the respective ticker symbol has to be created for each additional company so that R can access it correctly for each regression.
Here is an excerpt from the code:
Nasdaq_100 = read_xlsx("Nasdaq_100_Sales_Data.xlsx")
#Correlation between quarterly close price and Sales of AMD
AMD <- subset (Nasdaq_100, Nasdaq_100$TickerSymbol=="AMD")
AMD_regression = lm(AMD$Sales ~ AMD$Stockprice_quarterly, data = Nasdaq_100)
summary(AMD_regression)
Can you help me to program this loop for regression analysis?
Thanks in advance for any help!

To convert this to a for loop, first get a list of the .xlsx files in your working directory:
require(data.table)
myfiles <- list.files(pattern="*.xlsx")
Then loop through each file, and saving with minor modifications to your existing code:
for (file in myfiles) {
Nasdaq_100 <- data.table::fread(file)
AMD <- subset (Nasdaq_100, Nasdaq_100$TickerSymbol=="AMD")
AMD_regression = lm(AMD$Sales ~ AMD$Stockprice_quarterly, data = Nasdaq_100)
summary(AMD_regression)
data.table::fwrite(AMD_regression, file=paste0("output_", file), quote = F, sep = "\t", row.names = F)
}
Copy-paste in r and Let me know if it works.

file <- choose.files()
lmset <- data.frame(x='x',y='y')
for(i in seq_len(100))
{
data <- read_excel(file,sheet=i)
lmset <- rbind(lmset,lm(AMD$Sales~AMD$Stockprice_quarterly,data=data)$coefficients)
}

Related

How do I summarise data from multiple files into batches of multiple files?

The file names my lab has contains the monkey ID, system number, date, and the task the data is for, and each file contains a header row. We would like to check their progress daily, so we’re interested in seeing if we can automate the daily summaries as much as possible using R. The way we are doing it with Excel is clunky and takes too much time. For a daily summary, we’d need to know what task(s) they worked on, how many trials they did (on each task), and their percentage correct (on each task).
I have figured out the coding script to determine the daily summaries but I’m having issues as right now the script only runs with one monkey and one data file. Ideally, I would like to have it so that I could effectively point the script at a folder with a bunch of data files from different monkeys and generate daily summaries. Here are some examples of the summaries I have generated:
7-255 Summary table:
n_trials correct incorrect task_name
7 42.85714 57.14286 SHAPE2
H033 Summary table:
n_trials correct incorrect task_name
177 44.0678 55.9322 MTSseq
I have attached my coding script below:
library(tidyverse)
#library(readr)
file <- "/Users/siddharthsatishchandran/R project/R project status 1/Data/H033.csv"
data_trials <- read_csv(file)
head(data_trials)
summary(data_trials)
n_trials <- length(data_trials$trial)
correct <- mean(data_trials$correct)*100
incorrect <- 100 - correct
df_trials_correct <- data.frame(n_trials = n_trials,
correct = correct,
incorrect = incorrect,
task_name = unique(data_trials$task_name))
df_trials_correct
This might be what you are looking for:
path <- "/Users/siddharthsatishchandran/R project/R project status 1/Data"
file_list <- list.files(
path,
pattern = "csv$"
)
summary_tables <- list()
for (file in file_list) {
data_trials <- read_csv(file.path(path, file))
summary_tables[[files]] <- data.frame(
n_trials = length(data_trials$trial),
correct = mean(data_trials$correct)*100,
incorrect = 100 - correct,
task_name = unique(data_trials$task_name)
)
}
Now you get a list of data.frames, each containing your desired information.
This could be "flattened" into a single data.frame using bind_rows:
bind_rows(summary_tables, .id = "monkey_id")

How to get averages of values from multiple data sets written to one output file

so I've used R for a few years but I find thinking around coding issues pretty hard still so if any explanations can assume as little as possible I'd really appreciate it!
Basically, I have lots of data files that correspond to different dates. I would like to write some sort of loop where I can have each day's data file be read in, analysis taken place (e.g. the mean of one of the columns) and the output go to a separate file labelled by the date/name of the file (The date isn't currently part of the data file so I haven't figured out how to have that in the code yet)
To complicate things I need to pull out subsets from the data file to analyze separately. I've figured out how to do this and get the separate means already I just don't know how to incorporate the loop.
#separating LINK (SL) satellites from entire list
SL<- data[grepl("^LINK", data$name), ]
#separating non-SL sat. from entire list
nonSL<- data[!grepl("^LINK", data$name), ]
analyse<- function(filenames){
#mean mag. for satellites in data frame
meansat<- print(mean(data[,2]))
#mean mag. for LINK sat. in data frame
meanSLsat<- print(mean(SL[,2]))
#mean mag. non-SL sat. in data frame
meannonSLsat<- print(mean(nonSL[,2]))
means<-c(meansat, meanSLsat, meannonSLsat)
}
#looping in data files
filenames<- list.files(path = "Data")
for (f in filenames) {
print(f)
allmeans<-analyse(f)
}
write.table(allmeans, file = "outputloop.txt", col.names = "Mean Magnitude", quote = FALSE, row.names = FALSE)
This is what I have so far, but it's not working and I don't understand why. There are feeble attempts for a loop but I have no idea where/the order for putting in a loop when I need to then separate out the subclasses, so any help would be really appreciated! Thank you in advance!
Try:
for (f in filenames) {
allmeans <-analyse(f)
file_out <- paste(f, "_output.txt", sep='') # This is for creating different filenames for each file analyzed
write.table(allmeans, file = file_out , col.names = "Mean Magnitude", quote = FALSE, row.names = FALSE)
}

R: use single file while running a for loop on list of files

I am trying to create a loop where I select one file name from a list of file names, and use that one file to run read.capthist and subsequently discretize, fit, derived, and save the outputs using save. The list contains 10 files of identical rows and columns, the only difference between them are the geographical coordinates in each row.
The issue I am running into is that capt needs to be a single file (in the secr package they are 'captfile' types), but I don't know how to select a single file from this list and get my loop to recognize it as a single entity.
This is the error I get when I try and select only one file:
Error in read.capthist(female[[i]], simtraps, fmt = "XY", detector = "polygon") :
requires single 'captfile'
I am not a programmer by training, I've learned R on my own and used stack overflow a lot for solving my issues, but I haven't been able to figure this out. Here is the code I've come up with so far:
library(secr)
setwd("./")
files = list.files(pattern = "female*")
lst <- vector("list", length(files))
names(lst) <- files
for (i in 1:length(lst)) {
capt <- lst[i]
femsimCH <- read.capthist(capt, simtraps, fmt = 'XY', detector = "polygon")
femsimdiscCH <- discretize(femsimCH, spacing = 2500, outputdetector = 'proximity')
fit <- secr.fit(femsimdiscCH, buffer = 15000, detectfn = 'HEX', method = 'BFGS', trace = FALSE, CL = TRUE)
save(fit, file="C:/temp/fit.Rdata")
D.fit <- derived(fit)
save(D.fit, file="C:/temp/D.fit.Rdata")
}
simtraps is a list of coordinates.
Ideally I would also like to have my outputs have unique identifiers as well, since I am simulating data and I will have to compare all the results, I don't want each iteration to overwrite the previous data output.
I know I can use this code by bringing in each file and running this separately (this code works for non-simulation runs of a couple data sets), but as I'm hoping to run 100 simulations, this would be laborious and prone to mistakes.
Any tips would be greatly appreciated for an R novice!

Data transpose function in R not working properly

I am using R to do some work but I'm having difficulties in transposing data.
My data is in rows and the columns are different variables. When using the function phyDat, the author indicates a transpose function because importing data is stored in columns.
So I use the following code to finish this process:
#read file from local disk in csv format. this format can be generated by save as function of excel.
origin <- read.csv(file.choose(),header = TRUE, row.names = 1)
origin <- t(origin)
events <- phyDat(origin, type="USER", levels=c(0,1))
When I check the data shown in R studio, it is transposed but the result it is not. So I went back and modified the code as follows:
origin <- read.csv(file.choose(),header = TRUE, row.names = 1)
events <- phyDat(origin, type="USER", levels=c(0,1))
This time the data does not reflect transposed data, and the result is consistent with it.
How I currently solve the problem is transposing the data in CSV file before importing to R. Is there something I can do to fix this problem?
I had the same problem and I solved it by doing an extra step as follows:
#read file from local disk in csv format. this format can be generated by save as function of excel.
origin <- read.csv(file.choose(),header = TRUE, row.names = 1)
origin <- as.data.frame(t(origin))
events <- phyDat(origin, type="USER", levels=c(0,1))
Maybe it is too late but hope it could help other users with the same problem.

read, manipulate and export multiple .dta Files using a for Loop in R

I have multiple time series (each in a seperate file), which I need to adjust seasonally using the season package in R and store the adjusted series each in a seperate file again in a different directory.
The Code works for a single county.
So I tried to use a for Loop but R is unable to use the read.dta with a wildcard.
I'm new to R and using usually Stata so the question is maybe quite stupid and my code quite messy.
Sorry and Thanks in advance
Nathan
for(i in 1:402)
{
alo[i] <- read.dta("/Users/nathanrhauke/Desktop/MA_NH/Data/ALO/SEASONAL_ADJUSTMENT/SINGLE_SERIES/County[i]")
alo_ts[i] <-ts(alo[i], freq = 12, start = 2007)
m[i] <- seas(alo_ts[i])
original[i]<-as.data.frame(original(m[i]))
adjusted[i]<-as.data.frame(final(m[i]))
trend[i]<-as.data.frame(trend(m[i]))
irregular[i]<-as.data.frame(irregular(m[i]))
County[i] <- data.frame(cbind(adjusted[i],original[i],trend[i],irregular[i], deparse.level =1))
write.dta(County[i], "/Users/nathanrhauke/Desktop/MA_NH/Data/ALO/SEASONAL_ADJUSTMENT/ADJUSTED_SERIES/County[i].dta")
}
This is a good place to use a function and the *apply family. As noted in a comment, your main problem is likely to be that you're using Stata-like character string construction that will not work in R. You need to use paste (or paste0, as here) rather than just passing the indexing variable directly in the string like in Stata. Here's some code:
f <- function(i) {
d <- read.dta(paste0("/Users/nathanrhauke/Desktop/MA_NH/Data/ALO/SEASONAL_ADJUSTMENT/SINGLE_SERIES/County",i,".dta"))
alo_ts <- ts(d, freq = 12, start = 2007)
m <- seas(alo_ts)
original <- as.data.frame(original(m))
adjusted <- as.data.frame(final(m))
trend <- as.data.frame(trend(m))
irregular <- as.data.frame(irregular(m))
County <- cbind(adjusted,original,trend,irregular, deparse.level = 1)
write.dta(County, paste0("/Users/nathanrhauke/Desktop/MA_NH/Data/ALO/SEASONAL_ADJUSTMENT/ADJUSTED_SERIES/County",i,".dta"))
invisible(County)
}
# return a list of all of the resulting datasets
lapply(1:402, f)
It would probably also be a good idea to take advantage of relative directories by first setting your working directory:
setwd("/Users/nathanrhauke/Desktop/MA_NH/Data/ALO/SEASONAL_ADJUSTMENT/")
Then you can simply the above paths to:
d <- read.dta(paste0("./SINGLE_SERIES/County",i,".dta"))
and
write.dta(County, paste0("./ADJUSTED_SERIES/County",i,".dta"))
which will make your code more readable and reproducible should, for example, someone ever run it on another computer.

Resources