I'm writing a function that uses the Pushshift API to get data from subreddits, and iterates over the function to get more than the maximum amount of posts. So far it works and prints to the screen, but won't save the data frame to the environment. What am I doing wrong here? Bit of an R newbie so any help and explanations would be great!
Here is my code:
get_data_loop <- function(after, subreddit, iterations = c(1:2)) {
loopeddata <- as.data.frame(NULL)
for(i in iterations) {
after+1
data <- as.data.frame(fromJSON(paste("https://api.pushshift.io/reddit/search/comment/?after=", after, "d&subreddit=", subreddit, "&size=100&fields=body,author", sep = "")))
loopeddata <- rbind(data, loopeddata)
#Sleep for API
Sys.sleep(5)
}
print(i)
view(loopeddata)
}
You have to return the variable you are interested in from the function. In the last line of your function use return(loopeddata) and then run the function as
loopeddata <- get_data_loop(..your..options..)
Alternatively, you can, after the iterations, assign that variable from inside the function to a global variable. Last functon line could be but I would prefer the former soltuon which is cleaner:
assign("loopeddata", loopeddata, .GlobalEnv)
Related
I have written a function that should make my script more streamlined, however I am having trouble with assigning variables or calling variables. In this case I want tax_count to be as if Domain_count was written outside the function. I have tried the solution by R: How can a function assign a value to a variable that will persist outside the function? but with no luck.
Here is my function and calling the function:
taxonomic_count <- function(tax_count, tax, col_num, data) {
tax_count <- aggregate(.~tax, data, sum)
tax_count$count <- rowSums(select(tax_count, col_num:7))
tax_count <- tax_count %>%
select(tax, count)
}
taxonomic_count(Domain_count, Domain, 2, Data1)
Help would be awesome!
In R I am attempting to create a function that will do multiple other functions with just one command input.
I have tried
data_answer<-function(){
data<-read.csv("data.csv",stringsAsFactors=TRUE)
summary(data)
Cor(data[c("x","y","z","h")])
pairs(data[c("x","y","z","h")])
data_train<-data[1,1000]
data_test<-data[1001,1500]
data_model<-lm(h~x+y+z,data=data)
data_pred<-predict(data_model,data_test)
}
All that results is a big multiple scatterplot window.
So my question is:
How do I write a function that runs all the above commands and shows the results of each and use a outside data set as the parameter. So then I can just enter data_answer("_____") in R and it will run all the functions on that dataframe.
First of all, make your function a function of file name:
data_answer<-function(filename){
data<-read.csv(filename,stringsAsFactors=TRUE)
then either print all the functions you call on the data that otherwise return data, or return a list of all the results. For example:
f = function(x){
print(mean(x))
print(sd(x))
}
f(runif(100))
will print the mean and sd of 100 random numbers, and:
f2 = function(x){
result = list(mean=mean(x), sd=sd(x))
return(result)
}
f2(runif(100))
will return a list of the mean and sd which will then get printed. You can store it and print it later, or access the values:
zz = f2(runif(100))
zz
zz$mean
First let me say that I am not an expert coder and any advice about this particular question or my general technique will be greatly appreciated.
I have a large data set that is made up of similar data frames named Table6.# such as: Table6.1, Table6.2, ect. I have variables in each data frame that repeat as well, such as: ST1_Delta_PV%, ST2_Delta_PV%, ect. and ST1_Realloc_Margin, ST2_Reallocation_Margin, ect.
I am trying to write several nested loops that will calculated values in each table across these similar variables. I have tried to do this with the paste function as shown below, but this is obviously not the correct way to do this.
for (i in 1:25){
for (j in 1:4){
for (k in 1:length(paste("Table6.",i,"sep="")[,1]){
paste("Table6.",i,sep="")$paste("ST",j,"NonTgt_Shr",sep="")[k] <- paste("Table6.",i,sep="")$paste("ST",j,"_Delta_PV%",sep="")[k] * paste("Table6.",i,sep="")$paste("ST",j,"_Reallocation_Margin",sep="")[k]
}
}
}
I apologize if this is a complete mess. I appreciate your help.
As akrun says, you should put your data frames in a list
Tables <- list(Table6.1, Table6.2, …)
for (Table in Tables) { … }
This way, you do not need to use paste to construct the different Table names.
For accessing the different columns, you can use the df["column"] syntax - this is similar to df$column, except that inside the brackets, you can use any string
nonTgt_Shr.column.name <- paste0("ST",j,"NonTgt_Shr")
delta.column.name <- paste0("ST",j,"_Delta_PV%")
for (k in 1:nrow(Table) {
Table[nonTgt_Shr.column.name][k] <- Table[delta.column.name][k] * …
}
Note how I use variables for storing the name, making the line with the actual computation much more readable.
Also, nrow is more intuitive than length(Table[,1]).
The calculations could be transformed into a function which improves readability, scaling and
robustness
In the actual calculation function, the function get is used to retrieve the data frame based on the name.
#Calculation Function
fn_CalcVariables <- function(
tableName="Table6.1",
outputVarName="NonTgt_Shr",
inputVarNames=c("_Delta_PV%", "_Reallocation_Margin"),
variablePrefix="ST1"
) {
DF <- get(tableName)
outputVarName <- paste0(variablePrefix, outputVarName)
inputVarNames <- paste0(variablePrefix, inputVarNames)
DF[,outputVarName] <- DF[,inputVarNames[1]] * DF[,inputVarNames[2]]
return(DF)
}
This function should by called by nested lapply calls.
lapply iterates over the lists of the arguments, calls the function (second argument), and collects a list of the return values.
(As an exercise, try l <- list(a=1, b=2); lapply(l, function(x) { x*2 }).)
#List object names for tables and variable names
tableNamesList <- paste0("Table6.",1:25)
variablePrefixList <- paste0("ST",1:4)
#Nested loops to invoke custom function from above
lapply(variablePrefixList, function(alpha) {
lapply(tableNamesList, function(x, varprefix=alpha) {
cat("Begin Processing Table",x,"varPrefix",varprefix,"\n")
fn_CalcVariables(
tableName=x,
outputVarName="NonTgt_Shr",
inputVarNames=c("_Delta_PV%","_Reallocation_Margin"),
variablePrefix=varprefix
)
cat("End Processing Table", x, "varPrefix", varprefix, "\n")
}) #End of innner lapply
}) #End of outer lapply
everyone.
I am programming a simulation app in Shiny R and I am stuck at the for loops.
Basically, in an reactive I am calling a function that loops through a couple of other functions, like this:
In the server.R:
output.addiction <- reactive ({
SimulateMultiple(input$no.simulations, vectors(), parameters(), input$S.plus, input$q,
input$weeks, input$d, list.output)
})
The function:
SimulateMultiple <- function (no.simulations, vectors, parameters, S.plus, q, weeks, d, list.output) {
for (i in 1:no.simulations) {
thisi <- i
simulation <- SimulateAddictionComponents(vectors, parameters, S.plus, q, weeks, d) # returns list "simulation"
df.output <- BuildOutputDataframe(weeks, simulation, vectors) # returns "df.outout"
output.addiction <-BuildOutputList(df.output, simulation, list.output) # returns "output.addiction"
}
return(output.addiction)
}
And, again, the last function that creates the out put list:
BuildOutputList <- function (df.output, simulation, list.output) {
addiction <- simulation$addiction
output.w.success <- list(df.output, addiction) # includes success data
output.addition <- c(list.output, list(output.w.success)) # adds the new data to the list
return(output.addition)
}
I read about the issue online a lot, I tried to isolate some stuff, to introduce a local({}) etc. But it never works. In the end, I get a list of length 1.
I would be forever grateful, if you could help me - I have been on this for two days now.
The problem solved itself when I edited the code in the function from
output.addition <- c(list.output, list(output.w.success)) # adds the new data to the list
return(output.addition)
to
list.output <- c(list.output, list(output.w.success)) # adds the new data to the list
return(list.output)
so as to not overwrite the object every time in the loop. After all - very easy and stupid problem, but hard to spot.
So, I built a function called sort.song.
My goal with this function is to randomly sample the rows of a data.frame (DATA) and then filter it out (DATA.NEW) to analyse it. I want to do it multiple times (let's say 10 times). By the end, I want that each object (mantel.something) resulted from this function to be saved in my workspace with a name that I can relate to each cycle (mantel.something1, mantel.somenthing2...mantel.something10).
I have the following code, so far:
sort.song<-function(DATA){
require(ade4)
for(i in 1:10){ # Am I using for correctly here?
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist<-dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist<-dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist<-dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist<-dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist<-dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist<-dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist<-dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
mantel.numnotes[i]<<-mantel.rtest(coord.dist,num.notes.dist,nrepet=1000)
mantel.songdur[i]<<-mantel.rtest(coord.dist,songdur.dist,nrepet=1000)
mantel.hfreq[i]<<-mantel.rtest(coord.dist,hfreq.dist,nrepet=1000)
mantel.lfreq[i]<<-mantel.rtest(coord.dist,lfreq.dist,nrepet=1000)
mantel.bwidth[i]<<-mantel.rtest(coord.dist,bwidth.dist,nrepet=1000)
mantel.hfreqlnote[i]<<-mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
}
}
Could someone please help me to do it the right way?
I think I'm not assigning the cycles correctly for each mantel.somenthing object.
Many thanks in advance!
The best way to implement what you are trying to do is through a list. You can even make it take two indices, the first for the iterations, the second for the type of analysis.
mantellist <- as.list(1:10) ## initiate list with some values
for (i in 1:10){
...
mantellist[[i]] <- list(numnotes=mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur=mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq=mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
...)
}
return(mantellist)
In this way you can index your specific analysis for each iteration in an intuitive way:
mantellist[[2]][['hfreq']]
mantellist[[2]]$hfreq ## alternative
EDIT by Mohr:
Just for clarification...
So, according to your suggestion the code should be something like this:
sort.song<-function(DATA){
require(ade4)
mantellist <- as.list(1:10)
for(i in 1:10){
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist<-dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist<-dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist<-dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist<-dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist<-dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist<-dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist<-dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
mantellist[[i]] <- list(numnotes=mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur=mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq=mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
lfreq=mantel.rtest(coord.dist,lfreq.dist,nrepet=1000),
bwidth=mantel.rtest(coord.dist,bwidth.dist,nrepet=1000),
hfreqlnote=mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
)
}
return(mantellist)
}
You can achieve your objective of repeating this exercise 10 (or more times) without using an explicit for-loop. Rather than have the function run the loop, write the sort.song function to run one iteration of the process, then you can use replicate to repeat that process however many times you desire.
It is generally good practice not to create a bunch of named objects in your global environment. Instead, you can hold of the results of each iteration of this process in a single object. replicate will return an array (if possible) otherwise a list (in the example below, a list of lists). So, the list will have 10 elements (one for each iteration) and each element will itself be a list containing named elements corresponding to each result of mantel.rtest.
sort.song<-function(DATA){
DATA.NEW <- DATA[sample(1:nrow(DATA),replace=FALSE),]
DATA.NEW <- DATA.NEW[!duplicated(DATA.NEW$Point),]
coord.dist <- dist(DATA.NEW[,4:5],method="euclidean")
num.notes.dist <- dist(DATA.NEW$Num_Notes,method="euclidean")
songdur.dist <- dist(DATA.NEW$Song_Dur,method="euclidean")
hfreq.dist <- dist(DATA.NEW$High_Freq,method="euclidean")
lfreq.dist <- dist(DATA.NEW$Low_Freq,method="euclidean")
bwidth.dist <- dist(DATA.NEW$Bwidth_Song,method="euclidean")
hfreqlnote.dist <- dist(DATA.NEW$HighFreq_LastNote,method="euclidean")
return(list(
numnotes = mantel.rtest(coord.dist,num.notes.dist,nrepet=1000),
songdur = mantel.rtest(coord.dist,songdur.dist,nrepet=1000),
hfreq = mantel.rtest(coord.dist,hfreq.dist,nrepet=1000),
lfreq = mantel.rtest(coord.dist,lfreq.dist,nrepet=1000),
bwidth = mantel.rtest(coord.dist,bwidth.dist,nrepet=1000),
hfreqlnote = mantel.rtest(coord.dist,hfreqlnote.dist,nrepet=1000)
))
}
require(ade4)
replicate(10, sort.song(DATA))