I am using "edgarWebR" package to get the data from USSEC EDGAR website. There is a function in the package called "company_filings", which has several arguments and I would like to use four of the arguments and it should be like this -
company_filings (comp, type = c('10-K','10-Q'), before = 20181231, count = 40)
where comp is a vector defined as follows -
comp <- c ("AAPL", "GOOG", "INTC")
but the company_filings function accepts only one element at a time in comp vector - for example -
company_filings ("AAPL", type = c('10-K','10-Q'), before = 20181231, count = 40)
Actually, I use the following code to get the results for all elements in comp vector -
filing <- Reduce(rbind, lapply(comp, company_filings))
but it does not work. Can anybody help me in this respect?
I appreciate your help.
To use functions in the apply family, the function in question should be of a single variable. You can create an anonymous function of one variable from a function of several variables and do something like:
sapply(comp,function(x){company_filings (x, type = c('10-K','10-Q'), before = 20181231, count = 40)})
Related
I am attempting to have multiple mutates within a for loop, but each of the mutate should create a new variable in an increasing numeric. Am I able to do this within one block of code without repeating the same line with the new variable name? In this example, I am attempting to find the datediff between 2 dates within each mutuate function
ex:
for (i in c(1:nrow(pd)))
{
result <- pd %>%
mutate(n1.DateDiff = abs(difftime(pd[i,]$`Date(US)`, n1.Status_dt, units = c("days"))),
n2.DateDiff = abs(difftime(pd[i,]$`Date(US)`, n2.Status_dt, units = c("days"))),
n3.DateDiff = abs(difftime(pd[i,]$`Date(US)`, n3.Status_dt, units = c("days"))))
}
Ideally, I'd like one line where I'm able to loop and create n1-n3 without writing this in 3 lines.
The across pattern recommended by #akrun is the way such problems are intended to be solved. But if you want to use a for loop, perhaps you are looking for something like the following:
in_cols = c("n1.Stats_dt", "n2.Stats_dt", "n3.Stats_dt")
out_cols = c("n1.DateDiff", "n2.DateDiff", "n3.DateDiff")
for(ii in 1:3){
this_in = in_cols[ii]
this_out = out_cols[ii]
df = df %>%
mutate(!!sym(this_out) := difftime(pd, !!sym(this_in)))
}
Notes:
!!sym(.) is used to turn a text string into a variable
:= is equivalen to = but allows us to use !!sym(.) on the left-hand side
I'm still new to writing my own functions. As an exercise and because I use it alot, I want to write a flexible function to easily reverse survey response scales. This is what I came up with:
rev_scale = function(var, new_var, scale){
for (i in 1:length(abs(var))){
new_var[i] = scale-abs(var[i])+1
}
}
Info on code
var = variable I want to reverse.
new_var = new column with the reversed variable
scale = how many points in the scale (eg. 5 for a 5-point scale)
The reason why I use 'abs' instead of just 'var' is that some dataframes also return value-labels, and I only want the values in this function.
Question
When applying this new function on a variable, R returns "NULL". However, if I run the for-loop separately, with the arguments 'imputed', my new variable is properly reversed.
Any ideas on what is happening here?
Thanks in advance!
### Example of the (working) for-loop with arguments 'imputed' ###
df <- data.frame(matrix(ncol = 1, nrow = 4))
df$var = c(1,2,3,4)
for (i in 1:length(abs(df$var))){
df$var_rev[i] = 4-abs(df$var[i])+1
}
df$var_rev
OUTPUT:
[1] 4 3 2 1
R does not use reference-variables (think pointers)*. So your new_var outside of your function does not get updated when refered to inside a function. Instead, R creates a new copy of new_var and updates that.
You should instead return the new value from your function. I.e.
rev_scale = function(var, scale){
res <- vector('numeric', length(var))
for (i in 1:length(abs(var))){
res[i] = scale-abs(var[i])+1
}
return(res)
}
Also note that I have removed new_var from the function's arguments. In other words, I have completely separated the functions input-arguments from its output.
The reason you get a NULL from the function is that in R, all functions returns somethings. If not specified, the function will return the last value of the last statement, except when the last statement is a control structure (ifs, loops) - then it defaults to a NULL.
* There are a couple of exceptions and work-arounds, but I will not go into that here.
Edit:
As benimwolfspelz noted, you do not need to explicitly iterate over each element in var, as R does this implicitly. Your entire function could be reduced to:
rev_scale = function(var, scale) {
scale-abs(var)+1
}
Secondly, in your for-loop, your can simplify length(abs(var)) to length(var) as abs(var) does not change the length of the vector.
I have a tibble called 'Volume' in which I store some data (10 columns - the first 2 columns are characters, 30 rows).
Now I want to calculate the relative Volume of every column that corresponds to Column 3 of my tibble.
My current solution looks like this:
rel.Volume_unmod = tibble(
"Volume_OD" = Volume[[3]] / Volume[[3]],
"Volume_Imp" = Volume[[4]] / Volume[[3]],
"Volume_OD_1" = Volume[[5]] / Volume[[3]],
"Volume_WS_1" = Volume[[6]] / Volume[[3]],
"Volume_OD_2" = Volume[[7]] / Volume[[3]],
"Volume_WS_2" = Volume[[8]] / Volume[[3]],
"Volume_OD_3" = Volume[[9]] / Volume[[3]],
"Volume_WS_3" = Volume[[10]] / Volume[[3]])
rel.Volume_unmod
I would like to keep the tibble structure and the labels. I am sure there is a better solution for this, but I am relative new to R so I it's not obvious to me. What I tried is something like this, but I can't actually run this:
rel.Volume = NULL
for(i in Volume[,3:10]){
rel.Volume[i] = tibble(Volume = Volume[[i]] / Volume[[3]])
}
Mockup Data
Since you did not provide some data, I've followed the description you provided to create some mockup data. Here:
set.seed(1)
Volume <- data.frame(ID = sample(letters, 30, TRUE),
GR = sample(LETTERS, 30, TRUE))
Volume[3:10] <- rnorm(30*8)
Solution with Dplyr
library(dplyr)
# rename columns [brute force]
cols <- c("Volume_OD","Volume_Imp","Volume_OD_1","Volume_WS_1","Volume_OD_2","Volume_WS_2","Volume_OD_3","Volume_WS_3")
colnames(Volume)[3:10] <- cols
# divide by Volumn_OD
rel.Volume_unmod <- Volume %>%
mutate(across(all_of(cols), ~ . / Volume_OD))
# result
rel.Volume_unmod
Explanation
I don't know the names of your columns. Probably, the names correspond to the names of the columns you intended to create in rel.Volume_unmod. Anyhow, to avoid any problem I renamed the columns (kinda brutally). You can do it with dplyr::rename if you wan to.
There are many ways to select the columns you want to mutate. mutate is a verb from dplyr that allows you to create new columns or perform operations or functions on columns.
across is an adverb from dplyr. Let's simplify by saying that it's a function that allows you to perform a function over multiple columns. In this case I want to perform a division by Volum_OD.
~ is a tidyverse way to create anonymous functions. ~ . / Volum_OD is equivalent to function(x) x / Volumn_OD
all_of is necessary because in this specific case I'm providing across with a vector of characters. Without it, it will work anyway, but you will receive a warning because it's ambiguous and it may work incorrectly in same cases.
More info
Check out this book to learn more about data manipulation with tidyverse (which dplyr is part of).
Solution with Base-R
rel.Volume_unmod <- Volume
# rename columns
cols <- c("Volume_OD","Volume_Imp","Volume_OD_1","Volume_WS_1","Volume_OD_2","Volume_WS_2","Volume_OD_3","Volume_WS_3")
colnames(rel.Volume_unmod)[3:10] <- cols
# divide by columns 3
rel.Volume_unmod[3:10] <- lapply(rel.Volume_unmod[3:10], `/`, rel.Volume_unmod[3])
rel.Volume_unmod
Explanation
lapply is a base R function that allows you to apply a function to every item of a list or a "listable" object.
in this case rel.Volume_unmod is a listable object: a dataframe is just a list of vectors with the same length. Therefore, lapply takes one column [= one item] a time and applies a function.
the function is /. You usually see / used like this: A / B, but actually / is a Primitive function. You could write the same thing in this way:
`/`(A, B) # same as A / B
lapply can be provided with additional parameters that are passed directly to the function that is being applied over the list (in this case /). Therefore, we are writing rel.Volume_unmod[3] as additional parameter.
lapply always returns a list. But, since we are assigning the result of lapply to a "fraction of a dataframe", we will just edit the columns of the dataframe and, as a result, we will have a dataframe instead of a list. Let me rephrase in a more technical way. When you are assigning rel.Volume_unmod[3:10] <- lapply(...), you are not simply assigning a list to rel.Volume_unmod[3:10]. You are technically using this assigning function: [<-. This is a function that allows to edit the items in a list/vector/dataframe. Specifically, [<- allows you to assign new items without modifying the attributes of the list/vector/dataframe. As I said before, a dataframe is just a list with specific attributes. Then when you use [<- you modify the columns, but you leave the attributes (the class data.frame in this case) untouched. That's why the magic works.
Whithout a minimal working example it's hard to guess what the Variable Volume actually refers to. Apart from that there seems to be a problem with your for-loop:
for(i in Volume[,3:10]){
Assuming Volume refers to a data.frame or tibble, this causes the actual column-vectors with indices between 3 and 10 to be assigned to i successively. You can verify this by putting print(i) inside the loop. But inside the loop it seems like you actually want to use i as a variable containing just the index of the current column as a number (not the column itself):
rel.Volume[i] = tibble(Volume = Volume[[i]] / Volume[[3]])
Also, two brackets are usually used with lists, not data.frames or tibbles. (You can, however, do so, because data.frames are special cases of lists.)
Last but not least, initialising the variable rel.Volume with NULL will result in an error, when trying to reassign to that variable, since you haven't told R, what rel.Volume should be.
Try this, if you like (thanks #Edo for example data):
set.seed(1)
Volume <- data.frame(ID = sample(letters, 30, TRUE),
GR = sample(LETTERS, 30, TRUE),
Vol1 = rnorm(30),
Vol2 = rnorm(30),
Vol3 = rnorm(30))
rel.Volume <- Volume[1:2] # Assuming you want to keep the IDs.
# Your data.frame will need to have the correct number of rows here already.
for (i in 3:ncol(Volume)){ # ncol gives the total number of columns in data.frame
rel.Volume[i] = Volume[i]/Volume[3]
}
A more R-like approach would be to avoid using a for-loop altogether, since R's strength is implicit vectorization. These expressions will produce the same result without a loop:
# OK, this one messes up variable names...
rel.V.2 <- data.frame(sapply(X = Volume[3:5], FUN = function(x) x/Volume[3]))
rel.V.3 <- data.frame(Map(`/`, Volume[3:5], Volume[3]))
Since you said you were new to R, frankly I would recommend avoiding the Tidyverse-packages while you are still learing the basics. From my experience, in the long run you're better off learning base-R first and adding the "sugar" when you're more familiar with the core language. You can still learn to use Tidyverse-functions later (but then, why would anybody? ;-) ).
This question already has answers here:
How to assign from a function which returns more than one value?
(16 answers)
Closed 6 years ago.
How can I return multiple objects in an R function? In Java, I would make a Class, maybe Person which has some private variables and encapsulates, maybe, height, age, etc.
But in R, I need to pass around groups of data. For example, how can I make an R function return both an list of characters and an integer?
Unlike many other languages, R functions don't return multiple objects in the strict sense. The most general way to handle this is to return a list object. So if you have an integer foo and a vector of strings bar in your function, you could create a list that combines these items:
foo <- 12
bar <- c("a", "b", "e")
newList <- list("integer" = foo, "names" = bar)
Then return this list.
After calling your function, you can then access each of these with newList$integer or newList$names.
Other object types might work better for various purposes, but the list object is a good way to get started.
Similarly in Java, you can create a S4 class in R that encapsulates your information:
setClass(Class="Person",
representation(
height="numeric",
age="numeric"
)
)
Then your function can return an instance of this class:
myFunction = function(age=28, height=176){
return(new("Person",
age=age,
height=height))
}
and you can access your information:
aPerson = myFunction()
aPerson#age
aPerson#height
Is something along these lines what you are looking for?
x1 = function(x){
mu = mean(x)
l1 = list(s1=table(x),std=sd(x))
return(list(l1,mu))
}
library(Ecdat)
data(Fair)
x1(Fair$age)
You can also use super-assignment.
Rather than "<-" type "<<-". The function will recursively and repeatedly search one functional level higher for an object of that name. If it can't find one, it will create one on the global level.
You could use for() with assign() to create many objects.
See the example from assign():
for(i in 1:6) { #-- Create objects 'r.1', 'r.2', ... 'r.6' --
nam <- paste("r", i, sep = ".")
assign(nam, 1:i)
Looking the new objects
ls(pattern = "^r..$")
One way to handle this is to put the information as an attribute on the primary one. I must stress, I really think this is the appropriate thing to do only when the two pieces of information are related such that one has information about the other.
For example, I sometimes stash the name of "crucial variables" or variables that have been significantly modified by storing a list of variable names as an attribute on the data frame:
attr(my.DF, 'Modified.Variables') <- DVs.For.Analysis$Names.of.Modified.Vars
return(my.DF)
This allows me to store a list of variable names with the data frame itself.
I have a custom function that I want to apply to any dataset that shares a common name.
common_funct=function(rank_p=5){
df = ANY_DATAFRAME_HERE[ANY_DATAFRAME_HERE$rank <rank_p,]
return(df)
}
I know with common functions I could do something like below to get the value of each.
apply(mtcars,1,mean)
But what if I wanted to do :
apply(any_dataset, 1, common_funct(anyvalue))
How would I pass that along?
library(dplyr)
mtcars$rank = dense_rank(mtcars$mpg)
iris$rank = dense_rank(iris$Sepal.Length)
Now how would I go about applying my same function to both values?
If I understand you question, I would suggest putting you data frames into a list and apply over it. So
## Your example function
common_funct=function(df, rank_p=5){
df[df$rank <rank_p,]
}
## Sanity check
common_funct(mtcars)
common_funct(iris)
Next create a list of the data frames
l = list(mtcars, iris)
and use lapply
lapply(l, common_funct)