r edgar Error: Input year(s) is not numeric - r

I have a "test" dataframe with 3 companies (ciknum variable) and years in which each company filed annual reports (fyearq):
ciknum fyearq
1 1408356 2012
2 1557255 2012
3 1557255 2013
4 1557255 2014
5 1557255 2015
6 1557255 2016
7 1555538 2013
8 1555538 2014
9 1555538 2015
10 1555538 2016
After obtaining the MasterIndex folder and running this code (see proposed solution) I use the R edgar package to obtain 10-K filings. I run the following code:
for (i in 1:nrow(test)){
firm<-test[i,"ciknum"] #edit: seems like mistake can be here since new firm data only contains 1 obs of 1 variable
year<-test[i,"fyearq"] #edit: seems like mistake can be here since new year data only contains 1 obs of 1 variable
my_getFilings(firm,'10-K',year,downl.permit="y")
}
And it keeps spitting the following error: Error: Input year(s) is not numeric. I checked the variable type and it seems my fyearq variable is numeric.
sapply(test,class)
ciknum fyearq
"numeric" "numeric"
Don't really understand why the "numeric" fyearq variable is not read as such by the my_getFilings function. Any help would be much appreciated.
Thank you in advance.

Martins
The ordering seems to matter here. I solved this problem by using the descriptor from the function, so that
my_getFilings(firm,'10-K',year,downl.permit="y")
as you wrote is written as
my_getFilings(cik.no = firm, form.def = '10-K', filing.year = 2016, downl.permit = "y")

Thank you #bartosz25 and M Grace,
I finally made it work through the following code:
for (row in 1:nrow(test)){
firm <- as.numeric(test[row, "ciknum"])
year <- as.numeric(test[row, "fyearq"])
my_getFilings(firm, c('10-K'), year, downl.permit="y")
}
Apologies for not posting it before.

Related

Error: object not found in R. Headers not naming from .csv file

I am new to R and I keep getting inconsistent results with trying to display a column of data from a csv. I am able to import the csv into R without issue, but I can't call out the individual columns.
Here's my code:
setwd('mypath')
cdata <- read.csv(file="cendata.csv",header=TRUE, sep=",")
cdata
This prints out the following:
year pop
1 2010 2,775,332
2 2011 2,814,384
3 2012 2,853,375
4 2013 2,897,640
5 2014 2,936,879
6 2015 2,981,835
7 2016 3,041,868
8 2017 3,101,042
9 2018 3,153,550
10 2019 3,205,958
When I try to plot the following, the columns cannot be found.
plot(pop,year)
Error: object 'pop' not found
I even checked if the column names existed, and only data shows up.
ls()
[1] "data"
I can manually enter the data and label them "pop" and "year" but that kind of defeats the point of importing the csv.
Is there a way to label each header as an object?
year and pop are not independent objects. You need to refer them as part of the dataframe you have imported. Also you might need to remove "," from the numbers to turn them to numeric before plotting. Try :
cdata$pop <- as.numeric(gsub(',', '', cdata$pop))
plot(cdata$year, cdata$pop)

R - Time Series ggplot Missing column

I am working to develop a time series plot in R. However, I can not seem to be able to access the columns in my data frame. The error message is Error in FUN(X[[i]], ...) : object 'Dates' not found.
Below includes my script and the brief table. Any help is much appreciated.
# Transpose USA to get dates
t_USA_G_1 <- as.data.frame(t(USA_G_1_date))
#Rename column headers
colnames(t_USA_G_1)[0] = "Dates"
colnames(t_USA_G_1)[1] = "USA_Net_Enrollment"
t_USA_G_1
#Time series plot
t_USA_G_1%>%
ggplot(aes(Dates, USA_Net_Enrollment)) +
geom_line() +
geom_point()
------Output-----
USA_Net_Enrollment
1999 96.56902
2000 96.69755
2001 96.28022
2002 94.99747
2003 94.74116
2004 93.37412
2005 93.68804
2006 94.81912
2007 95.86296
2008 96.26724
2009 94.81539
2010 93.62400
2011 92.91374
2012 93.16648
2013 92.77709
2014 93.09830
2015 93.75419
I found the answer using row.names.
t_USA_G_1%>%
ggplot(aes(row.names(t_USA_G_1), USA_Net_Enrollment)) +
geom_point(color="blue")+
labs(x="Dates", y="USA Net Enrollment")

Are there simple ways to lag (by group) in data frames without workarounds like data tables, xts, zoo, dplyr etc in R?

Whenever I want to lag in a data frame I realize that something that should be simple is not. While the problem has been asked & answered many times (see p.s.), I did not find a simple solution which I can remember until the next time I lag. In general, lagging does not seem to be a simple thing in R as the multiple workarounds testify. I run into this problem often and it would be very helpful to have some basic R solutions which do not need extra packages. Could you provide your simple solution for lagging?
If that is not possible, could you at least provide your workaround here so we can choose amongst second best alternatives? One collection already exists here
Also, in all blog posts on this subject I see people complain about how unexpectedly difficult lagging is so how can we get a simple lag function for data frames into R Core? This must be extremely disappointing for anyone coming from Stata or EViews. Or am I missing something and there is a simple built in solution?
say we want to lag "value" by 3 "year"s for each "country" here:
Data <- data.frame(year=c(rep(2010:2015,2)),country=c(rep("AT",6),rep("DE",6)),value=rnorm(12))
to create L3 like:
year country value L3
2010 AT 0.3407 NA
2011 AT -1.7981 NA
2012 AT -0.8390 NA
2013 AT -0.6888 0.3407
2014 AT -1.1019 -1.7981
2015 AT -0.8953 -0.8390
2010 DE 0.5877 NA
2011 DE -1.0204 NA
2012 DE -0.6576 NA
2013 DE 0.6620 0.5877
2014 DE 0.9579 -1.0204
2015 DE -0.7774 -0.6576
And we neither want to change the nature of our data (to ts or data table) nor do we want to immerse ourselves in three new packages when the deadline is tonight and our supervisor uses Stata and thinks lagging is easy ;-) (its not, I just want to be prepared...)
p.s.:
without groups
with data.table: Lag in dataframe or How to create a lag variable within each group?
time series are straightforward
If the question is how to provide a column with the prior third year's value not using packages then try this:
prior_year3 <- function(x, k = 3) head(c(rep(NA, k), x), length(x))
transform(Data, prior_year_value = ave(value, country, FUN = prior_year3))
giving:
year country value prior_year_value
1 2010 AT -1.66562121 NA
2 2011 AT -0.04950063 NA
3 2012 AT 1.55930293 NA
4 2013 AT -0.40462394 -1.66562121
5 2014 AT 0.78602610 -0.04950063
6 2015 AT 0.73912916 1.55930293
7 2010 DE 1.03710539 NA
8 2011 DE -1.13370942 NA
9 2012 DE -1.20530981 NA
10 2013 DE 1.66870572 1.03710539
11 2014 DE 1.53615793 -1.13370942
12 2015 DE -0.09693335 -1.20530981
That said, to use R effectively you do need to learn how to use the key packages.
Try slide from data combine package, its simple
slide(Data,Var='value',GroupVar = 'country',slideBy=-3)

R plots: simple statistics on data by year. Base package

How to apply simple statistics to data and plot them elegantly by year using the R base plotting system and default functions?
The database is quite heavy, hence do not generate new variables would be preferable.
I hope it is not a silly question, but I am wondering about this problem without finding a specific solution not involving additional packages such as ggplot2, dplyr, lubridate, such as the ones I found on the site:
ggplot2: Group histogram data by year
R group by year
Split data by year
The use of the R default systems is due to didactic purposes. I think it could be an important training before turn on the more "comfortable" R specific packages.
Consider a simple dataset:
> prod_dat
lab year production(kg)
1 2010 0.3219
1 2011 0.3222
1 2012 0.3305
2 2010 0.3400
2 2011 0.3310
2 2012 0.3310
3 2010 0.3400
3 2011 0.3403
3 2012 0.3410
I would like to plot with an histogram of, let's say, the total production of material during specific years.
> hist(sum(prod_dat$production[prod_dat$year == c(2010, 2013)]))
Unfortunately, this is my best attempt, and it trow an error:
in prod_dat$year == c(2010, 2012):
longer object length is not a multiple of shorter object length
I am really out of route, hence any suggestion can turn in use.
without ggplot I used to do it like this but there are smarter way I think
all <- read.table(header = TRUE, stringsAsFactors = FALSE, text = "lab year production
1 2010 1
1 2011 0.3222
1 2012 0.3305
2 2010 0.3400
2 2011 0.3310
2 2012 0.3310
3 2010 0.3400
3 2011 0.3403
3 2012 0.3410")
ar <- data.frame(year = unique(all$year), prod = tapply(all$production, list(all$year), FUN = sum))
barplot(ar$prod)

R how to remove duplicates elements in the column and get average value

Sorry I am new to R, and the problem is quite hard for me,
Here is the matrix:
V1 predictions
1 Jeffery Howes 0.0909596345057677
2 Sherilee Waring 0.00434589236424605
3 Rachel Maitland 0.0909596345057677
4 Jan Maitland 0.0909596345057677
5 Jan Maitland 0.0909596345057677
6 Jan Maitland 0.0909596345057677
7 Jan Maitland 0.0909596345057677
8 Sandra McEwen 0.0909596345057677
....
How can I remove the duplicates in the columns (that's okay for me, could use unique, but the following problem is quite hard for me).
For example, there are many duplicated name Jan Maitland, duplicates should be removed, but the predications values should be calculated (the final result left should be the average value of those duplicate names)
Could someone help me on that? thanks a lot!!
you can use the dplyr library :
result%.%group_by(V1)%.%summarise(predictions = mean(predictions))
# the 2nd syntax
summarise(group_by(result, V1), predictions=mean(predictions))
hth

Resources