I am working to develop a time series plot in R. However, I can not seem to be able to access the columns in my data frame. The error message is Error in FUN(X[[i]], ...) : object 'Dates' not found.
Below includes my script and the brief table. Any help is much appreciated.
# Transpose USA to get dates
t_USA_G_1 <- as.data.frame(t(USA_G_1_date))
#Rename column headers
colnames(t_USA_G_1)[0] = "Dates"
colnames(t_USA_G_1)[1] = "USA_Net_Enrollment"
t_USA_G_1
#Time series plot
t_USA_G_1%>%
ggplot(aes(Dates, USA_Net_Enrollment)) +
geom_line() +
geom_point()
------Output-----
USA_Net_Enrollment
1999 96.56902
2000 96.69755
2001 96.28022
2002 94.99747
2003 94.74116
2004 93.37412
2005 93.68804
2006 94.81912
2007 95.86296
2008 96.26724
2009 94.81539
2010 93.62400
2011 92.91374
2012 93.16648
2013 92.77709
2014 93.09830
2015 93.75419
I found the answer using row.names.
t_USA_G_1%>%
ggplot(aes(row.names(t_USA_G_1), USA_Net_Enrollment)) +
geom_point(color="blue")+
labs(x="Dates", y="USA Net Enrollment")
Related
How can I display % on Y-axis? I can edit the values in the Graph Editor but don't know how this can be done via a script as I am creating several graphs in a loop and tick values change with graphs.
clear
input yr v1
2005 77.01
2006 84.01
2007 83.01
2008 85.01
2009 86.01
2010 83.01
2011 98.01
2012 80.01
2013 79.01
end
graph twoway connected v1 yr
Actual
Expected
My previous answer was a bit messy given edits.
Here is a fresh self-contained answer based on nicelabels (on SSC since 10 May 2022) and mylabels (on SSC for some while, perhaps 2003).
Let's start by noting that adding % signs is not part of any official display format. So, we have to do it in our own code.
clear
input yr v1
2005 77.01
2006 84.01
2007 83.01
2008 85.01
2009 86.01
2010 83.01
2011 98.01
2012 80.01
2013 79.01
end
nicelabels v1, local(yla)
if wordcount("`yla'") < 5 nicelabels v1, local(yla) nvals(10)
mylabels `yla', suffix(%) local(yla)
twoway connected v1 yr , yla(`yla')
So nicelabels is asked to suggest nice labels for v1. If the number suggested is < 5 it is told to try again. Once those labels exist, they are pushed through mylabels for adding % to each. The process needs no user intervention.
I have a "test" dataframe with 3 companies (ciknum variable) and years in which each company filed annual reports (fyearq):
ciknum fyearq
1 1408356 2012
2 1557255 2012
3 1557255 2013
4 1557255 2014
5 1557255 2015
6 1557255 2016
7 1555538 2013
8 1555538 2014
9 1555538 2015
10 1555538 2016
After obtaining the MasterIndex folder and running this code (see proposed solution) I use the R edgar package to obtain 10-K filings. I run the following code:
for (i in 1:nrow(test)){
firm<-test[i,"ciknum"] #edit: seems like mistake can be here since new firm data only contains 1 obs of 1 variable
year<-test[i,"fyearq"] #edit: seems like mistake can be here since new year data only contains 1 obs of 1 variable
my_getFilings(firm,'10-K',year,downl.permit="y")
}
And it keeps spitting the following error: Error: Input year(s) is not numeric. I checked the variable type and it seems my fyearq variable is numeric.
sapply(test,class)
ciknum fyearq
"numeric" "numeric"
Don't really understand why the "numeric" fyearq variable is not read as such by the my_getFilings function. Any help would be much appreciated.
Thank you in advance.
Martins
The ordering seems to matter here. I solved this problem by using the descriptor from the function, so that
my_getFilings(firm,'10-K',year,downl.permit="y")
as you wrote is written as
my_getFilings(cik.no = firm, form.def = '10-K', filing.year = 2016, downl.permit = "y")
Thank you #bartosz25 and M Grace,
I finally made it work through the following code:
for (row in 1:nrow(test)){
firm <- as.numeric(test[row, "ciknum"])
year <- as.numeric(test[row, "fyearq"])
my_getFilings(firm, c('10-K'), year, downl.permit="y")
}
Apologies for not posting it before.
Whenever I want to lag in a data frame I realize that something that should be simple is not. While the problem has been asked & answered many times (see p.s.), I did not find a simple solution which I can remember until the next time I lag. In general, lagging does not seem to be a simple thing in R as the multiple workarounds testify. I run into this problem often and it would be very helpful to have some basic R solutions which do not need extra packages. Could you provide your simple solution for lagging?
If that is not possible, could you at least provide your workaround here so we can choose amongst second best alternatives? One collection already exists here
Also, in all blog posts on this subject I see people complain about how unexpectedly difficult lagging is so how can we get a simple lag function for data frames into R Core? This must be extremely disappointing for anyone coming from Stata or EViews. Or am I missing something and there is a simple built in solution?
say we want to lag "value" by 3 "year"s for each "country" here:
Data <- data.frame(year=c(rep(2010:2015,2)),country=c(rep("AT",6),rep("DE",6)),value=rnorm(12))
to create L3 like:
year country value L3
2010 AT 0.3407 NA
2011 AT -1.7981 NA
2012 AT -0.8390 NA
2013 AT -0.6888 0.3407
2014 AT -1.1019 -1.7981
2015 AT -0.8953 -0.8390
2010 DE 0.5877 NA
2011 DE -1.0204 NA
2012 DE -0.6576 NA
2013 DE 0.6620 0.5877
2014 DE 0.9579 -1.0204
2015 DE -0.7774 -0.6576
And we neither want to change the nature of our data (to ts or data table) nor do we want to immerse ourselves in three new packages when the deadline is tonight and our supervisor uses Stata and thinks lagging is easy ;-) (its not, I just want to be prepared...)
p.s.:
without groups
with data.table: Lag in dataframe or How to create a lag variable within each group?
time series are straightforward
If the question is how to provide a column with the prior third year's value not using packages then try this:
prior_year3 <- function(x, k = 3) head(c(rep(NA, k), x), length(x))
transform(Data, prior_year_value = ave(value, country, FUN = prior_year3))
giving:
year country value prior_year_value
1 2010 AT -1.66562121 NA
2 2011 AT -0.04950063 NA
3 2012 AT 1.55930293 NA
4 2013 AT -0.40462394 -1.66562121
5 2014 AT 0.78602610 -0.04950063
6 2015 AT 0.73912916 1.55930293
7 2010 DE 1.03710539 NA
8 2011 DE -1.13370942 NA
9 2012 DE -1.20530981 NA
10 2013 DE 1.66870572 1.03710539
11 2014 DE 1.53615793 -1.13370942
12 2015 DE -0.09693335 -1.20530981
That said, to use R effectively you do need to learn how to use the key packages.
Try slide from data combine package, its simple
slide(Data,Var='value',GroupVar = 'country',slideBy=-3)
How to apply simple statistics to data and plot them elegantly by year using the R base plotting system and default functions?
The database is quite heavy, hence do not generate new variables would be preferable.
I hope it is not a silly question, but I am wondering about this problem without finding a specific solution not involving additional packages such as ggplot2, dplyr, lubridate, such as the ones I found on the site:
ggplot2: Group histogram data by year
R group by year
Split data by year
The use of the R default systems is due to didactic purposes. I think it could be an important training before turn on the more "comfortable" R specific packages.
Consider a simple dataset:
> prod_dat
lab year production(kg)
1 2010 0.3219
1 2011 0.3222
1 2012 0.3305
2 2010 0.3400
2 2011 0.3310
2 2012 0.3310
3 2010 0.3400
3 2011 0.3403
3 2012 0.3410
I would like to plot with an histogram of, let's say, the total production of material during specific years.
> hist(sum(prod_dat$production[prod_dat$year == c(2010, 2013)]))
Unfortunately, this is my best attempt, and it trow an error:
in prod_dat$year == c(2010, 2012):
longer object length is not a multiple of shorter object length
I am really out of route, hence any suggestion can turn in use.
without ggplot I used to do it like this but there are smarter way I think
all <- read.table(header = TRUE, stringsAsFactors = FALSE, text = "lab year production
1 2010 1
1 2011 0.3222
1 2012 0.3305
2 2010 0.3400
2 2011 0.3310
2 2012 0.3310
3 2010 0.3400
3 2011 0.3403
3 2012 0.3410")
ar <- data.frame(year = unique(all$year), prod = tapply(all$production, list(all$year), FUN = sum))
barplot(ar$prod)
MarriageLicen
Year Month Amount
1 2011 Jan 742
2 2011 Feb 796
3 2011 Mar 1210
4 2011 Apr 1376
BusinessLicen
Month Year MARRIAGE_LICENSES
1 Jan 2011 754
2 Feb 2011 2706
3 Mar 2011 2689
4 Apr 2011 738
My question is how can we predict the number of Marriage Licenses (Y) issued by the city using the number of Business Licenses (X)?
And how can we join two datasets together?
It says that you can join using the combined key of Month and Year.
But I am suffering from this question for several days.
There are three options here.
The first is to just be direct. I'm going to assume you have the labels swapped around for the data frames in your example (it doesn't make a whole lot of sense to have a MARRIAGE_LICENSES variable in the BusinessLicen data frame, if I'm following what you are trying to do).
You can model the relationship between those two variables with:
my.model <- lm(MarriageLicen$MARRIAGE_LICENSES ~ BusinessLicen$Amount)
The second (not very rational) option would be to make a new data frame explicitly, since it looks like you have an exact match on each of your rows:
new.df <- data.frame(marriage.licenses=MarriageLicen$MARRIAGE_LICENSES, business.licenses=BusinessLicen$Amount)
my.model <- lm(marriage.licenses ~ business.licenses, data=new.df)
Finally, if you don't actually have the perfect alignment shown in your example you can use merge.
my.df <- merge(BusinessLicen, MarriageLicen, by=c("Month", "Year"))
my.model <- lm(MARRIAGE_LICENCES ~ Amount, data=my.df)