How can I display % on Y-axis? I can edit the values in the Graph Editor but don't know how this can be done via a script as I am creating several graphs in a loop and tick values change with graphs.
clear
input yr v1
2005 77.01
2006 84.01
2007 83.01
2008 85.01
2009 86.01
2010 83.01
2011 98.01
2012 80.01
2013 79.01
end
graph twoway connected v1 yr
Actual
Expected
My previous answer was a bit messy given edits.
Here is a fresh self-contained answer based on nicelabels (on SSC since 10 May 2022) and mylabels (on SSC for some while, perhaps 2003).
Let's start by noting that adding % signs is not part of any official display format. So, we have to do it in our own code.
clear
input yr v1
2005 77.01
2006 84.01
2007 83.01
2008 85.01
2009 86.01
2010 83.01
2011 98.01
2012 80.01
2013 79.01
end
nicelabels v1, local(yla)
if wordcount("`yla'") < 5 nicelabels v1, local(yla) nvals(10)
mylabels `yla', suffix(%) local(yla)
twoway connected v1 yr , yla(`yla')
So nicelabels is asked to suggest nice labels for v1. If the number suggested is < 5 it is told to try again. Once those labels exist, they are pushed through mylabels for adding % to each. The process needs no user intervention.
Related
I have a panel data of stock returns, where after a certain year the coverage universe of stocks doubled. It looks a bit like this:
Year Stock 1 Stock 2 Stock 3 Stock 4
2000 5.1% 0.04% NA NA
2001 3.6% 9.02% NA NA
2002 5.0% 12.09% NA NA
2003 -2.1% -9.05% 1.1% 4.7%
2004 7.1% 1.03% 4.2% -1.1%
.....
Of course, I am trying to maximize my observations both in the time series and in the cross-section as much as possible. However, I am not sure which of these 3 ways to sort would be the most "academically honest":
Sort the years until 2001 using only stocks 1 and 2, and incorporate the remaining stocks in the calculations once they become available in 2003.
Only include those stocks in calculations that have been available since 2000, i.e. stocks 1 and 2. Ignore the remaining stocks altogether since we do not have the full return profile.
Start the sort in year 2003, to have a larger cross-section.
The reason why our coverage universe expands in 2003 is simply because the data provider I am using changed their methodology in that year and decided to track more stocks. Stocks 3 and 4 do exist before 2003, but I cannot use their past return data since I need to follow my data provider (for the second variable I am sorting on).
Thanks all!
I am using the portsort() package in R but this does not seem to work well with NA`s.
I am working to develop a time series plot in R. However, I can not seem to be able to access the columns in my data frame. The error message is Error in FUN(X[[i]], ...) : object 'Dates' not found.
Below includes my script and the brief table. Any help is much appreciated.
# Transpose USA to get dates
t_USA_G_1 <- as.data.frame(t(USA_G_1_date))
#Rename column headers
colnames(t_USA_G_1)[0] = "Dates"
colnames(t_USA_G_1)[1] = "USA_Net_Enrollment"
t_USA_G_1
#Time series plot
t_USA_G_1%>%
ggplot(aes(Dates, USA_Net_Enrollment)) +
geom_line() +
geom_point()
------Output-----
USA_Net_Enrollment
1999 96.56902
2000 96.69755
2001 96.28022
2002 94.99747
2003 94.74116
2004 93.37412
2005 93.68804
2006 94.81912
2007 95.86296
2008 96.26724
2009 94.81539
2010 93.62400
2011 92.91374
2012 93.16648
2013 92.77709
2014 93.09830
2015 93.75419
I found the answer using row.names.
t_USA_G_1%>%
ggplot(aes(row.names(t_USA_G_1), USA_Net_Enrollment)) +
geom_point(color="blue")+
labs(x="Dates", y="USA Net Enrollment")
First time posting a question here. Useless times this forum helped, but now, I fell my R skills are not strong enough to do the job.
My problem is: I have a Spatial Data frame with multiples attributes, such as Grid_code (pixels values, integer), Sub_Population(Character) and Origin_year (integer). I need to find the break values, in this case, 3 breaks values to place 1/4 of the pixels in each class - that will be 4 classes.
Also, this breaks will vary regarding the Sub_population and Origin_year unique combination.
SubPop Origin grid_code
AL 2008 4.730380
AL 2008 5.552315
AL 2008 5.968850
AL 2008 5.128384
AL 2009 6.927450
AL 2009 7.135734
ALCentral 2008 7.381087
ALCentral 2008 6.232927
ALCentral 2009 6.431800
ALCentral 2009 6.690246
ALCentral 2009 6.794144
That said, the breaks that will allocate the pixels into 4 different classes (1/4 of pixels in each class) will be a unique single set for each combination of Sub_population and Origin_Year.
What I'm thinking to do:
For each unique combination of Sub_population and Origin_year I'll create a df.
> cstands_spdf_split <- cstands_select_df[ which(
> cstands_select_df$SubPop == "AL" | cstands_select_df$Origin
> ==2008) , ]
Now I need to know for to define the breaks for this unique combination. I was thinking in using the split function with quantiles, but I don't know how this can be done...
Within the time and leaning I'll update this script to be used to run like a function.
Any feedback is appreciate.
Whenever I want to lag in a data frame I realize that something that should be simple is not. While the problem has been asked & answered many times (see p.s.), I did not find a simple solution which I can remember until the next time I lag. In general, lagging does not seem to be a simple thing in R as the multiple workarounds testify. I run into this problem often and it would be very helpful to have some basic R solutions which do not need extra packages. Could you provide your simple solution for lagging?
If that is not possible, could you at least provide your workaround here so we can choose amongst second best alternatives? One collection already exists here
Also, in all blog posts on this subject I see people complain about how unexpectedly difficult lagging is so how can we get a simple lag function for data frames into R Core? This must be extremely disappointing for anyone coming from Stata or EViews. Or am I missing something and there is a simple built in solution?
say we want to lag "value" by 3 "year"s for each "country" here:
Data <- data.frame(year=c(rep(2010:2015,2)),country=c(rep("AT",6),rep("DE",6)),value=rnorm(12))
to create L3 like:
year country value L3
2010 AT 0.3407 NA
2011 AT -1.7981 NA
2012 AT -0.8390 NA
2013 AT -0.6888 0.3407
2014 AT -1.1019 -1.7981
2015 AT -0.8953 -0.8390
2010 DE 0.5877 NA
2011 DE -1.0204 NA
2012 DE -0.6576 NA
2013 DE 0.6620 0.5877
2014 DE 0.9579 -1.0204
2015 DE -0.7774 -0.6576
And we neither want to change the nature of our data (to ts or data table) nor do we want to immerse ourselves in three new packages when the deadline is tonight and our supervisor uses Stata and thinks lagging is easy ;-) (its not, I just want to be prepared...)
p.s.:
without groups
with data.table: Lag in dataframe or How to create a lag variable within each group?
time series are straightforward
If the question is how to provide a column with the prior third year's value not using packages then try this:
prior_year3 <- function(x, k = 3) head(c(rep(NA, k), x), length(x))
transform(Data, prior_year_value = ave(value, country, FUN = prior_year3))
giving:
year country value prior_year_value
1 2010 AT -1.66562121 NA
2 2011 AT -0.04950063 NA
3 2012 AT 1.55930293 NA
4 2013 AT -0.40462394 -1.66562121
5 2014 AT 0.78602610 -0.04950063
6 2015 AT 0.73912916 1.55930293
7 2010 DE 1.03710539 NA
8 2011 DE -1.13370942 NA
9 2012 DE -1.20530981 NA
10 2013 DE 1.66870572 1.03710539
11 2014 DE 1.53615793 -1.13370942
12 2015 DE -0.09693335 -1.20530981
That said, to use R effectively you do need to learn how to use the key packages.
Try slide from data combine package, its simple
slide(Data,Var='value',GroupVar = 'country',slideBy=-3)
How to apply simple statistics to data and plot them elegantly by year using the R base plotting system and default functions?
The database is quite heavy, hence do not generate new variables would be preferable.
I hope it is not a silly question, but I am wondering about this problem without finding a specific solution not involving additional packages such as ggplot2, dplyr, lubridate, such as the ones I found on the site:
ggplot2: Group histogram data by year
R group by year
Split data by year
The use of the R default systems is due to didactic purposes. I think it could be an important training before turn on the more "comfortable" R specific packages.
Consider a simple dataset:
> prod_dat
lab year production(kg)
1 2010 0.3219
1 2011 0.3222
1 2012 0.3305
2 2010 0.3400
2 2011 0.3310
2 2012 0.3310
3 2010 0.3400
3 2011 0.3403
3 2012 0.3410
I would like to plot with an histogram of, let's say, the total production of material during specific years.
> hist(sum(prod_dat$production[prod_dat$year == c(2010, 2013)]))
Unfortunately, this is my best attempt, and it trow an error:
in prod_dat$year == c(2010, 2012):
longer object length is not a multiple of shorter object length
I am really out of route, hence any suggestion can turn in use.
without ggplot I used to do it like this but there are smarter way I think
all <- read.table(header = TRUE, stringsAsFactors = FALSE, text = "lab year production
1 2010 1
1 2011 0.3222
1 2012 0.3305
2 2010 0.3400
2 2011 0.3310
2 2012 0.3310
3 2010 0.3400
3 2011 0.3403
3 2012 0.3410")
ar <- data.frame(year = unique(all$year), prod = tapply(all$production, list(all$year), FUN = sum))
barplot(ar$prod)