I have managed to aggregate some data into the following:
Month Year Number
1 1 2011 3885
2 2 2011 3713
3 3 2011 6189
4 4 2011 3812
5 5 2011 916
6 6 2011 3813
7 7 2011 1324
8 8 2011 1905
9 9 2011 5078
10 10 2011 1587
11 11 2011 3739
12 12 2011 3560
13 1 2012 1790
14 2 2012 1489
15 3 2012 1907
16 4 2012 1615
I am trying to create a barplot where the bars for the months are next to each other, so for the above example January through April will have two bars (one for 2011 and one for 2012) and the remaining months will only have one bar representing 2011.
I know I have to use beside=T, but I guess I need to create some sort of matrix in order to get the barplot to display properly. I am having an issue figuring out what that step is. I have a feeling it may involve matrix but for some reason I am completely stumped to what seems like a very simple solution.
Also, I have this data: y=c('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec') which I would like to use in my names.arg. When I try to use it with the above data it tells me undefined columns selected which I am taking to mean that I need 16 variables in y. How can I fix this?
To use barplot you need to rearrange your data:
dat <- read.table(text = " Month Year Number
1 1 2011 3885
2 2 2011 3713
3 3 2011 6189
4 4 2011 3812
5 5 2011 916
6 6 2011 3813
7 7 2011 1324
8 8 2011 1905
9 9 2011 5078
10 10 2011 1587
11 11 2011 3739
12 12 2011 3560
13 1 2012 1790
14 2 2012 1489
15 3 2012 1907
16 4 2012 1615",sep = "",header = TRUE)
y <- c('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec')
barplot(rbind(dat$Number[1:12],c(dat$Number[13:16],rep(NA,8))),
beside = TRUE,names.arg = y)
Or you can use ggplot2 with the data pretty much as is:
dat$Year <- factor(dat$Year)
dat$Month <- factor(dat$Month)
ggplot(dat,aes(x = Month,y = Number,fill = Year)) +
geom_bar(position = "dodge") +
scale_x_discrete(labels = y)
Related
I have one data.frame with three columns Year, Nominal_Revenue and COEFFICIENT. So I want to forecast with this data like example below
library(dplyr)
TEST<-data.frame(
Year= c(2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021),
Nominal_Revenue=c(8634,5798,6011,6002,6166,6478,6731,7114,6956,6968,7098,7610,7642,8203,9856,10328,11364,12211,13150,NA,NA,NA),
COEFFICIENT=c(NA,1.016,1.026,1.042,1.049,1.106,1.092,1.123,1.121,0.999,1.059,1.066,1.006,1.081,1.055,1.063,1.071,1.04,1.072,1.062,1.07, 1.075))
SIMULATION<-mutate(TEST,
FORECAST=lag(Nominal_Revenue)*COEFFICIENT
)
And results from this code is like picture below, or in other words this code calculate forecasting only for one year or more precisely 2019.
So my intention is get results only for NA in column Nominal_Revenue,like picture below.
So can anybody help me how to fix this code ?
Because each time you need the previously computed value, we can loop for the number of NAs in your variable and apply a dplyr
for (i in 1:length(which(is.na(TEST$Nominal_Revenue)))){
TEST=TEST%>%mutate(Nominal_Revenue=if_else(is.na(Nominal_Revenue),COEFFICIENT*lag(Nominal_Revenue),Nominal_Revenue))
}
> TEST
Year Nominal_Revenue COEFFICIENT
1 2000 8634.00 NA
2 2001 5798.00 1.016
3 2002 6011.00 1.026
4 2003 6002.00 1.042
5 2004 6166.00 1.049
6 2005 6478.00 1.106
7 2006 6731.00 1.092
8 2007 7114.00 1.123
9 2008 6956.00 1.121
10 2009 6968.00 0.999
11 2010 7098.00 1.059
12 2011 7610.00 1.066
13 2012 7642.00 1.006
14 2013 8203.00 1.081
15 2014 9856.00 1.055
16 2015 10328.00 1.063
17 2016 11364.00 1.071
18 2017 12211.00 1.040
19 2018 13150.00 1.072
20 2019 13965.30 1.062
21 2020 14942.87 1.070
22 2021 16063.59 1.075
I am running a GAMM using package mgcv. The model is running fine and gives an output that makes sense, but when I use vis.gam(plot.type="persp") my graph appears like this:
enter image description here
Why is this happening? When I use vis.gam(plot.type="contour") there is no area which is transparent.
It appears to not simply be a problem with the heat color pallete; the same thing happens when I change the color scheme of the "persp" plot:
persp plot, "topo" colour
The contour plot is completely filled while the persp plot is still transparent at the top.
Data:
logcpue assnage distkm fsamplingyr
1 -1.5218399 7 3.490 2015
2 -1.6863990 4 3.490 2012
3 -1.4534337 6 3.490 2014
4 -1.5207723 5 3.490 2013
5 -2.4061258 2 3.490 2010
6 -2.5427262 3 3.490 2011
7 -1.6177367 3 3.313 1998
8 -4.4067192 10 3.313 2005
9 -4.3438054 11 3.313 2006
10 -2.8834031 7 3.313 2002
11 -2.3182512 2 3.313 1997
12 -4.1108738 1 3.235 2010
13 -2.0149030 3 3.235 2012
14 -1.4900912 6 3.235 2015
15 -3.7954892 2 3.235 2011
16 -1.6499840 4 3.235 2013
17 -1.9924302 5 3.235 2014
18 -1.2122716 4 3.189 1998
19 -0.6675703 3 3.189 1997
20 -4.7957905 7 3.106 1998
21 -3.8763958 6 3.106 1997
22 -1.2205021 4 3.073 2010
23 -1.9262374 7 3.073 2013
24 -3.3463891 9 3.073 2015
25 -1.7805862 2 3.073 2008
26 -3.2451931 8 3.073 2014
27 -1.4441139 5 3.073 2011
28 -1.4395389 6 3.073 2012
29 -1.6357552 4 2.876 2014
30 -1.3449091 5 2.876 2015
31 -2.3782225 3 2.876 2013
32 -4.4886364 1 2.876 2011
33 -2.6026897 2 2.876 2012
34 -3.5765503 1 2.147 2002
35 -4.8040211 9 2.147 2010
36 -1.3993664 5 2.147 2006
37 -1.2712250 4 2.147 2005
38 -1.8495790 7 2.147 2008
39 -2.5073795 1 2.034 2012
40 -2.0654553 4 2.034 2015
41 -3.6309855 2 2.034 2013
42 -2.2643639 3 2.034 2014
43 -2.2643639 6 1.452 2006
44 -3.3900241 8 1.452 2008
45 -4.9628446 2 1.452 2002
46 -2.0088240 5 1.452 2005
47 -3.9186675 1 1.323 2013
48 -4.3438054 2 1.323 2014
49 -3.5695327 3 1.323 2015
50 -1.6986690 7 1.200 2005
51 -3.2451931 8 1.200 2006
52 -0.9024016 4 1.200 2002
library(mgcv)
f1 <- formula(logcpue ~ s(assnage)+distkm)
m1 <- gamm(f1,random = list(fsamplingyr =~ 1),
method = "REML",
data =ycsnew)
vis.gam(m1$gam,color="topo",plot.type = "persp",theta=180)
vis.gam(m1$gam,color="heat",plot.type = "persp",theta=180)
vis.gam(m1$gam,view=c("assnage","distkm"),
plot.type="contour",color="heat",las=1)
vis.gam(m1$gam,view=c("assnage","distkm"),
plot.type="contour",color="terrain",las=1,contour.col="black")
The code of vis.gam has this:
surf.col[surf.col > max.z * 2] <- NA
I am unable to understand what it is doing and it appears to be rather ad_hoc. NA values of colors are generally transparent. If you comment out that line (and assign the environment of the new function as:
environment(vis.gam2) <- environment(vis.gam)
.... you get complete coloring of the surface.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I want to run regression for my panel data.
I have a panel data in the following format:
Column 1 has years column 2 has company name and column 3 has Equity variable
Year company name EQUITY
2006 A 12
2007 A 13
2008 A 23
2009 A 24
2010 A 13
2011 A 14
2012 A 12
2013 A 14
2014 A 14
2015 A 15
2006 B 221
2007 B 242
2008 B 262
2009 B 250
2010 B 400
2011 B 411
2012 B 420
2013 B 420
2014 B 422
2015 B 450
I have a data of 10 years for 200 companies. I want to regress the log of equity of each company on number of years(time- 10 years ). I want only slope coefficient.
I want my output like this:
Column 1-years column 2-company name column 3- beta values
Year company name slope(beta) p-value
2006 A beta value (assumed)
2007 A "
2008 A "
2009 A
2010 A
2011 A
2012 A "
2013 A
2014 A "
2015 A "
I mean slope coefficient of each comany.
Can't see what you've tried so far so here's a solution to get you up and running. The final output you sketch out doesn't really make sense since you have a slope for each company - not for each company for each year.
Here's a base R version for running the regressions. by is used to split the data and then lm for the estimation.
res <- by(indata, indata$company, FUN=function(x) { coef(lm(log(EQUITY) ~ Year+0, data=x))} )
This results in the following output of the slopes and the output can be used for plotting or listing
> res
indata$company: A
[1] 0.001344837
-------------------------------------------------------
indata$company: B
[1] 0.002896053
Update
if you want to add the slopes to the dataset for each year you can add
indata$slope <- res[indata$company]
which gives
> indata
Year company EQUITY slope
1 2006 A 12 0.001344837
2 2007 A 13 0.001344837
3 2008 A 23 0.001344837
4 2009 A 24 0.001344837
5 2010 A 13 0.001344837
6 2011 A 14 0.001344837
7 2012 A 12 0.001344837
8 2013 A 14 0.001344837
9 2014 A 14 0.001344837
10 2015 A 15 0.001344837
11 2006 B 221 0.002896053
12 2007 B 242 0.002896053
13 2008 B 262 0.002896053
14 2009 B 250 0.002896053
15 2010 B 400 0.002896053
16 2011 B 411 0.002896053
17 2012 B 420 0.002896053
18 2013 B 420 0.002896053
19 2014 B 422 0.002896053
20 2015 B 450 0.002896053
I've got a data frame with panel-data, subjects' characteristic through the time. I need create a column with a sequence from 1 to the maximum number of year per every subject. For example, if subject 1 is in the data frame from 2000 to 2005, I need the following sequence: 1,2,3,4,5,6.
Below is a small fraction of my data. The last column (exp) is what I trying to get. Additionally, if you have a look at the first subject (13) you'll see that in 2008 the value of qtty is zero. In this case I need just a NA or a code (0,1, -9999), it doesn't matter which one.
Below the data is what I did to get that vector, but it didn't work.
Any help will be much appreciated.
subject season qtty exp
13 2000 29 1
13 2001 29 2
13 2002 29 3
13 2003 29 4
13 2004 29 5
13 2005 27 6
13 2006 27 7
13 2007 27 8
13 2008 0 NA
28 2000 18 1
28 2001 18 2
28 2002 18 3
28 2003 18 4
28 2004 18 5
28 2005 18 6
28 2006 18 7
28 2007 18 8
28 2008 18 9
28 2009 20 10
28 2010 20 11
28 2011 20 12
28 2012 20 13
35 2000 21 1
35 2001 21 2
35 2002 21 3
35 2003 21 4
35 2004 21 5
35 2005 21 6
35 2006 21 7
35 2007 21 8
35 2008 21 9
35 2009 14 10
35 2010 11 11
35 2011 11 12
35 2012 10 13
My code:
numbY<-aggregate(season ~ subject, data = toCountY,length)
colnames(numbY)<-c("subject","inFish")
toCountY$inFish<-numbY$inFish[match(toCountY$subject,numbY$subject)]
numbYbyFisher<-unique(numbY)
seqY<-aggregate(numbYbyFisher$inFish, by=list(numbYbyFisher$subject), function(x)seq(1,x,1))
I am using ddply and I distinguish 2 cases:
Either you generate a sequence along subjet and you replace by NA where you have qtty is zero
ddply(dat,.(subjet),transform,new.exp=ifelse(qtty==0,NA,seq_along(subjet)))
Or you generate a sequence along qtty different of zero with a jump where you have qtty is zero
ddply(dat,.(subjet),transform,new.exp={
hh <- seq_along(which(qtty !=0))
if(length(which(qtty ==0))>0)
hh <- append(hh,NA,which(qtty==0)-1)
hh
})
EDITED
ind=qtty!=0
exp=numeric(length(subject))
temp=0
for(i in 1:length(unique(subject[ind]))){
temp[i]=list(seq(from=1,to=table(subject[ind])[i]))
}
exp[ind]=unlist(temp)
this will provide what you need
My dataframe, df:
df
EffYr EffMo count dts
2 2012 1 1 2012-01-01
3 2012 2 3 2012-02-01
4 2012 3 1 2012-03-01
5 2012 5 1 2012-05-01
6 2012 6 1 2012-06-01
7 2012 7 2 2012-07-01
8 2012 8 11 2012-08-01
9 2012 9 84 2012-09-01
10 2012 10 184 2012-10-01
11 2012 11 165 2012-11-01
12 2012 12 246 2012-12-01
13 2013 1 414 2013-01-01
14 2013 2 130 2013-02-01
15 2013 3 182 2013-03-01
16 2013 4 261 2013-04-01
17 2013 5 229 2013-05-01
18 2013 6 249 2013-06-01
19 2013 7 330 2013-07-01
20 2013 8 135 2013-08-01
Each row of df represents a "month-year", the earliest being Jan 2012 and the latest being Aug 2013. I want to plot a bar graph (using ggplot2) where each bar represents a row of df with the bar height equal to the row's count. So, I should have 24 bars in total.
I want my x axis to be divided into 12 intervals: Jan-Dec, and bars that represent the same calendar month should lie in the same "month interval". For example, if df has a row for Jan 2011, Jan 2012, Jan 2013, then the Jan portion of my graph should have 3 bars so that I can compare my business's performance in the month of January for subsequent years.
Thanks
Edit: I want something that looks like
ggplot(diamonds, aes(cut, fill=cut)) + geom_bar() +
facet_grid(. ~ clarity)
But broken down by month. I tried to modify that code to fit my data, but never could get it right.
#Ben you're asking a number of ggplot2 questions. I would recommend you sit down with some good ggplot2 resources and try the example to become more skilled. Here are 2 excellent resources I use often:
http://docs.ggplot2.org/current/
http://www.cookbook-r.com/Graphs/
Now the solution I think you're after:
## dat <- read.table(text=" EffYr EffMo count dts
## 2 2012 1 1 2012-01-01
## 3 2012 2 3 2012-02-01
## 4 2012 3 1 2012-03-01
## 5 2012 5 1 2012-05-01
## 6 2012 6 1 2012-06-01
## 7 2012 7 2 2012-07-01
## 8 2012 8 11 2012-08-01
## 9 2012 9 84 2012-09-01
## 10 2012 10 184 2012-10-01
## 11 2012 11 165 2012-11-01
## 12 2012 12 246 2012-12-01
## 13 2013 1 414 2013-01-01
## 14 2013 2 130 2013-02-01
## 15 2013 3 182 2013-03-01
## 16 2013 4 261 2013-04-01
## 17 2013 5 229 2013-05-01
## 18 2013 6 249 2013-06-01
## 19 2013 7 330 2013-07-01
## 20 2013 8 135 2013-08-01", header=TRUE)
dat$month <- factor(month.name[dat$EffMo], levels = month.name)
dat$year <- as.factor(dat$EffYr)
ggplot(dat, aes(month, fill=year)) + geom_bar(aes(weight=count), position="dodge")