I have aggregated a table from my datafile using this synthax:
sumtab <- as.data.frame(table(S$MONTH))
colnames(sumtab) <- c("Month", "Frq")
rownames(sumtab) <- c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug",
"Sep","Oct","Dec")
Resulting in this table sumtab:
Month Frq
Jan 1 3
Feb 2 5
Mar 3 16
Apr 4 45
May 5 11
Jun 6 16
Jul 7 99
Aug 8 101
Sep 9 45
Oct 10 456
Dec 12 112
And this script produces a ggplot:
ggplot(sumtab, aes(x=Month,y=Frq),width=1.5) +
scale_y_continuous(limit=c(0,17),expand=c(0, 0)) +
geom_bar(stat='identity',fill="lightgreen",colour="black") +
xlab("Month") + ylab("No of bears killed") +
theme_bw(base_size = 11) +
theme(axis.text.x=element_text(angle=0,size=9))
The problem is that there are no values for November in my data, and I need to somehow enter a zero for November in the table. Probably a simple thing for most of you, and I have tried to search in other questions , and I have googled and read the books, but been unable to find the correct synthax.Need a little help.
Adding rbind into the script:
sumtab <- as.data.frame(table(S$MONTH))
sumtab <- rbind(sumtab, c(11, 0))
produced this error message:
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = 11) :
invalid factor level, NA generated
ant this table:
Var1 Freq
1 1 3
2 2 5
3 3 6
4 4 14
5 5 7
6 6 2
7 7 13
8 8 12
9 9 3
10 10 1
11 12 4
12 <NA> 0
So thanks #PaulH for your help, but I've probably used your help in a wrong way.
You could use the rbind command to add the November row:
sumtab <- rbind(sumtab, Nov = c(11, 0))
Good luck!
Related
This is what my data.table looks like:
library(data.table)
dt <- fread('
Year Total Shares Balance
2017 10 1 10
2016 12 2 9
2015 10 2 7
2014 10 3 6
2013 10 NA 3
')
**Balance** is my desired column. I am trying to find the cumulative subtractions by taking the first value of Total which is 10(it should also be the first value of Balance field) and then cumulatively subtracting values in Shares. So the second value is 10-1 =9 and the third value is 9-2 = 7 and such. There is one condition, if the Year is 2014, then subtract the Shares value after dividing it by 2. so the fourth value is 7-(2/2)=6 and the fifth value is 6-3=3. I want to end the calc as of the last row.
My attempt is:
dt[, Balance:= ifelse( Year == 2014, cumsum(Total[1]-Shares/2), cumsum(Total[1] - Shares))]
Here is one method.
dt[, Balance2 := Total[1] - cumsum(shift(Shares * (1 - (0.5 *(Year == 2015))), fill=0))]
shift is used to create a lag variable, and the first element is filled with 0, using fill=0. The other elements are calculated as Shares * (1 - (0.5 *(Year == 2015))) which return Shares except when Years == 2015, in which case Shares * 0.5 is returned.
which returns
dt
Year Total Shares Balance Balance2
1: 2017 10 1 10 10
2: 2016 12 2 9 9
3: 2015 10 2 7 7
4: 2014 10 3 6 6
5: 2013 10 NA 3 3
FWIW, I wanted to provide a functional alternative that would allow for more flexible calculations in the cumulative differences, indexing, etc. I also have read in the data with read.table.
dt <- read.table(header=TRUE, text='
Year Total Shares Balance
2017 10 1 10
2016 12 2 9
2015 10 2 7
2014 10 3 6
2013 10 NA 3
')
makeNewBalance <- function(dt) {
output <- NULL
for (i in 1:nrow(dt)) {
if (i==1) {
output[i] <- dt$Total[i]
} else {
output[i] <- output[i-1] - as.integer(ifelse(dt$Year[i]==2014,
dt$Shares[i-1]/2,
dt$Shares[i-1]))
}
}
return(output)
}
dt$NewBalance <- makeNewBalance(dt)
which also returns
> dt
Year Total Shares Balance NewBalance
1 2017 10 1 10 10
2 2016 12 2 9 9
3 2015 10 2 7 7
4 2014 10 3 6 6
5 2013 10 NA 3 3
I'm trying to learn R coming from Stata, but have run into the following two problems which I cannot seem to find elegant solutions for in R:
1) I have a panel dataset with gaps in my time variable. I would like to expand my time variable to include the gaps despite having no observed data for these rows.
In Stata I would usually go about this by setting my ID and time variables with xtset and then expanding the dataset based on this with tsfill. Is there an equivalently elegant way in R?
2) I would like to fill some of the new, blank cells with data for constant variables.
In Stata I would do this by copying data from previous (relative to my time variable) observations using the l.-prefix; for example using replace Con = l.Con.
In other words I'm asking how to go from something like this:
ID Time Num Con
1 Jan 10 A
1 Feb 15 A
1 May 20 A
2 Feb 12 B
2 Mar 14 B
2 Jun 15 B
To something like this:
ID Time Num Con
1 Jan 10 A
1 Feb 15 A
1 Mar A
1 Apr A
1 May 20 A
2 Feb 12 B
2 Mar 14 B
2 Apr B
2 May B
2 Jun 15 B
Hopefully that makes sense. Thanks in advance.
You can try merge from base R or the data.table join
library(data.table)
DT2 <- setDT(df1)[, {tmp <- match(Time, month.abb)
list(Time=month.abb[min(tmp):max(tmp)])}, .(ID,Con)]
setkey(df1[, c(1,4,2,3), with=FALSE], ID, Con, Time)[DT2]
# ID Con Time Num
# 1: 1 A Jan 10
# 2: 1 A Feb 15
# 3: 1 A Mar NA
# 4: 1 A Apr NA
# 5: 1 A May 20
# 6: 2 B Feb 12
# 7: 2 B Mar 14
# 8: 2 B Apr NA
# 9: 2 B May NA
#10: 2 B Jun 15
NOTE: It may be better to keep missing value as NA
I issue the following commands:
ops <- read.csv("ops.csv")
ops.ts <- ts(ops, frequency=12, start=c(2014,1))
ops.fc <- forecast(ops.ts)
forecast() then throws the following error:
Error in ...fourier(x, K, 1:length(x)) :
K must be not be greater than period/2
The data from the csv looks like this according to summary(ops):
1 10
2 3
3 7
4 4
5 2
6 20
7 13
8 9
9 8
10 7
11 6
12 11
13 7
R is up to date, Forecast is installed via CRAN.
I appreciate any advice especially because I am quiet new to R.
The error message is self-explanatory.
You have 13 elements in your dataset so when you do:
ops.ts <- ts(ops, frequency = 12, start=c(2014, 1))
You get (notice the 2015 value here):
#> ops.ts
# Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#2014 10 3 7 4 2 20 13 9 8 7 6 11
#2015 7
I'm guessing you only want to use the first 12 months and then use forecast() ? If that is the case you can do either:
ops.ts <- ts(ops, frequency = 12, start = 2014, end = c(2015, 0))
ops.fc <- forecast(ops.ts)
or
ops <- ops[1:12, ]
ops.ts <- ts(ops, frequency = 12, start = 2014)
ops.fc <- forecast(ops.ts)
The txt is like
#---*----1----*----2----*---
Name Time.Period Value
A Jan 2013 10
B Jan 2013 11
C Jan 2013 12
A Feb 2013 9
B Feb 2013 11
C Feb 2013 15
A Mar 2013 10
B Mar 2013 8
C Mar 2013 13
I tried to use read.table with readLines and count.field as shown belows:
> path <- list.files()
> data <- read.table(text=readLines(path)[count.fields(path, blank.lines.skip=FALSE) == 4])
Warning message:
In readLines(path) : incomplete final line found on 'data1.txt'
> data
V1 V2 V3 V4
1 A Jan 2013 10
2 B Jan 2013 11
3 C Jan 2013 12
4 A Feb 2013 9
5 B Feb 2013 11
6 C Feb 2013 15
7 A Mar 2013 10
8 B Mar 2013 8
9 C Mar 2013 13
The problem is that it give four attributes instead of three. Therefore i manipulate my data as below which seeking a alternative.
> library(zoo)
> data$Name <- as.character(data$V1)
> data$Time.Period <- as.yearmon(paste(data$V2, data$V3, sep=" "))
> data$Value <- as.numeric(data$V4)
> DATA <- data[, 5:7]
> DATA
Name Time.Period Value
1 A Jan 2013 10
2 B Jan 2013 11
3 C Jan 2013 12
4 A Feb 2013 9
5 B Feb 2013 11
6 C Feb 2013 15
7 A Mar 2013 10
8 B Mar 2013 8
9 C Mar 2013 13
You can use read.fwf to read fixed width files. You need to correctly specify the width of each column, in spaces.
data <- read.fwf(path, widths=c(-12, 8, -4, 2), header=T)
The key there is how you specify the width. Negative means skip that many places, positive means read that many. I am assuming entries in the last column have only 2 digits. Change widths accordingly if this is not the case. You will probably also have to fix the column names.
You will have to change the indices if the file format changes, or come up with some clever regexp to read it from the first few rows. A better solution would be to enclose your strings in " or, even better, avoid the format altogether.
?count.fields
As the R Documentation states count.fields counts the number of fields, as separated by sep, in each of the lines of file read, when you set count.fields(path, blank.lines.skip=FALSE) == 4 it will skip the header row which actually has three fields.
i want to sum the months for all the years in a time series that looks like
Jan Feb Mar Apr Jun Jul Aug Sep Oct Nov Dec
2006 4 4 3 4 4 5 5 3 3
2007 3 3 2 2 4 3 3 2 2 5 5
2008 3 3 3 2 2 4 4 3
by using
window(the time series object,start=c(2006,3),end=c(2008,3),frequency=1)
this line gives you a new ts object with just march of 2006-2007. However this does not work when the month does not have any values in it, is there any way to replace the gaps with NA? I have seen questions like this before but the dont answer i think for a ts object.
Assuming that
the_time_series_object <- ts(1:31, frequency = 12, start = c(2006, 3))
then:
window(the time series object, start = c(2006,3), end = c(2008,3), frequency = 12)
Your frequency should be 12 instead of 1. There's no NA problem it's just that one variable that you have wrong