I had a longer TimeSeries and turned it into wider for forecasting purposes, currently timeseries has the following structure :
Day
Value
Strength1
Strength2
Strength3
1/2
1.356
3
NA
NA
2/2
1.385
NA
NA
NA
3/2
1.385
NA
1.01
NA
4/2
1.4
NA
NA
10
5/2
1.6
NA
NA
NA
6/2
1.7
4
NA
NA
7/2
1.8
NA
1.05
NA
8/2
1.88
NA
NA
NA
9/2
1.98
NA
NA
11
10/2
1.8
NA
NA
NA
I want a function that :
given a TimeSeries
loops through columns if cell == NA and
previously only NAs were found in the column , keep NA
if cell != NA good
if cell == NA But previously we found not NA values, change to previously found value
This would be result :
Day
Value
Strength1
Strength2
Strength3
1/2
1.356
3
NA
NA
2/2
1.385
3
NA
NA
3/2
1.385
3
1.01
NA
4/2
1.4
3
1.01
10
5/2
1.6
3
1.01
10
6/2
1.7
4
1.01
10
7/2
1.8
4
1.05
10
8/2
1.88
4
1.05
10
9/2
1.98
4
1.05
11
10/2
1.8
4
1.05
11
I tried this function but it isn't right :
filler <- function(df) {
col <- colnames(df)
one <- NA
for (i in col) {
for (a in i) {
if(!is.na(a)) {
one = a
}
if(!is.na(one) & is.na(a)) {
a = one
}
}
}
}
You may use tidyr::fill -
filler <- function(data) tidyr::fill(data, dplyr::everything())
filler(df)
# Day Value Strength1 Strength2 Strength3
#1 1/2 1.356 3 NA NA
#2 2/2 1.385 3 NA NA
#3 3/2 1.385 3 1.01 NA
#4 4/2 1.400 3 1.01 10
#5 5/2 1.600 3 1.01 10
#6 6/2 1.700 4 1.01 10
#7 7/2 1.800 4 1.05 10
#8 8/2 1.880 4 1.05 10
#9 9/2 1.980 4 1.05 11
#10 10/2 1.800 4 1.05 11
I have a quarterly time series. I am trying to apply a function which is supposed calculate the year-to-year growth and year-to-year difference and multiply a variable by (-1).
I already used a similar function for calculating quarter-to-quarter changes and it worked.
I modified this function for yoy changes and it does not have any effect on my data frame. And any error popped up.
Do you have any suggestion how to modify the function or how to accomplish to apply the yoy change function on a time series?
Here is the code:
Date <- c("2004-01-01","2004-04-01", "2004-07-01","2004-10-01","2005-01-01","2005-04-01","2005-07-01","2005-10-01","2006-01-01","2006-04-01","2006-07-01","2006-10-01","2007-01-01","2007-04-01","2007-07-01","2007-10-01")
B1 <- c(3189.30,3482.05,3792.03,4128.66,4443.62,4876.54,5393.01,5885.01,6360.00,6930.00,7430.00,7901.00,8279.00,8867.00,9439.00,10101.00)
B2 <- c(7939.97,7950.58,7834.06,7746.23,7760.59,8209.00,8583.05,8930.74,9424.00,9992.00,10041.00,10900.00,11149.00,12022.00,12662.00,13470.00)
B3 <- as.numeric(c("","","","",140.20,140.30,147.30,151.20,159.60,165.60,173.20,177.30,185.30,199.30,217.10,234.90))
B4 <- as.numeric(c("","","","",-3.50,-14.60,-11.60,-10.20,-3.10,-16.00,-4.90,-17.60,-5.30,-10.90,-12.80,-8.40))
df <- data.frame(Date,B1,B2,B3,B4)
The code will produce following data frame:
Date B1 B2 B3 B4
1 2004-01-01 3189.30 7939.97 NA NA
2 2004-04-01 3482.05 7950.58 NA NA
3 2004-07-01 3792.03 7834.06 NA NA
4 2004-10-01 4128.66 7746.23 NA NA
5 2005-01-01 4443.62 7760.59 140.2 -3.5
6 2005-04-01 4876.54 8209.00 140.3 -14.6
7 2005-07-01 5393.01 8583.05 147.3 -11.6
8 2005-10-01 5885.01 8930.74 151.2 -10.2
9 2006-01-01 6360.00 9424.00 159.6 -3.1
10 2006-04-01 6930.00 9992.00 165.6 -16.0
11 2006-07-01 7430.00 10041.00 173.2 -4.9
12 2006-10-01 7901.00 10900.00 177.3 -17.6
13 2007-01-01 8279.00 11149.00 185.3 -5.3
14 2007-04-01 8867.00 12022.00 199.3 -10.9
15 2007-07-01 9439.00 12662.00 217.1 -12.8
16 2007-10-01 10101.00 13470.00 234.9 -8.4
And I want to apply following changes on the variables:
# yoy absolute difference change
abs.diff = c("B1","B2")
# yoy percentage change
percent.change = c("B3")
# make the variable negative
negative = c("B4")
This is the fuction that I am trying to use for my data frame.
transformation = function(D,abs.diff,percent.change,negative)
{
TT <- dim(D)[1]
DData <- D[-1,]
nms <- c()
for (i in c(2:dim(D)[2])) {
# yoy absolute difference change
if (names(D)[i] %in% abs.diff)
{ DData[,i] = (D[5:TT,i]-D[1:(TT-4),i])
names(DData)[i] = paste('a',names(D)[i],sep='') }
# yoy percent. change
if (names(D)[i] %in% percent.change)
{ DData[,i] = 100*(D[5:TT,i]-D[1:(TT-4),i])/D[1:(TT-4),i]
names(DData)[i] = paste('p',names(D)[i],sep='') }
#CA.deficit
if (names(D)[i] %in% negative)
{ DData[,i] = (-1)*D[1:TT,i] }
}
return(DData)
}
This is what I would like to get :
Date pB1 pB2 aB3 B4
1 2004-01-01 NA NA NA NA
2 2004-04-01 NA NA NA NA
3 2004-07-01 NA NA NA NA
4 2004-10-01 NA NA NA NA
5 2005-01-01 39.33 -2.26 NA 3.5
6 2005-04-01 40.05 3.25 NA 14.6
7 2005-07-01 42.22 9.56 NA 11.6
8 2005-10-01 42.54 15.29 11.0 10.2
9 2006-01-01 43.13 21.43 19.3 3.1
10 2006-04-01 42.11 21.72 18.3 16.0
11 2006-07-01 37.77 16.99 22.0 4.9
12 2006-10-01 34.26 22.05 17.7 17.6
13 2007-01-01 30.17 18.3 19.7 5.3
14 2007-04-01 27.95 20.32 26.1 10.9
15 2007-07-01 27.04 26.1 39.8 12.8
16 2007-10-01 27.84 23.58 49.6 8.4
Grouping by the months, i.e. 6th and 7th substring using ave and do the necessary calculations. With sapply we may loop over the columns.
f <- function(x) {
g <- substr(Date, 6, 7)
l <- length(unique(g))
o <- ave(x, g, FUN=function(x) 100/x * c(x[-1], NA) - 100)
c(rep(NA, l), head(o, -4))
}
cbind(df[1], sapply(df[-1], f))
# Date B1 B2 B3 B4
# 1 2004-01-01 NA NA NA NA
# 2 2004-04-01 NA NA NA NA
# 3 2004-07-01 NA NA NA NA
# 4 2004-10-01 NA NA NA NA
# 5 2005-01-01 39.32901 -2.259202 NA NA
# 6 2005-04-01 40.04796 3.250329 NA NA
# 7 2005-07-01 42.21960 9.560688 NA NA
# 8 2005-10-01 42.54044 15.291439 NA NA
# 9 2006-01-01 43.12655 21.434066 13.83738 -11.428571
# 10 2006-04-01 42.10895 21.720063 18.03279 9.589041
# 11 2006-07-01 37.77093 16.986386 17.58316 -57.758621
# 12 2006-10-01 34.25636 22.050356 17.26190 72.549020
# 13 2007-01-01 30.17296 18.304329 16.10276 70.967742
# 14 2007-04-01 27.95094 20.316253 20.35024 -31.875000
# 15 2007-07-01 27.03903 26.102978 25.34642 161.224490
# 16 2007-10-01 27.84458 23.577982 32.48731 -52.272727
I have a dataset something like this
data <- read.table(text = "Me EE PE DE TE DEE CE
1 1 1 4.5 2000 0.50 0.2547 0.69
2 1 2 2.4 3000 NA 0.5896 2.56
3 1 3 6.5 2345 15.24 NA 1.85
4 1 4 NA NA 18.23 1.2594 2.06
5 2 1 2.6 NA 12.25 1.5943 2.34
6 2 2 NA 3145 10.25 NA NA
7 2 3 2.7 4235 NA NA 2.90
8 2 4 NA NA 6.32 2.5990 3.18
9 3 1 3.5 NA 8.25 2.9339 3.46
10 3 2 3.8 NA NA NA NA
11 3 3 NA NA NA 3.6037 1.58
12 3 4 4.4 4325 NA 3.9386 4.30
13 4 1 4.7 NA 15.24 4.2735 4.58
14 4 2 NA 4325 6.66 NA NA
15 4 3 5.3 NA 25.20 NA NA
16 4 4 5.6 3256 NA 5.2782 5.42
17 5 1 NA 4351 25.36 5.6131 5.70
18 5 2 6.2 2345 NA 5.9480 NA
19 5 3 6.5 NA 19.36 NA NA
20 5 4 NA 4643 17.25 6.6178 6.54", header = T)
Dataset can be found in here. Dataset
I am trying to plot a ggplot2 point graph using
library(ggplot2)
ggplot(data,aes(x=EE,y=PE)) + geom_point( aes(color = factor(Me)))
and the plot is
I want to pick the color of my choice from the RGB or HSV or HCL and assign to individual factors or levels(Me) how can I do that?
How can I have proper colors assigned to unique(data$Me) values like c(red, blue, green, yellow, orange, black, brown, magenta, ...)
Thanks in advance.
Add this to the plot
+ scale_colour_manual(values = c('red', 'green'))
You can also use hex codes.
+ scale_colour_manual(values = c('#FF0000', '#00FF00'))
I have two datsets I call them 1.data 2.data_other
1.data
data <- read.table(text = "Me EE PE DE TE DEE CE
1 1 1 4.5 2000 0.50 0.2547 0.69
2 1 2 2.4 3000 NA 0.5896 2.56
3 1 3 6.5 2345 15.24 NA 1.85
4 1 4 NA NA 18.23 1.2594 2.06
5 2 1 2.6 NA 12.25 1.5943 2.34
6 2 2 NA 3145 10.25 NA NA
7 2 3 2.7 4235 NA NA 2.90
8 2 4 NA NA 6.32 2.5990 3.18
9 3 1 3.5 NA 8.25 2.9339 3.46
10 3 2 3.8 NA NA NA NA
11 3 3 NA NA NA 3.6037 1.58
12 3 4 4.4 4325 NA 3.9386 4.30
13 4 1 4.7 NA 15.24 4.2735 4.58
14 4 2 NA 4325 6.66 NA NA
15 4 3 5.3 NA 25.20 NA NA
16 4 4 5.6 3256 NA 5.2782 5.42
17 5 1 NA 4351 25.36 5.6131 5.70
18 5 2 6.2 2345 NA 5.9480 NA
19 5 3 6.5 NA 19.36 NA NA
20 5 4 NA 4643 17.25 6.6178 6.54", header = T)
2.data_other
Me EE PE DE TE DEE CE
1 1 1.3 1452 12.5 0.2587 1.25
1 2 1.7 2458 10.8 1.5469 0.69
1 3 0.8 3524 11.96 0.5874 0.87
1 4 0.7 1905 13.58 0.9654 0.98
2 1 0.5 2941 12.54 0.2548 1.65
2 2 0.2 3183.5 13.134 1.3658 1.11
2 3 0.01 3426 13.42 0.48692 1.320666667
2 4 1.52 3668.5 13.706 1.9547 1.386095238
3 1 1.98 3911 13.992 0.36906 1.45152381
3 2 2.1 4153.5 14.278 0.31013 1.516952381
3 3 2.44 6674 15.96 0.564 0.2512
3 4 2.736 6671 16.5 1.85 1.5687
4 1 1.05 4881 15.136 0.13334 1.713238095
4 2 1.52 5123.5 15.422 0.07441 1.778666667
4 3 0.95 5366 15.708 0.01548 1.844095238
4 4 1.073 5608.5 15.994 1.2548 1.90952381
5 1 1.0233 5851 16.28 1.9658 1.974952381
5 2 0.9733 6093.5 16.566 2.2154 2.040380952
5 3 0.9233 6336 16.852 1.2587 2.105809524
5 4 2.45 6578.5 17.138 1.9657 2.171238095
I am plotting point graph with data and I am using following code
data <- read.table("data.txt")
datad <- melt(data,id.vars = c("Me","EE"),measure.vars= c("PE","DE","TE","DEE","CE"))
ggplot(subset(datad, !is.na(EE)),aes(x=EE,y=value), as.table = T) +
geom_point(aes(color=factor(Me)), size = 3) +
facet_wrap(~variable, nrow=3, ncol=2, scales = "free_y")
I am plotting line plot with data_other and using following code
data_other <- read.table("data_other.txt", header = T)
data_otherd <- melt(data_other, id.vars = c("Me","EE"), measure.vars = c("PE","DE","TE","DEE","CE"))
ggplot(subset(data_otherd, !is.na(EE)), aes(x=EE, y=value), as.table = T) +
geom_line(aes(color = factor (Me))) +
facet_wrap(~variable, nrow=3, ncol=2, scales = "free_y")
Result.
data (point graph)
data_other (line graph)
I am trying to combine both the data sets on to the same plot (Meaning data_other line plot should be overlapped on to the data point plot or vice versa ).
Both the datasets have same number of columns and columns names are also similar.
Without the facet_wrap I know I can use different geom to specify different dataset and plot the line plot on the point graph. But with facet wrap I dont know how to combine both the dataset.
How to wrap two datasets with different geom on to a single plot.
Please help. Thanks in advance.
You need just one call to ggplot() with first data frame and then add geom_line() with data=data_otherd. As all variable names are the same, geom_line() will use the same information that is provided is aes() of ggplot() (move also color=factor(Me) to aes() of ggplot() call).
ggplot(datad,aes(x=EE,y=value,color=factor(Me)))+
geom_point()+
geom_line(data=data_otherd)+
facet_wrap(~variable, nrow=3, ncol=2, scales = "free_y")
I have a dataframe X with several columns and want to select column N for each row where N is different for each row depending on some vector ( in this example : values in column sel)
A B C D sel
16/04/2012 NA -1.25 -1.25 0.25 1
17/04/2012 NA 20 21.25 17.25 1
18/04/2012 -5.25 -5.25 -5.75 -1 2
19/04/2012 -6 -6 -6.25 -12 2
20/04/2012 2.5 2.5 2.75 NA 2
23/04/2012 NA -12.25 -12 NA 2
24/04/2012 NA 7.25 7.5 7.25 2
25/04/2012 NA 17.5 17 18.25 4
26/04/2012 NA 9.5 10 11.5 4
27/04/2012 NA 2 1 -3.25 4
30/04/2012 NA -4.75 -4 -1 4
01/05/2012 NA 6.25 5.75 17 3
02/05/2012 NA -3 -2.75 -16 3
03/05/2012 NA -11.5 -11.5 -6.75 4
04/05/2012 NA -23.5 -23.75 -23 4
so i would end up with
16/04/2012 NA
17/04/2012 NA
18/04/2012 -5.25
19/04/2012 -6
20/04/2012 2.5
23/04/2012 -12.25
24/04/2012 7.25
25/04/2012 18.25
26/04/2012 11.5
27/04/2012 -3.25
30/04/2012 -1
01/05/2012 5.75
02/05/2012 -2.75
03/05/2012 -6.75
04/05/2012 -23
X[,X$sel]
gave me a square matrix equal to nrow(X), not quite what i need.
is there some sort of "Excel's INDEX' type of functions i can use maybe inside an apply function?
You could use the method of subsetting a data frame by passing a two-column matrix with row numbers in the first column and column numbers in the second column. So:
X[matrix(ncol=2, c(1:nrow(X), X$sel)]
will give you a vector of those selected elements, which you can then build into whatever result data frame you're aiming for. Or just add to the existing data frame like this:
X$selected_values <- X[matrix(ncol=2, c(1:nrow(X), X$sel)]