R: Variable length differ - r

I want to regress a differenced dependent variable on differenced independent variables and on one non- differenced variable.
I tried the following lines in R:
xt <- ts(xx)
yt <- ts(yy)
zt <- ts(zz)
bt <- ts(bb)
mt <- ts(mm)
xtd <- diff(xt)
ytd <- diff(yt)
ztd <- diff(zt)
btd <- diff(bt)
axx <- ts.intersect(xtd, ytd, ztd, btd, mt)
reg1 <- lm(xtd~ytd+ztd+btd+mt, axx)
summary(reg1)
Without the command ts.intersect() a error message pops up, saying that the variable lengths differ, found for the variable mt. Which makes sense since it isnt differenced. My questions are:
i) is this a correct way to deal with different variable lengths? and ii) is there a more efficient way? many thanks in advance
Date xx yy zz bb mm
1 03.01.2005 0.065 0.001 14.4700 17.938 345001.0
2 04.01.2005 0.067 0.006 14.5100 17.886 345001.0
3 05.01.2005 0.064 -0.007 14.4200 17.950 334001.0
4 06.01.2005 0.065 -0.005 13.8000 17.950 334001.0
5 07.01.2005 0.060 -0.006 13.5700 17.913 334001.0
6 10.01.2005 0.059 -0.007 12.9200 17.958 334001.0
7 11.01.2005 0.057 -0.009 13.6800 17.962 334001.0
8 12.01.2005 0.060 -0.005 14.0500 17.886 340001.0
9 13.01.2005 0.060 -0.004 13.6400 17.568 340001.0
10 14.01.2005 0.059 -0.005 13.5700 17.471 340001.0
11 17.01.2005 0.058 -0.005 13.2000 17.365 340001.0
12 18.01.2005 0.059 -0.005 13.1700 17.214 340001.0
13 19.01.2005 0.057 -0.006 13.6300 17.143 354501.0
14 20.01.2005 0.057 -0.007 14.1700 17.125 354501.0
15 21.01.2005 0.056 -0.007 13.9600 17.193 354501.0
16 24.01.2005 0.057 -0.006 14.1100 17.283 354501.0
17 25.01.2005 0.058 -0.006 13.6300 17.083 354501.0
18 26.01.2005 0.057 -0.006 13.3200 17.348 348001.0
19 27.01.2005 0.059 -0.005 12.4600 17.295 353001.0
20 28.01.2005 0.060 -0.004 12.8100 17.219 353001.0
21 31.01.2005 0.058 -0.004 12.7200 17.143 353001.0
22 01.02.2005 0.059 -0.003 12.3600 17.125 353001.0
23 02.02.2005 0.058 -0.003 12.2500 17.000 357501.0
24 03.02.2005 0.056 -0.008 12.3800 16.808 357501.0
25 04.02.2005 0.058 -0.004 11.6000 16.817 357501.0
26 07.02.2005 0.058 -0.004 11.9900 16.798 357501.0
27 08.02.2005 0.058 -0.003 11.9200 16.804 355501.0
28 09.02.2005 0.062 0.000 12.1900 16.589 355501.0
29 10.02.2005 0.060 0.000 12.0400 16.500 355501.0
30 11.02.2005 0.062 0.002 11.9900 16.429 355501.0

The short answer is yes you need to use ts.intersect() when you have some variables that are differenced and some that are not.
You can probably clean up the code a little bit so you don't have so many lines repeated but (especially if you these are all your variables it doesn't really make a difference.
For example, you might recode all columns as time.series in one step by doing ts.d=ts(d[2:6]).

Related

Error trying to produce forecast errors in R

I am trying to modify some code that I have, which works, to instead work with a different function for estimating a model. The original code is the following, and it works with the ARIMA function:
S=round(0.75*length(ts_HHFCE_log))
h=1
error1.h <- c()
for (i in S:(length(ts_HHFCE_log)-h))
{
mymodel.sub <- arima(ts_HHFCE_log[1:i], order = c(0,1,3),seasonal=c(0,0,0))
predict.h <- predict(mymodel.sub,n.ahead=h)$pred[h]
error1.h <- c(error1.h,ts_HHFCE_log[i+h]-predict.h)
}
The intuition is the following: Your time series has length T. You start somewhere at the beginning of your sample, but to give enough observations to regress and obtain parameter coefficients for your alpha and betas. Let's call this t for simplicity. Then based on this, you produce a one-step ahead forecast, so for time period (t+1). Then your forecast error is the difference between the actual value for (t+1) and your forecast value based on regressing on data available until t. Then you iterate, and consider from the start to (t+1), regress, and forecast (t+2). Then you obtain a forecast error for (t+2). Then basically you keep on doing this iterative process until you reach (T-1) and produce a forecast for T. This provides with what is known as a dynamic out of sample forecast error series. You do this for different models and then ascertain using a statistical test which is the more appropriate model to use. It is a way to produce out of sample forecasting using only the data you already have.
I have modified the code to be the following:
S=round(0.75*length(ts.GDP))
h=1
error1.h <- c()
for (i in S:(length(ts.GDP)-h))
{
mymodel.sub <- lm(ts.GDP[4:i] ~ ts.GDP[3:(i-1)] + ts.GDP[2:(i-2)] + ts.GDP[1:(i-3)])
predict.h <- predict(mymodel.sub,n.ahead=h)$pred[h]
error1.h <- c(error1.h,ts.GDP[i+h]-predict.h)
}
I'm trying to do an AR(3) model. The reason I am not using the ARIMA function is because I also then want to compare these forecast errors with an ARDL model, and to my knowledge there is no simple function for the ARDL model (I'd have to use lm(), hence why I want to do the AR(3) model using the lm() function).
The model I wish to compare the AR(3) model is the following:
model_ts.GDP_1 <- lm(ts.GDP[4:123] ~ ts.GDP[3:122] + ts.GDP[2:121] + ts.GDP[1:120] + ts.CCI_AGG[3:122] + ts.CCI_AGG[2:121] + ts.CCI_AGG[1:120])
I am unsure how further to modify the code to get what I am after. Hopefully the intuition bit I explained should be clear in what I am trying to do.
The data for GDP is basically the quarterly growth rate. It is stationary. The other variable in the second model is an index I've constructed using a dynamic PCA and taken first differences so it too is stationary. But in any case, in the second model, the forecast at t is based only on lagged data of each GDP and the index I constructed. Equally, given I am simulating out of sample forecast using data I have, there is no issue with actually properly forecasting. (In time series, this technique is seen as a more robust method to compare models than simply using things such as RMSE, etc.)
Thanks!
The data I am using:
Date GDP_qoq CCI_A_qoq
31/03/1988 2.956 0.540
30/06/1988 2.126 -0.743
30/09/1988 3.442 0.977
31/12/1988 3.375 -0.677
31/03/1989 2.101 0.535
30/06/1989 1.787 -0.667
30/09/1989 2.791 0.343
31/12/1989 2.233 -0.334
31/03/1990 1.961 0.520
30/06/1990 2.758 -0.763
30/09/1990 1.879 0.438
31/12/1990 0.287 -0.708
31/03/1991 1.796 -0.078
30/06/1991 1.193 -0.735
30/09/1991 0.908 0.896
31/12/1991 1.446 0.163
31/03/1992 0.870 0.361
30/06/1992 0.215 -0.587
30/09/1992 0.262 0.238
31/12/1992 1.646 -1.436
31/03/1993 2.375 0.646
30/06/1993 0.249 -0.218
30/09/1993 1.806 0.676
31/12/1993 1.218 -0.393
31/03/1994 1.501 0.346
30/06/1994 0.879 -0.501
30/09/1994 1.123 0.731
31/12/1994 2.089 0.062
31/03/1995 0.386 0.475
30/06/1995 1.238 -0.243
30/09/1995 1.836 0.263
31/12/1995 1.236 -0.125
31/03/1996 1.926 -0.228
30/06/1996 2.109 -0.013
30/09/1996 1.312 0.196
31/12/1996 0.972 -0.015
31/03/1997 1.028 -0.001
30/06/1997 1.086 -0.016
30/09/1997 2.822 0.156
31/12/1997 -0.818 -0.062
31/03/1998 1.418 0.408
30/06/1998 0.970 -0.548
30/09/1998 0.968 0.466
31/12/1998 2.826 -0.460
31/03/1999 0.599 0.228
30/06/1999 -0.651 -0.361
30/09/1999 1.289 0.579
31/12/1999 1.600 0.196
31/03/2000 2.324 0.535
30/06/2000 1.368 -0.499
30/09/2000 0.825 0.440
31/12/2000 0.378 -0.414
31/03/2001 0.868 0.478
30/06/2001 1.801 -0.521
30/09/2001 0.319 0.068
31/12/2001 0.877 0.045
31/03/2002 1.253 0.061
30/06/2002 1.247 -0.013
30/09/2002 1.513 0.625
31/12/2002 1.756 0.125
31/03/2003 1.443 -0.088
30/06/2003 0.874 -0.138
30/09/2003 1.524 0.122
31/12/2003 1.831 -0.075
31/03/2004 0.780 0.395
30/06/2004 1.665 -0.263
30/09/2004 0.390 0.543
31/12/2004 0.886 -0.348
31/03/2005 1.372 0.500
30/06/2005 2.574 -0.066
30/09/2005 0.961 0.058
31/12/2005 2.378 -0.061
31/03/2006 1.015 0.212
30/06/2006 1.008 -0.218
30/09/2006 1.105 0.593
31/12/2006 0.943 -0.144
31/03/2007 1.566 0.111
30/06/2007 1.003 -0.125
30/09/2007 1.810 0.268
31/12/2007 1.275 -0.592
31/03/2008 1.413 0.017
30/06/2008 -0.491 -0.891
30/09/2008 -0.617 -0.836
31/12/2008 -1.410 -1.092
31/03/2009 -1.593 0.182
30/06/2009 -0.106 -0.922
30/09/2009 0.788 0.351
31/12/2009 0.247 0.414
31/03/2010 1.221 -0.329
30/06/2010 1.561 -0.322
30/09/2010 0.163 0.376
31/12/2010 0.825 -0.104
31/03/2011 2.484 0.063
30/06/2011 -0.574 -0.107
30/09/2011 0.361 -0.006
31/12/2011 0.997 -0.304
31/03/2012 0.760 0.243
30/06/2012 0.143 -0.381
30/09/2012 2.547 0.315
31/12/2012 0.308 -0.046
31/03/2013 0.679 0.221
30/06/2013 0.766 -0.170
30/09/2013 1.843 0.352
31/12/2013 0.756 0.080
31/03/2014 1.380 -0.080
30/06/2014 1.501 0.162
30/09/2014 0.876 0.017
31/12/2014 0.055 -0.251
31/03/2015 0.497 0.442
30/06/2015 1.698 -0.278
30/09/2015 0.066 0.397
31/12/2015 0.470 0.076
31/03/2016 1.581 0.247
30/06/2016 0.859 -0.342
30/09/2016 0.865 -0.011
31/12/2016 1.467 0.049
31/03/2017 1.006 0.087
30/06/2017 0.437 -0.215
30/09/2017 0.527 0.098
31/12/2017 0.900 0.218
The only thing you need to understand is how to get predictions using lm, it's not necessary to add other details (without reproducible data you're only making it more difficult).
Create dummy data:
set.seed(123)
df<-data.frame(a=runif(10),b=runif(10),c=runif(10))
> print(df)
a b c
1 0.2875775 0.95683335 0.8895393
2 0.7883051 0.45333416 0.6928034
3 0.4089769 0.67757064 0.6405068
4 0.8830174 0.57263340 0.9942698
5 0.9404673 0.10292468 0.6557058
6 0.0455565 0.89982497 0.7085305
7 0.5281055 0.24608773 0.5440660
8 0.8924190 0.04205953 0.5941420
9 0.5514350 0.32792072 0.2891597
10 0.4566147 0.95450365 0.1471136
Fit your model:
model<-lm(c~a+b,data=df)
Create new data:
new_df<-data.frame(a=runif(1),b=runif(1))
> print(new_df)
a b
1 0.9630242 0.902299
Get predictions from your new data:
prediction<- predict(model,new_df)
> print(prediction)
1
0.8270997
In your case, the new data new_df will be your lagged data, but you have to make the appropriate changes, OR provide reproducible data as above if you want us to go through the details of your problem.
Hope this helps.

Averaging Duplicate Values in an R data frame

I have a df named ColorMap in which I am looking to average all numerical values corresponding to the same feature (further explanation below). Here is the df.
> ColorMap
KEGGnumber Colors
1 c("C00489" 0.162
2 "C06104" 0.162
3 "C02656") 0.162
4 C00163 -0.173
5 c("C02656" -0.140
6 "C00036" -0.140
7 "C00232" -0.140
8 "C01571" -0.140
9 "C00422") -0.140
10 c("C00402" 0.147
11 "C06664" 0.147
12 "C06687" 0.147
13 "C02059") 0.147
14 c("C00246" 0.069
15 "C00902") 0.069
**16 C00033 0.011
...
25 C00033 -0.073**
26 C00048 0.259
**27 c("C00803" 0.063
...
37 C00803 -0.200
38 C00803 -0.170**
39 c("C00164" -0.020
40 "C01712" -0.020
...
165 c("C00246" 0.076
166 "C00902") 0.076
**167 C00163 -0.063
...
169 C00163 0.046**
170 c("C00058" -0.208
171 "C00036") -0.208
172 C00121 -0.178
173 C00033 -0.193
174 C00163 -0.085
I would like the final to look something like this
> ColorMap
KEGGnumber Colors
1 C00489 0.162
2 C06104 0.162
3 C02656 0.162
4 C00163 -0.173
5 C02656 -0.140
6 C00036 -0.140
7 C00232 -0.140
8 C01571 -0.140
9 C00422 -0.140
10 C00402 0.147
11 C06664 0.147
12 C06687 0.147
13 C02059 0.147
14 C00246 0.069
15 C00902 0.069
**16 C00033 0.031**
26 C00048 0.259
**27 C00803 -0.100**
39 C00164 -0.020
40 C01712 -0.020
...
165 C00246 0.076
166 C00902 0.076
**167 C00163 0.0085**
170 C00058 -0.208
171 C00036 -0.208
172 C00121 -0.178
173 C00033 -0.193
174 C00163 -0.085
They do not need to be next to each other, I simply chose those for easy visualization. I would like the mean of all Colors to a single KEGGvalue. Thus, each KEGGvalue is unique, there are no duplicates.
You can clean that column using
library(stringr)
ColorMap$KEGGnumber <- str_extract(ColorMap$KEGGnumber, "[C][0-9]+")
The argument pattern allows you to match with a regular expression, in this case, a simple one, telling you to match the capital letter C followed by any number of numbers.
Afterwards, grouping using dplyr we have
library(dplyr)
ColorMap %>% group_by(KEGGnumber) %>% summarize(mean(Colors))

Different regression output using dynlm and lm

I ran a regression first using lm and then using dynlm(from the package dynlm). Here is what I did using lm:
Euribor3t <- ts(diff(Euribor3))
OIS3t <- ts(diff(Ois3))
x <- ts(diff(Eurepo3-Ois3))
Vstoxxt <- ts(diff(Vstoxx))
CDSt <- ts(diff(CDS))
omo2 <- ts(diff(log(Open.Market.Operations)))
l1 <- (lag(Euribor3t, k=-1))
axx <- ts.intersect(Euribor3t, OIS3t, x, Vstoxxt, CDSt, omo2, l1)
reg1 <- lm(Euribor3t~OIS3t+CDSt+x+Vstoxxt+omo2+l1, data=axx)
summary(reg1)
and for dynlm:
zooX = zoo(test[, -1])
lmx <- dynlm(d(Euribor3)~d(Ois3)+d(CDS)+d(Eurepo3-Ois3)+d(Vstoxx)+d(log(Open.Market.Operations))+d(L(Euribor3, 1)), data=zooX)
summary(lmx)
These two approaches give me exact the same output. However if I add a subset to both regressions from 1 to 24 (all else equal):
Euribor3t <- ts(diff(Euribor3))
OIS3t <- ts(diff(Ois3))
x <- ts(diff(Eurepo3-Ois3))
Vstoxxt <- ts(diff(Vstoxx))
CDSt <- ts(diff(CDS))
omo2 <- ts(diff(log(Open.Market.Operations)))
l1 <- (lag(Euribor3t, k=-1))
axx <- ts.intersect(Euribor3t, OIS3t, x, Vstoxxt, CDSt, omo2, l1)
reg1 <- lm(Euribor3t~OIS3t+CDSt+x+Vstoxxt+omo2+l1, data=axx, subset=1:24)
summary(reg1)
zooX = zoo(test[, -1])
lmx <- dynlm(d(Euribor3)~d(Ois3)+d(CDS)+d(Eurepo3-Ois3)+d(Vstoxx)+d(log(Open.Market.Operations))+d(L(Euribor3, 1)), data=zooX[1:24])
summary(lmx)
The two outputs differ from each other. What might be the problem causing the deviation in my regression outputs?
Here is the data sample I experimented with:
Date Euribor3 Ois3 Eurepo3 Vstoxx CDS Open.Market.Operations
1 03.01.2005 2.154 2.089 2.09 14.47 17.938 344999
2 04.01.2005 2.151 2.084 2.09 14.51 17.886 344999
3 05.01.2005 2.151 2.087 2.08 14.42 17.950 333998
4 06.01.2005 2.150 2.085 2.08 13.80 17.950 333998
5 07.01.2005 2.146 2.086 2.08 13.57 17.913 333998
6 10.01.2005 2.146 2.087 2.08 12.92 17.958 333998
7 11.01.2005 2.146 2.089 2.08 13.68 17.962 333998
8 12.01.2005 2.145 2.085 2.08 14.05 17.886 339999
9 13.01.2005 2.144 2.084 2.08 13.64 17.568 339999
10 14.01.2005 2.144 2.085 2.08 13.57 17.471 339999
11 17.01.2005 2.143 2.085 2.08 13.20 17.365 339999
12 18.01.2005 2.144 2.085 2.08 13.17 17.214 347999
13 19.01.2005 2.143 2.086 2.08 13.63 17.143 354499
14 20.01.2005 2.144 2.087 2.08 14.17 17.125 354499
15 21.01.2005 2.143 2.087 2.08 13.96 17.193 354499
16 24.01.2005 2.143 2.086 2.08 14.11 17.283 354499
17 25.01.2005 2.144 2.086 2.08 13.63 17.083 354499
18 26.01.2005 2.143 2.086 2.08 13.32 17.348 347999
19 27.01.2005 2.144 2.085 2.08 12.46 17.295 352998
20 28.01.2005 2.144 2.084 2.08 12.81 17.219 352998
21 31.01.2005 2.142 2.084 2.08 12.72 17.143 352998
22 01.02.2005 2.142 2.083 2.08 12.36 17.125 352998
23 02.02.2005 2.141 2.083 2.08 12.25 17.000 357499
24 03.02.2005 2.144 2.088 2.08 12.38 16.808 357499
25 04.02.2005 2.142 2.084 2.08 11.60 16.817 357499
26 07.02.2005 2.142 2.084 2.08 11.99 16.798 359999
27 08.02.2005 2.141 2.083 2.08 11.92 16.804 355500
28 09.02.2005 2.142 2.080 2.08 12.19 16.589 355500
29 10.02.2005 2.140 2.080 2.08 12.04 16.500 355500
30 11.02.2005 2.140 2.078 2.08 11.99 16.429 355500
31 14.02.2005 2.139 2.078 2.08 12.52 16.042 355500
You are not allowing dynlm to use the same amount of data as in lm. The latter model contains two fewer observations.
dim(model.frame(reg1))
# [1] 24 7
dim(model.frame(lmx))
# [1] 22 7
The reason is that withlm you are transforming the variables (differencing) with the entire data set (31 observations), while in dynlm you are passing only 24 observations and, hence, dynlm will do the differencing with 24 observations. Due to the observations that are lost after differencing, the resulting number of rows is not the same in both cases.
In dylm you should use data=zooX[1:26]. In this way the same subset is used and the same result is obtained:
reg1 <- lm(Euribor3t~OIS3t+CDSt+x+Vstoxxt+omo2+l1, data=axx, subset=1:24)
lmx <- dynlm(d(Euribor3)~d(Ois3)+d(CDS)+d(Eurepo3-Ois3)+d(Vstoxx)+
d(log(Open.Market.Operations))+d(L(Euribor3, 1)), data=zooX[1:26])
all.equal(as.vector(fitted(reg1)), as.vector(fitted(lmx)))
# [1] TRUE
all.equal(coef(reg1), coef(lmx), check.attributes=FALSE)
# [1] TRUE

automatic filtering of measurement data within R

my question is about automatic filtering of measurement data, because I have several hundred files to process.
The file-structure looks like:
test1 <- read.table("~/test1.txt",sep="\t",dec=".",skip=17,header=TRUE)
Number Time.s Potential.V Current.A
1 0.0000 0.060 -0.7653
2 0.0285 0.060 -0.7597
3 0.0855 0.060 -0.7549
.....
17 0.8835 0.060 -0.7045
18 0.9405 0.060 -0.5983
19 0.9975 0.061 -0.1370
20 1.0545 0.062 0.1295
21 1.1115 0.063 0.2680
......
8013 456.6555 0.066 -1.1070
8014 456.7125 0.065 -1.1850
8015 456.7695 0.063 -1.2610
8016 456.8265 0.062 -1.3460
8017 456.8835 0.061 -1.4380
8018 456.9405 0.060 -1.4350
8019 456.9975 0.060 -1.0720
8020 457.0545 0.060 -0.8823
8021 457.1115 0.060 -0.7917
8022 457.1685 0.060 -0.7481
I need to get rid off the beginning and ending extra lines with the Potential.V == 0.06. My problem is that the number of lines in the beginning and at the end of the various files isn't fix.
Next restriction is that the file includes several measurements after each other, so I can't just remove all lines with 0.06 in the data.frame.
I the moment I do the cutting manually, not very elegant but I don't know of a better solution:
test_b1 <- data.frame(test1$Number[18:8018],test1$Time.s[18:8018],test1$Potential.V[18:8018],test1$Current.A[18:8018])
I tried using iterations like
for (c in 1:(length(test1))) {
if (counter>1) & ((as.numeric(r[counter])- as.numeric(r[counter-1]))==1) {
cat("Skip \n")}
}
but I didn't got a working solution, because of a lack of skill on my side :/ .
Is there a module on CRAN or a more elegant way to solve such problems ?
Best regards
Another way using which.max:
# data modified to include 0.06 Potential.V in inner range
d <- read.table(text="Number Time.s Potential.V Current.A
1 0.0000 0.060 -0.7653
2 0.0285 0.060 -0.7597
3 0.0855 0.060 -0.7549
17 0.8835 0.060 -0.7045
18 0.9405 0.060 -0.5983
19 0.9975 0.061 -0.1370
19 0.9975 0.060 -0.1370
20 1.0545 0.062 0.1295
21 1.1115 0.063 0.2680
8013 456.6555 0.066 -1.1070
8014 456.7125 0.065 -1.1850
8015 456.7695 0.063 -1.2610
8016 456.8265 0.062 -1.3460
8017 456.8835 0.061 -1.4380
8018 456.9405 0.060 -1.4350
8019 456.9975 0.060 -1.0720
8020 457.0545 0.060 -0.8823
8021 457.1115 0.060 -0.7917
8022 457.1685 0.060 -0.7481", header=TRUE)
with(d, {
inner.start <- which.max(Potential.V != 0.06)
inner.end <- nrow(d) - which.max(rev(Potential.V != .06)) + 1
d[inner.start:inner.end, ]
})
# Number Time.s Potential.V Current.A
# 6 19 0.9975 0.061 -0.1370
# 7 19 0.9975 0.060 -0.1370
# 8 20 1.0545 0.062 0.1295
# 9 21 1.1115 0.063 0.2680
# 10 8013 456.6555 0.066 -1.1070
# 11 8014 456.7125 0.065 -1.1850
# 12 8015 456.7695 0.063 -1.2610
# 13 8016 456.8265 0.062 -1.3460
# 14 8017 456.8835 0.061 -1.4380
If you want to include the 0.06 row just before and after the inner range, subtract 1 from inner.start and add 1 to inner.end.
Here's one using rle:
filter.df <- function(df) {
pot.rle <- rle(df$Potential.V)
idx <- cumsum(pot.rle$lengths)
val <- pot.rle$values
chk <- ifelse(val[1] == 0.06 & val[length(val)] == 0.06, TRUE, FALSE)
if (chk) {
df[(idx[1]):(max(idx[1], idx[length(idx)-1])+1), ]
}
}
filter.df(df)
# Number Time.s Potential.V Current.A
# 5 18 0.9405 0.060 -0.5983
# 6 19 0.9975 0.061 -0.1370
# 7 20 1.0545 0.062 0.1295
# 8 21 1.1115 0.063 0.2680
# 9 8013 456.6555 0.066 -1.1070
# 10 8014 456.7125 0.065 -1.1850
# 11 8015 456.7695 0.063 -1.2610
# 12 8016 456.8265 0.062 -1.3460
# 13 8017 456.8835 0.061 -1.4380
# 14 8018 456.9405 0.060 -1.4350
Here's another one, quite similar, also with rle :
val <- rle(df$Potential.V)
if (val$values[1]==0.06) df <- df[-(1:(val$lengths[1]-1)),]
if (tail(val$values,1)==0.06) {
nb <- nrow(df)
df <- df[-((nb-tail(val$lengths,1)+2):nb),]
}
It gives :
Number Time.s Potential.V Current.A
5 18 0.9405 0.060 -0.5983
6 19 0.9975 0.061 -0.1370
7 20 1.0545 0.062 0.1295
8 21 1.1115 0.063 0.2680
9 8013 456.6555 0.066 -1.1070
10 8014 456.7125 0.065 -1.1850
11 8015 456.7695 0.063 -1.2610
12 8016 456.8265 0.062 -1.3460
13 8017 456.8835 0.061 -1.4380
14 8018 456.9405 0.060 -1.4350

Mapping spatial Distributions in R

My data set includes 17 stations and for each station there are 24 hourly temperature values.
I would like to map each stations value in each hour and doing so for all the hours.
What I want to do is something like the image.
The data is in the following format:
N2 N3 N4 N5 N7 N8 N10 N12 N13 N14 N17 N19 N25 N28 N29 N31 N32
1 1.300 -0.170 -0.344 2.138 0.684 0.656 0.882 0.684 1.822 1.214 2.046 2.432 0.208 0.312 0.530 0.358 0.264
2 0.888 -0.534 -0.684 1.442 -0.178 -0.060 0.430 -0.148 1.420 0.286 1.444 2.138 -0.264 -0.042 0.398 -0.196 -0.148
3 0.792 -0.564 -0.622 0.998 -0.320 1.858 -0.036 -0.118 1.476 0.110 0.964 2.048 -0.480 -0.434 0.040 -0.538 -0.322
4 0.324 -1.022 -1.128 1.380 -0.792 1.042 -0.054 -0.158 1.518 -0.102 1.354 2.386 -0.708 -0.510 0.258 -0.696 -0.566
5 0.650 -0.774 -0.982 1.124 -0.540 3.200 -0.052 -0.258 1.452 0.028 1.022 2.110 -0.714 -0.646 0.266 -0.768 -0.532
6 0.670 -0.660 -0.844 1.248 -0.550 2.868 -0.098 -0.240 1.380 -0.012 1.164 2.324 -0.498 -0.474 0.860 -0.588 -0.324
MeteoSwiss
1 -0.6
2 -1.2
3 -1.0
4 -0.8
5 -0.4
6 -0.2
where N2, N3, ...m MeteoSwiss are the stations and each row presents the station's temperature value for each hour.
id Longitude Latitude
2 7.1735 45.86880001
3 7.17254 45.86887001
4 7.171636 45.86923601
5 7.18018 45.87158001
7 7.177229 45.86923001
8 7.17524 45.86808001
10 7.179299 45.87020001
12 7.175189 45.86974001
13 7.179379 45.87081001
14 7.175509 45.86932001
17 7.18099 45.87262001
19 7.18122 45.87355001
25 7.15497 45.87058001
28 7.153399 45.86954001
29 7.152649 45.86992001
31 7.154419 45.87004001
32 7.156099 45.86983001
MeteoSwiss 7.184 45.896
I define a toy example more or less resembling your data:
vals <- matrix(rnorm(24*17), nrow=24)
cds <- data.frame(id=paste0('N', 1:17),
Longitude=rnorm(n=17, mean=7.1),
Latitude=rnorm(n=17, mean=45.8))
vals <- as.data.frame(t(vals))
names(vals) <- paste0('H', 1:24)
The sp package defines several classes and methods to store and
display spatial data. For your example you should use the
SpatialPointsDataFrame class:
library(sp)
mySP <- SpatialPointsDataFrame(coords=cds[,-1], data=data.frame(vals))
and the spplot method to display the information:
spplot(mySP, as.table=TRUE,
col.regions=bpy.colors(10),
alpha=0.8, edge.col='black')
Besides, you may find useful the spacetime package
(paper at JSS).

Resources