Xtabs to xts in R - r

I have data of class xtabs in R that looks like bellow:
Last Update 1Y 2Y 3Y 4Y 5Y 7Y 10Y 15Y 20Y 25Y
2011-04-15 0.666 1.315 2.105 2.780 3.355 4.180 4.807 5.233 5.411 5.504
2011-04-18 0.653 1.280 2.053 2.727 3.311 4.142 4.785 5.206 5.395 5.491
2011-04-19 0.652 1.273 2.053 2.730 3.312 4.143 4.771 5.201 5.380 5.468
2011-04-20 0.655 1.293 2.092 2.766 3.356 4.181 4.796 5.227 5.402 5.490
2011-04-21 0.644 1.281 2.079 2.772 3.337 4.171 4.805 5.231 5.409 5.485
2011-04-25 0.635 1.261 2.047 2.734 3.314 4.141 4.762 5.181 5.361 5.449
Tenor
Last Update 30Y
2011-04-15 5.504
2011-04-18 5.486
2011-04-19 5.461
2011-04-20 5.503
2011-04-21 5.503
2011-04-25 5.494
The function that I am trying to use is Nelson.Siegel from package YieldCurve. The input structure of the function should be as I have but in xts format. I tried as.xts() but it is not working. How can I change the type from xtabs to xts maintaining the same structure?

Apparently there is not much you can do about converting xtabs to xts format. I reconstructed the structure using reshape(), converted it into a data frame and at last used xts() to change the type to xts format.

Related

Delete top and bottom entire rows if value is lower than -50

I have the below data set:
Profit
MRO 15x5
D30
$150.00
-9.189
-0.24
$12.50
-6.076
-0.248
-$125.00
-7.699
-0.282
-$162.50
-8.008
-0.281
-$175.00
-0.183
-0.056
-$175.00
-0.235
-0.061
$275.00
0.141
-0.027
-$175.00
-4.062
-0.103
-$162.50
-5.654
-0.258
-$162.50
-1.578
-0.051
-$175.00
-3.336
-0.205
-$162.50
-1.523
-0.022
$412.50
-1.524
-0.194
$337.50
-1.049
-0.055
$100.00
-1.043
-0.059
I want to first arrange column D30 in ascending order and then look into the Profit column. If the top n row and bottom n row values (a range of cells) are less than -50 in the Profit column then delete the entire row in the data set.
The result would be like this:
Profit
MRO 15x5
D30
$275.00
0.141
-0.027
-$162.50
-1.578
-0.051
$337.50
-1.049
-0.055
-$175.00
-0.183
-0.056
$100.00
-1.043
-0.059
-$175.00
-0.235
-0.061
-$175.00
-4.062
-0.103
$412.50
-1.524
-0.194
-$175.00
-3.336
-0.205
$150.00
-9.189
-0.24
$12.50
-6.076
-0.248
This output is the result of the deletion of the top 1st row and bottom 3 rows from the entire data set as these rows (range of values) were having Profit values less than -50.
Can anyone please help me to do this in the R program using dplyr or by using some other filtering packages?
I would be thankful for your kind support.
Regards,
Farhan
Use cumany. Combined with filter, it removes rows until a criterion is met (here Profit <= -50).
The first command is a way to parse your Profit column into a numeric column.
library(dplyr)
data %>% mutate(Profit = parse_number(str_replace(Profit,"^-\\$(.*)$", "$-\\1"))) %>%
arrange(D30) %>%
filter(cumany(Profit > -50)) %>%
arrange(desc(D30)) %>%
filter(cumany(Profit > -50))
Profit MRO_15x5 D30
1 275.0 0.141 -0.027
2 -162.5 -1.578 -0.051
3 337.5 -1.049 -0.055
4 -175.0 -0.183 -0.056
5 100.0 -1.043 -0.059
6 -175.0 -0.235 -0.061
7 -175.0 -4.062 -0.103
8 412.5 -1.524 -0.194
9 -175.0 -3.336 -0.205
10 150.0 -9.189 -0.240
11 12.5 -6.076 -0.248

ggplot2 Legend order - Multiple mappings?

I am generating scatterplots with some acoustics data. I am finding the way the data is mapped in ggplot I am unable to reorder the legend in the traditional way with renaming the levels.
Here is the toy example for you:
tickpointer$stt
[1] 0.000 0.166 0.324 0.515 0.734 0.909 1.059 1.191 1.319 1.445 1.561 1.683
tickpointer$highhz
[1] 7338 7073 7312 7948 7975 7709 7895 7736 8293 8373 8081 9920
tickpointer$peakhz
[1] 4969 4969 4969 4969 5250 5438 3938 4125 4594 4875 5531 7125
tickpointer$lowhz
[1] 2351 2404 2510 2351 2802 2908 2882 3174 3598 4022 4580 5587
looking like: Highhz,peakhz,lowhz,stt in a dataframe
I plotted each of these in a ggplot:
require(ggplot2)
tickpointer
ggplot(tickpointer,aes(stt))+#fileoff is choosen as the x value to have a standardized starttime
geom_line(aes(y=highhz,col="High"))+
geom_point(aes(y=highhz,col="High"))+
geom_line(aes(y=peakhz,col="Peak"))+
geom_point(aes(y=peakhz,col="Peak"))+
geom_line(aes(y=lowhz,col="Low"))+
geom_point(aes(y=lowhz,col="Low"))
notice the legend is in the order: High, Peak, Low.
I would like it to be: High, Low, Peak.
Any leads or help trying to figure this out would be great.
Cheers

Error: faceting variables must have at least one value

I have the following dataset:
Col1 Col2 Col3 Col4 Col5 Col6
4439.5 6.5211 50.0182 29.4709 -0.0207 0.0888
4453 25.1186 46.5586 34.1279 -0.0529 0.082
4453.5 24.2974 46.6291 30.6281 -0.057 0.0809
4457.5 25.3257 49.6885 26.2664 -0.0357 0.0837
4465 7.1077 53.516 32.5077 -0.0398 0.1099
4465.5 7.5892 53.0884 33.1582 -0.0395 0.1128
4898.5 8.8296 55.0611 40.3813 -0.0123 0.1389
4899 9.2469 54.4799 37.1927 -0.0061 0.1354
4900 13.4119 50.8334 28.9441 -0.0272 0.1071
4900.5 21.8415 50.1127 24.2351 -0.0375 0.0882
4905 11.3824 52.4024 37.2646 -0.0324 0.1215
4918.5 6.2601 49.9454 27.715 0.0101 0.1444
4919 7.4157 49.7412 25.6159 -0.0164 0.1038
4932 25.737 46.2825 38.6334 -0.0425 0.0717
5008.5 13.641 49.7868 18.0337 -0.0213 0.111
5010.5 13.5935 49.5352 23.9319 -0.0518 0.0979
5012 16.6945 48.0672 25.2408 -0.0446 0.0985
5014.5 14.1303 49.6361 23.1816 -0.0455 0.1056
5040 7.6895 49.8688 31.562 -0.0138 0.126
5044 12.594 60.822 52.4569 0.0481 0.1877
5045.5 10.3719 56.443 43.3782 0.0076 0.1403
5046 8.1382 54.5388 46.2675 0.01 0.1443
5051.5 29.0142 46.8052 43.3224 -0.0465 0.0917
5052 32.3053 46.4278 32.9387 -0.0509 0.0868
5052.5 38.4807 45.3555 24.4187 -0.0619 0.0774
5053 38.8954 43.8459 21.8487 -0.0688 0.0681
5055 19.69 50.9335 46.9419 -0.0527 0.0897
5055.5 11.7398 51.8329 59.5443 -0.0307 0.1083
5056 13.3196 51.8329 55.4419 -0.0276 0.1262
5056.5 18.3702 51.7003 39.232 -0.0408 0.1105
5057.5 14.0531 50.1129 24.4546 -0.0444 0.0921
5058 15.292 49.8805 23.0938 -0.0347 0.0925
5059 20.5135 49.52 21.6173 -0.0333 0.1006
5060 14.5151 47.5836 27.0685 -0.0156 0.1062
5060.5 14.5188 48.2506 27.9704 -0.0363 0.1018
5228 1.2168 54.2009 17.4351 0.0583 0.1794
5229 3.5896 51.7649 26.1107 -0.0033 0.1362
5232.5 2.7404 53.5941 38.6852 0.0646 0.194
5233 3.6694 53.9483 36.674 0.0633 0.204
5234 1.3789 53.8741 18.5804 0.0693 0.1958
5234.5 0.8592 53.6052 18.1654 0.0742 0.1982
5237 2.6951 52.3763 24.8098 0.0549 0.1923
I am trying to create an R visual that will break out each Column into facets, using Col1 as the identity column.
To do this I am using this (faulty) code:
library(reshape2)
library(plotly)
plot.data <- dataset
melted <- melt(dataset, id.vars="Col1")
sp <- ggplot(melted, aes(x=Col1, y=value)) + geom_line()
# Divide by variable in the vertical direction
sp + facet_grid(variable~.)
ggplotly()
However, I am receiving an error saying:
Faceting variables must have at least one value
I know this is an unlikely solution, but did you make sure all your filters are correct / not filtering out values somehow? I find that filter are often a source of mistakes for me so if it works in R, that could be the problem.
I had the same error and it was my filtering:
Example:
I did this data <- data[data$symbol == geneId,] instead of data <- data[data$symbol %in% geneId,]

Issues with merging multiple dataframes (with varying rows) using a common column

Suppose I have many data frames, that have varying row numbers (of data) but Date as common among them. e.g. :
DF1:
Date Index Change
05-04-17 29911.55 0
03-04-17 29910.22 0.0098
31-03-17 29620.5 -0.0009
30-03-17 29647.42 0.0039
29-03-17 29531.43 0.0041
28-03-17 29409.52 0.0059
27-03-17 29237.15 -0.0063
24-03-17 29421.4 0.003
And
DF2:
Date NG NG_Change
05-04-17 213.8 0.0047
04-04-17 212.8 0.0421
03-04-17 204.2 -0.0078
31-03-17 205.8 -0.0068
30-03-17 207.2 -0.0166
29-03-17 210.7 0.0483
28-03-17 201 0.005
27-03-17 200 -0.0015
24-03-17 200.3 0.0137
And another one:
DF3:
Date TI_Price TI_Change
05-04-17 51.39 0.0071
04-04-17 51.03 0.0157
03-04-17 50.24 -0.0071
31-03-17 50.6 0.005
30-03-17 50.35 0.017
29-03-17 49.51 0.0236
28-03-17 48.37 0.0134
I wanted to combine them, using Dates column "as common variable", in a way that there are only those rows in the final for which Dates are common. Such as:
Date TI_Price TI_Change NG NG_Change TI_Price TI_Change
05-04-17 51.39 0.0071 213.8 0.0047 51.39 0.0071
04-04-17 51.03 0.0157 212.8 0.0421 51.03 0.0157
03-04-17 50.24 -0.0071 204.2 -0.0078 50.24 -0.0071
31-03-17 50.6 0.005 205.8 -0.0068 50.6 0.005
30-03-17 50.35 0.017 207.2 -0.0166 50.35 0.017
29-03-17 49.51 0.0236 210.7 0.0483 49.51 0.0236
28-03-17 48.37 0.0134 201 0.005 48.37 0.0134
I am just wondering if there is any method so that I could merge them in one go and not like the merge() function which takes DF2 and DF2 at a time, merge and then the result is merged with DF3.
What I used and tweaked around (but waste):
myfulldata = merge(DF1, DF2, all.x=T)

Read the data from a text file and reshape the data in R

I have a data set for different time intervals. The data has three comment lines before data for each time interval. For each time interval there are 500 data points. I want to change the dataset such that I have the following format:
t1 t2 t3 ................
0.00208 0.00417 0.00625 .................
a1 a2 a3 ...................
b1 b2 b3 ...................
c1 c2 c3 .................
...............................
................................
The link to the file is as follows: https://www.dropbox.com/s/hc8n3qcai1mlxca/WAT_DEP.DAT
As you will see on the file, time for each interval is the second data of the third line before the data starts. For the first time, t= 0.00208. I need to change the data in several rows into one column. At last I need to create a dataframe with the format shown above. In the sample above, a1, b1, c1 are the data for time t1, and so on.
I am sorry for posting a relatively large data set.
Thank you for the help.
Sample data added
The sample data is as follows:
** N:SNAPSHOT TIME DELT[S]
** WATER DEPTH [M]: (HP(L),L=2,LA)
1800 0.00208 0.10000
3.224 3.221 3.220 3.217 3.216 3.214 3.212 3.210 3.209 3.207
3.205 3.203 3.202 3.200 3.199 3.197 3.196 3.193 3.192 3.190
3.189 3.187 3.186 3.184 3.184 3.182 3.181 3.179 3.178 3.176
3.175 3.174 3.173 3.171 3.170 3.169 3.168 3.167 3.166 3.164
3.164 3.162 3.162 3.160 3.160 3.158 3.158 3.156 3.156 3.155
3.154 3.153 3.152 3.151 3.150 3.150 3.149 3.149 3.147 3.147
3.146 3.146 3.145 3.145 3.144 3.144 3.143 3.143 3.142 3.142
3.141 3.142 3.141 3.141 3.140 3.141 3.140 3.140 3.139 3.140
3.139 3.140 3.139 3.140 3.139 3.140 3.139 3.140 3.139 3.140
3.139 3.140 3.140 3.140 3.140 3.141 3.141 3.142 3.141 3.142
3.142 3.142 3.143 3.143 3.144 3.144 3.145 3.145 3.146 3.146
3.147 3.148 3.149 3.149 3.150 3.150 3.152 3.152 3.153 3.154
3.155 3.156 3.157 3.158 3.159 3.160 3.161 3.162 3.163 3.164
3.165 3.166 3.168 3.169 3.170 3.171 3.173 3.174 3.176 3.176
3.178 3.179 3.181 3.182 3.184 3.185 3.187 3.188 3.190 3.191
3.194 3.195 3.196 3.198 3.199 3.202 3.203 3.205 3.207 3.209
3.210 3.213 3.214 3.217 3.218 3.221 3.222 3.225 3.226 3.229
3.231 3.233 3.235 3.238 3.239 3.242 3.244 3.247 3.248 3.251
3.253 3.256 3.258 3.261 3.263 3.266 3.268 3.271 3.273 3.276
3.278 3.281 3.283 3.286 3.289 3.292 3.294 3.297 3.299 3.303
3.305 3.307 3.311 3.313 3.317 3.319 3.322 3.325 3.328 3.331
3.334 3.337 3.340 3.343 3.347 3.349 3.353 3.356 3.359 3.362
3.366 3.369 3.372 3.375 3.379 3.382 3.386 3.388 3.392 3.395
3.399 3.402 3.406 3.409 3.413 3.416 3.420 3.423 3.427 3.430
3.435 3.438 3.442 3.445 3.449 3.453 3.457 3.460 3.464 3.468
3.472 3.475 3.479 3.483 3.486 3.491 3.494 3.498 3.502 3.506
3.510 3.514 3.518 3.522 3.526 3.531 3.534 3.539 3.542 3.547
3.551 3.555 3.559 3.564 3.567 3.572 3.576 3.581 3.584 3.589
3.593 3.598 3.602 3.606 3.610 3.615 3.619 3.624 3.628 3.633
3.637 3.642 3.646 3.651 3.655 3.660 3.664 3.669 3.673 3.678
3.682 3.686 3.691 3.695 3.700 3.704 3.710 3.714 3.719 3.723
3.728 3.733 3.738 3.742 3.747 3.752 3.757 3.761 3.766 3.771
3.776 3.780 3.786 3.790 3.795 3.800 3.805 3.810 3.815 3.819
3.825 3.829 3.835 3.839 3.845 3.849 3.855 3.859 3.865 3.869
3.875 3.879 3.885 3.889 3.895 3.900 3.905 3.910 3.915 3.920
3.926 3.930 3.935 3.941 3.945 3.951 3.956 3.961 3.966 3.972
3.976 3.982 3.987 3.993 3.997 4.003 4.008 4.014 4.018 4.024
4.029 4.035 4.039 4.045 4.050 4.056 4.061 4.066 4.071 4.077
4.082 4.088 4.093 4.099 4.103 4.109 4.114 4.120 4.125 4.131
4.136 4.142 4.147 4.153 4.157 4.163 4.168 4.174 4.179 4.185
4.190 4.195 4.201 4.206 4.212 4.217 4.223 4.228 4.234 4.239
4.245 4.250 4.256 4.261 4.267 4.272 4.278 4.283 4.289 4.294
4.300 4.305 4.311 4.316 4.322 4.327 4.333 4.339 4.345 4.350
4.356 4.361 4.367 4.372 4.378 4.383 4.389 4.394 4.400 4.405
4.411 4.417 4.423 4.428 4.434 4.439 4.445 4.450 4.456 4.461
4.467 4.473 4.478 4.484 4.489 4.495 4.500 4.506 4.511 4.517
4.523 4.529 4.534 4.540 4.545 4.551 4.556 4.562 4.568 4.574
4.579 4.585 4.590 4.596 4.601 4.607 4.613 4.619 4.624 4.630
4.635 4.641 4.646 4.652 4.658 4.664 4.669 4.675 4.680 4.686
4.691 4.697 4.703 4.709 4.714 4.720 4.725 4.731 4.736 4.741
** N:SNAPSHOT TIME DELT[S]
** WATER DEPTH [M]: (HP(L),L=2,LA)
3600 0.00417 0.10000
4.124 4.123 4.123 4.122 4.122 4.121 4.121 4.120 4.120 4.119
4.118 4.117 4.117 4.116 4.116 4.115 4.115 4.114 4.114 4.114
4.114 4.113 4.113 4.112 4.112 4.111 4.111 4.110 4.110 4.109
4.109 4.109 4.109 4.108 4.108 4.107 4.107 4.106 4.107 4.106
4.106 4.105 4.105 4.105 4.105 4.104 4.104 4.104 4.104 4.103
4.103 4.103 4.102 4.102 4.102 4.102 4.101 4.102 4.101 4.101
4.101 4.101 4.100 4.101 4.100 4.101 4.100 4.100 4.100 4.100
4.100 4.100 4.100 4.100 4.100 4.100 4.100 4.100 4.100 4.100
4.100 4.100 4.100 4.100 4.100 4.100 4.100 4.100 4.100 4.101
4.100 4.101 4.100 4.101 4.101 4.101 4.101 4.102 4.101 4.102
4.102 4.101 4.102 4.102 4.103 4.102 4.103 4.103 4.104 4.103
4.104 4.104 4.105 4.104 4.105 4.105 4.106 4.106 4.107 4.106
4.107 4.107 4.108 4.108 4.109 4.109 4.110 4.110 4.110 4.110
4.111 4.111 4.112 4.112 4.113 4.113 4.114 4.114 4.115 4.115
4.116 4.116 4.117 4.117 4.118 4.118 4.120 4.120 4.121 4.121
4.122 4.122 4.122 4.123 4.123 4.125 4.125 4.126 4.126 4.127
4.128 4.129 4.129 4.130 4.130 4.132 4.132 4.133 4.133 4.135
4.135 4.136 4.137 4.138 4.138 4.139 4.140 4.141 4.141 4.143
4.143 4.145 4.145 4.146 4.147 4.148 4.149 4.150 4.150 4.152
4.152 4.154 4.154 4.156 4.156 4.158 4.158 4.160 4.160 4.162
4.162 4.163 4.164 4.165 4.166 4.167 4.168 4.169 4.171 4.171
4.173 4.173 4.175 4.176 4.177 4.178 4.180 4.180 4.182 4.183
4.184 4.185 4.187 4.187 4.189 4.190 4.192 4.192 4.194 4.195
4.197 4.197 4.199 4.200 4.202 4.203 4.204 4.205 4.207 4.208
4.210 4.210 4.212 4.213 4.215 4.216 4.218 4.219 4.221 4.221
4.223 4.224 4.225 4.227 4.228 4.230 4.231 4.233 4.234 4.236
4.237 4.239 4.240 4.242 4.243 4.245 4.246 4.248 4.249 4.251
4.252 4.254 4.255 4.257 4.258 4.260 4.262 4.264 4.265 4.267
4.268 4.270 4.271 4.273 4.275 4.277 4.278 4.280 4.281 4.283
4.285 4.287 4.288 4.290 4.291 4.294 4.295 4.297 4.298 4.301
4.302 4.303 4.305 4.307 4.309 4.310 4.312 4.314 4.316 4.317
4.320 4.321 4.323 4.325 4.327 4.328 4.331 4.332 4.334 4.336
4.338 4.339 4.342 4.343 4.346 4.347 4.349 4.351 4.353 4.355
4.357 4.359 4.361 4.362 4.365 4.366 4.369 4.370 4.373 4.374
4.377 4.378 4.381 4.382 4.385 4.386 4.389 4.390 4.393 4.394
4.397 4.398 4.400 4.402 4.404 4.406 4.408 4.411 4.412 4.415
4.416 4.419 4.421 4.423 4.425 4.427 4.429 4.432 4.433 4.436
4.437 4.440 4.442 4.444 4.446 4.449 4.450 4.453 4.455 4.457
4.459 4.462 4.463 4.466 4.468 4.470 4.472 4.475 4.476 4.479
4.481 4.484 4.485 4.488 4.490 4.492 4.494 4.497 4.499 4.501
4.503 4.505 4.508 4.509 4.512 4.514 4.517 4.519 4.521 4.523
4.526 4.528 4.530 4.532 4.535 4.537 4.540 4.541 4.544 4.546
4.549 4.551 4.554 4.555 4.558 4.560 4.563 4.565 4.568 4.569
4.572 4.574 4.577 4.579 4.582 4.584 4.586 4.588 4.591 4.593
4.596 4.598 4.601 4.603 4.605 4.607 4.610 4.612 4.615 4.617
4.620 4.622 4.624 4.627 4.628 4.631 4.633 4.636 4.638 4.641
4.643 4.646 4.648 4.651 4.653 4.656 4.657 4.660 4.662 4.665
4.667 4.670 4.672 4.675 4.677 4.680 4.682 4.685 4.687 4.690
4.692 4.695 4.697 4.700 4.702 4.705 4.706 4.709 4.711 4.714
4.716 4.719 4.721 4.724 4.726 4.729 4.731 4.734 4.736 4.741
Currently, I have data of 10 columns for each time. I want to create that as one single column of 500 data points. So, I want to arrange the data columns such that first the data on row 1 will be used and then data on second row and so on. This way, we will have one column for one time.
This produces a matrix, result, containing the times in the first row and the data in columns underneath the corresponding time.
L <- readLines(infile)
nt <- length(grep("TIME", L)) # no. of TIME lines
nd <- round((length(L) / nt) - 3) # no. of data lines per time
# times
ix.times <- rep(c(FALSE, TRUE, FALSE), c(2, 1, nd))
times <- scan(text = L[ix.times]) [ c(FALSE, TRUE, FALSE) ]
# data
ix.dat <- rep(c(FALSE, TRUE), c(3, nd))
dat <- matrix(scan(text = L[ix.dat]), nc = nt)
result <- rbind(times, dat)
The first few rows are:
> head(result)
[,1] [,2]
times 0.00208 0.00417
3.22400 4.12400
3.22100 4.12300
3.22000 4.12300
3.21700 4.12200
3.21600 4.12200
For the first part of your question : On idea to remove the comments lines is to use recycling. First, I read all the data using fill=TRUE then:
dat <- read.table(file=file.Name,fill=TRUE)
Then, since you have fixed number of rows, you can do this :
dat <- dat[c(rep(FALSE,3),rep(TRUE,500)),]
You will get a clean data.frame .
I don't get your second part of the question.
Second part solution:
First, call the sample data as sample. I assume two columns in the solution below. You can use lapplyto apply to other columns.
col.1<-as.data.frame(sample[,1])
col.2<-as.data.frame(sample[,2])
Now col.1 and col.2 are dataframes. Try to have the same colnames for `rbind` to work.
sample.1<-rbind(col.1,col.2)

Resources