month language in the as.date function - r

I don't know how to figure out this problem. That's a piece of my df:
structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("B0", "B1", "B12", "B2", "B21", "B22", "B26",
"B3", "B33", "B4", "B7", "P1", "P21", "P24", "P24 ", "P25", "P27",
"P28", "P29"), class = "factor"), Date = structure(c(9839, 9946,
10045, 10133, 10190, 10302, 10354, 10423, 10528, 10676, 10756,
10841, 10904, 11032, 11129, 11227, 11290, 11390, 11485, 11571,
11645, 11725, 11843, 11928, 12003, 12128, 12221, 12305, 12380,
12499, 12549, 12640, 12716, 12856, 12926, 12996, 13104, 13580,
13671, 13759, 13802), class = "Date"), T = c(9.6, 10.1, 10.4,
9.9, 9.4, 9.8, 10, 9.8, 9.8, 9.9, 10.3, 10.6, 9.9, 10, 10.3,
10.1, 10.3, 10, 10.2, 10.4, 10.1, 10.1, 10.1, 10.5, 10.3, NA,
NA, NA, NA, 10.3, 10.4, 10.9, 10.6, 10.4, 10.7, 10.2, 10, 10.2,
10.6, 10.5, 10.4), ph = c(6.9, 7.08, 6.96, 7, 7, 6.97, 6.92,
7.02, 6.93, 6.91, 6.83, 6.87, 6.8, 6.92, 7.02, 6.94, 6.94, 6.86,
6.9, 6.89, 6.9, 6.97, 6.92, 6.93, 6.91, 6.88, 6.93, 6.78, 6.87,
6.91, 6.82, 6.91, 6.98, 6.99, 6.79, 6.91, 6.61, 6.86, 6.93, 6.88,
6.74), EC = c(2810, 3020, 2170, 2511, 1695, 3100, 2510, 1759,
1101, 3330, 5370, 3300, 3210, 921, 2300, 3380, 3340, 2850, 3430,
3510, 3450, 3400, 3280, 3170, 3210, 3250, 3010, 2970, 3080, 3120,
3100, 3040, 3100, 2940, 3050, 3070, 3040, 2270, 2990, 2830, 3010
), O2 = c(0.1, 0.1, 1.3, 0.2, 0.2, 0.1, NA, 0.2, 0.1, 0.2, 0.1,
NA, NA, NA, 0.1, 0.1, NA, 0.1, 0.1, 0.2, NA, NA, 0.2, 0.1, 0.1,
0, 0, 0.1, 0.4, 0.2, 0.2, 0.3, 0.2, 0.1, 0.1, 0.3, 0.7, 0.2,
0.4, 0.2, 0.2), Cl = c(696, 718, 722, 856, 776, 752, 745, 788,
822, 727, 650, 800, 766, 700, 800, 720, 760, 710, 730, 720, 810,
610, 720, 830, 820, 740, 670, 510, 710, 500, 640, 630, 650, 430,
660, 660, 630, 560, 680, 670, 670), SO4 = c(152, 111, 133, 245,
194, 110, 105, 104, 185, 156, 137, 194, 196, 170, 220, 230, 240,
200, 220, 200, 220, 170, 230, 210, 240, 280, 240, 190, 260, 360,
280, 250, 220, 380, 240, 240, 230, 320, 220, 210, 220), NO2 = c(NA,
NA, NA, NA, NA, NA, NA, 0.0067, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 0.015, NA, NA, NA, NA, NA, 0.01,
NA, NA, NA, 0.031, NA, NA, NA, NA, NA, NA, NA), NO3 = c(0.15,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 0.06, NA, 0.02, NA, 0.02, 0.07, 0.2,
0.02, NA, NA, NA, 0.05, 0.08, NA, NA, NA, NA, NA), Fe = c(22,
20, NA, 23, NA, 25, NA, NA, NA, NA, 27, NA, NA, NA, 32, NA, NA,
NA, 33, NA, NA, NA, 33, NA, NA, NA, 29, NA, NA, NA, 9, NA, NA,
NA, 8.3, NA, NA, NA, 17, NA, NA), Mn = c(3.8, 3.8, NA, 4.5, NA,
4.7, NA, NA, NA, NA, 4.9, NA, NA, NA, 5.8, NA, NA, NA, 6, NA,
NA, NA, 6, NA, NA, NA, 5.3, NA, NA, NA, 4.1, NA, NA, NA, 4.2,
NA, NA, NA, 4.9, NA, NA), Month = c("dicembre", "marzo", "luglio",
"settembre", "novembre", "marzo", "maggio", "luglio", "ottobre",
"marzo", "giugno", "settembre", "novembre", "marzo", "giugno",
"settembre", "novembre", "marzo", "giugno", "settembre", "novembre",
"febbraio", "giugno", "agosto", "novembre", "marzo", "giugno",
"settembre", "novembre", "marzo", "maggio", "agosto", "ottobre",
"marzo", "maggio", "agosto", "novembre", "marzo", "giugno", "settembre",
"ottobre")), .Names = c("ID", "Date", "T", "ph", "EC", "O2",
"Cl", "SO4", "NO2", "NO3", "Fe", "Mn", "Month"), row.names = c(NA,
-41L), class = "data.frame")
I convert the Date column into a Date object using:
df$Date<-as.Date(df$Date, "%d.%m.%y")
Then I created the Month column from the Date one, typing:
df$Month <- months(as.Date(df$Date))
But the month names are in Italian and when I try to create another column of ordered factor
df$Month_factor<-factor(df$Month levels=month.name, ordered=T)
month names are not recognized and it just appears a column NA. So my question is: is it possible to change the language of the months when creating the new column? Else, is it possible for R to recognize the months as an ordered factor even if the month names are not in English?
I need the months as an ordered factor because I have to plot values like explained in this post.

It works for me... there is no italian as you overwrite it with df$Month <- months(as.Date(df$Date)) can you please give us your sessionInfo(), here is mine
sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C LC_TIME=C
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] Rcpp_0.10.4 TeachingDemos_2.9 fastmatch_1.0-4 fasttime_1.0-0 data.table_1.8.9 bit64_0.9-2
[7] bit_1.1-10 vimcom_0.9-8
loaded via a namespace (and not attached):
[1] tools_3.0.1
try to set LC_TIME to C too with Sys.setlocale("LC_TIME", "C");

Related

Formattable - Export to PDF

I'd like to accompany my ggplot2-visualisations with nice looking tables. Or in some cases just display the tables. My target audience is not fond of just being presented a table. It needs some clear indicators of 'where to look' so to say. For that I've been using formattable (see pct_change... columns).
I can export the table below in an image format, but I've been unable to fully reproduce it as a pdf. When I export it as a html, then print from the browser, I lose the colour formatting (see pct_formatter-code at bottom). I've tried Edge, Firefox and Chrome. Turning on print with colour does not help. So in addition to being cumbersome (the table below is one of a group of 150) to print via the browser, it also doens't give me the desired result.
I've also found a workaround here on Stackoverflow where someone wrote an 'export_formattable' function. This does indeed export in pdf directly from R. However I lose again the colour and when I open it in Adobe Illustrator, I also lose the arrow icons, they become like [X]-boxes. So that doesn't work either.
I haven't really tried Rmarkdown to be honest, simply because I'm quite unskilled in using it. From what I tried, it's seems it's not made to simply output a table in the way, shape or size I want. I don't want create a (reproducible) rapport. I just need a 'nicely' formatted pdf-table (or .svg!!) that I will then manually combine with a visualisation in InDesign to make the desired document.
Thanks for reading, hope there's some way to help!
pct_formatter <- formatter("span",
style = x ~ style(
color =
ifelse(
x > 0, "#39870c",
ifelse(
x < 0, "#d52b1e",
"black")
)
), x ~ icontext(ifelse(x>0, "arrow-up", "arrow-down"), x)
)
Data:
structure(list(Year = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("2019", "2020",
"2021"), class = "factor"), Month = c(1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3,
4, 5, 6, 7, 8, 9), Totaal_permaand = c(2243L, 2007L, 2884L, 2206L,
2701L, 2325L, 1452L, 1721L, 3152L, 3067L, 3097L, 2554L, 3303L,
2948L, 3325L, 3173L, 3504L, 3209L, 5924L, 4637L, 5735L, 6206L,
4252L, 3479L, 4312L, 3128L, 4529L, 4170L, 3814L, 5587L, 9281L,
4615L, 4426L), abs_change.M = c(NA, -236L, 877L, -678L, 495L,
-376L, -873L, 269L, 1431L, -85L, 30L, -543L, 749L, -355L, 377L,
-152L, 331L, -295L, 2715L, -1287L, 1098L, 471L, -1954L, -773L,
833L, -1184L, 1401L, -359L, -356L, 1773L, 3694L, -4666L, -189L
), pct_change.M = c(NA, -10.5, 43.7, -23.5, 22.4, -13.9, -37.5,
18.5, 83.1, -2.7, 1, -17.5, 29.3, -10.7, 12.8, -4.6, 10.4, -8.4,
84.6, -21.7, 23.7, 8.2, -31.5, -18.2, 23.9, -27.5, 44.8, -7.9,
-8.5, 46.5, 66.1, -50.3, -4.1), abs_change.Y = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1060L, 941L, 441L, 967L,
803L, 884L, 4472L, 2916L, 2583L, 3139L, 1155L, 925L, 1009L, 180L,
1204L, 997L, 310L, 2378L, 3357L, -22L, -1309L), pct_change.Y = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 47.3, 46.9, 15.3,
43.8, 29.7, 38, 308, 169.4, 81.9, 102.3, 37.3, 36.2, 30.5, 6.1,
36.2, 31.4, 8.8, 74.1, 56.7, -0.5, -22.8), abs_change.Y2 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 2069L, 1121L, 1645L, 1964L, 1113L,
3262L, 7829L, 2894L, 1274L), pct_change.Y2 = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 92.2, 55.9, 57, 89, 41.2, 140.3, 539.2, 168.2,
40.4), CS = c(2243L, 4250L, 7134L, 9340L, 12041L, 14366L, 15818L,
17539L, 20691L, 23758L, 26855L, 29409L, 3303L, 6251L, 9576L,
12749L, 16253L, 19462L, 25386L, 30023L, 35758L, 41964L, 46216L,
49695L, 4312L, 7440L, 11969L, 16139L, 19953L, 25540L, 34821L,
39436L, 43862L), abs_change.SY = c(NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 1060L, 2001L, 2442L, 3409L, 4212L, 5096L,
9568L, 12484L, 15067L, 18206L, 19361L, 20286L, 1009L, 1189L,
2393L, 3390L, 3700L, 6078L, 9435L, 9413L, 8104L), pct_change.SY = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 47.3, 47.1, 34.2,
36.5, 35, 35.5, 60.5, 71.2, 72.8, 76.6, 72.1, 69, 30.5, 19, 25,
26.6, 22.8, 31.2, 37.2, 31.4, 22.7), abs_change.SY2 = c(NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 2069L, 3190L, 4835L, 6799L, 7912L, 11174L,
19003L, 21897L, 23171L), pct_change.SY2 = c(NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 92.2, 75.1, 67.8, 72.8, 65.7, 77.8, 120.1, 124.8,
112)), row.names = c(NA, -33L), class = c("tbl_df", "tbl", "data.frame"
))
Have you tried to open your exported PDF in Inkscape? I have edited several PDF files in Inkscape and not lost anything from the original PDF.

Add a new row to each dataframe in a list of a dataframes R

I have a list of a dataframes, which I have calculated the colmeans for each column of each dataframe except columns 1 and 2 of the dataframes. A small section of my list of dataframes is at the bottom.
I calculated the colmeans with the line below
df_colmeans <- lapply(df, function(x) colMeans(x[-c(1:2)],na.rm = T))
I want to add the colmeans back to each dataframe in my list of dataframes and tried the line below but ended up with a result that wasn't correct.
df <- rbind(df[-c(1:2)],df_colmeans)
I get the following error
In rbind(df[-c(1:2)], df_colmeans) :
number of columns of result is not a multiple of vector length (arg 2)
If there is a way to do it one step, without creating a separate variable (df_colmeans) that would be ideal.
Any help and suggestions greatly appreciated.
list(Fe.II. = structure(list(Lab = c("AD", "AD", "AK", "AK",
"AO", "AO", "BQ", "BQ", "CI", "CI", "CL", "CL", "CP", "CP", "CU",
"CU", "CW", "CW", "CZ", "CZ", "DA", "DA", "DC", "DC", "DF", "DF",
"EQ", "EQ", "EY", "EY", "FL", "FL", "FM", "FM", "FO", "FO", "FP",
"FP", "FR", "FR", "GB", "GB", "GC", "GC", "GL", "GL", "GM", "GM",
"GT", "GT", "GY", "GZ", "I", "I", "K", "K", "M", "M", "Q", "Q",
"S", "S", "U", "U", "V", "V", "W", "W"), `Sample Prep` = c(335,
337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337,
335, 337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337, 335,
337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337,
335, 337, 335, 337, 335, 337, 335, 337, 335, 337, 337, 337, 335,
337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337,
335, 337), `1` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.19,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 0.17, NA, 0.14, NA, NA, NA, NA, NA), `2` = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 0.01, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.13, NA, 0.15, NA, NA,
NA, NA, NA), `3` = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 0.02, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 0.23, NA, 0.29, NA, NA, NA, NA), `4` = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.02, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.22, NA,
0.29, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-68L)), SiO2 = structure(list(Lab = c("AD", "AD", "AK", "AK",
"AO", "AO", "BQ", "BQ", "CI", "CI", "CL", "CL", "CP", "CP", "CU",
"CU", "CW", "CW", "CZ", "CZ", "DA", "DA", "DC", "DC", "DF", "DF",
"EQ", "EQ", "EY", "EY", "FL", "FL", "FM", "FM", "FO", "FO", "FP",
"FP", "FR", "FR", "GB", "GB", "GC", "GC", "GL", "GL", "GM", "GM",
"GT", "GT", "GY", "GZ", "I", "I", "K", "K", "M", "M", "Q", "Q",
"S", "S", "U", "U", "V", "V", "W", "W"), `Sample Prep` = c(335,
337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337,
335, 337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337, 335,
337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337,
335, 337, 335, 337, 335, 337, 335, 337, 335, 337, 337, 337, 335,
337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337,
335, 337), `1` = c(3.65, NA, 3.595, NA, 3.645, NA, 3.66, NA,
3.644, NA, 3.547, NA, 3.648, NA, 3.655, NA, NA, NA, 3.666, NA,
3.686, NA, 3.674, NA, 3.667, NA, 3.59, NA, 3.63, NA, 3.689, NA,
3.64, NA, 3.58, NA, NA, NA, 3.46, NA, 3.71, NA, 3.66, NA, 3.65,
NA, 3.64, NA, 3.503, NA, NA, NA, 3.631, NA, 3.731, NA, 3.656,
NA, NA, NA, 3.75, NA, 3.656, NA, 3.667, NA, 3.684, NA), `2` = c(3.6,
NA, 3.591, NA, 3.646, NA, 3.65, NA, 3.625, NA, 3.548, NA, 3.648,
NA, 3.643, NA, NA, NA, 3.679, NA, 3.69, NA, 3.646, NA, 3.577,
NA, 3.6, NA, 3.63, NA, 3.692, NA, 3.66, NA, 3.59, NA, NA, NA,
3.512, NA, 3.703, NA, 3.62, NA, 3.63, NA, 3.63, NA, 3.49, NA,
NA, NA, 3.627, NA, 3.739, NA, 3.669, NA, NA, NA, 3.73, NA, 3.66,
NA, 3.664, NA, 3.669, NA), `3` = c(NA, 3.55, NA, 3.662, NA, 3.665,
NA, 3.65, NA, 3.67, NA, 3.576, NA, 3.631, NA, 3.683, NA, NA,
NA, 3.638, NA, 3.703, NA, 3.666, NA, NA, NA, 3.59, NA, 3.61,
NA, 3.657, NA, 3.67, NA, NA, NA, NA, NA, 3.8, NA, 3.684, NA,
3.69, NA, 3.66, NA, 3.69, NA, 4.398, 3.672, NA, NA, 3.678, NA,
3.769, NA, 3.678, NA, 3.718, NA, 3.75, NA, 3.636, NA, NA, NA,
3.603), `4` = c(NA, 3.57, NA, 3.69, NA, 3.624, NA, 3.66, NA,
3.673, NA, 3.551, NA, 3.64, NA, 3.684, NA, NA, NA, 3.678, NA,
3.673, NA, 3.696, NA, NA, NA, 3.56, NA, 3.62, NA, 3.615, NA,
3.66, NA, NA, NA, NA, NA, 3.642, NA, 3.684, NA, 3.67, NA, 3.67,
NA, 3.67, NA, 4.38, 3.672, NA, NA, 3.68, NA, 3.757, NA, 3.673,
NA, 3.759, NA, 3.78, NA, 3.64, NA, NA, NA, 3.597)), class = "data.frame", row.names = c(NA,
-68L)), CaO = structure(list(Lab = c("AD", "AD", "AK", "AK",
"AO", "AO", "BQ", "BQ", "CI", "CI", "CL", "CL", "CP", "CP", "CU",
"CU", "CW", "CW", "CZ", "CZ", "DA", "DA", "DC", "DC", "DF", "DF",
"EQ", "EQ", "EY", "EY", "FL", "FL", "FM", "FM", "FO", "FO", "FP",
"FP", "FR", "FR", "GB", "GB", "GC", "GC", "GL", "GL", "GM", "GM",
"GT", "GT", "GY", "GZ", "I", "I", "K", "K", "M", "M", "Q", "Q",
"S", "S", "U", "U", "V", "V", "W", "W"), `Sample Prep` = c(335,
337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337,
335, 337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337, 335,
337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337,
335, 337, 335, 337, 335, 337, 335, 337, 335, 337, 337, 337, 335,
337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337, 335, 337,
335, 337), `1` = c(0.04, NA, NA, NA, 0.055, NA, 0.053, NA, 0.055,
NA, 0.021, NA, 0.049, NA, 0.052, NA, NA, NA, 0.053, NA, 0.056,
NA, 0.054, NA, 0.036, NA, 0.032, NA, 0.06, NA, 0.053, NA, 0.05,
NA, 0.043, NA, NA, NA, NA, NA, 0.054, NA, 0.057, NA, 0.052, NA,
0.05, NA, 0.048, NA, NA, NA, 0.056, NA, 0.043, NA, 0.053, NA,
NA, NA, 0.052, NA, 0.076, NA, 0.051, NA, 0.047, NA), `2` = c(0.04,
NA, NA, NA, 0.054, NA, 0.053, NA, 0.055, NA, 0.023, NA, 0.05,
NA, 0.05, NA, NA, NA, 0.051, NA, 0.053, NA, 0.056, NA, 0.036,
NA, 0.032, NA, 0.06, NA, 0.053, NA, 0.05, NA, 0.043, NA, NA,
NA, NA, NA, 0.056, NA, 0.056, NA, 0.057, NA, 0.05, NA, 0.05,
NA, NA, NA, 0.053, NA, 0.043, NA, 0.053, NA, NA, NA, 0.052, NA,
0.076, NA, 0.051, NA, 0.047, NA), `3` = c(NA, 0.04, NA, NA, NA,
0.055, NA, 0.053, NA, 0.057, NA, 0.023, NA, 0.05, NA, 0.055,
NA, NA, NA, 0.053, NA, 0.053, NA, 0.054, NA, NA, NA, 0.045, NA,
0.06, NA, 0.052, NA, 0.05, NA, NA, NA, NA, NA, 0.05, NA, 0.054,
NA, 0.056, NA, 0.054, NA, 0.05, NA, 0.054, 0.057, NA, NA, 0.057,
NA, 0.053, NA, 0.054, NA, 0.052, NA, 0.05, NA, 0.045, NA, NA,
NA, 0.051), `4` = c(NA, 0.04, NA, NA, NA, 0.054, NA, 0.053, NA,
0.056, NA, 0.019, NA, 0.052, NA, 0.056, NA, NA, NA, 0.051, NA,
0.051, NA, 0.056, NA, NA, NA, 0.047, NA, 0.06, NA, 0.046, NA,
0.05, NA, NA, NA, NA, NA, 0.05, NA, 0.054, NA, 0.055, NA, 0.054,
NA, 0.05, NA, 0.052, 0.056, NA, NA, 0.055, NA, 0.053, NA, 0.053,
NA, 0.051, NA, 0.052, NA, 0.046, NA, NA, NA, 0.051)), class = "data.frame", row.names = c(NA,
-68L)))
Do you want to add the column mean as a new row to the existing dataframe?
Try :
result <- lapply(df, function(x) rbind(x[-c(1:2)], colMeans(x[-c(1:2)],na.rm = TRUE)))

R, Pivot longer, multiple observations per row

I think I have a question that is nearly identical to this one: R Pivot multiple columns from wide to long but I am hopelessly lost on the regex when trying to follow along.
I am also trying to pivot data to be longer, and I also have multiple columns I'd like to save. My data currently:
FollowUpScans<-structure(list(study_id = c(40, 44, 49, 61, 66, 67, 68, 84, 86,
94, 95, 101, 123, 126, 131, 153, 154, 155, 156, 161, 166, 169,
175, 185, 199, 203, 207, 211, 217, 221, 227, 256, 257, 259, 266,
275, 284, 301, 306, 307, 309, 313, 320, 353, 382, 392, 398, 401,
402, 412, 415, 428, 431, 433, 434, 436), Score1 = c(3, 0, 4,
4, NA, 0, 0, 5, 0, 0, 7, 0, 4, 0, 4, 2, 3, 1, 0, 2, 2, 0, 3,
0, 0, 0, 9, 0, 0, 0, 6, 0, 0, 7, 5, 7, 0, 0, 8, 0, 0, 0, 5, 0,
3, 0, 5, 0, 2, 0, 0, 0, 0, 7, 0, 2), TimeBetweenScans = structure(c(316,
113, 335, 104, 7, 42, 30, 643, 404, 40, 171, 51, 449, 56, 104,
79, 116, 65, 39, 1193, 142, 106, 221, 36, 125, 137, 927, 63,
156, 32, 411, 201, 160, 166, 459, 212, 50, 312, 1627, 354, 33,
62, 842, 174, 216, 17, 214, 24, 149, 72, 9, 13, 42, 771, 113,
122), class = "difftime", units = "days"), Score2 = c(NA, 0,
7, NA, NA, NA, 0, 7, NA, 5, 8, 0, NA, NA, NA, 8, NA, NA, 9, NA,
NA, 0, 4, NA, NA, 0, 9, 2, 0, NA, NA, NA, NA, NA, NA, NA, 4,
1, 8, NA, NA, 3, NA, 0, 8, NA, 5, NA, 7, NA, 0, 3, NA, 7, NA,
4), TimeBetweenScans2 = structure(c(NA, 139, 660, NA, NA, NA,
84, 1794, NA, 221, 320, 227, NA, NA, NA, 989, NA, NA, 411, NA,
NA, 216, 474, NA, NA, 372, 1006, 429, 447, NA, NA, NA, NA, NA,
NA, NA, 313, 530, 1706, NA, NA, 130, NA, 300, 264, NA, 268, NA,
382, NA, 38, 138, NA, 1200, 166, 475), class = "difftime", units = "days"),
Score3 = c(NA, NA, NA, NA, NA, NA, 7, NA, NA, 8, NA, NA,
NA, NA, NA, 8, NA, NA, NA, NA, NA, 1, 4, NA, NA, 0, NA, 5,
0, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA,
NA, NA, NA, 5, NA, NA, NA, NA, NA, NA, 8, 0, 4), TimeBetweenScans3 = structure(c(NA,
NA, NA, NA, NA, NA, 467, NA, NA, 394, NA, NA, NA, NA, NA,
1097, NA, NA, NA, NA, NA, 266, 796, NA, NA, 941, NA, 533,
470, NA, NA, NA, NA, NA, NA, NA, NA, 783, NA, NA, NA, NA,
NA, NA, NA, NA, 388, NA, NA, NA, NA, NA, NA, 1512, 180, 640
), class = "difftime", units = "days"), Score4 = c(NA, NA,
NA, NA, NA, NA, 8, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 5, NA, NA, NA, 1, NA, 5, 0, NA, NA, NA, NA,
NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA), TimeBetweenScans4 = structure(c(NA,
NA, NA, NA, NA, NA, 826, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 497, NA, NA, NA, 1102, NA, 567, 1204,
NA, NA, NA, NA, NA, NA, NA, NA, 1574, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), class = "difftime", units = "days"),
Score5 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, 1, NA,
NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
TimeBetweenScans5 = structure(c(NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 575,
NA, NA, NA, 1225, NA, NA, 1266, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA), class = "difftime", units = "days")), row.names = c(NA,
-56L), class = c("tbl_df", "tbl", "data.frame"))
And instead of columns that looks like: study_id, Score1, TimeBetweenScans,Score2,TimeBetweenScans2, Score3, TimeBetweenScans3,etc.etc..
I'd love it to ultimately look like: study_id,Score,Time,Occurence
The "Occurence" column would just have a 1,2,3,4 etc.. to demonstrate which column it came from. The study_id column would be nice to keep because it demonstrates which "person" it came from.
Any help would be appreciated! Thank you!
You can try:
FollowUpScans %>%
rename(TimeBetweenScans1 = TimeBetweenScans) %>%
pivot_longer(-study_id,
names_to = c(".value", "Time"),
names_pattern = "([A-Za-z]+)([0-9]+)")
The steps are:
Rename the column that is likely to cause problems
pivot_longer specifying that the columns are named in a any number of characters followed by any number of digits pattern. You can use different regex patterns than the one I've shared here. For example, you could probably use "(.*)(\\d+)" for this particular dataset.
If you don't rename first, I would suspect that you would end up with too many rows. You should end up with nrow(FollowUpScans) * 5 rows.

How to determine a value in a column immediately before a value in another column in R?

Plot
Following is a plot of speeds of two vehicles over time. The subject vehicle (blue) is following the lead vehicle (red) in the same lane. So, the speed profile of subject vehicle is very similar to lead vehicle's.
I have manually labelled the points where a vehicle changes its speed by acceleration/deceleration. Now, I want to determine these points from the data. Following are the sample data:
Data
> dput(veh)
structure(list(Time = c(287, 288, 289, 290, 291, 292, 293, 294,
295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307,
308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320,
321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331), fit_p = c(NA,
NA, NA, 8.86, 8.5, 8.15, 7.79, 7.44, 7.08, 6.73, 6.38, 6.1, 6.48,
6.86, 7.24, 7.63, 8.01, 8.38, 8.58, 8.68, 8.7, 8.53, 8.33, 8.12,
7.92, 7.71, 7.74, 8.1, 8.45, 8.8, 9.15, 9.29, 9.22, 9.16, 9.09,
9.13, 9.25, 9.37, 9.49, 9.51, 9.34, 9.17, NA, NA, NA), psi_p2 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 298, NA, NA, NA, NA,
NA, 304, 305, NA, 307, NA, NA, NA, NA, NA, 313, NA, NA, NA, 317,
NA, NA, NA, 321, NA, NA, NA, NA, 326, NA, NA, NA, NA, NA), slo_p = c(-0.35,
-0.35, -0.35, -0.35, -0.35, -0.35, -0.35, -0.35, -0.35, -0.35,
-0.35, 0.38, 0.38, 0.38, 0.38, 0.38, 0.38, 0.2, 0.02, 0.02, -0.2,
-0.2, -0.2, -0.2, -0.2, -0.2, 0.35, 0.35, 0.35, 0.35, -0.06,
-0.06, -0.06, -0.06, 0.12, 0.12, 0.12, 0.12, 0.12, -0.17, -0.17,
-0.17, -0.17, -0.17, -0.17), fit_v = c(NA, NA, NA, 9.16, 8.57,
7.99, 7.4, 7.23, 7.13, 7.04, 6.94, 6.85, 6.75, 6.66, 7.07, 7.57,
8.06, 8.56, 9.04, 9.15, 9.26, 9.37, 9.15, 8.92, 8.68, 8.45, 8.22,
7.99, 8.03, 8.24, 8.55, 8.87, 9.02, 8.96, 8.89, 8.82, 8.75, 8.99,
9.28, 9.47, 9.42, 9.37, NA, NA, NA), psi_v2 = c(NA, NA, NA, NA,
NA, NA, 293, NA, NA, NA, NA, NA, NA, 300, NA, NA, NA, NA, 305,
NA, NA, 308, NA, NA, NA, NA, NA, 314, 315, 316, NA, NA, 319,
NA, NA, NA, 323, NA, NA, 326, NA, NA, NA, NA, NA), slo_v = c(-0.59,
-0.59, -0.59, -0.59, -0.59, -0.59, -0.1, -0.1, -0.1, -0.1, -0.1,
-0.1, -0.1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.11, 0.11, 0.11, -0.23,
-0.23, -0.23, -0.23, -0.23, -0.23, 0.04, 0.16, 0.32, 0.32, 0.32,
-0.07, -0.07, -0.07, -0.07, 0.29, 0.29, 0.29, -0.05, -0.05, -0.05,
-0.05, -0.05, -0.05)), .Names = c("Time", "fit_p", "psi_p2",
"slo_p", "fit_v", "psi_v2", "slo_v"), row.names = c(NA, -45L), class = "data.frame")
In the column psi_v2, I have the time where subject vehicle changed the speed. These are all the S points. The points where the lead vehicle changed the speed are in the column psi_p2. But, I only want to determine the location of those points in psi_p2 which happened immediately before point S. These points are all the L points on the plot. For instance, S1 happened at psi_v2=300, therefore, L1 is 298 in psi_p2.
Question
I guess that I need to use which() to determine the relevant points from psi_p2. But I don't know how to code the part where only the "immediately before" point is picked.
Once the points are identified, I want to check if the subject vehicle accelerated in response to lead vehicle's acceleration. The acceleration of subject vehicle is in slo_v and that of lead vehicle is inslo_p. Example: For S1, slo_v = 0.5, and for L1, slo_p = 0.38. Since subject vehicle accelerated due to acceleration of lead vehicle, we call it "opening" (or "closing" in opposite case).
So, my desired output is:
structure(list(Time = 287:331, fit_p = c(NA, NA, NA, 8.86, 8.5,
8.15, 7.79, 7.44, 7.08, 6.73, 6.38, 6.1, 6.48, 6.86, 7.24, 7.63,
8.01, 8.38, 8.58, 8.68, 8.7, 8.53, 8.33, 8.12, 7.92, 7.71, 7.74,
8.1, 8.45, 8.8, 9.15, 9.29, 9.22, 9.16, 9.09, 9.13, 9.25, 9.37,
9.49, 9.51, 9.34, 9.17, NA, NA, NA), psi_p2 = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 298L, NA, NA, NA, NA, NA, 304L, 305L,
NA, 307L, NA, NA, NA, NA, NA, 313L, NA, NA, NA, 317L, NA, NA,
NA, 321L, NA, NA, NA, NA, 326L, NA, NA, NA, NA, NA), slo_p = c(-0.35,
-0.35, -0.35, -0.35, -0.35, -0.35, -0.35, -0.35, -0.35, -0.35,
-0.35, 0.38, 0.38, 0.38, 0.38, 0.38, 0.38, 0.2, 0.02, 0.02, -0.2,
-0.2, -0.2, -0.2, -0.2, -0.2, 0.35, 0.35, 0.35, 0.35, -0.06,
-0.06, -0.06, -0.06, 0.12, 0.12, 0.12, 0.12, 0.12, -0.17, -0.17,
-0.17, -0.17, -0.17, -0.17), fit_v = c(NA, NA, NA, 9.16, 8.57,
7.99, 7.4, 7.23, 7.13, 7.04, 6.94, 6.85, 6.75, 6.66, 7.07, 7.57,
8.06, 8.56, 9.04, 9.15, 9.26, 9.37, 9.15, 8.92, 8.68, 8.45, 8.22,
7.99, 8.03, 8.24, 8.55, 8.87, 9.02, 8.96, 8.89, 8.82, 8.75, 8.99,
9.28, 9.47, 9.42, 9.37, NA, NA, NA), psi_v2 = c(NA, NA, NA, NA,
NA, NA, 293L, NA, NA, NA, NA, NA, NA, 300L, NA, NA, NA, NA, 305L,
NA, NA, 308L, NA, NA, NA, NA, NA, 314L, 315L, 316L, NA, NA, 319L,
NA, NA, NA, 323L, NA, NA, 326L, NA, NA, NA, NA, NA), slo_v = c(-0.59,
-0.59, -0.59, -0.59, -0.59, -0.59, -0.1, -0.1, -0.1, -0.1, -0.1,
-0.1, -0.1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.11, 0.11, 0.11, -0.23,
-0.23, -0.23, -0.23, -0.23, -0.23, 0.04, 0.16, 0.32, 0.32, 0.32,
-0.07, -0.07, -0.07, -0.07, 0.29, 0.29, 0.29, -0.05, -0.05, -0.05,
-0.05, -0.05, -0.05), label = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 9L, 1L, 1L, 1L, 3L, 10L, 1L,
4L, 11L, 1L, 1L, 1L, 1L, 5L, 1L, 1L, 12L, 6L, 1L, 13L, 1L, 7L,
1L, 14L, 1L, 1L, 8L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "L1",
"L2", "L3", "L4", "L5", "L6", "L7&S7", "S1", "S2", "S3", "S4",
"S5", "S6"), class = "factor"), condition = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L,
3L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 2L, 1L,
1L, 1L, 3L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "closing",
"opening"), class = "factor")), .Names = c("Time", "fit_p", "psi_p2",
"slo_p", "fit_v", "psi_v2", "slo_v", "label", "condition"), class = "data.frame", row.names = c(NA,
-45L))
Kindly guide me what function should I use to identify these points? I prefer using dplyr because I have multiple pairs like this example. An operation for one data frame can then be used on all others using group_by().

Subset xts object using vector of unique index days

I'm trying to subset an xts object using a vector of xts timestamps that have been processed into a vector of unique timestamps. This follows on from this previous question that was only partially answered.
Some sample data:
dput(sample.data.merge, control="all")
structure(c(11.65, 11.13, 11.13, 11.5, 11.8, 11.45, 11.45, 11.08,
11.08, 11.25, 9.8, 10.45, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9,
10.45, 10.5, 10.5, 10.08, 10.08, 10.65, 10.08, 10.65, 10.6, 10.65,
10.65, 10.085, 10.145, 11.9, 11.085, 9.35, 9.15, 9.15, 9.9, 9.0875,
9.3, 9.3, 9.3, 9.35, 9.35, 9.35, 9.25, 9.5, 9.45, 9.3, 11.15,
11.15, 11.15, 11.15, 11.8, 8, 10.05, 10.05, 10.25, 10.4, 10.15,
10.15, 10.3, 10.15, 10.1, 11.08, 11.08, 11.08, 11.65, 11.85,
11.9, 11.9, 11.9, 12.65, 13.35, 13.35, 15.95, 15.9, 15.4, 15.4,
15.4, 15.4, 15.13, 12.13, 12.35, 11.082, 11.082, 11.08, 12.1,
12.3, 12.3, 12.4, 12.6, 12.6, 12.13, 12.45, 12.9, 12.9, 12.9,
14, 12.6, 12.6, 12.45, 15.25, 12.085, 12.95, 12.95, 12.35, 12.13,
12.8, 14, 14, 12.45, 12.45, 12.45, 12.45, 12.25, 12.6, 12.085,
15.1, 15.15, 15.35, 15.3, 12.5, 12.5, 12.15, 12.2, 11.085, 11.35,
11.45, 11.13, 11.13, 11.35, 11.2, 12.5, 12.6, 12.95, 12.95, 12.5,
12.45, 12.3, 12.3, 12.3, 12.45, 12.45, 12.45, 12.5, 12.45, 12.45,
12.13, 12.13, 12.65, 190, 190, 190, 190, 130, 190, 190, 190,
190, 190, 130, 190, 130, 130, 445, 445, 445, 445, 130, 445, 190,
445, 445, 190, 190, 190, 190, 130, 190, 190, 190, 190, 190, 190,
190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190,
190, 275, 190, 190, 190, 190, 190, 190, 190, 190, 190, 130, 130,
190, 190, 190, 130, 130, 130, 190, 130, 190, 190, 190, 130, 190,
190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190,
1190, 190, 190, 130, 130, 130, 190, 1130, 190, 190, 130, 190,
190, 190, 190, 190, 190, 130, 130, 190, 190, 375, 190, 190, 190,
130, 190, 130, 190, 190, 190, 190, 130, 190, 190, 190, 190, 190,
190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190, 190,
130, 130, 130, 190, 130, 190, 190, 190, 130, 130, 445, 445, 130,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 0, 0, NA, NA, NA, NA, NA, 0.21, 0.21, 0.26, 0.0250000000000004,
0, 0.0250000000000004, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 0.0249999999999995, 0.0250000000000004, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.0250000000000004,
0.100000000000001, 0.39, NA, NA, NA, NA, NA, 0.0250000000000004,
NA, NA, NA, NA, NA, 0.524999999999999, 0.25, 0, 0, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 0.149999999999999, 0.135000000000001,
0.149999999999999, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 0.409999999999999, 0.375, 0.3, 0.635, 0.385, 0.335, 0.175000000000001,
0, NA, NA, NA, NA, NA, 1.4, 0.2, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 0.109999999999999, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 0.0749999999999993, 0.0749999999999993, 0.0749999999999993,
0.0250000000000004, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, NA, NA, NA,
NA, NA, 127.5, 0, 0, 0, 0, 0, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 30, 30, 30, NA, NA, NA, NA,
NA, 0, NA, NA, NA, NA, NA, 0, 0, 0, 0, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 30, 30, 30, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 0, 30, 30, 0, 0, 0, 0, 0, NA, NA, NA, NA, NA, 0,
0, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 0, 0, 30, 0, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 10.9,
10.9, NA, NA, NA, NA, NA, 10.29, 10.29, 10.34, 10.625, 10.65,
10.625, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 9.325,
9.325, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 10.15, 10.225, 10.69, NA, NA, NA, NA, NA, 11.9,
NA, NA, NA, NA, NA, 15.4, 15.4, 15.4, 15.4, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 12.35, 12.35, 12.425, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 12.65, 12.575, 12.875, 12.875, 12.625,
12.625, 12.625, 12.45, NA, NA, NA, NA, NA, 13.85, 15.125, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 11.275, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 12.375, 12.375, 12.375, 12.45, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 445, 445, NA, NA, NA, NA, NA, 317.5, 190, 190, 190, 190,
190, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 190,
190, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 160, 160, 160, NA, NA, NA, NA, NA, 190, NA, NA,
NA, NA, NA, 190, 190, 190, 190, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 160, 160, 160, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 190, 190, 190, 190, 190, 190, 190, 190, NA, NA, NA, NA,
NA, 190, 190, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 190, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 130, 130, 160, 190, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NaN, Inf, NA, NA, NA, NA, NA, 0.999999999999996,
1.71428571428572, 1, 1, NaN, 21.5999999999997, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 1.00000000000004, 2.99999999999993,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 37.1999999999995, 8.54999999999987, 0.999999999999998,
NA, NA, NA, NA, NA, 29.9999999999996, NA, NA, NA, NA, NA, 0,
0, NaN, Inf, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1.66666666666666,
1.62962962962963, 0.166666666666658, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 1.26829268292683, 0.600000000000004,
3.75, 1.77165354330709, 0.454545454545457, 0.522388059701495,
1, NaN, NA, NA, NA, NA, NA, 1.07142857142857, 0.875000000000003,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.681818181818179, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, 2, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NaN, Inf, NA, NA, NA, NA, NA, 1, NaN, NaN, Inf, NaN, NaN,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NaN, NaN,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 1, 1, 1, NA, NA, NA, NA, NA, Inf, NA, NA, NA, NA, NA,
NaN, NaN, NaN, NaN, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1,
1, 32.3333333333333, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NaN, 6.16666666666667, 0, NaN, NaN, Inf, NaN, Inf, NA,
NA, NA, NA, NA, NaN, NaN, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NaN, NA, NA, NA, NA, NA, NA, NA, NA, NA, NaN, Inf, 1, NaN,
NA, NA, NA, NA, NA), .Dim = c(150L, 8L), .Dimnames = list(NULL,
c("price", "volume", "madprice", "madvolume", "medianprice",
"medianvolume", "absdevmadprice", "absdevmadvolume")), index = structure(c(1325584080,
1325594940, 1325594940, 1325604600, 1325759100, 1325762520, 1325762520,
1325769300, 1325769300, 1325848080, 1325864880, 1326128220, 1326196500,
1326196500, 1326196500, 1326196500, 1326196500, 1326196500, 1326209700,
1326279480, 1326283620, 1326288300, 1326288300, 1326289680, 1326289680,
1326289680, 1326292320, 1326294060, 1326294600, 1326297600, 1326387000,
1326456720, 1326467160, 1326711600, 1326723000, 1326724260, 1326809940,
1326814860, 1326885960, 1326885960, 1326889980, 1326894000, 1326895200,
1326895200, 1326898080, 1326986700, 1326987240, 1326992100, 1327072140,
1327328040, 1327328040, 1327328040, 1327417920, 1327423140, 1327424820,
1327425240, 1327483200, 1327496520, 1327570320, 1327570320, 1327575420,
1327588680, 1327588980, 1327595880, 1327595880, 1327595880, 1327664820,
1327674720, 1327680660, 1327680780, 1327680780, 1327683960, 1327914300,
1327914300, 1327915260, 1327918140, 1327924860, 1327924920, 1327924980,
1327924980, 1327927680, 1328013360, 1328014200, 1328025000, 1328025000,
1328026740, 1328089440, 1328091360, 1328091360, 1328110620, 1328111340,
1328111340, 1328112420, 1328113800, 1328193540, 1328194080, 1328194140,
1328196720, 1328274360, 1328274420, 1328278320, 1328519280, 1328520120,
1328520600, 1328520600, 1328524140, 1328527980, 1328531580, 1328540880,
1328540880, 1328547600, 1328547660, 1328547720, 1328547780, 1328607060,
1328608080, 1328618760, 1328623380, 1328623380, 1328625720, 1328631480,
1328717760, 1328717880, 1328793000, 1328797980, 1329132840, 1329210480,
1329215400, 1329215820, 1329215820, 1329219480, 1329223140, 1329300900,
1329301620, 1329315240, 1329315240, 1329388740, 1329389700, 1329390000,
1329390000, 1329390180, 1329391860, 1329391860, 1329391860, 1329402120,
1329467700, 1329467700, 1329469080, 1329469080, 1329471300), tzone = "", tclass = c("POSIXlt",
"POSIXt")), .indexCLASS = c("POSIXlt", "POSIXt"), .indexTZ = "", tclass = c("POSIXlt",
"POSIXt"), tzone = "", class = c("xts", "zoo"))
The code:
sample.data.mergesub <- sample.data.merge['T10:30/T17:30']
sample.data.mergeout <- sample.data.mergesub[ which((sample.data.mergesub$absdevmadprice >=5 & sample.data.mergesub$absdevmadprice < Inf) | (sample.data.mergesub$absdevmadvol>=10 & sample.data.mergesub$absdevmadvol<Inf)),]
sample.data.unique <- unique(.indexday(sample.data.mergeout))
This sample.data.unique is therefore a vector of index days. Question: I'd like to use this to extract the full day of data from the original dataset sample.data in order to later graph the full day of trades, rather than the subset of data. For instance, if Jan 03 2012 10:53:00 meets the conditions of having absdevmadprice >= 5, and less than infinite, then I'd like to return the day (Jan 03 2012) into a vector and use this to subset the original dataset. This would select all observations in that day (so over the whole trading period) and I could then graph this day.
I've tried this code (based on Joshua's answer here) but it doesn't work:
> sample.data.uniquePOS<-sample.data.merge[paste(as.Date(as.POSIXct(sample.data.unique, origin = "1970-01-01 00:00.00 UTC", tz="GMT")))]
It returns simply the column names:
> sample.data.uniquePOS
price volume madprice madvolume medianprice medianvolume absdevmadprice
absdevmadvolume
For info, the structure of the variables:
> str(sample.data.merge)
An ‘xts’ object on 2012-01-03 09:48:00/2012-02-17 09:35:00 containing:
Data: num [1:150, 1:8] 11.6 11.1 11.1 11.5 11.8 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:8] "price" "volume" "madprice" "madvolume" ...
Indexed by objects of class: [POSIXlt,POSIXt] TZ:
xts Attributes:
NULL
> str(sample.data.uniquePOS)
An 'xts' object of zero-width
> str(sample.data.unique)
num 15371
Thanks for the help (and if anyone can explain why the code doesn't work!).
answer to own question:
Using these posts (Ananda's answer to this, Joshua's answer to this, and the as.Date.numeric function I found out about here) I was able to solve my own problem. This line of code seems to do it:
sample.data.uniquePOS <- sample.data.merge[paste(as.Date.numeric(sample.data.unique, origin= "1970-01-01 00:00.00 UTC", tz="GMT")),]
Can't give a great explanation as to why it works compared to the below, but perhaps as.POSIXct can't take the same format that as.Date.numeric can?
sample.data.uniquePOS <- sample.data.merge[paste(as.Date(as.POSIXct(sample.data.unique, origin = "1970-01-01 00:00.00 UTC", tz="GMT")))]

Resources