Related
I may be misunderstanding how for loops work, but I'm having hard time comprehending why the current code doesn't populate vectors (the vectors evidently remain NA, although the code itself runs). I imagine there may also be a way to subset all of this information using ifelse(), but I'm experiencing "coder's block".
Issue (elaborated): I am trying to code a running Electoral College projection based on a betting market from the 2008 presidential cycle, over the final 90 days until Election Day. I justify using two for loops because the code needs to check conditional statements on a particular day and add a particular value to a preexisting sum at on that day. In other words, if the betting price for Obama is higher than McCain on a particular for a particular state that state's electoral votes are awarded to Obama on that day, and visa versa. Again, the code runs, but the vectors apparently remain NA.
Key of Relevant Variables
EV, electoral votes of that particular state
X, a unique value assigned to each observation
day, date class
PriceD, betting price for the Dem candidate
PriceR, betting price for the Rep candidate
DaysToEday, a numeric value indicating the difference between variable day and election day (2008-11-04)
Code in Question
Obama08.ECvotesByDay <- McCain08.ECvotesByDay <- rep(NA, 90)
for (i in 1:90) {
for (j in 1:nrow(subset(mpres08, mpres08$DaysToEday <= 90))){
if(mpres08$PriceD[j] > mpres08$PriceR[j]) {
Obama08.ECvotesByDay[i] <- Obama08.ECvotesByDay[i]+mpres08$EV[j]
}
else {
McCain08.ECvotesByDay[i] <- McCain08.ECvotesByDay[i]+mpres08$EV[j]
}
}
}
dput of Data (five rows)
structure(list(state = c("AK", "AK", "AK", "AK", "AK"), state.name = c("Alaska",
"Alaska", "Alaska", "Alaska", "Alaska"), Obama = c(38L, 38L,
38L, 38L, 38L), McCain = c(59L, 59L, 59L, 59L, 59L), EV = c(3L,
3L, 3L, 3L, 3L), X = c(24073L, 25195L, 8773L, 25603L, 25246L),
day = structure(c(13937, 13959, 13637, 13967, 13960), class = "Date"),
PriceD = c(7.5, 7.5, 10, 8, 7.5), VolumeD = c(0L, 0L, 0L,
0L, 0L), PriceR = c(92.5, 92.5, 90, 92, 92.5), VolumeR = c(0L,
0L, 0L, 0L, 0L), DaysToEday = c(250, 228, 550, 220, 227)), row.names = c(NA,
5L), class = "data.frame")
You are adding a number to NA, and for R the result is NA.
Obama08.ECvotesByDay[i] and McCain08.ECvotesByDay[i] are initialised with NA. In R, if you try to do arithmetic with NA it stays NA (e.g. NA + 1 results in NA). Depending on what is a neutral result for you, you could initialise the vectors in the beginning with 0:
Obama08.ECvotesByDay <- McCain08.ECvotesByDay <- rep(0, 90)
I have a dataset that was recorded by observation(each observation has its own row of data). I am looking to combine/condense these rows by the plant they were found on - currently a character variable. All other columns are numerical vales.
EX:
This is the raw data
|Sci_Name|Honeybee_count|Other_bee_Obsevrved|Stem_count|
|---|---|---|---|
|Zizia aurea|1|5|10|
|Asclepias viridiflora|15|1|3|
|Viola unknown|0|0|4|
|Zizia aurea|0|2|6|
|Zizia aurea|3|6|3|
|Asclepias viridiflora|8|2|17|
and I want:
Sci_Name
Honeybee_count
Other_bee_Obsevrved
Stem_count
Zizia aurea
4
13
19
Asclepias viridiflora
23
3
20
Viola unknown
0
0
4
I am currently pulling this data from a CSV already in table form. I have been attempting to create a new table/data frame with one entry of each plant species, and blanks/0s for each other variable, which I can then use to c-binding the two together. This, however, has been clunky at best and I am having trouble figuring out how to have each row check itself. I am open to any approach, let me know what you think!
Thanks :D
We can use the formula method in aggregate from base R. On the rhs of the ~, specify the grouping variable and on the lhs, use . for denoting the rest of the variables. Specify the FUN as sum and it will do the column wise sum by group
aggregate(. ~ Sci_Name, df1, sum)
-output
Sci_Name Honeybee_count Other_bee_Obsevrved Stem_count
1 Asclepias viridiflora 23 3 20
2 Viola unknown 0 0 4
3 Zizia aurea 4 13 19
data
df1 <- structure(list(Sci_Name = c("Zizia aurea", "Asclepias viridiflora",
"Viola unknown", "Zizia aurea", "Zizia aurea", "Asclepias viridiflora"
), Honeybee_count = c(1L, 15L, 0L, 0L, 3L, 8L), Other_bee_Obsevrved = c(5L,
1L, 0L, 2L, 6L, 2L), Stem_count = c(10L, 3L, 4L, 6L, 3L, 17L)),
class = "data.frame", row.names = c(NA,
-6L))
I'm looking to find the most common species ("spid" variable, which is a code made with the 4 first letters of the genus name and then the 4 first letter of the species name) in a data frame where there is different habitats (variable "hab", modalities : TA,TB,TC).
I don't know how I can apply the "max n" ("slice(which.max(n))") on each habitat to select the species that are the most common for those habitats. As an exemple, if a species has been counted 50 times in 1 habitat and 0 time in the others, compared to a species that has 10 counts in each habitat, the last one would be the more common.
Here is the code I started with :
brk %>%
dplyr::select(spid,hab)%>%
dplyr::group_by(spid) %>%
dplyr::mutate(n = length(unique(hab))) %>%
filter(n == 3)
As first I thought to filter the species that are in the 3 habitats but I couldn't select those species. But then how can I apply my "max" function to select the most shared species ? Is a "apply" function a good approach ?
Here is a reproductible code :
library(dplyr)
brk%>%
dplyr::select(spid,hab)%>%
dplyr::sample_n(20)%>%
dput()
structure(list(spid = structure(c(157L, 21L, 181L, 128L, 191L,
197L, 202L, 122L, 179L, 150L, 15L, 162L, 43L, 202L, 154L, 179L,
57L, 229L, 231L, 183L), .Label = c("ACROEMER", "ACROMEGA", "AEROSUBPM",
"AMAZDIPL", "ANASAURI", "ANASPILI", "ANDRABER", "ANDRBILO", "ANEULATI",
"BAZZDECR", "BAZZDECRM", "BAZZMASC", "BAZZNITI", "BAZZPRAE",
"BAZZROCA", "BRACEURY", "BUCKMEMB", "CALYARGU", "CALYFISS", "CALYMASC",
"CALYPALI", "CALYPERU", "CAMPARCTM", "CAMPAURE", "CAMPCRAT",
"CAMPFLEX", "CAMPJAME", "CAMPROBI", "CAMPTHWA", "CEPHVAGI", "CERABELA",
"CERACORN", "CERAZENK", "CHEICAME", "CHEICORDI", "CHEIDECU",
"CHEIMONT", "CHEISERP", "CHEISURR", "CHEITRIF", "CHEIUSAM", "CHEIXANT",
"COLOCEAT", "COLOHASK", "COLOHILD", "COLOOBLI", "COLOPEPO", "COLOTANZ",
"COLOZENK", "COLUBENO", "COLUCALY", "COLUDIGI", "COLUHUMB", "COLUOBES",
"COLUTENU", "CONOTRAP", "CRYPMART", "CUSPCONT", "CYCLBORB", "CYCLBREV",
"CYLIKIAE", "DALTANGU", "DALTLATI", "DENDBORB", "DICRBILLB",
"DIPLCAVI", "DIPLCOGO", "DIPLCORN", "DREPCULT", "DREPHELE", "DREPMADA",
"DREPPHYS", "ECTRREGU", "ECTRVALE", "FISSASPL", "FISSMEGAH",
"FISSSCIO", "FRULAPIC", "FRULAPICU", "FRULBORB", "FRULCAPE",
"FRULGROS", "FRULHUMB", "FRULLIND", "FRULREPA", "FRULSCHI", "FRULSERR",
"FRULUSAMR", "FRULVARI", "FUSCCONN", "GOTTNEES", "GOTTSCHI",
"GOTTSPHA", "GROULAXO", "HAPLSTIC", "HERBDICR", "HERBJUNI", "HERBMAUR",
"HETEDUBI", "HETESPLE", "HETESPN", "HOLOBORB", "HOLOCYLI", "HYPNCUPR",
"ISOPCHRY", "ISOPCITR", "ISOPINTO", "ISOTAUBE", "JAEGSOLI", "JAEGSOLIR",
"KURZCAPI", "KURZCAPIS", "LEJEALAT", "LEJEANIS", "LEJECONF",
"LEJEECKL", "LEJEFLAV", "LEJELOMA", "LEJEOBTU", "LEJERAMO", "LEJETABU",
"LEJETUBE", "LEJEVILL", "LEPIAFRI", "LEPICESP", "LEPIDELE", "LEPIHIRS",
"LEPISTUH", "LEPISTUHP", "LEPTFLEX", "LEPTINFU", "LEPTMACU",
"LEUCANGU", "LEUCBIFI", "LEUCBORY", "LEUCCANDI", "LEUCCAPI",
"LEUCCINC", "LEUCDELI", "LEUCGRAN", "LEUCHILD", "LEUCISLE", "LEUCLEPE",
"LEUCMAYO", "LEUCSEYC", "LOPHBORB", "LOPHCOAD", "LOPHCONC", "LOPHDIFF",
"LOPHEULO", "LOPHMULT", "LOPHMURI", "LOPHNIGR", "LOPHSUBF", "MACRACID",
"MACRMAUR", "MACRMICR", "MACRPALL", "MACRSERP", "MACRSULC", "MACRTENU",
"MASTDICL", "METZCONS", "METZFURC", "METZLEPT", "METZMADA", "MICRAFRI",
"MICRANKA", "MICRDISP", "MICRINFL", "MICRKAME", "MICROBLO", "MICRSTRA",
"MITTLIMO", "MNIOFUSC", "PAPICOMP", "PLAGANGU", "PLAGDREP", "PLAGPECT",
"PLAGRENA", "PLAGREPA", "PLAGRODR", "PLAGTERE", "PLEUGIGA", "PLICHIRT",
"POLYCOMM", "POROELON", "POROMADA", "POROUSAG", "PRIOGRAT", "PSEUDECI",
"PTYCSTRI", "PYRRSPIN", "RACOAFRI", "RADUANKE", "RADUAPPR", "RADUBORB",
"RADUBORY", "RADUCOMO", "RADUEVEL", "RADUFULV", "RADUMADA", "RADUSTEN",
"RADUTABU", "RADUVOLU", "RHAPCRIS", "RHAPGRAC", "RHAPRUBR", "RICCAMAZ",
"RICCEROS", "RICCFAST", "RICCLIMB", "RICCLONG", "SCHLBADI", "SCHLMICRO",
"SCHLOANGU", "SCHLSQUA", "SEMACRAS", "SEMASCHI", "SEMASUBP",
"SERPCYRT", "SOLEBORG", "SOLEONRA", "SOLESPHA", "SPHATUMI", "SPHEMINU",
"SYRRAFRI", "SYRRAPER", "SYRRDIMO", "SYRRGAUD", "SYRRHISP", "SYRRPOTT",
"SYRRPROL", "SYRRPROLA", "SYZYPURP", "TAXICONFO", "TELACOAC",
"TELADIAC", "TELANEMA", "TRICADHA", "TRICDEBE", "TRICPERV", "ULOTFULV",
"WARBLEPT", "ZYGOINTE", "ZYGOREIN"), class = "factor"), hab = structure(c(3L,
2L, 2L, 1L, 1L, 2L, 3L, 2L, 3L, 1L, 2L, 3L, 3L, 2L, 1L, 2L, 2L,
1L, 1L, 2L), .Label = c("TA", "TB", "TC"), class = "factor")), row.names = c(NA,
-20L), class = "data.frame")
Thank you for your help,
Germain V
We can try
library(dplyr)
brk %>%
group_by(spid) %>%
summarise(n = n_distinct(hab)) %>%
slice(which.max(n))
Thank you for your answer
I would like to have a list with the species (the code spid) that are the most common between those 3 habitats, based on the effectif of those species in each habitat.
spid n_TA n_TB n_TC
DREPPHYS 6 1 1
BUCKMEMB 4 4 4
LEIJCOLE 0 0 0
In this random exemple (I've a total of 246 species) I would like to make a compute on this array to select the "most common" or "most sharred species" - in this case that should be BUCKMEMB, then DREPPHYS and finaly LEIJCOLE.
Maybe "max" function isn't a good approach ?
I would appreciate any help to randomly select a subset of var.w_X
containing 5 out of 10 var.w_X variables from my sample data sampleDT, while keeping all the other variables that do not start withvar.w_.
Below is the sample data sampleDT which contains, among other variables (those to be kept altogether), X variables starting with var.w_ in their names (those from which to draw the random sample).
In the current example, X=10, so that var.w_ includes var.w_1 to var.w_10, and I want to draw a random sample of 5 out of these 10. However, in my actual data, X>1,000,000and I might want to draw a sample of 7,500 var.w_ variables out of these X>1,000,000.
Therefore, accounting for efficiency is paramount in any given solution since recently I experienced some performance issues with mutate_at whose cause I still don't have an explanation.
Importantly, the other variables to keep (those that do not start with var.w_) are not guaranteed to stay in any pre-specified order, as they might be located before and/or between and/or after the var.w_ variables, for example. So solutions that rely on order of columns will not work.
#sample data
sampleDT<-structure(list(n = c(62L, 96L, 17L, 41L, 212L, 143L, 143L, 143L,
73L, 73L), r = c(3L, 1L, 0L, 2L, 170L, 21L, 0L, 33L, 62L, 17L
), p = c(0.0483870967741935, 0.0104166666666667, 0, 0.0487804878048781,
0.80188679245283, 0.146853146853147, 0, 0.230769230769231, 0.849315068493151,
0.232876712328767), var.w_8 = c(1.94254385942857, 1.18801169942857,
3.16131123942857, 3.16131123942857, 1.13482609242857, 1.13042157942857,
2.13042157942857, 1.13042157942857, 1.12335579942857, 1.12335579942857
), var.w_9 = c(1.942365288, 1.187833128, 3.161132668, 3.161132668,
1.134647521, 1.130243008, 2.130243008, 1.130243008, 1.123177228,
1.123177228), var.w_10 = c(1.94222639911111, 1.18769423911111,
3.16099377911111, 3.16099377911111, 1.13450863211111, 1.13010411911111,
2.13010411911111, 1.13010411911111, 1.12303833911111, 1.12303833911111
), group = c(1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L,
0L, 0L), treat = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L), c1 = c(1.941115288,
1.186583128, 1.159882668, 1.159882668, 1.133397521, 1.128993008,
1.128993008, 1.128993008, 1.121927228, 1.121927228), var.w_6 = c(1.939115288, 1.184583128,
3.157882668, 3.157882668, 1.131397521, 1.126993008, 2.126993008,
1.126993008, 1.119927228, 1.119927228), var.w_7 = c(1.94278195466667,
1.18824979466667, 3.16154933466667, 3.16154933466667, 1.13506418766667,
1.13065967466667, 2.13065967466667, 1.13065967466667, 1.12359389466667,
1.12359389466667), c2 = c(0.1438,
0.237, 0.2774, 0.2774, 0.2093, 0.1206, 0.1707, 0.0699, 0.1351,
0.1206), var.w_1 = c(1.941115288, 1.186583128, 3.159882668, 3.159882668,
1.133397521, 1.128993008, 2.128993008, 1.128993008, 1.121927228,
1.121927228), var.w_2 = c(1.931115288, 1.176583128, 3.149882668,
3.149882668, 1.123397521, 1.118993008, 2.118993008, 1.118993008,
1.111927228, 1.111927228), var.w_3 = c(1.946115288, 1.191583128,
3.164882668, 3.164882668, 1.138397521, 1.133993008, 2.133993008,
1.133993008, 1.126927228, 1.126927228), var.w_4 = c(1.93778195466667,
1.18324979466667, 3.15654933466667, 3.15654933466667, 1.13006418766667,
1.12565967466667, 2.12565967466667, 1.12565967466667, 1.11859389466667,
1.11859389466667), var.w_5 = c(1.943615288, 1.189083128, 3.162382668,
3.162382668, 1.135897521, 1.131493008, 2.131493008, 1.131493008,
1.124427228, 1.124427228)), class = "data.frame", row.names = c(NA, -10L))
#my attempt
//based on the comment by #akrun - this does not keep the other variables as specified above
myvars <- sample(grep("var\\.w_", names(sampleDT), value = TRUE), 5)
sampleDT_test <- sampleDT[myvars]
Thanks in advance for any help
Apologies, had to step into a meeting for a little bit. So, I think you could adapt akrun's solution and keep the first columns for the sample dataframe. Let me know how this scales on the full dataframe. Also, thanks for clarifying further.
> # Subsetting the variable names not matching your pattern using grepl
> names(sampleDT)[!grepl("var\\.w_", names(sampleDT))]
[1] "n" "r" "p" "group" "treat" "c1" "c2"
>
> # Combine that with akrun's solution
> myvars <- c(names(sampleDT)[!grepl("var\\.w_", names(sampleDT))],
+ sample(grep("var\\.w_", names(sampleDT), value = TRUE), 5))
> head(sampleDT[myvars])
n r p group treat c1 c2 var.w_6 var.w_1 var.w_4 var.w_3 var.w_8
1 62 3 0.04838710 1 0 1.941115 0.1438 1.939115 1.941115 1.937782 1.946115 1.942544
2 96 1 0.01041667 1 0 1.186583 0.2370 1.184583 1.186583 1.183250 1.191583 1.188012
3 17 0 0.00000000 0 0 1.159883 0.2774 3.157883 3.159883 3.156549 3.164883 3.161311
4 41 2 0.04878049 1 0 1.159883 0.2774 3.157883 3.159883 3.156549 3.164883 3.161311
5 212 170 0.80188679 0 0 1.133398 0.2093 1.131398 1.133398 1.130064 1.138398 1.134826
6 143 21 0.14685315 1 1 1.128993 0.1206 1.126993 1.128993 1.125660 1.133993 1.130422
I have a csv file and when i use this command
SOLK<-read.table('Book1.csv',header=TRUE,sep=';')
I get this output
> SOLK
Time Close Volume
1 10:27:03,6 0,99 1000
2 10:32:58,4 0,98 100
3 10:34:16,9 0,98 600
4 10:35:46,0 0,97 500
5 10:35:50,6 0,96 50
6 10:35:50,6 0,96 1000
7 10:36:10,3 0,95 40
8 10:36:10,3 0,95 100
9 10:36:10,4 0,95 500
10 10:36:10,4 0,95 100
. . . .
. . . .
. . . .
285 17:09:44,0 0,96 404
Here is the result of dput(SOLK[1:10,]):
> dput(SOLK[1:10,])
structure(list(Time = structure(c(1L, 2L, 3L, 4L, 5L, 5L, 6L,
6L, 7L, 7L), .Label = c("10:27:03,6", "10:32:58,4", "10:34:16,9",
"10:35:46,0", "10:35:50,6", "10:36:10,3", "10:36:10,4", "10:36:30,8",
"10:37:23,3", "10:37:38,2", "10:37:39,3", "10:37:45,9", "10:39:07,5",
"10:39:07,6", "10:39:46,6", "10:41:21,8", "10:43:20,6", "10:43:36,4",
"10:43:48,8", "10:43:48,9", "10:43:54,6", "10:44:01,5", "10:44:08,4",
"10:45:47,2", "10:46:16,7", "10:47:03,6", "10:47:48,6", "10:47:55,0",
"10:48:09,9", "10:48:30,6", "10:49:20,6", "10:50:31,9", "10:50:34,6",
"10:50:38,1", "10:51:02,8", "10:51:11,5", "10:55:57,7", "10:57:57,2",
"10:59:06,9", "10:59:33,5", "11:00:31,0", "11:00:31,1", "11:04:46,4",
"11:04:53,4", "11:04:54,6", "11:04:56,1", "11:04:58,9", "11:05:02,0",
"11:05:02,6", "11:05:24,7", "11:05:56,7", "11:06:15,8", "11:13:24,1",
"11:13:24,2", "11:13:32,1", "11:13:36,2", "11:13:37,2", "11:13:44,5",
"11:13:46,8", "11:14:12,7", "11:14:19,4", "11:14:19,8", "11:14:21,2",
"11:14:38,7", "11:14:44,0", "11:14:44,5", "11:15:10,5", "11:15:10,6",
"11:15:12,9", "11:15:16,6", "11:15:23,3", "11:15:31,4", "11:15:36,4",
"11:15:37,4", "11:15:49,5", "11:16:01,4", "11:16:06,0", "11:17:56,2",
"11:19:08,1", "11:20:17,2", "11:26:39,4", "11:26:53,2", "11:27:39,5",
"11:28:33,0", "11:30:42,3", "11:31:00,7", "11:33:44,2", "11:39:56,1",
"11:40:07,3", "11:41:02,1", "11:41:30,1", "11:45:07,0", "11:45:26,6",
"11:49:50,8", "11:59:58,1", "12:03:49,9", "12:04:12,6", "12:06:05,8",
"12:06:49,2", "12:07:56,0", "12:09:37,7", "12:14:25,5", "12:14:32,1",
"12:15:42,1", "12:15:55,2", "12:16:36,9", "12:16:44,2", "12:18:00,3",
"12:18:12,8", "12:28:17,8", "12:28:17,9", "12:28:23,7", "12:28:51,1",
"12:36:33,2", "12:37:45,0", "12:39:22,2", "12:40:19,5", "12:42:22,1",
"12:58:46,3", "13:06:05,8", "13:06:05,9", "13:07:17,6", "13:07:17,7",
"13:09:01,3", "13:09:01,4", "13:09:11,3", "13:09:31,0", "13:10:07,8",
"13:35:43,8", "13:38:27,7", "14:11:16,0", "14:17:31,5", "14:26:13,9",
"14:36:11,8", "14:38:43,7", "14:38:47,8", "14:38:51,8", "14:48:26,7",
"14:52:07,4", "14:52:13,8", "15:09:24,7", "15:10:25,8", "15:29:12,1",
"15:31:55,9", "15:34:04,1", "15:44:10,8", "15:45:07,1", "15:57:04,9",
"15:57:13,9", "16:16:27,9", "16:21:41,7", "16:36:01,5", "16:36:13,2",
"16:46:10,5", "16:46:10,6", "16:47:37,3", "16:50:52,4", "16:50:52,5",
"16:51:44,5", "16:55:11,5", "16:56:21,8", "16:56:37,5", "16:57:37,9",
"16:58:18,6", "16:58:44,5", "17:00:39,1", "17:01:50,7", "17:03:13,2",
"17:03:28,3", "17:03:46,7", "17:03:47,0", "17:04:30,4", "17:08:41,8",
"17:09:44,0"), class = "factor"), Close = structure(c(8L, 7L,
7L, 6L, 5L, 5L, 4L, 4L, 4L, 4L), .Label = c("0,92", "0,93", "0,94",
"0,95", "0,96", "0,97", "0,98", "0,99"), class = "factor"), Volume = c(1000L,
100L, 600L, 500L, 50L, 1000L, 40L, 100L, 500L, 100L)), .Names = c("Time",
"Close", "Volume"), row.names = c(NA, 10L), class = "data.frame")
The first column includes the time stamp of every transaction during a stock's exchange daily session. I would like to convert the Close and Volume columns to an xts object ordered by the Time column.
UPDATE: From your edits, it appears you imported your data using two different commands. It also appears you should be using read.csv2. I've updated my answer with Lines that (I assume) look more like your original CSV (I have to guess because you don't say what the file looks like). The rest of the answer doesn't change.
You have to add a date to your times because xts stores all index values internally as POSIXct (I just used today's date).
I had to convert the "," decimal notation to the "." convention (using gsub), but that may be locale-dependent and you may not need to. paste today's date with the (possibly converted) time and then convert it to POSIXct to create an index suitable for xts.
I've also formatted the index so you can see the fractional seconds.
Lines <- "Time;Close;Volume
10:27:03,6;0,99;1000
10:32:58,4;0,98;100
10:34:16,9;0,98;600
10:35:46,0;0,97;500
10:35:50,6;0,96;50
10:35:50,6;0,96;1000
10:36:10,3;0,95;40
10:36:10,3;0,95;100
10:36:10,4;0,95;500
10:36:10,4;0,95;100"
SOLK <- read.csv2(con <- textConnection(Lines))
close(con)
solk <- xts(SOLK[,c("Close","Volume")],
as.POSIXct(paste("2011-09-02", gsub(",",".",SOLK[,1]))))
indexFormat(solk) <- "%Y-%m-%d %H:%M:%OS6"
solk
# Close Volume
# 2011-09-02 10:27:03.599999 0.99 1000
# 2011-09-02 10:32:58.400000 0.98 100
# 2011-09-02 10:34:16.900000 0.98 600
# 2011-09-02 10:35:46.000000 0.97 500
# 2011-09-02 10:35:50.599999 0.96 50
# 2011-09-02 10:35:50.599999 0.96 1000
# 2011-09-02 10:36:10.299999 0.95 40
# 2011-09-02 10:36:10.299999 0.95 100
# 2011-09-02 10:36:10.400000 0.95 500
# 2011-09-02 10:36:10.400000 0.95 100
That's an odd structure. Translating it to dput syntax
SOLK <- structure(list(structure(c(1L, 2L, 3L, 4L, 5L, 5L, 6L, 6L, 7L,
7L), .Label = c("10:27:03,6", "10:32:58,4", "10:34:16,9", "10:35:46,0",
"10:35:50,6", "10:36:10,3", "10:36:10,4"), class = "factor"),
Close = c(0.99, 0.98, 0.98, 0.97, 0.96, 0.96, 0.95, 0.95,
0.95, 0.95), Volume = c(1000L, 100L, 600L, 500L, 50L, 1000L,
40L, 100L, 500L, 100L)), .Names = c("", "Close", "Volume"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10"))
I'm assuming the comma in the timestamp is decimal separator.
library("chron")
time.idx <- times(gsub(",",".",as.character(SOLK[[1]])))
Unfortunately, it seems xts won't take this as a valid order.by; so a date (today, for lack of a better choice) must be included to make xts happy.
xts(SOLK[[2]], order.by=chron(Sys.Date(), time.idx))