Need some help to determine how is this date/time being encoded
Here are some examples (known date only):
2B5F0200 -> 31/10/2021
2B9F0200 -> 31/12/2021
2C3F0200 -> 31/01/2022
I don't understand how this datetime format works.
First, since the last two bytes are the same in all cases, let's focus on the first two bytes. Look at them in binary:
0x2B5F: 0b_0010_1011_0101_1111
0x2B9F: 0b_0010_1011_1001_1111
0x2C3F: 0b_0010_1100_0011_1111
Next, consider the binary representation for the numbers in the dates. In some date formats, months are 0-based (January is 0), in others they're 1-based, so include both.
21: 0b_1_0101
22: 0b_1_0110
10: 0b_1010, 0b_1001
12: 0b_1010, 0b_1001
01: 0b_0001, 0b_0000
31: 0b_1_1111
By inspection, each of these binary numerals appears in the appropriate date. 31 is the last 5 bits. The next 4 bits are 10, 12 and 1. 21 and 22 show up in the first 7 bits (for 100 years, you'll need at least 7 bits).
21 10 31
0x2B5F: 0b_0010101_1010_11111
21 12 31
0x2B9F: 0b_0010101_1100_11111
22 1 31
0x2C3F: 0b_0010110_0001_11111
The format is thus a packed bit-field:
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-------------+-------+---------+
| year (YY) | month | day |
+-------------+-------+---------+
Or, as bit masks:
year: 0xFE00
month: 0x01E0
day: 0x001F
Related
I'm trying to decode this Firebird blob to extract decimal numbers from it (not sure exactly what format they'll be in).
Some context is the blob is storing vibration spectrum data charting amplitude against frequency. I'm pretty sure that the blob only contains the amplitude data though. Here's an example blob export for a small test spectra i generated:
0000803F0000004000004040000080400000A0400000C0400000E0400000004100001041000020410000304100004041000050410000604100007041000080410000884100009041000098410000A0410000A8410000B0410000B8410000C0410000C8410000D0410000D8410000E0410000E8410000F0410000F84100000042000004420000084200000C4200001042000014420000184200001C4200002042000000006666663FA4707D3F77BE7F3F72F97F3F58FF7F3F0000803F0000C03F0000004000002040000040400000604000008040000088400000904000009840CDCC9C400000A0400000C84200007A4400401C46
As far as i can tell, it looks like each number is represented by 4 bytes of data which is hexadecimal in this export. I know it's 4 bytes per value because of how it lines up with my test set below. I also think that potentially the first 2 bytes are the fractional part, and the last 2 being the whole numbers? I think it might use a scaling factor as well. Here is my test set (same as above, just reformatted), with the actual values (amplitudes):
Actual Value Blob Section
1 0000803F
2 00000040
3 00004040
4 00008040
5 0000A040
6 0000C040
7 0000E040
8 00000041
9 00001041
10 00002041
11 00003041
12 00004041
13 00005041
14 00006041
15 00007041
16 00008041
17 00008841
18 00009041
19 00009841
20 0000A041
21 0000A841
22 0000B041
23 0000B841
24 0000C041
25 0000C841
26 0000D041
27 0000D841
28 0000E041
29 0000E841
30 0000F041
31 0000F841
32 00000042
33 00000442
34 00000842
35 00000C42
36 00001042
37 00001442
38 00001842
39 00001C42
40 00002042
0 00000000
0.9 6666663F
0.99 A4707D3F
0.999 77BE7F3F
0.9999 72F97F3F
0.99999 58FF7F3F
1 0000803F
1.5 0000C03F
2 00000040
2.5 00002040
3 00004040
3.5 00006040
4 00008040
4.25 00008840
4.5 00009040
4.75 00009840
4.9 CDCC9C40
5 0000A040
100 0000C842
1000 00007A44
10000 00401C46
Its pretty obvious that its not just a straight hexadecimal - decimal conversion, but i feel like this is something an expert would be able to recognize. Any help or pointers on how to decode this 4 bytes of hex back to a number value would be much appreciated!
That is industry-standard 4-bytes floating point format (single float).
https://www.h-schmidt.net/FloatConverter/IEEE754.html
https://en.wikipedia.org/wiki/Single-precision_floating-point_format
Of course, bytes order should be accounted for too (you see it visually reversed in your dump above, comparing to normal writing of hexadecimal integer numbers on the site above).
https://en.wikipedia.org/wiki/Endianness
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I'm fairly new here and also fairly new to R so apologies if anything is unclear.
Basically, I have a csv table of numbers for each person, 1 number for each week for 38 weeks.
For example, Anthony has number 6 in week 1, 12 in week 2 and so on, these numbers are fairly random and range from 1-20.
I have taken the numbers from the table and saved them into a string, hence Anthonys string when printed would look like
"6 12 18 7 17 4 16 11 20 15 3 5 19 10 8 9 1 14 13 19 11 16 18 4 17 7 6 12 14 1 10 13 20 15 3 5 8 9"
What I'm trying to do with this is find/count the amount of times a number between 1 and 10 occurs in groups of 3 consecutively and then groups of 4 consecutively and possibly 5.
For example, in this string 8, 9 and 1 occur consecutively and then 3, 5, 8 and 9 occur consecutively, meaning the amount of occurrences is 2.
I've tried using str_count from the stringr package and also tried a few different functions located here - Count the number of overlapping substrings within a string
I can't seem to find a method/function to get this to output what I want (a simple count of the number of occurrences).
If anyone could provide any insight/help it would be greatly appreciated.
It would be easier to keep these as numbers. Here I use scan() to turn your string into a vector of values indicating if each number is less than 10 or not then I call rle() on it to calculate run lenths
x <- "6 12 18 7 17 4 16 11 20 15 3 5 19 10 8 9 1 14 13 19 11 16 18 4 17 7 6 12 14 1 10 13 20 15 3 5 8 9"
rr <- rle(scan(text=x)<10)
Now I can mangle this into a data.frame and see which runs were longer than 2
subset(as.data.frame(unclass(rr)), values==T & lengths>2)
# lengths values
# 9 3 TRUE
# 17 4 TRUE
So we can see that we had a run of 3 and a run of 4.
I could clean this up by defining a function to turn the rle into a data.frame more easily and track the starting indexes
as.data.frame.rle <- function(x) {
data.frame(unclass(x), start=head(cumsum(c(0,rr$lengths))+1,-1))
}
and can then run
subset(as.data.frame(rle(scan(text=x)<10)), values==T & lengths>2)
# lengths values start
# 9 3 TRUE 15
# 17 4 TRUE 35
so we can see those runs start at positions 15 and 35.
Probably my question title is not appropriate, sorry for that. I have a csv file named "table_parameter". Please, download from here.. Data look like this:
time Avg.PM10 sill range nugget
1 1 2012030101 52.269231 0.11054330 45574.072 0.037261216
2 2 2012030102 55.314286 0.20250974 87306.391 0.048315377
3 3 2012030103 56.038095 0.17711558 56806.827 0.034956709
4 4 2012030104 55.904762 0.16466350 104767.669 0.030752835
5 5 2012030105 57.123810 0.23638953 87306.391 0.037308364
6 6 2012030106 58.542857 0.24130317 87306.391 0.042108754
7 7 2012030107 60.066667 0.20362439 87306.391 0.037353980
8 8 2012030108 63.790476 0.19417801 87306.391 0.034144464
.
.
.
In my dataframe there is a variable named time contains hours value from 01 march 2012 to 7 march 2012 in numeric form. for example 01 march 2012, 1.00 a.m. is written as 2012030101 and so on.
I want to subset this dataframe time wise. I want a dataframe contains only morning times of every 7 days. morning time is 1.00 am to 5.00 a.m. That means I want a dataframe which contais all the value belongs to 2012030101 to 2012030105, 2012030201 to 2012030205..........2012030701 to 2012030705.in other words,I want a dataframe like below:
time Avg.PM10 sill range nugget
1 49 49 2012030301 17.371429 0.7154449 48239.54 0.17163448
2 50 50 2012030302 17.811321 1.1201199 117603.55 0.12425337
3 51 51 2012030303 17.094340 0.5799705 55103.16 0.12061258
4 52 52 2012030304 16.679245 0.8486774 86725.77 0.15210005
5 53 53 2012030305 16.885714 1.2408621 154677.61 0.09743375
6 73 73 2012030401 21.619048 0.4417369 104767.67 0.08567888
7 74 74 2012030402 20.485714 2.0271124 215474.54 0.06340464
8 75 75 2012030403 20.552381 0.4509354 104767.67 0.06319812
9 76 76 2012030404 20.104762 0.4438798 104767.67 0.05639840
10 77 77 2012030405 20.133333 0.5050201 104767.67 0.09037341
.
.
.
For doing this I wrote these code:
table<-read.csv("table_parameter.csv")
table
table_morning<-subset(table, time %in% c(2012030101:2012030105,
2012030201:2012030205,
2012030301:2012030305,
2012030401:2012030405,
2012030501:2012030505,
2012030601:2012030605,
2012030701:2012030705) & Avg.PM10 <=30)
table_morning
But this code is not efficient.as you see, I wrote all the hour values to subset! If want to do the same work for 90 days then Its very inefficient. So, how can I do this subsetting efficiently? If you have any further query please let me know.
you could use substring like below:
table_morning <- subset(table, substring(time, 9, 10) %in% c("01", "02","03","04", "05") & Avg.PM10 <=30)
I would extract the hour from the time and then filter accordingly.
For example:
library(dplyr)
data_orpheus = read.csv('table_parameter.csv')
data_orpheus$hour = as.numeric(substr(as.character(data_orpheus$time),9,10))
data_morning = data_orpheus %>% filter(hour >= 1 & hour <= 5)
The dplyr operator %>% is not necessary, you could filter with data_morning = data_orpheus[with(data_orpheus,hour >= 1 & hour <= 5)]
Update
I am still learning dplyr, so here is a beautiful one-liner that does it all:
data_morning = read.csv('table_parameter.csv') %>% # Read CSV
mutate(hours = as.numeric(substr(time,9,10))) %>% # Extract hours
filter(hours >= 1 & hours <= 5) %>% # Keep only mornings
select(-hours) # Drop hours, if not needed
head(data_morning)
X time Avg.PM10 sill range nugget
1 1 2012030101 52.26923 0.1105433 45574.07 0.03726122
2 2 2012030102 55.31429 0.2025097 87306.39 0.04831538
3 3 2012030103 56.03810 0.1771156 56806.83 0.03495671
4 4 2012030104 55.90476 0.1646635 104767.67 0.03075283
5 5 2012030105 57.12381 0.2363895 87306.39 0.03730836
6 25 2012030201 67.10476 0.1434977 72755.33 0.03003781
Thanks a lot for Other answers. My improvised answer for my future advantage:
table<-read.csv("table_parameter.csv")
times<- as.numeric(substr(table$time,9,10))
table_morning<- subset(table, times>=1 & times<=5 & Avg.PM10<=30)
I have two equally long matching vectors of time series data: Price (x) and hour (h). Hour goes from 0-23. My hour variable is my dummy variable (or factor/level variable I guess it is called in R).
Right now i've defined 24 different dummy variables, and for each hour I type my dummy variable. So for example generating 24 plots to look at or calculate 24 means etc I would type:
plot.ts(hour1) # and so on for all 24.
I would like to do this for all 24 variables as easily as possible? So I can run a lot of different calculations. For example, how could I just compute the mean for all 24 dummy variables without making 24 lines of code, changing each dummy variable?
EDIT: Sorry, thought it was clear with the two vectors. Example:
1. Price Hour
2. 8 0
3. 12 1
4. 14 2
5. 16 3
6. 18 4
7. 20 5
8. 22 6
9. 24 7
10. 26 8
11. 28 9
12. 24 10
13. 26 11
14. 23 12
15. 23 13
16. 23 14
17. 14 15
18. 19 16
19. 25 17
20. 26 18
21. 28 19
22. 30 20
23. 33 21
24. 24 22
25. 10 23
26. 14 0
27. 12 1
28. 13 2
29. x ect.
It is not clear how your data are stored since you don't give a reproducible example. I assume you have separate variables for each hour1.
Generally, It is better to put your hourxx variable in a list to perform calculations.
For example, this will compute mean for all hours:
lapply(lapply(ls(pattern='hour.*'),get),mean)
EDIT after OP clarification:
You shuld create a new variable to distinguish between Hours intervals. Something like :
dat <- data.frame(Price=rnorm(24*5),Hour=rep(0:23,5))
dat$id <- cumsum(c(0,diff(dat$Hour)==-23))
Then using ply package for example , you can compute mean by id:
library(plyr)
ddply(dat,.(id),summarise,mPrice=mean(Price))
id mPrice
1 0 0.2999602
2 1 -0.2201148
3 2 0.2400192
4 3 -0.2087594
5 4 0.1666915
Sorry me again. I will keep on trying but I want help in case I can't figure out within the next hour.
My data looks like this:
B<-data.frame(ID=c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2),EVID=c(1,1,1,0,1,2,2,1,1,1,2,2,1,1,1),VALUE=seq(15))
B$TIME<-c(Sys.time()+6*3600*(seq_len(nrow(B))-1))
Actually the time is more variable, and each ID may have multiple EVID of 2.
I wanted to add one hour increments between the times for EVID=2 for as many hours as they are apart, i.e., for each pair of EVID=2, I add one hour until the time is within one hour to the second EVID=2 in the pair, so I can get something like this:
(value and ID are just duplicate previous rows)
ID EVID VALUE TIME
1 1 1 1 2013-05-31 07:51:09
2 1 1 2 2013-05-31 13:51:09
3 1 1 3 2013-05-31 19:51:09
4 1 0 4 2013-06-01 01:51:09
5 1 1 5 2013-06-01 07:51:09
6 1 2 6 2013-06-01 13:51:09
6 1 2 6 2013-06-01 14:51:09
6 1 2 6 2013-06-01 15:51:09
6 1 2 6 2013-06-01 16:51:09
6 1 2 6 2013-06-01 17:51:09
6 1 2 6 2013-06-01 18:51:09
7 1 2 7 2013-06-01 19:51:09
8 1 1 8 2013-06-02 01:51:09
9 2 1 9 2013-06-02 07:51:09
10 2 1 10 2013-06-02 13:51:09
11 2 2 11 2013-06-02 19:51:09
11 2 2 11 2013-06-02 20:51:09
11 2 2 11 2013-06-02 21:51:09
11 2 2 11 2013-06-02 22:51:09
11 2 2 11 2013-06-02 23:51:09
11 2 2 11 2013-06-02 0:51:09
12 2 2 12 2013-06-03 01:51:09
13 2 1 13 2013-06-03 07:51:09
14 2 1 14 2013-06-03 13:51:09
15 2 1 15 2013-06-03 19:51:09
Below is my brainstorm/attempt:
library(data.table)
BDT <- data.table(row=1:nrow(B), B, key="ID")
BDT[,list(row,EVID,c(EVID)==2)]
attach(B)
newB<-BDT[c(EVID)==2,list(row=row+1,ID=ID,EVID=EVID,VALUE=VALUE,TIME=head(TIME+3600,-1))]
finalB<-rbind(BDT,newB)[order(EVID,decreasing=TRUE)][order(row)][,-1,with=FALSE]
However, this adds one row of Time+1 hour to each EVID=2 which is not what I desired.
The next thing I tried duplicates every row after the first which is not what I wanted, but has the advantage of sparing my from typing out all the names of the columns (I have about 32)
newB<-B[c(1,rep(2:nrow(B),each=2)),]
## My wild guess -- as.numeric(head(TIME))-as.numeric(tail(TIME)))/3600 doesn't work. I know it says that from row 2 to last row, repeat each row twice
newB[c(FALSE,TRUE),"EVID"]<-2
newB[c(FALSE,TRUE),"TIME"]<-newB[c(FALSE,TRUE),"TIME"]+3600
Thank you for any feedback.
=================================================================
eddie's code works well with my example, which I thought was a good representation but my actual data keep getting
error in seq.int(...) wrong sign in 'by' argument
(...) varies depending on what I was trying
I have a relatively large data, the column that I use as the ID as in the example is in the middle of the data table; I see even from my small sample data if I place the ID along with the other names in the list, R will recognize item 2 as having n+1 columns than item 1 in the rbind. But if I don't include it in the list so that I may use the by=ID, R complains that names are in different order. If a do not list one of the unimportant columns in the beginning of the data, R says item 2 has n-1 columns compared to item 1!
I thought that perhaps my error comes from my time being not exactly hours apart, but by test runs I see that small differences are tolerated, and rounding, either to hour or doing integers, doesn't help.
I tried using length.out, ignoring the warning
Warning message: In .rbind.data.table(...) : Argument 2 has names in
a different order. Columns will be bound by name for consistency with
base. Alternatively, you can drop names (by using an unnamed list) and
the columns will then be joined by position. Or, set use.names=FALSE.
But then the code does not add to between the 2's except at the end, where it adds too many!
What am I doing wrong? I've been pulling all-nighter for this :(
OK so when I rearrange the original data I can get rid of the warnings. However, the insertions are still happening at the end of the data only and they were too many.
This should work:
library(data.table)
dt = data.table(B)
dt[, TIME := as.POSIXct(TIME)]
rbind(dt, dt[EVID == 2,
list(EVID=EVID[1],
VALUE=VALUE[1],
TIME=seq.POSIXt(TIME[1], TIME[2], "hour")),
by = ID])[!duplicated(paste(ID,EVID,TIME))][order(ID, TIME)]