How are bytes represented in a file by Ocaml? - hex

I'm trying to write a function to write a list of bytes to a file (we're writing a parser of the .class file and after some insertions writing the file back.) When my partner wrote the code to read it in, the bytecode list variable is a list of ints. So now I need to convert it to bytes and then write it to a new .class file. Will the representation of the hex numbers be the same essentially so that the new .class file can be processed by the JVM? My functional programming is LISP from a single semester three years ago and a semester of Coq one year ago. Not enough to make me think in functional terms easily.

Your question is, in fact, pretty confusing. Here's some code that reads in the bytes of a file as an int list, then writes the ints back out as bytes to a new file. On a reasonable system (you don't mention your system), this will copy a file exactly so that no program including the JVM can tell the difference.
let get_bytes fn =
let inc = open_in_bin fn in
let rec go sofar =
match input_char inc with
| b -> go (Char.code b :: sofar)
| exception End_of_file -> List.rev sofar
in
let res = go [] in
close_in inc;
res
let put_bytes fn ints =
let outc = open_out_bin fn in
List.iter (fun b -> output_char outc (Char.chr b)) ints;
close_out outc
let copy_file infn outfn =
put_bytes outfn (get_bytes infn)
I tested this on my system (OS X 10.11.2). I don't have any class files around, but the JVM had no trouble running a jarfile copied with copy_file.
The essence of this problem has nothing to do with hexadecimal numbers. Those are a way of representing numbers as strings, which don't appear anywhere. It also has little to do with functional programming, other than the fact that you want to write your code in OCaml.
The essence of the problem is the meaning of a series of bytes stored in a file. At the lowest level, the bytes stored in the file are the meaning of the file. So you can faithfully copy the file just by copying the bytes. That's what copy_file does.
Since you want to change the bytes, you of course need to make sure your new bytes represent a valid class file. Once you've figured out the new bytes that you want, you can write them out with put_bytes (on a reasonable system).

Related

Vector contains data but reports length is 0, can be accessed by some functions

I've written a wrapper for a camera library in Rust that commands and operates a camera, and also saves an image to file using bindgen. Once I command an exposure to start (basically telling the camera to take an image), I can grab the image using a function of the form:
pub fn GetQHYCCDSingleFrame(
handle: *mut qhyccd_handle,
w: *mut u32,
...,
imgdata: &mut [u8],) -> u32 //(u32 is a retval)
In C++, this function was:
uint32_t STDCALL GetQHYCCDSingleFrame(qhyccd_handle: *handle, ..., uint8_t *imgdata)
In C++, I could pass in a buffer of the form imgdata = new unsigned char[length_buffer] and the function would fill the buffer with image data from the camera.
In Rust, similarly, I can pass in a buffer in the form of a Vec: let mut buffer: Vec<u8> = Vec::with_capacity(length_buffer).
Currently, the way I have structured the code is that there is a main struct, with settings such as the width and height of image, the camera handle, and others, including the image buffer. The struct has been initialized as a mut as:
let mut main_settings = MainSettings {
width: 9600,
...,
buffer: Vec::with_capacity(length_buffer),
}
There is a separate function I wrote that takes the main struct as a parameter and calls the GetQHYCCDSingleFrame function:
fn grab_image(main_settings: &mut MainSettings) {
let retval = unsafe { GetQHYCCDSingleFrame(main_settings.cam_handle, ..., &mut main_settings.image_buffer) };
}
Immediately after calling this function, if I check the length and capacity of main_settings.image_buffer:
println!("Elements in buffer are {}, capacity of buffer is {}.", main_settings.image_buffer.len(), main_settings.image_buffer.capacity());
I get 0 for length, and the buffer_length as the capacity. Similarly, printing any index such as main_settings.image_buffer[0] or 1 leads to a panic exit saying len is 0.
This would make me think that the GetQHYCCDSingleFrame code is not working properly, however, when I save the image_buffer to file using fitsio and hdu.write_region (fitsio docs linked here), I use:
let ranges = [&(x_start..(x_start + roi_width)), &(y_start..(y_start+roi_height))];
hdu.write_region(&mut fits_file, &ranges, &main_settings.image_buffer).expect("Could not write to fits file");
This saves an actual image to file with the right size and is a perfectly fine image (exactly what it would look if I took using the C++ program). However, when I try to print the buffer, for some reason is empty, yet the hdu.write_region code is able to access data somehow.
Currently, my (not good) workaround is to create another vector that reads data from the saved file and saves to a buffer, which then has the right number of elements:
main_settings.new_buffer = hdu.read_region(&mut fits_file, &ranges).expect("Couldn't read fits file");
Why can I not access the original buffer at all, and why does it report length 0, when the hdu.write_region function can access data from somewhere? And where exactly is it accessing the data from, and how can correctly I access it as well? I am bit new to borrowing and referencing, so I believe I might be doing something wrong in borrowing/referencing the buffer, or is it something else?
Sorry for the long story, but the details would probably be important for everything here. Thanks!
Well, first of all, you need to know that Vec<u8> and &mut [u8] are not quite the same as C or C++'s uint8_t *. The main difference is that Vec<u8> and &mut [u8] have the size of the array or slice saved within themselves, while uint8_t * doesn't. The Rust equivalent to C/C++ pointers are raw pointers, like *mut [u8]. Raw pointers are safe to build, but requires unsafe to be used. However, even tho they are different types, a smart pointer as &mut [u8] can be casted to a raw pointer without issue AFAIK.
Secondly, the capacity of a Vec is different of its size. Indeed, to have good performances, a Vec allocates more memory than you use, to avoid reallocating on each new element added into vector. The length however is the size of the used part. In your case, you ask the Vec to allocate a heap space of length length_buffer, but you don't tell them to consider any of the allocated space to be used, so the initial length is 0. Since C++ doesn't know about Vec and only use a raw pointer, it can't change the length written inside the Vec, that stays at 0. Thus the panicking.
To resolve it, I see multiple solutions:
Changing the Vec::with_capacity(length_buffer) into vec![0; length_buffer], explicilty asking to have a length of length_buffer from the start
Using unsafe code to explicitly set the length of the Vec without touching what is inside (using Vec::from_raw_parts). This might be faster than the first solution, but I'm not sure.
Using a Box<[u8; length_buffer]>, which is like a Vec but without reallocation and with the length that is the capacity
If your length_buffer is constant at compile time, using a [u8; length_buffer] would be much more efficient as no allocation is needed, but it comes with downsides, as you probably know

How to start reading a file x bytes from the beginning in Julia?

I need to read records from a file, each being 9 bytes long. I need to know how to start reading at different points in the file
It looks like you're looking for the seek function:
help?> seek
search: seek seekend seekstart ParseError setenv select select! selectperm
seek(s, pos)
Seek a stream to the given position.
In particular you might want to
open(filename) do f
seek(f, n) # seek past nth byte
read(f, m) # read m bytes
end
There is also the skip function that may come in useful
help?> skip
search: skip skipchars
skip(s, offset)
Seek a stream relative to the current position.

Immutable dictionary

Is there a way to enforce a dictionary being constant?
I have a function which reads out a file for parameters (and ignores comments) and stores it in a dict:
function getparameters(filename::AbstractString)
f = open(filename,"r")
dict = Dict{AbstractString, AbstractString}()
for ln in eachline(f)
m = match(r"^\s*(?P<key>\w+)\s+(?P<value>[\w+-.]+)", ln)
if m != nothing
dict[m[:key]] = m[:value]
end
end
close(f)
return dict
end
This works just fine. Since i have a lot of parameters, which i will end up using on different places, my idea was to let this dict be global. And as we all know, global variables are not that great, so i wanted to ensure that the dict and its members are immutable.
Is this a good approach? How do i do it? Do i have to do it?
Bonus answerable stuff :)
Is my code even ok? (it is the first thing i did with julia, and coming from c/c++ and python i have the tendencies to do things differently.) Do i need to check whether the file is actually open? Is my reading of the file "julia"-like? I could also readall and then use eachmatch. I don't see the "right way to do it" (like in python).
Why not use an ImmutableDict? It's defined in base but not exported. You use one as follows:
julia> id = Base.ImmutableDict("key1"=>1)
Base.ImmutableDict{String,Int64} with 1 entry:
"key1" => 1
julia> id["key1"]
1
julia> id["key1"] = 2
ERROR: MethodError: no method matching setindex!(::Base.ImmutableDict{String,Int64}, ::Int64, ::String)
in eval(::Module, ::Any) at .\boot.jl:234
in macro expansion at .\REPL.jl:92 [inlined]
in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at .\event.jl:46
julia> id2 = Base.ImmutableDict(id,"key2"=>2)
Base.ImmutableDict{String,Int64} with 2 entries:
"key2" => 2
"key1" => 1
julia> id.value
1
You may want to define a constructor which takes in an array of pairs (or keys and values) and uses that algorithm to define the whole dict (that's the only way to do so, see the note at the bottom).
Just an added note, the actual internal representation is that each dictionary only contains one key-value pair, and a dictionary. The get method just walks through the dictionaries checking if it has the right value. The reason for this is because arrays are mutable: if you did a naive construction of an immutable type with a mutable field, the field is still mutable and thus while id["key1"]=2 wouldn't work, id.keys[1]=2 would. They go around this by not using a mutable type for holding the values (thus holding only single values) and then also holding an immutable dict. If you wanted to make this work directly on arrays, you could use something like ImmutableArrays.jl but I don't think that you'd get a performance advantage because you'd still have to loop through the array when checking for a key...
First off, I am new to Julia (I have been using/learning it since only two weeks). So do not put any confidence in what I am going to say unless it is validated by others.
The dictionary data structure Dict is defined here
julia/base/dict.jl
There is also a data structure called ImmutableDict in that file. However as const variables aren't actually const why would immutable dictionaries be immutable?
The comment states:
ImmutableDict is a Dictionary implemented as an immutable linked list,
which is optimal for small dictionaries that are constructed over many individual insertions
Note that it is not possible to remove a value, although it can be partially overridden and hidden
by inserting a new value with the same key
So let us call what you want to define as a dictionary UnmodifiableDict to avoid confusion. Such object would probably have
a similar data structure as Dict.
a constructor that takes a Dict as input to fill its data structure.
specialization (a new dispatch?) of the the method setindex! that is called by the operator [] =
in order to forbid modification of the data structure. This should be the case of all other functions that end with ! and hence modify the data.
As far as I understood, It is only possible to have subtypes of abstract types. Therefore you can't make UnmodifiableDict as a subtype of Dict and only redefine functions such as setindex!
Unfortunately this is a needed restriction for having run-time types and not compile-time types. You can't have such a good performance without a few restrictions.
Bottom line:
The only solution I see is to copy paste the code of the type Dict and its functions, replace Dict by UnmodifiableDict everywhere and modify the functions that end with ! to raise an exception if called.
you may also want to have a look at those threads.
https://groups.google.com/forum/#!topic/julia-users/n-lqjybIO_w
https://github.com/JuliaLang/julia/issues/1974
REVISION
Thanks to Chris Rackauckas for pointing out the error in my earlier response. I'll leave it below as an illustration of what doesn't work. But, Chris is right, the const declaration doesn't actually seem to improve performance when you feed the dictionary into the function. Thus, see Chris' answer for the best resolution to this issue:
D1 = [i => sind(i) for i = 0.0:5:3600];
const D2 = [i => sind(i) for i = 0.0:5:3600];
function test(D)
for jdx = 1:1000
# D[2] = 2
for idx = 0.0:5:3600
a = D[idx]
end
end
end
## Times given after an initial run to allow for compiling
#time test(D1); # 0.017789 seconds (4 allocations: 160 bytes)
#time test(D2); # 0.015075 seconds (4 allocations: 160 bytes)
Old Response
If you want your dictionary to be a constant, you can use:
const MyDict = getparameters( .. )
Update Keep in mind though that in base Julia, unlike some other languages, it's not that you cannot redefine constants, instead, it's just that you get a warning when doing so.
julia> const a = 2
2
julia> a = 3
WARNING: redefining constant a
3
julia> a
3
It is odd that you don't get the constant redefinition warning when adding a new key-val pair to the dictionary. But, you still see the performance boost from declaring it as a constant:
D1 = [i => sind(i) for i = 0.0:5:3600];
const D2 = [i => sind(i) for i = 0.0:5:3600];
function test1()
for jdx = 1:1000
for idx = 0.0:5:3600
a = D1[idx]
end
end
end
function test2()
for jdx = 1:1000
for idx = 0.0:5:3600
a = D2[idx]
end
end
end
## Times given after an initial run to allow for compiling
#time test1(); # 0.049204 seconds (1.44 M allocations: 22.003 MB, 5.64% gc time)
#time test2(); # 0.013657 seconds (4 allocations: 160 bytes)
To add to the existing answers, if you like immutability and would like to get performant (but still persistent) operations which change and extend the dictionary, check out FunctionalCollections.jl's PersistentHashMap type.
If you want to maximize performance and take maximal advantage of immutability, and you don't plan on doing any operations on the dictionary whatsoever, consider implementing a perfect hash function-based dictionary. In fact, if your dictionary is a compile-time constant, these can even be computed ahead of time (using metaprogramming) and precompiled.

How to simply read binary data sheets which contains string column in Julia?

I am trying to (write, read) a number of tabular data sheets (in, from) a binary file, data are of Integer, Float64 and ASCIIString types, I write them without difficulty, I lpad ASCIIString to make ASCIIString columns of the same length. now I am facing reading operation, I want to read each table of data by a single call to read function e.g.:
read(myfile,Tuple{[UInt16;[Float64 for i=1:10];UInt8]...}, dim) # => works
EDIT-> I do not use the above line of code in my real solution because
I found that
sizeof(Tuple{Float64,Int32})!=sizeof(Float64)+sizeof(Int32)
but how to include ASCIIString fields in in my Tuple type?
check this simplified example:
file=open("./testfile.txt","w");
ts1="5char";
ts2="7 chars";
write(file,ts1,ts2);
close(file);
file=open("./testfile.txt","r");
data=read(file,typeof(ts1)); # => Errror
close(file);
Julia is right because typeof(ts1)==ASCIIString and ASCIIString is a variable length array, so Julia don't know how many bytes must be read.
What kind of type I must replace there? Is there a type that represents ConstantLangthString<length> or Bytes<length> , Chars<length>? any better solution exists?
EDIT
I should add more complete sample code that includes my latest progress, my latest solution is to read some part of data into a buffer (one row or more), allocate memory for one row of data then reinterpret bytes and copy result value from buffer into an out location:
#convert array of bits and copy them to out
function reinterpretarray!{ty}(out::Vector{ty}, buffer::Vector{UInt8}, pos::Int)
count=length(out)
out[1:count]=reinterpret(ty,buffer[pos:count*sizeof(ty)+pos-1])
return count*sizeof(ty)+pos
end
file=open("./testfile.binary","w");
#generate test data
infloat=ones(20);
instr=b"MyData";
inint=Int32[12];
#write tuple
write(file,([infloat...],instr,inint)...);
close(file);
file=open("./testfile.binary","r");
#read data into a buffer
buffer=readbytes(file,sizeof(infloat)+sizeof(instr)+sizeof(inint));
close(file);
#allocate memory
outfloat=zeros(20)
outstr=b"123456"
outint=Int32[1]
outdata=(outfloat,outstr,outint)
#copy and convert
pos=1
for elm in outdata
pos=reinterpretarray!(elm, buffer, pos)
end
assert(outdata==(infloat,instr,inint))
But my experiments in C language tell me that there must be a better, more convenient and faster solution exists, I would like to do it using C style pointers and references, I don't like to copy data from one location to another one.
Thanks
You can use Array{UInt8} as an alternative type for ASCIIString, which is the type for underlying data.
ts1="5chars"
print(ts1.data) #Array{UInt8}
someotherarray=ts1.data[:] #copies as new array
someotherstring=ASCIIString(somotherarray)
assert(someotherstring == ts1)
Do mind that I'm reading UInt8 in a x86_64 system, which might not be your case. You should use Array{eltype(ts1.data)} for safety reasons.

Miranda Error cannot unify [[char]] with[char] on line 12

having a problem with coding with miranda am just a newbie to functional programming so slap me hard if ive dont an easy mistake so i learn
anyway i a getting an error on line 12 with having a problem with unifyin char with char my idea is to check is something is spelt right by filtering it with the dictionary wich would be both an list of words and another list from a file added together
this is my line 12
= [filter (= typed) ((read file) ++ dictionary)]
and this is the rest of my program so far
filename == [char]
word == [ char ]
dictionary :: [ word ]
spell:: filename -> filename -> [ char ]
look:: word -> filename ->[[[ char ]]]
look typed file
= [filter (= typed) ((read file) ++ dictionary)]
dictionary =
["aardvark","bell","camp","dictionary","editor","file","ground",
"grounds","help","intelligent","joint","kettle","light","memory",
"nettle","orange","quite","research","standard","terminal",
"umbrella","violin","water","xenon","yellow","zoo","aaa","abb",
"acc","add","aee"]
so could anyone point out where i'v gone wrong?
I've never used Miranda, but having used Haskell, it looks like the problem is that you're trying to append a string and a list of strings; however, I guess that ++ needs two lists of the same type (as in Haskell):
(++) :: [a] -> [a] -> [a]
But read file is of type [char], and dictionary is of type [[char]].
Trying to substitute these into the type signature for ++ causes the type error:
(++) :: [char] -> [[char]] -> ?? -- type error!!
Maybe you want to split (read file) into words before you append it to dictionary. Then you would be appending [[char]] to [[char]], which will work just fine.
Note I don't know anything about Miranda -- this answer is based on looking at your code, the error message you gave, and my experience with Haskell (where I've made oodles of similar mistakes).

Resources