the documentation is the ksy code... and ksy code is the documentation.
I imagine what you imagine by mentioning the "documentation". When I was RE bcf (bruker hypermaps) I was doing it in the old ways, that is the documentation as offset position of different data could be written in word documents, excel spreadsheets, on the paper and so on. And for such format static documentation makes sense, especially when format was clearly designed by someone and thought from bottom-up. Such formats have well defined structure (best binary structure I ever saw was some old JEOL SEM formats - just simply beautiful, if there is beauty in 0's and 1's), often contain table of contents, which contain offesets to (if data is itemized (i.e. points, images)) the particular items. This allows to build lazy loader (memory efficient) which can read the only item you need from the disk, and not load everything into RAM.
Cameca formats, as I mentioned before, are highly probable direct memory dumps by software. Structure is highly dynamic and chaotic (well it have general parts as I would call header, main and footer); All of them (maybe except footer) are dynamic. There is no table of contents... and items have no fixed size and no hints of its size, so item needs to be read field by field, item is built by its own header, main part and footer... and junk there and here, some fixed-size, some activated by prefixing binary flag, some as result of combination of file version, sub-sus-sub structure version, combination of filetype (or version or both) and substructure version and so on... of course all strings has dynamic length, thus You need to parse every string, to go further in bit-stream. Opened the file in peaksight, did something (like changing contrast or other stuff) and overwrite/saved over the file - the structure had changed, header got some strings, main part got some new stuff and so on and on... Thought that got those offsets, those offset works no more...
So using old reverse-engineering methods I found that it is too hard to RE. Pretty soon I had found out that Documenting this format in hard form makes no sense.
What I had needed was a dynamic hex editor, and I was looking for one to buy... but fortunately I had found the free and, I am 100% sure to say that, the most powerful RE tool - "kaitai_struct".
https://kaitai.io/ With that tool I finally could do a real breakthroughs through the RE those formats. I could start bisecting not only older vs newer versions of the files, but formats against formats, to find same common structures, "illogicities" and junk. There is still lots of work to bisect all of possible stuff, but at least there is some hope to finalize this, and make this fully functional.
So that should explain why ksy is the documentation and documentation is ksy.
I am attaching the most recent version of my RE ksy; as the only author and the only copyright holder I decided that I should release this with LGPL license. Then there is no problems for me or for anyone. LGPL code can be used and distributed with proprietary programs as is no need for separate licenses. The only requirement that it should not be packed into binary, and that users could interchange it with other versions.
Its up to You to implement your own parser based on that information in ksy... but I would advice to use this ksy directly, as that have in my opinion more benefits for both sides.
Your benefits:
1) with new /next peaksight versions and new next "features" in binary cameca files added, having ksy file will allow to address new bytes/records inserted in structure. With ready ksy, and kaitai_webide it is so easy that it would took a few minutes to add additional parsing instructions, without braking API.
2) I could miss some bits which some user would activate in peaksight, he would have ability to experiment with modifying ksy, and send the found bits to you, so that all users would benefit.
3) Some users can have older peaksight, or would want to return to datasets acquired before peaksight 5. They could modify ksy so that older files could be read without recompiling anything. changes to ksy could be shared upstream, so that all ProbeSoftware users would benefit.
4) In case I quit being interested side (quit being operator of epma, which is quite real and imaginable as I need to feed my family), it is much easier to reverse engineer some progressed stuff than start from a scratch. With ksy and free powerful ide this is easy to catch up.
5) In case I find new structures, I update ksy upstream, and Probesoftware could use that instantly, without recompiling whole Probesofware, only recompiling the wrapper dll.
My benefits:
1) the direct changes to ksy would benefit my applications. High number of users has higher chance to get to some corner cases which could slip my mind.
How to use ksy directly:
As Probesoftware is written mostly in VB, and there is no kaitai parser compiler for VB, Some dynamic library/wrapper should be made which could be used from VB. I guess VB can use compiled C++ or C# dll's. So such small C++ or C# wrapper could expose to VB relevant data parts while hiding all junk, and useless details. Wrapper library should use kaitai compiler code which is MIT licensed and can be statically linked to same wrapper library.
the ksy file attached, is work in progress, there is some not finished stuff. I attach it for looking into it at kaitai web ide and for posibility to look making wrapper C library for your use. Also it is not cleaned from different questionable naming of variables, like "stuff", "thingy" and other monty-pythonic references.
My plan is to achieve state when all kind of files can be successfully parsed, and then cleanup and optimization is going to follow.