Author Topic: Cameca .Tiff file outputs  (Read 1052 times)

jon_wade

  • Professor
  • ****
  • Posts: 72
Cameca .Tiff file outputs
« on: June 14, 2019, 10:22:40 AM »
Dear collected wisdom

I know this isn't the place, and I know John's (very valid) answer, but that may have to wait until we win the lottery...

Anyone have experience of exporting images from peak sight 6?  I have a *lot* collected in a mosaic, but each one is being exported with a different intensity, suggesting that, somewhere, they are being auto ranged.
All I really want is raw counts so I can stuff them into python and, if necessary, fix the ranging, but I cannot work out if they are ranged, and if so how, or even how to get the unadulterated raw data out of PS6. 
CSV data looks like 32 bit tiff which appear to be raw counts, but its really not clear how or what is going on in the software.

Anyone got any magic insights?  (granted, its a lot of data... but each .imDAT file is 3+Gb.  Nope, no idea what is in that 3 Gb.....pixies?  unicorns?  analysts tears? ;) )


John Donovan

  • Administrator
  • Emeritus
  • *****
  • Posts: 2691
  • Other duties as assigned...
    • Probe Software
Re: Cameca .Tiff file outputs
« Reply #1 on: June 14, 2019, 01:03:42 PM »
I know this isn't the place, and I know John's (very valid) answer, but that may have to wait until we win the lottery...

Listen up mate, you talking about our software?  "Win the lottery"?  You're at Oxford University, right?  I think you meant to say, "hold a bake sale".  Either that or lotteries in the UK are pretty disappointing!   ;)

Anyone have experience of exporting images from peak sight 6?  I have a *lot* collected in a mosaic, but each one is being exported with a different intensity, suggesting that, somewhere, they are being auto ranged.
All I really want is raw counts so I can stuff them into python and, if necessary, fix the ranging, but I cannot work out if they are ranged, and if so how, or even how to get the unadulterated raw data out of PS6. 
CSV data looks like 32 bit tiff which appear to be raw counts, but its really not clear how or what is going on in the software.

Anyone got any magic insights?  (granted, its a lot of data... but each .imDAT file is 3+Gb.  Nope, no idea what is in that 3 Gb.....pixies?  unicorns?  analysts tears? ;) )

Don't know anything about the ImDAT file format, but from my failing memory the .ImpDat image file format has a 1 (or 2) kilobyte header and then just long integers (X by Y).  It's a binary file but it would be easy to write some code to process these files if all you need are the intensity values.

And yes, we have asked Cameca repeatedly for the format of the .ImpDat header and they simply refuse to provide it, saying: just use the ASCII export file.  Unfortunately the ASCII export format leaves some things ambiguous, not to mention that they modify it on occasion. So the binary format would be much better to parse, especially if one wants the raw intensities. But maybe someone out there will specify information on the .ImpDat header format as part of their next Cameca microprobe purchase...?
« Last Edit: June 14, 2019, 03:57:49 PM by John Donovan »
John J. Donovan, Pres. 
(541) 343-3400

"Not Absolutely Certain, Yet Reliable"

jon_wade

  • Professor
  • ****
  • Posts: 72
Re: Cameca .Tiff file outputs
« Reply #2 on: June 15, 2019, 08:27:09 AM »
cheers John - I can see myself getting Hexedit out at some point  :P

Looking at consecutive BSE Tiff's of the same material, the grey levels are varying.  Is this a scaling thing in the software?

Honestly, I can promise you that should we have a windfall we'll definitely be investing PfEPMA, but cash, for a variety of reasons, appears short right now.

I'd be interested to know from others, but I get the distinct smell in the UK that geoscience demand for EPMA and analysis in general is declining.... :(

« Last Edit: June 15, 2019, 10:15:36 AM by John Donovan »

Probeman

  • Emeritus
  • *****
  • Posts: 2118
  • Never sleeps...
    • John Donovan
Re: Cameca .Tiff file outputs
« Reply #3 on: June 15, 2019, 10:21:25 AM »
Looking at consecutive BSE Tiff's of the same material, the grey levels are varying.  Is this a scaling thing in the software?

Hi Jon,
You're asking about the TIFF files from PeakSight?  I have no idea, but I do know that we have to set the BSE mode from "Differential" to "Ground", and then adjust the brightness until the image looks good, otherwise the BSE images show streaks when the BSE signal changes from pits and cracks.

Also on my system we have in the past seen gradients from left to right in the BSE brightness at lower magnifications:

https://probesoftware.com/smf/index.php?topic=583.msg3318#msg3318

This can be fixed by adjusting some electronics.
The only stupid question is the one not asked!

neko

  • Professor
  • ****
  • Posts: 63
Re: Cameca .Tiff file outputs
« Reply #4 on: July 09, 2019, 11:54:20 AM »
Looking at consecutive BSE Tiff's of the same material, the grey levels are varying.  Is this a scaling thing in the software?

Based on the 3gb file size you've generated, I'm guessing you're not assembling these maps in PeakSigh (one might hope they've removed the 4 megapixel limitation since version 4 but I doubt it's large enough now) but are outputting individual cells for re-assembly in ImageJ or similar (have they fixed the rows/column designations being backwards to literally every other application on the planet yet?). Assembling the map in PeakSight will actually result in everything being set to 0-255 (or whatever their highest normalization value is), but when you output them individually, they're all normalized based on the brightest pixel value. WHICH IS TERRIBLE AND SHOULD NEVER EVER BE DONE IN THE FIRST PLACE EVEN WITH ONLY A SINGLE IMAGE BECAUSE THIS IS SCIENCE DAMNIT.

Someone did write an open-source Python program for extracting impdat, but unfortunately they wrote it in a weird Python GUI instead of making it command line so I'm not sure how useful it might be (link is here: https://probesoftware.com/smf/index.php?topic=938 ).

The Ascii format does just contain an array of pixel values (AFAIK 0-255 for BSE, 0-? for Xray channels), but the headers suck and ImageJ needs them stripped before it can import them - Writing something to strip the BSE headers, then running them through a batch importer in ImageJ as 8 bit greyscale should work (not sure if the unix tool ImageMagic works with ascii images but if so, command lines are great for batch processing, and who knows, might even have a header-stripping tool built in). They are n-length integers but ImageJ and Matlab understand them when importing.

This is definitely a problem I'm interested in solving so feel free to contact me about it, because I've had to go through and de-normalize (eg set high value to 255) tons of images by hand when people want them to be directly comparable to each other BECAUSE THIS IS SCIENCE AND THAT IS IMPORTANT AND WHY DOESN'T CAMECAAAAARGIGIHTNEHUONSTHUSANOTBKT *deep breath* I just can't for the life of me understand why Cameca doesn't understand that imaging is actually important. I had been hoping they'd have fixed that by 6.2 but I guess I was wrong.

Probeman

  • Emeritus
  • *****
  • Posts: 2118
  • Never sleeps...
    • John Donovan
Re: Cameca .Tiff file outputs
« Reply #5 on: July 09, 2019, 12:52:52 PM »
Hi Nick,
Yes. It is weird that they would do this.

Thermo doesn't do this in their mosaics and they are also using a .tif format. The Stage app in Probe for EPMA saves the raw intensity data as floating point in GRD format as seen here for one of my standard mounts:



https://probesoftware.com/smf/index.php?topic=324.0

I guess you can wait for them to fix this (it shouldn't be that hard), or maybe someday you and Jon Wade will get it together and get some better software!   ;)

Attached below are some high resolution BSE and CL mosaics acquired on our probe.  I downsized them so they aren't so huge for uploading.  The Surfer GRD mosaics themselves are each over 1 GB.
« Last Edit: July 09, 2019, 12:58:20 PM by Probeman »
The only stupid question is the one not asked!

sem-geologist

  • Professor
  • ****
  • Posts: 67
Re: Cameca .Tiff file outputs
« Reply #6 on: April 06, 2021, 05:48:30 AM »
Quote
...weird Python GUI...
hehehe...  ;D
just found this rant by neko.
1) I still am sitting at reverse engineering of cameca formats at my FREE-TIME.
2) Cameca will not give you header information as there is 99.9% probability they don't know its structure themselves. For me, as seasoned reverse-engineer (I had successfully reverse-engineered some other microscope binary formats), lots of stuff looks like direct memory dump (with all garbage in between). So don't ask them as they will not give you this, as they don't have it.
3) there are lots of changes between peaksight 5 and 6 formats; older formats contain some template structures (it is empty structures inserted in different weird binary places, for what? don't ask me) which I only recently could decipher - it was the main wall I was bumping my head in the last two years.
4) I am working for universal parser - that is not only for impDat, but also wdsDat, qtiDat... and calibrations too! oh and setups. and it have to do it right for peaksight 5 and peaksight 6 files.
5) The github repository have old version, I was not updating it as I mainly work in kaitai_struct for RE. Well it contains GUI, because all this initiative is driven/was initiated by demand of some real solutions to real problems. And I started with wdsDat, as I wanted to overlay all WDS scans of all our standards at once, without clicking and waiting to death on Peaksight SX-results (and that what GUI is for). It is weird GUI as it is not finished, experimental, WIP. You can use that for Peaksight 6 WDS'es. I attache screenshot (shameful self-advertising) so others would know what weird GUI is.
6) Parser file is GUI agnostic, You can take it out and use in your own python program.
7) But even better, as soon I will finish with RE with kaitai, I will upload ksy. That will allow to use any popular programming language to parse those binary files (C++, C#, python, Go, javascript, java, lua, ruby, OMG OMG... even "asleep on the Keyboard" I mean Perl).
6) The current GUI will be heavy modified, but unnecessary as probably for imgDat files I would also prefer CLI than GUI (wds is different story, those need graphical interaction to play with them)
« Last Edit: April 06, 2021, 05:51:17 AM by sem-geologist »

John Donovan

  • Administrator
  • Emeritus
  • *****
  • Posts: 2691
  • Other duties as assigned...
    • Probe Software
Re: Cameca .Tiff file outputs
« Reply #7 on: April 06, 2021, 06:22:39 AM »
1) I still am sitting at reverse engineering of cameca formats at my FREE-TIME.
2) Cameca will not give you header information as there is 99.9% probability they don't know its structure themselves. For me, as seasoned reverse-engineer (I had successfully reverse-engineered some other microscope binary formats), lots of stuff looks like direct memory dump (with all garbage in between). So don't ask them as they will not give you this, as they don't have it.
3) there are lots of changes between peaksight 5 and 6 formats; older formats contain some template structures (it is empty structures inserted in different weird binary places, for what? don't ask me) which I only recently could decipher - it was the main wall I was bumping my head in the last two years.
4) I am working for universal parser - that is not only for impDat, but also wdsDat, qtiDat... and calibrations too! oh and setups. and it have to do it right for peaksight 5 and peaksight 6 files.

If you reverse engineer the Cameca impDat file format, please share it with us!  We are utilizing the .TXT format and it is not ideal.
John J. Donovan, Pres. 
(541) 343-3400

"Not Absolutely Certain, Yet Reliable"

sem-geologist

  • Professor
  • ****
  • Posts: 67
Re: Cameca .Tiff file outputs
« Reply #8 on: April 06, 2021, 07:52:21 AM »
Quote
If you reverse engineer the Cameca impDat file format, please share it with us!  We are utilizing the .TXT format and it is not ideal.
There is no problem in sharing GPL, it can be problem in incorporating the code directly to proprietary software (ProbeSoftware). Albeit, probably we could do special agreement (like i.e. Qt framework does by providing same code for closed-source projects with commercial license and same code with GPL license for open-source projects).

...going back to neko's rant. BSE images are 8bit, because BSE signal is converted to digital format with 8bit ADC, thus it is saved as 8bit tiffs. Normalization is stupid, and Cameca is not an exception, we had similar idiotic issues with mosaic'ing with Bruker Esprit (same problem by same cause: 32bit software, thus memory limitations, thus converting everything to 8bits before merging, and additionaly to that an idiotic normalization to max val pixel in the cell) on our SEM's... untill I had reverse engineered their bcf format and now we have truly 16bit mosaics. So you would not get more information on Cameca with 16bit tiffs as they have only 8bits of information. (Bruker is exception, as their ADC is 16bits, thats why they can gather 16bit BSE/SEI; and you could get 16bit BSE images on Cameca instruments using full licence and hardware of Bruker (the basic licence has not only less software options, but hardware(PCI card) is missing add-on card with ADC's)). Element mapping is different story - information internally is stored as 32bit floating point values, as that allows to store much higher values than 32 integers, and 16bits would be easiely overflown in many cases. I cant remember how it is on peaksight 5 but on 6 you can export images as 32bit tiffs, and those are floating point 32bit, not integer, window image viewer gets confused and shows as garbage because misinterpret those as integer 32bits tiffs.

John Donovan

  • Administrator
  • Emeritus
  • *****
  • Posts: 2691
  • Other duties as assigned...
    • Probe Software
Re: Cameca .Tiff file outputs
« Reply #9 on: April 06, 2021, 07:55:51 AM »
Quote
If you reverse engineer the Cameca impDat file format, please share it with us!  We are utilizing the .TXT format and it is not ideal.
There is no problem in sharing GPL, it can be problem in incorporating the code directly to proprietary software (ProbeSoftware). Albeit, probably we could do special agreement (like i.e. Qt framework does by providing same code for closed-source projects with commercial license and same code with GPL license for open-source projects).

I do not want the code, just any documentation you manage to produce.
John J. Donovan, Pres. 
(541) 343-3400

"Not Absolutely Certain, Yet Reliable"

sem-geologist

  • Professor
  • ****
  • Posts: 67
Re: Cameca .Tiff file outputs
« Reply #10 on: April 06, 2021, 12:13:02 PM »
the documentation is the ksy code... and ksy code is the documentation.   :o
I imagine what you imagine by mentioning the "documentation". When I was RE bcf (bruker hypermaps) I was doing it in the old ways, that is the documentation as offset position of different data could be written in word documents, excel spreadsheets, on the paper and so on. And for such format static documentation makes sense, especially when format was clearly designed by someone and thought from bottom-up. Such formats have well defined structure (best binary structure I ever saw was some old JEOL SEM formats - just simply beautiful, if there is beauty in 0's and 1's), often contain table of contents, which contain offesets to (if data is itemized (i.e. points, images)) the particular items. This allows to build lazy loader (memory efficient) which can read the only item you need from the disk, and not load everything into RAM.

Cameca formats, as I mentioned before, are highly probable direct memory dumps by software. Structure is highly dynamic and chaotic (well it have general parts as I would call header, main and footer); All of them (maybe except footer) are dynamic. There is no table of contents... and items have no fixed size and no hints of its size, so item needs to be read field by field, item is built by its own header, main part and footer... and junk there and here, some fixed-size, some activated by prefixing binary flag, some as result of combination of file version, sub-sus-sub structure version, combination of filetype (or version or both) and substructure version and so on... of course all strings has dynamic length, thus You need to parse every string, to go further in bit-stream. Opened the file in peaksight, did something (like changing contrast or other stuff) and overwrite/saved over the file - the structure had changed, header got some strings, main part got some new stuff and so on and on... Thought that got those offsets, those offset works no more...
So using old reverse-engineering methods I found that it is too hard to RE. Pretty soon I had found out that Documenting this format in hard form makes no sense.

What I had needed was a dynamic hex editor, and I was looking for one to buy... but fortunately I had found the free and, I am 100% sure to say that, the most powerful RE tool - "kaitai_struct". https://kaitai.io/
With that tool I finally could do a real breakthroughs through the RE those formats. I could start bisecting not only older vs newer versions of the files, but formats against formats, to find same common structures, "illogicities" and junk. There is still lots of work to bisect all of possible stuff, but at least there is some hope to finalize this, and make this fully functional.
So that should explain why ksy is the documentation and documentation is ksy.

I am attaching the most recent version of my RE ksy; as the only author and the only copyright holder I decided that I should release this with LGPL license. Then there is no problems for me or for anyone. LGPL code can be used and distributed with proprietary programs as is no need for separate licenses. The only requirement that it should not be packed into binary, and that users could interchange it with other versions.

Its up to You to implement your own parser based on that information in ksy... but I would advice to use this ksy directly, as that have in my opinion more benefits for both sides.
Your benefits:
1) with new /next peaksight versions and new next "features" in binary cameca files added, having ksy file will allow to address new bytes/records inserted in structure. With ready ksy, and kaitai_webide it is so easy that it would took a few minutes to add additional parsing instructions, without braking API.
2) I could miss some bits which some user would activate in peaksight, he would have ability to experiment with modifying ksy, and send the found bits to you, so that all users would benefit.
3) Some users can have older peaksight, or would want to return to datasets acquired before peaksight 5. They could modify ksy so that older files could be read without recompiling anything. changes to ksy could be shared upstream, so that all ProbeSoftware users would benefit.
4) In case I quit being interested side (quit being operator of epma, which is quite real and imaginable as I need to feed my family), it is much easier to reverse engineer some progressed stuff than start from a scratch. With ksy and free powerful ide this is easy to catch up.
5) In case I find new structures, I update ksy upstream, and Probesoftware could use that instantly, without recompiling whole Probesofware, only recompiling the wrapper dll.

My benefits:
1) the direct changes to ksy would benefit my applications. High number of users has higher chance to get to some corner cases which could slip my mind.

How to use ksy directly:
As Probesoftware is written mostly in VB, and there is no kaitai parser compiler for VB, Some dynamic library/wrapper should be made which could be used from VB. I guess VB can use compiled C++ or C# dll's. So such small C++ or C# wrapper could expose to VB relevant data parts while hiding all junk, and useless details. Wrapper library should use kaitai compiler code which is MIT licensed and can be statically linked to same wrapper library.

the ksy file attached, is work in progress, there is some not finished stuff. I attach it for looking into it at kaitai web ide and for posibility to look making wrapper C library for your use. Also it is not cleaned from different questionable naming of variables, like "stuff", "thingy" and other monty-pythonic references.

My plan is to achieve state when all kind of files can be successfully parsed, and then cleanup and optimization is going to follow.
« Last Edit: April 07, 2021, 05:14:29 AM by sem-geologist »

sem-geologist

  • Professor
  • ****
  • Posts: 67
Re: Cameca .Tiff file outputs
« Reply #11 on: April 06, 2021, 12:40:03 PM »
oh, I forgot... ksy is not used directly. So that mean if using it as code, the wrapper library should also have LGPL license, and on demand of user its source for recompilation along with ksy should be shared, or distributed together with program. Sorry for misleadings in post above. I forgot how this works in python, as there is literally nothing compiled.
I probably was mislead at that time when thought that it will be going to be one of the common ways:
https://www.gitmemory.com/issue/kaitai-io/kaitai_struct_python_runtime/50/691314234; but looks it is a bad idea, and precompilation/transformation into specific language is going to be main way.