Author Topic: Cameca .Tiff file outputs  (Read 3542 times)

jon_wade

  • Professor
  • ****
  • Posts: 82
Cameca .Tiff file outputs
« on: June 14, 2019, 10:22:40 AM »
Dear collected wisdom

I know this isn't the place, and I know John's (very valid) answer, but that may have to wait until we win the lottery...

Anyone have experience of exporting images from peak sight 6?  I have a *lot* collected in a mosaic, but each one is being exported with a different intensity, suggesting that, somewhere, they are being auto ranged.
All I really want is raw counts so I can stuff them into python and, if necessary, fix the ranging, but I cannot work out if they are ranged, and if so how, or even how to get the unadulterated raw data out of PS6. 
CSV data looks like 32 bit tiff which appear to be raw counts, but its really not clear how or what is going on in the software.

Anyone got any magic insights?  (granted, its a lot of data... but each .imDAT file is 3+Gb.  Nope, no idea what is in that 3 Gb.....pixies?  unicorns?  analysts tears? ;) )


John Donovan

  • Administrator
  • Emeritus
  • *****
  • Posts: 3275
  • Other duties as assigned...
    • Probe Software
Re: Cameca .Tiff file outputs
« Reply #1 on: June 14, 2019, 01:03:42 PM »
I know this isn't the place, and I know John's (very valid) answer, but that may have to wait until we win the lottery...

Listen up mate, you talking about our software?  "Win the lottery"?  You're at Oxford University, right?  I think you meant to say, "hold a bake sale".  Either that or lotteries in the UK are pretty disappointing!   ;)

Anyone have experience of exporting images from peak sight 6?  I have a *lot* collected in a mosaic, but each one is being exported with a different intensity, suggesting that, somewhere, they are being auto ranged.
All I really want is raw counts so I can stuff them into python and, if necessary, fix the ranging, but I cannot work out if they are ranged, and if so how, or even how to get the unadulterated raw data out of PS6. 
CSV data looks like 32 bit tiff which appear to be raw counts, but its really not clear how or what is going on in the software.

Anyone got any magic insights?  (granted, its a lot of data... but each .imDAT file is 3+Gb.  Nope, no idea what is in that 3 Gb.....pixies?  unicorns?  analysts tears? ;) )

Don't know anything about the ImDAT file format, but from my failing memory the .ImpDat image file format has a 1 (or 2) kilobyte header and then just long integers (X by Y).  It's a binary file but it would be easy to write some code to process these files if all you need are the intensity values.

And yes, we have asked Cameca repeatedly for the format of the .ImpDat header and they simply refuse to provide it, saying: just use the ASCII export file.  Unfortunately the ASCII export format leaves some things ambiguous, not to mention that they modify it on occasion. So the binary format would be much better to parse, especially if one wants the raw intensities. But maybe someone out there will specify information on the .ImpDat header format as part of their next Cameca microprobe purchase...?
« Last Edit: June 14, 2019, 03:57:49 PM by John Donovan »
John J. Donovan, Pres. 
(541) 343-3400

"Not Absolutely Certain, Yet Reliable"

jon_wade

  • Professor
  • ****
  • Posts: 82
Re: Cameca .Tiff file outputs
« Reply #2 on: June 15, 2019, 08:27:09 AM »
cheers John - I can see myself getting Hexedit out at some point  :P

Looking at consecutive BSE Tiff's of the same material, the grey levels are varying.  Is this a scaling thing in the software?

Honestly, I can promise you that should we have a windfall we'll definitely be investing PfEPMA, but cash, for a variety of reasons, appears short right now.

I'd be interested to know from others, but I get the distinct smell in the UK that geoscience demand for EPMA and analysis in general is declining.... :(

« Last Edit: June 15, 2019, 10:15:36 AM by John Donovan »

Probeman

  • Emeritus
  • *****
  • Posts: 2836
  • Never sleeps...
    • John Donovan
Re: Cameca .Tiff file outputs
« Reply #3 on: June 15, 2019, 10:21:25 AM »
Looking at consecutive BSE Tiff's of the same material, the grey levels are varying.  Is this a scaling thing in the software?

Hi Jon,
You're asking about the TIFF files from PeakSight?  I have no idea, but I do know that we have to set the BSE mode from "Differential" to "Ground", and then adjust the brightness until the image looks good, otherwise the BSE images show streaks when the BSE signal changes from pits and cracks.

Also on my system we have in the past seen gradients from left to right in the BSE brightness at lower magnifications:

https://probesoftware.com/smf/index.php?topic=583.msg3318#msg3318

This can be fixed by adjusting some electronics.
The only stupid question is the one not asked!

neko

  • Professor
  • ****
  • Posts: 66
Re: Cameca .Tiff file outputs
« Reply #4 on: July 09, 2019, 11:54:20 AM »
Looking at consecutive BSE Tiff's of the same material, the grey levels are varying.  Is this a scaling thing in the software?

Based on the 3gb file size you've generated, I'm guessing you're not assembling these maps in PeakSigh (one might hope they've removed the 4 megapixel limitation since version 4 but I doubt it's large enough now) but are outputting individual cells for re-assembly in ImageJ or similar (have they fixed the rows/column designations being backwards to literally every other application on the planet yet?). Assembling the map in PeakSight will actually result in everything being set to 0-255 (or whatever their highest normalization value is), but when you output them individually, they're all normalized based on the brightest pixel value. WHICH IS TERRIBLE AND SHOULD NEVER EVER BE DONE IN THE FIRST PLACE EVEN WITH ONLY A SINGLE IMAGE BECAUSE THIS IS SCIENCE DAMNIT.

Someone did write an open-source Python program for extracting impdat, but unfortunately they wrote it in a weird Python GUI instead of making it command line so I'm not sure how useful it might be (link is here: https://probesoftware.com/smf/index.php?topic=938 ).

The Ascii format does just contain an array of pixel values (AFAIK 0-255 for BSE, 0-? for Xray channels), but the headers suck and ImageJ needs them stripped before it can import them - Writing something to strip the BSE headers, then running them through a batch importer in ImageJ as 8 bit greyscale should work (not sure if the unix tool ImageMagic works with ascii images but if so, command lines are great for batch processing, and who knows, might even have a header-stripping tool built in). They are n-length integers but ImageJ and Matlab understand them when importing.

This is definitely a problem I'm interested in solving so feel free to contact me about it, because I've had to go through and de-normalize (eg set high value to 255) tons of images by hand when people want them to be directly comparable to each other BECAUSE THIS IS SCIENCE AND THAT IS IMPORTANT AND WHY DOESN'T CAMECAAAAARGIGIHTNEHUONSTHUSANOTBKT *deep breath* I just can't for the life of me understand why Cameca doesn't understand that imaging is actually important. I had been hoping they'd have fixed that by 6.2 but I guess I was wrong.

Probeman

  • Emeritus
  • *****
  • Posts: 2836
  • Never sleeps...
    • John Donovan
Re: Cameca .Tiff file outputs
« Reply #5 on: July 09, 2019, 12:52:52 PM »
Hi Nick,
Yes. It is weird that they would do this.

Thermo doesn't do this in their mosaics and they are also using a .tif format. The Stage app in Probe for EPMA saves the raw intensity data as floating point in GRD format as seen here for one of my standard mounts:



https://probesoftware.com/smf/index.php?topic=324.0

I guess you can wait for them to fix this (it shouldn't be that hard), or maybe someday you and Jon Wade will get it together and get some better software!   ;)

Attached below are some high resolution BSE and CL mosaics acquired on our probe.  I downsized them so they aren't so huge for uploading.  The Surfer GRD mosaics themselves are each over 1 GB.
« Last Edit: July 09, 2019, 12:58:20 PM by Probeman »
The only stupid question is the one not asked!

sem-geologist

  • Professor
  • ****
  • Posts: 302
Re: Cameca .Tiff file outputs
« Reply #6 on: April 06, 2021, 05:48:30 AM »
Quote
...weird Python GUI...
hehehe...  ;D
just found this rant by neko.
1) I still am sitting at reverse engineering of cameca formats at my FREE-TIME.
2) Cameca will not give you header information as there is 99.9% probability they don't know its structure themselves. For me, as seasoned reverse-engineer (I had successfully reverse-engineered some other microscope binary formats), lots of stuff looks like direct memory dump (with all garbage in between). So don't ask them as they will not give you this, as they don't have it.
3) there are lots of changes between peaksight 5 and 6 formats; older formats contain some template structures (it is empty structures inserted in different weird binary places, for what? don't ask me) which I only recently could decipher - it was the main wall I was bumping my head in the last two years.
4) I am working for universal parser - that is not only for impDat, but also wdsDat, qtiDat... and calibrations too! oh and setups. and it have to do it right for peaksight 5 and peaksight 6 files.
5) The github repository have old version, I was not updating it as I mainly work in kaitai_struct for RE. Well it contains GUI, because all this initiative is driven/was initiated by demand of some real solutions to real problems. And I started with wdsDat, as I wanted to overlay all WDS scans of all our standards at once, without clicking and waiting to death on Peaksight SX-results (and that what GUI is for). It is weird GUI as it is not finished, experimental, WIP. You can use that for Peaksight 6 WDS'es. I attache screenshot (shameful self-advertising) so others would know what weird GUI is.
6) Parser file is GUI agnostic, You can take it out and use in your own python program.
7) But even better, as soon I will finish with RE with kaitai, I will upload ksy. That will allow to use any popular programming language to parse those binary files (C++, C#, python, Go, javascript, java, lua, ruby, OMG OMG... even "asleep on the Keyboard" I mean Perl).
6) The current GUI will be heavy modified, but unnecessary as probably for imgDat files I would also prefer CLI than GUI (wds is different story, those need graphical interaction to play with them)
« Last Edit: April 06, 2021, 05:51:17 AM by sem-geologist »

John Donovan

  • Administrator
  • Emeritus
  • *****
  • Posts: 3275
  • Other duties as assigned...
    • Probe Software
Re: Cameca .Tiff file outputs
« Reply #7 on: April 06, 2021, 06:22:39 AM »
1) I still am sitting at reverse engineering of cameca formats at my FREE-TIME.
2) Cameca will not give you header information as there is 99.9% probability they don't know its structure themselves. For me, as seasoned reverse-engineer (I had successfully reverse-engineered some other microscope binary formats), lots of stuff looks like direct memory dump (with all garbage in between). So don't ask them as they will not give you this, as they don't have it.
3) there are lots of changes between peaksight 5 and 6 formats; older formats contain some template structures (it is empty structures inserted in different weird binary places, for what? don't ask me) which I only recently could decipher - it was the main wall I was bumping my head in the last two years.
4) I am working for universal parser - that is not only for impDat, but also wdsDat, qtiDat... and calibrations too! oh and setups. and it have to do it right for peaksight 5 and peaksight 6 files.

If you reverse engineer the Cameca impDat file format, please share it with us!  We are utilizing the .TXT format and it is not ideal.
John J. Donovan, Pres. 
(541) 343-3400

"Not Absolutely Certain, Yet Reliable"

sem-geologist

  • Professor
  • ****
  • Posts: 302
Re: Cameca .Tiff file outputs
« Reply #8 on: April 06, 2021, 07:52:21 AM »
Quote
If you reverse engineer the Cameca impDat file format, please share it with us!  We are utilizing the .TXT format and it is not ideal.
There is no problem in sharing GPL, it can be problem in incorporating the code directly to proprietary software (ProbeSoftware). Albeit, probably we could do special agreement (like i.e. Qt framework does by providing same code for closed-source projects with commercial license and same code with GPL license for open-source projects).

...going back to neko's rant. BSE images are 8bit, because BSE signal is converted to digital format with 8bit ADC, thus it is saved as 8bit tiffs. Normalization is stupid, and Cameca is not an exception, we had similar idiotic issues with mosaic'ing with Bruker Esprit (same problem by same cause: 32bit software, thus memory limitations, thus converting everything to 8bits before merging, and additionaly to that an idiotic normalization to max val pixel in the cell) on our SEM's... untill I had reverse engineered their bcf format and now we have truly 16bit mosaics. So you would not get more information on Cameca with 16bit tiffs as they have only 8bits of information. (Bruker is exception, as their ADC is 16bits, thats why they can gather 16bit BSE/SEI; and you could get 16bit BSE images on Cameca instruments using full licence and hardware of Bruker (the basic licence has not only less software options, but hardware(PCI card) is missing add-on card with ADC's)). Element mapping is different story - information internally is stored as 32bit floating point values, as that allows to store much higher values than 32 integers, and 16bits would be easiely overflown in many cases. I cant remember how it is on peaksight 5 but on 6 you can export images as 32bit tiffs, and those are floating point 32bit, not integer, window image viewer gets confused and shows as garbage because misinterpret those as integer 32bits tiffs.

John Donovan

  • Administrator
  • Emeritus
  • *****
  • Posts: 3275
  • Other duties as assigned...
    • Probe Software
Re: Cameca .Tiff file outputs
« Reply #9 on: April 06, 2021, 07:55:51 AM »
Quote
If you reverse engineer the Cameca impDat file format, please share it with us!  We are utilizing the .TXT format and it is not ideal.
There is no problem in sharing GPL, it can be problem in incorporating the code directly to proprietary software (ProbeSoftware). Albeit, probably we could do special agreement (like i.e. Qt framework does by providing same code for closed-source projects with commercial license and same code with GPL license for open-source projects).

I do not want the code, just any documentation you manage to produce.
John J. Donovan, Pres. 
(541) 343-3400

"Not Absolutely Certain, Yet Reliable"

sem-geologist

  • Professor
  • ****
  • Posts: 302
Re: Cameca .Tiff file outputs
« Reply #10 on: April 06, 2021, 12:13:02 PM »
the documentation is the ksy code... and ksy code is the documentation.   :o
I imagine what you imagine by mentioning the "documentation". When I was RE bcf (bruker hypermaps) I was doing it in the old ways, that is the documentation as offset position of different data could be written in word documents, excel spreadsheets, on the paper and so on. And for such format static documentation makes sense, especially when format was clearly designed by someone and thought from bottom-up. Such formats have well defined structure (best binary structure I ever saw was some old JEOL SEM formats - just simply beautiful, if there is beauty in 0's and 1's), often contain table of contents, which contain offesets to (if data is itemized (i.e. points, images)) the particular items. This allows to build lazy loader (memory efficient) which can read the only item you need from the disk, and not load everything into RAM.

Cameca formats, as I mentioned before, are highly probable direct memory dumps by software. Structure is highly dynamic and chaotic (well it have general parts as I would call header, main and footer); All of them (maybe except footer) are dynamic. There is no table of contents... and items have no fixed size and no hints of its size, so item needs to be read field by field, item is built by its own header, main part and footer... and junk there and here, some fixed-size, some activated by prefixing binary flag, some as result of combination of file version, sub-sus-sub structure version, combination of filetype (or version or both) and substructure version and so on... of course all strings has dynamic length, thus You need to parse every string, to go further in bit-stream. Opened the file in peaksight, did something (like changing contrast or other stuff) and overwrite/saved over the file - the structure had changed, header got some strings, main part got some new stuff and so on and on... Thought that got those offsets, those offset works no more...
So using old reverse-engineering methods I found that it is too hard to RE. Pretty soon I had found out that Documenting this format in hard form makes no sense.

What I had needed was a dynamic hex editor, and I was looking for one to buy... but fortunately I had found the free and, I am 100% sure to say that, the most powerful RE tool - "kaitai_struct". https://kaitai.io/
With that tool I finally could do a real breakthroughs through the RE those formats. I could start bisecting not only older vs newer versions of the files, but formats against formats, to find same common structures, "illogicities" and junk. There is still lots of work to bisect all of possible stuff, but at least there is some hope to finalize this, and make this fully functional.
So that should explain why ksy is the documentation and documentation is ksy.

I am attaching the most recent version of my RE ksy; as the only author and the only copyright holder I decided that I should release this with LGPL license. Then there is no problems for me or for anyone. LGPL code can be used and distributed with proprietary programs as is no need for separate licenses. The only requirement that it should not be packed into binary, and that users could interchange it with other versions.

Its up to You to implement your own parser based on that information in ksy... but I would advice to use this ksy directly, as that have in my opinion more benefits for both sides.
Your benefits:
1) with new /next peaksight versions and new next "features" in binary cameca files added, having ksy file will allow to address new bytes/records inserted in structure. With ready ksy, and kaitai_webide it is so easy that it would took a few minutes to add additional parsing instructions, without braking API.
2) I could miss some bits which some user would activate in peaksight, he would have ability to experiment with modifying ksy, and send the found bits to you, so that all users would benefit.
3) Some users can have older peaksight, or would want to return to datasets acquired before peaksight 5. They could modify ksy so that older files could be read without recompiling anything. changes to ksy could be shared upstream, so that all ProbeSoftware users would benefit.
4) In case I quit being interested side (quit being operator of epma, which is quite real and imaginable as I need to feed my family), it is much easier to reverse engineer some progressed stuff than start from a scratch. With ksy and free powerful ide this is easy to catch up.
5) In case I find new structures, I update ksy upstream, and Probesoftware could use that instantly, without recompiling whole Probesofware, only recompiling the wrapper dll.

My benefits:
1) the direct changes to ksy would benefit my applications. High number of users has higher chance to get to some corner cases which could slip my mind.

How to use ksy directly:
As Probesoftware is written mostly in VB, and there is no kaitai parser compiler for VB, Some dynamic library/wrapper should be made which could be used from VB. I guess VB can use compiled C++ or C# dll's. So such small C++ or C# wrapper could expose to VB relevant data parts while hiding all junk, and useless details. Wrapper library should use kaitai compiler code which is MIT licensed and can be statically linked to same wrapper library.

the ksy file attached, is work in progress, there is some not finished stuff. I attach it for looking into it at kaitai web ide and for posibility to look making wrapper C library for your use. Also it is not cleaned from different questionable naming of variables, like "stuff", "thingy" and other monty-pythonic references.

My plan is to achieve state when all kind of files can be successfully parsed, and then cleanup and optimization is going to follow.
« Last Edit: April 07, 2021, 05:14:29 AM by sem-geologist »

sem-geologist

  • Professor
  • ****
  • Posts: 302
Re: Cameca .Tiff file outputs
« Reply #11 on: April 06, 2021, 12:40:03 PM »
oh, I forgot... ksy is not used directly. So that mean if using it as code, the wrapper library should also have LGPL license, and on demand of user its source for recompilation along with ksy should be shared, or distributed together with program. Sorry for misleadings in post above. I forgot how this works in python, as there is literally nothing compiled.
I probably was mislead at that time when thought that it will be going to be one of the common ways:
https://www.gitmemory.com/issue/kaitai-io/kaitai_struct_python_runtime/50/691314234; but looks it is a bad idea, and precompilation/transformation into specific language is going to be main way.

sem-geologist

  • Professor
  • ****
  • Posts: 302
Re: Cameca .Tiff file outputs
« Reply #12 on: May 10, 2021, 03:08:50 PM »
I am attaching a bit newer version. Thanks to this thread I got reminded that it is possible to do mosaics (how I could miss that...?), that was quite a missing piece, and taking this into account at last part of some binary parts, which looked previously like complete junk, starts to make some sense. I tried latest version on many qtiDat, wdsDat and impDat files, but it is very high probability I had not covered some corner case. The implementation still does not parse all items, but only a first dataset. I had one mosaic file (8x7),  which contained only a single BSE channel, it would be good to test on some larger (more signals) mapping (mosaicing) impDat. I ask all interested parts in this attempt to download this and test on your special binary files. To check the parser You don't need to install anything, You can use kaitai_web_ide for that (https://ide.kaitai.io/), while it is web IDE (works in browser) it process data on the PC, it does not uploads files to the internet, so no worries (web ide is easiest way to use javascript as it is preinstalled with most of web browsers). For your extremely big files you need to be sure that you are using 64bit browser, and Your OS or browser does not memory constrain browser-build-in javascript. You basically need to drag and drop ksy file and after that the binary file to that ide. I want to be assured that no binary files cause any errors in this state and so I can move to finalization of this. There left to be some outer structures to be (re)defined, before re enabling parsing of all datasets in the file. That outer data structure is different for qti, wds and image/mapping data with only some parts being common, and thus I need to rewrite and/or write it up (that is why parsing for current state is set only for only first dataset). If no more surprises comes up to that outer structs (unless some of you will report some errors) then finalizing it is very very near.
I will then go for cleaning and renaming some secondary fields/attributes (like "junk", "unknown", "etc" fields into numbered "reserved") and add some inline documentation useful for complete implementation at target language.

Quote
Nope, no idea what is in that 3 Gb.....pixies?  unicorns?  analysts tears?
Coming back to this statement, I need to say that those files can look bloated, but actually they hold quite lots of advantage. If acquisition is set to do more than single run (making the same area to be scanned multiple times) the data from every run will be saved in the file and after those the sum of all those runs will be saved as an additional layer/array. The same goes for WDS scans, if multiple takes on scan is set, then all of those runs are saved separately, and after those a sum spectra follows. The file can be easily bloated, as in mapping (or/and mosaicing) it takes 32bit floating point per pixel. 8bit grey is available only if taking a single picture with taking image button in peaksight. Optical impDat images also takes 32bits (but it is RGBX, where X is not used, so 24bits in 32bit block, and so if taking optical image mosaicing then it can look like 32bit.
« Last Edit: May 10, 2021, 03:26:39 PM by sem-geologist »

John Donovan

  • Administrator
  • Emeritus
  • *****
  • Posts: 3275
  • Other duties as assigned...
    • Probe Software
Re: Cameca .Tiff file outputs
« Reply #13 on: May 10, 2021, 03:58:07 PM »
Since SG mentioned mosaicing of image files, I should also mention that if you have the Probe for EPMA software one can mosaic BMP, GIF or JPG images using the Mosaic feature in the Stage application as described here:

https://probesoftware.com/smf/index.php?topic=324.msg5044#msg5044

However, this requires that each image file also has an associated .ACQ file which contains the image extents in stage coordinates.  These ACQ files are automatically generated when the mosaic images are acquired using the Stage application.  ACQ files can also be generated in Probe for EPMA through various image acquisition modes when the images are exported.

Similarly, one can also utilize the Probe Image software to mosaic sets of images as described here:

https://probesoftware.com/smf/index.php?topic=132.msg542#msg542

But again, this requires that each image file have an associated .PrbImg file which contains the stage extents.

On the other hand one can mosaic sets of images (using 32 bit data) using Golden Software's Surfer package. That is, if the images are converted to the .GRD format used by the Surfer program (which already includes the X and Y coordinates). 

Finally one can utilize the *free* PictureSnapApp software to mosaic sets of .BMP files into a mosaic, but again, each BMP file requires an associated .ACQ file containing the stage extents as described here:

https://probesoftware.com/smf/index.php?topic=1087.msg7230#msg7230
John J. Donovan, Pres. 
(541) 343-3400

"Not Absolutely Certain, Yet Reliable"

sem-geologist

  • Professor
  • ****
  • Posts: 302
Re: Cameca .Tiff file outputs
« Reply #14 on: May 11, 2021, 06:22:09 AM »
Yes John,
I can imagine that your software can do that. But why then You need ability to read impDat?

I am attaching nearly fully working parser. There are still some missing dynamic dtype checking for qti data (ALL, by stochiometry, matrix + stochiometry implemented; other not). But for images and wds scans this should work already fully in this version. I did a small clean up, few attributes got reorganized and renamed. This will be going to be cleaned up a bit more. I probably better should open github repository for this file, so that if someone would come with some non implemented feature or error could report it there. I think You can really start experimenting with the latest version. I think ability to read wdsScans and Qti Data or even calDat could be nice features for very new customers, as importing old data would allow switch flawlessly.

A note about way the mosaics are saved in impDat: It looks that all mosaic images/pieces are blobbed into single continuous array, which looks extremely strange, finding exact data boundaries for every piece needs to be deduced experimentally. Also there is only single note over the stage coordinates (for first piece of mosaic?),
« Last Edit: May 12, 2021, 12:19:21 AM by sem-geologist »