Author Topic: Error trapping JEOL (and Cameca) instrument problems  (Read 3880 times)

John Donovan

  • Administrator
  • Emeritus
  • *****
  • Posts: 3304
  • Other duties as assigned...
    • Probe Software
Error trapping JEOL (and Cameca) instrument problems
« on: January 21, 2017, 10:06:18 AM »
One significant difference between our Probe Software applications and the OEM software is that our software generally reports all errors returned by the instrument to the user.  It appears that sometimes the JEOL or Cameca OEM software will ignore certain errors, but sometimes it appears that they ignore the first N instances of an error and only report the error after it occurs a number of times.  It is not entirely clear which errors are treated in this way- undoubtedly it depends on the instrument, software and firmware versions of the specific instrument.

In practice this means that if a communication chip in your instrument electronics is getting a little old and noisy, PFE will often report the problem first, while the OEM software continues to act like nothing is wrong and will just continue on its merry way!

This is not just an issue for JEOL instruments. One time I received an email from an SX100 lab that said there was something wrong with our Probe for EPMA software because PFE was occasionally throwing up an error that the stage movement timed out during an overnight 1 um step stage traverse.  This particular error means that PFE had commanded a 1 um stage movement, but after 60 seconds (or so), the instrument still had not reported that the motion was completed.  Therefore PFE notes this problem and reports it to the user.  The user also said that it had to be a problem with PFE because the Cameca software could do 1 um step stage traverses just fine. 

So I asked the person- when you get this motion complete time out error from PFE, is the stage still oscillating back and forth by a tiny amount?  And they said "well yeah, it is just sitting there moving back and forth slightly".  And I said: "well that means the stage is trying, but unable to reach the actual commanded position target. And that is why the instrument never reports the motion complete!".   What was interesting was that the Cameca UNIX software did not wait for this motion complete flag, and instead simply waited for some arbitrary amount of time, and then allowed the acquisition to proceed, but while the stage was oscillating back and forth, thus "smearing out" the measurement!

Fortunately the Cameca engineer was able to determine that the y stage axis "brake" was too loose and needed adjustment.  I don't know the details of how specific versions of the JEOL and Cameca software handle such errors, but at times in the past, the OEM softwares simply ignore some errors that we think should be reported, because it indicates a problem with the instrument acquisition and should be dealt with.

On the JEOL side of things, some early 8230/8530 instruments had some spectrometer communication errors which were reported by PFE, but never by the JEOL PC-SEM software.  When we contacted JEOL, they said "Yes, we know about these spectrometer comms errors, and are working on a fix, so just ignore those errors for now as they aren't causing any actual problems".   The problem was apparently a firmware issue that JEOL has since fixed in subsequent firmware updates for the 8230/8530.

More recently some JEOL 8900 instruments (which aren't getting any younger!), will report an error of some type, while again, the same operation in the JEOL UNIX software does not report an issue. An example is an error reported by Minghua Ren on his 8900 instrument:

ERROR in JeolMoveMotor (J8K_SetSpectroSpeed [spectro 4]) : JEOL device error number returned,  1303, undocumented spectrometer error

This is an example of the JEOL instrument reporting a general communication error, which is apparently ignored by the JEOL UNIX software.  The same command works fine on other JEOL 8900 instruments running PFE, so we suspect that the problem is one or more communication chips on this particular instrument are getting a little "flaky", as the technical term goes I believe!

Anette von der Handt puts it this way in an email discussion with Minghua:

Quote
I found PFE much more sensitive to hardware issues. Usually I pick it up with PFE or PI first and then the JEOL software follows suit within a couple of days (happened at my machine with spectrometer boards, stage decoders, and crystal flip motors regardless)...

By now I "trust" PFE (at first I was puzzled and thought it is a software issue. But again, it would usually quickly show up in the JEOL system too) and even my JEOL engineer is slowly coming around to it too.

This is a topic where one can discuss whether software should ignore hardware errors or report them. My preference is to always report errors unless we specifically know that the error is benign and will not result in corrupted data. But if we ignore a hardware error returned from the instrument, then ones instrument may not performing exactly the acquisition that one thinks had been performed.  Here are a few more wise words from Anette on this specific subject (quoted with permission).

Quote
However, "Bad communication chips" is my guess too. Somehow, the JEOL system is more forgiving which can be nice when you desperately need data but often it mixes in bad data beforehand.

And some stuff can lurk around in the machine forever. I am still annoyed by the faulty XRay counter board which was apparently bad since the installation of the machine (23 years ago). And my machine was not the only 8900 that was affected by it...

In case you are interested, you might want to run high current maps, lower magnification beam scan maps (so that you get Bragg defocusing) on an aluminum metal standard. I would get 2-5% of pixels with 128/256/512 values like hot pixels. I only noticed because I started doing quant maps with PI (I showed that at the PFE user meeting at the TC this year).

This hot pixel mapping issue reminds me a bit of the counting tests John Armstrong performed on his 8530 a year or so ago, where he determined that JEOL had used some lower performing chips for photon counting in the spectrometer pre-amps and found that they gave low count rates even when the beam was off!   I believe JEOL has replaced these chips in all the affected instruments, but you might want to ask your engineer!

The point of all this is simply that we should not be treating our instruments like a "black box", but instead we should be performing routine tests on the instrument which check that the stage, spectrometers, counters, etc are all operating as expected.  I know, I know, none of us has the time to to do some instrument validation regularly (something that Paul Carpenter has been pushing us all for many years to do), because our labs are very busy, but if you do get a report of a hardware problem by PFE from the instrument, be assured there is probably something wrong with the instrument hardware and/or electronics (or even stage mechanicals).

Figuring out whether a problem is a software problem, or an electronic problem or even a mechanical hardware problem (e.g., stage mechanical issues mentioned above), is sometimes very difficult, but we need to be able to recognize these symptoms and treat them properly.

john
« Last Edit: January 21, 2017, 11:24:04 AM by John Donovan »
John J. Donovan, Pres. 
(541) 343-3400

"Not Absolutely Certain, Yet Reliable"