Validation of Novel Test Methods
In Research and Development, it is often the case that a novel test is required to test a property of a new material or prototype in-situ, or as close to in-situ as possible. Where in-situ is not feasible, as in the case of the long-term insulation properties of a passive flow assurance coating, both the test equipment and validation of test data become important. I present a very specific sample, to make some very broad points on the governing principles of the qualification of testing:
1. No test procedure should rely on the special expertise (beyond a basic understanding of the test and data analysis) of the test facility to interpret the significance of results.
2. Tests performed under a wide range condition, purporting to measure the same property must demonstrate that those conditions do not impact the accuracy or precision of test results.
3. Where the property of interest is complex (that is, made up of two or more measurements combined to give a single test result), the individual measurements must be shown to measure the same sample, at the same time and in the same geometry.
4. That propagation of errors is correctly applied to report the variance (precision) of the test.
5. That a comparison of a test result to a standard requirement (for pass/fail purposes) is both specified and reported unambiguously.
A Simulated Service Test is the Oil and Gas Industries name for a test which simulates the pressure and temperature gradient across an insulated pipe coating under operating conditions (ie 1000 to 3000 meter depth, internal temperature of 80°C to 300°C and external water temperature of 4°C to 23°C. Insulation is important because temperature drops inside the pipe can lead to the formation of hydrates, asphaltenes and/or waxes, any one of which can permanently shut down a line with an intended operating service life of decades.
Apart from the cost of a pressure vessel capable of sustaining this pressure on a large sample, while generating an internal temperature for the duration of the test under steady state conditions, the question of axial heat loss (how much heat is lost through the coating and how much is lost through the length of the pipe) has traditionally made both the engineering set-up of the test and the interpretation of data controversial, to the point where the specific expertise and qualifications of the facility operator are as much at issue as the integrity of the test equipment. The situation is aggravated by the fact that the test is very expensive and typically the last gatekeeper before full approval of production. I won’t weigh in on the issues of quantifying reputation or experience as a test requirement, but there is an alternative available through the ISO17025 standard for qualifying novel tests and facilities where no other qualified facility exists and no objective primary standard for the test property is available (either because no such universal standard exists, or because the property itself is so complex that no representative standard can fully qualify the test of facility.
Under normal circumstances, a new lab (facility or piece of equipment, or proposed methodology) can be qualified by testing a primary standard (NIST certified) of known and certified property values. For example: The standard Kilogram kept at the French Academy of Science which was based on a platinum iridium alloy against which all other weight measures were standardized. Later the weight standard was based on Planck’s constant (which never changes) and the distance standard was based on the wavelength of the Krypton 86 spectrum (which also never changes). In both cases, any new device or facility we wish to qualify, would demonstrate capability by being able to show that it’s measurement of an object calibrated against the primary standard yields the precise value of the standard as a test result and consistently under a variety of conditions which change from test to test, but do not alter the dimensions or weight of the sample being tested. Another method is round robin testing against other facilities already qualified to perform the test. Alone or in tandem, these methods, properly documented and witnessed demonstrate that the test facility is capable of generating the ‘right’ answer (meaning measurably accurate and precise).
The insulation value of a passive flow assurance coating is defined by U value:
U = Q / (A*delta T)
Where A is a ratio of dimensions, delta T is the temperature gradient and Q is heat-flux as measured through the thickness at any one axial and radial location. Temperature and dimensions can be measured with verifiable precision and accuracy by any number of methods, but the point of dispute is the measurement of heat flux. Heat flux can be measured directly by a heat flux sensor, which can be calibrated to a known standard, or indirectly by energy loss. In a building, or a sub-sea pipeline, the reason we want to know heat flux is to determine how much energy is lost through the wall when it’s warm inside and cold outside. The difficulty in measuring this in a pipe, is that some of the energy is lost through the efficiency of the heaters themselves, and a good deal more is lost in thermal conductivity of the inner steel pipe to the axial ends of the test. Estimating the proportion lost is by any standard definition, an estimate at best and may vary with a large number of other factors (thermal properties of the test fluid, and their tendency to dissipate heat axially and radially both internally and at interfaces, thermal efficiency of heaters, thermal loss in other mechanical parts and interfaces, and environmental effects on the precision and accuracy of thermocouples). In short, while we can readily measure the total energy required to maintain an internal temperature at a constant value, we cannot definitively say how much of that energy loss comes from the fluid itself and how much of it comes from the design and construction of the test device and measurement systems.
One way to simulate this in the laboratory is with a Flow Loop, which runs heated oil through a much longer continuous length of insulated piping, this system suffers potentially from the same problem of the SST, but the fraction of energy lost to axial effects is incrementally smaller. The conventional solution was to have an expert generate a correction factor. This was problematic, because the correction factor included, at some point, the theoretical U value of the test sample, and unverifiable assumption about the overall efficiency of the test vessel. It would be vastly preferable for a test to eliminate any assumptions (and therefore the expertise to justify them). For simple equipment, a calibration certificate solves this problem instantly A calibrated scale can be trusted to accurately show the weight of what is being measured (if used according to manufacturer's directions, and the conditions under which it was calibrated). For complex and unique tests, calibration certificates of individual sensors do not assure that the reported values have been correctly assembled. Although an equation is a rather obvious and simple verification of the correct math based on the average individual measurements, the variance (uncertainty) requires calculus to evaluate propagation of error.
The main argument against heat flux sensors, is that they represent an extremely localized measure roughly equal to their surface area (from 1 to 10 cm2) and therefore not representative of any other axial or radial locations where heat flow may be different. The other challenge is that it measures only radial heat flux and ignores all other energy dissipation. This argument is somewhat ironic, given that all of the other energy dissipated, only needs to be considered for the correction factor used in calculating overall energy loss: a correction which isn’t required if we can measure radial heat flux directly. The first criticism is addressed by replication, and placement of sensors. An internal thermocouple is aligned axially and radially with the external thermocouple and heat flux sensor. Replication with placement at different axial and radial locations in the sample, demonstrates both axial and radial variance in U. High reproducibility in readings of the same property by independent sensor arrays demonstrates that the ‘effects’ which potentially compromise the accuracy of the test are trivial. When a large number of sensor arrays are deployed, and most of them agree, it also allows us to discard readings that are unreliable (ie due to an individually malfunctioning sensor, or to a location where a localized flaw in the sample may exist). To avoid accusations of selective discarding of data, corrected and uncorrected data are presented to demonstrate that the overall impact is on variance rather than performance. This mitigates and retesting but requires absolute transparency to persuade operators or clients of legitimacy. The best way to assure this is to write the rules for removing outlier data into the test procedure prior to approval of the test. Current calibration certificates, an approved test procedure (by the test facility and the client) and full agreement over interpretation of results prior to the performance of the test constitutes the first proof of reliability and accountability of the test facility.
The second ‘proof’ of the test is agreement between the measured value of U based on the test, and the theoretical value of U based on the tested properties measured by other means (typically material properties such as creep, thermal conductivity and density) The material properties are tested to calibrated standards, and the equation relating them has no associated correction factors, or theoretical justification.
The third proof of the test is that it shows agreement between theoretical and measured property values under a wide range of conditions including pressure, temperature and dimension. In this test, it would mean that thermocouples and heat flux sensors are demonstrated to give the same results at different pressures and temperatures. Since compression under pressure is also a factor in the test equation (based on the dimensional component), outer diameter of the insulative foam under compression must also me continuously measure through the course of the test.
An SSV test typically runs 28 days, and measurements (of temperature, pressure, heat flux and compression) are recorded continuously. Multiple sensor arrays generate multiple independent measures of U value over the course of the test but given that each sensor in each array collects data continuously, the ‘data file’ at the end of the test contains millions of readings, mostly identical to each other. This very high degree of replication produces a very low overall variance of any single sensor and makes it demonstrably obvious (and statistically provable) if and when an individual sensor or array are compromised, and why this doesn’t necessarily compromise the test.
When I was writing on this as the scientific lead for our company’s facility, the arguments sounded self-serving, no matter how objectively true and verifiable. In practise, they were found acceptable by inspectors, client engineers and certifying bodies (like ISO) even in the absence of standardized testing.
Now, I find them generally useful and instructive in addressing the question of test validation. Whether we simply need to convince ourselves, or our management team that a proposed test is useful, relevant to the objectives of product development, or whether we are concerned about presenting or defending a specific test result or methodology to a client, governing body, or legal challenge.
To recap my original 5 points:
When we consider a property P to be measured, particularly against a pass/fail standard which determines whether or not a sample (as representative of a concept) is fit for purpose, the criteria for acceptance should be unconditionally defined BEFORE they are tested and never ‘corrected’ after the fact to justify a failed test, no matter how relevant or sophisticated the justification is.
Sensors calibrated to generate a measurement under the conditions of the test must be calibrated to the environmental condition that the sensor (not necessarily the sample) operates under.
Complex properties must be measure such that the individual measurements can be demonstrated to the specific property at the location in the sample where the property is representative. Complex properties should always be reported with a variance established by propagation of errors, unless it can be shown that one particular property is the primary (and nearly exclusive) source of variation under all conditions of the test.
Finally, that both the client and the test facility understand the difference between “greater than” and “greater than or equal to”. For a test to pass a “greater than or equal to” requirement, the test result within a mutually acceptable standard of deviation (such as the 95% confidence interval) may be up to 2 standard deviations less than the specified pass criterion and still be fit for purpose. A test result that needs to be “greater than” actually needs to be 2 standard deviations better than the standard to objectively pass, while a requirement for ‘between’ (such as a dimensional specification) must demonstrate either two standard deviations above the lower specification limit and two standard deviations below the upper specification limit to be ‘between’ the performance requirements.
If you think this doesn’t matter, I welcome you to take my place in arguing whether a $200,000 test needs to be repeated or not, and who’s fault it is.