Test Compression for Launch-on-Capture Transition Fault Testing

A new low-power test compression scheme, called Dcompress, is proposed for launch-on-capture transition fault testing by using a new seed encoding scheme, a new design for testability architecture, and a new low-power test application procedure. The new seed encoding scheme generates seeds for all tests by selecting a primitive polynomial that encodes all tests of a compact test set. A software-defined linear feedback shift register architecture, called SLFSR, is proposed to make the new method conform to the current flow of design and test. Experimental results on benchmark circuits show that test data volume can be compressed up to 6300X with the well-compacted baseline test set for a design with 11.8M gates and more than 1.1M scan flip-flops.


INTRODUCTION
The huge amount of test data for Launch-on-Capture (LOC) delay testing makes test compression for delay testing even more important [2,6,40] than that for single stuck-at tests.Test application of compressed test data produces further more test power.
Test data compression for stuck-at tests have been extensively studied in the past decades [4,7,8,15,18,19,21,23,32,46,50].Deterministic Built-In Self-Test (BIST) with reseeding techniques were proposed to compress test data [16,17,19,21,23,32].The early version of scan forest for single stuck-at fault testing by structural analysis was presented by Xiang et al. [50], which compresses test data and reduces test application cost.A gating technique was proposed by Saxena et al. [38] to reduce shift power for scan testing.Test application cost increases based on the gating technique in the work of Saxena et al. [38].A low-power test application scheme in the work of Czysz et al. [13] was presented in an EDT test compression environment.However, some extra control data are 7:2 D. Xiang required by combining a new gating technique.A gating technique was used in the work of Xiang et al. [48] for cost-effective logic BIST, which was proposed based on the original weighted scan enables based logic BIST technique [46,48,51].The logic BIST techniques based on weighted scan enables can significantly improve fault coverage for single stuck-at fault [51] and LOC transition fault testing [49].
The test response compactor in the work of Mitra and Kim [22] handled simultaneous errors from multiple scan chains and sources of unknown logic values (X's), provided diagnosis capabilities, and does not depend on test sets, while providing significant reduction in scan-out pins and test response data.The work of Xiang et al. [53,54] proposed an unknown-tolerant test compression technique for single stuck-at tests.A two-stage low-power scan architecture was proposed by Xiang et al. [55] for low-power test application, which compresses test data.
Combinational test compression techniques were proposed by Bayraktaroglu and Orailoglu [4] and Wang et al. [41], which established a XOR network for test compression by encoding all tests.Test seeds were applied to the network directly unlike other methods [16,17,32], which used a sequential Linear-Feedback Shift Register (LFSR)/ring generator.A response compactor was also combined to compact test responses.
A test data compression and seed encoding approach was proposed in the work of Hellebrand et al. [16] by configuring multiple primitive polynomials with the same LFSR.Only a limited number of primitive polynomials can be configured based on the method of Hellebrand et al. [16].It is essential to propose a cost-effective LFSR to configure a large number of primitive polynomials into a single LFSR, which can be very beneficial for seed encoding and pseudo-random test pattern generation, especially for LOC delay testing.
Compactor circuits designed using the X-Compact technique [22] can be inserted at the scan chain outputs to significantly reduce test application time and test data volume.The compactor circuits designed using the X-Compact approach handled simultaneous errors from multiple scan chains and sources of unknown logic values (X's), which provided diagnosis capabilities, and did not depend on test sets.
Generally, in many cases, test compression for LOC delay fault testing cannot be extended directly from the ones for stuck-at tests [14,44].This is mainly because LOC delay tests are generated in a two-frame circuit, but the single stuck-at tests are generated in a single-frame circuit.Test response compaction for LOC delay testing is similar [44].
Test data volume for LOC delay testing is usually quite large.Reduction of test data and test power for at-speed delay testing is essential [18,33,42,43].An integrated Automatic Test Pattern Generation (ATPG) scheme was proposed in the work of Wu et al. [43], which efficiently and effectively performs compressible low capture power X-filling.Initial results on delay test compression were presented in other works [24, 29-31, 39, 43, 44] (also see [48,56]).However, the method of Wu et al. [43] was proposed for power/thermal consideration.Other methods [29][30][31] were proposed for functional broadside delay testing.Therefore, there is a strong need for proposing an effective low-power test compression for LOC delay testing.LOC delay fault testing is widely used in industry.
The Illinois scan architecture was extended to delay testing based on the generated test set in the work of Sharma et al. [39].A new scan forest architecture for LOC delay testing was proposed by Xiang et al. [44] to compress test data, compact test responses, and reduce test application cost based on two-frame circuit structural analysis.A new Design for Testability (DFT) architecture and test application scheme was presented in another work by Xiang et al. [53] to reduce thermal emergencies for small delay defect testing.The method in yet another work by Xiang et al. [56] includes an effective test ordering scheme, a thermal-aware path selection scheme, and an initial test compression technique.DFT support was proposed by Saeed and Sinanoglu [34] for enabling 7:3 the use of a set of patterns optimized for cost and quality in a low power manner.A low-power test generator to reduce test power at launch and capture cycles was proposed by Wen et al. [42] by proper X-filling for the tests with a lot of don't cares, which can be easily incorporated into any test generation or design flow.Other techniques [9][10][11] were proposed to reduce test power at launch or capture cycles by revising the test generator or circuit partitioning.
A fully autonomous BIST approach for FAST was presented in the work of Kampmann et al. [18], which supports in-field testing by appropriate strategies for test generation and response compaction.The required test frequencies for HDF detection are selected such that hardware overhead and test time are minimized.Test response compaction handles the large number of unknowns (Xvalues) on long paths by storing intermediate MISR (Multiple-Input Signal Register) signatures in a small on-chip memory for later analysis using X-canceling transformations.
BIST for LOC and Launch-Off-Shift (LOS) delay testing [26-28, 36, 37] is an effective way to reduce the test data volume of at-speed delay testing.A combination of BIST and deterministic BIST can reduce test data volume for LOC/LOS delay testing significantly.However, fault coverage for LOC delay testing is usually very low.Additionally, we must reduce test power of BIST for LOC/LOS delay testing.
Low-power BIST approaches to LOC/LOS delay testing were proposed in the work of Omana et al. [26][27][28].A novel scalable approach was proposed by Omana et al. [26,28] to reduce the power droop during at-speed testing of sequential circuits with scan-based logic BIST using the LOC scheme.Two approaches were presented in another work by Omana et al. [27] to reduce the power droop generated at capture cycles during at-speed test of sequential circuits with scan-based logic BIST using the LOS scheme.
In this article, a new low-power test compression architecture is proposed for LOC transition fault testing.The major contributions are as follows.A new low-power test compression scheme is proposed to compress LOC delay tests; a new test application scheme and DFT architecture are proposed to implement the low-power test compression method; a new technique is proposed to select primitive polynomials and the number of extra variables, which encodes all tests for LOC transition fault testing; and a novel software-defined LFSR, called SLFSR, is proposed to implement the new low-power test compression approach.It establishes an LFSR with any primitive polynomial and any sizes.
We propose a new DFT architecture and a new test compression scheme by selecting a primitive polynomial for each test.The selected primitive polynomials encode all tests.The proposed new decompressor is also configurable for the number of external scan-in pins.SLFSR is proposed to implement the selected primitive polynomial by delivering an extra vector into the early inserted SLFSR.The SLFSR can establish an LFSR established by any primitive polynomials.
A low-power deterministic BIST approach is proposed for LOC delay testing.We intend to present a new method that effectively combines an efficient low-power test application scheme with test compression capability or LOC delay testing.Power consumption for delay testing is an even more difficult problem because of a much larger number of delay test patterns.
The new primitive polynomial selection method combines with a new scheme to select the number of extra variables.Using the software-defined decompressor, each of the deterministic tests can be encoded into a seed with a sequence of extra variables injected into the LFSR.The DFT logic including the LFSR is inserted into the circuit before test pattern generation.The LFSR is implemented by shifting some extra control vector to implement the selected LFSR.
A new Phase Shifter (PS) is used for the new decompressor design.The ATPG scheme in the work of Xiang et al. [49] is the baseline of the compression-aware test generation technique in the paper.Their ATPG method [49] did not consider test compression.Details of the new PS and the ATPG scheme are not presented for simplicity.The rest of this article is organized as follows.The flowchart of the proposed DFT technique is presented in Section 2. The new DFT architecture with the new LFSR is presented for low-power test compression for LOC transition fault testing in Section 3.An innovative LFSR called SLFSR is presented in Section 4. A new procedure to select primitive polynomials and the number of extra variables injected into the SLFSR is presented in Section 4. A new low-power test compression scheme is presented in Section 5. Experimental results are shown in Section 6.The article concludes in Section 7.

THE FLOWCHART OF THE PROPOSED DFT TECHNIQUE
Figure 1 presents the flowchart of the proposed low-power test application scheme.Our method first establishes the DFT architecture as presented in Figure 2 for the proposed low-power test compression method.It includes an SLFSR, the scan tree construction [44] for LOC delay testing, the test response compactor, the gating logic insertion, and the new PS.After the DFT architecture has been established, our method generates a compact test for the selected target transition fault under the LOC delay testing mode [45].After all deterministic tests have been generated, our method selects a primitive polynomial and a minimum number of extra variables that establish the SLFSR to the test.An encoded seed is applied to the circuit using the proposed low-power test application scheme.
Test data volume for LOC transition fault testing is far more than that of single stuck-at testing.It is important to propose an effective method for LOC transition fault test compression.We cannot directly use test compression techniques for stuck-at fault testing to compress test data for LOC delay fault testing.The most significant differences for test compression of stuck-at and LOC delay fault testing can be stated as follows.First, stimulus test data for LOC delay testing are generated with a two-frame circuit model, whereas stuck-at tests are generated with a one-frame circuit model.Therefore, the correlation between any pair of scan flip-flops is extended to the two-frame circuit.Test compression for LOC delay testing is different from that of stuck-at tests so as not to introduce new untestable faults, which are testable in the original circuit.Second, test response data compaction is different.Response data for any pair of scan flip-flops are also extended to a two-frame circuit model, and therefore the relation between scan flip-flop pairs shows in the two-frame circuit; however, test response data of two scan flip-flops for single stuck-at fault testing is considered in a one-frame circuit model.
As for the sequential linear based decompressors, such as the ones in the work of Kampmann et al. [18] and Rajski et al. [32], the DFT configuration for single stuck-at tests can also be ineffective.The numbers of external scan-in pins and internal scan-in pins should be regulated, as otherwise some unexpected untestable faults must be introduced.In other words, a single decompressor configuration cannot effectively compress test data for both single stuck-at tests and LOC transition fault tests.
The Pseudo-Primary Inputs (PPIs) for two scan flip-flops are driven by the same test signal for all tests when they are placed at the same level of the same scan tree.The sufficient conditions to merge two scan flip-flops into a single for LOC transition fault testing and single stuck-at fault testing are not the same.Two scan flip-flops can be placed at the same level of the same scan tree for LOC delay testing if they do not have any common combinational successor in the two-frame circuit [44].Two scan flip-flops can be configured at the same level of the same scan tree if they do not have any common combinational successor in the combinational part of the circuit.
The principle for the current test compression flows with scan chain design is similar.The structural correlation among scan flip-flops at the same level of scan chains has impact on the performance of test compression.The correlation between the scan flip-flops for single stuck-at tests shows in a one-frame circuit, and the dependencies are reflected in the two-frame circuit model for LOC delay testing.The consideration should be reflected on the ratio for the external scan-in pins and the internal scan-in pins.In other words, test compression with the same scan configuration for LOC delay tests as that for single stuck-at tests may introduce unnecessary new untestable faults, coverage loss, or test pattern expansion.
Two scan flip-flops f 1 and f 2 can be placed at the same level of the same scan tree for single stuck-at fault testing into the same scan flip-flop group for single stuck-at fault testing if the PPIs of the two scan flip-flops do not have any common combinational successor in the combinational part of the circuit.Test stimulus for two scan flip-flops can be compressed into a single bit for single stuck-at tests if they do not have any common combinational successor in the combinational part of the circuit.
Sufficient conditions for test compression of LOC delay fault testing can be stated as follows.Test stimulus for two scan flip-flops can be compressed into a single bit for LOC transition fault tests if they do not have any common combinational successor in the two-frame circuit.Sufficient conditions to compact test responses for single stuck-at fault testing and LOC transition fault testing is described as follows.Test responses of two scan flip-flops can be compacted into a single bit for LOC transition fault tests if they do not have any common combinational predecessor in the two-frame circuit.

THE DFT ARCHITECTURE
One can use the same scan architecture and test response compactor for single stuck-at fault and LOC delay fault testing according to the features of LOC delay testing.In other words, a test response compactor and a test data compressor for LOC delay testing can be used for single stuckat fault testing.However, the performance of test compression and test response compaction for single stuck-at tests cannot be regulated to the best level in this way.
On the other side, a test response compactor/test compressor for single stuck-at fault testing cannot be directly applied to LOC delay fault testing.It is not a proper way to use the same scan architecture and test response compactor based on the features of single stuck-at tests to compress test data and compact test responses simultaneously for both single stuck-at tests and LOC delay testing according to the work of Xiang et al. [44], which can introduce a large number of aliasing faults.It is strongly suggested that separate DFT architectures and test response compactors be properly used for both single stuck-at fault testing and LOC delay testing.We consider only test compression for LOC transition fault testing in this article.
A new DFT architecture is presented in Figure 2 to implement the low-power test compression method for LOC transition fault testing.A gating technique is also combined with the two-frame circuit scan forest architecture [44] to reduce scan shift test power.The demultiplexers represented by dmux as shown in Figure 2 are used for the gating technique, which is helpful to increase the compression ratio.It also significantly reduces the number of internal scan chains.The PS is directly connected to the SLFSR to improve the randomness of the test signals, which is also good for the encoding capability of the SLFSR.SLFSR is proposed to encode all tests by using the selected primitive polynomials.
First of all, the scan forest architecture produces multiple fanouts only at the start-end driven by the outputs of demultiplexers, and after that, all scan flip-flops are processed like scan chains; therefore, not much extra layout or routing overhead is introduced.If the number of scan chains is driven by the same output of a demultiplexer (e.g., 20), then two stages of expansion can be used.In other words, the output drives 5 scan chains, whereas each of the first scan flip-flops are connected to 4 scan flip-flops.Each of the 20 scan flip-flops is connected to a scan chain.This can significantly reduce the layout or routing overhead.
In this work, the number of internal scan-in pins is usually controlled very low.An effective primitive selection scheme is proposed by combining the novel SLFSR.The preceding factors weaken the purposes of the PS for randomness and encoding capability improvement unlike the conventional previous methods, such as EDT [32].We use our own techniques for all hardware related to the new DFT method, including the PS.Details of the new PS are not given for simplicity.
The proposed gating technique has great impact on the proposed low-power test compression scheme.It improves the compression ratio but does not increase test application time because increasing the fanout factor of demultiplexers also decreases scan depth.
This DFT architecture is different from the one in the work of Xiang et al. [48] as follows.First, the DFT architecture in their work [48] was proposed to reduce thermal emergencies, which obtained some initial test compression results; the new one is a combination of SLFSR and scan forest for low-power test compression of LOC transition fault testing.Second, the gating logic in the new DFT architecture is quite different from the one in the work of Xiang et al. [48].Third, a novel SDLFSR is proposed.
The new DFT architecture is also different from the one in another work by Xiang et al. [47] for low-power BIST.First, the gating logic is quite different from the one in their work [47].The demultiplexers and the gating technique present a low-power test application scheme, which does not increase test application cost.Second, the gating technique increases the test compression ratio; however, the one in the work of Xiang et al. [47] does not have this feature.Third, the SLFSR architecture is combined in the DFT architecture for configurable LFSR, which improves encoding capability.Fourth, the scan forest and test response compactor are constructed in the two-frame circuit model; however, the scan forest in their work [47] is constructed in a single-frame circuit model.
As shown in Figure 2, the scan forest architecture [44] for LOC transition fault testing is used for the low-power test compression scheme.Any pair of scan flip-flops are included in the same scan flip-flop group if they do not have any common combinational successor in the two-frame circuit.This condition can be more restrictive than that for single stuck-at fault testing.
Each stage of the PS drives the input of a demultiplexer instead of a scan chain, and each demultiplexer drives dmux scan trees, where dmux is the fanout factor of the demultiplexers.Each scan tree [44] contains multiple scan chains, which is constructed by structural analysis in the two-frame circuit.Scan flip-flops in the same group (f 1 , f 2 , . . ., f д ) are placed at the same level in the same scan tree.It is required that each pair of scan flip-flops in (f 1 , f 2 , . . ., f д ) not have any common combinational successor in the two-frame circuit.
All primary inputs are also included for scan flip-flop grouping to simplify test encoding, which are not directly connected to the PS.Two copies of extra scan flip-flops are inserted into the circuit for each PI in both frames of the circuit as shown in Figure 3(a).An extra multiplexer is injected for each PI to deliver different values in the launch and capture cycles to a PI as shown in Figure 3  PIs in the first frame are grouped like all regular scan flip-flops.Each extra scan flip-flop attached to PIs in the second frame is connected to scan trees, which is connected to the PI via a buffer with one clock cycle delay.The area overhead of the injected extra logic is trivial for practical designs.
Our method provides zero-aliasing test response compaction.As shown Figure 2, all scan trees are driven by k separate clock signals.All scan output signals of the scan chains driven by the same clock signal are connected to the same XOR tree.Let f i, j be a scan flip-flop, and let (f i,1 , f i,2 , . . ., f i,l ) be the ith scan chain.Two scan chains (f 1,1 , f 1,2 , . . ., f 1,l ) and (f 2,1 , f 2,2 , . . ., f 2,l ) are compatible if f 1,1 and f 2,1 , f 1,2 and f 2,2 , . . ., f 1,l and f 2,l do not have any common combinational predecessor in the two-frame circuit, respectively.Scan-out signals of two compatible scan chains can be connected to the same XOR gate.A routing-aware technique like the one in the work of Xiang et al. [44] is adopted to reduce the connection overhead to establish scan forest for LOC transition fault testing.The connection overhead can be controlled to be very low in this way.
The fanout factor dmux for the demultiplexers is selected for the gating technique, which can be of impact on test response compaction [44], the compression ratio, and test power.Let dmux be 16, and let the group size for the scan trees be 10.Each scan-in pin drives 160 scan chains.Much shorter scan chains are used [44] with the proposed low-power test compression DFT architecture.
Increasing dmux does not increase test application cost using the new low-power test compression scheme.Therefore, the test application cost is less than that of the scan chain designed circuit.It reduces the maximum depth of scan trees.As shown in Figure 2 The size of the LFSR can be very large if it is set according to the number of maximum care bits, because a few vectors can have a large number of care bits.This may significantly increase the test data volume.This problem has been solved by injecting a small number of extra variables to the LFSR or ring generator [32] without keeping a big seed for each vector.
There is still enough room to reduce the total number of care bits for all tests generated for the scan chain based designs.Scan tree architecture can significantly reduce the total number of care bits and the maximum number of care bits of the tests.It is found that the maximum number of care bits can be reduced tens of times for single stuck-at tests using the scan tree architecture compared to the scan chain designs.
The size of the LFSR can be determined as follows.A primitive polynomial, with a number of extra variables, is selected.The LFSR established by the selected primitive polynomials and injected extra variables can encode all deterministic tests based on the compact test generator [45].Tests generated by any other test generators can also be used.
A small number of extra pins are required to control the demultiplexers as shown in Figure 2 to implement the gating technique for low-power BIST, where all demultiplexers in Figure 2 can share the same extra pins.The extra pins of all demultiplexers can be connected to an extra register to reduce the pin overhead.As shown in Figure 2, all scan chains in the same scan tree are selected into the same subset of scan chains, which are driven by the same clock signal.Some extra variables are injected just like EDT [32].We propose a new procedure to select the size of the LFSR, the primitive polynomial, and the number of external scan-in pins (the number of extra variables injected into the SLFSR) that establishes the LFSR to minimize the amount of deterministic test data.The SLFSR encodes all deterministic tests with a small number of primitive polynomials and only a few number of external scan-in pins.
A primitive polynomial is selected for each test as presented in Figure 1 where one or only a small number of primitive polynomials is enough.The LFSR with no extra variables is considered first.If the SLFSR-based pseudo-random test generator cannot encode the deterministic test, we consider the case when a single extra variable is injected.If the deterministic test still cannot be encoded, our method considers the case when two extra variables are injected.This process continues until the given number of extra variables no greater than the given threshold has been tried.If the SLFSRbased decompressor still cannot encode the test, we consider the second primitive polynomial.Our method keeps a limited number of primitive polynomials for the decompressor and compressionaware test generator.It is no more than 8 for all experimental results in this article.In many cases, a single primitive polynomial is enough to encode all tests.If the test is unable to encode by all kept primitive polynomials, the test cube is aborted, and the target fault is then put back to the unprocessed fault list.
The fault simulation process is started for the fully specified test after the test has been found encodable.Continue the preceding process until all faults have been handled.Note that the number of extra variables injected to the SLFSR is equal to the number of external scan-in pins.The CPU time to select a primitive polynomial with the selected number of extra variables injected and the CPU time to encode all deterministic tests are trivial compared to the ATPG time.The number of external scan-in pins is changeable for the new method by reconfiguring the number of external scan-in pins.In all experimental results of this article, the number of external scan-in pins is no more than 2 for all circuits that we present in Section 6.
The proposed DFT architecture can be applied to single stuck-tests.However, the scan forest and test response compactor must be revised according to the methods in TC2007 and the scan  forest for LOC transition fault testing.Generally, the group size for stuck-at testing scan forest can be bigger than that of LOC transition fault testing.It is also easier to construct the structural-based response compactor for single stuck-at fault testing.

THE SLFSR
The size of the LFSR and the primitive polynomial to establish the LFSR are configurable as stated previously.However, ATPG is handled after all DFT logics have been inserted into the circuit under test to conform to the general flow.We must propose a new LFSR inserted before ATPG, which implements all LFSR configurations according to the kept primitive polynomials.This is the most important motivation for us to propose SLFSR.The reason we call it software-defined LFSR is that the proposed SLFSR implements configurable LFSRs with various sizes, and different primitive polynomials by just delivering separate control vectors.
The proposed SLFSR provides flexibility on the size and the primitive polynomial to establish the LFSR.The final architecture for the SLFSR is implemented by the given extra vector.In other words, different control vectors establish different LFSRs.As shown in Figure 4(a), each stage of SLFSR is connected to one input of a separate multiplexer whose other input is a constant 0. The size of the SLFSR is equal to that of the primitive polynomial of the most size.
The control inputs of the multiplexers are connected to different bits of the extra register.The size of the extra register is the same as the number of stages of the SLFSR.As shown in Figures 4(a) and 4(b), if the output of a multiplexer di selects the input ai and c i is set to value 0. The output d i of the multiplexer selects constant 0. The primitive polynomial, which establishes the LFSR, presents a zero coefficient for this stage.Therefore, the structure as presented in Figure 4(a) can implement LFSRs with any primitive polynomials and LFSRs with various sizes.Figure 4(b) presents the simplified version of SLFSR.The multiplexer and the constant 0 are simplified as a two-input AND gate.One of its two inputs is the control variable c i , and the other input is the stage variable a i .The corresponding stage for the primitive polynomial is non-zero when the corresponding bit c i of the control vector loaded in the extra register is 1; otherwise, a constant 0 is connected to the XOR tree and the variable a i does not have any impact on the feedback function.As shown in Figure 4(b), the n-stage SLFSR can implement any LFSR established by any m-degree primitive polynomial with m ≤ n.In any case, the proposed SLFSR is inserted into the circuit before test generation when all other DFT logics such as PS, scan forest, and test response compactor are inserted.The control vectors delivered into the extra register R implements configurations by the kept primitive polynomials that encodes all deterministic tests.Up to now, it is found that all deterministic tests can be encoded by the SLFSR established by only a few primitive polynomials.
Let us consider the proposed SLFSR with 10 stages.Assume that the primitive polynomial is x 10 +x 3 +1, and the following control vector is shifted into the extra shift register 10000001001 as shown in Figure 5(a).To configure the primitive polynomial x 10 +x 9 +x 8 +x 6 +x 4 +x 3 +1, the control vector is set to 10011010111 as shown in Figure 5(b).The proposed SLFSR can also establish an LFSR of eight stages: x 8 +x 6 +x 5 +x 4 +1 with the control vector 10001110100 as shown in Figure 5(c).
Figure 6(a) presents an alternative implementation of SLFSR, where the AND gate is replaced by a NOR gate.Each stage is connected to a NOT gate, whose output is connected to the NOR gate as shown in Figure 6(a).The control vector is the bitwise NOT of the one as shown in Figure 6(b) to establish the LFSR as presented in Figure 5(b).
The proposed SLFSR can have a number of interesting applications: (1) pseudo-random pattern generation with SLFSR, (2) pseudo-random pattern generation for LOC delay testing with SLFSR, and (3) reseeding for transition fault LOC deterministic BIST.Our method selects a primitive polynomial whose degree is no less than 20.This setting is flexible.For a smaller circuit, it can be set to be even smaller sizes.
The reseeding technique stated previously indicates that different primitive polynomials are used to encode tests, which can improve the encodability.Implementation of the reseeding technique is not complex by just loading the corresponding extra vector in R as shown in Figure 3.
An efficient procedure is used to generate primitive polynomials of any desired degree.Our method selects a small number of primitive polynomials of different sizes.As stated earlier, each test attaches a primitive polynomial that encodes it.The generated test cube is aborted, and the target fault is put back to the unprocessed fault list.That fault can be selected as the target fault again, or covered by fault simulation, or selected as a secondary fault of another target fault.
The test data volume of a compressed test is the summation of the seed bit and the total bits of extra variables.It is better to use LFSRs with smaller sizes, which produces less compressed test data volume.The size of the LFSR is no more than 32 for all experimental results in this article.The tool, which we used to generate primitive polynomials, can only handle polynomials up to 128 stages.
When we apply tests to the circuit, all tests are ordered according to the extra control vectors for the tests.Our method uses at most eight primitive polynomials-that is, eight extra vectors are required, each of which has the same number of bits as the size of the SLFSR.This extra data is trivial compared to the total test data volume.

LOW-POWER TEST COMPRESSION
Let us consider a sequential linear decompressor, such as EDT [32], when no gating technique is used.Let L, b j , v, and S be the size of the LFSR, the total number of care bits for the scan flip-flops in the last j levels   SLFSR (it is equal to the number of external scan-in pins), and the total number of care bits for the current test, respectively.Assume that the total number of care bits for all scan chains in the last j levels is b j , and the total number of injected extra variables after j shift cycles is V j .The following conditions can be necessary to encode a deterministic test [19]: (a) L+d •v ≥ S, and (b) b j ≤ (L+V j ) for j ∈ {1, 2, . . .,d}.
For condition (a), the summation of the size of LFSR and the total number of injected extra variables must be no less than the total number of care bits.As for condition (b), all d inequalities must be satisfied for j ∈ {1, 2, . . .,d}, where d is the chain depth.
Distribution of care bits can have impact on the final test data volume.When the number of care bits is distributed to the scan flip-flops close to the scan-out signals, the size of the LFSR or the number of external scan-in pins or the size of the SLFSR can be larger.This may make the test data volume very high.Therefore, it is quite important to propose a low-power test compression scheme that can significantly improve the compression ratio.As shown in Figure 2, the demultiplexers and gating logic are used not only for power consumption reduction but also for encoding capability improvement.The fanout factor of the demultiplexers (dmux) for all scan-in pins are set to the same value.The parameter dmux is equal to k.It has impact on the encoding capability of the SLFSR established by the selected primitive polynomials.
A seed is loaded into the LFSR.The test data are loaded to the first subset of scan trees when the extra variables are injected into the LFSR.The first group of scan trees is disabled after they have received the test data.Simultaneously, all other subsets except the second subset of scan trees are also disabled.The second group of scan trees are activated simultaneously, and all other scan trees are disabled.The test data based on the the resulted state of the LFSR are loaded into the second subset of scan trees when different values on the extra variables are injected to the LFSR.The following d scan shift cycles start from the LFSR state when the deterministic test data are shifted into the second subset of scan trees.This process continues until all scan trees have received the test data.
The external scan-in pins can load constant 0 after enough extra variables have been injected.Our method activates all scan flip-flops in the launch cycle when applying the test data to the primary inputs, which is followed by a capture cycle.
Each clock signal for the gating logic drives a subset of scan trees.An effective seed encoding scheme is used here to reduce the storage requirements of the tests.The new method can only reduce shift power instead of test power in the launch and capture cycles.Test power in the launch and capture cycles can be reduced by using a low-power test generator like that in the work of Wen et al. [42].However, we prefer to reduce capture power by using a new DFT scheme.More details on low capture power design are not presented for simplicity.The scan outputs of all scan chains driven by the same clock signal are connected to the same XOR network to compact test responses during the low-power test application phase.The test responses of the previous test are shifted out in only a few clock cycles when shifting in the next test.
The reasons test data can be compressed better with the proposed method are as follows.First, the number of care bits for a test of the scan tree designed circuit can be far less than that of scan chain designed circuit.Second, the proposed low-power test application scheme avoids unnecessary extra variable injection.
Test data volume for a test includes two portions: (1) the seed kept in the SLFSR and (2) the total number of injected extra variables.The total number of extra variables T is T = v • d • l, where v, d, and l are the number of extra pins injected into the LFSR, chain depth, and the first l rounds of data shifts that must inject extra variables, respectively.
The compression ratio CR is defined as follows.As presented in Equation ( 1), the chain pattern count represents the test pattern count for the circuit designed with scan chains, and compressed test bits stands for the test data volume based on the new test compression scheme Dcompress.The new metric CR for test compression ratio is quite different from the previous ones, which includes test pattern counts before and after test compression.
Usually, the pattern count slightly increases for the scan tree designed circuit.The new metric CR presents more practical performance of a test compression technique.

EXPERIMENTAL RESULTS
The proposed method has been implemented and evaluated on a Dell Precision Tower 7910 workstation.A very small numbers of scan-in pins (SI) and external scan-in pins (usually, no more than 2) were used, making the size of the PS very small.In other words, the area overhead can be reduced significantly.The compact test generator for LOC transition fault testing developed in our group [45] was used as the baseline ATPG tool to generate tests for all transition faults.Dynamic and static test compaction were used in the ATPG tool.
Table 1 shows the statistics of the benchmark circuits used for experiments.We used the ISCAS89 circuit s38417 [5]; the largest ITC99 circuit b19 [12]; and the IWLS2005 circuits wb_conmax, usb_funct, pci_bridge, des_perf, ethernet, and vga_lcd, and four open cores netcard, leon3mp, leon2, and leon3-avnet-3s1500 [1].Another bigger circuit named circuit1 with more than 10M gates and more than 1M flip-flops is used to evaluate the proposed method.The columns labeled FFs and gates, show the number of flip-flops and the number of gates in a circuit, respectively.The columns PIs and POs present the number of primary inputs and primary outputs in a circuit, respectively.The column Org.(bits) shows the original test data volume of the multiple scan designed circuits.
The scan forest architecture and the proposed DFT architecture make the scan chain much shorter, which also has great impact on the test response compactor.Let chain design pattern count, FFs, and PIs represent the number of tests for the scan chain design circuit, the scan flipflops, and primary input numbers.The test compression ratio (reduction times) can be estimated as follows: Test application cost for the multiple scan-chian designed circuits is presented in Equation (2).Test application cost for the proposed low-power test compression scheme Dcompress is evaluated by Equation (3).In Equations ( 2) and (3), D 1 and D 2 represent the scan chain depth for both the conventional scan chain designs and the proposed low-power test compression DFT architecture, and dmux stands for the fanout factor of the demultiplexers for the DFT architecture as shown in Figure 2. The proposed gating technique does not increase test application cost because scan chain depth decreases proportionally when dmux increases.Let the number of internal scan chains be set to the same as that for Dcompress; TA 1 is usually far more than TA 2 when a scan forest is established no matter how one sets the fanout factor dmux for the demultiplexers as presented in Figure 2.
Table 2 shows the performance of the proposed low-power test compression scheme, called Dcompress.The column Sequ.Linear presents the test compression performance of a conventional sequential linear based test compression method.We just implemented the approach presented in an existent sequential linear based test compression method.The test sets for the previous method are generated with the same baseline ATPG tool developed in our group.The tests for Sequ.Linear are generated by using a constrained dynamic test compaction technique.
The columns FC, vec, vec1, PER, CPU (s), and R cpu present the fault coverage, the number of tests for the DFT circuit with the decompressor, the test pattern count of original scan-designed circuit, the pattern expansion time for Sequ.Linear and Dcompress, ATPG runtime (seconds) of the circuit with the test compression logic, and the CPU increase times, respectively.
Dcompress uses a dynamic test compaction scheme without any constraint.Therefore, the number of care bits for all tests is quite different.The constrained dynamic test compaction scheme for  the Sequ.Linear method generates tests with closer numbers of care bits, according to which the test data volume for the final compressed test data volume is far more that of Dcompress.The previous method requires different ATPG runs for different scan configurations (the numbers of external scan pins and internal scan-in pins must be predetermined); however, the proposed Dcompress does provide configurable external scan-in pins.In other words, the number of external scan-in pins can be different for the same test set of a single circuit.
The average pattern expansion ratio PER for Sequ.Linear and Dcompress is 1.85, and 0.81, respectively.In other words, the size of the test set for Sequ.Linear almost doubled for all circuits; however, the test set size for Dcompress designed circuits decreases about 20% on average.As for the largest four open cores netcard, leon3mp, leon2, and leon3-avnet-3s1500, the new method generates very close test set sizes.
The column CR in Table 3 presents the compression ratio (times) estimated by Equation (1).It is found that more than 670 time test data compression can be obtained for circuit netcard with Dcompress.For circuit leon3mp, an increase of internal scan-in pins does not necessarily improve the compression ratio.As for circuit leon2, an increase of internal scan-in pins consistently improves the compression ratio.It is found according to the results in Table 3 that the compression ratio reaches 844.42 when the number of internal scan-in pins is set to 135.The compression ratio for circuit leon3-avnet-3s1500 increases when increasing the parameter SI/ext.Up to 916.32 time compression can be obtained for the largest open core circuit.Dcompress obtains the least CR, 19.09 time, for circuit s38417.
As for the largest circuit with more than 1M scan flip-flops and more than 10M gates, our method runs the ATPG and DFT tool twice for two different scan configurations.It takes more than half a month to run the ATPG and DFT tool.When increasing the parameter SI/ext., our tool does not increase the pattern count; however, the compression ratio is significantly improved.
The scan flip-flop groups for LOC delay testing scan forest of des is quite large (around 100), which makes the test data volume for the scan forest designed circuit quite small.As for circuit b19, our method still reaches up to 71.56 time test data compression although our method derives up to 486.17 time test data compression for circuit vga.In all cases, Dcompress obtains far better test compression ratio than Sequ.Linear even when the parameter SI/ext. is far bigger than that for Dcompress.The pattern expansion exists for both methods.However, the pattern increase rate (times) for the Sequ.Linear method is greater than that of Dcompress for all circuits.A number of circuits that provide a less than 1 pattern expansion rate for Dcompress are shown in Table 2.The number of tests for the scan forest designed circuit [44] increases slightly compared to that of the scan chain designed circuit.These circuits are usb, pci, ethernet, and netcard.The average pattern increase rate for Sequ.Linear is 1.85, which is far more than 0.81, and the average pattern increase rate of Dcompress is 0.81.The pattern increase rate and the average pattern increase rate are informative.
The decompressor also increases the CPU time for ATPG.The CPU time increase rates for both methods are presented in Table 2. Unlike the pattern increase rate, there are several circuits in which Dcompress produces a greater CPU time increase rate than the previous method.However, the average CPU time increase rate (2.40) for the previous method is still greater than that of Dcompress (0.81).
The compressed data for each test include the seed and the total bits of the extra variables delivered to the SLFSR via the external scan-in pins.The previous methods used a ring generator to compress test data with injected external scan-in pins.
Table 4 presents test data volume comparison with the Sequ.Linear test compression technique when the scan configurations are the same as those presented in Table 3. Test data volume for the four bigger open cores netcard, leon3mp, leon2, and leon3-avnet-3s1500 can be compressed to less than 10M with Dcompress.The test data volume of Dcompress is about 44M for the largest circuit named circuit1.The previous method provides far more data volume, although a far bigger SI/ext. is set for all circuits.The column R (times) in Table 4 shows test data volume reduction times for Dcompress compared to the previous sequential linear test compression method.
Performance comparison for the proposed low-power test compression approach for LOC transition fault testing and the existent Sequ.Linear method is presented in Table 5 on average power (mW), test application cost (cycles), and area overhead (AO, percentage) for both methods.The column TAR presents test application cost reduction ratios (times) for Dcompress compared to Sequ.Linear.It is found that up to 33.36 time test application cost reduction can be obtained, although Equation ( 4) presents the area overhead of the proposed method with respect to the original circuit with no testability feature.Results on the area overhead are obtained based on the 65-nm TSMC cell library.The area overhead of the proposed DFT architecture is quite low.The supply voltage is set to 1.5V, and the frequency for test application is set to 200M Hz.
The area overhead of the proposed method AO in the Column Dcompress is presented in Table 5, which presents the performance of the proposed method on power reduction (in milliwatts).We consider only average power.The column power (mW) in the Column Dcompress shows test power reduction for the proposed low-power test compression scheme.The columns dmux, power (mW), and rate (times) show the fanout factor of the demultiplexers for the new method, test power for both methods, and test power reduction rate (times), respectively.The proposed low-power test compression scheme reduces test power up to 33.4×.Note that the numbers of internal scan-in pins (SI) are set to very low, 100, and 20 for the larger benchmark circuits in Sequ.Linear and Dcompress, respectively.The columns TA 1 and TA 2 present the test application cost for Sequ.Linear and Dcompress, respectively.It is found that up to 43.2 time test application cost reduction can be obtained.

CONCLUSION
Test data volume for LOC delay testing is far more than that of stuck-at fault testing.Test compression for delay testing to reduce the huge amount of test data is far more important, which is different from the conventional approaches for stuck-at fault testing.A low-power test compression scheme called Dcompress for LOC transition fault testing was proposed by using a novel software-defined DFT architecture, a new low-power test application approach, and a new deterministic data encoding scheme.
SLFSR was proposed to implement test data encoding by just delivering an extra vector to provide LFSR with different sizes and connected by different primitive polynomials.It was found from the experimental results that test data volume can be compressed up to 6298×, test power reduction up to 33.36×, and test application time reduction up to 43.2× even though the baseline test data are well compacted by the test generator [45,47].

Fig. 1 .
Fig. 1.The flowchart to design for low-power test compression, ATPG, and low-power test application.

Fig. 2 .
Fig. 2. The general DFT architecture of low-power test compression for LOC transition fault testing.
(b).The selection signal for all multiplexers injected into the PIs is connected to the output of a twoinput AND gate.One input is connected to the regular selection signal for all scan flip-flops via an inverter.The other input of the two-input AND gate is an extra pin e, which controls the PIs to launch cycles when e = 1 and the capture cycle when e = 0.All extra scan flip-flops attached to the

Fig. 3 .
Fig. 3. Handling PIs to compress tests for LOC transition fault testing.(a) Two extra scan flip-flops for each PI.(b) The DFT architecture to implement (a).
, there are k clock signals.Each of the clock signals drives a scan tree that fans out from each demultiplexer.The proposed gating technique does not increase the test application compared to the scan chain based designs, which reduces the test application time.The reason is the scan forest, the general DFT architecture, and the new test application scheme.The experimental results also conform to this statement.There is no fault coverage loss introduced by the gating logic because all scan flip-flops capture test responses at the same time.The same values for the extra variables are injected into the LFSR for different subsets of scan trees driven by the different clock signals.The number of bits for the compressed test data of a test is equal to the summation of the total number of bits for the injected extra variables e • d , d ≤ d, where d is the maximum depth of the scan trees.
(close to the scan-out signals), the number of extra variables injected into the ACM Transactions on Design Automation of Electronic Systems, Vol. 29, No. 1, Article 7. Pub.date: November 2023.

Table 1 .
Statistics of the Circuits

Table 2 .
Evaluation of the Test Sets of the Proposed Test Compression Scheme for LOC Transition Fault Testing

Table 3 .
Performance of the Proposed Test Compression Scheme for Test Sets of LOC Transition Fault Testing

Table 4 .
Test Data Volume Comparison of LOC Transition Fault Testing

Table 5 .
Test Power and Test Application Cost Reduction, and Area Overhead number of internal scan-in pins for Dcompress is far less for the last three circuits.AO = area of new − area of the orig.cir.area of orig.cir.× 100% the