MoiréVision: A Generalized Moiré-based Mechanism for 6-DoF Motion Sensing

Ultra-high precision motion sensing leveraging computer vision (CV) is a key technology in many high-precision AR/VR applications such as precise industrial manufacture and image-guided surgery, yet conventional CV can be challenged by moiré-based sensing mechanism, thanks to moiré pattern's high sensitivity to six degrees of freedom (6-DoF) pose changes. Unfortunately, existing moiré-based solutions, in their infancy, cannot deal with complicated curvilinear moiré patterns caused by various perspective angles. In this paper, we propose a generalized moiré-based mechanism, MoiréVision, towards practical adoptions; it relies on high-frequency gratings as visual marker to help extract the fine-grained feature points for ultra-high precision motion sensing. As the foundation of general moiré-based sensing, we propose a formulation to characterize "uncontrolled" curvilinear moiré patterns in practical scenarios. To deal with the problem of moiré feature interference in practice, we propose a Gabor-based algorithm to separate overlapped curvilinear moiré patterns from two dimensions. Furthermore, to extract fine-grained feature points for high-precision motion sensing, we propose a bending function-based model and a resolution-enhanced strategy to reconstruct detailed texture of moiré markers and extract moiré feature points at sub-pixel level. Extensive experimental results show that MoiréVision greatly enhances the usability and generalizability of moiré-based sensing systems in real-world applications.


INTRODUCTION
Ultra-high precision 6-DoF motion sensing has become a key technology in many advanced and sophisticated AR/VR applications [2], such as precise industrial manufacture [10,37,49] and image-guided surgery [5,36,48].For example, in fields like industrial production, accurately determining the position and orientation of objects with a millimeter or even sub-millimeter precision can ensure optimal functionality, seamless integration, and enhanced operation safety [19,23].In fields like image-guided surgery, where precision is paramount, high-accurate motion sensing ensures precise instrument tracking and operating, enabling surgeons to perform intricate procedures with confidence [43].However, these advanced high-precision scenarios are typically equipped with clear and uniform backgrounds to ensure precise operations [28,44].Consequently, such scenes often fail to provide sufficient and reliable feature points for natural feature-based methods [4,13,16,17,21,34,35] to realize precise motion sensing.In contrast, visual marker-based methods [8,14,15,18,25,47] harness markers with distinctive patterns or codes, enabling them to offer stable and high-precision motion sensing for these scenarios.
Conventional CV-based approaches face a serious challenge in the accuracy of 6-DoF motion sensing, imposed by the crucial bottleneck in the finite resolution restriction of cameras.Since a real-life 3-dimensional (3D) world (including the markers) has infinite resolution but that of a camera is always finite, many fine-grained features in the 3D world can be lost due to quantization, as shown in Figure 1.These feature losses may significantly affect the detection of visual markers and hence degrade the motion detection accuracy.To improve upon conventional visual markers, methods based on moiré-marker (hereafter MoiréCode) emerge as the preferred choice due to moiré pattern's high sensitivity to 6-DoF pose changes and robustness to the environmental interference, as shown in Figure 1.Specifically, moiré pattern can magnify the micro pose changes by 5.13 ∼ 11 times [27] compared with conventional marker-based methods.However, moiré-based sensing mechanism for pose estimation is still in its infancy.Most of existing works [7,39,40,46] can only be used for sensing either orientation or position, failing to achieve comprehensive 6-DoF sensing.Furthermore, the current methods capable of achieving 6-DoF motion sensing either require meticulously customized dual-layer printed glass wafer [31], or are confined to interactions with digital screens [27].This undoubtedly impedes the widespread adoption of moiré-based sensing mechanism.
Realizing the vision of generalized moiré-based sensing faces two major challenges.The first challenge is that moiré pattern becomes "uncontrolled" in practice.Existing moirébased systems utilize linear moiré patterns with clear spectrum features, as illustrated in Figure 3.However, in general cases where there is a perspective angle between two grating planes in 3D space as shown in Figure 2(a), the superposed moiré pattern becomes "uncontrolled" due to the uneven perspective variation in one of the layers, making it difficult to extract meaningful features.The second challenge is the mutual interference of moiré features between two dimensions in practical scenarios.In fact, a curvilinear moiré pattern in one dimension alone is sufficient to cause fuzzy features in frequency domain, as the curvilinear pattern comprises frequency components from various angles.In the case of a 2D Color Filter Array (CFA) of camera, it can generate curvilinear moiré patterns in both dimensions, leading to mutual interference of moiré features, as shown in Figure 2(b).
To tackle these challenges, we propose MoiréVision to realize fine-grained feature point extraction for 6-DoF motion sensing based on curvilinear moiré patterns in practical scenarios, as shown in Figure 1.We design a MoiréCode with high-frequency gratings as a visual marker to produce moiré patterns for fine-grained feature extraction.Different from  the complicated dual-layer markers used in existing moirébased systems [31,46], lightweight single-layer MoiréCode with two-dimensional grating patterns can be easily printed on papers or any surface.To deal with the feature interference between two dimensions, we adopt the concept of "fingerprint separation" [9] to separate overlapped curvilinear moiré patterns from two dimensions.To further accurately characterize the curvilinear moiré patterns captured by a camera from any perspective angle, we propose a general formulation that composites the profile and geometric layout of curvilinear gratings to depict their characteristics.With this general formulation, we build a bending function-based model to deduce the geometric layout of the captured Moiré-Code, and then we perform super-resolution on it to help derive the sub-pixel moiré feature points for fine-grained 6-DoF motion sensing.We make three key contributions in this paper.
• We are the first to address the limited usability and generalizability of moiré patterns in practical scenarios.We propose a novel formulation for curvilinear gratings, laying the foundation for developing general moiré-based sensing mechanisms.The promising results show that MoiréVision greatly enhances the usability and generalizability of moirébased sensing systems, enabling fine-grained moiré feature point extraction without being constrained by distance and angles.In the following, we investigate the background and key issues of existing moiré-based systems in Section 2, which serves as the motivation for our subsequent design.We  present a generalized formulation for curvilinear moiré patterns in Section 3. Accordingly, we elaborate upon our system design in Section 4.An in-depth performance evaluation is reported in Section 5, followed by a literature review in Section 6.Finally, we conclude our paper in Section 7.

BACKGROUND AND MOTIVATIONS
In this chapter, we lay the background for moiré-based sensing mechanism and analyze the defects of existing methods, i.e., the inability to achieve 6-DoF motion sensing in practical scenarios.To address this critical issue, we investigate the problems posed by perspective angles in practical situations, including "uncontrollable" moiré patterns, their fuzzy features and mutual interference across dimensions, thereby elucidating the key technical challenges that MoiréVision needs to address.

Primers for Moiré-based Sensing
Moiré-based sensing methods have recently been proposed for ultra-high-precision motion sensing due to their distinguished magnification effect for subtle changes.Compared with traditional marker-based methods, moiré-based systems always prevail by superior position and angle estimation accuracy.Existing moiré-based methods for sensing the camera position and orientation can be classified into customized solutions [7,39,46] and CFA-based solutions [27].
The basic idea of the customized solutions [7,39,46] is to design a dual-layer apparatus to generate artificial moiré patterns, as shown in Figure 3(a).Due to a certain distance between the two periodic grating layers, when the camera's position changes, the projections of the two layers on the camera plane will undergo relative displacement, leading to the change of moiré pattern's period and phase.However, these approaches require a design of intricate dual-layer moiré markers for achieving moiré-based sensing, such as the combination of 3D-printed grids and an iPad [46], customized lens array with printed stripes [39,40], or meticulously customized double-sided glass wafers [31].The complexity of these moiré markers undeniably hamper the generalized adoption of moiré-based sensing.
To further alleviate the burden of designing moiré markers, subsequent work [27] proposes a CFA-based method, where the CFA in the camera is used as one of the two layers to produce the moiré pattern, as shown in Figure 3(b).In this way, variations in the camera's pose, i.e., changes in the CFA, can significantly affect the morphology of the moiré pattern, enabling estimation of the camera's 6-DoF pose.However, existing CFA-based work [27] is only suitable for interaction with a large screen, and the perspective angle during interaction is constrained within 15 • .Exactly because moiré patterns get uncontrolled and curvilinear under large perspective angles, existing methods cannot deal with these irregular moiré patterns to extract effective features.Therefore, it is necessary to study the impact of the perspective phenomenon on moiré patterns to improve the usability and generalizability of moiré-based sensing mechanism.

Uncontrolled Moiré Pattern
Moiré pattern is generated by the superimposition of two grating layers with close spatial frequencies.For customized moiré patterns, the two layers with linear and periodic gratings are manually deployed in parallel, as shown in Figure 3, which causes the superimposed moiré patterns based on these two layers to be linear and periodic.To realize moirébase sensing in a more lightweight and generalized manner, CFA-based methods directly leverage CFA as one of the layers to generate moiré patterns.Consequently, there is a perspective angle between the CFA and MoiréCode, as shown in Figure 2(a).In the camera's coordinate system, the CFA remains constant, while the projected MoiréCode layer undergoes deformation due to the perspective angles relative to the CFA plane.Even a slight distortion in the projected MoiréCode can lead to a pronounced distortion in the final superposed moiré pattern.Therefore, in practical interaction scenarios, the occurrence of curvilinear moiré patterns is common and needs to be solved.
To further explore the reasons for moiré pattern's curvilinear deformation, we leverage the relationship among the frequency vectors of moiré pattern, CFA and the projected MoiréCode to analyze.According to the principle of moiré pattern [6], moiré pattern's frequency vector f  can be derived from the difference between CFA's frequency vector f  and the projected MoiréCode's frequency vector f  , i.e.: We showcase the frequency vectors of gratings in Figure 4.
Notably, the direction of the frequency vector is orthogonal to the stripes' inherent direction to indicate the propagation direction of gratings.Due to the uneven changes caused by the perspective, the frequency vector f  of the projected MoiréCode varies unevenly in the image, including both the frequency magnitude and direction.Accordingly, the frequency vector f  of moiré pattern also changes unevenly in its magnitude and direction, leading to curvilinear patterns.

Intra/Inter-Dimension Interference
The presence of perspective introduces two types of interference in extracting effective features from curvilinear moiré patterns, including intra-dimension and inter-dimension interference."Intra-dimension" refers to the impact of perspective angles on one-dimensional (1D) gratings, while "interdimension" refers to the interference between two dimensions.Here, we use 1D grating to represent one set of stripes with the same propagation direction, and use 2D grating to represent two sets of stripes with mutually orthogonal propagation directions.

Fuzzy Features within Each Dimension.
For linear and periodic moiré patterns as shown in Figure 3, the features of moiré patterns, including the period and propagation direction, are stable across the entire image.Therefore, we can easily extract moiré features based on conventional imagedomain processing or frequency-domain analysis.Whereas, for the curvilinear moiré pattern, the period and direction vary at each pixel point.It is hard to derive uneven moiré features via straightforward image-domain processing due to the high cost of calculating features for each pixel, and the image-domain noises may significantly affect the feature extraction.Besides, the frequency-domain analysis method is also difficult to extract effective features.As shown in Figure 3(a) and Figure 3(b), frequency vectors for regular moiré patterns present clear impulse points or clusters in their spectra, whereas the frequency vectors for curvilinear moiré pattern are fuzzy and cluttered in its spectrum, as shown in Figure 4. Therefore, it is hard to extract effective features for curvilinear moiré patterns based on conventional image processing and spectrum analysis methods.

Mutual
Interference across Dimensions.In addition to the fuzzy features within each dimension, the worse problem is the overlapping problem across two dimensions.As illustrated in Figure 2(a), even 1D gratings are sufficient to generate curvilinear moiré patterns with fuzzy features.
Whereas, for CFA-based moiré patterns, each dimension of CFA can produce a curvilinear moiré pattern, leading to 2D fuzzy features as shown in Figure 2(b).Different from linear periodic moiré patterns, whose two-dimensional features can be clearly separated in the spectrum, the overlapping phenomenon of curvilinear moiré patterns further increases the difficulties for feature extraction.Therefore, in practical scenarios, to extract effective moiré features, we need to first precisely separate curvilinear moiré patterns from two dimensions, and then deal with the fuzzy features for each dimension.

Infeasibility of Deep Learning Model
An intuitive idea to realize moiré-based motion sensing is to use moiré patterns as input to deep learning models to extract finer-grained feature points or output high-precision pose detection results.However, this idea may be unfeasible, primarily due to the following reasons: 1) moiré patterns may not arise in practical scenarios; detectable moiré patterns are only generated when the frequencies of CFA and projected MoiréCode match under limited angles and distances.Therefore, it is challenging to construct effective and comprehensive training datasets.2) Moiré pattern features heavily depend on specific device parameters.When different cameras and MoiréCodes are employed, moiré patterns corresponding to the same pose are totally different.Consequently, the model would face severe generalizability problems.3) Most importantly, the correlation between moiré pattern features and precise poses is extremely weak.The features that truly reflect poses do not directly manifest on the surface feature of moiré patterns, but rather reside within the captured MoiréCode.These effective features undergo significant nonlinear deformations when combined with CFAs, making it difficult to establish correlations with feature points or poses.Therefore, an in-depth investigation into the mechanics of curvilinear moiré patterns and the extraction of essential features is imperative.

MODELING CURVILINEAR GRATING
Based on the understanding of curvilinear moiré patterns in Section 2, we now propose a composite formulation for curvilinear gratings according to their profile and geometric layout functions.This formulation can be applied to pervasive moiré patterns that occur in practical moiré-based sensing applications.

Formulation of Curvilinear Grating
For curvilinear gratings, we use  (, ) to denote the intensity at location (, ), and any curvilinear grating  (, )  obtained by bending straight periodic gratings can be expressed as the following composite function [6]: where  () represents periodic profile, i.e., the transition behavior of intensity values across each of curvilinear stripes, while  = (, ) defines how the curvilinear stripes bend.
Particularly, typical periodic profiles  () include cosine and square waves that correspond to gradually-varied intensity change and sharply-varied intensity change, respectively.Without loss of generality, we set the periodic profile as cosine curve, i.e.,  () = 1 2 + 1 2 cos(2  ).We choose this efficient formulation of cosine profile rather than some complicated functions, e.g., square wave, because the detailed transition behavior across stripes does not influence the primary layout of grating and the superimposed moiré patterns.Besides, for the sake of convenience, we set the frequency  as 1 to get the normalized periodic profile, and the actual frequency of the grating will be contained in its bending function (, ).Here, we showcase a typical curvilinear grating expressed by periodic profile  () and bending function (, ) in Figure 5.We can observe that the bending function (, ) depicts the curvature of the composite gratings, while the periodic profile  () shapes the bending function surface into periodic gratings.Particularly, areas with larger gradients on the surface result in higher grating frequencies when combined with the periodic profile.Therefore, for any curvilinear grating, we can deduce its bending function based on the frequency distribution in the image.

Deriving Projected MoiréCode
Based on the above understanding, for curvilinear moiré patterns, we can characterize their features by calculating their bending functions.However, as shown in Figure 4, the features present in the captured moiré patterns lack intuitive and effective meaning for camera's motion sensing.Instead, the information describing camera's 6-DoF pose relative to the MoiréCode is embedded in the perspective distortion of the projected MoiréCode.Therefore, to realize the motion sensing based on cameras, we need to solve the projected MoiréCode from the captured moiré patterns and then derive meaningful perspective information from the reconstructed MoiréCode.
The basic idea is that as moiré pattern is generated by the constant CFA layer and the projected MoiréCode layer, where   () and   (, ) indicate moiré pattern's periodic profile and the bending function.According to the moiré theory [6], moiré pattern's periodic profile and bending function can be expressed as: where * represent the T-convolution [6].Therefore, we can derive the bending function   (, ) of projected MoiréCode as: Combining with periodic profile   () =   () = 1 2 + 1 2 cos(2), we can finally obtain the formalization of the projected Moiré-Code, i.e., the detailed texture of MoiréCode that cannot be captured in original images due to limited resolution.

DESIGN OF MOIRÉVISION
This section elaborates the system design of MoiréVision with the following key components.1) Moire Feature Extraction: MoiréVision takes the captured MoiréCode images as input to extract moiré area and enhanced moiré pattern.2) Mixed Feature Separation: Based on extracted moiré pattern, a Gabor-based block filtering method is proposed to deal with the problem of cross-dimension feature interference of curvilinear moiré patterns.3) Reconstruction of MoiréVision: Based on proposed formulation of curvilinear gratings in Section 3, a bending function-based method is designed to help derive the detailed bending layout of capture Moiré-Code, which represents the stripes distribution.4) Moiré Feature Point Extraction: A super-resolution strategy for bending function is proposed to reconstruct the detailed texture of captured MoiréCode, so as to extract fine-grained moiré feature points for precise motion sensing.Particularly, the shape and frequency of MoiréCode can be adjusted to support applications with different demands.Specifically, moiré pattern's frequency is equal to the frequency difference between the projected MoiréCode and camera's CFA, while MoiréCode's projected frequency is proportional to the distance and MoiréCode's spatial frequency.Therefore, decreasing MoiréCode's spatial frequency or increasing the camera's resolution are both viable ways to reduce the frequency of moiré pattern so as to support farther working distances.Besides, if the application does not require specific orientation detection, the encoding of the MoiréCode can also be designed as concentric circles so that moiré patterns can be generated at any orientation to sense micro deformations [33].

Extraction of Moiré Area.
For each frame captured by the camera containing a MoiréCode, we first need to extract the region of moiré pattern (ROM), as shown in Figure 6.Specifically, we perform a quadrangle detection to extract the region of MoiréCode and record the coordinates of four vertices.Based on the four vertices, we can further extract an inner region within the quadrangle, forming a regular rectangle.It facilitates the subsequent processing and calculation of moiré pattern's bending function.For the inner region, we select red channel of image to obtain a clear skeleton of moiré pattern since the contrast between green and blue channels is weak.The final bending function results calculated based on the inner region can then be extended to the entire ROM area.

Mixed Feature Separation
According to Section 2.3.2, each dimension of CFA can produce a curvilinear moiré pattern containing individual perspective information.In terms of 2D MoiréCode, each dimension of MoiréCode can form a curvilinear moiré grating with one of the dimensions of CFA, and these two gratings usually overlap in captured images.Therefore, we need to separate the two curvilinear patterns and extract effective features from each of them.For the case of a little perspective angle, the two gratings are mainly horizontal and vertical splines.We can easily separate them via 2D Fast Fourier Transform (FFT) analysis since the frequency directions of these two gratings are nearly orthogonal to each other, as shown in Figure 3(b)'s spectrum.Whereas, for severe impact of perspective on moiré patterns, the two gratings may be severely warped.It is not feasible to separate them by directly performing FFT analysis since the two gratings' frequency directions in spectrum are diffused and overlapped.
We can observe that while the gratings in each dimension may be severely warped across the entire inner region, they are relatively stable within each small block.Hence, we adopt a local block-based approach involving frequency direction extraction, frequency direction classification, and the construction of local Gabor filters [11] for separation.Firstly, we partition the inner region image into blocks with the number of  ×  , as shown in Figure 7.We illustrate the operation for each individual block in the first row of Figure 7, and demonstrate the workflow of the entire algorithm in the second row of Figure 7.For each block, we perform 2D FFT and  extract two main frequency vectors from its spectrum, corresponding to two dimensions of gratings, respectively.Based on the frequency vectors for each dimension, including their magnitudes and orientations, we can construct Gabor filters to filter features along specific orientations, resulting in target grating for each dimension.Note that each block yields two frequency vectors, and filtering with these two vectors separately can produce two individual gratings with different orientations.For each individual block, it is challenging to directly determine which dimension the two vectors belong to.Here, we show the initial filtering results with random classification in Figure 7(b).We can find that it is essential to consider the contextual information of entire image to classify the two vectors with different orientations into their respective dimensions.
Specifically, we adopt a relaxation-labeling-based iterative approach [9,29,32] to classify two vectors for each block.The basic idea for relaxation labeling starts with an initial assignment of labels, i.e., two dimensions, to the vectors and then updates the labels based on compatibility between vectors and a predefined compatibility matrix [30].In our relaxation labeling task, we use the similarity between adjacent vectors to build the compatibility matrix, i.e., the more similar the magnitude and orientation of the vectors, the higher the compatibility value.The algorithm aims to achieve label consistency within all the vectors by adjusting vectors' labels to be more in agreement with the labels of their neighbors and the compatibility matrix.This process continues until labels reach a stable state, resulting in vectors being separated into distinct categories based on their similarity and agreement with each other.Based on the classification results, for each dimension, we create local Gabor filters [1] based on the magnitude and orientation of each block's frequency vector, resulting in the separated gratings for two dimensions, as shown in Figure 7(d).
Note that the number of grating periods within each block should be moderate.If the block size is too large, gratings in each block may have significant warping.Conversely, if the block size is too small, the number of grating periods contained in each block decreases, leading to increased error in frequency vector extraction.Therefore, we determine the block size based on the overall frequency of the entire image.The block size starts at 32 and increases in step of 16.When the average frequency value of all blocks exceeds the frequency threshold of 5, we select the current block size.

Reconstruction of MoiréCode
In this section, we aim to restore the fine-grained projected MoiréCode on camera's CFA plane, i.e., reconstructing the sub-pixel-level details of captured MoiréCode in original images.This allows us to extract fine-grained feature points and thereby improve the precision of 6-DoF motion sensing.These feature points extracted after reconstruction through MoiréVision may not even exist in the original image, as the limited resolution makes it difficult to capture the fine details of MoiréCode projected on the CFA.
Based on the understanding of formulation for curvilinear gratings and the relationship between moiré pattern, CFA, and projected MoiréCode in Section 3, we can restore capture MoiréCode by deriving its bending function from moiré pattern and then recombining with its periodic profile.Therefore, we separately utilize moiré gratings from each dimension to compute MoiréCode's corresponding bending function and the fine-grained gratings after combining periodic profile, and then merge the results from two dimensions.

Bending Function Calculation of Captured MoiréCode.
According to Eq. ( 5), as long as we can obtain the bending function of moiré pattern and CFA, we can derive the bending function of MoiréCode grating projected on CFA plane, i.e., the capture MoiréCode in image.CFA's bending function   (, ) is easy to calculate based on prior parameters of CFA.Therefore, the key problem is to derive moiré pattern's bending function   (, ).
To make the curvilinear grating normalized, we define the intensity range of each curvilinear grating from 0 to 1. Besides, we select the cosine wave function as periodic profile for computational convenience.Therefore, the periodic profile   () of moiré pattern can be set as 1 2 + 1 2 cos(2) 1 , 1 According to Eq. ( 4), the theoretical result of moiré pattern's periodic profile is 1 4 + 1 8 cos(2 ) when we set the periodic profiles of CFA and projected MoiréCode both as 1 2 + 1 2 cos(2 ).Here, we normalize the intensity of moiré pattern from [ 1 8 , 3 8 ] to [0, 1] because higher intensity can benefit the moiré pattern extraction and does not affect subsequent feature point extraction.i.e., the moiré pattern can be expressed as: When we re-examine this formula, we find that the bending function is actually part of the moiré pattern's phase 2  (, ).Therefore, the calculation of bending function is now converted to calculating the phase of moiré pattern.Based on the above understanding, we leverage Hilbert transform [20] to calculate the overall 2D phase of the extracted moiré pattern.Hilbert transform is a mathematical operation that is used to transform a real-valued signal into an analytic signal, which is a complex-valued signal.Therefore, we are able to derive the phase information based on the analytic signal's phase angle at each pixel point.Specifically, moiré pattern can be regarded as an extension of a time-domain signal over a 2D image, where the amplitude of the time-domain signal at each moment corresponds to the intensity of each pixel in image.Accordingly, each column of moiré image can be seen as a 1D time-domain signal.For each column (), we perform Hilbert transform on it to obtain an analytic signal  () () that contains the real signal and an imaginary signal with a phase difference of 90 degrees.To get the phase information, we further calculate instantaneous angles of the complex signal  () () to derive phase information for each column of moiré images.Most importantly, the bending function is supposed to be a continuous curve surface, so we need to further perform the unwrap operation along the row of moiré image to eliminate wrinkles in the phase surface, as shown in Figure 8.According to Eq. (3), we can obtain the final bending function of moiré pattern by dividing the 2D phase information by 2.
Since the horizontal and vertical gratings of CFA can be directly defined as   () =  and   () = , we can finally obtain the bending function of captured MoiréCode according to Eq. ( 5).Combining periodic profile   () of projected MoiréCode, we can finally reconstruct the curvilinear gratings of captured MoiréCode.Therefore, to obtain sub-pixel feature points of captured MoiréCode, we need to enhance the resolution of the reconstructed original grating.Obviously, it is in vain to directly perform super-resolution on the reconstructed MoiréCode grating because such a low resolution is not enough to restore the trend of grating profile.Nevertheless, we have defined a general formulation of curvilinear gratings in Eq. (3) so that we only need to perform super-resolution on the calculated bending function   (, ) to help obtain the fine-grained grating of captured MoiréCode.As shown in Figure 9(c), we perform the bi-linear upsampling on the bending function   (, ) of each dimension and apply original periodic profile   () on the new resolution-enhanced bending function  ′  (, ) to derive the fine-grained grating of captured MoiréCode.We zoom in on the reconstructed fine-grained captured MoiréCode in Figure 9(d).It is worth noting that Figure 9(d) is at a sub-pixel level, where the actual pixel-level coordinates correspond to one-eighth of the current coordinate values.Finally, we merge two dimensions to obtain the 2D resolution-enhanced gratings of captured MoiréCode in Figure 9(e) so that we can extract feature points at sub-pixel level for motion sensing.

Moire Feature Point Extraction
4.4.2Fine-grained Feature Point Extraction.Based on the reconstructed fine-grained MoiréCode, we are able to extract fine-grained feature points for realizing ultra-high precision motion sensing.Specifically, we extend the resolutionenhanced bending function  ′  (, ) obtained from inner area to the entire ROM in Figure 6 by fitting 2D bending surface  ′  (, ).After combining with periodic profile   (), we can obtain the fine-grained texture of captured MoiréCode within the entire ROM.For the local area of ROM's four vertices, we extract sub-pixel-level moiré feature points by identifying detailed vertices closest to ROM's pixel-level vertices, according to the known contour and frequency parameters of MoiréCode.We show the extracted moiré feature points at sub-pixel level and traditional CV feature points at pixel level in Figure 10 for comparison.We use semi-transparent red squares to display the traditional pixel-level CV feature  Base on such fine-grained moiré feature points, we extensive motion sensing application can be realized.Notably, in this paper, we mainly focus on how to reconstruct the finegrained projected MoiréCode and extract fine-grained moiré feature points that traditional CV methods cannot detect.Extensive applications for high-precision motion sensing achieved through fine-grained MoiréCode and innovative moiré feature points are left for the extended development of the current proposal.Here, we take the camera's pose estimation as an example to demonstrate the motion-sensing based on MoiréVision.Specifically, we employ traditional Perspective-n-Point (PnP) algorithms [12,42] to calculate camera's 6-DoF pose relative to the MoiréCode.Here, we extract 3D feature points from MoiréCode's corner points in 3D space, while the 2D feature points are exactly moiré feature points extracted from the reconstructed fine-grained MoiréCode.By combining the calibrated camera intrinsic parameters, we can establish the mapping between the 2D and 3D feature points, thus enabling camera's pose estimation relative to the MoiréCode.Without loss of generality, as the complexity of MoiréCode contour increases, we can extract more moiré feature points, thereby further enhancing the accuracy of motion sensing.

Extending Working Space of MoiréVision.
According to Eq. (1), moiré patterns are visible when the magnitude of the spatial frequency vector f  is small.This requirement necessitates that the magnitudes and directions of CFA and projected MoiréCode's frequency vectors f  and f  are both similar.Specifically, the distance between the camera and MoiréCode needs to ensure that the projected MoiréCode's frequency is close to the CFA's frequency, and the rotation angle between the CFA plane and the MoiréCode plane cannot be too large.For instance, when the angle between the frequency vectors f  and f  is 90 • , the magnitude of the generated f  , i.e., f  − f  , can even exceed the magnitude of the original f  and f  , resulting in invisible moiré patterns with ultra-high frequency.Therefore, in this subsection, we need to address both the limitations in distance and rotation angle of the camera along its optical center.
In contrast to the approach in [27], we extend the thumbnail method from simple frequency value reconstruction to the entire 2D curvilinear moiré pattern reconstruction.The fundamental idea of the thumbnail method is to superimpose an artificial grating (f  ) onto high-frequency moiré patterns (f  ) to generate detectable low-frequency thumbnail moiré patterns (f  ), i.e., f  = f  −f  .The frequency value of f  equals   , where   represents the number of pixels in the edge of moiré image, and  is the thumbnail ratio.Specifically, we downsample the original moiré pattern image to generate thumbnails (f  ) and use the Tenengrad function with Sobel operator [45] to evaluate the sharpness of the thumbnails (f  ) to determine the thumbnail ratio .For the curvilinear moiré pattern, there exists the following relationship between its bending function   (, ) and the bending functions   (, ) and   (, ) of the thumbnail and artificial gratings: Based on the 2D feature separation method described in Section 4.2, we can separately solve for the bending function   (, ) of each dimension of the thumbnail moiré patterns.Furthermore, according to Eq. (4.4.3), we can further deduce   (, ) based on our artificially customized grating's bending function   (, ).Particularly, the bending functions of   (, ) for the two dimensions are    (, ) =  and    (, ) = , respectively, Based on the above method, we can reconstruct each dimension of the curvilinear moiré patterns, thereby enhancing the generalizability for MoiréVision in both the distance and rotation angle.The remaining reconstruction of the projected MoiréCode and moiré feature point extraction are the same as Section 4.3 and Section 4.4.

PERFORMANCE EVALUATION 5.1 Implementation and Methodology
Implementation: We implement and evaluate MoiréVision in practical scenarios to show its performance in accuracy and generalizability.Figure 11 illustrates the experimental setup.Our system is built with cameras embedded in smartphones and printed MoiréCode.Based on MoiréVision, we can extract fine-grained moiré feature points in captured images and realize ultra-high precision motion sensing.For the data collection, we record five 15-second dynamic videos with a total of 2250 frames by changing the camera's position and orientation for each experimental setting.
Ground Truth: To obtain the ground truth of Moiré-Code and camera's position and orientation, we separately attach three and four motion capture markers on camera and MoiréCode, and leverage OptiTrack [3] to capture precise coordinates of these markers, as shown in Figure 11.We build OptiTrack's coordinate system based on a fixed corner of a table and regard it as the world coordinate system.Based on the collected coordinates of motion capture markers, we can derive the rotation matrix R   and translation vector T   from MoiréCode to the world coordinate system and the rotation matrix R   and translation vector T   from camera to the world coordinate system.Then we can calculate the rotation matrix R   and translation vector T   from camera's to MoiréCode's coordinate system.
Setup: 1) Comparisons: We compare MoiréVision with the state-of-the-art CV-based solution ArUco [26] by adding four ArUco markers at the four corners of MoiréCode.2) Distance: To demonstrate the generalizability of MoiréVision in practical scenarios, we evaluate the system at a distance range of 100 ∼ 400 cm. 3) Perspective angle: To demonstrate the generalizability of MoiréVision with various camera pose relative to MoiréCode, we change the perspective angle from 0 to 60 • and rotate the camera along its optical axis from 0 to 90 • .4) Camera type: To evaluate the impact of different CFAs, we select five smartphones with different camera parameters, including iPhone X, Xiaomi Mi 11, Xiaomi Mi Note 3, Samsung S20, and Sony Xperia.5) Brightness: We evaluate the impact of different lighting conditions on MoiréVision by adjusting the light intensity from 88 to 1019 lux.6) Moving speed: We evaluate the performance of MoiréVision in motion scenarios by setting the moving speed of a camera from 5 cm/s to 20 cm/s.
Metrics: To evaluate the performance of MoiréVision in motion sensing, we leverage the 6-DoF pose estimation results to access the accuracy of moiré feature points.The metrics we used include rotation error and translation error.Rotation error refers to the difference between the rotation matrix R   calculated by MoiréVision and the ground truth collected by OptiTrack.We translate the rotation matrix to Eulerian angles to represent the detailed rotation error along three axes, and use the mean value of Eulerian angles to express the overall rotation error.Translation error refers to the difference between the translation vector T   calculated by MoiréVision and the ground truth.The translation error can describe the detailed error along  ,  , and  axis, and we also use the mean value of the three axes to describe the overall translation error.11.We calculate the mean error for these MoiréCodes and show the detailed rotation errors for three axes (, , ) and translation errors for three axes ( ,  ,  ) in Figure 12.It is observed that MoiréVision can achieve comparable and even better performance than conventional CV-based method.The average rotation error is about 0.77 • and the average translation error is about 5.17 mm.It is worth noting that ArUco toolkit is able to leverage multiple feature points on ArUco markers to solve the camera pose, whereas MoiréVision only uses the basic feature points of four vertices for calculation.To be sure, modifying the profile and pattern of MoiréCode to extract more moiré feature points will help MoiréVision get better performance.Besides, the rotation error along  axis and the translation error along  axis are obviously higher than other axes for both ArUco and MoiréVision.It is because they share the same calibrated intrinsic parameters of camera.Therefore, camera calibration still has a significant impact on the results.

Distance.
To evaluate the working distance of MoiréVision, we use the default device of Samsung Galaxy S20 and select two MoiréCodes in the period of 1.2 mm and 2.4 mm  with a side length of 16 cm and 28 cm.For these two Moiré-Codes, we record videos at the distance range of 1.0 ∼ 2.2 m and 2.8 ∼ 4.0 m, respectively.The corresponding results, including the overall translation error and rotation error, are reported in Figure 13.In particular, we mark the trends of the mean error with red and blue lines, respectively.It can be observed that the translation error increases with distance while the rotation error remains relatively stable.Furthermore, there is a difference in the interaction distance supported by these two MoiréCode.This is because for a fixed camera, as the period of MoiréCode increases, a greater projection distance is required for the fixed CFA to match the lower frequency of the projected MoiréCode.As analyzed in Section 4.1.1,reducing the period of MoiréCode or increasing the resolution of the camera can both increase the interaction distance.Therefore, we can determine the period of MoiréCode and select the model of the camera based on specific applications' interaction demands.Additionally, we can employ the thumbnail-based method to extend the interaction distance based on the fixed devices.leads to significant distortion in the aspect ratio of Moiré-Code, making it challenging to detect effective features from the MoiréCode.Besides, we set camera's rotation angles around the optical axis, i.e., the roll angle of the camera, as 0 to 90 • .This covers any possible rotation state between the two layers of CFA and MoiréCode since they both consist of 2D gratings with a direction difference of 90 • .For the perspective angle, MoiréVision exhibits excellent performance in the range of 0 ∼ 50 • , while the translation error significantly increases at 60 degrees.This is because excessively large perspective angles result in only one dimension of moiré patterns being clear and detectable, leading to significant errors in 2D moiré feature point detection.Regarding the roll angle, as both the CFA and MoiréCode are 2D grating, moiré patterns at roll angles of 0 • and 90 • are similar.For 30 • to 60 • , moiré patterns are hard to detect and require thumbnail-based methods to reconstruct, leading to increased translation errors.Nonetheless, MoiréVision still achieves relatively stable performance, demonstrating its usability and generalization in practical scenarios.

Light Intensity.
To evaluate the performance of MoiréVision under different lighting conditions, we set the ambient brightness around the MoiréCode ranging from 88 to 1468 lux.The corresponding experimental results are shown in Figure 15(a).We can observe that as the illumination intensity increases, the performance of MoiréVision continuously improves.However, when the light intensity is overexposed, such as at 1468 lux, the rotation error significantly increases.Overall, MoiréVision can achieve relatively optimal performance when the illumination intensity in practical scenarios is around 400 to 1000 lux.

Movement Speed.
To assess the performance of MoiréVision in dynamic scenarios, we move the camera at three different speeds and record corresponding videos of the Moiré-Code.The experimental results are reported in Figure 15(b).It is worth mentioning that our system is designed to amplify and perceive micro-motions in advanced AR/VR applications, such as precise industrial manufacturing and image-guided   surgery.The micro-level operations involved in these target scenarios do not require high-speed and large-scale interactions.Based on the experimental results, we can observe that at normal moving speeds, relative motions between the camera and target objects in our concerned scenarios do not significantly impact the performance of MoiréVision.Besides, the camera has the ability to capture clear moiré patterns in dynamic scenarios.

Device Type.
We select five different cameras and configure the light intensity to be dim (about 80 lux), moderate (about 500 lux), and bright (about 1000 lux), respectively.The specific rotation and translation errors are reported in Figure 16.Importantly, we ensure that the ISO values of cameras remain consistent across the three light intensity settings, thus preventing adaptive adjustments of image brightness by cameras.It can be observed that some devices, such as Mi 11, Samsung S20, and Sony Xperia, are less affected by varying light conditions.In contrast, the performance of Mi Note 3 notably declines as the light intensity decreases.

Case Study
In this section, we present the ultra-high precision of MoiréVision through a case study of micro vibration.Specifically, as illustrated in Figure 17, we attach a 5x5cm MoiréCode onto a target vibrating object.The vibration is generated by a speaker, and the audio frequency played by the speaker is 13Hz.To generate vibrations with different amplitudes, we adjust the speaker volume to 100%, 80%, and 60%.To obtain the ground truth for these micro vibrations, we utilize a laser vibrometer with a precision of 1 um, and the measured amplitudes of three vibrations are 40, 32, and 17, respectively.Meanwhile, we adhered four OptiTrack markers to the object back to capture the vibration.It is worth noting that such micro amplitudes cannot be captured by traditional visual methods when camera is at a far distance from MoiréCode.This is because the actual amplitude projected in the image spans less than one pixel width.However, the low-frequency characteristics of moiré pattern enable it with the ability to magnify micro motion changes.As a result, in the videos captured in this experiment, the physical vibrations of the target object itself are difficult to discern, yet the jitter in moiré patterns is significant.

Micro Vibration Detection.
We illustrate the detected vibration waveform in Figure 18, including the vibration changes in translation and rotation in Figure 18(a) and Figure 18(c).Additionally, we perform Fast Fourier Transform (FFT) on these two waveforms to observe their spectral distribution in Figure 18(b) and Figure 18(d).It is obvious that there is a 13Hz frequency component in the spectrum of vibration in translation, indicating that MoiréVision is capable of sensing micro vibrations at the micrometer level.Whereas, the spectrum of vibration in rotation cannot effectively detect the actual vibration frequency.This is because the vibrations generated by the speaker are along horizontal direction, exerting minimal influence on rotation.consistent with the ground truth, achieving accuracy over 99%.The detection results for vibration translation account for 68%, 78%, and 120% of the ground truth for three amplitudes respectively, while the detection results for vibration rotation even exceed the ground truth by more than double.This indicates that MoiréVision is limited in precisely reconstructing micro vibration's amplitude and direction.Nonetheless, MoiréVision enables sensitive perception of micrometer-level vibrations that traditional visual methods cannot detect.
Traditional vision-based motion sensing: Vision-based methods for motion sensing mainly include natural featurebased methods [4,13,16,17,21,34,35] and visual markerbased methods [8,14,15,18,25,47].Natural feature-based methods utilize inherent features in scenes, such as corners, edges, and textures, to estimate the movement of objects, instead of using predefined markers or codes.However, based on such inherent features in scenarios, they are susceptible to scenario's complexity, lightning conditions, and occlusions.Visual marker-based motion sensing involves the use of specific markers or patterns that are strategically attached on objects or deployed in target environment.Although this technology is more accurate and robust than methods based on natural features, in order to achieve ultra-high precision sensing of subtle motion, they still need to adopt expensive camera devices with high resolution.Furthermore, the challenge of vision-based solutions lies in perceiving subtle motions, especially when the target is in front of distant markers.In contrast, MoiréVision leverages the capability of magnifying detail texture features through moiré patterns, achieving both ultra-high precision and robustness.
Moiré-based solutions: Existing moiré-based methods for sensing the camera position and orientation can be classified into customized solutions [7,39,46] and CFA-based solution [27].The basic idea of the customized solutions [7,39,46] is to design a dual-layer apparatus to generate artificial moiré patterns, while CFA-based solutions [27] leverage camera's CFA as one of the two layers to produce the moiré pattern.Xiao et al. [46] propose MoiréBoard to achieve 3-DoF position tracking based on moiré patterns produced by a customized device composed of 3D-printed grids and an LED display.However, the customized dual-layer device is not widely applicable to advanced industrial production or medical environments.Ning et al. [27] propose MoiréPose to achieve 6-DoF pose estimation for interactions with outdoor large LED screens.However, their approach is limited to specific camera-to-screen scenarios where the CFA plane is nearly parallel to the interactive screen, ensuring the effectiveness of the moiré pattern feature in limited scenarios.Qiu et al. [31] propose MoiréTag to achieve angular measurement and tracking based on moiré pattern produced by binary structures printed on both sides of a sophisticated glass wafer.Although this method can solve the situation where there is a perspective angle of between the MoiréTag and camera, it still requires the use of the complex customized device.Different from the above works, MoiréVision extremely enhances the usability and generalizability for moiré-based sensing systems and is capable of sensing 6-DoF motion between the camera and markers with ultra-high precision.

CONCLUSION
In this paper, we propose a generalized moiré-based mechanism, MoiréVision, aiming to address limited usability and generalizability of moiré patterns in practical scenarios.We propose a novel formulation to characterize curvilinear moiré patterns and an algorithm inspired by Gabor filter to deal with the intra/inter interference of curvilinear moiré features.Moreover, we innovate in a bending function-based model and a super-resolution strategy to extract fine-grained moiré feature points for high-precision motion sensing.We implement a MoiréVision prototype and evaluate it extensively with experiments and case studies.Specifically, MoiréVision achieves an ultra-high motion sensing with a translation error of 5.17mm and a rotation error of 0.77 • , enabling finegrained moiré feature point extraction without being constrained by distance and angles.

ROM
(a) Original image.(b) Extraction of ROM.(c) Inner area.

Figure 6 :
Figure 6: Moiré pattern extraction.we can backward derive the representation of the projected MoiréCode as long as we can formalize the captured moiré pattern and CFA.Specifically, we first formalize two original curvilinear gratings CFA and the projected MoiréCode as   (, ) =   (  (, )) and   (, ) =   (  (, )), respectively.Then we can express the superimposed moiré pattern (, ) based on these two gratings as:
bending function of (d).

4. 4
.1 Resolution Enhancement Strategy.However, due to camera's limited resolution, i.e., limited sampling frequency   of CFA, our reconstructed MoiréCode by combining bending function   (, ) and periodic profile   () of MoiréCode still presents blurry coarse gratings, as shown in Figure 9(b).
Four vertices with blurry features at pixel level.  Coarse CV feature point at pixel level Precise moiré feature point at sub-pixel level       (b) Reconstructed MoiréCode and fine-grained moiré feature points.

Figure 10 :
Figure 10: Fine-grained moiré feature points.points.The green squares represent the fine-grained moiré feature points extracted by MoiréVision after reconstructing captured MoiréCode in original images.It can be observed that MoiréVision can restore detailed texture of MoiréCode (which cannot be captured in original images due to camera's limited resolution) and extract finer feature points at sub-pixel level for high-precision motion sensing.Base on such fine-grained moiré feature points, we extensive motion sensing application can be realized.Notably, in this paper, we mainly focus on how to reconstruct the finegrained projected MoiréCode and extract fine-grained moiré feature points that traditional CV methods cannot detect.Extensive applications for high-precision motion sensing achieved through fine-grained MoiréCode and innovative moiré feature points are left for the extended development of the current proposal.Here, we take the camera's pose estimation as an example to demonstrate the motion-sensing based on MoiréVision.Specifically, we employ traditional Perspective-n-Point (PnP) algorithms[12,42] to calculate camera's 6-DoF pose relative to the MoiréCode.Here, we extract 3D feature points from MoiréCode's corner points in 3D space, while the 2D feature points are exactly moiré feature points extracted from the reconstructed fine-grained MoiréCode.By combining the calibrated camera intrinsic parameters, we can establish the mapping between the 2D and 3D feature points, thus enabling camera's pose estimation relative to the MoiréCode.Without loss of generality, as the complexity of MoiréCode contour increases, we can extract more moiré feature points, thereby further enhancing the accuracy of motion sensing.

5. 2 . 1
Comparison with CV-based Method.To show the performance of MoiréVision compared with conventional CVbased methods, we leverage the advanced marker-based toolkit ArUco to show the baseline of 6-DoF pose estimation results.Particularly, we select the default camera of Samsung Galaxy S20 and record videos for three kinds of MoiréCodes with different sizes and periods, as shown in Figure (a) MoiréCode period: 1.2 mm.(b) MoiréCode period: 2.4 mm.

5. 2 .
3 Orientation.To evaluate the generalization of MoiréVision, we conduct experiments for two types of orientation changes: perspective angles and camera's rotation angles along its optical axis, leading to the disappearance of moiré patterns.The detailed detection results are shown in Figure 14.Specifically, we set the perspective angle from 0 to 60 • .It is because when the perspective angle becomes excessively large, a common issue among all CV-based methods (a) Perspective.(b) Roll angle.

Figure 15 :
Figure 15: Impact of lightness and movement speed.

5. 4 . 2
Impact of Vibration Amplitude.We further investigate the impact of three kinds of vibration amplitudes on MoiréVision's motion sensing results.For the translation changes, we compare the results of position detection of the target object and vibration translation at the detected position, as shown in Figure19(a).The results for posture detection and vibration rotation are reported in Figure19(b).Notably, the detection ratio represents the ratio of MoiréVision's detection results to the ground truth, where data exceeding 100% indicates that MoiréVision's detected results exceed the ground truth.We can observe that for all three amplitudes, whether in position or posture, MoiréVision's detection results are nearly Spectrum of (a).
Waveform in rotation.