Comparison of computational complexity of neural network models for detection of steganographic algorithms

The purpose of the paper is to compare the computational complexity of the steganalytic method created by the authors in previously developed and presented by other academics to detect three steganographic algorithms, including jUniward, nsF5 and UERD, against other solutions found during the literature survey. The article describes shortly each analyzed steganographic algorithm along with all compared solutions, gives their characteristic points, and includes list of the model specifications. Moreover, the metric comparing the computational complexity, so the number of parameters in a given neural network, is presented. Since none of the compared articles provided this parameter, there is provided algorithm by which these values could be calculated for each analyzed model. The comparison of the results and final conclusions are presented to explain the distinctions between the obtained outcomes and to identify the factors which could have affected such performance.


INTRODUCTION
The kinds of menaces, which may be encoutered on the Internet, are constantly evolving and one of them is the image steganography.Steganography is a technique of hiding a particular message in an unencrypted medium in such a way that no changes in that medium can be noticed.This means that the the content and even the existence of the message are both hidden.Futhermore, image steganography means that the hidden message does not affect the the visual layer of the photo itself and it remains unaltered.It emerges as a potential threat in view of that may be used to leak sensitive data from, e.g., workplaces or websites.For this reason, this topic became our focus, and a series of articles [8,9] emerged from it.There has been already proposed a model of a deep neural network that does not contain splicing layers for detecting existence of image steganography resulting in a series of articles.One of the main assumptions of this solution is the potential lower computational complexity compared to competing solutions.Thus, the main focus of this article is to analyse how the previously prepared neural network model may be compared in terms of performance to other models dedicated to the detection of similar threats.

STEGANOGRAPHIC METHODS DEDICATED FOR JPEG FILES
A couple of steganographic algorithms make it possible to hide information in photos without revealing that there is something extra in them.Three of the most popular algorithms were chosen for the study: J-Uniward, nsF5, and UERD.They will be presented in the consecutive chapters.

J-Uniward
The first analyzed algorithm was J-Uniward [6], whose acronym expands as follows: JPEG Universal Wavelet Relative Distortion.This is a specially adapted version of the algorithm under JPEG compression, which main idea is to model steganographic distortion sprinkled with the introduction of additional information.The purpose of the primary encoder is to modify the input base information in a way to minimize distortions or hide those in areas that are more difficult to detect.To find smooth edges that are easy to detect, calculations of relative changes in values based on the directional decomposition of filter changes are performed.In this way, the encoder analyzes the places that are prone to detection, and thus, this method is very effective in hiding data with little notice.This algorithm uses the Syndrome Trellis Coding (STC) [3] mechanism, which determines how the DCT coefficients should be changed.Its operation is described in the section on the nsF5 algorithm.

nsF5
The nsF5 [4] algorithm is based on introducing additional information into an image by modifying the least significant bits of the AC transform of the DCT of a given JPEG image.The whole process of hiding information is based on so-called syndromes and their encoding.A syndrome is a particular matrix that acts as an encryption or decryption key.Having a message m consisting of p bits and we have n non-zero AC values of the DCT transform available, we need to find the y vector.Our equation must satisfy the following relation: where D is a binary matrix of dimension p x n -this is our syndrome.The data embedding algorithm must find such a solution to the equation that no modification of the bits of the zero value factor is required.The algorithm must also minimize the Hamming weight between the modified and unmodified least significant bit vectors.This is an example of simple syndrome coding.Still, there are also more sophisticated methods, such as the STC mentioned above, where a parity matrix is used instead of D. The y vector is a path through the trellis based on the parity matrix check.

UERD
The last algorithm analyzed is UERD [5], standing for Uniform Embedding Revisited Distortion.The goal of this algorithm is to reduce the probability of detecting steganographically hidden information by minimizing statistical parameter changes in the image.This can be implemented by analyzing the DCT parameters of individual mods but also entire DCT blocks together with their neighbors.The next step is to assess whether a region can be considered noisy and whether the embedding of new information affects statistical features, such as the histogram of the image.Areas that are statistically predictable and pose a risk of revealing this information are marked as "wet." When embedding information, the use of DC modes or zero DCT coefficients is not excluded on the grounds that their statistical profiles allow information to be safely hidden.The algorithm itself tries to distribute the changes in statistics resulting from embedding new data.Here as well as in the previous two algorithms, the STC mechanism is used to hide information.

RELATED STEGANALYTIC METHODS FOR JPEG STEGANOGRAPHY
Five different neural network models were selected for the study.Each of them performs the same task differently, the detection of steganographic algorithms using different models.Most of them focus on j-Uniward detection, but some also work on other methods of hiding data.Also, each of the articles discussed, as well as the author's model, is based on research conducted on the BossBase [1] collection, which contains ten thousand black and white photos of 512x512 pixels each.Depending on the model, they were halved in some cases, as will be shown in the following chapters.The following subsections will briefly discuss each of these solutions.

OneHot model
In the article [13], the authors prepared a neural network model consisting of three convolution layers.The solution consisted of two Conv-Normalization-Dense sections on the first section, which is hybrid, and the two Conv-Normalization paths are combined into one resulting path.By using only two C-N-D sections, it is the least computationally complex of this type of solution.The authors focused on the detection of j-Uniward and nsF5 algorithms.

ResDet
Article [7] focuses on the detection of only the most difficult steganographic algorithm to detect, namely j-Uniward.Here, the authors proposed three alternating sections throughout the model.The first consists of two Convolution-Normalization-Dense sequences, but a Convolution-Normalization sequence is included in parallel, and its result is included before the last Dense layer.The second section, on the other hand, is from the Convolution-Normalization-Dense sequence together, and feedback is attached at the end.This model is already more complicated than the first solution.

DenseNet
Paper [11] focuses, like the previous article, on j-Uniward.The authors proposed a model built from 6 Convolution-Normalization-Dense sequences.Two C-N-D sequences were used at the beginning, but together with them at the end, data from the primary input are combined, and an addition operation is performed.The following two sections are implemented identically to the first step, and the addition operation from the beginning of these sections is also performed.The last two sections C-N-D are already joined directly, and their result is the final result of the whole model.

JPEG-Phase-Aware CNN
Article [2] proposes two network architectures, one of which can have an unlimited number of convolution layers given specific needs, while the other has six main convolution layers.To effectively compare these networks, an option with six layers was chosen.The selected version has 6 Convolution-Normalization-Dense sections.The first two use the TanH activation function, and the other four use ReLU on the Dense layer.In addition, the first two sections are followed by a reduction of the analyzed window of a given image and the following sections are followed by pooling.The last element of the whole model is a linear classifier based on Softmax.As it can be noticed from the description, this is the most complicated model that was taken for comparison.The authors of the article also focused their research on the j-Uniward algorithm.

Solutions based on ImageNet
In the work [12], the authors used existing and learned neural networks mainly for object detection in images.The tests presented in this paper were performed on all the steganographic algorithms that were addressed in our research, namely j-Uniward, nsF5 and UERD.The researchers mainly used neural networks from the Ima-geNet family, which are well documented on the Internet.That is, just for this case, it was not necessary to calculate the number of parameters, and it was enough to take the given values before their creators.For comparison purposes, option of 55 million parameters was assumed.The next chapter will describe our own neural network-based solution.

AUTHOR'S JPEG STEGANOGRAPHY DETECTION METHOD
The model that was obtained during the preceding research differs from the models presented in other articles.The literature mainly uses models based on the Convolution-Normalization-Dense scheme.This means that the given model consists of several sections that include three layers: a convolution layer, then we have a normalization layer, and finally, a perceptron layer, usually based on the ReLU activation function.In the author's solution, only the perceptron and normalization layers part of the scheme was used.The convolution layer was replaced by feature extraction using DCTR methodology.Figure 1 shows the concept of this solution.Next, a study was conducted on what neural network architecture would be most effective.The research showed that a network with five layers (three perceptron and two normalization layers) would be the most efficient in terms of accuracy and performance.Also, from the research, results were obtained that indicated that feature extraction using DCTR gave significantly better results than GFR and PHARM.A diagram of the neural network prepared in this way is shown in Figure 2. The next step was calculating the number of parameters for such a network configuration.This information will be needed in the next step to compare the computational complexity with other models.The DCTR parameterization yielded 8,000 parameters at the input of the neural network, which was crucial when calculating the number of parameters for the neural network.The neural network itself in this configuration had exactly 2,037,951 parameters.The following section will describe the exact algorithm for calculating the number of parameters for models from the literature.

METHODOLOGY OF THE COMPARISON
The model created during the work on the article [9] was prepared for detecting steganographic algorithms like jUniward, UERD, and nsF5.To effectively compare the computational complexity of the models, five different solutions were selected for comparison, which also focused on the above-mentioned algorithms.Another aspect that had to be determined was the computational complexity parameter by which to compare the models with each other.Unfortunately, the analyzed articles did not reveal any information on computational complexity, so these parameters had to be calculated for each model based solely on the data and diagrams attached to the articles in question.Taking into account the available information in these papers, the number of parameters of a given neural network was chosen as the main factor determining the computational complexity and calculability.Unfortunately, not all the information was included in the analyzed works, so some assumptions had to be made.Any exact assumptions will be presented in Chapter 4.

Method of calculating the number of parameters for a given model
As mentioned earlier, the authors of the methods presented in Section 3 did not include clear information on how many parameters their neural networks had.In order to calculate the number of parameters in each neural network, it was necessary to first divide them into component layers and iterate the calculation of parameters step by step, going from the beginning of the network to the end.At the very beginning, it was necessary to obtain information about the size of the image that was fed into the network.Once this information was obtained, it was possible to move on to calculating the number of parameters of the first layer.Depending of layer type, there are unique formulas for calculating this value.Each layer will be discussed in the following subsections and the exact assumptions and calculations will be presented in the Section 7.

Calculating number of parameters for convolutional layer
The convolution layer was the most complicated layer to get the number of parameters.In order to retrieve that data , it is necessary to have information about the filter that was used in a given convolution process, that is, its height and width and its offset, if any.Then next needed information is the number of filters in the preceding layer and the number of filters in the current layer.Having all those information, provided formula can be used: where: PC number of parameters for the convolution layer W width of convolutional filter H height of convolutional filter S stride of convolutional filter PL amount of filters in previous layer CL amount of filters in current layer

Calculating number of parameters for normalization layer
The normalization layer was the easiest layer to calculate the number of parameters.To do this, it is necessary to know the number of filters on the previous layer.The formula for calculating the number of parameters looks like this: where: PN number of parameters for the normalization layer P amount of filters in previous layer

Calculating number of parameters for perceptron layer
The perceptron layer was relatively simpler in terms of calculating the number of parameters.For this layer, it was enough to know the number of filters on a given layer and the number of filters on the preceding layer.The formula for calculating the number of parameters looks like this: where: PP number of parameters for the perceptron layer P amount of filters in previous layer C amount of filters in current layer

COMPARISON RESULTS
After analyzing the articles, unfortunately, it became clear that some information was missing so that the number of parameters on each layer could be calculated.None of the articles specified the number of neurons for the perceptron layers, so in order for the comparison to be meaningful, it was assumed that each model has 250 neurons in this situation.This is identical to the number used in the author's model.In addition, the OneHot model did not have information on the number of filters on the convolutional layer, so after literature analysis, the most common stratum was the number 32, and this was assumed in this particular situation.The size of the image that was fed into the neural network was also important information.The model based on OneHot and DenseNet operated on images of 256 by 256 pixels, while the other two operated on images of 512 by 512 pixels.Based on these assumptions and the information obtained at this stage, it was possible to calculate the number of parameters for each model with a high degree of accuracy.Table 1 shows the results obtained.The parameter values for the ImageNet family and the author's solution have also been added to the table.Due to the fact that the author's model also requires performing feature extraction using the DCTR algorithm, which can be simplified to applying the convolution function of the entire analyzed image with an 8x8 filter and creating a histogram later from the final result, this can be translated into the use of convolution and perceptron layers from the neural network.Applying the predetermined schemes for calculating parameters with an input photo of 512x512 pixels gives us an additional 28,603 parameters, which are highlighted separately in the table.
As we can see, the author's solution had the smallest number of parameters.It can also be seen that the number of parameters increases adequately to the complexity of a given neural network, which implies that the method for calculating the number of parameters is correct.Confirmation of this thesis is that the more a given neural network has convolutional layers, the more the number of parameters in a given neural network increases.
It is also worth comparing the obtained results of the effectiveness of a given model in relation to the parameters used.All authors of the analyzed articles obtained a detection error, calculated based on false-alarm and missed-detection probabilities, as proposed in [10].
For the OneHot solution, a detection error of 8-10 was obtained for j-Uniward and 3.5 for nsF5.ResDet scored 0.89-0.95,DenseNet scored 9.75, and the JPEG-Phase-Aware CNN model scored 8.70 for j-Uniward.Models from the ImageNet parent scored 4-5.13 for j-Uniward, 5.03-6.08 for UERD and nsF5 from 3.14 to as high as 31.47.As can be seen from these results, more parameters do not guarantee better performance in terms of efficiency.The author's solution achieved accuracies of 83.1% for j-Uniward, 88.7% for nsF5 and 84.1% for UERD.Unfortunately, these are different measures of model performance and it is difficult to compare them with each other.

CONCLUSIONS
The purpose of this work was to compare the complexity of various neural network models existing in the literature with the author's solution detailed in previous articles.The number of parameters for each network was chosen as a metric, comparing these models.After calculating the required information, it turned out that the author's solution was the least resource-intensive compared to the other analyzed solutions.Of course, this method has a disadvantage, since DCTR feature extraction cannot be estimated with full accuracy as well, but some assumptions have been made to compare the computational complexity; hovewer these methods at some level of abstraction.Besides, approach of feature extraction itself can take place much earlier, for example, on the already pre-processed image still on the user's end device.Therefore, it cannot be compared directly to other solutions because it does not necessarily put a computational burden on the specific machine that is responsible for creating the model in question.The model creation itself can be done on more powerful machines, the more complex the model is, the more resources are also needed to use it on the end device.Such a device could be a cell phone, which still have less capabilities than specialized large computers.There is also the problem of consuming excessive energy, which is becoming an increasingly important factor in software development these days as well.This type of paradigm could be explored in further research on this topic.

Table 1 :
Amount of parameters for specific models