ABSTRACT
Digital forensics faces several challenges in examining and analyzing data due to an increasing range of technologies at people's disposal. The investigators find themselves having to process and analyze many systems manually (e.g. PC, laptop, Smartphone) in a single case. Unfortunately, current tools such as FTK and Encase have a limited ability to achieve the automation in finding evidence. As a result, a heavy burden is placed on the investigator to both find and analyze evidential artifacts in a heterogenous environment. This paper proposed a clustering approach based on Fuzzy C-Means (FCM) and K-means algorithms to identify the evidential files and isolate the non-related files based on their metadata. A series of experiments using heterogenous real-life forensic cases are conducted to evaluate the approach. Within each case, various types of metadata categories were created based on file systems and applications. The results showed that the clustering based on file systems gave the best results of grouping the evidential artifacts within only five clusters. The proportion across the five clusters was 100% using small configurations of both FCM and K-means with less than 16% of the non-evidential artifacts across all cases -- representing a reduction in having to analyze 84% of the benign files. In terms of the applications, the proportion of evidence was more than 97%, but the proportion of benign files was also relatively high based upon small configurations. However, with a large configuration, the proportion of benign files became very low less than 10%. Successfully prioritizing large proportions of evidence and reducing the volume of benign files to be analyzed, reduces the time taken and cognitive load upon the investigator.
- D. Quick, and K. K. R. Choo, "Big forensic data management in heterogeneous distributed systems: quick analysis of multimedia forensic data," Software: Practice and Experience, vol. 47, no. 8, pp. 1095--1109,2017. Google Scholar
Digital Library
- V. Roussev, C. Quates, and R. Martell, "Real-time digital forensics and triage," Digital Investigation, vol. 10, no. 2, pp. 158--167,2013. Google Scholar
Digital Library
- H. Mohammed, N. Clarke, and F. Li, "An automated approach for digital forensic analysis of heterogeneous big data," The Journal of Digital Forensics, Security and Law: JDFSL, vol. 11, no. 2, pp. 137,2016.Google Scholar
- D. Bennett, "The challenges facing computer forensics investigators in obtaining information from mobile devices for use in criminal investigations," Information Security Journal: A Global Perspective, vol. 21, no. 3, pp. 159--168, 2012. Google Scholar
Digital Library
- M. N. Almunawar, M. Anshari, and H. Susanto, "Adopting Open Source Software in Smartphone Manufacturers' Open Innovation Strategy," Encyclopedia of Information Science and Technology, Fourth Edition, pp. 7369--7381: IGI Global, 2018.Google Scholar
- A. S. Tanenbaum, Modern operating system: Pearson Education, Inc, 2009. Google Scholar
Digital Library
- E. Casey, Digital evidence and computer crime: Forensic science, computers, and the internet: Academic press, 2011. Google Scholar
Digital Library
- D. Ayers, "A second generation computer forensic analysis system," digital investigation, vol. 6, pp. S34--S42, 2009. Google Scholar
Digital Library
- S. Almulla, Y. Iraqi, and A. Jones, "Feasibility of Digital Forensic Examination and Analysis of a Cloud Based Storage Snapshot," Journal of Digital Information Management, vol. 15, no. 1, pp. 19, 2017.Google Scholar
- V. S. Harichandran, F. Breitinger, I. Baggili, and A. Marrington, "A cyber forensics needs analysis survey: Revisiting the domain's needs a decade later," Computers & Security, vol. 57, pp. 1--13,2016. Google Scholar
Digital Library
- R. Xu, and D. Wunsch, "Survey of clustering algorithms," IEEE Transactions on neural networks, vol. 16, no. 3, pp. 645--678, 2005. Google Scholar
Digital Library
- S. C. Guptill, "Metadata and data catalogues," Geographical information systems, vol. 2, pp. 677--692, 1999.Google Scholar
- H. Mohammed, N. Clarke, and F. Le, "Automating the Harmonisation of Heterogeneous Data in Digital Forensics," in 17th European Conference on Cyber Warfare and Security, Oslo, Norway, 2018.Google Scholar
- N. C. Rowe, and S. L. Garfinkel, "Finding anomalous and suspicious files from directory metadata on a large corpus." pp. 115--130.Google Scholar
- L. F. da Cruz Nassif, and E. R. Hruschka, "Document clustering for forensic computing: An approach for improving computer inspection." pp. 265--268. Google Scholar
Digital Library
- M. Al Fahdi, N. L. Clarke, F. Li, and S. M. Furnell, "A suspectoriented intelligent and automated computer forensic analysis," Digital Investigation, vol. 18, pp. 65--76, 2016. Google Scholar
Digital Library
- N. C. Rowe, "Identifying forensically uninteresting files using a large corpus." pp. 86--101.Google Scholar
- N. L. Beebe, and L. Liu, "Clustering digital forensic string search output," Digital Investigation, vol. 11, no. 4, pp. 314--322, 2014. Google Scholar
Digital Library
- M. Kalra, N. Lal, and S. Qamar, "K-Mean Clustering Algorithm Approach for Data Mining of Heterogeneous Data," Information and Communication Technology for Sustainable Development, pp. 61--70: Springer, 2018.Google Scholar
- NIST, "The CFReDS project," 2015, 2015.Google Scholar
- T. P. Liang, E. Turban, and J. E. Aronson, "Decision Support Systems and Intelligent Systems," Yogyakarta: Penerbit Andi, 2005. Google Scholar
Digital Library
- P.-E. Danielsson, "Euclidean distance mapping," Computer Graphics and image processing, vol. 14, no. 3, pp. 227--248, 1980.Google Scholar
Cross Ref
Recommendations
Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology
This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number ...
Survey of Clustering: Algorithms and Applications
This article is a survey into clustering applications and algorithms. A number of important well-known clustering methods are discussed. The authors present a brief history of the development of the field of clustering, discuss various types of ...
A robust fuzzy approach for gene expression data clustering
AbstractIn the big data era, clustering is one of the most popular data mining method. Most clustering algorithms have complications like automatic cluster number determination, poor clustering precision, inconsistent clustering of various datasets and ...





Comments