No abstract available.
Safely delegating data mining tasks
Data mining is playing an important role in decision making for business activities and governmental administration. Since many organizations or their divisions do not possess the in-house expertise and infrastructure for data mining, it is beneficial ...
Data mining methodological weaknesses and suggested fixes
Predictive accuracy claims should give explicit descriptions of the steps followed, with access to the code used. This allows referees and readers to check for common traps, and to repeat the same steps on other data. Feature selection and/or model ...
Accuracy estimation with clustered dataset
If the dataset available to machine learning results from cluster sampling (e.g. patients from a sample of hospital wards), the usual cross-validation error rate estimate can lead to biased and misleading results. An adapted cross-validation is ...
Towards automated record linkage
The field of Record Linkage is concerned with identifying records from one or more datasets which refer to the same underlying entities. Where entity-unique identifiers are not available and errors occur, the process is non-trivial. Many techniques ...
A comparative study of classification methods for microarray data analysis
In response to the rapid development of DNA Microarray technology, many classification methods have been used for Microarray classification. SVMs, decision trees, Bagging, Boosting and Random Forest are commonly used methods. In this paper, we conduct ...
Data mining in conceptualising active ageing
The concept of older adults contributing to society in a meaningful way has been termed 'active ageing'. We present applications of data mining techniques on the active ageing data collected via a survey of older australian on a wide range of social and ...
Analysis of breast feeding data using data mining methods
The purpose of this study is to demonstrate the benefit of using common data mining techniques on survey data where statistical analysis is routinely applied. The statistical survey is commonly used to collect quantitative information about an item in a ...
Using a kernel: based approach to visualize integrated chronic fatigue syndrome datasets
We describe the use of a kernel-based approach using the Laplacian matrix to visualize an integrated Chronic Fatigue Syndrome dataset comprising symptom and fatigue questionnaire and patient classification data, complete blood evaluation data and ...
Analyzing harmonic monitoring data using data mining
Harmonic monitoring has become an important tool for harmonic management in distribution systems. A comprehensive harmonic monitoring program has been designed and implemented on a typical electrical MV distribution system in Australia. The monitoring ...
Discover knowledge from distribution maps using Bayesian networks
This paper applies a Bayesian network to model multi criteria distribution maps and to discover knowledge contained in spatial data. The procedure consists of three steps: pre processing map data, training the Bayesian Network model using distribution ...
Data mining for lifetime prediction of metallic components
The ability to accurately predict the lifetime of building components is crucial to optimizing building design, material selection and scheduling of required maintenance. This paper discusses a number of possible data mining methods that can be applied ...
Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text
An increasing number of language and speech applications are gearing towards the use of texts from online sources as input. Despite such rise, not much work can be found in the aspect of integrated approaches for cleaning dirty texts from online ...
A study of local and global thresholding techniques in text categorization
Feature Filtering is an approach that is widely used for dimensionality reduction in text categorization. In this approach feature scoring methods are used to evaluate features leading to selection. Thresholding is then applied to select the highest ...
A characterization of wordnet features in Boolean models for text classification
Supervised text classification is the task of automatically assigning a category label to a previously unlabeled text document. We start with a collection of pre-labeled examples whose assigned categories are used to build a predictive model for each ...
Weighted kernel model for text categorization
Traditional bag-of-words model and recent word-sequence kernel are two well-known techniques in the field of text categorization. Bag-of-words representation neglects the word order, which could result in less computation accuracy for some types of ...
Visualization of attractive and repulsive zones between variables
This paper presents a preprocessing step in mining association rules which uses tables to summarize synthetically the way variables interact by highlighting any zones which are attractive. Attractive zones are those which guarantee that potentially ...
On the optimal working set size in serial and parallel support vector machine learning with the decomposition algorithm
The support vector machine (SVM) is a well-established and accurate supervised learning method for the classification of data in various application fields. The statistical learning task - the so-called training - can be formulated as a quadratic ...
Marking time in sequence mining
Sequence mining is often conducted over static and temporal datasets as well as over collections of events (episodes). More recently, there has also been a focus on the mining of streaming data. However, while many sequences are associated with absolute ...
Discovering debtor patterns of centrelink customers
Data mining is currently becoming an increasingly hot research field, but a large gap still remains between the research of data mining and its application in real-world business. As one of the largest data users in Australia, Centrelink has huge volume ...
What types of events provide the strongest evidence that the stock market is affected by company specific news?
The efficient market hypothesis states that an efficient market immediately incorporates all available information into the price of the traded entity. It is well established that the stock market is not an efficient market as it consists of numerous ...
Investigating the size and value effect in determining performance of Australian listed companies: a neural network approach
This paper explores the size and value effect in influencing performance of individual companies using backpropagation neural networks. According to existing theory, companies with small market capitalization and high book to market ratios have a ...
Extraction of flat and nested data records from web pages
This paper deals with studies the problem of identification and extraction of flat and nested data records from a given web page. With the explosive growth of information sources available on the World Wide Web, it has become increasingly difficult to ...
Tracking the changes of dynamic web pages in the existence of URL rewriting
Crawlers in a knowledge management system need to collect and archive documents from websites, and also track the change status of these documents. However, the existence of URL rewriting mechanism raises a page tracking problem since the URLs of a pair ...
A framework of combining Markov model with association rules for predicting web page accesses
The importance of predicting Web users' behaviour and their next movement has been recognised and discussed by many researchers lately. Association rules and Markov models are the most commonly used approaches for this type of prediction. Association ...
Modeling spread of ideas in online social networks
Internet based online social networks collectively facilitate the spread of ideas. Hence, to understand how social networks evolve as a function of time, it is critical to learn the relationship between the information dissemination pathways or flows ...


