Abstract
It is significant for real estate investors to understand how the construction environments and building characteristics impact the housing unit price. However, it is challenging for identifying the complex feature-influence from construction environments and building characteristics. It is also hard to alleviate the heterogeneity of real estate. In this article, we propose a framework named Adaptive Weighted Finite Mixture Model to identify the feature-influence and simultaneously alleviate the ill effect of heterogeneity. Applying this framework, we can predict the housing unit price based on the corresponding features. Besides, we discover that the feature-influence exists in the dissimilarity among similar cities. Specifically, we adaptively learn the weights of features to identify the feature-influence, and we model the estimation of the housing unit price with the feature-influence into a finite mixture model. We utilize the Principle Component Analysis algorithm to obtain a low-dimensional representation of housing features. The low-dimensional representation reduces the computational cost of model learning, and it avoids a potential catastrophe of the singular matrix inversion during the process of learning model parameters, which are estimated by the Expectation Maximization algorithm. To avoid the blind search for the latent group number used in the proposed framework, we employ the pre-clustering result as a guide of the searching range of the latent group numbers. We conduct numerous experiments on three real datasets from Shenyang, Changchun, and Harbin, which are the three provincial capital cities that have similar geography, economics, and cultures. The experimental results demonstrate the effectiveness of the proposed framework.
- Hervé Abdi and Lynne J. Williams. 2010. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2, 4 (2010), 433--459.Google Scholar
Digital Library
- David Arthur and Sergei Vassilvitskii. 2007. k-means++: The advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 1027--1035.Google Scholar
Digital Library
- Timothy L. Bailey and Charles Elkan. 1994. Fitting a mixture model by expectation maximization to discover motifs in bipolymers. In Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, California, 28--36.Google Scholar
- Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative searching. Commun. ACM 18, 9 (1975), 509--517.Google Scholar
Digital Library
- Tadeusz Caliński and Jerzy Harabasz. 1974. A dendrite method for cluster analysis. Communications in Statistics-theory and Methods 3, 1 (1974), 1--27.Google Scholar
Cross Ref
- John Clapp and Carmelo Giaccotto. 2002. Evaluating house price forecasts. Journal of Real Estate Research 24, 1 (2002), 1--26.Google Scholar
- John M. Clapp and Carmelo Giaccotto. 1994. The influence of economic variables on local house price dynamics. Journal of Urban Economics 36, 2 (1994), 161--183.Google Scholar
- Jianhua Dai and Qing Xu. 2013. Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Applied Soft Computing 13, 1 (2013), 211--221.Google Scholar
Digital Library
- Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39, 1 (1977), 1--22.Google Scholar
Cross Ref
- Chris Ding and Xiaofeng He. 2004. K-means clustering via principal component analysis. In Proceedings of the 21st International Conference on Machine Learning. ACM, 29.Google Scholar
- Leigh Drake. 1993. Modelling UK house prices using cointegration: An application of the Johansen technique. Applied Economics 25, 9 (1993), 1225--1228.Google Scholar
- Mingjing Du, Shifei Ding, and Hongjie Jia. 2016. Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowledge-Based Systems 99 (2016), 135--145.Google Scholar
Digital Library
- Bradley Efron, Trevor Hastie, Iain Johnstone, Robert Tibshirani, et al. 2004. Least angle regression. The Annals of Statistics 32, 2 (2004), 407--499.Google Scholar
- Mario A. T. Figueiredo and Anil K. Jain. 2002. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis 8 Machine Intelligence 3 (2002), 381--396.Google Scholar
- Yoav Freund and Robert E. Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 1 (1997), 119--139.Google Scholar
Digital Library
- Y. Fu, Y. Ge, Y. Zheng, Z. Yao, Y. Liu, H. Xiong, and J. Yuan. 2014. Sparse real estate ranking with online user reviews and offline moving behaviors. In Proceedings of the IEEE International Conference on Data Mining.Google Scholar
- Yanjie Fu, Hui Xiong, Yong Ge, Zijun Yao, Yu Zheng, and Zhi-Hua Zhou. 2014. Exploiting geographic dependencies for real estate appraisal: A mutual perspective of ranking and clustering. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1047--1056.Google Scholar
Digital Library
- J. Edward Jackson and Govind S. Mudholkar. 1979. Control procedures for residuals associated with principal component analysis. Technometrics 21, 3 (1979), 341--349.Google Scholar
Cross Ref
- Hanwool Jang, Kwangwon Ahn, Dongshin Kim, and Yena Song. 2018. Detection and prediction of house price bubbles: Evidence from a new city. In Proceedings of the International Conference on Computational Science. Springer, 782--795.Google Scholar
- Ping Jia, Jian-hua Dai, Yun-he Pan, and Miao-liang Zhu. 2006. Novel algorithm for attribute reduction based on mutual-information gain ratio. Journal-Zhejiang University Engineering Science 40, 6 (2006), 1041.Google Scholar
- Asha Gowda Karegowda, A. S. Manjunath, and M. A. Jayaram. 2010. Comparative study of attribute selection using gain ratio and correlation based feature selection. International Journal of Information Technology and Knowledge Management 2, 2 (2010), 271--277.Google Scholar
- Visit Limsombunchai. 2004. House price prediction: Hedonic price model vs. artificial neural network. In Proceedings of the New Zealand Agricultural and Resource Economics Society Conference. 25--26.Google Scholar
- Qi Liu, Yong Ge, Zhongmou Li, Enhong Chen, and Hui Xiong. 2011. Personalized travel package recommendation. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining. IEEE, 407--416.Google Scholar
Digital Library
- G. J. McLachlan and D. C. McGiffin. 1994. On the role of finite mixture models in survival analysis. Statistical Methods in Medical Research 3, 3 (1994), 211--226.Google Scholar
Cross Ref
- Douglas A. McManus and Sol T. Mumey. 2002. System and method for providing house price forecasts based on repeat sales model. US Patent 6,401,070.Google Scholar
- J. Ross Quinlan. 1986. Induction of decision trees. Machine Learning 1, 1 (1986), 81--106.Google Scholar
Digital Library
- Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20 (1987), 53--65.Google Scholar
Digital Library
- David Sculley. 2010. Web-scale k-means clustering. In Proceedings of the 19th International Conference on World Wide Web. ACM, 1177--1178.Google Scholar
Digital Library
- Alex J. Smola and Bernhard Schölkopf. 2004. A tutorial on support vector regression. Statistics and Computing 14, 3 (2004), 199--222.Google Scholar
Digital Library
- David L. Streiner. 1996. Maintaining standards: Differences between the standard deviation and standard error, and when to use each. The Canadian Journal of Psychiatry 41, 8 (1996), 498--502.Google Scholar
Cross Ref
- Michael E. Tipping and Christopher M. Bishop. 1999. Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61, 3 (1999), 611--622.Google Scholar
Cross Ref
- Catherine Tucker, Juanjuan Zhang, and Ting Zhu. 2013. Days on market and home sales. The RAND Journal of Economics 44, 2 (2013), 337--360.Google Scholar
Cross Ref
- Xin Xu, Zeyu Huang, Jingyi Wu, Yanjie Fu, Na Luo, Weitong Chen, Jianan Wang, and Minghao Yin. 2019. Finding the key influences on the house price by finite mixture model based on the real estate data in Changchun. In Proceedings of the International Conference on Database Systems for Advanced Applications. Springer, 378--382.Google Scholar
- Zhi-Hua Zhou and Xu-Ying Liu. 2006. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge 8 Data Engineering 1 (2006), 63--77.Google Scholar
Digital Library
- Hengshu Zhu, Hui Xiong, Fangshuang Tang, Qi Liu, Yong Ge, Enhong Chen, and Yanjie Fu. 2016. Days on market: Measuring liquidity in real estate markets. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 393--402.Google Scholar
- Zoran Zivkovic and Ferdinand van der Heijden. 2004. Recursive unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 5 (2004), 651--656.Google Scholar
Digital Library
Index Terms
Adaptive Weighted Finite Mixture Model: Identifying the Feature-Influence of Real Estate
Recommendations
Finite mixture of varying coefficient model: Estimation and component selection
AbstractHeterogeneous longitudinal data have become prevalent in medical, biological, and social studies. This paper proposes a finite mixture of varying coefficient models for handling heterogeneous populations. Each component of the mixture ...
Robust non-rigid point registration based on feature-dependant finite mixture model
In previous works on point registration based on finite mixture model, the correspondence probability is often determined by exploiting global relationship in the point set instead of considering the local point distribution. That results in a ...
A stochastic EM algorithm for a semiparametric mixture model
Recently, there has been a considerable interest in finite mixture models with semi-/non-parametric component distributions. Identifiability of such model parameters is generally not obvious, and when it occurs, inference methods are rather specific to ...






Comments