Author image not provided
 Fei Chiang

Authors:
Add personal information
  Affiliation history
Bibliometrics: publication history
Average citations per article9.18
Citation Count156
Publication count17
Publication years2007-2017
Available for download10
Average downloads per article331.10
Downloads (cumulative)3,311
Downloads (12 Months)488
Downloads (6 Weeks)73
SEARCH
ROLE
Arrow RightAuthor only


AUTHOR'S COLLEAGUES
See all colleagues of this author

SUBJECT AREAS
See all subject areas




BOOKMARK & SHARE


18 results found Export Results: bibtexendnoteacmrefcsv

Result 1 – 18 of 18
Sort by:

1 published by ACM
April 2018 Journal of Data and Information Quality (JDIQ) - Challenge Paper, Experience Paper and Research Paper: Volume 9 Issue 4, May 2018
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 18,   Downloads (12 Months): 112,   Downloads (Overall): 112

Full text available: PDFPDF
Data quality has become a pervasive challenge for organizations as they wrangle with large, heterogeneous datasets to extract value. Given the proliferation of sensitive and confidential information, it is crucial to consider data privacy concerns during the data cleaning process. For example, in medical database applications, varying levels of privacy ...
Keywords: Data quality, data cleaning, data privacy

2 published by ACM
November 2017 CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 12,   Downloads (12 Months): 90,   Downloads (Overall): 90

Full text available: PDFPDF
Functional Dependencies (FDs) define attribute relationships based on syntactic equality, and, when used in data cleaning, they erroneously label syntactically different but semantically equivalent values as errors. We enhance dependency-based data cleaning with Ontology Functional Dependencies (OFDs), which express semantic attribute relationships such as synonyms and is-a hierarchies defined by ...
Keywords: data cleaning, dependency discovery, functional dependency, ontology functional dependency

3
November 2017 CASCON '17: Proceedings of the 27th Annual International Conference on Computer Science and Software Engineering
Publisher: IBM Corp.
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 3,   Downloads (12 Months): 11,   Downloads (Overall): 11

Deduplication is a costly and tedious task that involves identifying duplicate records in a dataset. High duplication rates lead to poor data quality, where data ambiguity occurs as to whether two records refer to the same entity. Existing deduplication techniques compare a set of attribute values, and verify whether given ...

4 published by ACM
October 2016 CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 5,   Downloads (12 Months): 55,   Downloads (Overall): 156

Full text available: PDFPDF
Poor data quality has become a persistent challenge for organizations as data continues to grow in complexity and size. Existing data cleaning solutions focus on identifying repairs to the data to minimize either a cost function or the number of updates. These techniques, however, fail to consider underlying data privacy ...
Keywords: constraint based cleaning, data quality, information disclosure

5 published by ACM
August 2016 Journal of Data and Information Quality (JDIQ) - Research Paper, Challenge Papers and Experience Paper: Volume 7 Issue 3, September 2016
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 6,   Downloads (12 Months): 52,   Downloads (Overall): 223

Full text available: PDFPDF
Integrity constraints play an important role in data design. However, in an operational database, they may not be enforced for many reasons. Hence, over time, data may become inconsistent with respect to the constraints. To manage this, several approaches have proposed techniques to repair the data by finding minimal or ...
Keywords: Data quality, data repair, constraint repair

6
January 2016 Transactions on Computational Collective Intelligence XXI - Volume 9630
Publisher: Springer-Verlag New York, Inc.
Bibliometrics:
Citation Count: 0

Online product search engines such as Google and Yahoo shopping, rely on having extensive and complete product information to return accurate and timely search results. Given the expanding scope of products and updates to existing products, automated techniques are needed to ensure the underlying product dictionaries remain current and complete. ...
Keywords: Dictionaries, Information extraction, Clustering

7
January 2016 Transactions on Computational Collective Intelligence XXI - Volume 9630
Publisher: Springer-Verlag
Bibliometrics:
Citation Count: 0

Online product search engines such as Google and Yahoo shopping, rely on having extensive and complete product information to return accurate and timely search results. Given the expanding scope of products and updates to existing products, automated techniques are needed to ensure the underlying product dictionaries remain current and complete. ...
Keywords: Clustering, Dictionaries, Information extraction

8
December 2015 Proceedings of the VLDB Endowment: Volume 9 Issue 4, December 2015
Publisher: VLDB Endowment
Bibliometrics:
Citation Count: 7
Downloads (6 Weeks): 9,   Downloads (12 Months): 52,   Downloads (Overall): 150

Full text available: PDFPDF
Quantitative data cleaning relies on the use of statistical methods to identify and repair data quality problems while logical data cleaning tackles the same problems using various forms of logical reasoning over declarative dependencies. Each of these approaches has its strengths: the logical approach is able to capture subtle data ...

9 published by ACM
November 2014 CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 6,   Downloads (12 Months): 11,   Downloads (Overall): 106

Full text available: PDFPDF
We present CONDOR, a tool for managing constraints towards improved data quality. As increasing amounts of heterogeneous data are being generated, integrity constraints are the primary tool for enforcing data integrity. It is essential that an accurate and up-to-date set of constraints exist to validate that the correct application semantics ...
Keywords: constraint repair, data cleaning, data repair, data quality

10
April 2012 ICDE '12: Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 1

An attribute dictionary is a set of attributes together with a set of common values of each attribute. Such dictionaries are valuable in understanding unstructured or loosely structured textual descriptions of entity collections, such as product catalogs. Dictionaries provide the supervised data for learning product or entity descriptions. In this ...

11 published by ACM
February 2012 iConference '12: Proceedings of the 2012 iConference
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 0,   Downloads (12 Months): 2,   Downloads (Overall): 80

Full text available: PDFPDF
Shopping online has become a prolific activity as the number of online vendors and consumers continue to rise each year. In 2009, almost $15 billion in goods and services were ordered online by Canadians [1]. About 53% of these consumers 'window shop' by doing product research before actually making a ...

12
January 2012
Bibliometrics:
Citation Count: 0

Although integrity constraints are the primary means for enforcing data integrity, there are cases in which they are not defined or are not strictly enforced. This leads to inconsistencies in the data, causing poor data quality. In this thesis, we leverage the power of constraints to improve data quality. To ...

13
April 2011 ICDE '11: Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 12

Integrity constraints play an important role in data design. However, in an operational database, they may not be enforced for many reasons. Hence, over time, data may become inconsistent with respect to the constraints. To manage this, several approaches have proposed techniques to repair the data, by finding minimal or ...

14
August 2009 Proceedings of the VLDB Endowment: Volume 2 Issue 1, August 2009
Publisher: VLDB Endowment
Bibliometrics:
Citation Count: 37
Downloads (6 Weeks): 0,   Downloads (12 Months): 28,   Downloads (Overall): 562

Full text available: PDFPDF
The presence of duplicate records is a major data quality concern in large databases. To detect duplicates, entity resolution also known as duplication detection or record linkage is used as a part of the data cleaning process to identify records that potentially refer to the same real-world entity. We present ...

15
August 2008 Proceedings of the VLDB Endowment: Volume 1 Issue 1, August 2008
Publisher: VLDB Endowment
Bibliometrics:
Citation Count: 56
Downloads (6 Weeks): 9,   Downloads (12 Months): 65,   Downloads (Overall): 1,115

Full text available: PDFPDF
Dirty data is a serious problem for businesses leading to incorrect decision making, inefficient daily operations, and ultimately wasting both time and money. Dirty data often arises when domain constraints and business rules, meant to preserve data consistency and accuracy, are enforced incompletely or not at all in application code. ...

16 published by ACM
June 2008 SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 2,   Downloads (12 Months): 5,   Downloads (Overall): 351

Full text available: PDFPDF
XML database systems are expected to handle increasingly complex queries over increasingly large and highly structured XML databases. An important problem that needs to be solved for these systems is how to choose the best set of indexes for a given workload. We have developed an XML Index Advisor that ...
Keywords: automatic physical database design, index advisor, xml databases

17
April 2008 ICDE '08: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 3

XML database systems are expected to handle increasingly complex queries over increasingly large and highly structured XML databases. An important problem that needs to be solved for these systems is how to choose the best set of indexes for a given workload. In this paper, we present an XML Index ...

18
September 2007 VLDB '07: Proceedings of the 33rd international conference on Very large data bases
Publisher: VLDB Endowment
Bibliometrics:
Citation Count: 26
Downloads (6 Weeks): 2,   Downloads (12 Months): 8,   Downloads (Overall): 341

Full text available: PDFPDF
The popularity of blogs has been increasing dramatically over the last couple of years. As topics evolve in the blogosphere, keywords align together and form the heart of various stories. Intuitively we expect that in certain contexts, when there is a lot of discussion on a specific topic or event, ...



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us