Overview of MERA: An Architecture to Perform Record Linkage in Music-Related Databases

  1. Fernández-Álvarez, Daniel 1
  2. Labra Gayo, José Emilio 1
  3. Gayo-Avello, Daniel 1
  4. Ordoñez de Pablos, Patricia 1
  1. 1 Universidad de Oviedo
    info

    Universidad de Oviedo

    Oviedo, España

    ROR https://ror.org/006gksa02

Libro:
Semantic Web Science and Real-World Applications
  1. Miltiadis D. Lytras (ed. lit.)
  2. Naif Aljohani (ed. lit.)
  3. Ernesto Damiani (ed. lit.)
  4. Kwok Tai Chui (ed. lit.)

Editorial: IGI Global

ISSN: 2328-2762 2328-2754

ISBN: 9781522571865 9781522571872

Año de publicación: 2019

Páginas: 219-245

Tipo: Capítulo de Libro

DOI: 10.4018/978-1-5225-7186-5.CH009 GOOGLE SCHOLAR

Resumen

The proliferation of large databases with potentially repeated entities across the World Wide Web drives into a generalized interest to find methods to detect duplicated entries. The heterogeneity of the data cause that generalist approaches may produce a poor performance in scenarios with distinguishing features. In this paper, we analyze the particularities of music related-databases and we describe Musical Entities Reconciliation Architecture (MERA). MERA consists of an architecture to match entries of two sources, allowing the use of extra support sources to improve the results. It makes use of semantic web technologies and it is able to adapt the matching process to the nature of each field in each database. We have implemented a prototype of MERA and compared it with a well-known music-specialized search engine. Our prototype outperforms the selected baseline in terms of accuracy.

Referencias bibliográficas

  • Achichi, M., Cheatham, M., Dragisic, Z., Euzenat, J., Faria, D., Ferrara, A., ... & Jiménez-Ruiz, E. (2016). Results of the ontology alignment evaluation initiative 2016. In OM: Ontology Matching (pp. 73–129).
  • Altwaijry H. Kalashnikov D. V. Mehrotra S. (2017). Qda: A query-driven approach to entity resolution. IEEE Transactions on Knowledge and Data Engineering, 29(2), 402–417. 10.1109/TKDE.2016.2623607.
  • Arrington, M. (2006). AOL proudly releases massive amounts of private data.
  • Baxter, R., Christen, P., & Churches, T. (2003). A comparison of fast blocking methods for record linkage. In ACM SIGKDD (Vol. 3, pp. 25–27).
  • Benatallah B. Venugopal S. Ryu S. H. Motahari-Nezhad H. R. Wang W. (2017). A systematic review and comparative analysis of cross-document coreference resolution methods and tools. Computing, 99(4), 313–349. 10.1007/s00607-016-0490-0.
  • Castano, S., Ferrara, A., & Montanelli, S. (2018). Matching Techniques for Data Integration and Exploration: From Databases to Big Data. In A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years (pp. 61–76). Springer.
  • Chaudhuri S. Ganjam K. Ganti V. Motwani R. (2003). Robust and efficient fuzzy match for online data cleaning. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data (pp. 313–324). 10.1145/872757.872796.
  • Christen P. (2008). Febrl-: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1065–1068). 10.1145/1401890.1402020.
  • Christen, P., & Winkler, W. E. (2016). Record Linkage. In Encyclopedia of Machine Learning and Data Mining.
  • de Assis Costa, G., & de Oliveira, J. M. P. (2016). A blocking scheme for entity resolution in the semantic web. In 2016 IEEE 30th international conference on Advanced Information networking and applications (AINA) (pp. 1138–1145).
  • Draisbach U. Naumann F. (2010). DuDe: The duplicate detection toolkit. In Proceedings of the International Workshop on Quality in Databases (QDB).
  • Dunning T. E. Kindig B. D. Joshlin S. C. Archibald C. P. (2011). Associating and linking compact disc metadata. Google Patents.
  • Efthymiou V. Papadakis G. Papastefanatos G. Stefanidis K. Palpanas T. (2017). Parallel meta-blocking for scaling entity resolution over big heterogeneous data.Information Systems, 65, 137–157. 10.1016/j.is.2016.12.001.
  • Elmagarmid A. Ilyas I. F. Ouzzani M. Quiané-Ruiz J.-A. Tang N. Yin S. (2014). NADEEF/ER: Generic and interactive entity resolution. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data (pp. 1071–1074). 10.1145/2588555.2594511
  • Enríquez J. G. Domínguez-Mayo F. J. Escalona M. J. Ross M. Staples G. (2017). Entity reconciliation in big data sources: A systematic mapping study.Expert Systems with Applications, 80, 14–27. 10.1016/j.eswa.2017.03.010
  • Eshghi K. Rajaram S. S. Dagli C. Cohen I. (2015). Identifying related objects in a computer database. Google Patents.
  • Fellegi I. P. Sunter A. B. (1969). A Theory for Record Linkage.Journal of the American Statistical Association, 64(328), 1183–1210. 10.1080/01621459.1969.10501049
  • Fisher J. Christen P. Wang Q. Rahm E. (2015). A clustering-based framework to control block sizes for entity resolution. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 279–288). 10.1145/2783258.2783396
  • Frontini F. Brando C. Ganascia J.-G. (2015). Domain-adapted named-entity linker using Linked Data. In Workshop on NLP Applications: Completing the Puzzle co-located with the 20th International Conference on Applications of Natural Language to Information Systems (NLDB 2015).
  • Gao C. Cheng Q. Li X. Xia S. (2018). Cloud-assisted privacy-preserving profile-matching scheme under multiple keys in mobile social network. Cluster Computing, 1–9.
  • Hall P. A. V. Dowling G. R. (1980). Approximate string matching. ACM Computing Surveys, 12(4), 381–402. 10.1145/356827.356830
  • Harron K. Goldstein H. Dibben C. (2015). Methodological developments in data linkage. John Wiley & Sons. 10.1002/9781119072454
  • Hartnett J. (2015). Discogs. com.The Charleston Advisor, 16(4), 26–33. 10.5260/chara.16.4.26
  • Hemerly, J. (2011). Making metadata: The case of MusicBrainz.
  • Hernández M. A. Stolfo S. J. (1995). The merge/purge problem for large databases.SIGMOD Record, 24(2), 127–138. 10.1145/568271.223807
  • Ilyas, I. F., Chu, X., & others. (2015). Trends in cleaning relational data: Consistency and deduplication. Foundations and Trends®in Databases, 5(4), 281–393.
  • Jurczyk P. Lu J. J. Xiong L. Cragan J. D. Correa A. (2008). FRIL: A tool for comparative record linkage. AMIA ... Annual Symposium Proceedings - AMIA Symposium. AMIA Symposium, 2008, 440.18998844
  • Kalashnikov D. V. Mehrotra S. (2006). Domain-independent data cleaning via analysis of entity-relationship graph. ACM Transactions on Database Systems, 31(2), 716–767. 10.1145/1138394.1138401
  • Kang H. Getoor L. Shneiderman B. Bilgic M. Licamele L. (2008). Interactive entity resolution in relational data: A visual analytic tool and its evaluation. Visualization and Computer Graphics. IEEE Transactions On, 14(5), 999–1014.18599913
  • Kouki, P., Pujara, J., Marcum, C., Koehly, L., & Getoor, L. (2017). Collective entity resolution in familial networks. In 2017 IEEE International Conference on Data Mining (ICDM) (pp. 227–236). 10.1109/ICDM.2017.32
  • Lamarine M. Hager J. Saris W. H. Astrup A. Valsesia A. (2018). Fuzzy Matching and Machine Learning approaches for large-scale, automated mapping of food diaries on food composition tables.Frontiers in Nutrition, 5, 38. 10.3389/fnut.2018.0003829868600
  • Lotker Z. Patt-Shamir B. Pettie S. (2015). Improved distributed approximate matching. Journal of the Association for Computing Machinery, 62(5), 38. 10.1145/2786753
  • McCormack, K., & Smyth, M. (2017). A Mathematical Solution to String Matching for Big Data Linking. Journal of Statistical Science and Application, 5, 39–55.
  • Newcombe H. B. Kennedy J. M. (1962). Record linkage: Making maximum use of the discriminating power of identifying information. Communications of the ACM, 5(11), 563–566. 10.1145/368996.369026
  • Nguyen K. Ichise R. (2016). Linked data entity resolution system enhanced by configuration learning algorithm. IEICE Transactions on Information and Systems, 99(6), 1521–1530. 10.1587/transinf.2015EDP7392
  • Peng T. Li L. Kennedy J. (2014). A Comparison of Techniques for Name Matching. Journal of Computers, 2(1), 55–61.
  • Pow C. Iron K. Boyd J. Brown A. Thompson S. Chong N. Ma C. (2017). Privacy-preserving record linkage: An international collaboration between Canada, Australia and Wales. International Journal for Population Data Science, 1(1). 10.23889/ijpds.v1i1.101
  • Rahmani H. Ranjbar-Sahraei B. Weiss G. Tuyls K. (2016). Entity resolution in disjoint graphs: An application on genealogical data. Intelligent Data Analysis, 20(2), 455–475. 10.3233/IDA-160814
  • Schnell R. Bachteler T. Reiher J. (2009). Privacy-preserving record linkage using Bloom filters.BMC Medical Informatics and Decision Making, 9(1), 41. 10.1186/1472-6947-9-4119706187
  • Shin K. Jung J. Lee S. Kang U. (2015). Bear: Block elimination approach for random walk with restart on large graphs. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (pp. 1571–1585). 10.1145/2723372.2723716
  • Song D. Luo Y. Heflin J. (2017). Linking heterogeneous data in the semantic web using scalable and domain-independent candidate selection. IEEE Transactions on Knowledge and Data Engineering, 29(1), 143–156. 10.1109/TKDE.2016.2606399
  • Stutzbach A. R. (2011). MusicBrainz [review]. Notes, 68(1), 147–151. 10.1353/not.2011.0134
  • Vatsalan, D., Sehili, Z., Christen, P., & Rahm, E. (2017). Privacy-Preserving Record Linkage for Big Data: Current Approaches and Research Challenges. In Handbook of Big Data Technologies (pp. 851–895). Springer.
  • Vidhya K. A. Geetha T. V. (2017). Resolving entity on a large scale: determining linked entities and grouping similar attributes represented in assorted terminologies.Distributed and Parallel Databases, 35(3–4), 303–332. 10.1007/s10619-017-7205-1
  • Volz J. Bizer C. Gaedke M. Kobilarov G. (2009). Silk-A Link Discovery Framework for the Web of Data (Vol. 538). LDOW.
  • Yancey W. E. (2002). BigMatch: A program for extracting probable matches from a large file for record linkage.Computing, 1, 1–8.
  • Yu M. Li G. Deng D. Feng J. (2016). String similarity search and join: A survey.Frontiers of Computer Science, 10(3), 399–417. 10.1007/s11704-015-5900-5
  • Zahaf, A., & Malki, M. (2018). Methods for Ontology Alignment Change. In Handbook of Research on Contemporary Perspectives on Web-Based Systems (pp. 214–239). Hershey, PA: IGI Global. 10.4018/978-1-5225-5384-7.ch011
  • Zhu L. Ghasemi-Gol M. Szekely P. Galstyan A. Knoblock C. A. (2016). Unsupervised Entity Resolution on Multi-type Graphs. In International Semantic Web Conference (pp. 649–667).