Big Code: New Opportunities for Improving Software Construction

  1. Francisco Ortín 1
  2. Javier Escalada 1
  3. Oscar Rodríguez-Prieto 1
  1. 1 Universidad de Oviedo
    info

    Universidad de Oviedo

    Oviedo, España

    ROR https://ror.org/006gksa02

Revista:
Journal of Software

ISSN: 1796-217X

Año de publicación: 2016

Volumen: 11

Número: 11

Páginas: 1083-1008

Tipo: Artículo

DOI: 10.17706/JSW.11.11.1083-1088 GOOGLE SCHOLAR lock_openAcceso abierto editor

Otras publicaciones en: Journal of Software

Resumen

An emerging research topic called big code has recently appeared. Big code is based on the idea that open source code repositories can be used to create new kind of programming tools and services to improve software reliability and construction. We discuss different fields of application of big code, and the key issues to implement tools aimed at improving software construction following this approach. We describe the existing works that have already used this idea to build tools for vulnerability detection, software deobfuscation, automatic code completion for API usage, and efficient querying using detailed source-code information. Then, we propose different fields of application and the key issues found. We identify eight different fields where big code may be applied, and describe different examples for each field. We also detect seven different issues that must be tackled when creating tools based on the big code approach.

Información de financiación

This work has been funded by the European Union, through the European Regional Development Funds (ERDF); and the Principality of Asturias, through its Science, Technology and Innovation Plan (grant GRUPIN14-100).

Financiadores

Referencias bibliográficas

  • Doll, B. (2013). 10 million repositories. GitHub. Retrieved from https://github.com/blog/1724-10-million-repositories.
  • Defense Advanced Research Projects Agency. (2014). MUSE envisions mining "big code" to improve software reliability and construction. Retrieved from http://www.darpa.mil/news-events/2014-03-06a
  • Raychev, V., Vechev, M., & Krause, A. (2015). Predicting program properties from "big code", in: Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (pp. 111-124.
  • Karaivanov, S., Raychev, V., & Vechev, M. (2014). Phrase-based statistical translation of programming languages. Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, Onward! (pp. 173-184). ACM, New York, NY, USA.
  • Yamaguchi, F., Golde, N., Arp, D., & Rieck, K. (2014). Modeling and discovering vulnerabilities with code property graphs. Proceedings of the 2014 IEEE Symposium on Security and Privacy, SP'14 (pp. 590-604). IEEE Computer Society, Washington, DC, USA.
  • Yamaguchi, F., Lottmann, M., & Rieck, K. (2012). Generalized vulnerability extrapolation using abstract syntax trees. Proceedings of the 28th Annual Computer Security Applications Conference, ACSAC'12 (pp. 359-368). ACM, New York, NY, USA.
  • Yamaguchi, F., Wressnegger, C., Gascon, H., & Rieck, K. (2013). Chucky: Exposing missing checks in source code for vulnerability discovery. Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, CCS'13 (pp. 499-510). ACM, New York, NY, USA.
  • Urma, R., & Mycroft, A. (2015). Source-code queries with graph databases - with application to programming language usage and evolution. Science of Computer Programming, 97, 127-134.
  • Bloch, J. (2008). Effective Java (The Java Series) (2nd Edition). Prentice Hall PTR, Upper Saddle River, NJ, USA.
  • Raychev, V., Vechev, M., & Yahav, E. (2014). Code completion with statistical language models. Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'14 (pp. 419-428). ACM, New York, NY, USA.
  • Escalada, J., & Ortin, F. (2014). An adaptable infrastructure to generate training datasets for decompilation issues. New Perspectives in Information Systems and Technologies, Springer International Publishing, pp. 85-94.
  • Takikawa, A., Feltey, D., Greenman, B., New, M. S., Vitek, J., & Felleisen, M. (2016). Is sound gradual typing dead? Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL’16 (pp. 456-468) ACM, New York, NY, USA.
  • CERT, Carnegie Mellon University. (2016). Java coding guidelines. Retrieved from https://www.securecoding.cert.org/confluence/display/java/Java+Coding+Guidelines. [14] Kite, Your program copilot. (2016). Retrieved from https://kite.com.
  • Kite, Your program copilot. (2016). Retrieved from https://kite.com.