My general research interest is in the realm of Empirical Software Engineering. In particular, I am experienced in deep learning, data analysis, data mining, data gathering, statistical modeling, machine learning, metric design, and natural language processing. I have extensive experience in gathering data using R and Python to build both predictive and explanatory statistical models, and I am an adept programmer in multiple languages.
Modern software development is increasingly collaborative. Open Source Software (OSS) are the bellwether; they support dynamic teams, with tools for code sharing, communication, and issue tracking. The success of an OSS project is reliant on team communication. E.g., in issue discussions, individuals rely on rhetoric to argue their position, but also maintain technical relevancy. Rhetoric and technical language are on opposite ends of a language complexity spectrum: the former is stylistically natural; the latter is terse and concise. Issue discussions embody this duality, as developers use rhetoric to describe technical issues. The style mix in any discussion can define group culture and affect performance, e.g., issue resolution times may be longer if discussion is imprecise. Using GitHub, we studied issue discussions to understand whether project-specific language differences exist, and to what extent users conform to a language norm. We built project-specific and overall GitHub language models to study the effect of perceived language complexity on multiple responses. We find that experienced users conform to project-specific language norms, popular individuals use overall GitHub language rather than project-specific language, and conformance to project-specific language norms reduces issue resolution times. We also provide a tool to calculate project-specific perceived language complexity.
Open Source Software (OSS) supports dynamic teams across a wide variety of social and technical backgrounds. OSS project success relies on crowd contributions; though a small number of developers are primary contributors, for tasks such as help with issue identification and documentation, and bug fixing, minority contributors are also called on. It is, then, important to know who can help and who can be trusted with important task-related duties, and why.
In this paper, we argue that @-mentions in GitHub issues and pull request discussions can be appropriately used as signals of trust. We built overall and project-specific predictive future trust models of @-mentions, in order to capture the determinants of trust in each of two hundred projects, and to understand if and how those determinants differ between projects. We found that visibility, expertise, and productivity are associated with an increase in trust, while responsiveness is not, when controlling for confounds. Also, we find that even though project-specific differences exist in the trust models, the overall model can be used for cross-project prediction, indicating its GitHub-wide viability and utility.
Invited to present at the International Symposium on the Foundations of Software Engineering (FSE) 2016.
Open Source Software projects are communities in which people "learn the ropes" from each other. The social and technical activities of developers evolve together, and as they link to each other they get organized in a network of changing socio-technical connections. Traces of those activities, or behaviors, are typically visible to all, in project repositories and through communication between them. Thus, in principle it may be possible to study those traces to tell which of the observable socio-technical behaviors of developers in these projects are responsible for the forming of persistent links between them. It may also be possible to tell the extent to which links participate in the spread of potential behavioral influences.
Since OSS projects change in both social and technical activity over time, static approaches, that either ignore time or simplify it to a few slices, are frequently inadequate to study these networks. On the other hand, ad-hoc dynamic approaches are often only loosely supported by theory and can yield misleading findings. Here we adapt the stochastic actor-oriented models from social network analysis. These models enable the study of the interplay between behavior, influence and network architecture, for dynamic networks, in a statistically sound way.
We apply the stochastic actor-oriented models in case studies of two Apache Software Foundation projects, and study code ownership and developer productivity as behaviors. For those, we find evidence of significant social selection effects (homophily) in both projects, but in different directions. However, we find no evidence for the spread (social influence) of either code ownership or developer productivity behaviors through the networks.
Data and scripts used in this work can be found here.
Best Paper Nominee.
Programming is knowledge intensive. While it is well understood that programmers spend lots of time looking for information, with few exceptions, there is a significant lack of data on what information they seek, and why. Modern platforms, like Android, comprise complex APIs that often perplex programmers. We ask: which elements are confusing, and why? Increasingly, when programmers need answers, they turn to StackOverflow. This provides a novel opportunity. There are a vast number of applications for Android devices, which can be readily analyzed, and many traces of interactions on StackOverflow. These provide a complementary perspective on using and asking, and allow the two phenomena to be studied together. How does the market demand for the USE of an API drive the market for knowledge about it? Here, we analyze data from Android applications and StackOverflow together, to find out what it is that programmers want to know and why.
Graduate student researcher in the DECAL lab at UC Davis. Teaching assistant for multiple courses, including software engineering and introduction to object-oriented programming (C++).
Worked towards various solutions for cybersecurity related problems. Published a paper accepted to ICNC 2018.
Wrote code for a front-end in Actionscript to interface with wireless devices using a proprietary packet structure.