Publications | Mohammad Mahdi Mohajer

2024

Conference
Effectiveness of ChatGPT for Static Analysis: How Far Are We?

Mohammad Mahdi Mohajer, Reem Aleithan, Nima Shiri Harzevili, Moshi Wei, Alvine Boaye Belle, Hung Viet Pham, and Song Wang

In Proceedings of the 1st ACM International Conference on AI-powered Software (AIware 2024), Porto de Galinhas, Brazil, 2024

Abs Bib HTML

This paper conducted a novel study to explore the capabilities of ChatGPT, a state-of-the-art LLM, in static analysis tasks such as static bug detection and false positive warning removal. In our evaluation, we focused on two types of typical and critical bugs targeted by static bug detection, i.e., Null Dereference and Resource Leak, as our subjects. We employ Infer, a well-established static analyzer, to aid the gathering of these two bug types from 10 open-source projects. Consequently, our experiment dataset contains 222 instances of Null Dereference bugs and 46 instances of Resource Leak bugs. Our study demonstrates that ChatGPT can achieve remarkable performance in the mentioned static analysis tasks, including bug detection and false-positive warning removal. In static bug detection, ChatGPT achieves accuracy and precision values of up to 68.37% and 63.76% for detecting Null Dereference bugs and 76.95% and 82.73% for detecting Resource Leak bugs, improving the precision of the current leading bug detector, Infer by 12.86% and 43.13% respectively. For removing false-positive warnings, ChatGPT can reach a precision of up to 93.88% for Null Dereference bugs and 63.33% for Resource Leak bugs, surpassing existing state-of-the-art false-positive warning removal tools.
@inproceedings{10.1145/3664646.3664777, author = {Mohajer, Mohammad Mahdi and Aleithan, Reem and Harzevili, Nima Shiri and Wei, Moshi and Belle, Alvine Boaye and Pham, Hung Viet and Wang, Song}, title = {Effectiveness of ChatGPT for Static Analysis: How Far Are We?}, year = {2024}, isbn = {9798400706851}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3664646.3664777}, doi = {10.1145/3664646.3664777}, booktitle = {Proceedings of the 1st ACM International Conference on AI-powered Software (AIware 2024)}, pages = {151–160}, numpages = {10}, keywords = {ChatGPT, Large language models, Static analysis}, location = {Porto de Galinhas, Brazil}, series = {AIware 2024}, }
Journal
History-Driven Fuzzing For Deep Learning Libraries

Nima Shiri Harzevili, Mohammad Mahdi Mohajer, Moshi Wei, Hung Viet Pham, and Song Wang

ACM Transactions on Software Engineering and Methodology (TOSEM 2024), Aug 2024

Abs Bib HTML

Recently, many Deep Learning (DL) fuzzers have been proposed for API-level testing of DL libraries. However, they either perform unguided input generation (e.g., not considering the relationship between API arguments when generating inputs) or only support a limited set of corner-case test inputs. Furthermore, many developer APIs crucial for library development remain untested, as they are typically not well documented and lack clear usage guidelines, unlike end-user APIs. This makes them a more challenging target for automated testing.To fill this gap, we propose a novel fuzzer named Orion, which combines guided test input generation and corner-case test input generation based on a set of fuzzing heuristic rules constructed from historical data known to trigger critical issues in the underlying implementation of DL APIs. To extract the fuzzing heuristic rules, we first conduct an empirical study on the root cause analysis of 376 vulnerabilities in two of the most popular DL libraries, PyTorch and TensorFlow. We then construct the fuzzing heuristic rules based on the root causes of the extracted historical vulnerabilities. Using these fuzzing heuristic rules, Orion generates corner-case test inputs for API-level fuzzing. In addition, we extend the seed collection of existing studies to include test inputs for developer APIs.Our evaluation shows that Orion reports 135 vulnerabilities in the latest releases of TensorFlow and PyTorch, 76 of which were confirmed by the library developers. Among the 76 confirmed vulnerabilities, 69 were previously unknown, and 7 have already been fixed. The rest are awaiting further confirmation. For end-user APIs, Orion detected 45.58% and 90% more vulnerabilities in TensorFlow and PyTorch, respectively, compared to the state-of-the-art conventional fuzzer, DeepRel. When compared to the state-of-the-art LLM-based DL fuzzer, AtlasFuz, and Orion detected 13.63% more vulnerabilities in TensorFlow and 18.42% more vulnerabilities in PyTorch. Regarding developer APIs, Orion stands out by detecting 117% more vulnerabilities in TensorFlow and 100% more vulnerabilities in PyTorch compared to the most relevant fuzzer designed for developer APIs, such as FreeFuzz.
@article{10.1145/3688838, author = {Shiri Harzevili, Nima and Mohajer, Mohammad Mahdi and Wei, Moshi and Pham, Hung Viet and Wang, Song}, title = {History-Driven Fuzzing For Deep Learning Libraries}, year = {2024}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, issn = {1049-331X}, url = {https://doi.org/10.1145/3688838}, doi = {10.1145/3688838}, journal = {ACM Transactions on Software Engineering and Methodology (TOSEM 2024)}, month = aug, keywords = {Fuzz testing, test generation, deep learning}, }

Workshop

Fairness Analysis of Machine Learning-Based Code Reviewer Recommendation

Mohammad Mahdi Mohajer, Alvine Boaye Belle, Nima Shiri Harzevili , Junjie Wang, Hadi Hemmati, Song Wang, and Zhen Ming Jiang

5th International Workshop on Algorithmic Bias in Search and Recommendation (Bias@SIGIR2024), Aug 2024

Bib HTML

@article{firstlook,
  title = {Fairness Analysis of Machine Learning-Based Code Reviewer Recommendation},
  author = {Mohajer, Mohammad Mahdi and Belle, Alvine Boaye and Shiri Harzevili, Nima and Wang, Junjie and Hemmati, Hadi and Wang, Song and Ming Jiang, Zhen},
  journal = {5th International Workshop on Algorithmic Bias in Search and Recommendation (Bias@SIGIR2024)},
  year = {2024},
  url = {https://link.springer.com/chapter/10.1007/978-3-031-71975-2_4},
}

Workshop

Using GPT-4 Turbo to Automatically Identify Defeaters in Assurance Cases

Kimya Khakzad Shahandashti, Alvine Boaye Belle, Mohammad Mahdi Mohajer, Oluwafemi Odu, Timothy C. Lethbridge, Hadi Hemmati, and Song Wang

In 2024 IEEE 32nd International Requirements Engineering Conference Workshops (REW), Aug 2024

Bib HTML

@inproceedings{10628633,
  author = {Shahandashti, Kimya Khakzad and Belle, Alvine Boaye and Mohajer, Mohammad Mahdi and Odu, Oluwafemi and Lethbridge, Timothy C. and Hemmati, Hadi and Wang, Song},
  booktitle = {2024 IEEE 32nd International Requirements Engineering Conference Workshops (REW)},
  title = {Using GPT-4 Turbo to Automatically Identify Defeaters in Assurance Cases},
  year = {2024},
  volume = {},
  number = {},
  pages = {46-56},
  keywords = {Large language models;ISO Standards;Conferences;Robustness;Cognition;Safety;Security;Large Language Models;Assurance Cases;As-surance Deficits;Defeaters;System Certification},
  doi = {10.1109/REW61692.2024.00011},
  url = {https://ieeexplore.ieee.org/abstract/document/10628633/},
}

Conference
Assessing the Impact of GPT-4 Turbo in Generating Defeaters for Assurance Cases

Kimya Khakzad Shahandashti, Mithila Sivakumar, Mohammad Mahdi Mohajer, Alvine Boaye Belle, Song Wang, and Timothy Lethbridge

In Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering (Forge 2024), Lisbon, Portugal, Aug 2024

Abs Bib HTML

Assurance cases (ACs) are structured arguments that allow verifying the correct implementation of the created systems’ non-functional requirements (e.g., safety, security). This allows for preventing system failure. The latter may result in catastrophic outcomes (e.g., loss of lives). ACs support the certification of systems in compliance with industrial standards, e.g., DO-178C and ISO 26262. Identifying defeaters —arguments that challenge these ACs — is crucial for enhancing ACs’ robustness and confidence. To automatically support that task, we propose a novel approach that explores the potential of GPT-4 Turbo, an advanced Large Language Model (LLM) developed by OpenAI, in identifying defeaters within ACs formalized using the Eliminative Argumentation (EA) notation. Our preliminary evaluation assesses the model’s ability to comprehend and generate arguments in this context and the results show that GPT-4 turbo is very proficient in EA notation and can generate different types of defeaters.
@inproceedings{10.1145/3650105.3652291, author = {Khakzad Shahandashti, Kimya and Sivakumar, Mithila and Mohajer, Mohammad Mahdi and Boaye Belle, Alvine and Wang, Song and Lethbridge, Timothy}, title = {Assessing the Impact of GPT-4 Turbo in Generating Defeaters for Assurance Cases}, year = {2024}, isbn = {9798400706097}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3650105.3652291}, doi = {10.1145/3650105.3652291}, booktitle = {Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering (Forge 2024)}, pages = {52–56}, numpages = {5}, keywords = {large language models, assurance cases, assurance defeaters, system certification, FM for Requirement Engineering}, location = {Lisbon, Portugal}, series = {FORGE '24}, }