15 research outputs found

    SBFT Tool Competition 2024 -- Python Test Case Generation Track

    Full text link
    Test case generation (TCG) for Python poses distinctive challenges due to the language's dynamic nature and the absence of strict type information. Previous research has successfully explored automated unit TCG for Python, with solutions outperforming random test generation methods. Nevertheless, fundamental issues persist, hindering the practical adoption of existing test case generators. To address these challenges, we report on the organization, challenges, and results of the first edition of the Python Testing Competition. Four tools, namely UTBotPython, Klara, Hypothesis Ghostwriter, and Pynguin were executed on a benchmark set consisting of 35 Python source files sampled from 7 open-source Python projects for a time budget of 400 seconds. We considered one configuration of each tool for each test subject and evaluated the tools' effectiveness in terms of code and mutation coverage. This paper describes our methodology, the analysis of the results together with the competing tools, and the challenges faced while running the competition experiments.Comment: 4 pages, to appear in the Proceedings of the 17th International Workshop on Search-Based and Fuzz Testing (SBFT@ICSE 2024

    SBFT tool competition 2024 : Python test case generation track

    Get PDF
    Test case generation (TCG) for Python poses distinctive challenges due to the language’s dynamic nature and the absence of strict type information. Previous research has successfully explored automated unit TCG for Python, with solutions outperforming random test generation methods. Nevertheless, fundamental issues persist, hindering the practical adoption of existing test case generators. To address these challenges, we report on the organization, challenges, and results of the first edition of the Python Testing Competition. Four tools, namely UTBotPython, Klara, Hypothesis Ghostwriter, and Pynguin were executed on a benchmark set consisting of 35 Python source files sampled from 7 open-source Python projects for a time budget of 400 seconds. We considered one configuration of each tool for each test subject and evaluated the tools’ effectiveness in terms of code and mutation coverage. This paper describes our methodology, the analysis of the results together with the competing tools, and the challenges faced while running the competition experiments

    se2p/pynguin: Pynguin 0.35.0

    No full text
    <ul> <li>Fix <code>TypeError</code> bug in instrumentation of bytecode (see GitHub PR #51)</li> <li>Add a dump method for type-information statistics</li> <li>Fix handling of aliased modules (see GitHub issue #57)</li> <li>Fix method-signature handling for C extensions (see GitHub issue #59)</li> </ul&gt

    se2p/pynguin: Pynguin 0.36.0

    No full text
    <ul> <li>Remove unused code</li> <li>Fix <code>ruff</code> warnings</li> <li>Add sequence variable for type-evolution tracking</li> <li>Add CLI options to ignore methods and modules from analysis (see https://github.com/se2p/pynguin/issues/62)</li> </ul&gt

    An empirical study of automated unit test generation for Python

    No full text
    AbstractVarious mature automated test generation tools exist for statically typed programming languages such as Java. Automatically generating unit tests for dynamically typed programming languages such as Python, however, is substantially more difficult due to the dynamic nature of these languages as well as the lack of type information. Our Pynguin framework provides automated unit test generation for Python. In this paper, we extend our previous work on Pynguin to support more aspects of the Python language, and by studying a larger variety of well-established state of the art test-generation algorithms, namely DynaMOSA, MIO, and MOSA. Furthermore, we improved our Pynguin tool to generate regression assertions, whose quality we also evaluate. Our experiments confirm that evolutionary algorithms can outperform random test generation also in the context of Python, and similar to the Java world, DynaMOSA yields the highest coverage results. However, our results also demonstrate that there are still fundamental remaining issues, such as inferring type information for code without this information, currently limiting the effectiveness of test generation for Python.</jats:p

    An empirical study of automated unit test generation for Python

    No full text
    Various mature automated test generation tools exist for statically typed programming languages such as Java. Automatically generating unit tests for dynamically typed programming languages such as Python, however, is substantially more difficult due to the dynamic nature of these languages as well as the lack of type information. Our P YNGUIN framework provides automated unit test generation for Python. In this paper, we extend our previous work on P YNGUIN to support more aspects of the Python language, and by studying a larger variety of well-established state of the art test-generation algorithms, namely DynaMOSA, MIO, and MOSA. Furthermore, we improved our P YNGUIN tool to generate regression assertions, whose quality we also evaluate. Our experiments confirm that evolutionary algorithms can outperform random test generation also in the context of Python, and similar to the Java world, DynaMOSA yields the highest coverage results. However, our results also demonstrate that there are still fundamental remaining issues, such as inferring type information for code without this information, currently limiting the effectiveness of test generation for Python

    An Empirical Study of Flaky Tests in Python

    Full text link
    Tests that cause spurious failures without any code changes, i.e., flaky tests, hamper regression testing, increase maintenance costs, may shadow real bugs, and decrease trust in tests. While the prevalence and importance of flakiness is well established, prior research focused on Java projects, thus raising the question of how the findings generalize. In order to provide a better understanding of the role of flakiness in software development beyond Java, we empirically study the prevalence, causes, and degree of flakiness within software written in Python, one of the currently most popular programming languages. For this, we sampled 22352 open source projects from the popular PyPI package index, and analyzed their 876186 test cases for flakiness. Our investigation suggests that flakiness is equally prevalent in Python as it is in Java. The reasons, however, are different: Order dependency is a much more dominant problem in Python, causing 59% of the 7571 flaky tests in our dataset. Another 28% were caused by test infrastructure problems, which represent a previously undocumented cause of flakiness. The remaining 13% can mostly be attributed to the use of network and randomness APIs by the projects, which is indicative of the type of software commonly written in Python. Our data also suggests that finding flaky tests requires more runs than are often done in the literature: A 95% confidence that a passing test case is not flaky on average would require 170 reruns.Comment: 11 pages, to be published in the Proceedings of the IEEE International Conference on Software Testing, Verification and Validation (ICST 2021

    An Empirical Study of Flaky Tests in Python

    No full text
    This is a summary of our work presented at the International Conference on Software Testing 2021 [Gr21b]. Tests that cause spurious failures without code changes, i.e., flaky tests, hamper regression testing and decrease trust in tests. While the prevalence and importance of flakiness is well established, prior research focused on Java projects, raising questions about generalizability. To provide a better understanding of flakiness, we empirically study the prevalence, causes, and degree of flakiness within 22 352 Python projects containing 876 186 tests. We found flakiness to be equally prevalent in Python as in Java. The reasons, however, are different: Order dependency is a dominant problem, causing 59% of the 7 571 flaky tests we found. Another 28% were caused by test infrastructure problems, a previously less considered cause of flakiness. The remaining 13% can mostly be attributed to the use of network and randomness APIs. Unveiling flaky tests also requires more runs than often assumed: A 95% confidence that a passing test is not flaky on average would require 170 reruns. Additionally, through our investigations, we created a large dataset of flaky tests that other researchers already started building on [MM21; Ni21]

    SBFT tool competition 2024 : Python test case generation track

    No full text
    Test case generation (TCG) for Python poses distinctive challenges due to the language’s dynamic nature and the absence of strict type information. Previous research has successfully explored automated unit TCG for Python, with solutions outperforming random test generation methods. Nevertheless, fundamental issues persist, hindering the practical adoption of existing test case generators. To address these challenges, we report on the organization, challenges, and results of the first edition of the Python Testing Competition. Four tools, namely UTBotPython, Klara, Hypothesis Ghostwriter, and Pynguin were executed on a benchmark set consisting of 35 Python source files sampled from 7 open-source Python projects for a time budget of 400 seconds. We considered one configuration of each tool for each test subject and evaluated the tools’ effectiveness in terms of code and mutation coverage. This paper describes our methodology, the analysis of the results together with the competing tools, and the challenges faced while running the competition experiments
    corecore