* [pve-devel] [RFC] towards automated integration testing @ 2023-10-13 13:33 Lukas Wagner 2023-10-16 11:20 ` Stefan Hanreich 2023-10-16 13:57 ` Thomas Lamprecht 0 siblings, 2 replies; 10+ messages in thread From: Lukas Wagner @ 2023-10-13 13:33 UTC (permalink / raw) To: Proxmox VE development discussion Hello, I am currently doing the groundwork that should eventually enable us to write automated integration tests for our products. Part of that endeavor will be to write a custom test runner, which will - setup a specified test environment - execute test cases in that environment - create some sort of test report What will follow is a description of how that test runner would roughly work. The main point is to get some feedback on some of the ideas/ approaches before I start with the actual implementation. Let me know what you think! ## Introduction The goal is to establish a framework that allows us to write automated integration tests for our products. These tests are intended to run in the following situations: - When new packages are uploaded to the staging repos (by triggering a test run from repoman, or similar) - Later, this tests could also be run when patch series are posted to our mailing lists. This requires a mechanism to automatically discover, fetch and build patches, which will be a separate, follow-up project. - Additionally, it should be easy to run these integration tests locally on a developer's workstation in order to write new test cases, as well as troubleshooting and debugging existing test cases. The local test environment should match the one being used for automated testing as closely as possible As a main mode of operation, the Systems under Test (SUTs) will be virtualized on top of a Proxmox VE node. This has the following benefits: - it is easy to create various test setups (fixtures), including but not limited to single Proxmox VE nodes, clusters, Backup servers and auxiliary services (e.g. an LDAP server for testing LDAP authentication) - these test setups can easily be brought to a well-defined state: cloning from a template/restoring a backup/rolling back to snapshot - it makes it easy to run the integration tests on a developers workstation in identical configuration For the sake of completeness, some of the drawbacks of not running the tests on bare-metal: - Might be unable to detect regressions that only occur on real hardware In theory, the test runner would also be able to drive tests on real hardware, but of course with some limitations (harder to have a predictable, reproducible environment, etc.) ## Terminology - Template: A backup/VM template that can be instantiated by the test runner - Test Case: Some script/executable executed by the test runner, success is determined via exit code. - Fixture: Description of a test setup (e.g. which templates are needed, additional setup steps to run, etc.) ## Approach Test writers write template, fixture, test case definition in declarative configuration files (most likely TOML). The test case references a test executable/script, which performs the actual test. The test script is executed by the test runner; the test outcome is determined by the exit code of the script. Test scripts could be written in any language, e.g. they could be Perl scripts that use the official `libpve-apiclient-perl` to test-drive the SUTs. If we notice any emerging patterns, we could write additional helper libs that reduce the amount of boilerplate in test scripts. In essence, the test runner would do the following: - Group testcases by fixture - For every fixture: - Instantiate needed templates from their backup snapshot - Start VMs - Run any specified `setup-hooks` (update system, deploy packages, etc.) - Take a snapshot, including RAM - For every testcase using that fixture: - Run testcase (execute test executable, check exit code) - Rollback to snapshot (iff `rollback = true` for that template) - destroy test instances (or at least those which are not needed by other fixtures) In the beginning, the test scripts would primarily drive the Systems under Test (SUTs) via their API. However, the system would also offer the flexibility for us to venture into the realm of automated GUI testing at some point (e.g. using selenium) - without having to change the overall test architecture. ## Mock Test Runner Config Beside the actual test scripts, test writers would write test configuration. Based on the current requirements and approach that I have chose, a example config *could* look like the one following. These would likely be split into multiple files/folders (e.g. to group test case definition and the test script logically). ```toml [template.pve-default] # Backup image to restore from, in this case this would be a previously # set up PVE installation restore = '...' # To check if node is booted successfully, also made available to hook # scripts, in case they need to SSH in to setup things. ip = "10.0.0.1" # Define credentials in separate file - most template could use a # default password/SSH key/API token etc. credentials = "default" # Update to latest packages, install test .debs # credentials are passed via env var # Maybe this could also be ansible playbooks, if the need arises. setup-hooks = [ "update.sh", ] # Take snapshot after setup-hook, roll back after each test case rollback = true [template.ldap-server] # Backup image to restore from restore = '...' credentials = "default" ip = "10.0.0.3" # No need to roll back in between test cases, there won't be any changes rollback = false # Example fixture. They can be used by multiple testcases. [fixture.pve-with-ldap-server] # Maybe one could specify additional setup-hooks here as well, in case # one wants a 'per-fixture' setup? So that we can reduce the number of # base images? templates = [ 'pve-default', 'ldap-server', ] # testcases.toml (might be split to multiple files/folders?) [testcase.test-ldap-realms] fixture = 'pve-with-ldap-server' # - return code is check to determine test case success # - stderr/stdout is captured for the final test report # - some data is passed via env var: # - name of the test case # - template configuration (IPs, credentials, etc.) # - ... test-exec = './test-ldap-realms.pl' # Consider test as failed if test script does not finish fast enough test-timeout = 60 # Additional params for the test script, allowing for parameterized # tests. # Could also turn this into an array and loop over the values, in # order to create multiple test cases from the same definition. test-params = { foo = "bar" } # Second test case, using the same fixture [testcase.test-ldap-something-else] fixture = 'pve-with-ldap-server' test-exec = './test-ldap-something-else.pl' ``` -- - Lukas ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pve-devel] [RFC] towards automated integration testing 2023-10-13 13:33 [pve-devel] [RFC] towards automated integration testing Lukas Wagner @ 2023-10-16 11:20 ` Stefan Hanreich 2023-10-16 15:18 ` Lukas Wagner 2023-10-16 13:57 ` Thomas Lamprecht 1 sibling, 1 reply; 10+ messages in thread From: Stefan Hanreich @ 2023-10-16 11:20 UTC (permalink / raw) To: Proxmox VE development discussion, Lukas Wagner On 10/13/23 15:33, Lukas Wagner wrote: > - Additionally, it should be easy to run these integration tests locally > on a developer's workstation in order to write new test cases, as well > as troubleshooting and debugging existing test cases. The local > test environment should match the one being used for automated testing > as closely as possible This would also include sharing those fixture templates somewhere, do you already have an idea on how to accomplish this? PBS sounds like a good option for this if I'm not missing something. > As a main mode of operation, the Systems under Test (SUTs) > will be virtualized on top of a Proxmox VE node. > > This has the following benefits: > - it is easy to create various test setups (fixtures), including but not > limited to single Proxmox VE nodes, clusters, Backup servers and > auxiliary services (e.g. an LDAP server for testing LDAP > authentication) I can imagine having to setup VMs inside the Test Setup as well for doing various tests. Doing this manually every time could be quite cumbersome / hard to automate. Do you have a mechanism in mind to deploy VMs inside the test system as well? Again, PBS could be an interesting option for this imo. > In theory, the test runner would also be able to drive tests on real > hardware, but of course with some limitations (harder to have a > predictable, reproducible environment, etc.) Maybe utilizing Aaron's installer for setting up those test systems could at least produce somewhat identical setups? Although it is really hard managing systems with different storage types, network cards, ... . I've seen GitLab using tags for runners that specify certain capabilities of systems. Maybe we could also introduce something like that here for different bare-metal systems? E.g. a test case specifies it needs a system with tag `ZFS` and then you can run / skip the respective test case on that system. Managing those tags can introduce quite a lot of churn though, so I'm not sure if this would be a good idea. > The test script is executed by the test runner; the test outcome is > determined by the exit code of the script. Test scripts could be written Are you considering capturing output as well? That would make sense when using assertions at least, so in case of failures developers have a starting point for debugging. Would it make sense to allow specifying a expected exit code for tests that actually should fail - or do you consider this something that should be handled by the test script? I've refrained from talking about the toml files too much since it's probably too early to say something about that, but they look good so far from my pov. In general this sounds like quite the exciting feature and the RFC looks very promising already. Kind Regards Stefan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pve-devel] [RFC] towards automated integration testing 2023-10-16 11:20 ` Stefan Hanreich @ 2023-10-16 15:18 ` Lukas Wagner 2023-10-17 7:34 ` Thomas Lamprecht 0 siblings, 1 reply; 10+ messages in thread From: Lukas Wagner @ 2023-10-16 15:18 UTC (permalink / raw) To: Stefan Hanreich, Proxmox VE development discussion Thank you for the feedback! On 10/16/23 13:20, Stefan Hanreich wrote: > On 10/13/23 15:33, Lukas Wagner wrote: > >> - Additionally, it should be easy to run these integration tests locally >> on a developer's workstation in order to write new test cases, as well >> as troubleshooting and debugging existing test cases. The local >> test environment should match the one being used for automated testing >> as closely as possible > This would also include sharing those fixture templates somewhere, do > you already have an idea on how to accomplish this? PBS sounds like a > good option for this if I'm not missing something. > Yes, these templates could be stored on some shared storage, e.g. a PBS instance, or they could also distributed via a .deb/multiple .debs (not sure if that is a good idea, since these would become huge pretty quickly). It could also be a two-step process: Use one command to get the latest test templates, restoring them from a remote backup, converting them to a local VM template. When executing tests, the test runner could then use linked clones, speeding up the test setup time quite a bit. All in all, these templates that can be used in test fixtures should be: - easily obtainable for developers, in order to have a fully functional test setup on their workstation - easily updateable (e.g. installing the latest packages, so that the setup-hook does not need to fetch a boatload of packages every time) >> As a main mode of operation, the Systems under Test (SUTs) >> will be virtualized on top of a Proxmox VE node. >> >> This has the following benefits: >> - it is easy to create various test setups (fixtures), including but not >> limited to single Proxmox VE nodes, clusters, Backup servers and >> auxiliary services (e.g. an LDAP server for testing LDAP >> authentication) > I can imagine having to setup VMs inside the Test Setup as well for > doing various tests. Doing this manually every time could be quite > cumbersome / hard to automate. Do you have a mechanism in mind to deploy > VMs inside the test system as well? Again, PBS could be an interesting > option for this imo. > Several options come to mind. We could use a virtualized PBS instance with a datastore containing the VM backup as part of the fixture. We could use some external backup store (so the same 'source' as for the templates themselves) - however that means that the systems under test must have network access to that. We could also think about using iPXE to boot test VMs, with the boot image either be provided by some template from the fixture, or by some external server. For both approaches, the 'as part of the fixture' approaches seem a bit nicer, as they are more self-contained. Also, the vmbd2 thingy that thomas mentioned might be interesting for this - i've only glanced at it so far though. As of now it seems that this question will not influence the design of the test runner much, so it can probably be postponed to a later stage. >> In theory, the test runner would also be able to drive tests on real >> hardware, but of course with some limitations (harder to have a >> predictable, reproducible environment, etc.) > > Maybe utilizing Aaron's installer for setting up those test systems > could at least produce somewhat identical setups? Although it is really > hard managing systems with different storage types, network cards, ... . In general my biggest concern with 'bare-metal' tests - and to precise, that does not really have anything to do with being 'bare-metal', more about testing on something that is harder roll back into a clean state that can be used for the next test execution, is that I'm afraid that a setup like this could become quite brittle and a maintenance burden. At some point, a test execution might leave something in an unclean state (e.g. due to a crashed test or missing something while cleanup), tripping up the following test job. As an example from personal experience: One test run might test new packages which introduce a new flag in a configuration file. If that flag is not cleanup up afterwards, another test job testing other packages might fail because it now has to deal with an 'unknown' configuration key. Maybe ZFS snapshots could help with that, but I'm not sure how that would work in practice (e.g. due to the kernel being stored on the EFI partition). The automated installer *could* certainly help here - however, right now I don't want to extend the scope of this project too much. Also, there is also the question if the installation should be refreshed after every single test run, increasing the test cycle time/resource consumption quite a bit? Or only if 'something' breaks? That being said, it might also make sense to be able to run the tests (or more likely, a subset of them, since some will inherently require a fixture) against an arbitrary PVE instance that is under full control of a developer (e.g. a development VM, or, if feeling adventurous, the workstation itself). If this is possible, then these tests could the fastest way to get feedback while developing, since there is no need to instantiate a template, update, deploy, etc. In this case, the test runner's job would only be to run the test scripts, without managing fixtures/etc, and then reporting the results back to the developer. Essentially, as Thomas already mentioned, one approach to do this would be to decouple the 'fixture setup' and 'test case execution' part as much as possible. How that will look in practice will be part of further research. > I've seen GitLab using tags for runners that specify certain > capabilities of systems. Maybe we could also introduce something like > that here for different bare-metal systems? E.g. a test case specifies > it needs a system with tag `ZFS` and then you can run / skip the > respective test case on that system. Managing those tags can introduce > quite a lot of churn though, so I'm not sure if this would be a good idea. > I have thought about a tag system as well - not necessarily for test runners, but for test cases. E.g. you could tag tests for the authentication system with 'auth' - because at least for the local development cycle it might not make much sense to run tests for clusters, ceph, etc. while working on the authentication system. The 'tags' to be executed might then be simply passed to the test runner. These tags could also be used to mark the subset of 'simple' test cases that don't need a special test fixture, as described above... This could also be extended to a full 'predicate-like' system as Thomas described. >> The test script is executed by the test runner; the test outcome is >> determined by the exit code of the script. Test scripts could be written > Are you considering capturing output as well? That would make sense when > using assertions at least, so in case of failures developers have a > starting point for debugging. Yup, I'd capture stdout/stderr from all test executables/scripts and include it in the final test report. Test output is indeed very useful when determining *why* something went wrong. > > Would it make sense to allow specifying a expected exit code for tests > that actually should fail - or do you consider this something that > should be handled by the test script? I guess that's a matter of taste. Personally I'd keep the contract between test runner and test script simple and say 0 == success, everything else is a failure. If there are any test cases that expect a failure of some API call, then the script should 'translate' the exit code. If we discover that specifying an expected exit actually makes things easier for us, then adding it should be rather trivial - and easier than ripping it out the other way round. > I've refrained from talking about the toml files too much since it's > probably too early to say something about that, but they look good so > far from my pov. > > In general this sounds like quite the exciting feature and the RFC looks > very promising already. Thanks for your feedback! > > Kind Regards > Stefan -- - Lukas ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pve-devel] [RFC] towards automated integration testing 2023-10-16 15:18 ` Lukas Wagner @ 2023-10-17 7:34 ` Thomas Lamprecht 0 siblings, 0 replies; 10+ messages in thread From: Thomas Lamprecht @ 2023-10-17 7:34 UTC (permalink / raw) To: Proxmox VE development discussion, Lukas Wagner, Stefan Hanreich Am 16/10/2023 um 17:18 schrieb Lukas Wagner: > On 10/16/23 13:20, Stefan Hanreich wrote: >> I can imagine having to setup VMs inside the Test Setup as well for >> doing various tests. Doing this manually every time could be quite >> cumbersome / hard to automate. Do you have a mechanism in mind to >> deploy VMs inside the test system as well? Again, PBS could be an >> interesting option for this imo. >> > Several options come to mind. We could use a virtualized PBS instance > with a datastore containing the VM backup as part of the fixture. We > could use some external backup store (so the same 'source' as for the > templates themselves) - however that means that the systems under test > must have network access to that. We could also think about using > iPXE to boot test VMs, with the boot image either be provided by some > template from the fixture, or by some external server. For both > approaches, the 'as part of the fixture' approaches seem a bit nicer, > as they are more self-contained. What about the following approach: The test state that they need one or more VMs with certain properties, i.e., something like "none" (don't care), "ostype=win*", "memory>=10G" or the like (can start out easy w.r.t to supported comparison features, as long the base system is there it can be extended relatively easily later on). Then, on a run of a test first all those asset-dependencies are collected. Then they can be, depending on further config, get newly created or selected from existing candidates on the target test-host system. In general the test-system can add a specific tag (like "test-asset") to such virtual guests by default, and also add that as implicit property condition (if no explicit tag-condition is already present) for when searching for existing assets, this way one can either re-use guests, be it because they exist due to running on a bare-metal system, that won't get rolled back, or even in some virtual system that gets rolled back to a state that already has to virtual-guest test-assets configured and thus can also reduce the time required to set up a clean environment by a lot, benefiting both use cases. Extra config, and/or command line, knobs can then force re-creation of all, or some asses of, a test, or the base search path for images, here it's probably enough to have some simpler definitively wanted ones to provide the core-infra for how to add others, maybe more complex knobs in the future more easily (creating new things is IMO always harder than extending existing ones, at least if non-trivial). > Also, the vmbd2 thingy that thomas mentioned might be interesting for Because I stumbled upon it today, systemd's mkosi tool could be also interesting here: https://github.com/systemd/mkosi https://github.com/systemd/mkosi/blob/main/mkosi/resources/mkosi.md > this - i've only glanced at it so far though. > > As of now it seems that this question will not influence the design of > the test runner much, so it can probably be postponed to a later > stage. Not of the runner itself, but all set up stuff for it, so I'd at least try to keep it in mind – above features might not be that much work, but would create lots of flexibility to allow devs using it more easily for declarative reproduction tries of bugs too. At least I see it a big mental roadblock if I have to set up specific environments for using such tools, and cannot just re-use my existing ones 1:1. > >>> In theory, the test runner would also be able to drive tests on real >>> hardware, but of course with some limitations (harder to have a >>> predictable, reproducible environment, etc.) >> >> Maybe utilizing Aaron's installer for setting up those test systems >> could at least produce somewhat identical setups? Although it is >> really hard managing systems with different storage types, network >> cards, ... . > > In general my biggest concern with 'bare-metal' tests - and to > precise, that does not really have anything to do with being > 'bare-metal', more about testing on something that is harder roll back > into a clean state that can be used for the next test execution, is > that I'm afraid that a setup like this could become quite brittle and > a maintenance burden I don't see that as issue, just as two separate thing, one is regression testing in clean states where we can turn up reporting of test-failures to the max and the other is integration testing where we don't report widely but only allow some way to see list of issues for admins to decide. Bugs in the test system or configuration issue breaking idempotency assumptions can then be fixed, other issues that are not visible in those clean-room tests can become visible, I see no reason why both cannot co-exist and have equivalent priority/focus. New tests can be checked for basic idempotency by running them twice, with the second run not doing any rollback. >> I've seen GitLab using tags for runners that specify certain >> capabilities of systems. Maybe we could also introduce something like >> that here for different bare-metal systems? E.g. a test case >> specifies it needs a system with tag `ZFS` and then you can run / >> skip the respective test case on that system. Managing those tags can >> introduce quite a lot of churn though, so I'm not sure if this would >> be a good idea. > > I have thought about a tag system as well - not necessarily for test > runners, but for test cases. E.g. you could tag tests for the > authentication system with 'auth' - because at least for the local > development cycle it might not make much sense to run tests for > clusters, ceph, etc. while working on the authentication system. Yes, I thought about something like that too, a known set of tags (i.e., centrally managed set and bail, or at least warn if test uses unknown one) – having test runs be filtered by their use classes, like "migration" or "windows" or your "auth" example would be definitively nice. >>> The test script is executed by the test runner; the test outcome is >>> determined by the exit code of the script. Test scripts could be >>> written >> Are you considering capturing output as well? That would make sense >> when using assertions at least, so in case of failures developers >> have a starting point for debugging. > Yup, I'd capture stdout/stderr from all test executables/scripts and > include it in the final test report. I guess there would be a (optional) notification to a set of addresses, passed to the test system via CLI/Config by the tester (human on manual tests or derived from changes and maintainers for automated tests), and that would only have a summary and link/point to the full report that provides the longer outputs of test harness and possibly system logs. > Test output is indeed very useful when determining *why* something > went wrong. Journalctl of all nodes that took part of a test might be useful too. >> Would it make sense to allow specifying a expected exit code for >> tests that actually should fail - or do you consider this something >> that should be handled by the test script? > > I guess that's a matter of taste. Personally I'd keep the contract > between test runner and test script simple and say 0 == success, > everything else is a failure. If there are any test cases that expect > a failure of some API call, then the script should 'translate' the > exit code. W.r.t. exit code I find that fine, but maybe we want to allow passing a more formal result text back, but we always can extend this by just using some special files that the test script writes to, or something like that, in the future, here starting out with simply checking exit code seems fine enough to me. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pve-devel] [RFC] towards automated integration testing 2023-10-13 13:33 [pve-devel] [RFC] towards automated integration testing Lukas Wagner 2023-10-16 11:20 ` Stefan Hanreich @ 2023-10-16 13:57 ` Thomas Lamprecht 2023-10-16 15:33 ` Lukas Wagner 1 sibling, 1 reply; 10+ messages in thread From: Thomas Lamprecht @ 2023-10-16 13:57 UTC (permalink / raw) To: Proxmox VE development discussion, Lukas Wagner A few things, most of which we talked off-list already anyway. We should eye if we can integrate existing regression testing in there too, i.e.: - The qemu autotest that Stefan Reiter started and Fiona still uses, here we should drop the in-git tracked backup that the test VM is restored from (replace with something like vmdb2 [0] managed Debian image that gets generated on demand), replace some hard coded configs with a simple config and make it public. [0]: https://vmdb2.liw.fi/ - The selenium based end-to-end tests which we also use to generate most screenshots with (they can run headless too). Here we also need a few clean-ups, but not that many, and make the repo public. Am 13/10/2023 um 15:33 schrieb Lukas Wagner:> I am currently doing the groundwork that should eventually enable us > to write automated integration tests for our products. > > Part of that endeavor will be to write a custom test runner, which will > - setup a specified test environment > - execute test cases in that environment This should be decoupled from all else, so that I can run it on any existing installation, bare-metal or not. This allows devs using it in their existing setups with almost no change required. We can then also add it easily in our existing buildbot instance relatively easily, so it would be worth doing so even if we might deprecate Buildbot in the future (for what little it can do, it could be simpler). > - create some sort of test report As Stefan mentioned, test-output can be good to have. Our buildbot instance provides that, and while I don't look at them in 99% of the builds, when I need to its worth *a lot*. > > ## Introduction > > The goal is to establish a framework that allows us to write > automated integration tests for our products. > These tests are intended to run in the following situations: > - When new packages are uploaded to the staging repos (by triggering > a test run from repoman, or similar) *debian repos, as we could also trigger some when git commits are pushed, just like we do now through Buildbot. Doing so is IMO nice as it will catch issues before a package was bumped, but is still quite a bit simpler to implement than an "apply patch from list to git repos" thing from the next point, but could still act as a preparation for that. > - Later, this tests could also be run when patch series are posted to > our mailing lists. This requires a mechanism to automatically > discover, fetch and build patches, which will be a separate, > follow-up project. > > As a main mode of operation, the Systems under Test (SUTs) > will be virtualized on top of a Proxmox VE node. For the fully-automated test system this can be OK as primary mode, as it indeed makes things like going back to an older software state much easier. But, if we decouple the test harness and running them from that more automated system, we can also run the harness periodically on our bare-metal test servers. > ## Terminology > - Template: A backup/VM template that can be instantiated by the test > runner I.e., the base of the test host? I'd call this test-host, template is a bit to overloaded/generic and might focus too much on the virtual test environment. Or is this some part that takes place in the test, i.e., a generalization of product to test and supplementary tool/app that helps on that test? Hmm, could work out ok, and we should be able to specialize stuff relatively easier later too, if wanted. > - Test Case: Some script/executable executed by the test runner, success > is determined via exit code. > - Fixture: Description of a test setup (e.g. which templates are needed, > additional setup steps to run, etc.) > > ## Approach > Test writers write template, fixture, test case definition in > declarative configuration files (most likely TOML). The test case > references a test executable/script, which performs the actual test. > > The test script is executed by the test runner; the test outcome is > determined by the exit code of the script. Test scripts could be written > in any language, e.g. they could be Perl scripts that use the official > `libpve-apiclient-perl` to test-drive the SUTs. > If we notice any emerging patterns, we could write additional helper > libs that reduce the amount of boilerplate in test scripts. > > In essence, the test runner would do the following: > - Group testcases by fixture > - For every fixture: > - Instantiate needed templates from their backup snapshot Should be optional, possible a default-on boolean option that conveys > - Start VMs Same. > - Run any specified `setup-hooks` (update system, deploy packages, > etc.) Should be as idempotent as possible. > - Take a snapshot, including RAM Should be optional (as in, don't care if it cannot be done, e.g., on bare metal). > - For every testcase using that fixture: > - Run testcase (execute test executable, check exit code) > - Rollback to snapshot (iff `rollback = true` for that template) > - destroy test instances (or at least those which are not needed by > other fixtures) Might be optional for l1 hosts, l2 test VMs might be a separate switch. > In the beginning, the test scripts would primarily drive the Systems > under Test (SUTs) via their API. However, the system would also offer > the flexibility for us to venture into the realm of automated GUI > testing at some point (e.g. using selenium) - without having to > change the overall test architecture. Our existing selenium based UI test simple use the API to create stuff that it needs, if it's not existing, and sometimes remove also some. It uses some special ranges or values to avoid most conflicts with real systems, allowing one to point it at existing (production) systems without problems. IMO this has a big value, and I actually added a bit of resiliency, as I find that having to set up clean states a bit annoying and for one of the main use cases of that tooling, creating screenshots, too sterile. But always starting out from a very clean state is IMO not only "ugly" for screenshots, but can also sometimes mas issues that test can run into on systems with a longer uptime and the "organic mess" that comes from long-term maintenance. In practice one naturally wants both, starting from a clean state and from existing one, both have their advantages and disadvantages. Like messy systems also might have more false-positives on regression tracking. > > ## Mock Test Runner Config > > Beside the actual test scripts, test writers would write test > configuration. Based on the current requirements and approach that > I have chose, a example config *could* look like the one following. > These would likely be split into multiple files/folders > (e.g. to group test case definition and the test script logically). > > ```toml > [template.pve-default] > # Backup image to restore from, in this case this would be a previously > # set up PVE installation > restore = '...' > # To check if node is booted successfully, also made available to hook > # scripts, in case they need to SSH in to setup things. > ip = "10.0.0.1" > # Define credentials in separate file - most template could use a > # default password/SSH key/API token etc. > credentials = "default" > # Update to latest packages, install test .debs > # credentials are passed via env var > # Maybe this could also be ansible playbooks, if the need arises. fwiw, one could also define a config-deployment-system, like - none (already is setup) - cloudinit - QGA but that can be added later on too. > setup-hooks = [ > "update.sh", > ] > # Take snapshot after setup-hook, roll back after each test case > rollback = true > > > [template.ldap-server] > # Backup image to restore from > restore = '...' > credentials = "default" > ip = "10.0.0.3" > # No need to roll back in between test cases, there won't be any changes > rollback = false > > > > # Example fixture. They can be used by multiple testcases. > [fixture.pve-with-ldap-server] > # Maybe one could specify additional setup-hooks here as well, in case > # one wants a 'per-fixture' setup? So that we can reduce the number of > # base images? > templates = [ > 'pve-default', > 'ldap-server', > ] > > > # testcases.toml (might be split to multiple files/folders?) maybe some sort of predicates could be also nice (even if not there from the start), like to place condition where a test is skipped if that's not met, like the existence of a ZFS-storage or something like that. While those seem like details, having a general (simple) dependency and, so to say, anti-dependency system might influence overall design more. > [testcase.test-ldap-realms] > fixture = 'pve-with-ldap-server' > > # - return code is check to determine test case success > # - stderr/stdout is captured for the final test report > # - some data is passed via env var: > # - name of the test case > # - template configuration (IPs, credentials, etc.) > # - ... > test-exec = './test-ldap-realms.pl' > # Consider test as failed if test script does not finish fast enough > test-timeout = 60 > # Additional params for the test script, allowing for parameterized > # tests. > # Could also turn this into an array and loop over the values, in > # order to create multiple test cases from the same definition. > test-params = { foo = "bar" } > > # Second test case, using the same fixture > [testcase.test-ldap-something-else] > fixture = 'pve-with-ldap-server' > test-exec = './test-ldap-something-else.pl' > > ``` > Is the order of test-cases guaranteed by toml parsing, or how are intra- fixture dependencies ensured? Anyway, the most important thing is to start out here, so I don't want to block anything on base on minorish stuff. The most important thing for me is that the following parts are decoupled and ideally shippable by a separate debian package each: - parts that manage automated testing, including how the test host base system is set up (the latter could be even its own thing) - running test itself inclusive some helper modules/scripts - the test definitions As then we can run them anywhere easily and extend, or possible even rework some parts independently, if ever needed. - Thomas ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pve-devel] [RFC] towards automated integration testing 2023-10-16 13:57 ` Thomas Lamprecht @ 2023-10-16 15:33 ` Lukas Wagner 2023-10-17 6:35 ` Thomas Lamprecht 0 siblings, 1 reply; 10+ messages in thread From: Lukas Wagner @ 2023-10-16 15:33 UTC (permalink / raw) To: Thomas Lamprecht, Proxmox VE development discussion Thanks for the summary from our discussion and the additional feedback! On 10/16/23 15:57, Thomas Lamprecht wrote: >> - create some sort of test report > > As Stefan mentioned, test-output can be good to have. Our buildbot > instance provides that, and while I don't look at them in 99% of the > builds, when I need to its worth *a lot*. > Agreed, test output is always valuable and will definitely captured. >> >> ## Introduction >> >> The goal is to establish a framework that allows us to write >> automated integration tests for our products. >> These tests are intended to run in the following situations: >> - When new packages are uploaded to the staging repos (by triggering >> a test run from repoman, or similar) > > *debian repos, as we could also trigger some when git commits are > pushed, just like we do now through Buildbot. Doing so is IMO nice as it > will catch issues before a package was bumped, but is still quite a bit > simpler to implement than an "apply patch from list to git repos" thing > from the next point, but could still act as a preparation for that. > >> - Later, this tests could also be run when patch series are posted to >> our mailing lists. This requires a mechanism to automatically >> discover, fetch and build patches, which will be a separate, >> follow-up project. > >> >> As a main mode of operation, the Systems under Test (SUTs) >> will be virtualized on top of a Proxmox VE node. > > For the fully-automated test system this can be OK as primary mode, as > it indeed makes things like going back to an older software state much > easier. > > But, if we decouple the test harness and running them from that more > automated system, we can also run the harness periodically on our > bare-metal test servers. > >> ## Terminology >> - Template: A backup/VM template that can be instantiated by the test >> runner > > I.e., the base of the test host? I'd call this test-host, template is a > bit to overloaded/generic and might focus too much on the virtual test > environment. True, 'template' is a bit overloaded. > > Or is this some part that takes place in the test, i.e., a > generalization of product to test and supplementary tool/app that helps > on that test? It was intended to be a 'general VM/CT base thingy' that can be instantiated and managed by the test runner, so either a PVE/PBS/PMG base installation, or some auxiliary resource, e.g. a Debian VM with an already-set-up LDAP server. I'll see if I can find good terms with the newly added focus on bare-metal testing / the decoupling between environment setup and test execution. > Is the order of test-cases guaranteed by toml parsing, or how are intra- > fixture dependencies ensured? > Good point. With rollbacks in between test cases it probably does not matter much, but on 'real hardware' with no rollback this could definitely be a concern. A super simple thing that could just work fine is ordering test execution by testcase-names, sorted alphabetically. Ideally you'd write test cases that do not depend on each other any way, and *if* you ever find yourself in the situation where you *need* some ordering, you could just encode the order in the test-case name by adding an integer prefix - similar how you would name config files in /etc/sysctl.d/*, for instance. -- - Lukas ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pve-devel] [RFC] towards automated integration testing 2023-10-16 15:33 ` Lukas Wagner @ 2023-10-17 6:35 ` Thomas Lamprecht 2023-10-17 12:33 ` Lukas Wagner 0 siblings, 1 reply; 10+ messages in thread From: Thomas Lamprecht @ 2023-10-17 6:35 UTC (permalink / raw) To: Lukas Wagner, Proxmox VE development discussion Am 16/10/2023 um 17:33 schrieb Lukas Wagner: >> Or is this some part that takes place in the test, i.e., a >> generalization of product to test and supplementary tool/app that helps >> on that test? > > It was intended to be a 'general VM/CT base thingy' that can be > instantiated and managed by the test runner, so either a PVE/PBS/PMG > base installation, or some auxiliary resource, e.g. a Debian VM with > an already-set-up LDAP server. > > I'll see if I can find good terms with the newly added focus on > bare-metal testing / the decoupling between environment setup and test > execution. Hmm, yeah OK, having some additional info on top of "template" like e.g., "system-template", or "app-template", could be already slightly better then. While slightly details, IMO still important for overall future direction, I'd possibly split "restore" into "source-type" and "source", where the "source-type" can be e.g., "disk-image" for a qcow2 or the like to work directly on, or "backup-image" for your backup restore process, or some type for bootstrap tools like debootstrap or the VM specific vmdb2. Also having re-use configurable, i.e., if the app-template-instance is destroyed after some test run is done. For that, writing a simple info about mapping instantiated templates to other identifiers (VMID, IP, ...) in e.g. /var/cache/ (or some XDG_ directory to cater also to any users running this as non-root). Again, can be classified as details, but IMO important for the direction this is going, and not too much work, so should be at least on the radar. >> Is the order of test-cases guaranteed by toml parsing, or how are intra- >> fixture dependencies ensured? >> > > Good point. With rollbacks in between test cases it probably does not > matter much, but on 'real hardware' with no rollback this could > definitely be a concern. > A super simple thing that could just work fine is ordering test > execution by testcase-names, sorted alphabetically. Ideally you'd write > test cases that do not depend on each other any way, and *if* you ever > find yourself in the situation where you *need* some ordering, you > could> just encode the order in the test-case name by adding an integer > prefix> - similar how you would name config files in /etc/sysctl.d/*, > for instance. While it can be OK to leave that for later, encoding such things in names is IMO brittle and hard to manage if more than a handful of tests, and we hopefully got lots more ;-) From top of my head I'd rather do some attribute based dependency annotation, so that one can depend on single tests, or whole fixture on others single tests or whole fixture. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pve-devel] [RFC] towards automated integration testing 2023-10-17 6:35 ` Thomas Lamprecht @ 2023-10-17 12:33 ` Lukas Wagner 2023-10-17 16:28 ` Thomas Lamprecht 0 siblings, 1 reply; 10+ messages in thread From: Lukas Wagner @ 2023-10-17 12:33 UTC (permalink / raw) To: Thomas Lamprecht, Proxmox VE development discussion On 10/17/23 08:35, Thomas Lamprecht wrote: >>> Is the order of test-cases guaranteed by toml parsing, or how are intra- >>> fixture dependencies ensured? >>> >> >> Good point. With rollbacks in between test cases it probably does not >> matter much, but on 'real hardware' with no rollback this could >> definitely be a concern. >> A super simple thing that could just work fine is ordering test >> execution by testcase-names, sorted alphabetically. Ideally you'd write >> test cases that do not depend on each other any way, and *if* you ever >> find yourself in the situation where you *need* some ordering, you >> could> just encode the order in the test-case name by adding an integer >> prefix> - similar how you would name config files in /etc/sysctl.d/*, >> for instance. > > > While it can be OK to leave that for later, encoding such things > in names is IMO brittle and hard to manage if more than a handful > of tests, and we hopefully got lots more ;-) > > > From top of my head I'd rather do some attribute based dependency > annotation, so that one can depend on single tests, or whole fixture > on others single tests or whole fixture. > The more thought I spend on it, the more I believe that inter-testcase deps should be avoided as much as possible. In unit testing, (hidden) dependencies between tests are in my experience the no. 1 cause of flaky tests, and I see no reason why this would not also apply for end-to-end integration testing. I'd suggest to only allow test cases to depend on fixtures. The fixtures themselves could have setup/teardown hooks that allow setting up and cleaning up a test scenario. If needed, we could also have something like 'fixture inheritance', where a fixture can 'extend' another, supplying additional setup/teardown. Example: the 'outermost' or 'parent' fixture might define that we want a 'basic PVE installation' with the latest .debs deployed, while another fixture that inherits from that one might set up a storage of a certain type, useful for all tests that require specific that type of storage. On the other hand, instead of inheritance, a 'role/trait'-based system might also work (composition >>> inheritance, after all) - and maybe that also aligns better with the 'properties' mentioned in your other mail (I mean this here: "ostype=win*", "memory>=10G"). This is essentially a very similar pattern as in numerous other testing frameworks (xUnit, pytest, etc.); I think it makes sense to build upon this battle-proven approach. Regarding execution order, I'd now even suggest the polar opposite of my prior idea. Instead of enforcing some execution order, we could also actively shuffle execution order from run to run, at least for tests using the same fixture. The seed used for the RNG should be put into the test report and could also be provided via a flag to the test runner, in case we need to repeat a specific test sequence . In that way, the runner would actively help us to hunt down hidden inter-TC deps, making our test suite hopefully less brittle and more robust in the long term. Any way, lots of details to figure out. Thanks again for your input. -- - Lukas ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pve-devel] [RFC] towards automated integration testing 2023-10-17 12:33 ` Lukas Wagner @ 2023-10-17 16:28 ` Thomas Lamprecht 2023-10-18 8:43 ` Lukas Wagner 0 siblings, 1 reply; 10+ messages in thread From: Thomas Lamprecht @ 2023-10-17 16:28 UTC (permalink / raw) To: Lukas Wagner, Proxmox VE development discussion Am 17/10/2023 um 14:33 schrieb Lukas Wagner: > On 10/17/23 08:35, Thomas Lamprecht wrote: >> From top of my head I'd rather do some attribute based dependency >> annotation, so that one can depend on single tests, or whole fixture >> on others single tests or whole fixture. >> > > The more thought I spend on it, the more I believe that inter-testcase > deps should be avoided as much as possible. In unit testing, (hidden) We don't plan unit testing here though and the dependencies I proposed are the contrary from hidden, rather explicit annotated ones. > dependencies between tests are in my experience the no. 1 cause of > flaky tests, and I see no reason why this would not also apply for > end-to-end integration testing. Any source on that being the no 1 source of flaky tests? IMO that should not make any difference, in the end you just allow better reuse through composition of other tests (e.g., migration builds upon clustering *set up*, not tests, if I just want to run migration I can do clustering setup without executing its tests). Not providing that could also mean that one has to move all logic in the test-script, resulting in a single test per "fixture", reducing granularity and parallelity of some running tests. I also think that > I'd suggest to only allow test cases to depend on fixtures. The fixtures > themselves could have setup/teardown hooks that allow setting up and > cleaning up a test scenario. If needed, we could also have something > like 'fixture inheritance', where a fixture can 'extend' another, > supplying additional setup/teardown. > Example: the 'outermost' or 'parent' fixture might define that we > want a 'basic PVE installation' with the latest .debs deployed, > while another fixture that inherits from that one might set up a > storage of a certain type, useful for all tests that require specific > that type of storage. Maybe our disagreement stems mostly from different design pictures in our head, I probably am a bit less fixed (heh) on the fixtures, or at least the naming of that term and might use test system, or intra test system when for your design plan fixture would be the better word. > On the other hand, instead of inheritance, a 'role/trait'-based system > might also work (composition >>> inheritance, after all) - and > maybe that also aligns better with the 'properties' mentioned in > your other mail (I mean this here: "ostype=win*", "memory>=10G"). > > This is essentially a very similar pattern as in numerous other testing > frameworks (xUnit, pytest, etc.); I think it makes sense to > build upon this battle-proven approach. Those are all unit testing tools though that we do already in the sources and IIRC those do not really provide what we need here. While starting out simple(r) and avoiding too much complexity has certainly it's merits, I don't think we should try to draw/align too many parallels with those tools here for us. > > Regarding execution order, I'd now even suggest the polar opposite of my > prior idea. Instead of enforcing some execution order, we could also > actively shuffle execution order from run to run, at least for tests > using the same fixture. > The seed used for the RNG should be put into the test > report and could also be provided via a flag to the test runner, in case > we need to repeat a specific test sequence . Hmm, this also has a chance to make tests flaky and get a bit annoying, like perl's hash scrambling, but not a bad idea, I'd just not do that by default on the "armed" test system that builds on package/git/patch updates, but possibly in addition with reporting turned off like the double tests for idempotency-checking I wrote in my previous message. > In that way, the runner would actively help us to hunt down > hidden inter-TC deps, making our test suite hopefully less brittle and > more robust in the long term. Agree, but as mentioned above I'd not enable it by default on the dev facing automated systems, but possibly for manual runs from devs and a separate "test-test-system" ^^ In summary, the most important points for me is a decoupled test-system from the automation system that can manage it, ideally such that I can decide relatively flexible on manual runs, IMO that should not be to much work and it guarantees for clean cut APIs from which future development, or integration surely will benefit too. The rest is possibly hard to determine clearly on this stage, as it's easy (at least for me) to get lost in different understandings of terms and design perception, but hard to convey those very clearly about "pipe dreams", so at this stage I'll cede to add discussion churn until there's something more concrete that I can grasp on my terms (through reading/writing code), but that should not deter others from giving input still while at this stage. Thanks for your work on this. - Thomas ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pve-devel] [RFC] towards automated integration testing 2023-10-17 16:28 ` Thomas Lamprecht @ 2023-10-18 8:43 ` Lukas Wagner 0 siblings, 0 replies; 10+ messages in thread From: Lukas Wagner @ 2023-10-18 8:43 UTC (permalink / raw) To: Thomas Lamprecht, Proxmox VE development discussion On 10/17/23 18:28, Thomas Lamprecht wrote: > Am 17/10/2023 um 14:33 schrieb Lukas Wagner: >> On 10/17/23 08:35, Thomas Lamprecht wrote: >>> From top of my head I'd rather do some attribute based dependency >>> annotation, so that one can depend on single tests, or whole fixture >>> on others single tests or whole fixture. >>> >> >> The more thought I spend on it, the more I believe that inter-testcase >> deps should be avoided as much as possible. In unit testing, (hidden) > > We don't plan unit testing here though and the dependencies I proposed > are the contrary from hidden, rather explicit annotated ones. > >> dependencies between tests are in my experience the no. 1 cause of >> flaky tests, and I see no reason why this would not also apply for >> end-to-end integration testing. > > Any source on that being the no 1 source of flaky tests? IMO that > should not make any difference, in the end you just allow better Of course I don't have bullet-proof evidence for the 'no. 1' claim, but it's just my personal experience, which comes partly from a former job (where was I coincidentally also responsible for setting up automated testing ;) - there it was for a firmware project), partly from the work I did for my master's thesis (which was also in the broader area of software testing). I would say it's just the consequence of having multiple test cases manipulating a shared, stateful entity, be it directly or indirectly via side effects. Things get of course even more difficult and messy if concurrent test execution enters the picture ;) > reuse through composition of other tests (e.g., migration builds > upon clustering *set up*, not tests, if I just want to run > migration I can do clustering setup without executing its tests). > > Not providing that could also mean that one has to move all logic > in the test-script, resulting in a single test per "fixture", reducing > granularity and parallelity of some running tests. > > I also think that > >> I'd suggest to only allow test cases to depend on fixtures. The fixtures >> themselves could have setup/teardown hooks that allow setting up and >> cleaning up a test scenario. If needed, we could also have something >> like 'fixture inheritance', where a fixture can 'extend' another, >> supplying additional setup/teardown. >> Example: the 'outermost' or 'parent' fixture might define that we >> want a 'basic PVE installation' with the latest .debs deployed, >> while another fixture that inherits from that one might set up a >> storage of a certain type, useful for all tests that require specific >> that type of storage. > > Maybe our disagreement stems mostly from different design pictures in > our head, I probably am a bit less fixed (heh) on the fixtures, or at > least the naming of that term and might use test system, or intra test > system when for your design plan fixture would be the better word. I think it's mostly a terminology problem. In my previous definition of 'fixture' I was maybe too fixated (heh) on it being 'the test infrastructure/VMs that must be set up/instantatiated'. Maybe it helps to think about it more generally as 'common setup/cleanup steps for a set of test cases, which *might* include setting up test infra (although I have not figured out a good way how that would be modeled with the desired decoupling between test runner and test-VM-setup-thingy). > >> On the other hand, instead of inheritance, a 'role/trait'-based system >> might also work (composition >>> inheritance, after all) - and >> maybe that also aligns better with the 'properties' mentioned in >> your other mail (I mean this here: "ostype=win*", "memory>=10G"). >> >> This is essentially a very similar pattern as in numerous other testing >> frameworks (xUnit, pytest, etc.); I think it makes sense to >> build upon this battle-proven approach. > > Those are all unit testing tools though that we do already in the > sources and IIRC those do not really provide what we need here. > While starting out simple(r) and avoiding too much complexity has > certainly it's merits, I don't think we should try to draw/align > too many parallels with those tools here for us. > > > In summary, the most important points for me is a decoupled test-system > from the automation system that can manage it, ideally such that I can > decide relatively flexible on manual runs, IMO that should not be to much > work and it guarantees for clean cut APIs from which future development, > or integration surely will benefit too. > > The rest is possibly hard to determine clearly on this stage, as it's easy > (at least for me) to get lost in different understandings of terms and > design perception, but hard to convey those very clearly about "pipe dreams", > so at this stage I'll cede to add discussion churn until there's something > more concrete that I can grasp on my terms (through reading/writing code), > but that should not deter others from giving input still while at this stage. Agreed. I think we agree on the most important requirements/aspects of this project and that's a good foundation for my upcoming efforts. At this point, the best move forward for me is to start experimenting with some ideas and start with the actual implementation. When I have something concrete to show, may it be a prototype or some sort of minimum viable product, it's much easier to discuss any further details and design aspects. Thanks! -- - Lukas ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-10-18 8:43 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-10-13 13:33 [pve-devel] [RFC] towards automated integration testing Lukas Wagner 2023-10-16 11:20 ` Stefan Hanreich 2023-10-16 15:18 ` Lukas Wagner 2023-10-17 7:34 ` Thomas Lamprecht 2023-10-16 13:57 ` Thomas Lamprecht 2023-10-16 15:33 ` Lukas Wagner 2023-10-17 6:35 ` Thomas Lamprecht 2023-10-17 12:33 ` Lukas Wagner 2023-10-17 16:28 ` Thomas Lamprecht 2023-10-18 8:43 ` Lukas Wagner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox