Designing Scalable Tests for Feature Flags

Feature flags introduce a layer of dynamic behavior in applications, enabling toggled changes without redeployment. While they empower development and experimentation, they also bring unique challenges to testing. Designing tests around feature flags requires recognizing that one size does not fit all—different scenarios demand different strategies. In this post, we explore a range of approaches to help maintain adaptable, and scalable test suites.


Context

Working with feature flags often starts with creating separate tests for each flag state. While this approach works, exploring more streamlined and scalable strategies can lead to better results. Feature flags add layers of complexity to workflows, making adaptable and creative testing strategies essential. By leveraging flexible approaches, we can maintain clarity, minimize redundancy, and manage growing complexity while ensuring high-quality tests.

This blog post explores strategies like parameterized tests, feature logic handlers, subclassing, and leveraging pytest fixtures to optimize test execution and design for feature flags. These strategies are adaptable to various workflows and scenarios, ensuring they are applied effectively.

To illustrate these strategies, I have used examples from Qxf2’s ACC Model App —an in-house React application designed to manage Attributes, Components, and Capabilities (ACC) in software projects. Some strategies are drawn from the tests I developed for feature flags implemented in this app, using Qxf2’s Page Object Model framework.


Strategies

In this section, I will outline the various strategies I explored and applied to effectively test feature flags for different scenarios. While the example tests demonstrate key concepts, they are simplified for clarity and do not represent complete, production-ready code. Additionally, these snippets don’t reflect Qxf2’s exact coding practices but are tailored to illustrate the strategies discussed.

1. Separate Tests for Each Flag State

Creating separate tests for each feature flag is a reliable strategy for ensuring clarity and thoroughness. Isolating the behavior of the application when a flag is enabled or disabled reduces ambiguity. This approach works well when feature flags introduce significant changes to the UI or workflows.

Example: Home Page Redesign
In the ACC Model app, on the home page, two buttons Login and Authenticate buttons appear in two locations: in the middle of the page and the navigation bar. The home page redesign introduced a feature flag to control the visibility of these buttons.

Feature Flag disabled: Users directly interact with the buttons on the home page or navigation bar.
Feature Flag enabled: The redesign removes the buttons from their original positions and introduces a personalized greeting section on the home page. If the system detects a returning user, it dynamically displays their name in the greeting. New users see the ‘Get Started’ link in the navigation bar.

Below are the test snippets that illustrate how to validate different feature flag states.
Each method used in the test is a page object that encapsulates the actions and validations required to perform that function. These page objects interact with the page elements to perform checks and log the results accordingly.

Test for Feature Flag OFF
def test_acc_model_home_page(test_obj):
    "Validate the home page of the ACC Model application."
    try:
        test_obj = PageFactory.get_page_object("acc model home page", base_url=test_obj.base_url)
 
        # Step 1: Verify the presence of 'Register' and 'Login' buttons
        result_flag = test_obj.verify_auth_buttons_presence()
        test_obj.log_result(result_flag,
                            positive="'Register' and 'Login' buttons are present as expected.",
                            negative="'Register' or 'Login' buttons are missing unexpectedly.")
 
        # Step 2: Verify the 'Register' functionality
        result_flag = test_obj.verify_register_flow()
        test_obj.log_result(result_flag,
                            positive="Successfully initiated the 'Register' flow.",
                            negative="Failed to initiate the 'Register' flow.")
 
        # Step 3: Verify the 'Login' functionality
        result_flag = test_obj.verify_login_flow()
        test_obj.log_result(result_flag,
                            positive="Successfully initiated the 'Login' flow.",
                            negative="Failed to initiate the 'Login' flow.")
 
# Output the results.
        test_obj.write_test_summary()
 
        ...
Test for Feature Flag ON
def test_redesigned_acc_model_home_page(test_obj, setup_flag):
    "Validate the redesigned home page of the ACC Model application with feature flag enabled."
    try:
        test_obj = PageFactory.get_page_object("acc model home page", base_url=test_obj.base_url)
 
        # Step 1: Verify the absence of 'Register' and 'Login' buttons (as they are removed in the redesign)
        result_flag = test_obj.verify_auth_buttons_absence()
        test_obj.log_result(result_flag,
                            positive="'Register' and 'Login' buttons are absent as expected.",
                            negative="'Register' or 'Login' buttons are unexpectedly present.")
 
        # Step 2: Verify the personalized greeting is displayed for a returning user
        result_flag = test_obj.verify_personalized_greeting()
        test_obj.log_result(result_flag,
                            positive="Personalized greeting is displayed for returning user.",
                            negative="Personalized greeting is missing for returning user.")
 
        # Step 3: Verify the "Get Started" link is present for new users
        result_flag = test_obj.verify_get_started_link_presence()
        test_obj.log_result(result_flag,
                            positive="'Get Started' link is present for new users as expected.",
                            negative="'Get Started' link is missing for new users.")
 
        # Step 4: Check for other relevant changes specific to the redesign
        result_flag = test_obj.verify_navigation_bar_update()
        test_obj.log_result(result_flag,
                            positive="Navigation bar updates are correctly reflected.",
                            negative="Navigation bar updates are not as expected.")
 
        # Output the results.
        test_obj.write_test_summary()
 
        ...

This is generally how tests are structured to validate different feature flag states. In practice, we organize these tests into separate functions within a module, each targeting specific scenarios. This modular approach helps maintain clarity and simplifies maintenance. However, relying solely on this strategy may not always be the best option. It can lead to redundant tests and reduced efficiency, especially when changes are minor. Testers should evaluate whether splitting tests adds value or just increases maintenance effort.

2. Parametrized Tests

So far, we have explored separate tests for different feature flag states. However, another effective strategy is to use parameterized tests, which allow us to toggle flag states within a single test suite. This approach is particularly useful when the underlying workflow stays consistent, even if the UI changes significantly. When the UI redesign primarily involves changes to how web elements like buttons and other components display, we can conditionally check locators and elements based on the flag state.

Example: Edit User Redesign on the Manage Users Page
In the ACC Model app, one of the pages lists the Users of the app with Edit and Delete buttons for each row. The feature flag helped with the redesign of the Edit User functionality.

Feature Flag disabled: Clicking Edit opens a modal pop-up, where the user can update the email address and then save the changes.
Feature Flag enabled: Clicking Edit now makes the row inline editable, allowing the user to update the email address directly within the row and save the changes.

The overall flow remains the same: navigate to the page, edit a user’s email, and save changes. However, the UI interactions differ based on the flag state, such as how editing is initiated and saved.

Test Code

Since the change in the redesign is mostly around different web elements, we can dynamically select locators based on the flag state in the page objects.

class ACCModelUsersPage(Web_App_Helper):
    "Page object for the Users page of the ACC Model Application"
 
    # common locators
    username_login = locators.username_login
    password_login = locators.password_login
    login_button = locators.login_button
 
    def set_locators(self, feature_flag_state):
        """
        Set locators dynamically based on feature flag state.
        """
        if feature_flag_state:
            self.edit_button = locators.edit_button_inline
            self.save_button = locators.save_button_inline
            self.email_field = locators.email_field_inline
        else:
            self.edit_button = locators.edit_button
            self.save_button = locators.save_button_edit_form
            self.email_field = locators.email_edit_form

Importantly, while the locators differ, the actual page object methods remain common, minimizing code duplication.

@Wrapit._exceptionHandler
def click_edit_button(self):
    """
    Click the Edit button on the user row.
    """
    result_flag = self.click_element(self.edit_button)
    self.conditional_write(
        result_flag,
        positive="Clicked the Edit button successfully.",
        negative="Failed to click the Edit button.",
        level='debug'
    )
    return result_flag
 
...

We then write the tests to validate the core functionality without duplication. Using pytest.mark.parametrize, we can dynamically toggle the flag state and test both scenarios.

@pytest.mark.parametrize("feature_flag_state", [False, True])
@pytest.mark.GUI
def test_manage_users_page(test_obj, feature_flag_state):
    "Test the manage users page with or without the feature flag"
    try:
        test_obj = PageFactory.get_page_object("manage users page", base_url=test_obj.base_url)
 
        test_obj.set_locators(feature_flag_state)
 
        result_flag = test_obj.login(conf.name, conf.password)
        test_obj.log_result(result_flag,
                            positive="Logged in successfully.",
                            negative="Failed to login."
                            )
 
        result_flag = test_obj.click_on_users_link()
        test_obj.log_result(
            result_flag,
            positive="Successfully navigated to the Manage Users page.",
            negative="Failed to navigate to the Manage Users page.",
        )
 
        result_flag = test_obj.click_edit_button()
        test_obj.log_result(result_flag,
                            positive="Clicked on the Edit button.",
                            negative="Failed to click on the Edit button."
                            )
 
        result_flag = test_obj.update_email()
        test_obj.log_result(result_flag,
                            positive="Updated the email in the Edit form.",
                            negative="Failed to update the email in the Edit form."
                            )
 
        result_flag = test_obj.click_save_button()
        test_obj.log_result(result_flag,
                            positive="Clicked on the Save button.",
                            negative="Failed to click on the Save button."
                            )
 
        test_obj.write_test_summary()
 
       ....

3. Subclassing Page Objects

When feature flags result in different UI versions or functionalities, it’s important to structure tests in a way that can easily adapt to these changes. A good approach lets us reuse shared functionality while isolating feature-specific differences. Subclassing page objects offers a practical solution to this, striking a balance between flexibility and maintainability.

Example: Differentiated UI Features on the Manage Users Page
Extending the earlier use case of the Manage Users page redesign, this scenario highlights specific functionalities that differ between the old and new UIs.

Feature Flag disabled: Supports the sorting of users
Feature Flag enabled: Replaces sorting with a filtering feature

Here, the core workflow (e.g., navigating to the page, editing user details) remains the same, but feature-specific functionality like sorting or filtering requires distinct handling. We can address this by subclassing UI-specific functionality. A base class handles the shared functionality, while subclasses manage the UI-specific features.

Test Code

Base Class: Contains shared functionality (eg: login, navigation, common actions).

class ACCModelUsersPage(Web_App_Helper):
    "Page object for the Users page of the ACC Model Application"
 
    def login(self, username, password):
        # Logic for entering username, password, and clicking login
        return result_flag

Subclass for Old UI: Handles sorting functionality.

class ACCModelUsersOldUIPage(ACCModelUsersPage):
    """
    Page object for the old UI version of the Users page in the ACC Model Application.
    """
    def sort_users(self):
        # Logic for sorting users
        return result_flag

Subclass for New UI: Handles filtering functionality.

class ACCModelUsersNewUIPage(ACCModelUsersPage):
    """
    Page object for the new UI version of the Users page in the ACC Model Application.
    """
    def filter_users(self):
        # Logic for filtering users
        return result_flag
Dynamically Mapping Page Objects

We use PageFactory to dynamically select the correct page object based on the feature flag state.
After defining the necessary subclasses, we dynamically assign the correct page object based on the feature flag state.

class PageFactory():
    "PageFactory uses the factory design pattern."
    @staticmethod
    def get_page_object(page_name, feature_flag_state=None, base_url=url_conf.ui_base_url):
        "Return the appropriate page object based on page_name"
        test_obj = None
        page_name = page_name.lower()
        if page_name in ["zero","zero page","agent zero"]:
            from page_objects.zero_page import Zero_Page
            test_obj = Zero_Page(base_url=base_url)
        elif page_name == "manage users page":
            if feature_flag_state:
                from page_objects.examples.acc_model_app.users_page import ACCModelUsersNewUIPage
                test_obj = ACCModelUsersNewUIPage(base_url)
            else:
                from page_objects.examples.acc_model_app.users_page import ACCModelUsersOldUIPage
                test_obj = ACCModelUsersOldUIPage(base_url)
        return test_obj

In the base class, we define a method to handle feature-specific actions. Depending on the feature flag state, this method delegates to the appropriate functionality such as filtering users or sorting users (for this example). The core logic remains the same across feature variations, but the specifics are isolated in the subclasses.

class ACCModelUsersPage(Web_App_Helper):
    "Page object for the Users page of the ACC Model Application"
 
    def set_feature_flag(self, feature_flag_state):
        "Set the feature flag state"
        self.feature_flag_state = feature_flag_state
 
    def perform_feature_specific_action(self):
        """
        Decide what action to perform based on the feature flag state.
        """
        if self.feature_flag_state:
            if hasattr(self, 'filter_users'):
                return self.filter_users()
            else:
                raise NotImplementedError('Subclasses must define filter_users method')
        else:
            if hasattr(self, 'sort_users'):
                return self.sort_users()
            else:
                raise NotImplementedError('Subclasses must define sort_users method')

In the test, we pass the feature_flag_state to dynamically load the correct page object and perform the relevant actions. The test code stays focused on executing the common logic, while the feature-specific actions are handled based on the feature flag.

@pytest.mark.parametrize("feature_flag_state", [False, True])
@pytest.mark.GUI
def test_manage_users_page(test_obj, feature_flag_state):
    "Test the manage users page with or without the feature flag"
    try:
        # Create a test object.
        test_obj = PageFactory.get_page_object("manage users page", feature_flag_state, base_url=test_obj.base_url)
        test_obj.set_feature_flag(feature_flag_state)
        test_obj.set_locators()
 
        # Perform common actions
        result_flag = test_obj.login(conf.name, conf.password)
        test_obj.log_result(result_flag,
                            positive="Logged in successfully.",
                            negative="Failed to login."
                            )
 
        # Other common actions
 
        # Execute feature-specific methods
        result_flag = test_obj.perform_feature_specific_action()
        test_obj.log_result(result_flag,
                            positive="Successfully performed feature specific actions.",
                            negative="Failed to perform feature specific actions."
                            )
 
        test_obj.write_test_summary()

This approach allows us to reuse common functionality, such as login and navigation, while isolating feature-specific actions like sorting or filtering.

Both subclassing and parameterized tests effectively manage UI changes driven by feature flags, particularly when core functionality remains consistent despite UI variations. By isolating feature-specific logic, these approaches preserve shared functionality across versions, keeping tests organized and reusable.

However, as feature flags grow, managing subclasses and variations becomes complex, potentially leading to tightly coupled test logic. To address this, use external configuration files to dynamically map flags to behaviors, minimizing duplication and simplifying maintenance. For multi-variant flags, implement dynamic method selection in the base class to handle specific actions based on flag states. Proper test suite design is essential—breaking tests into smaller, reusable modules ensures adaptability to different flag states while keeping tests clean, scalable, and maintainable, even as complexity increases.

Managing Feature Flag States with Fixtures

So far, we have explored strategies for designing tests around feature flags. Managing feature flags depends on how they are implemented. In staging or test environments, where testers can control feature flags through an endpoint, REST API, or similar mechanism, fixtures provide an effective solution. They dynamically configure the test environment based on the flag state, centralizing setup and teardown logic. By toggling the flag within the fixture, testers can precisely control the application’s behavior for each test case.

Below is an example of designing fixtures to streamline test management when feature flags are programmatically accessible. If your flags are managed through a configuration file, the implementation would differ. This example focuses on demonstrating the efficiency of fixtures, though the specifics depend on your application’s feature flag setup.

1. Initializing the Flag Manager

We define a fixture to initialize the flag manager, which could be self-hosted or a third-party service like LaunchDarkly. In this example, I have used LaunchDarklyFlagManager.

@pytest.fixture(scope="module")
def flag_manager(): 
    """Initialize the LaunchDarklyFlagManager instance for the test module."""
    return LaunchDarklyFlagManager()

This initializes an instance of the flag manager at the module level, allowing it to be reused across all tests in the module. The LaunchDarklyFlagManager class would include methods for setting and retrieving feature flag values.

2. Setting the Initial State of the Flag

We then define a fixture to ensure that every test starts with a predefined flag state, regardless of the current flag value. This avoids test flakiness caused by leftover states from other tests.

@pytest.fixture(scope="function")
def set_flag_initial_state(flag_manager): 
    """Set feature flags before running a test."""
    flag_manager.set_flag("hideAuthButtons", "OFF")

The scope of this fixture is set to function so it resets the flag state before each test. This ensures isolation, as each test starts with the same known state.

3. Toggling the Feature Flag

Next, we define a fixture to toggle the feature flag dynamically. This fixture accepts the flag state as a parameter using request.param.

@pytest.fixture(scope="function")
def setup_flag(request, flag_manager, set_flag_initial_state):
    """Dynamically set feature flag states for a test."""
    flag_state = request.param
    flag_manager.set_flag("hideAuthButtons", flag_state)

Here, setup_flag depends on set_flag_initial_state, ensuring the flag always starts from the initial state before being toggled. The request.param allows us to pass the desired flag state from the test, making it easy to validate both states of the feature flag.

4. Writing the Parameterized Test

Finally, we use pytest.mark.parametrize to define test cases for different flag states.

@pytest.mark.parametrize("feature_flag_state", [False, True])
@pytest.mark.GUI
def test_manage_users_page(test_obj, feature_flag_state):
    """Test the manage users page with different feature flag states."""

Here, feature_flag_state provides values for toggling the flag, enabling the test to validate both enabled and disabled states.

If testers don’t have direct access to control feature flags, collaboration with the development team becomes essential. Developers can help create a test environment with predefined flag states or introduce a mechanism, like environment-specific configurations, that allow testers to validate different scenarios. Alternatively, testers can simulate flag states by mocking feature flag behavior within the tests themselves, provided the application architecture supports such flexibility. This ensures that the testing process remains thorough and reliable, even without direct flag management access.


Handling Retired Feature Flags

When a feature flag is retired, testers need to update the test cases and clean up code related to the flag. However, it’s essential to collaborate closely with developers during this process. Testers must understand the timeline for flag retirement so they can update tests before the flag is fully retired. Without this information, tests may fail as flag-dependent code paths are merged or removed.

By working with developers, testers can ensure they are prepared for the transition, updating tests to align with the new, unified implementation. This collaboration minimizes disruptions and ensures the tests remain accurate and reliable throughout the process.


Conclusion

In summary, testing with feature flags can add complexity, but with the right strategies, it’s possible to manage it efficiently. By carefully selecting and adapting testing techniques, testers can ensure they maintain clear, reliable tests, even as the number of flags grows or becomes more dynamic. Collaboration with developers plays a key role, especially when managing more complex scenarios or ensuring flag-dependent behavior is properly handled. This combination of effective strategies and collaboration ensures that feature flags don’t compromise test quality, but rather enhance the flexibility of your testing process.


Take your testing to the next level with Qxf2

Qxf2 has been helping startups navigate complex testing challenges since 2013. Our deep expertise in testing strategies, including advanced topics like feature flags, ensures your releases are faster, safer, and more controlled. If you’re looking for a QA partner that understands the nuances of modern development workflows, explore our specialized QA services for startups.


Leave a Reply

Your email address will not be published. Required fields are marked *