Disclaimer – code post, hence, English only.

In a previous post, I mentioned that page objects are not enough, and introduced the concept I call “flows” (I can’t take credit for the name – it was already in use when I first got to work). The idea, basically, is to add a layer between the tests and the page objects that will be responsible of complex operations that are not in-scope for a single page object.
However, the flows have some limitations and while better than using raw page-objects, it can still be inconvenient to the point where we said “There must be a better way to do this”. 
The main problems we were seeing were: 
  • We used static flows, which meant that we were sending a ton of parameters for each flow – which suffers from readability & usability issues similar to  what can be seen in telescoping constructors.
    Essentially, a call to a flow would look like this:

    Where the function signature is:

    public static checkout(WebDriver driver,Reporter reporter,String baseUrl,CardDetails cardDetails,SHIPPING_OPTIONS shippingOptions,String discountCoupon,Boolean shouldAbortPurchase){...}
  • Code duplication
    Every now and then we wanted to do “this flow, only change this little thing” and this tiny change, in the middle of the flow, forced us to create another flow (or, when we were lazy, to add another parameter to the flow, which will be null 95% of the times it will be called, as in the previous code snippet).
  • multiple functions doing the same business action. It’s partially connected to the previous point, but using the naïve version of flows will end up having many flows that are similar in terms of business logic, but different in implementation (for instance: “purchase a book and register” is similar to “purchase a book with a registered user” and “purchase a book without registering”)
  • Ugly tests.
    Since we had several flavors in our system, writing a test that will simply perform a single purchase looked like this:
    if (isCardSmsEnabled){
         SMSFlows.checkout(driver,reporter,baseUrl,preferredPhoneNumber,otp, null);
    }else if (isPurchaseWithPassword){
       NonRegisteredFlows.checkout(driver,reporter,baseUrl,cardDetails,null,null, false);

    Sure, we could put this logic in a seperate method, except that then we would have doubled the parameters that should be null (why would an SMS enabled purchase need a password?)

  • Refactoring is painful.
    When we began, we did the mistake of sending username & password as strings. At some point it didn’t work anymore and we moved to sending a “User” object. Now I had to go over each and every flow and change them to support this new behavior – and it took me 3 days of knowing that I could have been doing something useful with my time instead of this partial refactoring. Why partial? because I didn’t go and change all of the calls to the flows – If there were more than 5 calls to a flow with username and password, I just left it lying around, which I could do only by overloading the methods to support both calls. 
So we clearly needed to find a solution. We wanted this solution to be simple to use, flexible and as future-proof  as we could get it. Each of these properties came from a pain we experienced:

  • Simple to use – This includes both not having a large number of parameters and not having to worry about the different flavors in our system. It should be “write once, run with all configurations”. The pain here was what is described in the 1st and 4th bullet above. 
  •  Flexible – we should be able to change the “default” behavior in a test without too much of a fuss and without causing ripples that will affect other tests. The story behind this was the one that made us realize our flows solution was not good enough anymore. We had a new timeout feature: after a certain time from the purchase start, if it was not completed, end the session and fail the purchase. Now, imagine what was the situation we were in: the flows didn’t have any notion of “wait”, and each flow was passing anywhere between 1 and 4 screens that we wanted to wait for a while before submitting the page. In the flows world, the choice was between sending a complex “sleep in step X for Y seconds” parameter or duplicating the flows to create “TimeoutFlows” that would have a method for each of the waiting places. Either way – Yuck!
  • Future proof –  The idea here is to avoid two kinds of problems: The application changes in a place shared between multiple flows, so we want to make the fix in one place only, and not in each flow, and we wanted to lower the cost of refactoring – even if we change a method signature. 
The solution we came up with is simple to describe but complex to implement, as what we did is to take that awful complexity that is part of our product’s business logic and hide it elsewhere.
It has the following parts:

  1. Commands – each step that we consider to be a single action (clicking “next”, filling a form, validating a value against the database, you name it) is encapsulated within a command. You can check the command design pattern in Wikipedia, but the general idea is that it doesn’t matter what lies beneath the surface, externally, there is only an “execute” method that is exposed. We cheated a bit and have two methods (“run” and “runCancel”), but the idea that every command exposes the same interface still stands.
    public class FillPasswordCommand implements ITestCommand {
    WebDriver driver;
    PasswordContext passwordContext;
    IStepResult result;
    public FillPasswordCommand(WebDriver driver,ITestContext context,IStepResult result){
    //see explanation for this below
    public IStepResult run(){
    PasswordPage page = new PasswordPage(driver);
    //skipped some verifications to keep the example short
    result.addScreenShot(driver,"after filling password");
    return result;
  2. Context chameleon objects – This part is a bit odd, and the reasoning behind it is that we had the following conflicting requirements:
    1. The context should contain every bit of information that any command might need, now or in the future
    2. The context object shall not have too many methods (the idea is to utilize efficiently the IDE auto-complete functionality, which won’t be very helpful if you have over 100 methods)
    3. There will be only one context object that will be used to create multiple commands. 

    The solution we came up with was to have multiple context objects, and then combine them all into some sort of a Megazord (for those who have failed the age\culture test – a Megazord is the giant robot resulting from combining the power-rangers personal robots together). The object can then be cast to represent any of the underlying objects. So we might have “IUserRegistrationContext”,”IDbConnectionData” and “IPurchaseContext” all bundled together. As the code itself is neither short nor is it self explanatory, I won’t include a code sample, but the idea of what we did is as follows: For each type of context we wanted we created an in
    terface  (the “I” at the beginning is marking “interface”), then when we want to merge a couple of these we create a Java proxy object that answers for all interfaces implemented by both contexts. The InvocationHandler is just holding the two contexts and redirecting the calls appropriatly.This was also the first time I looked at a Java code example and did not understand what I was reading1.. All of the contexts are initialized at the setup method of our base test, and each specific test needs only to change the relevant context values that matter to it (plus, the defaults match most of the use cases, so a test won’t need to change many parameters.

  3. Commands runner – This entity holds a chain of commands and is responsible to run them one after the other. In case that I want to change something – say, click “cancel” after the 3rd screen I see, the runner is the one responsible to do that for me. I want to sleep 2 seconds between commands? The runner again. I want to add a specific command somewhere in between the existing commands? I will have to do this before calling the runner to execute the steps.
  4. The Flow Factory – Remember that I said that we hid all of the complexity in another place? well, this is that other place, or at least – most of it. This part returns a runner with the chain of commands built inside.
    Here we read the context objects, build new command instances in the right order and return them to the test. Since the logic it encapsulates is complex, we have broken it into ~5 different classes, just to keep things readable.
    How complex it is? well, when we started, we created a decision chart. It now has some additional nodes that make it just a bit more fun. (It is redacted, since I don’t know how much I can share, so I left the interesting questions out, but the decision tree structure remained to illustrate the inherent difficulty).
    Using the factory, one the other hand, is really simple. here’s an example:


So, a short summary of the test-commands is that the tests are sending context to a factory in order to get a runner that will execute the required actions. And that’s it.
What did we gain from this construct?
Well, quite a bit:

  1. The test does not call a method with fifty parameters, out of which half are null. We could have gained most of that by using non-static flows, but I feel this works better also in this aspect. 
  2. All of our tests are now configuration oblivious (to the extent that the business logic does not change according to those configurations) – we don’t have to worry about getting the correct flow. 
  3. We have the ability to intervene in the middle of a chain without creating a new flow – so no code duplication. 
  4. Adding new behaviors is actually easier & faster – since every part of the chain creation is isolated, we don’t need to create the whole flow from scratch (or, as was common – copy, paste & edit), we can just add the needed code at the right point. For example – when we added a new challenge (we had password & SMS, we wanted to add another one), all that it took was to add the code that deals with the new screens to the switch statement dealing with the challenge type – and did I mention that all of our tests now supported this new challenge? this is really the point where I wanted to shout “presto!”
  5. Writing tests got shorter to a third. Not “by a third”, to a third. It also enables us to focus our attention on the important stuff that are developed instead of making sure our tests are compatible with the multitude of flavors our product has.

As you can see – there’s quite a lot of work to get to the point where the commands are working, and it might not be intuitive at first. It has some advantages over the flows implementation, but those advantages do not always outweigh the drawbacks of high initial cost. So, when to use what?
If your application has a small number of atomic actions (by that I mean “things a user would consider as a single action”), and they are strongly distinguished from one another – Flows are probably OK for you. If, however, there are a lot of similar actions or they change rapidly – commands are probably better. Currently, we consider using a slightly different approach for dealing with situation where a user will perform more than one action (“go and buy something” is one action, but “check user history, then unlock the user account and reset the password” are three separate actions) – the concept of the commands will probably stay, but we consider replacing the factory in a builder. The difference is that in a builder we could do something like


But, we’ll have to wait for a trigger to start working on that  – implementing such a solution would not be short and just like everyone else – we have more improvements we want to make than time to implement them.
I hope you’ll find this idea useful, if there are any questions (I did try to explain what we do with the commands, but I feel it might not be as straightforward as I think it is) – don’t hesitate to ask.

1  Reflection in Java, and proxies in particular, can be a bit confusing when you first encounter them. If you are a bit confused as I was, all you need to know is that a Java proxy has two parts: A list of interfaces that it is faking and will answer true for “instanceof” queries, and an invocation handler, which is the part that is responsible to actually do something when a method is called. It can be as simple as just returning null value,  adding a delay or counting the number of times each method was invoked for this specific object, or it could be as complex as you would like it to be.