Tuesday, April 19, 2011

Notes from DDD & CQRS training - day 2

That's when things got really interesting. The topics covered were more or less the same as in the 6.5h video from one of Greg's previous trainings which I had watched some time before the training during a long, lonely night at a hotel in Germany, but still I was listening at 100% attention. Without further babble, here go my notes:

Read model

  • simple, hard to screw up
  • to be done by low value/junior developers
  • can be outsourced
    • can't find better thing to OS
  • uses whatever model is appropriate
Command handlers
public interface Consumes<T> where T:Message{ 
  //T would be a command(in case of command handlers) 
  //or an event in case of event handlers/projections
  void consume(T);
}
public interface Message {
  // just a marker interface
}
  • they are the application services in CQRS-based systems, the external edge of the domain model
  • should contain no logic, not even a simple if statement
  • can implement cross-cutting concerns
    • logging
    • transactions
    • authentication
    • authorization
    • batch commands
    • exceptions handling
    • merging
  • handle cross-cutting concerns not directly in the same class that invokes aggregate method, but using composition (think decorator pattern).
class DeactivateInventoryItemCommandHandler : 
  Consumes<DeactivateInventoryItemCommand> {
  /* constructor-injected repository */
  void consume(DeactivateInventoryItemCommand msg) {
    var item = repository.getById(msg.id);
    item.deactivate(msg.comment);
    // this makes batch cmd processing impossible
    repository.save(item); 
  }
}

class LoggingHandler<T> : Consumes<T> {
  public LoggingHandler(Consumes<T> next) {
    this.next = next;
  }
  public void consume(T message){
    Logger.write("received message:" + message);
    next.consume(message);
  }
}

var handler = new LoggingHandler(
  new DeactivateInventoryItemCommandHandler(
    repo));// yay, we've got logging!

class AuthorizingHandler : Consumes<T>{
  AuthorizingHandler(Consumes<T> next){...}
  void consume(T message){
    // check authorization then do:
    next.consume(message);
  }
}
  • make command handler wrapping automatic with reflection:
[RequiresPermission("admin")]
class DeactivateInventoryItemCommandHandler : 
  Consumes<DeactivateInventoryItemCommand> {....}

  • we can make our code our configuration




  • above is equal to doing functional composition (with interfaces). it could also be done explicitly:



  • // let's have a lambda:
    return x => DeactivateInventoryItemCommandHandler(
      new TestRepository<InventoryItem>(), x); 
      // this is DI in functional language
      // using function currying - that's so cool!

    public void DeactivateInventoryItemCommandHandler
      (Repository<InventoryItem repo,
       DeactivateInventoryItemCommand) {...}
    Projections
    • consume many events to update a view model
    • an important explicit concept
    • will have multiple methods, each handling another type of event
    • are in 1-to-1 relation with tables (sometimes, but rarely, 1-N)
    class InventoryItemCurrentCountProjection : 
      Consumes<InventoryItemDeactivated/*Event*/>, 
      Consumes<IventoryItemCreated> , ... 
    // more events needed to update the view model 
            // can't directly translate to Java :(
    {
    void consume(InventoryItemDeactivated message) {
    // do sth
    }
    void consume(IventoryItemCreated message) {
    // do sth else
    }
    }
    BOOK TO READ: The little LISPer


    CQRS can be done using a single data store for writes & reads. Like building the read model based on SQL views. But we can drive the specialization of write & read side even further. Finally, they've got totally different characteristics.
    And reports run on 3NF database are so sloooow.
    Enter:
    Events
    • verbs in past tense - they are things that have already happened, actions completed in the past (think passé composé)
    • listeners can disagree with them but can't say NO
      • can only compensate
    • can be used for synchronizing multiple different models
    So, we'll have our domain model implemented with (n)Hibernate emit events so that we can have our beloved 3NF database and denormalize into multiple read models (to get near infinite scalability)? Just having a 2PC transaction between the write db and a queue?

    Nope. This is guaranteed to fail.
    Why? 
    • ORM creates series of deltas
    • we have to prove that Δ(Hibernate) = Δ(events) - not easy
    • models can get out of sync in case of a bug
      • such problems can be hard to spot
      • impossible to fix data model broken this way
    So, what shall we do?
    • get rid of the ORM so our events are our only source of truth
    • we can have projections populating our 3NF model
      • but is it worth the costs and increased size of our code base?
    • this is the poison pill architecture
      • will get you to event sourcing
      • getting rid of 3NF model will let you get rid of 
        • your DBA freaks
        • costs of DB licences (business will like it!)
    Event sourcing At last!
    • in functional programming terms: current state = left fold of past behaviours
    • existing business systems (systems of problem domain, not necessarily computer systems, think: accounting) use history, not current state (like bank account balance)
    • deriving state from events allows us to change implementation of domain model easily, without affecting object persistence = disconnects the domain model from storage
    • events cannot be vetoed but we can compensate:
      • partial compensating actions (difficult, we don't want to go this way)
      • full compensating actions (accounting people - and developers! - prefer it)
        • compensate the whole transaction & add another one, correct
    • ES gives you an additive (append)-only behavioural model
    • we don't loose any information we would loose with structural model
      • we can build any structural model from our events
      • event log let's you see the system as it was at any point of time
        • this means you can go back in time
        • which is extremally valuable for debugging!
      • you can re-run everything that the system has ever done on latest (or any) version of software
    • when using MongoDB or Cassandra (or sth similar) aggregates can become documents you append events to
    • user's intention should be carrier through from commands to events
    • events are not equal to commands, even if from implementation point of view they might be identical
    • events can be enriched with results of business operations (authorization code of credit card operations, sales tax etc)
      • this prevents duplication of business logic between various places
    Event sourced aggregates
    • a base class is OK
    • important methods:
      • applyChange(event)
        • calls an event handling method of the aggregate (apply) for the concrete event type
        • registers events that have happened if it's a new event
      • public methods defining the business interface of the aggregate
        • business logic, conditionals live here
      • private methods defined in concrete aggregate classes handling events
        • no conditionals
        • only setting data
      • loadFromHistory(IEnumerable<Event>
        • accepts an event stream to restore the aggregate from the history
        • calls the apply method for each event (that's why those methods don't have behaviour, only set the data)
    • repository
      • saves only uncommitted changes of the aggregate and marks them as committed in the aggregate (clears the uncommitted events list)
    • a command makes aggregate produce 0..N events
    • you need a unit-of-work to support batch command processing
      • if you don't need it an explicit call to repository.save() in your event handler should be ok
      • UOW could be configured to accept events from only 1 aggregate and changing that setting to allow batch processing
    What if our aggregates have so many events that restoring aggregate state from them becomes a serious performance problem?
    Rolling snapshots
    • event log changes: [1,2,3,4,5,6,7] becomes [1,2,3,4,5, snapshot, 6, 7]
    • don't use direct serialization of aggregates
    • build snapshots in a separate snapshotter process
    Testing with Event sourcing

    • DDD testing - no asserting against getters, just the behaviour
    • an example scenario:

    public class when_deactivating_an_deactivated_inventory_item :
      AggregateSpecification {
      public IEnumerable given()  {
        yield return New.inventoryItemCreatedWithId(5);
        yield return New.inventoryItemDeactivatedWithId(5);
      }
      public override void when() {
        aggregate.Deactivate();
      }
      [Then]
      public void an_invalid_argument_exception_is_thrown() {
        Assert.isType{thrown};
      }
      [Then]
      public void no_events_are_produced() {
        Assert.isEmpty(events);
      }
    }
    • there's no magic in it, the base test class is dead simple:
    public abstract class AggregateSpecification 
      where T:AggregateRoot {
      public abstract IEnumerable Given();
      public abstract void When();
      protected T aggregate;
      protected Exception caught;
      protected List events;

      [Setup]
      public void Setup() {
        try {
          aggregate = new T();
          aggregate.loadFromHistory(given);
          When();
          events = new List(
            aggregate.getUncommittedChanges());
        } catch (Exception ex) {
          caught = ex;
        }
      }
    }
    • documentation can be generated from those tests
    • or: write the tests in natural language and generate test classes from them
      • then you (and business people) can see your progress as you make test cases pass
      • such tests can be used as communication tool
      • generate such docs in html or whatever format on every CI build so that business can see them at any time
      • override toString() on every event to get human-readable output that can be used in such tests
    • personal note: this is f*cking awesome!
    • we could also do it like:
    public class when_deactivating_an_deactivated_inventory_item :
      AggregateSpecification {


      public IEnumerable given()  {
        yield return New.inventoryItemCreated.WithId(5);
        yield return New.inventoryItemDeactivated.WithId(5);
      }

      public override Command when() {// difference here!
        return New.DeactivateInventoryItem.WithId(5);
      }

      [Then]
      public void an_invalid_argument_exception_is_thrown() {
        Assert.isType{thrown};
      }
      [Then]
      public void no_events_are_produced() {
        Assert.isEmpty(events);
      }
    }
    • entire testing can be expressed with events & commands
    • or maybe have a DSL to express those tests in platform-independent way? like:
    <Given>
    <!-- events serialized to XML -->
    </Given>
    <When>
    <!-- command serialized to XML -->
    </When>
    <Expect>
    <!-- assertions expressed in XML -->
    </Expect
    >
    • then get (for example) Ruby to make it ever nicer to look at
    • HINT: give business people a comfortable way to share their knowledge to development team
    • calling a method of an object == sending a message to an object
    • refucktoring
      • changing a test is making a _new_ test
    • versioning -> new tests with new versions
    • HINT: you don't want your devs understand the framework - you want them to understand the CONCEPT
    Building an event store on top of a SQL db

    • there's a detailed explanation available at cqrsinfo.com
    • RDBMS provides transactions out-of-the-box
    • with multiple event stores you can only guarantee events ordering within aggregate boundary (you can do global ordering with single event store)
    • metadata can be stored along with events (server the event originated in, security context, user, timestamp etc)
    • uses optimistic concurrency and carries the version between server & client
    • for storing events a stored procedure is recommended to avoid multiple server-db roundtrips:
    BEGIN
      var s = select currentversion from aggregate 
        where aggregateid = @1
      if(s==null)
        s = 0
        INSERT INTO AGGREGATES ....
      if( s != expectedVersion)
        throw new ConcurrencyException();
      foreach(event e)
        s++
        INSERT INTO EVENTLOG ...
      update aggregate set currentversion = s
    END
    • snapshotting 
      • brings in another table ( to avoid concurrency exceptions with the servers writing real events into the store)
      • snapshotting is done asynchronously by a snapshotter
      • when to snapshot? when we've got a certain number of events not included in last snapshot (different aggregates can have different snapshotting rules depending on the type etc)
      • snapshots are NOT necessity for most systems, only a heuristic brought in when we need performance boost on the write side
      • snaphots can be versioned differently from domain model (thanks to usage of Memento pattern for snapshots)
      • don't do snapshots by default
    • event store is a queue (little mutant freak database-queue hybrid baby)
    CQRS vs CAP theorem
    • CQRS doesn't break the CAP theorem
    • we don't get all 3 properties at the same time
    • domain (write) side needs C and A
    • read model needs A and P
    events, commands and dto are very strong boundaries allowing us to specialize within them

    CQRS from business perspective
    • are all developers created equal?
      • if your answer is yes you're just wrong
      • if your answer is no - why same people work on domain, UI & data storage?
    • how much time (%) do you really spend working with your domain? 25%? 30%? 40%?
    • reasons to create systems in private sector:
      • make money
      • save money
      • manage risk (let's have it just in case)
    • organizations with high level of maturity can have bigger teams
    • CQRS can get more people into the team (up to 2.5x) without decreasing maturity level
      • people can work in 3 teams independent of each other
      • communication between teams is low
      • create schema (think XSD) describing DTOs, commands & events in the estimation phase
        • if you can't - what the hell are you estimating?!
    • don't allow features to cross iteration boundaries - we want working system, not components, at the end of the iteration
      • keep teams working on the same feature at the same time
    • when working with UI you can mock out read model & commands endpoint
      • same with developing other parts of the system
    • there are 4 parts of every task/story: domain, GUI, read model, integration
    Moving to CQRS+ES architecture
    • one aggregate at a time
    • ask yourself: how are we gonna kill the system?
      • when ES-based system dies the events log is all that is left behind
        • you can migrate from ES to different system by creating a projection matching target data model
    CQRS vs stereotypical architecture
    • CQRS
      • writing to read side sucks
      • reading is easy
    • stereotypical architecture
      • writes are easy
      • queries suck
    • we're making a trade-off
    • both architectures produce an eventually consistent system
    • what about integration?
      • CQRS+ES system has integration model build-in!
        • our read model is in fact integrating with our domain
        • so we have actually tested our integration model!
        • we have a nice, push integration model
          • not an ugly, pull model
    === END OF DAY 2 ===

    That's it for now, advanced topics coming next in notes from day 3.

    Nighty-night!

    No comments:

    Post a Comment