wtorek, 12 lipca 2011

FizzBuzz of Doom

Today a colleague had to interview a bunch of students (on their 3rd or 4th year at the university I believe) wanting to have their internship at our company. Just before he went to talk to them I mentioned the FizzBuzz problem (huh?) as a nice example of simple screening questions and he decided to ask it the candidates.
When I first heard about it I couldn't really believe this could be a problem for anyone (I suppose I was on my 3rd year of studies at the time). Today I got a hard proof I was terribly wrong. Out of 6 students only 2 gave an acceptable answer to the problem.

Ouch! I don't really know what to think about it. Thought that FizzBuzz can actually be a problem for CS students makes me feel uneasy. Or maybe there's nothing to worry about? Time will show

wtorek, 5 lipca 2011

Coding kata

Before yesterday I have done a coding kata (what is it? a short programming exercise that you will ideally perform daily, write some code and throw it away - for more detailed explanation see here) and decided to start doing it daily. It was Roy Osherove's TDD kata 1 .On Sunday I've done it in Java, took be about 20 minutes to complete it (including the advanced part). Yesterday I did it again, this time in Python.
I have learned a bit of Python while studying, done a couple of simple networking apps in it (curses-based IM, Tkinter mail client etc), but that's where my adventure with this fine language ended. Since then I've used it only for small scripts to handle boring tasks, and eventually my knowledge of it has faded.
The basic scope of the String Calculator kata took me 1h to complete! I felt quite ashamed of my performance so decided to re-learn Python. There's a couple of different katas I want to try (including Uncle Bob's Bowling Kata), and have downloaded JetBrains' PyCharm IDE to get a decent tooling support (especially documentation, as I have forgotten most of the standard Python library functions). The experience I had with it yesterday was satisfying and I think I will continue evaluating it and learning Python in the process.
Hopefully I have enough self-discipline to do it!

wtorek, 28 czerwca 2011

My CQRS training

I've run an internal CQRS training for my team mates. Took 2 hours but we didn't manage to go through all topics I wanted to present. Hallway & canteen to the rescue - had another hour talking about CQRS and event sourcing. Here are my slides: http://bit.ly/kADhjZ.


I suppose it's now time for a real world CQRS+ES project, huh? :D

środa, 20 kwietnia 2011

Notes from DDD & CQRS training - day 3

Here's the last part of my notes from Greg Young's class in Poland. Enjoy!

Push vs Pull integration
Pull is:
Accounting <-- getAccountBalance() --- Sales
           -------- balance --------->
Push is:
Accounting -- AccountBalanceChanged--> Sales



  • With pull model we get tight coupling of the systems. With push systems are loosely coupled. System receiving events uses denormalizers to build whatever structural model is needs.
  • Why is push generally better than pull?
    • What if we put Accounting system in Poland and Sales in South Africa? 
      • performance will suck when using pull integration model
      • performance won't be affected as Sales will build it's own view model that can be queried without calling Accounting system
    • weakest link antipattern will hurt systems using pull integration
    • web services (think -> pull) cause Bounded Contexts boundaries to blur - my team needs to understand how other applications look at my system
    • push reduces coupling between project teams - we don't have to wait for other teams to implement their functionality
    • doing push means that we don't pollute our system with concepts of other systems
    • replacing a system with a new one
      • hard in PULL model (have to support how everyone sees our system)
      • easy in PUSH (have to support only events)
    • push keeps us from having a huge, messy canonical model
  • With push integration we apply the same pattern we did for aggregates - reducing coupling through redundancy
  • When can pull be beneficial?
    • when complex calculations must be performed on the data and we don't want to put such logic in every system
    • data from other system is vital for the business
    • it's hard to emulate PUSH with an adapter on top of another system
  • out of events coming from other systems we can build any possible structural model we need
  • a system publishes a language other systems can listen to
  • PUSH should be the default integration model
  • we can degrade our SLAs in order to achieve higher uptime
    • it's better to degrade SLA that being down
    • having errors is often better than being down
    • we introduce eventual consistency
      • if risk goes too high because of stale data the business can hit the red button to bring the system down                 
  • people are afraid of push integration because they are control freaks
    • they like to have a central point that manages everything
  • sending heartbeat messages ("hey, i'm still alive") to let other systems know that we're running fine so that they can act accordingly in case we are down
  • with push we can do remote calculations without pissing off the users
  • push makes eventual consistency explicit (we still have it implicit in PULL but prefer not to think about it)
  • doing push == applying OO principles between systems


Versioning "is dead simple"

  • wouldn't it be easy if we only added things?
  • Let's consider version 1: 
class InventoryItem {
  void deactivate() {// ...
    Apply(new ItemDeactivated(id);
  }
}
class InventoryItemDeactivated:Event{
  public readonly Guid id;
  InventoryItemDeactivated(Guid id){...}
}
  • We'll move to version 2:
// don't change existing event!
class InventoryItemDeactivated:Event{
  public readonly Guid id;
  InventoryItemDeactivated(Guid id){...}
}
// instead just copy & paste & rename:
public class InventoryItemDeactivated_V2:Event{
  public final Guid id;
  public final String comment;
  InventoryItemDeactivated_V2(Guid id,String comment)

  {...}
}
class InventoryItem {
  void deactivate(String comment) {
    if(comment.isNull()) 
      throw new ArgNullEx();
    Apply(new ItemDeactivated_V2(id, comment);
    // ...
  
}
}

  • copy & paste the apply() method to handle V2 event
    • but - as no business logic needs the comment so we don't event copy it into an aggregate
  • what about V57? gets a little dirty...
  • new version of event is convertable from the old version of event
    • if i can't transform v1 to v2 it's not the same event type!!!!
    • new fields get default value in case of old version events
    • let's have a method that converts event to newer version
static InventoryItemDeactivatedEvent_V2 convert(
  InventoryItemDeactivatedEvent e){
  return new InventoryItemDeactivatedEvent_V2(e.id, "BEFORE COMMENTS");
  // or another default value
}
    • now we can delete code that deals with old versions of events
  • we have to version our commands with exactly the same pattern
class DeactivateInventoryItem:Command{
  public final Guid itemId;
  public final int originalVersion;
  // constructor...
}
class DeactivateInventoryItem_V2:Command{
  final Guid itemId;
  public final int originalVersion;
  public final String comment;
  // constructor
}
//let's jump into command handler:
[Depreciated("13/04/2011")]
public void handle(DeactivateInventoryItem m) {
  var item = repo.getById(m.id);
  item.deactivate("");
}
public void handle(DeactivateInventoryItem_V2 m) {
  var item =  repo.getById(m.id);
  item.deactivate(m.comment);
}

  • we don't need any support for versioning in our serialization infrastructure
  • generally we keep 2-3 versions of a command and delete old versions(both handler and command) after some time
    • "how many test you web pages with IE4? why? don't you wanna support them?" 
  • keeping multiple versions running concurrently lets the clients do the transition
  • we never change events!! 
    • we add a new event
    • a deleting change example: v3 without the comment:
class InventoryItemDeactivated_V3:Event {
  public final Guid id;
  // removed: public final String comment;
  InventoryItemDeactivated_V3(Guid id){...}
}

//in the convert() function just don't copy the comment!

  • snapshots (using memento pattern):
    • do it like commands - add a new handling method and keep it until it's no longer needed, then delete it
  • to prevent events & commands from being changed
    • don't write them, generate them from XSD
    • use some tool to detect changes made to XSD and reject checkins
  • bigger problem: we realize that our aggregate boundaries were wrong, what's now?
    • write a little script to break events apart: 
    • build the original aggregate, build a new aggregate from it and save it (keep the reference (id) to the old aggregate)
    • this is annoying task but doesn't happen very often
    • keeping the reference to original aggreagate help other systems integrated in PUSH way (like our read model?) keep their model intact
  • prefer flat events over those containing little data objects - this is a trade-off between coupling and duplication 
    • it's harder to measure coupling than duplication so normally we don't see those problems
    • most of the time we introduce coupling to avoid duplication because duplication is easier to spot
    • flat events don't have problems when a data object definition changes (how would we version that?)

Merging


  • how to get optimal level concurrency?
    • merging prevents most of the problems with optimistic concurrency

public class MergingHandler : Consumes {
  public  MergingHandler(Consumes next) {...}
  public void consume(T message) {
    var commit = eventStore.getEventsSinceVersion(
      message.AggregateId,message.ExpectedVersion); 
    foreach(var e in commit) {
      if(conflictsWith(message,e))
        throw new RealConcurrencyEx();
    }
    next.handle(message);
  }
}

  • doesn't comparing commands to events seem wrong?
    • duplicates the business logic from the domain (aggregate)

// following code assumes usage of UOW
public class MergingHandler : Consumes {
  public  MergingHandler(Consumes next) {...}
  public void consume(T message) {
    var commit = eventStore.getEventsSinceVersion(
      message.AggregateId,message.ExpectedVersion);
      next.handle(message);
      foreach(var e in commit) {
        foreach(var attempted in UnitOfWork.Current.PeakAll()) {
        // events that have been created by the aggregate during the operation
          if(conflictsWith(attempted,e))
            throw new RealConcurrencyEx();
        }
    }
  }
}


  • we can often have general rules for generic conflict detection, like:
    • events of same type tend to conflict
  • unfortunately, the above example still misses an important thing...

public class MergingHandler : Consumes {
  public  MergingHandler(Consumes next) {...}
  public void consume(T message) {
    try {
    BEGIN:
      var commit = eventStore.getEventsSinceVersion(
        message.AggregateId,message.ExpectedVersion); 
      next.handle(message);
      foreach(var e in commit) {
        foreach(var attempted in UnitOfWork.Current.PeakAll()) {
          if(conflictsWith(attempted,e))
            throw new RealConcurrencyEx();
        }
      }
      //normally that would be in another cmd handler:
      UnitOfWork.current.commit();
    }catch(ConcurrencyException e) {
      goto BEGIN; // don't do that in production :)
    }
  }
}

  • this is simple because we store events - try doing it on sql database with current state data!
  • in case of conflict rules that are not generic but domain-specific we usually add a conflictsWith(Event another) method on the event


Eventual consistency


  • don't ask experts: "does the data needs to be eventuall consistent?"
    • ask: "is it ok to have data that is X time old"

NEVER USE WORD "INCONSISTENT" WITH BUSINESS PERSON. SAY "OLD", "STALE" ETC

  • for business people inconsistent=wrong
  • how to get around problems with eventual consistency:
    • easy thing: "your comment is waiting for moderation"
    • last thing to do when everything else fails: fake the changes in the client. make it look like things have happened for the user making the changes
    • UI design & correct user's expectations
      • educate the user: 
        • tell them that sometimes software takes a second to think about what it's doing.
        • if the data is not there immediately, wait 2 seconds and press F5. 
        • if it's still not there immediately call tech support
        • after 1st week users get the point and will wait a bit longer if required
        • "they'are not all idiots"
      • use task-based UIs to make system look consistent (maximize time between sending commands and issuing a query on the client)
  • do we have to handle everything in the same pipe? maybe we can high- and low-priority pipes for different things in the system?


  • Set-based validation
    • what about validating that all usernames must be unique?
    • we only have consistency within a single AR
      • do we want to an AllUsers aggregate? erm, maybe not...
      1. ask: how bad is if two users get created with same username withing 500ms of each other? 
      2. we can see that something is wrong in an event handler (not a part of read model) and for example send an email?
      3. if we don't trust our clients we can put a validating layer on top of command endpoint checking the constraints in the read layer (but anyway - if the don't behave well they just get bad user experience)
    • more often than not if you ask about this topic you'll get redirected to this post
    • REMEMBER: solve problems in a business-centric way
Never going down (the write side)
  • put a queue in front of the command handlers
    • traffic spikes won't overload the system
    • but we can't ACK/NACK the command - we say we accepted the command and assume it will work
      • client has to be "pretty damn certain that the command won't fail"
      • might want to provide some minimal validation just before putting cmd into the queue
    • most people just don't need such architecture, but one-way command pattern is extermaly valuable when they do
  • most message-oriented middleware isn't service bus
  • point-to-point == observer pattern
    • easy, great choice with only a few of queues to set up
    • gets complex with many connections, not scalable in this case
  • hub & spoke - middle-man observer pattern
    • we end up buying tibco or biztalk and start putting a lot of logic into it (workflows ...) and it quickly becomes a tangled mess
    • watching messages flow within organization is easy (debugging too)
    • single point of failure - when hub is down everything is down
  • service bus
    • we distribute the routing information
    • single point of failure no longer exists
    • can be hard to manage from network perspective
    • is a gross overkill in most cases
    • debugging message flows becomes a pain
    • extra features offered by service buses cause lots of logic to be put into transport
  • a bit of humour: IP over Avian Carriers
    • big lol but...
    • "never underestimate the throughput of a truck full of DVDs - highly latent, huge bandwidth"

Sagas


  • what is a saga?
    • long-running business process? "long" can mean different things ;)
    • something that spans multiple transaction boundaries and ensures a process of getting back to a known good state if we fail in one of the transactions
  • got some hand-made drawings but don't feel like trying to re-create them in GIMP. why can't I find on Linux something as easy to use as M$ Paint?)
  • most companies get their competitive advantage not from a single system but from a bunch of interoperating systems
  • we need a facilitator instead of a bunch of business experts from specific domains
    • the PHBs in suits talking about kanban & lean (process optimization person - we don't want to act as one in this situation)
  • sagas do not contain business logic
  • set up a set of dependencies: 
    • who 
    • needs 
    • what 
    • when?
  • sagas move data to the right place at the right time for someone else to do the job
  • saga always starts in response to a single event coming out of domain model
  • choreographs the process and makes sure we reach the end
  • use a correlation id to know which events are related
    • most of the cases it's a part of the message. 
    • we might have multiple correlation ids.
  • sagas are state machines 
    • but we don't have implement it as one (few people think in state machines)
  • between events saga goes to sleep ( join calculus (think: wait, Future etc, continuations))
  • saga does the routing logic
    • it does not create data, just routes it between systems
  • some things have to happen before some amount of time passes
    • like in the movie Memento
    • no long term memory, have someone else providing information
    • use alarm clock for that - pass it a message that is an envelope for the message saga will send (?)
    • we want to avoid having state if possible, it should appear when we need it
  • types of sagas:
    • request-response based sagas
    • document based sagas
  • commands & events from individual systems become (are starting point for ) ubiquitous language
  • a saga often starts another saga (for example for handling rollbacks)
  • dashboards might be easily created from sagas data store (select * from sagastate ...)
  • if such a process is really important for our business why don't we model it (explicitly)?
  • sagas are extremally easy to test
    • small DSL for describing sagas
      • prove that you always exit correctly
      • generate all possible paths to exit
  • document oriented process
    • like with paper documents multiple persons use & fill with more info
    • most processes we try to implement has already been done before computers, on paper
    • but we forgot how we did it (and do the analysis again)
    • document based sagas are what you need in such cases
    • in case of big documents we don't send the whole document back and forth, we set up some storage for them and only send the links
  • RULE OF THUMB FOR VERSIONING SAGAS
    • when i release a new version all sagas already running stay in old version, all new will be run in new version (unless we've found a really bad bug in old implementation)
    • changing running sagas is dangerous and should be avoided
    • this rule makes versioning simple


Scaling writes


  • we only guarantee CA out from CAP on the write side so we can't partition it
  • we can do real-time systems with CQRS
  • stereotypical architecture: single db, multiple app servers with load balancer in front
    • pros
      • fault tolerance
      • can do software upgrade without going down
      • knowledge about it is widespread
    • cons
      • app servers must be stateless!
      • can't be scaled (just buy a bigger database)
      • database remains a single point of failure
      • database might be a performance bottleneck
    • it's good but has limitations
  • let's replace the database with a event store!
    • there's no functional difference between this solution and previous one
    • loading aggregates on each request increases latency
  • we might split event store into multiple stores, based on aggregate ID (sharding)
    • this can (theoretically) go as far as having a single event store per aggregate
    • problem happens when one of the datastores goes down
      • we could multiply them with a master-slave pattern
      • but: each slave increases latency
    • this allows scaling out our event store
  • in order to reduce latency we can switch from stateless to statefull app servers
    • we have a message router (with fast, in-memory routing info) which knows which aggregate resides in each app server
    • loaded aggregate stays in memory of the app server
    • over time event store becomes write-only
    • when a app server goes down message router must distribute it's job among other servers
      • this can cause latency spike unacceptable for some real-time systems
  • to solve the problem we can use a warm replica 
    • just as in previous example but:
      • when message is routed to a server another server is told to shadow the aggregate that the message was directed to
        • shadowing server loads the AR and subscribes to it's events
      • events are delivered to shadowing systems by a publisher
        • stays ~100ms behind original write
        • can use UDP multicast for publishing events
      • when a server goes down shadowing server is only 100ms behind it and requires small operation to catch up with current state
      • this greatly reduces the latency spike when a server is going down
      • but...
      • we can get rid of the spike completely!
      • when shadowing server receives first command it can act as if it was up-to-date
        • but still listen to events from event store!
        • until it gets events it created itself it tries to merge
          • same code as regular events merging!
        • when it does get its own events it unsubscribes
      • many businesses will accept the risk of possible merging problems to avoid latency spikes
    • with this architecture there are no more reads from the event store!


Occasionally connected systems
My notes here are barely readable drawings on paper with some (even less readable) text here and there. Will unfortunately have to skip it (I'm certainly NOT doing those drawings in GIMP!) but...
Greg already had a presentation on this subject recorded. It covers the same topics (watched it few days before the class).

The interesting thing here is the conclusion: CQRS is nothing else as plain, old, good MVC (as initially done in Smalltalk) brought to architectural level.
None of these ideas are new.
Isn't it cool?
The important lesson is:
Review what you have already done.

== END OF DAY 3 ===
and unfortunately of the whole training. A pity, I wouldn't mind at all spending few more days attending to such a great class! Thanks a lot for it, Greg!

wtorek, 19 kwietnia 2011

Notes from DDD & CQRS training - day 2

That's when things got really interesting. The topics covered were more or less the same as in the 6.5h video from one of Greg's previous trainings which I had watched some time before the training during a long, lonely night at a hotel in Germany, but still I was listening at 100% attention. Without further babble, here go my notes:

Read model

  • simple, hard to screw up
  • to be done by low value/junior developers
  • can be outsourced
    • can't find better thing to OS
  • uses whatever model is appropriate
Command handlers
public interface Consumes<T> where T:Message{ 
  //T would be a command(in case of command handlers) 
  //or an event in case of event handlers/projections
  void consume(T);
}
public interface Message {
  // just a marker interface
}
  • they are the application services in CQRS-based systems, the external edge of the domain model
  • should contain no logic, not even a simple if statement
  • can implement cross-cutting concerns
    • logging
    • transactions
    • authentication
    • authorization
    • batch commands
    • exceptions handling
    • merging
  • handle cross-cutting concerns not directly in the same class that invokes aggregate method, but using composition (think decorator pattern).
class DeactivateInventoryItemCommandHandler : 
  Consumes<DeactivateInventoryItemCommand> {
  /* constructor-injected repository */
  void consume(DeactivateInventoryItemCommand msg) {
    var item = repository.getById(msg.id);
    item.deactivate(msg.comment);
    // this makes batch cmd processing impossible
    repository.save(item); 
  }
}

class LoggingHandler<T> : Consumes<T> {
  public LoggingHandler(Consumes<T> next) {
    this.next = next;
  }
  public void consume(T message){
    Logger.write("received message:" + message);
    next.consume(message);
  }
}

var handler = new LoggingHandler(
  new DeactivateInventoryItemCommandHandler(
    repo));// yay, we've got logging!

class AuthorizingHandler : Consumes<T>{
  AuthorizingHandler(Consumes<T> next){...}
  void consume(T message){
    // check authorization then do:
    next.consume(message);
  }
}
  • make command handler wrapping automatic with reflection:
[RequiresPermission("admin")]
class DeactivateInventoryItemCommandHandler : 
  Consumes<DeactivateInventoryItemCommand> {....}

  • we can make our code our configuration




  • above is equal to doing functional composition (with interfaces). it could also be done explicitly:



  • // let's have a lambda:
    return x => DeactivateInventoryItemCommandHandler(
      new TestRepository<InventoryItem>(), x); 
      // this is DI in functional language
      // using function currying - that's so cool!

    public void DeactivateInventoryItemCommandHandler
      (Repository<InventoryItem repo,
       DeactivateInventoryItemCommand) {...}
    Projections
    • consume many events to update a view model
    • an important explicit concept
    • will have multiple methods, each handling another type of event
    • are in 1-to-1 relation with tables (sometimes, but rarely, 1-N)
    class InventoryItemCurrentCountProjection : 
      Consumes<InventoryItemDeactivated/*Event*/>, 
      Consumes<IventoryItemCreated> , ... 
    // more events needed to update the view model 
            // can't directly translate to Java :(
    {
    void consume(InventoryItemDeactivated message) {
    // do sth
    }
    void consume(IventoryItemCreated message) {
    // do sth else
    }
    }
    BOOK TO READ: The little LISPer


    CQRS can be done using a single data store for writes & reads. Like building the read model based on SQL views. But we can drive the specialization of write & read side even further. Finally, they've got totally different characteristics.
    And reports run on 3NF database are so sloooow.
    Enter:
    Events
    • verbs in past tense - they are things that have already happened, actions completed in the past (think passé composé)
    • listeners can disagree with them but can't say NO
      • can only compensate
    • can be used for synchronizing multiple different models
    So, we'll have our domain model implemented with (n)Hibernate emit events so that we can have our beloved 3NF database and denormalize into multiple read models (to get near infinite scalability)? Just having a 2PC transaction between the write db and a queue?

    Nope. This is guaranteed to fail.
    Why? 
    • ORM creates series of deltas
    • we have to prove that Δ(Hibernate) = Δ(events) - not easy
    • models can get out of sync in case of a bug
      • such problems can be hard to spot
      • impossible to fix data model broken this way
    So, what shall we do?
    • get rid of the ORM so our events are our only source of truth
    • we can have projections populating our 3NF model
      • but is it worth the costs and increased size of our code base?
    • this is the poison pill architecture
      • will get you to event sourcing
      • getting rid of 3NF model will let you get rid of 
        • your DBA freaks
        • costs of DB licences (business will like it!)
    Event sourcing At last!
    • in functional programming terms: current state = left fold of past behaviours
    • existing business systems (systems of problem domain, not necessarily computer systems, think: accounting) use history, not current state (like bank account balance)
    • deriving state from events allows us to change implementation of domain model easily, without affecting object persistence = disconnects the domain model from storage
    • events cannot be vetoed but we can compensate:
      • partial compensating actions (difficult, we don't want to go this way)
      • full compensating actions (accounting people - and developers! - prefer it)
        • compensate the whole transaction & add another one, correct
    • ES gives you an additive (append)-only behavioural model
    • we don't loose any information we would loose with structural model
      • we can build any structural model from our events
      • event log let's you see the system as it was at any point of time
        • this means you can go back in time
        • which is extremally valuable for debugging!
      • you can re-run everything that the system has ever done on latest (or any) version of software
    • when using MongoDB or Cassandra (or sth similar) aggregates can become documents you append events to
    • user's intention should be carrier through from commands to events
    • events are not equal to commands, even if from implementation point of view they might be identical
    • events can be enriched with results of business operations (authorization code of credit card operations, sales tax etc)
      • this prevents duplication of business logic between various places
    Event sourced aggregates
    • a base class is OK
    • important methods:
      • applyChange(event)
        • calls an event handling method of the aggregate (apply) for the concrete event type
        • registers events that have happened if it's a new event
      • public methods defining the business interface of the aggregate
        • business logic, conditionals live here
      • private methods defined in concrete aggregate classes handling events
        • no conditionals
        • only setting data
      • loadFromHistory(IEnumerable<Event>
        • accepts an event stream to restore the aggregate from the history
        • calls the apply method for each event (that's why those methods don't have behaviour, only set the data)
    • repository
      • saves only uncommitted changes of the aggregate and marks them as committed in the aggregate (clears the uncommitted events list)
    • a command makes aggregate produce 0..N events
    • you need a unit-of-work to support batch command processing
      • if you don't need it an explicit call to repository.save() in your event handler should be ok
      • UOW could be configured to accept events from only 1 aggregate and changing that setting to allow batch processing
    What if our aggregates have so many events that restoring aggregate state from them becomes a serious performance problem?
    Rolling snapshots
    • event log changes: [1,2,3,4,5,6,7] becomes [1,2,3,4,5, snapshot, 6, 7]
    • don't use direct serialization of aggregates
    • build snapshots in a separate snapshotter process
    Testing with Event sourcing

    • DDD testing - no asserting against getters, just the behaviour
    • an example scenario:

    public class when_deactivating_an_deactivated_inventory_item :
      AggregateSpecification {
      public IEnumerable given()  {
        yield return New.inventoryItemCreatedWithId(5);
        yield return New.inventoryItemDeactivatedWithId(5);
      }
      public override void when() {
        aggregate.Deactivate();
      }
      [Then]
      public void an_invalid_argument_exception_is_thrown() {
        Assert.isType{thrown};
      }
      [Then]
      public void no_events_are_produced() {
        Assert.isEmpty(events);
      }
    }
    • there's no magic in it, the base test class is dead simple:
    public abstract class AggregateSpecification 
      where T:AggregateRoot {
      public abstract IEnumerable Given();
      public abstract void When();
      protected T aggregate;
      protected Exception caught;
      protected List events;

      [Setup]
      public void Setup() {
        try {
          aggregate = new T();
          aggregate.loadFromHistory(given);
          When();
          events = new List(
            aggregate.getUncommittedChanges());
        } catch (Exception ex) {
          caught = ex;
        }
      }
    }
    • documentation can be generated from those tests
    • or: write the tests in natural language and generate test classes from them
      • then you (and business people) can see your progress as you make test cases pass
      • such tests can be used as communication tool
      • generate such docs in html or whatever format on every CI build so that business can see them at any time
      • override toString() on every event to get human-readable output that can be used in such tests
    • personal note: this is f*cking awesome!
    • we could also do it like:
    public class when_deactivating_an_deactivated_inventory_item :
      AggregateSpecification {


      public IEnumerable given()  {
        yield return New.inventoryItemCreated.WithId(5);
        yield return New.inventoryItemDeactivated.WithId(5);
      }

      public override Command when() {// difference here!
        return New.DeactivateInventoryItem.WithId(5);
      }

      [Then]
      public void an_invalid_argument_exception_is_thrown() {
        Assert.isType{thrown};
      }
      [Then]
      public void no_events_are_produced() {
        Assert.isEmpty(events);
      }
    }
    • entire testing can be expressed with events & commands
    • or maybe have a DSL to express those tests in platform-independent way? like:
    <Given>
    <!-- events serialized to XML -->
    </Given>
    <When>
    <!-- command serialized to XML -->
    </When>
    <Expect>
    <!-- assertions expressed in XML -->
    </Expect
    >
    • then get (for example) Ruby to make it ever nicer to look at
    • HINT: give business people a comfortable way to share their knowledge to development team
    • calling a method of an object == sending a message to an object
    • refucktoring
      • changing a test is making a _new_ test
    • versioning -> new tests with new versions
    • HINT: you don't want your devs understand the framework - you want them to understand the CONCEPT
    Building an event store on top of a SQL db

    • there's a detailed explanation available at cqrsinfo.com
    • RDBMS provides transactions out-of-the-box
    • with multiple event stores you can only guarantee events ordering within aggregate boundary (you can do global ordering with single event store)
    • metadata can be stored along with events (server the event originated in, security context, user, timestamp etc)
    • uses optimistic concurrency and carries the version between server & client
    • for storing events a stored procedure is recommended to avoid multiple server-db roundtrips:
    BEGIN
      var s = select currentversion from aggregate 
        where aggregateid = @1
      if(s==null)
        s = 0
        INSERT INTO AGGREGATES ....
      if( s != expectedVersion)
        throw new ConcurrencyException();
      foreach(event e)
        s++
        INSERT INTO EVENTLOG ...
      update aggregate set currentversion = s
    END
    • snapshotting 
      • brings in another table ( to avoid concurrency exceptions with the servers writing real events into the store)
      • snapshotting is done asynchronously by a snapshotter
      • when to snapshot? when we've got a certain number of events not included in last snapshot (different aggregates can have different snapshotting rules depending on the type etc)
      • snapshots are NOT necessity for most systems, only a heuristic brought in when we need performance boost on the write side
      • snaphots can be versioned differently from domain model (thanks to usage of Memento pattern for snapshots)
      • don't do snapshots by default
    • event store is a queue (little mutant freak database-queue hybrid baby)
    CQRS vs CAP theorem
    • CQRS doesn't break the CAP theorem
    • we don't get all 3 properties at the same time
    • domain (write) side needs C and A
    • read model needs A and P
    events, commands and dto are very strong boundaries allowing us to specialize within them

    CQRS from business perspective
    • are all developers created equal?
      • if your answer is yes you're just wrong
      • if your answer is no - why same people work on domain, UI & data storage?
    • how much time (%) do you really spend working with your domain? 25%? 30%? 40%?
    • reasons to create systems in private sector:
      • make money
      • save money
      • manage risk (let's have it just in case)
    • organizations with high level of maturity can have bigger teams
    • CQRS can get more people into the team (up to 2.5x) without decreasing maturity level
      • people can work in 3 teams independent of each other
      • communication between teams is low
      • create schema (think XSD) describing DTOs, commands & events in the estimation phase
        • if you can't - what the hell are you estimating?!
    • don't allow features to cross iteration boundaries - we want working system, not components, at the end of the iteration
      • keep teams working on the same feature at the same time
    • when working with UI you can mock out read model & commands endpoint
      • same with developing other parts of the system
    • there are 4 parts of every task/story: domain, GUI, read model, integration
    Moving to CQRS+ES architecture
    • one aggregate at a time
    • ask yourself: how are we gonna kill the system?
      • when ES-based system dies the events log is all that is left behind
        • you can migrate from ES to different system by creating a projection matching target data model
    CQRS vs stereotypical architecture
    • CQRS
      • writing to read side sucks
      • reading is easy
    • stereotypical architecture
      • writes are easy
      • queries suck
    • we're making a trade-off
    • both architectures produce an eventually consistent system
    • what about integration?
      • CQRS+ES system has integration model build-in!
        • our read model is in fact integrating with our domain
        • so we have actually tested our integration model!
        • we have a nice, push integration model
          • not an ugly, pull model
    === END OF DAY 2 ===

    That's it for now, advanced topics coming next in notes from day 3.

    Nighty-night!

    poniedziałek, 18 kwietnia 2011

    Notes from DDD & CQRS training - day 1

    My notes from the 1st day (11/04/2011) of the DDD/CQRS training by Greg Young, just as I took them - very little post-processing applied so it might be of little help for anyone but me (or maybe other participants of the training).

    UIs:

    • CRUD (these suck)
    • task-based (users like those)
    Aggregate:
    • group of object we treat together as a whole
    • affect only a single aggregate - that lets you avoid distributed transactions (think horizontal partitioning/sharding)
    • put the method next to the state it operates on is
    • denormalization helps to get the design right
    Booksto read:
    • Streamlined object modelling 
      • time interval object
      • make implicit explicit
    • Object-oriented software construction 2nd edition by Bertrand Meyer
      • describes CQS 
    saving two objects = bad

    • business doesn't care about consistency
    • breaking bidirectional relationships
      • ask: do those things need to be consistent? 
      • drop consistency of invariant
    • domain model != data model
    • if needed, a Domain Service can ensure consistency (this should really be used only as a last resort!)
    • collection of Transaction objects can have a domain meaning
    • AggregateRoot (AR) name makes sense for the entire aggregate
    • too much magic is bad (think ORM)
    • between aggregates use soft links (IDs) instead of references
    TIP: keeping track of Optimistic Concurrency Exceptions makes an interesting statistic

    EXERCISE: test-drive Probability value object class with methods like combine(Probability), not() etc, encapsulating a Java's BigDecimal (.NET's Decimal?). The tricky part: you can't have any kind of accessor methods to expose the internal state. What do you test first?

    And now... suppose that standard BigDecimal implementation is too slow for your system. You have to change the implementation of the Probability class but retain the API. How many tests do you have to change?

    personal note: this turned out to be an easy, yet an interesting exercise. Funny, how it changes the way you write code when you don't have those evil getters around. I really, really liked it!

    Repository:
    • Evans: works on aggregates, provides domain language to persistence infrastructure
    • Fowler: purely technical stuff

    • make contracts in the domain:
      • as narrow as possible
      • as explicit as possible
    • this will lower the conceptual coupling
    Service:
    • any piece of procedural code
    • can be:
      • infrastructure
      • domain
      • application
    • but it the end they are all facades
    • if you do things right you might never need services
    • interface segregation
      • single method interfaces
      • role interfaces
    Hexagonal architecture - ports & adapters

    TIP:
     why not check check-ins for illegal dependencies (like domain depending on something else) and reject those that don't follow the rules?

    On SOLID principles:
    • they are just heuristics
    • don't try to stick to them no matter the cost (duplication sometimes can be a good thing!)
    EXAMPLE: When not to adhere to Interface Segregation?
    • when all methods go the the same source
      class Stream implements ICanSeek,ICanRead,ICanWrite
      // client code:
      void DoesSomething(ICanSeek seeker, ICanRead reader) {
        seeker.seekTo(0);
        while (var x = reader.read() != null) {
        /...
        }
      }
      ICanSeekRead extends ICanSeek,ICanRead != Foo implements ICanSeek, ICanRead
      • DI/IoC:
        • ServiceLocator is totally OK when resolving things at the same layer
        • about injecting into entities: most dependencies match the lifecycle of methods, not objects 
      void Submit(ISearchDriverLicences s) {
        s.searchFor("something");
      }
      void F() {//coupling from F to G
        G.Something();
      }
      interface ISomething{
        something();
      }
      void F(ISomething s) { 
        s.something();
      }
      class ISomethingImpl:ISomething {
        // sometimes DI is too much:
        void something(){
          Console.WriteLine("Hello world");
        }
      }
      • we're overusing tools, frameworks 
        frameworks pollute our brains

        Back to services:

        • ApplicationServices 
          • should be role interfaces, one for every use case of the system
          • you should have no business logic in them (not even an if statement!)
        isValid() antipattern
        • pure evil!
        • don't do that!
        • causes GIGO (Garbage In, Garbage Out)
        • entities end up being in one of 3 possible states:
          • valid
          • invalid
          • have no frakking clue
        • encapsulation is about protecting state - don't let people jam it!
        Specification
        • (wikipedia)
        • predicate logic (think Prolog)
        • might need getters (protected/internal) exposed
          • but people will start using them as soon as they see them
        • composite specification
        public class AService {
          AService( IEnumerable<Specification<Customer>>  
            rules){}

          void deactivate(Customer c) {
            if(!rules.areAllValid(c) { 
              throw new IllegalArgException(); 
            }
            ...
          }
        }

        === END OF DAY 1 ===
            
        Unfortunately, those notes don't show how absolutely awesome the training was. Really got my eyes wide open on many issues that I was somehow missing before.

        On a related note - people really do use functional programming in real-world applications! After hearing that from Greg I started learning Clojure. I had a Prolog & Haskell course back at the university. I didn't like the Prolog part but really enjoyed writing minimized code in Haskell. Now I just have to find some time to refresh by skills at functional programming. Or better - find some use for it so I can justify re-learning it at work ;)

        DDD & CQRS training

        Last week I participated to a great training by Greg Young in Kraków. I was amazed how much knowledge can get stuffed into my brain in mere 3 days. Prior to the training I have watched a couple of Greg's presentations (yeah, including the 6.5h-long video from a CQRS training), read blogs, followed the DDD/CQRS news group but still found myself sitting down as if I were hypnotized for 3x8 hours. Not once during the training Greg's answer let me think that he was talking about something he was unsure of. I find it annoying that some people (read: consultants) give talks about things they don't have any experience with.

        I suppose I won't waste any more time trying to duplicate Piotr's blog entry about the training and will start re-writting my notes immediately.


        Damn, just found the feedback form Greg asked us to fill in for him. The blog entry will have to wait for a moment.

        Initial babble

        The time has just come for me to write my first blog entry ever. Okay, maybe not write - publish. Has already written a few but never got to finish & publish one.

        On my quest to pretend that I'm more of a social creature that I really am, I also set up a Twitter account In fact it was Greg Young who pushed me to take this desperate step. He found super weird that no one was tweeting during IT conferences in Poland. Maybe some people just don't want to miss their only chance to actually have a face2face chat with real people? Dunno, I just can't imagine wasting my time on Twitter while attending Greg's great DDD&CQRS training (will- hopefully- write more about it in the days to come). I suppose some people (like myself) prefer to stay quiet until they've got something interesting to say. I prefer doing that than talking aloud about things I really got no idea of.

        Oh wait, I think I got side-tracked so will get back on the track and finish this short post. I hope I will have some interesting stuff to share with people on this blog. Don't really want to write HelloWorld-style entries but I suppose it could be useful for myself while learning a new technology or another evil Java framework. Will start with my notes from the training mentioned above, that will give me a change to re-read and re-think it. And will make it more durable than the paper I scribbled it on. Hopefully one day someone will find it worth reading. 

        This entry makes no sense whatsoever but I will publish it nonetheless, just to get myself started with blogging. 

        Stay tuned for more, I sincerely hope it will come.