MAD's adventures with software: 2011

wtorek, 12 lipca 2011

FizzBuzz of Doom

Today a colleague had to interview a bunch of students (on their 3rd or 4th year at the university I believe) wanting to have their internship at our company. Just before he went to talk to them I mentioned the FizzBuzz problem (huh?) as a nice example of simple screening questions and he decided to ask it the candidates.
When I first heard about it I couldn't really believe this could be a problem for anyone (I suppose I was on my 3rd year of studies at the time). Today I got a hard proof I was terribly wrong. Out of 6 students only 2 gave an acceptable answer to the problem.

Ouch! I don't really know what to think about it. Thought that FizzBuzz can actually be a problem for CS students makes me feel uneasy. Or maybe there's nothing to worry about? Time will show

wtorek, 5 lipca 2011

Coding kata

Before yesterday I have done a coding kata (what is it? a short programming exercise that you will ideally perform daily, write some code and throw it away - for more detailed explanation see here) and decided to start doing it daily. It was Roy Osherove's TDD kata 1 .On Sunday I've done it in Java, took be about 20 minutes to complete it (including the advanced part). Yesterday I did it again, this time in Python.
I have learned a bit of Python while studying, done a couple of simple networking apps in it (curses-based IM, Tkinter mail client etc), but that's where my adventure with this fine language ended. Since then I've used it only for small scripts to handle boring tasks, and eventually my knowledge of it has faded.
The basic scope of the String Calculator kata took me 1h to complete! I felt quite ashamed of my performance so decided to re-learn Python. There's a couple of different katas I want to try (including Uncle Bob's Bowling Kata), and have downloaded JetBrains' PyCharm IDE to get a decent tooling support (especially documentation, as I have forgotten most of the standard Python library functions). The experience I had with it yesterday was satisfying and I think I will continue evaluating it and learning Python in the process.
Hopefully I have enough self-discipline to do it!

wtorek, 28 czerwca 2011

My CQRS training

I've run an internal CQRS training for my team mates. Took 2 hours but we didn't manage to go through all topics I wanted to present. Hallway & canteen to the rescue - had another hour talking about CQRS and event sourcing. Here are my slides: http://bit.ly/kADhjZ.

I suppose it's now time for a real world CQRS+ES project, huh? :D

środa, 20 kwietnia 2011

Notes from DDD & CQRS training - day 3

Here's the last part of my notes from Greg Young's class in Poland. Enjoy!

Push vs Pull integration
Pull is:
Accounting <-- getAccountBalance() --- Sales
-------- balance --------->
Push is:
Accounting -- AccountBalanceChanged--> Sales

With pull model we get tight coupling of the systems. With push systems are loosely coupled. System receiving events uses denormalizers to build whatever structural model is needs.
Why is push generally better than pull?

What if we put Accounting system in Poland and Sales in South Africa?

performance will suck when using pull integration model
performance won't be affected as Sales will build it's own view model that can be queried without calling Accounting system

weakest link antipattern will hurt systems using pull integration
web services (think -> pull) cause Bounded Contexts boundaries to blur - my team needs to understand how other applications look at my system
push reduces coupling between project teams - we don't have to wait for other teams to implement their functionality
doing push means that we don't pollute our system with concepts of other systems
replacing a system with a new one

hard in PULL model (have to support how everyone sees our system)
easy in PUSH (have to support only events)

push keeps us from having a huge, messy canonical model

With push integration we apply the same pattern we did for aggregates - reducing coupling through redundancy
When can pull be beneficial?

when complex calculations must be performed on the data and we don't want to put such logic in every system
data from other system is vital for the business
it's hard to emulate PUSH with an adapter on top of another system

out of events coming from other systems we can build any possible structural model we need
a system publishes a language other systems can listen to
PUSH should be the default integration model
we can degrade our SLAs in order to achieve higher uptime

it's better to degrade SLA that being down
having errors is often better than being down
we introduce eventual consistency

if risk goes too high because of stale data the business can hit the red button to bring the system down

people are afraid of push integration because they are control freaks

they like to have a central point that manages everything

sending heartbeat messages ("hey, i'm still alive") to let other systems know that we're running fine so that they can act accordingly in case we are down
with push we can do remote calculations without pissing off the users
push makes eventual consistency explicit (we still have it implicit in PULL but prefer not to think about it)
doing push == applying OO principles between systems

Versioning "is dead simple"

wouldn't it be easy if we only added things?
Let's consider version 1:

class InventoryItem {
  void deactivate() {// ...
   Apply(new ItemDeactivated(id);
  }
}
class InventoryItemDeactivated:Event{
  public readonly Guid id;
  InventoryItemDeactivated(Guid id){...}
}

We'll move to version 2:

// don't change existing event!
class InventoryItemDeactivated:Event{
  public readonly Guid id;
  InventoryItemDeactivated(Guid id){...}
}
// instead just copy & paste & rename:
public class InventoryItemDeactivated_V2:Event{
  public final Guid id;
  public final String comment;
  InventoryItemDeactivated_V2(Guid id,String comment)

  {...}
}
class InventoryItem {
  void deactivate(String comment) {
   if(comment.isNull())

   throw new ArgNullEx();
   Apply(new ItemDeactivated_V2(id, comment);
   // ...
  }
}

copy & paste the apply() method to handle V2 event

but - as no business logic needs the comment so we don't event copy it into an aggregate

what about V57? gets a little dirty...
new version of event is convertable from the old version of event

if i can't transform v1 to v2 it's not the same event type!!!!
new fields get default value in case of old version events
let's have a method that converts event to newer version

static InventoryItemDeactivatedEvent_V2 convert(

InventoryItemDeactivatedEvent e){
return new InventoryItemDeactivatedEvent_V2(e.id, "BEFORE COMMENTS");

// or another default value

}

now we can delete code that deals with old versions of events

we have to version our commands with exactly the same pattern

class DeactivateInventoryItem:Command{
  public final Guid itemId;
  public final int originalVersion;
  // constructor...
}
class DeactivateInventoryItem_V2:Command{
  final Guid itemId;
  public final int originalVersion;
  public final String comment;
  // constructor
}

//let's jump into command handler:
[Depreciated("13/04/2011")]
public void handle(DeactivateInventoryItem m) {
  var item = repo.getById(m.id);
  item.deactivate("");
}
public void handle(DeactivateInventoryItem_V2 m) {
  var item = repo.getById(m.id);
  item.deactivate(m.comment);
}

we don't need any support for versioning in our serialization infrastructure
generally we keep 2-3 versions of a command and delete old versions(both handler and command) after some time

"how many test you web pages with IE4? why? don't you wanna support them?"

keeping multiple versions running concurrently lets the clients do the transition
we never change events!!

we add a new event
a deleting change example: v3 without the comment:

class InventoryItemDeactivated_V3:Event {
  public final Guid id;
  // removed: public final String comment;
  InventoryItemDeactivated_V3(Guid id){...}
}

//in the convert() function just don't copy the comment!

snapshots (using memento pattern):

do it like commands - add a new handling method and keep it until it's no longer needed, then delete it

to prevent events & commands from being changed

don't write them, generate them from XSD
use some tool to detect changes made to XSD and reject checkins

bigger problem: we realize that our aggregate boundaries were wrong, what's now?

write a little script to break events apart:
build the original aggregate, build a new aggregate from it and save it (keep the reference (id) to the old aggregate)
this is annoying task but doesn't happen very often
keeping the reference to original aggreagate help other systems integrated in PUSH way (like our read model?) keep their model intact

prefer flat events over those containing little data objects - this is a trade-off between coupling and duplication

it's harder to measure coupling than duplication so normally we don't see those problems
most of the time we introduce coupling to avoid duplication because duplication is easier to spot
flat events don't have problems when a data object definition changes (how would we version that?)

Merging

how to get optimal level concurrency?

merging prevents most of the problems with optimistic concurrency

public class MergingHandler : Consumes {
  public MergingHandler(Consumes next) {...}
  public void consume(T message) {
   var commit = eventStore.getEventsSinceVersion(
   message.AggregateId,message.ExpectedVersion);
   foreach(var e in commit) {
   if(conflictsWith(message,e))
   throw new RealConcurrencyEx();
   }
   next.handle(message);
  }
}

doesn't comparing commands to events seem wrong?

duplicates the business logic from the domain (aggregate)

// following code assumes usage of UOW
public class MergingHandler : Consumes {
  public MergingHandler(Consumes next) {...}
  public void consume(T message) {
   var commit = eventStore.getEventsSinceVersion(
   message.AggregateId,message.ExpectedVersion);
   next.handle(message);
   foreach(var e in commit) {
   foreach(var attempted in UnitOfWork.Current.PeakAll()) {
   // events that have been created by the aggregate during the operation
   if(conflictsWith(attempted,e))
   throw new RealConcurrencyEx();
   }
   }
  }
}

we can often have general rules for generic conflict detection, like:

events of same type tend to conflict

unfortunately, the above example still misses an important thing...

public class MergingHandler : Consumes {
  public MergingHandler(Consumes next) {...}
  public void consume(T message) {
    try {
    BEGIN:
   var commit = eventStore.getEventsSinceVersion(
   message.AggregateId,message.ExpectedVersion);
   next.handle(message);
      foreach(var e in commit) {
      foreach(var attempted in UnitOfWork.Current.PeakAll()) {
      if(conflictsWith(attempted,e))
      throw new RealConcurrencyEx();
      }
      }
      //normally that would be in another cmd handler:
      UnitOfWork.current.commit();
   }catch(ConcurrencyException e) {
      goto BEGIN; // don't do that in production :)
    }
  }
}

this is simple because we store events - try doing it on sql database with current state data!
in case of conflict rules that are not generic but domain-specific we usually add a conflictsWith(Event another) method on the event

Eventual consistency

don't ask experts: "does the data needs to be eventuall consistent?"

ask: "is it ok to have data that is X time old"

NEVER USE WORD "INCONSISTENT" WITH BUSINESS PERSON. SAY "OLD", "STALE" ETC

for business people inconsistent=wrong
how to get around problems with eventual consistency:

easy thing: "your comment is waiting for moderation"
last thing to do when everything else fails: fake the changes in the client. make it look like things have happened for the user making the changes
UI design & correct user's expectations

educate the user:

tell them that sometimes software takes a second to think about what it's doing.
if the data is not there immediately, wait 2 seconds and press F5.
if it's still not there immediately call tech support
after 1st week users get the point and will wait a bit longer if required
"they'are not all idiots"

use task-based UIs to make system look consistent (maximize time between sending commands and issuing a query on the client)

do we have to handle everything in the same pipe? maybe we can high- and low-priority pipes for different things in the system?

Set-based validation

what about validating that all usernames must be unique?
we only have consistency within a single AR

do we want to an AllUsers aggregate? erm, maybe not...

ask: how bad is if two users get created with same username withing 500ms of each other?
we can see that something is wrong in an event handler (not a part of read model) and for example send an email?
if we don't trust our clients we can put a validating layer on top of command endpoint checking the constraints in the read layer (but anyway - if the don't behave well they just get bad user experience)

more often than not if you ask about this topic you'll get redirected to this post
REMEMBER: solve problems in a business-centric way

Never going down (the write side)

put a queue in front of the command handlers

traffic spikes won't overload the system
but we can't ACK/NACK the command - we say we accepted the command and assume it will work

client has to be "pretty damn certain that the command won't fail"
might want to provide some minimal validation just before putting cmd into the queue

most people just don't need such architecture, but one-way command pattern is extermaly valuable when they do

most message-oriented middleware isn't service bus
point-to-point == observer pattern

easy, great choice with only a few of queues to set up
gets complex with many connections, not scalable in this case

hub & spoke - middle-man observer pattern

we end up buying tibco or biztalk and start putting a lot of logic into it (workflows ...) and it quickly becomes a tangled mess
watching messages flow within organization is easy (debugging too)
single point of failure - when hub is down everything is down

service bus

we distribute the routing information
single point of failure no longer exists
can be hard to manage from network perspective
is a gross overkill in most cases
debugging message flows becomes a pain
extra features offered by service buses cause lots of logic to be put into transport

a bit of humour: IP over Avian Carriers

big lol but...
"never underestimate the throughput of a truck full of DVDs - highly latent, huge bandwidth"

Sagas

what is a saga?

long-running business process? "long" can mean different things ;)
something that spans multiple transaction boundaries and ensures a process of getting back to a known good state if we fail in one of the transactions

got some hand-made drawings but don't feel like trying to re-create them in GIMP. why can't I find on Linux something as easy to use as M$ Paint?)
most companies get their competitive advantage not from a single system but from a bunch of interoperating systems
we need a facilitator instead of a bunch of business experts from specific domains

the PHBs in suits talking about kanban & lean (process optimization person - we don't want to act as one in this situation)

sagas do not contain business logic
set up a set of dependencies:

who
needs
what
when?

sagas move data to the right place at the right time for someone else to do the job
saga always starts in response to a single event coming out of domain model
choreographs the process and makes sure we reach the end
use a correlation id to know which events are related

most of the cases it's a part of the message.
we might have multiple correlation ids.

sagas are state machines

but we don't have implement it as one (few people think in state machines)

between events saga goes to sleep ( join calculus (think: wait, Future etc, continuations))
saga does the routing logic

it does not create data, just routes it between systems

some things have to happen before some amount of time passes

like in the movie Memento
no long term memory, have someone else providing information
use alarm clock for that - pass it a message that is an envelope for the message saga will send (?)
we want to avoid having state if possible, it should appear when we need it

types of sagas:

request-response based sagas
document based sagas

commands & events from individual systems become (are starting point for ) ubiquitous language
a saga often starts another saga (for example for handling rollbacks)
dashboards might be easily created from sagas data store (select * from sagastate ...)
if such a process is really important for our business why don't we model it (explicitly)?
sagas are extremally easy to test

small DSL for describing sagas

prove that you always exit correctly
generate all possible paths to exit

document oriented process

like with paper documents multiple persons use & fill with more info
most processes we try to implement has already been done before computers, on paper
but we forgot how we did it (and do the analysis again)
document based sagas are what you need in such cases
in case of big documents we don't send the whole document back and forth, we set up some storage for them and only send the links

RULE OF THUMB FOR VERSIONING SAGAS

when i release a new version all sagas already running stay in old version, all new will be run in new version (unless we've found a really bad bug in old implementation)
changing running sagas is dangerous and should be avoided
this rule makes versioning simple

Scaling writes

we only guarantee CA out from CAP on the write side so we can't partition it
we can do real-time systems with CQRS
stereotypical architecture: single db, multiple app servers with load balancer in front

pros

fault tolerance
can do software upgrade without going down
knowledge about it is widespread

cons

app servers must be stateless!
can't be scaled (just buy a bigger database)
database remains a single point of failure
database might be a performance bottleneck

it's good but has limitations

let's replace the database with a event store!

there's no functional difference between this solution and previous one
loading aggregates on each request increases latency

we might split event store into multiple stores, based on aggregate ID (sharding)

this can (theoretically) go as far as having a single event store per aggregate
problem happens when one of the datastores goes down

we could multiply them with a master-slave pattern
but: each slave increases latency

this allows scaling out our event store

in order to reduce latency we can switch from stateless to statefull app servers

we have a message router (with fast, in-memory routing info) which knows which aggregate resides in each app server
loaded aggregate stays in memory of the app server
over time event store becomes write-only
when a app server goes down message router must distribute it's job among other servers

this can cause latency spike unacceptable for some real-time systems

to solve the problem we can use a warm replica

just as in previous example but:

when message is routed to a server another server is told to shadow the aggregate that the message was directed to

shadowing server loads the AR and subscribes to it's events

events are delivered to shadowing systems by a publisher

stays ~100ms behind original write
can use UDP multicast for publishing events

when a server goes down shadowing server is only 100ms behind it and requires small operation to catch up with current state
this greatly reduces the latency spike when a server is going down
but...
we can get rid of the spike completely!
when shadowing server receives first command it can act as if it was up-to-date

but still listen to events from event store!
until it gets events it created itself it tries to merge

same code as regular events merging!

when it does get its own events it unsubscribes

many businesses will accept the risk of possible merging problems to avoid latency spikes

with this architecture there are no more reads from the event store!

Occasionally connected systems
My notes here are barely readable drawings on paper with some (even less readable) text here and there. Will unfortunately have to skip it (I'm certainly NOT doing those drawings in GIMP!) but...
Greg already had a presentation on this subject recorded. It covers the same topics (watched it few days before the class).

The interesting thing here is the conclusion: CQRS is nothing else as plain, old, good MVC (as initially done in Smalltalk) brought to architectural level.
None of these ideas are new.
Isn't it cool?
The important lesson is:
Review what you have already done.

== END OF DAY 3 ===
and unfortunately of the whole training. A pity, I wouldn't mind at all spending few more days attending to such a great class! Thanks a lot for it, Greg!

wtorek, 19 kwietnia 2011

Notes from DDD & CQRS training - day 2

That's when things got really interesting. The topics covered were more or less the same as in the 6.5h video from one of Greg's previous trainings which I had watched some time before the training during a long, lonely night at a hotel in Germany, but still I was listening at 100% attention. Without further babble, here go my notes:

Read model

simple, hard to screw up
to be done by low value/junior developers
can be outsourced

can't find better thing to OS

uses whatever model is appropriate

Command handlers

public interface Consumes<T> where T:Message{

//T would be a command(in case of command handlers)

//or an event in case of event handlers/projections

void consume(T);

}

public interface Message {

// just a marker interface

}

they are the application services in CQRS-based systems, the external edge of the domain model
should contain no logic, not even a simple if statement
can implement cross-cutting concerns

logging
transactions
authentication
authorization
batch commands
exceptions handling
merging

handle cross-cutting concerns not directly in the same class that invokes aggregate method, but using composition (think decorator pattern).

class DeactivateInventoryItemCommandHandler :

Consumes<DeactivateInventoryItemCommand> {

/* constructor-injected repository */

void consume(DeactivateInventoryItemCommand msg) {

var item = repository.getById(msg.id);

item.deactivate(msg.comment);

// this makes batch cmd processing impossible

repository.save(item);

}

class LoggingHandler<T> : Consumes<T> {

public LoggingHandler(Consumes<T> next) {

this.next = next;

}

public void consume(T message){

Logger.write("received message:" + message);

next.consume(message);

}

var handler = new LoggingHandler(

new DeactivateInventoryItemCommandHandler(

repo));// yay, we've got logging!

class AuthorizingHandler : Consumes<T>{

AuthorizingHandler(Consumes<T> next){...}

void consume(T message){

// check authorization then do:

next.consume(message);

}

make command handler wrapping automatic with reflection:

[RequiresPermission("admin")]

class DeactivateInventoryItemCommandHandler :

Consumes<DeactivateInventoryItemCommand> {....}

we can make our code our configuration

above is equal to doing functional composition (with interfaces). it could also be done explicitly:

// let's have a lambda:

return x => DeactivateInventoryItemCommandHandler(

new TestRepository<InventoryItem>(), x);

// this is DI in functional language

// using function currying - that's so cool!

public void DeactivateInventoryItemCommandHandler

(Repository<InventoryItem repo,

DeactivateInventoryItemCommand) {...}

Projections

consume many events to update a view model
an important explicit concept
will have multiple methods, each handling another type of event
are in 1-to-1 relation with tables (sometimes, but rarely, 1-N)

class InventoryItemCurrentCountProjection :

Consumes<InventoryItemDeactivated/*Event*/>,

Consumes<IventoryItemCreated> , ...

// more events needed to update the view model

// can't directly translate to Java :(

{

void consume(InventoryItemDeactivated message) {

// do sth

}

void consume(IventoryItemCreated message) {

// do sth else

}

BOOK TO READ: The little LISPer

CQRS can be done using a single data store for writes & reads. Like building the read model based on SQL views. But we can drive the specialization of write & read side even further. Finally, they've got totally different characteristics.

And reports run on 3NF database are so sloooow.

Enter:

Events

verbs in past tense - they are things that have already happened, actions completed in the past (think passé composé)
listeners can disagree with them but can't say NO

can only compensate

can be used for synchronizing multiple different models

So, we'll have our domain model implemented with (n)Hibernate emit events so that we can have our beloved 3NF database and denormalize into multiple read models (to get near infinite scalability)? Just having a 2PC transaction between the write db and a queue?

Nope. This is guaranteed to fail.

Why?

ORM creates series of deltas
we have to prove that Δ(Hibernate) = Δ(events) - not easy
models can get out of sync in case of a bug

such problems can be hard to spot
impossible to fix data model broken this way

So, what shall we do?

get rid of the ORM so our events are our only source of truth
we can have projections populating our 3NF model

but is it worth the costs and increased size of our code base?

this is the poison pill architecture

will get you to event sourcing
getting rid of 3NF model will let you get rid of

your DBA freaks
costs of DB licences (business will like it!)

Event sourcing At last!

in functional programming terms: current state = left fold of past behaviours
existing business systems (systems of problem domain, not necessarily computer systems, think: accounting) use history, not current state (like bank account balance)
deriving state from events allows us to change implementation of domain model easily, without affecting object persistence = disconnects the domain model from storage
events cannot be vetoed but we can compensate:

partial compensating actions (difficult, we don't want to go this way)
full compensating actions (accounting people - and developers! - prefer it)

compensate the whole transaction & add another one, correct

ES gives you an additive (append)-only behavioural model
we don't loose any information we would loose with structural model

we can build any structural model from our events
event log let's you see the system as it was at any point of time

this means you can go back in time
which is extremally valuable for debugging!

you can re-run everything that the system has ever done on latest (or any) version of software

when using MongoDB or Cassandra (or sth similar) aggregates can become documents you append events to
user's intention should be carrier through from commands to events
events are not equal to commands, even if from implementation point of view they might be identical
events can be enriched with results of business operations (authorization code of credit card operations, sales tax etc)

this prevents duplication of business logic between various places

Event sourced aggregates

a base class is OK
important methods:

applyChange(event)

calls an event handling method of the aggregate (apply) for the concrete event type
registers events that have happened if it's a new event

public methods defining the business interface of the aggregate

business logic, conditionals live here

private methods defined in concrete aggregate classes handling events

no conditionals
only setting data

loadFromHistory(IEnumerable<Event>

accepts an event stream to restore the aggregate from the history
calls the apply method for each event (that's why those methods don't have behaviour, only set the data)

repository

saves only uncommitted changes of the aggregate and marks them as committed in the aggregate (clears the uncommitted events list)

a command makes aggregate produce 0..N events
you need a unit-of-work to support batch command processing

if you don't need it an explicit call to repository.save() in your event handler should be ok
UOW could be configured to accept events from only 1 aggregate and changing that setting to allow batch processing

What if our aggregates have so many events that restoring aggregate state from them becomes a serious performance problem?

Rolling snapshots

event log changes: [1,2,3,4,5,6,7] becomes [1,2,3,4,5, snapshot, 6, 7]
don't use direct serialization of aggregates

use Memento pattern
snapshots can be versioned

build snapshots in a separate snapshotter process

Testing with Event sourcing

DDD testing - no asserting against getters, just the behaviour

an example scenario:

public class when_deactivating_an_deactivated_inventory_item :

AggregateSpecification {

public IEnumerable given() {

yield return New.inventoryItemCreatedWithId(5);

yield return New.inventoryItemDeactivatedWithId(5);

}

public override void when() {

aggregate.Deactivate();

}

[Then]

public void an_invalid_argument_exception_is_thrown() {

Assert.isType{thrown};

}

[Then]

public void no_events_are_produced() {

Assert.isEmpty(events);

}

there's no magic in it, the base test class is dead simple:

public abstract class AggregateSpecification

where T:AggregateRoot {

public abstract IEnumerable Given();

public abstract void When();

protected T aggregate;

protected Exception caught;

protected List events;

[Setup]

public void Setup() {

try {

aggregate = new T();

aggregate.loadFromHistory(given);

When();

events = new List(

aggregate.getUncommittedChanges());

} catch (Exception ex) {

caught = ex;

}

documentation can be generated from those tests

or: write the tests in natural language and generate test classes from them

then you (and business people) can see your progress as you make test cases pass
such tests can be used as communication tool
generate such docs in html or whatever format on every CI build so that business can see them at any time
override toString() on every event to get human-readable output that can be used in such tests

personal note: this is f*cking awesome!

we could also do it like:

public class when_deactivating_an_deactivated_inventory_item :

AggregateSpecification {

public IEnumerable given() {

yield return New.inventoryItemCreated.WithId(5);

yield return New.inventoryItemDeactivated.WithId(5);

}

public override Command when() {// difference here!

return New.DeactivateInventoryItem.WithId(5);

}

[Then]

public void an_invalid_argument_exception_is_thrown() {

Assert.isType{thrown};

}

[Then]

public void no_events_are_produced() {

Assert.isEmpty(events);

}

entire testing can be expressed with events & commands

or maybe have a DSL to express those tests in platform-independent way? like:

<Given>

</Given>

<When>

</When>

</Expect

then get (for example) Ruby to make it ever nicer to look at

HINT: give business people a comfortable way to share their knowledge to development team

calling a method of an object == sending a message to an object

refucktoring

changing a test is making a _new_ test

versioning -> new tests with new versions

HINT: you don't want your devs understand the framework - you want them to understand the CONCEPT

Building an event store on top of a SQL db

there's a detailed explanation available at cqrsinfo.com

RDBMS provides transactions out-of-the-box

with multiple event stores you can only guarantee events ordering within aggregate boundary (you can do global ordering with single event store)

metadata can be stored along with events (server the event originated in, security context, user, timestamp etc)

uses optimistic concurrency and carries the version between server & client

for storing events a stored procedure is recommended to avoid multiple server-db roundtrips:

BEGIN

var s = select currentversion from aggregate

where aggregateid = @1

if(s==null)

s = 0

INSERT INTO AGGREGATES ....

if( s != expectedVersion)

throw new ConcurrencyException();

foreach(event e)

s++

INSERT INTO EVENTLOG ...

update aggregate set currentversion = s

END

snapshotting

brings in another table ( to avoid concurrency exceptions with the servers writing real events into the store)
snapshotting is done asynchronously by a snapshotter
when to snapshot? when we've got a certain number of events not included in last snapshot (different aggregates can have different snapshotting rules depending on the type etc)
snapshots are NOT necessity for most systems, only a heuristic brought in when we need performance boost on the write side
snaphots can be versioned differently from domain model (thanks to usage of Memento pattern for snapshots)
don't do snapshots by default

event store is a queue (little mutant freak database-queue hybrid baby)

CQRS vs CAP theorem

CQRS doesn't break the CAP theorem
we don't get all 3 properties at the same time
domain (write) side needs C and A
read model needs A and P

events, commands and dto are very strong boundaries allowing us to specialize within them

CQRS from business perspective

are all developers created equal?

if your answer is yes you're just wrong
if your answer is no - why same people work on domain, UI & data storage?

how much time (%) do you really spend working with your domain? 25%? 30%? 40%?
reasons to create systems in private sector:

make money
save money
manage risk (let's have it just in case)

organizations with high level of maturity can have bigger teams
CQRS can get more people into the team (up to 2.5x) without decreasing maturity level

people can work in 3 teams independent of each other
communication between teams is low
create schema (think XSD) describing DTOs, commands & events in the estimation phase

if you can't - what the hell are you estimating?!

don't allow features to cross iteration boundaries - we want working system, not components, at the end of the iteration

keep teams working on the same feature at the same time

when working with UI you can mock out read model & commands endpoint

same with developing other parts of the system

there are 4 parts of every task/story: domain, GUI, read model, integration

Moving to CQRS+ES architecture

one aggregate at a time
ask yourself: how are we gonna kill the system?

when ES-based system dies the events log is all that is left behind

you can migrate from ES to different system by creating a projection matching target data model

CQRS vs stereotypical architecture

CQRS

writing to read side sucks
reading is easy

stereotypical architecture

writes are easy
queries suck

we're making a trade-off
both architectures produce an eventually consistent system
what about integration?

CQRS+ES system has integration model build-in!

our read model is in fact integrating with our domain
so we have actually tested our integration model!
we have a nice, push integration model

not an ugly, pull model

=== END OF DAY 2 ===

That's it for now, advanced topics coming next in notes from day 3.

Nighty-night!

poniedziałek, 18 kwietnia 2011

Notes from DDD & CQRS training - day 1

My notes from the 1st day (11/04/2011) of the DDD/CQRS training by Greg Young, just as I took them - very little post-processing applied so it might be of little help for anyone but me (or maybe other participants of the training).

UIs:

CRUD (these suck)
task-based (users like those)

Aggregate:

group of object we treat together as a whole
affect only a single aggregate - that lets you avoid distributed transactions (think horizontal partitioning/sharding)
put the method next to the state it operates on is
denormalization helps to get the design right

Booksto read:

Streamlined object modelling

time interval object
make implicit explicit

Object-oriented software construction 2nd edition by Bertrand Meyer

describes CQS

saving two objects = bad

business doesn't care about consistency
breaking bidirectional relationships

ask: do those things need to be consistent?
drop consistency of invariant

domain model != data model
if needed, a Domain Service can ensure consistency (this should really be used only as a last resort!)
collection of Transaction objects can have a domain meaning
AggregateRoot (AR) name makes sense for the entire aggregate
too much magic is bad (think ORM)
between aggregates use soft links (IDs) instead of references

TIP: keeping track of Optimistic Concurrency Exceptions makes an interesting statistic

EXERCISE: test-drive Probability value object class with methods like combine(Probability), not() etc, encapsulating a Java's BigDecimal (.NET's Decimal?). The tricky part: you can't have any kind of accessor methods to expose the internal state. What do you test first?

And now... suppose that standard BigDecimal implementation is too slow for your system. You have to change the implementation of the Probability class but retain the API. How many tests do you have to change?

personal note: this turned out to be an easy, yet an interesting exercise. Funny, how it changes the way you write code when you don't have those evil getters around. I really, really liked it!

Repository:

Evans: works on aggregates, provides domain language to persistence infrastructure
Fowler: purely technical stuff

make contracts in the domain:

as narrow as possible
as explicit as possible

this will lower the conceptual coupling

Service:

any piece of procedural code
can be:

infrastructure
domain
application

but it the end they are all facades
if you do things right you might never need services
interface segregation

single method interfaces
role interfaces

Hexagonal architecture - ports & adapters

TIP: why not check check-ins for illegal dependencies (like domain depending on something else) and reject those that don't follow the rules?

On SOLID principles:

they are just heuristics
don't try to stick to them no matter the cost (duplication sometimes can be a good thing!)

EXAMPLE: When not to adhere to Interface Segregation?

when all methods go the the same source

class Stream implements ICanSeek,ICanRead,ICanWrite

// client code:
void DoesSomething(ICanSeek seeker, ICanRead reader) {
seeker.seekTo(0);

while (var x = reader.read() != null) {

/...

}

ICanSeekRead extends ICanSeek,ICanRead != Foo implements ICanSeek, ICanRead

DI/IoC:

ServiceLocator is totally OK when resolving things at the same layer
about injecting into entities: most dependencies match the lifecycle of methods, not objects

void Submit(ISearchDriverLicences s) {

s.searchFor("something");

}

in functional programming DI can be implemented with function currying

void F() {//coupling from F to G

G.Something();

}

interface ISomething{

something();

}

void F(ISomething s) {

s.something();

}

class ISomethingImpl:ISomething {

// sometimes DI is too much:

void something(){

Console.WriteLine("Hello world");

}

we're overusing tools, frameworks

frameworks pollute our brains

Back to services:

ApplicationServices

should be role interfaces, one for every use case of the system
you should have no business logic in them (not even an if statement!)

isValid() antipattern

pure evil!
don't do that!
causes GIGO (Garbage In, Garbage Out)
entities end up being in one of 3 possible states:

valid
invalid
have no frakking clue

encapsulation is about protecting state - don't let people jam it!

Specification

(wikipedia)
predicate logic (think Prolog)
might need getters (protected/internal) exposed

but people will start using them as soon as they see them

composite specification

public class AService {

AService( IEnumerable<Specification<Customer>>

rules){}

void deactivate(Customer c) {

if(!rules.areAllValid(c) {

throw new IllegalArgException();

}

...

}

=== END OF DAY 1 ===

Unfortunately, those notes don't show how absolutely awesome the training was. Really got my eyes wide open on many issues that I was somehow missing before.

On a related note - people really do use functional programming in real-world applications! After hearing that from Greg I started learning Clojure. I had a Prolog & Haskell course back at the university. I didn't like the Prolog part but really enjoyed writing minimized code in Haskell. Now I just have to find some time to refresh by skills at functional programming. Or better - find some use for it so I can justify re-learning it at work ;)

DDD & CQRS training

Last week I participated to a great training by Greg Young in Kraków. I was amazed how much knowledge can get stuffed into my brain in mere 3 days. Prior to the training I have watched a couple of Greg's presentations (yeah, including the 6.5h-long video from a CQRS training), read blogs, followed the DDD/CQRS news group but still found myself sitting down as if I were hypnotized for 3x8 hours. Not once during the training Greg's answer let me think that he was talking about something he was unsure of. I find it annoying that some people (read: consultants) give talks about things they don't have any experience with.

I suppose I won't waste any more time trying to duplicate Piotr's blog entry about the training and will start re-writting my notes immediately.

Damn, just found the feedback form Greg asked us to fill in for him. The blog entry will have to wait for a moment.

Initial babble

The time has just come for me to write my first blog entry ever. Okay, maybe not write - publish. Has already written a few but never got to finish & publish one.

On my quest to pretend that I'm more of a social creature that I really am, I also set up a Twitter account In fact it was Greg Young who pushed me to take this desperate step. He found super weird that no one was tweeting during IT conferences in Poland. Maybe some people just don't want to miss their only chance to actually have a face2face chat with real people? Dunno, I just can't imagine wasting my time on Twitter while attending Greg's great DDD&CQRS training (will- hopefully- write more about it in the days to come). I suppose some people (like myself) prefer to stay quiet until they've got something interesting to say. I prefer doing that than talking aloud about things I really got no idea of.

Oh wait, I think I got side-tracked so will get back on the track and finish this short post. I hope I will have some interesting stuff to share with people on this blog. Don't really want to write HelloWorld-style entries but I suppose it could be useful for myself while learning a new technology or another evil Java framework. Will start with my notes from the training mentioned above, that will give me a change to re-read and re-think it. And will make it more durable than the paper I scribbled it on. Hopefully one day someone will find it worth reading.

This entry makes no sense whatsoever but I will publish it nonetheless, just to get myself started with blogging.

Stay tuned for more, I sincerely hope it will come.