Category Archives: Uncategorized

Incompatible AVRO schema in Schema Registry

My company uses Apache Kafka as the spine for its next-generation architecture. Kafka is a distributed append-only log that can be used as a pub-sub mechanism. We use Kafka to publish events once business processes have completed successfully, allowing a high degree of decoupling between producers and consumers.

These events are encoded using Avro schemas. Avro is a binary serialization format that enables a compact representation of data, much more than, for instance, JSON. Given the high volume of events we publish to kafka, using a compact format is critical.

In combination with Avro we use Confluent’s Schema Registry to manage our schemas. The registry provides a RESTful API to store and retrieve schemas.

Compatibility modes

The Schema Registry can control what schemas get registered, ensuring a certain level of compatibility between existing and new schemas. This compatibility can be set to one of the next four modes:

  • BACKWARD: a new schema is allowed if it can be used to read all data ever published into the corresponding topic.
  • FORWARD: a new schema is allowed if it can be used to write data that all previous schemas would be able to read.
  • FULL: a new schema that fullfils both registrations.
  • NONE: a schema is allowed as long as it is valid Avro.

By default, Schema Registry sets BACKWARD compatibility, which is most likely your preferred option in PROD environment, unless you want to have a hard time with your consumers not quite understanding events published with a newer, incompatible version of the schema.

Incompatible schemas

In development phase it is perfectly fine to replace schemas with others that are incompatible. Schema Registry will prevent updating the existing schema to an incompatible newer version unless we change its default setting.

Fortunately Schema Registry offers a complete API that allows to register and retrieve schemas, but also to change some of its configuration. More specifically, it offers a /config endpoint to PUT new values for its compatibility setting.

The following command would change the compatibility setting to NONE for all schemas in the Registry:

curl -X PUT http://your-schema-registry-address/config 
     -d '{"compatibility": "NONE"}'
     -H "Content-Type:application/json"

This way next registration would be allowed by the Registry as long as the newer schema were valid Avro. The configuration can be set for an specific schema too, simply appending the name (i.e., /config/subject-name).

Once the incompatible schema has been registered, the setting should be set back to a more cautious value.

Summary

The combination of Kafka, Avro and Schema Registry is a great way to store your events in the most compact way possible, while still retains the ability to evolve the corresponding schemas.

However some of the limitations that the Schema Registry imposes make less sense on a development environment. On some occassions, making incompatible changes in a simple way is necessary and recommendable.

The Schema Registry API allows changing the compatibility setting to accept schemas that, otherwise, would be rejected.

FAQ: Story points

Story points are quite old, but there are still way too many misunderstandings around them. Below I’m going to try to shed some light on the most common doubts around them.

 

What are Story Points?
It’s a way to measure the effort necessary to implement a story, where a story is some requirement that an Agile team is going to convert into working software.

 

How do they work?
You have a scale of values, you define a baseline (a really simple story that you would consider requires an effort of 1 point) and then you estimate everything relatively to that baseline story. If a story requires the same or less effort than your baseline, you give it 1 point. If it is roughly twice as difficult, you assign 2 points. The values in the scale have to be spacious enough to make sure you don’t try to estimate “too precisely”. Therefore many teams choose Fibonacci series as their scale (1, 2, 3, 5, 8, etc).

 

Wait a minute, what do you mean by “don’t try to estimate too precisely”? And why not just estimating using time?
I mean exactly that. When you use this technique, you are implicitly recognising that you can’t provide meaningful estimations with the level of detail that a time estimation requires. In plain English, you recognise your estimations in time are not accurate, therefore they don’t have any value.

Instead you use a more high-level, less-precise measure like story points. Even if it is less precise than a time-based estimation, it is more valuable because it’s more stable and, overtime, it will be more helpful to forecast team and project progress.

 

Is effort all I have to take into account when estimating with story points?
Not necessarily, although it is the most important bit. Other things that you may consider are:

  • How clear are the requirements and acceptance criteria in the story?
  • Does it look like they may be many technical or business unknowns that will be discovered during the implementation phase?
  • Is there any technical risk? For example, are you using a technology for the first time?

The more question marks around the story, the higher the number of story points.

 

Can I sum story points?

No, you can’t. They don’t represent numbers, they represent buckets. That means that, when you have a story that is the same or less effort  than your base line, you put in the 1-point bucket. When it’s the same or less than twice the effort for your base line, you throw it to the 2-point bucket, etc. You get the point.

Also quite often the amount of time require to implement a 3-point story will be much more than 50% more the effort of a 2-point story. There is no linearity, not to mention that the higher the bucket, the wilder the oscillation in implementation time (which makes sense because the higher the risk too).

 

Is Story Points the only way to measure stories and forecast?

No, there are other metrics. T-shirt sizes is quite common too. Some people also consider using “ideal days”. This one is, more or less, a representation of how much work you can do in a perfect day, without meetings, without distractions and without any other problem. Then you assign those ideal days to stories and, if you’re working on sprints, over time you can measure how many actual ideal days your team has per sprint.

 

Do I have to use Story Points if I do Scrum?

Not at all. If you check the Scrum.org Scrum Guide, story points aren’t mention anywhere. That makes all the sense, because contrary to what many people think, Scrum is a quite loose framework (not a process) that you have to fill in with your own practices to come up with a development process. Actually, years ago the Guide didn’t even mention estimations. It just mentioned your backlog should be ordered and it was up to the Product Owner to discover what that order should be.

 

Why should I use Story Points then?

You shouldn’t if you don’t know why you would use them. And you would use them if you want to provide some forecasting regarding your project. Basically, been able to answer the question: “when is this going to be done?”. Story Points help you answer that question because, overtime, you get some sense of how many points you can deliver per unit of time, where that unit of time is usually your sprint size in weeks. Based on that, you can be reasonable confident about how many stories you can get done and when, on a relatively close time horizon. Don’t try to estimate a massive project using story points before even starting it, it won’t work. You won’t have enough understanding of the project, the stakeholders and the technology and your estimations will have zero value.

 

Why should I estimate in the first place?

Well, if you are a developer, estimating doesn’t add any value to you; zero. You just want to get a list of things to do and nail them and you don’t need to communicate in advance when they’ll be done, right? However, some people would argue that part of been a professional engineer includes providing meaningful estimations regarding delivery of software to the rest of the business. In better words than mine:

Avoiding responsibility for estimates is another way of saying, “I’m not ready to be relied upon for building critical pieces of infrastructure.” All businesses rely on estimates, and all engineers working on a project are involved in Joint Activity, which means that they have a responsibility to others to make themselves interpredictable. In general, mature engineers are comfortable with working within some nonzero amount of uncertainty and risk.

So man up and come up with some respetable estimations that you’re willing to commit to.

 

Should Management measure team’s productivity using Story Points? 

NEVER. That is one of the biggest mistakes that can be done. If you do so, you’re going to make two mistakes in one:

  • You will ruin story points as a tool to estimate. Eventually every human being tends to trick any system rules, even unconsciously. If you measure people’s productivity with points, they will just inflate their estimations to make it look like more points are delivered per sprint, therefore the team is doing more. Wrong and useless.
  • You’ll miss the opportunity to use a proper and useful measure, like business value. Not saying that business value is easy to measure, though, but definitively worth trying instead of measuring something that is completely irrelevant and easy to trick.

 

What’s the difference with Planning Poker?

Planning Poker is just a estimation technique, not a estimation measure. You use planning poker as a way to take advance of the “Wisdom of Crowds”. Planning Poker is useful because:

  • Estimations are done and presented without knowing other members’ opinion. Therefore more junior/shy members won’t be influenced by estimations presented by senior/stronger players.
  •  If estimations don’t match, a healthy debate is triggered where more information is brought into the discussion for those that have bigger/smaller numbers. That benefits the final estimation and also helps all team benefit from the insights of each member.

 

Is that all?

Not really, there are many other things that are interesting on this topic, like trying to correlate points with time (bad idea IMHO) , what a good scale for points should look like, what to do if you realize after implementing a story that it was over/under estimated, how to manage scope creep, etc. Maybe for another day.