A bit about me
- I'm from Poland (Europe)
- 9 years of commercial exp in IT
- 7 years with Ruby and Rails
- I find Ruby quite useful
- I love open-source
- Interested in quality-assurance automation tools
- Running a blog mostly about Ruby related stuff
Please notify me if...
- I speak 2 fast
- I should repeat something
- I should explain something better
- You have any questions
What is Apache Kafka?
- Kafka is a high-throughput distributed messaging system
- Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization
- It can be elastically and transparently expanded without downtime
- It provides broadcasting to many applications
- Allows to build systems that are event based
Who uses Apache Kafka?
- Linkedin
- Yahoo
- Twitter
- Netflix
- Square
- Spotify
- Pinterest
- Uber
- Tumblr
- Cisco
- Foursquare
- Shopify
- Oracle
- Urban Airship
- OVH
- And many more...
What is Karafka?
- Karafka = Kafka + Ruby => KaR(uby)afka
- It is a microframework
- It was designed to simplify Kafka based applications development
- It allows developers to build "Rails like" apps that consume and produce messages
Why we developed Karafka?
- We've needed a tool that would allow us to build applications faster
- We've needed a tool that would allow us to process faster
- We've needed a tool that would allow us to handle events and messages from many sources and process them the same way
- Because single message can be automatically delivered to many Karafka applications
Why even bother with messaging when there is HTTP and REST?
- HTTP does not provide broadcasting
- We often need to trigger many actions based on a single event
- We don't want to maintain internal API clients
- With a message broker you can replace microservices transparently
- You can obtain better microservices isolation
- Because you can create new microservices that use multiple different events from many sources
It really is about messaging
Real life is asynchronous
Microservices without broadcasting
Without a broker you need to add code to both ends of your SOA system
Microservices with broadcasting
With a broker all you need to know is topic on which you want to listen and a message format
Karafka uses goods that are already well known
- Ruby-Kafka
- Celluloid to introduce sockets clustering inside threads
- Sidekiq to support background data processing
- Rails app structure concept for bigger apps
- Sinatra app structure concept for small apps
Karafka ecosystem
Each part can be used independently
- Karafka Framework - Engine to process incoming messages
- WaterDrop - Ruby-Kafka based library for outgoing messages
- Worker Glass - Worker wrapper that provides optional timeout and after failure (reentrancy)
Karafka framework components
Apart of the implementation details, Karafka is combined from few logical parts:
- Messages Consumer (Karafka::Connection::Consumer)
- Router (Karafka::Routing::Router)
- Base Controller (Karafka::BaseController)
- Base Worker (Karafka::BaseWorker)
- CLI (Karafka::Cli)
Karafka framework components
How can I start using it?
# Gemfile
source 'https://rubygems.org'
gem 'karafka', github: 'karafka/karafka'
bundle install
bundle exec karafka install
Then open app.rb and update configuration settings
All the configutation options are described here:github.com/karafka/karafka
Karafka conventions and features
Karafka conventions and features
Karafka has a routing engine similar to the Rails one (just much smaller)
App.routes.draw do
topic :incoming_messages do
group :composed_application
controller Videos::DetailsController
worker Workers::DetailsWorker
parser Parsers::BinaryToJson
interchanger Interchangers::Binary
end
# If you work with JSON data, only controller is required
topic :new_videos do
controller Videos::NewVideosController
end
end
Karafka conventions and features
NewVideosController #=> NewVideosWorker
Users::PaymentsController #=> Users::PaymentsWorker
By default Karafka builds a worker class per controller based on a controller name. This will allow you to prioritize (if needed) Sidekiq workers
Karafka conventions and features
You can overwrite all of the default behaviours
# If you work with JSON data, only controller is required
topic :new_videos do
controller Videos::NewVideosController
# Instead of a default Videos::NewVideosWorker
worker Videos::DifferentWorker
end
end
Karafka conventions and features
Karafka controllers are simple. All you need is a #perform method that will be executed asynchronously in response to an incoming message
class CreateVideosController < Karafka::BaseController
def perform
Video.create!(params[:video])
end
end
Karafka conventions and features
#before_enqueue filter that acts in a similar way to Rails #before_action
class CreateVideosController < Karafka::BaseController
before_enqueue -> {
# Reject old incoming messages
# When before_enqueue throws false,
# task won't be send to Sidekiq
throw(:abort) if params[:sent_at] < 1.minute.ago
}
end
It can be used to provide first layer data filtering. If it returns false, Sidekiq task won't be scheduled
Karafka conventions and features
There are also few usefull CLI commands available:
bundle exec karafka [COMMAND]
console # Start the Karafka console (short-cut alias: "c")
flow # Print application data flow (incoming => outgoing)
help # Describe available commands or one specific command
info # Print configuration details and other options
install # Install all required things for Karafka application
routes # Print out all defined routes in alphabetical order
server # Start the Karafka server (short-cut alias: "s")
worker # Start the Karafka Sidekiq worker (short-cut alias: "w")
Karafka performance
- Is strongly dependent on what you do in your code
- Redis performance (for Sidekiq) is a factor as well
- Message size is a factor
- Single process can handle around 30 000 messages/sec
- Less than a ms to send a message with the slowest (secure) mode (Kafka request.required.acks -1)
- Less than 1/10 of a ms to send a message with in the 0 mode (Kafka request.required.acks 0)
Karafka framework scalability
Each scaling strategy targets a different problem
Scaling strategies can be combined
Following strategies are available:
- Scaling using multiple Karafka threads
- Scaling using Kafka partitions
- Scaling using Karafka clusterization (in progress)
Scaling using multiple threads
- Good when you have multiple topics that are not 100% utilized
- Good when you want to provide paralleism but still have a single process running
- Generally the easiest way to have multiple controllers listening at the same time
Scaling using multiple threads
Scaling using Kafka partitions
- Topic partition is the unit of parallelism in Kafka
- Partitions are an answer to heavy duty topics
- Karafka processes automatically rebalances between available partitions
- Karafka requires topics partitioning when you want to handle more than 30 000 messages per second per topic
Scaling using Kafka partitions
Scaling using Karafka processes clustering
- Single Karafka process can handle up to 30 000 messages per second (total)
- It means that the bigger your application is, the slower it gets (per controller)
- Thanks to process clustering, each Karafka process will listen only to a selected part of topics
- That way with a 10 process cluster, we can increase throughput to more than 300 000 messages per second
Scaling using Karafka processes clustering
I want to integrate it with my Rails/Sinatra app
- The best approach is to start generating messages from your current applications via
WaterDrop
- With WaterDrop you can tell your Karafka apps what your other Ruby components are doing
def create
video = Video.create!(params[:video])
WaterDrop::Message.new(:video_created, video.to_json).send!
respond_with video
end
- Once you start sending messages, you can extract functionalities and responsibilities and move them to Karafka based applications
WANT TO CONTRIBUTE?
github.com/karafka
- The more people star it, the more people use it
- The more people use it, the more people star it
- There are many issues you can help us fix
- We use Code Climate and Travis with many QA tools to maintain the quality
Karafka – Ruby framework for building Kafka message based applications
Maciej Mensfeld
twitter: @maciejmensfeld
www: mensfeld.pl
e-mail: maciej@mensfeld.pl