Finding Tech

Always find what I found
Posts tagged "Aggregation"

Sentry[https://github.com/dcramer/sentry]

Sentry is a realtime event logging and aggregation platform. It specializes in monitoring errors and extracting all the information needed to do a proper post-mortem without any of the hassle of the standard user feedback loop.

Also available as a hosted version @ https://www.getsentry.com

Flume[https://cwiki.apache.org/FLUME/]

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytics application.

Flume is open-sourced under the Apache Software Foundation License v2.0.

qlio[http://ql.io]

A declarative, evented, data-retrieval and aggregation gateway for HTTP APIs. Through ql.io, we want to help application developers increase engineering clock speed and improve end user experience. ql.io can reduce the number of lines of code required to call multiple HTTP APIs while simultaneously bringing down network latency and bandwidth usage in certain use cases.

ql.io consists of a domain-specific language inspired by SQL and JSON, and a node.js-based runtime to process scripts written in that language. Check out ql.io on Github for the source and http://ql.io for demos, examples, and docs.

Why ql.io?

HTTP based APIs – some call them services – are an integral part of eBay’s architecture. This is true not just for eBay, but for most companies that use the Web for content and information delivery. Within eBay’s platform engineering group, we noticed several pain points for application developers attempting to get the data they need from APIs:

  • Most use cases require accessing multiple APIs – which involves making several network round trips.
  • Often those API requests have interdependencies – which requires programmatic orchestration of HTTP requests – making some requests in parallel and some in sequence to satisfy the dependencies and yet keep the overall latency low.
  • APIs are not always consistent as they evolve based on the API producers’ needs – which makes code noisier in order to normalize inconsistencies.

We found that these issues have two critical impacts: engineering clock speed and end user experience.

  • Engineering clocks slow down because developers need to account for dependencies between API calls, and to arrange those calls to optimize overall latency. Implementing orchestration logic involves multi-threaded fork-join code, leads to code bloat, and distracts from the main business use case that the developer is striving to support.
  • End user experience suffers due to high bandwidth usage as well as the latency caused by the number of requests and the processing overhead of non-optimized responses from APIs.

The goal of ql.io is to ease both pain points:

  • By using a SQL- and JSON-inspired DSL to declare API calls, their interdependencies, forks and joins, and projections, you can cut down the number of lines of code from hundreds of lines to a few, and the development time from entire sprints to mere hours. Using this language, you can create new consumer-centric interfaces that are optimized for your application’s requirements.
  • You can deploy ql.io as an HTTP gateway between client applications and API servers so that ql.io can process and condense the data to just the fields that the client needs. This helps reduce the number of requests that the client needs to make as well as the amount of data transported to clients.

Scribe[https://github.com/facebook/scribe]

Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups. If the central scribe server isn’t available the local scribe server writes the messages to a file on local disk and sends them when the central server recovers. The central scribe server(s) can write the messages to the files that are their final destination, typically on an nfs filer or a distributed filesystem, or send them to another layer of scribe servers.

Scribe is unique in that clients log entries consisting of two strings, a category and a message. The category is a high level description of the intended destination of the message and can have a specific configuration in the scribe server, which allows data stores to be moved by changing the scribe configuration instead of client code. The server also allows for configurations based on category prefix, and a default configuration that can insert the category name in the file path. Flexibility and extensibility is provided through the “store” abstraction. Stores are loaded dynamically based on a configuration file, and can be changed at runtime without stopping the server. Stores are implemented as a class hierarchy, and stores can contain other stores. This allows a user to chain features together in different orders and combinations by changing only the configuration.