From SandboxWiki
Jump to: navigation, search

Contact: sandbox-developers at

Source code

Source browser is available at;a=summary

Octopus overview

Octopus is a media engine for controlling audio and video streams. Media streams can be local files or actual streams over the network.

Octopus provides a higher level API for the end user applications to manage multimedia content. Target applications are eg. media players, voice and video call applications. Octopus itself works as a background service that several applications can use at the same time. The client API is currently a DBus API but in the next phase also an OpenMAX AL API will be provided.

For media content operations Octopus uses either GStreamer or OpenMAX IL components.

Implementation status

  • Basic audio and video playback tested on several embedded devices.
  • Includes a simple GTK based player UI.
  • D-Bus API implementation is nearly complete. Missing things:
    • Endpoint descriptions
    • Advanced video output management
    • Global volume and mute
  • Routing multiple sources to the same destination or through a common component does not work
  • Some bugs and memory leaks may remain. Should be tested with valgrind.
  • Native OpenMAX backend is rather limited and can currently only play MP3 files
  • Selecting variants of the same backend (such as native gstreamer / gstomx) requires messing with the installed .ocd files

Technical documentation

Octopus is a know-it-all media handler. It provides an interface for programs to request playback from a source and control the destination of the media content. For example, one can take a local file as a source, and set the destination to local screen or to a remote destination (a contact for example).

Media sources

Media engine will support the following sources for media (in priority order):

  • Local media files
    • Described by URI
    • GNOME-VFS elements handles these
  • Local media inputs
    • Microphone
    • Video camera
  • RTP streams
    • Described by SDP
  • Remote media
    • Described by URI
    • GNOME-VFS handles at least some of these

Media destinations

Media engine will support the following destinations for media (in priority order):

  • Local media outputs
    • Speakers/headphones
      • Described by HAL id's
      • Local screen
      • Described by XID ( Note: big issue, how do we manage this sensibly? xvideo + composite = troubles?)
  • RTP streams
    • Described by SDP
  • Contacts
    • Translates to RTP streams under the hood

Media engine architecture

Octopus architecture

Description of key components:

  • OctopusServer is the toplevel class and handles communications with the outside through D-Bus
  • Backend contains the logic for building and managing routes. There are separate backend classes for each media framework
  • Component definition files (not shown) are used to inform the backend of available components and their properties
  • Routes represent individual media pipelines
  • Modules can be used to provide additional functionality for backends (such as the QoS manager of the OMP project)

Component definitions

To make adding new components easy, component definition files are used. These are flat text files which describe the framework elements that make up each component. Each backend has a prefix and loads all files that match the glob "<prefix>*.ocd". Following is an example of a typical source component:

   component src_gnomevfs
      element gnomevfssrc
         source local-file-gnomevfs
      element typefind

The first line starts the component definition and gives it a name. The name is of no consequence and is only used for debugging output. By convention component names have a prefix describing their general function (such as src, demux, dec, sink).

The second line specifies that the component starts with an element of type gnomevfssrc. How the name is interpreted is backend-specific - the gstreamer and openmax backends use it directly as a framework element name. Indentation is not mandatory but makes the file easier to read.

The third line assigns an endpoint type (source) and name (local-file-gnomevfs) to the previous element. Endpoint names can be anything, but by convention they contain a prefix describing the function of the endpoint (local-file) and a part that makes the name unique. When looking for endpoints, prefix matching is used, so a request for local-file matches this endpoint.

The fourth line specifies another element for the component. Multiple elements in a component are always linked together - it is an error if the linking is impossible. The typefind element is necessary to find out what kind of media comes from the source.

The fifth and last line flags the component as dynamic. This tells the pipeline builder that it's okay to stop here even if it didn't reach the requested destination, since it is necessary to wait for the pipeline to process some data before continuing. A backend may place additional restrictions for the use of this flag. For the gstreamer backend, the last element of the pipeline should be either typefind or something with dynamic pads (such as a demuxer).

There are two additional keywords that are recognized in the component definitions:

parameter <name> <value> - sets a parameter for an element. The gstreamer backend maps these directly to GObject properties.

priority <value> - sets component priority. Components with a higher (more positive) priority value are considered before those that have a lower value.

Building pipelines

Each backend implements its own logic for building a media pipeline. Generally the process should go as follows:

  1. Select a source component according to the endpoint name. Source URI may not be available at this point.
  2. Try to build a route to each of the destinations. If the destination or a dynamic component is reached, accept the (part of) route and return.
  3. If the build attempt ended with a dynamic component, wait until playback starts.
  4. When more information about the media is available, go back to step 2.

D-Bus interface

The D-Bus interface should have the following abilities:

  • List sources
  • List destinations
  • Query active source->dest mapping
  • Apply source->dest mapping
  • Signal playback status
  • Interrupt requests
    • lower volume / pause / stop playback due to events

The current interface is fully described in the Octopus D-Bus API XML.

The endpoints (sources and destinations) can be queried either at once or separately. Endpoints are identified by arbitrary strings. To make them somewhat predicatable, there should be predefined set of names that have known characteristics but that can be provided by any module.

    <method name="GetEndpoints">
      <arg type="as" direction="out" name="sources" />
      <arg type="as" direction="out" name="destinations" />
    <method name="GetSources">
      <arg type="as" direction="out" name="sources" />
    <method name="GetDestinations">
      <arg type="as" direction="out" name="destinations" />
    <method name="GetDescriptions">
      <arg type="as" direction="in" name="endpoints" />
      <arg type="as" direction="out" name="descriptions" />

A mapping specifies the endpoints to use for a route

    <method name="SetMapping">
      <arg type="u" direction="in" name="id" />
      <arg type="as" direction="in" name="sources" />
      <arg type="as" direction="in" name="destinations" />
    <method name="GetMapping">
      <arg type="u" direction="in" name="id" />
      <arg type="as" direction="out" name="sources" />
      <arg type="as" direction="out" name="destinations" />

Endpoints might have URI:s to describe their data sources or destinations

    <method name="SetEndpointURI">
      <arg type="u" direction="in" name="id" />
      <arg type="s" direction="in" name="endpoint" />
      <arg type="s" direction="in" name="uri" />
    <method name="GetEndpointURI">
      <arg type="u" direction="in" name="id" />
      <arg type="s" direction="in" name="endpoint" />
      <arg type="s" direction="out" name="uri" />

As a plan B for automatic mapping, include overriding of the whole pipeline (with an xml description)

FIXME: the description format is currently backend-specific

    <method name="ForcePipeline">
      <arg type="s" direction="in" name="pipeline_desc" />

Video output management

If world would be a happy place, every HW vendor would provide accelerated video output through the XVideo extension in their X.Org driver.

The reality is, however, that virtually no XVideo enabled drivers exist currently in the world of embedded Linux. Often the support is only at the framebuffer level, and any OpenMAX or GStreamer elements written for specific hardware use those directly. Thus we cannot limit our video playback scenario to support just XVideo / plain X surfaces.

Octopus is designed to allow very hardware-oriented modules to be used, so we need also a generic way to control the placement of video on screen. This has two problematic issues:

  • The abstraction between XVideo surface vs direct framebuffer surface
  • The D-Bus layer abstraction

To maximize reusability of the API, and since it's a workable abstraction, we are going to follow the concept defined in the OpenMAX AL (section 4.6):

  • Display - corresponds to the whole screen area
  • Window - corresponds to a display surface available for window
  • Region - defines where and at what size the video is shown inside the window

We expose the abstraction in the route interface so that the sink elements can do whatever is neccessary to fullfill the desired location of the video, and in the D-Bus interface to allow the client applications to control the location of the destination more naturally. The technical difficulties of syncing the client side location of the video and the actual implementation-specific video output are acknowledged as a limitation of the system.

The D-Bus API exposes only platform-specific ways to define the display. It is not considered meaningful to abstract the platform at this level since there is no direct method to pass around pointers through the DBus interface.

The display and window are tied to an Octopus ID, so at least in theory we could have multiple displays playing video at the same time. For X11, the display is provided as a string (usually acquired with XDisplayName() by the client) and the window id is passed as-is (since the X11 window entities are conveniently just numerical IDs).

    <method name="SetX11Display">
      <arg type="u" direction="in" name="id" />
      <arg type="s" direction="in" name="endpoint" />
      <arg type="s" direction="in" name="display" />
      <arg type="u" direction="in" name="window" />

The call is not mandatory. The server uses the default display (ie. what is in the DISPLAY environment variable) if none has been specified (or if it is given as a empty string). The server also creates and manages a window (if required and possible) if the call has not been made. The default region will be the size of the display area, so omitting the SetX11Display and SetRegion calls should result in fullscreen video playback.

We also adopt the XADisplayRegionSettings model to define the characteristics of the video output. If the client has provided the window handle, the server never modifes it. In this scenario, the server will only try to adapt the region size (if applicable) when the SetRegion call is made. If the server is managing the window, it tries to resize and relocate the window to match the region set. This is to ensure that the server-managed window doesn't obscure any non-video content outside the region. The server may also decide to omit creating the native window in cases where an reliable overlay implementation is available. In any case, the media server does not handle or block input events on the video area. The application requesting the region is responsible for handling input events on the region. No guarantees are made that the region can be overlapped by other windows in the display (even when the application is managing the window itself), so making sure the playback is suspended or stopped in such cases is very recommendable.

Note that the following are not yet implemented (10.12.2008).

    <method name="GetRegionSettings">
      <arg type="u" direction="in" name="id" />
      <arg type="s" direction="in" name="endpoint" />
      <arg type="i" direction="out" name="top" />
      <arg type="i" direction="out" name="bottom" />
      <arg type="i" direction="out" name="left" />
      <arg type="i" direction="out" name="right" />
      <arg type="u" direction="out" name="alpha" />
      <arg type="u" direction="out" name="depth" />
      <arg type="b" direction="out" name="chromakey_enabled" />
      <arg type="u" direction="out" name="chromakey" />

    <method name="SetRegion">
      <arg type="u" direction="in" name="id" />
      <arg type="s" direction="in" name="endpoint" />
      <arg type="i" direction="in" name="top" />
      <arg type="i" direction="in" name="bottom" />
      <arg type="i" direction="in" name="left" />
      <arg type="i" direction="in" name="right" />

    <method name="SetAlpha">
      <arg type="u" direction="in" name="id" />
      <arg type="s" direction="in" name="endpoint" />
      <arg type="u" direction="in" name="alpha" />

    <method name="SetDepth">
      <arg type="u" direction="in" name="id" />
      <arg type="s" direction="in" name="endpoint" />
      <arg type="u" direction="in" name="depth" />

    <method name="SetChromaKey">
      <arg type="u" direction="in" name="id" />
      <arg type="s" direction="in" name="endpoint" />
      <arg type="b" direction="in" name="enabled" />
      <arg type="u" direction="in" name="chromakey" />