Archive for the ‘Graphics’ Category

How can I make my ARM device fast?

Wednesday, March 2nd, 2011

In a previous article, I described common performance pitfalls that ARM devices typically succumb to.  Here, I will lay out how to avoid most of those problems.

Tip 1: Give your designers representative hardware to test on.

The latest ARM hardware has roughly the same performance as a good netbook.  So give your UI designers netbooks or nettops (based on Atom, ION or AMD Fusion), if not as their main workstation, then as a performance-appropriate test platform.  They aren’t very expensive and won’t take up much desk space, and can usually be multiplexed into their existing keyboard, mouse and monitor.

This will encourage them to write efficient software in the first place, so do this as soon as possible when setting up the project.

Even better is to give them the real hardware to try out, but a netbook has the advantage of being usable in a desktop-like way, so it can be used for debugging.

Tip 2: Understand, before specifying hardware, what you need it to do.

Do you want simple 2D graphics?  Video?  Alpha blending (and is it premultiplied or not)?  Alpha blending on top of video (which probably requires “textured video”)?  3D?  Two-way video?  High definition?  Dual displays?  Touch gestures – with or without stylus?  ClearType?  More than one of these simultaneously? (more…)

Why is my ARM device slow?

Wednesday, March 2nd, 2011

At Movial, we are often asked to help customers achieve their dream user interface and multimedia experience with some selected hardware and/or software platform. Usually we are happy to provide our expertise, but all too often those of us at the coalface issue a collective groan when we are told what is expected of us.

The task often proves difficult due to mismatches between the hardware’s capabilities or design, the software toolkit, and the expectations of what can be achieved with them. This article will explain some of the difficulties we encounter and how customers and managers – that’s you – can help to maximise the likelihood of success.

The first and most important thing to remember is that embedded or mobile devices do not have the same capabilities as the PCs that you may be used to on your desk or at home.

Typical PCs have enormously powerful CPUs and graphics processors, linked by high-bandwidth buses, and can therefore get away with a lot of brute-force-and-ignorance in the software department. This ability comes at the cost of extremely high power consumption, which can easily reach into the hundreds of watts.

But this is unacceptable in the mobile domain, where a pocket-sized battery is often required to last for days. Hardware manufacturers therefore work hard to produce highly efficient processors inside a one-watt power envelope. The latest OMAP4 and Tegra2 chips are roughly as fast as a good netbook. Obviously, older chips – which may be cheaper – will have even less performance.

This all means that for a good user experience, the available hardware must be used to maximum efficiency, and special features of the hardware must match up to what is required. When this is not the case, your device will be slow. (more…)

Baby steps towards RandR goodies for xf86-video-omapfb

Tuesday, November 9th, 2010

For a long while I’ve been wanting to implement RandR support for xf86-video-omapfb, but never really got around it. Fortunately, Movial came to the rescue and allowed me to kickstart the RandR code on work time, yay for Movial!

The main question (which is always ‘why’) can be answered with the planned feature list and their use-cases:

  • Display mode switching: “real” and pixel doubling (scaled) modes
  • Rotation: orientation response to accelerometer data for example
  • Screen migration (cloning): HDMI output in your device? Flick of a switch and you are on it.
  • Screen expanding: Merge multiple screens into one. This one is a bit iffy, current kernel drivers don’t seem to support it, but…

The ‘how’ is too large topic for this post, but the general idea is to take advantage of the API the revised omapfb kernel driver (big thanks to Tomi Valkeinen) offers and manipulate the OMAP overlays to scan the framebuffer to different outputs.

I will post updates when the support reaches milestones, but here’s a “teaser” (ogg/theora, 860KiB, on youtube too) showing a Blaze board obey a script that enables/disables the different displays in a sequence with the xrandr tool. Blaze has two LCD’s, HDMI port and even an integrated DLP Pico projector though sadly I couldn’t get that one to work for the demonstration.

The current code is available from the git repository, master branch. As usual, testing and patches are welcomed (though don’t expect much from the testing part just yet).

Pixman gets NEON support

Friday, June 12th, 2009

I’ve been working on NEON fastpaths for Pixman lately, and as I write, these are being pushed upstream, hopefully in time for Pixman’s next stable release.  They complement some work already done in this area by engineers at ARM.  Some ARM hardware does use 32-bit framebuffers, but hardware constraints still seem tight enough that 16-bit framebuffers are still common.  So while the ARM guys focused mostly on 32-bit framebuffers and some internal operations, we focused firmly on 16-bit framebuffers.

For those who don’t know, Pixman is a backend library shared by Cairo and X.org, which takes care of various basic 2D graphics operations when there isn’t any specific GPU support for them.  It gets pretty heavy use if you use the XRender protocol on a bare framebuffer, for example.  So optimising Pixman for the latest ARM developments will make Gecko faster, as well as any of those “fancy” compositing window managers which are all the rage these days.

Now the following operations are accelerated, all on RGB565 framebuffers (which may or may not be cached):

  • Flat rectangular fills.  (These also work on other framebuffer formats.)
  • Copying 16-bit images around.
  • Converting 24-bit xRGB images (eg. a decoded JPEG) into the framebuffer format.
  • Flat translucent rectangles.
  • Compositing 32-bit ARGB images (eg. a decoded PNG).
  • Glyphs and strings thereof (8-bit alpha masks, with an overall colour that might be translucent).

Most of the listed operations are now at least twice as fast as they were without NEON, and many come within spitting distance of available memory bandwidth on typical ARMv7 hardware.  Using a benchmark of common operations (as issued by a common Web browser visiting a popular news portal), we measured an overall doubling in performance, despite the most common drawing operations being extremely tiny and therefore difficult to optimise.

In some cases on a more synthetic benchmark, the throughput is vastly greater than that, at least when running on an uncached framebuffer (which tends to hurt generic code very badly).  The main performance techniques were to read from the framebuffer in big chunks (where required), preload source data into the cache, and then process data in decent-sized chunks per loop iteration.  This essentially removes the performance advantage of a “shadowed framebuffer”, so you can now sensibly save memory by turning it off.

We also found some opportunities for reducing per-request overhead in both Pixman and X.org.  Hopefully these improvements will also be integrated upstream in the near future.