Pixman gets NEON support

I’ve been working on NEON fastpaths for Pixman lately, and as I write, these are being pushed upstream, hopefully in time for Pixman’s next stable release.  They complement some work already done in this area by engineers at ARM.  Some ARM hardware does use 32-bit framebuffers, but hardware constraints still seem tight enough that 16-bit framebuffers are still common.  So while the ARM guys focused mostly on 32-bit framebuffers and some internal operations, we focused firmly on 16-bit framebuffers.

For those who don’t know, Pixman is a backend library shared by Cairo and X.org, which takes care of various basic 2D graphics operations when there isn’t any specific GPU support for them.  It gets pretty heavy use if you use the XRender protocol on a bare framebuffer, for example.  So optimising Pixman for the latest ARM developments will make Gecko faster, as well as any of those “fancy” compositing window managers which are all the rage these days.

Now the following operations are accelerated, all on RGB565 framebuffers (which may or may not be cached):

  • Flat rectangular fills.  (These also work on other framebuffer formats.)
  • Copying 16-bit images around.
  • Converting 24-bit xRGB images (eg. a decoded JPEG) into the framebuffer format.
  • Flat translucent rectangles.
  • Compositing 32-bit ARGB images (eg. a decoded PNG).
  • Glyphs and strings thereof (8-bit alpha masks, with an overall colour that might be translucent).

Most of the listed operations are now at least twice as fast as they were without NEON, and many come within spitting distance of available memory bandwidth on typical ARMv7 hardware.  Using a benchmark of common operations (as issued by a common Web browser visiting a popular news portal), we measured an overall doubling in performance, despite the most common drawing operations being extremely tiny and therefore difficult to optimise.

In some cases on a more synthetic benchmark, the throughput is vastly greater than that, at least when running on an uncached framebuffer (which tends to hurt generic code very badly).  The main performance techniques were to read from the framebuffer in big chunks (where required), preload source data into the cache, and then process data in decent-sized chunks per loop iteration.  This essentially removes the performance advantage of a “shadowed framebuffer”, so you can now sensibly save memory by turning it off.

We also found some opportunities for reducing per-request overhead in both Pixman and X.org.  Hopefully these improvements will also be integrated upstream in the near future.

3 Responses to “Pixman gets NEON support”

  1. John Stowers Says:

    Excellent to see.

    Because the re-factored pixman seems to be slower than the older one, do you expect to ship the NEON support on top of an older version? Are you confident that the performance regressions introduced during the re-factor can be removed?

  2. Jonathan Morton Says:

    I think that for most common cases, the new blitters more than compensate for the extra overhead. I also suspect that a malloc-removal patch that’s been integrated will help to compensate.

    And yes, I do think that the overhead can be reduced further without losing the maintenance benefits of the refactored version. Repackaging the blitter parameters in a struct and passing that by-reference, for example, would keep a lot of things off the stack and avoid having to copy them around.

  3. Aulia Says:

    Hey, Donald!Is it me or have you suddenly deapapesrid?Your website is reaaaally great and inspiring and has helped me a lot with my 3d programming. I wanted to write you an email and thank you for your trouble putting these awesome tutorials together for people like me, but figured you’re probably a busy man, so I’d not bother you directly through email and rather write a comment here.Then again, you’re most probably getting emailed for every comment on this blog though, but nevermind that.Once again, a BIG thank you for all your work which has truly inspired me to do things, which years ago, I wouldn’t even dream about.I really hope you haven’t stopped with the tutorials and am really hoping I’ll see a GLSL Normal/Bump mapping tutorial some time this year. That would be OMEGA-awesome ;}Anyway, thanks again, Audrius.

Leave a Reply

To prove you're a person (not a spam script), type the answer to the math equation shown in the picture. Click on the picture to hear an audio file of the equation.
Click to hear an audio file of the anti-spam equation