Stop saying mobile web apps are slow
This morning I came across a post by Drew Crawford entitled “Why mobile web apps are slow.” It’s an interesting read, with some good information included. But what struck me is that the article content doesn’t really address the topic of the title.
2) ARM is slower than x86. Drew says “10 times slower.”
First, I’d like to address those points. Then I’ll dive into my thoughts on performance of mobile web apps.
How much slower is it today?
Another factor is what you’re doing. A lot of JS operations already execute at near-native speeds on a good JIT engine (so, not in an iOS app). If you don’t do many allocations, you may never feel the impact of garbage collection. On the other hand, if you’re constantly creating new objects instead of reusing from an object pool, you may hit them frequently and performance will certainly be affected.
That said, is five times slower a reasonable approximation? Sure, why not. I think it’s wrong for a lot of cases, but I’m reasonably okay going with that for the purposes of this discussion.
Is ARM really 10 times slower than x86?
Of course not, but this is the wrong question. How about: Is the sort of mobile chip you find in a tablet or smartphone 10 times slower than the sort of chip you find in an Ultrabook? Perhaps.
ARM chip speeds vary hugely. They’re also affected by the type of workload you give them in drastically different ways from the Core i5 in the Surface Pro I’m typing this on. For example, the Tegra 3 in the Surface RT is a quad core chip and runs at roughly 1.3Ghz. Sort of. It can’t actually run four cores at that speed for very long, because it has a target power and thermal profile which won’t allow it. For short bursts it can use all that power, but for sustained work the actual operating frequency and number of active cores is constantly being throttled to allow the thing to run all day and not require a fan.
But the same is true of an Atom SoC. In fact, a lot of the learning that Intel has been doing lately has been along those lines. That is, building chips that can scale their power usage and thermal output a greal deal and very, very quickly. This is how you get the most bang for your buck (/watt/degree) in the mobile world.
This complicates comparisons between a mobile SoC and an ultrabook or desktop CPU. Combine this with the fact that ARM SoCs themselves vary greatly in computational power (i.e. the Tegra 3 is a relatively slow chip, especially when compared against the latest iPad chips). Some of those ARM chips outperform the latest Intel Atom SoCs significantly. Maybe not in all workloads, but surely in some. I don’t think Intel is going to magically change that with Bay Trail. Rather, I think they’ll have roughly comparable performance. Certainly they will not magically be 10x faster by virtue of being x86! Intel may gain some ground using its manufacturing muscle and ability to die-shrink faster, but that isn’t going to yield the sort of improvement Drew seems to expect. And I’m skeptical it will be enough for Intel to win over big ARM customers. I could expand on this, but my thoughts of the future of ARM and Intel SoCs is a topic for another post
Discussing mobile web app performance
My first step will be to define a few terms and scope the discussion. For the remainder of the post I will assume the following:
Platforms in scope for this discussion are iOS, Windows 8, Windows Phone, and Android (where I have the least amount of knowledge to work from at this time).
Performance refers to the ability to provide the desired functionality with what users perceive to be a responsive, “fast and fluid” experience. In particular, one which is indistinguishable from an equivalent native application.
Challenges for mobile web app performance
Code and resource delivery
The simplest way to put a “web app” into an app store on a mobile platform is to simply wrap a web page in a WebView control. This is a common practice on every mobile platform, and really provides the worst case example for this sort of performance discussion. If your app’s code, markup, and/or resources (i.e. images) need to be downloaded before they can even begin being parsed or executed, then you’ve already lost a very important performance battle. Even if you assume ideal network conditions, and have condensed your first page download size to something very small (say, 30kb – pretty optimistic!), you almost certainly have a best-case latency in the 100-200ms range. That’s just to get your code on the box. And that’s assuming you have effective geo-distribution on your server-side, perhaps via a CDN. Realistically, you’re dealing with a larger download and you’re going to have a lot of users on slow cellular or coffee shop connections, where this latency can be measured in seconds.
It gets worse still if you take a naïve approach and make many HTTP requests, each with measurable overhead, versus bundling them effectively for your needs. Or have to hit a data center on the other side of the continent (or world).
If you’re going to compare a mobile web app to a “native” mobile app, you need to play fair. I don’t know if Facebook’s old app downloaded all or most of its HTML, CSS, and JS on launch, or what caching implementation it may have used, but you probably wouldn’t built a native app which downloads its compiled code or layout information from a web server. So why assume you have to build your “web app” that way?
Some platforms make building an app with installed HTML, CSS, and JS very easy and efficient. Okay, one platform does (hint: it’s the one I helped build). But on non-Windows platforms (or Windows Phone) you can certainly make it work. And if you’re serious about performance, you will.
Platform startup optimizations
Assuming your app’s package includes your code and at least the majority of your HTML, CSS, and static resources, your next challenge is optimizing what happens when the user clicks on your app’s icon or tile, especially the first time they do it. Each platform has its own optimizations for general app start-up. Whether it’s optimizing disk layout or prefetching commonly used app files, or providing splash screen / placeholder screenshots / mark-up based skeleton UIs / fast suspend and resume, each platform strives to make its native apps feel like they’re always ready to go.
The result of those platform optimizations mean that even on a particularly slow Tegra 3 machine like the Surface RT, web apps can start just as quickly as native C++ apps (and in my experience, slightly more quickly than .NET based apps). Yeah, they’re limited by the hardware (both CPU and I/O), but that “5x slower” margin Drew mentioned simply doesn’t exist there. They’re all equally fast on a fast chip or equally slow on a slow one
The next challenge mobile web apps face is the threading model they’re subject to, which evolved from the very earliest days of the web and carries significant legacy with it.
In a traditional web runtime/browser environment, you have one “big” thread where nearly everything happens. The main exception is I/O, which (thankfully) has pretty much always been asynchronous in the web world. So your requests to the network or disk (whether via XHR, src tag for an iframe, image, or script tag, platform API, etc) will not block your UI. But what will? Pretty much everything else.
Not long ago, web browsers did all of these things on a single UI thread:
- Markup parsing
- DOM operations
- Input processing
- Garbage collection
- Misc browser overhead (navigation work, building objects like script contexts, etc).
- Probably a bunch more I can’t think of right now.
The list in the previous section includes a bullet for “input processing.” Timely processing of input is critical to providing a responsive and fluid user experience. This has never been more true than in the world of touch computing. Delays that are imperceptible to mouse users can completely ruin your touch experience. And user expectations for operations like panning a page with their finger are exceedingly high.
To address this, mobile platforms go to great lengths to prioritize input, particularly touch input. I’m obviously most well-versed in how Windows does it. But I believe I understand the basics of Apple’s implementation, based on things I’ve read over the years and my own experiences using an iPad 1 and more recently an iPad Mini. However, I am very happy to learn additional details or corrections (or confirmations) for anything I say here about Apple’s implementation. So if you’re an expert on their input handling system, please comment below!
My understanding of iOS’s input system is that it’s message-based, and like traditional Windows app have been for ages, those messages are delivered to the app’s UI thread. To prioritize touch input, iOS uses a message filtering and prioritization system, where delivery of most message types is suspended when a touch input message (i.e. “finger down”) is received, and kept that way until the input operation has ended (“finger up”).
For a native developer, you get some control over this behavior, and as I understand it, you can decide which work your UI thread will and won’t do while the user is interacting with your UI. What I do not know is whether the system has a mechanism to interrupt your UI thread when a touch interaction happens, or whether there’s any support for handling input on a separate thread. Based on the behaviors I’ve seen from using apps (including Safari), I suspect the answer to both is no. Questions I see on StackOverflow about touch input processing on iOS support this. I could be wrong though, so feel free to correct me!
I don’t know if other plaforms have similar mechanisms. What I have observed is that manipulations in Safari on iOS are obviously not fully independent, as the UI thread freezes while you’re panning a web page (and if you get to an unrendered part of the page with the checkerboard pattern, it will stay there until you release your finger). So if Safari doesn’t do independent panning, I think it’s unlikely an app using WebView could.
Animation and Composition
Of course, handling input processing on another thread isn’t that useful if you can’t update the view for the user in response to it. So Windows 8 makes uses of a separate composition and animation thread using a technology called DirectComposition. The web runtime on Windows uses this hand-in-hand with DirectManipulation to provide a completely off-UI-thread manipulation experience.
For other animations, you need to take care to stick to “independent animations” whenever possible. For Windows 8, this means you’re best off sticking to CSS3 animations and transitions over CSS3 transform properties (i.e. use translate(x, y) rather than animating the “left” and “top” properties). The built-in WinJS animations do a good (though in a few cases not ideal) job of using independent animations effectively, and for many apps they’re all you need.
For more about independent composition and animation in IE / Windows 8 apps, read this article on MSDN.
I suspect on other platforms you should follow similar rules to get the best animation performance, but I’d love to learn more from experts at iOS or Android development.
Better yet, as I understand it, each web worker gets its own script context and heap, thus its own garbage collector. So if you allocate a lot of objects on a background thread, then only pass small output data to the UI thread, you can help minimize the impact of garbage collection on your UI thread.
DOM, layout, and rendering performance
Another factor for performance of web apps is the overhead and general performance characteristics of any libraries you use. A library can be a bunch of reusable JS code (like JQuery or WinJS), or a native platform exposed to (WinRT, PhoneGap, etc). In either case, the library itself and your usage of it will affect performance.
One example is data binding. WinJS provides a handy data binding implementation, which I’ve made extensive use of. But for some situations, its performance overhead can start to be noticeable, and you probably have options for doing something more efficient for your needs.
Another factor is whether the library you’re using is tuned for the platform/runtime you’re using it on. WinJS is obviously well-tuned to IE on Windows. But others may not be, particularly if they are designed to work with outdated versions of IE.
Here’s a slide from Gaurav’s deck (which I’ve annotated slightly) showing the difference in his example app just from reusing objects versus allocating new ones unnecessarily:
Re-answering why GC is a bigger problem on iOS
A large section toward the end of Drew’s post asks why, if GC is so bad that Apple can’t offer it to iOS developers, how does Windows Phone get its buttery smooth reputation when it uses an (almost) exclusively GC platform?
Well, take what we discussed above and assume similar mechanisms are in place on Windows Phone and its .NET implementation. I don’t actually know as many technical details here, but I assume the strategy is similar to Windows 8.
On Windows 8, a GC pass while you’re panning the view does not cause a stutter. On Android, apparently it does. Or maybe recent versions fixed that (I haven’t been able to figure that out – anyone know?). Neither the thread handling the input nor the thread doing the scroll animation ever runs any GC code (and the impact on CPU resources doesn’t really affect it, as the heavy lifting for the animation is done by the GPU). Yes, you can potentially scroll to a region the UI thread hasn’t yet rendered. But since the scrolling operation is fully independent, the UI thread shouldn’t have too hard of a time keeping up (and in practice, does not), and it can update the surface while you’re still panning it. So if a GC happens while panning, maybe you see some blank list items for an instant, but you don’t get a stutter.
What Drew should call his post
Will it be fast as-is on iOS? Maybe not. A recent iPad (heck, maybe even an iPad 2) has a beefier CPU, so there’s hope. But since the platform is in some ways just not optimized at all for the web (i.e. no independent manipulations), and in other ways actively hostile to web apps (i.e. crippled JS engine outside Safari), maybe I’ll be forced to go native there. That sucks. But that’s no one’s fault but Apple’s.