Wednesday, January 29, 2014

Bug Chasing in Google Play Books

What I'm about to say concerns the epub rendering engine used by Google Play Books, the Californian tech-behemoth's first major attempt to break into the world of ebook retail. It will explain why I consider Google Play Books to be easily the weirdest e-reader available on the market today. As anyone who has worked with the standards-averse, held-together-with-gaffa-tape world of e-publishing will testify, competition for the title of "weirdest epub rendering engine" is always fierce, so this is quite a claim.

I think my findings back it up though.

---

Most e-readers ruin your books by not recognising certain CSS declarations, overriding them with their own defaults, or by implementing your CSS in a freakishly non-standard way – not so Google Play Books. The part of Google Play Books that handles CSS stylesheets – presumably forked from the Chrome browser – seems to be excellent, it can understand complex pseudo-class selectors and parse combinations of pseudo-class and pseudo-element selectors with ease. The problem comes from the way that it handles the HTML framework onto which that CSS is applied.

This first became apparent to me when I loaded one of the books I was working on into Google Play Books. This book had drop-caps on the opening body-text paragraphs of each chapter. These were identified using an HTML class (p.first) and a pseudo-element selector (::first-letter). I did it this way because it allowed swanky modern systems like iBooks and Readium to display drop-caps, but phrased it in such a way that Adobe Digital Editions and similar readers (which always render drop-caps wrong) would ignore it (pseudo elements mean nothing to them).

When I loaded this book into Google Play books I noticed something odd. In addition to the drop cap on the first paragraph (which rendered very nicely), it added a drop cap to the first letter of the following page (the page break having fallen halfway through the first para). This seemed to imply that Google Play Books was altering my HTML in real-time (it reacted to changes in font-size and line-height that moved the page break), adding in a hard paragraph break on either side of the page break.

This was weird, but I just thought “meh, I've seen weirder” and changed the pseudo-class selector to counter this odd habit. The drop-cap selector now said div.text>p:first-of-type::first-letter – selecting the first letter of the first child of the div.text container. I figured this would stop it from applying the drop cap to the second, artificial <p class="first">

When I ran this code in Readium, it worked fine. When I ran it in Google Play Books, however, something really strange happened. The text sprouted drop caps everywhere – not only on the first paragraph of the chapter, but also on the first paragraph of each page and the first paragraph after each nested <div>. This seemed to imply that Google Play Books was closing and re-opening the body text container at the end of each page and whenever the flow of text was interrupted by something.

Intrigued, I added another layer to my selector. I changed it to body>div.text>p:first-of-type::first-letter this absurdly convoluted selector should, in theory, have selected only the first letter of the first paragraph of the first div in the whole HTML document. What it actually did was select the first letter of each page.

This seems to imply that in order to render a book, Google Play Books takes the content from your epub and pastes it into an individual HTML document for each page. To make it even stranger, in order to work out where to put the page breaks it must have to apply the CSS to the HTML first, then work out where the page breaks will fall, then chop up the HTML into individual documents and re-apply the CSS. Only after it has gone through all that can it render the page.

This seems absurd, but it would explain some things about the odd behaviour of the Google Play Books app. For example, if you adjust the font size or line height, the screen goes blank and the whole thing has to reload before it can display the changes. This isn't something that any other e-reader does, but would make sense it it was having to re-generate a set of html files. Secondly, even simple books take a long time to load. The same book takes longer to load in Google Play Books – running on a brand-new android tablet – than it does on a first-generation iPad. Finally, there's the strange way that Google Play Books has to pause to load every six pages or so. There’s no way I know of to download and view a fragment of an HTML document, so logically there should only be a loading screen at the beginning of each chapter.

Unfortunately the only way to know exactly what it’s doing would be to break it open and rummage through the source code – a task I have absolutely no idea how to do (I'm just an editor who knows a bit of CSS).