There are two types of image file in computing-related matters: raster and vector. The way in which each type stores pictorial information differs radically, and there is a further difference in the two principal forms of raster image representation, all of which has a bearing on the way in which e-book producers achieve the highest image quality for the smallest file size.
This appendix considers the differences between the vector and raster approach, before exploring the difference between the two principal raster-image file-formats, explaining the trade-offs in image quality and file size that these formats offer.
A raster image presents pictorial information as a two dimensional array of points called pixels (‘picture elements’ – the term first appeared publicly in 1965), where the term ‘raster’ comes from the German word ‘raster’ meaning ‘screen or frame’, which comes in turn from the Latin word ‘rastrum’ meaning ‘rake’.
You will be very familiar with raster displays and images. An old-style cathode-ray-tube TV screen and the modern flat-screens in TV and computer displays are all raster devices, and most of the static images you see in web sites are raster graphics too. Indeed, the display device on which you are reading this guide now is an example of raster technology.
Like raster images, a vector image (from the Latin ‘vehere’, ‘to carry’, the origin of words such as ‘convey’ and ‘vehicle’) is a two-dimensional grid of pixels, but user agents and vector-image files represent it internally as collections of linked points and other graphical ‘primitives’.
For example, to represent a triangle, a vector editor models only the three points that represent the shape's vertices, instead of maintaining an internal map of each pixel that makes up every part of the triangle's image and its background. When rendering such a shape on screen, a user agent simply draws a line between each of the points, and this yields two clear and substantial advantages over the raster approach.
Greater Efficiency. To store the image of the triangle above as a raster image of 1000 pixels square, you need to store the information describing one million pixels (1000 × 1000). With a simple, ‘bit-map’ file format, where just one byteis used to store the information that describes each pixel, this requires one megabyte of space on a given storage device (all for one piffling triangle). In contrast, a vector-based file will store only the information representing the triangle's three vertices, which, in principle, consumes just a few score bytes comprising the X and Y coordinates of the points in question.
This significant saving in space is of great importance to e-book publishers where a given work contains a wealth of imagery. This is because, as the discussion of pricing in Chapter One pointsout, the 70% revenue model comes at the cost of paying Amazon a delivery fee that is proportional to a given book's file-size, and so choosing the 70% option renders the minimisation of image file-size critical. Given this, you should choose to render image content using the vector approach whenever possible.
Unlimited Scaling. You can scale-up vector-based images without degradation. That is, if you take a raster image that is 1000 pixels square, then it may look entirely acceptable printed on paper at a resolution of 100 pixels per centimetre (an image of 10 centimetres on a side – the width of your hand, roughly). However, if you blow that image up to print it on paper that is around the size of that used in advertising hoardings (say, 10 metres square), then each pixel will now be 10 centimetres wide. In our triangle example, the lines of the shape would have an obvious ‘stepped’ or jagged appearance, an example of which you can see in the blown-up section of the image of Saturn above, which would be unacceptable.
In contrast, scaling-up a vector image simply increases the distance between the points that make up the image elements. The image renderer then simply re-draws the lines connecting the points, where those lines will be displayed at the native resolution of the medium in question (be it a print or electronic technology – the substrate is immaterial). This means that any stepped appearance that the observer sees stems solely from the resolution of the medium, not from the image itself, and so, in very high-resolution media, no stepping will be visible.
It follows from the second point that you could produce a print version of an e-book, where the vector images would render beautifully on the highest quality paper printed at the highest possible resolution, all without further intervention on your part. That is, you would not need to substitute very high-resolution equivalents for your vector images in order to achieve professional results, which might be required in the case of the raster content in the work, which would be tedious and time consuming at the least.
This is why e-book retailers now demand that you upload a raster-based promotional image of your book's cover of relatively high definition (along with the book file itself). If your book hits the big time, the retailer may want to do some equally big-time marketing, and a low-resolution promotional image would not help matters in this case.
Note that this exposition may cause you to think that vector images are viable only in the case of technical drawings, cartoon-like imagery and rather sterile, non-photographic diagrams, but this is not the case. The lettering and quill-pen motif in the cover of the e-book version of this guide were generated using the vector approach, and truly skilled practitioners can create photo-realistic images of great and subtle beauty. Moreover, vector images can contain raster elements too if needed (the terracotta background to this guide's cover being an example again), thus giving the best of both worlds, and the image of Saturn above is an example of this. To wit, it is an SVG image, where the planet, the blown-up inset, and the background comprise the raster element, while the circles and lines that delimit the inset comprise the vector element.
Turning now to the ways in which raster information is stored, a raster-image file, like any other kind of persistent information-object, comprises a sequence of symbols. These correspond to the image information itself, along with symbols that are not visible in the rendered image, but which tell the renderer in question how to work with the information at hand.
It is possible to store the image information as-is, such that each pixel is represented directly by a corresponding symbolic value. That is, each pixel in the image ‘maps’ to a small block of information in the file. This is a simple, literal ‘bit map’ representation, and the size of such files grows in direct proportion to an increase in the number of pixels. As the previous section points out, a 1000 × 1000 pixel image stored using a simple encoding, requires one million bytes of storage space, while an image of 2000 × 2000 pixels will require four million bytes, and 4000 × 4000 pixels will require sixteen million.
Without a better approach, using such a file format in image-rich web sites would cause pages to load relatively slowly because it takes time for information to traverse the Internet from the server to the user's browser – the greater the size of the image files in a site, the longer it will take to load. In the case of image-rich e-books, many large image-files increase the size of the deliverable dramatically.
To the uninitiated, this situation might appear intractable, and would appear to give publishers of image-rich e-books little choice but to go with the 35% revenue option, but this is not the case. If you consider the simple image of a triangle used in the first section above, it is clear that much of the information in a raster version of that graphic will be the same.
That is, if we assume a plain background to the image, the majority of its pixels must have exactly the same colour and luminance etc. values, yet the repetition of this information in a file is surely redundant – why state the same thing multiple times? However, mathematical techniques exist that allow us to exploit that repetition, which reduces the volume of redundant information, and thus reduces file size.
This is referred to as ‘data compression’, as it minimises the size of a block of information, while preserving its semantics, and there are two approaches to its implementation in graphical processing:
Lossless compression. This allows reproduction of a compressed image on a display device without degradation, meaning that, in the example of the triangle graphic, the image will look exactly the same on decompression as it did before compression.
Lossy compression. This can yield great reductions in size, and while this can incur some loss of image quality, it need not be so excessive as to make the image unacceptable.
There are a number of lossless and lossy compression formats. The most commonly used lossless scheme on the Web is Portable Network Graphics, or PNG, and a lossy format that is very common in web sites and in photography is JPG (which stands for ‘Joint Photographic experts Group’). The next two sections consider these in turn.
Without going into technical details, the PNG format is best suited to images where continuous gradation is sparse or non-existent. This means it is ill suited to photographic images, but is very well suited to cartoon-style images, and line-art. Technical diagrams such as the flow charts in this guide are ideal PNG candidates (although those are SVG images) because the image is composed of boxes, lines and sharply defined symbols, set against a uniform background.
There is a version of the format called ‘APNG’ that supports animation, as the older GIF encoding does (the use of which on the Web declined with the advent of PNG). However, not all browsers support animated PNGs, and the Kindle devices do not support them at all (the slow refresh rate of e-ink displays is not suited to animated images).
PNG also supports transparency, but e-book producers should be aware here that the KindleGen tool converts all PNG images in a book into JPGs. This is why KindleGen takes considerable time in converting EPUBs that contain a great many PNG images, whereas books that comprise text almost exclusively take just seconds. Moreover, the JPG format does not support transparency, which means that any transparency in a given book's PNG images will be lost. The end of this chapter discusses this point.
The second, common form of raster-image compression – the JPG standard – is a lossy compression technique, meaning that a typical, decompressed JPG image will be less faithful to the original uncompressed version. The degree to which fidelity is lost depends on the degree of compression that you choose, where highly compressed images are very small, but are photographically sub-standard, whereas image files that retain high photographic quality are larger (although are still smaller than the uncompressed original).
Nevertheless, under the right conditions, and with appropriate decisions on your part, JPG can deliver excellent compression ratios with very high fidelity, and it is up to you to choose the trade-off that maximises the production quality of your book, while minimising the size of the resulting book-file, thus saving you money under Amazon's 70% payment option.
There is no convenient mathematical formula that will determine the sweet spot in the size-quality trade-off with JPGs. This is because the aesthetic quality of an image is subjective, and this means in turn that, in a book that contains many images, you will have to spend time on experimenting until you find the correct balance. This is one of the reasons that image-rich e-books take longer to produce (beyond the actual writing process) because they require many more decisions and so cause you to spend more time with whatever graphical editor you are using.
Considerable guidance is to be had, however, in understanding a little of how the format works. To grasp the essence of JPG, consider the image of the grey sphere you see here in the sidebar, and imagine how the brightness of a line of pixels varies from left to right, within the area of the thin rectangular strip formed by the dotted lines.
That variation can be plotted on a graph, which you see in the lower part of the diagram. If you were to record the height of the line for each pixel along the horizontal axis, you would have the basis of the simple, monochrome bit-map format considered above, with all of its redundancy. However, it is possible to apply a mathematical technique that will generate a set of numbers that ‘encode’ the line's undulation in the graph.
The nature of this encoding is related very strongly to the origin of the characteristic sound, or ’timbre' of a given class of musical instrument. If you pluck an un-fretted guitar string, it will appear to vibrate at a single frequency called the fundamental, but it will really be vibrating on that and many other frequencies that are higher than the fundamental. These higher frequencies are called ‘harmonics’, where each has a particular strength or ‘amplitude’, and it is the combination of all those frequencies at their respective amplitudes that makes a guitar sound like a guitar. In contrast, a violin, for example, sounds like a violin because the relative amplitudes of the harmonics concerned are different to those of a guitar.
By the same principle, you can think of the undulating line in the graph formed from sampling the sphere above as a ‘waveform’, and the set of numbers that encode that waveform as the set of ‘harmonics’ (or ‘coefficients’, more accurately), that go to make up its characteristic shape. Given that they are numbers, you can store them in a file, and then use them when you read the file subsequently to reconstruct the original waveform.
However, while it is possible to decompose a given waveform into a great many ‘harmonics’, those at very high frequencies contribute little to the overall shape of that waveform. This means in turn that, by discarding the very high frequency coefficients, and storing only those of lower frequency, you can store a representation of a waveform in considerably less space than that which a literal representation requires. In general, you will still be able to reconstruct the original waveform with acceptable fidelity,
This is how JPG encoding achieves compression. It decomposes the pixel-to-pixel variance across an image into a set of coefficients that, just like harmonics in acoustical analysis, can be used to regenerate the original image subsequently. By recording only the lower frequency coefficients, however, it saves a great deal of space while retaining acceptable fidelity. However, you should note the qualifier ‘in general’ in the previous paragraph. Consider the next diagram.
This shows the waveform generated by sampling the figure along a given line of pixels, just as in the previous diagram, and you can see that the line exhibits sharp discontinuities as the pixels vary from white to black. This issue is pivotal in JPG encoding because the only way to reconstruct such a waveform with acceptable fidelity is to use many of the higher-frequency coefficients. This is why you have a choice of compression ratio when exporting an image from, say GIMP or Photoshop. Higher compression discards a greater number of the high-frequency coefficients, yielding low fidelity, while low compression preserves a greater number those coefficients, giving greater fidelity but a larger image file.
This, in turn, is why you see artefacts in overly compressed JPG images, such as ‘edge haloes’. In these cases, the compression ratio is too high, and so the image file represents sharp discontinuities such as the edges of objects inadequately. The renderer must therefore make some form of compromise when displaying the relevant points in the image, and that compromise is the origin of the visual irregularities that it generates.
Conversely, images with smooth changes in colour etc. do not require the very high-frequency coefficients, which is why you can use a high degree of compression and still preserve acceptable fidelity. The image in the sidebar is a JPG image generated by exporting a PNG from GIMP with a quality factor of six. The PNG is 14kB in size, but the JPG is only 2kB, an impressive gain in compression, but the image suffers from unacceptably prominent edge haloes.
From the points in the sections above, certain implications follow when making image-related decisions in the production of an e-book, some of which you will have read before. First, you should use JPG for photographic images and any other graphics where smooth, gradual discontinuities abound. Moreover, the smoother and more gradual the gradations in an image, the higher the compression ratio you can use while preserving acceptable fidelity. Conversely, the sharper the edges in an image, the lower the compression ratio that will be available to you, and the larger the file you will generate.
PNG, on the other hand, is not favourable for photographic images because that compression format excels only when there are many areas of uniform continuity in the image (like a single, consistent background colour) and sharp discontinuities like lines and the edges of shapes. An additional factor, normally, would be that you should choose PNG when you need partial or complete transparency in parts of an image, but that point does not hold for books targeted at the Kindle brand, given that KindleGen converts all ‘explicit’ (see below) PNGs to JPGs.
Without alternatives, this could be a serious issue for producers who wish to use styling and layout that capitalise upon transparency, and so this limitation acts to retard the development of e-books as a literary medium. Additionally, some Kindle user agents support the inversion of the display colours to give white text on a black background, or to set a sepia background. Yet changing this setting does not affect the background colour of images because they are always JPGs, which retain a white background that does a book no favours.
If the Kindle brand supported PNGs natively, and thus supported their transparency, their renderers could simply invert the colours of the non-transparent areas (for the white-to-black background switch), and leave those areas untouched for the sepia setting. Most colour images would fail under this scheme, but monochrome line-art would survive the transformation, thus preserving the dignity of the book in question.
It is unclear why Kindles do not support PNGs natively. It seems unlikely that, for example, and in the general case, JPGs yield better compression at a good quality setting. To wit: a comparison of the files sizes for the larger box-modeldiagram in Chapter Five, Layout, yields the figures you see here in the sidebar, where the JPG was generated from the PNG with a quality factor of 90. It shows that SVG beats PNG and JPG hands-down, and that (in this case) the PNG is actually slightly smaller than the JPG. Certainly, a lower quality-factor (i.e. higher compression ratio) would yield a smaller JPG, but the rendering of the lines in that diagram, of which there are many, would suffer.
This means that when KindleGen converts a book's PNGs to JPGs, it must use a low compression ratio, so a space-saving quest on Amazon's part cannot be the answer. This is why Step F, Stage Three in Chapter Seven, Production,Insert <img> Elements, directs you to convert all your PNGs to JPGs prior to inclusion in the EPUB file that you feed to KindleGen, as you then retain control over the compression ratio. This, in turn, may give you some additional edge when selecting a trade-off between content-richness and transmission-overhead when selling under Amazon's 70% revenue model.
However, do remember that transparency with SVG images is entirely possible, including opacity gradients, and given that SVG is the clear winner in the size stakes, choosing SVG over either of the raster formats renders KindleGen's PNG to JPG conversion academic.
It seems that while KindleGen converts ‘explicit’ PNGs to JPGs, it does not do this with PNG images that are embedded in an SVG image, as is the case with the image of Saturn above. If you embed rather than link a PNG image within an SVG file, the image editor will convert the string of symbols that comprise the PNG file into a textual equivalent that it saves as an integral part of that SVG file. When an EPUB containing that SVG image is fed to KindleGen, that utility does not ‘reach in’ to the SVG file, and recode the PNG information such that it becomes a JPG, nor does it recode it to remove any transparency information.
As a demonstration, and assuming that you are reading this on a Kindle user agent, consider the image in the sidebar. It is an SVG image that contains two textual elements, which have a colour of #808080 (mid-grey on any display technology). In the Z-axis, i.e. the stacking order of elements coming ‘out’ of your display, they are positioned underneath the two other elements in the image. One is an SVG element that possesses a gradient fill that goes from opaque black on the left to complete transparency on the right. The other is a PNG image of that element that was exported from Inkscape and then re-imported and positioned where you see it, over the text that says ‘PNG Transparency’.
It is clear from this that the Kindle supports transparency in PNGs as well as native SVG transparency. If not, the lower gradient would go from black to opaque white on the right-hand side, and the words ‘PNG Transparency’ would not be visible.
Two interesting points flow from this:
It deepens the mystery over why KindleGen converts stand-alone PNGs to JPGs. Clearly, there is a PNG-renderer of some form in Kindle software, so why can this not be invoked for stand-alone PNGs?
If you do want to use PNG transparency in your e-books, you need only create an SVG file that acts as nothing more than a wrapper for the PNG in question. It need have no elements other than an embedded (note: not linked) copy of the PNG image, and you should set the SVG canvas dimensions to exactly the dimensions of the PNG image. Problem solved.