Introduction

Chapters

Strategy Resources Structure Style Layout Composition Production Infrastructure

Appendices

E-Book Formats Explained Image Formats Explained Advanced Styling & Layout Better Photography Working with SVG Unicode and All That Troubleshooting

Supplementary

Revenue Calculator Contact
Copyright © Dodeca Technologies Ltd. 2016

Structure

You need go ankle deep only into the world of HTML in order to format the majority of e-books, and this chapter shows you in simple, stepwise fashion how to structure your content using that language, prior to applying any stylistic touches you may desire, which is the subject of the next chapter.

Note that this chapter does not present an exhaustive treatment of HTML, as many of its features never see use in e-book formatting, and so, should you wish to learn more (say for constructing your own web site – see​Chapter Eight), you should consult a more extensive tutorial.

Basic Structure

One of the principal ways of organising prose is to break it into paragraphs, two plain-text examples of which are presented in the first part of the following example, where the second part shows how that text would appear in a web browser in the absence of any further treatment.

Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua.

Ut enim ad minim veniam, quis nostrud exercitation ullamco
laboris nisi ut aliquip ex ea commodo consequat.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

As you can see, the user agent (the browser that you are using to read this text now) elides the line-breaks that are present in the source, such that it presents only a monolithic block of content. That is, attempting to impose structural formatting by inserting extra spaces and line breaks between words or paragraphs will fail. Instead, when using HTML to prepare your book for publication, you should wrap each paragraph with special information that does not appear when the human reader consumes the content, but which indicates to the user agent how it should be structured. The next example demonstrates this:

<p>
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua.
</p>

<p>
Ut enim ad minim veniam, quis nostrud exercitation ullamco
laboris nisi ut aliquip ex ea commodo consequat.
</p>

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

The <p> and </p> constructions are known as ‘tags’. Specifically the <p> is a ‘start tag’, and the </p> is an ‘end tag’, and the start and end tag, and the text that they enclose, form collectively what is called an ‘element’.

To explain: ‘p’ stands for paragraph, and a <p> element tells the user agent to lay out vertically each block of text that is wrapped with such tags, and to place a gap between anything above and below them. It is possible to change the extent of that gap – see​the next chapter, for more information. Note that, as with all such encodings and formats, you cannot be lax and leave things out, and you must code your tags in the form in which you see them in the examples here. Moreover, if your code does not behave as you expect, you should consult Appendix G,​Troubleshooting.

A few additional points before we explore HTML in greater depth:

1.

Mark-Up. You may encounter from time to time the term ‘mark-up’ (or ‘markup’) in discussions about HTML, which reflects the idea of adding such formatting commands to a piece of otherwise unadorned content (and hence ‘HTML’ stands for ‘Hyper Text Mark-up Language’).

2.

Rigorous Layout. Note the formal layout in the code above. This is not merely cosmetic, and Appendix G explains​the very sound reasons for giving code such strong visual structure.

3.

Forcing Line-Breaks You can force a break in a run of text by inserting the following tag: <br/> (note the position of the trailing slash, which a section below considers). Here, ‘br’ means ‘break’, and the fact is that there are relatively few situations where you would use such a tag. That is, it is far better to impose your formatting using <p> and other tags rather than <br/>, as you will then retain the ability to adjust the distance between elements when the user agent renders them (the subject of the next chapter).

4.

Comments. It is of value at times to be able to insert text that does not appear when the wider body of content is rendered, as we can then leave notes in the code that assist us in the production process. For example, you might need to mark a spot that needs further attention, but where you must defer the work required until later. In this case, you might put ‘Come back to this bit’ into the code, prepending and appending that note with a special symbol sequence, as the next example shows:

<!--

Come back to this bit

-->

When used together, the ‘<!--’ and ‘-->’ sequences, and the content between them, form a ‘comment’, which user agents ignore, causing them to remain invisible when the content is rendered. Note here that if you start a comment using ‘<!--’ then you must terminate it at some point with ‘-->’, otherwise the user agent will see a comment that appears to run to the end of the file without terminating.

Note also that it is useful to adopt a convention in marking places that require further work, by inserting, say, three hashes into a comment. It then becomes a simple matter to track down all such locations by searching simply for ‘###’ in your text editor.

You can also use lines of equals signs as comments (not dashes – they confuse the user agent), which serve admirably in splitting up tracts of code into logical sub-sections. For example, the HTML that implements the end of the previous section and the start of this section looks like this:

   </p>
</div>

<!--== Basic Structure =============================================================-->

<div class = "Section" id = "Basic_Structure">
   <h2>Basic Structure</h2>
   <p>
   One of the principal ways of organising prose is to break it into paragraphs,
   two plain-text examples of which are presented in the first part of the
   following example,

Finally, you can use comment syntax to disable sections of code temporarily when, say, you are trying to diagnose a particularly obscure problem. The final appendix, Troubleshooting, considers​this technique in more detail.

With these points established, let us now explore HTML a little more formally.

Page Anatomy

The simple example of paragraphs shown in the previous section only scratches the surface. In fact, all valid HTML files have a higher-level structure above that of individual paragraphs, so consider the next example:

<!DOCTYPE html>
<html>
   <head>
      <meta charset = 'UTF-8'/>
      <title>Lorem Ipsum</title>
   </head>

   <body>
   </body>

</html>

Aside from the ‘meta’ tag, which is optional, this is the smallest syntactically valid HTML file that is possible. The rules state that every HTML page must start with the (odd-looking) <!doctype html> at the top (which tells the user agent that you are using a modern form of HTML). It must also contain an <html> </html> element, within which there should be a <head></head> element and a <body></body> element.

Moreover, the head element must contain a <title></title> element, and that must carry content of some kind, hence the ‘Lorem Ipsum’ you see above. Your HTML code is not valid if it deviates from this, and even though e-readers do not display the contents of the <title> element, you must include something there, as invalidity on the part of your code impinges on its acceptability to some e-book retailers, as the previous chapter​points out.

Note too that the <head> and <body> elements are known as ‘child’ elements, while the <html> element is the ‘parent’ (or ‘document’ element) because their relationship is hierarchical. This notion of hierarchy extends to an arbitrary depth, such that the content of a given page – the elements that form the actual words and pictures – are always children, directly or indirectly, of the <body> element (and no other). Moreover, some of those elements can contain their own children (which can contain their own children, and so on).

Finally, and if you are wondering about the purpose of the <meta> element, this informs the user agent about what is called the ‘character encoding’ of the file. This is a rather technical matter, about which you can forget in most circumstances, but you should consult​Appendix F, Unicode and all That, should you need an explanation. Whatever the case, however, and while it is optional, it is important to include it in any HTML that you generate as it precludes a certain class of problem, so just include it as a matter of course.

Given these points, and to combine the examples we have seen so far, the following is a valid piece of HTML that contains some content:

<!DOCTYPE html>
<html>
   <head>
      <meta charset = 'UTF-8'/>
      <title>Lorem Ipsum</title>
   </head>

   <body>
      <p>
      Lorem ipsum dolor sit amet, consectetur
      adipiscing elit, sed do eiusmod tempor
      incididunt ut labore et dolore magna aliqua.
      </p>

      <p>
      Ut enim ad minim veniam, quis nostrud
      exercitation ullamco laboris nisi ut aliquip
      ex ea commodo consequat.
      </p>
   </body>

</html>

This would render no differently to the initial example given above, and so you may wonder why the <head> element is even necessary. It is true that it serves no explicit purpose in this simple example, but the rules require it, and it does perform an important role in more complex scenarios, which the next chapter makes clear.

Headings

To display titles and subtitles, HTML provides the <h> series of tags, and the next example shows the full set and the way in which they render.

<h1>Biggest </h1>
<h2>Smaller </h2>
<h3>Smaller </h3>
<h4>Smaller </h4>
<h5>Smaller </h5>
<h6>Smallest</h6>

Biggest

Smaller

Smaller

Smaller

Smaller
Smallest

The ‘h’ in these tags stands for ‘heading’, and the number that follows determines the default size of the text. There are only six levels, but this should be sufficient for any need, and do note that CSS (considered in the next two chapters) gives you full control over their size and style should you need to override the defaults that user agents employ.

Text Effects

Beyond the basics demonstrated above, you will need at times to render text such that it is, for example, italicised or emboldened. Imagine that you wish to stress the word ‘ipsum’ in the phrase given below, by italicising it.

Lorem ipsum dolor

To achieve this, you wrap the word in a pair of <i></i> tags (where ‘i’ stands for ‘italics’), as the next example shows:

Lorem <i>ipsum</i> dolor
Lorem ipsum dolor

Similarly, if you wish to embolden a word or phrase, you should wrap it in a pair of <b></b> tags, where the ‘b’ stands for ‘bold’:

Lorem <b>ipsum</b> dolor
Lorem ipsum dolor

Note that, while HTML furnishes the <u> and <s> tags, which cause the rendering of their content as underscored and struck-through text respectively, you cannot use them in e-books. This is because the EPUB standard proscribes them, and you should see​the Textual Emphasis section in Chapter Four for an explanation of this issue, and for an alternative approach that the EPUB standard does sanction.

Finally in this section: technical and scientific e-books would be seriously compromised without the ability to communicate information such as 28 and H2O, and HTML caters for such requirements with the <sup> and <sub> tags (short for ‘superscript’ and ‘subscript’ respectively). To render the two examples in the previous sentence, the mark-up for that content looks like this:

Finally in this section: technical and scientific
e-books would be seriously compromised without the
ability to communicate information such as
2<sup>8</sup> and H<sub>2</sub>O and HTML caters
for such...

At this point in this chapter, you have seen enough HTML to format the great majority of e-books. Accordingly, and if your book is typical in that it contains only paragraphs, titles, subtitles and links (in the form of the table of contents) then you need read no further, and you can proceed to the next chapter, Style,​which considers the control of your content's appearance. If your content needs more HTML than covered so far, however, the remaining sections in this chapter will show you some more advanced constructions.

Images

You may wish to include images in your book (other than the cover, which is mandatory), indeed it may be a picture book of some kind, where images predominate, in which case it may contain only a modicum of text. Emplacing an image within a chapter is simple, as the following example illustrates:

<img src = ‘MyPicture.png’/> 

This code constitutes an image element, and it differs from the elements demonstrated above in that it is an ‘empty element’, which is to say that it has no closing tag but ‘closes itself’ by means of the trailing slash. This means that the following form is incorrect:

<img src = 'MyPicture.png'></img>

This is because an <img> element does not ‘contain’ any content, and so such elements go by the term ‘replaced element’ because, unlike <p> elements, it does no more than tell the user agent that an image should appear at that point in the rendered content. That is, <img> elements do not represent content intrinsically, but act as proxies for content that is external to the file in question, where the value of the ‘src’ attribute tells the user agent where to find that content.

In the example above, that attribute states simply that the image is called MyImage.png, and implies that the image file is in the same directory (folder) as the HTML file, but do note that it is a very good idea to have a special sub-directory that contains all the images for your book. Indeed, this text assumes that you will use Sigil to create your EPUB file, and so you should create a directory structure that conforms precisely to the diagram you see here, and should segregate accordingly the files that comprise your book within that structure (although you can omit the Fonts directory if your book does not employ custom typefaces).

This is because this structure replicates the way that Sigil organises things internally within a given EPUB, and so adopting it will ensure that your images load correctly when you test your chapters in both a web browser and an e-reader. Failure, on the other hand, to use this structure will force you to edit the value of every src attribute once you have imported your content into Sigil – an extremely tedious and time-consuming task when you are dealing with many images.

It follows that you should always prepend the values for your src attributes in any image elements with ‘Images/’, as this will allow user agents to locate a given image. Given that, and to develop the example above, your src attribute-values should conform to the following pattern:

<img src = 'Images/MyImage.png'/>

…where the two dots mean ‘go up a level in the directory structure’, and the slash that follows means ‘then descend into the Images directory’.

With these points in hand, remember that there are two forms of image: raster and vector, where these terms refer to the method by which pictorial information is represented (and Appendix B explores​the differences between them). Given this, and if you wish to use SVG images in your content, the mechanism is the same as for raster images, which is that you use an <img> element in just the same way.

To use a simple triangle as an example, where the SVG file in question is called ‘Triangle.svg’, the HTML for emplacing that image within a given chapter is as follows:

<img src = 'Images/Triangle.svg'/>

Note here that Amazon's documentation states currently that you can also include SVG images using HTML's <object> and <embed> tags. This is false, as any attempt to use those elements in an e-book will cause Amazon's Kindle Previewer (and the KindleGen tool that it runs internally) to ignore those tags, and to emit a warning message about unrecognised tags. The resulting KF8 file does not render any image that you attempt to include in this way, although the EPUB version of the same book does render the image. This means that the only way to get SVG images into content that is targeted at Kindle devices is by using <img> elements.

Five final points in this section:

1.

Images as well as text can be links. All you do is wrap a given <img> element in an anchor element, exactly as shown above for textual links. A click/touch on the image in question will then take the reader to the appropriate location.

2.

Appendix E, Working with SVG, considers the idea of clickable​SVG elements, where elements within an SVG image can be made to act as links that connect to different parts of a book's content.

3.

The distinction between replaced and non-replaced elements has some bearing on their behaviour within the flow of a given piece of content, and Chapter Five, Layout, considers​the implications here. This issue need not concern you unless you have a somewhat advanced vision for your book's styling and layout.

4.

In web development, image elements must possess an alt attribute, the absence of which renders invalid the HTML of which they are a part. The value of that attribute should be a human-readable word or phrase that describes the content within the image in question, the idea being that, if a given network connection is slow, the text can act as a stand-in until the browser manages to retrieve the image data. Additionally, speech synthesisers can (in principle) read the content of the alt attribute, thus making the web page in question accessible to the visually impaired.

Clearly, the idea of an e-reader being unable to load a given graphic should not apply, as the graphic in question should form a part of the book file itself. Moreover, text-to-speech enabled e-readers like the Kindle Touch just skip to the next block of prose without reading the value of the alt attribute. Despite these points, however, you should give a meaningful and appropriate value for that attribute in any <img> elements you emplace within your content in case some future customer buys your book and consumes it via a yet-to-be and more-accommodating user agent.

5.

Currently, Amazon's ‘Look Inside’ feature that allows customers to sample the first 10% of an e-book supports neither SVG nor custom typefaces. This is a serious deficiency, as that ten percent is your chance to really show-off your book to the potential purchaser, and so Amazon do you no favours in this respect (nor itself, indirectly). Sadly, the only way around the SVG problem is to use JPG equivalents in the first ten percent of your book, and to use SVG (if that is your desire) in the following 90%.

This is the approach taken with the e-book version of this guide, where the revenue difference chart in the pricing section in the first chapter is a JPG image (which renders beautifully as an SVG), as is the master production flowchart presented in that chapter.

In the case of custom typefaces, an alternative is to render the text in question within a vector- or raster- graphics editor, and to then export the image as a JPG. You can then use an image element in place of the native text at the appropriate point(s) in the first 10% of your book.

None of this is satisfactory, but, as with many things that are Kindle-related, we just have to live with it.

Lists

It is common for books to present information in the form of lists, and HTML supports this requirement in the form of list elements, which come in two flavours, unordered and ordered.

An unordered list simply presents the list items with some form of mark or symbol, whereas an ordered list presents the items in a numbered fashion. The next example shows the code for an unordered list.

<ul>
   <li>Lorem</li>
   <li>Ipsum</li>
   <li>Dolor</li>
</ul>
  • Lorem
  • Ipsum
  • Dolor

Here, the <ul></ul> pair tell the user agent that the element is an unordered list (hence ‘ul’), and a given <li></li> pair denotes a list item.

Alternatively, you can create an ordered list by replacing the ‘u’ in the <ul></ul> tags with an ‘o’, which stands for ‘ordered’. To reprise the previous example in kind, the HTML for an ordered list, and the way such an element renders, looks like this:

<ol>
   <li>Lorem</li>
   <li>Ipsum</li>
   <li>Dolor</li>
</ol>  
  1. Lorem
  2. Ipsum
  3. Dolor

Do not worry for now if the style of the bullet point for an unordered list is not as you wish, nor if the numbering style in an ordered list does not fit your goals (you may desire, say, Roman numerals), as the next chapter, Style, explains​how to manipulate things in that respect.

Tables

In addition to lists, some books need to present information in a tabular format, and HTML supports this requirement with a set of dedicated elements. The following example gives a simple piece of code that renders a table:

<table>
   <tr> <td>A1</td> <td>A2</td> <td>A3</td> </tr>
   <tr> <td>B1</td> <td>B2</td> <td>B3</td> </tr>
   <tr> <td>C1</td> <td>C2</td> <td>C3</td> </tr>
</table>  
A1 A2 A3
B1 B2 B3
C1 C2 C3

The <table></table> pair delimit a table element, whereas <tr></tr> pairs delimit rows within that table, and <td></td> pairs (which stand for ‘Table Data’) delimit the information held at a given row-column location.

Note again the regular layout of the code; it is a very good idea to observe the same layout regimen when constructing your own table code, as larger tables can be decidedly unwieldy, which makes it very easy to make mistakes. Such mistakes include forgetting to add an end </tr> or </td>, and giving a table a different number of cells in different rows (without a corresponding colspan or rowspan attribute, see below). This is one point where the value of using a proper stand-alone text editor (with, one hopes, a columnar​selection feature), rather than your word processor or tools such as Sigil and Calibre, comes swiftly to the fore.

Note also that, while a validator will trap problems such as too few/many </td>s, the ‘shove it in anyhow and let the validator find the problems’ attitude is the approach of the rank amateur, as you will still have to locate the offending run of code, at which point the value of good layout will become undeniably apparent.

Tables are very versatile, and you can make a given cell span a number of columns and/or rows. The following alters the first row of the example given above to demonstrate the use of the ‘colspan’ attribute:

<table>

   <tr> <td colspan = '2'>A1/A2</td>             <td>A3</td> </tr>
   <tr> <td              >B1   </td> <td>B2</td> <td>B3</td> </tr>
   <tr> <td              >C1   </td> <td>C2</td> <td>C3</td> </tr>

</table> 
A1/A2 A3
B1 B2 B3
C1 C2 C3

This code tells the user agent to make the top-left cell span two columns, hence the colspan attribute (‘COLumn SPAN’) that you see, which must have an integer value greater than zero. Note again the layout used here, and observe how the trailing angle bracket in the first <td> elements of the second and third rows is lined up with the trailing bracket of the top left <td> element. You are entirely free to add superfluous whitespace in this fashion, and it is a good idea to avail yourself of this latitude, as it yields good code layout, which is our front-line defence against confusion.

To complement colspan, the ‘rowspan’ attribute allows you to make a cell occupy an arbitrary number of rows (up to the total number of rows in the table in question). Consider the following:

<table>

   <tr> <td rowspan = '2'>A1/B1</td> <td>A2</td> <td>A3</td> </tr>
   <tr>                              <td>B2</td> <td>B3</td> </tr>
   <tr> <td              >C1   </td> <td>C2</td> <td>C3</td> </tr>

</table> 
A1/B1 A2 A3
B2 B3
C2 C2 C3

This provides a route to labelling a set of rows, or all rows in a table, and Appendix C, Advanced Styling & Layout, shows​how to style such table cells accordingly.

If you wish to see a real application of the rowspan attribute, take a look at the greyshades table​given in the section on colour in Chapter Six, Composition. The first two columns have 16 rows, but the third spans 16 rows, and an SVG graphic fills the vertically-extended cell that results.

Beyond these principles, and if you wish to tell the user agent that the first row of cells contain column headings rather than actual table data, you can use the <thead> element. For example the following:

<table>

   <tr> <th>Col 1</th> <th>Col 2</th> <th>Col 3</th> </tr>

   <tr> <td>A1   </td> <td>A2   </td> <td>A3   </td> </tr>
   <tr> <td>B1   </td> <td>B2   </td> <td>B3   </td> </tr>
   <tr> <td>C1   </td> <td>C2   </td> <td>C3   </td> </tr>

</table>  
Col 1 Col 2 Col 3
A1 A2 A3
B1 B2 B3
C1 C2 C3

…shows that the user agent, in the absence of any explicit styling information that you provide, renders <th> elements in the bold version of whatever the default typeface may be, and centres the headings in their respective cells.

You can also give a table a caption, using the <caption> element. For example:

<table>

   <caption>Lorem Ipsum</caption>

   <tr> <td>A1</td> <td>A2</td> <td>A3</td> </tr>
   <tr> <td>B1</td> <td>B2</td> <td>B3</td> </tr>
   <tr> <td>C1</td> <td>C2</td> <td>C3</td> </tr>

</table>
Lorem Ipsum
A1 A2 A3
B1 B2 B3
C1 C2 C3

Note too that there are other elements that you can use within <table> elements, such as <tfoot> and <col> and <colgroup>, the latter of which are of great value when styling columns of values in a particular way. See​the coverage of this in Appendix C, Advanced Styling & Layout.

Two final points in this section:

1.

Clickable Tables. Anchor elements (i.e. links) are not restricted to encapsulating only text or images, as they can contain a <table> element, which means that a click or touch on such a table will take the reader to another point in the content, just as if it were a textual link.

2.

Character References

Plainly, there are many alphabets in the world, yet only a relatively small number of keys on a computer keyboard. How then does one write a French romance, where the two protagonists meet in a swish café, when you do not have an ‘e-acute’ key?

Equally, that Asian thriller of yours includes as a critical plot device a document written in Mongolian that holds the clue to the mystery. You need to show your reader the cryptic message that your protagonist has discovered, but where-oh-where is the wretched ‘Mongolian end of chapter’ key? Yes, written Mongolian really does have an end of chapter symbol, and it is this:

HTML surmounts this problem in two equivalent ways, where the first comprises the concept of ‘character entities’. Do not let this term worry you (it relates to the arcana of HTML's underlying semantic model), all you need to know is that a character entity is a sequence of letters that act as a stand-in or mnemonic for a given character, and which must start with an ampersand and must end with a semi-colon. However, not all e-readers recognise these. Moreover, the set of character entity mnemonics that HTML defines is relatively limited, yet you may need to render a particularly unusual character (to western eyes) such as the Mongolian end-of-chapter symbol above.

The alternative therefore is to use a ‘numeric character-entity reference’ (another scary term for a simple idea). This employs the same approach as a named character entity, except you place a hash after the ampersand, after which you identify the character you desire by stipulating a unique number.

That number is known as the character's ‘Unicode Codepoint’ (again, a seemingly fearsome but harmless term), and you should consult Appendix F,​Unicode and All That if you need a detailed understanding. The essential idea, however, is that Unicode assigns a unique number (a codepoint) to each character in almost all the writing systems of the world, a number that you should cite when you want a given character to appear at a given location in your content.

The following shows this idea in action, where the value 233 is the codepoint for the e-acute character as it appears in the code that generates the first paragraph of this section:

...two protagonists meet
in a swish caf&#233;,
when you do not have an
e-acute key?    
...two protagonists meet in a swish café, when you do not have an e-acute key?

With that in hand, the table you see below gives some of the codepoints that are common to e-books, citing their values in decimal:

Codepoint
Glyph
Name
8220 left double quotation mark
8221 right double quotation mark
8216 left single quotation mark
8217 right single quotation mark
39 ' apostrophe
60 < less-than sign
62 > greater-than sign
38 & ampersand
8230 horizontal ellipsis
169 © copyright sign
174 ® registered sign
8482 trademark sign
163 £ pound sign
8364 euro sign
8211 en dash
8212
em dash

If you need something that is not in this list, then you should consult a Unicode reference, and the following search terms should be useful in this respect:

unicode character table

Such references usually cite codepoints in decimal and hexadecimal (or base-16), which again need be no cause for concern even though it sounds dreadfully mathematical, and if you need a grounding here, Appendix F,​Unicode and All That, gives a full explanation of the relevant concepts. The essential point, however, is that stipulating a character reference using hexadecimal rather than decimal, requires an additional ‘x’ after the ampersand and hash.

Given that, and to reprise the example above, the following shows the use of the hexadecimal number E9 (as unsettling as the concept of 'E' being a number may seem), which is the codepoint for the e-acute character:

...two protagonists meet
initially in a swish caf&#xE9;,
when you do not have an
e-acute key?    

You will notice that the table above includes entity references for the <, > and & characters. These may seem redundant to you, as it would seem that all keyboards have keys for these, but they exist because a piece of content that reads:

Laurel & Hardy were a comedy duo.

…is invalid when coded as-is in a piece of HTML because user agents read the ampersand as the beginning of a character entity reference, which it is not. Similarly, the following:

100 >  99
 99 < 100

…says simply, and in mathematical notation, that one hundred is greater than ninety-nine, and that ninety-nine is less than one hundred. However, coding this directly into a tract of HTML will cause the user agent to see the < and > symbols as denoting the beginning and end respectively of a tag, which of course they are not.

Web browsers are very forgiving in the face of these coding solipsisms, as it is easy for them to see subsequently that they are dealing only with ordinary content, so usually you just see the character as-is. However, the code is invalid, which, as​noted in the previous chapter, can cause problems further down the line. Hence, when you wish to render those symbols as part of your content, you should use the numeric reference equivalents given in the sidebar above. This means that the code that renders the two examples above looks like this:

Laurel &#38; Hardy were
a comedy duo
100 &#62;  99
 99 &#60; 100

Preformatted Text

As pointed out initially above user agents elide extra spaces and line-breaks in HTML files, and while this is fine for normal prose, there are times when you do not want this feature. For example, this guide contains numerous HTML and CSS examples where the layout of the code is critical to your understanding. Another example is e-books that contain poetry, where you need to control the layout of the text precisely.

In addressing this, you could try using the positioning and layout features of CSS, but this would be tedious, error prone and unreliable. Alternatively, you could subvert​HTML's support for tables to achieve the layout you desire, but this too would prove dreadfully tedious and inflexible, and would not always give you what you wanted. Thankfully, HTML provides the <pre> element, where ‘pre’ stands for ‘preformatted’, which tells the user agent to preserve the textual layout when rendering the content.

Consider the following example:

<pre>

  Indented by three spaces
     Indented by six spaces
        Indented by nine spaces

</pre>
  Indented by three spaces
     Indented by six spaces
        Indented by nine spaces

This gives authors of technical treatises exactly what they need, and proffers also an alternative to tabulating information by tussling with a mass of <table>, <tr> and <td> elements. That is, to present a simple table of information, you could lay out the table's content in your editor as you wish it to appear in the user agent, and then wrap that content with a <pre>...</pre> pair.

Do note here, however, that there is a defect in at least some Kindles, where lines that are longer than the lateral extent of the <pre> element of which they are a part, wrap around on to the line below. This should not occur, but there is a fix for this, which the next chapter considers.