XML

XML and it's offsprings.

XML

Some notes about basic xml language.
This is an old and wield way to write configuration, make standard, etc…
I hate it, to be honest.

Books & References

XML Schema_The W3C's object-oriented descriptions for XML

Overall Structure

<!--Claim a xml file-->
<?xml version="1.0"?> 
<!--In defining any schema, the xml namespace should be declared-->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <!--some simple types may be derived-->
    <xs:simpleType name="aSimpleTypeName">
    ...
    </xs:simpleType>
    <!--some complex types may be defined-->
    <!--elements would be defined-->
    <xs:element name="anElementName" type="xs:aTypeName"/>
  ...
</xs:schema>

Derive a simple type

The 'Data Type' :The simple datatypes describe the content of a text node or an attribute value.

<!--By adding restrictions to predefined simple type-->
<xs:simpleType name="typeName">
  <xs:restriction base="xs:baseTypeName">
    <!--every added restriction is called a 'facet'-->
    <!--enumeration-->
    <xs:enumeration value="val1"/>
    <xs:enumeration value="val2"/>
    ...
    <!--range-->
    <xs:maxExclusive value="max_val_e"/>
    <xs:minInclusive value="max_val_i"/>
    <!--pattern-->
    <xs:pattern value="some_regex"/>
  </xs:restriction>
</xs:simpleType>

<!--By list-->
<!--with predefined type elements-->
<xs:simpleType name="aList">
  <xs:list itemType="xs:baseType"/>
</xs:simpleType>

<!--by union-->
<xs:simpleType name="integerOrDate">
  <xs:union memberTypes="xs:integer xs:date"/>
</xs:simpleType>

Define a complex type

The 'Data Structure' : a description of the markup structure. They use simple datatypes to describe their leaf element nodes and attribute values, but have no other links with simple datatypes.

Types of content models
Content Models Empty Simple Complex Mixed
Child Element No No Yes Yes
Child Text No Yes No Yes
Simple Content Model

Which means elements having only text nodes and attributes, closest to simple types.
The operation of adding attributes to a simple type to create a simple content complex type is called an extension of the simple type.

<!--create a simple content complex type-->
<!--anonymous-->
<xs:element name="anElementName">
  <!--start defining an anonymous complex type-->
  <xs:complexType>
    <xs:simpleContent>
      <xs:extension base="aSimpleTypeName">
        <xs:attribute ref="anAttributeName"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>
  <!--end of type defining-->
</xs:element>
<!--global-->
<xs:complexType name="aComplexTypeName">
   <xs:simpleContent>
     <xs:extension base="aSimpleTypeName">
       <xs:attribute ref="anAttributeName"/>
     </xs:extension>
   </xs:simpleContent>
 </xs:complexType>



<!--derivation-->
<!--by extension : add an attribute-->
<xs:complexType name="secondComplexTypeName"> 
    <xs:simpleContent>
      <xs:extension base="aComplexTypeName">
        <xs:attribute ref="secondAttributeName"/>
      </xs:extension>
    </xs:simpleContent>
</xs:complexType>
<!--by restriction-->
<xs:complexType name="thirdComplexTypeName"> 
    <xs:simpleContent>
      <xs:restriction base="secondComplexTypeName">
        <xs:pattern value=".*"> 
          <!--remove an attribute-->
        <xs:attribute ref="secondeAttributeName" use="prohibited">     
      </xs:restriction>
    </xs:simpleContent>
</xs:complexType>

Complex Content Model

Complex contents are created by defining the list (and order) of its elements and attributes.
xs:sequence to define ordered lists of particles
xs:choice to define a choice of one particle among several
xs:all to define nonordered list of particles

<xs:element name="author">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="name"/>
      <xs:element ref="born"/>
      <xs:element ref="dead" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute ref="id"/>
  </xs:complexType>
</xs:element>

&NewLine;

Inclusion

Schema inclusions must be top-level elements, children of the xs:schema element. Their effect is to include all the top-level declarations of the included schema (which doesn’t need to be a complete schema).

<xs:include schemaLocation="simple-types.xsd"/>
<xs:include schemaLocation="complex-types.xsd"/>

Namespace Declaration

This is like import in OOP languages

<!--without prefix-->
<library xmlns="http://dyomedea.com/ns/library">
<!--with a prefix-->
<lib:library xmlns:lib="http://dyomedea.com/ns/library">

The namespace URI doesn't necessarily resolve to any web-page! It's just a universal identification of a shared specification any client application is requited to recognize and obey. The specification only specifies the abstract meanings and datatypes of the namespace vocabularies, the validation and parsing process is all done on client side.

EPUB 3

Links

Epub 3 Community Goup
Epub 3.2 Specification
OFC specification
Structural Semantic Vocabulary

Books

Matt Garrish – What Is EPUB 3(2011)
Matt Garrish, Markus Gylling – EPUB 3 Best Practices(2013)

1 Package Document

Which EPUB is this (identifiers)?
What names is it known by (titles)?
Does it use any vocabularies I don’t necessarily understand (prefixes)?
What language does it use? What are all the things in the box (manifest)?
Which one is the cover image, and do any of them contain MathML or SVG or scripting (spine itemref properties)?
In what order should I present the content (spine), and how can a user navigate this EPUB (the nav document)?
Are there resources I need to link to (link)?
Are there any media objects I’m not designed by default to handle (bindings)?

1.1 The Package Document Structure

<?xml version="1.0"?> 
<package version="3.0" xmlns="http://www.idpf.org/2007/opf" ...  >
   <metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
      ...
   </metadata>
   <manifest>
      ...
   </manifest>
   <spine>
      ...
   </spine>
</package>

Other optional attributes of package:

  • xml:lang
    The language of the package document (not necessarily the same as the publication!)

  • dir
    The text directionality of the package document: left-to-right (ltr) or right-to-left (rtl)

1.2 Namespaces

1.2.1 DCMES

Dublin Core Metadata Element Set (DCMES) for much of its required and optional metadata. Commonly referred to as “Dublin Core”.

DCMES is widely used as a basic framework for metadata of all sorts, from publication metadata to metadata for media like movies, audio, and images.

1.2.2 Default Vocabulary

meta – The workhorse of EPUB 3 metadata
link – Enabling the inclusion of external resources
item – Providing metadata about each item in the manifest
itemref – For metadata associated with items in the spine

1.2.3 The Reserved Vocabularies

The reserved vocabularies provide commonly used sets of terms that can be used, with the proper prefix, without requiring those prefixes to be declared in the EPUB. The reading system is supposed to know where to find authoritative documentation of these vocabularies.

dcterms– A richer but more restrictive counterpart to DCMES in Dublin Core, designed to enable “linked data”
marc – A vocabulary commonly used by libraries for bibliographic metadata
media – The vocabulary on which EPUB 3’s Media Overlays specification depends
onix – The vocabulary used for book supply chain metadata

But better declare everything clearly:

<package xmlns="http://www.idpf.org/2007/opf" 
         xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns:dcterms="http://purl.org/dc/terms/"
         prefix="rendition: http://www.idpf.org/vocab/rendition/#" >
1.2.4 Other Vocabularies

To use any of these other vocabularies, their terms must include a prefix (similar to how namespaces work), and each such prefix used in an EPUB must be declared in the prefix attribute of the package element, which is the root container of the package document.

xmp
The Extensible Metadata Platform, widely used for metadata about images and other media:

prefix="xmp: http://ns.adobe.com/xap/1.0/"

prism
The very rich vocabulary used for magazine and other publication metadata:

prefix="prism: http://prismstandard.org/namespaces/basic/3.0/"

custom
A proprietary metadata scheme used by a publisher:

prefix="TimeInc: http://www.timeinc.com/PRISM/2.1/"

1.3 The metadata Element

XML rules require that you declare the Dublin Core namespace in order to use the elements. This declaration is typically added to the metadata element, but can also be added to the root package element. For example:

<!--declaration of Dublin Core namespace-->
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">

Required elemtents:

<package ... unique-identifier="pub-identifier" >
...
<dc:identifier id="pub-identifier">urn:isbn:1223456789</dc:identifier>
<dc:title id="pub-title">title</dc:title>
<dc:language id="pub-language">en</dc:language>
<meta property="dcterms:modified">timestamp</meta>

Optional but useful:

<!--author-->
<dc:creator id="author">Bill Kasdorf</dc:creator>
<meta refines="#author" property="role" scheme="marc:relators">aut</meta>
<!--data in standard xs:datetime format-->
<dc:date>2000-01-01T00:00:00Z</dc:date>
<!--source publication-->
<dc:source id="src-id">urn:isbn:9780375704024</dc:source>
1.3.1 The meta Elements

They’re contained in the metadata element, the first child of the package element, and from that central location they serve as a hub for metadata about the EPUB, its resources, its content documents, and even locations within the content documents.

<meta refines="#creator"
property="alternate-script"
xml:lang="ja">
  村上 春樹
</meta>
1.3.2 Example 1

mata elements with property values from the dcterms namespace, may act as counterparts of dc-prefixed elements.

<dc:identifier id="pub-identifier">9781491991732</dc:identifier>
<meta id="meta-identifier" property="dcterms:identifier">9781491991732</meta>

<dc:title id="pub-title">Flask Web Development</dc:title>
<meta property="dcterms:title" id="meta-title">Flask Web Development</meta>

<dc:language id="pub-language">en</dc:language>
<meta property="dcterms:language" id="meta-language">en</meta>

<meta property="dcterms:modified">2018-03-05T19:09:05Z</meta>

<dc:publisher>O'Reilly Media, Inc.</dc:publisher>
<meta property="dcterms:publisher">O'Reilly Media, Inc.</meta>
    
<dc:date>2018-03-05</dc:date>
<meta property="dcterms:date">2018-03-05</meta>

<dc:description>...</dc:description>
<meta property="dcterms:description">
...
</meta>

<dc:creator>Miguel Grinberg</dc:creator>
<meta property="dcterms:creator">Miguel Grinberg</meta>

<meta name="cover" content="cover-image" />
<meta property="ibooks:specified-fonts">true</meta>
<meta xmlns="" name="BNContentKind" content="" />
1.3.3 Example 2

Tamplete from AozoraEpub3: refine, file-as, etc

<dc:title id="title">${title}</dc:title>
<meta refines="#title" property="dcterms:title">${title}</meta>
<meta refines="#title" property="file-as">
${titleAs}
</meta>

<dc:creator id="creator">${creator}</dc:creator>
<meta refines="#creator" property="dcterms:creator">
${creator}
</meta>
<meta refines="#creator" property="file-as">
${creatorAs}
</meta>

<dc:publisher id="publisher">${publisher}</dc:publisher>
<meta refines="#publisher" property="dcterms:publisher">
${publisher}
</meta>

<dc:language id="pub-lang">ja</dc:language>
<meta refines="#pub-lang" property="dcterms:language">ja</meta>
        
<dc:identifier id="pub-id">
urn:uuid:${identifier}
</dc:identifier>
<meta refines="#pub-id" property="dcterms:identifier">
urn:uuid:${identifier}
</meta>

<meta property="dcterms:modified">${modified}</meta>
<meta name="cover" content="img${coverImage.Id}"/>

<!--comics only-->
<!--setting pre-paginated content-->
<meta property="rendition:layout">pre-paginated</meta>
<meta name="fixed-layout" content="true"/>
<meta name="book-type" content="comic"/>
<meta name="zero-gutter" content="true"/>
<meta name="zero-margin" content="true"/>
<meta name="primary-writing-mode" content="horizontal-rl"/>
<meta name="RegionMagnification" content="false"/>
<meta name="orientation-lock" content="none"/>
<!--<meta name="original-resolution" content="844x1200"/>-->
1.3.4 Multiple Titles
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
   ...
   <dc:title id="t1">A Dictionary of Modern English Usage</dc:title>
   <meta refines="#t1" property="title-type">main</meta>
   <dc:title id="t2">First Edition</dc:title>
   <meta refines="#t2" property="title-type">edition</meta>
   <dc:title id="t3">Fowler’s</dc:title>
   <meta refines="#t3" property="title-type">short</meta>
   ...
</metadata>

1.4 Manifest and Spine

They constitute the “packing list” and a key aspect of the “assembly instructions” that make an EPUB so much more than just a “website in a box.” The manifest documents all of the individual resources that together constitute the EPUB, and the spine provides a default reading order by which those resources may be presented to a user.

1.4.1 Manifest

id

  • This is essential, so that each constituent part of the EPUB can be uniquely identified. It is what enables that versatile meta element to provide metadata associated with specific items, via its refines attribute.

href

  • An internationalized resource identifier (IRI) specifying the location of the resource. Resource names should be restricted to the ASCII character set.

media-type

  • The MIME media type that specifies the type and format of the resource.

Required:

<item id="chapter01"
      href="xhtml/c01.xhtml"
      media-type="application/xhtml+xml"/>

Example:


<!--ncx nav for EPUB2-->
<item id="toc.ncx" href="toc.ncx" media-type="application/x-dtbncx+xml" />

<!--css-->
<item id="epub-css" href="epub.css" media-type="text/css" />

<!--embedded fonts-->
<item id="epub.embedded.asset.1" 
      href="DejaVuSans-Bold.otf" 
      media-type="application/vnd.ms-opentype" />
      
<!--cover page-->
<item id="cover" href="cover.html" media-type="application/xhtml+xml" />

<!--cover-image-->
<item id="cover-image" 
      properties="cover-image" 
      href="assets/cover.png" 
      media-type="image/png" />
      
<!--normal image-->
<item id="img-idm140583869049392" 
      href="assets/fwd2_0201.png" 
      media-type="image/png" />
      
<!--copyright page-->
<item id="copyright-page-idm1" 
      href="copyright-page01.html" 
      media-type="application/xhtml+xml" />
      
<!--nav doc for EPUB 3-->
<item id="toc-idm140583871891600" 
      href="toc01.html" 
      media-type="application/xhtml+xml" 
      properties="nav" />
      
<!--content pages-->
<item id="preface-idm140583872047376" 
      href="preface01.html" 
      media-type="application/xhtml+xml" />
<item id="part-idm140583872046672" 
      href="part01.html" 
      media-type="application/xhtml+xml" />
... 
1.4.2 Spine

It is required to list only those components that are not referenced by other components (primary content).

An optional linear attribute that specifies whether the item is primary (yes, the default) or auxiliary (no)

The spine may also contain a toc attribute that identifies an NCX file. The NCX is how EPUB 2 specified navigation; it is superseded in EPUB 3 by the Navigation Document(nav, a XHTML document).
In order to enable an EPUB 3 to be rendered by an EPUB 2 reading system, it has to include an NCX even though it also has to have a nav to conform to EPUB 3. Until EPUB 2 reading systems have become obsolete (no sign of that in the near future), publishers generally need to include both.

<package ...>
   ...
   <manifest>
      ...
      <!--nav doc for EPUB3-->
      <item id="nav"
            href="nav.xhtml"
            media-type="application/xhtml+xml"
            properties="nav"/>
      <!--ncx file for EPUB2-->      
      <item href="toc.ncx"
            id="ncx"
            media-type="application/x-dtbncx+xml"/>
      ...
   </manifest>
   <!--toc attribute a xs:IDREF for ncx file-->
   <spine toc="ncx">
      ...
   </spine>
</package>

Example:

<itemref idref="cover" />
<itemref idref="titlepage-id" />
<itemref idref="copyright-page-id" />
<itemref idref="dedication-id" />
<itemref idref="preface-id" />
<itemref idref="part-id" />
<itemref idref="chapter-id" />
...
1.4.3 Page Renderin Direction
<spine page-progression-direction="rtl" toc="ncx">
<spine page-progression-direction="ltr" toc="ncx">