CMT 3315 Advanced Web Technologies: SIXTH TASK

Discuss the roles of entities in XML, giving examples what problems can occur when they are not used properly.

Entities in XML are usually described as being special units of storage that serve as the building blocks of XML documents. They can usually identified by the fact they tend to have unique names and the function of containing the content in documents. It must be noted that although most entities do have names some don’t such as the document entity. The document entity and the external DTD are both nameless. External DTD (Document Type Definition) this entity is used to describe the structure and format of documents created in a precise XML language, and is not always necessary. It is only used if one exists and is nameless. The document entity will be discussed in more depth below. Most entities are declared in an entity declaration, which has to appear before the entity can be used in the document. Entities can store numerous types of data which makes them extremely flexible.

The Document Entity

This entity is the top-level entity for any document and is extremely important as it functions as the storage container for the complete document. It must also be noted that it is a nameless entity and is unique. In being a unique entity it is declared in the document type declaration. This entity is broken down into sub entities, which are broken down additionally, this process continues till only the content is revealed. This entity declaration should have a unique entity name and some form of data that links with the entity name.

Examples of Data that which could be referenced as entities:

Stringed text
External files containing binary data or text data
A segment of the Document Type Definition

There are two types of entities used in XML documents, parsed and unparsed entities:

Parsed Entities (containing text data)

These are entities that contain XML data that is processed by an XML application; there are two basic types of parsed entities:

General Entities

This kind of entity is useful when text or document data is to be reused. An example is the copyright information on a website this may be placed on all web pages of that website. A general declaration declares the use of a general entity. The entity must be placed in the DTD and can go in either the internal or external DTD.

Internal DTD would be used if its say a single document and the external DTD is if the document is to be shared eg, the copyright on a website, being placed on multiple web pages.
Once declared the entity can be used anywhere in the content of the entire document.

Parameter Entities

This is another general entity which can be used only within a Document DTD, to structure the storage of common pieces of declarations.This type of entity must be declared before being used in a DTD and is useful when the DTD is fairly big with repeating declarations.

Unparsed Entities (containing text or binary data)

These types of entities are not processed by XML applications, therefore the content cannot be embedded straight into document like a parsed entity. As they are unable to be parsed its essential to use helper information allowing the processing of the entity. Helper applications can be a number of things be it a stand alone application or a plug in, and they are called notions as they offer direction to the XML application to the helper so the unparsed entity can be processed.

Predefined Entities

Character Entity

Less than symbol (<) <

Greater than symbol (>) >

Quote symbol (") &quot

Apostrophe symbol (') '

Ampersand symbol (&) &

All entities have to be declared before using them in a document apart from the five predefined entities listed above that can be used without declaring.

2. Review the correct use of character sets in XML, discussing the advantages and disadvantages of different sets, with examples. How would you select your character sets to present Chinese characters

Character sets determines which characters are permitted in an XML document, and can be encoded in various ways to suite different human languages. XML supports numerous character encoding schemes which are all based on the Unicode text standard. The UCS, Universal character set that embraces most of the world's writing system is a ISO standard that uses multi-ocet characters which causes plenty of compatibility issues often making it not compatible with many current protocols and applications. Thus making the UCS Transformation Formats (UTF) standards that resolve the compatibility issues.

ASCII

ASCII is a popular character set used worldwide with each character set represented by a character encoding value. Meaning that each character code is given a unique value which they could be identified by. 128 different values are allowed in pure ASCII which has a 7-bit encoding system.

This has been upgraded by ANSI to a 8-bit encoding system which allows the full range of 256 characters in a Byte.

<----- From character values 00-127

Figure 26-2 - ASCII Character Set

http://docstore.mik.ua/orelly/xml/xmlnut/ch26_01.htm

Unicode

The Unicode text standard indicates which characters are permitted in a text document and includes characters from all over the globe. The character encoding scheme for a XML document called the character encoding declaration must be declared in the XML declaration. As you can see from the below example it looks like an attribute.

The UTF-8 is the default scheme for XML and is widely supported by XML applications along with the UTF-16. The UTF-8 uses 8 bits to represent text while UTF-16 uses 16. UTF-16 has the advantage of being more versatile as it is more effective when used with multiple human languages unlike UTF-8, the only disadvantage being the use of more memory. The UTF-8 character set is compatible with 7-bit ASCII and is more universal it is often seen as the best character set for XML. Reason being is the broad tool support unlike other character sets, support tools like Eclipse, Notepad, JEDIT etc. The W3C has also recommended this character as the ideal choice.

The main disadvantage of character sets are the compatibility issues that can arise however the advantage that they have overshadows such a disadvantage. There are many character sets for different needs, however I have gathered that the default one UTF-8 seems to be the most popular.

To get your character set to Chinese characters, you would write the below hexadecimal character code in the beginning of the XML document:

<?xml version= “1.0” encoding=“ISO-8859-1”?>

<decl> Declaration: 人 &# x4EBA; 生 而 自 </decl>"

UTF-16 contains substantial amount of Chinese, Japanese and Korean characters

3. Describe and discuss the logical modelling of data in the context of XML,

The Document Object Model (DOM) is a standard established by the World Wide Web Consortium (W3C), it is a standard method that exposes the elements in a document as a data structure in any given feasible programming language, it is also viewed as a collection of programming interfaces. In XML the DOM basically provides access to an entire document, this is done by a DOM parser which is a program that reads through the XML code through its source and provides a data structure that is accessible through your application. The data structures can also be written out in XML by the XML parser, allowing changes to be saved to the data structures. The functionality and specifications of the workings of a DOM are based upon the implementations, and as their are multiple levels of the DOM this will vary by level.

DOM Level 0 : The original Netscape/ IE Functionalities
                           Not a W3C Recommendation, but was used as a basis

DOM Level 1 : (Latest edition September 2000) Fundamental DOM Objects (not fully
                           backward compatible with Level 0!)
DOM Level 2 : (November 2000) Access to stylesheets, handling namespaces; also
                           includes an event model

DOM Level 3 : (In preparation) Handling XML schemes (including on-the-fly validation),
                           X Path - based selections

http://www.w3.org/Consortium/Offices/Presentations/XMLFamilyOverview/13.html

As DOM are programming interfaces, extending the DOM reveals the basic interface called Node. Using a XML document to explain this clearer, the DOM is represented in a tree structure and the Nodes are shown in this tree structure. The below image shows a the DOM representation of an XML document, the tree is made up of nodes. Nodes are extremely i important in XML documents as they determine the relationship between parent and child element. Regardless of the entity in a DOM tree it is still a node, and the top node in a DOM Tree is called the root.

DOM Interfaces (frequently used ones)

The Node Interface
The Element Interface
The Document Interface
The Attr Interface
The NodeList Interface

4. What is the role of namespaces in XML? What benefits does the use of namespaces confer and what problems may be avoided?

To avoid any dispute and confusion between identically named attributes and elements in XML, namespaces were introduced and became the solution to such issues. Solving the issue of elements and attributes which have identical names but are not related, which cause confusion and cause problems for XML applications. XML applications are unable to judge the difference in context when elements share the same names as other markup languages.

Namespaces in XML introduces uniqueness to avoid confusion with clashing names of elements and attributes in XML documents, they are a collection of elements and attributes that are used to rid conflicts. Namespaces are extremely important as they can be used in Nodes, which means it will only have an effect on the elements and nodes in the given node. If a namespace has a global scope/node this will obviously be applied to the entire XML document. If a namespace is associated with the root element it will become a global namespace.

Example....
An example is the cd collection, I've referred to on various blog posts, if I decided to have cd collection and video collection, both will have have shared elements and attributes, as shown below:

<cdcollection>
<title> Unthinkable </title>
<artist> Alicia Keys </artist>
<album>The Element of Freedom </album>
.......................................
.......................................
</cdcollection>

<videocollection>
<title>Speechless</title>
<artist>Ciara</artist>
<format>HD</format>
....................................
....................................
</videocollection>

An XML application will not understand the differences in the two, therefore I would require a namespace for each markup language. In the example above I have two elements that are in both cd & video collection, title and artist. When naming namespaces its important to remember they are identifiers for elements and attributes therefore they should have unique names. To avoid confusion with naming, Uniform Resource Identifiers (URIs) was developed as a naming scheme, they tend to reference physical resources on the Internet. In doing so the possibility of being unique is guaranteed. An alternative to URIs is URNs, Universal Resource Name which identify a unique location independent name for a name that draws to a single or multiple URLs.

The Declaration of namespaces looks similar to an attribute of the element, and should be used with the root element if its to be applied to the entire XML document. I have incorporated namespaces into the above example:

<cd:cdcat xmlns: cd="urn: xmlns: musiccatologue.com: cdcollection">
<cd:cdcollection>
<cd:title> Unthinkable </cd:title>
<cd:artist> Alicia Keys </artist>
<cd:album> The Element of Freedom </album>
....................................
....................................</cd:cdcollection>

<vd:vdcat xmlns: vd="urn: xmlns: hiphopvideos.com: videocollection">
<vd:videocollection>
<vd:title> Speechless </vd:title>
<vd:artist> Ciara </vd:artist>
<vd:format> HD </format>
....................................
....................................</vd:videocollection>

Namespaces are not compulsory, meaning not required for all XML documents

5. What is Xpath and what does it do? Evaluate the strengths and weaknesses of this concept.

Xpath became a W3C recommendation on the 16th of November 1999. It is designed for use with XML documents and an important element in document addressing in XSLT. Unlike its XSL counterparts XPath is not implemented as an XML Language and its main function is to a act as a syntax for defining parts of an XML document, it hosts a library of standard functions. The name path dictates the concept of XPath as using somewhat of a path to address XML documents, it uses path expressions to navigate around XML documents.

XPath Nodes

Element
Attribute
Text
Namespace
Processing-Instruction
Comment
Document nodes

Strengths

Simple Expressions and compact queries
Query strings take no effort in being embedded into programs, scripts XML etc...
Nodes can be identified uniquely in an XML document
Powerful and quick parsing of queries
Simple to use and simple syntax
Recommended by the W3C

Weaknesses

Main weakness being that it holds the same disadvantages of the widely criticised Document Object Model (DOM)
Processing and resource time
Compatibility issues
Issues may arise with complex expressions or queries

CMT 3315 Advanced Web Technologies

Friday, 4 February 2011

SIXTH TASK

No comments:

Post a Comment