Writing

Jun 24: Moving towards custom XML note formats

Document formats are evolving. New formats are continuously being created and destroyed resulting in an incredible amount of stagnant data. Why should I have to learn how to deal with every popular trend when I just want to store my thoughts?

XML is extremely powerful as it can express semantic ideas in a form that can be easily transformed into common formats. This enables people to write data for specific applications and use it later in a variety of circumstances. By combining transformations data formats can be be converted to any desired representation.

Last term I designed a custom format for exam revision notes, converting the erratic slides into a uniform format beautifully typset in HTML and CSS. These were converted from XML using a short XSL transformation supported by most modern browsers. This meant that each set of notes was visible on my EEPC (Ubuntu), desktop (Windows 7) and at the Cambridge Computer Lab (Windows XP/Linux). This is an efficient setup that anyone can emulate.

An example section

This is the general format of the document:

<section>
	<title>Natural language interfaces and dialogue systems</title>
	<p><defn><for>Natural Language User Interfaces</for> (LUI) are <is>a type of computer human interface where linguistic phenomena such as verbs, phrases and clauses act as UI controls for creating, selecting and modifying data in software applications</is>.</defn>
		<resource author="Wikipedia" url="http://en.wikipedia.org/wiki/Natural_language_user_interface">Natural language user interface</resource>
	</p>
</section>

My aim is to refine this portable document format to write lecture notes from the very beginning of third year Computer Science. Using XML allows a XSLT transformation between the old and new versions.

Required features

  • Hierarchical textual content
  • ASCIIMathML to convert inline LaTeX mathematics to MathML that can be rendered in documents with mime-type application/xhtml+xml
  • Automatic hyphenation to increase readability using Hyphenator.js
  • References to external documents using URLs
  • Tables, lists (using HTML syntax)
  • Syntax highlighting for common languages, provided by Gorbatchev's SyntaxHighlighter

Additional extras include

  • Rich graphics from external files (unfortunately inline graphics add too much complexity to the document)
  • Internal references like see also to link related topics
  • A section for the syllabus and links to where each topic is explained (contents)

Changes from the last version

The first version was based on the hierarchy of topic, section, subsection with titles as attributes. A little research into existing XML formats like DocBook can remove many of the design decisions. Why reinvent when teams of intelligent people have already come up with a better solution? Using different elements names for levels in the hierarchy was a mistake. It hampered refactoring the XML document.

Using elements like seealso, result and source signified external references in the first version. These will be unified into a resource element with multiple uses. In other aspects there will be an emphasis in using cdata over attribute values to represent information (title will no longer be an attribute).

Colloquialisms should be used to repeat key sections explaining underlying concepts. These can be represented using a handwriting font and rendered in collapsible regions in the transformed XHTML.

Definitions must fit into document prose while being structured semantically to summarise into a definition section. A niche feature this would support is exporting question/answer pairs to SuperMemo using a supermemo.xsl transformation that only applies to definitions.

Posted June 24, 2010 in Writing-Development; Feed

Aug 17: How to stop XSLT condensing explicit end tags

The .NET Framework can be unintuitive but when tethered to Sitecore it becomes a strange and unpredictable beast (as I am sure you know if you work with it). One issue I have had is stopping the XSL renderings from creating invalid short tags when the output form is set to XML.

My previous solution to this was to use <![CDATA[]]> at these points to imply that the field has a value, even though it is an empty value. However after upgrading the .NET framework this no longer works. Instead we have to insert content to ensure the XML Writer does not use short tags.

The snippet <xsl:comment>*</xsl:comment> works reasonably well and can be used to insert <!--*--> at these sections. Although it is not ideal, it is very useful for clearing div tags that should not have any renderable content.

Posted August 17, 2009 in Writing-Development; Feed

Mar 25: ImgBurn saves the day!

Just found an application called ImgBurn after trying for literally an hour to find an Open Source or Freeware application to burn Cue/Bin images. It is a saving grace, and the slightly delapidated website put a huge smile on my face. What a kind guy for doing all this development.

It is brilliant application:

ImgBurn supports a wide range of image file formats - including BIN, CUE, DI, DVD, GI, IMG, ISO, MDS, NRG and PDI.

It can burn Audio CD’s from any file type supported via DirectShow / ACM - including AAC, APE, FLAC, M4A, MP3, MP4, MPC, OGG, PCM, WAV, WMA and WV.

The burn just finished successfully however the application just played some kind of ridiculous Glockenspiel tune, a characteristic requirement of freeware software. It has performed so far above expectations that I will ignore this minor grievance and rate it triple-A.

When I finally earn back my overdraft this summer I will donate to keep this project going, he only asks for $2.00 after all! I can now finally get on with the finer points of the Computational Mathematics coursework.

Posted March 25, 2009 in Writing-Software

Mar 19: Journal feed generation, RSS and Atom

For the past few hours I have been trying to integrate a feed generation library into my new blog system. After a couple of minutes searching on Google I came across Anis uddin Ahmad’s PHP Universal Feed Generator. This feed generator can create Atom, RSS 1.0 and RSS 2.0 feeds.

There were a few mistakes in his program: to begin with, escaping data in the XML output using the PHP function htmlentities is incorrect because it will generate entities that are not part of the XML standard. Instead you should allow native UTF-8 characters to remain in the XML document and only escape the small subset of characters that XML uses. I fixed other niggles, including making a method of FeedWriter public static.

I want to write my own feed generator and release it to the world. I wrote the feed generator for Mind & Soul when I was working for Premier Media Group. With some of the knowledge gained from my first year computer science course I could do a far better job. I will now take any excuse to try out my new skills!

Continue reading the full article

Posted March 19, 2009 in Writing-Development

Dec 15: Homepage refresh for Christmas 08

A total homepage refresh has taken place. Instead of using third-party software I have written an article system from scratch. It is designed to take plain HTML files and deliver them through an interactive categorised journal.

Most striking is the vivid new design, a massive contrast to previous efforts. Instead of a combination of white and one other colour, this design is based around a maroon shade. It was tough getting this to look good but I think it is an improvement over the old design.

Another departure is removing a lot of the user generated content. Instead of allowing people to comment directly and leave guestbook entries, they can simply e-mail me. It is my site after all.

Quick warning before you read on: this article becomes CompSci!

Continue reading the full article

Posted December 15, 2008 in Writing-Development

More articles on the next page