Articles

Aug 15: Decentralised XML storage

Over the past couple of years I have been experimenting with an XML storage system for displaying articles. I first wrote about during my 2008 homepage refresh. The last couple of days have seen a major overhaul, converting from a centralised XML database to dispersed XML files that accompany each entry. The article directory is parsed and these XML files are collated and serialized by PHP.

Having a decentralised storage mechanism allows greater integration with the filesystem. It removes the need to update and overwrite the XML database after every change. Now a new article can be uploaded to a directory using FTP and automatically deployed on a cache refresh.

Example XML info file

The XML format allows content to be mapped to different output types. This webpage is delivered as XHTML so the convertor identifies PHP Markdown as a class library to achieve the mapping.

<?xml version="1.0" encoding="UTF-8"?>
<article>
    <title>Decentralised XML storage</title>
    <created>Sun, 15 Aug 2010 00:07:05 +0100</created>
    <modified>Sun, 15 Aug 2010 00:07:05 +0100</modified>
    <content format="markdown" src="decentralisedxmlstorage.mdml"/>
    <category>Writing-Development-Flux, Flux</category>
</article>

How to detect changes

In many computer systems a dirty bit is set to signify that modifications have taken place and the cache line should be refreshed. Deleting the cache is a simple solution, forcing the directories to be indexed. This is an appropriate solution for up to 1000 articles. Beyond these scales a more efficient solution would detect filesystem modification times. Is it sensible to trust system time? I would rather not go there.

Major modifications

The decentralisation is a major step towards a class library that can be used by other developers to deliver high quality content in less time. A single file management interface is the final stage before release.

Posted August 15, 2010 in Writing-Development-Flux, Feed

Jun 24: Moving towards custom XML note formats

Document formats are evolving. New formats are continuously being created and destroyed resulting in an incredible amount of stagnant data. Why should I have to learn how to deal with every popular trend when I just want to store my thoughts?

XML is extremely powerful as it can express semantic ideas in a form that can be easily transformed into common formats. This enables people to write data for specific applications and use it later in a variety of circumstances. By combining transformations data formats can be converted to any desired representation.

Last term I designed a custom format for exam revision notes, converting the erratic slides into a uniform format beautifully typeset in HTML and CSS. These were converted from XML using a short XSL transformation supported by most modern browsers. This meant that each set of notes was visible on my EEPC (Ubuntu), desktop (Windows 7) and at the Cambridge Computer Lab (Windows XP/Linux). This is an efficient setup that anyone can emulate.

An example section

This is the general format of the document:

<section>
	<title>Natural language interfaces and dialogue systems</title>
	<p><defn><for>Natural Language User Interfaces</for> 
	(LUI) are <is>a type of computer human interface where linguistic 
	phenomena such as verbs, phrases and clauses act as UI controls for creating, 
	selecting and modifying data in software applications</is>.</defn>
		<resource author="Wikipedia" 
			url="http://en.wikipedia.org/wiki/Natural_language_user_interface">
			Natural language user interface
		</resource>
	</p>
</section>

My aim is to refine this portable document format to write lecture notes from the very beginning of third year Computer Science. Using XML allows a XSLT transformation between the old and new versions.

Required features

  • Hierarchical textual content
  • ASCIIMathML to convert inline LaTeX mathematics to MathML that can be rendered in documents with mime-type application/xhtml+xml
  • Automatic hyphenation to increase readability using Hyphenator.js
  • References to external documents using URLs
  • Tables, lists (using HTML syntax)
  • Syntax highlighting for common languages, provided by Gorbatchev's SyntaxHighlighter

Additional extras include

  • Rich graphics from external files (unfortunately inline graphics add too much complexity to the document)
  • Internal references like see also to link related topics
  • A section for the syllabus and links to where each topic is explained (contents)

Changes from the last version

The first version was based on the hierarchy of topic, section, subsection with titles as attributes. A little research into existing XML formats like DocBook can remove many of the design decisions. Why reinvent when teams of intelligent people have already come up with a better solution? Using different elements names for levels in the hierarchy was a mistake. It hampered refactoring the XML document.

Using elements like seealso, result and source signified external references in the first version. These will be unified into a resource element with multiple uses. In other aspects there will be an emphasis in using cdata over attribute values to represent information (title will no longer be an attribute).

Colloquialisms should be used to repeat key sections explaining underlying concepts. These can be represented using a handwriting font and rendered in collapsible regions in the transformed XHTML.

Definitions must fit into document prose while being structured semantically to summarise into a definition section. A niche feature this would support is exporting question/answer pairs to SuperMemo using a supermemo.xsl transformation that only applies to definitions.

Posted June 24, 2010 in Writing-Development; Feed

May 20: PT OnDemand (fitness)

Screenshot of the PT OnDemand homepage

PT OnDemand offers bespoke personal training videos to a global audience. Users can pay for a series of workout videos using PayPal and track their progress over time using a workout tracker.

I was brought in after the previous developer failed to deliver the requirements. The project required a complete rewrite. I created unique features from scratch while using open source libraries to handle common elements like email validation with the EmailAddressValidator class and controller/view separation using Savant3 native-PHP templates.

PayPal integration was important to enable videos to be purchased. My system is a succinct object-orientated version of their name-value pair examples. Unfortunately PayPal provided verbose functions that were intrinsically linked with their examples. I halved their example code using inbuilt functions like http_build_query.

The videos had to be available and fast to download from all target countries. Amazon CloudFront provides world-wide performance with reduced start-up costs compared to excellent but expensive CDNs like Edgecast.

Posted May 20, 2010 in Portfolio

Mar 19: Solar Empire (game)

Solar Empire took a large amount of my childhood. It was brilliant inadvertent training in software development and team projects. During the time when I was an active developer we had over 200 active players producing over 50000 hits per day; cumulatively, our distributions have been downloaded over 40000 times. At the time it was an active rewarding community but the central developers eventually left for bigger things. The project is on hold until I find a team of competent developers who want to continue development in their spare time.

System Wars game map
Star system map
System Wars overview screen
Ship/planet listing

Continue reading the full article

Posted March 19, 2010 in Portfolio

Mar 19: GoodScripts (script archive)

Screenshot of GoodScripts script search

GoodScripts was a major project that began as my GCSE ICT coursework. It developed into an excuse to spend countless hours learning PHP and web layout skills, creating a script storage system like HotScripts and Scripts.com. The project was successful — it worked — but I stopped developing it after I realised that it was not a viable competitor to these existing resources.

Continue reading the full article

Posted March 19, 2010 in Portfolio

More articles on the next page