Michael J.A. Clark
Michael Clark is a Computer Science student from England providing freelance programming and design when not studying at Cambridge. Skills: C#, Sitecore, PHP, XHTML, CSS, AS3, Java, ML, F#.

Sections

Contact details

Email
mjac@mjac.co.uk
Skype
mjacdotuk
Twitter
mjacuk

Articles tagged features

Using Chrome’s Experimental Speech API to create Tuenti Voice Control

Tuenti Voice Control is a proof-of-concept that allows users to browse Tuenti with their voice instead of using a mouse or keyboard. It was created for HackMeUp 15, a 24 hour code competition held between Tuenti engineers every quarter, and uses the Experimental Speech API available in Google Chrome since 2011. Tuenti featured this article on their developer blog.

Ismael Gonzalez and I recorded a video that demonstrates browsing Tuenti tabs, going to specific profiles and starting chats:

(see it here, embed coming)

After creating a Chrome plugin that communicates speech-to-text data to the website, we spent the remaining three hours adding commands related to Tuenti. By the deadline we could:

  • Access top-level pages on the site like “mensajes” and “salir”
  • Target specific friends with “chat” or “perfil” followed by their name -- users can also go directly to Jose’s profile by speaking “perfil jose” or be more specific with “perfil jose manuel”
  • Write speech directly to the chat conversation and send the message -- it is possible to begin chatting with Natalia with “chat natalia” and output any following text to the screen “hola natalia como estas”

Plugin architecture

Chrome’s Experimental Speech API implements a subset of the features detailed in the W3C Recommendation for Speech Grammer (March 2004) and allows extensions to start speech recognition and retrieve the captured text. To use experimental extension APIs, you must start Chrome with the command line option --enable-experimental-extension-apis.

Alt text

Google Chrome extensions are composed of HTML pages with specific functions. We use a single content script to capture events from the browser and send requests to a background page:

window.addEventListener("speechstart", function(e) {
    chrome.extension.sendRequest('speechstart', function(response) {
        triggerSimpleEvent('speechstarted');
    });
});

This background page is able to access the experimental API and start speech recognition:

chrome.experimental.speechInput.start({
    language: 'ES_es'
}, function () {
    if (chrome.extension.lastError) {
        console.debug("Couldn't start speech input: "
            + chrome.extension.lastError.message);
    }
});

The background page then communicates the result to the content script via an asynchronous request.

// Target active tab
chrome.tabs.getSelected(null, function (tab) {
    chrome.tabs.sendRequest(tab.id, {
        success: true,
        result: result
    }, function (response) {
        // Handle request callback
    });
});

If recognition has been successful, the content script appends a JSON-serialized version of the speech data array to the DOM and fires a ‘speechresult’ event.

chrome.extension.onRequest.addListener(
    function(request, sender, sendResponse) {
        var voice = document.getEleventById('voice');
        voice.setAttribute('success', request.success ? 'true' : '');
        voice.setAttribute('data', JSON.stringify(request.success 
            ? request.result.hypotheses : []));
        triggerSimpleEvent('speechresult');
    }
);

Serialization is required because the content script and underlying website have different Javascript contexts and objects cannot be shared between them.

Processing the speech result

The W3C recommendation includes a method for specifying a grammar. This is crucial for achieving high accuracy and precision in speech recognition system as error rates decrease as the vocabulary size shrinks: 0-9 can be recognized without error, but vocabulary sizes of 200, 5000 or 100000 can have error rates of 3%, 7% or 45%. After experimentation we found that custom grammars are not implemented in Chrome, as of December 2011, and that Google returns any set of words from its dictionary.

  1. We solved this issue by converting recognized text to a bag-of-words and calculating the probability of a user wanting to perform an action on a friend based on the number of occurrences of words related to that action/user pair.
  2. Words were normalized and used to access a hash map that maps words to friends who have that name or commands that are referenced by that word
  3. Each friend or command is then increased by a value weighted by the confidence factor returned by Google and the index of the result The action/word with the highest cumulative weight is performed

This approach worked flawlessly when words present in text returned by Google correspond to a valid action/friend pair. This is helped by speaking clearly and using a high quality noise-cancelling microphone (Apple MacBook Pro) to ensure that the speech recognizer can detect the beginning and end of the command.

Conclusions

Before starting the project we did not know if it would be possible, especially using a single key, to start speech recognition, let alone recognize commands. It was and we think that such techniques can provide a better web experience. For this to happen, both Google (and other browser makers) and the W3C must work together to provide a stable API that can be used by all websites without extensions.

Thanks for reading, please add your comments.

The Dial — digitising a classic Cambridge creative writing magazine

The Dial is a creative writing magazine for students at Cambridge University. It initially ran from 1906-1953 and was reborn in 2008 when students at Queens' College gained funding to bring it back. It aims to "give space to the new, original and tough work which is the essence of student writing at Cambridge".

Back in early 2010 I contacted the editor Florence Privett with the promise of free technical labour (PHP, XHTML, the standard) and we had some great chats about design strategies. A poetry magazine has to be content-centric. The crux is providing enough highlight to individual poems, while retaining some form of navigation system. I contacted Robert Leadbetter, a contemporary of mine, to finish the concept. The final result can be seen at thedial.org.uk and the resource will improve over time as past issues are added.

The interface

My aim was to start with a typographic layout and then add flourishes of emphasis afterwards. This initial design was based on the styling provided in the Michaelmas 2008 edition of The Dial: Lizzie Robinson elegantly separates the different content types with unobtrusive line marks.

Initial concepts sent to Florence

One criticism of my first approach was the amount of space given to navigation. With the poem Jessica, above, the navigation and poetry have the same total area. It is preferable that the text is the focal point. In addition, italics and size changes could be used to distinguish between body and header content, providing a subtle but clear separation.

Final design the The Dial magazine

This final design was cut up and placed online. Personally, I like the concise multiline navigation menu at the top.

Behind the scenes

Content is published by uploading text files using FTP and full metadata is provided with a fixed filename/directory format that contains IDs and indexes. A small page is then provided containing controls to regenerate the site with the latest content. The suffix "-draft" can be used to hide years, issues or individual articles from the navigation interface.

Many poems have precise formatting requirements, where the spacing between lines and characters is important. This issue is compounded as character widths vary between most fonts. The solution is to set the font for the body text in concrete (we went for Georgia) and transcribe poems from source PDFs into an text editor using that fixed font; the online content will then match the text editor of the less-technical transcriber.

And it would not be possible without…

The Student Run Computing Facility (SRCF) at Cambridge graciously provide hosting and administration for student societies. They also have the best customer service out of any organisation I have ever dealt with, providing incomprehensibly fast 2 minute email responses. A toast to them.

Thanks for reading, please add your comments.

Google APM Workshop at the Cambridge Computer Lab

Today I attended a Google Product Workshop, with Associate Product Managers Kenny and Emma, at Cambridge University. It lasted three hours and was composed of an introduction to Google and the APM position, followed by an interactive group-based workshop for the fifteen attendees.

Google are hiring at the moment and now is the time to apply. Unfortunately since this news arose they have received around 75000 CVs per week. Find a backdoor! Most of the APM positions are in Zurich, a beautiful but expensive city (adequate compensation provided).

They introduced Google and the position

The Associate Product Manager Program is an elite two-year rotational program, consisting of two one-year rotations, designed for top recent computer science graduates who are interested in exploring product development and leadership opportunities. This select group is given broad responsibilities, generous access to resources, visibility into Google's executive team and many opportunities to grow within the organization.

They take around 6 new associate product managers in EMEA (Europe, the Middle East and Africa) every year. This is not a fixed limit, but completely down to the quality of applicants. It is Marissa Mayer's baby, fixing problems with old product managers just hiring people like themselves, while spotting malleable young talent.

Aim of the interaction session

The Googlers then presented an interactive task. We were split into three groups, with the aim being to create a Shopping List product for a specific market segment. These were:

  1. Wedding Planners
  2. Fashionistas
  3. Kids

Shopping for kids

My group targetted kids and our final idea was to create a process based on interactive physical devices like tablets (iPad):

  1. Suppliers are aggregated, with products sorted into child categories (Books, School, Sport), providing the possibility of charging a small percentage for the silent referral
  2. Parents control category, time and budget constraints (you have £20 to spend on books or tennis in the next 2 hours)
  3. The child is provided with the device and they drill-down, selecting products that are within the budget constraints
  4. Each product has educational annotations sourced directly from Wikipedia; currency is not shown instead opting for an indication of whether it will be an acceptable combination with currently selected entries
  5. If the parent selects that the choices require review, the child then passes the choices back to be checked before payment/shipping; this is optional and provides the child an opportunity to learn about online shopping before they have to consider card payments and exact values

I was impressed by the ability of our group to collate differing viewpoints onto a whiteboard before concentrating on a clear aim to develop. Matej usefully organised use-cases for parents and children onto the board and these diagrams guided the user interface drawings. Anyone know a tried and tested method for these situations (there must be a book)?

The presentations

From our team of five, two left before the presentation, and accordingly it became an opportunity to refine my presentation skills. Jen and Stojan demanded that I had to take the lead as the native English speaker but fortunately we all chipped in together when the time came. There was a great team spirit and we shared the more inventive ideas unique to this product, instead of discussing universal technical considerations.

The other two teams had interesting concepts:

  1. Team Fashionista created an annotated-map interface where boutiques are displayed around your current location (iPhone)
  2. Team Wedding List used a text-search interface organised into wedding-related categories with budget/priority annotations (web)

My questions to Google

What happens if deadlines are missed? Have an understanding why it went over-time and a new strategy for completing the product. Remove features or extend the deadline according to time necessity — shopping is dependent on the holiday season and may utilise heavy feature cuts to meet tough deadlines.

Do you use academic research to guide product development (HCI)? Google have breathtaking amounts of data (BigTable) and it is all available for use by internal projects. They use this data in conjunction with user statistics to guide product development instead of relying on research papers (academia progresses slowly in comparison). Products are developed iteratively with refinements based on user interaction testing.

The experience was valuable and enjoyable.

Thanks for reading, please add your comments.

Social Backup – data redundancy on trusted machines

Social Backup is my third year project at the University of Cambridge. It provides peer-to-peer backup between trusted peers (you and your friends share storage).

Commercial data storage

Dropbox has revolutionised off-site storage by providing a simple interface to black box data storage. There are a number of inherent disadvantages to using commercial services:

  • Freemium business models may result in data loss and low reliability
  • Companies are at liberty to remove free accounts and increase fees
  • Business catastrophes like liquidation could cause data loss
  • Users have no control over the encryption keys (Amazon S3 servers)

The Internet is becoming distributed

As Internet connectivity improves, the popularity of distributed systems such as BitTorrent (file transfer), and more recently Diaspora (social networking) increase. There is a trend toward ownership of data rather than subscribing to proprietary services, and also peer-to-peer data transmission rather than routing all information through a single point of failure.

Such systems have increased privacy, reliability and failure tolerance, often at the expense of inconsistent state across the distributed system.

My project Social Backup

Social Backup provides each user with data redundancy to restore important information if local failures cause data loss. Redundant data is spread across a set of trusted peers but kept under encryption so that only the original owner can access it. The social aspect promotes fairness and discriminates against free riders: a user who provides 100MB to 10 users will expect a similar allocation from these peers. This user can then use these allocations for repeated storage of small files or for chunking larger articles across the entire space.

My project aims to show that Social Backup is a viable alternative to physical backup and and cloud services.

Thanks for reading, please add your comments.

Fixing faulty wireless Ubuntu 10.10, Asus Eee PC 901

After upgrading my Asus Eee PC 901 to Ubuntu Netbook Edition 10.10 I was shocked to find that the wireless card stopped working. Problems included:

  • Failing to detect wireless networks
  • Random cutouts after connection
  • Unstable file transfer
  • Very slow download speed (at one point 500 B/sec)

This problem affects all Eee PC models with a rt2860sta wireless card; it is also likely that similar RaLink models are affected, for instance the Eee PC 1000HE among others.

Information about networking capabilities can be found using lspci | grep RaLink* and lsmod | grep rt*.

A working solution

Install the package linux-backports-modules-wireless-maverick-generic using Aptitude. Then run this bash code:

echo "
blacklist rt2800lib
blacklist rt2800pci
blacklist rt2x00usb
" | sudo tee -a /etc/modprobe.d/blacklist.conf

This removes the faulty Linux kernel modules and restores wireless connection functionality. According to the forum post below, problems with WEP networks can be solved by adding rt2x00lib and rt2x00pci back in. Look at these sources if you want more information.

Useful sources

Thanks for reading, please add your comments.

More articles on the next page