Paul Hollands: June 2005 Archives

« May 2005 | Main | July 2005 »

June 28, 2005

This is quite Star Trek..

Perl.com: Data Munging with Sprog

Posted by pj at 08:53 PM

June 23, 2005

SKOS - An RDF schema for thesaural representations

XML.com: Introducing SKOS

Posted by pj at 10:07 PM

June 22, 2005

Re-working the funding opportunities RSS module

I've spent the last week and a half re-working the funding opportunities RSS module so that the syntax fits with the rdf:nodeID way of doing things on the recommendation of Libby Miller. As part of this work for the DEL project I have:

- produced an RDFS definition for the module: http://www.medev.ac.uk/interoperability/rss/1.0/modules/fundops/rss1.0fundopsmodule

- produced a new example document: http://www.medev.ac.uk/fundops_example.rdf

- written some XSLT to transform the comments and specs in the RDFS into HTML:
http://www.medev.ac.uk/rdfs2html.xsl

- written the whole thing up on the MEDEV site:
http://www.medev.ac.uk/interoperability/rss/1.0/modules/fundops/0.2/

Posted by pj at 03:44 PM

June 15, 2005

More RDF play things for Eclipse

A suite of RDF type plug-ins for Eclipse:

RDFX

Posted by pj at 03:03 PM

Eclipse and Jython

Apparently there are Jython plug-ins for Eclipse too:

Eclipse Integration

Posted by pj at 01:09 PM

June 14, 2005

Eclipse and SWeDE

I was looking for an IDE for Java development on Mac OS X yesterday and came across Eclipse:

- http://www.eclipse.org/

Had a tinker with it and it was pretty good. Then this morning I needed an RDFS or OWL editor to start designing the funding opportunities schema. Then I discovered SWeDE:

- http://projects.semwebcentral.org/projects/owl-eclipse/

This is a plug-in application for Eclipse! Hurrah!

Posted by pj at 03:39 PM

June 08, 2005

Nodemap for the new website

I've produced the first page of a nodemap for the new website:

- nodemap page 1

- resources section nodemap

- acitvities section nodemap

Posted by pj at 04:37 PM

Wireframes for the new website

The wireframes for the new web site are here:

- homepage

- other site pages

Posted by pj at 01:06 PM

June 03, 2005

Website workpackages - Draft v 4

WP 1 - Joint MEDEV and HS&P portal site located at www.health.ac.uk

1.1 Determine portal site functional requirements (SSADM?)

1.2 Produce a node map (site map) on paper

1.3 Produce a wireframe for the home page and one for other types of pages

1.4 Investigate options for combining / aggregating multiple RSS feeds into one and implement findings

1.5 Investigate options for allowing personalized portal views

1.6 Finalize and implement the outcomes of the portal navigation WP 3

1.7 Finalize and implement the outcomes of the portal branding WP

1.8 Finalize wireframes and content and produce structural layout CSS for home and other page types

1.9 Implement final site

WP 2 - New MEDEV site located at www.medev.ac.uk

2.1 Determine site functional requirements (SSADM?)

2.2 Produce a node map (site map) on paper

2.3 Produce a wireframe for the home page and one for other types of pages

2.4 Investigate options for allowing personalized portal views

2.5 Finalize and implement the outcomes of the navigation WP

2.6 Finalize and implement the outcomes of the branding WP

2.7 Finalize wireframes and content and produce structural layout CSS for home and other page types

2.8 Produce a low-fi styled site mock up to test content combinations

2.9 Implement final site

WP 3 - Portal navigation design

3.1 Determine landmark navigation and courtesy navigation menu content and information architecture informed by WP 1.2

3.2 Investigate use of MovableType categories for use as contextual navigation elements informed by WP 1.2 and implement findings

3.3 Invsestigate approaches for wayfinding (breadcrumb trail generation) and implement findings

WP 4 - MEDEV site navigation design

4.1 Determine landmark navigation menu content and information architecture informed by WP 2.2

4.2 Investigate use of MovableType categories for use as contextual navigation elements informed by WP 2.2 and implement findings

4.3 Investigate approaches for breadcrumb trail generation and implement findings

WP 5 - Portal site visual design and branding

5.1 Commission a new design and brand for the HEALTH portal site

5.2 Finalise branding and produce and implement CSS

WP 6 - MEDEV site visual design and branding

6.1 Commission a new design and brand for the MEDEV site

6.2 Finalise branding and produce and implement CSS

WP 7 - Generic web form handler project

7.1 SSADM evaluation of requirements

7.2 Set of generic form content checks (dates, URLs, email addresses, MS special chars, post codes, HTML etc.) specified and implemented

7.3 Generic solution (possibly using Python Archetypes developed)

7.4 New table structures developed for all content currently managed in MySQL

7.5 Forms developed using new tools and new DB structures for all content currently managed in MySQL

WP 8 - RSS and iCalendar feeds

8.1 Develop new RSS events RSS feeds based upon new table structure and the RSS 1.0 Event module

8.2 Develop new RSS funding opportunities RSS feeds based upon the new fund ops RSS module defined as part of the DEL work

8.3 Refine iCalendar view of events

8.4 Determine which other RSS feeds to include on the site and incorporate using Feed2JS or similar

8.5 Hack Feed2JS or MagpieRSS to handle the new RSS formats

WP 9 - MovableType work

9.1 Investigate the use of MT for classification of resources informed by WP 4.2 and implement findings

9.2 Hack MT QuickPost functions or use the Atom API to ensure URL of the posted resource is available as an access key for navigation purposes

WP 10 - Portal / MEDEV site and resource searching

10.1 Investigate the functional requirements for site searching in all areas

10.2 Investigate the whether the OpenObjects search engine will fulfill the requirements generated from 10.1

10.3 Implement findings

WP 11 - Porting content over to the new site

11.1 Audit and prioritise the porting of key website sections, candidates to include:

- Workshop bookings and publication
- Contact details administration and user checking
- Group management
- Workshop proposals
- Mini-project proposals
- Publication requests
- Newsletter publication process
- Uploading resources
- FAQs,glossary, projects, special reports, best practice

11.2 Implement new content using outcomes of WP 7

WP 12 MEDEV site information architecture

12.1 Based upon outcomes of 11.1, 2.3, WP 4, WP 8 and WP9, develop a new information architecture for the site, with particular focus on resources (using 'set pattern' for navigation)

Posted by pj at 01:02 PM

June 02, 2005

Auditing and classifying content on an existing web-site for improved access

Website information architecture is the business of deciding where each document sits within a website and what it links from and to. It is about determining what the context of the information should be.

It is also about the navigational elements of a site and the topics or subject headings documents are listed under.

According the the World Wide Web Consortium's good practice guidelines, the location of each resource within its parent site should be reflected in its URL or web address. In other words the way that your website is organised into hierarchies of documents should reflect the nature of the information published therein.

A news item about health should sit with other items of the same topic in the relevant section, giving a URL perhaps like the following:

http://mysite.org/news/topics/health/cold_cures.html

Further, each resource should have its own unique URL which should ideally be readable by humans who may need to remember or re-type it:

http://www.w3.org/QA/Tips/readable-uri

This worked well and was relatively straight forward to achieve when web sites were collections of HTML files in a series of nested directories or folders on a webserver. It is also still possible with modern content management systems which replicate the folder and file model. Things are less easy with database driven websites which rely on session IDs and so forth.

The BBC News website is a good example of how this can be done well:

http://news.bbc.co.uk

It is my experience however, that no matter how well intentioned and clear the rules about the organisation of information on a site are at the start of its life, in the rush to publish and meet deadlines, over time, ad hoc decisions about the location of documents result in a slow but steady degradation the quality of the information architecture.

Also, the larger the site the worse things get.

The MEDEV website <http://www.medev.ac.uk/> has reached the point where the organisation of information on the site has become unwieldy.

We are in the process of designing and building a new website for the subject centre and are therefore taking the opportunity to do an audit of the information we currently publish on the site so that the new information architecture can reflect the information on the site, rather than having to shoe horn documents into an increasingly baroque information architecture as best we can.

A major inconsistency has developed between actual arrangement of information at the folder level of the site and the way this is reflected to the user in what is known as the landmark navigation menu on the top right of each page.

The landmark menu should appear consistently on each page and allow the user to navigate easily between the major sections of the site.

You will notice that there is a Resources menu option which leads the user to the http://www.medev.ac.uk/resources/ section of the site.

There are also links to the main News, Events, Discussion, Funding and Links sections of the site, which should correspond to the main sections of the website information architecture, the top level folders if you will.

You would expect that the News section would have the following URL therefore:

http://www.medev.ac.uk/news/

In fact, the News section is in a sub-folder of the Resources section:

http://www.medev.ac.uk/resources/news/

This is the same for the Events, Discussion, Funding and Links sections of the site.

Given that there are now between two and three thousand resources published on the site and somewhere around seventy two thousand links between those documents, making any changes to the information architecture is a major undertaking if we want to move large sections around without breaking huge swathes of links.

In order to re-organise what information sits where we need to get a grip on the real scope of the information now published on our site. In order to facilitate this I have begun an audit of the information we currently publish there.

In order to determine the context of any documents (including those being generated from databases) and determine where they are linked from and where they link to, I have written a web crawler or robot script. This crawls our own website performing a number of different tasks as it goes:

1. It turns the HTML of each document into plain text and then inserts that text into a MySQL table so that we can quickly do free text searching across everything published on our site.

2. It gathers lists of all the links contained in that document and inserts that information into another table so that by searching with URLs we can quickly determine all the places where a particular document is linked from, in case we wish to change its URL as part of our re-organisation.

3. It also gathers information about the types of links (internal, external, mailto and so forth) and the content-type of each document (whether it is an HTML document, a PDF, a Word document and so forth.)

4. It produces a separate list of links to documents external to our site and checks that they are still active and not broken, logging the HTTP response code for each URL in our database to aid fixing of broken links.

Having knowledge of the full scope of the site is very powerful in that we can now determine which sections have the largest number of resources in them and this will help us in prioritizing which sections to move onto the new site first and what our new bottom up information architecture might look like. We can also tell more easily which sections are redundant or anachronistic so that these can be archived off on the old site.

Having a full text index of the whole site also makes the second step in this process much easier. I'm refering to the job of classifying each resource on the new site according to a given set of subject headings or topics.

Essentially we are talking about cataloguing all of the resources on our current site, to produce metadata for each. This metadata could form the basis of a very simplistic ontology layer, a browsable tree of subject based navigation to ease access to all of the information held on our site for our users.

In the parlance of services like del.icio.us and flickr we will be tagging our resources. Basically, categorising them.

We have a number of ready made vocabularies that we use for cataloguing and categorisation within the subject centre, including MeSH (Medical Subject Headings) and Higher Education Academy Pedagogy and Policy themes, but our topics vocabulary also needs to be adaptable without breaking anything or the need for major re-classification.

If we can develop a robust set of classifications for our resources then this can form a major new element of our information architecture.

Given that we now have an index of the all the text on the site, and once we have our vocabulary figured out, it might be possible to do some of the categorisation automatically. There are some very exciting developments in the area of auto-classification using Bayesian algorithms and there are tools available for filtering email spam using these techniques which might be re-tasked for this purpose:

http://spambayes.sourceforge.net/

Nevertheless, the bulk of the work will probably need to be done by hand and certainly for new resources added to the site after it has gone live. The cataloguing process therefore needs to be quick and easy, taking up the minimum of time while still capturing metadata of suitable quality.

Fortunately myself and my colleagues are already very familiar with software which allows the "cataloguing" and classification of resources quickly and effectively. We use MovableType for blogging useful resources.

Using the MT quickpost facility we can describe, classify and quickly add a basic entry for the resource we currently have in our browser window into our blog. Further, MT stores all the entries in a MySQL database as well as the categories available to classify each resource.

On a vanilla MT blog page the classification terms form a key part of the navigation for users and is the basis for the whole blog information architecture.

Being based in MySQL and also having a good deal of flexibility in terms of publishing chunks of blog content, it would be trivial to incorporate this menu / hierarchy of category terms into the information architecture of our new site.

Further, MT includes lots of options for the syndication in formats such as RSS and Atom. We are intending to incorporate the blog RSS feed to form a 'Latest additions to this site' information pod on the new site home page.

We could also use the MT facilities to allow comments and trackbacks about resources on our site.

Finally, MT produces Dublin Core metadata for each entry added expressed as RDF.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://minnesota.ncl.ac.uk/fuzzybuckets/archives/2005/06/article_about_u.html" trackback:ping="http://minnesota.ncl.ac.uk/cgi-bin/moveabletype/mt-tb.cgi/1146" dc:title="Article about using patterns to improve web navigation" dc:identifier="http://minnesota.ncl.ac.uk/fuzzybuckets/archives/2005/06/article_about_u.html" dc:subject="Navigation design" dc:description="Improving Web Information Systems with Navigational Patterns..." dc:creator="pj" dc:date="2005-06-01T16:30:54+00:00" /> </rdf:RDF>

Posted by pj at 12:44 PM

June 01, 2005

Article about using patterns to improve web navigation

Improving Web Information Systems with Navigational Patterns

Posted by pj at 04:30 PM

Website workpackages - third draft

Workpackage 1 - Joint MEDEV / HS&P portal demonstrator page at http://www.health.ac.uk

- Determine portal page functional requirements

- Produce a non-styled mock-up to test content combinations

- Produce a wireframe for the page

- Finalise and implement outcomes of navigation WP

- Finalise content and wireframe and produce CSS

- Investigate combining / aggregating multiple RSS feeds into one

- Investigate options for allowing personalized views

- Implement final demonstrator page

Workpackage 2 - New MEDEV website

- Determine functional requirements

- Produce a non-styled mock-up to test content combinations

- Produce a wireframe for the page

- Finalise and implement outcomes of navigation WP

- Produce a node / site map on paper

- Finalise content and wireframe and produce CSS

- Implement final site

Workpackage 3 - Navigation design

- Develop and model demonstrator navigation elements (such as breadcrumbs) for portal page

- Develop and model final MEDEV site navigation

- Implement navigational elements

Workpackage 4 - MEDEV site visual design and branding

- Commission a new design and brand for the website

- Finalise branding for MEDEV website

- Request all necessary logos from Academy

- Produce CSS to implement MEDEV site branding

Workpackage 5 - RSS and iCalendar feeds

- Finalise events database table structure

- Develop new events RSS feeds using the Event module

- Develop new fund ops RSS feeds using new schema

- Refine iCalendar view of events

- Build movabletype Latest additions consumer pod into MEDEV homepage

- Forms for event information entry to include appropriate publishing and editorial controls using outcomes of WP 11

- Forms for fund ops information entry to include appropriate publishing and editorial controls using outcomes of WP 11

- Include feeds from mini-projects RSS and a number of blogs (to be determined)

- Hack Feed2JS or MagpieRSS tools to handle the new RSS formats

Workpackage 6 - MovableType work

- Investigate the strengths and threats of using MT for classification over and above a home grown DB approach

- Implement a movabletype blog for Latest additions blogging and feed (to double for classification of resources too)

Workpackage 7 - Portal / MEDEV site and resource searching

- Investigate the functional requirements for site searching in all areas

- Investigate use of OpenObjects search engine

- Implement according to findings of the above

Workpackage 8 - DTML to ZPT and / or Plone porting

- Generic form builder and manager DTML

- Workshop proposals

- Mini-project proposals

- Publication requests

- Workshop publishing

- Workshop bookings

- Contact details administration

- Contact details user checking

- Newsletter publication process

- Uploading of resources for MPs and WSs

- FAQ, glossary, project reports and websites, special reports and best practice

Workpackage 9 - XSLT based RSS to HTML conversion

- Develop Zope / Python based tools for running XSLT transformations on the various flavours of RSS feeds (including new modules) we will be using

- Develop XSL stylesheets for transformations

Workpackage 10 - HEALTH site visual design and branding

- Commission a new design and brand for the website

- Agree designs with HS&P

- Finalise branding for HEALTH website

- Request all necessary logos from Academy

- Produce CSS to implement HEALTH site branding

Workpackage 11 - Generic web form handler project

- Paul D and Tony's team to do SSADM evaluation of requirements

- Set of generic form content checks defined and implemented

- Generic solution possibly using Python Archetypes developed

- Forms developed using new tools and new DB structures for all content currently managed in MySQL

Posted by pj at 03:12 PM