Ali's Stuff: November 2007

Friday, 30 November 2007

JISC Annual Review

JISC's Annual Review includes a section on eResearch http://www.jisc.ac.uk/aboutus/annualreview/2007/eresearch.aspx. Good to see mention of some of my projects: NaCTeM, ASSERT, GEMS2, GEESE, SAW-GEO, SEE-GEO :-) There's also a podcast with John Wood, JSR Chair.

The Argo project

Mentioned on the BCS-KIDDM list this week, the ARGO project is interesting not only for the work it's doing in measuring vital information on our oceans, but in the way it is managing data. The real-time data are freely offered to anyone and can be downloaded from the web. The project involves around 30 countries including the UK (managed by the Met Office, with partners: the National Oceanography Centre, the British Oceanographic Data Centre and the UK Hydrographic Office). UK involvement is jointly funded by Defra, MoD and NERC.
http://www.argo.net/

Tuesday, 27 November 2007

Geospatial Knowledge Infrastructures Workshop

I managed to catch some of the Geospatial Knowledge Infrastructures Workshop today, part of the eSI programme and hosted by the Welsh eScience Centre. Here are my quick notes...

Rob Lemmens from International Institute for Geo-Information Science and Earth Observation talked about end-user tools. He outlined the different approaches of corporate/national Spatial Data Infrastructures (SDIs) which is a centralised approach and Web 2.0 which is community driven. SDIs are based on stricter rules for annotation and accuracy tends to be higher than Web 2.0 tools, although this is changing. Rob outlined the need for a semantic interoperability framework (combination of ontologies, their relationships and methods for ontology-based description of info sources - data sets, services etc) and a semantic interoperability infrastructure (comprises framework and the tools to maintain and use the framework as well as the information sources produced within this framework). Rob's presentation also included a slide outlining the characteristics of an ontology which was a good representation and a demonstration of ontology visualisation (same tool which ASSERT is using for clustering?). Rob concluded by summarising what the geospatial community can learn and take from Web 2.0, for example tagging/tag clouds, tools for building ontologies (community tagging e.g Google Image Labeller), instant feedback (e.g. password strength bars when selecting a new password) - on the negative side, community-driven tagging can lead to weak semantics. Rob suggests combining the best of both SDI and Web 2.0 worlds - map the SDI and Web2.0 ontologies to create dynamic annotations of geo sources, thus improving discovery.

Ulrich Bugel from Fraunhofer Institut IITB presented on ontology based discovery and annotation of resources in geospatial applications. Ulrich talked about the ORCHESTRA project (http://www.eu-orchestra.org/) which aims to design and implement an open service-oriented architecture to improve interoperability in a risk management setting (e.g. how big is the risk of a forest fire in a certain region of the Pyrenees in a given season?). This question has spatial references (cross-border, cross-administration); temporal references (time series and prognostics); thematic reference (forest fire); and conceptual reference (what is risk?). ORCHESTRA will build a service network to address these sorts of question. Interoperability is discussed on 3 levels - syntactic (encodings), structural (schemas, interfaces), semantic (meaning). The project has produced the Reference Model for the ORCHESTRA Architecture (RM-OA), drawing on standards from OGC, OASIS, W3C, ISO 191xx, ISO RM-ODP. Many iterations of the Reference Model which led to Best Practice status at OGC. The ORCHESTRA Architecture comprises a number of semantic services: Annotation Service automatically generates meta-information from sources and relates them to elements of an ontology; Ontology Access Service enabling high-level access and queries to ontologies; Knowledge Base Service; Semantic Catalogue Service.

Ian Holt from Ordnance Survey presented on geospatial semantics research at OS. OS has one of the largest geospatial databases, unsurprisingly, with 400 million features and over 2000 concepts. Benefits of semantics research: quality control, better classification; semantic web enablement, semi-automated data integration, data and product repurposing; data mining - i.e. benefits to OS and to customers. OS has developed a topographic domain ontology which provides a framework for specifying content. www.ordnancesurvey.co.uk/ontology. Developed ontologies for hydrology; administrative geography; buildings and places. Working on addresses; settlements; and land forms. Supporting modules on mereology, spatial relations, network topology. Conceptual ontology- knowledge represented in a form understandable by people vs computational topology - knowledge represented in a form understandable by computers. A controlled natural language called Rabbit has been developed - structured English, compilable to OWL. OS is also part of the OWL 1.1. task force to develop a controlled natural language syntax. A project currently underway developing plug in for Protege with Leeds University - allows natural language descriptions and in the back end, will translate into an OWL model. The first release is scheduled for December with further release planned for March 08. Ian also talked about experimental work to semantically describe gazetteers - an RDF version (downloadable?) to represent the data and OWL ontology to describe the concepts. This work includes administrative regions and work underway to include cities etc. Through their work, OS has experienced some problems with RDF - e.g. may degrade performance (they have >10 billion triples); how much is really needed?. Ian described some work on semantic data integration e.g. "find all addresses with a taxable value over £500,000 in Southampton" so looking at how to merge ontologies (i.e. creating another ontology rather than interoperability between the two). Ian briefly covered some lessons learned - ontologies are never perfect and can't offer complete descriptions of any domain; automatic tools are used as far as possible. Ian also describe work on linking ontologies to databases using D2RQ which maps SPARQL queries to SQL, creating "virtual" RDF. Conclusions : domain experts need to be at the centre of the process; technology transfer is difficult - benefits of semantics in products and applications must be clarified.

Alun Preece from Cardiff University presented on an ontology-based approach to assigning sensors to tasks. The idea is to bridge the gap between people out in the field needing to make decisions (e.g. disaster management) and the data/information produced from networks of sensors and other sources. Issues tackled: data orchestration (determine, locate, characterise resources required); reactive source deployment (repurpose, move, redeploy resources); push/pull data delivery. The approach is ontology-centric and involves semantic matchmaking. Work on proof of concept - SAM (Sensor Assignment for Missions) software prototype and integration with a sensor network. This work is funded by US/UK to support military application - intelligence, surveillance and reconaissance (ISR) requirements. The work uses ontologies to specify ISR requirements of a mission (e.g. night surveillance, intruder detection) and to specify the ISR capabilities provided by different asset types. Uses semantic reasoning to compare mission requirements and capabilities and to decide if requirements are satisfied. For example, if a mission requires Unmanned Aerial Vehicles (UAV), the ontology would specify different types of UAV and the requirements of the mission (e.g. high altitude to fly above weather, endurance) and the semantic matchmaking (exact, subsuming, overlapping, disjoint) then leads to a preferred choice. The project has engaged with domain experts to get the information into the ontology and to share conceptualisations. Alun showed the Mission and Means Framework Ontology which is a high-level ontology which is fleshed out with more specific concepts.

Slides from the workshop will be uploaded to http://www.nesc.ac.uk/action/esi/contribution.cfm?Title=832

Wednesday, 21 November 2007

Comms planning for projects

A really helpful post from Alice Gugan, to help projects manage their communications effectively - thanks Alice!
http://lawrie.jiscinvolve.org/2007/11/16/seven-steps-to-developing-an-effective-communications-plan/

Gnizr

Have come across this twice this week, both times in a geospatial context. The Geospatial Semantic Web blog describes Gnizr, a new open source application http://www.geospatialsemanticweb.com/2007/11/16/gnizr-open-source. One of my projects is considering using Gnizr, possibly with its sister application KnowledgeSmarts, which I don't think is open source. (More info at http://code.google.com/p/gnizr/ and http://www.imagemattersllc.com/products/gnizr.php)

Breaking down digital barriers - report and case studies

Thanks to docuticker (http://www.docuticker.com/) for pointing out a report on Breaking Down Digital Barriers (http://www.docuticker.com/?p=17844). One of the three case studies looks at interoperability issues with mashups:
"Most clearly among our three case studies, the area of Web services demonstrates the manner in which interoperability can stimulate large-scale innovation."

UPGRADE journal for informatics

Just came across reference to UPGRADE (http://www.upgrade-cepis.org/index.html) via the BCS. Recent issues on semantic web, web services, next generation search. Only had time for quick look so far but seems interesting reading.

Friday, 16 November 2007

Chris Date Lecture @ NeSC

The latest NeSC newsletter http://www.nesc.ac.uk/news/newsletter/November07.pdf features a report on the Chris Date Lecture last month. Highlights:

"Bill Pike (Pacific Northwest National Laboratory), in his presentation on integrating knowledge models into the scientific analysis process [...] described the challenge of trying to capture scientific knowledge as it is created, with workflow models that describe the process of discovery. In this way, the knowledge of what was discovered can be connected with
the knowledge of how the discovery was made."

"If future generations of scientists are to understand the work of the present, we have to make sure they have access to the processes by which our knowledge is being formed. The big problem is that, if you include all the information about all the people, organisations, tools, resources and situations that feed into a particular piece of knowledge, the sheer quantity of data will rapidly become overwhelming. We need to find ways to filter this knowledge to create sensible structures... "

"One method for explicitly representing knowledge was presented by Alberto Canas (Institute for Human and Machine Cognition). The concept maps that he discussed are less ambiguous than natural language, but not as formal as symbolic logic. Designed to be read by humans, not machines, they have proved useful for finding holes and misconceptions in knowledge, and for understanding how an expert thinks. These maps are composed of concepts joined up by linking phrases to form propositions: the logical structure expressed in these linking phrases is what distinguishes concept maps from similar-looking, but less structured descriptions such as "mind maps". "

Skills required to support SOA and integrated services

From Computing 15 November 2007, article "Fitting the skills together" by Jim Mortleman:
"..analyst Gartner predicts that four out of five companies will have taken the SOA route by 2010...SOA involves a fundamental change to the way firms think about IT - namely, as a series of interoperable business services, rather than as discrete IT systems."

The article also quotes Nick Masterton-Jones, IT Director of Vocalink: "I think SCA is something we're going to see a lot more of in the coming three years" SCA is Service component architecture "an open SOA promoted by major Java vendors to bridge the gap between people who understand the business domain and people who understand system design".

Monday, 12 November 2007

Data sharing

Useful links:
OECD Principles (2007): http://www.oecd.org/document/55/0,3343,en_2649_37417_38500791_1_1_1_37417,00.html

RIN's Stewardship of Digital Research Data (2007): http://www.rin.ac.uk/data-principles

MRC's Guidelines on data sharing: http://www.mrc.ac.uk/PolicyGuidance/EthicsAndGovernance/DataSharing/PolicyonDataSharingandPreservation/index.htm

BBSRC's Guidelines on data sharing: http://www.bbsrc.ac.uk/support/guidelines/datasharing/context.html

Plus some interesting outputs from JISC-funded projects:

Liz Lyon's Dealing with Data report: http://www.jisc.ac.uk/whatwedo/programmes/programme_digital_repositories/project_dealing_with_data.aspx. A very comprehensive overview with a list of recommendations which is now being reviewed by JISC.

GRADE project : http://edina.ac.uk/projects/grade/Grade_reportRSSv2.pdf. Found that researchers most commonly use USB stick and email to share small datasets. Also noted that as well as enabling sharing/preservation, a national repository would enable UK to contribute to European and other international intiatives.

DISC-UK Datashare : http://www.disc-uk.org/docs/state-of-the-art-review.pdf. One interesting finding reported is from Australian colleagues who found that a single repository wasn't proving effective and they subsequently moved towards two distinct repositories: one to enable collaboration on work-in-progress and one for published outputs/datasets.

There's a lot in these links about the wider context; how things look now; barriers to data sharing (e.g. trust, IPR, time); discussion on possible solutions (e.g. social software models, reward/recognition, mandates).

Semantic Web?

New Scientist has an interesting article, "'Semantic' website promises to organise your life" (http://technology.newscientist.com/channel/tech/dn12903-semantic-website-promises-to-organise-your-elife.html?feedId=online-news_rss20) which talks about Twine, currently in beta testing. Twine uses annotation and natural language processing. It'll be interesting to see how it works when it's released. The article mentions other semantic services in development: Powerset, True Knowledge and Freebase.

OGSA-DAI 3.0

NeSC's October newsletter (http://www.nesc.ac.uk/news/newsletter/October07.pdf) features a news item announcing the release of OGSA-DAI 3.0. It includes a clear description of why OGSA-DAI is needed and what it does - one of the simplest explanations I've seen! It also uses SEE-GEO as an example of how OGSA-DAI is being used in the community.

Friday, 9 November 2007

OGF21 - interesting snippets

This time round OGF featured a Semantic Grid theme which I think was either missing or limited last time round as the Manchester event clashed with the WWW conference in Banff. There were a number of presentations on Web2.0 including:

Savas Parastatidis from Microsoft talking about "the cloud" http://www.ogf.org/OGF21/materials/1031/2007.10.15%20-%20OGF%20-%20Web%202.0-Cloud%20Era%20and%20its%20Impact%20on%20how%20we%20do%20Research.pdf
Dave de Roure talking about the JISC-funded myExperiment (VRE2) http://www.ogf.org/OGF21/materials/1030/OGF21myExperiment.ppt

And of course not forgetting the geospatial stuff...
http://www.gridforum.org/gf/event_schedule/index.php?id=960

OGF and OGC MoU

At OGF21, the OGF and OGC signed a Memorandum of Understanding. According to the OGF's Grid Connections Newsletter (Nov 07), the goals of the collaboration will include:
"1.) Integrate OGC's Web Processing Service (WPS) with a range of "back-end" processing environments to enable large-scale processing. The WPS could also be used as a front-end to interface to multiple grid infrastructures, such as TeraGrid, NAREGI, EGEE, and the UK's National Grid Service. This would be an application driver for both grid and data interoperability issues.
2.) Integration of WPS with workflow management tools. OGF’s SAGA draft standard is where multiple WPS calls could be managed.
3.) Integration of OGC Federated Catalogues/Data Repositories with grid data movement tools. OGF’s GridFTP is one possibility that supports secure, third-party transfers that are useful when moving data from a repository to a remote service.
However, the real goal is not just to do science, but to greatly enhance things like operational hurricane forecasting, location-based services, and anything to do with putting data on a map. WPS is just a starting point for the collaboration. As the two organizations engage and build mutual understanding of technical requirements and approaches, many other things will be possible. "

Thursday, 8 November 2007

Agile development

One of my new projects is proposing to use agile development techniques, as opposed to traditional linear project management. This is something that's been talked about by my JISC colleagues before but I haven't looked too deeply into it - so this has been a good opportunity to talk to colleagues about their experiences.

Key issues and lessons

Projects should use wikis/websites to enable tracking of work through the development lifecycle

Be prepared to adapt templates for project documentation

As a Programme Manager, you may need more regular/frequent engagement with projects - the 6-monthly progress report is not going to be sufficient

Useful links (in no particular order)

http://www-306.ibm.com/software/rational/agile/

http://apln.org/

http://pmdoi.org/

LIFE Project: model for lifecycle costing of digital objects

Recently posted to JISC-REPOSITORIES:
"The LIFE Project has recently published a revised model for lifecycle costing of digital objects." The project team is looking for comments via the project blog.
More info at:
http://eprints.ucl.ac.uk/archive/00004831/

Kings - Centre for eResearch

Thanks to Ann for posting about this to the JISC-ERESEARCH list:

"King's College London is pleased to announce the establishment of the KCL Centre for e-Research. Based in Information Systems and Services, the Centre will lead on building an e-research environment and data management infrastructure at King's, seeking to harness the potential of IT to enhance research and teaching practice across the College. The Centre also has a remit to make a significant contribution to national, European and international agendas for e-research, and in particular to carry forward in a new context the work of the AHDS across the arts and humanities.
Planning for the new Centre began on 1st October 2007 and a major launch event is planned for Spring 2008. Further information and news about the Centre and its activities will be released over the coming months."

Wednesday, 7 November 2007

Info World Review : interesting news items

Information World Review (Nov 07) features some interesting items...

- a news item, Search and aggregators set to dominate, on the recent Outsell Information Industry Outlook report:
"Watson Healy said 2008 would be 'year of the wiki', with Web 2.0 technology replacing complex portals and knowledge management, and that 'a critical mass of information professionals would take charge of wikis, blogs or other 2.0 technologies on behalf of their organisations".

- an item, PubMed recasts rules for open access re-use, on the new guidelines recently agreed by the UK PubMed Central Publishers Panel:
"Under the terms of the statement of principles, open access (OA) published articles can be copied and the text data mined for further research, as long as the original author is fully attributed".

Cool mashup: migrating swans

This is a nice example of a mashup, using Google Maps to track the migration of swans from Iceland to UK/Ireland. You can follow their flights on the map (though it's hard to get the right scale) and be kept up to date with RSS feeds
http://www.wwt.org.uk/superwhooper/573/super_whooper.html

Tuesday, 6 November 2007

Business and Community Engagement at JISC

http://www.jisc.ac.uk/whatwedo/themes/business_community_engagement.aspx

There've been several publications from this programme of work recently:

User needs study: How JISC could support Business and Community Engagement
Evaluation report: JISC Services and the third stream
Final report: Study of Customer Relationship Management issues in UK HE institutions
Study: The use of publicly-funded infrastructure, services, and intellectual property for BCE
Business and Community Engagement: An overview of JISC activitiesPortable Document Format

Friday, 2 November 2007

Google launches OpenSocial

http://news.bbc.co.uk/1/hi/technology/7070815.stm
"Google has launched a system that will allow developers to create applications for a variety of social networks."

Latest Ariadne : NaCTeM, repositories and KIDDM

http://www.ariadne.ac.uk/issue53/

Good to see NaCTeM :-) A good overview of the current services and a run-through their roadmap:
"NaCTeM's text mining tools and services offer numerous benefits to a wide range of users. These range from considerable reductions in time and effort for finding and linking pertinent information from large scale textual resources, to customised solutions in semantic data analysis and knowledge management. Enhancing metadata is one of the important benefits of deploying text mining services. TM is being used for subject classification, creation of taxonomies, controlled vocabularies, ontology building and Semantic Web activities. As NaCTeM enters into its second phase we are aiming for improved levels of collaboration with Semantic Grid and Digital Library initiatives and contributions to bridging the gap between the library world and the e-Science world through an improved facility for constructing metadata descriptions from textual descriptions via TM."

Other interesting snippets:

SURFshare programme covering the research lifecycle http://www.surffoundation.nl/smartsite.dws?ch=ENG&id=5463

a discussion on the use of Google as a repository : "Repositories, libraries and Google complement each other in helping to provide a broad range of services to information seekers. This union begins with an effective advocacy campaign to boost repository content; here it is described, stored and managed; search engines, like Google, can then locate and present items in response to a search request. Relying on Google to provide search and discovery of this hidden material misses out a valuable step, that of making it available in the first instance. That is why university libraries need Google and Google needs university libraries."

feedback from ECDL conference, including a workshop on a european repository ecology, featuring a neat diagram showing how presentations are disseminated after a conference using a mix of web2.0, repositories and journals http://www.ariadne.ac.uk/issue53/ecdl-2007-rpt/#10

Ali's Stuff