Friday, 14 December 2007

New Innovation and Knowledge Centres to be set up

Computing (13 Dec) reports on the news that 2 new research hubs (Innovation and Knowledge Centres) are to be set up with £19m funding, including £5m from the Technology Strategy Board. Universities are being invited to bid to host the centres - successful bids are to be announced Sept 08. Their purpose will be "to combine academic and business design facilities with the aim of promoting the commercial exploitation of new developments". There are already 2 pilot IKCs in Cambridge and North Wales.

Tony Hey's vision for eResearch

December's NeSC News features a write-up of a lecture given by Tony Hey in which he talks about the future direction of eResearch:

"Research today is data-intensive, compute-intensive, collaborative and multidisciplinary. Researchers are becoming "extreme information workers", looking for subtle signals in great volumes of data. Technologies are emerging that enable a step change in handling scientific data: visualisation, analysis and processing, and also data management and preservation.

Where we have failed, so far, is in making it as easy to use the technology for handling these challenges as it is to use the web. Some communities, like particle physicists, are quite happy with technically complex middleware, but most are unwilling to work with these sorts of tools.
The new technologies of the social web may be the key to empowering researchers in the new data-centric world. It’s already happening in some communities. [...]

Beyond the lab, academic publishing is on the verge of a revolution. [...]

In the new world of e-Research, everything is connected in the cloud – the web-based applications and data stores out there in the internet. There will increasingly be services and tools in the cloud, accessed through simple interfaces via a web browser. [...]"

BCS project management articles

A couple of interesting articles on the BCS site:

Art or science? describing the key skills and competencies a project manager needs to succeed e.g. managing uncertainty, managing expectations

What an elephant is like on making the most of project meetings

CNI program 07-08

CNI (Coalition for Networked Information) in the US launched its program plan for 2007-08 at their recent meeting ( There are several references to eResearch:

"There is a renewed focus on campus infrastructure to support research programs. Developments include: policy, technical and economic influences that are leading to a partial re-centralization of computing functions; radically new high performance network and distributed computing technologies; a rethinking of storage functionality and economics; requirements for long-term data management, curation and preservation; and growing faculty demands for informatics support services. An additional dimension of these needs involves information and technology intensive collaborations among groups at multiple campuses (sometimes characterized as collaboratories or virtual organizations). Complementing the organizationally oriented work on e-research already described, CNI is also concerned with the institutional and cross-institutional rdevelopment of technical infrastructure, with a particular focus on large-scale storage and data management, and on collaboration tools and environments."

The idea of an Executive Roundtable is an interesting way to engage senior stakeholders:
"The Executive Roundtable assembles executive teams (usually the chief librarian and chief information technology officer) from about ten institutions for a focused two-to-three hour discussion of a specific topic of interest on the morning of the first day of the Task Force meeting. Past topics have included institutional repositories, learning management system strategies, identity management, learning spaces, funding innovation, and infrastructure to support research, which brought together vice presidents or vice provosts of research, in addition to the usual Roundtable organizational representatives from libraries and information technology."

"In the 2007-2008 program year CNI will continue to engage e-research developments both in the sciences and the humanities. The US National Science Foundation is launching major programs addressing data curation (the DataNet initiative, and also the Community-based Data Interoperability Networks program), and we will be highlighting these in our Task Force meetings."

"CNI is concerned with questions about availability of data related to scholarly work, and has been engaged in a number of discussions around open access, open science, and open data as they relate to this question, as well as discussions about disciplinary norms for data sharing. We will also continue to explore and document the ways in which data and computationally intensive scholarship are altering the nature of scholarly communication; the issues here include the legal and technical barriers to large-scale text and data mining; appropriate organizational, policy and technical strategies for linking articles and underlying data; and ways to construct scholarly works that are amenable to various combinations of human and machine use."

"As part of our ongoing exploration of the institutional implications of the emergence of e-science and e-research, we will continue to look at organizational and staffing questions. These include: how to appropriately combine and balance centralized and departmental support resources to most effectively support faculty and students; new information technology/library collaborations required by the e-research environment; and the staffing needs of data curation programs. In this endeavor we will work closely with ARL, where an e-science task force has recently mapped out a number of similar questions from a library perspective, and with the EDUCAUSE Cyberinfrastructure Task Force."

Thursday, 13 December 2007

The structured web

Interesting post on Alex Iskold's blog: looking at how the web will evolve to handle structured information, thus paving the way for the Semantic Web.

Access to raw data

Interesting post on the Open Knowledge Foundation blog...Give Us the Data Raw, and Give it to Us Now


The World Economic Forum announced its list of Technology Pioneers 2008 ( including Garlik, which is "the first company to develop a web-scale commercial application of semantic technology"; Wikimedia; and Imaginatik, which is developing collaborative spaces to explore ideas.

The Andrew W Mellon Foundation announced its Awards for Technology Collaboration ( which "honor not‐for‐profit organizations for leadership in the collaborative development of open source software tools with application to scholarship in the arts and humanities, as well as cultural-heritage not‐for‐profit activities".

Monday, 10 December 2007

CNI task force meeting

The Coalition for Networked Information (CNI) task force meeting is going on this week. Several projects dealing with data are being discussed: One which is particularly relevant to work of eResearch team is Data-Cyberinfrastructure Collaboration at the University of California, San Diego:

"At the University of California, San Diego (UCSD), the University Libraries are [...] working collaboratively with the San Diego Supercomputer Center to build an intersect of personnel, expertise, and services to provide long-term preservation of and access to research data that enables domain scientists and researchers to carry-out longitudinal complex data analysis to support interdisciplinary research. This critical partnership is providing new opportunities to the UCSD community and when linked with opportunities being developed for a University of California (UC) system-wide grid service platform, it will truly transform the way discovery and access intersect at UCSD and within the UC system."

JISC podcast with Professor John Wood

JISC's Annual Review features a podcast with Professor John Wood, on the work of the Sub-Committee for the Support for Research (JSR), which can be accessed at

Professor Wood talks about current work of the JSR to develop a high level strategy to deliver real results, focusing on fewer bigger projects rather than many smaller projects. The data deluge is a key concern: the amount of data generated by research expected to rise almost exponentially. There are implications for institutions, not least, the costs involved. Professor Wood described a move from libraries of physical materials to virtual data stores. Some of the areas needing clarification are: getting the middleware right; agreeing approaches to metadata; and linking datasets effectively. Professor Wood is engaged with discussions at an EU level but feels one of the key roles of JSR is to communicate the urgency of the data deluge problem.

Alongside the work of JSR, JISC is engaging with Research Councils on the infrastructure needed to support research. Professor Wood also chairs JISC Scholarly Communications group which is now looking at various media and how these may be linked in a holistic way to support researchers. From an institutional perspective, the impact of JSR (and indeed sometimes JISC) is somewhat hidden from researchers. They will have heard of, maybe even JISCmail but may be unfamiliar with JISC itself.

Regarding the future of JSR, Professor Wood sees a need to focus on larger projects, quoting the examples of the Digital Curation Centre ( and the National Centre for Text Mining (, now starting to show results. It is vital to look at what researchers need otherwise there is a risk of different groups adopting different approaches. There is also a need to engage on an international level to ensure interoperability, thus enabling international collaboration.

Professor Wood explains the need to look ahead 10 years in order to develop a vision. He outlines 4 issues in particular which JSR must tackle:
  • what sort of middleware should we support as standard?
  • what software development do we need to maximise the infrastructure we have?
  • what are the priorities for tackling data storage and supporting/sustaining repositories?
  • what training is required to enable research communities to understand what is available?

Knowledge Discovery Resources - Marcus Zillman

Just came across this whilst catching up on Bloglines...

Thursday, 6 December 2007

Networking for researchers

Research Information (Ded/Jan) also has a nice article on resources for biomedical researchers, including mention of a life sciences social networking site to be launched in January ( - it's a prepopulated network, sourced from NIH data, containing over 1.4 million profiles. It'll be an interesting one to watch...

Mandate for access to NIH research

In Research Information (Dec 07/Jan 08), the story "US Senate approves mandate for access to NIH Research". It will allow NIH to require, rather than request, researchers to make their NIH-funded research outputs publicly available via PubMed Central. Currently, less than 5% is deposited voluntarily.

LIFE - digital lifecycle cost study

LIFE, a collaboration of British Library and University College London funded by JISC, started its second phase this year. The first phase "examined the life cycles of key digital collections at UCL and the British Library and established the individual stages in the cycle. These stages were then costed to show the full financial commitment of collecting digital materials over the long term". The second phase, which will end next August, "aims to refine the LIFE methodology adding four new exemplar case studies to further build upon LIFE1"

Wednesday, 5 December 2007

JISC Information Governance Gateway

JISC Information Governance Gateway now launched at - useful resource for institutions

Tuesday, 4 December 2007

Email : outsource or inhouse?

Thanks to ResourceShelf for pointing out this story:
When E-Mail Is Outsourced
This looks at some of the issues institutions are now facing (although US focused) in deciding how to move ahead with email and other services. Microsoft and Google have both marketed to the higher education sector and offer the benefits of integration. But the choice facing institutions is not simple and raises a number of questions relating to:
  • cost/value
  • role of IT services
  • privacy and ownership of data
  • advertising
  • the value of an ".edu" or "" email address
  • capacity to innovate
  • support required
  • ability to influence priorities for development.
Relates to a story on BBC News ( earlier this year and to the move towards "software as services" mentioned in my posting on 11th October (

Models of sustainability workshop

Yesterday, I went along to the workshop on Models of Sustainability, jointly organised by JISC and the Oxford eResearch Centre. It was an interesting workshop, with a keynote by Malcolm Read followed by presentations from eScience projects. Here are a few of the issues and ideas discussed:
  • Have eScience Programme outputs reached the level of sustainability needed? How long is a reasonable length of time to expect a step change?
  • Can we have generic tools given that research itself is not generic?
  • How much tinkering of tools and software do researchers actually want to do?
  • Although we have a culture of sharing software, there isn't the same culture of sharing data (file sharing does not equal data sharing!). The problem is not technological but cultural
  • In time, will current students bring their social networking skills into research?
  • What can be learned from Athens to Shibboleth move in terms of running a development programme to inform a production programme?
  • Sustainability = continuing efforts, changing culture, reducing duplication, encouraging sharing and discussion, open collaboration. Must not forget the broader sustainability agenda (e.g. HEFCE shared services programme)
  • The sofware engineering needs to be sound and built on a solid framework. Academia is perhaps not geared to developing robust software and middleware; funding agencies generally haven't funded software development; career progression reward for those developing software is difficult; staff recruitment and retention issues; sustainability not even on radar screen of many HEIs and most academics
  • One option is the spinoff company - in this instance, it is important to establish trust between company and university. Takes time to get technology to market. The DTI/Technology Standards Board follow on is a great bridge. Keep the team together as far as possible
  • The team needs a mix of scientific, domain, financial and business experience
  • Sustainability depends on users but need to promote long term view (vision of integrating compute, data and collaboration is not easy for researchers in a hurry with a short term view); new ways of working takes researchers out of their comfort zones
  • If you want to continue to innovate, maintaining what you have becomes more difficult – issues of scalability, competition for support. There is a tension between maintaining an infrastructure and innovating
  • Sustainability lessons – work with user community; constantly innovate and deliver; develop modular easy to use software; strong promotion, personal ownership; vision - many new ideas
  • Innovation has 2 strands – new technical capability, new science communities
  • Is there a role for Full Economic Costing in sustainability?
  • Need to get across that software and data are facilities and therefore need managing
  • What is the role of institutions in helping to sustain a project?

Slides will be available from

Friday, 30 November 2007

JISC Annual Review

JISC's Annual Review includes a section on eResearch Good to see mention of some of my projects: NaCTeM, ASSERT, GEMS2, GEESE, SAW-GEO, SEE-GEO :-) There's also a podcast with John Wood, JSR Chair.

The Argo project

Mentioned on the BCS-KIDDM list this week, the ARGO project is interesting not only for the work it's doing in measuring vital information on our oceans, but in the way it is managing data. The real-time data are freely offered to anyone and can be downloaded from the web. The project involves around 30 countries including the UK (managed by the Met Office, with partners: the National Oceanography Centre, the British Oceanographic Data Centre and the UK Hydrographic Office). UK involvement is jointly funded by Defra, MoD and NERC.

Tuesday, 27 November 2007

Geospatial Knowledge Infrastructures Workshop

I managed to catch some of the Geospatial Knowledge Infrastructures Workshop today, part of the eSI programme and hosted by the Welsh eScience Centre. Here are my quick notes...

Rob Lemmens from International Institute for Geo-Information Science and Earth Observation talked about end-user tools. He outlined the different approaches of corporate/national Spatial Data Infrastructures (SDIs) which is a centralised approach and Web 2.0 which is community driven. SDIs are based on stricter rules for annotation and accuracy tends to be higher than Web 2.0 tools, although this is changing. Rob outlined the need for a semantic interoperability framework (combination of ontologies, their relationships and methods for ontology-based description of info sources - data sets, services etc) and a semantic interoperability infrastructure (comprises framework and the tools to maintain and use the framework as well as the information sources produced within this framework). Rob's presentation also included a slide outlining the characteristics of an ontology which was a good representation and a demonstration of ontology visualisation (same tool which ASSERT is using for clustering?). Rob concluded by summarising what the geospatial community can learn and take from Web 2.0, for example tagging/tag clouds, tools for building ontologies (community tagging e.g Google Image Labeller), instant feedback (e.g. password strength bars when selecting a new password) - on the negative side, community-driven tagging can lead to weak semantics. Rob suggests combining the best of both SDI and Web 2.0 worlds - map the SDI and Web2.0 ontologies to create dynamic annotations of geo sources, thus improving discovery.

Ulrich Bugel from Fraunhofer Institut IITB presented on ontology based discovery and annotation of resources in geospatial applications. Ulrich talked about the ORCHESTRA project ( which aims to design and implement an open service-oriented architecture to improve interoperability in a risk management setting (e.g. how big is the risk of a forest fire in a certain region of the Pyrenees in a given season?). This question has spatial references (cross-border, cross-administration); temporal references (time series and prognostics); thematic reference (forest fire); and conceptual reference (what is risk?). ORCHESTRA will build a service network to address these sorts of question. Interoperability is discussed on 3 levels - syntactic (encodings), structural (schemas, interfaces), semantic (meaning). The project has produced the Reference Model for the ORCHESTRA Architecture (RM-OA), drawing on standards from OGC, OASIS, W3C, ISO 191xx, ISO RM-ODP. Many iterations of the Reference Model which led to Best Practice status at OGC. The ORCHESTRA Architecture comprises a number of semantic services: Annotation Service automatically generates meta-information from sources and relates them to elements of an ontology; Ontology Access Service enabling high-level access and queries to ontologies; Knowledge Base Service; Semantic Catalogue Service.

Ian Holt from Ordnance Survey presented on geospatial semantics research at OS. OS has one of the largest geospatial databases, unsurprisingly, with 400 million features and over 2000 concepts. Benefits of semantics research: quality control, better classification; semantic web enablement, semi-automated data integration, data and product repurposing; data mining - i.e. benefits to OS and to customers. OS has developed a topographic domain ontology which provides a framework for specifying content. Developed ontologies for hydrology; administrative geography; buildings and places. Working on addresses; settlements; and land forms. Supporting modules on mereology, spatial relations, network topology. Conceptual ontology- knowledge represented in a form understandable by people vs computational topology - knowledge represented in a form understandable by computers. A controlled natural language called Rabbit has been developed - structured English, compilable to OWL. OS is also part of the OWL 1.1. task force to develop a controlled natural language syntax. A project currently underway developing plug in for Protege with Leeds University - allows natural language descriptions and in the back end, will translate into an OWL model. The first release is scheduled for December with further release planned for March 08. Ian also talked about experimental work to semantically describe gazetteers - an RDF version (downloadable?) to represent the data and OWL ontology to describe the concepts. This work includes administrative regions and work underway to include cities etc. Through their work, OS has experienced some problems with RDF - e.g. may degrade performance (they have >10 billion triples); how much is really needed?. Ian described some work on semantic data integration e.g. "find all addresses with a taxable value over £500,000 in Southampton" so looking at how to merge ontologies (i.e. creating another ontology rather than interoperability between the two). Ian briefly covered some lessons learned - ontologies are never perfect and can't offer complete descriptions of any domain; automatic tools are used as far as possible. Ian also describe work on linking ontologies to databases using D2RQ which maps SPARQL queries to SQL, creating "virtual" RDF. Conclusions : domain experts need to be at the centre of the process; technology transfer is difficult - benefits of semantics in products and applications must be clarified.

Alun Preece from Cardiff University presented on an ontology-based approach to assigning sensors to tasks. The idea is to bridge the gap between people out in the field needing to make decisions (e.g. disaster management) and the data/information produced from networks of sensors and other sources. Issues tackled: data orchestration (determine, locate, characterise resources required); reactive source deployment (repurpose, move, redeploy resources); push/pull data delivery. The approach is ontology-centric and involves semantic matchmaking. Work on proof of concept - SAM (Sensor Assignment for Missions) software prototype and integration with a sensor network. This work is funded by US/UK to support military application - intelligence, surveillance and reconaissance (ISR) requirements. The work uses ontologies to specify ISR requirements of a mission (e.g. night surveillance, intruder detection) and to specify the ISR capabilities provided by different asset types. Uses semantic reasoning to compare mission requirements and capabilities and to decide if requirements are satisfied. For example, if a mission requires Unmanned Aerial Vehicles (UAV), the ontology would specify different types of UAV and the requirements of the mission (e.g. high altitude to fly above weather, endurance) and the semantic matchmaking (exact, subsuming, overlapping, disjoint) then leads to a preferred choice. The project has engaged with domain experts to get the information into the ontology and to share conceptualisations. Alun showed the Mission and Means Framework Ontology which is a high-level ontology which is fleshed out with more specific concepts.

Slides from the workshop will be uploaded to

Wednesday, 21 November 2007

Comms planning for projects

A really helpful post from Alice Gugan, to help projects manage their communications effectively - thanks Alice!


Have come across this twice this week, both times in a geospatial context. The Geospatial Semantic Web blog describes Gnizr, a new open source application One of my projects is considering using Gnizr, possibly with its sister application KnowledgeSmarts, which I don't think is open source. (More info at and

Breaking down digital barriers - report and case studies

Thanks to docuticker ( for pointing out a report on Breaking Down Digital Barriers ( One of the three case studies looks at interoperability issues with mashups:
"Most clearly among our three case studies, the area of Web services demonstrates the manner in which interoperability can stimulate large-scale innovation."

UPGRADE journal for informatics

Just came across reference to UPGRADE ( via the BCS. Recent issues on semantic web, web services, next generation search. Only had time for quick look so far but seems interesting reading.

Friday, 16 November 2007

Chris Date Lecture @ NeSC

The latest NeSC newsletter features a report on the Chris Date Lecture last month. Highlights:

"Bill Pike (Pacific Northwest National Laboratory), in his presentation on integrating knowledge models into the scientific analysis process [...] described the challenge of trying to capture scientific knowledge as it is created, with workflow models that describe the process of discovery. In this way, the knowledge of what was discovered can be connected with
the knowledge of how the discovery was made."

"If future generations of scientists are to understand the work of the present, we have to make sure they have access to the processes by which our knowledge is being formed. The big problem is that, if you include all the information about all the people, organisations, tools, resources and situations that feed into a particular piece of knowledge, the sheer quantity of data will rapidly become overwhelming. We need to find ways to filter this knowledge to create sensible structures... "

"One method for explicitly representing knowledge was presented by Alberto Canas (Institute for Human and Machine Cognition). The concept maps that he discussed are less ambiguous than natural language, but not as formal as symbolic logic. Designed to be read by humans, not machines, they have proved useful for finding holes and misconceptions in knowledge, and for understanding how an expert thinks. These maps are composed of concepts joined up by linking phrases to form propositions: the logical structure expressed in these linking phrases is what distinguishes concept maps from similar-looking, but less structured descriptions such as "mind maps". "

Skills required to support SOA and integrated services

From Computing 15 November 2007, article "Fitting the skills together" by Jim Mortleman:
"..analyst Gartner predicts that four out of five companies will have taken the SOA route by 2010...SOA involves a fundamental change to the way firms think about IT - namely, as a series of interoperable business services, rather than as discrete IT systems."

The article also quotes Nick Masterton-Jones, IT Director of Vocalink: "I think SCA is something we're going to see a lot more of in the coming three years" SCA is Service component architecture "an open SOA promoted by major Java vendors to bridge the gap between people who understand the business domain and people who understand system design".

Monday, 12 November 2007

Data sharing

Useful links:
OECD Principles (2007):,3343,en_2649_37417_38500791_1_1_1_37417,00.html

RIN's Stewardship of Digital Research Data (2007):

MRC's Guidelines on data sharing:

BBSRC's Guidelines on data sharing:

Plus some interesting outputs from JISC-funded projects:

There's a lot in these links about the wider context; how things look now; barriers to data sharing (e.g. trust, IPR, time); discussion on possible solutions (e.g. social software models, reward/recognition, mandates).

Semantic Web?

New Scientist has an interesting article, "'Semantic' website promises to organise your life" ( which talks about Twine, currently in beta testing. Twine uses annotation and natural language processing. It'll be interesting to see how it works when it's released. The article mentions other semantic services in development: Powerset, True Knowledge and Freebase.


NeSC's October newsletter ( features a news item announcing the release of OGSA-DAI 3.0. It includes a clear description of why OGSA-DAI is needed and what it does - one of the simplest explanations I've seen! It also uses SEE-GEO as an example of how OGSA-DAI is being used in the community.

Friday, 9 November 2007

OGF21 - interesting snippets

This time round OGF featured a Semantic Grid theme which I think was either missing or limited last time round as the Manchester event clashed with the WWW conference in Banff. There were a number of presentations on Web2.0 including:

And of course not forgetting the geospatial stuff...


At OGF21, the OGF and OGC signed a Memorandum of Understanding. According to the OGF's Grid Connections Newsletter (Nov 07), the goals of the collaboration will include:
"1.) Integrate OGC's Web Processing Service (WPS) with a range of "back-end" processing environments to enable large-scale processing. The WPS could also be used as a front-end to interface to multiple grid infrastructures, such as TeraGrid, NAREGI, EGEE, and the UK's National Grid Service. This would be an application driver for both grid and data interoperability issues.
2.) Integration of WPS with workflow management tools. OGF’s SAGA draft standard is where multiple WPS calls could be managed.
3.) Integration of OGC Federated Catalogues/Data Repositories with grid data movement tools. OGF’s GridFTP is one possibility that supports secure, third-party transfers that are useful when moving data from a repository to a remote service.
However, the real goal is not just to do science, but to greatly enhance things like operational hurricane forecasting, location-based services, and anything to do with putting data on a map. WPS is just a starting point for the collaboration. As the two organizations engage and build mutual understanding of technical requirements and approaches, many other things will be possible. "

Thursday, 8 November 2007

Agile development

One of my new projects is proposing to use agile development techniques, as opposed to traditional linear project management. This is something that's been talked about by my JISC colleagues before but I haven't looked too deeply into it - so this has been a good opportunity to talk to colleagues about their experiences.

Key issues and lessons

  • Projects should use wikis/websites to enable tracking of work through the development lifecycle

  • Be prepared to adapt templates for project documentation

  • As a Programme Manager, you may need more regular/frequent engagement with projects - the 6-monthly progress report is not going to be sufficient

Useful links (in no particular order)

LIFE Project: model for lifecycle costing of digital objects

Recently posted to JISC-REPOSITORIES:
"The LIFE Project has recently published a revised model for lifecycle costing of digital objects." The project team is looking for comments via the project blog.
More info at:

Kings - Centre for eResearch

Thanks to Ann for posting about this to the JISC-ERESEARCH list:

"King's College London is pleased to announce the establishment of the KCL Centre for e-Research. Based in Information Systems and Services, the Centre will lead on building an e-research environment and data management infrastructure at King's, seeking to harness the potential of IT to enhance research and teaching practice across the College. The Centre also has a remit to make a significant contribution to national, European and international agendas for e-research, and in particular to carry forward in a new context the work of the AHDS across the arts and humanities.
Planning for the new Centre began on 1st October 2007 and a major launch event is planned for Spring 2008. Further information and news about the Centre and its activities will be released over the coming months."

Wednesday, 7 November 2007

Info World Review : interesting news items

Information World Review (Nov 07) features some interesting items...

- a news item, Search and aggregators set to dominate, on the recent Outsell Information Industry Outlook report:
"Watson Healy said 2008 would be 'year of the wiki', with Web 2.0 technology replacing complex portals and knowledge management, and that 'a critical mass of information professionals would take charge of wikis, blogs or other 2.0 technologies on behalf of their organisations".

- an item, PubMed recasts rules for open access re-use, on the new guidelines recently agreed by the UK PubMed Central Publishers Panel:
"Under the terms of the statement of principles, open access (OA) published articles can be copied and the text data mined for further research, as long as the original author is fully attributed".

Cool mashup: migrating swans

This is a nice example of a mashup, using Google Maps to track the migration of swans from Iceland to UK/Ireland. You can follow their flights on the map (though it's hard to get the right scale) and be kept up to date with RSS feeds

Tuesday, 6 November 2007

Business and Community Engagement at JISC

There've been several publications from this programme of work recently:

  • User needs study: How JISC could support Business and Community Engagement
  • Evaluation report: JISC Services and the third stream
  • Final report: Study of Customer Relationship Management issues in UK HE institutions
  • Study: The use of publicly-funded infrastructure, services, and intellectual property for BCE
  • Business and Community Engagement: An overview of JISC activitiesPortable Document Format

Friday, 2 November 2007

Google launches OpenSocial
"Google has launched a system that will allow developers to create applications for a variety of social networks."

Latest Ariadne : NaCTeM, repositories and KIDDM

Good to see NaCTeM :-) A good overview of the current services and a run-through their roadmap:
"NaCTeM's text mining tools and services offer numerous benefits to a wide range of users. These range from considerable reductions in time and effort for finding and linking pertinent information from large scale textual resources, to customised solutions in semantic data analysis and knowledge management. Enhancing metadata is one of the important benefits of deploying text mining services. TM is being used for subject classification, creation of taxonomies, controlled vocabularies, ontology building and Semantic Web activities. As NaCTeM enters into its second phase we are aiming for improved levels of collaboration with Semantic Grid and Digital Library initiatives and contributions to bridging the gap between the library world and the e-Science world through an improved facility for constructing metadata descriptions from textual descriptions via TM."

Other interesting snippets:
  • SURFshare programme covering the research lifecycle

  • a discussion on the use of Google as a repository : "Repositories, libraries and Google complement each other in helping to provide a broad range of services to information seekers. This union begins with an effective advocacy campaign to boost repository content; here it is described, stored and managed; search engines, like Google, can then locate and present items in response to a search request. Relying on Google to provide search and discovery of this hidden material misses out a valuable step, that of making it available in the first instance. That is why university libraries need Google and Google needs university libraries."

  • feedback from ECDL conference, including a workshop on a european repository ecology, featuring a neat diagram showing how presentations are disseminated after a conference using a mix of web2.0, repositories and journals

Monday, 29 October 2007

Data management - learning from commercial sector?

Computing (25 Oct 07) has an article on data management, featuring BAE Systems as one of a series of case studies. BAE estimates "80% of networked employees were wasting an average of 30 minutes a day retrieving information , while 60% were spending an hour or more duplicating the work of others".

The article acknowledges the cultural barriers to using/sharing data and suggests policies are put in place to establish guidelines and principles, as well as training and mentoring to help develop the collaborative and information management skills required.

One of the case studies, Denton Wilde Sapte, cautions "People are so wrapped up in the technical whizz-bangs that they forget that IT is really all about information delivery".

"Organisations are recognising that some pieces of their information have more fundamental value than other parts, although that value might not be realisable today. For certain items of information its maximum value will only be achieved at some point in the future, so companies need to invest in good archiving, storage, search and retrieval systems today" Ian Charlesworth, Ovum, quoted in the article.

NIH funds research data sharing project

"Researchers at ICPSR have been awarded a two-year grant by the National Library of Medicine, National Institutes of Health (NIH) for a project entitled, Barriers and Opportunities for Sharing Research Data. The project will investigate the extent of research data sharing in the social sciences and assess whether research data sharing is related to other aspects of the scientific process including scientific publication."

Rethinking the publication process

Interesting post from Peter Murray-Rust's blog:
Considers preprints, Creative Commons license, using non-copyrighted images, managing citations.

OGC News October 2007

Latest issue of OGC News has a couple of interesting links:

- info on and link to their Spatial Data Quality survey, which will inform the Spatial Data Quality Working Group's attempts to define a framework and grammar for the certification and communication of spatial data quality

- a slideshow demonstrating the use of OGC standards for earth observation.

Wednesday, 24 October 2007

Project Management blogs

"Semantic Web vision: where are we?"

Thanks to Alan Rector for pointing this out:
The Semantic Web Vision: Where Are We?
"The aim of this article is to present a snapshot that can capture key trends in the Semantic Web, such as application domains, tools, systems, languages and techniques being used, and a projection on when organizations will put their full-blown systems into production."

"Democratization of innovation"

Thanks to Bill St Arnaud, for pointing out in CANews - the Economist recently ran an article on democratization of innovation, focusing on the way technology such as web 2.0 are resulting in a trend away from centralised R&D to more distributed, participative methods:

JISC Inform

Latest issue features article on visualisation and podcast with Prof Roy Kalawsky:
Also articles on Go-Geo and on open source.

Very topical mashup

LA Fire Dept are using this mashup to show the spread of the wildfires in LA, placing of evacuation centres and other support facilities:


Been meaning to look at this for ages....
Interesting way of sharing data - also they are launching a Private version presumably if you want to be careful who you share with. Some issues re quality tho - e.g. how could you be sure of provenance?

Tuesday, 16 October 2007

Monday, 15 October 2007

Web2.0 reports

Techwatch report

Results and analysis of the Web 2.0 services survey undertaken by the SPIRE project

Guardian Online : Visualisation

ManyEyes is particularly interesting

Also a really good link to an article listing really good visualisation tools, including search - I really like the Visual Thesaurus.

BBC News : Drive advance fuels terabyte era
"Hard drives currently have a one terabyte limitA single hard drive with four terabytes of storage (4TB) could be a reality by 2011, thanks to a nanotechnology breakthrough by Japanese firm Hitachi..."

Related story in New Scientist

Friday, 12 October 2007

Gartner's Top 10 strategic technologies for 2008

Thanks to Bill St Arnauld for pointing to this on his blog:
At the Gartner Expo this week, the following were discussed as the top 10 technologies organisations can't afford to ignore...

  1. Green IT
  2. Unified communications (interesting for VRE programme)
  3. Business Process Management (to support SOA)
  4. Metadata management
  5. Virtualisation 2.0
  6. Mashups and composite applications
  7. Web platform and Web-Oriented Architecture
  8. Computing fabrics
  9. Real World Web
  10. Social software

Liz Lyon presentation on data curation

Slides from recent presentation by Liz Lyon to a NERC data management workshop

Thursday, 11 October 2007

Sainsbury Review of Science and Innovation


Computing (11 October 2007): Lisa Kelly "Web 2.0 taps the wisdom of crowds":

"Crowdsourcing is an internet-enabled upgrade of the original focus group concept, according to Dell vice president Bob Pearson". The article goes on to show how several big companies are using crowdsourcing to develop products including L'Oreal, Kimberly Clark, Dell.

Wikipedia entry on crowdsourcing

Software on demand

Computing (11 October 2007) this week features a story by Tom Young, Online software is in demand, which talks about new products launched recently, which "are hosted and accessed in real time rather than being installed on in-house systems". Adobe, IBM, Google, Yahoo are all either developing or releasing products, in an attempt to compete with Microsoft's dominance. The software-on-demand model offers a number of benefits around updates, licensing, virus protection, flexibility.

Tuesday, 9 October 2007


Go-Geo has a new look (press release):

Sunday, 7 October 2007


Thanks Frederique for pointing to this - Chris Mackie mentioned this in a meeting earlier in the year but I hadn't followed it up since...

From their website:

"SEASR (Software Environment for the Advancement of Scholarly Research) is being developed by the National Center for Supercomputing Applications in cooperation with the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign.

SEASR aims to:

  • assist scholars in accessing and analyzing existing large information sources more readily and with greater refinement;
  • give scholars increased portability of large information stores for on-demand computing; and
  • empower collaboration among researchers by enhancing and innovating scholarly communities’ and their resources’ virtual research environments.

How will we do it? The SEASR development team will construct software bridges to move information from the unstructured and semi-structured data world to the structured data world by leveraging two well-known research and development frameworks: NCSA’s Data-To-Knowledge (D2K) and IBM’s Unstructured Information Management Architecture (UIMA). SEASR will focus on developing, integrating, deploying, and sustaining a set of reusable and expandable software components and a supporting framework, benefiting a broad set of data-mining applications for scholars in the humanities.

SEASR’s technical goals include supporting:

  • the development of a state-of-the-art software environment for unstructured data management and analysis of digital libraries, repositories and archives, as well as educational platforms; and
  • the continued development, expansion, and maintenance of end-to-end software system: user interfaces, workflow engines, data management, analysis and visualization tools, collaborative tools, and other software integrated into a complete environment."

Internet evolution

"The site for news, analysis and opinion about the future of the Internet"

The Future of Scholarly Communication : workshop report

Report of a joint NSF/JISC workshop:

Some highlights:
  • Access to research : "Success stories, such as TREC for information retrieval research [Voorhees] or the Human Genome Project [HGP], have devoted substantial expertise to creating the necessary infrastructure and managing the datasets with a very clear understanding of how they fit the research practices in their fields."
  • Access to research : "Cyberscholarship needs superdata centers, which combine the storage and organization of vast amounts of data with substantial computing power to analyze it. Building such centers requires investment and long-term commitment on the part of an organization or discipline. While equipment can be purchased, expertise takes longer to establish. Superdata centers and the researchers who use them will need several years before they become truly effective."
  • Value-added services : "As our systems grow more sophisticated, we will see applications that support not just links between authors and papers but relationships between users, data and information repositories, and communities. What is required is a mechanism to support these relationships that leads to information exchange, adaptation and recombination – which, in itself, will constitute a new type of data repository."
  • Also a reference to the need for summarisation on p 10 referencing humanities research in particular.
The CyberScholarship roadmap includes automated metadata generation; provenance establishment; source validation; annotation tools; and contextual semantics.

NSF calls

Sustainable Digital Data Preservation and Access Network Partners (DataNet)

Cyber-Enabled Discovery and Innovation (CDI)

Friday, 5 October 2007

Participative web conference

The blog from the Participative Web conference this week has been a really good read. Richard Ackerman has posted on citizen science - "It is clear that the rapid pace of change is pushing those involved with science infrastructure to think about ways to interact with a broader public, to take advantage of the energy and creativity of the general population, promoting greater understanding of and participation in science. "

Richard has also given a summary of a talk from Andrew Herbert from Microsoft - interesting points include: how sensor networks will enable real-world data to be used in simulations; how to get the right balance of skills in the research workforce.

There's also some handy guides to some of the key themes of the conference, including eScience. The blog links to transcripts as well as the webcast, and to some videos (including two relevant to eScience, by Andrew Herbert and Walter Stewart).

GRH makes the news!

Not particularly relevant to eResearch but had to include it..! Gloucestershire Royal Hospital makes the news in New Scientist:
"Brian Witcombe, a radiologist at Gloucestershire Royal NHS Trust received the Ig Nobel prize in medicine for his study of sword swallowing and its side effects."
Ig Nobel home page:

Semantic image retrieval features a story New search tool gets the picture about a new search tool developed by Southampton Uni. The story mentions the limitations of the tool, e.g. that it is difficult to expand and may not cope with the variety of images on the web; but also the strengths e.g. dealing with language, producing more discriminating search results.

Thursday, 4 October 2007


This week's Computing (4 Oct) mentions 5 information management technologies to watch out for in the next 3 years:

Wednesday, 3 October 2007

Role of libraries

An interesting post yesterday on Science Library Pad:
It mentions how librarians should accept that some services might be better done through technology or even by other organisations. Instead, they should focus on where they can really add value e.g. managing scientific data, curating digital information like blog posts. The post mentions a recent event organised jointly by University of Washington Libraries and Microsoft, Global Research Library 2020.

Tuesday, 2 October 2007

Tips for conference bloggers

Really useful tips on blogging from a conference


From OGF Grid Connections Oct 07 newsletter:

"In terms of user communities, OGF is pursuing a collaboration with the Open Geospatial Consortium (OGC). OGC has a suite of tools for managing and presenting geospatial data -- anything that goes on a map -- and wants very much to extend their tools with the capability for distributed resource management, i.e., grids. I should also note that there is a Web 2.0 workshop at OGF-21 that covers social networking, semantic grids, and sensors. The fact that half of all Web 2.0 services registered at are geospatially related, and that Google is sending KML through the OGC standardization rocess, indicates that there is a huge potential for grids in this arena. "

OGF21 is later this month - worth seeing what comes out of the following workshops:

Web 2.0 - features presentations on research and commercial applications of Web 2.0 technology including HPC, Cyberinfrastructure, Semantic Research, Social Networking
Geospatial - a collaboration with the OGC, covering topics such as grid-enabling the OGC's Web Processing Service and a NSF proposal on Community-based Data Interoperability Networks
GridNet2 - highlighting the work of the UK eScience at the OGF and in related standards bodies

Monday, 1 October 2007

Blue Ribbon Task Force on Sustainable Digital Preservation and Access

"JISC is supporting an international initiative, led by US-based organisations the National Science Foundation (NSF) and the Andrew W. Mellon Foundation, to address the issue of economic sustainability in digital preservation.
A Task Force to be co-chaired by Fran Berman, director of the San Diego Supercomputer Center at the University of California and a pioneer in data ‘cyberinfrastructure’, and Brian Lavoie, an economist and research scientist with OCLC, will receive support from the Library of Congress, the National Archives and Records Administration and the Council on Library and Information Resources, along with JISC.
The Blue Ribbon Task Force on Sustainable Digital Preservation and Access is expected to meet over the next two years to gather testimony from experts in preparation for the Task Force's Final Report. Though significant progress has been made to overcome the technical challenges of achieving persistent access to digital resources, the economic challenges remain."

Friday, 28 September 2007


Interesting sites:

Mashup Dashboard

Repositories Mashup

Simile Timeline

Grid and Web2.0

Article on Bill St Arnaud's blog on grid and web2.0

NSF call for community-based data interoperability networks

"Digital data are increasingly both the products of research and the starting point for new research and education activities. The ability to re-purpose data – to use it in innovative ways and combinations not envisioned by those who created the data – requires that it be possible to find and understand data of many types and from many sources. Interoperability (the ability of two or more systems or components to exchange information and to use the information that has been exchanged) is fundamental to meeting this requirement. This NSF crosscutting program supports community efforts to provide for broad interoperability through the development of mechanisms such as robust data and metadata conventions, ontologies, and taxonomies."

Guidance on bid writing

From JISC's eLearning Focus pages :

Unbundling Windows

From BCS News -
"PCs sold in the European Union (EU) should not come with an operating system already installed on them, according to a new report. The publication created by the Globalisation Institute and submitted to the European commission (EC) suggests that it is not in the interests of consumers to keep selling systems that are bundled with Windows."

eBank report

Thanks to the UKOLN newsfeed for pointing out a new report from the eBank project - A Study of Curation and Preservation issues in the eCrystals Data Repository and proposed federation. Now in its 3rd phase, eBank is exploring preservation, curation and sustainability issues, with a view to progressing a federation of repositories in crystallography, thus ensuring data remains usable (and reusable by others).

Some of the specific issues considered in the report are:
  • Audit and certification: including a brief overview of recent work and current instruments. The recommendation is to use the lightweight version of the DRAMBORA toolkit due this month, as part of an annual self-audit cycle. It is acknowledged that short-term staff contracts and funding cycles have a detrimental effect on the organisational aspects which need to be in place to ensure sustainability. A further recommendation is to explore LOCKSS or CLOCKSS "to engage the crystallography community in the preservation of its valuable data"
  • Open Archival Information System (OAIS) standard: the report recommends eBank develop a formal deposit, ingest, validation and dissemination policy and that work on Representation Information looks wider than just the eCrystals repository but looks at the whole crystallography domain
  • Metadata: The report recommends further exploration of provenance information as currently versioning is the only type of information stored; and also how preservation metadata can be generated, extracted and maintained automatically

Thursday, 27 September 2007

CNI news

News from the CNI points to the following interesting items this week :

CODATA Data Science journal - "Open Data for Global Science"

CT Watch Aug 07 - "The Coming Revolution in Scholarly Communications and Cyberinfrastructure"

Tuesday, 25 September 2007

UUK stats

Universities UK have published Patterns of higher education institutions in the UK: Seventh report which includes some useful figures. According to the report, in 2005/06, £3,120,606,000 was received by UK institutions via research grants and contracts, out of a total of £19,503,112,000.

Also useful is their Research & Innovation Facts and Figures which includes income by subject area (the clear leader is clinical medicine) and trends in government expenditure on R&D

International Journal of Digital Earth

Just heard from the All Hands organisers about the forthcoming International Journal of Digital Earth to be launched next year by Taylor and Francis

D-Lib articles on libraries and cyberinfrastructure

In Sept/Oct 07 issue, both by Anna Gold...

Cyberinfrastructure, Data, and Libraries, Part 1 : A Cyberinfrastructure Primer for Librarians

Cyberinfrastructure, Data, and Libraries, Part 2 : Libraries and the Data Challenge: Roles and Actions for Libraries


Posted on the Geospatial Semantic Web blog yesterday, a post about Intel's Mash Maker .

I really like the Gartner's Hype Cycle of Emerging Technologies quoted on the page. A quick search turned up a 2007 version but not available for free :-(

Added 15/10/07: Yahoo also have a mashup service, MapMixer.

Promoting IT as a problem-solving profession

The Sept 07 issue of IT NOW (from the British Computer Society) includes a feature on the future of the IT profession and how the profession, given the importance of IT in all that we do, needs to be promoted as a "problem-solving" profession right from school-age. Currently, public perception of IT is skewed towards stereotypes and this isn't helped by the focus on teaching applications (e.g. Word) in schools, rather than on the contribution IT can make (giving the example of NHS CFH which if it works, will change how the NHS operates) to wider society.

Friday, 21 September 2007

The Long Tail

Can't remember how I came across this link but I think it was in someone else's blog, referring to Chris Anderson's forthcoming new book (called "Free" I think but could be wrong!).

Anyway, Random House have included excerpts of The Long Tail on the web here.

Thursday, 20 September 2007

Digital preservation - shared service across Whitehall

Also in Information World Review, a feature on the 3 year scoping exercise led by National Archives to devise a pan-government service to take on the task of digital preservation across Whitehall.

EU-funded news aggregator

Information World Review (Sept 07) includes a review of the European Media Monitor, which "works by compiling summaries of stories from across the web, which are clustered together and ranked depending on the number of articles that have been produced for a particular topic or language" [review by Daniel Griffin].

Wednesday, 19 September 2007

Recent articles

Microsoft's Office Open XML not accepted as standard

Microsoft's bid for 'open' document standard is rebuffed
Article in International Herald Tribune about Microsoft's failed attempt to get their open document format, Office Open XML, recognised as an international standard.

OS MasterMap press release

Press release
OS MasterMap goes online for universities and colleges across Britain
Tens of thousands of students, staff and researchers at universities and further education colleges across Britain have online access to the country’s most advanced digital mapping from this month....

Report from BCS KIDDM Mash-Up

On Monday, I went along to the (BCS KIDDM Knowledge Mash-up ) - only stayed for the morning and was a bit disappointed that the day wasn't as interactive as the title suggested. The talks in the morning were quite high level too but it was interesting. Came across the BCS Geospatial Group for the first time.

Peter Murray has written up some of the day's presentations on his blog.

Conrad Taylor, introducing the day, covered issues around mark-up and tagging, referring to the difficulties of marking up audio/video and unstructured text; time constraints; and difficulties of subject classification.

Tony Rose talked about information retrieval and some of the innovative approaches out there:
  • semantic searching - as demonstrated by hakia and lexxe

  • natural language processing - as demonstrated by powerset and lexxe

  • disambiguation - as demonstrated by quintura
    and ask

  • assigning value to documents - as demonstrated by google

He sees future of search as addressing the following:
  • rich media search
  • multi/cross lingual search
  • vertical search
  • search agents
  • specialised content search
  • human UI
  • social search
  • answer engines
  • personalisation
  • mobile search

Tom Khazaba from SPSS talked about their products for text and data mining and the various applications they're used for (CRM, risk analysis, crime prevention etc). He stressed that the results of text analysis have to be fitted into business processes and mentioned briefly how Credit Suisse have achieved this. He listed the keys of success of text/data mining solutions:
  • ease of use
  • supports the whole process
  • comprehensive toolkit - ie features visualisation, modelling etc so all you need is in one place
  • openness - using existing infrastructure
  • performance and scalability
  • flexible deployment
Ian Herbert, from the Health Informatics SG, talked about the recent work on SNOMED-CT and its application in NHS CFH. SNOMED-CT will allow pre-coordinate and post-coordinate searching. The main challenge has been in capturing the depth of clinical expression. Concepts have qualifiers e.g. pain has a qualifier indicating severity. There has been some work mapping to MeSH although Ian seemed to think this wasn't complete. The key challenge facing the team now is rolling out - there are few examples of its use in a real-time environment. It remains to be seen if health professionals will take well to using during consultations - it is quite a complex system and as Ian admits "users want the biggest bang for their keystroke buck".

Dan Rickman introduced geospatial information systems. He referred to the importance of metadata and ontologies for handling the large volumes of unstructured data. In geospatial information, there is also a temporal aspect as many applications will view an area over time. He mentioned OS' work on a Digital National Framework which has several principles:
  • capture information at the highest resolution possible
  • capture information once and use many times
  • use existing proven standards etc
Dan also mentioned issues around object-based modelling. The OS has developed TOpographical IDentifiers (TOIDs) to identify every feature in Britain. He also mentioned the Basic Land and Property Unit (BLPU) which would be used to describe, for example, a hospital (which may have many buildings, "owned" by different organisations). He also talked about neogeography which has arisen from the web2.0 explosion.

Sunday, 16 September 2007