Friday, 28 March 2008

Projects addressing issues around research data

Yesterday, we had a meeting here at JISC to bring together current projects working in the field of research data. There's a lot happening and it's going to be really interesting to see what comes out of these studies:

I already mentioned ( an article earlier this year in Inform. Of course, much of the work stems from Liz Lyon's report from last year Dealing with Data (see earlier post at

Tuesday, 18 March 2008

NGS - case study and ENGAGE project

Latest NGS newsletter ( features:

- article on ENGAGE: "Eleven groups, with research interests that include Oceanography, Biology and Chemistry, have already been interviewed. The results of the interviews will be reviewed during ENGAGE’s second phase. This phase will identify and publicise the ‘big issues’ that are hindering e-Research adoption and the ‘big wins’ that could help it. Solutions to some of the big issues will be developed and made freely available so that the entire research community will benefit. The solutions may involve the development of new software, which will make use of OMII-UK’s expertise, or may simply require the provision of more information and training. Any software that is developed will be deployed and evaluated by the community on the NGS."

- case study of using NGS in Integrative Biology to understand defibrillation of the heart: "“Using the NGS does not give time improvements when you are using sequential code, but it does give definite performance improvements.” says Dr Rodriquez. “Once you get started, using the NGS is very easy to use.”"

Building Effective Virtual Organisations

Webcasts from a recent NSF event:

DCC Curation Lifecycle Model

This recently went to consultation - not sure when the results of the consultation come out and how much the model will change as a result. But in meantime, want to keep track of the links:

model :
background info :

"Innovation Nation" white paper

Thanks to James for pointing this out:

"Innovation Nation sets out the Government's aim to make the UK the best place in the world to run an innovative business or public service. It argues that innovation is essential to the UK's future prosperity and the ability to tackle major challenges like climate change.

The paper considers how Government and society respond to changes in innovation across the public, private and third sectors. Other key themes are further supporting innovative businesses and research; increasing exchanges of knowledge; boosting the supply of skilled people; supporting innovative towns and regions and promoting innovation in the public sector.

Headline commitments include [...]:
* Doubling the number of Knowledge Transfer Partnerships between businesses, universities and colleges to boost competitiveness and productivity alongside a greater exchange of innovation expertise between the private sector and Government led by DIUS and the TSB;
* Piloting of a new Specialisation and Innovation Fund to boost the capacity of further education colleges to unlock workforce talent and to support businesses in raising innovation potential;
* Expanding the network of National Skills Academies with one academy for every major sector of the economy;
* Sponsoring new Partnerships for Innovation bringing together venture capital with universities, business and other local partners to jointly develop innovative solutions to local and regional challenges. DIUS will publish a prospectus in the autumn;
* Establishing an Innovation Research Centre in partnership with the Economic and Social Research Council (ESRC), NESTA and the TSB;
* A new Annual Innovation Review to provide a comprehensive annual assessment of promoting innovation in the public and private sectors. The first of these will be published this autumn."

OGC and OASIS collaborating on standards

Press release from OGC ( about their collaboration with OASIS:
"The groups point to Web services as a key area of their cooperation. With the existing OGC Web Services (OWS) standards, most of the standards needed to publish, discover and use Web-resident geospatial data and services on the Web are in place. However, OWS must work in concert with other Web services standards. That's why OGC members approved the ebRIM (electronic business Registry Information Model) OASIS Standard as the preferred cataloging meta-model foundation for future application profiles of the OpenGIS® Catalog Service Web (CS-W) Standard. "

Monday, 17 March 2008

New guide to geospatial resources in humanities

AHESSC has produced a very readable guide to geospatial resources and services in the humanities:

Semantic web - various

Some discussion recently about the vision of the semantic web...

Tim Berners Lee features in The Times Online ( talking about the potential of the semantic web

Discussion on the BCS-KIDDM list referred to an earlier talk by Prof Ian Horrocks ( which is a very readable intro to the concepts behind the semantic web and also refers to Manchester's work with Protege, including the pizza demo.

Yahoo have also been talking about semantic web, in particular its application in web searching (
"In the coming weeks, we'll be releasing more detailed specifications that will describe our support of semantic web standards. Initially, we plan to support a number of microformats, including hCard, hCalendar, hReview, hAtom, and XFN. Yahoo! Search will work with the web community to evolve the vocabulary framework for embedding structured data. For starters, we plan to support vocabulary components from Dublin Core, Creative Commons, FOAF, GeoRSS, MediaRSS, and others based on feedback. And, we will support RDFa and eRDF markup to embed these into existing HTML pages. Finally, we are announcing support for the OpenSearch specification, with extensions for structured queries to deep web data sources."

Thursday, 13 March 2008

NGS and data storage

The March NeSC newsletter ( features a short item on NGS focusing on storage, referring to their work with SDSS:

"One example of where the NGS is assisting with large data sets is the Sloan Digital Sky Project (SDSS; It already uses the NGS in order to simultaneously access two large databases containing images of nearly 300 million celestial objects. The relational data bases in the UK and US hold over 100 parameters for each object therefore difficulties with storage and access were inevitable. Helen Xiang at the University of Portsmouth has been using the Oracle databases hosted on the NGS to store the data and recently succeeded in transferring almost 2 Terabytes of SDSS data to the NGS Oracle database in Manchester. A separate Microsoft SQL database at Portsmouth holds another 2 Terabytes of similar data and joint queries on the two databases have been successfully run. Not only did the SDSS solve the problems of data storage but they also solved the problem of a large number of users being able to access the data from wherever they were based. "

Commercial Research and Higher Education Institutions

From the JISC IPR Consultancy newsletter:

"Her Majesty's Revenue's and Customs and the Charities Commission's lawyers have issued an opinion that when the purpose of a particular piece of research carried out in a University is to enhance the profitability of a third party, then in order to avoid breaking the law, that research must be run by a spin off company of the HEI. The spin off company can then covenant its profits to the parent HEI. However, if the "commercial" research is not undertaken through the spin out company, then the HEI is breaking the Charities Act. On the other hand, if the work can be shown primarily to further the HEI, and benefits to any third party can be shown to be incidental, or only a very small sum is involved, there isn't a problem. The wording the lawyers have used was "Charity trustees would be in breach of their trusts if they decided to carry out activities that were supposed to directly further the charity's objects, but which resulted in private benefits that were not legitimately incidental. If the commercial advantage was one of the main purposes of the arrangement, then the commercial research would have to be undertaken as a taxable 'non primary purpose' trading.....In case of doubt, HMRC recommends carrying out the research through a trading company." The Charities Commission will shortly be issuing advice on this matter on its web site. "

Google and data storage

Back in January, Wired reported ( on Google's plans for open access to research data:

"Two planned datasets are all 120 terabytes of Hubble Space Telescope data and the images from the Archimedes Palimpsest, the 10th century manuscript that inspired the Google dataset storage project"

Also refers to an earlier article by Thomas Goetz ( on freeing dark data i.e. negative results, to get around the publication bias problem.


"The UK Data Archive (UKDA) is launching UKDA-store, a new research output management tool, later this year. Used to submit data deposits into the UK Data Archive, UKDA-store is to be initially released to the social science research community with the intention of extending the system to other researchers. UKDA-store will enable researchers to submit a range of digital outputs to the self-archiving repository with the right to set permissions for individual and group access, so that data can remain private (on embargo) although metadata continues to be searchable. Furthermore, data that is judged to meet the UKDA’s acquisition criteria can be formally lodged for long-term central system preservation within the UK Data Archive. [...]
UKDA-store will be formally launched at the National Centre for Research Methods Festival on 30 June 2008 in Oxford."

Cloud tools used in NY Times digitisation

A short blog item (pointed out by Bill St Arnaud) on the use of Yahoo's Hadoop and Amazon's S3 and EC2 by the New York Times - cheaper infrastructure costs but the comments seem to suggest higher or equal coding costs. Interesting.

NSF partnership with Google and IBM

The Cluster Exploratory (CluE) relationship will enable the academic research community to conduct experiments and test new theories and ideas using a large-scale, massively distributed computing cluster. NSF anticipates being able to support 10 to 15 research projects in the first year of the program, and will likely expand the number of projects in the future.

Back on my blog!

Well, I haven't been doing too well at keeping blogging lately - combination of being poorly, on leave and just plain busy but hopefully getting back on track now...

Wednesday, 5 March 2008

myExperiment in Nature

Thanks to Judy for pointing this out - myExperiment gets a mention in Nature. Shame it doesn't mention it's funded by JISC but hey, we can't have everything!

", funded by the UK government, lets users share workflows: the customary protocols for standardizing data, running simulations or conducting statistical analysis on large data sets. Standardized protocols for manipulating large data sets can be tweaked for specific purposes. Users can comment on their usefulness and link to other work-flows of interest. Bioinformaticians and geneticists are among those who stand to benefit most. For example, sharing a workflow for identifying biological pathways implicated in Trypanosomiasis resistance in cattle allowed another investigator to find pathways involved in sex dependence in the mouse model, says myExperiment project leader David De Roure, a computer scientist at the University of Southampton, UK. Done independently, this type of study could take two years. Such streamlining allows scientists to focus on discovery rather than drudgery, he says."