Wednesday, 30 July 2008

"Semantic Medline"

Interesting story in Information Today...

Cognition launches Semantic Medline

"...enables complex health and life science material to be rapidly and efficiently discovered with greater precision and completeness using natural language processing (NLP) technology"

I tried a quick search "exercise and depression" just to see it working - results are mostly relevant on the first couple of pages - it does offer you to select the correct meaning e.g. of depression (feeling of sadness/hopelessness) but still seems to bring up records referring to other meanings (e.g. ST segmental depression) - although I guess it's impossible to avoid that - and the definitions might be more useful if sourced from a medical dictionary which they don't appear to be. It would be interesting to compare results using MeSH.

Given that my search retrieved over 7000 results, it would also be useful to have some options for narrowing the search - suggesting additional search terms (e.g. are you interested in a particular population e.g. postnatal?)

Monday, 28 July 2008

Mobile web

From BBC: Mobile web reaches critical mass

"The mobile web has reached a "critical mass" of users this year, according to a report by analysts Nielsen Mobile.

The US is the most tech savvy nation with nearly 40 million Americans - 16% of all US mobile users - using their handset to browse on the move.

The UK and then Italy come a close second and third in the 16 countries surveyed by the analyst firm.


'PC internet users visit more than 100 domains per month, on average,' the report said.

'By contrast, the average mobile internet user in the US visited 6.4 individual websites per month.' UK use was slightly less at 5.5 per month."

Clearly, has implications for how to deliver content effectively ... could be a good way of delivering alerts, prompts, small chunks of quality content, bitesize e-learning...

Friday, 25 July 2008

Open Web Foundation

"an organization that will help the creation and acceptance of Open Web"

"The Open Web Foundation's goal it to provide a home for community created specs. with mentorship, resources and infrastructure. Hopefully this will help teams spend time on making the spec."

ps Thanks to Ian for pointing this out

Thursday, 24 July 2008

More..various news

  • Google launched Knol this week, taking on Wikipedia although it does take a different approach, making authors more visible than on Wikipedia, with more emphasis on authority and reputation. Individuals can contribute but I'm not clear how contributions are validated - it recommends contributors write a bio to establish credentials and you can set permissions for others to edit your "knol" - but essentially it seems to be up to the reader to judge based on the writer's credentials. It also lets writers select IPR options, defaulting to Creative Commons. A lot of the knols there now relate to health so I'd be interested to know more about their quality framework.
  • Steve Prentice from Gartner tells the BBC that the days of interacting with your computer via your mouse are numbered
  • New Scientist reports "UK to get superfast broadband by 2012" (speeds of up to 100 megabits per second) -
  • CILIP Gazette 11-24 July includes a feature on the latest TFPL Connect event, exploring implications of a recent CMI report on the world of work in 2018. Delegates discussed the move towards portfolio working; the role of knowledge managers; flexible working; increasing emphasis on "alliance-building", strategic planning and political skills.
  • Central Office for Information releases guidelines on inclusion for public sector websites
  • Interesting article reporting on James Evans' research in Science, Great minds think (too much) alike suggesting that access to more journal literature is actually resulting in fewer citations
  • Article in Times Higher reporting on the suggestion by Bahram Bekhradnia, director of the Higher Education Policy Institute that HEFCE's new Research Excellence Framework should be based on peer review not solely data metrics
  • IWR reports: Nearly £10 million has been awarded to preserve low use journals for those in UK Higher Education. The new initiative, UK Research Reserve (UKRR) aims to improve access to the journal information for researchers as well as better preserve the body of work.

Tuesday, 22 July 2008

More bits and pieces of news and stuff

Friday, 18 July 2008

Various news

I'm starting to catch up with reading - here's some of the news to hit recently (ish!):
  • Microsoft buys up Powerset, in its attempt to take on Google
  • HEFCE announces 22 pilot institutions to test the new REF (
  • NHS Choices selects Capita as preferred bidder
  • Google is experimenting with a Digg-like interface
  • Amazon S3 experienced service outage on 20 July - one of the risks of relying on the cloud, I guess
  • Encyclopaedia Britannica goes wiki
  • Proquest to acquire Dialog business from Thomson Reuters
Some interesting articles came my way too...
  • Information : lifeblood or pollution? has some interesting thoughts about when information has value and when there is so much information it loses its value. Jakob Nielsen is quoted: 'Information pollution is information overload taken to the extreme. It is where it stops being a burden and becomes an impediment to your ability to get your work done.' Possible solutions are rating the integrity of information and clearer provenance.
  • International initiative licenses resources across 4 European countries about a deal negotiated via the Knowledge Exchange with Multi-Science, ALPSP, BioOne, ScientificWorldJournal, and Wiley-Blackwell.
  • A fun way of describing the amount of data Google handles

Thursday, 17 July 2008

JISC Innovation Forum

Earlier this week, this JISC Innovation Forum took place, with the aim of getting together projects and programmes to discuss cross-cutting themes and share experiences. I attended the theme on research data - 3 sessions in all each focusing on a different aspect:

Session 1 - Legal and policy issues
This session followed the format of a debate, with Prof Charles Oppenheim arguing for the motion that institutions retain IPR and Mags McGinley arguing that IPR should be waived (with the disclaimer that both presenters were not necessarily representing their personal or institution's views).

Charles argued that institutional ownership encourages data sharing. Curation should be done by those with the necessary skills - curation involves copying and can only be done effectively where the curator knows they are not infringing copyright therefore the IPR needs to be owned "nearby". He also explained how publishers are developing an interest in raw data repositories and wish to own the IPR on raw as well as published data. There is a real need to encourage authors from blindly handing over the IPR on raw data. He suggested a model where the author is licensed to use and manipulate data (e.g. deposit in repository) and the right to intervene should they feel their reputation is under threat. The main argument focused on preventing unthinking assignment of rights to commercial publishers.

Mags suggested that curation is best done when no-one asserts IPR. There may in fact be no IPR to assert and she explained that there is often over-assertion of rights. There is in general a lot of confusion and uncertainty around IPR which leads to poor curation - Mags suggested the only way to prevent this confusion is to waive IPR altogether. Data is more than ever now the result of collaboration relying on multiple (and often international) sources of data so unravelling the rights can be very difficult - there could be many, even 100s of owners across many jurisdictions. Mags concluded with the argument that it is easier to share data which is unencumbered by IPR issues and quoted the examples of Science Commons and CC0.

A vote at this point resulted in : 5 for the motion supporting institutional ownership; 10 against; 7 abstaining.

A lively discussion followed - here are the highlights:
  • it's important to resolve IPR issues early
  • NERC model - researchers own IPR and NERC licenses it (grant T&Cs)
  • in order to waive your right, you have to assert it first
  • curation is more than just preservation - the whole point is reuse
  • funders have a greater interest in reuse than individual researchers - also have the resources to develop skills and negotiate T&Cs/contracts
  • not just a question of rights but responsibilities too
  • issues of long-term sustainability e.g. AHDS closure
  • incentives to curate - is attribution enough?
  • what is data? covered range of data including primary data collected by researcher, derived data, published results
  • are disciplines too different?
  • duty to place publicly funded research in the public domain? use of embargoes?
  • can we rely on researchers and institutions to curate?
  • "value" of data?
  • curation doesn't necessarily follow ownership - may outsource
  • proposal to change EU law on reuse of publicly funded research - HE now exempt - focuses on ability to commercially exploit - HEIs may have to hand over research data??
And finally, we voted again : this time, 6 for the motion; 14 against; 3 abstaining.

Session 2 - Capacity and skills issues
This session looked at 4 questions:
  1. What are the current data management skills deficits and capacity building possibilities?
  2. What are the longer term requirements and implications for the research community?
  3. What is the value of and possibilities for accrediting data management training programmes?
  4. How might formal education for data management be progressed?
Highlights of discussion:
  • who are we trying to train? How do we reach them? The need for training has to appear on their "radar" - best way to reach researchers is via lab, Vice-Chancellor, Head of School of funding source.
  • training should be badged e.g. "NERC data management training"
  • "JISC" and "DCC" less meaningful to researchers
  • a need to raise awareness of the problem first
  • domain specific vs generic training
  • need to target postgrads and even undergrads to embed good practice early on
  • need to cover entire research lifecycle in training materials
  • how is info literacy delivered in institutions now? can we use this as a vehicle for raising awareness or making early steps?
  • School of Chemistry in Southampton has accredited courses which postgrads must complete - these include an element of data management
  • lack of a career path for "data scientists" is a problem
  • employers increasingly looking for Masters graduates as perceived to be better at info handling
  • new generation of students - have a sharing ethic (web2.0) but not necessarily a sense of structured data management
  • small JISC-funded study to start soon on benefits of data management/sharing
  • can we tap into records management training? a role here for InfoNet?
  • can we learn from museums sector? libraries sector?
  • Centre for eResearch at Kings are developing "Digital Asset Management" course, to run Autumn 09
  • UK Council of Research Repositories has a resource of job descriptions
  • role of data curators in knowledge transfer - amassing an evidence base for commercial exploitation
  • also a need for marketing data resources

Session 3 - Technical and infrastructure issues

This session explored the following questions:

  • what are the main infrastructure challenges in your area?
  • who is addressing them?
  • why are these bodies involved? might others do better?
  • what should be prioritised over the next 5 years?
One of the drivers for addressing technical and infrastructure issues is around the sheer volume of data – instruments are generating more and more data – and the volume is growing exponentially. It must be remembered that this isn't just a problem for all big science – small datasets need to be managed too although the problem here is more to do with variety of data (heterogenous) than volume. It was argued that big science has always had the problem of too much data and have to plan experiments to deal with this e.g. LHC in CERN disposes of a large percentage of data collected during experiments. In some areas, e.g. geospatial, data standards have emerged but it may be a while before other areas develop their own or until existing standards become de facto standards.

Other areas touched on included:
  • the role of the academic and research library
  • roles and responsibilities for data curation
  • how can we anticipate which data will be useful in the future?
  • What is ‘just the right amount of effort’?
  • What are the selection criteria – what value this data might have in the future (who owns it, who’s going to pay for it), how much effort and money would you have to regenerate this data (eg do you have the equipment and skills to replicate it?)
  • not all disciplines are the same therefore one size doesn't fit all
  • what should be kept? data, methodology, workflow, protocol, background info on researcher? How much context is needed?
  • how much of this context metadata can be sourced directly e.g. from proposal?
  • issues of ownership determine what is stored and how
  • what is the purpose of retaining data - reuse or long-term storage? Should a nearline/offline storage model be used? Infrastrucutre for reuse may be different from that for long-term storage?
  • Should we be supporting publication of open notebook science? (and publishing of failed experiments). What about reuse/sharing if there’s commercial gains?
The summing up at the end concluded 4 main priority areas for JISC:
  1. within a research environment – can we facilitiate the data curation using the carrot of sharing systems? (IT systems in the lab)
  2. additional context beyond the metadata
  3. how do we help institutions understand their infrastructural needs
  4. what has to happen with the various dataset systems (fedora etc) to help them link with the library and institutional systems

Tuesday, 8 July 2008

Data, Information, Knowledge, Wisdom

I was pointed to an article by Martin Fricke (by the BCS KIDDM list) which argues that the Data-Information-Knowledge-Wisdom hierarchy is methodologically unsound. It makes for fairly heavy reading at times but has some interesting discussion about the strength of data/information/knowledge in relation to "truth":

"Information is both more extensive than data and many instances of it are logically stronger than data. Information is irreducible to data. [...] This makes knowledge and information synonymous. Knowledge and information collapse into each other"

"And the wise person must not only have wide appropriate knowledge, but they must act in accordance with the knowledge they have."

The article also mentions evidence, but in a different context to the "evidence-based practice" use - this is more related to knowledge (some discussion of whether this means "know-that" or "know-how") and wisdom.

Wednesday, 2 July 2008

Agile documentation

Interesting post on BCS blog on agile management led me to the concept of TAGRI - They Aren't Gonna Read It:

Some interesting thoughts of how documentation should be produced - working with the customer, to provide communication not documentation for documentation sake, writing to a good enough standard.

Some of the questions posed could apply to anyone writing any kind of documentation. Interestingly, they don't advise using templates as each system is different so will require different documentation - the thinking is that the template is resource-intensive to create; the template will ask for detail which isn't always relevant but people will attempt to write something; thus reviews take longer because there's so much more information to read through.