Showing posts with label events. Show all posts
Showing posts with label events. Show all posts

Thursday, 17 July 2008

JISC Innovation Forum

Earlier this week, this JISC Innovation Forum took place, with the aim of getting together projects and programmes to discuss cross-cutting themes and share experiences. I attended the theme on research data - 3 sessions in all each focusing on a different aspect:

Session 1 - Legal and policy issues
This session followed the format of a debate, with Prof Charles Oppenheim arguing for the motion that institutions retain IPR and Mags McGinley arguing that IPR should be waived (with the disclaimer that both presenters were not necessarily representing their personal or institution's views).

Charles argued that institutional ownership encourages data sharing. Curation should be done by those with the necessary skills - curation involves copying and can only be done effectively where the curator knows they are not infringing copyright therefore the IPR needs to be owned "nearby". He also explained how publishers are developing an interest in raw data repositories and wish to own the IPR on raw as well as published data. There is a real need to encourage authors from blindly handing over the IPR on raw data. He suggested a model where the author is licensed to use and manipulate data (e.g. deposit in repository) and the right to intervene should they feel their reputation is under threat. The main argument focused on preventing unthinking assignment of rights to commercial publishers.

Mags suggested that curation is best done when no-one asserts IPR. There may in fact be no IPR to assert and she explained that there is often over-assertion of rights. There is in general a lot of confusion and uncertainty around IPR which leads to poor curation - Mags suggested the only way to prevent this confusion is to waive IPR altogether. Data is more than ever now the result of collaboration relying on multiple (and often international) sources of data so unravelling the rights can be very difficult - there could be many, even 100s of owners across many jurisdictions. Mags concluded with the argument that it is easier to share data which is unencumbered by IPR issues and quoted the examples of Science Commons and CC0.

A vote at this point resulted in : 5 for the motion supporting institutional ownership; 10 against; 7 abstaining.

A lively discussion followed - here are the highlights:
  • it's important to resolve IPR issues early
  • NERC model - researchers own IPR and NERC licenses it (grant T&Cs)
  • in order to waive your right, you have to assert it first
  • curation is more than just preservation - the whole point is reuse
  • funders have a greater interest in reuse than individual researchers - also have the resources to develop skills and negotiate T&Cs/contracts
  • not just a question of rights but responsibilities too
  • issues of long-term sustainability e.g. AHDS closure
  • incentives to curate - is attribution enough?
  • what is data? covered range of data including primary data collected by researcher, derived data, published results
  • are disciplines too different?
  • duty to place publicly funded research in the public domain? use of embargoes?
  • can we rely on researchers and institutions to curate?
  • "value" of data?
  • curation doesn't necessarily follow ownership - may outsource
  • proposal to change EU law on reuse of publicly funded research - HE now exempt - focuses on ability to commercially exploit - HEIs may have to hand over research data??
And finally, we voted again : this time, 6 for the motion; 14 against; 3 abstaining.

Session 2 - Capacity and skills issues
This session looked at 4 questions:
  1. What are the current data management skills deficits and capacity building possibilities?
  2. What are the longer term requirements and implications for the research community?
  3. What is the value of and possibilities for accrediting data management training programmes?
  4. How might formal education for data management be progressed?
Highlights of discussion:
  • who are we trying to train? How do we reach them? The need for training has to appear on their "radar" - best way to reach researchers is via lab, Vice-Chancellor, Head of School of funding source.
  • training should be badged e.g. "NERC data management training"
  • "JISC" and "DCC" less meaningful to researchers
  • a need to raise awareness of the problem first
  • domain specific vs generic training
  • need to target postgrads and even undergrads to embed good practice early on
  • need to cover entire research lifecycle in training materials
  • how is info literacy delivered in institutions now? can we use this as a vehicle for raising awareness or making early steps?
  • School of Chemistry in Southampton has accredited courses which postgrads must complete - these include an element of data management
  • lack of a career path for "data scientists" is a problem
  • employers increasingly looking for Masters graduates as perceived to be better at info handling
  • new generation of students - have a sharing ethic (web2.0) but not necessarily a sense of structured data management
  • small JISC-funded study to start soon on benefits of data management/sharing
  • can we tap into records management training? a role here for InfoNet?
  • can we learn from museums sector? libraries sector?
  • Centre for eResearch at Kings are developing "Digital Asset Management" course, to run Autumn 09
  • UK Council of Research Repositories has a resource of job descriptions
  • role of data curators in knowledge transfer - amassing an evidence base for commercial exploitation
  • also a need for marketing data resources

Session 3 - Technical and infrastructure issues

This session explored the following questions:

  • what are the main infrastructure challenges in your area?
  • who is addressing them?
  • why are these bodies involved? might others do better?
  • what should be prioritised over the next 5 years?
One of the drivers for addressing technical and infrastructure issues is around the sheer volume of data – instruments are generating more and more data – and the volume is growing exponentially. It must be remembered that this isn't just a problem for all big science – small datasets need to be managed too although the problem here is more to do with variety of data (heterogenous) than volume. It was argued that big science has always had the problem of too much data and have to plan experiments to deal with this e.g. LHC in CERN disposes of a large percentage of data collected during experiments. In some areas, e.g. geospatial, data standards have emerged but it may be a while before other areas develop their own or until existing standards become de facto standards.

Other areas touched on included:
  • the role of the academic and research library
  • roles and responsibilities for data curation
  • how can we anticipate which data will be useful in the future?
  • What is ‘just the right amount of effort’?
  • What are the selection criteria – what value this data might have in the future (who owns it, who’s going to pay for it), how much effort and money would you have to regenerate this data (eg do you have the equipment and skills to replicate it?)
  • not all disciplines are the same therefore one size doesn't fit all
  • what should be kept? data, methodology, workflow, protocol, background info on researcher? How much context is needed?
  • how much of this context metadata can be sourced directly e.g. from proposal?
  • issues of ownership determine what is stored and how
  • what is the purpose of retaining data - reuse or long-term storage? Should a nearline/offline storage model be used? Infrastrucutre for reuse may be different from that for long-term storage?
  • Should we be supporting publication of open notebook science? (and publishing of failed experiments). What about reuse/sharing if there’s commercial gains?
The summing up at the end concluded 4 main priority areas for JISC:
  1. within a research environment – can we facilitiate the data curation using the carrot of sharing systems? (IT systems in the lab)
  2. additional context beyond the metadata
  3. how do we help institutions understand their infrastructural needs
  4. what has to happen with the various dataset systems (fedora etc) to help them link with the library and institutional systems

Thursday, 26 June 2008

ISKO event on information retrieval

Went along to some of the ISKO event on information retrieval today...

Brian Vickery was up first but unfortunately, I missed most of his talk. I did catch the last few minutes though where he asked some very pertinent questions:

  • What is the case for building classifications, thesauri and taxonomies? How does this relate to the needs of Communities of Practice?
  • Are the benefits of controlled retrieval languages strong enough to justify the effort and cost of creating/maintaining/using them?
  • Is there a growing need to harmonise or match terminologies?
  • What is the future for "universal" controlled languages and general classifications/ ontologies?

Next up was Stephen Robertson, giving a researcher perspective. He pointed out that although web search engines have been very successful, other systems cannot say the same - perhaps because the extensive machine learning available to Google et al just isn't feasible for a smaller setup. Roberston mentioned some useful sources of evidence in evaluating retrieval - notably click-throughs and "dwell time" (how long a user spends somewhere before returning to search results). There is some rich data out there but it is also "noisy".

Last up was Ian Rowlands who talked about the implications of the Google Generation report. He started with some context - insecurity around the power of Google, Yahoo branding; devaluing of the "library" brand; the hypothesis that the younger generation is somehow different. He referred to various pieces of research including Carol Tenopir's long-standing survey of academics. The bottom line of the Google Generation report is that it is a myth - yes, there is a type of user behaviour which is comfortable online but the Google Generation is not a homogenous mass of people - "silver surfers" (another irritating term!) demonstrate characteristics too and there are also "digital dissidents" among younger generations who are shunning technology. So, the general message is to stop thinking of our users as fixed targets who fit some kind of stereotype. We need to understand user behaviour much better, in particular, online reading - but then, how much do we really understand about how people read/absorb information in print? How can we be sure what we learn about online reading is peculiar to an online environment and isn't just typical of reading in whatever format?

Rowlands also suggested that we need to help users form "mental maps" of information - typically, when you walk into a library for a print resource, you have a reasonably good image of what you are expecting to find - the same can't be said of the web. There is a message for librarians here to help create easier access to information for users e.g. through less confusing terminology. Information literacy is key but research seems to suggest that unless individuals learn from a young age, the changes possible in user behaviour are more limited. There have been studies demonstrating a correlation between information literacy and academic grades.

Rowlands finished with a plea to understand our users better - stop thinking of them as one big mass which can be served by a one size fits all solution and learn from the commercial world, where customers are segmented and can follow a number of routes to information - though, I have to say, the commercial world doesn't always get it right either and they have greater resource at their disposal.

Thursday, 19 June 2008

JISC away day : part 2

Oh dear, it's taken me a while to finish writing up the away day ... I blame it on the email backlog which was waiting for me when our away day finished.

Anyway, the most useful session (for me) was on the 2nd day - on the new JISC IPR policy. I understand this is going to appear on the JISC web site soon. It's been developed as part of the IPR consultancy. Professor Charles Oppenheim talked us through the background and the key principles behind the policy.

It was also a useful refresher of some of the issues around IPR and the implications for JISC and its funded projects. Charles referred to the 4 reports produced as part of the consultancy:

Monday, 16 June 2008

JISC Away Day part 1

Today was the first day of the annual JISC Away Day. Here are my very quickly typed up notes...

First up, Ron Cooke, the JISC Chair, gave an overview of some recent achievements and looked towards the future and JISC's role in the sector. Malcolm Read gave an overview of key challenges facing JISC and referred to recent market research (e.g. 100% of Russell Group unis have led on JISC projects but figures are lower for other institutions).

Particularly useful to hear from JISC Collections - have noted down the following to look up later: NESLI2SMP; Knowledge Exchange joint licensing; eBooks observatory; CASPER; extending licensing beyond HE (study ongoing); deals with Scottish HEIs. Also noted: JISC Publishers Action Group; paper ebook; Repositories UK; Flourish/TICTOCs as examples of the U&I programme; Emerge community; Web2Rights.

Attended a session on increasing the impact of JISC in the sector. The group discussed who we are trying to reach (funding bodies; institutions; change agents); what messages we need to get across (value for money, influencing strategy/policy); and how. I think an additional question might be when we engage with different stakeholders depending what we hope to achieve. Branding was a key topic and the need for brand management. It was agreed JISC also needs to work on improving understanding of JISC activities within the community, enabling feedback, and finding the right metrics to measure impact. Kerry mentioned that they are currently working on audience analysis to improve the web site - i.e. providing secondary routes to information. It was acknowledged that much of our information is written for experts - there needs to be a more basic level which is more contextual.

The group also discussed what is meant by impact. We need to distinguish between reach (e.g. hit on Google) and impact (affecting behaviour in the sector). What can we learn from service reviews? What can we learn from the Top Concerns work? What value does JISC add to the sector? Methods discussed included institutional visits; networks of moles/champions.

Tuesday, 27 May 2008

Presentation on VREs/MREs

Thanks to Rachel for pointing this out: Interesting presentation from the Eduserv Symposium...by David Harrison of Cardiff Uni...
http://www.eduserv.org.uk/foundation/symposium/2008/presentations/davidharrison

Future of the Internet

BCS are hosting a debate next week - sold out :-( - featuring Jonathan Zittrain and Bill Thompson, looking at appliances (e.g. iPhones, XBox) and the impact they're having. Should we be concerned that appliances stifle the ability to create new things on the Internet, or should we be more concerned about safety and security? Some discussion on one of the BCS blogs - http://www.bcs.org/server.php?show=ConBlogEntry.441

Friday, 23 May 2008

Geospatial resources use in tertiary education: shaping the future

Last week, I attended a workshop organised and run by EDINA, as part of the eFramework workpackage of the SEE-GEO project. The aim of the workshop was to inform future planning and to begin thinking about how geospatial resources might work in a future world. We were asked to look ahead around 5 years - the general consensus was that we would be seeing an evolution rather than a revolution in that time e.g. ubiquity of geo info.

Opportunities and Challenges

Social/political/economic:
  • economics of information - IPR; FoI; access and exploitation
  • what about the knowledge that doesn't lend itself to a digital format?
  • how to handle digital persona - virtual communities and alternative economies
  • divisive nature of technology - a new division of class according to access to technology? does it disenfranchise or empower?
Technological
  • standards and interoperability - impact of Google/Microsoft/Yahoo?
  • how to manage fast paced change and multiple devices
  • still a need to teach and train experts - geo experts will be needed, deeper learning for experts
  • domination of Google/Microsoft/Yahoo - driving technology but have also helped put GI in mainstream
  • data deluge
  • protection/privacy/access/reuse
  • embedding (what does embedding really mean?)

Research

  • need an underlying basic IT infrastructure (e.g. grid, visualisation, mobile) with a spatial infrastructure (e.g. spatial ontologies) overlaid on top
  • Google/Microsoft/Yahoo challenge - raises expectations; discourages sharing?; how well does it transfer to academia?
  • methodologies - lack of skills here - mashups are not research; need to develop more analytical skills in young researchers
  • data - integrity; interoperability; creation (new, repurposed); sharing
  • policy - IPR; funding; publication; RAE/REF; tracking development of information
  • collaboration - technological, social, learning with industry

Enablers

Data/Content
  • Data is currently in layers and "all over the place"
  • What will INSPIRE achieve?
  • funding for infrastructure: interoperability; storage; distribution
  • role of community generated data
  • quality and validation
  • semantic enrichment
  • where does Google/Yahoo/Microsoft fit?
  • Research Council mandates are not enforced
  • how does a researcher deposit a dataset/database?
  • depth/breadth tension
Tools/Technology
  • there is a disconnect between creator and dataset - need provenance info - data/process broker, intelligent catalogue
  • (web) services lead to fundamental changes in models of use e.g. do you need processing power alongside the data - remote processing
  • "handy" mobile needed - portable, light, multiple ports, GPS, wearable
  • sensor networks and notion of central storage
  • tools/portals enable virtual world immersion - deeper sense of telepresence
  • can we learn from games technology?
  • consolidated and converged technologies
  • collaboration and sharing - less travel?
  • different publication needs - raw data; code; published papers
Skills, knowledge, people
  • wider promotion of geo info
  • compulsory GI education
  • funders to encourage outputs to be disseminated
  • policy framework
  • repositories, portals, databases
  • need for academic level specialist support
  • career development
  • professional development
  • networks and communities of practice
Legal/policy
  • funding for methodological development e.g. spatial methods for Grid
  • copyright and intellectual property - derived data, watermarking, commercialisation
  • training - cross-disciplinary; quality
  • data and standards development - involving user communities
  • ethics - code of practice; awareness of issues; data integrity; monitoring
  • support - policy to encourage networking
  • data access policy - feasibility and extent of info in public domain
  • access/usage permissions - who has the right to grant permissions? authentication in a global context
  • collaborative support - policy to enable multi-centre, multidisciplinary, multisector, multinational activity
Social/institutional/economic
  • social software/networking tools
  • wider dissemination of metadata beyond traditional subject boundaries
  • cultural change to cite datasets
  • links between universities and schools
  • changing demography e.g. >adult learners
  • funding - different streams - staffing, content, experimentation
  • benefits - clear roles/responsibilities
  • free or pay to view infrastructure
  • alternative (i.e. to OS) providers now available
  • entrepreneurial drivers
  • REF/RAE should effectively recognise complex and hybrid digital outputs
  • institutional or subject repositories
  • nervousness about depositing material
  • support to clear confusion re IPR especially in relation to derived data
There was some discussion about the role of JISC and its Geospatial Working Group so some messages to feed back.

Also, as an aside, I talked with Dr Douglas Cawthorne from De Montfort Uni in Leicester - they are involved in a large project to map Leicester - the result will be a multilayered map, showing the current city, the Roman city, social maps, emotive maps etc and will incorporate user generated content e.g. photos. Something to watch out for...

Wednesday, 16 April 2008

JISC conference

Yesterday, the annual JISC conference took place in Birmingham - as usual, a very busy day and although I caught up with lots of people, I still managed to miss some of the people I was hoping to catch up with.

3 of my projects gave demos - 3DVisA, NaCTeM and ASSERT - and it was great to see the interest in the people attending. I went along to two parallel sessions: one on the Strategic eContent Alliance and one on rapid community building. Here are my notes from both...

The Strategic eContent Alliance aims to build a common information environment, a UK Content Framework and to gather case studies and exemplars. The UK Content Framework will be launched in March 2009 and will incorporate:
  • standards and good practice
  • advice, support, embedding
  • policy, procedures
  • service convergence modeling
  • audit and register
  • audience analysis and modeling
  • exchange (interoperability) model development
  • business models and sustainability strategies
There are a number of change agents to achieve the vision of the SCA...
  • common licensing platforms
  • common middleware
  • digital repositories
  • digitisation
  • devolved administrations
  • service convergence
  • uk government policy review
  • funding

Globally, there are other incentives e.g.
  • service oriented architecture
  • EU initiatives
  • Google and Microsoft initiatives
  • Open Content Alliance etc
The SCA has also engaged an IPR consultancy and Naomi Korn gave a brief overview of the issues of working in such a content-rich world. Naomi pointed out that it has never been easier to access content and referred to a number of key developments and standards to be aware of:
  • Science Commons
  • Digital Libraries i2010
  • PLUS
  • ACAP
  • SPECTRUM (collections management)
  • JISC registry of electronic licences
  • Open Access Licensing initiatives
Simon Delafond from the BBC talked about the Memoryshare project which enables user-generated content to be recorded against a timeframe to create a national living archive. They plan to build on this project with the SCA to create Centuryshare to aggregate content and augment with user generated content - this will be a proof of concept project due to deliver in March 2009.

Meredith Quinn talked about the recent Ithaka report on sustainability. The paper tackles some of the cultural issues to be resolved to create the right environment for sustainability. Meredith outlined the 4 key lessons from this work:
  1. rapid cycles of innovation are needed - i.e. don't be afraid to try new ideas and to drop ideas which aren't working
  2. seek economies of scale - e.g. Time Inc required all their magazines to use the same platform - not such an easy task to achieve in the distributed nature of HE but maybe this is where shared services come in
  3. understand your unique value to your user
  4. implement layered revenue streams
The rapid community building workshop focused on the Users and Innovations programme and the Emerge community which has been set up to support the programme. Given the nature of the Web2.0 and next generation technologies this programme is dealing with, it was decided early on to adopt an agile and community-led approach. It was important to avoid imposing an understanding on the community and instead build a shared understanding across the community. So 80 institutions were brought together (some 200 individuals) face to face to start to build a community of practice - from there, the community developed further in an online environment, set up using Elgg.

The programme shared the success factors for community building:
  • bounded openness
  • heterogenous homophily
  • mutable stability
  • sustainable development
  • adaptable model
  • structured freedom
  • multimodal identity
  • shared personal repertoires
  • serious fun
some of which are oxymorons! This is explained a little more at https://e-framework.usq.edu.au/users/wiki/UserCentredDevelopment. The approach is based on "appreciative enquiry" coined by Cooperrider and Srivastra in 1987.

It was interesting to hear their thoughts on benefits realisation which focuses on 3 strands:
  • synthesis (of learning etc)
  • capacity building
  • increased uptake
The programme is also planning to create an Emerge Bazaar where projects can "share their wares" and offer services. This will also promote a kind of IdeasForge to encourage new activities which might lead to new funded projects. The Emerge Online conference is next week from 23 to 25 April.

As for the keynote sessions, key points from Lord Puttnam's speech were that we shouldn't try to solve problems with the same kind of thinking that caused them and that we are only scratching the surface of what we can achieve with technologies therefore should be more ambitious and keep innovation high on the agenda.

It was good to hear Ron Cooke highlight the data problem: "...my nightmare is the “challenge of super-abundant data” - not just its life cycle, but its superfluity with the new, unprecedented increases of data through Web 2.0 and user-generated content, including academic publishing in real time, blogging without control, and the quality and reliability of data. I am also concerned about the demands of skills it places on us - critical assessment is needed to deal with this data."

I missed Angela Beesley from Wikia but am pleased to see someone has summarised the talk http://librariesofthefuture.jiscinvolve.org/2008/04/15/jisc-conference-closing-keynote-speech-angela-beesley/ :-)

The SCA team have blogged the conference (far better than i have!) which you can read at http://sca.jiscinvolve.org/2008/04/15/.

The conference also saw the launch of the Libraries of the Future campaign (http://www.jisc.ac.uk/whatwedo/campaigns/librariesofthefuture.aspx).

Tuesday, 19 February 2008

eInfrastructure programme meeting

Earlier this month, we organised a Programme Meeting, for new projects (funded through the Capital Programme) and existing projects. It was a great opportunity to get projects talking together and we need to think about what events will be useful in the future...

The presentations, notes and soon-to-be-uploaded audio available online here.

Thursday, 3 January 2008

VRE 1 - lessons learned

Just found my notes from the JISC Conference 2007 and rather than lose them again thought they might be more useful here:

The session on VREs raised a number of questions:
  • is a VRE a warehouse or a federated repository?

  • should it be a social space or an organised rich space?

  • what is the right level of granularity?

  • should content be open or protected?

  • what about desktop integration?

  • how can we enable added value by researchers?

  • how is software evolving? how can we make it sustainable?

Roger Slack talked about users:


  • requirements gathering is not a one-off - it is longitudinal

  • need active involvement of all partners

  • enfranchisement - spell out benefits to users

  • ensure funding to support championing

Tuesday, 4 December 2007

Models of sustainability workshop

Yesterday, I went along to the workshop on Models of Sustainability, jointly organised by JISC and the Oxford eResearch Centre. It was an interesting workshop, with a keynote by Malcolm Read followed by presentations from eScience projects. Here are a few of the issues and ideas discussed:
  • Have eScience Programme outputs reached the level of sustainability needed? How long is a reasonable length of time to expect a step change?
  • Can we have generic tools given that research itself is not generic?
  • How much tinkering of tools and software do researchers actually want to do?
  • Although we have a culture of sharing software, there isn't the same culture of sharing data (file sharing does not equal data sharing!). The problem is not technological but cultural
  • In time, will current students bring their social networking skills into research?
  • What can be learned from Athens to Shibboleth move in terms of running a development programme to inform a production programme?
  • Sustainability = continuing efforts, changing culture, reducing duplication, encouraging sharing and discussion, open collaboration. Must not forget the broader sustainability agenda (e.g. HEFCE shared services programme)
  • The sofware engineering needs to be sound and built on a solid framework. Academia is perhaps not geared to developing robust software and middleware; funding agencies generally haven't funded software development; career progression reward for those developing software is difficult; staff recruitment and retention issues; sustainability not even on radar screen of many HEIs and most academics
  • One option is the spinoff company - in this instance, it is important to establish trust between company and university. Takes time to get technology to market. The DTI/Technology Standards Board follow on is a great bridge. Keep the team together as far as possible
  • The team needs a mix of scientific, domain, financial and business experience
  • Sustainability depends on users but need to promote long term view (vision of integrating compute, data and collaboration is not easy for researchers in a hurry with a short term view); new ways of working takes researchers out of their comfort zones
  • If you want to continue to innovate, maintaining what you have becomes more difficult – issues of scalability, competition for support. There is a tension between maintaining an infrastructure and innovating
  • Sustainability lessons – work with user community; constantly innovate and deliver; develop modular easy to use software; strong promotion, personal ownership; vision - many new ideas
  • Innovation has 2 strands – new technical capability, new science communities
  • Is there a role for Full Economic Costing in sustainability?
  • Need to get across that software and data are facilities and therefore need managing
  • What is the role of institutions in helping to sustain a project?

Slides will be available from http://www.jisc.ac.uk/whatwedo/programmes/programme_einfrastructure/modelsofsustainability

Friday, 5 October 2007

Participative web conference

The blog from the Participative Web conference this week has been a really good read. Richard Ackerman has posted on citizen science - "It is clear that the rapid pace of change is pushing those involved with science infrastructure to think about ways to interact with a broader public, to take advantage of the energy and creativity of the general population, promoting greater understanding of and participation in science. "

Richard has also given a summary of a talk from Andrew Herbert from Microsoft - interesting points include: how sensor networks will enable real-world data to be used in simulations; how to get the right balance of skills in the research workforce.

There's also some handy guides to some of the key themes of the conference, including eScience. The blog links to transcripts as well as the webcast, and to some videos (including two relevant to eScience, by Andrew Herbert and Walter Stewart).

Tuesday, 2 October 2007

Tips for conference bloggers

Really useful tips on blogging from a conference
http://www.lunchoverip.com/2007/10/tips-for-confer.html

Wednesday, 19 September 2007

Report from BCS KIDDM Mash-Up

On Monday, I went along to the (BCS KIDDM Knowledge Mash-up ) - only stayed for the morning and was a bit disappointed that the day wasn't as interactive as the title suggested. The talks in the morning were quite high level too but it was interesting. Came across the BCS Geospatial Group for the first time.

Peter Murray has written up some of the day's presentations on his blog.

Conrad Taylor, introducing the day, covered issues around mark-up and tagging, referring to the difficulties of marking up audio/video and unstructured text; time constraints; and difficulties of subject classification.

Tony Rose talked about information retrieval and some of the innovative approaches out there:
  • semantic searching - as demonstrated by hakia and lexxe

  • natural language processing - as demonstrated by powerset and lexxe

  • disambiguation - as demonstrated by quintura
    and ask

  • assigning value to documents - as demonstrated by google

He sees future of search as addressing the following:
  • rich media search
  • multi/cross lingual search
  • vertical search
  • search agents
  • specialised content search
  • human UI
  • social search
  • answer engines
  • personalisation
  • mobile search

Tom Khazaba from SPSS talked about their products for text and data mining and the various applications they're used for (CRM, risk analysis, crime prevention etc). He stressed that the results of text analysis have to be fitted into business processes and mentioned briefly how Credit Suisse have achieved this. He listed the keys of success of text/data mining solutions:
  • ease of use
  • supports the whole process
  • comprehensive toolkit - ie features visualisation, modelling etc so all you need is in one place
  • openness - using existing infrastructure
  • performance and scalability
  • flexible deployment
Ian Herbert, from the Health Informatics SG, talked about the recent work on SNOMED-CT and its application in NHS CFH. SNOMED-CT will allow pre-coordinate and post-coordinate searching. The main challenge has been in capturing the depth of clinical expression. Concepts have qualifiers e.g. pain has a qualifier indicating severity. There has been some work mapping to MeSH although Ian seemed to think this wasn't complete. The key challenge facing the team now is rolling out - there are few examples of its use in a real-time environment. It remains to be seen if health professionals will take well to using during consultations - it is quite a complex system and as Ian admits "users want the biggest bang for their keystroke buck".

Dan Rickman introduced geospatial information systems. He referred to the importance of metadata and ontologies for handling the large volumes of unstructured data. In geospatial information, there is also a temporal aspect as many applications will view an area over time. He mentioned OS' work on a Digital National Framework which has several principles:
  • capture information at the highest resolution possible
  • capture information once and use many times
  • use existing proven standards etc
Dan also mentioned issues around object-based modelling. The OS has developed TOpographical IDentifiers (TOIDs) to identify every feature in Britain. He also mentioned the Basic Land and Property Unit (BLPU) which would be used to describe, for example, a hospital (which may have many buildings, "owned" by different organisations). He also talked about neogeography which has arisen from the web2.0 explosion.