Wednesday 19 September 2007

Report from BCS KIDDM Mash-Up

On Monday, I went along to the (BCS KIDDM Knowledge Mash-up ) - only stayed for the morning and was a bit disappointed that the day wasn't as interactive as the title suggested. The talks in the morning were quite high level too but it was interesting. Came across the BCS Geospatial Group for the first time.

Peter Murray has written up some of the day's presentations on his blog.

Conrad Taylor, introducing the day, covered issues around mark-up and tagging, referring to the difficulties of marking up audio/video and unstructured text; time constraints; and difficulties of subject classification.

Tony Rose talked about information retrieval and some of the innovative approaches out there:
  • semantic searching - as demonstrated by hakia and lexxe

  • natural language processing - as demonstrated by powerset and lexxe

  • disambiguation - as demonstrated by quintura
    and ask

  • assigning value to documents - as demonstrated by google

He sees future of search as addressing the following:
  • rich media search
  • multi/cross lingual search
  • vertical search
  • search agents
  • specialised content search
  • human UI
  • social search
  • answer engines
  • personalisation
  • mobile search

Tom Khazaba from SPSS talked about their products for text and data mining and the various applications they're used for (CRM, risk analysis, crime prevention etc). He stressed that the results of text analysis have to be fitted into business processes and mentioned briefly how Credit Suisse have achieved this. He listed the keys of success of text/data mining solutions:
  • ease of use
  • supports the whole process
  • comprehensive toolkit - ie features visualisation, modelling etc so all you need is in one place
  • openness - using existing infrastructure
  • performance and scalability
  • flexible deployment
Ian Herbert, from the Health Informatics SG, talked about the recent work on SNOMED-CT and its application in NHS CFH. SNOMED-CT will allow pre-coordinate and post-coordinate searching. The main challenge has been in capturing the depth of clinical expression. Concepts have qualifiers e.g. pain has a qualifier indicating severity. There has been some work mapping to MeSH although Ian seemed to think this wasn't complete. The key challenge facing the team now is rolling out - there are few examples of its use in a real-time environment. It remains to be seen if health professionals will take well to using during consultations - it is quite a complex system and as Ian admits "users want the biggest bang for their keystroke buck".

Dan Rickman introduced geospatial information systems. He referred to the importance of metadata and ontologies for handling the large volumes of unstructured data. In geospatial information, there is also a temporal aspect as many applications will view an area over time. He mentioned OS' work on a Digital National Framework which has several principles:
  • capture information at the highest resolution possible
  • capture information once and use many times
  • use existing proven standards etc
Dan also mentioned issues around object-based modelling. The OS has developed TOpographical IDentifiers (TOIDs) to identify every feature in Britain. He also mentioned the Basic Land and Property Unit (BLPU) which would be used to describe, for example, a hospital (which may have many buildings, "owned" by different organisations). He also talked about neogeography which has arisen from the web2.0 explosion.

No comments: