Eponymysing Dr Dozer

Thursday, December 22, 2005

holiday

Holiday comming up. My house is a mess. Got a working shower and kitchen, the heating works and my doors lock. But, most walls don't have wallpaper (some don't have plaster) and there's no carpet downstairs. Family arrive in a few hourse. Joy!

Been doing my bit for wikipedia - an article a day keeps the register at bay - but my spelling and grammar perhaps count against my improvements to content. Perhaps the articles hould be spell-checked?

Reading my way through a load of sci-fi. Couldn't finnish Disaster Area - found it just a bit turgid. Like P K Dick without the art. Hey ho. Started Systems of the World, so hope to have completed that trillogy before the holiday is out.

Wednesday, November 23, 2005

Revisiting symbols

I'm taking yet another look at the bjv2 biological symbols model. This time I've sliced it differently yet again. Ambiguity represents things that are potentially ambiguous. Symbol represents things that are not ambiguous. Token is a super-interface of both. Alphabets are over Symbols and have methods to obtain Ambiguities for symbols. A TokenBuffer contains tokens and is specialised for AmbiguityBuffer and SymbolBuffer. Similarly, the io tokenization stuff works with Tokens, and it may be necisary to specialise this for ambiguities and symbols.

I hope that this particular way of slicing things will give a better level of control over the symbols, their stoorage and extra API that particular symbol classes expose. Also, I'm hoping to be able to retro-fit basis symbols and use annotations to code-generate mapping functions to extract components of the basis symbols with full type-safety. For example, I want to be able to have an alignment with getQuery(), getSource() and getState(), where we know the classes of the tree components, and can extract a TokenBuffer for just one component of this e.g. the state, still with type-safety.

So - now the API is roughly in place, I need to write some apps that use this, fill in the booring implementation code and see how easy/prety it is to use.

Tuesday, November 22, 2005

Morning Meeting

That was interesting. I spent the morning in their meetings - one for the projects and one about setting up a group to write a requirements report for a data grid.
The first meeting we covered some stuff re. ISWC and how myGrid had coverage there. Also, Pete Li showed taverna using some web-services from the pathport guys, and some realy cool blast data visualisations. It was nice to see my data viewers stuff being used as it was intended.
The seccond meeting was all about setting up a data-grid. The data is exposed as both files and database connections. There is a need for security - both in terms of access through the API and of the service boxes re. being hacked. They have potentialy lots of data and are currently moving it by external hard disk. Sounds like the early days of the genome project.
Spoke with the e-fungi person again. We talked about the development platform/stack. I pushed for svn + maven2 - they are using eclipse, which is fine.
Next, got the OWL-DL lunch. BYO food. Looking forward to it.

Recap

Yesterday went well. Saw loads of people. Had a long chat with Sean about the owlapi. Spent some time with Daniele and Ula talking OWL. I think ComparaGrid will work. We may not even need to write it all ourselves.

Lunch with Normal was good. We spent 1/2 the time talking about microbase and half about ComparaGrid. I think we will be able to use their e-fungi group as alpha-testers for the microbase pipeline v2. He thinks my data integration stuff is ... ambitious. We will see.

Today I see Carole. Talking about Phil's grant, among other things. Let's see what comes out of this. Also on a grid meeting at 10 about some bioinf, and lunch - originaly with Ian Horrocks but apparently now we are all going out for a chinese or something. Hola.

Sunday, November 20, 2005

Manchester or bust

Off to Manchester to speak to OWL people. Carole was very keen I visit, and I need to spend some time with these guys anyway to make sure that I'm not doing anything realy stupid with the ComparaGrid design. They may even have some stuff lying arround that I can use in the implementation.

It is a realy frosty morning. The train is freezing. There's fog (probably freezing fog) stretching out to the horizon and all the grass and trees are covered in a layer of silver. 30 min to York, and then I change for Man Pic - here's hoping the next train has some heat.

Been documenting the more finnished biojava 2 APIs. Services and Cli are nearly at release-level. Some of the students in our department have been using bits of them, apparently without too much heart-ache. Rob from down-south had some rpoblems with services, but we got that straightened out in the end. It turned out to be a user/docuementation problem.

Builders starting work on Wednesday, apparently. First thing is the heating. (You can tell that's high on my agenda right now.) I think doors will be done early too. Just got to hope that the weather doesn't burst the pipes first.

Thursday, November 10, 2005

Thursday keynote

By Daniel J Weltzner, MIT

Sociology, law, the web.

In real life, although the laws exist, we tend to mediate through conflicts without recourse to the machinery of law enforcement. Privacy on the web is an issue, and is exassabated by the ability to do data-mining.

In the past, we've addressed privacy and intrusion by controling who has access to data, can collect data, can share data. Privacy now is perhaps more about controlling how information is used.

IP protection is currently about big producers (film, music) protecting investment maximising proffit. THis is inapropreate for protecting diffuse and non-proffit content that only becomes worth much more when agregated / integrated.

compare/contrast two models: trusted third parties vs de-referencable 2nd parties. Didn't realy tackle the motivations for people and organisations behaving well. But over all, a very interesting talk.

Wednesday, November 09, 2005

Value editing and displaying annotations

It looks like we've got three lots of java5 annotatoins that do related things. Bjv2 has annotations for making command-line programs from beans. You annotate the bean with @App and the setters with @Option, and from this it works out what switches are legal for the app. Additionally, the main method is on the bean instance, making it easy to instantiate lots of them e.g. for a parameter sweep or as a web-service worker.

Keith is doing some stuff with annotations that binds UI components that associates UI components (JLabel, ...) with bean properties. Then, the system generates forms using this, and routes the event handlers correctly.

Jimi is generating html forms from beans using annotations that describe what html form elements to use. This has logic for converting between the form data type (usually string) and the bean data type.

All three of the systems appear to be doing similar things using similar tech. Once all three have been proven to work, I think it's time to see if we can rationalize some of this - re-use some interfaces, extract some design patterns best practices, or mabey even add an extra layer of indirection and get a single core system out.

Continuum

Had a go at installing continuum on deanmoor. No joy so far - can't get it bouncing through apache2. Also, the xml-rpc bit seems to be alive, but I'm not convinced that the rest is. Something to ask Dan about on Monday.
I was hoping to get continuous integration working for all the fluxion projects, and then perhaps do some fancy things where the bits get checked out to a test server with a dummy client clicking on things and so on. Not today though :-(
Time to see if there's a bar open by the formal dinner.

SPI for renderers

Working with Rob over MSN to get renderlets for OWL glued in via an SPI. Think he gets it. At least it's starting to work a bit.

The idea ultimately is to be able to load in renderlet-containing maven resources and thus extend the rendering at run-time. However, for now we can live with several jars that get built and imported into the war.

Wednesday afternoon - ontology mapping II

Semantinc mappings from xml-schema. Seems to be about translating data from xml-schema-valid xml into an ontology. This is some sort of argument for why comparagrid shouldn't primarily rely on an xml data interchange - it's a research topic converting this into marked-up data. OK - bi-directional properties make things tricky for realisation as a semantically-marked-up schema. However, this can be disambiguated by adding extra tags here and there. Rarification doesn't help. Over all it was encouraging, but not sure how appliccable unless you control both the schema and the ontology.

Probablistic modelling - they got the wrong paper printed due to a versioning fuckup. Dho! Seemed OK

The next one was a baysean-based approach - I didn't follow it. Was one of the guys that looked at our poster last night and had some (construcive?) critical remarks.

Wednesday afternoon

Ontology alignment part I

Given two ontologies O1 and O2, align one to the othher means for each entityh in 01, find a concept in O2 with the same intended meaning - ignoring structure issues. End up with equivalence links between the two ontologies. Compare a concept by text-comparison of the label (wordnet, etc.) and by properties it has (including name, cardinalty, ...) and then the classes on the end of the properties. Calculate some weights. Take average (or otherwise agregate). Threshold. Call those similar. So - how to choose the weights? They end up with a decission tree that does the job better than people. It doesn't seem to use the taxonomy - I guess looking at the decission tree rules would tell you at what level people are capturing the 'core' relationships in their domain.

The next talk makes the good point that equivalence between terms in different ontologies is less likely than subsumption between them. Example of mapping eu agric food to us grocery food. So far, these approaches have only worked because of injecting extra info e.g. from wordnet - this kind of indicates to me that the way ontologies are being used is impoverished relative to dictionaries. Users or representation?

The next one is a string-similarity-metric based approach. His laptop isn't talking to the projector :-( Let's hope there's no associated powercut like yesterday. Humph - not sure what this guy was doing other than getting fancy string matching.

Tuesday, November 08, 2005

lsid resolver and client

How hard can it be to implement an lsid resolver and lsid client? Can I do it in one 1 1/2 hour session? Will my battery last?

OWL import mechanisms and semantics

OWL supports ontology importing using owl:import. The semantics of what entities to consider are RDF-based. All the [x owl:imports y] statements in an ontolgy cause y to be loaded, scanned for owl:import statements and so on. Eventually, you have the set of processed resources (hopefully all OWL, presumably all rdf-graphs) for which for every y, the processed resource is in the set. This all seems fairly sane.

The semantics of what happens to the importing ontology are also defined for owl:import. This is marginally less sane. If [x owl:imports y] then all the content of y is added as content of x. This is like a #include, not a java import. It makes sence that eventually you will want to create this merged data-structure for sending off to a reasoner, but it looses the information about where particular content originated from.

So - things I would like from an import mechanism:

  1. Seperation of the model for the ontology from the model of the imports closure as an ontology - let's call this the closure ontology.
  2. The ability to perform ad-hock ontology merging.
  3. Keep full information about which ontology makes a particular statement.
  4. During edit, make sane choices or at least provide options for which ontology to update.
  5. Associate semantics with imports.
The semantics of imports could be leveraged in interesting ways to help the publisher do sane things, and help tools enforce this. Our experience is that a model represented as an Ontology contains concepts and relationships, but also some limited number of individuals (e.g. acting like Enum values).
  1. In OWL-DL, it makes sence to define a model in oneontology (model ontology) and then import this into several ontologiesthat publish individuals that populate these concepts (data ontology).owl:implements could extends owl:imports, with the semantics thatinstances introduced in the data ontology must be instances of conceptsin the model ontology.
  2. Some ontologies, for example the OWL-DL view of GO, provide lots of terms that you would like to use, but it would not make much sence to sub-type the concepts - just refer to the individuals defined.
  3. Hide the internal concepts of an ontology to be imported. In reallistic ontology design, there are often concepts that are present for modelling reasons but which have little relevance to the user of the ontology and can be confusing. An import option could be provided to expose references to the 'public' terms, while making the 'private' terms opaque. They still exist for reasoning purposes, but are not present when in the browsing / editing roles.
I think all this can be trivially supported in any particular OWL api, but it would require some extention to the current owl:import stuff.

OWL api lunch

Just had lunch with Sean - one of the OWL-api authors. Talked about the resolver stuff, ontology imports and memory management issues. We didn't show tell with code, but it generally sounded positive. I don't realy want to become manager for yet antother project, but he was talking about writing a v2 of the API.

Visualisation

Queries: Good intro to why query languages are sucky. They present an approach using ACE to represent queries in a controlled English. Costs to learn ACE. Does seem to work though - better than writing SQL or looking at the semantic query language. Students seem to be able to use this system. I liked this talk. Reminded me of a role-playing game API that was backed by RDF that described what objects actions and individuals were "live" and used this to disambiguate user input.

Looks like I am definitely not in the visualisation track...

Using tripples directly in implementation: why, mvc architecture, tool suirre, demo. MVC falls down with model-sqew - also tends towards early transformation of data model to the stuff required for the viewer. But things like owl-specific APIs are there because the semantics of OWL are not that of RDF - it's just that RDF is the serialization. OK - now introduced mediators - maps between tripples and high-level views (attributes, hierachy, ...) - should consider this very seriously for the OWL-api stuff. These guys have a tripplestore that they claim works. At the top, the Triple20 editor and sesame java client.

Tuesday morning b4 coffee

Some awards. I should probably read the papers that got awwards.

Carole is giving her keynote. Big science collecting big data with virtual collaboration. Heavily slanted to bio domain. Very up on RDF. What's crap about the attitude of the semantic web community to bio people. Lack of tools, understanding, expecting too high a level of geeky skills.

Visitors

My first 3 cracks at type-safe visitors all failed. In exciting and sometimes colorful ways (including one code-base that made it through javac but class-casted at runtime!) I've come up with another way to represent visitors, with one visitor type for each tree node type. This is closer to the ML data-structure switching stuff. However, it's more verbose. On the other hand, it is vastly configurable and perhaps may be typesafe.