Editing workshop (British Fantasy Society)
10gbp (€11.54 / $12.12 / £10)

Post created with https://apps.rhiaro.co.uk/latinum
Editing workshop (British Fantasy Society)
10gbp (€11.54 / $12.12 / £10)
Post created with https://apps.rhiaro.co.uk/latinum
In reply to:
Dropping in and out of the 5th Scottish Linked Data Interest Group workshop.
+ http://datasets-satin.telecom-st-etienne.fr/aboutet/DeSN15/
Amy added http://datasets-satin.telecom-st-etienne.fr/aboutet/DeSN15/ to https://rhiaro.co.uk/bookmarks/
This is a summary of a few bits and pieces that stood out to me from ESWC2015. I haven't covered every session I attended or paper I saw, just the ones that remained with me (other people will do full summaries of all the paper sessions I'm sure, or you can refer to the programme or Fabien Gandon's closing slides which have an excellent summary). For a more 'live' overview of my view on the conference you can see everything I posted during it.
Overall, I had a great experience, met some fantastic people and absorbed lots of interesting ideas. I feel more positive about work in linked data; I'd been slacking off following the community for a while, but I've been reassured that there are plenty of practical-minded researchers out there who are doing great things, and I'll be paying more attention again henceforth. Daily swims in the sea probably didn't hurt.
The developers workshop was great, full of people positive about building tools and applications, and finding ways to make the power of linked data accessible to actual end users. The focus was on building for web developers rather than on end-user applications, with projects being great libraries and tooling for working with linked data, as a way to bridge the gap. There was an air of frankness, with attendees keen to address problems openly, without handwaving or glossing over things that weren't working out. There was even live debugging during presentations.
Here's the program, with links to projects and repos.
I missed the final discussion session, but this was recorded and I hear it was good.
For a philosophy of the web workshop, the talks and discussions during this workshop were around pretty pragmatic issues. In particular how we can obtain true decentralisation, problems with centralised DNS and internet infrastructure, the lack of attention paid in this community to security issues, and the importance of understanding social processes and current practice for ensuring the web continues to function and that we don't "break it by accident" (Henry Thompson). These aren't things that tend to get much of a forum at conferences like ESWC, but semantic web academics being at the forefront of a truly linked information space should definitely be encouraged to think about the effects of our work on society, particularly underprivileged and minorities.
This workshop - usage analysis and the web of data - had a general focus on understanding and getting the most out of the web as we know it today, in order to shape the web we want in the future. As well as traditional paper submissions, they were also accepting submissions via blog posts, and will continue to accept articles on an ongoing basis, which is a great way to keep the discussion alive. I was gutted to miss Max van Kleek's keynote "Not in my Castle" because I got the timing wrong, but I hear it was awesome.
Harry Halpin and Francesca Bria worked on an interesting project to map social innovation projects (like hacklabs, open data initiatives, community enterprises) across Europe. It wouldn't have been strictly necessary to use linked data for this, and doing so might have actually caused the site to be pretty slow. However, it allowed them to do a bunch of interesting network analysis on the hundreds of different projects and organisations mapped and gain some insights into how to strengthen such initiatives (for example, by increasing collaboration opportunities). Also, I suspect technologies for building sites on the back of linked data have probably improved quite a bit since this work was started, so the speed issue might easy to overcome. I paid attention cos I'm generally interested in putting stuff on maps but I'd really like to see more decentralised mapping things; projects/organisations publishing their information independently as linked data, such that they're in control over what's available, rather than having to submit their info to a centralised service.
Seyi Feyisetan from Southampton discussed different factors that affected the performance of crowd workers, by looking at features of the tasks themselves rather than the platform or rewards, when asking workers to classify entities in tweets. It was suggested that their results could be used to work out if NER on your microposts dataset would be better performed by machine, human experts or crowdworkers, depending on the contents of the dataset.
Revanthy Krishnamurthy presented about using general background knowledge and the contents of tweets to detect the location of twitter users, as most twitter users don't have geolocation enabled when posting. They did smart stuff like correlating mentions of events, landmarks and slang terms with physical places, but I don't remember them saying much about respecting the privacy of people who actively don't use geolocation..
The demos and poster session I thought was particularly lively. I liked that it didn't overlap with any other sessions, and breakfast cakes and fruit were distributed through the demos area. It was pretty cramped though, and possibly would have been better off earlier in the week too (it was in the morning of the last day).
I appreciate the principles and technologies behind Sarven Capadisli's Linked Research project, to the point that I implemented it for one of my own papers immediately. Encouraging web scientists to publish their research using the native web stack, using RDF to make research queryable and discoverable on a more granular level - in other words, to practice what we preach - is a worthy goal. And it was really easy to set up. Everyone should do it.
Entity annotation isn't something I know much about, but I think I understood a bit more after talking to Ricardo Usbeck about GERBIL, a tool for evaluating entity annotators. This easy to use online tool lets you compare some of the different annotators available against different types of datasets, to see which would perform the best for your particular use case, without you needing to access any of the test datasets yourself (as you often have to pay for licenses). You just plug your annotator in and leave it running, and it returns results. This also allowed them to check reported results of popular annotators and compare them according to different standards, for a more well rounded view of their capabilities. I dunno if the preceding paragraph made much sense, but that's what I got.
Had some great discussions about microformats, RDFa, schema.org, federated/decentralised social web stuff and whatnot. I was also approached by a bunch of people who knew who I was from reading my WWW2015 post o.O Which was weird, but most people seemed to like it..
Finally, this was one of the better catered conferences I've been to, so props for that :)
Just notes!
Workshop by Dr Mimo Caenepeel on Monday 22nd April.
'Critical' does not mean you have to pass judgement, or say why it's good or
bad.
Not taking things at face value.
Started with freewriting about what has particularly influenced / inspired our own research. Five minutes, not allowed to stop or edit, don't worry about quality of writing, not for anyone else to read. A good way to get ideas out of your head and start to organise your thoughts without censoring or constraining yourself.
How many pages will a review usually take up in a thesis? My policy is to write what needs to be written and stop when you're done. But apparently 20 to 30, sometimes more, is normal in sciences.
There's no consistent / right answer to 'how many publications to review'. For some people it's in the tens, for some the hundreds.
Think about how to integrate literature review into the thesis. You're unlikely to have a chapter that is just 'literature review' and no mention of the background reading elsewhere.
Good qualities for a lit review?
\\\\- Coherence (avoid fragmentation)
\\\\- Structure, clarity.
\\\\- Proof of novelty - purposeful.
A review can often be considered as an indicator of the quality of the rest of the research - demonstrating scholarship.
A good place to start:
1\\\\. Write your research question, formulated as a question.
2\\\\. Write up to five research areas that are relevant to your research
question.
3\\\\. Note some related issues/areas that will not be considered in your review.
Think about balance of content.
1\\\\. Three studies influential in your field (I couldn't answer this, I clearly
need to read more).
2\\\\. Two significan older contributions.
3\\\\. Five recent sources.
4\\\\. Two sources that have strongly influenced your thinking.
You don't need to consider all papers in the same level of detail. Decide which papers are more important / useful than others.
For some papers (important ones) you should work through these questions in
the same way every time you read something (this is 'SQ3R'):
1\\\\. Survey: What is the gist of the article? Skim the title, abstract,
introduction, conclusion and section headings. What stands out?
2\\\\. Question: Which aspects of the research are particularly relevant for your
review? Articulate some relevant questions the article might address.
3\\\\. Read: Read through the text more slowly and in more detail and highlight
key points / key words. Identify connections with other material you have
read.
4\\\\. Recall: Divide the text into manageable chunks and summarise each chunk in
a sentence.
5\\\\. Review: To what extent has the text answered the questions you formulated
earlier?
Critical reading (these seem like really useful questions to work through
whilst reading papers):
1\\\\. What is the author's central argument or main point, ie. what does the
author want you, the reader, to accept?
2\\\\. What conclusions does the author reach?
3\\\\. What evidence does the author put foward in support of his or her
conclusions?
4\\\\. Do you think the evidence is strong enough to support the arguments and
conclusions, ie. is the evidence relevant and far-reaching enough?
5\\\\. Does the author make any unstated assumptions about shared beliefs with
readers?
6\\\\. Can these assumptions be challenged?
7\\\\. Could the text's scientific, cultural or historical context have an effect
on the author's assumptions, the content and the way it has been presented?
See Ridley, D. The Literature Review: A step-by-step guide for students. Sage Study Skills Series. Sage Publications, 2011 (2008).
Last modified:
Just notes from a three-hour workshop about how to write an Informatics thesis, on the 16th of April.
State contributions (to knowledge) explicitly. Intro, conclusions; each chapter should have some (probably not all) contributions discussed. Be obvious; use headings.
Knowledge - background:
Evidence, well-reasoned arguments, acknowledge limitations.
Clear openings for future work. Be clear where they are.
Make it reproduceable.
Short / concise. Examiners like short theses.
Introduce what's interesting and important.
When outline thesis, look at structure of main argument, not of document.
Background material must have point. Only include as much detail as you need
to make point.
Points, eg:
Then we had five minutes to write down what our PhDs are about and what we have already found out. I wrote:
How do the futures of the Semantic Web and amateur digital content creation
fit together?
Can Semantic Web tools and technologies be used to enhance collaborative
creative partnerships and encourage fruitful outputs?
_
There are knowledge sharing systems and collaborative tools for scientific
fields and in education, but nothing for creative artsy things.
Attitudes towards data sharing and privacy amongst content creators are in
flux. There are lots of projects and energy around open data and
decentralised social networks that allow data to become portable and not tied
to one platform. One of TBL's visions for the Semantic Web is the dissolution
of data silos and 'walled' applications that disadvantage the user, and as
such the promotion of the 'ownership' of a user's data by the user themselves,
rather than the software or organisation that uses the data.
__There are lots of reasons people make content. There are lots of reasons
people don't make content (who could / would like to)._
[Notes resume]
Use backreferences; don't repeat yourself.
Info / advice
...homepages.../sgwater/resources.html
..homepages.../imurray2/teaching/writing
Style: Toward Clarity & Grace (book)
The Craft of Research (book)
When to start writing thesis?
Don't assume appendices will be read. More for extra info if needed by people trying to reproduce your work (not your examiners).
Too many direct quotes look like you don't understand and are avoiding explaining yourself.
Keep copies of web resources and cite access dates in case they change /
disappear.
Figures might be copyright if you just copy them from papers, even if you cite
them. Remake them, and put 'adapted from' as citation.
Examiners?
No grading system (ie no different levels of passed PhD). Might be external prizes if you want extra recognition.
Last modified:
The UK Ontology Networks Workshop took place over one day in the Informatics Forum.
There was a mix of people there; some talks were way over my head and very technical, and some talks were by people who confessed they had had to look up "ontology" that morning. And things in between.
Lazy writeup, but following are notes as I scribbled them:
John Callahan
US navy research.
Focused information integration.
Human intervention to keep predictive part on track. Tweaking.
Alan Bundy
Interaction of representation and reasoning.
Changing world so agents must evolve. How to automate? What would trigger a
need for change:
Inconsistency
Incompleteness
Inefficiency
how to diagnose which?
Interested in language and perception change.
Unsorted first order logic algorithm called Reformation. Based on standard
unification algorithm.
Allows blocking and unblocking unification.
Phil Barker
Schema.org
Cetis (JISC funded)
learning resource metadata initiative.
Big names behind schema.org.
= ontology + syntax
Big and growing ontology.
Dumbed down for people.
LRMI adds to it. W3C go through it. It's creeping, how much do the big names
actually care about stuff that's added?
don't know how Google uses it.
People should consider using it for more sophisticated search and
disambiguation.
Gill Hamilton
Doing more with library metadata. Learnt from OKFN. Had to convince people in
charge.
Dublin core, didn't like; not specific enough. Instead RDF > OWL. "We know
best how to structure our data"
Hardest was convincing marketing people that there was no commercial value. Metadata is advert to actual resource.
Enrico Motta
Traditionally top down approach. So now so many people interacting with
semantic structures, so should involve users.
Recognise there isn't a unique or best way of doing things.
Initial study included modeling task with binary relations.
Patterns that are more or less intuitive. 4D least, 3D+1 most.
N-ary most widely used by experts.
Relationship between reasoning power and intuitiveness of writing? More creativity needed for simpler ones. (Not really sure what he's saying)
Email him for copy of study.
Chris Mellish
Ontology authoring is hard. Better ways to do it.
Controlled language input (mature tech); responsive reasoning (also mature, information as you're editing); understanding the process (beginning to understand more).
Hypotheses:
users don't know what they're doing. What if questions. Many answers, what is
relevant? Depends on context.
Authoring as dialogue.
Todo list.
Useable in the same ways as protégé.
Peter Winstanley
UN classification schemes.
Various vocabularies.
Allow development of cross mapping between government administrations.
Mostly internal currently. Moves to bring externalizing data into the 21st century.
Peter Murray-Rust
Fight for your Ontologies.
Ontologies in physical sciences. Chemists don't want ontologies. They'll sue
you.
Crystallography uses 'dictionary'. Written in CIF. 20 years to build CIF.
Compare physical sciences to government.
Every program author writes dictionaries that work for them. When different parties agree, promote to communal dictionary. Provide conventions to help disagreements.
Show a company can do it as opposed to a rabbiting academic ..
Jeff Pan
Tractable ontological stream reasoning.
Need to be more efficient, scaleable, as things change. Inputs from web.
Dealing with complexities: approximate owl2.
Dealing with frequent updates: to-add stream and to-do delete stream. Truth
maintenance. Evaluation criteria.
Trowl.EU can use with protégé, also supports jena.
Edoardo Pignotti
Semantic web tech to support Interdisciplinary research.
ourSpaces VRE
Provenance crucial.
OPM prov ontology.
Deployed since 2009, 180 users. Comprehensive ontologies but people unwilling
to provide metadata.
paper! Edwards et al. ourSpaces.
Tom Grahame (BBC) @tfgrahame
Content arrangement on BBC sport by tagging, automatic to free up editors to
write.
LD API so systems don't need to know about each other.
Growing from simple rdfxml to more complex ontology.
Can ask much more general and much more detailed questions about sport.
Mapping incoming data is outsourced.
Lots of errors, sometimes system alerts, sometimes manual.
Working on opening the data. Maybe a dump, but licensing issues.
Ewan Klein
Mining old texts for commodities, adding place and time and putting in
structured database.
Transcriptions of customs import records.
Skos for synonyms.
Dbp concepts.
Why? Want to query.
Visualisations.
Tools? Python script.
Janice Watson
Harnessing clinical terminologies and classifications for healthcare improvements.
Bob Barr
Geographical addressing.
Addressing and address geocoding is important and broad. Not always postal,
but this not addressed (punlol) in ontologies.
Different contexts change meaning of address (for delivering, you only care
about postbox; property sale whole building).
Loads of things to address. Loads of reasons why.
Work held up as national address file is owned by royal mail and might be
sold!
Fiona McNeill
Run time extraction of data. Failure driven. Looking at extraction of specific
information.
Emergency response. Lots of data, timely sharing of data required.
From domestic level to humanitarian disasters.
How can it be automated?
Multilayered incompatibility.
Format
Terminology
Structure
...
Richard Gunn
Towards an intelligent information industry.
Elena Simperl (Soton, sociam)
Crowdsourcing ontology engineering.
CSrc: Brabham 2008.
Distribute task into smaller atomic units.
Humans validating results that are automatically detected as not accurate.
What are the costs? What resources?
Games with a purpose. Like quizzes.
Micropayments or vouchers.
MTurk. CrowdFlower.
Paper about useage of microtask crowdsourcing. ISWC 2012.
Claudia Paglieri
Ontologies in ehealth.
Enrico Motta \\\\- Rexplore
Klink algorithm mines relations between research topics.
Use this! Nope, it's not public. Uees MS Academic research.
Peter Murray-Rust
Content mining expands regular text mining.
Focus on academic stuff.
Chemical Tagger. Takes chemistry jargon and annotated it, knows actions,
conditions, molecules etc.. NLP. Uses ontologies and contributes to
ontologies.
In chemistry, no need to put everything in rdf because there are already lots
of formalisms.
Proper cool PDF to sensible format conversion. Amy the kangaroo. Looking for
collaborators.
Yuan Ren
Ontology authoring in whatif project.
Reasoning with protégé and trowl .
Tractable reasoning. Trowl v fast.
Notes from conversations / breakout discussions:
BBC use owlm triplestore .
Store all their datasets in svn. But they have reads and writes to the live
triplestore all the time.
Lots of people saying minimise owl use because of unpredictable output.
Versioning ontologies (available in owl2) in case third parties change stuff you use. You're dependent on their software engineering practices. Only good if they're ahead of the game.
IRIs, Arabic characters in ontologies!
Semantic heavy, maybe make a decision to abstract away to ids and make heavier
use of labels.
Difference between importing and using someone else's.
There's no (practically useful) software that lets you reason over stuff you haven't imported? (over HTTP?)
Build ontology from reality (data), don't start with no data.
Lode.
Problems with dbpedia URIs changing or disappearing.
Hard to visualize massive graphs. Relational, tabular much easier to understand.
Last modified: