Most expensive breakfast ever (Notes)
21gbp (€24.55 / $26.71 / £21)
Post created with https://apps.rhiaro.co.uk/latinum
Most expensive breakfast ever (Notes)
21gbp (€24.55 / $26.71 / £21)
Post created with https://apps.rhiaro.co.uk/latinum
Starting to think actually I can use a self-hosted etherpad for everything and there's no need for any other communication or note-taking applications.
Of course, I will think this until my server goes down and I lose everything.
+ https://hackpad.com/PrivOn-2015-1w1AVtigY92
Amy added https://hackpad.com/PrivOn-2015-1w1AVtigY92 to https://rhiaro.co.uk/bookmarks/
In reply to:
Notes post social-informatics seminar talk
methodology: iterative, inductive
ResearchGate > Academia.edu
Research in use of avatars
Where the good enough boundary lies for managing? When is it dangerous, when serendipitous. What 'matters'? No perfect model of social management/access control.
Easier to look at different systems doing the same thing in different ways (things of a similar species)
Profiles shaped strongly by the needs of the service providers Problems of their own survival are also played out in this space Difficulties people may have managing profiles are sometimes down to easily fixable poor design things Interests of people fight with interests of service
Github
Definition of ownership
ACNE and other diseases communities on twitter - people create second accounts to participate
"epidemic communication patterns" - Robin Williams
I had two meetings with Dave Robertson, my second supervisor, about what on earth I'm doing, and here is a vague summary of my thoughts afterwards.
I came to the realisation between meetings that I need to scrap the term Amateur Creative Digital Content, because amateur doesn't really apply by its true definition and creative is too subjective anyway.
Focus on content creators, not content (so previous point doesn't matter so much anyway; maybe just need to look at existing ways people are describing types of users to make it clear who I'm concentrating on).
In terms of emphasis of the thesis, I need to make a choice between taking a cognitive science/sociology perspective and a tecchie/engineering perspective (I choose tech because that's where I'm most comfortable, but the sociology side of things is still important).
(Therefore) I need to think concretely now about technology architecture.
Not to get too hung up on the Semantic Web; the technologies are a vehicle for testing theories, rather than an end in itself (though I still think facilitating a big linked data set of this sort of data is useful in the long run for research and practical applications, I didn't labour that point).
Social machines, and how Dave's process modelling language fits in, which I think I get in theory but not practice (I'd probably have to look at a working application and code to understand really). Some of the principles may be useful further down the line, but probably not the language itself or anything.
Technology-wise, I'm not thinking about anything novel or new, but more new ways for how various Web and SW technologies are combined and applied to this domain. (?)
So maybe the novelty is in marking up various things about content creators and using this to infer information about the processes they're involved in (or want to be involved in) in order to then facilitate these processes, without (necessarily) ever explicitly representing these processes (because from the content creators' perspective, they're certainly not thinking in terms of formal representations of processes, and in many cases won't know what they're trying to make until it's done, for example).
How to represent the inferences made might be novel and exciting, but I don't know.
Hmm, I still don't think I've figured out how to evaluate .. anything. Beyond comparing activities of users with magical-new-system vs without magical-new- system. And maybe, going back to the this-big-dataset-is-useful idea, by finding questions we can now ask about these kinds of communities that we couldn't before because they were so fragmented.
Last modified:
For easy navigation of my gibbering.
[First impressions of Madrid (6th and 7th,
Mon 8th: Research in theory and practice, and where on earth am I?
Tue 9th: Collaborative ontology engineering and team formation
Wed 10th: Practical semantics, and human nature.
Thu 11th: Social semantics and serendipity
Everything from Friday 12th July
Final impressions of Madrid (14th, #travel)
Last modified:
Make a model of what is happening.
WeSenseIt \\\\- citizen water observations.
River belongs to citizens, not authorities.
Physical sensors (hard layer) are expensive and brittle.
So use people instead (soft layer, social).
Give people small sensors. Phones.
Then you just need software for information management.
Can't rely on phones.
Old people in Doncaster.
Give them easy sensors instead.
Costs about EUR 80.
Open Source & hackable.
Not expected to substitute professional sensors, but a way to crowdsource information you would never get.
In Delft
Give people flood preparation advice and record who ticks things off, to build a picture of who/how/when preparations take place.
The Floow Ltd
"Commercialises data solution for telematic insurance."
World divided 10x10m squares, sense things everywhere.
Traffic risks.
Sensors tell you people are going somewhere, not why.
That's what social media can tell you.
Monitoring development of a house fire via Twitter.
Seeing events through the eyes of the community.
Social streams:
Large music festival. Monitor geolocated messages, trends, topics and relations.
Most 'critical' events were management issues.
Developing system to warn you automatically about things to pay attention to.
Look/listen for event within 72 hours. 10 minutes to find out what it was.
\\\\- Simulation of station bombing.
Minute by minute description of event.
1.5 billion messages.
Four things when monitoring:
Big problem - people tweet crap!
People don't realise when people nearby are in danger.
Deception on social media
False crowdsourcing political support on social networks.
Smear campaigns using bots.
Bots to foster / prevent social unrest.
Identifying bots
23 behavioural features.
Feature set is open.
Recognise 90% of bots - more than humans can do.
Very small amount of tweets are geolocated, it's useless.
Have to use the text.
Timestamp is not necessarily correct.
Issues in events
No infrastructure (eg. at music festivals).
Phone signal issues, phone charging issues.
Most tweets from outside event.
Conclusions
Need to convince citizens that authorities are not spying on them.
Need to convince authorities that citizens are not all criminals.
Privacy and legality issues.
Creating a company on this research would be unethical.
Need to pass the right message. Full disclosure. Non-intrusive use of tweet
content.
What happens when authorities demand this technology for privacy-invading stuff.
Have to be careful with what you publish.
Always assume the bad guys have thought of what you thought of.
Always be in a situation where you can destroy your data at short notice.
Bit legal barrage behind them. Know what they are/aren't allowed, know what
they do/don't have to do.
Start leading a blameless life.
Last modified:
Tools and Techniques
Recommender systems
Input: Set of users + set of items + rating matrix.
Problem - given user, predict rating for an item.
In real world, recommendation matrix data is sparse.
Can use hybrid approaches.
Collaborative RS:
Knowledge-based RS:
User-based collaborative recommendation:
Item-based collaborative recommendation:
Content-based RS:
and profile of user interests.
Don't necessarily have complete descriptions of items - just have a 0 in your vector.
Similarity between items:
Using LOD
To mitigate lack of information/descriptions about concepts/entities.
Recommender systems are usually vertical, but LD lets you easily build a multi-domain recommender system.
To avoid noisy data, you have to filter it before feeding your RS.
Freebase.
Tiapolo
Vector space model for LOD
Last modified:
Collaborative ontology development from Natasha Noy (sssw2012)
Stanford, Protégé.
In past 10-15 years, through collaboration with scientists (particularly biomed), ontologies have become essential.
Don't need to sell ontologies to scientists, they believe in it.
Focus on science because that's where she has experience etc.
We're not so bad at versioning ontologies, more versioning data is the problem.
Experts add stuff, curator checks quality, and publishes upcoming tasks.
Similar to open source developments, but no research to compare the two.
\\\\- Different because biomed people are paid (well).
ICD - International Classification of Diseases.
Conflict resolution:
Users expect stuff like Web 2.0 interactions, Web interface. Web Protégé:
Last modified:
RELEVANT.
Users (consumers?):
More data than just what you seen in the media (cue my Venn diagram).
Plus, eg. paintings - lots of 'cultural baggage'.
Care more about the story than the media.
Interpretation by end users. Hopefully message that the author intended.
Meaning of combination of assets.
eg. Exhibition of artists work.
Interacting further with the media.
(SW and multimedia community need to work together).
-> Raphael Troncy on Friday - attaching semantics to multimedia on the Web.
Need mechanisms:
Workflow for multimedia applications
Heard of MPEG-7? Don't bother.. very much from a media algorithms perspective.
Applications:
Canonical processes overview...
There's a paper.
CeWe photobook - automatic selection, sorting and ordering of photos.
Context (timestamp, tags) analysis and content (colours, edges) analysis.
Things from these you want to represent your digital system (ie with LOD):
COMM - Core Ontology for Multimedia.
Premediate and construct message - human parts, she doesn't expect them to be digitised any time soon.
Using Semantics to create stories with media
Can we link media assets to existing linked data and use this to improve presentation?
How can annotations help?
Vox Populi (PhD project)
Traditionally video documentary is a set of shots decided by director/editor.
vs.
Annotating video material and showing what the user asks to see.
interviewwithamerica.com
Annotations for these documentary clips:
Automatically generated coherant story.
Vox Populi has (not for human consumption) GUI for querying annotated video content.
User can determine subject and bias of presentation.
Documentary maker can just add in new videos and new annotations to easily
generate new sequence options.
User informatio needs - Ana Carina Palumbo
Linked TV. Enhancing experience of watching TV. What users need to make decisions / inform opinions.
Experiment - oil worth the risk?
Published at EuroITV.
Conclusions
Questions
Hand annotations are error prone - how to validate?
Media stuff - there can be uncertainty, people don't always care.
Motivating researchers to annotate...
Make a game.
Store whole video or segements?
W3C fragment identification standards - timestamps via URLs.
Last modified:
Things to investigate further!
LibRDF - linkeddata-perl for Debian by Kjetil Kjernsmo.
Rakebul Hasan, Fabien Geandon (? not sure about names, can't read my handwriting..) - Trustworthiness of inferences.
Taldea - fostering spontaneous communities.
Ghada Ben Nejma.
NERD ontology for spotting entities.
nerd.eurecom.fr
See photo:

Last modified:
It's a dynamic world! Ubiquitous streams and the Linked Data Web from Manfred Hauswirth (pooh42)
Streams: Any time dependant data / changes over time.
Has done a paper about P2P stuff.
Data silo - "natural enemy of SW scientists"
Massive exponential growth of global data.
Still have to integrate dynamic data with static data.
Multiway joins are domintion operator. Need to be efficient.
Everything/body is a sensor.
Various research challenges:
CoAP ~= http for sensors.
Stuff about sensor networks and context - useful for Michael.
Rewrite query to spit out static and dynamic - lots of overhead.
But need to optimise between these.
Neither existing stream processing systems nor existing databases could be
efficient enough.
So the built own LD stream processing system. (Optimised and adopted existing
database stuff).
HyperWave - didn't succeed. Didn't listen to customers and wasn't open source
(license fees).
But better than hypertext was back in the day.
Performance important for success/uptake.
Just putting it on cloud infrastructure doesn't mean it scales.
To do?
World is:
... uncertain, fuzzy, contradictory.
So combine statistics and logics.
Hard to scale logical reasoning, so use statistics to shoot in the right
direction.
Privacy?
Don't get hung up on approaches / labels.
Last modified:
(by Mathieu D'Aquin).
Introduction to Linked Data from Mathieu d'Aquin
Linked Data = universal connections, like Lego.
Universal is why it's important.
The only problem we had during the workshop was disagreement about how to read the 'broader' and 'narrower' relations between courses. It instinctively ready contrary to what (my) common sense suggested (eg. that 'arts and humanities' is broader than 'history', which some people disagreed with). A quick reference to the ontology documentation resolved that.
Last modified:
Semantic Web questions we couldn't ask 10 years ago from Frank Van Harmelen
Semantic Web & Web of data = a more manageable mission.
Metaweb movie - got bought by Google and incorporated into Knowledge Graph.
SW Principles:
1\\\\. Give everything a name (entities).
2\\\\. Relations form graph between things.
3\\\\. Names are addresses on the Web (so we inherit properties of Web like AAA).
This becomes Giant Global Graph. (Maybe SW should be called Giant Global Graph?)
4\\\\. Add semantics.
Google: from just links to results, to information boxes (last May). Can't
directly address Google Knowledge Graph.
NXP (microprocessors): 26,000 products. Integrated all databases into
triplestore. Exposing subset of triplestore to customers.
BBC: 125 million triples. Many data sources. APIs to website. Own
ontologies.
All have the same triple-layer architecture:
| Raw data | SW layer |
|---|
Output / API / UI etc
DataGov: eg. air quality in cities, campaign money, if policies work.
Companies don't care about SW, but are using these technologies for their own IRL purposes.
These are all different types of use cases of SW technologies:
It's important that the SW graph is so big.
The LD cloud is still poorly interconnected, but good graph properties.
SameAs.org
Heterogeneity is unavoidable.
Socio-economic, first to market - why certain systems/ontologies get used, eg.
schema.org, dbpedia.
Self-organisation.
LD cloud grew, nobody designed it.
Knowledge follows power curve. This has an impact on mapping and reasoning,
storage and indexing.
Distribution.
Web not geared for distributed SPARQL queries. Everyone pulls in all data and
queries local copy. Not very 'webby', disadvantageous. So subgraphs? Query
planning? Caching? Payload priority?
Provenance.
Representation, (re)construction. Metametadata (knowledge about knowledge;
uncertainty; problems with vocabs for this).
How to get from provenance to trust.
Dynamics (change).
Cool Web in 60 seconds graphic.
SW not changing this fast, but soon..
Errors and noise.
Sometimes we disagree.
Deal with by: avoid, repair or contain. Or just deal with it \\\\- allow
argumentation.
Fuzzy, rough semantics - almost, maybe.
Lots of research questions. But not ones we could ask 10 years ago.
Information universe - "algorithms exist without us looking at them".
We should ask if things work in theory.
Scientists vs. engineers.
Discovering vs. building.
Says we should change our mindset from building stuff to hypothesising and falsifying.
Last modified:
Last modified:
Just notes from a three-hour workshop about how to write an Informatics thesis, on the 16th of April.
State contributions (to knowledge) explicitly. Intro, conclusions; each chapter should have some (probably not all) contributions discussed. Be obvious; use headings.
Knowledge - background:
Evidence, well-reasoned arguments, acknowledge limitations.
Clear openings for future work. Be clear where they are.
Make it reproduceable.
Short / concise. Examiners like short theses.
Introduce what's interesting and important.
When outline thesis, look at structure of main argument, not of document.
Background material must have point. Only include as much detail as you need
to make point.
Points, eg:
Then we had five minutes to write down what our PhDs are about and what we have already found out. I wrote:
How do the futures of the Semantic Web and amateur digital content creation
fit together?
Can Semantic Web tools and technologies be used to enhance collaborative
creative partnerships and encourage fruitful outputs?
_
There are knowledge sharing systems and collaborative tools for scientific
fields and in education, but nothing for creative artsy things.
Attitudes towards data sharing and privacy amongst content creators are in
flux. There are lots of projects and energy around open data and
decentralised social networks that allow data to become portable and not tied
to one platform. One of TBL's visions for the Semantic Web is the dissolution
of data silos and 'walled' applications that disadvantage the user, and as
such the promotion of the 'ownership' of a user's data by the user themselves,
rather than the software or organisation that uses the data.
__There are lots of reasons people make content. There are lots of reasons
people don't make content (who could / would like to)._
[Notes resume]
Use backreferences; don't repeat yourself.
Info / advice
...homepages.../sgwater/resources.html
..homepages.../imurray2/teaching/writing
Style: Toward Clarity & Grace (book)
The Craft of Research (book)
When to start writing thesis?
Don't assume appendices will be read. More for extra info if needed by people trying to reproduce your work (not your examiners).
Too many direct quotes look like you don't understand and are avoiding explaining yourself.
Keep copies of web resources and cite access dates in case they change /
disappear.
Figures might be copyright if you just copy them from papers, even if you cite
them. Remake them, and put 'adapted from' as citation.
Examiners?
No grading system (ie no different levels of passed PhD). Might be external prizes if you want extra recognition.
Last modified: