"This is an html view on the data, for humans. Is it so hard to just make it readable?!" Realtalk from http://twitter.com/essepuntato
semantic web (16 out of 71)
- Wrote some of the longest SPARQL queries I've ever seen.
- Disambiguated all the things.
- Added all the Things to DBpedia.
- Got my very own BBC/things guid.
- Corrected typos in public ontology documentation.
- Discovered rewriting a script from scratch in Python is easier than learning Scala to make a couple of small changes.
- Wrote an IRC bot script that replied in the affirmative if anyone mentioned falafel in the team channel.
- Accidentally made every artist in the triplestore
owl:sameAs
David Bowie. - Categories (names of emotions)
- Dimensions (ways of describing intensity, I think)
- Modality types (the means through which the emotion was expressed, eg. face)
- Regulation attributes (response to an emotion)
- Appraisal attributes (a list of other descriptive terms; emotional metadata, if you will)
- Emotion times (start and end)
- Most recent videos
- Most viewed
- Top rated
- Most discussed
- Top favorites
- Most linked
- Recently featured
- Most responded
- Google+
- Myspace
- Tumblr
- Blogger
- deviantArt
- WordPress
- SoundCloud
- Orkut
- Flickr
- Google Play
- iTunes
- Zazzle
- CafePress
- Spreadshirt
- rhiaro.co.uk/vocab/oocc# for the ontology spec for any terms of my own (when I write it)
- rhiaro.co.uk/cc/onlinepersona/
for OnlinePersona
s - rhiaro.co.uk/cc/content/
for content, when I get that far. -
Follow the links to find more connections and/or verify ones I've already found. For common social and content sites, I can manually scrape useful information or use their APIs. For independent websites or things I haven't come across before, I shall devise some means to not ignore them altogether...
-
Grab other stuff from the YouTube profile and handle it in the same way. Featured channels may link to other channels the content creator is involved with. Subscriptions and mutual friends may be a good place to go for building up the network.
- Put more into the graph than just the FOAF OnlineAccounts. Start on content..
rhiaro arrived at an event for 2 months, 23 days, 23 hours, 12 minutes, and 45 seconds
Amy added http://www.slideshare.net/pmika1/social-networks-and-the-semantic-web-a-retrospective-of-the-past-10-years to https://rhiaro.co.uk/bookmarks/
ESWC 2015, Slovenia
☑ Attending!
RSVP
ESWC 2015, SloveniaTA and Marker, Semantic Web Systems
School of Informatics, University of Edinburgh
Same as last year, but with twice as many students. Tirelessly answered student emails, made a few supplementary materials, mostly got the feedback sent on time; was nominated for a Best Teaching Award ^^ Oh, also organised a hands-on workshop this year, because I generally disagree with lectures.
Blog Post URIs
My instinct is telling me to separate a 'raw' blog post from a rendered version which includes html markup, css styles, replies from other people, a header, footer, maybe links to other pages on my website on a menu somewhere, ...etc. Because an instance of a blog post is not the same as the post mashed into an HTML page along with replies and other stuff. The replies can stand alone on the social web, so the original post should be able to.
So 'raw' post URIs are in the form https://rhiaro.co.uk/llog/unique-post-slug
and the rendered versions are found at https://rhiaro.co.uk/yyyy/mm/unique-post-slug
. The rendered versions have a slashy-datey URL because it makes it obvious that you can filter back to temporal aggregates by removing parts of the URL. It's just plumbing and a bit of UX++, which is partly why I don't want to use this format as the URI for the blog itself (I might change my plumbing one day).
Dereferencing the rendered URI obviously returns the rendered page in all its glory.
Dereferencing the 'raw' URI should still return a document, not redirect, because a blog post can be retrieved over HTTP. But what exactly should it return? A bunch of plain text turtle maybe; all the triples with the URI as the subject. Or minimally marked up HTML (including microformats, I suppose, and maybe RDFa). I guess I just have an arbitrary decision to make here.
So what's the relationship between the 'raw' URI and the rendered one? Maybe <raw> foaf:primaryTopicOf <rendered>
and <rendered> foaf:primaryTopic <raw>
? (Or maybe I should coin a new renderedAt
property to make the relationship clearer).
But when people want to start sending replies from their own sites, they're automatically going to want to use the rendered URI as the subject of their in-reply-to, however they do it, which would be technically incorrect; they should be replying to the blog post itself, not this rendering of it. Or maybe this is a context in which it doesn't matter, and the 'raw' and the rendered are essentially sameAs
each other.
Maybe I call the 'raw' a permalink, and make it clear on the rendered page, and hope people point their replies at the right place.
But if people do muddle the blog post and the (essentially) page-about-the-blog-post in their interactions with it, it probably doesn't matter; I can disambiguate them appropriately on my end. Maybe do some contextual reinterpretation experiments in the process (watch this space).
How do other people handle this? Well the people who are replying to each others' blog posts from their own sites are indieweb and they're not necessarily big on the linked data or pedantic arguments about what a URI 'means'. Todo: Find out if anyone has implemented a blog or similar with LDP.
Last modified:
ISWC Context, Interpretation and Meaning Workshop
My paper Roles and Relationships as Context-Aware Properties on the Semantic Web was accepted to the ISWC Context, Interpretation and Meaning workshop. I'll be presenting it on the 17th of October, in Riva del Garda, Italy.
Linked Data at the BBC
I've been offered a 3 month contact at the BBC, to work as a data architect on thier Linked Data Platform team in London!
The BBC uses linked data to model all of the things of importance to their audience: people, places, events, news storylines, and general concepts which are used to group things together through tags. A lot of the power is internal only right now, used by journalists for categorising creative works, but more and more audience-facing parts of the website are powered by linked data.
I'm excited about seeing how linked data works in 'the real world' (as opposed to academia), and really interested to see how various semantic web theories translate in practice.
Junior Data Architect, Linked Data Platform
BBC, London
Worked with an amazing team to boost the profits of Mr. Falafel in Shepherd's Bush and on the side helped with modelling the world as the BBC sees it, and learnt all of the corners to cut and ideologies to give up in order to develop linked data applications to improve the lives of people who don't know/care about linked data.
Key achievements include:
ARC2 SPARQL Endpoint
So Slog'd got stuck for a little while because the fast, nice-looking, somewhat magical SPARQL endpoint provided by ARC2 stopped working for no discernable reason.
I thought I'd try leaving it alone for a few weeks to see if it started working again by itself, but alas, it has not.
Everything is fine until I try to query for a specific predicate. (Specific objects or subjects are fine). The query runs, it just returns no results. I know the data is in there, because I can get it out with less specific queries. Also because I can see it all in the MySQL database on which it is based. When I left it, it was working fine.
I'm going to kill the database and set it up again.
I did this by - and oh, it was joyous - going into the database settings and appending '2' to the name of the database. I then reloaded the endpoint page, and it set everything up by itself :)
I inserted two triples, and successfully queried for a specific prefix. So, it works. I wonder what will happen if I dump all my old data back in there? (I validated the raw file with all the triples in RDF/XML, and they're fine).
I inserted the rest the rest: LOAD <path/to/rdf.rdf> INTO <>
Ran a test query, aaaand... it's fine.
So what the hell was wrong with my other database? Perhaps I'll never know...
Ontology of the Feels
I had an idea for a tiny wee project to do with quantified self. More on that later.
Because I'm trying to use linked data for everything, for reasons beyond the scope of this post, the first thing I did was sketch out the data I need to store in a graph structure. I need to record emotions, so I did a quick search for ontologies that represent emotions, figuring psychologists and the like must have been at this for years already.
Sure enough I found a few, but the most convincing one, the HUMAINE Emotion Annotation & Representation Language (EARL) is in XML rather than OWL.
Yay! Time to convert a well structured and useful dataset into RDF. Always a Good Thing.
EARL
EARL comes as many files, and goes beyond what I need. But it's not huge, and with a little effort (and looking some stuff on Wikipedia) I think I can understand what's going on enough to convert the lot.
Note: There's apparently a lot of disagreement about terms and stuff in this area. Not something I'm invested in, so I'm just going to roll with this XML.
There are:
Emotional occurrences can have all of the above as properties, as well as probability and intensity. Complex emotional occurrences have times, and contain a minimum of two emotional occurrences with the above properties.
The terms are all taken from various different psychological experiments or schools of thought. There are alternative versions of some of these things from something called AIBO. Arbitrarily I'm ignoring everything prefixed AIBO for now.
Converting
I'm going through the files and writing everything relevant out, then drawing it as a graph.
First juncture: do I use all the attributes (like the list of 55 emotions) as properties (as they are demonstrated in the original XML) or use classes? Properties seems messy, and feels less extensible, even though technically I suppose it's not.
Maybe they should be properties. Except the categories, they all (or at least most) have corresponding DBPedia entries that it would be stupid not to take advantage of. But the dimensions, regulation and appraisal might be better suited to being properties, otherwise I'm having pointless identifiers or blank nodes everywhere. And nobody wants that.
I adjusted the Samples thing a bit, mostly to simplify it, and I may have got it wrong, but I think it makes sense.
Then I typed it all into WebProtege. As a result, I think quite a few things are overspecified. What do you think? Check it out: http://vocab.amy.so/earl.
Last modified:
ARC2 Named Graphs
Dimly aware that ARC2's triplestore uses named graphs, I decided to check where all my triples are at the moment. Turns out any inserted from php scripts are inside a graph whose URI is the file path to the script, eg. <http://localhost/testclasses.php>
.
Maybe I should think about more proactively sorting my graphs.
Last modified:
TA and Maker, Multi-Agent Semantic Web Systems
School of Informatics, University of Edinburgh
Marked courseworks where undergrad/taught masters students had to convert an open dataset to linked data and query it and stuff. It was pretty fun. Also helped students understand the course materials by email.
URIs for content creators and content
Generating unique IDs
In a centralised system, I could generate my own unique IDs by whatever means, assign them, and be done with it.
I thought briefly about trying to generate human-readable unique IDs, but this article made me decide that that will all end in tears.
Maybe for now I assume that people won't need to remember their OnlinePersona
URI... Dangerous? Maybe. Maybe not. Maybe it's more likely that someone will be searched for by all the properties of their OnlinePersona
, but the OnlinePersona
itself doesn't matter directly. We shall see.
So on that note, Python's UUID will do. They're long and horrible. But I'll get over it.
Power to the people
How do I persist creator and content URIs in a non-centralised, user-owned network?
People would need the option to change their URI to whatever they felt represented themselves, like their personal 'about me' page. Trying to enforce content negotiation and a Document != Person mentality here might be difficult.
Ultimately it doesn't really matter what their URI is as long as it resolves and persists, right? And if it doesn't resolve, or even disappears entirely, it's kind of rubbish, but not Web-breaking. Kind of the reason the Web still holds up, and the reason the Semantic Web is an extension of that.
Assuming a distributed, Diaspora*-esque 'Pod' structure for this network, if a user moves to another Pod and as a result must change their URI, the protocols involved essentially need to require leaving a 'forwarding address' to their new URI. Maybe, in this scenario, URIs are handled differently altogether. Separately from the Pods. You can get a URI from the Pod you just joined, or you can use your own or generate one from a provider.
How do you authenticate changing of a URI? Someone could essentially steal someone else's identity by switching out their URI... so... that can't happen.
Maybe I'm thinking too much. I might need to talk to someone smarter about this.
Last modified:
Network of creator profiles
I'm trying to automatically find connections between accounts on different networks - social networks, content hosting sites, other? - that are held by the same person Agent. I'm starting with YouTube, because that's a good source of content creators.
Who?
I haven't figured out a way to reliable pick channels at random (and have since decided that wouldn't be a good way of doing it anyway due to the long tail of people who don't upload anything at all, let alone are 'active' content creators), so I'm starting with the 'standard feeds'. These used to, more sensibly, be called Charts. They're RSS (or Atom or JSON) feeds of statistics about content or channels. They no longer appear on the frontend of the site, but are available if you know where to look. They are mentioned in some of the API documentation, which is referred to in the YouTube Help about generating your own RSS feeds from YouTube content. The standard feeds are:
Most of these are useful in finding popular videos, which means there's a good chance the uploader has a wide network of connections within YouTube (which I can follow to get more information). Many, though, will be one-hit wonders. I've picked Top favorites as a list that intuition says will be more likely populated by videos from channels to which viewers have some kind of loyalty. These days everything you do on YouTube shows up in your friends feeds, so people may favourite a video as part of building their own identity on the service, as well as to support the content creators they love. It demonstrates an active, positive, reaction to the video. It's the content creators who produce content that is received in this way that I'm ultimately aiming to support. Most viewed, discussed, linked and responded could simply be controversial. Recently featured is some YouTube-inner-circle conspiracy, no doubt. This is all my opinion; if anyone has any better insights on these charts, please do let me know.
I'm also limiting the charts to 'this week', to get a fairly - but not too - rapid turnover of data. 'Today' might give me too many less-established one-hit wonder, viral of the moment types; longer term establishes some sort of consistent enjoyment of the video by the masses. 'All time' is a fairly unchanging list, and would mean all my research is based around Charlie Bit Me. (Although this in itself might be an interesting study of content creator evolution; the original video was aimed at close family and friends, went viral by chance, and since the parents and children involved have built a many-$ content creation empire, with sequels and merchandise and all sorts. They've easily made enough from ad revenue to put both kids through college. But that's another discussion).
Why though?
I'd like to know which other networks are most commonly linked to by active content creators. This might indicate what kinds of interactions are meaningful to them. Social networks for interacting with fans? Other content host sites for different versions of their content, or different media types? Independently run websites and portfolios? Online merchandise stores? Other peoples' content they want to share with their viewers (friends and collaborators)?
It might also be interesting to try to find out how often people reuse the same username across sites. And do people link to profiles on other sites that aren't their own? Either profiles they share with collaborators or friends, or just other peoples' profiles entirely? How can I reliably differentiate?
YouTube's provisions for external account linking
YouTube allows people to put links on their channel. They can choose up to four 'social' links to display icons for over their channel banner, plus one 'custom' link. They can also input as many custom links as they like which show up in a list in the About section of their channel.
The predefined list of 'social' links from YouTube is:
There are crucial things missing from this list, I'm sure - Bandcamp, Newgrounds, off the top of my head - but if this is what YouTube thinks its users want to connect to, then it seems like as good a place to start as any. And of course, if a chosen profile doesn't appear on this list, they can add it (labelled however they want) in the custom links section. The custom links section is also often used for listing secondary (or tertiary or group) YouTube channels, which are fairly commonly found amongst active YouTubers.
Getting these links programmatically
The YouTube API (v3) is balls when it comes to giving me information that is useful in this regard.
Scraping time!
Currently all of these links, regardless of banner, social, or custom, conveniently reside in <li>
s with a class
of custom-links-item
. I BeautifulSouped them out. (Why I can't get this information through the API, I don't know).
Linked Data-ing things
So I'll use FOAF's OnlineAccount to hook all the accounts together as Linked Data, which in theory is a perfect fit. SIOC's UserAccount is also an option, but I'll keep it simple for now.
In related news, YouTube is phasing out usernames. New YouTube channels are now created directly through Google+, with a Google+ ID as the unique identifier. It's trying (to the outrage of YouTubers with any kind of branding or well-known identity) to encourage people to hook up their channels to their G+ profiles, and lose their old username. Once done, this cannot be undone. I'd still expect to be able to find out someone's username if they have one though, given the unique channel ID. The API doesn't return this. You get a channel 'title', which is just a display name. For some people (those with established branding) this will be their ye olde username, but for many - most, I suspect - it's their G+ (supposedly real) name.
It just means that for YouTube channels I have to use the gibberish long unique ID instead of a nice human readable username for the foaf:accountName
. This goes against what I feel accountName
means, but is compliant with the spec, so I guess I'll leave it there.
Everything else at that point is straightforward:
Once the links are got, broken down into their constituent parts with urlparse
, I can use rdflib
to turn them into, eg:
And store them somewhere ... to be continued.
OnlinePersonae
I'll probably subclass Agent
with OnlinePersona
(inspired by K. Faith Lawrence's FanOnlinePersona
) and have the accounts belonging to that. Eventually OnlinePersona
will have more properties which it won't necessarily share with all Agent
s.
Note: SIOC doesn't have a notion of this type. SIOC has UserAccount
which subclasses foaf:OnlineAccount
, and thus defers back to a foaf:Agent
as the account holder.
Sooo... what do I use as URIs for my OnlinePersona
s?
This merits a tangent in the discussion, so I'll make another post about URI issues.
URI locations
Months ago (probably) I thought it would be a good idea to make a PURL for all of my content creation ontology related stuff. I couldn't find any existing sensibly named domains that are public at purl.org... things like '/ontology' are selfishly private. So I created '/content-creation' as a (public!) top-level domain. It's still 'pending approval'. Which means I can't do anything with it. Is purl.org even looked after any more? Grumble.
(Andrei Sambra suggested I use prefix.cc to give my ontology a pretty name. Which looked briefly promising, before I realised it doesn't redirect automatically to an ontology... it's good for humans searching for vocab prefixes, but not for machines by any stretch. Mo validated my feeling that ontology URIs ought to resovle to machine- and human-readable descriptions).
I had been going to use data.inf.ed.ac.uk as the base, but the server that pointed to melted down last month. I dunno when it'll be back. So I'll stick to something I, personally, control. At some point I might buy a more suitable domain specificially for it, but I should discuss the options with some people who know what I'm doing before making a decision by myself. Available candidates right now though include: creativecontent.info, webcontentdb.com/info, internetcontentdb.com/info.
Oh, I just found out that purl.org isn't unfailingly reliable. In that case, forget it.
So for now I'll use:
Next
Last modified:
Graphite
I think I'll use Chris Gutteridge's Graphite layer on top of ARC2. I've been putting off thinking about yet more libraries, but I think I'll end up implementing bits of Graphite in an effort to make ARC2 more friendly anyway, and I'm sure he's done a better job than I ever could.