blog, social media metrics, twitter

50M per day, or pushing the envelope at 600 tweets per second


Twitter is now reporting that 50 million tweets bleep through the grid every single day. It's a staggering number, 600 per second, of which "approximately 83 tweets per second contain product or brand references (20%)" according to coverage in Readwriteweb. Alongside metrics reported for Facebook (60 million status updates per day) and Youtube (1 billion videos per day) I'm inclined to run for cover in anticipation of some great resounding social sonic boom.

No need to do that however as the metricians have yet to find proof that there is a social equivalent of the sound barrier out there to warn us of. Be that as it may, social media giants Facebook, Youtube, Twitter, and Google Buzz likely enjoy the race for traffic growth more than they know what they would do if we ever gave them a finish line. Boom! More likely the sound of a starting gun in our case than some barrier up in the sky the other side of which lie demons in waiting. The envelope these guys are pushing is no sound barrier but contains instead the big paycheck (and for the true type-A venture guy, the big payback).

Fifty million tweets a day would knock you on your ass if you were at the receiving end of that firehose. But you are! And so am I. But I, like you, am as likely tweeting myself or if not possibly sitting here like a monkey with my fingers in my ears, hands over my eyes, and then over my mouth. In the time that I've been writing this, and since my last tweet exactly 20 minutes ago, 720,000 tweets have blown by me and I didn't catch a single one of them.

I'm like the guy in the Memorex ad seated in some high-veneered-class black leather and chrome Corbu lounger dressed in Ray Bans and with my tie laid out behind me like a wind sock perched at the back end of some Nasa Ames wind tunnel test of the tweet resistance properties of social media power users.

And the tag-line, or the alt-tag, or the tag cloud reads: "Is it live or is it Realtime?"

If I can be exposed to 50 million tweets per day and still retain my balance at the end of it, if I can withstand the shock and awe of that many messages and I'm not bleeding from the ears eyes and nose, and if I'm not wearing some giant camo protective suit like the guy in Hurt Locker who looks like a cross between a transformer and the michelin man impersonating Arnold Schwarzenegger in Terminator, then there's something behind those numbers worth peeling back.

Fact is there's probably a lot there worth digging into. Here are some hints as to what we might find, if we had the data and the gear to mine it with. This from Socialtimes:

  • A large number of inactive twitter accounts, with around 25% users having no followers and 40% users having never sent a single Tweet.
  • Around 80% users sending fewer than ten tweets.
  • Only 17% of the registered users having sent a tweet since Dec, 2009.
  • The number of active users becoming even more engaged.

"The conclusion of RJ Metrics study was that although Twitter grew tremendously in 2009, a bulk of this growth could be attributed to power users."

Yeah so how do you like them numbers? Obviously, twitter usage stats correlate to what is perhaps a shrinking percentage of active users (somebody dig up the historical data on how many had 0 tweets and 0 followers 2 yrs ago) vis-a-vis a rapidly-rising flow of tweetage from a core set of power tweeters.

(And I'm now seeing the mental image of not a classroom but a markedly larger higher-ed environment kind of hall or auditorium far in the high back left of which is a cluster of excited-looking students yet again engaged in frantic hand-waving and displaying loss of upper-body movement described perhaps by means of words like "paroxysms" and "peripatetic." And if I press my fingertips to my temples I'm getting a strong sense that they want my attention.)

Fact is, twitter is an attention machine. And it's not always a smoothly-functioning affair. It works great if you expect little to come back. It's perfect if you just get a kick out of turning it on. Awesome if you enjoy hearing the buzz. And rocks if you like standing around with a bunch of other folks just admiring the damn thing, like a beast of engineering well-oiled and purring and all coiled up and ready to pounce like some high performance V 8 on the track at Altamont.

Thing is that we don't know what kind of machine it really is. Or was, is, and is becoming. We don't exactly know who uses it, why, and for what purpose. If twitter is an engine for buzz in some circles, a motor of growth for others, a speed demon for fast-moving news cycles, a truck loaded up with discounts and offers, or just a limo with its engine on idle parked where the valet should be while you make your important appearance as it sits, a symbol of your status and overall position — numbers like 50 million don't tell us what engines those 50 million messages are spinning.

I've noticed several types of people who use and benefit from twitter. Obviously a small number of the overall population, given twitter's somewhat remedial drop-out rate. I group them into four main types, as Self-oriented, Other-oriented, Relationally-oriented, and media users. This fourth type is new, as it's not really a personality type but works as a media user type.

  • Self-oriented types can use twitter to their benefit as a soapbox. Good for punditry, for talking at more than with. Celebrities fit in here also, along with the pundits who would like to be celebrities but are not.
  • Other-oriented types, whose communication skills are a bit less self-centered and monological and who are instead more conversational. These types respond and talk to and sometimes with other people. They don't have to talk about what interests them because they often start with what somebody else says.
  • Relational types are more difficult to find on twitter, because twitter makes relational activity hard to engage in. There's multiple @replying and @naming, but no multiple DM-ing. Relational stuff, like gossip, back-channeling, mediating and triangulating good social grist rests on communication that includes and excludes members of a self-sustaining group.
  • Media-related types are those who use twitter just for broadcast. As a way to push out content like news, links, headlines. Or some micro-social version of the big media forms of these. Not as social, not as conversational, and, really, not as egotistical. Twitter as smart extension and tool or channel. (Yeah marketing types don't go kill twitter now y'hear?)

At 50 million tweets a day, twitter really is humming along. But I would really like to know who's using it and how that's going. It has helped me see the value in twitter, and also preserve my own cranial structural integrity, to sort out differences in what is posted there and in how people use it. For branding themselves, passing around bits of interest, journaling out loud, climbing social ladders, socializing hysterically like a first-timer half hung out of the sunroof of a towncar in Vegas...but with a megaphone, an octave pedal, and some doppler-canceling device whose chief function is to make sure it passes at a steady and un-diminishing pitch and volume.

I'm digging deeper into this, because twitter and its ilk are, really and truly, and for better or reflux-inducing worse, the Great Capitalist System's new mode of production. Both the distribution channel and media preference of choice for millions of new consumers. And even if at 50 million-G-force-inducing-tweets-a-day-but-nobody's-paying-attention this machine is imperfect and recall prone, it is how we many of us communicate and with that how much of our culture surfaces and makes its waves. Relational, communicative, un-coerced and largely free of the police, twitter is just one in a family of now gangly and sometimes awkward adolescent social tools historically inevitably destined to grow up make the social contributions that are their civic duty.

I'd say stick around, watch, learn, and think a bit. But if you're here you probably already made that choice. It's early days, like when television comedies were radio acts with a camera. The talkies are here. Say something interesting. Keep it real. And never be afaid to draw back the curtain ask: So what does this mean?

Related
Slideshow: Finding Signal In the Real-Time Noise by Louis Gray
Twitter Hits 50 Million Tweets Per Day; Still Dwarfed by Facebook & YouTube
Twitter Users Sending 50 Million Tweets Each Day

Social Media & Semantic Web, blog, data silos, foaf, foaf+ssl, identiy 2.0, microblogging, privacy 2.0, sioc, twitter

A Flock of Twitters: Decentralized Semantic Microblogging

In my last article, Flocking To the Stream, I ended with this thought about the growing issue of social-networking fatigue:

…as the number of streams continue to increase and as the flow rate of each stream picks up, people will grow tired of having to subscribe to, having to join yet-another-stream phenomenon (YASP). Does the Web truly need additional stream providers each with their own data silos? Is there a user-centric solution to this rapidly growing, overflowing-stream issue that puts YASP to rest once and for all?

This article answers these two questions in great detail but the succinct preview version is as follows:

  1. The Web does not need additional stream providers each who exert significant control over a vast number of individuals, each who require their users to have a separate new user account (a new digital identity)
  2. The Web does not need additional closed data islands (data silos)
  3. The Web does need a means with which each individual can create, maintain, and control their own identity, efficiently and effectively manage stream conversations, and therefore not be beholden to a few, large data-silo stream providers
  4. The only way to accomplish point three is for the emergence of a distributed, decentralized, Open Source microblogging ecosystem that leverages the power of the Semantic Web

Table of Contents

Since some of the following may be too generic for more advanced readers, I’m providing this Table of Contents to help readers navigate to those parts with which they may have the most interest. The first four sections are a general review of the problem and solution. The rest of the article provides my detailed thoughts on this issue.

Although you should feel free to skip ahead, doing so might result in missing a crucial connection.

A Web of Damned Streams

From a user’s perspective, one of the issues with YASP is that their Web identity is strewn throughout the Web with some of their thoughts clumped in one data silo while others are deposited in another data silo. This makes it very difficult for each user to manage all their streams and associated relationships.

What happens when a new, exciting stream comes along? When a new Stream comes along, users have to weigh the potential benefits of membership against the likely pain and inconvenience caused by having to create a new identity, build a new network, and manage yet another stream.

Social networks benefit from what is called user lock in—the very real fact that, most things being equal between social networks, a user will likely decide to stick with a social network because it takes too much work to move data from one site to another. So, instead of moving their data and possibly closing their account, a user will simply open up another account at a competing social network.

Of course, this version of lock in assumes that social networks allow the moving of, or the copying of, their members’ data from one network to a competing network. In reality, the vast majority of social networks do not even allow their members free access and control over their personal data.

The issue facing most Web 2.0 users is that they have a multitude of accounts, each with its own username and password, each associated with a specific web service, and each located in a separate, independent repository—the proverbial walled garden of disparate user data, the omnipresent data silo.

Although most of the large social networks do expose a portion of their users’ data via proprietary APIs, they do not run an open network. They guard their data closely, assuming ownership of all their users’ personal streams. It is easy to understand why this is the case. A social network’s competitive advantage is their users’ data.

The current Web is dominated by the Web-2.0 social networking meme. It is not a healthy, vibrant Web. In fact, the current Web is becoming filled with damned streams, silos whose data barely trickles out and are not openly accessible to the rest of the Web. Google Buzz, Facebook, and Twitter could almost be considered alternate Webs, their members’ data mostly disconnected from the greater Web.

From a user’s standpoint, it is even worse. Most of these fortresses have rules and regulations that make it difficult for users to freely access and use their data elsewhere. Two years ago, Robert Scoble found out this shocking fact when he tried to move his social graph from Facebook to another service.

What’s the result of all these damned data silos? The promise of the Social Web is hindered. Later I’ll discuss the difference between the Social Web and social networks.

A Flock of Twitters

Instead of people becoming more dependent on highly centralized, proprietary microblogging services like Twitter, FriendFeed, Google Buzz, and Facebook, What if users could embed microblogging capabilities into their personal websites?

I don’t mean simply tie their Twitter, Facebook, and other social media streams into their website via behind the scenes, proprietary APIs—which they can already do. I mean actually host their own microblogging platform, become their own microblogging provider.

People should be able to subscribe directly to your microblog, to you and not to one of your myriad profiles on someone’s data silo. The way it currently works is that a user interested in what you have to say not only has to join Twitter (or Facebook, or Google Buzz, etc), but they must also subscribe to your stream on that particular service.

But what if a user who was interested in what you had to say could simply subscribe to your microblog, in essence subscribe to you? What if they could pull microblogging content from your site that originated directly on your site? What if there were a flock of Twitters and not just a single, centralized Twitter?

Why Decentralized?

Whereas a flock of Twitters may seem like an interesting concept, you may wonder if there actually is a benefit to creating a decentralized, distributed microblogging platform.

Part of the original vision of the Internet was to create a distributed communications network that did not have a central point of failure. The Web added a layer that allowed anyone, in theory, the opportunity to operate their own communications platform or channel (called a website).

But today’s Web-2.0 data-siloed social networks have created a handful of massive points of communication failure in the daily lives of hundreds of millions of people.

As an example, over the past two months, Twitter has experienced increasing unreliability. In fact, on January 20, 2010, Twitter was down for 90 minutes causing an uproar in the community. Whereas this might have been a fluke, or possibly have been related to their growth rate, the cause does not really matter. What does matter is that millions of people felt lost without their connection to their network.

This illustrates another fact of Web-2.0 life—that the promise of a Web where everyone had their own communications channel has been usurped. Although most people naively believe they do have their own communications channel by having a Twitter, Facebook, of LinkedIn account, in reality they are beholden to a few Web behemoths to offer them communication services.

By creating a truly decentralized and distributed microblogging platform, users can once again regain control over their Web experience and create their own communications channels. They will benefit from increased data control, data accessibility, data usability, and data security.

A final benefit to decentralized microblogging: data portability is no longer an issue when you own, host, manage, and control your own data store—at least with regard to your microblog activity. You do not have to port the data into a new silo because your data is always right where it should be—in your own silo. Your data is kept by you, managed by you, and controlled by you. You may have to move periodically your database to a new server or another web hosting firm, but that is not an issue of data portability.

Even with decentralized microblogging, there will still be data silos. The silos will just be micro silos (or solo silos) where all the data contained within each silo represents one entity and is controlled by that one entity. It is the perfect entity-to-silo ratio.

A final point. There is a theoretical limit to the number of microblog installs. It is the extant human population. Actually, it is more than that if you make allowances for the fact that businesses, governmental entities, and clubs could host and manage their own microblogs. A user, after all, does not have to be an individual person. A user can be a business.

Why Semantic?

Offering users the ability to operate their own microblogging platform is an enticing thought. But a decentralized, distributed microblogging system does not guarantee that data will be readily available and open throughout the Web.

Instead of having a few, very large closed data silos, a Web of microblogs would in essence be millions of very small closed data silos.

Why is being open important?

One of the promises of the Web in its early conception was to create a network were disparate data sources were interconnected in such a way that integration and interoperability issues went away. To accomplish that goal, data needs to be exposed.

Exposing data creates an entirely new realm of beneficial possibilities. Instead of websites being searched for matching keywords and phrases, the underlying data can be directly queried.

So, how do we open up all the micro silos? By leveraging the power of the Semantic Web.

This article will not go into a deep explanation of the Semantic Web. However, you can think about it in this broad way. Web browsers navigate hypertext; Semantic Web applications navigate hyperdata—data that is encoded with semantic markup and interconnected to other semantically-coded data in other locations. So, whereas hypertext is text linking to other text (documents), hyperdata is data linking to other data. (See 1 & 2 below)

Semantic Web applications are built using a stack of W3C-specific technologies— in particular the Resource Description Framework (RDF) and the Ontology Language (OWL). The Semantic Web technology stack is particularly important, as it provides a standardized way of encoding data without the need for a central controlling authority.

When data is semantically tagged, with the underlying metadata modeled using RDF and URIs, machines can “see and understand” the content. By this, I am not referring to some type of artificial intelligence (AI) engine that can infer meaning from data.

Instead, the data that has been encoded with semantic markup (semantic metadata) becomes structured in such a way that the intent, the meaning intended by the author is unambiguous. This is accomplished by using various ontologies (vocabularies) to tag the upper-level data with sufficient, relevant metadata that structure and meaning is added to the human-readable data.

Once data is opened up to discovery by being semantically marked up, the Web becomes a truly interconnected network.

For more information on the Semantic Web, you can start here:

  1. Henry Story’s excellent presentation Building Secure Open & Distributed Social Networks
  2. For a more detailed explanation of hyperdata, read Nova’s article, The Semantic Web, Collective Intelligence and Hyperdata
  3. For more information on the Semantic Web (definitions, RDF, and development tools), visit this link
  4. For a brief history of the Semantic Web, read James Hendler’s article, What is the Semantic Web really all about?

Since it is difficult to succinctly and accurately describe the Semantic Web in layman’s terms, I encourage you to read other sources and become well versed in the Semantic Web–its concepts, underlying technologies, and how you can participate in it.

Evolving Nova’s Stream Concept

Before I get too far into the specifics, I need to present a new interpretation of what Nova Spivack calls the Stream.

One of the powers of Nova’s Stream concept–at least in my opinion–is that it evokes the imagery of a flowing body of water. As I began gathering my thoughts for this article, it became apparent that his Stream metaphor could be expanded, could be evolved in a way that sets the table for a more meaningful discussion about decentralized semantic microblogging.

Nova describes the Stream as follows:

Just as the Web is formed of sites, pages and links, the Stream is formed of streams.

Streams are rapidly changing sequences of information around a topic. They may be microblogs, hashtags, feeds, multimedia services, or even data streams via APIs.

In my extension to his concept, I diverge somewhat from his original definition of the Stream. Instead of viewing each stream as an information flow around a particular topic, I’ve reimagined the stream as the flow of ideas from a given individual. A Stream is thus a monologue that contributes to a greater conversation.

A Drop of An Idea

In keeping with the metaphor of a flowing body of water, I envisioned a water-cycle like flow from a single idea to an ocean of open discussion. Therefore, I call my model of a decentralized microblogging ecosystem the Meta-Hydrological Model.

With that concept in mind, you can think of a single idea posted by a user as a drop. Just as a user of Twitter adds to a conversation by posting a tweet, and a user of FriendFeed or Facebook makes what is generically called a micropost, a user in this new conversation ecosystem posts a drop. So a drop is equal to a tweet is equal to a micropost.

Here is a simplified, graphical representation of the Meta-Hydrological Model (also called the Meta-Flow for short).

Click to see full size

The aggregation of all of a given user’s Drops is that user’s Stream. Viewed in this way, if a Stream is what a single user produces, then the River is the confluence of disparate users’ Streams. I’ll describe this in more detail later.

Within each user’s Stream, ideas might coalesce into specific topics. I call these Channels (Stream Channels). Channels are Drops that are grouped under a specific topic to form substream categories.

The final part of the Meta-Hydrological Model is what I call the MicroBlogOcean (MBO). The MBO is the sum total of all microblogging activity in the global conversation ecosystem. It is all the conversations, represented by all the Rivers.

Below is a natural, visual representation of this model as seen from space.

Satellite image of the Amazon River delta from NASA's Landsat GeoCover Program

Channeling Your Stream, Seining Your River

In our hydrological metaphor, a River is the confluence of disparate users’ Streams. But it is not a passive mixing of user ideas. Instead, each user has their own unique River, a River that they assemble, that they control. In particular, a River is the aggregation of all the Streams to which a given user is subscribed. It is similar to your following list on Twitter.

With Twitter, however, there is no practical way to filter the streams of those whom you follow. You subscribe to their entire stream of consciousness. Wouldn’t it be great if you could decide what thoughts, what information you let other users send flowing down your River? Wouldn’t you like the option to grab just the content in which you are truly interested?

Whereas users could of course choose to subscribe to your entire treasure trove of thoughts, by organizing your content into Channels, you provide a means whereby your subscribers can filter out what they do not care to see. They would have the option to subscribe just to your substream(s) and not your entire Stream.

Why is this important?

Well, as an example, for absolutely every person I currently follow on Twitter, I don’t care who just booted whom out as the mayor of whateverville. I don’t want that drivel polluting my pleasant paddle down my River. It adds zero value to my day and provides little if any entertainment.

I also rarely need to know (nor care to know) whenever someone has just stopped by a Starbucks, or is eating at this and such restaurant 1000 miles away, or is on a treadmill listening to Kid Rock on their fancy Zune. It’s also the case for many people whom I follow that I’m not actually interested in all the serious topics about which they micropost. In effect, I actually subscribe to them only for a small subset of their shared knowledge.

Now, to be perfectly fair, I bet some of my followers would be very glad to filter out my microposts on the Semantic Web, whereas others would be happy to stop seeing my microposts about WordPress or BuddyPress. It may also be the case that no one cares at all to see any of my general thoughts that I occasional let float down their River. I think my subscribers, my followers, should have the right to filter out what they consider to be MY drivel.

By providing a mechanism for channeling thoughts into topics, our new microblogging client would provide a better user experience. The utility of user Channels could be further improved by offering public and private Channels. A Public Channel would be visible to all and open to subscription. A Private Channel would only be available to those users who are granted access via their WebID (more on the concept of using the WebID later).

The MicroBlogOcean

As mentioned above, the totality of all microblogging activity is called the MicroBlogOcean (MBO). In this global conversation ecosystem, Drops are constantly being pushed to and pulled from the MBO cloud.

To provide and manage the myriad MBO services, a new type of SaaS model needs to be created. I call this software-based service a Confluence Hub. A confluence is the point where two or more bodies of water meet. Therefore, a Confluence Hub is the place where Drops sent by various users meet up, are processed, and wait for further action.

Notice User has only subscribed to User 3's Channel. Click for full size

This is how the process works. A user’s client sends a Drop to the closest Confluence Hub where an amalgamator combines them for transmittal to all that user’s subscribers. The Drops are organized by Channels, if any, and cached. If a Confluence Hub (CH) is down, then the Drop is automatically rerouted to the next closest hub.

An aggregate is a collection of items that are gathered together from different sources. The role of the client-side aggregator then, is to poll, to query the primary Confluence Hub Server (CHS) of each user Stream to which a user is subscribed, pulling the resultant dataset into their River on a predefined, regular interval.

Only the content the User wants gets through. Click for full size

So, whereas a user’s Drop is pushed to the closest, active Confluence Hub, the Drops of each user that they follow are pulled into their River from the MBO cloud.

Using our hydrological-based metaphor, Drops are created and stored on each owners’ site. This means any Drops that are de facto responses to someone else’s Drop are contained within disparate sites across the Web. Whereas the user’s client would cache all incoming Drops (in their River) and the application might even have an option to save a discussion to disk, the original Drop remains located in the owner’s Stream.

The Meta-Flow concept is not a perfect analogy to a natural hydrological flow. Whereas Drops do travel to Confluence Hubs, copies of those Drops are pulled into each subscribing-user’s client to form their unique River. The MicroBlogOcean therefore contains multiple references to the same original Drop and the Rivers actually flow out of the MBO rather than into it.

Although I personally believe this hydrological-based metaphor does a sufficient job of breaking down and describing the component parts of the overall decentralized microblogging ecosystem, for purposes of user understandability, the terms may need to be replaced with a more generic, globally-recognized nomenclature. Although, what is more globally recognizable than the water cycle?

Social Web Versus Social Network

When talking about the Semantic Web, it’s important to differentiate between social networks and the Social Web. These terms are not synonyms. In fact, the Social Web is not even the sum of all social networks.

Why is this the case?

Today’s social networks are nothing more than the famous walled gardens of the Web—as was previously discussed.

With their closely-guarded data silos, social networks are not full participants in the Web, they are not participants in the interconnected data ecosystem. So, unlike an ecological web (think of a food web), the Web-based Internet is not as much of an intact web as it is a land of social network islands that punctuate an ocean of truly connected websites.

The Social Web, on the other hand, is a fully functioning and healthy ecosystem were all data is globally connected. In my view, the only way to bring to fruition the promise of the Social Web is to embrace the Semantic Web.

The term Social Semantic Web is often used to differentiate between the current social-network based Web and a truly connected Web of Data. Since I believe that the Social Web requires the Semantic Web, I view the two terms as synonyms.

What might a truly connected Social Web look like?

I use this image as a graphical representation of what an open, fully linked, global Social Web would look like (see the caption for the actual description of the image). Imagine that each end point is a user creating their Drops that freely flow down their Stream, into their River, finally ending up in the MBO cloud. Each node, the point were multiple Streams converge, would be a Confluence Hub Server.

This image is a tracing of all the Internet traffic circa late 2006. It is licensed under a Creative Commons License (by-nc-sa/1.0) and created by http://opte.org/

Where would the big social networks appear on this graph?

Twitter would be a single point in this image with a few tenuous tendrils extending out representing the limited access that Twitter allows to their data silos via their proprietary APIs. There would be no lines representing conversations between users as the totality of conversation all occurs within the walled-off Twitter space.

The same holds true for Facebook, Google Buzz, FriendFeed, LinkedIn, and many of the other social networks. The lines connecting these services would be nothing more than gossamer strands representing the brute-force pushing of limited duplicate content between these data silos.

You might be thinking that conversations regularly occur between users of these platforms. For instance, I can choose to show my latest tweets on Facebook or LinkedIn, I can choose to display my latest Facebook or LinkedIn status updates on Twitter, and so forth. But these are not conversations. They are just snapshots of conversation that are occurring within other data silos.

Anatomy of a Drop

A Drop contains more than just the visible content, more than just the human-readable layer. A Drop is a packet composed of several layers, each providing additional metadata that makes the management and discovery of data more feasible.

Click to see full size

Content Layer: that part of the Drop that is actually intended to be seen by humans; also referred to as the droplet

Metadata Condensate: when the Drop is being assembled, different metadata layers are aggregated together, which are then deposited into a super-metadata layer. This layer encodes all the supporting data that makes extensibility, management, delivery, and discovery of the user’s Drop possible.

The Metadata Condensate layer is composed of five sub layers:

Rich-media Layer: pointers to associated audio, video, or picture files

Semantic Layer: the machine-readable, semantically-marked up metadata

Rights Layer: the granted usage rights for the Drop

Using the proposed Protocol for Implementing Open Access Data as a model, Drops, Channels, and even entire Streams could be marked with usage rights

Security Layer: WebID to tag Drop to specific user; whether Drop is public, private

Stream Management Layer: unique Drop ID; time stamp; GIS metadata (location-based tagging for mobile microblogging); Channel tag for grouping Drop content (allows filtering by other users); whether Drop is to be broadcast to all, a specific user group, or to one specific user; Drop broadcast delay; Drop time decay (a finite lifespan for Drop if desired); client metadata (whether Drop was sent via Web client, desktop client, via a CHS service, etc.)

Semantifying the Drop addressees several key issues that hinder current microblogging platforms. First, by providing a mechanism where machine-readable metadata can be effectively and efficiently associated with Drops, this unlocks each micro data silo, opening it up to outside services to access via query. Second, organizing, grouping, classifying Drops into Channels allows for meaningful filtering of users content. Third, by using a FOAF+SSL backed WebID, privacy and identity management across the MBO becomes possible.

Whereas users can still add tags (via micro and nanoformats) when composing each Drop–and maybe even some basic html markup, like the “a” link tag–the real benefit accrues from the automatic encoding of semantic metadata into the Drop.

Additional ontological encoding could occur on each Drop via a Semantic Interface Options box on the Drop composition panel.

It’s important to note that although each individual user will have the right to determine how much of their microblogging content is shareable across the Web and even with whom it can be shared, in concept, if a user is wishing to participate in the global microblogging community, it is assumed that they will wish others to see what they have to say.

This is just an initial concept of the structure of a Drop. It may be that one or more of the Metadata Condensate layers (or parts of a given layer) should be included under the Semantic Layer.

Some Technical Thoughts

This article is primarily a presentation of an initial concept. The technical details obviously need to be fleshed out. But I have ideas toward that end which I’ll present here in no particular order of importance.

User and Stream Management

How do users login into their Streams? How do users subscribe to another person’s Stream?

By using a combination of FOAF+SSL, the micbroblogging ecosystem would authenticate and authorize users based on their WebIDs.

So, as an example, a single user (authenticated via their WebID and FOAF+SSL) of type foaf:Person will subscribe to, will follow the Streams of many users of type foaf:Person.

Fault Tolerance and Redudancy

Redundant distribution and replication to geo-disparate Confluence Hub Servers could provide additional fault-tolerance for those stream providers who want too ensure that their subscribers are guaranteed access to their Streams at all time. This would be very useful in crisis situations where the real-time nature of microblogging has proven extremely beneficial during several recent natural disasters.

Platform Ecosystem

My model of a decentralized semantic microblogging ecosystem (the Meta-Hydrological Model) requires three basic software components:

Personal Stream Server (PSS): the client software that a user uses to create their Stream and manage their River.

Community Stream Server (CSS): for those users who do not want to manage their own self-hosted solution, a community-based, public Stream provider is necessary. Such providers could offer the service for free or for a fee. The important issue here is that all users with an account at a Community Stream Server would be the owners of all their data, deciding how the data is used and exposed. If they wished to move their data (their Stream identity) to another server, they could easily do so. Community Stream Servers would be configured so that users could brand their identity, using their own domain names.

Confluence Hub Server (CHS): this has been discussed in more detail above. In addition to the aforementioned duties, each CHS would also be responsible for co-aggregating the realtime view of the MicroBlogOcean.

Unlike the handful of DNS root zones in the Domain Name System, the number of Confluence Hubs would not be limited by any authority. Anyone who meets a set of minimum requirements (hardware, software, and bandwidth) could host a CHS. Although anyone could download the CHS platform software, only those whose setup meet the minimum requirements would be able to initiate an active CHS service.

Client-Server Software Architecture

The software architecture of client and server, as well as the UI/UX, is beyond the scope of this article. Although I do have a few top-level suggestions/ideas:

  1. Software stack must utilize all open-source based technologies
  2. Use of a graph database backend (or a similar NOSQL DB) which is better suited at modeling the graph-like nature of social networks. For more details on this comment, look for my upcoming Powering Startups to Become Smartups series.
  3. Possibly the use of a language that allows for coding of a Web-based interface as well as desktop client software (Java, Python, or Ruby to name a few). One of the drivers of growth and success for Twitter has been the development of 3rd-party desktop clients. It may make sense to offer an initial version of such a client along with the Web-based interface.

These are just kernels of an idea about possible architectural considerations.

Possible Extensions to FOAF and SIOC Ontologies

As the FOAF specification states, “FOAF documents describe the characteristics and relationships amongst friends of friends, and their friends, and the stories they tell.” In the world of social networking–especially decentralized microblogging–the concept of friend can be very nebulous.

This is why microblogging services like Twitter and Google Buzz use the term follower, and FriendFeed (owned by Facebook) uses the term subscriber. It is a one-way relationship that does not have implicit reciprocity.

In other words, just because I follow you does not imply that you follow me, that you plan on following me, or that you will ever follow me. In fact, in practically all cases, users with large followings do not know and are not even aware of the vast majority of their followers.

The FOAF concepts of “friend” and “know” are often not in tight alignment with the realities of the newer social networks. A better classification of these relationships needs to be created.

A new FOAF class of foaf:Following may be all that is needed to rectify this type mismatch. A list of all the people that a given user is following could easily be compiled by querying the system for all unique foaf:Following relationships. This list could be further broken down by unique social networks by extending the query to include property foaf:account. It would equally be simple to determine all of the people who are following a given user.

Addendum: Thanks to comments below from John Breslin and Alexandre Passant who pointed out the SIOC specification does have the sioc:follows property. So, using foaf:Person with sioc:follows could properly classify a following relationship.

How should users of a globally decentralized semantic microblogging platform be classified?

Each user would be identified via their WebID and not their sioc:User type—which is utilized only for marking up the various accounts a user has throughout the Web of social networks.

Whereas the SIOC Core Ontology is designed for easy extendability, the emergence of decentralized microblogging may necessitate an addition to the core classes as the current classes do not fully capture the uniqueness of such a system.

Whereas discussions within traditional blogs and forums occur on the same site (within the same data silo), discussions on a decentralized microblogging cloud are not the same. The discussions occur across the cloud, across the Social Semantic Web. This then becomes an issue of classifying relationships within the Social Web and not between disparate social networks and their data silos.

Some Early Players in This Space

There are a few early players in the decentralized microblogging platform space and at least one in the open source centralized blogging arena. It is important to note that only one of the players below is working on a decentralized semantic microblogging implementation.

  • SMOB: self described as an open, distributed Semantic MicroBlogging framework
  • 6d: self described as decentralized social network. This is not a true microblogging platform but I thought it should be included for reference.
  • onesocialweb: an open-source application created by the Vodafone Group described as a free, open, decentralized microblogging platform
  • StatusNet: the open source, centralized microblogging platform that powers Identi.ca

Which of these is the right solution?

While all of these are encouraging entrants in the space, SMOB shows the most promise at this time as it is the only platform that is working on bringing about the Social Web through decentralized semantic microblogging.

Conclusion

It’s time to return to the original concept of the Web-based Internet—an interconnected, decentralized and distributed, open and independent cacophony of individuals who control their own Webspace, operate their own communication channel, and freely communicate with others without having to worry about a central point of failure.

The only way to build a truly open and decentralized global microblogging network is by leveraging the power of the Semantic Web. Doing so will help usher in the reality of the Social Web.

Decentralizing and individualizing Stream creation and management will help ensure that the MicroBlogOcean does not have a central point of failure and does not require a central-controlling authority. With a properly semantified and structured Stream, even efficient and effective privacy and identity management become feasible.

This article is just one drop in the bucket (yep, I had to say it). It is a first version of an evolving concept. As people provide constructive feedback and the idea gets debated, I’ll openly evolve this concept to better reflect the realities of the emerging Social Web and the technologies that will help bringing it to fruition.

Entrepreneurship, Innovation, Operations, blog

For Startups, How Much Process Is Too Much?

Whether they're found in a garage or inside an established enterprise, startups struggle with decisions about process and infrastructure. The speed at which a startup can learn is its competitive advantage and the defining factor in its success. But startups can't rely on the processes and infrastructure that their established competitors use, because those "best practices" tend to kill disruptive innovation.

Still, startups develop some kind of process — whether it's disciplined, haphazard, bureaucratic or empowering — because building a great product depends on it.

They just need to balance process with innovation. Companies that insist on building a world-class infrastructure before shipping a product are doomed to "achieve failure," because they're starved of feedback for too long. I learned this lesson first hand in a previous company (read the sad story here). On the other hand, companies that take a "just do it" attitude without any process at all are also taking a major gamble. High-profile startup Friendster had first-mover advantage in the social networking space, but created openings for competitors when it could not scale to meet demand.

Finding the right balance requires an understanding of the fundamental feedback loop that powers all startups. It begins with an idea, which is translated into a product via the "build stage." When customers interact with that product, they create data, which startups harvest in the "measure stage." And, with any luck, that data will inform the company in the "learn stage," and that learning will influence the next set of ideas. This three-stage feedback loop sounds simple, but it's powerful nonetheless. It gives rise to this heuristic for evaluating any process or infrastructure change in the context of a startup:

Always choose the option that minimizes the total time through the feedback loop.

In other words, any change that accelerates learning is a win, and everything else is waste. This is very different from the trade-offs that need to be made in situations where the goal is to optimize for profit, margin, or growth.

The lean movement has been preaching waste reduction for many years, and anyone familiar with those ideas will understand how it applies here. The only difference here is that instead of measuring the creation of value by our ability to produce tangible high-quality artifacts, startups measure value by validated learning about customers.

This approach clashes with classic product management and product development. The detailed specification documents that PMs demand go stale too quickly to keep up with a fast-learning team. Massive data warehousing reports used in product dev do what warehosues do well, store data. They don't promote learning, because people learn best when presented with a small number of actionable metrics. And engineers who build heavyweight architectures may design a technical triumph, but lack the agility to adapt when the goal of the system changes radically.

Every process a startup uses operates at one stage of the feedback loop. But lean startup practices have the effect of optimizing the total time through the loop. Practices that are harmful are the ones that optimize our ability to do just one of the three stages well. For example, you can build much faster if you don't "waste time" measuring. That's like suggesting you can drive faster if you close your eyes and hit the accelerator. It's true, but dangerous. The same is true for departmental structures that work like silos. They may work in large companies, but in startups they're dangerous because they encourage people to improve at their specialized job rather than maximizing learning.

Using just the right amount of process can help startups accelerate. But, for the entrepreneur starting from scratch, investments in process and infrastructure are expensive, and take time and energy away from work that directly benefits customers. Even worse, process investments can quickly become obsolete as a company grows, and management challenges evolve. Adapting a process to this ever-changing reality requires a commitment to continuous improvement and incremental investment, which will be the subject of the next post in this series.

Eric Ries is the author of StartupLessonsLearned.com and is an adviser to many startups, companies, and venture capital firms.

Managing yourself, Social Media, blog

Managing Myself: Almost Off the Twitter Fence

A couple of weeks ago, my manager asked me to sign up for Twitter. The basis: Twitter is a big source of traffic for HBR.org, and as editors we should all understand how readers reach us. No arguments there, but I still had my reservations.

The only aspect of social media I've truly embraced is the acronym MYOB (and even that I say aloud more than I text). I understand the reasons for HBR's Twitter accounts, but have struggled to understand the value of a personal account. The mash of personal and professional and serious and frivolous reminds me of multitasking — a lot of action without a lot of substance. Do I really need to add more noise to an already noisy life?

But then I came across two articles that forced me to look at "prosonal" lifestyles and 24/7 connectivity in a new way.

The first was Smart Dust? Not Quite, but We're Getting There in the New York Times. It's about microchip-equipped sensors "designed to monitor and measure not only motion, but also temperature, chemical contamination or biological changes." If scientists can ever figure out an efficient way to power them, their applications include bridges that can sense metal fatigue and tell engineers they need repairs, and fruits and vegetables that can tell grocers when they ripen and begin to spoil.

You can't help but imagine the evil twins of these well intentions, but what caught my eye social-media-wise were the last few paragraphs. They talk about a separate group of researchers who say forget about smart dust. Instead, we should focus on aggregating data through wireless devices that already exist, like cell phones. A specific example is your.flowing.data.com, a Twitter application for self-reported data on one's daily life. By assembling the data into graphs that show behavior over time, the application has the potential to help people make smarter decisions about their health — eating habits, weight, blood pressure, glucose, and sleep times. So, there I had it: a purpose to the chatter on a personal level.

The second piece, Business Friendships from Columbia Ideas at Work, cautions against overly firm boundaries between personal and professional. The authors — Paul Ingram, a professor at Columbia, and PhD candidate Xi Zou — found that not only are friendships with colleagues rewarding for their own sake, there are also a number of ways they can benefit your career and organizational performance. For some cultures, this is old news: In China, professionals rely on their friendships to run firms and acknowledge that personal ties can help build trust and accountability into professional relationships. For example, finding out that a colleague comes from the same village is viewed as evidence of reliability and trustworthiness.

Another advantage I can imagine is an increased ability to empathize with your coworkers by knowing more about their lives. If you see your colleague's tweet that the "monster" under his son's bed woke him up at 3 AM, you might be less annoyed when he's late for your morning meeting.

Admittedly, these two articles are only tangentially connected to Twitter, but for someone who has read the typical pro arguments and failed to be convinced, they were just the perspective I needed. Now the only thing stopping me is rule number three for how to use social media: "You gotta create awesome, awesome content."

No pressure.

Managing yourself, Social Media, blog

Managing Myself: Almost Off the Twitter Fence

A couple of weeks ago, my manager asked me to sign up for Twitter. The basis: Twitter is a big source of traffic for HBR.org, and as editors we should all understand how readers reach us. No arguments there, but I still had my reservations.

The only aspect of social media I've truly embraced is the acronym MYOB (and even that I say aloud more than I text). I understand the reasons for HBR's Twitter accounts, but have struggled to understand the value of a personal account. The mash of personal and professional and serious and frivolous reminds me of multitasking — a lot of action without a lot of substance. Do I really need to add more noise to an already noisy life?

But then I came across two articles that forced me to look at "prosonal" lifestyles and 24/7 connectivity in a new way.

The first was Smart Dust? Not Quite, but We're Getting There in the New York Times. It's about microchip-equipped sensors "designed to monitor and measure not only motion, but also temperature, chemical contamination or biological changes." If scientists can ever figure out an efficient way to power them, their applications include bridges that can sense metal fatigue and tell engineers they need repairs, and fruits and vegetables that can tell grocers when they ripen and begin to spoil.

You can't help but imagine the evil twins of these well intentions, but what caught my eye social-media-wise were the last few paragraphs. They talk about a separate group of researchers who say forget about smart dust. Instead, we should focus on aggregating data through wireless devices that already exist, like cell phones. A specific example is your.flowing.data.com, a Twitter application for self-reported data on one's daily life. By assembling the data into graphs that show behavior over time, the application has the potential to help people make smarter decisions about their health — eating habits, weight, blood pressure, glucose, and sleep times. So, there I had it: a purpose to the chatter on a personal level.

The second piece, Business Friendships from Columbia Ideas at Work, cautions against overly firm boundaries between personal and professional. The authors — Paul Ingram, a professor at Columbia, and PhD candidate Xi Zou — found that not only are friendships with colleagues rewarding for their own sake, they also can benefit your career and organizational performance in a number of ways. For some cultures, this is old news: In China, professionals rely on their friendships to run firms and acknowledge that personal ties can help build trust and accountability into professional relationships. For example, finding out that a colleague comes from the same village is viewed as evidence of reliability and trustworthiness.

Another advantage I can imagine is an increased ability to empathize with your coworkers by knowing more about their lives. If you see your colleague's tweet that the "monster" under his son's bed woke him up at 3 AM, you might be less annoyed when he's late for your morning meeting.

Admittedly, these two articles are only tangentially connected to Twitter, but for someone who has read the typical pro arguments and failed to be convinced, they were just the perspective I needed. Now the only thing stopping me is rule number three for how to use social media: "You gotta create awesome, awesome content."

No pressure.

Enterprise 2.0, Web 2.0, blog

8 Guiding Principles for Pilot Programs: A Key for Enterprise 2.0

In my Implementing Enterprise 2.0 report I put Iterate and Refine at the center of the Enterprise 2.0 Implementation Framework.

e2impl_framework_500w.jpg

One of the most critical elements of this principle is the ability to establish and run effective pilot programs.

Below is an excerpt from Chapter 17 of Implementing Enterprise 2.0 on Pilots, which describes 8 guiding principles for pilot programs.

GUIDING PRINCIPLES FOR PILOTS
While there are no hard and fast rules for establishing successful pilots, eight guiding principles that should be kept in mind are:

“It is reasonably cheap and easy to get a pilot up and running to evaluate how successful a new Technology will be. Fail fast, fail cheap. Set things up as pilots and pick up the lessons.”
CIO, large property developer

1. Select fertile ground.
Pilots often establish the tone for how broader initiatives are received across the organization. Stories – both positive and negative – about the success of pilots often filter out very widely. A successful pilot can easily take a life of its own as others hear about the benefits and actively want to apply them in their own work. Failures can often be referred to across the organization as reasons why related initiatives will not succeed. While you can never expect all pilots to be successful, maximize chances of success by selecting the most promising projects and the best team, and make it easy for them to identify value.

2. The pilot team is critical.
Perhaps the single most important success factor in Enterprise 2.0 pilots is the people involved. While there are a number of criteria in selecting a pilot team (for more details see below), the single most important attribute is enthusiasm. Given that these technologies require new approaches to working and communicating, uptake and resultant benefits will depend substantially on the degree of use and experimentation during the pilot.

3. Design around business applications or benefits NOT tools.
Far too often organizations decide to trial specific tools such as wikis and blogs without having a clear idea of why they are doing so. In the majority of these cases pilots fail to gain traction or result in clear benefits. Pilots should be designed either to create specific benefits such as streamlined processes and faster outcomes (which will provide clear measures for the pilot’s success), or an application such as project management or creating better sales forecasts.

4. Define scope but encourage experimentation.
In establishing pilot projects, a balance needs to be struck between having clarity on the intentions and scope of the pilot, and allowing experimentation that may uncover even more valuable uses and applications. A definition of pilot scope includes the immediate objectives, participants, and timeline for review. However if variations on the intended activity, or even entirely different approaches, seem to offer potential business value, these should be encouraged. Remember that experimentation is often the source of much of the value of Enterprise 2.0 implementations.

5. Design the pilot to learn useful lessons and expand.
Pilots are established with the primary intention of demonstrating value so that they can be applied more broadly across the organization. However even if the pilot is very successful, it should not necessarily be implemented in the same way for future roll-outs. And if the pilot is not seen to be successful, there may be even more useful lessons on how to improve or refine subsequent initiatives. As such, there need to be specific systems to capture lessons during and at the conclusion of the pilot.

6. Provide training and guidance.
If no training is provided on the use of a new tool, it should not be surprising if it is not used or used well. This can be done in many formats, including brief online learning sessions. It is possible to setup pilots so that usage guidelines and recommendations are provided at first login, and regularly during the process of the trial.

7. Create visibility.
In many cases you will want pilots to be visible outside the pilot group, in order to attract participation, generate demand in the rest of the organization, and stimulate ideas for other applications. For example providing reference materials on IT support or HR policies creates broad visibility for new approaches. However in some cases you may choose to keep pilots less visible if there are greater risks of failure or active experimentation by a small team.

“The earlier you determine when something should be killed, the better.”
Charlie Beaver, vice-president, Booz Allen Hamilton

8. Monitor progress and cut or expand.
It is a mistake to set up a series of pilots without subsequently assessing progress. That can be easier with Enterprise 2.0 tools than with some other technologies, given the very low costs of the tools. The mantra of “fail fast, fail cheap” is immensely relevant here. Specific timeframes – usually measured in months – for the pilots need to be established at the outset. Success needs to be assessed both in terms of the initial objectives and/or any other value that has been created in the process of the project (see principle 5 above). Decisions must be made on whether to continue, expand, discontinue, or change the pilot.

Enterprise 2.0, Web 2.0, blog

8 Guiding Principles for Pilot Programs: A Key for Enterprise 2.0

In my Implementing Enterprise 2.0 report I put Iterate and Refine at the center of the Enterprise 2.0 Implementation Framework.

e2impl_framework_500w.jpg

One of the most critical elements of this principle is the ability to establish and run effective pilot programs.

Below is an excerpt from Chapter 17 of Implementing Enterprise 2.0 on Pilots, which describes 8 guiding principles for pilot programs.

GUIDING PRINCIPLES FOR PILOTS
While there are no hard and fast rules for establishing successful pilots, eight guiding principles that should be kept in mind are:

“It is reasonably cheap and easy to get a pilot up and running to evaluate how successful a new Technology will be. Fail fast, fail cheap. Set things up as pilots and pick up the lessons.”
CIO, large property developer

1. Select fertile ground.
Pilots often establish the tone for how broader initiatives are received across the organization. Stories – both positive and negative – about the success of pilots often filter out very widely. A successful pilot can easily take a life of its own as others hear about the benefits and actively want to apply them in their own work. Failures can often be referred to across the organization as reasons why related initiatives will not succeed. While you can never expect all pilots to be successful, maximize chances of success by selecting the most promising projects and the best team, and make it easy for them to identify value.

2. The pilot team is critical.
Perhaps the single most important success factor in Enterprise 2.0 pilots is the people involved. While there are a number of criteria in selecting a pilot team (for more details see below), the single most important attribute is enthusiasm. Given that these technologies require new approaches to working and communicating, uptake and resultant benefits will depend substantially on the degree of use and experimentation during the pilot.

3. Design around business applications or benefits NOT tools.
Far too often organizations decide to trial specific tools such as wikis and blogs without having a clear idea of why they are doing so. In the majority of these cases pilots fail to gain traction or result in clear benefits. Pilots should be designed either to create specific benefits such as streamlined processes and faster outcomes (which will provide clear measures for the pilot’s success), or an application such as project management or creating better sales forecasts.

4. Define scope but encourage experimentation.
In establishing pilot projects, a balance needs to be struck between having clarity on the intentions and scope of the pilot, and allowing experimentation that may uncover even more valuable uses and applications. A definition of pilot scope includes the immediate objectives, participants, and timeline for review. However if variations on the intended activity, or even entirely different approaches, seem to offer potential business value, these should be encouraged. Remember that experimentation is often the source of much of the value of Enterprise 2.0 implementations.

5. Design the pilot to learn useful lessons and expand.
Pilots are established with the primary intention of demonstrating value so that they can be applied more broadly across the organization. However even if the pilot is very successful, it should not necessarily be implemented in the same way for future roll-outs. And if the pilot is not seen to be successful, there may be even more useful lessons on how to improve or refine subsequent initiatives. As such, there need to be specific systems to capture lessons during and at the conclusion of the pilot.

6. Provide training and guidance.
If no training is provided on the use of a new tool, it should not be surprising if it is not used or used well. This can be done in many formats, including brief online learning sessions. It is possible to setup pilots so that usage guidelines and recommendations are provided at first login, and regularly during the process of the trial.

7. Create visibility.
In many cases you will want pilots to be visible outside the pilot group, in order to attract participation, generate demand in the rest of the organization, and stimulate ideas for other applications. For example providing reference materials on IT support or HR policies creates broad visibility for new approaches. However in some cases you may choose to keep pilots less visible if there are greater risks of failure or active experimentation by a small team.

“The earlier you determine when something should be killed, the better.”
Charlie Beaver, vice-president, Booz Allen Hamilton

8. Monitor progress and cut or expand.
It is a mistake to set up a series of pilots without subsequently assessing progress. That can be easier with Enterprise 2.0 tools than with some other technologies, given the very low costs of the tools. The mantra of “fail fast, fail cheap” is immensely relevant here. Specific timeframes – usually measured in months – for the pilots need to be established at the outset. Success needs to be assessed both in terms of the initial objectives and/or any other value that has been created in the process of the project (see principle 5 above). Decisions must be made on whether to continue, expand, discontinue, or change the pilot.

Conference Reports, Open Data, Talis, blog

Open… and Mobile?

light trailsI know what you’re thinking: “He’s going to say Data!”

Well, I might do at some point, but I was going to say “Days”. Last month, Talis flung open its doors to 30 or so folk who were interested in SPARQL, the Semantic Web and Linked … er, Data. The idea was to host an informal event for folks learn about much of what we’ve been talking about for the past few years. We planned some talks on what it means to join up your data, what this Platform is about, and a detailed introduction to SPARQL. With the launch of data.gov.uk and many of the stories covered over in the Magazine, it seemed possible that people were starting to get interested in this whole Linked Data scene.

So, we sent out some invites and tweeted a bit, and soon had to cap the registration numbers. We filled up spaces in the January day not long after New Year, and the February day not long after the January one. March is quickly filling up too (hint). I have to admit, I wasn’t expecting this many people to express an interest so soon. Not only did people sign up, but travelled to Birmingham through adverse weather to come and take part at both ‘Days—and we’ve had a lot of fun.

One thing that seemed to be a good idea was to ask for feedback before the event. It sounds wrong, but the point of an Open Day is to cover things that YOU’re interested in learning or exploring. So, when people registered, they were asked for their expectations and what they’d like to take away with them from such an event—aside from a T-shirt and SPARQL mug, obviously. It made it much easier to work out what we should cover, and I hope it meant that we were able to talk about the things most relevant to the people who came along.

I’d like to do it again, but slightly differently. Instead of hosting an Open Day here at Talis HQ, what if we came to you? Would you be interested in attending a Talis Platform Roadshow? What would you want us to cover? More importantly, where would you like us to go?

Comments below, or email me or tweet me.

Conference Reports, Open Data, Talis, blog

Open… and Mobile?

light trailsI know what you’re thinking: “He’s going to say Data!”

Well, I might do at some point, but I was going to say “Days”. Last month, Talis flung open its doors to 30 or so folk who were interested in SPARQL, the Semantic Web and Linked … er, Data. The idea was to host an informal event for folks learn about much of what we’ve been talking about for the past few years. We planned some talks on what it means to join up your data, what this Platform is about, and a detailed introduction to SPARQL. With the launch of data.gov.uk and many of the stories covered over in the Magazine, it seemed possible that people were starting to get interested in this whole Linked Data scene.

So, we sent out some invites and tweeted a bit, and soon had to cap the registration numbers. We filled up spaces in the January day not long after New Year, and the February day not long after the January one. March is quickly filling up too (hint). I have to admit, I wasn’t expecting this many people to express an interest so soon. Not only did people sign up, but travelled to Birmingham through adverse weather to come and take part at both ‘Days—and we’ve had a lot of fun.

One thing that seemed to be a good idea was to ask for feedback before the event. It sounds wrong, but the point of an Open Day is to cover things that YOU’re interested in learning or exploring. So, when people registered, they were asked for their expectations and what they’d like to take away with them from such an event—aside from a T-shirt and SPARQL mug, obviously. It made it much easier to work out what we should cover, and I hope it meant that we were able to talk about the things most relevant to the people who came along.

I’d like to do it again, but slightly differently. Instead of hosting an Open Day here at Talis HQ, what if we came to you? Would you be interested in attending a Talis Platform Roadshow? What would you want us to cover? More importantly, where would you like us to go?

Comments below, or email me or tweet me.

Social Media, blog, design, technology

Four Ways of Looking at Twitter

Data visualization is cool. It's also becoming ever more useful, as the vibrant online community of data visualizers (programmers, designers, artists, and statisticians — sometimes all in one person) grows and the tools to execute their visions improve.

Jeff Clark is part of this community. He, like many data visualization enthusiasts, fell into it after being inspired by pioneer Martin Wattenberg's landmark treemap that visualized the stock market.

Clark's latest work shows much promise. He's built four engines that visualize that giant pile of data known as Twitter. All four basically search words used in tweets, then look for relationships to other words or to other Tweeters. They function in almost real time.

"Twitter is an obvious data source for lots of text information," says Clark. "It's actually proven to be a great playground for testing out data visualization ideas." Clark readily admits not all the visualizations are the product of his design genius. It's his programming skills that allow him to build engines that drive the visualizations. "I spend a fair amount of time looking at what's out there. I'll take what someone did visually and use a different data source. Twitter Spectrum was based on things people search for on Google. Chris Harrison did interesting work that looks really great and I thought, I can do something like that that's based on live data. So I brought it to Twitter."

His tools are definitely early stages, but even now, it's easy to imagine where they could be taken.

Take TwitterVenn. You enter three search terms and the app returns a venn diagram showing frequency of use of each term and frequency of overlap of the terms in a single tweet. As a bonus, it shows a small word map of the most common terms related to each search term; tweets per day for each term by itself and each combination of terms; and a recent tweet. I entered "apple, google, microsoft." Here's what a got:

twittervenn.jpg

Right away I see Apple tweets are dominating, not surprisingly. But notice the high frequency of unexpected words like "win" "free" and "capacitive" used with the term "apple." That suggests marketing (spam?) of apple products via Twitter, i.e. "Win a free iPad...".

I was shocked at the relative infrequency of "google" tweets. In fact there were on average more tweets that included both "microsoft" and "google" than ones that just mentioned "google."

So then I went to Twitter Spectrum, a similar tool that compares two search terms and shows which words are most commonly associated with each term and which words are most commonly used in tweets with both terms. Here's the "google, microsoft" Twitter Spectrum:

twitterspectrum.jpg

I love that the word "ugh" is dead center between Google and Microsoft. But the prominence of social media terms on the blue side versus search terms on the red side is fascinating. It looks like two armies marching at each other ready to fight different wars.

Clark has also created TwitArcs. This one, I feel, is still a work in progress and Clark says "visually I like it but it might be the least useful so far." In this case, you type in a tweeter's handle and it returns a stream of that person's tweets with arcs that link common words between tweets (on the right) and common retweeters (on the left). Rolling your mouse over highlights the last tweet in the arc. Here's a TwitArc of @timoreilly:

twitarc.jpg

Finally, the Stream Graph. Enter a search term and Clark's engine returns the frequency of the most common words found with your search term for the last 1,000 tweets. You see a literal flow of conversation. You can also highlight one term to see how its frequency changed over time and you'll see the most recent tweets that include both your search term and that highlighted term.

Sometimes 1,000 tweets with your term may span weeks. For my search term, "Tiger Woods" which I entered yesterday afternoon right after news that he'd speak publicly broke, 1,000 tweets covered about 20 minutes. Here's the "Tiger Woods" stream graph with "silence" highlighted:

streamgraph.jpg

It isn't hard to imagine how this may be applicable to business. I can already see eager marketers watching the stream flow by as their commercial debuts during next year's Super Bowl.

Clark, like many data visualizers, believes we're on the front end of a revolution in information presentation. "There's a lot of work done called scientific visualization or business intelligence graphics," he says. "And it's pragmatic, trying to solve practical problem. It's all standard, a bar chart or pie. But those standard ways are not adequate when you're trying to mine a richer data space. The world is full of complex data and we're just starting to get the tools to make sense of it. We're looking for new ways of presenting data."