<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>simon button • com &#187; data</title>
	<atom:link href="http://www.simonbutton.com/category/data/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.simonbutton.com</link>
	<description>Deep into a world that interests, fascinates and never fails to surprise me!</description>
	<lastBuildDate>Wed, 01 Feb 2012 01:18:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Social Capitalism and The Culture of Data</title>
		<link>http://www.simonbutton.com/2010/06/30/social-capitalism-and-the-culture-of-data/</link>
		<comments>http://www.simonbutton.com/2010/06/30/social-capitalism-and-the-culture-of-data/#comments</comments>
		<pubDate>Wed, 30 Jun 2010 08:01:59 +0000</pubDate>
		<dc:creator>Dan Robles</dc:creator>
				<category><![CDATA[art]]></category>
		<category><![CDATA[blog]]></category>
		<category><![CDATA[culture]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[emotion]]></category>
		<category><![CDATA[General Information]]></category>
		<category><![CDATA[information]]></category>
		<category><![CDATA[Innovation]]></category>
		<category><![CDATA[knowledge]]></category>
		<category><![CDATA[Politics]]></category>
		<category><![CDATA[Social Capitalism]]></category>
		<category><![CDATA[spirituality]]></category>
		<category><![CDATA[wisdom]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[Social media has also shown us what happens when the good data becomes the important information, which increases knowledge among the most people leading to increasingly effective innovation and changing the conventional wisdom about an increasing dive...]]></description>
			<content:encoded><![CDATA[<p>Social media has also shown us what happens when the good data becomes the important information, which increases knowledge among the most people leading to increasingly effective innovation and changing the conventional wisdom about an increasing diversity of subjects.  Social Capitalism will replace Market Capitalism simply because the culture is superior.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonbutton.com/2010/06/30/social-capitalism-and-the-culture-of-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sharing Data on the Web</title>
		<link>http://www.simonbutton.com/2010/02/05/sharing-data-on-the-web/</link>
		<comments>http://www.simonbutton.com/2010/02/05/sharing-data-on-the-web/#comments</comments>
		<pubDate>Thu, 04 Feb 2010 14:56:24 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[blog]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[Nodalities Magazine]]></category>
		<category><![CDATA[Open Data]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[&#124; This article will appear in Nodalities Magazine, Issue 9.
by Kaitlin Thaney
Program Manager of Science Commons, Creative Commons

In the emerging data web, there have been multiple efforts working towards the same broad goal of data sharing (ie., the...]]></description>
			<content:encoded><![CDATA[<p><strong>| This article will appear in Nodalities Magazine, Issue 9.</strong></p>
<p><em>by <a href="http://sciencecommons.org/about/whoweare/thaney/">Kaitlin Thaney</a></em><br />
<em>Program Manager of Science Commons, Creative Commons</em></p>
<p><a href="http://blogs.talis.com/nodalities/files/2010/02/Photo-32.jpg"><img src="http://blogs.talis.com/nodalities/files/2010/02/Photo-32-300x225.jpg" alt="Photo 32" width="300" height="225"></a></p>
<p>In the emerging data web, there have been multiple efforts working towards the same broad goal of data sharing (ie., the NeuroCommons, Linked Open Data, efforts of the World Wide Web Consortium), but are still unevenly distributed. Our understanding of the legal, social and technical issues is increasing, but still is at a very early stage. </p>
<p>This past fall at the International Semantic Web Conference in Chantilly, VA, USA, I joined three other leading minds to lead a tutorial examining some of the legal and social frameworks for sharing data in the emerging data web, focusing on an overview of the need for access, the social issues of applying Free-Libre/Open Source (FLOSS) licenses to data, and the approach we advocate at Creative Commons to help navigate this complex space — converging on the public domain. </p>
<h2>Lessons Learned</h2>
<p>Creative Commons as an organisation works to make knowledge sharing easy, legal and scalable – with applications in the culture space (music, text, film, art), education (open educational resources, virtual textbooks), and science (biological materials transfer, data sharing, Open Access, semantic web, patents). We maintain an integrated approach, and craft policy and legal tools to lower the barriers to knowledge sharing.  </p>
<p>When it comes to data sharing, first and foremost, the information needs to be legally and technically accessible. The Open Access movement has increased awareness to this, using the Creative Commons licensing suite to unlock content, and has seen its share of qualified success. But what to do when the information you want to share and reuse falls outside the protections of copyright?</p>
<p>In short, it’s complicated. </p>
<p>This is the where the discussion of legal protections for data gets murky. Knowledge is not always copyrightable – it may be easy to discern the rights associated with journal articles, but what about data, ontologies, annotations, or research statements described in triples? </p>
<p>The emergence, adoption, and use of the free-libre/open licensing regimes has allowed for remix and reuse of software code, music, film, educational resources and scientific research in a way that otherwise would be difficult to achieve. </p>
<p>The successes of these licensing approaches has caused a change in the social ethos of licensing, instead using a traditional “all rights reserved” model to make something <em>more</em> free, rather than less.</p>
<p>But from our research, this approach is not ideal for data. The trend towards applying licenses, click-wrap agreements and other sorts of restrictions on scientific data is increasing, but with the undesired consequence of limiting the downstream use of this information, and even at times blocking interoperability. The costs are high, the terms are not always clear, nor the protections always legally sound, making it very difficult to scale for scientific uses. The result is a high barrier to entry to do meaningful analysis, annotation, search, etc. on the mass of data available currently that’s continuing to grow exponentially, and integrating with the literature available. </p>
<p>We advocate an approach of converging on the public domain, and requesting behaviours often found in the various flavours of free and open licensing through norms – not a legal construct. But first, let’s take a look at some of the issues to be aware of and their social implications to furthering the goal of linked open data.</p>
<h2>Attribution v. Citation</h2>
<p>Under US Copyright law, “Copyright does not protect facts, ideas, systems, or methods of operation, although it may protect the way these things are expressed.”Since facts are not covered by copyright, attribution – a license obligation – doesn’t seem to apply to ideas or facts either, since those rights are conditional on compliance with terms of the <em>license</em>. </p>
<p>Socially, the scholarly concept of citation is fairly well understood – credit where credit us due. It has long been viewed as an entrenched norm of good scientific practice.</p>
<p>But when it comes to the legalities of both terms and how to enact this behaviour, the devil is in the details, and the two are actually rather different when it comes to enforceability and applications / ramifications in the digital world.  </p>
<p>In a copyright license, the word “attribution” is a legal requirement, whereas citation evokes more of a club mentality and social practice. Citation in its sole form is not assured or enforceable in the same way, but that’s not necessarily a downside. Ask yourself this, which one is more important – legal enforcement or credit enforced through professional reputation? Attribution – a relatively narrow legal term that can affect interoperability while at the same time possibly failing to provide what you really want? Or citation – an entrenched scientific norm that asks for credit where credit is due. </p>
<h2>Implications of FLOSS toggles and directives on data sharing</h2>
<p>These issues emerge when instead of focusing on maximizing interoperability of resources, one applies a property metaphor to data. And in the digital world, that tendency can have quite limiting ramifications to future use of the information, as technology continues to outpace the social components to data sharing. </p>
<p>Misunderstanding the legalities can lead to category errors on the social level, including unintentional infringement or on the other side of the spectrum, choosing not to use the resource for fear of infringement. The intentions are often good – believing that applying a less-restrictive copyright license is ensuring the data can be freely shared, reused, and built upon. But without existing precedent or involving a legal team, these issues make for a problematic area to navigate, creating additional confusion and burdens for the users, as well as data providers. </p>
<p>Let’s look at a few examples to gain a better understanding. </p>
<p><strong>Non-Commercial</strong> – When used in the context of data, what is a commercial use of the data web? Is it the extraction of a subset, a query that may touch on the data set, hyperlinking? </p>
<p><strong>Attribution</strong> – As detailed above, the definitions of attribution and citation are often conflated. Attribution speaks to the legal requirement triggered by the use of the work. But in the case of linked open data, if one were to run a query involving 30,000 data sources (something that is happening every day at an ever decreasing cost), would they then be required to attribute the contributors for all 30,000 databases? You can see how this unintended consequence of attribution stacking could impose a very daunting task for the researcher.</p>
<p><strong>Share-Alike</strong> – This toggle specifies that any derivative product be relicensed under the same terms. In the example above of running a large query, all it would take would be one database licensed with a share-alike provision for the entire derivate work to then be under the same terms and no other license. This leads to compatibility issues </p>
<p>There are other external mechanisms and limitations imposed by various jurisdictions and countries that can have a profound effect on data-sharing, especially in terms of international data sharing efforts. These include the <em>sui generis</em> database directive in the European Union, Crown Copyright, “sweat of the brow” and “industrious collection” limitations, trade secrets and unfair competition laws, adding another dimension of complexity to an already complex arena. </p>
<p>After convening a series of meetings, roundtables and other discussions with members of the scientific community, the need emerged for a legally accurate and simple solution, that reduced and/or eliminated the need for one to make the distinction of what’s protected. The conflict between understanding the legal issues and complexities can best be resolved by a two-fold approach:  (1) a reconstruction of the public domain and (2) the use of scientific norms to request behaviour through a non-license means. </p>
<h2>Converging on the Public Domain (+ Norms)</h2>
<p>We believe that the public domain is the best means to achieve maximum interoperability of data with the lowest imposed burdens on the user. This can be achieved through the use of a legal tool – either the Creative Commons CC0 Waiver or the Public Domain Dedication and License (PDDL) – waiving all intellectual property rights asserting that the provider makes no claims on the data. These tools put the work as closely into the public domain as possible. </p>
<p>It calls for data providers to waive all rights necessary for data extraction and re-use (ie., copyright, <em>sui generis</em> database rights, claims of unfair competition, implied contracts). It also requires the provider place no additional obligations such as copyleft or share-alike on the information, which could limit downstream use, as discussed above.</p>
<p>Science Commons also crafted the Protocol for Implementing Open Access Data – a protocol for evaluating database terms of use, in hopes of providing a unified framework for users to evaluate if any given database may be integrated with any other database.</p>
<p>The Protocol recommends one request behaviour, such as citation, through norms and terms of use rather than as a legal requirement based on copyright or contracts.</p>
<p>We are aware that different disciplines and jurisdictions call for different approaches, and this is not always a one-size-fits-all solution. With requesting behaviour through norms and terms of use rather than a legal construct, various scientific disciplines have the ability to develop their own norms for citation, allowing for legal certainty without constraining one community to the norms of another.</p>
<h2>Final Thoughts</h2>
<p>In the early days of the World Wide Web, there weren’t many free-libre licenses available, and after a debate over using GPL for the original web code, CERN chose to put it into the public domain. Getting the law out of the way was key to allow for network effects, and to the success of the Web.</p>
<p>Converge on the public domain and ensure the freedom to integrate. It’s the most scalable solution.</p>
<p><em><a href="http://creativecommons.org/licenses/by/3.0/">This work is licensed under a Creative Commons Attribution 3.0 License</a>.</em></p>
<h2>Resources</h2>
<ul>
<li><a href="http://neurocommons.org">http://neurocommons.org</a></li>
<li><a href="http://linkeddata.org">http://linkeddata.org</a></li>
<li><a href="http://www.w3.org">http://www.w3.org</a></li>
<li><a href="http://www.copyright.gov/help/faq/faq-general.htm">http://www.copyright.gov/help/faq/faq-general.htm</a></li>
</ul>
<p><img src="http://feeds.feedburner.com/~r/Nodalities/~4/5InnjBALf2M" height="1" width="1"></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonbutton.com/2010/02/05/sharing-data-on-the-web/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Twitter Data Analysis: An Investor’s Perspective</title>
		<link>http://www.simonbutton.com/2009/10/06/twitter-data-analysis-an-investor%e2%80%99s-perspective/</link>
		<comments>http://www.simonbutton.com/2009/10/06/twitter-data-analysis-an-investor%e2%80%99s-perspective/#comments</comments>
		<pubDate>Tue, 06 Oct 2009 02:20:20 +0000</pubDate>
		<dc:creator>Guest Author</dc:creator>
				<category><![CDATA[blog]]></category>
		<category><![CDATA[Company & Product Profiles]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[investor]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false"></guid>
		<description><![CDATA[
This is a guest post by Robert J. Moore, the CEO and co-founder of RJMetrics, a on-demand database analytics and business intelligence startup that helps online businesses measure, manage, and monetize better.  He was previously a venture capital anal...]]></description>
			<content:encoded><![CDATA[<p><img src="http://cache0.techcrunch.com/wp-content/uploads/2009/10/RobertJMoore-180x180.jpg" alt="RobertJMoore" title="RobertJMoore" width="180" height="180"></p>
<p><em>This is a guest post by <a href="http://www.crunchbase.com/person/robert-j-moore">Robert J. Moore</a>, the CEO and co-founder of <a href="http://www.rjmetrics.com">RJMetrics</a>, a on-demand database analytics and business intelligence startup that helps online businesses measure, manage, and monetize better.  He was previously a venture capital analyst and currently serves as an advisor to several New York startups.  Robert blogs at <a href="http://themetricsystem.rjmetrics.com/">The Metric System</a> and can be followed on Twitter at <a href="http://www.twitter.com/RJMetrics">@RJMetrics</a>.</em></p>
<p>A few weeks ago, <a href="http://www.insightpartners.com/">my former employer</a> led a <a href="http://blogs.wsj.com/deals/2009/09/24/breaking-news-twitter-to-raise-100-million-from-insight-t-rowe-price-other-investors/">$100 million investment</a> into Twitter and I must admit that I was quite jealous of my former colleagues.  Chances are they got the opportunity to do some very cool analytics on Twitter&#39;s data.</p>
<p>Rather than wonder about what I missed, I decided to figure out what I could from the outside looking in.  Using some statistical trickery, the Twitter API, and my <a href="http://www.rjmetrics.com/">RJMetrics</a> dashboard, I uncovered a ton of astonishing new information about Twitter.  Here are some highlights: </p>
<ul>
<li>Twitter&#39;s user growth is no longer accelerating.  The rate of new user acquisition has plateaued at around 8 million per month.</li>
<li>Over 14% of users don&#39;t have a single follower, and over 75% of users have 10 or fewer followers.</li>
<li>38% of users have never sent a single tweet, and over 75% of users have sent fewer than 10 tweets.</li>
<li>1 in 4 registered users tweets in any given month.</li>
<li>Once a user has tweeted once, there is a 65% chance that they will tweet again.  After that second tweet, however, the chance of a third tweet goes up to 81%.</li>
<li>If someone is still tweeting in their second week as a user, it is extremely likely that they will remain on Twitter as a long-term user.</li>
<li>Users who joined in more recent months are less likely to stop using the service and more likely to tweet more often than users from the past.</li>
</ul>
<p>Read on for some detailed charts a deeper dive into the data.</p>
<h2>How We Did It</h2>
<p>In most cases, this kind of outside-looking-in exercise wouldn&#39;t be possible.  Twitter, however, is a special case for a few reasons:</p>
<ul>
<li>The company is pre-revenue, so its value is wrapped up in user activity and engagement</li>
<li>A Twitter user&#39;s activity data (tweets, followers, etc) is all public by default</li>
<li>Twitter&#39;s API allowed me to automatically download up to 20,000 data points per hour</li>
<li>Twitter uses auto-incrementing ID numbers (1,2,3,4…) for both users and tweets</li>
<li>The <a href="http://en.wikipedia.org/wiki/Central_limit_theorem">central limit theorem</a> tells us, among other things, that a large enough random subset of a large data set will behave like its parent set with a high degree of statistical confidence</li>
</ul>
<p>In the end, our sample size consisted of about 85,000 users and just over 3 Million tweets.  By piecing all of these things together and pulling the data into the <a href="http://www.rjmetrics.com/">RJMetrics Dashboard</a>, I was able to chart loads of information about Twitter&#39;s user base and user behavior.  I&#39;ve looked around, and this appears to be the largest public analysis of Twitter&#39;s user base online.  Enjoy!</p>
<h2>Number of Twitter Users</h2>
<p>This analysis leverages the fact that Twitter uses auto-incrementing ID numbers for both users and tweets.  We identified the range of IDs that were consumed by the system in any given month and the percentage of them actually tied to real Twitter accounts.  (&quot;Dead&quot; IDs are likely canceled accounts, SPAM accounts, test accounts, etc.)  In combination, these numbers give us a reliable approximation of how many new users joined Twitter each month: </p>
<p><a href="http://themetricsystem.wordpress.com/files/2009/10/newusers.jpg"><img title="NewUsers" src="http://themetricsystem.wordpress.com/files/2009/10/newusers.jpg" border="0" alt="NewUsers" width="600" height="403"></a>
</p>
<p>This shows us the exponential growth experienced by Twitter in 2009.  In Q3, this plateaus at a rate of about 8 million new users per month.  A chart of total cumulative users is below:
</p>
<p>  <a href="http://themetricsystem.wordpress.com/files/2009/10/cumulativeusers.jpg"><img title="CumulativeUsers" src="http://themetricsystem.wordpress.com/files/2009/10/cumulativeusers.jpg" border="0" alt="CumulativeUsers" width="600" height="403"></a>
</p>
<p>Hockey, anyone?  As of September 1st, <strong>the actual number of live Twitter accounts was just above 50 million</strong>.
</p>
<h2>Average Number of Followers</h2>
<p>According to the data, <strong>the average Twitter user has 42 followers</strong>.  It&#39;s interesting to see the distribution of users by the number of people following them:</p>
<p>  <a href="http://themetricsystem.wordpress.com/files/2009/10/followerspie1.jpg"><img title="FollowersPie" src="http://themetricsystem.wordpress.com/files/2009/10/followerspie1.jpg" border="0" alt="FollowersPie" width="600" height="403"></a> </p>
<p><a href="http://themetricsystem.wordpress.com/files/2009/10/avgfollowers1.jpg"></a></p>
<p>As you can see, the vast majority of users have ten or fewer followers, and over 20% have no followers at all!   As we know, most users have been on the system for less than a year and, as shown in the chart below, the number of followers is proportional to the user&#39;s time since joining:</p>
<p>  <a href="http://themetricsystem.wordpress.com/files/2009/10/avgfollowers.jpg"><img title="AvgFollowers" src="http://themetricsystem.wordpress.com/files/2009/10/avgfollowers.jpg" border="0" alt="AvgFollowers" width="600" height="403"></a>
</p>
<h2>Number of Tweets</h2>
<p>It&#39;s also interesting to look at the number of status updates, or &quot;tweets&quot; made by the average user.  Obviously, the number of tweets from any given user grows over time (per the trend shown in the chart below): </p>
<p><a href="http://themetricsystem.wordpress.com/files/2009/10/updatesjoindate.jpg"><img title="UpdatesJoinDate" src="http://themetricsystem.wordpress.com/files/2009/10/updatesjoindate.jpg" border="0" alt="UpdatesJoinDate" width="600" height="403"></a></p>
<p>When we look at the distribution of tweets by user, we see a very surprising trend: <strong>over 75% of all Twitter users have tweeted fewer than ten times</strong>.</p>
<p><a href="http://themetricsystem.wordpress.com/files/2009/10/updatespie.jpg"><img title="UpdatesPie" src="http://themetricsystem.wordpress.com/files/2009/10/updatespie.jpg" border="0" alt="UpdatesPie" width="600" height="403"></a>
</p>
<h2>&quot;Protected&quot; (Private) Twitter Profiles</h2>
<p>Before moving onto analyses at the tweet level, it&#39;s important to note that some of the users we identified have &quot;protected&quot; their tweets, meaning we were able to see how many followers they had and how many times they had tweeted, but were unable to download specific tweets (and, more importantly, tweet times).</p>
<p>The chart below shows how many users in our data set are &quot;protected&quot; by the month they joined.  The overall number sits around 10% (and dropping): </p>
<p><a href="http://themetricsystem.wordpress.com/files/2009/10/protectedaccounts.jpg"><img title="ProtectedAccounts" src="http://themetricsystem.wordpress.com/files/2009/10/protectedaccounts.jpg" border="0" alt="ProtectedAccounts" width="600" height="403"></a> </p>
<p>Also interesting is how &quot;protected&quot; Twitter users differ from public users.  As shown in the charts below, protected users tend to tweet far more often, but have far fewer followers:</p>
<p> <a href="http://themetricsystem.wordpress.com/files/2009/10/avgupdates-protected.jpg"><img title="AvgUpdates-protected" src="http://themetricsystem.wordpress.com/files/2009/10/avgupdates-protected.jpg" border="0" alt="AvgUpdates-protected" width="300" height="303"></a><a href="http://themetricsystem.wordpress.com/files/2009/10/avgfollowers-protected.jpg"><img title="AvgFollowers-protected" src="http://themetricsystem.wordpress.com/files/2009/10/avgfollowers-protected.jpg" border="0" alt="AvgFollowers-protected" width="300" height="303"></a>
</p>
<h2>Power Users</h2>
<p>Another limitation of the API is that it can only return the 3,200 most recent tweets for any given user.  This is obviously not a big deal for most users, but there are some users out there who have passed that mark.  Our sample data set showed that less than 0.02% of Twitter users have sent more than 3,200 tweets.  These users will have incomplete data sets in our study, but the population is so small that they should not have any meaningful impact on our conclusions.</p>
<h2>Tweets by Source</h2>
<p>It&#39;s interesting to see how different tweeting methods have risen up over time.  Below I show the most popular methods and what percent of Twitter traffic came through them each month since 2007:</p>
<p>  <a href="http://themetricsystem.wordpress.com/files/2009/10/tweetsbysource3.jpg"></a> </p>
<p>  <a href="http://../files/2009/10/tweetsbysource1.jpg"></a></p>
<p>  <a href="http://themetricsystem.wordpress.com/files/2009/10/tweetsbysource4.jpg"><img title="TweetsbySource" src="http://themetricsystem.wordpress.com/files/2009/10/tweetsbysource4.jpg" border="0" alt="TweetsbySource" width="600" height="403"></a>
</p>
<p>The web clearly dominates this list.  Let&#39;s exclude it to get a closer look at which other sources are driving tweets:
</p>
<p style="text-align:center"><a href="http://themetricsystem.wordpress.com/files/2009/10/tweetsbysourcenoweb.jpg"><img style="border:0 none" title="tweetsbysourcenoweb" src="http://themetricsystem.wordpress.com/files/2009/10/tweetsbysourcenoweb.jpg" alt="tweetsbysourcenoweb" width="600" height="403"></a></p>
<p>Twitterriffic has clearly seen better days, and text messages (txt) have been declining as a channel, as well.  Meanwhile, TweetDeck appears to be aggressively gobbling up market share.</p>
<h2>Time Between Tweets</h2>
<p>Since we know the timestamp of every tweet in our sample data set, we can study the time between tweets and the recency of tweets from the userbase.</p>
<p>Remarkably, <strong>the average time between any two tweets from the same user is exactly 24 hours</strong>.</p>
<p>The chart below shows the average amount of time between tweets for a user&#39;s first ten tweets (when applicable).  The x-axis contains the time of the tweet in question, and the value is the average amount of time since the previous tweet.</p>
<p style="text-align:center"><a href="http://themetricsystem.wordpress.com/files/2009/10/timesinceprevioustweet.jpg"><img style="border:0 none" title="TimeSincePreviousTweet" src="http://themetricsystem.wordpress.com/files/2009/10/timesinceprevioustweet.jpg" alt="TimeSincePreviousTweet" width="600" height="403"></a></p>
<p>Surprisingly, the time between Tweets actually drops as users do more tweeting.  However, this could be biased by the fact that most users have tweeted fewer than ten times.  To clear things up, let&#39;s look at the average time between tweets based on how many times the user has tweeted:</p>
<p style="text-align:center"><a href="http://themetricsystem.wordpress.com/files/2009/10/tbtusage.jpg"><img style="border:0 none" title="TBTUsage" src="http://themetricsystem.wordpress.com/files/2009/10/tbtusage.jpg" alt="TBTUsage" width="600" height="403"></a></p>
<p>Indeed, as you might expect, users who send more tweets also tweet more frequently, and the dropoff is quite significant.</p>
<h2>Probability of Incremental Tweets</h2>
<p>Since there is such a huge dropoff in tweeting activity up until the 10 tweets mark, we thought it might be interesting to look at the &quot;probability of an incremental tweet&quot; based on how many tweets a given user has completed.  This can be calculated with just a few clicks in <a href="http://www.rjmetrics.com/">RJMetrics</a>:</p>
<p style="text-align:center"><a href="http://themetricsystem.wordpress.com/files/2009/10/probinc.jpg"><img style="border:0 none" title="ProbInc" src="http://themetricsystem.wordpress.com/files/2009/10/probinc.jpg" alt="ProbInc" width="600" height="403"></a></p>
<p>As you might expect, with every Tweet a user performs, their chance of tweeting again goes up.</p>
<h2>Active Tweeters</h2>
<p>We know that Twitter has 50 million registered users, but we also know that the vast majority of them have tweeted fewer than ten times.  Let&#39;s investigate just how many of these registered users are actually actively tweeting.</p>
<p>Using our tweet data, we can identify what percent of the user base sent out at least one tweet in any given month.  This &quot;unique tweeters&quot; statistic is charted below (to get a fair statistic we excluded protected accounts from our denominator):</p>
<p style="text-align:center"><a href="http://themetricsystem.wordpress.com/files/2009/10/percenttweeting1.jpg"><img style="border:0 none" title="PercentTweeting" src="http://themetricsystem.wordpress.com/files/2009/10/percenttweeting1.jpg" alt="PercentTweeting" width="600" height="403"></a><a href="http://themetricsystem.wordpress.com/files/2009/10/percenttweeting.jpg"></a></p>
<p>The number seems to hover in the 25% range.  In other words, <strong>only about 1 in 4 registered users is actually tweeting in any given month</strong>.  (Although it&#39;s worth noting that some users may only be using Twitter to read others&#39; tweets, meaning they are not full-fledged &quot;zombie&quot; accounts.)</p>
<p>Notice the bump in early 2009, right around the time when new user growth began to accelerate aggressively.  This suggests the obvious: on average, a newer user is more likely to tweet than an older user.  When new user growth exploded in early 2009, the concentration of new users became denser, driving this average up.  To illustrate this (and get a better look at how users behave over their lifetime), we turn to cohort analysis.</p>
<h2>Cohort Analysis</h2>
<p>A <a href="http://themetricsystem.rjmetrics.com/2009/09/09/cohort-analysis-in-rjmetrics/">cohort analysis</a> is a great way to look at user behavior and loyalty over time.  Each line in the chart below represents a different &quot;cohort&quot; of Twitter users based on the month they joined (we chose 7 cohorts from different time periods to avoid clutter).  In the chart below, we monitor what percent of the users in each cohort come back to tweet again in each month after having tweeted in the first month.  Obviously, month 1 is 100% by definition:</p>
<p style="text-align:center"><a href="http://themetricsystem.wordpress.com/files/2009/10/monthlycohort.jpg"><img style="border:0 none" title="MonthlyCohort" src="http://themetricsystem.wordpress.com/files/2009/10/monthlycohort.jpg" alt="MonthlyCohort" width="600" height="403"></a></p>
<p>This is quite a telling chart:</p>
<ul>
<li>There is an expected usage dropoff in month 2, but after that point <strong>usage holds predictably steady</strong>.  This is great news for anyone trying to forecast user activity early on in a new user&#39;s lifetime.</li>
<li>The newer cohorts, despite being significantly larger in size, actually consist of more loyal users.  The two highest lines are also the two most recent, meaning that <strong>users who joined in 2009 are actually more likely to keep tweeting after their first month than those who joined in the same month in 2008</strong>.</li>
</ul>
<p>Since the dropoff in Month 2 is quite pronounced, let&#39;s zoom in and look at weekly cohorts to see if we can see how usage drops off at the weekly level:</p>
<p style="text-align:center"><a href="http://themetricsystem.wordpress.com/files/2009/10/weeklycohort.jpg"><img style="border:0 none" title="WeeklyCohort" src="http://themetricsystem.wordpress.com/files/2009/10/weeklycohort.jpg" alt="WeeklyCohort" width="550" height="403"></a></p>
<p>We see a similar pattern here, although more recent cohorts don&#39;t stand out as much as in the monthly analysis.  Again, however, the dropoff in the second period doesn&#39;t seem to further decline as time goes on.  <strong>This means that by the second week of a cohort&#39;s lifetime, Twitter can reliably predict its users&#39; future behavior as a group.</strong> </p>
<p>Another cohort analysis that might be interesting is to look at how many tweets a cohort makes each month after joining.  This metric will incorporate both the dropoff in usage from the users who churn in the first month and the uptick in activity from users who stay on the platform:</p>
<p style="text-align:center"><a href="http://themetricsystem.wordpress.com/files/2009/10/tweetcohorts.jpg"><img style="border:0 none" title="TweetCohorts" src="http://themetricsystem.wordpress.com/files/2009/10/tweetcohorts.jpg" alt="TweetCohorts" width="600" height="403"></a></p>
<p>Wow!  This is a remarkable image.  Despite the massive dropoff in users after the first month, the tweeting activity from the users who are left is so voluminous that it makes the &quot;tweets per month&quot; of each cohort average over 100% (and, as before, the more recent cohorts are the more loyal)!</p>
<p>In other words, the users who stick around actually tweet so frequently (and at such a rapid pace compared to their first month) that they more than make up for the lost activity of those who churned after the first month.  This is a very powerful and unexpected statistic.</p>
<h2>Conclusion</h2>
<p>Everyone has their own feelings about Twitter&#39;s <a href="http://www.techcrunch.com/2009/09/16/twitter-closing-new-venture-round-with-1-billion-valuation/">reported</a> $1 billion valuation.  I hope this article gave you a taste of what its new investors likely considered before coming up with that number.</p>
<p>To learn more about RJMetrics and our original blog posts including the <a href="http://themetricsystem.rjmetrics.com/2009/05/26/business-intelligence-rap-video/">business intelligence rap</a> and our <a href="http://themetricsystem.rjmetrics.com/2009/07/21/how-to-get-twitter-followers-the-definitive-guide/">twitter followers guide</a>, check out <a href="http://www.rjmetrics.com">our website</a> and follow us on Twitter <a href="http://www.twitter.com/RJMetrics">@RJMetrics</a>.</p>
<p><strong><em>Crunch Network</em></strong>:  <a href="http://www.crunchboard.com">CrunchBoard</a><em> </em>because it’s time for you to find a new Job2.0</p>
<p><a href="http://d.techcrunch.com/ck.php?oaparams=2__bannerid=214__zoneid=43__cb=90f88b287a__oadest=http%3A%2F%2Fwww.StrataScale.com%2Fironscaleservers"><br />
<img src="http://i.techcrunch.com/71a7ba935d5cf5e8dba355aa787fcd35.gif" width="300" height="250" border="0"></a><br />
<a href="http://d.techcrunch.com/ck.php?oaparams=2__bannerid=213__zoneid=43__cb=c5ab92f32f__oadest=http%3A%2F%2Fwww.cubetree.com%2F%3Futm_source%3Dtechcrunch%26utm_medium%3Dbanner%26utm_content%3Dfirstad%26utm_campaign%3Dbenchmarktest"><br />
<img src="http://i.techcrunch.com/67301164d96328d1db32a36554564b29.gif" width="300" height="250" border="0"></a></p>
<div>
<img src="http://d.techcrunch.com/lg.php?bannerid=214&amp;campaignid=31&amp;zoneid=43&amp;cb=80fc344a86" style="width:0px;height:0px">
</div>
<div>
<img src="http://d.techcrunch.com/lg.php?bannerid=213&amp;campaignid=177&amp;zoneid=43&amp;cb=c5ab92f32f" style="width:0px;height:0px">
</div>
<div>
<a href="http://feeds.feedburner.com/~ff/Techcrunch?a=gSMXpZJBMbo:knxF0-jdO5U:2mJPEYqXBVI"><img src="http://feeds.feedburner.com/~ff/Techcrunch?d=2mJPEYqXBVI" border="0"></a> <a href="http://feeds.feedburner.com/~ff/Techcrunch?a=gSMXpZJBMbo:knxF0-jdO5U:dnMXMwOfBR0"><img src="http://feeds.feedburner.com/~ff/Techcrunch?d=dnMXMwOfBR0" border="0"></a> <a href="http://feeds.feedburner.com/~ff/Techcrunch?a=gSMXpZJBMbo:knxF0-jdO5U:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/Techcrunch?i=gSMXpZJBMbo:knxF0-jdO5U:D7DqB2pKExk" border="0"></a> <a href="http://feeds.feedburner.com/~ff/Techcrunch?a=gSMXpZJBMbo:knxF0-jdO5U:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/Techcrunch?d=7Q72WNTAKBA" border="0"></a> <a href="http://feeds.feedburner.com/~ff/Techcrunch?a=gSMXpZJBMbo:knxF0-jdO5U:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Techcrunch?d=yIl2AUoC8zA" border="0"></a>
</div>
<p><img src="http://feeds.feedburner.com/~r/Techcrunch/~4/gSMXpZJBMbo" height="1" width="1"></p>
]]></content:encoded>
			<wfw:commentRss>http://www.simonbutton.com/2009/10/06/twitter-data-analysis-an-investor%e2%80%99s-perspective/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

