#ANZCA14 hashtag analysis

Summary:

As Travis Holland requested, I’ve made this post in order to pool all my little bits of data analysis of the official ANZCA conference hashtag “#anzca14”. As far as ‘remarkable findings’ go, I doubt there’s anything of real interest, although some people may be interested in reading about themselves ūüôā

In terms of tweet publication¬†times, the scraper I used defaults to GMT, and it reports two different times, separated by an hour. I don’t (yet) know an easy way to modify the times, other than just winging it as I write. I’ve excluded tweets that I have made from the results. I also want to shout out to Johnathan Hutchinson¬†for showing me this method.

 

Here goes with the analysis:

#ANZCA14 was the official hashtag for the conference. I collected tweets from the 6th to the 12th of July GMT, which covers a smattering of chatter a couple of days before the pre-conference, right up to the post-conference morning well-wishers.

Our first poster Cinnamon Mind,¬†whose IRL¬†name is likely different, and who I don’t think I met. They posted at 10:36am on Sunday the 6th.

The last post that I collected came from John Tebbut, 10:14am on today, Saturday the 12th.

 

In the period covered Рexactly 6 days or 144 hours Рa number of different accounts contributed to the #ANZCA14 hashtag, which including many of our keynote speakers, the attendees, and what I think were the chefs at the conference dinner.

Number of unique tweets: 1877

Number of authors: 161

Averaging: 11.71 per person

We can see how this was distributed over the days:

chart1 (2)

Beyond the small wobbles at the start (pre-conference hype), the first peak is the postgraduate preconference, as organised by myself and Emily van der Nagel. We can see that the most tweets occurred at the second peak – the first day of the conference. The second day was¬†distributed throughout the day a bit better, while the last day never reached the heights of all the other days, while extending on a fair bit longer. I think we can attribute this to people having to¬†catch flights out of town, and then trying to organise more and more people on¬†towards post-conference drinks. There’s a very tiny ripple at the end,¬†which I should note is mainly people¬†footnoting things they meant to say during the conference, or wishing people well (or posting long-winded, uninteresting SNA analyses of conferences, amirite?).

Doing some fairly crude arithmetic, which is in no way statistical, we were getting about a tweet per minute during the conference hours themselves.¬†I’ve had trouble getting processing tools to work with any ‘time’ metadata, so the figure it simply based on the number of conference days (4), and the number of conference hours (8 per day, plus 4 overall for the associated conference activities).¬†This comes in at about 52 tweets per hour.

Median tweet value: 1

For those with less statistics than I, the median is the¬†statistical value that’s at the half-way point. It’s not the same as the average, which simply smooths everything out. The average represents an idea of how much people would contribute, if everyone had contributed equally.¬†For the median, we¬†might say that it’s the value that most people experience.¬†We might think of it this way: The average wealth¬†of¬†Australians could be calculated as the GDP divided by the population, but this would not give an indication as to the experiences of most people.¬†The wealth of the 1% would totally distort the economic experiences of the most impoverished in the country.¬†When the value is 1 it indicates that of the people who contributed,¬†most only contributed once. This is different from the average, which would say that people, on average,¬†tweet 11 times. This oversimplifies things, and would suggest that¬†all 161¬†tweeters had equal investment in contributing, which was clearly not the case. Indeed, the¬†median¬†speaks to¬†the fact that the ‘long tail’ was firmly in effect:

chart1

Our 10 top tweeters and number of tweets were:

Melissainau 131

ElizabHk 123

Reporting4Work 120

myspaceghost 109

MattLoads 93

aaron_humphrey 88

jessamy_sesame 88

travisaholland 83

ggoggin 78

anurbanheart 73

 

Miscellaneous stats:

These are going to be facts (factoids) that make you go “hunh”, rather than “wow!”

 

1) Our biggest retweeter was ElizabHk, for whom 46% of the tweets were retweets.

 

2) As I already mentioned on twitter already, our most central scholar was Jason Farman Рhe eclipsed all others users by being in interaction with over 160 other tweeters during the conference. Next was  Johnathan Hutchinson with 74.

 

3) AShieldsDobson was the most generous retweeter, with 100% of their tweets being retweets.

 

4) (Probably) the most retweeted tweet was this one, by Edwina Throsby:

I haven’t worked out 100%¬†how to¬†get the data about RTs out of the program I’m using, but¬†it was the only thing that I saw that had more than 3 retweets.

 

5) At least one other hashtag seemed to get small amount of use: I scraped #anzca2014 and found 8 tweets from early in the conference from people who either bowed out at that point (1 person), or got on the #ANZCA14 gravy train.

Only 58 tweets led to having replies – I’d initially described this as a non-dialogue. As Rowan Wilken noted, that’s probably because we’re not intending Twitter to be the site of ‘real’ dialogue, but rather as a means of broadcasting¬†the panels we had observed.¬†Anthony McCosker added to this that the metadata is perhaps not indicative of how discussions were unfolding anyway. I agree with both of them, and have a belief that a lot of the dialogic communications were happening face to face.

 

Visual analysis with Gephi:

I’ve used Gephi, following some blog posts I found, to create a¬†visual representation of ¬†connections between identities¬†and hashtags. These have become my ‘nodes’ for the graph data. This analysis is not to do with individual volume, but rather with individual connectivity – i.e. the level of engagement between a user and their¬†community. This is represented by the degree to which they include others in their postings by either mentioning them, or mentioning a hashtag.

The algorithm that I used initially was based on proximity. The number of times a topic is referenced influences how far apart the nodes are pushed. The more central nodes become the ones which have the most connections. More connections between any two nodes pushes those nodes closer. Because the central userbase was fairly small, the outliers had a large influence on the distribution of the nodes. Because the spatial distribution was weighted, operation could only perform so many iterations before it stopped expanding. As such, my initial graph looked like this:

ForceAtlas

 

You might be able to spot the outliers on the far left, and the duo of nodes on the far right of the image.

Trying again with a Fruchterman Rheingold algorithm left me with a distribution that attempted to (I think) centralise the individuals with the most connections Рi.e. putting them equidistant from their communicators. I then applied a label weight feature to indicate the number of their connections to their counterparts. The bigger the text, the more times individuals connected to other people.

Weighted labels

You can¬†open the images up in other windows if you want. They’re 1024×1024

Observations:

We’re a communications studies association, at a social-media themed conference so we might assume that¬†there’s a fair amount of social media usage. There may well have been plenty of other communication going on over other channels.¬†I’ve used PiratePad in the past for my collaborative conference notes, and Facebook, Xabber GChat, and email have been used for¬†similar communicative channels at the conferences I’ve been to. I would say, though, that while there’s a fair amount of chatter, I don’t know to what extent there’s much reach¬†outside a central pool of tweeters. Several people were paying attention¬†externally of the conference, so there was at least some reach outside a closed bubble, but I think that’s the limit to which this piece of SNA can¬†understand of #ANZCA14.

Reflections:

I guess it was fun to have a look at code again, and to manage data. It appeals, in that it’s fun to do,¬†but I find it hard to draw conclusions that I find interesting from the data. I think the results and conclusions to be claimed from the subject matter is fairly limited, but that’s primarily due to my approach¬†(i.e. I’m not approaching this with a set methodology, but rather just an underinformed method. I also don’t have a particular conclusion that I’m looking for beyond the connections themselves). Happy enough with theory stuff for now, academically speaking, but I’d like to try and¬†find out sites where I can apply SNA¬†in a more critical realm. Happy to take feedback and critique on Twitter¬†@robbiefordyce¬†

Advertisements

2 thoughts on “#ANZCA14 hashtag analysis”

  1. Thanks for sharing this Robbie!

    What’s the difference between factoids #1 and #3 (“biggest retweeter” and “most generous retweeter”)?

    1. Biggest retweeter == sheer volume of RTs
      Most generous == highest proportion of tweets as RTs. (In this case, 100%)

      Generous is, admittedly, not the best term to describe this quotient, as it makes assumptions about intentions.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s