One of the slightly tricky things to do is to turn the tabulated data in TAGS into usable graph data.
While you can just copy/paste data out of TAGS into a spreadsheet, it won’t actually generate a list of paired users having a conversation. It’ll give you a user, and a list of what they’ve said, which will include other users, but it doesn’t separate it out in a way that a graphing program will understand. We need paired users (otherwise known as ‘nodes) so that we can show and map relationships. Any other additional information will prevent the graphing software from properly generating a relationship.
We’re going to fix that here.
Create a useable CSV (Comma-separated variable) file.
- Go to your Tags Search
- Navigate to your ‘Archive’ tab of your TAGS search
- Go to the menu item: File>Download As> ‘CSV’
- Save the file somewhere convenient.
At this point, we are going to use a program called OpenRefine, available here. OpenRefine works using your browser, and can capture material from your Google account.
Once you’ve installed it, and loaded it up, you’ll find yourself with a new browser window:
Import your CSV into OpenRefine
- Go to ‘Create Project’.
- Select ‘This Computer’.
- Find and select the saved CSV file from above
- Click ‘Next’ — do not save the project just yet
- Once this has loaded, click the tab at the bottom to identify the file as a CSV file
- ‘Create project’
OpenRefine will be able to modify this project so that you can make use of the information in a graphing file.
Strip out unnecessary information
- From the ‘All’ column, click the drop down menu, “Reorder/Remove Columns”
- Drag all columns over to the right hand side, except “From_user” and “Text”
- From the “Text” column, select
Edit cells>Common transformations>To lowercase. - Then, from the “Text” column a second time, select
Edit column>Add column based on this column. - Then, in the window that pops up, name your column ‘target’ and replace the Expression text (it should just say ‘value’) with the following text:
filter(value.split(/[^a-z0-9-_@#]/),i,or(i.startsWith("#"),i.startsWith("@"))).join(",")
(For the curious, this is a ‘regex’, or regular expression logical statement, merged with OpenRefine’s internal markup system. It breaks down the components of the tweet into parts delimited into anything starting with # or @, thus grabbing all hashtags and usernames. It will only work on lowercase text, so if you’re having trouble, make sure you’ve converted your material properly in step 3.)
- From the “From_user” column, select
Edit column>Add column based on this column - Name your new column ‘source’ and add the following to the Expression box:
'@' + value
- Finally, reorder your columns again to get rid of “Text” and “From_user”.
Splitting, reconciling, exporting
- In our new column, ‘nodes’, select
Edit Cell> Split Multi-Valued Cells - Click ‘okay’ (The value in the dialogue box should simply be a comma)
- In ‘From_user’, select
Edit Cells> Fill Down - You now have a set of users on the left, and a list of hashtags and users that they messaged on the right. Export this as a new CSV file, and give it a different name from before.
You should now have a usable file. Hold on to it, and we’ll make use of it in the visualisation phase of the workshop.