For my MSc thesis research, I’m working with a bunch of data collected through online discussions during a blended course. Part of the discussions took place using Blackboard’s discussion board feature, part took place on students’ blogs. One of the things I need to do is to document how the discussions played out, to try and tease out any differences between the two venues. I’ll be using the Community of Inquiry model to describe the social/teaching/cognitive components of posts, but I’ve been wanting to describe the flow of discussion as well. How do the discussions occur? Are there patterns of activity, in time or size of responses? I’ve been struggling with how to document these. In my thesis, it’s really just a glorified case study, so I’ve had to constantly force myself to stop thinking of it as controlled experimental data. What I’m doing is describing the activity within a single course, in 2 venues of online discussion.

I had a bit of an epiphany this afternoon, while working through some preliminary work to prep for CoI coding. I thought about Hans Rosling’s statistic visualizations and how he was able to incorporate several axes of data into a graph by using size, colour, shape, etc…

And then it hit me – it would be relatively straightforward to apply that approach to the data documenting an online discussion. The timestamp data is there. The info about the individual is there. Basic “demographic” data is there (number of words, types of things included – images, links, attachments, media, etc…), and if I combine those, I get something like this:

On this rough mockup visualization, time is the vertical axis, transformed into a simple “number of days” integer. The horizontal axis is “threads of discussion.” This displays the discussion in a “FAQ” discussion board used in the course. There were 9 primary threads (plus one forked thread).

Each circle represents a post. The size of the circle represents the number of words in a post or response – in this mockup, I just did a simple conversion where the number of words directly translated into the width of the circle (a post with 100 words is 1.00″, a post with 50 words is .50″, a post with 150 words is 1.50″ etc…). The colour of the circle indicates the person who posted it. White circles are the instructor. Black circles are anonymous students (who did not provide consent to participate in the research, so the content of their posts was deleted from my working archive), and other colours indicating individual students.

This is a very rough mockup. I’m hoping to refine it a bit more, to include a way to represent the CoI coding for each message – an indicator of the relative social/cognitive/teaching aspect of the post, as well as a way to indicate other interesting things about a post (how many images/links/attachments/embedded media were included? etc…)

# Problems with the mockup:

1. It’s messy when posts occur close together. Overlap makes the circles obscure each other.
2. The literal translation of wordcount to size means larger posts overwhelm the other posts in the diagram, in a way that over-represents the difference as seen in the actual discussion (a post that is 5x the size of another post doesn’t necessarily drown out the other posts, but it is given prominent emphasis in the diagram…)
3. Forking of threads could get confusing – how to best indicate the branch points? I tried with a dotted line, but it’s unclear which post/circle it originates from…
4. threads that are displayed beside each other may not be directly related, but they may appear to be intertwined because of the overlap of circles (a large post in thread 6 overlaps threads 5 and 7, etc…)

I’d like to extend the mockup, after figuring out ways to get around these issues, to show all posts in all discussions in the entire course. It should be interesting to see the temporal overlap between discussions, and see some data about patterns of interaction from participants across the entire thing – does a given participant start most threads? do they respond with giant posts? do they stay in one CoI aspect, or do they cover the whole thing? etc…

I would love to see a large visualization, with vertical lanes for each thread in an entire course, across all venues of online discussion, with posts displayed as shown above, and with the CoI coding indicated. What better way to compare activity across discussions in a course?

It strikes me that this visualization is extremely simple – perhaps too simple? perhaps so obvious in hindsight that someone else has already come up with a solution? Scott Leslie sent me a link to Boardtracker, which looks extremely interesting, but it looks like it’s strictly based on time and not threads, and doesn’t appear to handle representing individual contributions. Also, it appears to be under construction…

update: I was thinking about the overly-large-circle problem, and wondered what the diagram would look like if it was laid out more like an autoradiogram, with opacity of a block indicating the “size” of a contribution, and symbols overlaid to represent data like contributor and potentially coding info…

Size of contribution (wordcount) is the opacity of each block. The coloured circle represents the contributor (white is instructor, black is anonymous, etc…) This representation makes it harder to see at a glance, but probably displays the conversation more accurately.

update 2: working in some of Tim’s suggestions via his comment, I came up with this version. It’s a little closer to Rosling’s work. Now, I need to figure out how to indicate the CoI coding for each post…

update 3: I put all of the metadata from the Blackboard discussions, and one WordPress site, into OmniGraphSketcher to see what it would look like. Some interesting things become apparent:

Blackboard posts (and responses) are circles, WordPress posts (and comments) are diamonds. At a glance, discussion board interactions appear to be briefer – fewer words – and more immediate (posts usually occur within a few days, and then stop). Blog posts appear to be longer (more words), and extend conversation over a longer period – with several days being common between post and comment. The WordPress blog posts also appear to have elicited longer responses via comments (at least in the first WordPress site I entered data for…)

# Visualization tools that may be useful:

• SNAPP – works with major LMS applications, but appears to not like our old version of Blackboard (Bb8), and doesn’t grok WordPress, so couldn’t be used to visualize my entire data set.
• Meerkat – sounds like it might support custom data imports. I’ve signed up for an account so I can try it out.
• AGNA
• DiscoverText

## 15 replies on “on visualizing online discussions”

1. Trever says:

Size of circles — try mapping words to area instead of diameter?

1. interesting. that’d help mitigate the wordcount exaggerated influence, for sure. good idea! Jon Becker suggested a 3D visualization, which could add height as another dimension. But would add another way to obscure data behind other stuff, too… 🙂

2. DijutalTim says:

A few questions:

1. Are you thinking in terms of an interactive or dynamic visualization or will it be static?
2. Is the ultimate focus of the visualization something that could be applied to this problem space or just to tell the story of your specific data set?
1. Selfishly, I need to visualize my (static) data. The primary use will be as supporting media in a thesis, so print-based rather than interactive…

I was looking at the SNAPP tool for visualizing discussions, but it seems to work only with key LMS applications like Blackboard, D2L etc…, and doesn’t work with our version of Bb (or with WordPress).

1. DijutalTim says:

My suggestions (preliminary):

1. Plot the time on the horizontal axis – I think one of the stories to tell is the length of the conversation (which I think each thread represents). It seems to me that they’re relatively short — which then leads to the question of whether that’s simply a function of our pedagogical methods or whether the tools themselves encourage shorter conversations. One problem with using size to represent post size is that it suggests a longer discussion as well as the problems you’ve identified. Since time is the linear element here, it should be the horizontal axis to impute continuity.

2. I don’t think you need to make threads a discrete axis. Using a line to connect data plots or using an area chart might better represent the nature of the threads (which seem like pulses over time by my reading of the prototype)

3. The second visualization loses the story of the magnitude of the contribution of the instructor vs. the students. I think if there were one or two students that drowned out the conversation vis a viz the rest of the class, this might be interesting but as it stands, the real story seems to be the instructor vs the students so you might want to consider just representing the students as an aggregate.

3. I like the suggestion of using area rather than diameter to reflect post length as that will make it more manageable. Another idea – the width of threads doesn’t need to be standard, does it, so they could adjust to the widest (longest) post.

My \$0.02, your first effort is by far the best one. One of the main reasons I like it is that it reads downwards, like discussion forums do. Number 2 is less successful but still captures some of it, whereas for me, number 3 does not really work. Size and word count work well together visually, and this totally looses that.

Is diagram 3 correct – it seems to show one of the posts from day 1 leading to a discussion, whereas I read the other two visualizations as saying these did not spawn discussion.

I assume you are looking to do two visualizations, one for the CMS-based discussions and one for the blogs, or are both going to be done on the same graph?

1. yeah. of the 3 day-1 posts by the instructor, only 1 of the posts triggered responses. (and half of the responses to that post were by a student who didn’t provide consent, so I had to use the null-post marker to at least make use of the metadata about their posts: date, position in thread)

4. Any reason you would not want to use some SNA tools to plot the conversation, theme or genre flows between the various participants, too?

Have you looked at some of the free SNA tools like Agna, D’Arcy? Not really sophisticated in presentation, but good at what it does.

1. The main reason was data compatibility – tools seem to work with one kind of data (LMS or blogs) but not both. I need to be able to compare activity in discussions taking place in both Bb8 and WordPress.

Looks like AGNA is mostly for visualizing connections within a social network – weighted links between individuals, rather than modelling the activity of a discussion. I’m looking for a way to represent the transactions of a discussion, to show patterns of activity.

2. although, holy crap! that Network Analysis tool will come in handy for describing cohesiveness of a community on each platform. awesome!

1. You might also want to look at Discovertext.com for its ability to thematically tag interactions based on transcation types that you define. Can ingest RSS, tweets, and even YouTube comments. Text is a breeze.

I see what you’re trying to do by modeling the Rosling look. There’s got to be a multi-dimensional modeling tool that could work with categories you define.

d.

5. I just added another mockup. This one includes the metadata from all of the Blackboard discussion boards in my data, as well as one of the 8 WordPress sites. This could get messy…

6. And, this is why we call it Piled high and deep.
Quality isn’t something that measures in word count or- numbers of comments- or anything else.
Did General McAuliffe need more words to respond to the Germans? “Nuts” did it.
Einstein said a lot with E=MCsquared.
I’d be more interested in how many people were engaged- and not how many commented.
Google has been using “reputation” for rankings- as well as numbers of visitors plus heuristic data for ranking for a long time.
Of course- all these charts and big words confuse me….
is the main reason the numbers of discussions on these platforms instead generated by a failure of the teacher to properly explain the subject- forcing the students to have to have a discussion?

1. yeah. another layer of the visualization will represent the social/teaching/cognitive presence indicated in each post. not “quality” either, but gets to what happened, rather than straight wordcount.

the problems is I’ll have so much data describing each post – the basic metadata, the coded presence data, conversation flow data, etc… that it’ll be hard to visualize it meaningfully. non-trivial…