According to some posts I saw there is a paper on Mastodon that scraped public posts. Anyone know what it is called or where to find it?

It's also nice that all the conclusions in the paper are wrong because they start with a mistaken premise that content warnings mean that a post is "inappropriate".

@jaranta Thanks for the actual paper link, let's see it. :P I'll look at the ethics once I see it, but that title and its "inappropriate" is hilarious in itself from a methods perspective - this is why you don't rely on automated data collection without actually having a living human enter the context and look around first.

@werekat It's as if you need to understand the topic your researching.

@jaranta @werekat Unless they just wanted a specific conclusion, and were just looking for an easy way to get their paydirt.

@ansugeisler Doesn't really smell like "we needed to get this result" to me - at the very least I can't imagine a practical goal for that particular result. So my money's on "people who know how to write data crawlers, but not do social science".

@werekat Well, an article that says "That strange social network that grabbed some headlines a while ago but isn't Commercial is bad, actually" is an article that will likely get some headlines, which in turn is what some researchers think will help them in their moribund careers (because they have no scruples and don't understand science).

@jaranta Yes. I only read the abstract but this is painfully clear.

@jaranta @socrates Are any of the paper authors active Mastodon users on or any other instance?

@drbjork @jaranta Not that I'm aware of

And based on how well they seem to have understood CW's on the Fediverse, I doubt that any of them used Mastodon at all, ever

@socrates @jaranta As far as I understand it is a conference paper. I really hope it wouldn't have passed peer review...

@drbjork @socrates @jaranta I think conference papers still count as publications in CS.

@mplouffe @drbjork Yeah, conference papers are considered important publications in many fields and go through the same (or similar) review as journal papers.

@jaranta @drbjork I kinda wish it was like that in my fields. I've got a bunch of conference papers that are functionally worthless for promotion.

At the same time, I've seen some absolute crap get presented at conferences because all they require is a 250-word abstract and the author didn't follow through with the project after acceptance.

@mplouffe @drbjork I visit both kinds of conferences and both have their merits and drawbacks. The abstract-only presentations tend to be good for promotion and require less work, but you do tend to end up with some stuff that is just crap.

@jaranta @drbjork It's been awhile since I've been to a paper-only conference. Submission deadlines never seem to match what I feel like working on, and I avoid conferences if I'm not presenting. πŸ™ƒ

@jaranta Well, it's all written by computer scientists, so would you expect the ethical component of the paper to actually be meaningful?

@jaranta They anonymized the data, right ... and those were public posts.

What am i missing?

@qcat It's not possible to anonymise textual data so that the original text is impossible to find afterwards, meaning that it is impossible to anonymise textual data. There are methodological work-arounds that are used in internet research, but these computer scientists probably did not know about them.

@jaranta Yeah, they said they anonymised the user data, but if nobody knows who posted what, then how relevant is the content? Also, I don't see them using any explicit text in their analysis. Of course if they keep that dataset lying around (or even worse, the non-anonymised version) and provide it to third parties, that would indeed be problematic.

@qcat They published the dataset. It's since been taken down.

@jaranta Ah, ok. And the implication being that anyone can find the original posts even if you don't know the user and imstance if you search through the instance list tjey used. That's the concern i guess?

Sign in to participate in the conversation
Scholar Social

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!