New people to the , particularly those who are academics or journalists:

know that unlike Twitter, almost all people here *do not* consent to having their posts quoted without prior permission.

Do not scrap a bunch of posts, or re-embed them in articles, without getting consent.

Want to study the or write about it?

Ask to talk to people. Many will talk to you. They have voices and know how to use them

oops, scrape, not scrap. feel free to scrap posts, particularly mine I sort of scrap but the data is very heavily mangled because it is not my goal to actually store the text itself but only it's structure and this is used later to generate brand new text.
as you can see an example here...


@logan Yeah, I guess my concern is more direct quoting exact people/accounts without consent.

I do know there is leeriness about scraping -- there was a paper a few years back that raised hackles for scraping Mastodon, but I think they failed to anonymize well... i don't really have to anonymize the data thankfully because it is very scrabbled . at most i do remove the @ mentions though but as you can see...
you can really get anything useful from this.

@logan I do see some pretty specific URLs leading to accounts in there. What about those? their not specific . they just happen to come into my bots timeline .
the bot considers them words but if needed i can add this to what i also weed out.

@logan @robertwgehl Please don't scrape anyway!

I do not consent to my posts being shoveled into some Giant ML Model or whatever it is you're doing.

Scrape your /own/ posts. it's not really scraping though it's only what the bot can see on it's own timeline.

it's basically like me looking at my current timeline

@logan @robertwgehl Just because it's limited to the instances you federate with doesn't make it not scraping. indeed it is scrapping but the contents of the posts is not really the goal of the bot collecting this data.

it only needs the structure.

like how "how are you"
the important data is that "are" comes after "how".

@logan @robertwgehl Still, please don't!

I don't want to be part of your research project without being asked. it's not a research project . it's a chat bot.

but i do understand your concern.

@logan @robertwgehl Research project, chat bot, whatever, /ask first/.

A chat bot is actually significantly creepier, so uh, yeah no. well i can try to figure out how to get a list of people that my bot follows.

this is very early development. this data that i am collecting will likely just be deleted anyways as i adjust how it reads it.

so now i have 2 things to implement for the next adjustments.

scrub urls and figure out how to get a list of people the bot follows.

@logan @robertwgehl Maybe make it so a) it only scrapes from people it follows, and b) only follows people who follow it. Then you have to opt in.

@logan @robertwgehl Sure, you won't get as much data.

But the important thing is consent.

@logan @IceWolf I think the issue is that many people here know the history of data gathering and how such practices come back to harm them.

You're not going to be able to escape the heritage of the Emotional Contagion study or Cambridge Analytica et al unless you behave differently here.

Gaining informed consent is the first task.

@logan @robertwgehl "No description provided" Well that's ridiculously unhelpful.

@logan @robertwgehl Wait, do you mean home - so only who the bot is /following/ - or public TLs?

(and please don't make it a followbot, if the former)

(well, if the latter too, but whatever.) the tricky part with that is i still would have to try and figure out how to read that information.

I don't go to other peoples accounts and scan them for details i only use what happens to come across the global timeline (at least that is what the bot currently does)

once i setup the following limitation (i was probably going to anyways at some point just this post kind of speed me up on this)
i can just ask people to go follow my bot if they don't mind having their posts gathered and ripped apart for their individual words. because i would have to manually input the data which goes against the point of a self learning chat bot.

all the data it collects is smashed into 1 large array of words , not even entire sentences . and than it randomly chooses words from this list.

a word is in this case is anything separated by a space character. don't worry though i'm currently implementing limitations for it to only work with people that follow it.
this will take some time because i'm also implementing a few other things i neglected the first time i structured it.

@robertwgehl Thanks for the reminder for me to add "All public or unlisted posts on this profile may be cited, embedded or otherwise shared freely, unless the intent is for me to be harassed." to my bio.

