Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Hard problems in social media archiving (alexwlchan.net)
47 points by surprisetalk 12 hours ago | hide | past | favorite | 11 comments
 help



Good that they actually raise the question of users not wanting to be archived. I think the semi-ephemerality of channel based systems like Discord is increasingly popular partly because of various sorts of "cancel wars", well- or ill-intentioned capture and use of posts out of context.

Semi-ephemeral is the wrong way to view Discord et al. It's only that way if you actively remove your messages after a while, but they have made it as difficult as possible.

Though the fact that it's hard to archive too, perhaps make it more ephemeral on the whole, since few people will have a backup when do you get around to remove it.


Agreed. This is mostly mitigated by using a Bot to auto-delete all messages in a channel based on time since the post. I've switched to this method to be more ephemeral and have less concern of retention in perpetuity by Discord's egregious privacy policy.

Here is the bot I am leveraging: https://eazyautodelete.xyz/en


That seems to require having control of the discord, and can't be done by individual users. A hostile moderator can also kick/ban you which stops from deleting things.

HTTP is not designed for mirroring.

FTP was easy to mirror with "lftp> mirror -p".

Easy mirroring and archive level maintenance (let's say the network always maintain 3 copies at minimum) should be built-in the "social media" protocols.


Would it make sense to archive every word every person ever speaks? At what point does archiving everything people do constrain their ability to live freely in the present?

Despite being in written form (decreasingly so), social media feels more like a private conversation in a public space - and like all such conversations, it deserves the right to decay, so that we do not all become prisoners of the dumbest thing we ever said.

The transformative work of curation - choosing which pieces to save, to turn into books, diary entries, or blog posts that record context for posterity - is a valid part of how archivists build the corpus of history. Harvesting all the raw data simply because we can is a dangerous road.


>social media feels more like a private conversation in a public space

Then it wasn't very smart of you to post something publicly then.

You should be able to choose what things you want public or private, but if your intention is for public, I don't think you should be able to delete it. You can make amendments to what you say, but you shouldn't be granted the ability to choose what other people remember. Otherwise, private conversations or just saying nothing are alternatives.

>At what point does archiving everything people do constrain their ability to live freely in the present?

The point at which information you intended to post privately is made public without your permission.


Why would you want to archive social media? It's always been slop and now it's increasingly AI slop.

We should really to back to people hosting their own websites when they want to share something publicly. Just plain HTML like in 1995.


For better or for worse, there are a lot of "important" discussions that happen on Twitter, and a lot of "documentation" that has unfortunately been assembled in Discord channels. You can put your content on your website, but that's not going to make everyone else stop taking the easy way out.

Well if only we still can archive Instagram full-profiles, for example ...

TLDR The actual (formally) hard problems:

  Defining archive boundaries in a dense social graph (graph traversal + stopping criteria without exploding scope)
  Entity resolution across pseudonymous accounts 
  Reconstructing opaque ranking algorithms from outputs



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: