Creating Vancouver’s next flag [Washington state, USA]

litchralee@sh.itjust.works · 16 hours ago

Sadly, I’m not familiar enough with Nginx Proxy Manager to know. But I would imagine that there must be a different way to achieve the same result.

BTW, when I read “NPM”, I first think of Node.JS Package Manager. The title of your post may be confusing, and you might consider editing it to spell out the name of Nginx Proxy Manager.

litchralee@sh.itjust.works · 17 hours ago

I’ll take a stab at the question. But I’ll need to lay some foundational background information.

When an adversarial network is blocking connections to the Signal servers, the Signal app will not function. Outbound messages will still be encrypted, but they can’t be delivered to their intended destination. The remedy is to use a proxy, which is a server that isn’t blocked by the adversarial network and which will act as a relay, forwarding all packets to the Signal servers. The proxy cannot decrypt any of the messages, and a malicious proxy is no worse than blocking access to the Signal servers directly. A Signal proxy specifically forwards only to/from the Signal servers; this is not an open proxy.

The Signal TLS Proxy repo contains a Docker Compose file, which will launch Nginx as a reverse proxy. When a Signal app connects to the proxy at port 80 or 443, the proxy will – in the background – open a connection to the Signal servers. That’s basically all it does. They ostensibly wrote the proxy as a Docker Compose file, because that’s fairly easy to set up for most people.

But now, in your situation, you already have a reverse proxy for your selfhosting stack. While you could run Signal’s reverse proxy in the background and then have your main reverse proxy forward to that one, it would make more sense to configure your main reverse proxy to directly do what the Signal reverse proxy would do.

That is, when your main proxy sees one of the dozen subdomains for the Signal server, it should perform reverse proxying to those subdomains. Normally, for the rest of your self hosting arrangement, the reverse proxy would target some container that is running on your LAN. But in this specific case, the target is actually out on the public Internet. So the original connection comes in from the Internet, and the target is somewhere out there too. Your reverse proxy simply is a relay station.

There is nothing particularly special about Signal choosing to use Nginx in reverse proxy mode, in that repo. But it happens to be that you are already using Nginx Proxy Manager. So it’s reasonable to try porting Signal’s configuration file so that it runs natively with your Nginx Proxy Manager.

What happens if Signal updates that repo to include a new subdomain? Well, you wouldn’t receive that update unless you specifically check for it. And then update your proxy configuration. So that’s one downside.

But seeing as the Signal app demands port 80 and 443, and you already use those ports for your reverse proxy, there is no way to avoid programming your reverse proxy to know the dozen subdomains. Your main reverse proxy cannot send the packets to the Signal reverse proxy if your main proxy cannot even identify that traffic.

litchralee@sh.itjust.works · 5 days ago

Since that whole vibe-coded Cloudflare Matrix nonsense and associated attempted retcon – see here for context – I am looking forward to a talk on how Matrix actually works.

Specifically, I’d like to know what aspects of a secure, decentralized message platform are particularly hard. That’s in the context of whether Matrix can ever grow into a bona fide Signal competitor (nb: Signal remains the gold standard), and also whether Matrix would function well as a Discord replacement, even if it doesn’t have as strong of group chat privacy and encryption protections.

litchralee@sh.itjust.works · 5 days ago

There can be, although some parts may still need to be written in assembly (which is imperative, because that’s ultimately what most CPUs do), for parts like a kernel’s context switching logic. But C has similar restrictions, like how it is impossible to start a C function without initializing the stack. Exception: some CPUs (eg Cortex M) have a specialized mechanism to initialize the stack.

As for why C, it’s a low-level language that maps well to most CPU’s native assembly language. If instead we had stack-based CPUs – eg Lisp Machines or a real Java Machine – then we’d probably be using other languages to write an OS for those systems.

litchralee@sh.itjust.works · 6 days ago

The other commenters correctly opined that encryption at rest should mean you could avoid encryption in memory.

But I wanted to expand on this:

I really don’t see a way around this, to make the string searchable the hashing needs to be predictable.

I mean, there are probabilistic data structures, where something like a Bloom filter will produce one of two answers: definitely in the set, or possibly in the set. In the context of search tokens, if you had a Bloom filter, you could quickly assess if a message does not contain a search keyword, or if it might contain the keyword.

A suitably sized Bloom filter – possibly different lengths based on the associated message size – would provide search coverage for that message, at least until you have to actually access and decrypt the message to fully search it. But it’s certainly a valid technique to get a quick, cursory result.

Though I think perhaps just having the messages in memory unencrypted would be easier, so long as that’s not part of the attack space.

litchralee@sh.itjust.works · edit-2 6 days ago

Upvoting because the FAQ genuinely is worthwhile to read, and answers the question I had in mind:

7.9 Why not just use a subset of HTTP and HTML?

I don’t agree with their answer though, since if the rough, overall Gemini experience:

is roughly equivalent to HTTP where the only request method is “GET”, the only request header is “Host” and the only response header is “Content-type”, plus HTML where the only tags are <p>, <pre>, <a>, <h1> through <h3>, <ul> and <li> and <blockquote>

Then it stands to reason – per https://xkcd.com/927/ – to do exactly that, rather than devise new protocol, client, and server software. Instead, some of their points have few or no legs to stand on.

The problem is that deciding upon a strictly limited subset of HTTP and HTML, slapping a label on it and calling it a day would do almost nothing to create a clearly demarcated space where people can go to consume only that kind of content in only that kind of way.

Initially, my reply was going to make a comparison to the impossibility of judging a book by its cover, since that’s what users already do when faced with visiting a sketchy looking URL. But I actually think their assertion is a strawman, because no one has suggested that we should immediately stop right after such a protocol has been decided. Very clearly, the Gemini project also has client software, to go with their protocol.

But the challenge of identifying a space is, quite frankly, still a problem with no general solution. Yes, sure, here on the Fediverse, we also have the ActivityPub protocol which necessarily constrains what interactions can exist, in the same way that ATProto also constrains what can exist. But even the most set-in-stone protocol (eg DICT) can be used in new and interesting ways, so I find it deeply flawed that they believe they have categorically enumerated all possible ways to use the Gemini protocol. The implication is that users will never be surprised in future about what the protocol enables, and that just sounds ahistoric.

It’s very tedious to verify that a website claiming to use only the subset actually does, as many of the features we want to avoid are invisible (but not harmless!) to the user.

I’m failing how to see how this pans out, because seeing as the web is predominantly client-side (barring server side tracking of IP address, etc), it should be fairly obvious when a non-subset website is doing something that the subset protocol does not allow. Even if it’s a lay-in-wait function, why would a subset-compliant client software honor that?

When it becomes obvious that a website is not compliant with the subset, a well-behaved client should stop interacting with the website, because it has violated the protocol and cannot be trusted going forward. Add it to an internal list of do-not-connect and inform the user.

It’s difficult or even impossible to deactivate support for all the unwanted features in mainstream browsers, so if somebody breaks the rules you’ll pay the consequences.

And yet, Firefox forks are spawning left and right due to Mozilla’s AI ambitions.

Ok, that’s a bit blithe, but I do recognize that the web engines within browsers are now incredibly complex. Even still though, the idea that we cannot extricate the unneeded sections of a rendering engine and leave behind the functionality needed to display a subset of HTML via HTTP, I just can’t accept that until someone shows why that is the case.

Complexity begats complexity, whereas this would be an exercise in removing complexity. It should be easier than writing new code for a new protocol.

Writing a dumbed down web browser which gracefully ignores all the unwanted features is much harder than writing a Gemini client from scratch.

Once again, don’t do that! If a subset browser finds even one violation of the subset protocol, it should halt. That server is being malicious. Why would any client try to continue?

The error handling of a privacy-respecting protocol that is a subset of HTML and HTTP would – in almost all cases – assume the server is malicious, and to disconnect. It is a betrayal of the highest order. There is no such thing as a “graceful” betrayal, so we don’t try to handle that situation.

Even if you did it, you’d have a very difficult time discovering the minuscule fraction of websites it could render.

Is this about using the subset browser to look at regular port-80 web servers? Or is this about content discovery? Only the latter has a semblance of logic behind it, but that too is an unsolved problem to this day.

Famously, YouTube and Spotify are drivers of content discovery, based in part due to algorithms that optimize for keeping users on those platforms. Whereas the Fediverse eschews centralized algorithms and instead just doesn’t have one. And in spite of that, people find communities. They find people, hashtags, images, and media. Is it probably slower than if an algorithm could find these for the user’s convenience? Yes, very likely.

But that’s the rub: no one knows what they don’t know. They cannot discover what they don’t even imagine could exist. That remains the case, whether the Gemini protocol is there or not. So I’m still not seeing why this is a disadvantage against an HTTP/HTML subset.

Alternative, simple-by-design protocols like Gopher and Gemini create alternative, simple-by-design spaces with obvious boundaries and hard restrictions.

ActivityPub does the same, but is constructed atop HTTP, while being extensible to like-for-like replace any existing social media platform that exists today – and some we haven’t even thought of yet – while also creating hard and obvious boundaries which forment a unique community unlike any other social media platform.

The assertion that only simple protocols can foster community spaces is belied by ActivityPub’s success; ActivityPub is not exactly a simple protocol either. And this does not address why stripping down HTML/HTTP wouldn’t also do the same.

You can do all this with a client you wrote yourself, so you know you can trust it.

I sure as heck do not trust the TFTP client I wrote at uni, and that didn’t even have an encryption layer. The idea that every user will write their own encryption layer to implement the mandatory encryption for Gemini protocol is farcical.

It’s a very different, much more liberating and much more empowering experience than trying to carve out a tiny, invisible sub-sub-sub-sub-space of the web.

So too would browsing a sunset of HTML/HTTP using a browser that only implements that subset. We know this because if your reading this right now, you’re either viewing this comment through a web browser frontend for Lemmy, or using an ActivityPub client of some description. And it is liberating! Here we all are, on this sub sub sub sub space of the Internet, hanging out and commenting about protocols and design.

But that doesn’t mean we can’t adapt already-proven, well-defined protocols into a subset that matches an earlier vision of the internet, while achieving the same.

litchralee@sh.itjust.works · 3 months ago

Creating Vancouver’s next flag [Washington state, USA]

litchralee@sh.itjust.works · 4 months ago

Concrete example of threat modeling: if someone found out I was using Signal, for any reason at all, would that cause problems for me?

If yes, then Signal is not a good option. If no, then Signal may be appropriate. Why? Because in their documentation, they explicitly state that while messages are confidential, the fact that you’re using Signal cannot be hidden, and so they don’t make that guarantee.

litchralee@sh.itjust.works · 4 months ago

Tbf, can’t the other party mess it up with signal too?

Yes, but this is where threat modeling comes into play. Grossly simplified, developing a threat model means to assess what sort of attackers you reasonably expect to make an attempt on you. For some people, their greatest concern is their conservative parents finding out that they’re on birth control. For others, they might be a journalist trying to maintain confidentiality of an informant from a rogue sheriff’s department in rural America. Yet others face the risk of a nation-state’s intelligence service trying to find their location while in exile.

For each of these users, they have different potential attackers. And Signal is well suited for the first two, and only alright against the third. After all, if the CIA or Mossad is following someone around IRL, there are other ways to crack their communications.

What Signal specifically offers is confidentiality in transit, meaning that all ISPs, WiFi networks, CDNs, VPNs, script skiddies with Wireshark, and network admins in the path of a Signal convo cannot see the contents of those messages.

Can the messages be captured at the endpoints? Yes! Someone could be standing right behind you, taking photos of your screen. Can the size or metadata of each message reveal the type of message (eg text, photo, video)? Yes, but that’s akin to feeling the shape of an envelope. Only through additional context can the contents be known (eg a parcel in the shape of a guitar case).

Signal also benefits from the network effect, because someone trying to get away from an abusive SO has plausible deniability if they download Signal on their phone (“all my friends are on Signal” or “the doctor said it’s more secure than email”). Or a whistleblower can send a message to a journalist that included their Signal username in a printed newspaper. The best place to hide a tree is in a forest. We protect us.

My main issue for signal is (mostly iPhone users) download it “just for protests” (ffs) and then delete it, but don’t relinquish their acct, so when I text them using signal it dies in limbo as they either deleted the app or never check it and don’t allow notifs

Alas, this is an issue with all messaging apps, if people delete the app without closing their account. I’m not sure if there’s anything Signal can do about this, but the base guarantees still hold: either the message is securely delivered to their app, or it never gets seen. But the confidentiality should always be maintained.

I’m glossing over a lot of cryptographic guarantees, but for one-to-one or small-group private messaging, Signal is the best mainstream app at the moment. For secure group messaging, like organizing hundreds of people for a protest, that is still up for grabs, because even if an app was 100% secure, any one of those persons can leak the message to an attacker. More participants means more potential for leaks.

litchralee@sh.itjust.works · 4 months ago

When I see E2EE and XMPP mentioned, I think of this blog post by Soatok, outlining some very odd cryptographic choices in XMPP + OMEMO: https://soatok.blog/2024/08/04/against-xmppomemo/

I would very much like to see a richer playing field than just Signal for private messaging, but it’s a tough nut to crack. For exactly which aspect that turns me away from XMPP for E2EE, I think this nails it down:

you only need check whether OMEMO is on by default (it isn’t), or whether OMEMO can be turned off even if your client supports it (it can).

When the competition is Signal, these sorts of details matter a lot.