• 2 Posts
  • 15 Comments
Joined 2 years ago
cake
Cake day: October 4th, 2023

help-circle



  • The point I’m making is that bash is optimized for quickly writing throwaway code. It doesn’t matter if the code written blows up in some case other than the one you’re using. You don’t need to handle edge cases that don’t apply to the one time that you will run the code. I write lots of bash code that doesn’t handle a bunch of edge cases, because for my one-off use, that edge case doesn’t arise. Similarly, if an LLMs is generating code that misses some edge case, if it’s a situation that will never arise, and that may not be a problem.

    EDIT: I think maybe that you’re misunderstanding me as saying “all bash code is throwaway”, which isn’t true. I’m just using it as an example where throwaway code is a very common, substantial use case.


  • I don’t know: it’s not just the outputs posing a risk, but also the tools themselves

    Yeah, that’s true. Poisoning the training corpus of models is at least a potential risk. There’s a whole field of AI security stuff out there now aimed at LLM security.

    it shouldn’t require additional tools, checking for such common flaws.

    Well, we are using them today for human programmers, so… :-)



  • Security is where the gap shows most clearly

    So, this is an area where I’m also pretty skeptical. It might be possible to address some of the security issues by making minor shifts away from a pure-LLM system. There are (conventional) security code-analysis tools out there, stuff like Coverity. Like, maybe if one says “all of the code coming out of this LLM gets rammed through a series of security-analysis tools”, you catch enough to bring the security flaws down to a tolerable level.

    One item that they highlight is the problem of API keys being committed. I’d bet that there’s already software that will run on git-commit hooks that will try to red-flag those, for example. Yes, in theory an LLM could embed them into code in some sort of obfuscated form that slips through, but I bet that it’s reasonable to have heuristics that can catch most of that, that will be good-enough, and that such software isn’t terribly difficult to write.

    But in general, I think that LLMs and image diffusion models are, in their present form, more useful for generating output that a human will consume than that a CPU will consume. CPUs are not tolerant of errors in programming languages. Humans often just need an approximately-right answer, to cue our brains, which itself has the right information to construct the desired mental state. An oil painting isn’t a perfect rendition of the real world, but it’s good enough, as it can hint to us what the artist wanted to convey by cuing up the appropriate information about the world that we have in our brains.

    This Monet isn’t a perfect rendition of the world. But because we have knowledge in our brain about what the real world looks like, there’s enough information in the painting to cue up the right things in our head to let us construct a mental image.

    Ditto for rough concept art. Similarly, a diffusion model can get an image approximately right — some errors often just aren’t all that big a deal.

    But a lot of what one is producing when programming is going to be consumed by a CPU that doesn’t work the way that a human brain does. A significant error rate isn’t good enough; the CPU isn’t going to patch over flaws and errors itself using its knowledge of what the program should do.

    EDIT:

    I’d bet that there’s already software that will run on git-commit hooks that will try to red-flag those, for example.

    Yes. Here are instructions for setting up trufflehog to run on git pre-commit hooks to do just that.

    EDIT2: Though you’d need to disable this trufflehog functionality and have some out-of-band method for flagging false positives, or an LLM could learn to bypass the security-auditing code by being trained on code that overrides false positives:

    Add trufflehog:ignore comments on lines with known false positives or risk-accepted findings


  • I keep seeing the “it’s good for prototyping” argument they post here, in real life.

    There are real cases where bugs aren’t a huge deal.

    Take shell scripts. Bash is designed to make it really fast to write throwaway, often one-line software that can accomplish a lot with minimal time.

    Bash is not, as a programming language, very optimized for catching corner cases, or writing highly-secure code, or highly-maintainable code. The great majority of bash code that I have written is throwaway code, stuff that I will use once and not even bother to save. It doesn’t have to handle all situations or be hardened. It just has to fill that niche of code that can be written really quickly. But that doesn’t mean that it’s not valuable. I can imagine generated code with some bugs not being such a huge problem there. If it runs once and appears to work for the inputs in that particular scenario, that may be totally fine.

    Or, take test code. I’m not going to spend a lot of time making test code perfect. If it fails, it’s probably not the end of the world. There are invariably cases that I won’t have written test code for. “Good enough” is often just fine there.

    And it might be possible to, instead of (or in addition to) having human-written commit messages, generate descriptions of commits or something down the line for someone browsing code.

    I still feel like I’m stretching, though. Like…I feel like what people are envisioning is some kind of self-improving AI software package, or just letting an LLM go and having it pump out a new version of Microsoft Office. And I’m deeply skeptical that we’re going to get there just on the back of LLMs. I think that we’re going to need more-sophisticated AI systems.

    I remember working on one large, multithreaded codebase where a developer who isn’t familiar with or isn’t following the thread-safety constraints would create an absolute maintenance nightmare for others, where you’re going to spend way more time tracking down and fixing breakages induced than you saved by them not spending time coming up to speed on the constraints that their code needs to conform to. And the existing code-generation systems just aren’t really in a great position to come up to speed on those constraints. Part of what a programmer does is, when writing code, is to look at the human-language requirements, and identify that there are undefined cases and go back and clarify the requirement with the user, or use real-world knowledge to make reasonable calls. Training an LLM to map from an English-language description to code is creating a system that just doesn’t have the capability to do that sort of thing.

    But, hey, we’ll see.



  • There was a famous bug that made it into 95 and 98, a tick counter that caused the system to crash after about a month. It was in there so long because there were so many other bugs causing stability problems that it wasn’t obvious.

    I will say that classic MacOS, which is what Apple was doing at the time, was also pretty unstable. Personal computer stability really improved in the early 2000s a lot. Mac OS X came out and Microsoft shifted consumers onto a Windows-NT-based OS.

    EDIT:

    https://www.cnet.com/culture/windows-may-crash-after-49-7-days/

    A bizarre and probably obscure bug will crash some Windows computers after about a month and a half of use.

    The problem, which affects both Microsoft Windows 95 and 98 operating systems, was confirmed by the company in an alert to its users last week.

    “After exactly 49.7 days of continuous operation, your Windows 95-based computer may stop responding,” Microsoft warned its users, without much further explanation. The problem is apparently caused by a timing algorithm, according to the company.




  • If databases are involved they usually offer some method of dumping all data to some kind of text file. Usually relying on their binary data is not recommended.

    It’s not so much text or binary. It’s because a normal backup program that just treats a live database file as a file to back up is liable to have the DBMS software write to the database while it’s being backed up, resulting in a backed-up file that’s a mix of old and new versions, and may be corrupt.

    Either:

    1. The DBMS needs to have a way to create a dump — possibly triggered by the backup software, if it’s aware of the DBMS — that won’t change during the backup

    or:

    1. One needs to have filesystem-level support to grab an atomic snapshot (e.g. one takes an atomic snapshot using something like btrfs and then backs up the snapshot rather than the live filesystem). This avoids the issue of the database file changing while the backup runs.

    In general, if this is a concern, I’d tend to favor #2 as an option, because it’s an all-in-one solution that deals with all of the problems of files changing while being backed up: DBMSes are just a particularly thorny example of that.

    Full disclosure: I mostly use ext4 myself, rather than btrfs. But I also don’t run live DBMSes.

    EDIT: Plus, #2 also provides consistency across different files on the filesystem, though that’s usually less-critical. Like, you won’t run into a situation where you have software on your computer update File A, then does a sync(), then updates File B, but your backup program grabs the new version of File B but then the old version of File A. Absent help from the filesystem, your backup program won’t know where write barriers spanning different files are happening.

    In practice, that’s not usually a huge issue, since fewer software packages are gonna be impacted by this than write ordering internal to a single file, but it is permissible for a program, under Unix filesystem semantics, to expect that the write order persists there and kerplode if it doesn’t…and a traditional backup won’t preserve it the way that a backup with help from the filesystem can.



  • Aside from the issues that the article notes, I am skeptical that this is gonna work from a business protection standpoint, if protectionism is the state’s goal.

    Say the issue is that VW cannot make EVs in Europe competitive with BYD in China. Labor or environmental regulation costs drive their costs up. Say BYD can make a car for 75% of what VW can.

    The EU says “okay” and sticks a price floor in the EU Customs Union at whatever VW is selling at.

    Ah, now the playing field is level! Or…is it?

    Yes, BYD can’t undercut VW on prices now. But it still has that cost advantage. What BYD is going to do is to spend that extra money on adding more desirable features to their car. So now, at the price floor, you’re going to have a lackluster VW model and a really spiffy BYD model. Okay, maybe VW can ride on their reputation for a bit, but at some point, if BYD is consistently putting out better cars at a given price point, that’s gonna affect consumer views.

    What I expect it will do is reduce the number of people buying EVs at the lower end of the market, where VW wasn’t making an entrant, so now those consumers will probably be getting a conventional vehicle instead of an EV.


  • I think that the problem will be if software comes out that’s doesn’t target home PCs. That’s not impossible. I mean, that happens today with Web services. Closed-weight AI models aren’t going to be released to run on your home computer. I don’t use Office 365, but I understand that at least some of that is a cloud service.

    Like, say the developer of Video Game X says “I don’t want to target a ton of different pieces of hardware. I want to tune for a single one. I don’t want to target multiple OSes. I’m tired of people pirating my software. I can reduce cheating. I’m just going to release for a single cloud platform.”

    Nobody is going to take your hardware away. And you can probably keep running Linux or whatever. But…not all the new software you want to use may be something that you can run locally, if it isn’t released for your platform. Maybe you’ll use some kind of thin-client software — think telnet, ssh, RDP, VNC, etc for past iterations of this — to use that software remotely on your Thinkpad. But…can’t run it yourself.

    If it happens, I think that that’s what you’d see. More and more software would just be available only to run remotely. Phones and PCs would still exist, but they’d increasingly run a thin client, not run software locally. Same way a lot of software migrated to web services that we use with a Web browser, but with a protocol and software more aimed at low-latency, high-bandwidth use. Nobody would ban existing local software, but a lot of it would stagnate. A lot of new and exciting stuff would only be available as an online service. More and more people would buy computers that are only really suitable for use as a thin client — fewer resources, closer to a smartphone than what we conventionally think of as a computer.

    EDIT: I’d add that this is basically the scenario that the AGPL is aimed at dealing with. The concern was that people would just run open-source software as a service. They could build on that base, make their own improvements. They’d never release binaries to end users, so they wouldn’t hit the traditional GPL’s obligation to release source to anyone who gets the binary. The AGPL requires source distribution to people who even just use the software.