samiv 7 days ago

The most crucial thing that I've seen over the years is that most developers are simply afraid of bringing the application down on bugs.

They conflate error handling with writing code for bugs and this leads to proliferation of issues and second/third/etc degree issues where the code fails because it already encountered a BUG but the execution was left to continue.

What do I mean in practice? Practical example:

I program mostly in C and C++ and I often see code like this

   if (some_pointer) { ... }
and the context of the code is such that some_pointer being a NULL pointer is in fact not allowed and is thus a BUG. The right thing to do would be to ABORT the process execution immediately but instead the programmer turned this it into a logical condition. (probably because they were taught to check their pointers).

This has the side effect that:

  - The pre-condition that some_pointer may not be null is now lost. Reading the code it looks like this condition IS allowed. 
  - The code is allowed to continue after it has logically bugged out. Your 1+1 = 2 premise no longer holds. This will lead to second order bugs later on when the BUG let program to continue execution in buggy condition.  False reporting will likely happen. 
The better way to write this code is:

  ASSERT(some_pointer); 
Where ASSERT is a unconditional check that will always (regardless of your build config) abort your process gracefully and produce a) stack trace b) core dump file.

My advice is:

If your environment is such that you can immediately abort your process when you hit a BUG you do so. In the long run this will help with post-mortem diagnosis and fixing of bugs and will result in more robust and better quality code base.

  • texuf 7 days ago

    If you're validating parameters that originate from your program (messages, user input, events, etc), ASSERT and ASSERT often. If you're handling parameters that originate from somewhere else (response from server, request from client, loading a file, etc) - you model every possible version of the data and handle all valid and invalid states.

    Why? When you or your coworkers are adding code, the stricter you make your code, the fewer permutations you have to test, the fewer bugs you will have. But, you can't enforce an invariant on a data source that you don't control.

    • samiv 7 days ago

      Yes of course the key here is to understand the difference between BUGS and logical (error) conditions.

      If I write an image processing application failing to process an image .png when:

        - user doesn't permission to the file
        - file is actually not a file
        - file is actually not an image
        - file contains a corrupt image
        etc.
      
      are all logical conditions that the application needs to be able to handle.

      The difference is that from the software correctness perspective none of these are errors. In the software they're just logical conditions and they are only errors to the USER.

      BUGS are errors in the software.

      (People often get confused because the term "error" without more context doesn't adequately distinguish between an error condition experienced by the user when using the software and errors in the program itself.)

    • keybored 7 days ago

      > But, you can't enforce an invariant on a data source that you don't control.

      This is obvious.

  • j1elo 6 days ago

    Please, no!

    I worked with a 3rd party library that had this mentality. "A bug is a bug so the assert fails and thus the code is now in an unknown state thus The right thing to do would be to ABORT the process execution immediately". Oh my.

    Just do "if (pointer)" and when that fails, error out from the smallest context possible that applies to that pointer, and nothing more than that. I.e. the real BEST thing to do is to abort the current connection. To skip the current file with an error. To fail writing that piece of memory. Whatever. But never abort (unless maybe in debug builds).

    The end result of this library was that we had a WebRTC server handling 100s of simultaneous video calls, and then when a single new user tripped up during connection and went through a bogus code path, the library would decide "oh something is not as I expected so I'll abort, of course!" and the whole production server was brought down with it.

    That kind of behavior does not help achieving high production quality and providing robust and reliable services.

    We ended up removing the library's runtime assertions, which meant that connections that would bug the library code would just end up failing with an error somewhere else, that could be used to just discard the attempt and try again. All in all, numbers showed it was a huge positive in stability for the service.

    • Mawr 12 hours ago

      Oh my indeed.

      > That kind of behavior does not help achieving high production quality and providing robust and reliable services.

      Right, so since it's the messenger (assertion) that brought news of defeat in battle, you kill the messenger instead of trying to win the next battle.

      Your problem is that the entire business seems to hinge on a single point of failure - if the server binary dies, you lose customers. This has nothing to do with what some library does or does not do, it's just a horribly designed system.

      That crash should have made you all stop and think real hard about what you're doing because clearly it's not working:

      1. Don't write bugs in the first place. No I'm not kidding. Your tests clearly suck so fix them, make sure you do proper PR reviews, make sure to leverage your type system to the fullest, use all available tooling, etc.

      2. Design your system be resilient to the server binary dying because I assure you, there's nothing you can do to prevent it from happening for one reason or another.

    • mcdeltat 6 days ago

      I think the original comment was more directed to scenarios where the precondition fail is let slide, NOTHING happens, and that's not the desired behaviour, so you have a bug. Such code exists too often and it's just poor quality.

      If there's anything I've learnt about error handling, it's that it must be approached with very careful consideration of what you want the app to do (conceptually) when the error occurs. Sometimes that's crash the program, sometimes it's throw an error, sometimes it's log and move on. The issue comes when devs don't want to think about this, whereby the simplest solution is absorb any error and forget about it.

      • mikeschinkel 5 days ago

        Agree with your assumption, but then I also know that developers often read recommendations and convert them into dogma, ignoring the specifics for when those recommendations should be applied.

        Early in my career I taught programmers and I was horrified to come back to one client and find they followed my advice, but not where it applied meaning that what they did was actively harmful.

        Ever since that, I realized when a developer makes a recommendation they need to be very explicit about when it applies otherwise they are doing more harm than good and hopefully someone will publicly challenge their recommendation to bring their unstated qualifications to light.

    • samiv 4 days ago

      So the library was full of bugs.

      (IMHO) The right thing to do would be to

        - Fix the actual bugs
        - Provide super visor  and isolation as a protection mechanism and resiliciency. 
      
      What you've now done is to essentially just hammer it quiet while sweeping all the issues under the rug and pretend everything is great when the library is actually in ill defined state. How's that possibly any better? You're probably experiencing hard to detect bugs, occasional runtime corruptions and all the fun stuff now that you'll likely never be able to fix :)

      "That kind of behavior does not help achieving high production quality and providing robust and reliable services."

      It absolutely does.

      When bugs are obvious and caught early they're easier to fix and this leads to higher quality more robust and reliable service.

      Pretending that everything is good is never the solution.

      • j1elo 4 days ago

        I agree with you. But with a product trying to gain a good name and its first loyal customers, a crash that brings down the complete service is a no-go, no matter the excellence in software development that one seeks theoretically.

        Bugs won't be fixed either if the early customers fly away.

        In practice, ignoring the bugs meant better numbers. Only the user connections affected by those bugs would fail, which then a retry system transparently solved. This is how the library should have worked to begin with.

        Bugs are always going to exist, it's the first law of software, that there is no such thing as a software without bugs. So no, I don't believe on aborting a whole application is ever an acceptable behavior for a library. Do the if(pointer) else return error, not the assert(pointer) else abort.

        • samiv 4 days ago

          I'm sure you made the best judgement call given your circumstances. Of course from business perspective software quality is irrelevant. It's best achieved by letting the PR/marketing team take care of it.

          However, the problem is that when you start with this route that you allow buggy/incorrect code to continue running you cannot reason about your program anymore. You cannot make any smart decisions when the program is allowed to continue after hitting a bug.

          If I call function "foobar()" and I assume it's buggy but it continues and leaves my program in a bad state what should I do? How can I determine that the result that foobar() produced is garbage?

          So maybe foobar() returns a bool/success/flag value that indicates that it bugged out. But then what? Maybe I want to log this as an error but what if the logging function also has bugs in it? So maybe the logging function didn't work as intended because I called made a bug when I called it when trying to deal with the bug that happened inside foobar(). How do I propagate this error correctly without introducing more bugs to my callers who then must all do something.

          The fact is most programmers can barely get the "happy path" right. Even normal logical error conditions ("file not found") cause plenty of software to fail because it cannot do proper error handling and propagation. So if you let incorrect code to keep on executing nothing good will ever come out of it and there's 0 chance anyone will be able to write correct code on top of incorrect code.

          The point is once you start writing code to "deal with" incorrect logic its like trying to do math after a division of zero. None of the rules apply, none of the logic applies. Your program state is random garbage.

          All these problems disappear when you make the simple rule. You don't write code for bugs. Simply just abort.

          Mind you from the library perspective what is your bug might not be a bug from my perspective. In other words if I provide a library for random 3rd parties I can assume they will use it wrong. Therefore their buggy code is what I must expect and return some error value etc. But if I'm calling code that I wrote from my own code I don't write a single line of code for bugs. I simply assert and abort.

          And to your last point of bugs always existing. Yes I agree, and the best we can do is to squash them as soon as possible and make them loud and clear and as easy to debug as possible (i.e. direct callstack / core dump). Not doing this does not fix them but smply makes them harder to fix.

    • tsss 5 days ago

      Any sane person will throw an exception, which, when unhandled, will crash the program.

  • adonovan 6 days ago

    I’m a big fan of assertion and rigorous preconditions but there are times when a failure of some invariant in a minor subsystem should not be allowed to crash the entire process, especially if the context makes it easy to return an error.

    In our project (the language server for Go) we have gotten tremendous value from telemetry: return an error, but report home the 1-bit fact that the assertion has failed. Often that fact is enough to figure out why; other times it is necessary to refine the assertion into two or more (in a later release) to get another bit or two of information about the nature of the failure.

  • rikthevik 6 days ago

    > The code is allowed to continue after it has logically bugged out.

    I'm a big fan of asserting preconditions and making it clear that we are getting into a bad place. I would rather dig through Sentry for an AssertionError than propagate a bad state and having to fix mangled data after the fact. If the AssertionError means that we mishandled valid user input, no problem, we'll go fix it.

    A few times in my career I've had to ask, "okay, how long has this bug been quietly mangling user data?" and it's not a fun place to be.

    Side note: I've never understood the convention of removing asserts in production builds. It seems like removing the seatbelts from the car before the race just to save a few pounds.

    • zarzavat 6 days ago

      > Side note: I've never understood the convention of removing asserts in production builds. It seems like removing the seatbelts from the car before the race just to save a few pounds.

      Once an upon a time computers were slow and every cycle mattered. Assertions were compiled out of the build by necessity. Better a crash once in a while than the program hardly running because it was so slow.

      • drzaiusx11 6 days ago

        Case in point: When I was stuck inside a "big ball of perl" codebase that heavily used assertions for method input validation, I generated a flame graph of where time was spent in the codebase and it turned out it was assertions all the way down. Since only a small percentage of inputs came from external/unvalidated sources (user input etc) it was fine to remove the vast majority of them outside of the development environment. So we turned them into no-ops in prod and had a significant performance improvement.

        • zbentley 4 days ago

          Wouldn’t happen to have been a big healthcare company in Boston whose assert-anything function to validate that functions were called with correct signatures was called AssertFields?

  • jayd16 7 days ago

    Like everything in life, it depends.

    If this is some inconsequential part of the codebase it might be better to limp on then to completely stop anyone, user or fellow dev, from running the app at all.

    Said another way, graceful degradation is a thing.

    • gf000 7 days ago

      I think this is precisely why exceptions model particularly well - well - exceptional situations.

      They let you install barriers, and you can safely fail up until that point, disallowing the program from entering cursed states, all the while a user can be returned a readable error message.

      In fact, I would be interested in more research into transactions/transactional memory.

    • samiv 7 days ago

      How do you gracefully degrade when your program is in a buggy state and you no longer know what data is valid, what is garbage and what conditions hold ?

      If I told you to write a function that takes a chunk of customer JSON data but I told you that the data was produced / processed by some code that is buggy and it might have corrupted the data and your job is to write a function that works on that data how would you do it?

      Now your answer is likely to be "just check sum it", but what if i told you that the functions that compute the check sums sometimes go off rails in buggy branches and produce incorrect checksums.

      Then what?

      In a sane world your software is always well defined state. This means buggy conditions cannot be let to execute. If you don't honor this you have no chance of correct program.

      • gf000 7 days ago

        Contrary to people's dislike of OOP, I think it pretty well solves the problem.

        You have objects, and calling a method on it may fail with an exception. If the method throws an exception, it itself is responsible for leaving behind a sane state, but due to encapsulation it is a feasible task.

        (Of course global state may still end up in illegal states, but if the program architecture is carefully designed and implemented it can be largely mitigated)

      • ndriscoll 7 days ago

        Why not bring down the entire server if you detect an error condition in your application? You build things in a way where a job or request has isolated resources, and if you detect an error, you abort the job and free those resources, but continue processing other jobs. Operating systems do this through processes with different memory maps. Applications can do it through things like arenas or garbage collection.

        • layer8 7 days ago

          It may be okay in a server, but (for example) not in a desktop application. The issue, then, is that most code lives (or should live) in library-like modules that are agnostic of which kind of application context they are running in. In other words, you can’t just abort in library code, because the library might be used in application contexts for which this is not acceptable. And arguably almost all important code should be a library.

          Exception mechanisms let the calling context control how to proceed. Deferring to that control and doing some cleanup during stack unwinding virtually never causes serious issues in practice.

          • ndriscoll 7 days ago

            What I meant was that if you follow the logic of "computer is in an unknown state. Stop processing everything", then why not continue that to the entire server (operating system, hypervisor, etc.)? Obviously it's not okay in almost any context. Instead, assuming you have something more complicated than a CLI script that's going to immediately exit anyway, you should be handling those sorts of conditions and allowing your event loop/main thread to continue.

  • bluepizza 7 days ago

    I think the issue is that bringing the application down might mean cutting short concurrent ongoing requests, especially requests that will result in data mutation of some sort.

    Otherwise, some situations simply don't warrant a full shutdown, and it might be okay to run the application in degraded mode.

    • samiv 7 days ago

      "I think the issue is that bringing the application down might mean cutting short concurrent ongoing requests, especially requests that will result in data mutation of some sort."

      Yes but what is worse is silently corrupting the data or the state because of running in buggy state.

      • jayd16 7 days ago

        This is a false choice.

        • int_19h 7 days ago

          If you don't know why a thing that's supposed to never be null ended up being null, you don't know what the state of your app is.

          If you don't know what the state of your app is, how do you prevent data corruption or logical errors in further execution?

          • jayd16 7 days ago

            > If you don't know what the state of your app is, how do you prevent data corruption or logical errors in further execution?

            There are a lot of patterns for this. Its perfectly fine and often desirable to scope the blast radius of an error short of crashing everything.

            OSes shouldn't crash because a process had an error. Servers shouldn't crash because a request had an error. Missing textures shouldn't crash your game. Cars shouldn't crash because the infotainment system had an error.

            • int_19h 7 days ago

              If you can actually isolate state well enough, and code every isolated component in a way that assumes that all state external to it is untrusted, sure.

              How often do you see code written this way?

              • ndriscoll 6 days ago

                This is basically all code I've worked on. You have a parsing/validation layer that passes data to your logic layer. I could imagine it working less well for something like a game where your state lives longer than 2 ms and an external database is too slow, but for application servers that manipulate database entries or whatever it's completely normal.

                In most real-world application programming languages (i.e. not C and C++), you don't really have the ability to access arbitrary memory, so if you know you never gave task B a reference to task A or its resources, then you know task B couldn't possibly interfere with task A. It's not dissimilar to two processes being unable to interfere with each other when they have different logical address spaces. If B does something odd, you just abort it and continue with A. In something like an application server, it is completely normal for requests to have minimal shared state internal to the application (e.g. a connection pool might be the only shared object, and has a relatively small boundary that doesn't allow its clients to directly manipulate its own internals).

          • victorbjorklund 7 days ago

            You can "drop" that request which fails instead of crashing the whole app (and dropping all other requests too).

            • kstrauser 7 days ago

              Sure. You wouldn't want a webserver to crash if someone sends a malformed request.

              I'd have to think long and hard about each individual case of running in degraded mode though. Sometimes that's appropriate: an OS kernel should keep going if someone unplugs a keyboard. Other times it's not: it may be better for a database to fail than to return the wrong set of rows because of a storage error.

            • buttercraft 7 days ago

              That's exactly what the attacker wants you to do after their exploit runs: ignore the warning signs.

              • ndriscoll 7 days ago

                You don't ignore it. You track errors. What you don't do is crash the server for all users, giving an attacker an easy way to DoS you.

                • buttercraft 7 days ago

                  A DoS might be the better option vs. say, data exfiltration.

                  • ndriscoll 7 days ago

                    Most bugs aren't going to create any risk for data exfiltration. In most real application servers (which are very rarely written in C or C++ these days), requests are almost completely isolated from each other except to the extent that they interact with a database. If you detect a bug in one request, you just abort the one request, and there's likely no way it could affect others.

                    This is part of why something like Rust is usable at all; in the real world a lot of logic has straightforward, linear lifecycles. To the extent that it doesn't, you can push the long-lived state into something like an external database, and now your application has straightforward lifecycles again where the goal of a task is to produce commands to manipulate the database and then exit.

                    • buttercraft 6 days ago

                      Sure, but i was talking about an individual process. If you don't know what state it's in, you simply can't trust it to run anymore. That's all.

                      • ndriscoll 6 days ago

                        Except you usually can because the state isn't completely unknown. You might not expect some field in a structure to be null, but you still know for example that there's no way for one request to have a reference to another, so you just abort the one request and continue.

                        • buttercraft 4 days ago

                          No, if you have been compromised, you cannot make these assumptions.

              • joesb 7 days ago

                And what does DOS attacker want you to do? Not crashing the whole service to deny others of the service?

                • buttercraft 6 days ago

                  That is a valid tradeoff in many situations, yes.

          • buttercraft 7 days ago

            > If you don't know what the state of your app is, how do you prevent data corruption or logical errors in further execution?

            Even worse, you might be in an unknown state because someone is trying to exploit a vulnerability.

            • jayd16 7 days ago

              If you crash then you've handed them a denial of service vulnerability.

              • buttercraft 6 days ago

                That's an issue handled higher up the stack with process isolation etc. It's still not ok to continue running a process that is in an unknown state.

  • crabbone 7 days ago

    I don't agree with any of this.

    First of all, this results in unintelligible errors. Linux is famous for abysmal error reporting, where no matter what the problem really is, you get something like ENOENT, no context, no explanation. Errors need to propagate upwards and allow the handling code to reinterpret them in the context of the work it was doing. Otherwise, for the user they are either meaningless or dangerous.

    Secondly, any particular function that encounters an unexpected condition doesn't have a "moral right" to terminate the entire program (who knows how many layers there are on top of what this particular function does?) Perhaps the fact that a function cannot handle a particular condition is entirely expected, and the level above this function is completely prepared to deal with the problem: insufficient permissions to access the file -- ask user to elevate permission; configuration file is missing in ~/.config? -- perhaps it's in /etc/? cannot navigate to URL? -- perhaps the user needs to connect to Wi-Fi network? And so on.

    What I do see in practice, is that programmers are usually incapable of describing errors in a useful way, and are very reluctant to write code that automates error recovery, even if it's entirely within reach. I think, the reason for this is that the acceptance criteria for code usually emphasizes the "good path", and because usually multiple bad things can happen down the "bad path", it becomes cumbersome and tiresome to describe and respond to the bad things, and then it's seldom done.

    • dvektor 6 days ago

      yup. we have definitely all gotten an ENOENT or EIO before with no context.

  • jminter 6 days ago

    Absolutely. An intermediate path in Go is to recover any panics on your goroutines: in this case a nil dereference panic may cause the death of the goroutine but not the whole application.

    An example where this can be useful is in HTTP request handling: a single request might fail but the others can keep going -- but there are plenty of other use cases too.

    The panic recovery code can log for further investigation, as well as in the HTTP case for example probably returning a 500 to the caller if wanted.

    There are of course plenty of valid reasons not to take an approach like this too, but in some circumstances it can be useful.

  • dehrmann 7 days ago

    This is often called offensive programming.

    • samiv 7 days ago

      Hot damn, I never heard of this term before but yeah that's exactly what it is.

      TIL, thanks.

      • layer8 7 days ago

        Paradoxically, it is still a subset of defensive programming.

  • kstrauser 7 days ago

    I largely agree. If it came to pass that the precondition fails, there's a bug somewhere and this code just hides it. At the very least, that should go to an error log that someone actually sees.

    I'm writing a Rust project right now where I deliberately put almost no error handling in the core of the code apart from the bits accepting user input. In Rust speak, I use .unwrap() all over the place when fetching a mandatory row from the DB or loading config files or opening a network connection to listen on or writing to stdout. If any of those things fail, there's not a thing I can plausibly do to recover from it in this context. I suppose I could write code like

      if let Ok(cfg) = load_config() {
        println!("Loaded the config without failing!");
        Ok(cfg);
      }
      else {
        eprintln!("Oh no! Couldn't load the config file!";
        Err("Couldn't load the config file");
      }
    
    and make the program exit if it returns an error, but that's just adding noise around:

      return load_config().unwrap();
    
    The only advantage is that the error message is more gentle, at the expense of adding a bunch of code and potentially hiding the underlying error message from the user so that they could fix it.

    I think Python also gets that right, where it's common to raise exceptions when exceptional things happen, and only ever handle the exceptions you can actually do something about. In 99.999% of projects, what are you actually going to do at the application level to properly deal with an OOM or disk full error? Nothing. It's almost always better to just crash and let the OS / daemon manager / top level event loop log that something bad happened and schedule a retry.

    • samiv 7 days ago

      The whole story is 3-fold. We have

        - errors in the software itself, aka BUGS
        - logical conditions that are expected part of the program execution flow and expected state. some of these might be error conditions but only for the *user*. In other words they're not errors in the software itself.
        - unexpected failures where none of the above applies. typically only when some OS resource allocation fails, failed to allocate memory, socket, mutex etc and the reason is not because the programmer called the API wrong.
      
      
      In the first category we're dealing with BUGS and when I advocate asserting and terminating the process that only really applies to BUG conditions. If you let an application to continue in a buggy state then you cannot logically reason about it anymore.

      The logical conditions are the typical cases for example "file not found" or whatever. User tries to use the software but there's a problem. The application needs to deal with these but from the software correctness perspective there's no error. The error is only what the user perceives. When your browser prints "404" or "no internet connection" the software works correctly. The error is only from the user perspective.

      Finally the last category are those unexpected situations where something that should not fails. It is quite tricky to get these right. Is straight up exiting the right choice? Maybe the system will have more sources later if you just back off and try again later. Personally in C++ projects my strategy is to employ exceptions and let the callstack unwind to the UI level, inform the user and and then just return to the event loop. Of course the real trick is to keep the program state such that it's in some well defined state and not in a BUGGY state ;-)

      • joesb 7 days ago

        When a process is used to serve multiple requests, I don't think you need to let the whole process terminate just because there is a bug dealing with a single request. Just because we can not reason about the current request does not mean the only way to get to the clean state for other requests is to terminate the whole process.

      • kstrauser 7 days ago

        That sounds about right to me. Worry about the things you can fix and don't worry abut the things outside your control.

    • veidelis 7 days ago

      Makes sense. Better to unwrap via .expect("msg"), though.

      • kstrauser 7 days ago

        That's a good callout, but I do that if and when I can add extra meaningful context.

        From a user's POV, "I already know what file not found means. You don't have to explain it to me again in your own words."

        • Zacru 6 days ago

          The thing I wish more error messages did was tell me exactly which file was not found.

          • ziml77 6 days ago

            This is something really annoying about simple error codes. Sure they're lightweight but how the hell am I supposed to know the problem with my input when all the error information I get is "The parameter is incorrect"? I've actually had cases where I disassembled Windows system libraries to track down the exact validation that was failing.

  • monlockandkey 6 days ago

    Asserts are only available in debug compile mode.

gregwebs 7 days ago

There's a lot of comments here that seem overly critical. The author came up with solutions to extend Go's errors to meet their needs and shared that with the world- thank you.

I have been solving all the same problems and providing libraries that allow for more flexibility so that users can come up with approaches that best meet their needs. I am finally polishing the libraries and starting to write about them:

https://blog.gregweber.info/blog/go-errors-library/ (errors with stack traces and metadata)

https://github.com/gregwebs/errcode (adding codes to errors- working on improving docs and writing about this now).

jaza 7 days ago

Feels like OP is basically implementing exceptions and exception handling at the application level. If this is what you want, then why not just switch to one of the many other languages that has exceptions built in at the language level?

  • KingOfCoders 7 days ago

    I think they use too many sentinel errors [0] I have been doing Java for two decades, and I thought you need to handle individual errors by type. Using Go, I've learned from the code I write, 90%+ of errors I don't need to handle individually, or I can't do anything except bubble an error up. There is the rare case (10%) when a file does not exist, and I try to read an alternative one and I don't bubble up an error.

    For customer support I also found it much easier, instead of an error number, print a UUID that customers can give to support, and that UUID (Request ID) then can be found in the logs to find out what happened by developers.

    [0]:https://dave.cheney.net/2016/04/27/dont-just-check-errors-ha...

    • gf000 7 days ago

      > Using Go, I've learned from the code I write, 90%+ of errors I don't need to handle individually, or I can't do anything except bubble an error up

      So... exceptions are better, because they would do the correct thing by default in the majority of cases?

      • ceving 7 days ago

        Exceptions are easier for the programmer. The programmer has to write less and they clutter the code less. But exceptions require stack traces. An exception without a stack trace is useless. The problem with stack traces is: they are hard to read for non-programmers.

        On the other side Go's errors are more work for the programmer and they clutter the code. But if you consequently wrap errors in Go, you do not need stack traces any more. And the advantage of wrapped errors with descriptive error messages is: they are much easier to read for non-programmers.

        If you want to please the dev-team: use exceptions and stack traces. If you want to please the op-team: use wrapped errors with descriptive messages.

        • ndriscoll 7 days ago

          Messages and stack traces in the error are orthogonal to errors-as-values vs. exceptions for control flow. You could have `throw Exception("error fooing the bar", ctx)`. You could also `return error("error fooing the bar", ctx, stacktrace())`. Stack traces are also occasionally useful but not really necessary most of the time IME.

          Go's error handling is annoying because it requires boilerplate to make structured errors and gives you string formatting as the default path for easy-to-create error values. And the whole using a product instead of a sum thing of course. And no good story for exception-like behavior across goroutines. And you still need to deal with panics anyway for things like nil pointers or invalid array offsets.

        • gf000 7 days ago

          Go messages are harder for both devs and users to read. Grepping for an error message in a codebase is a special hell.

          Besides, it's quite trivial to simply return the exception's getMessage in a popup for an okay-ish error message (but writing a stacktrace prettifier that writes out the caused by exception's message as well is trivial, and you can install exception handlers at an appropriate level, unlike the inexpensibility of error values)

        • prirun 7 days ago

          I tend to use "catch and re-raise with context" in Python so that unexpected errors can be wrapped with a context message for debugging and for users, then passed to higher levels to generate a stack trace with context.

          For situations where an unexpected error is retried, eg, accessing some network service, unexpected errors have a compressed stack trace string included with the context error message. The compressed stack trace has the program commit id, Python source file names (not pathnames) and line numbers strung together, and a context error message, like:

          [#3271 a 25 b 75 c 14] Error accessing server xyz; http status 525

          Then the user gets an idea of what went wrong, doesn't get overwhelmed with a lot of irrelevant (to them) debugging info, and if the error is reported, it's easy to tell what version of the program is running and exactly where and usually why the error occurred.

          One of the big reasons I haven't switched from Python to Go for HashBackup (I'm the author) is that while I'd love to have a code speed-up, I can't stomach the work involved to add 'if err return err("blah")' after most lines of existing code. It would hugely (IMO) bloat the existing codebase.

      • bccdee 7 days ago

        When there's an exceptional case, it's better to handle that explicitly. I think Rust does that best with its single-character ? operator, but I don't want exceptions invisibly breaking out of control flow unless I give them permission to. `if err != nil` is a fair enough way of doing that.

        • int_19h 7 days ago

          It's not a good way to do this because it doesn't force you to either handle or propagate.

          • bccdee 7 days ago

            Yeah there are linters that force you not to implicitly discard errors, but that should really be a compiler error. Still, that's not a problem inherent to the Go's error-handling model.

      • prisenco 7 days ago

        Better is subjective, but I prefer errors as return values because then the function signature states whether an error has to be handled or not. Exceptions can be forgotten about, but returned errors have to be explicitly ignored.

        • gf000 7 days ago

          That's an independent problem - checked exceptions (and the even better effect types) are part of the method signature.

          • prisenco 7 days ago

            Checked exceptions feels six of one half dozen of the other to me.

      • lordofgibbons 7 days ago

        Unless you forget to catch the right type of exception. Then all hell breaks loose.

        • bigstrat2003 7 days ago

          People bitch about checked exceptions in Java but this is precisely why I think they're a great idea. You can't forget to catch the right type of exception.

          • LinXitoW 7 days ago

            The biggest issue with checked exceptions in modern Java is that even the Java makers themselves have abandoned them. They don't work well with any of the fancy features, like Streams.

            Checked Exceptions are nothing but errors as return values plus some syntactic sugar to support the most common response to errors, bubbling.

            • ndriscoll 7 days ago

              Scala's zio library basically gives you checked exceptions that work with things like type inference, streams, async operations, and everything else.

            • gf000 7 days ago

              > They don't work well with any of the fancy features, like Streams.

              Because that would require effect types, which is quite advanced/at a research level currently.

              • layer8 7 days ago

                All it would require is more support for sum types and variadic type parameters, and maybe fix some hiccups in the existing type inference. You can already write a Stream-like API that supports up to a fixed number of exception types (it’s just a bit annoying to write). The main issue at present is that you can’t do it for an open-ended number of exception types and abstract over the concrete set of types.

                • gf000 6 days ago

                  The throws clause would require union types, not sum types though (you can observe it in the catch part of a try catch, e.g. `catch ExceptionA | ExceptionB`. But java can't support unions elsewhere, it will have to be replaced by the two exceptions' common supertype.

                  • layer8 6 days ago

                    I was subsuming union types under sum types here, maybe a bit imprecisely.

                    The following already works in Java (and has for a long time):

                        interface F<X extends Exception, Y extends Exception>
                        {
                            void f(int n) throws X, Y;
                        }
                            
                        void g() throws IOException, SQLException
                        {
                            F<IOException, SQLException> f = n ->
                            {
                                if (n > 0) throw new IOException();
                                else throw new SQLException();
                            };
                            
                            h(f, 0);
                        }
                        
                        <X extends Exception, Y extends Exception>
                        void h(F<X, Y> f, int n) throws X, Y
                        {
                            f.f(n);
                        }
                    
                    We merely want for F and h to be able to work for any number of exception types. We don't need the ability to declare variables of type X | Y for that.

                    Of course, it would be nice not having to write IOException, SQLException multiple times in g, and instead have some shortcut for it, but that's not strictly necessary.

                    The main problem currently is that you have to define F1, F2, F3,... as well as h1, h2, h3,... to cover different numbers of exception types, instead of having just a single definition that would abstract over the number of exception types.

          • lomnakkus 7 days ago

            No, but you can easily end up missing some because somebody wrapped them in some sub-type of RuntimeException because they were forced(!) to. This happens all the time because the variance on throws clauses it at odds with the variance of method signatures (well, implementations, really -- see below).

            A new implementation of a ThingDoer usually needs to do something more/different from a StandardThingDoer... and so may need to throw more types of exceptions. So you end up having to wrap exceptions ... but now they don't get caught by, say, catch(IOException exc). If you're lucky you own the ThingDoer interface, but now you have a different problem: It's only JDBCThingDoer which can throw SQLException, so why does code which only uses a StandardThingDoer (via the ThingDoer interface) need to concern itself with SQLException?

            Checked exceptions in Java are worse than useless -- they actively make things worse than if there were only unchecked exceptions. (Because they sometimes force the unavoidable wrapping -- which every place where exceptions are caught needs to deal with somehow... which no help from the standard "catch" syntax.)

            • iainmerrick 7 days ago

              One thing you can do in Java is parameterise your interface on the exception type. That way, if the implementation finds it needs to handle some random exception, you can expose that through the interface -- e.g. "class JDBCThingDoer implements ThingDoer<SQLException>". Helper classes and functions can work with the generic type, e.g. "<E> ThingDoer<E> thingDoerLoggingWrapper(ThingDoer<E> impl)".

              I think this works really well to keep a codebase with checked exceptions tractable. I've always been surprised that I never saw it used very often. Anyone have any experience using that style?

              I guess it's not very relevant any more because checked exceptions are sadly out of fashion everywhere. I haven't done any serious Java for a while so I'm not on top of current trends there.

              • int_19h 7 days ago

                How do you handle the situation where the code might need to throw (pre-existing) exceptions that don't share a useful base class?

                • iainmerrick 7 days ago

                  I don’t remember! Possibly that’s one of the cases where it doesn’t work out.

                  Of course, if you had proper sum types, that situation wouldn’t be a problem.

                  • gf000 6 days ago

                    Java has proper sum types, but what one needs here is union types. They are not the same, sum types are labeled and are disjoint.

                    You want `MyException | ThirdPartyException` here, though.

                    • iainmerrick 6 days ago

                      You're right, I had it backwards! Thanks for the correction.

                      Now I'm wondering, could that actually be all that's needed to rescue checked exceptions? Is there any language that has that combination of features?

                      • int_19h 6 days ago

                        Back when Java didn't have lambdas, one of the more advanced lambda proposals (http://www.javac.info/closures-v06a.html) had this exact thing for this exact reason.

                        Unfortunately, this take on lambdas was deemed too complicated, and so we got the present system which doesn't really try to deal with this use case at all.

                      • gf000 6 days ago

                        Well, scala has union types, but it doesn't do checked exceptions per say (but it does have a very advanced type system so similar structure can be easily encoded). I think checked exceptions is pretty rare, so I don't know.. probably some research language (but they often go the extra mile towards effect types)

              • lomnakkus 7 days ago

                In a former life I worked with a codebase that used that style. Let's just say it isn't enough.

                • iainmerrick 7 days ago

                  Can you remember what sort of problems you were hitting?

          • evantbyrne 7 days ago

            That would be true if not for Java making the critical mistake of excluding RuntimeException from method definitions, so in-practice people just extend RuntimeException to keep their methods looking "clean".

            • bbatha 7 days ago

              Or are forced to because they want to use generics or lambdas.

              • gf000 7 days ago

                Both work with checked exceptions.

                • int_19h 7 days ago

                  The problem is that there's no way to specify an exception specification like "I propagate everything that this lambda throws" (or, for generics, "that method M of class C throws").

                  • layer8 7 days ago

                    No, but you can have an interface like

                        interface Func<P, R, X extends Exception>
                        {
                            R func(P param) throws X;
                        }
                    
                    or the same with more than one exception type, and convert your lambda to that. This works. The only problem is that you can’t abstract over an arbitrary number of exception types.

                    In principle, one could imagine a syntax for variadic type parameters like

                        interface Func<P..., R, X... extends Exception>
                        {
                            R func(P... params) throws X...;
                        }
                    
                    that would solve that problem.
                    • erik_seaberg 6 days ago

                      Will the compiler infer that a lambda or a method ref implements Func with its exception type param, or do you have to rewrite call sites?

                      • layer8 6 days ago

                        The compiler already infers that in current Java for one checked exception type, and also for several exception types at least in some cases (the latter seems to be a little more buggy in the current implementation).

          • pjmlp 7 days ago

            Additional info, they predate Java, having made an appearance in CLU, Modula-3 and C++, before Java was invented.

            I miss them in other languages every time I need to track down an unhandled exception in a production server.

            • Tempest1981 7 days ago

              >> People bitch about checked exceptions

              > they predate Java, having made an appearance in CLU, Modula-3 and C++

              Checked exceptions in C++? Can you force/require the call chain to catch an exception in C++? At compile time?

              • pjmlp 7 days ago

                That was part of the idea behind them yes, as many things in WG21 design process, reality worked out differently, and they are no longer part of ISO C++ since C++17.

                Although some want to reuse the syntax for value type exceptions, if that proposal ever moves forward, which seems unlikely.

          • Fire-Dragon-DoL 7 days ago

            The problem is that a checked exception makes sense only at a relatively high level of the app, but they are used extensively at a low level

          • piva00 7 days ago

            My main gripe with checked exceptions is they create a whole other possible code path on each `catch` clause. I tend to keep checked exceptions to the absolute minimum where they actually make sense, all the rest are RuntimeExceptions that should bubble up the stack.

            • KarlKode 7 days ago

              That's kind of how you do it in go. Either:

              1. Bubble up error (as is/wrapped/different error. 2. Handle error & have a (possibly complex) new code path.

              There's also the panic/recover that sometimes is misused to emulate exceptions.

            • LinXitoW 7 days ago

              But so would every single other method to react to different types of errors, no?

              In something like go, you're even required to create the separate code path for EVERY SINGLE erroring line, even if your intention is simply to bubble it up.

            • layer8 7 days ago

              They don’t create any other code paths than RuntimeExceptions.

        • xienze 7 days ago

          You may be thinking a bit too much about what happens in _Go_ when you forget to check for an error response from a function -- the current function continues on with (probably) incorrect/nil values being fed to subsequent code. In Java when an uncaught exception is thrown, the exception makes its way back up the call stack until it's finally caught, meaning subsequent code is _not_ executed. It's actually a very orderly termination. In any Java web framework (Spring et al) there's always a centralized point at which exceptions are caught and either built-in or user-specified code is used to translate the error to an HTTP response.

          This makes for much more pleasant code that is mostly only concerned with the happy path, e.g., my REST endpoint doesn't have to care if an exception is thrown from the DAO layer as the REST endpoint will simply terminate right then and there and the framework will map the exception to a 500 error. Why anyone would prefer Go's `if err != nil {}` error handling that must be added All. Over. The. Place. at every single level of the application is beyond me.

          • LinXitoW 7 days ago

            My slightly snarky take is that liking Go is simply a defensive reaction to one too many AbstractFactoryBeanFactory. Too many abstractions overloaded their "abstraction-insulin", so now they can only handle minute amounts of abstraction.

            • gf000 7 days ago

              I liked your other comment's take with the monads better :P

        • Yoric 7 days ago

          Is it time to brag about Rust error-handling or should we wait a little?

          • gf000 7 days ago

            It's certainly better than Go's (Go's is barely better than C's and that's quite a low bar), but I don't think that sum types are the global optimum.

            Exceptions are arguably better from certain aspects, e.g. defaulting to bubbling up, covering as small or wide range as needed (via try-catch blocks), and auto-unwrapping without plus syntax. So when languages with proper effect types come into mainstream we might reach a higher optimum.

            • LinXitoW 7 days ago

              Maybe I'm too pessimistic, but Rust style error handling feels like the global optimum under the constraint that the average developer understand it.

              Go is a language that exists purely because people saw Monads in the horizon and, in their panic, went back to monke, programming wise. Rust error handling is something that even many Go fans have said is a good abstraction.

            • Yoric 7 days ago

              No, sum types are certainly not a global optimum. But they remain the best error-handling mechanism that I've used professionally so far.

              Effect types (and effect handlers) are very nice, but they come with their own complexities. We'll see if some mainstream language manages to make them popular.

          • KingOfCoders 6 days ago

            After a decade of Scala and Rust I no longer believe in monads and prefer the way Go does error handling.

              for a <- a()
                  b <- b() {
                  return a + b
              }
              
            
            looks nice but only by hiding error handling. Today I like looking at code and see the error handling. With monads you end up with monad stacks and transformers which introduce their own failure states.
            • Yoric 4 days ago

              I haven't played with Scala in a while, but Rust's error-handling is not syntactically monadic. If you look at the code, you don't see combinators, you see either error propagation or error handling.

        • LinXitoW 7 days ago

          Which Go doesn't fix either, because their errors are all just "error", aka you can also forget to catch the right type of error.

          If only there was a way to combine optimizing the default path (bubbling), and still provide information on what errors exactly could happen. Something like a "?" operator and a Result monad...

      • KingOfCoders 6 days ago

        Exceptions solve the 10% better with the tradeoff of their inflexibility for the other 90% of cases.

    • RVuRnvbM2e 7 days ago

      This was exactly my train of thought. I even went looking for Dave's blog post about it before I saw your comment. :D

  • oefrha 7 days ago

    No, TFA is mostly about making errors consistent in a large application, while exception (vs error as standard return value) is largely about easier bubbling, which is one thing TFA hardly talked about (maybe I missed it, I only skimmed the article). In fact it spends a lot of energy on wrapping which is the opposite of automatic bubbling provided by exceptions by default. Throwing random, inconsistent module/package/whatever-specific exceptions from everywhere causes most of the same problems described in TFA.

    I feel like all the canned comments saying TFA is about implementing exceptions / ADT result type are from people who didn’t read the article and just want to repeat all the cliche on the topic (for easy karma? No idea what’s the point).

  • zeroxfe 7 days ago

    That's not how I read it. It's more about having a consistent approach to managing error types in large code bases. This is a common problem with exception-based languages too.

  • ori_b 7 days ago

    How so? This is about how errors are defined, not how they're propagated through the application. Feels like you didn't actually read what was being done by the OP.

    • retrodaredevil 7 days ago

      Exceptions have a hierarchical nature to them in most languages, or at least have some sort of identity to them. Your correct that the author doesn't try to change the way errors are propagated, but you can see similarities between what the author is creating themselves, and what already exists in languages with exceptions.

sethammons 7 days ago

The concepts aren't wrong (structured logs from structured errors), but I find this code to be very un-go-like and there are obvious signs of trying to write java in go (iFace, structs with one property "because everything needs to be contained in an object", and others).

Return "error" and not a custom type "mypkg.Error" - you run into more nil interface pointer problems and you are breaking an idiom.

Let me provide a counter example for helping create structured logs from structured errors that I wrote up that is much more idiomatic if not more narrowly focused:

https://github.com/sethgrid/kverr

As in the article, if you want to attach "username: foo", this package lets you return kverr.New(err, "username", foo, ...), and then extract a slice or map later for logging like logger.WithArgs(YoinkArgs(err)...).

rednafi 7 days ago

Go’s error handling is still cumbersome and lacking. I love writing Go but I don’t want to ever adopt anything like. It’s bending over backwards to achieve something sum types provide and this pattern is a mess.

  • KingOfCoders 7 days ago

    I thought so too, after years with Scala and Rust. Now I think (X, error) is fine, indeed I think it is great for it's simplicity. I might want to have a safe assignment

       // x() (X,error)
       x != x() 
       // x is X
       // return on error
    
    But the urge is not very high.
    • dontlaugh 7 days ago

      The problem is indeed composition. How do I chain 3 calls that short-circuit on the first error? In Go that's verbose in the extreme. With exceptions it's easy to miss an error. Sum type errors have neither problem.

      • KingOfCoders 6 days ago

        If you want to chain 3 calls and short circuit on the first error, don't use Go. I like explicit code without magic that I can't see.

           for a <- a()
               b <- b() {
               ...
           }
        
        I don't know what is happening there. Is it summing up errors? Is if short circuiting? Does it have error handling? Is it async? Is it a monad stack with transformers? That code could mean anything. Good luck coming back to that code after six months. I think the Sum type solution focuses on the happy path, the Go solution assumes you need to focus on the things that go wrong.

        Additionally you have that nasty dependency of the Result type of a() to b() to make it work. And I've spent hours creating the right Transformer stack to compose more than two monad types like Result, Future and IO.

    • KingOfCoders 4 days ago

      The problem here is that if your style is not to return the error, but wrap it, you need a way to wrap it.

      You could use special syntax

         x != x() ~ "x failed"
      
      that would wrap the error and return it,

         x, err := x() 
         if err != nil {
            return fmt.Errorf("x failed: %w", err)
         }
      
      is more verbose but less so compared to the "just return error" case.

      If you want to get rid of that, you need to have some auto conversion magic. This the way Rust does it, but it adds magic, you can't see the error handling, Rust needs (slow compiling) macros for this etc.

  • solumunus 7 days ago

    I would be all over Go with a better type system or exceptions.

    • thiht 7 days ago

      If Go ever adds exceptions[1] as an error handling mechanism, I'm out. Value errors are far superior to exceptions, even in their current state in Go.

      [1]: assuming panics are not an error handling mechanism but a recovery mechanism

      • LinXitoW 7 days ago

        Checked Exceptions are nothing but errors as values with some syntactic sugar for the most common use case (bubbling up the error).

        Gos version of value errors is just micrometers ahead of C style error codes. In both cases you get told "there could be an error", the error is a value of one single type (error/int), and you have to manually find out which different errors this value could represent.

        If you want to know what you're missing, check out Rusts error handling.

        • thiht 7 days ago

          I love Rust error handling and wish it would have been the default in Go (too late to shoehorn it), but it has nothing to do with exceptions.

      • jdbxhdd 7 days ago

        Panics can be values and errors don't have to be values in go?

        I think you are missing up concepts here

        • thiht 7 days ago

          I’m not talking about the underlying model, I’m talking about the control flow. What I mean is errors are explicit values belonging to the signature of the functions.

    • gf000 7 days ago

      C#, OCaml, Java, Scala, Kotlin all fulfill these requirements, while targeting the same niche.

      • LinXitoW 7 days ago

        Go has insanely good tooling and very fast single binary compiling.

        While all these languages (afaik) can reach similar levels of functionality (GraalVM e.g.), it's more work. As much as I hate the language Go, I can't deny how braindead simple it is to just make a tool with it. I don't need to choose a build tool, or a runtime version, there's a library for everything and most developers with more than a room temp IQ can immediately start working on it.

        The only other language that currently comes close is Rust. If only they had stuck to using a GC, I'd be in heaven.

        • Capricorn2481 7 days ago

          Is this that different from languages like Dotnet? I think one line in your config can make it Aot compiled.

      • solumunus 7 days ago

        Yes there are indeed lots of languages in existence.

vrosas 7 days ago

Trying to shoehorn code errors into HTTP errors is a prime example of conflating two very different things because sometimes they look similar. Let different things be different, I like to say. You either let your HTTP handlers do their own error-to-http-code management or you end up with a massive switch statement trying to map them all, or whatever monstrosity this approach is.

Also the entire problem of the OP would go away if they just implemented opentelemetry tracing to their logs.

  • pluto_modadic 7 days ago

    ah, yes, completely separate.

    HTTP code: 200 ok

    Body: {"error":"internal server error"}

    • bb88 7 days ago

      My favorite example of this was renaming a 500 error due to an unhandled exception to a 400 error to make it look like it was the error of the caller. Management was also possibly tracking 500 errors too, so the 400 could also have been gaming the system.

      In some mental models, though, it did make sense. Particularly the one that went, "Well, we never would have errored, if you never called us!"

      • xmprt 7 days ago

        It's somewhat fair though. If there's a case that would cause errors for the system and it's a case that you're not supposed to handle, then a 400 error sounds perfect for that case. For example, if you have a service and it panics/returns 500 when you pass in an empty user id, then you could instead return a 400 before you hit the panic and all is good.

        • bb88 7 days ago

          Normally you should attempt to find all the corner cases and present the errors to the user -- before processing the request. If you can't do this, it's time to rethink how your api works. A good api is simple to use and simple to write.

          It also simplifies your business logic in that all the possible user defined idiocies are caught before your business logic actually processes the request.

          Some frameworks do this better than others. And rather than documentation, I tend to prefer comprehensive error messages.

          • fmbb 7 days ago

            > Normally you should attempt to find all the corner cases and present the errors to the user -- before processing the request.

            That is what they are suggesting. You check the request and return 400 if it’s bad.

            • bb88 7 days ago

              One example of a 500 error is a null pointer error. Was it a bad request or a logic error? One is your problem the other is not. Just returning a 400 hides that issue. Validating the payload before processing it simplifies the issue for everyone involved.

              A 500 error should be your problem with a stack trace in the log. A 400 error should provide enough description to tell the user it's theirs and how to fix it.

              Just marking recoding a 500 to a 400 because of a null pointer error would get noticed on a code review and marked up on a code review.

              • bdangubic 7 days ago

                400 - you fucked up

                500 - we fucked up

    • maccard 7 days ago

      Think about what the client code looks like to handle this and the alternative, particularly if you’re implementing an sdk and the api is an implementation detail. I’m not saying I would choose this path, but it certainly reduces the amount of code on both sides that you have to write.

    • ranger207 7 days ago

      If HTTP is your API's transport layer, then HTTP errors should be related to problems with the transport layer and not to API itself. Is the internal server error caused by a bad HTTP request or a bad API request?

    • saghm 7 days ago

      Honestly, my controversial take is that for APIs, it would be cleaner to not use any HTTP status codes other than 200 and have all of the semantics in the body of the response. I'm sure someone smarter than me will jump in and explain why this wouldn't work in practice, but it just feels like application semantics are leaking from a much more natural location in the body of the response. I feel similarly about HTTP request methods other than POST in APIs; between the endpoint route and the body, there should be more than enough room to express the difference between POST, PATCH, and DELETE without needing them to be encoded as separate HTTP methods.

      • turbojet1321 7 days ago

        I'm sympathetic, but this can have issues if you want your API to be used by anything other than your own client, including stuff like logging middleware. A lot of tools inherently support/understand HTTP status codes, so building on top of that can make integration a lot easier.

        We, very roughly, do it like this:

        - 200: all good

        - 401: we don't know who you are

        - 403: you're not allowed to do that

        - 400: something's wrong and you can fix it

        - 500: something's wrong and you can't fix it

        Each response (other than 401) includes a json blob with details that our UI can do something with, but any other consumer of the API or HTTP traffic still knows roughly what's going on.

        I've worked in places where we really sweated on getting the perfect HTTP status codes, and I'm not sure it added much benefit.

        On POST - I find myself doing logical GETs with POST a lot, because the endpoint requires more information than can be conveyed in URL params. It makes me feel dirty, and it's obviously not RESTful but you know - sometimes you just have to get things done.

        • brabel 7 days ago

          You've just described basically everything a dev needs to know to implement HTTP APIs that report status codes properly, yet some people still seem to think it's oh so complicated. What has gone wrong?

          • turbojet1321 7 days ago

            I can understand how people might look at all the full list status codes and think it's all too hard, but yes, once you realize that there are only a handful you need most of the time it all becomes pretty simple.

            • saghm 7 days ago

              Sure, but the problem in my opinion is that while the handful that you pick is totally reasonable, someone else might pick a slightly different handful that's just as reasonable. If I want to use a new API and delete a user, how do I know if it uses DELETE or POST, and if it will return 401 or 403? At best, you'll be able to skim through the documentation more quickly due to having encountered similar conventions before, but nothing stops that from happening in terms of request and response bodies either.

              The fact that existing tooling relies on some of these conventions is probably a good enough reason to do things this way, but it's not obvious to me that this is because it's actually better rather than inertia. Conventions could be developed around the body of requests as well, and at least to me, it doesn't seem obvious that the amount of information conveyed at the HTTP method/response status layer was necessary to try to separate from the semantics of the request and response bodies. I'm sure that a part of that was due to HTTP supporting different content types for payloads, but nowadays it seems like quite a lot of the common alternatives to JSON APIs were designed not to even use HTTP (GraphQL, gRPC, etc.), which I'd argue is evidence that HTTP isn't necessarily being used as well for APIs as some people would like.

              To make something explicit that I've been alluding to, everything I've said is about using APIs in HTTP, not HTTP in the context of viewing webpages in a browser. It really seems like a lot of the complications in HTTP are due to it trying to be sufficient for both browsers and APIs, and in my opinion this comes mostly at the expense of the latter.

              • bvrmn 7 days ago

                It's quite unclear what's your point. HTTP APIs should have minimal status code set. Parent described it perfectly. It's simple, practical (especially from monitoring perspective) and doesn't intervenes with a service domain.

                It seems you have some alternative in mind but it wasn't presented.

                • saghm 7 days ago

                  I don't consider what the parent comment listed as "minimal". The alternative I described is literally in my initial comment; using only 200 for APIs is "minimal".

                  • bvrmn 6 days ago

                    Only 200 is detrimental for monitoring. You have to parse response body to classify response types. HTTP status codes is a cheap and already existing way to get insights into service behavior.

                  • turbojet1321 7 days ago

                    It's minimal if you want to integrate with anything that understands HTTP status codes.

          • kdmtctl 7 days ago

            Need an AI playground to paste error responses and fix the code.

        • Izkata 7 days ago

          > Each response (other than 401) includes a json blob with details

          ...until you discover an intermediate later strips the body of certain error codes. Like apache, which IIRC includes all 5xx and most 4xx.

      • bilbo-b-baggins 7 days ago

        Go ahead try to implement something like cross-origin requests or multipart encoded form uploads just using the body semantics you described. I’ll wait.

        Also that is not a controversial take. It is at best a naive or inexperienced take.

        • saghm 7 days ago

          Both of those happen in the context of web browsing rather than existing in APIs in a vaccuum; I'd argue that there's absolutely no reason why the mechanism used to request a webpage from a browser needed to be identical to the mechanism used for the webpage to perform those actions dynamically, which is pretty much my whole point: it doesn't seem obvious to me that it's useful to encode all of that information in an API that isn't also being used to serve webpages. If you are serving webpages, then it makes sense to use semantics that help with that, but I can't imagine I'm the only one who's had to deal with bikeshedding around this sort of thing in APIs that literally only are used for backends.

          • yxhuvud 7 days ago

            Multipart messages definitely happens in APIs as well, if you are handling blobs that are potentially pretty big.

      • omcnoe 7 days ago

        There are a lot of useful network monitoring tools that can analyze HTTP response codes out of the box. They can't do this for your custom application error format. You don't have to go crazy with it, but supporting at least 200/400/500 makes it so much easier to monitor the health of your services.

      • LouisSayers 7 days ago

        I like to find a middle ground.

        I use http status codes to encode how the _request_ was handled, not necessarily the data within the request.

        A 400 if you send mangled JSON, but a 200 if the request was valid but does not pass business validation rules.

        Inside the 200 response is structured JSON that also has a status that is relevant at the application level.

        Otherwise how can for example you tell if a 404 response is because the endpoint doesn't exist, or because the item requested at the endpoint doesn't exist?

        I believe it's important to have a separation between what is happening at the API level vs Application, and this approach caters for both.

        • bvrmn 7 days ago

          > A 400 if you send mangled JSON, but a 200 if the request was valid but does not pass business validation rules.

          What about empty required field in JSON? Is it still mangled or it's already BL?

          • LouisSayers 7 days ago

            As it's not to do with the http request and the body was able to be parsed, in my book that'd be classified as being at the application level, so results in a 200 status with a JSON response detailing the issue

            200 OK {status: "failed", errors: ["field X is required"]}

            How you deal with this on the application side, what JSON statuses you have etc is up to you.

            • bvrmn 6 days ago

              It's an client error and it's highly beneficial to make it 400 for monitoring purposes. You want to see your FE or mobile devs deployed a faulty app.

              • LouisSayers 6 days ago

                That depends on how you set up and do your monitoring. Not every failure needs to be indicated by an HTTP status code.

                For example, on a server I'm working on there are helper functions that generate different types of responses. Responding in certain ways will produce a 200, but will also log a warning or error.

                On the client side, you can create request helpers that all requests go through and that can resolve requests appropriately, rendering error messages to the user etc.

                The main thing is to have a well defined, consistent approach.

                • bvrmn 6 days ago

                  It's easier to do nothing (already available status codes) then to do something, isn't it? Developers are awful at following consistent approach.

      • int_19h 7 days ago

        One reason for using HTTP verbs is to distinguish between queries and updates, and for the latter, between idempotent and non-idempotent updates. This in turn makes it possible to do things like automatically retry queries on network errors or cache responses where it is safe to do so.

      • kdmtctl 7 days ago

        Anecdotally the color codes make life much easier when debugging a new API. You instantly see that's something is wrong. If everything is green you don't realize that something is wrong until you carefully read a uniquely structured custom response. Saves a lot of effort.

      • JodieBenitez 7 days ago

        > Honestly, my controversial take is that for APIs, it would be cleaner to not use any HTTP status codes other than 200 and have all of the semantics in the body of the response.

        We've been doing that for 20 years with json-rpc 1.0

            --> { "method": "echo", "params": ["Hello JSON-RPC"], "id":1}
            <-- { "result": "Hello JSON-RPC", "error": null, "id": 1}
        
        In this context, HTTP is just the transport and HTTP errors are only transport errors.

        Yes, you throw away lots of HTTP goodies with that, but there are many situations where it makes more sense than some half-assed ReSTish API. YMMV.

      • tayo42 7 days ago

        Your kind of describing things like thrift and other rpc servers?

        • saghm 7 days ago

          Possibly. I'm not sure why it should require switching to an entirely different protocol though; my point is that making an API that only uses POST and always returns 200 is something that already works in HTTP though, and I have trouble understanding why that isn't enough for pretty much everything.

          • necovek 7 days ago

            You lose some benefits of features already implemented by existing HTTP clients (caching, redirection, authorization and authentication, cross-origin protections, understanding the nature of the error to know that this request has failed and you need to try another one...).

            It's is certainly not comprehensive, but it's right there and it works.

            Moving to your own solution means that you have to reimplement all of this in every client.

            • Joker_vD 7 days ago

              > understanding the nature of the error to know that this request has failed and you need to try another one...

              Please elaborate. In my experience, most of HTTP client libraries do not automatically retry any requests, and thank goodness for that since they don't, and can't, know whether such retries are safe or even needed.

              > redirection

              An example of service where, at the higher business logic level, it makes sense to force the underlying HTTP transport level to emit a 301/302 response, would be appreciated. In my experience, this stuff is usually handled in the load-balancing proxy before the actual service, so it's similar to QoS and network management stuff: the application does not care about it, it just uses TCP.

              • necovek 6 days ago

                They don't retry on errors but they know it is an error. Eg. imagine a shell script using curl or wget and trying multiple URLs as a health check (eg. on different round-robin IPs). Without these "generic" HTTP tools knowing that this is a "failure", you would need to implement custom parsing for any case like this instead of relying on the defined "error" and "success" behaviour.

                The same holds true if you are using any programming library: there is a plethora of handlers for HTTP errors.

                As for redirection, a common example is offering downloads through S3 using pre-signed URLs (you share a URL with your own domain, but after auth redirect to a pre-signed S3 URL for direct download or upload).

          • lll-o-lll 7 days ago

            You are thinking like a developer, but there is a world of networking as well. Between your client and server will be various bits of hardware that cannot speak the language you invent. 200, 401, 500 — these are not for the use of the application developer — but rather the infrastructure engineer.

          • tayo42 7 days ago

            You need some kind of structured way to describe the action to take, what the result is or what the error is. so the client and server can actually parse the data. that's the protocol, whether its something formal like rpc libraries, or "REST"-ish or w/e

            json-rpc is probably what your describing over http, maybe if you squint enough graphql too

          • kaoD 7 days ago

            Something being "enough" doesn't mean it's optimal. There's a huge stack of tools that speak HTTP semantics out of the box; including the user agent, i.e. the browser (and others), but also stuff like monitoring tools, proxies, CORS, automation tools, web scrapers...

            You don't need to reinvent HTTP semantics when HTTP is already there, standard, doing the right thing, compatible with millions of programs all across the stack, out of the box.

            HTTP is so well designed it almost makes me angry when people try to sidestep it and inevitably end up causing pain in the future due to some subtle semantic detail that HTTP does right and they didn't even think to reimplement.

            And the only solution to such issues (as they arise, and they will) is to slowly reimplement HTTP across the whole stack: oh, you need to monitor your internal server errors? Now you have to configure your monitoring tool (or create your own) to inspect all your response bodies (no matter how huge) and parse their JSON (no matter how irrelevant) instead of just monitoring the status code in the response header and easily ignore the expensive body parsing.

            Even worse when people go all the way. If we don't need status codes, why do we need URLs at all? Just POST everything to /api/rpc with an `operation` payload. Congrats, none of your monitoring tools can easily calculate request rates by operation without some application-specific configuration (I wish this was a made up scenario).

            Just use HTTP ffs. You'd need a very good reason not to use it.

      • lmm 7 days ago

        Yeah, that's usually the pragmatic thing to do. Facebook does that with their API, for example.

        4xx or 5xx gets you the default HTTP handling for that kind of error. Occasionally - especially in small examples - that default handling does what you want and saves you duplicating a lot of work. More often it gets in your way.

        I'd compare it to browser default styling - in small examples it sounds useful, but in a decent-sized site you just end up having to do a "CSS reset" to get it out of the way before you do your styling.

      • evnix 7 days ago

        This is the way to go, pretty much solves, 404 resource not found or route not found. But you will get laughed at by so called architectural dogmatists. Remember we aren't really doing REST, it's just RPC and let's call it that.

        Shoehorning http protocols error codes as application error codes, drinking the cool aid and calling it best practice is beyond bizzare.

        • marcus_holmes 7 days ago

          Agree. "200 - successfully failed to do the thing" is valid and useful.

          500 is "failed to do anything at all"

  • bedobi 7 days ago

    The sane thing to do is to let lower layer functions return

        Either<Error, Foo>
    
    then, in the HTTP layer, your endpoint code just looks like

        return fooService
                    .getFoo(
                        fooId
                    ) //Either<Error, Foo>
                    .fold({ //left (error) case
                        when (it) {
                            is GetFooErrors.NotAuthenticated -> { Response.status(401).build() }
                            is GetFooErrors.InalidFooId -> { Response.status(400).build() }
                            is GetFooErrors.Other -> { Response.status(500).build() }
                        }
                    }, { //right case
                       Response.ok(it)
                    })
    
    the benefits of this are hard to overstate.

    * Errors are clearly enumerated in a single place. * Errors are clearly separated from but connected to HTTP, in the appropriate layer. (=the HTTP layer) Developers can tell from a glance at the resource method what the endpoint will return for each outcome, in context. * Errors are guaranteed to be exhaustively mapped because Kotlin enforces that sealed classes are exhaustively mapped at compile time. So a 500 resulting from forgetting to catch a ReusedPasswordException is impossible, and if new errors are added without being mapped to HTTP responses, the compiler will let us know.

    It beats exceptions, it beats Go's shitty error handling.

  • rollulus 7 days ago

    The error to code in the http handler is the true path. It’s the only place where the context and knowledge is about semantics. In one endpoint if something is not found it can be a proper 404, if its existence is truly optional. In another endpoint the absence might very well qualify as a 500.

    • Groxx 6 days ago

      100% this.

      Deciding on end user error handling in a low level is making assumptions that cannot be known at the low level. The caller decides how something is going to be handled and presented, not the callee, or you inevitably miss them in important places and silently miscategorize stuff. Far better to have that scenario lead to a 500 (unmapped error, unknown problem) so it can be fixed.

    • bvrmn 7 days ago

      404 is quite an ominous thing. 404 because route is not found or entity not found. God bless your monitoring.

      • rounce 7 days ago

        422 is frequently used for this case despite being part of the WebDAV extensions.

rollulus 7 days ago

Ah, a God error package that has all seeing knowledge of the domain around it. What a monstrosity.

  • mrkeen 7 days ago

    It's not the worst idea for an organisation to centralise stuff that needs to be centralised.

    Like defining protobuf schemas, it's no good if each team does its own thing.

  • joesb 7 days ago

    Either the package does or the developers do. And only one of them has compilation checks.

nikolayasdf123 7 days ago

> centralized system [... for errors]

dont think this will scale. errors are part of API. (especially Go mantra errors are values https://go.dev/blog/errors-are-values it is ever more prominent). and each API is responsibility of a service

so unless you deal with infrastructure or standards/protocols layer (say you define what HTTP 500 means or common pattern for URL paths in your API), then better not couple all services. those standards are very minimal and primitive that works for everything, which is opposite what you doing here aggregating all the specifics into single place

winstonp 7 days ago

I agree Go error handling is unoptimal, but this is simply not the right approach. This essentially turns error handling into a whole other language, almost like how Ginkgo is a separate language for handling tests.

  • wruza 7 days ago

    And most languages are lacking this useful error language. You can’t speak if you have no language, so having it must be a good thing.

    The only questionable thing here is that this framework is not a part of the main language still, which means near zero adoption. But that train has sailed.

ThePhysicist 7 days ago

I think that's overkill, most of the time I just bubble errors up and I have very few cases where the error handling depends on the type of error. I guess it's because I don't use errors for things that are recoverable and try to fix them instead inside the given function. An example given here in the thread is reading from a file and if it doesn't work try a backup. Rather than having a function that reads from a file and returns a bunch of different errors I'd just make one with a list argument and then handle the I/O errors inside, and return an "unrecoverable" error otherwise.

For adding context, %w is good enough I find, though as I said I only very sparingly use errors.Is(...). Go isn't a language that's designed around rich error or exception types, and I don't think you should use it like that.

  • Yoric 7 days ago

    Well, yes, if you're just using errors as error messages, you only need strings and %w. That's usually good enough if you're writing an application.

    However, if you're writing a library, chances are that your users want to catch the errors, find out whether the call failed because, say, the remote API is down or because the password is wrong.

    Or if you're writing an API, you probably want to return different error codes. If your errors are bubbling, you'll need to somehow `errors.Is`/`errors.As` somewhere.

  • eptcyka 7 days ago

    Yea, but like, when making an HTTP request, a timeout is significantly different from a failure to open a socket from a failure to resolve the hostname from a 429 error. And often it is up to the caller to decide how to handle those situations.

Animats 7 days ago

Is this just someone's proposal, or a formal addition to Go, or what?

"All errors must implement the Error interface." That's a step forward.

Rust really has the same error handling as Go - return an error status. But the syntax is cleaner. Rust thrashed around with errors at first. Then things sort of settled down. At this point, everybody uses Result<UsefulValue, Error>, but "Error" is just a trait that doesn't require much information. And "?" for propagating errors upwards is a huge convenience.

It's probably too late to retrofit "Result" and "?" into Go libraries, although they'd fit the language.

  • gf000 7 days ago

    > Rust really has the same error handling as Go

    Not at all. Rust has proper sum types, that it can return just like anything else in the language, while Go has a special cased error return slot (one may be tempted to call it an ugly hack), and it can return a value on both, which it does in some standard library calls.

    • margalabargala 7 days ago

      Not at all. Go has an error type, and Go functions have the ability to return zero, one, two, or more items, ordered however the developer likes. An error may be among those, as desired, and populated as desired.

      Some software also writes to both STDOUT and STDERR.

      • gf000 7 days ago

        I know, special cased may have been better worded as "just a convention". My point is, this is not much different than using a thread-local variable, like errno, and adds useless confusion - your return values represent n*m values, while there is only n+m case with proper error semantics.

        Re STDERR: but shells don't decide whether a program execution failed on having written to STDERR, but by the returned singular error code.

        • margalabargala 7 days ago

          I agree with everything you've written in this comment.

          I'd like to split a hair here and say, this is a "Go's standard library" problem, and not a "Go language" problem.

          Good API design for a software package should have proper error semantics.

          Good API design for a language, allows for flexibility in actual implementation, alongside standards that say "you SHOULD do this".

          • gf000 7 days ago

            Disagree. This level of convention is inseparable from the language.

            Not doing the conventional error return in go would be akin to using a Return sum type in reverse, putting the success value into the Error case..

  • arp242 7 days ago

    One of the issues in Go is that if all you ever do if "if err != nil { return err }", you will quickly run in to problems because you will have errors like "open foo: no such file or directory" or "sql: no rows in result set" without a clue where that error came from. Sometimes that's obvious, often it's not.

    I'm not sure how Rust handles that? But it's more than just "propagate errors", but more like "propagate errors with the appropriate context for this specific error".

    • bbatha 7 days ago

      Rust uses the `?` operator to convert between error types which allows for users and libraries to hook in to the error before its returned.

      There are a number of helper libraries that provide an extended type erased error type to attach a real stack trace to the error, such as `anyhow`. These helper libraries also provide ways to attach extra metadata to the error so you can do things like `returns_a_result().context("couldn't do it")?` so you can quickly annotate the error. The standard library is support for this through a `context.Value` like api on the Error trait. The std lib `Error` trait also has functions for find the cause of the error and traverse a collected chain of errors, very similar to go's `errors.Cause` api.

      Rust also has a number of libraries for making specific error types like `thiserror` which can help generate error enums with the implementations required to carry backtraces, context and causes.

      • kbolino 7 days ago

        Yep, if you want wrapped errors in Rust, you use the anyhow crate. It leans heavily into dyn so has some performance tradeoffs, but it's roughly the same performance-wise as Go's error interface (which also uses a vtable under the hood).

        • kibwen 7 days ago

          Though using a dynamic error in Rust should only impose an allocation cost on the error path, and I presume Go is the same.

Cloudef 7 days ago

Posts like these remind me how go really has nothing going for it apart from goroutines and channels. It's awkward mix of low level and high level with C like influence, which is weird considering it's a GC language.

binary132 7 days ago

I have been seeing this pattern repeated over and over since I started using Go in 2014 where people think they should be “building my favorite missing feature” — whether that’s futures, generics, structural processes, OTP, version managers, package managers, or now apparently exceptions. I always get the sense that the authors think they’ve done something cool and helpful when in the first place if they had simply put more effort into comprehending the simple “Go way” it wouldn’t have been necessary at all, and the needed functionality would have fallen out of the design.

  • jdbxhdd 7 days ago

    You realize that have of the features you are counting are now in Go while missing in the beginning exactly because people were missing them and Go simply did not offer a sane way to work around the missing features?

    I'm also quite sure that Go will provide a more sane way to handle errors in the not so far future, since it's continuously at the top of people's complaints

    • binary132 7 days ago

      your comment exemplifies the mentality, yes, and unfortunately it has now been adopted by project leadership, so I’m sure you are quite right that more “missing features” will get baked into the language soon :)

      • int_19h 7 days ago

        It's far better to have those features well-designed and baked into the language once, then to have them constantly poorly redesigned and baked into every other Go app.

        • binary132 7 days ago

          nobody would ever use this argument for the design of C. It’s good for C to stay lean and simple while communities using C (please let’s not with this imaginary monolithic “The Community”) are free to try things and offer competing solutions that others are free to ignore.

          kitchen sink languages are bad. Justifying them with “well the community is bad, so we need the bad thing to be mainlined” is maybe worse

          • int_19h 7 days ago

            C is legacy tech on life support.

            By Go standard, all other languages are "kitchen sink". Conversely, I would argue that basics like decent error handling are not in any meaningful sense a "kitchen sink" thing.

            • binary132 7 days ago

              C is still #4 on TIOBE, right behind Java, so that is not at all true.

              Go is good because it’s not like the other languages.

              It should stay not being like them, not try to be more like them.

tzahifadida 7 days ago

I arrived to a similar conclusion. I come from Java and in Java you have exceptions with TryCatch clauses and declaring them in function signatures. It works fairly well but very difficult and not idiomatic to Golang.

Therefor, I created a simple rule. If you do not know what this error means to the user yet then let it stay a fmt.errorf("xx:%w",err). If you do, wrap it in your own custom ServerError struct and return that type from now on. Do not change the meaning of ServerError even if you wrap the Error with another ServerError.

  • RVuRnvbM2e 7 days ago

    It is telling that you come from Java with this opinion. OP's approach is certainly not idiomatic Go.

    • wruza 7 days ago

      Idiomatic here means no idiom suggested really. So yeah, non-idiomatic.

wruza 7 days ago

When I thought about errors/exceptions, I basically came to the same conclusion. To reiterate or add to tfa: standard formulations, expected vs. happened, reasonable context visible in logs, error trees, automatic http/etc codes, tidy client messages in prod, reasonable distinction between: unexpected, semi-normal, programming error, likely fatal.

Not sure why most (all?) programming languages have such poor support for errors. Coding may feel like 2024, but error handling like 1980. Anyone with 2-5 years of any programming experience (in where errors do happen and they choose to handle them) will come to similar ideas.

Also the fact that try {} and catch/finally {} are always three different scopes is just idiotic. It should be try {catch{} finally{}}, what in the cargo cult that {}{}{} is? Everyone copies it blindly from grammar to grammar.

rollulus 7 days ago

This approach is so bad, I don't even know where to start. But it's all symptoms of their, sorry, incompetence. Take the loadCredentials example on top. If os.ReadFile cannot find the file, it returns an error with string representation: "open cred.json: no such file or directory". This comes straight from the std lib as it is, a great error. What does the errors.Is(err, os.ErrNotExist) do: prepend "file not found" to it, rendering: "file not found: open cred.json: no such file or directory". So this adds exactly nothing. The next if will prepend "failed to read file" to it, again, adding nothing as well. The two errors checks should be replaced by one if statement, optionally wrapping it with a context string but I cannot think of any use. Then the next step, error handling of verifyCredentials. I can only guess what it does, but assume that it returns an "username 'foo bar' cannot contain spaces" error. Does prepending "invalid credentials" help anything? Nope, so the whole if can be removed as well. No surprise your errors get clunky if you make them clunky.

I have more pressing things to do than dissect this article line by line, but let me suffice that I feel sorry for newcomers to the language that an article like this is so high on HN. Back in the days there was just Dave Cheney's material to read [1], and it was excellent. It's unfortunately outdated in certain regards (e.g. with new Is/As functionality in the errors package for inspection and the %w formatting directive in fmt.Errorf) but it's still an excellent article.

[1]: https://dave.cheney.net/2016/04/27/dont-just-check-errors-ha...

  • vl 7 days ago

    >it returns an error with string representation: "open cred.json: no such file or directory". This comes straight from the std lib as it is, a great error.

    It’s a terrible error. It’s not structured, so you can’t aggregate it effectively in logs, on top of that it leaks potential secret, so you can’t return it from RPC handler.

    • rollulus 7 days ago

      The string representation is obviously not structured, because it's a string representation and strings are scalars. The typed representation is structured, which you can put into your structured logs as you'd like, omitting sensitive information where needed.

      • zaptheimpaler 7 days ago

        The "typed" representation is an `error` type that a million other methods use with a single method that returns the string:

            type error interface {
              Error() string
            }
        
        It's the same unstructured goslop.
        • philosopher1234 6 days ago

          This error is backed by a strict with error codes and so on.

  • franticgecko3 7 days ago

    I'm worried readers of this article will be horrified and believe this kind of DIY error handling is necessary in Go.

    The author has attempted to fix their unidiomatic error handling with an even more unidiomatic error framework.

    New Go users: most of the time returning an error without checking its value or adding extra context is the right thing to do

    • mattgreenrocks 7 days ago

      > New Go users: most of the time returning an error without checking its value or adding extra context is the right thing to do

      Thank you.

      Feels like Go is having its Java moment: lots of people started using it, so questions of practice arise despite the language aiming at simplicity, leading to the proliferation of questionable advice by people who can't recognize it as such. The next phase of this is the belief that the std library is somehow inadequate even for tiny prototypes because people have it beaten over their heads that "everybody" uses SuperUltraLogger now, so it becomes orthodox to pull that dependency in without questioning it.

      After a bunch of iterations of this cycle, you're now far away from simplicity the language was meant to create. And the users created this situation.

      • int_19h 7 days ago

        Go is having a Go moment: lots of people using it are realizing that other programming languages have all that complexity for a reason, and that "aiming at simplicity" by aggressively removing or ignoring well-established language features often results in more complicated code that's easier to get wrong and harder to reason about.

    • vultour 7 days ago

      From my experience this is not the case. If you error out 7 functions deep and only return the original error there's no chance you're figuring out where it happened. Adding context on several levels is basically a simplified stack trace which lets you quickly find the source of the error.

      • richbell 7 days ago

        I agree; I've wasted countless hours troubleshooting errors returned in complex Go applications. The original error is not sufficient.

      • mrj 7 days ago

        I inherited a codebase with the same problem. After a few debugging sessions where it wasn't clear where the error was coming from, I decided the root problem was that we didn't have stack traces.

        Fortunately, the code was already using zap and it had a method for doing exactly that:

        zap.AddStacktrace(zap.LevelEnablerFunc(func(lvl zapcore.Level) bool { return lvl >= zapcore.InfoLevel }))

        Because most of the time if there's an error, you'd likely want to log it out. Much of the code was doing this already, so it made sense to ensure we had good stack traces.

        There's overhead to this, but in our codebase there was a dearth of logging so it didn't matter much. Now when things are captured we know exactly where it happened without having to do what the post is doing manually... adding stack info.

      • mplanchard 7 days ago

        We actually went through the same realization when we started writing Rust a few years ago. The `thiserror` crate makes it easy to just wrap and return an error from some third-party library, like:

            #[derive(Debug, thiserror::Error)]
            enum MyError {
              #[error(transparent)]
              ThirdPartyError(#[from] third_party::Error)
            }
        
        Since it derives a `From` implementation, you can use it as easily as:

            fn some_function() -> Result<(), MyError> {
              third_party::do_thing()?;
            }
        
        But if that's happening somewhere deep in your application and you call that function from more than one place, good luck figuring out what it is! You wind up with an error log like `third_party thing failed` and that's it.

        Generally, we now use structured error types with context fields, which adds some verbosity as specifying a context becomes required, but it's a lot more useful in error logs. Our approach was significantly inspired by this post from Sabrina Jewson: https://sabrinajewson.org/blog/errors

      • tetha 7 days ago

        It's not a binary decision though. Just because the article arrives at overkill for most things in my opinion doesn't mean sentinel errors or wrapping errors in custom types should be avoided at all costs in all situations.

        In my experience, it's good and healthy to introduce this additional context on the boundaries of more complex systems (like a database, or something accessing an external API and such), especially if other code wants to behave differently based on the errors returned (using errors.Is/errors.As).

        But it's completely not necessary for every single plumping function starts inspecting and wrapping all errors it encounters, especially if it cannot make a decision on these errors or provide better context.

    • mariusor 7 days ago

      Do you maybe have a constructive advice for people that need to return errors that demand different behaviour from the calling code?

      I gave an example higher in the thread: if searching for the entity that owns the creds.json files fails, we want to return a 404 HTTP error, but if creds.json itself is missing, we want a 401 HTTP error. What would be the idiomatic way of achieving this in your opinion?

      • tetha 7 days ago

        With some of these examples, I'd change the API of the lower-level methods. Instead of a (Credentials, err) and the err is a NotFound sometimes, I'd rather make it a (*Credentials, bool, err) so you can have a (creds, found, err), and err would be used for actual errors like "File not found"/"File unreadable"/...

        But other than that, there is nothing wrong with having sentinel errors or custom error types on your subsystem / module boundaries, like ErrCredentialsNotFetched, ErrUserNotFound, ErrFileInvalid and such. That's just good abstraction.

        The main worry is: How many errors do you actually need, and how many functions need to mess about with the errors going around? More error types mean harder maintenance in the future because code will rely on those. Many plumbing or workflow functions probably should just hand the errors upwards because they can't do much about it anyways.

        A lot of the details in the errors of the article very much feel like business logic and API design is getting conflated with the error framework.

        Is "Cannot edit a whatsapp message template more than 24 hours" or "the users account is locked" really an error like "cannot open creds.json: permission denied" or "cannot query database: connection refused"? You can create working code like that, but I can also use exceptions for control flow. I'd expect these things to come from some OpenAPI spec and some controller-code make this decision in an if statement.

      • sethammons 7 days ago

        Use errors.Is and compare to the returned err to mypkg.ErrOwnerNotExists and mypkg.ErrMissingConfig and the handler decides which status code is appropriate

        • mariusor 7 days ago

          Cool, but error.Is what? In my case would both come as a os.NotExist errors because both are files on the disk.

          I think that the original dismissal I replied to, might not have taken into account some of the complexities that OP most likely has given thought to and made decisions accordingly. Among those there's the need to extract or append the additional information OP seems to require (request id, tracking information, etc). Maybe it can be done all at the top level, but maybe not, maybe some come from deeper in the stack and need to be passed upwards.

          • sethammons 7 days ago

            no no no; do not return os.NotExists in both cases. The function needs to handle os.NotExists and then return mypkg.ErrOwnerNotExists or mypkg.ErrMissingConfig (or whatever names) depending on the state in the function.

            The os.NotExists error is an implementation detail that is not important to callers. Callers shouldn't care about files on disk as that is leaking abstraction info. What if the function decides to move those configs to s3? Then callers have to update to handle s3 errors? No way. Return errors specific to your function that abstract the underlying implementation.

            Edit: here is some sample code https://go.dev/play/p/vFnx_v8NBDf

            Second edit: same code, but leveraging my other comment's kverr package to propagate context like kv pairs up the stack for logging: https://go.dev/play/p/pSk3s0Roysm

            • mariusor 7 days ago

              Exactly, and that's what OP argues for, albeit in a very complex manner.

              Distilling their implementation to the basics, that's what we get: typed errors that wrap the Go standard library's ones with custom logic. Frankly I doubt that the API your library exposes (kv maps) vs OPs typed structs, is better. Maybe their main issue is relying on stuffing all error types in the same module, instead of having each independent app coming up with their own, but probably that's because they need the behaviour for handling those errors at the top of the calling stack is uniform and has only one implementation.

              A quick back of the napkin list for what an error needs to contain to be useful in a post execution debugging context would be:

              * calling stack

              * traceability info like (request id, trace id, etc)

              * data for the handling code to make meaningful distinction about how to handle the error

              I think your library could be used for the last two, but I don't know how you store calling stack in kv pairs without some serious handwaving. Also kv is unreliable because it's not compile time checked to match at both ends.

              • sethammons 7 days ago

                I'm not saying use kverr for explicit error handling (like, you could, but that is non ideal), use kverr as a context bag of data you want to capture in a log. If you programmatically are routing with untyped string data, I agree, unreliable

  • tetha 7 days ago

    > No surprise your errors get clunky if you make them clunky.

    From a user perspective, good errors in go make me think or Perls croak/carp. Croak and carp gave you a stacktrace of your error, but it cut out all the module-internal calls and left you with the function calls across module boundaries. Very useful - enough so that Java discovered it again later on.

    Personally, I wouldn't wrap the errors in loadCredentials at all. I'd just wrap the result of this method into an fmt.Errorf("failed to load credentials: %w"). This way the user knows the context the error happened in, and then we have to cross our fingers the error returned by this is good enough.

    But something like "application startup failed: failed to load credentials: open cred.json: no such file or directory" is a very nice error message from an application. Just enough context to know what's going on, but no 1200 line stacktrace to sift through.

  • mariusor 7 days ago

    As someone that ended up implementing something very similar to TFA, I'd like to ask in which way can you pass errors from 3 layers deep in your stack to the top layer and maintain context?

    Ie, when I can't find cred.json I want to return a 401 error, but when I can't find the entity cred.json is supposed to be owned by I want to return 404. How can one "not incompetent" Go developer solve this and distinguish between the two errors?

ikiris 7 days ago

The fact that this code also has gorm in it in one of the examples is neither supportive of the proposal’s fit for the language, nor really surprising.

mukunda_johnson 7 days ago

Adding error checks everywhere when you don't care about them is one of the ugliest things about Go.

What I do is have a utility package that lets me panic on most errors, so I can recover in a generalized handler.

x, err := doathing()

Catch(err, "didn't do the thing")

The majority of error handling is "the operation failed, so cancel the request." Sure there are places where the error matters and you can divert course, but that is far from the majority of cases.

  • kbolino 7 days ago

    I don't agree, but having said that, this feels like an entirely predictable/justifiable perspective to hold, given the terrible design of net/http in the standard library. Of course it feels easier to just panic, it's not like you can return an error from a handler. There is so much compatibility baggage from Go 1.0 in that package, that doing the right thing (contexts, errors, etc.) is so much harder than it should be, and most people end up doing the wrong thing because it's more ergonomic.

    • mukunda_johnson 6 days ago

      I usually use Echo which does have an error to return from handlers, but I don't think it's necessarily the wrong thing unless you're writing a library. I used to avoid panics with the same mindset that they aren't supposed to be used like exceptions, but I've found that panics are a clean way to handle a bulk of error cases that are "log and retreat", centralizing the process with some syntactic sugar to not have to check err != nil everywhere. More of my thoughts here if any are curious: https://blog.mukunda.com/cat/2022/dont-be-afraid-to-panic.tx...

      I think one thing that could help if the codebase wants to avoid regular panics is more syntactic sugar to help error bubbling, like Rust has.

revskill 7 days ago

Too much writing and lack of diagramming is a sign of digging through the rabitt hole.

omeid2 7 days ago

This is a cry for sum types.

AceCream 7 days ago

type xError struct { msg message, stack: callers(), }

is this legit in go?

bilbo-b-baggins 7 days ago

Bro got dragged so hard in the comments he took his site down. Oof.

  • jatins 7 days ago

    I mean their intentions are good but if I worked at a place that made me use that error package I'd not have a good time

    In general with golang, if something is not idiomatic Go then don't try too hard to fit constructs from other languages into it. Even the use of lodash like packages feels awkward in Go

  • asabla 7 days ago

    more like hug of death from HN users. Since the site is back up and working again