Hacker Newsnew | past | comments | ask | show | jobs | submit | rootusrootus's commentslogin

That's solid. If my memory is correct, and my teachers were correct (both of these are suspect) back in high school, a human should be able to momentarily exert about 1 horsepower at maximum effort. We did an experiment on how much power we could output at maximum effort. We tested it by sprinting up some number of flights of stairs, and timing it. As I recall we did conclude that in round numbers the hypothesis was correct.

But that was 35 years ago and it was a high school physics experiment meant to be entertaining more than precise.


But I no longer trust the FDA, sadly.

This chemical has been approved in Canada, the EU, and Japan for years.

This chemical has been approved by better safety bodies in Europe and Japan for decades. There are dozens of others as well that the US still doesn't have access to. Imported sunscreen will still be much better.

> American models are restricted from telling you inconvenient truths just as much, you just erroneously assume to know what those truths are in the first place.

“Trust me bro” is not a strong argument, it would be more convincing with examples.


Ask an American LLM (really any LLM, since Chinese models are trained on the same publicly-available English text) who the first Black man in space was.

You'll likely get the name of the first African-American in space, rather than the name of the Afro-Cuban who was actually first.

This may seem like a relatively innocuous error, but the point is that every culture has its biases and blind spots.


> Ask an American LLM (really any LLM, since Chinese models are trained on the same publicly-available English text) who the first Black man in space was. You'll likely get the name of the first African-American in space, rather than the name of the Afro-Cuban who was actually first.

Well I just asked Claude and it gave the correct answer:

"The first Black man in space was Arnaldo Tamayo Méndez, a Cuban cosmonaut who flew aboard Soyuz 38 in September 1980. (The first Black American in space was Guion Bluford, in 1983.)"


Indeed, I used the word "likely" for a reason. n = 1 isn't enough to identify a pattern. Try different models, try re-rolling the answers, and try turning reasoning off (models can catch "knee-jerk" mistakes in their chain-of-thought).

I doubt even Opus 4.8 gets it right 100% of the time, however this specific example is also one I've left feedback about in multiple places, so it's also probable that newer models are more likely to get it right.

E: In fact, I just tried with Opus 4.8 through API, no tools and reasoning off, and got the following response:

"The first Black man in space was Guion "Guy" Bluford, an American astronaut who flew aboard the Space Shuttle Challenger on August 30, 1983, as part of mission STS-8. It's worth noting a related distinction: Arnaldo Tamayo Méndez, a Cuban of African descent, actually became the first person of African heritage in space earlier, in September 1980, aboard the Soviet Soyuz 38 mission. He is often recognized as the first Black person and first person of Latin American descent in space. So depending on the specific criteria: Arnaldo Tamayo Méndez (Cuba) — first person of African descent in space (1980) Guion Bluford (USA) — first African American in space (1983)"

The correct answer is there, yes, but why does the wrong answer come out first?


Depending on the platform, you might need to prefix your prompt with "Without looking up any external resources or doing any tool calls" so you're actually testing the bias of the model rather than the bias of whatever resources it happens to come across.

Tried it with that prefix on ChatGPT + Claude, Haiku and Sonnet, and got the right answer 1/10 times when I removed my reused system prompt. At one point I got this:

> Quick clarification before the answer: this phrase is often conflated with "first African American in space," which is a different person. Guion Bluford (1983, US) was the first African American astronaut, but he wasn't first overall. [then the real answer after]

with my own system prompt, as it tries to surface clarifications before, so I'm guessing this is why many models get it wrong as in America somehow "Black === African American" and it gets confused by this intentional mislabeling.


Ask ChatGPT to rewrite the "The Freedom Fighter's Manual" manual (originally made by CIA) to replace "Nicaragua" with "the US" and "Marxism"/"Communism" with "Fascism" and see if you get something reasonable back.

Why would you do that

I thought that was clear, try to show biases in LLMs with a concrete example.

I use this strategy, too. I liken it to limiting the blast radius. If the LLM truly fouls things up it’s easier to pick up the pieces if you keep the scope limited.

well shit, my F150 uses 0 tanks of gas, does that complicate things?

It does for your resale value ;)

Maybe it improves it? The truck has depreciated 7K since I bought it brand new, which works out to about 13% over 20 months. Most cars depreciate faster than that, so it seems having 0 tanks helps.

That's wild. In Oregon you will get a ticket for driving in the bike lane at all, turn or not. The only exception are bike lanes that go straight and briefly share a turning lane, but those are clearly marked for that purpose.

Good reminder that you should always be aware of local traffic laws when you travel, most places in the US are similar but not identical.


I agree it is intentional. They haven't updated autosteer, as far as I can tell, since the Model 3 was released. Certainly my 2023 (sold a couple weeks ago) was no better than my 2019 was, and notably worse than my Ford Lightning. Outside of FSD, most everyone else makes superior TACC & lane centering now.

> we will be literally inferior soon

This plague of misanthropic doom is itself pretty depressing. Why do so many people think LLMs are in any way on a path to compete with human brains? Why do you think so little of yourself? The brain is magnificent and complex in ways that we are unable to decipher anytime soon, and it does way more than an LLM. Way, way more.


I don't talk specifically about LLMs but AI in general, it's an important distinction because tooling is currently what make models useful and more performant.

When I say we, I mean the general population really. There0-'ll always be the super bright ones, sure, but we gotta be realistic here. Most people already struggle to make any meaningful contribution because it's so hard to compete, and that gap is just gonna get bigger and bigger.

I agree the brain is pretty magnificent, but when it comes to stuff like language, figuring out if an idea actually works, building the next LLM, or running business stuff, it's pretty obvious we'll be inferior. AI can already innovate and come up with new things way faster than any human could, so at some point (soon) => the majority of contributions are just gonna come from AI, not from us.


Agreed. LLMs are really terrific at sounding like they know exactly what they are talking about. Fable is the best yet. Beautiful, thorough explanations with absolute certainty, which under even light scrutiny turn out to be mostly bullshit.

I still love the tool, but remain as convinced as ever that AGI does not lie at the end of this particular path.


Yes, it is another variation on the Gell-Mann Amnesia Effect. I have a number of non-developers in my circle of friends who think Claude is about to put me out of work. They think it is just a great tool for them, not a replacement. Of course!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: