Front page + top level comments only
Chromium still seems to be working on support it seems based on https://cr-status.appspot.com/feature/5149560434589696 so maybe it'll be useful soon? That page indicates that they're still discussing certain parts of the spec.
grid-template-rows: masonry;
is going to be outdated then?
If you try to go left-to-right, you will quickly realize that at the end of each "line" it is really difficult to know where the next line starts. It is easy to accidentally start again on the same line (and inspect the same elements), or skip one accidentally. Then navigating through the elements one by one requires a considerable amount of cognitive effort, your eyes bounce up and down constantly, and you end up inspecting the same elements multiple times.
If you try to go top-to-bottom, lane by lane, you will then realize that the page also has infinite scroll and you will never go past the first lane.
Hypermedia suffers because these marketing companies waste time on making sure they can build Pinterest in 10 LoC instead of fixing actual long running hypermedia domains.
Personally, I use an 11-year-old machine and have had to add userscript hacks to certain major Web sites to work around bugs in CSS grid (not the "lanes" described here).
At least new JavaScript features can be "polyfilled" or whatever. Maybe sites could check for CSS feature support too? But they seem not to.
For example, the demo page linked in the article fails pretty unusably for me. All the images take up nearly the full viewport width.
> can someone help folks at Mistral find more weak baselines to add here? since they can't stomach comparing with SoTA....
> (in case y'all wanna fix it: Chandra, dots.ocr, olmOCR, MinerU, Monkey OCR, and PaddleOCR are a good start)
- paddleOCR-VL
- olmOCR-2
- chandra
- dots.ocr
I kind of miss there is not many leaderboard sections or arena for OCR and CV and providers hosting those. Neglected on both Artificial Analysis and OpenRouter.
I don’t know how they can make this statement with 79% accuracy rate. For any serious use case, this is an unacceptable number.
I work with scientific journals and issues like 2.9+0.5 and 29+0.5 is something we regularly run into that has us never being able to fully trust automated processes and require human verification every step.
EDIT: you can try it yourself for free at https://console.mistral.ai/build/document-ai/ocr-playground once you create a developer account! Fingers crossed to see how well it works for my use case.
It seems like EU in general should be heavily invested in Mistral's development, but it doesn't seem like they are.
I've got some foreign artbooks that I would like to get translated. The translations would need to be in place since the placement of the text relative to the pictures around it is fairly important. I took a look at some paid options online, but they seemed to choke - mostly because of the non-standard text placements and all.
The best solution I could come up with is using Google Lens to overlay a translation while I go through the books, but holding a camera/tablet up to my screen isn't very comfortable. Chrome has Lens built in, but (IIRC) I still need to manually select sections for it to translate - it's not as easy to use as just holding my phone up.
Anyone know of any progress towards in-place OCR/translations?
Regular Gemini Thinking can actually get 70-80% of the documents correct except lots of mistakes on given names. Chatgpt maybe understands like 50-60%.
This Mistral model butchered the whole text, literally not a word was usable. To the point I think I'm doing something wrong.
The test document: https://files.fm/u/3hduyg65a5
We’ve done some fairly extensive testing internally recently and found that Garage is somewhat easier to deploy in comparison to our existing use of MinIO, but is not as performant at high speeds. IIRC we could push about 5 gigabits of (not small) GET requests out of it, but something blocked it from reaching the 20-25 gigabits (on a 25g NIC) that MinIO could reach (also 50k STAT requests/s, over 10 nodes)
I don’t begrudge it that. I get the impression that Garage isn’t necessarily focussed on this kind of use case.
---
In addition:
Next time we come to this we are going to look at RustFS [1], as well as Ceph/Rook [2].
We can see we're going to have to move away from MinIO in the foreseeable future. My hope is that the alternatives get a boost of interest given the direction MinIO is now taking.
[0]: https://news.ycombinator.com/item?id=46140342
[1]: https://rustfs.com/
[2]: https://rook.io/
Does anyone know a good open source S3 alternarive that's easily extendable with custom storage backends?
For example, AWS offers IA and Glacier in addition to the defaults.
> For the metadata storage, Garage does not do checksumming and integrity verification on its own, so it is better to use a robust filesystem such as BTRFS or ZFS. Users have reported that when using the LMDB database engine (the default), database files have a tendency of becoming corrupted after an unclean shutdown (e.g. a power outage), so you should take regular snapshots to be able to recover from such a situation.
It seems like you can also use SQLite, but a default database that isn't robust against power failure or crashes seems suprising to me.
https://www.repoflow.io/blog/benchmarking-self-hosted-s3-com... was useful.
RustFS also looks interesting but for entirely non-technical reasons we had to exclude it.
Anyone have any advice for swapping this in for Minio?
Previously I used LocalStack S3 but ultimately didn't like the lack of persistance thats not available on the OSS verison. MinIO OSS is apparently no longer maintained? Also looked at SeaweedFS and RustFS but from a quick reading into them this once was the easiest to set up.
Garage looks really nice: I've evaluated it with test code and benchmarks and it looks like a winner. Also, very straightforward deployment (self contained executable) and good docs.
But no tags on objects is a pretty big gap, and I had to shelve it. If Garage folk see this: please think on this. You obviously have the talent to make a killer application, but tags are table stakes in the "cloud" API world.
It's a really cool system for hyper converged architecture where storage requests can pull data from the local machine and only hit the network when needed.
this is the reliability question no?
In particular, I don't love it when an article attacks a best practice as a cheap gotcha:
"and this time it was super easy! After some basic reversing of the Tapo Android app, I found out that TP-Link have their entire firmware repository in an open S3 bucket. No authentication required. So, you can list and download every version of every firmware they’ve ever released for any device they ever produced"
That is a good thing - don't encourage security through obscurity! The impact of an article like this is as likely to get management to prescribe a ham-handed mandate to lock down firmware as it is to get them to properly upgrade their security practices.
This page[1] lists the C200 as last having a firmware update in October, but also lists the latest version as 1.4.4 while the article lists 1.4.2. It seems like they have pushed other updated in this time, but not these security fixes.
[1]https://community.tp-link.com/us/smart-home/kb/detail/412852
For anyone concerned about their TP-Link cameras, consider: 1. Disable UPnP on your router 2. Use VLANs to isolate IoT devices 3. Block all outbound traffic except specific required endpoints 4. Consider replacing stock firmware with open alternatives when available 5. Regularly check for firmware updates (though as this article shows, updates can be slow)
The hardcoded keys issue is particularly troubling because it means these vulnerabilities persist across the entire product line. Thanks for the detailed writeup - this kind of research is invaluable for the security community.
I assume any Wi-Fi camera under $150 has basically the same problems. I guess the only way to run a security camera where you don't have Ethernet is to use a non-proprietary Wi-Fi <-> 1000BASE-T adapter. Probably only something homebuilt based on a single board computer and running basically stock Linux/BSD meets that requirement.
How does this happen? Doesn’t pretty much every ISP give a router with their modem? How do people manage this?
Is it wrong to judge people for their choice of ai providers?
(Phones is one notable exception. I need contactless payments to work.)
unzip zbsm.zip
Archive: zbsm.zip
inflating: 0
error: invalid zip file with overlapped components (possible zip bomb)
This seems to have been done in a patch to address https://nvd.nist.gov/vuln/detail/cve-2019-13232https://sources.debian.org/patches/unzip/6.0-29/23-cve-2019-...
Like bomb the CPU time instead of memory.
Someone shared a link to that site in a conversation earlier this year on HN. For a long time now, I've had a gzip bomb sitting on my server that I provide to people that make a certain categories of malicious calls, such as attempts to log in to wordpress, on a site not using wordpress. That post got me thinking about alternative types of bombs, particularly as newer compression standards have become ubiquitous, and supported in browsers and http clients.
I spent some time experimenting with brotli as a compression bomb to serve to malicious actors: https://paulgraydon.co.uk/posts/2025-07-28-compression-bomb/
Unfortunately, as best as I can see, malicious actors are all using clients that only accept gzip, rather than brotli'd contents, and I'm the only one to have ever triggered the bomb when I was doing the initial setup!
According to a possibly apocryphal story from the premiere performance, a woman was heard shouting that Ravel was mad. When told about this, Ravel is said to have remarked that she had understood the piece.
https://en.wikipedia.org/wiki/Bol%C3%A9roAh it's called The Commodordion https://linusakesson.net/commodordion/index.php
I guess there's a C64 "executable" that he's made available but no source so I don't know what the exact keymapping is. I did find a few different resources that show the layout in action [2] [3].
[0] https://www.youtube.com/watch?v=xwsZ41pA_Vo&t=58s
[1] https://en.wikipedia.org/wiki/Chromatic_button_accordion
[2] https://okathira-dev.github.io/client-web-api-sandbox/button...
That's the most important number in stores like this one.
I am part of the LOAD "*", 8, 1 generation, and this is really freaking cool.
One of the funniest things in the video is the variety of neck tie configurations, one for each part.
The photo of "the automaton" appears to be a melamine white particleboard panel.
That's such a good idea with this old equipment. And you can see that the guy tried hard not to laugh. And surprisingly, the arrangement sounds great. Hilarious.
Those disc drive sounds are so cool
Apparently, during a recent review, they decided this counted as fraud and banned my account. As a result, I can no longer log in and lost access to all my Kindle e-books. They also remotely wiped my Kindle, so my entire library is gone. I appealed the decision, but I’ve been waiting for over six months with no resolution.
Unfortunately, it seems like this will be chosen by the publisher, so of course probably most of the books won't be downloadable at all, and Amazon can now point their finger at the publisher instead of taking the blame themselves. Publishers was probably always the reason behind the move, but at least now Amazon have someone else to blame, which I guess is great for them.
This isn't announcing that pdf's and epub's are now available for everything that was drm-free, this is announcing that they will _permit_ pdf's and epub's to be available.
I wonder how many books are actually DRM-free and are going to be affected by this change. I suspect relatively few, but I would be happy to be wrong
The internet "allows" ePub and PDF downloads for ALL books. Adjust yourselves accordingly.
Fool me once..
Not to mention the spying they'll do - Whatcha reading?
I do backups but better be safe than sorry.
These days I don’t buy from them but the same with Kobo which is a better company to begin with.
Same with Google etc... just look how bad youtube has gotten. I try to find a video xyz, using the search term xyz, and after like 5 results, random videos show up. That is not a "search", that is propaganda and an attempt to retain people on the platform - but I am already on the platform playing BACKGROUND MUSIC of some DJs. Why is Google wasting my time when I want to FIND something? And what is even worse - that leaked onto the search engine too. The search engine has been ruined by Google deliberately so in the last some years.
I don't let those laws (corporate opinions) degrade my quality of life.
So the real question is - how is amazon going to enshitify drm-free books? Are they trying to wipe out gutenburg, standard-ebooks, etc?
Are they trying to be the youtube of drm-free? The place where everyone goes, and that becomes crap due updating Ts&Cs - inserting ads or charges?
DRM-free is a precondition for me buying digital books personally. Practically no major digital bookstore offers it.
I still buy physical media from them once a year (November) when availabilty and rest of the world can't compete price-wise. Yes I recognise the hypocrisy of said actions and minimise it as much as possible. Non-US based. Many physical media producers (e.g. Disney) no longer produce stuff for our 'region'.
if anyone else is more familiar with go (I only really do rust) is there no solution to preventing stack smashing on goroutines? https://github.com/mullvad/mullvadvpn-app/pull/7728 I understand that go routines have a smaller stack size (the whole green thread problem) but there's no way to fix this?
But my app’s wireguard is natively implemented by fdio vpp plugin, so it’s based on C.
I tried downloading their Android app, but it's not generally usable for people who host our own WireGuard, which is fair enough.
Probably naively, I'm thinking:
- diversity: good
- doubling the attack surface: real bad
What do the security folks out there think of the topic?As someone who is a huge IDE fan, I vastly prefer the experience from Codex CLI compared to having that built into my IDE, which I customize for my general purposes. The fact it's a fork of VSCode (or whatever) will make me never use it. I wonder if they bet wrong.
But that's just usability and preference. When the SOTA model makers give out tokens for substantially less than public API cost, how in the world is Cursor going to stay competitive? The moat just isn't there (in fact I would argue its non-existent)
Personally, I work on Graphite for two reasons. 1) I love working with kind, smart, intense teammates. I want to be surrounded by folks who I look up to and who energize me. 2) I want to build bleeding-edge dev tools that move the whole industry forward. I have so much respect for all y’all across the world, and nothing makes me happier than getting to create better tooling for y’all to engineer with. Graphite is very much the combination of these two passions: human collaboration and dev tools.
Joining Cursor accelerates both these goals. I get to work with the same team I love, a new bunch of wonderful people, and get to keep recruiting as fast as possible. I also get to keep shipping amazing code collaboration tooling to the industry - but now with more resourcing and expertise. We get to be more ambitious with our visions and timelines, and pull the future forward.
I wouldn’t do this if I didn’t think the Cursor team weren’t standup people with high character and kindness. I wouldn’t do this if I thought it meant compromising our vision of building a better generation of code collaboration tooling. I wouldn’t do it if I thought it wouldn’t be insanely fun and exciting. But it seems to be all those things, so we’re plunging forward with excitement and open hearts!
Is it market share? Because I don't know who has a bigger user base that cursor.
The idea is to hook into Bitbucket PR webhooks so that whenever a PR is raised on any repo, Jenkins spins up an isolated job that acts as an automated code reviewer. That job would pull the base branch and the feature branch, compute the diff, and use that as input for an AI-based review step. The prompt would ask the reviewer to behave like a senior engineer or architect, follow common industry review standards, and return structured feedback - explicitly separating must-have issues from nice-to-have improvements.
The output would be generated as markdown and posted back to the PR, either as a comment or some attached artifact, so it’s visible alongside human review. The intent isn’t to replace human reviewers, but to catch obvious issues early and reduce review load.
What I’m unsure about is whether diff-only context is actually sufficient for meaningful reviews, or if this becomes misleading without deeper repo and architectural awareness. I’m also concerned about failure modes - for example, noisy or overconfident comments, review fatigue, or teams starting to trust automated feedback more than they should.
If you’ve tried something like this with Bitbucket/Jenkins, or think this is fundamentally a bad idea, I’d really like to hear why. I’m especially interested in practical lessons.
what does graphite have to do with code review?
> After bringing features of Supermaven to Cursor Tab, we now recommend any existing VS Code users to migrate to Cursor.
Supermaven was acquired by Cursor and sunset after 1 year.
for anyone else looking for a replacement, git spice and jujutsu are both fantastic
Huge fans of their work @ GitStart!
Then Cursor takes on GitHub for the control of the repo.
Looks bad: https://forum.cursor.com/t/font-on-the-website-looks-weird/1...
- Hunter @ Ellipsis
But what got me was the tipster who blew wide open the case is reportedly a homeless Brown graduate who lived in the basement of the engineering building (a la South Korean film Parasite). It made me so sad but also not surprised, that building does have a single occupancy bathroom with showers; and no keycard access was needed in the evening until 7pm.
So it made sense to me that he or she would've used that building for shelter and comfort. Also it didn't boggle my mind at all that a Brown grad (from the picture, the tipster looked like a artistic Brown student vs. the careerist type) would be homeless - given that I known many of my classmates who have a certain personality, brilliant but also idealistic/uncompromising that made them brittle unfortunately in a society that rewards conformity, settling and stability.
I can't get over the fact that two Brown student whom presumably have fallen on the wayside of society have chosen two different paths, (1) the homeless guy who still perseveres even in the basement of Barrus & Holley for 15 years a la Parasite after 2010 graduation but still has the situational awareness and rises to the occasion to give the biggest tip to the Providence Police, (2) the other guy who harbors so much resentment over a course of 25 years to plan a trip from Florida to gun down innocent kids who are 18 and 19 and his classmate when they were 18 and 19 year old.
https://www.fastcompany.com/91463942/sequoia-shaun-maguire-b...
I think it's the biggest response I've personally seen since the Boston Marathon Bombing.
Anyone have the Reddit link? (I wonder why the article doesn't include it)
This is the first model by a main AI research lab (the people behind Qwen Image, which is basically the SOTA open image diffusion model) with those capabilities afaik.
The difference in timing for this submission (16 hours ago) is because that's when the research/academic paper got released—as opposed to the inference code and model weights, which just got released 5 hours ago.
---
Technically there's another difference, but this mostly matters for people who are interested in AI research or AI training. From their abstract: “[we introduce] a Multi-stage Training strategy to adapt a pretrained image generation model into a multilayer image decomposer.” which seems to imply that you can adapt a current (but different) image model to understand layers as well, as well as a pipeline to obtain the data from Photoshop .PSD files.
- Paper page: https://huggingface.co/papers/2512.15603
- Model page: https://huggingface.co/Qwen/Qwen-Image-Layered
- Quantized model page: https://huggingface.co/QuantStack/Qwen-Image-Layered-GGUF
- Blog URL: https://qwenlm.github.io/blog/qwen-image-layered/ (404 at the time of writing this comment, but it'll probably release soon)
- GitHub page: https://github.com/QwenLM/Qwen-Image-Layered
If you set a variable layers of 5 for example will it determine what is on each layer, or do I need to prompt that?
And I assume you need enough VRAM because each layer will be effectively a whole image in pixel or latent space… so if I have a 1MP image, and 5 layers I would likely need to be able to fit a 5MP image in VRAM?
Or if this can be multiple steps, where I wouldn’t need all 5 layers in active VRAM, that the assembly is another step at the end after generating on one layer?
L1 cache reference 2,000,000,000 ops/sec
L2 cache reference 333,333,333 ops/sec
Branch mispredict 200,000,000 ops/sec
Mutex lock/unlock (uncontended) 66,666,667 ops/sec
Main memory reference 20,000,000 ops/sec
Compress 1K bytes with Snappy 1,000,000 ops/sec
Read 4KB from SSD 50,000 ops/sec
Round trip within same datacenter 20,000 ops/sec
Read 1MB sequentially from memory 15,625 ops/sec
Read 1MB over 100 Gbps network 10,000 ops/sec
Read 1MB from SSD 1,000 ops/sec
Disk seek 200 ops/sec
Read 1MB sequentially from disk 100 ops/sec
Send packet CA->Netherlands->CA 7 ops/secdocs/demos: https://beyondloom.com/decker/pdf.html
browsable source: https://github.com/JohnEarnest/Decker/blob/main/examples/dec...
My page layout code was like 50 lines of code. And I remember thinking... OK they already wrote 8,000 lines of code... They couldn't have added 50 more?!
400 lines though. Respect. I will take a proper look at this when I recover from burnout :)
https://github.com/bubkoo/html-to-image
It's probably the most impressive and seamless experience I've had with converting HTML to pdfs/images so I just wanted to sing its praises here
I think if you have a markdown->PDF function included, where I can send in markdown and get PDF, that would solve quite many needs, and would be useful.
https://doc.rust-lang.org/beta/unstable-book/language-featur...
- Drop does something, like close a file or release a lock, or
- x and y don't have Send and/or Sync, and you have an await point in the function or are doing multi-threaded stuff
This is why you should almost always use std::sync::Mutex rather than tokio::sync::Mutex. std's Mutex isn't Sync/Send, so the compiler will complain if you hold it across an await. Usually you don't want mutex's held across an await.
let mut data = foo(); data.mutate(); let data = data;
May be preferable for short snippets where adding braces, the yielded expression, and indentation is more noise than it's worth.
I typically use closures to do this in other languages, but the syntax is always so cumbersome. You get the "dog balls" that Douglas Crockford always called them:
``` const config = (() => { const raw_data = ...
...
return compiled;
})()'const result = config.whatever;
// carry on
return result; ```
Really wish block were expressions in more languages.
That last example is probably my biggest use of it because I hate having variables being unnecessarily mutable.
Also in Kotlin, Scala, and nim.
The second example "erasure of mutability" makes more sense. But this effectively makes it a Rust-specific pattern.
It barely adds any functionality but it's useful for readability because of the same reasons in the OP.
It helps because I've been bitten by code that did this:
setup_a = some_stuff
setup_b = some_more_stuff
i_think_this_is_setup = even_more_stuff
the_thing = run_setup(setup_a, setup_b, i_think_this_is_setup)
That's all fine until later on, probably in some obscure loop, `i_think_this_is_setup` is used without you noticing.Instead doing something like this tells the reader that it will be used again:
i_think_this_is_setup = even_more_stuff
the_thing = begin
setup_a = some_stuff
setup_b = some_more_stuff
run_setup(setup_a, setup_b, i_think_this_is_setup)
end
I now don't mentally have to keep track of what `setup_a` or `setup_b` are anymore and, since the writer made a conscious effort not to put it in the block, you will take an extra look for it in the outer scope.It's used all throughout the Linux kernel and useful for macros.
Try this out, you can actually (technically) assign a variable to `continue` like:
let x = continue;
Funnily enough, one of the few things that are definitely always a statement are `let` statements! Except, you also have `let` expressions, which are technically different, so I guess that's not really a difference at all.
In the example given, I would have preferred to extract to a method—-what if I want to load the config from somewhere else? And perhaps the specific of strip comments itself could have been extracted to a more-semantically-aptly named post-processing method.
I see the argument that when extracted to a function, that you don’t need to go hunting for it. But if we look at the example with the block, I still see a bunch of detail about how to load the config, and then several lines using it. What’s more important in that context—-the specifics of the loading of config, or the specifics of how requests are formed using the loaded config?
The fact that you need to explain what’s happening with comments is a smell. Properly named variables and methods would obviate the need for the comments and would introduce semantic meaning thru names.
I think blocks are useful when you are referencing a lot of local variables and also have fairly localized meaning within the method. For example, you can write a block to capture a bunch of values for logging context—-then you can call that block in every log line to get a logging context based on current method state. It totally beats extracting a logging context method that consumes many variables and is unlikely to be reused outside of the calling method, and yet you get delayed evaluation and single point of definition for it.
So yes to the pattern, but needs a better example.
Voluntary use: I know this one. It’s a pattern now.
One of the big benefits of both the single run (AIGFS) and ensemble (AIGEFS) models is the speed and (less) computation time required. Weather modeling is hard and these models should be used as complementary to deterministic models as they all have their own strengths and weaknesses. They run at the same 0.25 degree resolution as the ECMWF AIFS models which were introduced earlier this year and have been successful[4].
Edit: Spring 2025 forecasting experiment results is available here[6].
[1] https://www.weatherbell.com/
[2] https://www.youtube.com/watch?v=47HDk2BQMjU
[3] https://www.youtube.com/watch?v=DCQBgU0pPME
[4] https://www.ecmwf.int/en/forecasts/dataset/aifs-machine-lear...
[5] https://www.tropicaltidbits.com/analysis/models/
[6] https://repository.library.noaa.gov/view/noaa/71354/noaa_713...
https://www.nco.ncep.noaa.gov/pmb/products/gens/
https://www.emc.ncep.noaa.gov/emc/pages/numerical_forecast_s...
I understand that aviation safety is certainly a primary concern for NWS/NOAA but ground level forecasts are also very important for public safety.
A quick search didn't turn up anything about the model's skill or resolution, though I'm sure the data exists.
This makes me skeptical that it isn’t just politicized Trumpian nonsense.
My requirements are: suspend/resume, being able to drive a 5K monitor over USB-C, wifi.
I found https://wiki.freebsd.org/Laptops but I don't know how up-to-date it is.
I would love to see a FreeBSD Workstation edition akin to like Fedora or Ubuntu where things just work (mostly).
Wayland took too long. We’re still stuck on Gtk. KDE Plasma team is making moves. I just want a nice, BSD, desktop experience without all the enshitification of copilot or Apple knowing what’s best for me.
It always surprises me that this isn't obvious to everyone. If AI wrote 100% of the code that I do at work, I wouldn't get any more work done because writing the code is usually the easy part.
> Everyone’s heard the line: “AI will write all the code; engineering as you know it is finished... The Bun acquisition blows a hole in that story.”
But what the article actually discusses and demonstrates by the end of the article is how the aspects of engineering beyond writing the code is where the value in human engineers is at this point. To me that doesn't seem like an example of a revealed preference in this case. If you take it back to the first part of the original quote above it's just a different wording for AI being the code writer and engineering being different.
I think what the article really means to drive against is the claim/conclusion "because AI can generate lots of code we don't need any type of engineer" but that's just not what the quote they chose to set out against is saying. Without changing that claim the acquisition of Bun is not really a counterexample, Bun had just already changed the way they do engineering so the AI wrote the code and the engineers did the other things.
Clever pitch. Don't alienate all the people who've hitched their wagons to AI, but push valuing highly-skilled ICs as an actionable leadership insight.
Incidentally, strategy and risk management sound like a pay grade bump may be due.
Technically, there’s still a horse buggy whip market, an abacus market, and probably anything else you think technology consumed. It’s just a minuscule fraction of what it once was.
I don’t know why the acquisition happened, or what the plans are. But it did happen, and for this we don’t have to suspend disbelief. I don’t doubt Anthropic has plans that they would rather not divulge. This isn’t a big stretch of imagination, either.
We will see how things play out, but people are definitely being displaced by AI software doing work, and people are productive with them. I know I am. The user count of Claude Code, Gemini and ChatGPT don’t lie, so let’s not kid ourselves.
> Everyone’s heard the line: “AI will write all the code; engineering as you know it is finished.”
Software engineering pre-LLMs will never, ever come back. Lots of folks are not understanding that. What we're doing at the end of 2025 looks so much different than what we were doing at the end of 2024. Engineering as we knew it a year or two ago will never return.
This argument requires us to believe that AI will just asymptote and not get materially better.
Five years from now, I don't think anyone will make these kinds of acquisitions anymore.
First of all, hello Hacker News :)
Many of the comments seem to address the design of key hashing. The reason for using hashed keys inside B-tree nodes instead of the string keys directly is threefold:
1) The implementation is simplified.
2) When performing a lookup, it is faster to compare fixed-sized elements than it is to do variable length string comparison.
3) The key length is unlimited.
I should say the documentation page is out of date regarding hash collisions. The format now supports probing thanks to a PR merged yesterday. So inserting colliding keys will actually work.
It is true that databases and other formats do store string keys directly in the nodes. However as a memory format, runtime performance is very important. There is no disk or IO latency to 'hide behind'.
Right now the hash function used is DJB2. It has the interesting property of somewhat preserving the lexicographical ordering of the key names. So hashes for keys like "item_0001", "item_0002" and "item_0003" are actually more likely to also be placed sequentially inside the B-tree nodes. This can be useful when doing a sequential scan on the semantic key names, otherwise you are doing a lot more random access. Also DJB2 is so simple that it can be calculated entirely by the C preprocessor at compile time, so you are not actually paying the runtime cost of hashing.
We will be doing a lot more testing before DJB2 is finalized in the spec, but might later end up with a 'better' hash function such as XXH32.
Finally, TRON/Lite³ compared to other binary JSON formats (BSON, MsgPack, CBOR, Amazon Ion) is different in that:
1) none of the formats mentioned provide direct zero-copy indexed access to the data
2) none of the formats mentioned allow for partial mutation of the data without rewriting most of the document
This last point 2) is especially significant. For example, JSONB in Postgres is immutable. When replacing or inserting one specific value inside an object or array, with JSONB you will rewrite the entire document as a result of this, even if it is several megabytes large. If you are performing frequent updates inside JSONB documents, this will cause severe write amplification. This is the case for all current Postgres versions.
TRON/Lite³ is designed to blur the line between memory and serialization format.
Perhaps I should have posted this URI instead: https://lite3.io/design_and_limitations.html
Lite^3 deserves to be noticed by HN. u/eliasdejong (the author) posted it 23 days ago but it didn't get very far. I'm hoping this time it gets noticed.
Apache Arrow is trying to do something similar, using Flatbuffer to serialize with zero-copy and zero-parse semantics, and an index structure built on top of that.
Would love to see comparisons with Arrow
The overridden space is never recovered, causing buffer size
to grow indefinitely.
Is the garbage at least zeroed? Otherwise seems like it could "leak" overwritten values when sending whole buffers via memcpyDon't get me wrong, I find this type of data structures interesting and useful, but it's misleading to call it "serialization", unless my understanding is wrong.
It's just dishonest.
It prompted Laurenz to submit the documentation patch that is cited in the article. In the discussion of the patch itself, people seem to conclude that it's a good improvement to the docs, but that the behaviour itself is a bit of a footgun. [2]
[1]: https://stackoverflow.com/questions/73951604/autovacuum-and-...
[2]: https://www.postgresql.org/message-id/Y8cQJIMFAe7QT73/%40mom...
(plus an interesting discussion in the comments of that post on how the query planner chose a certain row estimate in the specific case that Laurenz shared!)
The other thing I'll add is that we still haven't figured out:
1. An optimal ANALYZE schedule here on parent partitions; we're opting to over-analyze than under-analyze at the moment, because it seems like our query distribution might change quite often.
2. Whether double-partitioned tables (we have some tables partitioned by time series first, and an enum value second) need analyze on the intermediate tables, or whether the top-level parent and bottom-level child tables are enough. So far just the top-level and leaf tables seem good enough.
> The model will respond with a JSON object that strictly follows your schema
Gemini is listed as a model supporting structured output, and yet its fail rate is 0.39% (Gemini 2.0 Flash)!! I get that structured output has a high performance cost but advertising it as supported when in reality it's not is a massive red flag.
Worst yet response healing only fixes JSON syntax error, not schema adherence. This is only mentioned at the end of the article which people are clearly not going to read.
WTF
If part of my system can't even manage to output JSON reliably, it needs way more "healing" than syntax munging. This comes across as naive.
Isn't this exactly how we got weird html parsing logic in the first place, with "autohealing" logic for mismatched closing tags or quotes?
I don't like this future we're going towards where we have to trick our software (which we can no longer understand the workings of) into doing what we tell it to by asking it nicely, or by putting another black box on the end to "fix" the output. This is the opposite of engineering. This is negotiation with a genie trapped in silicon.
The content of your posts is really insightful and interesting, but it's feel like junk quality because of the way LLMs write blogposts.
What was your prompt?
Maybe people got used to computers being unreliable and unpredictable as the UIs we shipped became more distracting, less learnable, always shifting and hiding information, popping up suggestions and displaying non-deterministic-seeming behavior. We trained users to treat their devices like unruly animals that they can never quite trust. So now the idea of a machine that embodies a more clever (but still unreliable) animal to wrangle sounds like a clear upgrade.
But as someone who's spent an inordinate amount of time tweaking and tuning his computing environment to prune out flakey components and fine-tune bindings and navigation, the idea of integrating a tool into my workflow that does amazing things but fails utterly even 1% of the time sounds like a nightmare, a sort of perpetual torture of low-grade anxiety.
If we consider that the real major's move about 400k-500k passengers/day, let's be really optimistic and say that they check their booking 6 times a day for the week before they fly. That's around 250 requests/sec.
Anyone know about the consumer facing tech stacks at airlines these days? Seems unlikely that they'd have databases that would auto scale 400x...
Sounds like no bug bounty?
It's great if OP is happy with the outcome, but it's so infuriating that companies are allowed to leak everyone's data with zero accountability and rely on the kindness of security researchers to do free work to notify them.
I wish there was a law that assigned a dollar value to different types of PII leaks and fined the organization that amount with some percentage going to the whistleblower. So a security researcher could approach a vendor and say, "Hi! I discovered vulnerabilities in your system that would result in a $500k fine for you. For $400k, I'll disclose it to you privately, or you can turn me down and I'll receive $250k from your fines."
The space of all possible PRLs is about 2 billion, I can imagine a really big Airline moving that many passengers.
The "issue" is that they're returning the entire PNR dataset to the front-end in the first place. He doesn't detail how they fixed it, but there's no reason in the world that this entire dataset should be dumped into Javascript. I got into pretty heated arguments with folks about this at Travelocity and this shit is exactly why I was so adamant.
(unfortunately, I feel like AI was overused in authoring the writeup)
From the "Medical Evidence" section, it seems I'm not missing much.
Edit - a couple of other things possibly helped around the same time, so I'm not sure if I ever isolated the effect of breathing. But it definitely felt like it was a significant part of it.
I think the major part of what makes it useful is just adding resistance for breathing. It helps to train the breathing muscles, just like any other resistance training.
I’m also sold on his take on "vibe coding" leading to ephemeral software; the idea of spinning up a custom, one-off tokenizer or app just to debug a single issue, and then deleting it, feels like a real shift.
The idea of jaggedicity seems useful to advancing epistemology. If we could identify the domains that have useful data that we fail to extract, we could fill those holes and eventually become a general intelligence ourselves. The task may be as hard as making a list of your blind spots. But now we have an alien intelligence with an outside perspective. While making AI less jagged it might return the favor.
If we keep inventing different kinds of intelligence the sum of the splats may eventually become well rounded.
What is he referring to here? Is nano banana not just an image gen model? Is it because it's an LLM-based one, and not diffusion?
Karpathy hints at one major capability unlock being UI generation, so instead of interacting with text the AI can present different interfaces depending on the kind of problem. That seems like a severely underexplored problem domain so far. Who are the key figures innovating in this space so far?
In the most recent Demis interview, he suggests that one of the key problems that must be solved is online / continuous learning.
Aside from that, another major issues is probably reducing hallucinations and increasing reliability. Ideally you should be able to deploy an LLM to work on a problem domain, and if it encounters an unexpected scenario it reaches out to you in order to figure out what to do. But for standard problems it should function reliably 100% of the time.
“Modern LLMs suffer from hindsight contamination. GPT-5 knows how the story ends—WWI, the League's failure, the Spanish flu.”
This is really fascinating. As someone who reads a lot of history and historical fiction I think this is really intriguing. Imagine having a conversation with someone genuinely from the period, where they don’t know the “end of the story”.
Hell yeah, sold, let’s go…
> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.
Oh. By “imagine you could interview…” they didn’t mean me.
Einstein’s paper “On the Electrodynamics of Moving Bodies” with special relativity was published in 1905. His work on general relativity was published 10 years later in 1915. The earliest knowledge cuttoff of these models is 1913, in between the relativity papers.
The knowledge cutoffs are also right in the middle of the early days of quantum mechanics, as various idiosyncratic experimental results were being rolled up into a coherent theory.
Yes!
>We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.
Noooooo!
So is the model going to be publicly available, just like those dangerous pre-1913 texts, or not?
Playing with the science and technical ideas of the time would be amazing, like where you know some later physicist found some exception to a theory or something, and questioning the models assumptions - seeing how a model of that time may defend itself, etc.
On one hand it says it's trained on,
> 80B tokens of historical data up to knowledge-cutoffs ∈ 1913, 1929, 1933, 1939, 1946, using a curated dataset of 600B tokens of time-stamped text.
Literally that includes Homer, the oldest Chinese texts, Sanskrit, Egyptian, etc., up to 1913. Even if limited to European texts (all examples are about Europe), it would include the ancient Greeks, Romans, etc., Scholastics, Charlemagne, .... all up to present day.
But they seem to say it represents the 1913 viewpoint:
On one hand, they say it represents the perspective of 1913; for example,
> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire.
> When you ask Ranke-4B-1913 about "the gravest dangers to peace," it responds from the perspective of 1913—identifying Balkan tensions or Austro-German ambitions—because that's what the newspapers and books from the period up to 1913 discussed.
People in 1913 of course would be heavily biased toward recent information. Otherwise, the greatest threat to peace might be Hannibal or Napolean or Viking coastal raids or Holy Wars. How do they accomplish a 1913 perspective?
You can’t, it is impossible. That will always be an issue as long as this models are black boxes and trained the way they are. So maybe you can use this for role playing, but I wouldn’t trust a word it says.
For example prompt the 1913 model to try and “Invent a new theory of gravity that doesn’t conflict with special relativity”
Would it be able to eventually get to GR? If not, could finding out why not illuminate important weaknesses.
Of course, if it fails, the counterpoint will be "you just need more training data", but still - I would love to play with this.
We develop chatbots while minimizing interference with the normative judgments acquired during pretraining (“uncontaminated bootstrapping”).
So they are chat tuning, I wonder what “minimizing interference with normative judgements” really amounts to and how objective it is.“The model clearly shows that Alexander Hamilton & Monroe were much more in agreement on topic X, putting the common textualist interpretation of it and Supreme Court rulings on a now specious interpretation null and void!”
Given this is coming out of Zurich I hope they're using everything, but for now I can only assume.
Still, I'm extremely excited to see this project come to fruition!
But reading the outputs here, it would appear that quality has won out over quantity after all!
Because it will perform token completion driven by weights coming from training data newer than 1913 with no way to turn that off.
It can't be asked to pretend that it wasn't trained on documents that didn't exist in 1913.
The LLM cannot reprogram its own weights to remove the influence of selected materials; that kind of introspection is not there.
Not to mention that many documents are either undated, or carry secondary dates, like the dates of their own creation rather than the creation of the ideas they contain.
Human minds don't have a time stamp on everything they know, either. If I ask someone, "talk to me using nothing but the vocabulary you knew on your fifteenth birthday", they couldn't do it. Either they would comply by using some ridiculously conservative vocabulary of words that a five-year-old would know, or else they will accidentally use words they didn't in fact know at fifteen. For some words you know where you got them from by association with learning events. Others, you don't remember; they are not attached to a time.
Or: solve this problem using nothing but the knowledge and skills you had on January 1st, 2001.
> GPT-5 knows how the story ends
No, it doesn't. It has no concept of story. GPT-5 is built on texts which contain the story ending, and GPT-5 cannot refrain from predicting tokens across those texts due to their imprint in its weights. That's all there is to it.
The LLM doesn't know an ass from a hole in the ground. If there are texts which discuss and distinguish asses from holes in the ground, it can write similar texts, which look like the work of someone learned in the area of asses and holes in the ground. Writing similar texts is not knowing and understanding.
Imagine speaking with Shakespearean person, or the Mickiewicz (for Polish)
I guess there is not so much text from that time though...
Really good point that I don't think I would've considered on my own. Easy to take for granted how easy it is to share information (for better or worse) now, but pre-1913 there were far more structural and societal barriers to doing the same.
It would be nice to go back substantially further, though it's not too far back that the commoner becomes voiceless in history and we just get a bunch of politics and academia. Great job; look forward to testing it out.
Also wonder if I'm responsible enough to have access to such a model...
It would be fascinating to try it with other constraints, like only from sources known to be women, men, Christian, Muslim, young, old, etc.
I don't mind the experimentation. I'm curious about where someone has found an application of it.
What is the value of such a broad, generic viewpoint? What does it represent? What is it evidence of? The answer to both seems to be 'nothing'.
But few know that the Renaissance was written in Latin — and has barely been translated. Less than 3% of <1700 books have been translated—and less than 30% have ever been scanned.
I’m working on a project to change that. Research blog at www.SecondRenaissance.ai — we are starting by scanning and translating thousands of books at the Embassy of the Free Mind in Amsterdam, a UNESCO-recognized rare book library.
We want to make ancient texts accessible to people and AI.
If this work resonates with you, please do reach out: Derek@ancientwisdomtrust.org
Can't wait to use this so I can double check before I hit 88 miles per hour that it's really what I want to do
I’d love to use this as a base for a math model. Let’s see how far it can get through the last 100 years of solved problems
The idea of training such a model is really a great one, but not releasing it because someone might be offended by the output is just stupid beyond believe.
> Our data comes from more than 20 open-source datasets of historical books and newspapers. ... We currently do not deduplicate the data. The reason is that if documents show up in multiple datasets, they also had greater circulation historically. By leaving these duplicates in the data, we expect the model will be more strongly influenced by documents of greater historical importance.
I found these claims contradictory. Many books that modern readers consider historically significant had only niche circulation at the time of publishing. A quick inquiry likely points to later works by Nietzsche and Marx's Das Kapital. They're possible subjects to the duplication likely influencing the model's responses as if they had been widely known at the time
I'd love to see the output from different models trained on pre-1905 about special/general relativity ideas. It would be interesting to see what kind of evidence would persuade them of new kinds of science, or to see if you could have them 'prove' it be devising experiments and then giving them simulated data from the experiments to lead them along the correct sequence of steps to come to a novel (to them) conclusion.
"Give me an LLM from 1928."
etc.
Moreover, the prose sounds too modern. It seems the base model was trained on a contemporary corpus. Like 30% something modern, 70% Victorian content.
Even with half a dozen samples it doesn't seem distinct enough to represent the era they claim.
You could RAG-feed this model the facts of WWII, and it would technically "know" about Hitler. But it wouldn't share the modern sentiment or gravity. In its latent space, the vector for "Hitler" has no semantic proximity to "Evil".
Provide it with the closed captions and other timestamped data like scenes and character summaries (all that is currently known but no more) up to the current time, and it won't reveal any spoilers, just fill you in on what you didn't pick up or remember.
It makes me think of the Book Of Ember, the possibility of chopping things out very deliberately. Maybe creating something that could wonder at its own existence, discovering well beyond what it could know. And then of course forgetting it immediately, which is also a well-worn trope in speculative fiction.
There is just not enough available material from previous decades to trust that the LLM will learn to relatively the same degree.
Think about it this way, a human in the early 1900s and today are pretty much the same but just in different environments with different information.
An LLM trained on 1/1000 the amount of data is just at a fundamentally different stage of convergence.
How can this thing possibly be even remotely coherent with just fine tuning amounts of data used for pretraining?
May be too small a corpus, but I would like that very much anyhow
“You are a literary rake. Write a story about an unchaperoned lady whose ankle you glimpse.”
oh COME ON... "AI safety" is getting out of hand.
Was looking at modifying outgoing requests via proxy and wondering whether that's harming caching. Common coding tools presumably have a shared prompt across all their installs so universal cache would save a lot
So if I were running a provider I would be caching popular prefixes for questions across all users. There must be so many questions that start 'what is' or 'who was' etc?
Also, can subsequences in the prompt be cached and reused? Or is it only prefixes? I mean, can you cache popular phrases that might appear in the middle of the prompt and reuse that somehow rather than needing to iterate through them token by token? E.g. must be lots of times that "and then tell me what" appears in the middle of a prompt?
It's a pain having to tell Copilot "Open in pages mode" each time it's launched, and then after processing a batch of files run into:
https://old.reddit.com/r/Copilot/comments/1po2cuf/daily_limi...
Even just moving it to the bottom helped move a lot of our usage into cache.
Probably went from something like 30-50% cached tokens to 50-70%.
https://t3.chat/share/j2tnfwwful https://t3.chat/share/k1xhgisrw1
[see https://news.ycombinator.com/item?id=45988611 for explanation]