Architectural choices are interesting to talk about, but I think most people reading this won't have any context to compare against, me included. How does this compare to e.g. the architecture of V8? What benefits do these choices give when compared against other engines? Etc, reading through the list it's easy to nod along, but it's hard to actually have an intuition about whether these are good choices or not.
It reads like an experimental approach because someone decided to will it into existence. That and to see if they can achieve better performance because of the architectural choices.
> Luckily, we do have an idea, a new spin on the ECMAScript specification. The starting point is data-oriented design (...)
> So, when you read a cache line you should aim for the entire cache line to be used. The best data structure in the world, bar none, is the humble vector (...)
> So what we want to explore is then: What sort of an engine do you get when almost everything is a vector or an index into a vector, and data structures are optimised for cache line usage? Join us in finding out (...)
The impetus for the engine design is indeed, as you say, "someone decided to will it into existence."
A friend of mine who works in the gaming industry told me about the Entity Component System architecture and I thought: Hey, wouldn't that work for a JavaScript engine? So I decided to find out.
Nova itself has already been created at that point and I was part of the project, but it was little more than a README. I then started to push it towards my vision, and the rest is not-quite-history.
A friend of mine who works in the gaming industry told me about the Entity Component System architecture and I thought: Hey, wouldn't that work for a JavaScript engine?
That was the first thing I thought of when I saw your description. But the reason ECS works well is cache coherence. (Why) would a general-purpose runtime environment like a JS engine benefit from ECS? Or alternatively, have you seen performance improvements as a result?
I guess the opposite could also be asked: Why would a game benefit from ECS? A player in the game can do basically anything, there's no guarantee that things are always perfectly accessed in a linear order.
It comes down to statistics: Large data sets in a general-purpose runtime environment are still created through parsing or looping, and they are consumed by looping. A human can manually create small data sets of entirely heterogenous data, but anything more than a 100 items is already unlikely.
Finally, the garbage collector is a kind of "System" in the ECS sense. So even if the JavaScript code has managed to create very nonlinear data sets, the garbage collector will still enjoy benefits. (Tracing the data is still "pointer chasing" but when tracing we don't need to trace in the data order but can instead gather a collection of heap references we've seen, sort them in order and then trace them.)
> Why would a game benefit from ECS? A player in the game can do basically anything, there's no guarantee that things are always perfectly accessed in a linear order.
There's actually a guarantee that things are mostly going to be accessed in a linear order because player actions don't matter to the execution of the simulation. The whole simulation is run at 1/FPS intervals across the whole set of entities, regardless of player input (or lack thereof).
In an ECS the whole World is run by Systems, which operate on Components. This is why cache locality works there: when the Movement System is acting, it's operating on the Position Component for all (or at least many) Entities, so linear array access pattern is very favorable. Any other component in your cache is going to be unused until the next system runs (and then the Position Component will become the useless data in cache). That's why you'd rather have an array of Components in cache instead of an array of Entities.
This access pattern is very suitable for games because the simulation is running continuously in an infinite loop (the game loop) consisting of even more loops (the Systems running), but not so much for general purpose computation where access patterns are mostly random. (EDIT: or rather, local to each "entity".)
It is very true that a general purpose computation can theoretically do anything and mess linearity of access patterns entirely up. But in practice programs do most of their work in very linear fashion. It's not by chance that eg. V8 will try to write objects parsed from a JSON array of objects one right after the other. So in a sense we can say that the JavaScript program itself becomes the System with a capital S.
That is not to say that Nova's heap vectors will necessarily make sense: The two big possible stumbling blocks are 1) growing of heap vectors possibly taking too long, and 2) compacting of heap vectors during GC taking too long.
The first point basically comes down to the fact that, at present, each heap vector is truly a single Rust Vec. When it can no longer fit all the heap data into it, it needs to reallocate. Imagine you have 2 billion ordinary objects, and suddenly the ordinary objects vector needs to reallocate: This will cause horrible stalls in the VM. This can be mitigated at the cost of splitting each heap vector into chunks, but this of course comes at the cost of an extra indirection and some lack of linearity in the memory layout.
The second point is more or less a repeat of the first: Imagine you have 2 billion ordinary objects, and suddenly a single one at the beginning of the vector is removed by GC: The GC has to now move every object remaining in the vector down a step to make the vector dense again. This is something that I cannot really do anything about: I can make this less frequent by introducing a "minor GC" but eventually a "major GC" must happen and something like this can then be experienced. I can only hope that this sort of things are rare.
The alternative would be to do a "swap to tail", so the last item in the vector is moved to take the removed item's place. But that then means that linear access is no longer guaranteed. It also plays havoc on how our GC is implemented but that's kind of a side point.
Software engineering is architecture is full of trade-offs :) I'm just hoping that the ones we've made will prove to make sense.
> It's not by chance that eg. V8 will try to write objects parsed from a JSON array of objects one right after the other.
Yes, but note this is still a different pattern of access (array of "entities"). V8 does this because it assumes that e.g. `foo.name` is very likely going to be accessed along with `foo.lastName` (which is likely the 99% case for general computing) whereas ECS assumes `foo.name` is very likely going to be accessed along with `foo2.name`, `foo3.name`, ..., `fooN.name` (which is the 99% case for videogame timestep loops).
> Software engineering is architecture is full of trade-offs :) I'm just hoping that the ones we've made will prove to make sense.
To clarify: my comment is not a criticism of Nova's design decisions. I was only trying to clarify the answer to "Why would a game benefit from ECS?" for those foreign to ECS's existential motive.
I'm sure Nova's tradeoffs make sense for some workloads and I wish you the best!
> Yes, but note this is still a different pattern of access (array of "entities").
I was referring to the `[foo, foo2, foo3]` objects themselves; V8 does use an "cache local" placement for those so you'll find them laid out in memory as:
For what it's worth, I am interested in laying object properties out in an ECS like manner in Nova, so the properties would be laid out as `[foo.name, foo2.name, foo3.name, ...]`, but currently the properties are laid out similarly to V8, `[foo.name, foo.lastName]`. The only difference is that we do not have "in object properties".
That being said: I am obviously biased, but I do wonder if an ECS-like layout wouldn't be nearly universally beneficial. Thinking of the `foo.name` and `foo.lastName` access: If those are on the same cache line then accessing the two only reads one cache line. This is nice. But if there are more properties in the objects (and there often are), then those will pollute the cache. If you do this access once, it doesn't matter. If you do this a million times, now the cache pollution becomes a real issue: In Node.js even the optimal case would be that you read read 625,000 cache lines worth of data, only to discard 250,000 cache lines of it.
If instead we use an ECS-like layout, then accessing these two properties reads two 10100cache lines: That's bad, but on the other hand if this happens once then it won't even make a blip on the screen. If a million of these accesses are done, you could think that we'd suddenly be slow as molasses but now the ECS-like layout is probably going to help you: You're more likely reading the next `name` and `lastName` property values on each access. If you have it bad and only half of the property data you read is actually the `name` and `lastName` properties you want, then you read 750,000 cache lines and lose out to the traditional engine by 100,000 cache lines. If you get 67% "hit rate" then you break even. And that's comparing to the case where the objects only contain `name` and `lastName` and nothing more.
It of course all comes down to statistics but... I'm very interested in the potential benefits here :)
Again, thank you for your comments, I've enjoyed discussing and pondering this <3
That would be one way; it would offer the best theoretical memory layout for well-behaved programs. But, I expect it to be very painful to work with and to come with some performance penalties in mixed use cases due to the extra indirection required.
No, my thinking is that properties would be stored into tables based on their size class: All objects that have 4-7 properties are in the same table, and all of their first property would be in the same slice, second property in another etc.
That sounds to me like you'll end up getting little benefit from ECS then. Let's say the JavaScript program is iterating over a hundred thousand instances of some Foo class which happens to have 6 properties. You'd ideally want good use of cache, but if your object vector that has all of the Foo objects also happens to have all sorts of instances of other types that have the same field count mixed in there, then you're going to spend a lot of time skipping over those unrelated objects and refreshing the cache.
I know that ECS is treated as a silver bullet by a lot of people, but my experience is that it really only works well when the data you're working with is statically typed so that you can actually partition into arrays where each array does represent a single meaningful type.
That is definitely possible: If it turns out to not make sense then I can at least always go back go the current system where all properties of an object as stored in a single array.
It's not as ECS'ssy as one would hope but it's at least proven technology :D
Virtual alloc your vectors so you can add more backing memory without having to modify the addresses of existing items. Compaction can reap only empty pages but you’ll still need some moving to avoid over fragmentation.
Yeah, virtual alloc for the Vec backing memory is something I hope to do _one day_. It's not a very pressing concern however, as it requires going much lower in the stack.
I would have to keep a free list per vector, and there's a lot of those, and it would defeat the point of keeping data packed and temporally colocated: I want to ensure that data was created together stays together and only ever gets more compact as it grows older.
Filling in empty slots would mean that likely unrelated data comes and pollutes the cache for the old data :(
I get your point but a few gaps here and there likely don't matter at all for performance. At least it's a lot better than making everything super compact all the time. Assuming you are splitting vectors at some points into chunks: In such a world you could choose to get rid of chunks with a lot of gaps and move the remaining entries into other chunks. At that point you really have a regular GC.
And the free list could be stored in the vector itself. E.g. if an entry is empty it would store the pointer to the next free entry. So all you need is a single head/tail index for each vector.
I also wonder how you handle pointers which could point into one of many vectors. E.g. a field could easily point either to an object or an array. Do you plan to pack this vector id into the 32-bit value? If so wouldn't there be a lot of dispatch like this as well:
A few gaps won't matter, and that to me speaks of a split between major and minor GC making sense. However, I'm not really sold on that meaning a free list making sense. For one, if I split the heap vectors into single value parts, then holding free slot data in any of them will become somewhat complicated. Hence at least for the foreseeable future I'm 100% in on the compacted heap vectors idea :) Time will tell if the aggressive compacting makes sense or not.
Our JavaScript Values are the full 8 bytes (yes, this is large and it pains me, but it does give us all integers on stack, most doubles on stack, and up to 7 bytes of string data on stack), so a field that can point to any kind of object stores a byte tag and a u32 index. I might pack this down to 1+3 bytes or so, at the cost of supporting smaller maximum number of objects in the engine. JS Value itself would still probably remain 8 bytes because of the stack data benefits.
There is indeed dynamic dispatching through match statements, though it generally happens at the specification method level. Eg. A specification method to get a property from an object will match on the tag and then dispatch to a concrete method with the index as parameter. The indexes are typed as well, so from this point on we statically know we're dealing with eg. an Array.
So there is dynamic dispatch yes, but we try to eliminate it at the earliest opportunity. We probably still have more of that than a traditional engine would have though: A traditional engine will keep the tag on the heap and there is some dynamic dispatch done based on that, but at least your data lookup isn't based on dispatch.
I think you're on to something (important). Decomposing structs into separate arrays (heaps) is becoming a thing. eg Rust and others are introducing language features to manually (explicitly) do so. It could be cool if the runtime just handled it.
I stumbled across a new research language with new syntax for just this purpose, to better express iteration and lambdas. IIRC.
Sorry, I was looking for something else (got nerdsniped by u/hinkley's mention of Erlang's "set-theoric types" ), and didn't bookmark it. If I find it again, I'll forward the link.
Maybe someone else here knows what I'm talking about.
V8 could probably implement the backing object "trick" with some trouble. I'm half-hoping that Nova will show it to be worth their while and that they will eventually do it. It will be a major refactoring of the engine, however.
The heap vector "trick" is basically impossible, I believe. It wouldn't be a refactoring so much as it would be a complete rewrite of the engine. The entirety of V8 assumes it deals in pointers, and all of that would need to change to using indexes instead. I will eat my hat if they do it. Without heap vectors they can still split object data apart using pointer-keyed hash maps, so maybe they could take advantage of some of the ideas still.
V8 does offer ways to run code without optimisations, which we can use for a more apples-to-apples comparison. The most important optimisation that Nova really needs before any big performance comparisons become meaningful is property access inline caching, which requires implementing object shapes.
I'd say that once object shapes are done, then limited performance comparisons can probably be made, especially if V8's JIT is disabled.
Yup, and to a degree the whole "heap references as indexes" idea was inspired by pointer compression. Not in a direct sense of "hey, look at that, what if I took it a step further?" but as I was thinking of the indexes I realised that it looks a lot like pointer compression, and that made me think it is a viable idea.
Not really: In my daydreams Nova becomes the premier JS engine in the world and takes the crown from V8. If V8 went all in and basically just copied all of Nova... I'd probably still develop Nova, as I don't want to work with C++ that much.
If V8 copied all of Nova AND adopted Rust, I might consider laying Nova to rest and going into V8 development. But I'd probably also be really angry at V8 just taking all of Nova's good ideas and peddling them off as their own without crediting Nova. So probably I'd still keep developing Nova while stewing in my anger and inability to do anything about it :)
I hope Nova can be a spark that ignites the JavaScript world into a bit of a renaissance with some of its ideas, but the point is not to burn bright and burn out. The point is to burn bright and stay lit.
Optimal memory layout isn't something you can know in advance. The optimal way to lay out objects in memory is exactly in the order that they will be accessed, but the runtime doesn't know that until after the user program has accessed them.
And if the program accesses a set of objects in different orders at different times, there is no one optimal layout.
I did ask Lars Bak once if they spent a lot of time thinking about cache usage and organizing objects in memory to take best advantage of it and, if I recall correctly, his answer was basically "no". They definitely think about it terms of packing objects into small amounts of memory. But in a dynamically typed language like JavaScript where every property is a reference to some other object elsewhere in memory, using the cache well is just profoundly hard.
Hell, it's hard even in Java where at least you do know the set of fields any given class has.
Isn't data-oriented design about organizing the data in a way that reflects the most common access patterns of the program?
The approach of placing all numbers in a big number vector, all Arrays into a big Array vector, and so on, would be "data-oriented design" if it actually reflects the most common access patterns. So, is it the case that when you read a number you also want all those other numbers that come together with it in the cache line? Is that the case for Arrays? For DataViews? In other words, does this approach to allocating memory reflect the most common data access patterns in JavaScript programs?
I'm not saying it's a bad approach, and I'm not even trying to imply that it's not DOD, I'm genuinely asking.
Edit: maybe a better question is: does it reflect the most common data access patterns of a JavaScript Engine?
Excellent question: In a theoretical sense the answer would be that we cannot know since it depends on the JavaScript being run. But: In practice I think that is indeed the case. Especially for the more common an object is, the likelier it is that it is used in conjunction with others around it. At the same time, the more important their memory placement becomes.
eg. Say you have a JS programs that has about a 100 DataViews: I'd say it's unlikely these are used in conjunction with others very often, but they're also only a small part of the program, so their placement is mostly whatever.
Now what if that number is a million instead? Now I'm betting they're mostly all created together, used together, and that their placement is critical to the program's performance.
So, I'm betting that making random memory access performance worse while guaranteeing that data created together stays together and improving linear memory performance will be an overall win.
Whether this is true data-oriented design is then in the eye of the beholder: Maybe someone will think I'm wrong, my assumptions are wrong, and I'm thus not doing things in a data-oriented way.
That's a good point. The "Internals of Nova" blog posts do a bit more explicit comparisons to V8.
In V8, and other production engines AFAIK, objects are variable-sized monoliths: All of their statically known data is contained in one slab. This means that for example in Node.js an empty ArrayBuffer is 96 bytes in size (IIRC).
Basically, they implement the ECMASCript specification defined inheritance chain using object-oriented class inheritance.
1. All data in V8 is allocated into one of many heap parts: Usually new data goes into a nursery space, and if it does not get GC'd it moves to the old space. Relative position of data isn't really guaranteed at this point.
2. All heap references in V8 are true pointers or, if pointer compression is used, offsets from the heap base.
3. All objects in V8 include all the data needed for them to act as objects, and all of their data is stored in a single allocation (with the exception of properties, with some exceptions). The more specialised an object is, say an ArrayBuffer, Uint8Array, or a DataView, the bigger it has to be as the specialisation requires more data to be stored.
This is a great idea! I had thought about doing this with a lisp interpreter. I had identified a few key advantages:
- homogenous allocation means no alignment gaps
- linear access win in garbage collection
- indices smaller than pointers
- type discriminated index can save some size
I haven’t verified whether those actually work out in the details. I’ll read your blog article.
Don’t bother with these comments immediately comparing it to V8 (a multi billion dollar venture). I don’t know how many creative projects they’ve done before.
You may be be interested in looking at Fabrice Bellard’s JS engine for ideas.
I actually made a Lisp interpreter in Zig a couple of years ago that has each object type in a separate heap array. In fact each field of each object type has its own array: every CDR is in one contiguous array. This was mostly for fun and to experiment with data-driven techniques using Zig metaprogramming. The code turned out relatively clean and simple.
It also has delimited continuation control, compiles to WebAssembly, and hooks promises into the continuation system, among some other pretty cool features!
Well I'll be damned! That sounds very much like what I want Nova to eventually be :) We don't have fields split apart at present, mostly because Rust doesn't make that quite as easy as I would want to. Otherwise it sounds like it's very much all the same, in a good way.
I'll definitely be taking a look at wisp, thank you very much for the link! If you ever have the time, I'd love seeing a comparison of this sort of engine design against a more traditional one.
I've flipped that idea around in a few of my own language designs, where pairs are the central feature and lists are just pairs with pair cdrs. Works fine from what I can see.
Oh yeah, continuation pointers also have their own array like every other field kind, which should have similar benefits as list traversal but for continuation copying... It's a really interesting design area, I think. Zig makes it easy!
That's how I got interested in this kind of memory layout in the first place. I wanted a nice Lisp for WebAssembly and had recently gotten into Zig. When I started defining the word structure I remembered Andy Kelley's talk about using data-oriented design to make the Zig compiler fast, so I thought I'd try it, and the more I thought about it the more reasonable it seemed.
There are like a dozen object types with different growing multiarrays. Words are 32 bit with 1 for GC state and 27 for index and the rest are the type tag. Ints are 28 bits. Byte arrays have their own heap too, as well as general 32 bit vectors.
Thank you for the encouragement! Avoiding alignment gaps is indeed pretty great: I have a vision of packing Arrays into 9 bytes split over two or three cache lines.
On typed indexes: If we accept only about 2^24 possible index values then we could use a 32 bit integer for our Values, or at least for Objects (if we want to keep 7 bytes worth of stack data, which is pretty hard to pass on).
I love the comments comparing Nova to V8: That's what I want to aim for after all :) I'm not sure I've heard of Fabrice Bellard's JS engine, thanks, I'll take a look!
Isn't data oriented design driven by knowing what your data accesses look like? In your engine, you're building as if you're assuming that common data access will be linear access over objects of the same type. Why?
Yeah, know your data and how it is used. I assume that data access is mostly linear because of a few reasons:
1. All performance issues arise in loops: I at least have never seen a performance problem that could be explained by a single thing happening once. It is always a particular thing happening over and over again.
2. All loops deal with collections of data, and the collections are usually created either created manually by a human being, or are created through parsing or looping many at a time.
3. A human being can manually create a collection of maybe a hundred items manually before they get bored and stop. A collection created this way may contain data from all over the place, with data access over it being nonlinear.
4. A collection created through parsing or looping will create its data in a mostly linear fashion. Accessing the data will then also be linear.
There are definitely cases where nonlinear collections exist, but these are usually either small or are created from smaller sets of linear data. eg. Think of dragging 10 lists of 1000 items to form a list of 10000 items. The entire 10000 items aren't going to be located linearly, but every 1000 items will be.
So in effect, I'm betting that most hot loops do deal with linear access over objects and that loops that work over nonlinear access are not particularly hot.
I recommend reading /Don’t Stop the BIBOP: Flexible and Efficient Storage Management for Dynamically Typed Languages/ (1994)[1]. "BIBOP" stands for "Big Bag of Pages."
The aim is absolutely to implement the entire ECMAScript specification. Progress has slowed down recently, as I've been both busy with other things and tied up in making the engine work with interleaved GC.
A secondary aim is to have a bunch of feature flags that allows the engine to drop out support for specification parts that a particular embedder doesn't care about. That obviously fights with the "implement the entire ECMAScript specification" goal, but I just hate indexed property getters and setters with a passion and want to see them gone wherever I go.
Boa is a great project and I believe it is being used in some production systems. I've met and exchanged some ideas with the main developer, Jason Williams, and even received the greatest praise that I could imagine: Boa will (or did?) take some inspiration from Nova on its GC refactoring. Nova has also copied (with proper attribution of course) a few minor parts from Boa, like whitespace skipping code for some spec abstract operations.
I highly recommend keeping an eye out and using Boa if you have the chance.
"Numbers go into the numbers vector" is unusual - typically JS engines use either NaN-boxing or inline small integers (e.g. v8 SMI). I suppose this means that a simple `this.count += 1` will always allocate.
Have you considered using NaN-boxing? Also, are the type-specific vectors compacted by the GC, or do they maintain a free list?
We do have all safe integers inline (and most doubles too).
I answered about NaN boxing somewhere here but basically, we get quite a bit of mileage from our tagged union / enum / ADT based Value, so I don't think I'd change to NaN boxing now even if I could.
Do you have some specific application profile in mind?
Sounds like this approach could be useful for games that embed a scripting engine. In that context it might be interesting to eventually see some benchmarks against usual suspects of game scripting like Lua.
The plan is to eventually get to full ECMAScript specification compatibility, and who knows if that would then bring us to eg. the Servo browser or Deno JS runtime.
In the short term, I am interested in one-shot script running scenarios where only very limited JavaScript type are needed. The engine already has a bunch of feature flags that can be turned off to disable things like ArrayBuffers and other "complex" features. I have a work-related system in mind where only JSON based types are needed, and garbage collection isn't really necessary: The code could be run once and afterwards the system could be wiped down to the initial state and re-run.
I also have half-a-mind to try running Nova on an STM32 board. But that could be called a hobby project within a hobby project :)
What exactly is meant by the word "kind" here when you say kind-specific vectors?
If I have `function X(a) { this.a = a; }` and then `function Y(b) { this.b = b; }` does that mean `new X(1)` and `new Y(2)` are considered objects of different kinds?
And what about creating objects with literals: are `{a: 1}` and `{b: 2}` considered objects of different kinds?
They're specification wise, statically separate kind of objects: A class constructor is special and requires more "internal slots" so they live in their own vector, as do ArrayBuffers, TypedArrays, Arrays, DataViews, Maps, Sets, ...
But objects that have different shapes do not end up in their own vectors, since the shape is a dynamic property.
I’m wondering how this interacts with the “young objects mostly die” assumption of a generational garbage collector. It seems like using an arena for the young generation might work better for some programs, while an ECS-like scheme works better for other programs.
Thank you for asking! I've not implemented and thus haven't proved this in action yet, but my thinking is that this interacts very well indeed: Each heap vector can designate an index that marks the beginning of the young generation. We don't need separate old and new spaces, instead promotion is just the act of moving the young generation beginning index up.
Side note: I have a corollary on the "most objects die young" that is very much at the heart of Nova: Most objects live together. If they are created at the same time, then they're likely also used together. Hence why I don't swap items around in the heap vectors, or use a free list for allocation: It would mess up the temporal order of items in the vectors, leading to less chances at useful cache line sharing.
Don’t you need to move the surviving young generation objects after the ones they’re surrounded by die? Otherwise the array is going to end up rather sparse, with a lot of unused array entries.
Without either a free list or compaction, I don’t really see how you’re collecting garbage at all.
Yes, I do need to compact the young generation during GC. Eg. Let's say I have the young generation starting at index 1000 and I do GC with 1100 items, with 10 items surviving. I'll have to compact the remaining 10 items to the 1000..1010 span of the vector, after which I can also decide to promote the bottom two young generation indexes to the old generation, making the next young generation start index 1002.
Nova only has a bytecode compiler and interpreter. I do not plan on trying my hand at JIT compiling any time in the future. In this I am a follower of Ladybird's Andreas Kling and hope that JIT will not become necessary.
I'm curious why you think JIT will not become necessary. My impression was that optimizing JIT compilers will basically always be multiple times faster than an interpreter.
I'm mostly just hoping it won't become necessary, though that is perhaps a vain hope.
The reasoning is that, according to my interpretation of talking with some folks working on JSC and SM, property lookup inline caching is the most important performance optimisation bar none. JIT compiling is an improvement on top, definitely, but it is not an massive step change.
Safari browser has a no-JIT mode that is fairly widely in use, and it is apparently fast enough that you don't really notice the change. Ladybird browser's LibJS has no JIT compiler, yet LibJS isn't really unbearably slow: The browser's biggest performance woes come from the browser around it and especially from having the simplest possible drawing algorithm possible.
From a "personal" experience, while the test262 compliance test set is no performance benchmark, Nova is for some reason consistently at the very top of the runtime list over at https://test262.fyi/#. This is of course partially just because we're really quick to do a controlled panic if an unsupported code path is called, and the remaining part is because the code is run so little that JIT doesn't get to kick in. Still, this meaningless number gives me some measure of hope: We're consistently 3 times as fast as V8 after all :)
Nope. It rather explains the memory ownership of a JavaScript heap in a way that Rust understands: The heap owns all data in the heap, and JavaScript objects holding "references" to other objects do not imply memory ownership in the sense that Rust understands it.
So, safery properties are not being silenced: The indexes definitely _are_ Rust wise unreliable where a pointer wouldn't be so bounds checks need to be done. But memory safety is not under threat here.
This does mean that we have to take care of garbage collection ourselves, Rust will not do that for us, but that was the case anyhow since Rust doesn't have a garbage collector we could use (thank heavens). If we make mistakes here, it will lead to the JavaScript heap being corrupted from the JS code point of view but from the engine point of view the memory is still fully safe: The worst thing that can happen is a panic from out of bounds vector indexing.
I think the point is that "memory safety", understood broadly, goes beyond that. You still have the notion of dangling pointers and such with indices, and while it doesn't allow for e.g. stack corruption, it still exposes many similar classes of bugs - for example, a "dangling" index to a deallocated object can potentially allow the code to read or write into a completely different object than it was originally pointing to. Hence, silent logic errors, potentially exploitable as well.
Yeah, sure in that sense this partially turns off Rust's safety features. That being said: A big part of making the engine be safe for interleaved GC is about using ZST's with lifetimes held inside them to bind any JavaScript Values when they appear on the stack and getting back parts of those security guarantees.
We can still make mistakes, especially in the garbage collector, but that is again somewhat helped by code-sharing and coding conventions enabled by Rust ie. using destructuring in GC to make sure we don't forget any part of the heap data. (Coding conventions are not a solution, they are a mitigation at most. I _can_ write the heap GC as a map from one heap data of 'old lifetime to 'new, but that leads to worse code generation than mutating a 'static lifetime heap data :( )
I don't know what "security safety" is so I must've gotten confused. If you mean type safety, then we do make sure to stay on top of that: Our JS Value is an enum that contains either stack data or a typed index that corresponds to the tag. So the Array variant holds an Array index etc. So it is not possible to take type of index and turn it into another type of index without transmute.
If you refer to referential safety, so that your reference to object X still refers to X later on, then that is indeed something we "lose" because we need to implement GC ourselves. But that wouldn't actually really meaningfully change with using pointers either, as updating pointers after a move would need to be done manually as well.
Using references is right out because we cannot explain the JavaScript memory ownership model to Rust: The two are simply not compatible. There are of course safe GC crates that give you reference APIs but they do the pointer updating manually on the inside (if they have moving GC anyway), so the situation doesn't meaningfully change.
The biggest obstacle right now is that for any reasonably big benchmark, Nova will never finish as the GC cannot be run while JavaScript is running and in a big benchmark JS is always running.
I've started a large-scale work to make the engine safe for interleaved garbage collection, but it's a ton of work and will take some time unfortunately. Once it is done, I will start doing benchmarks and seeing what takes time and where.
From small-scale benchmarks I already know that our JS Value comparisons take too much time, our object property lookups are really expensive on larger objects (as it's a simple linear search), and our String interning is very slow (as it too is a dumb-as-rocks linear search).
I hope so too :) I've contributed some minor bits of code to V8 and then worked on Nova for a year or two, but I'm still wet behind the ears compared to those folks. Any and all comments I can get from them is a blessing.
If I am not mistaken there are a few `TODO`s sprinkled all over code relating to function calls about implementing TCO. Shouldn't be too hard from what I can remember when I last looked over those parts of the code.
I'm coming at this with no real JavaScript implementation knowledge, so these comments might not even make sense...
The data sorting seems quite cleanly at first, but as I think more about it I don't quite get it. I guess you are saving a bit of space by segmenting by type... in another approach you might have the type on the pointer, and the pointer can point to anything, and so it's potentially a bit longer than having a type and pointer(/index) that points into a smaller portion of memory specific to that type. But enough to matter?
"No, pointers we do not want and cannot have, so the only real option is to use indexes. Indexes have a lot of benefits: They are small, work exceedingly well together with our heap vectors, enable using the same value to index into multiple heap vectors (or slices of the same heap vector), perform a form of pointer compression automatically, and offer great protection from safety vulnerabilities as reinterpreting an index as a different type changes both the type and the memory it indexes into."
That all just sounds like a pointer to me? The last case also seems like a security hole, not protection.
"Not all objects are the same: They differ in their usage and their capabilities. An object-oriented reading of JavaScript objects' capabilities and the ECMAScript specification would give you a clear and simple inheritance graph where the ordinary object is the base object class, and Arrays, DataViews, Maps, and others inherit from that. Not all objects are the same: They differ in their usage and their capabilities. An object-oriented reading of JavaScript objects' capabilities and the ECMAScript specification would give you a clear and simple inheritance graph where the ordinary object is the base object class, and Arrays, DataViews, Maps, and others inherit from that."
It seems like you are special-casing a specific set of object types (like Array), which is very justifiable. So sure.
"This is somewhat more of an aim for the future instead of current reality, but allow me to give some easy examples: The ArrayBuffer object in ECMAScript supports allocating up to 2^53 bytes of data. Most engines only allow a tad bit over 2^32 bytes but nevertheless, the fact of the matter is that you need more than 4 bytes to store that byte value. As a result, ArrayBuffer itself but also DataView and all the various TypedArray variants like Uint8Array must carry within them 8 byte data fields for byte offset, byte length, and even array length. Now ask yourself, how often do you deal with ArrayBuffers larger than 4 GiB? Not very often, obviously."
I'm guessing this is leading to a decision many languages have made about numbers and strings, where there's special types for small numbers and short strings (exposed only in the implementation). Or even more special types, where the pointers become values.
Also I can see a benefit to keeping track of "normal" Arrays and whatnot, so some of JavaScripts weird-but-not-usually-used behavior can be isolated, and normal behavior fast-tracked.
"In Nova we aim to split objects into parts to ensure that computationally unconnected parts are also stored separately in memory"
But this I don't get. If you are splitting things by type, how can you cluster them by how they are related? An object like {a: 1, b: 2} is an object with two strings and two numbers, presumably spread out over three different type-specific heaps?
Hey, thank you for the comment! I'll try to answer as best I can.
A pointer is 64 bits, though carrying much less useful payload than that. A JavaScript engine only rarely deals with more than 4 GiB of memory, so a 32 bit integer would be enough to index the entire memory needed. If you turn that though into indexes, a 32 bit index can speak of 4 billion separate items: Most programs never have that many distinct heap items alive at the same time. Note that this index doesn't now really correspond to indexable memory so we're no longer bound by the 4 GiB limit.
We actually do keep the 64 bit Value though! We just use the massive amounts of data to store a lot of data on the stack, avoiding heap allocations altogether.
> That just sounds like a pointer.
A pointer points to one place and one place only: An index can points to as many places as there are "parallel vectors" associated with it. eg. Think of a table: A row index refers to as many cells as there are columns, whereas a cell pointer only identifies one cell.
> The last case also seems like a security hole, not protection.
Usually JS engines don't consider the JS-accessible contents of the JS heap itself part of the threat model: Any object in the heap is liable to be leaked by the JS code running in the engine anyway. eg. V8's object placement is fairly static and easy to exploit. The important thing for safety is to avoid type confusion which can be used to create read/write primitives to punch out of the sandbox. So; an attacker can freely read through the heap data by creating heap indexes out of thin air but they cannot use that to reinterpret one type of data as another type and then feed that back to the engine to cause it to misbehave.
> But this I don't get. If you are splitting things by type, how can you cluster them by how they are related? An object like {a: 1, b: 2} is an object with two strings and two numbers, presumably spread out over three different type-specific heaps?
Yes, this would split into the ordinary object vector, and the object property vector. If the keys were longer they'd end up in the strings vector and if the values were heap allocated doubles then they'd end up in yet another vector. Looking at it one thing at a time, it is split here and there.
That being said, this doesn't really much change from how traditional engines do it: Strings are not going to be near the objects that use them as keys, nor are heap numbers, and (added) properties also go into a separate backing store which is likely not next to the object. Worst of all, even if all of these were next to the object, they'd span multiple cache lines and wouldn't really benefit from being close to each other as they're pointer chased and thus wouldn't get much guarantees of prefetching.
When you look at multiple objects, however, then you'll see that Nova's object data is still found in those 4 vectors, whereas the traditional engine design... It may have tried it's best to keep the data together but it's probably still spread out here and there. And you're loading all unnecessary stuff like the elements pointer (for indexed properties) and any other inline properties etc. together with the properties that you actually wanted to read.
Sorry, this ended up a bit disjointed. Let me know if you have more questions! Thanks.
Fun coincidence that you started this project, I've had this exact same idea brewing for a few years, but did not bite the bullet yet :D
Have you considered using Bevy as a base ECS as they have an automatic archetype (shape) handling in the library? This was essentially my original idea, to implement a JS runtime on top of Bevy. (And over the years slap together a browser after the JS starts working)
I have not considered Bevy, no. I sort of assumed that it wouldn't be easy to adapt to (thinking that it is more of a game engine), though it might've well been an excellent option.
I _have_ thought about using Bevy as a rendering engine for some beautiful heap access animations. Imagine rows of little boxes, each row a heap vector and each box an item in it: The boxes blink as their memory is accessed. Oh what a sight it would be.
Yeah, Bevy is super, super modular and isn't really a game engine, it's just a general ECS framework that has lots of batteries made for it so you can make a game with it, but you don't have to.
Apologies for an insubstantial complaint, but there are just too many things called “Nova” (or its Greek counterpart, “Neo”). I get it, it’s new, but isn’t there anything more specific or unique to reflect in the project name?
Heh, no apology needed. It's definitely an overused name. We bikeshed the name a good few years ago and came to "Nova" from "Supernova" because space is, like, cool. Rebranding would feel weird now :)
I console myself with the knowledge that most engine names are unknown anyway, and even if known they are still unsearchable (looking at you two, V8 and JSC!)
Architectural choices are interesting to talk about, but I think most people reading this won't have any context to compare against, me included. How does this compare to e.g. the architecture of V8? What benefits do these choices give when compared against other engines? Etc, reading through the list it's easy to nod along, but it's hard to actually have an intuition about whether these are good choices or not.
They seem to have a blog post on that: https://trynova.dev/blog/why-build-a-js-engine
It reads like an experimental approach because someone decided to will it into existence. That and to see if they can achieve better performance because of the architectural choices.
> Luckily, we do have an idea, a new spin on the ECMAScript specification. The starting point is data-oriented design (...)
> So, when you read a cache line you should aim for the entire cache line to be used. The best data structure in the world, bar none, is the humble vector (...)
> So what we want to explore is then: What sort of an engine do you get when almost everything is a vector or an index into a vector, and data structures are optimised for cache line usage? Join us in finding out (...)
The impetus for the engine design is indeed, as you say, "someone decided to will it into existence."
A friend of mine who works in the gaming industry told me about the Entity Component System architecture and I thought: Hey, wouldn't that work for a JavaScript engine? So I decided to find out.
Nova itself has already been created at that point and I was part of the project, but it was little more than a README. I then started to push it towards my vision, and the rest is not-quite-history.
A friend of mine who works in the gaming industry told me about the Entity Component System architecture and I thought: Hey, wouldn't that work for a JavaScript engine?
That was the first thing I thought of when I saw your description. But the reason ECS works well is cache coherence. (Why) would a general-purpose runtime environment like a JS engine benefit from ECS? Or alternatively, have you seen performance improvements as a result?
I guess the opposite could also be asked: Why would a game benefit from ECS? A player in the game can do basically anything, there's no guarantee that things are always perfectly accessed in a linear order.
It comes down to statistics: Large data sets in a general-purpose runtime environment are still created through parsing or looping, and they are consumed by looping. A human can manually create small data sets of entirely heterogenous data, but anything more than a 100 items is already unlikely.
Finally, the garbage collector is a kind of "System" in the ECS sense. So even if the JavaScript code has managed to create very nonlinear data sets, the garbage collector will still enjoy benefits. (Tracing the data is still "pointer chasing" but when tracing we don't need to trace in the data order but can instead gather a collection of heap references we've seen, sort them in order and then trace them.)
> Why would a game benefit from ECS? A player in the game can do basically anything, there's no guarantee that things are always perfectly accessed in a linear order.
There's actually a guarantee that things are mostly going to be accessed in a linear order because player actions don't matter to the execution of the simulation. The whole simulation is run at 1/FPS intervals across the whole set of entities, regardless of player input (or lack thereof).
In an ECS the whole World is run by Systems, which operate on Components. This is why cache locality works there: when the Movement System is acting, it's operating on the Position Component for all (or at least many) Entities, so linear array access pattern is very favorable. Any other component in your cache is going to be unused until the next system runs (and then the Position Component will become the useless data in cache). That's why you'd rather have an array of Components in cache instead of an array of Entities.
This access pattern is very suitable for games because the simulation is running continuously in an infinite loop (the game loop) consisting of even more loops (the Systems running), but not so much for general purpose computation where access patterns are mostly random. (EDIT: or rather, local to each "entity".)
It is very true that a general purpose computation can theoretically do anything and mess linearity of access patterns entirely up. But in practice programs do most of their work in very linear fashion. It's not by chance that eg. V8 will try to write objects parsed from a JSON array of objects one right after the other. So in a sense we can say that the JavaScript program itself becomes the System with a capital S.
That is not to say that Nova's heap vectors will necessarily make sense: The two big possible stumbling blocks are 1) growing of heap vectors possibly taking too long, and 2) compacting of heap vectors during GC taking too long.
The first point basically comes down to the fact that, at present, each heap vector is truly a single Rust Vec. When it can no longer fit all the heap data into it, it needs to reallocate. Imagine you have 2 billion ordinary objects, and suddenly the ordinary objects vector needs to reallocate: This will cause horrible stalls in the VM. This can be mitigated at the cost of splitting each heap vector into chunks, but this of course comes at the cost of an extra indirection and some lack of linearity in the memory layout.
The second point is more or less a repeat of the first: Imagine you have 2 billion ordinary objects, and suddenly a single one at the beginning of the vector is removed by GC: The GC has to now move every object remaining in the vector down a step to make the vector dense again. This is something that I cannot really do anything about: I can make this less frequent by introducing a "minor GC" but eventually a "major GC" must happen and something like this can then be experienced. I can only hope that this sort of things are rare.
The alternative would be to do a "swap to tail", so the last item in the vector is moved to take the removed item's place. But that then means that linear access is no longer guaranteed. It also plays havoc on how our GC is implemented but that's kind of a side point.
Software engineering is architecture is full of trade-offs :) I'm just hoping that the ones we've made will prove to make sense.
> It's not by chance that eg. V8 will try to write objects parsed from a JSON array of objects one right after the other.
Yes, but note this is still a different pattern of access (array of "entities"). V8 does this because it assumes that e.g. `foo.name` is very likely going to be accessed along with `foo.lastName` (which is likely the 99% case for general computing) whereas ECS assumes `foo.name` is very likely going to be accessed along with `foo2.name`, `foo3.name`, ..., `fooN.name` (which is the 99% case for videogame timestep loops).
> Software engineering is architecture is full of trade-offs :) I'm just hoping that the ones we've made will prove to make sense.
To clarify: my comment is not a criticism of Nova's design decisions. I was only trying to clarify the answer to "Why would a game benefit from ECS?" for those foreign to ECS's existential motive.
I'm sure Nova's tradeoffs make sense for some workloads and I wish you the best!
Thank you very much for your well-wishes <3
> Yes, but note this is still a different pattern of access (array of "entities").
I was referring to the `[foo, foo2, foo3]` objects themselves; V8 does use an "cache local" placement for those so you'll find them laid out in memory as:
> [foo_proto, foo_elems, foo_props, foo_name, foo_lastName, foo2_proto, foo2_elems, foo2_props, foo2_name, foo2_lastName, ...]
For what it's worth, I am interested in laying object properties out in an ECS like manner in Nova, so the properties would be laid out as `[foo.name, foo2.name, foo3.name, ...]`, but currently the properties are laid out similarly to V8, `[foo.name, foo.lastName]`. The only difference is that we do not have "in object properties".
That being said: I am obviously biased, but I do wonder if an ECS-like layout wouldn't be nearly universally beneficial. Thinking of the `foo.name` and `foo.lastName` access: If those are on the same cache line then accessing the two only reads one cache line. This is nice. But if there are more properties in the objects (and there often are), then those will pollute the cache. If you do this access once, it doesn't matter. If you do this a million times, now the cache pollution becomes a real issue: In Node.js even the optimal case would be that you read read 625,000 cache lines worth of data, only to discard 250,000 cache lines of it.
If instead we use an ECS-like layout, then accessing these two properties reads two 10100cache lines: That's bad, but on the other hand if this happens once then it won't even make a blip on the screen. If a million of these accesses are done, you could think that we'd suddenly be slow as molasses but now the ECS-like layout is probably going to help you: You're more likely reading the next `name` and `lastName` property values on each access. If you have it bad and only half of the property data you read is actually the `name` and `lastName` properties you want, then you read 750,000 cache lines and lose out to the traditional engine by 100,000 cache lines. If you get 67% "hit rate" then you break even. And that's comparing to the case where the objects only contain `name` and `lastName` and nothing more.
It of course all comes down to statistics but... I'm very interested in the potential benefits here :)
Again, thank you for your comments, I've enjoyed discussing and pondering this <3
Would this mean that each shape/structure/map get its own vector for each field in order for this to work?
That would be one way; it would offer the best theoretical memory layout for well-behaved programs. But, I expect it to be very painful to work with and to come with some performance penalties in mixed use cases due to the extra indirection required.
No, my thinking is that properties would be stored into tables based on their size class: All objects that have 4-7 properties are in the same table, and all of their first property would be in the same slice, second property in another etc.
That sounds to me like you'll end up getting little benefit from ECS then. Let's say the JavaScript program is iterating over a hundred thousand instances of some Foo class which happens to have 6 properties. You'd ideally want good use of cache, but if your object vector that has all of the Foo objects also happens to have all sorts of instances of other types that have the same field count mixed in there, then you're going to spend a lot of time skipping over those unrelated objects and refreshing the cache.
I know that ECS is treated as a silver bullet by a lot of people, but my experience is that it really only works well when the data you're working with is statically typed so that you can actually partition into arrays where each array does represent a single meaningful type.
That is definitely possible: If it turns out to not make sense then I can at least always go back go the current system where all properties of an object as stored in a single array.
It's not as ECS'ssy as one would hope but it's at least proven technology :D
Virtual alloc your vectors so you can add more backing memory without having to modify the addresses of existing items. Compaction can reap only empty pages but you’ll still need some moving to avoid over fragmentation.
Yeah, virtual alloc for the Vec backing memory is something I hope to do _one day_. It's not a very pressing concern however, as it requires going much lower in the stack.
> The GC has to now move every object remaining in the vector down a step to make the vector dense again
Is there something which forces you to compact everything here? Or could you do what most GCs do and track that free entry in a free list?
I would have to keep a free list per vector, and there's a lot of those, and it would defeat the point of keeping data packed and temporally colocated: I want to ensure that data was created together stays together and only ever gets more compact as it grows older.
Filling in empty slots would mean that likely unrelated data comes and pollutes the cache for the old data :(
I see. Thanks for the answers btw!
I get your point but a few gaps here and there likely don't matter at all for performance. At least it's a lot better than making everything super compact all the time. Assuming you are splitting vectors at some points into chunks: In such a world you could choose to get rid of chunks with a lot of gaps and move the remaining entries into other chunks. At that point you really have a regular GC.
And the free list could be stored in the vector itself. E.g. if an entry is empty it would store the pointer to the next free entry. So all you need is a single head/tail index for each vector.
I also wonder how you handle pointers which could point into one of many vectors. E.g. a field could easily point either to an object or an array. Do you plan to pack this vector id into the 32-bit value? If so wouldn't there be a lot of dispatch like this as well:
if (index & VECTOR_ID_MASK == OBJECTS_VECTOR_ID) { return objects[index&VECTOR_INDEX_MASK]; } else { .. }
I hope it's clear what I mean with this.
Thank you for the discussion!
A few gaps won't matter, and that to me speaks of a split between major and minor GC making sense. However, I'm not really sold on that meaning a free list making sense. For one, if I split the heap vectors into single value parts, then holding free slot data in any of them will become somewhat complicated. Hence at least for the foreseeable future I'm 100% in on the compacted heap vectors idea :) Time will tell if the aggressive compacting makes sense or not.
Our JavaScript Values are the full 8 bytes (yes, this is large and it pains me, but it does give us all integers on stack, most doubles on stack, and up to 7 bytes of string data on stack), so a field that can point to any kind of object stores a byte tag and a u32 index. I might pack this down to 1+3 bytes or so, at the cost of supporting smaller maximum number of objects in the engine. JS Value itself would still probably remain 8 bytes because of the stack data benefits.
There is indeed dynamic dispatching through match statements, though it generally happens at the specification method level. Eg. A specification method to get a property from an object will match on the tag and then dispatch to a concrete method with the index as parameter. The indexes are typed as well, so from this point on we statically know we're dealing with eg. an Array.
So there is dynamic dispatch yes, but we try to eliminate it at the earliest opportunity. We probably still have more of that than a traditional engine would have though: A traditional engine will keep the tag on the heap and there is some dynamic dispatch done based on that, but at least your data lookup isn't based on dispatch.
I think you're on to something (important). Decomposing structs into separate arrays (heaps) is becoming a thing. eg Rust and others are introducing language features to manually (explicitly) do so. It could be cool if the runtime just handled it.
I stumbled across a new research language with new syntax for just this purpose, to better express iteration and lambdas. IIRC.
Sorry, I was looking for something else (got nerdsniped by u/hinkley's mention of Erlang's "set-theoric types" ), and didn't bookmark it. If I find it again, I'll forward the link.
Maybe someone else here knows what I'm talking about.
Wait, is there an RFC for Rust to support SOA?
This is cool, but I'm wondering
(1) Why doesn't V8, whose whole point is performance, lay out memory in an optimal way?
(2) Will Nova need to also implement all of V8's other optimizations, to see if Nova's layout makes any significant difference?
V8 could probably implement the backing object "trick" with some trouble. I'm half-hoping that Nova will show it to be worth their while and that they will eventually do it. It will be a major refactoring of the engine, however.
The heap vector "trick" is basically impossible, I believe. It wouldn't be a refactoring so much as it would be a complete rewrite of the engine. The entirety of V8 assumes it deals in pointers, and all of that would need to change to using indexes instead. I will eat my hat if they do it. Without heap vectors they can still split object data apart using pointer-keyed hash maps, so maybe they could take advantage of some of the ideas still.
V8 does offer ways to run code without optimisations, which we can use for a more apples-to-apples comparison. The most important optimisation that Nova really needs before any big performance comparisons become meaningful is property access inline caching, which requires implementing object shapes.
I'd say that once object shapes are done, then limited performance comparisons can probably be made, especially if V8's JIT is disabled.
To be fair, pointer compression is morally and memory-wise similar to indexing a vector.
Yup, and to a degree the whole "heap references as indexes" idea was inspired by pointer compression. Not in a direct sense of "hey, look at that, what if I took it a step further?" but as I was thinking of the indexes I realised that it looks a lot like pointer compression, and that made me think it is a viable idea.
So is the whole point of this project to convince V8 to adopt a particular optimization?
Not really: In my daydreams Nova becomes the premier JS engine in the world and takes the crown from V8. If V8 went all in and basically just copied all of Nova... I'd probably still develop Nova, as I don't want to work with C++ that much.
If V8 copied all of Nova AND adopted Rust, I might consider laying Nova to rest and going into V8 development. But I'd probably also be really angry at V8 just taking all of Nova's good ideas and peddling them off as their own without crediting Nova. So probably I'd still keep developing Nova while stewing in my anger and inability to do anything about it :)
I hope Nova can be a spark that ignites the JavaScript world into a bit of a renaissance with some of its ideas, but the point is not to burn bright and burn out. The point is to burn bright and stay lit.
> But I'd probably also be really angry at V8 just taking all of Nova's good ideas and peddling them off as their own without crediting Nova.
Who knows, maybe they'd even give you credit (while still taking the idea)?
It could definitely happen. It would be a hard decision for me then :)
Optimal memory layout isn't something you can know in advance. The optimal way to lay out objects in memory is exactly in the order that they will be accessed, but the runtime doesn't know that until after the user program has accessed them.
And if the program accesses a set of objects in different orders at different times, there is no one optimal layout.
I did ask Lars Bak once if they spent a lot of time thinking about cache usage and organizing objects in memory to take best advantage of it and, if I recall correctly, his answer was basically "no". They definitely think about it terms of packing objects into small amounts of memory. But in a dynamically typed language like JavaScript where every property is a reference to some other object elsewhere in memory, using the cache well is just profoundly hard.
Hell, it's hard even in Java where at least you do know the set of fields any given class has.
1 it takes time and effort to make major architectural changes.
Certain design choices made for other reasons may conflict.
Isn't data-oriented design about organizing the data in a way that reflects the most common access patterns of the program? The approach of placing all numbers in a big number vector, all Arrays into a big Array vector, and so on, would be "data-oriented design" if it actually reflects the most common access patterns. So, is it the case that when you read a number you also want all those other numbers that come together with it in the cache line? Is that the case for Arrays? For DataViews? In other words, does this approach to allocating memory reflect the most common data access patterns in JavaScript programs? I'm not saying it's a bad approach, and I'm not even trying to imply that it's not DOD, I'm genuinely asking.
Edit: maybe a better question is: does it reflect the most common data access patterns of a JavaScript Engine?
Excellent question: In a theoretical sense the answer would be that we cannot know since it depends on the JavaScript being run. But: In practice I think that is indeed the case. Especially for the more common an object is, the likelier it is that it is used in conjunction with others around it. At the same time, the more important their memory placement becomes.
eg. Say you have a JS programs that has about a 100 DataViews: I'd say it's unlikely these are used in conjunction with others very often, but they're also only a small part of the program, so their placement is mostly whatever.
Now what if that number is a million instead? Now I'm betting they're mostly all created together, used together, and that their placement is critical to the program's performance.
So, I'm betting that making random memory access performance worse while guaranteeing that data created together stays together and improving linear memory performance will be an overall win.
Whether this is true data-oriented design is then in the eye of the beholder: Maybe someone will think I'm wrong, my assumptions are wrong, and I'm thus not doing things in a data-oriented way.
That's a good point. The "Internals of Nova" blog posts do a bit more explicit comparisons to V8.
In V8, and other production engines AFAIK, objects are variable-sized monoliths: All of their statically known data is contained in one slab. This means that for example in Node.js an empty ArrayBuffer is 96 bytes in size (IIRC).
Basically, they implement the ECMASCript specification defined inheritance chain using object-oriented class inheritance.
https://www.youtube.com/watch?v=5olgPdqKZ84
A short comparison to how V8 does these things:
1. All data in V8 is allocated into one of many heap parts: Usually new data goes into a nursery space, and if it does not get GC'd it moves to the old space. Relative position of data isn't really guaranteed at this point.
2. All heap references in V8 are true pointers or, if pointer compression is used, offsets from the heap base.
3. All objects in V8 include all the data needed for them to act as objects, and all of their data is stored in a single allocation (with the exception of properties, with some exceptions). The more specialised an object is, say an ArrayBuffer, Uint8Array, or a DataView, the bigger it has to be as the specialisation requires more data to be stored.
This is a great idea! I had thought about doing this with a lisp interpreter. I had identified a few key advantages:
- homogenous allocation means no alignment gaps - linear access win in garbage collection - indices smaller than pointers - type discriminated index can save some size
I haven’t verified whether those actually work out in the details. I’ll read your blog article.
Don’t bother with these comments immediately comparing it to V8 (a multi billion dollar venture). I don’t know how many creative projects they’ve done before.
You may be be interested in looking at Fabrice Bellard’s JS engine for ideas.
I actually made a Lisp interpreter in Zig a couple of years ago that has each object type in a separate heap array. In fact each field of each object type has its own array: every CDR is in one contiguous array. This was mostly for fun and to experiment with data-driven techniques using Zig metaprogramming. The code turned out relatively clean and simple.
https://github.com/mbrock/wisp
GC is stop© which as a side effect compacts each of those arrays and improves locality. I think most lists should end up having their CDRs next to each other in memory making iteration very cache friendly. But I didn't verify any performance qualities, beyond making it efficient enough for basic use.
It also has delimited continuation control, compiles to WebAssembly, and hooks promises into the continuation system, among some other pretty cool features!
Well I'll be damned! That sounds very much like what I want Nova to eventually be :) We don't have fields split apart at present, mostly because Rust doesn't make that quite as easy as I would want to. Otherwise it sounds like it's very much all the same, in a good way.
I'll definitely be taking a look at wisp, thank you very much for the link! If you ever have the time, I'd love seeing a comparison of this sort of engine design against a more traditional one.
Sorry, what is "CDR" in this context though?
Quick reply to the cdr thing: car/cdr are old Lisp names for the head/tail fields of linked list cells! :)
Ah, of course!
Yes. the right thing to do is to treat a list as a general case and other uses of cons as special case
I've flipped that idea around in a few of my own language designs, where pairs are the central feature and lists are just pairs with pair cdrs. Works fine from what I can see.
Yes pairs is the 1980s lisp design, but it’s not good for modern caches. Both obviously work.
Oh yeah, continuation pointers also have their own array like every other field kind, which should have similar benefits as list traversal but for continuation copying... It's a really interesting design area, I think. Zig makes it easy!
Yeah, I'm quite envious of the MultiArrayList or whatever it was that Zig has: If only Rust had that sort of a type built-in <3
That's how I got interested in this kind of memory layout in the first place. I wanted a nice Lisp for WebAssembly and had recently gotten into Zig. When I started defining the word structure I remembered Andy Kelley's talk about using data-oriented design to make the Zig compiler fast, so I thought I'd try it, and the more I thought about it the more reasonable it seemed.
There are like a dozen object types with different growing multiarrays. Words are 32 bit with 1 for GC state and 27 for index and the rest are the type tag. Ints are 28 bits. Byte arrays have their own heap too, as well as general 32 bit vectors.
Thank you for the encouragement! Avoiding alignment gaps is indeed pretty great: I have a vision of packing Arrays into 9 bytes split over two or three cache lines.
On typed indexes: If we accept only about 2^24 possible index values then we could use a 32 bit integer for our Values, or at least for Objects (if we want to keep 7 bytes worth of stack data, which is pretty hard to pass on).
I love the comments comparing Nova to V8: That's what I want to aim for after all :) I'm not sure I've heard of Fabrice Bellard's JS engine, thanks, I'll take a look!
Your blog mentions QuickJS, which I believe is the mentioned engine by Fabrice
Oops :D
Isn't data oriented design driven by knowing what your data accesses look like? In your engine, you're building as if you're assuming that common data access will be linear access over objects of the same type. Why?
Yeah, know your data and how it is used. I assume that data access is mostly linear because of a few reasons:
1. All performance issues arise in loops: I at least have never seen a performance problem that could be explained by a single thing happening once. It is always a particular thing happening over and over again.
2. All loops deal with collections of data, and the collections are usually created either created manually by a human being, or are created through parsing or looping many at a time.
3. A human being can manually create a collection of maybe a hundred items manually before they get bored and stop. A collection created this way may contain data from all over the place, with data access over it being nonlinear.
4. A collection created through parsing or looping will create its data in a mostly linear fashion. Accessing the data will then also be linear.
There are definitely cases where nonlinear collections exist, but these are usually either small or are created from smaller sets of linear data. eg. Think of dragging 10 lists of 1000 items to form a list of 10000 items. The entire 10000 items aren't going to be located linearly, but every 1000 items will be.
So in effect, I'm betting that most hot loops do deal with linear access over objects and that loops that work over nonlinear access are not particularly hot.
Makes sense when you put it like that, thanks very much for explaining your thought process.
I recommend reading /Don’t Stop the BIBOP: Flexible and Efficient Storage Management for Dynamically Typed Languages/ (1994)[1]. "BIBOP" stands for "Big Bag of Pages."
1. https://legacy.cs.indiana.edu/~dyb/pubs/bibop.pdf
Thank you, I'll put it on the reading list!
Is this an experimental only JS engine or do you aim to implement the entire ECMAscript specification?
I have been following the Rust Boa project, but I think that it isn't production ready, yet. https://github.com/boa-dev/boa
The aim is absolutely to implement the entire ECMAScript specification. Progress has slowed down recently, as I've been both busy with other things and tied up in making the engine work with interleaved GC.
A secondary aim is to have a bunch of feature flags that allows the engine to drop out support for specification parts that a particular embedder doesn't care about. That obviously fights with the "implement the entire ECMAScript specification" goal, but I just hate indexed property getters and setters with a passion and want to see them gone wherever I go.
Boa is a great project and I believe it is being used in some production systems. I've met and exchanged some ideas with the main developer, Jason Williams, and even received the greatest praise that I could imagine: Boa will (or did?) take some inspiration from Nova on its GC refactoring. Nova has also copied (with proper attribution of course) a few minor parts from Boa, like whitespace skipping code for some spec abstract operations.
I highly recommend keeping an eye out and using Boa if you have the chance.
"Numbers go into the numbers vector" is unusual - typically JS engines use either NaN-boxing or inline small integers (e.g. v8 SMI). I suppose this means that a simple `this.count += 1` will always allocate.
Have you considered using NaN-boxing? Also, are the type-specific vectors compacted by the GC, or do they maintain a free list?
We do have all safe integers inline (and most doubles too).
I answered about NaN boxing somewhere here but basically, we get quite a bit of mileage from our tagged union / enum / ADT based Value, so I don't think I'd change to NaN boxing now even if I could.
Do you have some specific application profile in mind?
Sounds like this approach could be useful for games that embed a scripting engine. In that context it might be interesting to eventually see some benchmarks against usual suspects of game scripting like Lua.
The plan is to eventually get to full ECMAScript specification compatibility, and who knows if that would then bring us to eg. the Servo browser or Deno JS runtime.
In the short term, I am interested in one-shot script running scenarios where only very limited JavaScript type are needed. The engine already has a bunch of feature flags that can be turned off to disable things like ArrayBuffers and other "complex" features. I have a work-related system in mind where only JSON based types are needed, and garbage collection isn't really necessary: The code could be run once and afterwards the system could be wiped down to the initial state and re-run.
I also have half-a-mind to try running Nova on an STM32 board. But that could be called a hobby project within a hobby project :)
What exactly is meant by the word "kind" here when you say kind-specific vectors?
If I have `function X(a) { this.a = a; }` and then `function Y(b) { this.b = b; }` does that mean `new X(1)` and `new Y(2)` are considered objects of different kinds?
And what about creating objects with literals: are `{a: 1}` and `{b: 2}` considered objects of different kinds?
They're specification wise, statically separate kind of objects: A class constructor is special and requires more "internal slots" so they live in their own vector, as do ArrayBuffers, TypedArrays, Arrays, DataViews, Maps, Sets, ...
But objects that have different shapes do not end up in their own vectors, since the shape is a dynamic property.
I’m wondering how this interacts with the “young objects mostly die” assumption of a generational garbage collector. It seems like using an arena for the young generation might work better for some programs, while an ECS-like scheme works better for other programs.
Thank you for asking! I've not implemented and thus haven't proved this in action yet, but my thinking is that this interacts very well indeed: Each heap vector can designate an index that marks the beginning of the young generation. We don't need separate old and new spaces, instead promotion is just the act of moving the young generation beginning index up.
Side note: I have a corollary on the "most objects die young" that is very much at the heart of Nova: Most objects live together. If they are created at the same time, then they're likely also used together. Hence why I don't swap items around in the heap vectors, or use a free list for allocation: It would mess up the temporal order of items in the vectors, leading to less chances at useful cache line sharing.
Don’t you need to move the surviving young generation objects after the ones they’re surrounded by die? Otherwise the array is going to end up rather sparse, with a lot of unused array entries.
Without either a free list or compaction, I don’t really see how you’re collecting garbage at all.
Yes, I do need to compact the young generation during GC. Eg. Let's say I have the young generation starting at index 1000 and I do GC with 1100 items, with 10 items surviving. I'll have to compact the remaining 10 items to the 1000..1010 span of the vector, after which I can also decide to promote the bottom two young generation indexes to the old generation, making the next young generation start index 1002.
Does Nova include a JIT compiler? Or just an interpreter?
Nova only has a bytecode compiler and interpreter. I do not plan on trying my hand at JIT compiling any time in the future. In this I am a follower of Ladybird's Andreas Kling and hope that JIT will not become necessary.
I'm curious why you think JIT will not become necessary. My impression was that optimizing JIT compilers will basically always be multiple times faster than an interpreter.
I'm mostly just hoping it won't become necessary, though that is perhaps a vain hope.
The reasoning is that, according to my interpretation of talking with some folks working on JSC and SM, property lookup inline caching is the most important performance optimisation bar none. JIT compiling is an improvement on top, definitely, but it is not an massive step change.
Safari browser has a no-JIT mode that is fairly widely in use, and it is apparently fast enough that you don't really notice the change. Ladybird browser's LibJS has no JIT compiler, yet LibJS isn't really unbearably slow: The browser's biggest performance woes come from the browser around it and especially from having the simplest possible drawing algorithm possible.
From a "personal" experience, while the test262 compliance test set is no performance benchmark, Nova is for some reason consistently at the very top of the runtime list over at https://test262.fyi/#. This is of course partially just because we're really quick to do a controlled panic if an unsupported code path is called, and the remaining part is because the code is run so little that JIT doesn't get to kick in. Still, this meaningless number gives me some measure of hope: We're consistently 3 times as fast as V8 after all :)
doesn't using these sorts of data structures turn the security safety properties of rust into silent logic errors?
Nope. It rather explains the memory ownership of a JavaScript heap in a way that Rust understands: The heap owns all data in the heap, and JavaScript objects holding "references" to other objects do not imply memory ownership in the sense that Rust understands it.
So, safery properties are not being silenced: The indexes definitely _are_ Rust wise unreliable where a pointer wouldn't be so bounds checks need to be done. But memory safety is not under threat here.
This does mean that we have to take care of garbage collection ourselves, Rust will not do that for us, but that was the case anyhow since Rust doesn't have a garbage collector we could use (thank heavens). If we make mistakes here, it will lead to the JavaScript heap being corrupted from the JS code point of view but from the engine point of view the memory is still fully safe: The worst thing that can happen is a panic from out of bounds vector indexing.
I think the point is that "memory safety", understood broadly, goes beyond that. You still have the notion of dangling pointers and such with indices, and while it doesn't allow for e.g. stack corruption, it still exposes many similar classes of bugs - for example, a "dangling" index to a deallocated object can potentially allow the code to read or write into a completely different object than it was originally pointing to. Hence, silent logic errors, potentially exploitable as well.
Yeah, sure in that sense this partially turns off Rust's safety features. That being said: A big part of making the engine be safe for interleaved GC is about using ZST's with lifetimes held inside them to bind any JavaScript Values when they appear on the stack and getting back parts of those security guarantees.
We can still make mistakes, especially in the garbage collector, but that is again somewhat helped by code-sharing and coding conventions enabled by Rust ie. using destructuring in GC to make sure we don't forget any part of the heap data. (Coding conventions are not a solution, they are a mitigation at most. I _can_ write the heap GC as a map from one heap data of 'old lifetime to 'new, but that leads to worse code generation than mutating a 'static lifetime heap data :( )
> But memory safety is not under threat here.
Note I did not say memory safety. I said security safety.
I don't know what "security safety" is so I must've gotten confused. If you mean type safety, then we do make sure to stay on top of that: Our JS Value is an enum that contains either stack data or a typed index that corresponds to the tag. So the Array variant holds an Array index etc. So it is not possible to take type of index and turn it into another type of index without transmute.
If you refer to referential safety, so that your reference to object X still refers to X later on, then that is indeed something we "lose" because we need to implement GC ourselves. But that wouldn't actually really meaningfully change with using pointers either, as updating pointers after a move would need to be done manually as well.
Using references is right out because we cannot explain the JavaScript memory ownership model to Rust: The two are simply not compatible. There are of course safe GC crates that give you reference APIs but they do the pointer updating manually on the inside (if they have moving GC anyway), so the situation doesn't meaningfully change.
Wild, interested to see how Nova would perform on a benchmark suite.
At the moment: Poorly! :D
The biggest obstacle right now is that for any reasonably big benchmark, Nova will never finish as the GC cannot be run while JavaScript is running and in a big benchmark JS is always running.
I've started a large-scale work to make the engine safe for interleaved garbage collection, but it's a ton of work and will take some time unfortunately. Once it is done, I will start doing benchmarks and seeing what takes time and where.
From small-scale benchmarks I already know that our JS Value comparisons take too much time, our object property lookups are really expensive on larger objects (as it's a simple linear search), and our String interning is very slow (as it too is a dumb-as-rocks linear search).
Well, Devs of V8, Spidermonkey, Webkit and GraalJS are all on HN. Hopefully they see this and all chime in.
I hope so too :) I've contributed some minor bits of code to V8 and then worked on Nova for a year or two, but I'm still wet behind the ears compared to those folks. Any and all comments I can get from them is a blessing.
Do you plan on supporting TCO? I was disappointed to learn a few years ago that V8 wouldn't, on the grounds that, IIRC, it would confuse developers.
True tail call recursion and lazy evaluation would enable truly functional JS.
It is the plans, since it is in the ECMAScript specification... It might actually be fairly easy now that I think about it?
If I am not mistaken there are a few `TODO`s sprinkled all over code relating to function calls about implementing TCO. Shouldn't be too hard from what I can remember when I last looked over those parts of the code.
I'm coming at this with no real JavaScript implementation knowledge, so these comments might not even make sense...
The data sorting seems quite cleanly at first, but as I think more about it I don't quite get it. I guess you are saving a bit of space by segmenting by type... in another approach you might have the type on the pointer, and the pointer can point to anything, and so it's potentially a bit longer than having a type and pointer(/index) that points into a smaller portion of memory specific to that type. But enough to matter?
"No, pointers we do not want and cannot have, so the only real option is to use indexes. Indexes have a lot of benefits: They are small, work exceedingly well together with our heap vectors, enable using the same value to index into multiple heap vectors (or slices of the same heap vector), perform a form of pointer compression automatically, and offer great protection from safety vulnerabilities as reinterpreting an index as a different type changes both the type and the memory it indexes into."
That all just sounds like a pointer to me? The last case also seems like a security hole, not protection.
"Not all objects are the same: They differ in their usage and their capabilities. An object-oriented reading of JavaScript objects' capabilities and the ECMAScript specification would give you a clear and simple inheritance graph where the ordinary object is the base object class, and Arrays, DataViews, Maps, and others inherit from that. Not all objects are the same: They differ in their usage and their capabilities. An object-oriented reading of JavaScript objects' capabilities and the ECMAScript specification would give you a clear and simple inheritance graph where the ordinary object is the base object class, and Arrays, DataViews, Maps, and others inherit from that."
It seems like you are special-casing a specific set of object types (like Array), which is very justifiable. So sure.
"This is somewhat more of an aim for the future instead of current reality, but allow me to give some easy examples: The ArrayBuffer object in ECMAScript supports allocating up to 2^53 bytes of data. Most engines only allow a tad bit over 2^32 bytes but nevertheless, the fact of the matter is that you need more than 4 bytes to store that byte value. As a result, ArrayBuffer itself but also DataView and all the various TypedArray variants like Uint8Array must carry within them 8 byte data fields for byte offset, byte length, and even array length. Now ask yourself, how often do you deal with ArrayBuffers larger than 4 GiB? Not very often, obviously."
I'm guessing this is leading to a decision many languages have made about numbers and strings, where there's special types for small numbers and short strings (exposed only in the implementation). Or even more special types, where the pointers become values.
Also I can see a benefit to keeping track of "normal" Arrays and whatnot, so some of JavaScripts weird-but-not-usually-used behavior can be isolated, and normal behavior fast-tracked.
"In Nova we aim to split objects into parts to ensure that computationally unconnected parts are also stored separately in memory"
But this I don't get. If you are splitting things by type, how can you cluster them by how they are related? An object like {a: 1, b: 2} is an object with two strings and two numbers, presumably spread out over three different type-specific heaps?
Hey, thank you for the comment! I'll try to answer as best I can.
A pointer is 64 bits, though carrying much less useful payload than that. A JavaScript engine only rarely deals with more than 4 GiB of memory, so a 32 bit integer would be enough to index the entire memory needed. If you turn that though into indexes, a 32 bit index can speak of 4 billion separate items: Most programs never have that many distinct heap items alive at the same time. Note that this index doesn't now really correspond to indexable memory so we're no longer bound by the 4 GiB limit.
We actually do keep the 64 bit Value though! We just use the massive amounts of data to store a lot of data on the stack, avoiding heap allocations altogether.
> That just sounds like a pointer.
A pointer points to one place and one place only: An index can points to as many places as there are "parallel vectors" associated with it. eg. Think of a table: A row index refers to as many cells as there are columns, whereas a cell pointer only identifies one cell.
> The last case also seems like a security hole, not protection.
Usually JS engines don't consider the JS-accessible contents of the JS heap itself part of the threat model: Any object in the heap is liable to be leaked by the JS code running in the engine anyway. eg. V8's object placement is fairly static and easy to exploit. The important thing for safety is to avoid type confusion which can be used to create read/write primitives to punch out of the sandbox. So; an attacker can freely read through the heap data by creating heap indexes out of thin air but they cannot use that to reinterpret one type of data as another type and then feed that back to the engine to cause it to misbehave.
> But this I don't get. If you are splitting things by type, how can you cluster them by how they are related? An object like {a: 1, b: 2} is an object with two strings and two numbers, presumably spread out over three different type-specific heaps?
Yes, this would split into the ordinary object vector, and the object property vector. If the keys were longer they'd end up in the strings vector and if the values were heap allocated doubles then they'd end up in yet another vector. Looking at it one thing at a time, it is split here and there.
That being said, this doesn't really much change from how traditional engines do it: Strings are not going to be near the objects that use them as keys, nor are heap numbers, and (added) properties also go into a separate backing store which is likely not next to the object. Worst of all, even if all of these were next to the object, they'd span multiple cache lines and wouldn't really benefit from being close to each other as they're pointer chased and thus wouldn't get much guarantees of prefetching.
When you look at multiple objects, however, then you'll see that Nova's object data is still found in those 4 vectors, whereas the traditional engine design... It may have tried it's best to keep the data together but it's probably still spread out here and there. And you're loading all unnecessary stuff like the elements pointer (for indexed properties) and any other inline properties etc. together with the properties that you actually wanted to read.
Sorry, this ended up a bit disjointed. Let me know if you have more questions! Thanks.
Obligatory "Torille!" as a fellow Finn.
Fun coincidence that you started this project, I've had this exact same idea brewing for a few years, but did not bite the bullet yet :D
Have you considered using Bevy as a base ECS as they have an automatic archetype (shape) handling in the library? This was essentially my original idea, to implement a JS runtime on top of Bevy. (And over the years slap together a browser after the JS starts working)
Torille!
I have not considered Bevy, no. I sort of assumed that it wouldn't be easy to adapt to (thinking that it is more of a game engine), though it might've well been an excellent option.
I _have_ thought about using Bevy as a rendering engine for some beautiful heap access animations. Imagine rows of little boxes, each row a heap vector and each box an item in it: The boxes blink as their memory is accessed. Oh what a sight it would be.
Yeah, Bevy is super, super modular and isn't really a game engine, it's just a general ECS framework that has lots of batteries made for it so you can make a game with it, but you don't have to.
It's so gd versatile so people have done cool weird stuff with it: https://www.nikl.me/blog/2024/bevy_ecs_as_data_layer_in_lept...
Apologies for an insubstantial complaint, but there are just too many things called “Nova” (or its Greek counterpart, “Neo”). I get it, it’s new, but isn’t there anything more specific or unique to reflect in the project name?
Heh, no apology needed. It's definitely an overused name. We bikeshed the name a good few years ago and came to "Nova" from "Supernova" because space is, like, cool. Rebranding would feel weird now :)
I console myself with the knowledge that most engine names are unknown anyway, and even if known they are still unsearchable (looking at you two, V8 and JSC!)
Absolutely overused but allows for cool puns