Writing LINQ in JavaScript using Memory-Efficient Generators

As someone very active in both .NET and the JavaScript community, I like to take lessons learned from one environment and apply it to the other. This blog will apply lessons learned from .NET's LINQ to JavaScript collections.

Introduction

We have come a long way in JavaScript when it comes to dealing with collections. Before 2015 we had to write stuff like this:

var coll = [5,7,8,65,555,3],
 filtered = [],
 projected = [],
 i;

// only dividable by 5
for (i = 0; i < coll.length; i++) { 
  if(i % 5 === 0){
    filtered.push(coll[i]);
  }
}

// add taxes 
for (i = 0; i < filtered.length; i++) { 
    projected.push(filtered[i] * 1.21);
}

//print
for (i = 0; i < projected.length; i++) { 
  console.log(projected[i]);
}

All this is doing is filtering out some unwanted elements and then projecting the remaining values by multiplying them with a constant.

The equivalent in SQL would be:

SELECT [Number] * 1.21
FROM Numbers
WHERE [Number] % 5 = 0

It's a sad day when JavaScript has to catch up with a 45-year-old language.

Fortunately, since ECMAScript 2015, we can write it like this:

let coll = [5,7,8,65,555,3];

let result = coll
  .filter(n => n % 5 === 0)
  .map(n => n * 1.21);

for	(let price of result){
  console.log(price);
}

However, there is still a lot of room for improvement. First of all, every operator creates a new collection.

/collection per operator.png

Depending on the size of the collection and how many operations are applied, this leads to wasted memory.

Chrome eats RAM

Second, who says we need the full collection anyway? With paging, people are usually only interested in the first few pages.

The Best Place to Hide a Dead Body is Page Two of Google search results - Unknown

So, how to solve this issue? This is where generators come in.

Generators

Here is a brief explanation about generators. If you know what generator functions are, you can skip this part.

A generator function looks something like this:

function* getPrime() {
  yield 2;
  yield 3;
  yield 5;
  yield 7;
}

Notice the * and the yield keyword. This function returns a collection. But it's not an array, it's a Generator. The main difference between an Array and a Generator is how its iterator works. With an array, the iterator walks through memory. When you request the next item, it simply reads a value from memory. With generators, the iterator walks through code. When you request the next item, it will execute code until it reaches a yield statement and then it returns that value.

Let's take a look at the execution of the previous function:

1. let gen = getPrime();
2. 
3. console.log(gen.next().value); //writes 2
4. console.log(gen.next().value); //writes 3

Calling the getPrime function immediately returns a Generator. However, nothing in the function body was executed. Only when line 3 is called, the function will start to execute until it reaches yield 2. Then the value 2 is handed to console.log. On line 4, we request the next item. The function will pick up where it left off and continue until it reaches yield 3. This is also known as lazy evaluation.

Notice that this result was never stored. It just fetches and processes the values one by one. This makes generators very suitable for pipelined processing. Also notice that 5 and 7 where never reached and no excessive work was done. This means that generators can be used in combination with potentially infinite lists, like all prime numbers.

For more information about Generators, you can take a look at the MDN documentation.

Building LINQ

So, let's use these generators to make more efficient operators for our collections.

The goal is to get something like this:

let result = coll
  .where(...)
  .select(...);

  
for (let item of result){
  console.log(item);
} 

But this time the operators don't create temporary collections but instead support lazy evaluation by doing pipelined processing. The for loop asks for the next item, which the select function asks to the where function, that the where retrieves from the original collection.

pipelined processing

Let's take a look at a first implementation:

version 1

function* where(collection, test) {
  for (let value of collection) {
    if(test(value)){
      yield value;
    }
  }
}

function* select(collection, transform) {
  for (let value of collection) {
      yield transform(value);
  }
}

let coll = [5,7,8,65,555,3];

let filtered = where(coll, n => n % 5 === 0);
let mapped = select(filtered, n => n * 1.21);

for	(let price of mapped){
  console.log(price);
}

The where function is basically copy/pasted from the very first code block. Except that any filter can be applied, and that it's now a generator function. The same thing applies to the select function.

No filtering or transformation is done until we reach the for loop at the end of the script. Then, each value in coll will be sent through the pipeline individually.

Great, that's exactly what we wanted. But maybe we can write it in a more natural way.

version 2

Array.prototype.where = function* (collection, test) {
  for (let value of collection) {
    if(test(value)){
      yield value;
    }
  }
}

Array.prototype.select = function* (collection, transform) {
  for (let value of collection) {
      yield transform(value);
  }
}

let coll = [5,7,8,65,555,3];

let result = coll
  .where(n => n % 5 === 0)
  .select(n => n * 1.21);

for	(let price of result){
  console.log(price);
} 

We've set the functions on the prototype of Array. Looks a lot better right? There is only one problem... It doesn't work. The reason is simple: function* does not return an Array it returns a Generator object. So, the select function is not found on the output of the where function.

version 3

Well, that should be easy to solve, right? You just put it on the Generator object instead. Two problems with that.

  • The Generator object is not exposed by JavaScript
  • The first Array will have to be converted to a Generator before you can start chaining the operators

We can get a reference to the Generator object by using a little hack though.

const Generator = Object.getPrototypeOf(function* () {});

Array.prototype.toGenerator = function* () {
  for (let value of this) {
      yield value;
  }
}

Generator.prototype.where = function* (test) {
  for (let value of this) {
    if(test(value)){
      yield value;
    }
  }
};

Generator.prototype.select = function* (transform) {
  for (let value of this) {
      yield transform(value);
  }
};

let coll = [5,7,8,65,555,3];

let result = coll
  .toGenerator()
  .where(n => n % 5 === 0)
  .select(n => n * 1.21);

for (let price of result){
  console.log(price);
} 

It works! But it feels a little bit dirty.

There is in fact a little improvement we can make. We can replace the toGenerator function with the following:

Array.prototype.toGenerator = function* () {
  yield* this;
}

yield* is normally used to delegate to another generator function but can also be applied to normal arrays.

But still, it feels a little bit dirty.

this feels dirty

final version

To avoid our dirty little hack, we can make use of higher level functions:

function where(test){
  return function* (coll) {
    for(let value of coll){
      if(test(value)){
        yield value;
      }
    }
  }
}

function select(transform){
  return function* (coll) {
    for(let value of coll){
      yield transform(value);
    }
  }
}

Array.prototype.pipe = function(...generators) {
  let chain = this;
  for(let gen of generators) {
    chain = gen(chain);
  }
  return chain;
}

let coll = [5,7,8,65,555,3];


let result = coll.pipe(
  where(n => n % 5 === 0),
  select(n => n * 1.21)
);

for (let price of result){
  console.log(price);
} 

Here, the where and select functions create a generator function. The pipe function chains these generators together without using the Generator's prototype.

Basically you get the following:

let result = selectGen(whereGen(coll))

As you might have noticed, this looks a lot like RxJS's syntax. That's no coincidence, we've basically built the synchronous version of RxJS here. The only difference is that ours is a pull mechanism, and not a push mechanism.

That's it! I hope you learned something. Leave a comment with any thoughts or remarks.