As someone very active in both .NET and the JavaScript community, I like to take lessons learned from one environment and apply it to the other. This blog will apply lessons learned from .NET's LINQ to JavaScript collections.
Introduction
We have come a long way in JavaScript when it comes to dealing with collections. Before 2015 we had to write stuff like this:
var coll = [5,7,8,65,555,3],
filtered = [],
projected = [],
i;
// only dividable by 5
for (i = 0; i < coll.length; i++) {
if(i % 5 === 0){
filtered.push(coll[i]);
}
}
// add taxes
for (i = 0; i < filtered.length; i++) {
projected.push(filtered[i] * 1.21);
}
//print
for (i = 0; i < projected.length; i++) {
console.log(projected[i]);
}
All this is doing is filtering out some unwanted elements and then projecting the remaining values by multiplying them with a constant.
The equivalent in SQL would be:
SELECT [Number] * 1.21
FROM Numbers
WHERE [Number] % 5 = 0
It's a sad day when JavaScript has to catch up with a 45-year-old language.
Fortunately, since ECMAScript 2015, we can write it like this:
let coll = [5,7,8,65,555,3];
let result = coll
.filter(n => n % 5 === 0)
.map(n => n * 1.21);
for (let price of result){
console.log(price);
}
However, there is still a lot of room for improvement. First of all, every operator creates a new collection.
Depending on the size of the collection and how many operations are applied, this leads to wasted memory.
Second, who says we need the full collection anyway? With paging, people are usually only interested in the first few pages.
The Best Place to Hide a Dead Body is Page Two of Google search results - Unknown
So, how to solve this issue? This is where generators come in.
Generators
Here is a brief explanation about generators. If you know what generator functions are, you can skip this part.
A generator function looks something like this:
function* getPrime() {
yield 2;
yield 3;
yield 5;
yield 7;
}
Notice the *
and the yield
keyword. This function returns a collection. But it's not an array, it's a Generator. The main difference between an Array and a Generator is how its iterator works.
With an array, the iterator walks through memory. When you request the next item, it simply reads a value from memory. With generators, the iterator walks through code. When you request the next item, it will execute code until it reaches a yield
statement and then it returns that value.
Let's take a look at the execution of the previous function:
1. let gen = getPrime();
2.
3. console.log(gen.next().value); //writes 2
4. console.log(gen.next().value); //writes 3
Calling the getPrime
function immediately returns a Generator. However, nothing in the function body was executed. Only when line 3 is called, the function will start to execute until it reaches yield 2
. Then the value 2
is handed to console.log
. On line 4, we request the next item. The function will pick up where it left off and continue until it reaches yield 3
. This is also known as lazy evaluation.
Notice that this result was never stored. It just fetches and processes the values one by one. This makes generators very suitable for pipelined processing. Also notice that 5 and 7 where never reached and no excessive work was done. This means that generators can be used in combination with potentially infinite lists, like all prime numbers.
For more information about Generators, you can take a look at the MDN documentation.
Building LINQ
So, let's use these generators to make more efficient operators for our collections.
The goal is to get something like this:
let result = coll
.where(...)
.select(...);
for (let item of result){
console.log(item);
}
But this time the operators don't create temporary collections but instead support lazy evaluation by doing pipelined processing. The for loop
asks for the next item, which the select
function asks to the where
function, that the where
retrieves from the original collection.
Let's take a look at a first implementation:
version 1
function* where(collection, test) {
for (let value of collection) {
if(test(value)){
yield value;
}
}
}
function* select(collection, transform) {
for (let value of collection) {
yield transform(value);
}
}
let coll = [5,7,8,65,555,3];
let filtered = where(coll, n => n % 5 === 0);
let mapped = select(filtered, n => n * 1.21);
for (let price of mapped){
console.log(price);
}
The where
function is basically copy/pasted from the very first code block. Except that any filter can be applied, and that it's now a generator function. The same thing applies to the select
function.
No filtering or transformation is done until we reach the for loop
at the end of the script. Then, each value in coll
will be sent through the pipeline individually.
Great, that's exactly what we wanted. But maybe we can write it in a more natural way.
version 2
Array.prototype.where = function* (collection, test) {
for (let value of collection) {
if(test(value)){
yield value;
}
}
}
Array.prototype.select = function* (collection, transform) {
for (let value of collection) {
yield transform(value);
}
}
let coll = [5,7,8,65,555,3];
let result = coll
.where(n => n % 5 === 0)
.select(n => n * 1.21);
for (let price of result){
console.log(price);
}
We've set the functions on the prototype of Array
. Looks a lot better right? There is only one problem... It doesn't work. The reason is simple: function*
does not return an Array
it returns a Generator
object. So, the select
function is not found on the output of the where
function.
version 3
Well, that should be easy to solve, right? You just put it on the Generator object instead. Two problems with that.
- The
Generator
object is not exposed by JavaScript
- The first
Array
will have to be converted to a Generator
before you can start chaining the operators
We can get a reference to the Generator object by using a little hack though.
const Generator = Object.getPrototypeOf(function* () {});
Array.prototype.toGenerator = function* () {
for (let value of this) {
yield value;
}
}
Generator.prototype.where = function* (test) {
for (let value of this) {
if(test(value)){
yield value;
}
}
};
Generator.prototype.select = function* (transform) {
for (let value of this) {
yield transform(value);
}
};
let coll = [5,7,8,65,555,3];
let result = coll
.toGenerator()
.where(n => n % 5 === 0)
.select(n => n * 1.21);
for (let price of result){
console.log(price);
}
It works! But it feels a little bit dirty.
There is in fact a little improvement we can make. We can replace the toGenerator
function with the following:
Array.prototype.toGenerator = function* () {
yield* this;
}
yield*
is normally used to delegate to another generator function but can also be applied to normal arrays.
But still, it feels a little bit dirty.
final version
To avoid our dirty little hack, we can make use of higher level functions:
function where(test){
return function* (coll) {
for(let value of coll){
if(test(value)){
yield value;
}
}
}
}
function select(transform){
return function* (coll) {
for(let value of coll){
yield transform(value);
}
}
}
Array.prototype.pipe = function(...generators) {
let chain = this;
for(let gen of generators) {
chain = gen(chain);
}
return chain;
}
let coll = [5,7,8,65,555,3];
let result = coll.pipe(
where(n => n % 5 === 0),
select(n => n * 1.21)
);
for (let price of result){
console.log(price);
}
Here, the where
and select
functions create a generator function. The pipe
function chains these generators together without using the Generator's prototype.
Basically you get the following:
let result = selectGen(whereGen(coll))
As you might have noticed, this looks a lot like RxJS's syntax. That's no coincidence, we've basically built the synchronous version of RxJS here. The only difference is that ours is a pull mechanism, and not a push mechanism.
That's it! I hope you learned something. Leave a comment with any thoughts or remarks.