lapsu.tv
VADDR: Using the dark side of the force to defeat MVC

Let’s assume for the sake of this post, that you accept the fact that MVC as a web pattern is well past its due date. What then can replace it? The answer, of course, is VADDR. Don’t worry, it’s not at all related to any Sith Lords you may have heard about.

The primary goals VADDR wants you to achieve are security and testability. Rather than being defined by how the functions and code are arranged, VADDR is defined by the phases that data passes through while processing a web request. Once the phases of data are known, how to arrange the transitions between phases becomes obvious.

These are the 4 data phases in the VADDR pattern:

  • VA - Validated/Authenticated Input
  • D - Database Records
  • D - Domain Objects
  • R - Render Objects

Validated/Authenticated Input

That MVC has nothing to say about input validation is reason enough to disregard it as a design pattern relevant to the web.

With VADDR, there is one object per request for Validated and Authenticated Input. Doing the work to create it is the first step in processing a request.

Any URL or POST params and even cookies must be validated first. Upon validation, they are used to create an object that is typed specifically for each request. Subsequent transitions only have access to this object. If it’s all just validated input, it allows the application code and tests to disregard the complexities of the HTTP context.

Also, the validation phase is where the current user’s session and CSRF token should be verified. Any failures to generate this object should cause the app to stop and send a 4xx error. By generating the Validated/Authenticated Input as the first phase of data, each request reminds the developer about the primary goal of security.

Database Records

This bit is most similar to the M in MVC. They are objects that model exactly what lives in the database. The “database” can be an RDBMS, a NoSQL document store or even a local or remote REST endpoint. The point is, this type of object represents what comes from the data source, before any processing or logic is applied to it.

While I/O is necessary to fetch Database Records or write them to persistent storage, transitions between Database Records, Domain Objects and Render Objects should be pure. This helps make the most important parts of the program easily testable, another of a developer’s primary goals.

Domain Objects

In MVC, Database Records, Domain Objects and Render Objects are often merged into a single type. When the single object doesn’t exactly fit it’s application, code complexity ensues. If they are broken up, it’s usually in an ad-hoc fashion that varies from one component to the next even within the same application.

With VADDR, Domain Objects are a distinctly separate idea. Domain Objects represent the domain model of the application regardless of how things are normalized and structured in the database. For example, a Domain Object may aggregate some data from a parent record into a child record as a single object.

Render Objects

The final phase is for passing data to templates, json or other output code. These objects should model the display layout to make front end rendering faster and less complex.

One example might be a HeaderRenderObject. It includes the user’s real name, counts of new messages and any other relevant info meant to be displayed in the header. User info not displayed in the header is excluded from this object as it’s targeted specifically for this case.

In some cases of VADDR, especially for small and/or new apps, there may be no differences between a database record, a domain object and a render object. In this case, there is simply no transformation necessary, but it’s still useful to think about there being 3 applications for that one object. If and when the domain or display start to drift, VADDR expects a new type to be created for the necessary phase, so that the data stays tuned specifically for its purpose. The key to simple code is appropriately structured data.

VADDR: A New Hope

As a pattern, VADDR aims to provide specific guidance on how to phase data so that programs are easier to write securely and maintain with testing. And once the data is phased, designs for simple, testable transformation functions should readily revealed themselves. If this sounds good, give it a try. If not, how would you lay out the phases of data? Write it up, or even better, code it up and post what you come up with.

Brick and Mortar Fail

I tried and failed to use real life to buy a dishwasher yesterday. Started at Sears and found a salesman uninformed about their limited selection. Moved on to Home Depot and found a salesman misinformed about their even more limited selection. Once again I turned to the internet for its ability to aggregate data from disparate sources and found success at AJ Madison. Despite selling old school technology like dishwashers and refrigerators, AJ Madison has the best e-commerce site I’ve used for comparison shopping. Only NewEgg is comparable.

Y64 > Base64

Ever tried to put base64 in a URL? You can’t (w/o special encoding) because whatever genius came up w/ base64 (which admittedly is pretty clever) thought it would be fun to make the slash part of the encoded format. Wait, what? Yes, that’s right. The directory delimiter is part of the format. I’m guessing the inventor of Base64 was an MS-DOS user.

But, fear not. Some observant fellow over at Yahoo! also noticed this and came up with Y64. It’s a variant of Base64 that replaces the 3 URL-unsafe characters in base64 (‘+’,’/’,’=’) with URL-friendly characters like ‘.’, ‘_’ and ‘-‘.

Now you can convert to Y64, toss it into a URL and happily watch it work w/o having to do any silly URL-encoding.

In the cloud? Try above the clouds!

Facebook’s new Timeline feature is designed basically to record all human events. Presumably forever. Because otherwise Facebook would get boring. It did get me thinking though. How long before the physical space required by the storage devices recording all human events actually surpasses the physical space required by presently occurring events? Probably later than decades and sooner than milleniums, so let’s say it will happen in centuries.

This brings me to my point: datacenters in SPACE! Sure, you’re skeptical, but what do datacenters need? Plenty of electricity, bandwidth and room, and a dry, cold environment.

The sun’s up there, so we’ve got solar powered electricity covered. Bandwidth is the reason most of our satellites are already up there, so we know we can do that. Of course there’s plenty of room. They don’t call it SPACE for nothing! And, if you haven’t been to space, I can assure you, air conditioning in space will not be necessary (which further reduces our electricity needs). In fact, space is cold enough that superconductor based chips would be as easy to operate as normal chips so our programs would be absolutely flying!

Given the look on your face, I can see you’re still not convinced. That’s ok. Eventually, when datacenters are as prevalent as suburban strip malls, you’ll come around. And it will be that much handier to travel in space once our websites are already running up there.

The Inheritance Trap

Anyone who’s ever done anything with object oriented programming has at least experimented with inheritance. It’s great. It makes so many things easy. Until you’re trying to figure out wtf is going on in some code you didn’t write. Then you have to trace through the entire inheritance hierarchy and try to decipher which version of a virtual function is being called.

This happens when we use inheritance to reuse code. Of course, the way to avoid this is to get code reuse through composition instead.

The problem is, given a set of code, it’s so easy to make a subclass and override a particular bit of functionality. Certainly easier than moving that bit of code into a composable component and changing everything to use it that way.

This is what makes object oriented inheritance a trap. It’s so tempting at first and so frequently turns painful by the end. Our solution is two-fold. First, resist the temptation to use inheritance even when it’s available. And second, stop adding inheritance to new tools. As long as inheritance is around, it will continue luring developers into its trap.

A conversation with OO

OO: Everything’s in a class now, it’s all better.
Me: Which class?
OO: I dunno. One of them. Just search through all the virtual methods until you find something that might be getting executed.

clofresh:

Jobs. 

clofresh:

Jobs. 

Developers come in version 1 and 2 also

It’s now more than a week after Surge and the thing I still find the most thought provoking was Kate Matsudaira’s idea of 2 distinct types of developers working in two modes: version 1 and version 2.

Version 1 developers are great prototypers. They get a lot of work done fast but might leave a few bits undone. Version 2 developers are more deliberate. They build a system to be solid and scalable, but they’re slow. If this doesn’t resonate as true, you must’ve been working with different developers than I have.

Matsudaira proposed that these 2 types of developers don’t collaborate very well when on the same projects. The v2 devs feel the v1 devs are doing a sloppy job and the v1 devs feel the v2 devs are slow and wasting time on too many tests and YAGNI work. Matsudaira suggested segregation as the solution. The v1 people work on prototyping projects and the v2 people come behind and solidify the most promising products and features.

On the one hand, this makes total sense. On the other hand, as one who perhaps leans toward the v2 end of the spectrum, I wouldn’t want to be shut out of the more exploratory types of work that would seem to be reserved for v1 devs. Maybe I just need to better recognize when prototyping is called for, put my v2 sensibilities to the side and embrace my v1 self.

NoSQL is Right… about no SQL

If the NoSQL DBs are right about anything, it’s that SQL is not worth implementing. SQL is great for poking around a database but it has no place in a live website.

SQL’s problem is that it’s lossy. It loses any information the programmer might have about how a query should be executed. We spend all day programming every other part of a website, but when it comes to the database, DB vendors have us convinced, “Hands off, you can’t be of any help here.” In a website, we want predictable performance, and a lossy language like SQL does not lead to predictability.

Instead we’re left to depend on query planners and optimizers to find the right way to run the query and just hope that that isn’t terrible. Which is most of the time.

What if I want to say this to my database? “Database, do this query using this index on these columns. If the index isn’t there, do me a favor and just fail that query rather than taking down the whole database trying to execute it without the index. Thanks Database.” SQL can’t provide this control, but it would help in the web environment.

There’s no reason that most NoSQL databases couldn’t implement a SQL interface to their data, but they’re right not to. It’s fine for the interface to require some database-specific programming, especially if it comes with reduced risk of performance problems. We’re programmers, we can handle it.

My new database

At the bar the other night, Etienne asked Dan and I, “If you were starting a new web application tomorrow, which of the current databases would you use?”  Dan works for 10gen, makers of MongoDB, so his answer is fine, but obviously biased. My answer: none of them.

All of the current set of databases have significant shortcomings:

  • PostgreSQL is cool, but lacks (stable) replication
  • MySQL is cool, but you have code your own sharding to scale it
  • Riak seems to have a good storage & sharding model, but the interface is primitive and painful to work with
  • MongoDB has a great interface, but it’s ad-hoc sharding model & stories of data loss make it suspect as a system of record

If you’re wondering why there’s no mention of Oracle, stop reading and close the browser window now. You’re not invited to read the rest of this.

Instead, I would invent my own perfect database. At the core of my magical, utopian database is what I will call a graph-oriented star-schema. Star-schemas are traditionally used for data warehouses, but I believe they could be successfully applied to a graph-oriented database as well. A star schema has dimension tables and fact tables. A dimension contains the spectrum of values that a column can hold and the database would have special support for them. Dimensions can be scalar values or tuples of scalars.

Fact tables would have 3 types of columns:

  • dimensions (fully indexed)
  • json-ish structured data
  • blob

Fact tables are related to each other through the dimensions. Here’s where the “graph” part comes in. The storage and indexing are optimized not for aggregated queries but for navigating from one fact table to the next via shared dimensions.

MapReduce is available for large queries and runs in Lua or Javascript or some other embedded language. The code is installed directly on the database servers like old school stored procedures. The database should have facilities built in for testing these functions against mock datasets.

And of course it should be linearly scalable and resilient to losing individual nodes. Fact tables are automatically sharded across nodes, probably by a specified dimension. Dimensions are probably stored on every node, but maybe an optimization is that dimensions are stored only on the nodes where related facts are stored.

Sounds good amiright? If you give me $1 million in funding, I’d be happy to build it for you. I’ll even let you name it!