Is Nginx a valid replacement for Apache?

Filed under: nginx apache 

I came across this post which, quite frankly, I consider to be complete and utter FUD. The author clearly at least tried Nginx or perused the documentation since he's got at least a passing familiarity with the software, but he seems to have missed the boat overall. I'm going to address his points one-by-one (at least as much as possible).

[Nginx] behaves in inexplicable ways for different browsers.

This complete and utter FUD. Browsers behave in inexplicable and different ways given the same content, but all Nginx does is serve content. It doesn't create content, so this claim is patently ridiculous.

The primary documentation is in Russian.

More FUD. It is true that Igor (the author of Nginx) is Russian and writes his documentation in Russian (surprise!). However, every bit of that documentation has been translated to English (and has ongoing efforts in several other languages as well, but I cannot attest to their completeness). This documentation has existed for over two years (and is linked to from the "primary" documentation), so there's little excuse to not be aware of it.

Nginx does not support .htaccess files.

Absolutely true. It's considered a feature. .htaccess files are certainly convenient, but they also introduce a serious performance penalty (if you enable them, Apache must check every directory on every page request to see if it has an .htaccess file and if it has changed since the last time it read it). One of Nginx' claims to fame is that it is much faster than Apache. One of the ways it achieves this is by not doing things that add lots of overhead to every request when there are faster ways of achieving the same thing. Further, .htaccess files are a potential attack vector for hackers, although this isn't the main reason Nginx doesn't support them.

Nginx requires you to have apache support tools lying around to do stuff.

Sort of true, but ultimately false. It's true that Nginx does not ship with its own version of htpasswd. It could (and such a program is trivial to write) but for whatever reason it doesn't (perhaps the assumption is that most people have Apache installed and so it would be redundant). But of course the web is full of such tools so you don't really need to install Apache just for this one tool. Also, the author says "tools", implying that anything except htpasswd is missing, but of course this is highly misleading. There's nothing else missing.

Nginx doesn't actually do anything beyond serve static HTML and binary assets... which is to say, it doesn't run php or perl or any of the other P's that you might find in the LAMP stack. What it does is take requests and proxies them to other servers that do know how to execute that code.

Once again, sort of true but highly misleading. The author suggests that all Nginx can do is proxy requests to another HTTP server. This is patently false. Nginx can use HTTP proxying or FastCGI to talk to backend applications. What it does not do is embed languages into the webserver or support CGI. There is a project to embed Python into Nginx (mod_wsgi) but it's not widely used as most Nginx users consider this separation of concerns a good thing. With mod_php or mod_perl, it can be a real pain to debug things like memory leaks because the programming language's interpreter runs within the Apache process. Apache's mod_language tools make for easy deployment but painful debugging. Also, unless you want to restart the entire web server anytime you make a change, you must use .htaccess files (whose drawbacks were outlined above). CGI isn't supported simply because, as Igor says "if you use CGI then you don't care about performance and you should just use Apache in that case". In any case, the difference between how Nginx handles dynamic content and how Apache does it boils down to how they pass the request along to another process (binary API vs HTTP/FastCGI).

The author sums up with this:

Finally, I am left with the question why? The ostensible reason is that it's faster and can therefore handle more requests. Even if we accept that as true (grumble, grumble), it only accomplishes that speed by passing the buck off to other servers. When you find a non-responsive site it's not because the static assets like images and HTML text are being served slowly... it's because the dynamic content generated by php/perl/python/ruby/whatever and the underly database from which the data is drawn cannot keep up. Nginx suffers that same failing... while requiring just as many resources because you now have to run so many different servers for each of the languages you want to code it.

Again, the author displays his complete and utter lack of understanding of a web stack and the difference between how Apache handles a request and how Nginx does it. Apache is a process-based server (threads being a type of process), whereas Nginx is asyncronous. Threaded programs can (but usually don't) perform as well as async programs if you are only measuring, say requests per second under a small to average load. What they don't do nearly as well is called scaling. The reason threaded programs don't scale as well is because they typically launch a separate thread for each incoming request. Not only is there latency introduced as the thread is spawned (this can be addressed to some degree with thread pools, but that introduces other issues), but the worst part is that a thread consumes a significant amount of RAM. On 32 bit Linux, this amount is typically 2MB per thread (the default stack size). That's 2MB per concurrent request, even if you are serving a 2K page of static HTML or a small image. The thread can also allocate more RAM if it needs to, but let's assume it doesn't for simplicity.

Let's say your server has 1GB of RAM and we ignore the RAM used by the kernel, Apache itself, PHP/Perl/Python/etc and all the other system processes. How many concurrent requests can you handle? Pretty easy: 1024MB/2MB = 512. Now for most sites, that's a pretty decent amount of traffic. Of course, we've consumed all the memory to do this, so if we get 513 requests we're now forced into swap. In reality, due to the other software running on the system, we'd certainly be forced into swap much, much sooner and we'd have much less available RAM to start with. Realistically, a 1GB Apache server could probably only handle around 100-200 simultaneous requests (much depends on the size of the response). Note that this is not the same as requests per second, which, if you have sub-second response times, 100-200 requests per second might only amount to 10-20 simultaneous requests.

The author's claim that site slowness isn't caused by static files is probably true for 90% of the web. Of course for sites that serve large files this is absolutely false. The longer a request takes to process, the more likely it is we'll start seeing a higher number of concurrent requests (since new requests come in faster than we can respond to them). This is why sites like youtube.com and torrentspy.com have switched to Nginx for serving static content. A single Nginx server can easily handle as many concurrent connections as imposed by the operating system limits (rather than memory limits). You don't need to be youtube.com to bring an Apache server to its knees if you serve many files that are more than a a few kilobytes each. A single Nginx server can easily replace a dozen similarly configured Apache servers for handling static content.

So what if you aren't serving big files? What difference does this make to you? Well, as a result of its asynchronous approach, Nginx also has the benefit of utilizing far less CPU. It doesn't take much traffic to drive Apache up into the 70-100% CPU utilization range. If you can manage to drive Nginx past 20% CPU then you don't need to read this as you certainly already have a team of scalability experts who can tell you the pitfalls of threading already.

The claim that Nginx is faster because it "hands off" processing to some other server is simply ludicrous. Apache does the same thing. The only difference is the protocol they use to communicate with that other process. Apache uses a binary API to pass information to PHP, Perl, Python, etc. Nginx uses HTTP or FastCGI. The processing of dynamic content is still done by the other process, not Apache or Nginx. Further, because Nginx uses less CPU and RAM, there are more resources available for that other process to use to get its job done. Frankly, this assertion left me flabbergasted. It's actually really difficult for me to believe that someone could have such a poor understanding of such a simple software stack.

Finally, I'm going to address the author's grudging allowance that Nginx might be faster ("grumble, grumble") for static content by simply asserting that Nginx stomps an absolute mudhole in Apache when it comes to overall performance. We aren't discussing a few KB/s faster or a few requests/second faster, we are discussing hundreds to thousands of requests per second faster for static content (depending on your hardware), all while using a fraction of the CPU and RAM Apache does before it fails.

Anyway. I'm going to try to forget I read this pile of rubbish. If the author wants to believe that Apache is better, so be it. I just find it unfortunate that a certain percentage of clueless people will find the article informative.



2 comments Leave a comment