Thank you Orbitz for posting [Erlang vs.] Java and and Threads (Jetty):
The basic idea is, instead of using 1 thread per connection, since connections can last awhile, they use 1 thread per request that a connection has. The hope being, a connection will idle most of the time and only send requests once in awhile. The problem that they ran into is, a piece of software is using a request timeout to poll for data. So requests are now sticking around for a long time, so they have all these active threads that they don’t want. So to deal with this, they use a concept of continuations so the thread can die but the request still hang around, and then once it’s ready to be processed a thread is created again and the request is handled. So having all these requests hanging around that arn’t doing anything is no longer a problem.
ell, this begs the question, why are you using a dynamic number of threads in the first place if you are going to have to limit how many you can even make. If the problem, in the first place, is they have too many threads running, then their solution works only for idle threads doesn’t it? Being forced to push some of the requests to a continuation means they have applied some artificial limit to the number of threads which can be run. What happens then, when the number of valid active requests exceeds this limit? What then? Push active requests to a continuation and get to then when you have time? Simply don’t let the new requests get handled? If they want to to use threads to solve their problem then putting a limit on them seems to make the choice of threads not a good one. Too poorly paraphrase Joe Armstrong, are they also going to put a limit on the number of objects they can use? If threads are integral to solving your problem, then it seems as though you are limiting how well you can solve the problem.
This also got me thinking about other issues involving threading in non-concurrent orientated languages. Using a COL (Concurrent Orientated Language) all the time would be nice (and I hope that is what the future holds for us). But today, I don’t think it is always practical. We can’t use Erlang or Mozart or Concurrent ML for every problem due to various limiting factors. But on the same token, using threads in a non-COL sometimes makes the solution to a problem a bit easier to work with. At the very least, making use of multiple processors sounds like a decent argument. But writing code in, say, java, as if it was Erlang does not work out. I think the best one can hope to do is a static number of threads. Spawning and destroying threads dynamically in a non-COL can be fairly expensive in the long run and you have to avoid situations where you start up too many threads. I think having a static number of threads i a pool or with each doing a specific task is somewhat the “best of both worlds”. You get your concurrency and you, hopefully, avoid situations like Jetty is running into. As far as communication between the threads is concerned, I think message passing is the best one can hope for. The main reason I think one should use message passing in these non-COL’s is, it forces all of the synchronization to happen in one localized place. You can, hopefully, avoid deadlocks this way. And if there is an error in your synchronization, you can fix it in one spot and it is fixed everywhere. As opposed to having things synchronized all over the code, god knows where you may have made an error.
…although it seems not all his readers corroborate with what he meant by “concurrent oriented languages”.
I strongly concur that languages *such as* Erlang (I’m saying such as, because Erlang got the concept right, and other languages /platforms/technologies may follow) will lead or at least make the transition into the future easier.
What the hell is Erlang anyway? Well:
Joe Armstrong had fault tolerance in mind when he designed and implemented the Erlang programming language in 1986, and he was subsequently the chief software architect of the project which produced Erlang/OTP, a development environment for building distributed real-time high-availability systems. More recently Joe wrote Programming Erlang: Software for a Concurrent World. He currently works for Ericsson AB where he uses Erlang to build highly fault-tolerant switching systems.
Erlang is a concurrent functional programming language. Basically there are two models of concurrency:
- Shared state concurrency
- Message passing concurrency
Virtually all language use shared state concurrency. This is very difficult and leads to terrible problems when you handle failure and scale up the system.
Erlang uses pure message passing concurrency. Very few languages do this. Making things scalable and fault-tolerant is relatively easy.
Erlang is built on the ideas of:
- Share nothing. (Process cannot share data in any way. Actually, this is not 100% true; there are some small exceptions.)
- Pure message passing. (Copy all data you need in the messages, no dangling pointers.)
- Crash detection and recovery. (Things will crash, so the best thing to do is let them crash and recover afterwards.)
Erlang processes are very lightweight (lighter than threads) and the Erlang system supports hundreds of thousands of processes.
It was designed to build highly fault-tolerant systems. Ericsson has managed to achieve nine 9’s reliability [99.9999999%] using Erlang in a product called the AXD301. [Editor’s Note: According to Philip Wadler, the AXD301 has 1.7 million lines of Erlang, making it the largest functional program ever written.]
While people are talking about 16-, 32-, 64- bits… And limit their “stuff” (whatever it is, threads, objects, RAM, …) accordingly, in Erlang there is no such hard limit.
Erlang processes can grow as big as it wants, provided you give it *enough resources*. Which means, the *same* Erlang program can run on 1 node on a single workstation, or on 1,000 servers spread across different buildings (or continents). The programmer doesn’t care anyway.
How much limited RAM? How much sockets can be open? etc. doesn’t depend on the programmer, and hopefully the programmer won’t need to care about it. Who will care about it is the one who’ll be deploying and running the Erlang program.
Most people still think of programming (and worse, think of Erlang) as procedural languages, then built things on top of it including threading… a threading framework.
Erlang on the other hand is sort of kernel (hence why it’s called a VM, not simply an interpreter but a real VM that manages processes the way a OS manages OS processes). Every function runs on different processes. A process may run in its own Erlang VM node, a different VM node in the server, or on another server. The program doesn’t really care that much (it can care, but doesn’t have to use a “distributed framework” the way other languages do.)
More information about this exciting language:
Update: Some frameworks, in particular Message Queueing systems (e.g. Microsoft’s and Sun Java’s), I think got it right… but on a more complicated, heavyweight level. Erlang/OTP is, under the hood, a message queueing system but much lighter on the CPU… and much lighter on the programmer brain overhead. 😉
Update 2: As of now I still don’t know what OTP stands for 😉