OSS development, a continuous lesson in humility
Yesterday I recieved a bug report on NEST, my .net client for elasticsearch. Now I've received many bug reports in the last 2 years and on several occasions these bug reports made me feel humiliated to the bone.
Yes, I wrote that and yes, that's pretty stupid
Now I am only human, my first reaction when I receive a pretty severe bug report is: Surely this can't be true and Come on! I'm not that stupid! but 90% of the time I investigate and can only conclude: Yes it IS true and Yes I AM that stupid.
Yesterday's bug report, was a prime example of this.
Async requests are not really async
------
Hi,
I noticed that SearchAsync is making synchronous calls in reality.
I think the fault might be with this code (DoAsyncRequest):
if (tcs.Task != null && tcs.Task.Result != null)
As it accesses the Task's Result, we wait for the task to complete...
Fuck me thats pretty severe! Surely this can't be true and Come on! I'm not that stupid! Looking at the code though:
var tcs = new TaskCompletionSource<ConnectionStatus>();
this.Iterate(this._AsyncSteps(request, tcs, data), tcs);
if (tcs.Task != null && tcs.Task.Result != null)
{
tcs.Task.Result.Request = data;
tcs.Task.Result.RequestUrl = request.RequestUri.ToString();
tcs.Task.Result.RequestMethod = request.Method;
}
return tcs.Task;
Damnit! Yes it IS true and Yes I AM that stupid. Fair enough the fix seems easy enough, return a continuation on the task instead.
var tcs = new TaskCompletionSource<ConnectionStatus>();
this.Iterate(this._AsyncSteps(request, tcs, data), tcs);
return tcs.Task.ContinueWith(t =>
{
cs.Task.Result.Request = data;
tcs.Task.Result.RequestUrl = request.RequestUri.ToString();
tcs.Task.Result.RequestMethod = request.Method;
return tcs.Task.Result;
});
Nest's comes with a demo indexer that tests the troughput of the different IConnection implementations. The code itself looks like this:
var partitionedMessages = msgGenerator.Generate(numMessages).Partition(bufferSize);
Interlocked.Exchange(ref NumSent, 0);
foreach (var messages in partitionedMessages)
{
var t = client.IndexManyAsync(messages, indexName, bulkParms);
tasks.Add(t);
Interlocked.Add(ref NumSent, bufferSize);
if (NumSent % 10000 == 0)
{
Console.WriteLine("Sent {0:0,0} messages to {1}", NumSent, indexName);
}
}
Task.WaitAll(tasks.ToArray());
msgGenerator generates new Message()'s' without doing ANY IO. So I really should be seeing multiple requests being being processed at the same time.
Yet fiddlers timeline shows me something pretty horrific:
Somethings still blocking damnit.
For reference here's the throughput the test app reports:
Fuck me these are not anywhere near the throughput numbers I reported 4 months ago
HTTP: 12123 docs/s
Thrift: 17575 docs/s
Mother of regressions!
Looking at the "fixed code" again:
var tcs = new TaskCompletionSource<ConnectionStatus>();
this.Iterate(this._AsyncSteps(request, tcs, data), tcs);
return tcs.Task.ContinueWith(t =>
{
cs.Task.Result.Request = data;
tcs.Task.Result.RequestUrl = request.RequestUri.ToString();
tcs.Task.Result.RequestMethod = request.Method;
return tcs.Task.Result;
});
The TaskCompletionSource still blocks while it coordinates the async steps. Let's try that again:
var tcs = new TaskCompletionSource<ConnectionStatus>();
if (!this._ResourceLock.WaitOne(timeout))
{
var m = "Could not start the operation before the timeout of " + timeout + "ms completed while waiting for the semaphore";
tcs.SetResult(new ConnectionStatus(new TimeoutException(m)));
return tcs.Task;
}
try
{
return Task.Factory.StartNew(() =>
{
this.Iterate(this._AsyncSteps(request, tcs, data), tcs);
return tcs.Task.Result;
}, TaskCreationOptions.LongRunning);
}
finally
{
this._ResourceLock.Release();
}
Here I explicitly start a new Task and on top of that moved the semaphore out of _AsyncSteps() so we delay the creation of Task<T>'s to the very last second. The TaskCreationOptions.LongRunning is also key here to prevent threadpool starvation! The TPL will prefer to really start a new Thread for your task not take one from the threadpool.
Let's see how we are doing now:
Yes! that's more like it!
Nice, we actually improved throughput from 12k 4 months ago to 17k now. Awesome!
This is why developing in the open is so awesome, it keeps you humble and grounded benefiting both your sanity and the quality of the project.
These fixes are pushed to nuget and in Nest 0.9.20.0
Already 5 comments
Mpdreamzundefined
Thanks for the kind words! Mostly because Iwanted to be in full control. Most of the code was written back when nest still supported .net 3.5. Somewhere last year i bumped to 4.0 but that still left HttpClient out of the picture, the other wrapper WebClient async stuff is event based which I personally think is more complicated then HttpWebRequest Begin_/End_ pattern.
It might be fun excersize to make a Nest.Connection.HttpClient45 project that has an IConnection that uses HttpClient though :)
inhiesundefined
I just wanted to say I've missed you man, and hope you are doing well! I remember working with you was what really got me started programming <3
-Insert Name Here
Comments on this post are linked to this post's issue page on GitHub
doronyundefined
Thanks for the quick resolution of the issue! I love your library and it's awesome you maintain it so rigorously :)
I wanted to ask, though, why are you implementing your own synchronization code, instead of relying on built-in stuff like HttpClient.PostAsync? Won't that make your code simpler and easier to maintain?