localghost.io_

OSS development, a continuous lesson in humility

Yesterday I recieved a bug report on NEST, my .net client for elasticsearch. Now I've received many bug reports in the last 2 years and on several occasions these bug reports made me feel humiliated to the bone.

Yes, I wrote that and yes, that's pretty stupid

Now I am only human, my first reaction when I receive a pretty severe bug report is: Surely this can't be true and Come on! I'm not that stupid! but 90% of the time I investigate and can only conclude: Yes it IS true and Yes I AM that stupid.

Yesterday's bug report, was a prime example of this.

Async requests are not really async
------
Hi,
I noticed that SearchAsync is making synchronous calls in reality.
I think the fault might be with this code (DoAsyncRequest):

if (tcs.Task != null && tcs.Task.Result != null)

As it accesses the Task's Result, we wait for the task to complete...

Fuck me thats pretty severe! Surely this can't be true and Come on! I'm not that stupid! Looking at the code though:

var tcs = new TaskCompletionSource<ConnectionStatus>();
this.Iterate(this._AsyncSteps(request, tcs, data), tcs);
if (tcs.Task != null && tcs.Task.Result != null)
{
    tcs.Task.Result.Request = data;
    tcs.Task.Result.RequestUrl = request.RequestUri.ToString();
    tcs.Task.Result.RequestMethod = request.Method;
}
return tcs.Task;

Damnit! Yes it IS true and Yes I AM that stupid. Fair enough the fix seems easy enough, return a continuation on the task instead.

var tcs = new TaskCompletionSource<ConnectionStatus>();
this.Iterate(this._AsyncSteps(request, tcs, data), tcs);
return tcs.Task.ContinueWith(t =>
{
    cs.Task.Result.Request = data;
    tcs.Task.Result.RequestUrl = request.RequestUri.ToString();
    tcs.Task.Result.RequestMethod = request.Method;
    return tcs.Task.Result;
});

Nest's comes with a demo indexer that tests the troughput of the different IConnection implementations. The code itself looks like this:

var partitionedMessages = msgGenerator.Generate(numMessages).Partition(bufferSize);
Interlocked.Exchange(ref NumSent, 0);
foreach (var messages in partitionedMessages)
{
    var t = client.IndexManyAsync(messages, indexName, bulkParms);
    tasks.Add(t);

    Interlocked.Add(ref NumSent, bufferSize);
    if (NumSent % 10000 == 0)
    {
        Console.WriteLine("Sent {0:0,0} messages to {1}", NumSent, indexName);
    }
}
Task.WaitAll(tasks.ToArray());

msgGenerator generates new Message()'s' without doing ANY IO. So I really should be seeing multiple requests being being processed at the same time.

Yet fiddlers timeline shows me something pretty horrific:

fiddler timeline first run

Somethings still blocking damnit.

For reference here's the throughput the test app reports:

consoleoutputfirstrun

Fuck me these are not anywhere near the throughput numbers I reported 4 months ago

HTTP: 12123 docs/s
Thrift: 17575 docs/s

Mother of regressions!

Looking at the "fixed code" again:

var tcs = new TaskCompletionSource<ConnectionStatus>();
this.Iterate(this._AsyncSteps(request, tcs, data), tcs);
return tcs.Task.ContinueWith(t =>
{
    cs.Task.Result.Request = data;
    tcs.Task.Result.RequestUrl = request.RequestUri.ToString();
    tcs.Task.Result.RequestMethod = request.Method;
    return tcs.Task.Result;
});

The TaskCompletionSource still blocks while it coordinates the async steps. Let's try that again:

var tcs = new TaskCompletionSource<ConnectionStatus>();
if (!this._ResourceLock.WaitOne(timeout))
{
    var m = "Could not start the operation before the timeout of " + timeout + "ms completed while waiting for the semaphore";
    tcs.SetResult(new ConnectionStatus(new TimeoutException(m)));
    return tcs.Task;
}
try
{
    return Task.Factory.StartNew(() =>
    {
        this.Iterate(this._AsyncSteps(request, tcs, data), tcs);
        return tcs.Task.Result;
    }, TaskCreationOptions.LongRunning);
}
finally
{
    this._ResourceLock.Release();
}

Here I explicitly start a new Task and on top of that moved the semaphore out of _AsyncSteps() so we delay the creation of Task<T>'s to the very last second. The TaskCreationOptions.LongRunning is also key here to prevent threadpool starvation! The TPL will prefer to really start a new Thread for your task not take one from the threadpool.

Let's see how we are doing now:

fiddler timeline second run

Yes! that's more like it!

consoleoutputsecondrun

Nice, we actually improved throughput from 12k 4 months ago to 17k now. Awesome!

This is why developing in the open is so awesome, it keeps you humble and grounded benefiting both your sanity and the quality of the project.

These fixes are pushed to nuget and in Nest 0.9.20.0

Already 5 comments

doronyundefined

Thanks for the quick resolution of the issue! I love your library and it's awesome you maintain it so rigorously :)
I wanted to ask, though, why are you implementing your own synchronization code, instead of relying on built-in stuff like HttpClient.PostAsync? Won't that make your code simpler and easier to maintain?

Mpdreamzundefined

Thanks for the kind words! Mostly because Iwanted to be in full control. Most of the code was written back when nest still supported .net 3.5. Somewhere last year i bumped to 4.0 but that still left HttpClient out of the picture, the other wrapper WebClient async stuff is event based which I personally think is more complicated then HttpWebRequest Begin_/End_ pattern.

It might be fun excersize to make a Nest.Connection.HttpClient45 project that has an IConnection that uses HttpClient though :)

inhiesundefined

I just wanted to say I've missed you man, and hope you are doing well! I remember working with you was what really got me started programming <3

-Insert Name Here

Mpdreamzundefined

@inhies wow, how awesome to hear from you man! So many fond memories of hacking on things over IRC with you 😄
We should catch up virtually, I don't really hang on IRC anymore but suggest a spot!

Super curious to hear where live took you this past decade :)

inhiesundefined

@Mpdreamz I'm so sorry for not getting back with you! Do you use Discord at all? I'm always on there now. IRC got too clunky for me these days, discord is modern IRC++ :)

Comments on this post are linked to this post's issue page on GitHub

<< back to index

OSS development, a continuous lesson in humility

Written by Mpdreamz