Threading.Timer - strange behaviour

williamg · November 14, 2018, 6:08pm

Hello,

I have measured that System.Threading.Timer has a very strange scheduling behaviour on the G400.
Thread: 100% CPU toggling a GPIO (Blue trace). This lets us see when the scheduler runs, every 20ms (a gap appears)

I start 5 timers, each scheduled for 100ms period, i.e.
T = 100,100,100,100,100
When they fire, they toggle a GPIO one time (Magenta trace)

As expected, every 5 times the scheduler runs, it fires all five timers one after the other:

However, this is what happens if I reduce one timer’s period slightly, i.e.
T = 100,100,100,100,80
You can see that the timers are now scheduled exactly 1 timer per scheduling event:

I have tried with various combinations of timers and period, and have found that you CANNOT exceed 1 timer elapsing every 20ms, or your timing is hopelessly lost. In fact, there are events which are silently discarded.
E.g.:
T=40,40 works (0,2,0,2) and 40,20 does not. Should be 1,2,1,2 but is 1,1,1,1 (50% get lost!)
T=60,60,60 (0,0,3,0,0,3) works and 60,60,40 does not. Should be 0,1,2,1,0,3… but is 1,1,1,1

I don’t understand this requirement, it seems really arbitrary that it is willing to fire more than one timer per scheduler event in some cases, but not in others.

I tried to find the limit, and found that you cannot increase the number of timers indefinitely - in fact 5 timers elapsing seems to be the maximum permitted per scheduling event.
If you try 10 timers (T=200,200,200,200,200) in fact you get 5 simultaneously, then 1 each for the next 5 scheduling events. This is even more bizarre than I might have expected (5 and then 5)

I feel pretty strongly that this is a bug and not a design feature. When I schedule things, I expect my framework to make a best effort to achieve it, instead of silently giving up my timing requirements and firing fewer events than I had requested.

Can I get some insight here please?
Thank you;
William

williamg · November 14, 2018, 7:23pm

To be clear, the change I would propose is the following:

Current implementation is MAX 5 timers per scheduler, with AVERAGE <1 per scheduler.
Proposed implementation is MAX 5 timers per scheduler, with AVERAGE <5 per scheduler.

I think this will follow the original intent of limiting CPU time given to timers, while keeping consistent timing until the actual system “limit” of 5/scheduler is reached.

Brett · November 14, 2018, 7:44pm

so, netmf is “open” now, right? Only people actively developing it are… well…

so, you could have a crack at it. You couldn’t for your G400 though. And GHI are building TinyCLR OS so aren’t going to incorporate some massive change in netmf when they have another stream to bring life to the platform.

williamg · November 14, 2018, 9:21pm

If GHI isn’t actively developing the source for the G400 and the rest of netmf is “open” when will GHI consider opening the G400 source - so those of us with products based on this platform can continue to use them?

Alternatively, what does the TinyCLR threading model look like for timers - does it exhibit the same behaviour, or can I look forward to TinyCLR on my G400 soon?

Justin · November 14, 2018, 11:59pm

Have you tried adding a thread.sleep(0)?

Gus_Issa · November 15, 2018, 1:12am

G400 is in full production and can be used with NETMF and TinyCLR. Both are not real time operating systems however we offered RLP and now we offer even better native extensions in TinyCLR. Through native code you can add real time and speedy extensions to handle specific tasks.

What are you trying to accomplish?

Mr_John_Smith · November 15, 2018, 2:38am

@williamg, The behaviour of .NET timers is non deterministic.

80ms doesn’t mean fire every period; it means to wait at least. So, if you have a timer set to 80ms, it could theoretically go off 10,000 ms later!

williamg · November 15, 2018, 3:14pm

I have experimented with Thread.Sleep(0). This does force a context switch as expected.
Can you explain how it would help in this case?
I’m not asking the scheduler to run at a granularity of <20ms, I’m asking the scheduler to fire all the timers which have elapsed, instead of choosing to fire exactly 1 timer in this failure mode.

I suppose your point is that if the scheduler ran more often, it would have more time slots in which to fire timers one-at-a-time. This of course isn’t feasible for a real-world application where I have several threads running together, IMHO it’s not realistic to add Thread.Sleep(0)'s throughout the code hoping to trick the scheduler into doing the right thing.

williamg · November 15, 2018, 3:31pm

Hi Gus, I feel like me needs are pretty basic - I want timers to elapse approximately when they were asked to, and to fire at the correct average rate, not discarding events arbitrarily.

I indeed understand this is not an RTOS (having come from FreeRTOS land) so I don’t expect guaranteed on-time execution. I expect simply that timers don’t change their firing behaviour based on how many I create and their relative timing.
I can accept a maximum number of fired timers per tick, I understand the consequences of context switching time. However, the limit on average fired timers per tick seems very arbitrary, and is extremely limiting.

As I indicated before, e.g.:
1 timer @ 60ms = okay (0 0 1)
2 timers @ 60ms = okay (0 0 2)
3 timers @ 60ms = okay (0 0 3)
4 timers @ 60ms = fail - timers fire in pseudo-random order, one every 20ms - meaning every 60ms, one of the 4 timers never gets fired. Ever.
I have shown the system will choose to fire up to 5 timers in one scheduler tick - but in this case, for some seemingly arbitrary reason, it refuses fire 4 timers in one scheduler tick.
To me, any choice would have been acceptable other than ‘just throw away timer events’, when clearly the system had the resources available to service them)

For my actual application, I will be running many parallel state machines which will be woken periodically to perform a brief function and go back to sleep (AutoResetEvent etc.). I was going to wake them with timers - but since I would have more than 5 timers, I can be guaranteed the timer system will not work for me in its current implementation.

williamg · November 15, 2018, 3:43pm

Sorry John, this is not correct.
It does not mean ‘wait at least’, it means ‘wait on average’.
I set a timer to run every 35ms. Of course this is impossible.
Your assertion is that it would wait 40ms every time. This is not true, and here is the proof:

The timer fires at times:
0, 19.4, 58.2, 97, 136, 175, 194, 233, …

This gives time deltas of:
19.4, 38.8, 38.8, 39, 39, 19, 39, …

What is the average of this repeating pattern (19.4, 38.8, 38.8, 39, 39)? Exactly 35.

mcalsyn · November 15, 2018, 4:35pm

Whether by definition, or by accident, it’s unlikely that you’ll see a change in this behavior without creating your own modified firmware, which we already know isn’t possible for GHI G400 legacy firmware. Even if GHI wanted to change this, it would change legacy behavior and while it would fix your issue, that change would probably break other folks with working code based on the legacy expectations. In other words, probably ain’t happening.

So, an alternative is to write code in the style of an animation loop (which I what I have done to solve this). You maintain a sorted list of tasks sorted by next run time (soonest first). You have a main loop that loops tightly, captures the current time, and examines the first task on the list. If the expected run time has arrived, run the delegate for that task and have it return the next desired run time. As you run each task, add it to a second list sorted by the next expected run time for that task. Continue to walk down the list until you hit a task whose runtime has not arrived or the end of the list. At that point, merge the two lists. Check the first item in the list and either start running again or sleep until the expected run time of the new first task in the list.

What this does is ensure that each ‘ready’ task runs, and minimizes cpu and energy use during idle times. It also ensures that all ready tasks get run, and it can detect when the sum of ready tasks start to go over budget (e.g., all tasks are always ready because the total run time of tasks is too large).

This is the core of the Sagitta (previously Verdant) runtime for drivers and agents. This code will be available as open source late in December, and I can post the essential bits here if desired.

williamg · November 15, 2018, 4:53pm

Thank you for this. It seems a good alternative to Timers.
Are you using this for every thread in the system? I.e. you are not using Timers or Threads, but have instead constructed your own scheduler ‘on top of’ the existing system?
Or is this one Thread which runs all your scheduled events, and other Threads run in parallel?
I could see both solutions working, I’m just curious which you chose.

mcalsyn · November 15, 2018, 6:49pm

I actually avoid threads as much as possible, so as to avoid the cost of context switches and avoid introducing additional non-determinism in execution times. Yes, this is a scheduler on top of the interpreter’s scheduler, and does not require timers - just sleep. Overall, performance can be higher than thread-based solutions.

This design allows you to schedule operations (sense, decide, act; uploads; etc) and run them completely single-threaded. Technically, since MCUs are single core and netmf is a single-threaded interpreter, this eliminates a lot of execution overhead that the syntactic sugar of ‘Threads’ create.

mcalsyn · November 15, 2018, 7:33pm

And to answer more concisely - This is one thread, and there are no others. Even device driver polling derives from this scheduler, through breaking that rule doesn’t keep this from working - just adds cost and makes it less deterministic.

Gus_Issa · November 15, 2018, 8:18pm

This is something we should investigate.