BuiltInEthernet issue

Back to your original post… Did you mean it does not happen with EMX?

That’s correct.

Checkout your power supply and regulator.

What firmware version are you running on the EMX and the G120E?

Roughly how long does it take for the G120E to stop responding, or respond longer, to pings? How long does it take for the G120E to return to normal behavior, if it does? Does it appear to be cyclical at all? Like working for 2 minutes, bad for 1, good for 2, bad for 1, etc.

Was this reproduced on a second G120E and also a second PC?

Firmware version: 4.3

It does not respond to one ping every 1/5 - 2 minutes when this application is running this application:
namespace MCA_IV
{
public static class Program
{
static EthernetBuiltIn Ethernet = new EthernetBuiltIn();
private static PWM _pwmHeartbeat;
static double _PWMTimer1Freq = 100000;

    public static void Main()
    {
       _pwmHeartbeat = new PWM(GHI.Pins.G120E.PwmOutput.P3_26, _PWMTimer1Freq, 0.50, false);
        Ethernet.Open();
        Ethernet.EnableStaticIP("172.16.2.14", "255.255.255.0", "1.0.0.0
        while (true)
        { 
            Thread.Sleep(10);
            _pwmHeartbeat.Start();
            Thread.Sleep(10);
            _pwmHeartbeat.Stop();
        }
    } 

but sometimes it time outs sooner. It only misses one ping and goes back to normal.
I can’t really tell if it is cyclical. with my real application it happens much frequently. it gets worse when I turn on the UI thread which runs every 10ms and refreshes the screen.
I can reproduce it on other G120Es .It does not happen on EMX at all.
It also happens with two G120Es connected to each other. I have not tried another PC.

What is the command you’re executing for ping? Is it firmware 4.3.8.1 for both EMX and G120E?

It sounds like the ping packet is just lost, so no amount of timeout will work. I’d use Wireshark or a similar tool to examine the ethernet traffic and look for the ping packets to see if anything stands out.

The command for ping: ping -t 172.16.2.14

Yes, the firmware for both EMX and G120E 4.3.8.1

I will look at the ethernet traffic and let you know if I see anything abnormal.
Thanks.

@John

would starting and stopping PWM or updating a screen every ten milliseconds cause interrupts to be masked which would result in the loss of ICMP messages? Race condition?

@ssalmi
Why updated a screen every 10ms. Do you need a 100 frames per seconds refresh rate?

Losing a ICMP packet and then recovering is not a major problem. I thought once the packet was lost, the device stopped responding. Your heartbeat processing should handle an occasional lost of heartbeat.

In a busy network yes it is possible. There is some buffering internally but not sure how much data it can hold.

Or a very busy device?

John,
Now that I think sometimes instead of ping timeout it took 2-3 seconds to answer the ping. Do you still suggest that the packets get lost?

Mike,

I’ve increased the screen refresh time to every 100ms, but the problem still exists.
The frequency 2-3 minutes of missing ping is for the very simple code. It is much higher in my original application. Ping response times are in seconds or the ping times out.
In the application, one G120E sends a ping every 5 seconds and if does not receive 3 consecutive ping responses it assumes the other G120E lost communication or is not present.
Although G120E is present but this problem occurs in about every 5 minutes (3 consecutive ping time outs).

Do you have a GHI board that you use to test? Just to rule out the custom hardware.

No, unfortunately I do not have GHI board but the same custom board is used for EMX and G120E.

To clarify: is it only ever just late replies to pings or is there sometimes never a reply?

Load on the device can certainly cause degraded network performance, but we’re not doing much in this situation. Given that the problem is exacerbated in your full application it does seem like the device is doing too much to respond quickly.

Load on the network can also cause degraded performance and it’s possible the G120 firmware isn’t as resilient to that as the EMX firmware. Is the PC connected to another network (perhaps via Wi-Fi or another ethernet connection) or is it just connected to the device?

Were you able to get the wireshark trace?

I set ping timeout on my PC to 10 seconds and I still get timeout. Sometimes in addition to timeouts I see long response time, 2-3 seconds.
Here are some of the images from wireshark trace:


Here it looks like there was no response.

In the next image the response was send to a wrong address:


My PC is directly connected to G120E through ethernet. There is no Ethernet but the PC is connected to another network with WIFI> however I turned WIFI off and it didn’t fix it.

I do understand the point, but why the same code (small code and real application) does not affect EMX? Is the device doing something in the background that we can’t see?

Thanks for the trace captures. There is other stuff going on in the firmware beyond your application like interrupts and timers. Since the EMX apparently doesn’t show the issue, there is likely a difference in their firmwares (they do not share the same codebase). For what it’s worth, we are able to reproduce delayed ping response here, but not the lost response (which appears to go to the wrong address?).

While neither case is ideal, is it possible to tweak the timeout window or count in your full application such that you allow for an errant delayed or missed ping?

John,
Thanks for your reply.
I spend some time yesterday and today to try to tweak the application, but the rate of ping timeouts are so high that increasing the number of the ping timeouts that are ignored will defeat the purpose of it (checking the other system) because if we go too long without any ping then we won’t know if the system is shut down, is in error mode or just not responding.

The issue does seem to scale with the load on the system. Is there some tight inner loop that you can break up? Perhaps making things more event driven? If the system is allowed to be idle for some periods that may help. Of course, this depends on your application.

If only to test, in whatever inner loops you have, play around with adding some sleeps of increasing duration up to a few hundred milliseconds to see if the ping issue is reduced, if possible.

Hello
We met similar issue working on our project year ago - Remove socket's supply with c# code - #13 by przemo
Network communication was not reliable; there were many retransmission and lost packets
We tried really hard to solve it but with no result
So my opinion is that G120E driving 4,3" TFT display is not combination that you can use for commercial network product.

1 Like