Feed on

After I released the ESP8266-based OpenGarage, there has been a weird bug that kept bugging me: a small fraction of the controllers (let’s say 1 out of 10) would crash on restart, and that leads to an infinite reboot cycle. Once this happens, it seems there is no way to stop it until you reflash the controller with NodeMCU firmware, and then flash OpenGarage firmware back. It’s completely crazy because first, this issue is hard to reproducing — most controllers would work just fine, but once in a while, one would go nuts; second, upon crashing you can check the stack dump but the information is basically gibberish and does not help debugging at all; third, just reflashing the OpenGarage firmware itself is not sufficient — something seems to have been permanently altered in the flash memory such that you have to flash a completely different firmware (NodeMCU for example) to stop it; finally, even this is not a permanent fix — if you reboot the controller regularly for a few times, this issue will eventually happen again.

For the longest time I thought there is a bug in the OpenGarage firmware code that causes this to happen. But the above symptom defies any standard reasoning and explanation. This has kept bugging me until yesterday, I finally sat down and did a number of reproducible tests that helped me finally figure out the cause. It turns out this has to do partly with the ESP8266 hardware quality and partly software. The short story is that every time ESP8266 restarts, the firmware (more specifically, ESP8266 Arduino library) writes the current WiFi SSID and password to flash memory (along with other flash memory operations such as reading config files); over time this may cause a corruption at some point (possibly due to flash memory quality); once this happens, it will keep crashing on restart, and the only way to stop it is to erase the corrupted region (that’s what reflashing NodeMCU firmware accomplishes).

In fact, this issue has already been reported in the ESP8266 Arduino forum. I wish I had discovered this earlier. The solution is actually quite straightforward: simply add WiFi.persistent(false); right at the beginning of the code, that will stop the library from writing SSID and password to flash memory every time. This is unnecessary anyways since OpenGarage firmware already stores SSID and password in the config file.

This simple trick seems to have magically fixed the annoying bug. To be fair, I believe this bug only happens when you manage WiFi SSID and password separately in a config file stored in flash memory space — if the SSID and password are hardcoded into the firmware the issue wouldn’t happen. In any case, I wrote this blog post to document the issue and fix, and hope this may be useful for others who encounter similar issues with ESP8266.

6 Responses to “ESP8266 Reboot Cycled Caused by Flash Memory Corruption — Fixed!”

  1. Sebastian says:

    Great, thank you for the post. I had a similar issue where the ESP8266 (WeMos D1 mini) would boot up and work well, but then crash at the instant that it the WiFi.begin(“hardcodedSSID”, “hardcodedPW”) function is called.

    Setting WiFi.persistent(false) fixed this.

    Without setting WiFi.persistent(false), the ESP apparently tries to read out the corrupted persistent Wifi credentials first, despite the fact that I supplied hardcoded SSID and PW in the *.ino file.

  2. Don says:

    Is it the error where you see a boot mode:(3,6) along with a stack dump continuously? If so, I got it… 🙁

    loading config file…ok
    server started @ 80
    STA mode
    MDNS registered
    set time server

    Soft WDT reset

    ctx: cont
    sp: 3fff0b60 end: 3fff0e80 offset: 01b0

    3fff0d10: 3fff0f44 00000035 3fff310c 4020b458
    3fff0d20: 4022c491 00000000 00000000 4020b4b8
    3fff0d30: 01000100 3ffefb40 3ffefbe0 4020d31a
    3fff0d40: 4022c55b 00000000 3ffeeec0 00000000
    3fff0d50: 00000000 3fff0d80 00000000 3fff0d4c
    3fff0d60: 3ffeeec0 4022c7e3 3fff0db0 3ffe8da7
    3fff0d70: 4022c870 3ffe8d9a 3fff0db0 3ffe8da7
    3fff0d80: 00000000 4022c858 00000000 401004d8
    3fff0d90: 00000000 40105a30 4023120e 00000000
    3fff0da0: 00000000 40105995 00000000 3ffefb50
    3fff0db0: 3ffefb54 40231288 3ffefd8c 4020e095
    3fff0dc0: 40242460 fffffffb ffff8000 0000003a
    3fff0dd0: 3fff30a0 00000000 3ffefd8c 4020e0c0
    3fff0de0: 40201424 3ffefb58 3ffefd8c 3fff22e4
    3fff0df0: 3ffef924 3ffefb58 3ffefb98 4020cea9
    3fff0e00: 6401a8c0 00ffffff 0101a8c0 3fff22e4
    3fff0e10: 3ffef924 3ffefb40 3ffefbe0 4020dfb0
    3fff0e20: 3ffe9030 6401a8c0 40210854 40210840
    3fff0e30: 4020cbdc 3ffefbe0 40210854 40210840
    3fff0e40: feefeffe feefeffe feefeffe 3ffefe58
    3fff0e50: 3fffdad0 00000000 3ffefe50 40206dbc
    3fff0e60: feefeffe feefeffe feefeffe 40210298
    3fff0e70: feefeffe feefeffe 3ffefe60 40100718

    ets Jan 8 2013,rst cause:2, boot mode:(3,6)

    load 0x4010f000, len 1264, room 16
    tail 0
    chksum 0x0f
    csum 0x0f

    • AlfArv says:

      It looks more like a WatchDogTimeout reset caused by an endless loop in the sketch

      • ray says:

        Actually it turns out adding WiFi.persistent(false); at the beginning of the begin function fixes the issue. This has been documented on ESP8266 github website.

  3. Jody Ono says:

    My OpenGarage unit reboots every 60 seconds, regardless of reflashing with NodeMCU and then og_1.0.4. Do I need to do a full build with WiFi.persistent(false) and then reflash?