Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heap leak on tcp socket closing (IDFGH-8173) #9670

Open
3 tasks done
ifedotov1984 opened this issue Aug 29, 2022 · 4 comments
Open
3 tasks done

Heap leak on tcp socket closing (IDFGH-8173) #9670

ifedotov1984 opened this issue Aug 29, 2022 · 4 comments
Assignees
Labels
Status: Opened Issue is new

Comments

@ifedotov1984
Copy link

ifedotov1984 commented Aug 29, 2022

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

When frequently open and close tcp sockets, anytime there are heap leakage by sizeof(struct tcp_pcb).
This is because the socket remains in the state TIME-WAIT.

Must be: When allocating a new socket, if the value is exceeded CONFIG_LWIP_MAX_ACTIVE_TCP (MEMP_NUM_TCP_PCB), the oldest socket in the state TIME-WAIT must be released.
Code from lwip:

tcp_alloc(u8_t prio, struct tcp_pcb_listen *tpcb)
{
  struct tcp_pcb *pcb;

  LWIP_ASSERT_CORE_LOCKED();

  pcb = (struct tcp_pcb *)memp_malloc(MEMP_TCP_PCB);
  if (pcb == NULL) {

    /* Try to send FIN for all pcbs stuck in TF_CLOSEPEND first */
    tcp_handle_closepend();

    /* Try killing oldest connection in TIME-WAIT. */
    LWIP_DEBUGF(TCP_DEBUG, ("tcp_alloc: killing off oldest TIME-WAIT connection\n"));
    tcp_kill_timewait();
    /* Try to allocate a tcp_pcb again. */
    pcb = (struct tcp_pcb *)memp_malloc(MEMP_TCP_PCB);
    if (pcb == NULL) {
      /* Try killing oldest connection in LAST-ACK (these wouldn't go to TIME-WAIT). */
      LWIP_DEBUGF(TCP_DEBUG, ("tcp_alloc: killing off oldest LAST-ACK connection\n"));
      tcp_kill_state(LAST_ACK);
      /* Try to allocate a tcp_pcb again. */
      pcb = (struct tcp_pcb *)memp_malloc(MEMP_TCP_PCB);
      if (pcb == NULL) {
        /* Try killing oldest connection in CLOSING. */
        LWIP_DEBUGF(TCP_DEBUG, ("tcp_alloc: killing off oldest CLOSING connection\n"));
        tcp_kill_state(CLOSING);
        /* Try to allocate a tcp_pcb again. */
        pcb = (struct tcp_pcb *)memp_malloc(MEMP_TCP_PCB);
        if (pcb == NULL) {
          /* Try killing oldest active connection with lower priority than the new one. */
          LWIP_DEBUGF(TCP_DEBUG, ("tcp_alloc: killing oldest connection with prio lower than %d\n", prio));
          tcp_kill_prio(prio);
          /* Try to allocate a tcp_pcb again. */
          pcb = (struct tcp_pcb *)memp_malloc(MEMP_TCP_PCB);
          if (pcb != NULL) {
            /* adjust err stats: memp_malloc failed multiple times before */
            MEMP_STATS_DEC(err, MEMP_TCP_PCB);
          }

What really: if MEMP_MEM_MALLOC=1 (by default), the value of CONFIG_LWIP_MAX_ACTIVE_TCP (MEMP_NUM_TCP_PCB) does not matter (not used in memp functions), and new sockets will be placed any time when calling tcp_alloc until the heap is zeroed out. And tcp_kill_timewait(); will call only when all ESP heap will be close to 0 (then memp_malloc(MEMP_TCP_PCB) will return NULL on allocation fails)
Setting MEMP_MEM_MALLOC=0 solves this issue, but there appears runtime issue with MEMP_NUM_SYS_TIMEOUT value (increasing this value solves this problem).

@espressif-bot espressif-bot added the Status: Opened Issue is new label Aug 29, 2022
@github-actions github-actions bot changed the title Heap leak on tcp socket closing Heap leak on tcp socket closing (IDFGH-8173) Aug 29, 2022
@shelacek
Copy link

There is a pull request targeting the same issue: espressif/esp-lwip#63.

@projectgus
Copy link
Contributor

projectgus commented Oct 1, 2024

Just wanted to comment that we've also had reports of problems from too many sockets in TIME-WAIT, which this bug enables. If connections are short-lived and rapid then you can easily end up with hundreds of sockets in TIME-WAIT state, rather than limited to CONFIG_LWIP_MAX_ACTIVE_TCP (MEMP_NUM_TCP_PCB)

@david-cermak
Copy link
Collaborator

Hi Angus,
Thank you for bringing this up again, and I apologize for the delay in addressing it.

I can confirm that the current lwIP doesn't properly limit the number of active PCBs. Here's a preliminary patch that should resolve this, we'll fix this soon. I'm sorry the inconvenience.
bugfix-lwip-fix-tcp-pcb-allocation-max-num-limition.txt

@projectgus
Copy link
Contributor

Hi @david-cermak, nice to hear from you again!

Appreciate the update and the patch, I know you folks are always busy.

For MicroPython's specific purposes we have our own patch now (PR is linked above), which basically uses a linker wrap to do the same thing as your patch - so we're good until this gets fixed upstream. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Opened Issue is new
Projects
None yet
Development

No branches or pull requests

5 participants