Skip to content

Performance Pix

This document is for diagnosing performance problems using the HTTP communication APIs of the Pix module of Dinamo.

Info

This flowchart is intended for solving performance problems. For architectural guidelines, see the topic Pix.

The initial basic diagnosis can be made using the flowchart below.

---
title: Fluxo para diagnóstico
---

%%{ init: { 'flowchart': { 'curve': 'basis' } } }%%

flowchart TD
    inicio((Lentidão nas<br>requisições<br>Pix HTTP))

    high_qty_session_?{A<br>quantidade<br>de sessões<br>abertas com o HSM<br>é maior que o<br>esperado ?}
    check_app_is_closing[Verificar se a aplicação<br>está fazendo o fechamento<br>das sessões após o seu<br>uso, inclusive nos casos<br>de erro]
    check_bacen_comm[Verificar se as chamadas<br>ao bacen estão<br>recebendo falhas. Em<br>caso de falha o Bacen<br>pode fazer um delay antes<br>de retornar um erro. Casos<br> de erro comuns: exceder o<br>limite de conexões e fila vazia.]

    qty_session_still_high_?{O número<br>de sessões<br>com o HSM<br>continua<br>alto ?}

    sessions_still_slow_?{A conexão<br>continua<br>lenta ?}

    check_network[Verificar se há<br>lenditão ou<br>instabilidade<br>conhecida na rede]
    network_still_slow_?{A conexão<br>continua<br>lenta ?}

    check_bacen[Verificar se há<br>lentidão com<br>o Bacen. Entrar<br>em contato ou<br>verificar se há<br>comunicados.]
    bacen_still_slow_?{A conexão<br>continua<br>lenta ?}

    check_Logs[Analisar logs do<br>client do HSM em<br>depuração. Ver<br>troubleshoot<br>específico.]
    logs_still_slow_?{A conexão<br>continua<br>lenta ?}

    suporte((Acionar<br>o suporte<br>do HSM))
    resolvido((Resolvido))

    inicio --> high_qty_session_?
    high_qty_session_? -->|Sim| check_app_is_closing
    high_qty_session_? -->|Não| check_bacen_comm

    check_app_is_closing --> qty_session_still_high_?
    qty_session_still_high_? -->|Sim| suporte
    qty_session_still_high_? -->|Não| resolvido

    check_bacen_comm --> sessions_still_slow_?
    sessions_still_slow_? -->|Não| resolvido
    sessions_still_slow_? -->|Sim| check_network

    check_network --> network_still_slow_?
    network_still_slow_? -->|Não| resolvido
    network_still_slow_? -->|Sim| check_bacen

    check_bacen --> bacen_still_slow_?
    bacen_still_slow_? -->|Não| resolvido
    bacen_still_slow_? -->|Sim| check_Logs

    check_Logs --> logs_still_slow_?
    logs_still_slow_? -->|Não| resolvido
    logs_still_slow_? ---->|Sim| suporte

Client Logs

1. Generate the logs of the HSM client when debugging

Enable the logs of the HSM client library in debugging and with flush enabled.

  1. Set the following environment variable with the destination folder. Remember to set a destination folder without the file name. For more details on how to enable logs, see the Log topic.

    HSM_LOG_DIR=/pasta_de_destino/
    

  2. Restart the application and retest.

  3. In the destination folder for the logs, open the log file generated. It should have the name tacndlib_xxxx.log, where xxxx will be the PID (Process ID) of the application.

2. Identify problematic operations

  1. Identify the operations considered slow. You can see the total time of the Pix HTTP operation in the total_time_ms item of the LogDebugRequestInfo log line.

    2021/02/03 19:06:21 144E0000 0000008C 408e0e2f debug LogDebugRequestInfo fpos="m" name_lookup_time_ms=0.000047 connect_time_ms=0.000227 appconnect_time_ms=5.000350 pretransfer_time_ms=5.000403 starttransfer_time_ms=5.000554 total_time_ms=6.000215 redirect_time_ms=0.000000

3. Analyze the data found

With the logs of problem operations identified, it is possible to see the points within the operation where the most time was spent.

The operation time logs have counters in milliseconds that mark the time between the start of the operation and the time of the marker. For example: connect_time_ms is the time from the start of the operation to the end of the connection to the remote server. appconnect_time_ms is the time from the start of the operation to the end of the TLS handshake. To find out the time taken by appconnect_time_ms, simply subtract the time taken by connect_time_ms from the time taken by appconnect_time_ms. So: TLS handshake time = appconnect_time_ms - connect_time_ms.

Some of the most common points to check.

  • High weather in appconnect_time_ms
    • When you first open a session, you may see a longer time in appconnect_time_ms. This is because the TLS handshake and the use of the key in the HSM take place at this point, which causes a longer time. In subsequent operations with the same session, this time should be low, because the tunnel is already established and will be reused from now on.
    • If this time is extremely high, check the quality of the connection to the HSM.
    • This is the only point in the operation where there is communication with the HSM.
    • If the following operations continue for a longer time, check that the session parameters are enabled correctly; for details see the topic Sessions.
    • If the time is still high, check that the session is not being reused. Session reuse is an important performance factor.
  • High weather in starttransfer_time
    • This counter marks the time from the start of the operation until the first byte is received. It can indicate slow communication with the remote server.
    • Some possible causes are network slowness, slow processing of the remote server, etc.
    • In the event of an error, BACEN may wait for a delay before sending a response. These are usually well-defined times. It is possible to see cases of waiting 5 seconds at this point.

Other points can be checked to debug the connections. The table below describes the other information available in the logs.

Given Description
name_lookup_time_ms Time elapsed between the start of the request and the end of the name resolution. Time in milliseconds.
connect_time_ms Elapsed time between the start of the request and the end of the connection with the remote host (or proxy). If there is redirection, the time is added. Time in milliseconds.
appconnect_time_ms Elapsed time between the start of the request and the end of the SSL/SSH handshake with the remote host. Time in milliseconds.
pretransfer_time_ms Time elapsed between the start of the request and the moment just before the data is sent. If there is redirection, the time is added. Time in milliseconds.
starttransfer_time_ms Time elapsed between the start of the request until the first byte is received. If there is redirection, the time is added. Time in milliseconds.
total_time_ms Total request time. If there is redirection, the time is added. Time in milliseconds.
redirect_time_ms Total time taken for redirects. Time in milliseconds.
content_length_upload Content-length when uploading data. Retrieved from the Content-Length field. Gets -1 if the length is not known. Size in bytes.
content_length_download Size of the content-length when downloading the data. Retrieved from the Content-Length field. Gets -1 if the length is not known. Size in bytes.
upload_speed Average speed of the complete upload. Speed in bytes/second.
download_speed Average speed of the complete download. Speed in bytes/second.
download_size Total size of the download payload (body). Metadata and headers are not included in this count. Size in bytes.
upload_size Total size of the upload payload (body). Size in bytes.
redirect_count Count of redirects made.
num_connects Count of connections created to complete the previous transfer.
request_size Total size of the request sent. Size in bytes.
received_header_size Total size of all headers received. Size in bytes.
http_response_code HTTP response code for the request. It will be zero if no response code is received.