Performance Pix
This document is for diagnosing performance problems using the HTTP communication APIs of the Pix module of Dinamo.
Info
This flowchart is intended for solving performance problems. For architectural guidelines, see the topic Pix.
The initial basic diagnosis can be made using the flowchart below.
---
title: Fluxo para diagnóstico
---
%%{ init: { 'flowchart': { 'curve': 'basis' } } }%%
flowchart TD
inicio((Lentidão nas<br>requisições<br>Pix HTTP))
high_qty_session_?{A<br>quantidade<br>de sessões<br>abertas com o HSM<br>é maior que o<br>esperado ?}
check_app_is_closing[Verificar se a aplicação<br>está fazendo o fechamento<br>das sessões após o seu<br>uso, inclusive nos casos<br>de erro]
check_bacen_comm[Verificar se as chamadas<br>ao bacen estão<br>recebendo falhas. Em<br>caso de falha o Bacen<br>pode fazer um delay antes<br>de retornar um erro. Casos<br> de erro comuns: exceder o<br>limite de conexões e fila vazia.]
qty_session_still_high_?{O número<br>de sessões<br>com o HSM<br>continua<br>alto ?}
sessions_still_slow_?{A conexão<br>continua<br>lenta ?}
check_network[Verificar se há<br>lenditão ou<br>instabilidade<br>conhecida na rede]
network_still_slow_?{A conexão<br>continua<br>lenta ?}
check_bacen[Verificar se há<br>lentidão com<br>o Bacen. Entrar<br>em contato ou<br>verificar se há<br>comunicados.]
bacen_still_slow_?{A conexão<br>continua<br>lenta ?}
check_Logs[Analisar logs do<br>client do HSM em<br>depuração. Ver<br>troubleshoot<br>específico.]
logs_still_slow_?{A conexão<br>continua<br>lenta ?}
suporte((Acionar<br>o suporte<br>do HSM))
resolvido((Resolvido))
inicio --> high_qty_session_?
high_qty_session_? -->|Sim| check_app_is_closing
high_qty_session_? -->|Não| check_bacen_comm
check_app_is_closing --> qty_session_still_high_?
qty_session_still_high_? -->|Sim| suporte
qty_session_still_high_? -->|Não| resolvido
check_bacen_comm --> sessions_still_slow_?
sessions_still_slow_? -->|Não| resolvido
sessions_still_slow_? -->|Sim| check_network
check_network --> network_still_slow_?
network_still_slow_? -->|Não| resolvido
network_still_slow_? -->|Sim| check_bacen
check_bacen --> bacen_still_slow_?
bacen_still_slow_? -->|Não| resolvido
bacen_still_slow_? -->|Sim| check_Logs
check_Logs --> logs_still_slow_?
logs_still_slow_? -->|Não| resolvido
logs_still_slow_? ---->|Sim| suporte
Client Logs
1. Generate the logs of the HSM client when debugging
Enable the logs of the HSM client library in debugging and with flush enabled.
-
Set the following environment variable with the destination folder. Remember to set a destination folder without the file name. For more details on how to enable logs, see the Log topic.
HSM_LOG_DIR=/pasta_de_destino/
-
Restart the application and retest.
-
In the destination folder for the logs, open the log file generated. It should have the name tacndlib_xxxx.log, where xxxx will be the PID (Process ID) of the application.
2. Identify problematic operations
-
Identify the operations considered slow. You can see the total time of the Pix HTTP operation in the total_time_ms item of the LogDebugRequestInfo log line.
2021/02/03 19:06:21 144E0000 0000008C 408e0e2f debug LogDebugRequestInfo fpos="m" name_lookup_time_ms=0.000047 connect_time_ms=0.000227 appconnect_time_ms=5.000350 pretransfer_time_ms=5.000403 starttransfer_time_ms=5.000554 total_time_ms=6.000215 redirect_time_ms=0.000000
3. Analyze the data found
With the logs of problem operations identified, it is possible to see the points within the operation where the most time was spent.
The operation time logs have counters in milliseconds that mark the time between the start of the operation and the time of the marker. For example: connect_time_ms is the time from the start of the operation to the end of the connection to the remote server. appconnect_time_ms is the time from the start of the operation to the end of the TLS handshake. To find out the time taken by appconnect_time_ms, simply subtract the time taken by connect_time_ms from the time taken by appconnect_time_ms. So: TLS handshake time = appconnect_time_ms - connect_time_ms.
Some of the most common points to check.
- High weather in appconnect_time_ms
- When you first open a session, you may see a longer time in appconnect_time_ms. This is because the TLS handshake and the use of the key in the HSM take place at this point, which causes a longer time. In subsequent operations with the same session, this time should be low, because the tunnel is already established and will be reused from now on.
- If this time is extremely high, check the quality of the connection to the HSM.
- This is the only point in the operation where there is communication with the HSM.
- If the following operations continue for a longer time, check that the session parameters are enabled correctly; for details see the topic Sessions.
- If the time is still high, check that the session is not being reused. Session reuse is an important performance factor.
- High weather in starttransfer_time
- This counter marks the time from the start of the operation until the first byte is received. It can indicate slow communication with the remote server.
- Some possible causes are network slowness, slow processing of the remote server, etc.
- In the event of an error, BACEN may wait for a delay before sending a response. These are usually well-defined times. It is possible to see cases of waiting 5 seconds at this point.
Other points can be checked to debug the connections. The table below describes the other information available in the logs.
Given | Description |
---|---|
name_lookup_time_ms |
Time elapsed between the start of the request and the end of the name resolution. Time in milliseconds. |
connect_time_ms |
Elapsed time between the start of the request and the end of the connection with the remote host (or proxy). If there is redirection, the time is added. Time in milliseconds. |
appconnect_time_ms |
Elapsed time between the start of the request and the end of the SSL/SSH handshake with the remote host. Time in milliseconds. |
pretransfer_time_ms |
Time elapsed between the start of the request and the moment just before the data is sent. If there is redirection, the time is added. Time in milliseconds. |
starttransfer_time_ms |
Time elapsed between the start of the request until the first byte is received. If there is redirection, the time is added. Time in milliseconds. |
total_time_ms |
Total request time. If there is redirection, the time is added. Time in milliseconds. |
redirect_time_ms |
Total time taken for redirects. Time in milliseconds. |
content_length_upload |
Content-length when uploading data. Retrieved from the Content-Length field. Gets -1 if the length is not known. Size in bytes. |
content_length_download |
Size of the content-length when downloading the data. Retrieved from the Content-Length field. Gets -1 if the length is not known. Size in bytes. |
upload_speed |
Average speed of the complete upload. Speed in bytes/second. |
download_speed |
Average speed of the complete download. Speed in bytes/second. |
download_size |
Total size of the download payload (body). Metadata and headers are not included in this count. Size in bytes. |
upload_size |
Total size of the upload payload (body). Size in bytes. |
redirect_count |
Count of redirects made. |
num_connects |
Count of connections created to complete the previous transfer. |
request_size |
Total size of the request sent. Size in bytes. |
received_header_size |
Total size of all headers received. Size in bytes. |
http_response_code |
HTTP response code for the request. It will be zero if no response code is received. |