Today, one of the MySQL database server response time has been increased within the pool of servers and noticed it was returning the query in 0.20 secs (randomly) as opposed to 0.0 secs within the LAN to the web server and was able to reproduce using MySQL command line tool as well.
First step is to isolate the slow response by finding the source of culprit, either it could be database server or app server or network components between the two systems.
When the same query was executed locally within the database server over socket or using TCP, the query runs in 0.0 secs. Query profile output looked identical in terms of timings for both 0 secs to 0.20 secs execution; which rules out database is executing the query in-time.
Then comes, network troubleshooting to find where it is actually spending 0.20 secs. There are number tools that comes handy to debug any network related issues (errors, dropped packets, congestion, overruns etc.).
My favorite network-diagnostic tools that always comes handy when you need it.
The stats from ifconfig, ethtool, sar, mtr, disabling multi-cast etc. did not give any picture of what is happening; nor in addressing the problem.
Finally when tshark output was compared by capturing tcp packets, found that the packets were re-transmitted due to no ACK back or loss of packets by MySQL server; which explains why it is taking that long to receive query results.
Here is how tshark helped to find the problem:
Lets say, MySQL is running on 10.1.1.1 on port 3306 and web server on 10.2.2.2.
Open 2 windows in webserver (10.2.2.2)
Window 1:
Run tshark and capture the network packets on port 3306
1 2 3 4 5 6 7 |
sudo tshark -i eth0 tcp port 3306 and host 10.1.1.1 or you can also run from mysql server by filtering on webserver ip sudo tshark -i eth0 tcp port 3306 and host 10.2.2.2 |
Window 2:
Run the query that reproduces the problem (0.20 secs and also 0 secs)
1 2 3 |
mysql “h10.1.1.1 “P3306 “e œselect now(); |
Here is the output from tshark for 0 secs query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
0.000000 10.1.1.1 -> 10.2.2.2 TCP 39157 > mysql [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=40055523 TSER=0 WS=9 tcp.srcport == 39157 0.000173 10.2.2.2 -> 10.1.1.1 TCP mysql > 39157 [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1380 TSV=39648809 TSER=40055523 WS=7 tcp.srcport == 3306 0.000202 10.1.1.1 -> 10.2.2.2 TCP 39157 > mysql [ACK] Seq=1 Ack=1 Win=6144 Len=0 TSV=40055523 TSER=39648809 tcp.srcport == 39157 0.000499 10.2.2.2 -> 10.1.1.1 MySQL Server Greeting proto=10 version=5.1.57-rel12.8-log tcp.srcport == 3306 0.000566 10.1.1.1 -> 10.2.2.2 TCP 39157 > mysql [ACK] Seq=1 Ack=69 Win=6144 Len=0 TSV=40055523 TSER=39648809 tcp.srcport == 39157 0.000945 10.1.1.1 -> 10.2.2.2 MySQL Login Request user=root tcp.srcport == 39157 0.001045 10.2.2.2 -> 10.1.1.1 TCP mysql > 39157 [ACK] Seq=69 Ack=63 Win=5888 Len=0 TSV=39648809 TSER=40055523 tcp.srcport == 3306 0.001066 10.2.2.2 -> 10.1.1.1 MySQL Response OK tcp.srcport == 3306 0.001110 10.1.1.1 -> 10.2.2.2 MySQL Request Query tcp.srcport == 39157 0.001329 10.2.2.2 -> 10.1.1.1 MySQL Response tcp.srcport == 3306 0.001486 10.1.1.1 -> 10.2.2.2 MySQL Request Query tcp.srcport == 39157 0.001743 10.2.2.2 -> 10.1.1.1 MySQL Response TABULAR tcp.srcport == 3306 0.001755 10.2.2.2 -> 10.1.1.1 MySQL Response tcp.srcport == 3306 0.001760 10.2.2.2 -> 10.1.1.1 MySQL Response tcp.srcport == 3306 0.001763 10.2.2.2 -> 10.1.1.1 MySQL Response tcp.srcport == 3306 0.001788 10.1.1.1 -> 10.2.2.2 TCP 39157 > mysql [ACK] Seq=188 Ack=2929 Win=11776 Len=0 TSV=40055524 TSER=39648809 tcp.srcport == 39157 0.001800 10.1.1.1 -> 10.2.2.2 TCP 39157 > mysql [ACK] Seq=188 Ack=5217 Win=16896 Len=0 TSV=40055524 TSER=39648809 tcp.srcport == 39157 0.002058 10.1.1.1 -> 10.2.2.2 MySQL Request Quit tcp.srcport == 39157 0.002070 10.1.1.1 -> 10.2.2.2 TCP 39157 > mysql [FIN, ACK] Seq=193 Ack=5217 Win=16896 Len=0 TSV=40055524 TSER=39648809 tcp.srcport == 39157 0.002196 10.2.2.2 -> 10.1.1.1 TCP mysql > 39157 [FIN, ACK] Seq=5217 Ack=194 Win=5888 Len=0 TSV=39648809 TSER=40055524 tcp.srcport == 3306 0.002218 10.1.1.1 -> 10.2.2.2 TCP 39157 > mysql [ACK] Seq=194 Ack=5218 Win=16896 Len=0 TSV=40055524 TSER=39648809 tcp.srcport == 39157 |
But for the same query when it took 0.20 secs; here is the output from tshark:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
0.000000 10.2.2.2 -> 10.1.1.1 TCP 49081 > mysql [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=40033975 TSER=0 WS=9 tcp.srcport == 49081 0.000174 10.1.1.1 -> 10.2.2.2 TCP mysql > 49081 [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1380 TSV=39569042 TSER=40033975 WS=7 tcp.srcport == 3306 0.000202 10.2.2.2 -> 10.1.1.1 TCP 49081 > mysql [ACK] Seq=1 Ack=1 Win=6144 Len=0 TSV=40033975 TSER=39569042 tcp.srcport == 49081 0.000543 10.1.1.1 -> 10.2.2.2 MySQL Server Greeting proto=10 version=5.1.57-rel12.8-log tcp.srcport == 3306 0.000604 10.2.2.2 -> 10.1.1.1 TCP 49081 > mysql [ACK] Seq=1 Ack=69 Win=6144 Len=0 TSV=40033975 TSER=39569042 tcp.srcport == 49081 0.000998 10.2.2.2 -> 10.1.1.1 MySQL Login Request user=root tcp.srcport == 49081 0.001136 10.1.1.1 -> 10.2.2.2 TCP mysql > 49081 [ACK] Seq=69 Ack=63 Win=5888 Len=0 TSV=39569042 TSER=40033975 tcp.srcport == 3306 0.001150 10.1.1.1 -> 10.2.2.2 MySQL Response OK tcp.srcport == 3306 0.001218 10.2.2.2 -> 10.1.1.1 MySQL Request Query tcp.srcport == 49081 0.001461 10.1.1.1 -> 10.2.2.2 MySQL Response tcp.srcport == 3306 0.001615 10.2.2.2 -> 10.1.1.1 MySQL Request Query tcp.srcport == 49081 0.001881 10.1.1.1 -> 10.2.2.2 MySQL Response TABULAR tcp.srcport == 3306 0.001896 10.1.1.1 -> 10.2.2.2 MySQL Response tcp.srcport == 3306 0.001900 10.1.1.1 -> 10.2.2.2 TCP [TCP Previous segment lost] [TCP segment of a reassembled PDU] tcp.srcport == 3306 0.001947 10.2.2.2 -> 10.1.1.1 TCP 49081 > mysql [ACK] Seq=188 Ack=2929 Win=11776 Len=0 TSV=40033975 TSER=39569042 tcp.srcport == 49081 0.001956 10.2.2.2 -> 10.1.1.1 TCP [TCP Dup ACK 15#1] 49081 > mysql [ACK] Seq=188 Ack=2929 Win=11776 Len=0 TSV=40033975 TSER=39569042 SLE=4297 SRE=5217 tcp.srcport == 49081 0.205367 10.1.1.1 -> 10.2.2.2 MySQL [TCP Retransmission] Response tcp.srcport == 3306 0.205425 10.2.2.2 -> 10.1.1.1 TCP 49081 > mysql [ACK] Seq=188 Ack=5217 Win=14336 Len=0 TSV=40033995 TSER=39569063 tcp.srcport == 49081 0.205669 10.2.2.2 -> 10.1.1.1 MySQL Request Quit tcp.srcport == 49081 0.205680 10.2.2.2 -> 10.1.1.1 TCP 49081 > mysql [FIN, ACK] Seq=193 Ack=5217 Win=14336 Len=0 TSV=40033995 TSER=39569063 tcp.srcport == 49081 0.205809 10.1.1.1 -> 10.2.2.2 TCP mysql > 49081 [FIN, ACK] Seq=5217 Ack=194 Win=5888 Len=0 TSV=39569063 TSER=40033995 tcp.srcport == 3306 0.205831 10.2.2.2 -> 10.1.1.1 TCP 49081 > mysql [ACK] Seq=194 Ack=5218 Win=14336 Len=0 TSV=40033995 TSER=39569063 tcp.srcport == 49081 |
Line 14 & 17 (TCP Previous segment lost and TCP Retransmission) indicates the re-submission of packets as its been lost (no ACK). For more information on interpretation, check wireshark page.
Now that we identified the problem is in the network layer, it should be easy to track by looking at various pieces like network switch, cable etc. by ensuring no loose cable or NIC or we have the latest network drivers and firmware.
In this case, network switch was identified as the root cause and replacing it fixed all the problems.
New #mysql planet post : Troubleshooting database network problems using tshark https://t.co/6ZDHzrgJ
Output with 0 secs and 0.20 secs is the same.
I understand what is shown here, but please update the output for 0 secs.
oops, blunder, just fixed it
Troubleshooting Database Network Problems Using tshark https://t.co/EX8MHskf
Troubleshooting Database Network Problems Using tshark https://t.co/hjvnFeV3
https://t.co/4KhBVYID: Troubleshooting Database Network Problems Using tshark https://t.co/4KhBVYID https://t.co/4KhBVYID
“Venu Anuganti Blog » Troubleshooting Database Network Problems Using tshark†https://t.co/JFnj6jhj