During the past week, we’ve had intermittent internet performance problems here at the ADM Schools. Now that we believe that we have them mostly ironed out, I thought that some of you might be interested in a detailed account of what went wrong, and what steps we took to diagnose and fix the problems.
- November 5th, 2014: We receive scattered reports of internet problems district-wide. We attempt to replicate the problems and are able to do so. We ruled out a wireless network issue as the problems occur on wired connections as well. That evening, we perform a firewall firmware update to try to address the issue.
- November 6th, 2014: Problems remain after the firewall update, and our testing indicates that the problem is not within our network. We report the problem to Heartland AEA – through which most area schools’ internet connections are aggregated – and the Iowa Communications Network (ICN), which is technically our internet service provider.
- November 7th, 2014: Heartland AEA reports that a Denial of Service attack on the Audubon Schools is the likely cause of the problems, and expects to have the problem resolved by Friday afternoon.
- November 10th, 2014: The problems remain on Monday morning. We contact the AEA and ICN.
- November 11th, 2014: With problems still remaining, we again attempt to verify that the problem is not internal. We verify with firewall logs that we are not blocking access to any needed services, and restart the firewall, wireless controller, and core switches. All testing indicates that the problem is external. Further, while our performance is consistently poor, internet speed tests are indicating that we are getting 180Mb down and 180Mb up, which is our expected value and more than enough for adequate performance. This indicates to us that there is likely either a DNS issue at the ICN or an issue with web filtering at the AEA. A quick test with Google’s 220.127.116.11 DNS server indicates that a DNS problem is unlikely.
- November 12th, 2014: The AEA acknowledges that the problem is likely with the firewall through which all area school district internet connections are aggregated. We work directly with AEA technicians to isolate the problem to the firewall. This was done by testing internet performance, then testing again after our service was set to bypass the AEA firewall. With the firewall bypassed, our performance problems vanished. Based upon this information, the AEA worked with an engineer from iBoss to determine that something was not working properly in the AEA’s iBoss appliance, and applied a firmware update to fix the problem.
- November 13th, 2014 – Morning: During the morning, problems largely appeared to have vanished. Few internet performance problems were reported, and those problems appeared to be traceable to the destination websites rather than any issues with our network or with the AEA/ICN internet service.
- November 13th, 2014 – ICN DNS Change: In what is likely a poor choice of timing, the ICN modified its DNS configuration statewide at approximately 2 PM. DNS provides name resolution – it tells the computer that http://www.google.com means 18.104.22.168, for instance – and is necessary for computers to access websites and other online services. This DNS modification resulted in significant internet performance disruption statewide between about 2:00 and 2:30. Below, you can see Heartland AEA’s internet traffic graph that shows internet utilization by all area districts. Note the massive dropoff in traffic at the time of the ICN DNS change.
- November 13th, 2014 – Our Mistake: While we immediately began to diagnose our service disruption due to the DNS change noted above, we weren’t too concerned about it, because we knew that it coincided with this DNS switch. Further, as you can see in the graph above, we knew that the issue was affecting many districts, not just us. We became concerned, however, when 2:30 arrived and other districts’ performance had gone back to normal and we were still seeing site disruptions. Specifically, we were having trouble accessing Infinite Campus sites, as well as the ADM Schools website and a number of internally-hosted sites. We began to investigate a number of possibilities, including DNS replication from our DNS servers to the ICN, functionality of our internal DNS, and firewall permissions and potential deny entries. Eventually, we found the culprit. In dealing with a totally unrelated issue involving our IP phone system this morning, we had cloned a firewall entry that allowed for access sites on our own domain (adm.k12.ia.us) from both within and outside our network, and modified it to fit our IP phone service needs. Unfortunately, the clone did not complete successfully, so the modification was made to the original DNS rule, thus effectively removing the rule from the server. This seemingly unrelated maintenance task had caused a significant problem when the DNS change was made, resulting addresses on our domain (such as campus.adm.k12.ia.us and moodle.adm.k12.ia.us) to be inaccessible from on-campus. The timing of this DNS access problem – nearly coinciding with a major DNS change at the ICN – was purely coincidental. Further, the timing of the DNS problems in general was not related at all to the internet troubles caused by the AEA’s iBoss device.
Troubleshooting internet problems can be tremendously time-consuming and frustratingly slow to unfold. Over the past week, we’ve had three separate internet service issues – two external and one internal – and each had a separate, unrelated cause. We are hopeful that at this point our internet problems have been resolved, and that network operations will return to normal. That said, we continue to work on two reports of internet performance problems from today that remain unresolved, and could relate to our network or to the destination sites. As always, we will continue to monitor our internet connectivity, and ask that staff members report internet problems as they occur via admsupport.adm.k12.ia.us.
Cheers, and happy web surfing!