- Wed, 15 Aug 2018
- 22:15:50 UTC
Yesterday Intel announced a new set of vulnerabilities known as L1 Terminal Fault or Foreshadow. It was discovered that memory present in the L1 data cache of an Intel CPU core may be exposed to a malicious process that’s executing on the CPU core.
All the software vendors are working on software updates to mitigate this vulnerability. As soon as they become available we will be updating the systems.
- Wed, 29 Aug 2018
- 15:52:49 UTC
We have been updating all our servers with the patches released by the different operating system providers
[Resolved] Intel processors security issueThis incident lasted 1 week, 6 days, 17 hours, and 36 minutes.
- Tue, 7 Aug 2018
- 21:59:23 UTC
We are investigating elevated packet loss in our Vancouver network.
- 22:20:11 UTC
Cogeco Peer 1 has confirmed they are investigating an issue in their network. We are waiting for an update and will post details as soon as we receive them.
- 22:32:49 UTC
While we wait for an update from Cogeco Peer 1, we can confirm the network is back to normal at this moment. We continue monitoring this issue and will provide an update as soon as we have more information.
- Wed, 8 Aug 2018
- 02:15:20 UTC
This incident has been resolved. Cogeco Peer 1 will provide a Reason For Outage report within next 2-3 business days as engineers are still investigating the root cause of the issue. We will publish a Postmortem report with all the details as soon as we receive them.
- Sun, 30 Sep 2018
- 14:30:25 UTC
On August 7, 2018, at 1:26pm PDT, the Network Operations Center \(NOC\) began receiving monitoring alerts for devices missing polls in the Vancouver Data Center. The networking team identified a potential issue with an aggregate switch and implemented a reroute of traffic through a redundant aggregate switch. This resolved the majority of the issues. The team continued their investigation and determined that a layer 2 traffic loop was occurring through a segment of the network. Once this had been identified, mitigating actions were implemented to normalize the network.
As a result of the incident on August 7, 2018, customers would have experienced varying degrees of connectivity issues. The incident was caused by an improper configuration for a customer’s new L2 circuit solution, that was brought to light when the customer initiated their connectivity. The NOC’s initial investigation focused on the 16th floor aggregate switch as a possible cause of the issue so the decision was made to re-route the traffic through the redundant aggregate switch on the 21st floor. This action resolved a majority of the reported customer connectivity issues. The networking team then continued their investigation and determined that a layer 2 traffic loop was occurring through a segment of the network. Once this had been identified as the cause, this was mitigated by deactivating the associated layer 2 tunnels. As a result of this final step, connectivity for the remaining few affected customers was restored.
PDT Time Zone:
• 13:26:00 - NOC receives alerts for network devices missing monitoring polls. Investigate possible issue with van-hc16e-agg-1.
• 13:29:00 - Devices begin responding to polls. NOC receives one customer report of brief connectivity issue during the event.
• 14:37:00 - NOC receives additional alerts for network devices missing monitoring polls.
• 14:45:00 - Attempts to access van-hc16e-agg-1 are unsuccessful so local console is attached. A possible reboot is evaluated to normalize the network disruption.
• 15:06:00 - Issue is escalated to Network Development Engineering for further investigation.
• 15:13:00 - Traffic for downstream devices is migrated from van-hc16e-agg-1 to van-21e-agg-1 by failing over the Redundant Trunk Group \(RTG\). This is in preparation of the possible switch reboot. This action resolved the majority of reported customer connectivity issues.
• 16:16:00 - Further investigation identified a layer 2 traffic loop that is being caused by a newly provisioned network solution. The Layer 2 network solution was deactivated and remaining effected customers report connectivity fully restored.
• 16:30:00 - Previous changes to RTG are reverted.
Downstream customer traffic was rerouted from van-hc16e-agg-1 to van-hc21e-agg-1, this resolved the majority of the customers experiencing connectivity issues. The remaining customers issues were resolved when the layer 2 tunnels were deactivated.
In future, the networking team will lab test these types of customized network solutions prior to deployment into production. Network design configurations will also be peer reviewed to ensure accuracy and optimization before provisioning steps are taken.
[Resolved] Vancouver network packet lossThis incident lasted 4 hours and 15 minutes.
- Sun, 5 Aug 2018
- 04:48:38 UTC
We are currently investigating this issue.
- 04:59:53 UTC
A datacenter wide power outage caused a downtime in our Toronto facility approximately at 12:26AM EST. Power has been restored and our team is working to to bring all our services comes back online.
- 06:19:40 UTC
We are still working to restore service to affected customers in our Toronto facility. A network issue caused by the outage is preventing us access to many of the servers (we know they are up, just not reachable).
A datacenter technician will work with us to troubleshoot the network problem within the next hour.
We will continue posting updates as we continue restoring services in Toronto.
- 08:14:00 UTC
We continue working to restore services in Toronto. The network issue has been identified and our team is working with on-site technicians to solve the problem.
- 09:45:38 UTC
All services have been restored in our Toronto datacenter. At this moment, all customers' applications have been started and are running. Our team will continue working to ensure all applications are working properly. We will post further updates as we know more about the root causes of this power outage.
Thanks to all our customers for the incredible patience and trust in our team during these hours of downtime.
- 14:55:01 UTC
Our service remains stable in Toronto. If you are having any issues with your sites please contact us and we will investigate immediately.
- Mon, 6 Aug 2018
- 01:02:19 UTC
This incident has been resolved. We will publish a postmortem report as soon as we receive the detailed Reasons For Outage (RFO) report from Cogeco Peer 1.
[Resolved] Outage in TorontoThis incident lasted 20 hours and 13 minutes.
- Past notices
- No further notices from the past 90 days.