We had a BGP incident on the backbone which concerned many OVH's principal backbone routers between 5:30 and 6:00. Actually, all is back to normal. We are searching the problem's origin.
Date: 2010-07-12 10:41:24 UTC All in all, OVH was isolated twice this morning
30 minutes (Jul 12 05:31:54 / Jul 12 06:01:31)
25 minutes (Jul 12 06:45:13 / Jul 12 07:10:40)
Date: 2010-07-12 10:40:35 UTC The server which manages the grouping of the scan alerts
has saturated the disc space on one of the partitions.
/dev/md0 71679728 71679728 0 100% /home.2
We check to see why there was suddenly much registered
The scripts which introduce the access-list
on the routers were expected to manage this case of
7380 + Jul 12 05:02:11 root ( 1) antiscan /home/antiscan/check2router.pl
7381 N + Jul 12 05:02:18 root ( 1) antiscan /home/antiscan/check2router.pl
7382 N + Jul 12 05:02:25 root ( 1) antiscan /home/antiscan/check2router.pl
7383 N + Jul 12 05:02:32 root ( 1) antiscan /home/antiscan/check2router.pl
7384 N + Jul 12 05:02:39 root ( 1) antiscan /home/antiscan/check2router.pl
writing problem /home/antiscan//access-list/access-list-ovh.1278903731
writing problem /home/antiscan//access-list/access-list-route.1278903738
writing problem /home/antiscan//access-list/access-list-route.1278903745
The problem is that another script has taken the information which were
partially written and has made the \"diff\" and modified the access-list
on the routers. We have also a protection with \"permit ip any any\"
which were not visibly added automatically on the output
on the routers.
The consequence is that Ovh was isolated from the Internet network on
Jul 12 05:31:54
The system has corrected the access-list on
Jul 12 06:01:31
in a way that OVH was again accessible via the internet.
there, we have taken a look at the origin of the problem but we did not
have much time to fix it ... because
Jul 12 06:45:13
the system has isolated OVH again from the internet.
We had to come to the office in order to be able to get connected to the
internal network and in order to take off the access-list of 4 principal
routers at Paris.
Jul 12 07:10:40
the situation was fixed.
Jul 12 07:15:43
the access-list were completed on the other routers
in a way that it functions again on the backbone.
The situation is stabilised. We are taking a look at
the logs in order to understand the order of things then correct
the scripts with this type of problems.
Date: 2010-07-12 06:40:09 UTC The system antiscan is the origin of the problem.