On 10 Oct 2009, early morning , one of our server was not accessible while our monitoring system keeps altering us, looks like there was something gone wrong happening, yup i was right, called up DC guy and asked them to console in that server, they saw Kernel panic messages on console, yuck…., this server is used to hosts abaout 20 VPSes, hence because of this all off those VPS were also inaccessible , nothing we can do except do hard reboot, and asked DC guy to do so, and few minutes later, that server was back to reality from it’s deep coma, but when sshed to server, i found server’s load was about 20 , ALAMAK!!!!, “calm dawn …”, i tried to encouraging myself several times
i found that the software RAID that we are using on that server, was being resynchronizing between all disks in that RAID array, and it’ was very very super duper slow, it’s only about 4.5KB/Sec!!!! yuck… , and tried to bring up one of VPS hosted on that server, it was always gives me “cannot lock VE” , there was no other options , i need turned off automatic startup for that virtuozzo service when server booting up
#chkconfig –level 345 vz off
and initiated “shutdown -r now” , not worked, damn!!…. tried “shutdown -r -f -n now” also gave me that the server is shutting down, wait for 3 minutes…., 10 minutes…., 25 minutes…. yuck, enough .. called up data center and ask them to repeat hard reboot sequence , and after rebooted, sshed to server, check for RAID synchronization ,using more /proc/mdstat
———————————————————————————————————–
Personalities : [raid1]
md0 : active raid1 sdb2[1] sda2[0]
4192896 blocks [2/2] [UU]
md2 : active raid1 sdb3[1] sda3[0]
479998016 blocks [2/2] [UU]
[>....................] resync = 0.7% (3745728/479998016) finish=139.4min speed=56924K/sec
md1 : active raid1 sdb1[1] sda1[0]
4192832 blocks [2/2] [UU]
———————————————————————————————————–
Hmmmm… much much better….after RAID synchronization completed, bring up VZ service and all VPSes back up for serving their “Masters” fiuuuh….
unfortunately it was not solved the problem completely, some of customers that have access to VZ/Power Control Panel , complained that they had problem accessing Plesk Control Panel (8443), login successful but they were always got this error message,
“Plesk is not allowed”
quick investigations found that this only affected to customer that using Virtuozzo offline management, by turning this offline management off, there is no issue accessing Plesk Control Panel, but they will lose capability to reboot,shutdown VPS remotely from VZ/Power Control Panel, these are some step by step in how to rectify the issue,
check Vzagagent status and restart it
#vzagent_ctl status
the results should be similar like this
vzagent (pid 28742 21104 20602 12985 12980 11195 11071 10749 10571 10375 10374 9214 9213 9206 9205 9203 9201 9200 9199 9198 9189 9185 9184 9182 9181 9180 9179 9178 9177 9176 9175 9174 9173 9172 9171 9170 9169 9168 9167 9166 9165 9164 9163 9162 9161 9160 9159 9158 9156 9155 9154 9153 9152 9151 7146 2461) is running…
and try to restart it
#vzagent_ctl restart
and if this also does not solve the problem with offline management still active, just try to restart Service Container,
#vzctl restart 1
otherwise you need to reinstall Service Container (VE 1), fortunately in my case , no need to reinstall that Service Container
References:
http://forum.parallels.com/showthread.php?t=84022
http://kb.parallels.com/en/659