The Go Blog

KVM Virtualization on Centos 6.x

bantana
6 September 2014

KVM系统基础规范

KVM在redhat 6.0之后已经可以商用;这份规范就是针对rhel6 ,centos6之后的版本提供的指导;

我们先实现一个简单的模型再逐步加深虚拟化的了解:

实现路径:signle model -->cluster model -->cloud model 每一种模型提供更多的特性和自动化部署,高可用性等不同级别和设计的能力; 越往上实现就越依赖对底层的理解,否则基本上是不可维护的;

请分清楚设计能力和执行力的不同,好的设计并不代表能很好的执行; 简单的说每一层的设计都需要提供相应的执行基础条件,达不到执行条件的,基本上设计就不可进入产品运营状态;

最简单的模型

.       10.10.21.70 (lan ip)
            |                                                    | --> vm1-nginx1 ( wan ip /lan ip )
   LAN --> eth0 --> |                                            |
                    | --> br0 --> |                              | --> vm1-tomcat1 (lan ip)
                    |             |    |-------------------|     |
                    |             |--> | Centos KVM Master | --> | --> vm1-redis (lan ip)
                    | --> br1 --> |    |-------------------|     |
   WAN --> eth1 --> |                                            | --> vm1-netty (lan ip)
            |                                                    |
        123.1.2.3 (wan ip)                                       | --> vm1-mysql (lan ip)

简单说明:  Centos KVM Master 使用2块网卡: 内网eth0 外网eth1

br0 bridge eth0
br1 bridge eth1

guest vm 分别是vm1-*(vm1-nginx1, vm1-tomcat1,...)

网络基础

centos6 使用vhost类型安装

yum groupinstall "Virtualization Platform"

默认网卡列表

eth0
eth1
lo
virbr0 (need destroy)
sit0 (need destroy)

清除垃圾网络信息

清除sit0

sit0 is ipv6-to-ipv4,we don’t need, so we destroy this dev, some Visualize Manager Platform must deny ipv6.

disable modprobe.d/dist.conf

install ipv6 /bin/true

chkconfig ip6tables off

/etc/sysconfig/network

NETWORKING_IPV6=no
IPV6INIT=no

and reboot ,lsmod|grep -i ipv6

or if you don’t want reboot your centos6, may be you need do these actions:

ifconfig destroy sit0
service ip6tables stop
/etc/init.d/network restart
rmmod ipv6

清除virbr0

virbr0 is use to desktop vm nat model,we don’t need it. let’s detroy it:

#virsh net-list
   default

#virsh net-destroy default

#virsh net-undefine default

清除无效dhcp引起的网络信息

使用netstat -rn 看到有169.254.x.x或者未知的网络地址,这个问题的原因和dhcp client有关

开始清除

in /etc/sysconfig/network

NOZEROCONF=yes

and

/etc/init.d/network restart

now your network is clear. check the netwok route table

netstat -rn

KVM Bridged Network Configuration

With bridged networking you can share actual network device with KVM machines.This is required for servers with multiple network cards and gives you good performance.You can choose to put multiple segments into one bridged network or to divide it into different networks interconnected by routers.

we need use bridge net device in the product env.

now we have these net dev

eth0 eth1 lo

we need add bridge0 net dev, use to bridge eth0 so :

cd /etc/sysconfig/network-scripts/

change ifcfg-eth0

DEVICE=”eth0”
HWADDR=xxxxxxx
ONBOOT=”yes”
BRIDGE=”bridge0”

add ifcfg-bridge0

DEVICE=”bridge0”
TYPE=”Bridge”            <--- it’s very import,the “Bridge” ,the first ”B” is Upper.
ONBOOT=”yes”
DELAY=”0”

and run

/etc/init.d/network restart

so you see it, they has same MAC address

bridge0 Link encap:Ethernet HWaddr F0:DE:F1:17:FD:EB
    inet addr:192.168.10.9 Bcast:192.168.10.255 Mask:255.255.255.0

eth0 Link encap:Ethernet HWaddr F0:DE:F1:17:FD:EB
    inet6 addr: fe80::f2de:f1ff:fe17:fdeb/64 Scope:Link

查看是否正常工作:

# brctl show

bridge name bridge id          STP enabled interfaces
br0         8000.782bcb18c2db  no           eth0
br1         8000.782bcb18c2dc  no           eth1

网络容量规划和高可用性:

使用kvm的时候,硬限制在实际应用中需要规划。

bridge的物理带宽是无法超过上限的,但是 virtio可以让内部vm交换接近到总线带宽,一般我们在产品环境中使用kvm guest的net driver 类型通常选择e1000或者virtio,这是基于稳定性测试后的结论。我们在后面再来逐步通过 cluster和cloud模型来突破这些限制;

bond类型可以提供1.5倍的带宽,同时带来2个物理端口的冗余。如果结合vlan技术则可以带来2个交换机单独线路的网络冗余;

boud0:

model 1:端口冗余模型

|------switch1----|
| port1   | port2 |
|---------|-------|
    |         |
|----eth0---eth1--|
|       boud0     |
|-----------------|

model 2: 交换机冗余模型

|-------------------- vlan8 -----------------|
| swith1 port12           swith2 port2       |
|-----------|---------------------|----------|
            |                     |
|--------- eth0 ---------------- eth1 -------|
|-------------------- bond0 -----------------|

性能优化在后面单独来讲;

路由表基础规范

默认网关建议放到文件

/etc/sysconfig/network

GATEWAY=x.x.x.x

其他静态路由表维护

/etc/sysconfig/network-scripts/route-br?

rm /etc/sysconfig/network-scripts/route-eth?

语法:<ADDRESS>/<MASK BITS> via <GATEWAY> dev <eth#>

示例:

192.168.4.0/24 via 192.168.5.1 dev br0

如果你不会算cidr,可以使用如下规范:

GATEWAY0=192.168.0.1
NETMASK0=255.240.0.0
ADDRESS0=172.16.0.0

GATEWAY1=192.168.0.1
NETMASK1=255.0.0.0
ADDRESS1=10.0.0.0

建议第一种,更接近自然语言的描述形式,由于是同一个文件,第二种也可以

ifcfg-eth?br?中建议不要出现DEFAULT GATEWAY; 使用setup维护后,请手动清理;

否则eth?中的多个gateway容易覆盖掉default route,当你有多个地方配置default gateway 时候,ifup你都搞不清楚谁肇事,也不直观;redhat setuptools 提供了第3方的包来修复这个问 题,centos没有提供;

重启网络

#service network restart

examples:

# netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
192.168.5.0     0.0.0.0         255.255.255.0   U         0 0          0 br1
192.168.4.0     192.168.5.1     255.255.255.0   UG        0 0          0 br1
192.168.3.0     192.168.5.1     255.255.255.0   UG        0 0          0 br1
192.168.2.0     192.168.5.1     255.255.255.0   UG        0 0          0 br1
192.168.1.0     192.168.5.1     255.255.255.0   UG        0 0          0 br1
219.237.242.0   0.0.0.0         255.255.255.0   U         0 0          0 br0
0.0.0.0         219.237.242.254 0.0.0.0         UG        0 0          0 br0

规范写法:

[root@test3 network-scripts]# cat route-br1

192.168.4.0/24 via 192.168.5.1 dev br1
192.168.3.0/24 via 192.168.5.1 dev br1
192.168.2.0/24 via 192.168.5.1 dev br1
192.168.1.0/24 via 192.168.5.1 dev br1

tip:

本机192.168.5.0/24网段不写,默认同网段应该是广播,也就是在ifcfg-br1中已经通过BROADCAST申明,否则会得到一个忽略提示的错误;一些linux的网络检查有些版本不够规范,导致允许你加入192.168.5/24 gw 192.168.5.1的申明,这个错误的设置会导致本该默认在2 层交换的包走路由;

这个规范是由 ip route申明。

准备一台网络安装服务器:

 it’s very easy,

yum install httpd
mkdir /var/www/html/centos6
mount -o loop centos6.iso /mnt
cp -rf /mnt/* /var/www/html/centos6/

so we get a install url http://host-ip/centos6/ may be you need check iptables rule, i don’t like iptables

service iptables stop; chkconfig iptalbes off

也可以使用kickstart规范来通过pxe网络安装,通过ks脚本来实现无人职守自动安装部署; 结合pxe server,dhcp server,kickstart,可以实现机器上架,加电后自动部署安装;这个标准对自动化非常有吸引力,当你有多台设备需要install和recover的时候,后期只需要设备加电就可以自动部署;

其基本思路如下:

通过dhcp配置指定MAC address和hostname来固定标识唯一;需要建立和管理MAC address维护表; 通过pxe的ks脚本来自动安装指定系统环境,通过ks脚步中的@post指令自动修改系统配置文件为你需要的环境;通过netbackup机制快速恢复; 这个特性当你需要一个人维护上百台设备的时候可以节省大量的时间;

libvirt环境:

/etc/libvirt and /var/lib/libvirt/
/etc/libvirt 是virt的环境和配置脚本的地方;

/var/lib/libvirt通常我们用来存放image的位置,这个可以使用其他位置来处理;在cluster和cloud模型中我们通常使用gfs2或iscsi storage来映射,这种方式可以提供在线动态迁移特性并且可以和redhat的cluster套件HA整合;

/etc/libvirt/
├── libvirtd.conf
 ├── lxc.conf
 ├── nwfilter
 │ ├──allow-arp.xml
 │ ├── allow-dhcp-ser ver.xml
 │ ├──allow-dhcp.xml
 │ ├──allow-incoming-ipv4.xml
 │ ├──allow-ipv4.xml
 │ ├──clean-traffic.xml
 │ ├──no-arp-spoofing.xml
 │ ├──no-ip-multicast.xml
 │ ├──no-ip-spoofing.xml
 │ ├──no-mac-broadcast.xml
 │ ├──no-mac-spoofing.xml
 │ ├──no-other-l2-traffic.xml
 │ ├──no-other-rarp-traffic.xml
 │ ├──qemu-announce-self-rarp.xml
 │ └──qemu-announce-self.xml ├── qemu
 │ ├──api1.xml
 │ ├──api2.xml
 │ ├──api3.xml
 │ ├──api4.xml
 │ ├──api5.xml
 │ ├── autostar t
 │ └──networks
 │ └── autostart
 ├── qemu.conf
 └── storage
 ├── autostart
 │     └── default.xml -> /etc/libvirt/storage/default.xml
 └── default.xml
7 directories, 25 files

[root@localhost etc]# tree /var/lib/libvirt/

/var/lib/libvirt/
├── boot
├── images
│ ├──api1.img
│ ├──api2.img
│ ├──api3.img
│ ├──api4.img
│ ├──api5.img
│ └──api6.img
├── lxc
├── network
└── qemu
    ├── save
    └── snapshot
7 directories, 6 files

常用的libvirt命令:

[root@vm32 ~]# virt-
virt-clone          virt-host-validate  virt-install        virt-pki-validate   virt-viewer
virt-convert        virt-image          virt-manager        virt-top            virt-xml-validate

我们常用的就2个: virsh virt-install

#virsh help;
#virt-install help;

virsh list

[root@vm32 ~]# virsh list
 Id    Name                           State
----------------------------------------------------
 9     ganglia                        running
 10    varnishm3                      running
 12    varnishm5                      running
 13    inginxM                        running

简单使用virt-install

#virt-install --prompt

使用vnc来安装配置

#virt-install --prompt --vnc --vnclisten=192.168.10.9 --vncport=5904

根据prompt提示输入基本参数,后期可以使用virsh edit domainid来修改xml配置信息,或 者直接修改/etc/libvir t/qemu/domain_id.xml

查看帮助

# virt-install --help or man virt-install

基本例子:

如果对磁盘性能有要求的情况使用--nonsparse做磁盘完整分配;

vnc:

# [root@localhost ~]# virt-install -n api7 -r 512 --vcpus=1 -v -l http://192.168.10.5/centos6/ --disk / var/lib/libvirt/images/api7.imze=8 -w bridge:br0 --vnc --vncport=5907 --vnclisten=192.168.10.9

# tarting install...
# Retrieving file vmlinuz... \ | 7.2 MB 00:00 ... Retrieving file initrd.img... | 57 MB 00:02 ...
# 
# Allocating 'api7.img' | 8.0 GB 00:00
# Creating domain... | 0 B 00:00
# Cannot open display:
# Run 'virt-viewer --help' to see a full list of available command line options
# Domain installation still in progress.You can reconnect to the console to complete the installation process.
# 使用vnc client连接 192.168.10.9:5907
# timezone去掉utcgsia/Shanghai
# virsh list --all
# vnc client 192.168.10.9:5907
# 

console and kickstart:

# virt-install -n source52 -r 8192 --vcpu=4 -v --disk /dev/VolGroup/source52  --graphics none --location=http://10.8.2.241/ -w bridge=br1,model=virtio --extra-args="ks=http://10.8.2.225:81/install/source52.cfg ip=10.8.2.52 netmask=255.255.255.0 console=tty0  console=ttyS0,115200"

kickstart file:

[root@vm32 ~]# curl http://10.8.2.225:81/install/source52.cfg
# Kickstart file automatically generated by anaconda.
#
# #version=DEVEL
# install
# url --url=http://10.8.2.241/
# lang en_US.UTF-8
# keyboard us
# network --onboot yes --device eth0 --bootproto static --ip 10.8.2.52 --netmask 255.255.255.0 --gateway 10.8.2.247 --nameserver 202.106.0.20 --noipv6 --hostname source52
#
# rootpw  --iscrypted $1$Jers269u$A35hMam.swv6h44aQYE0k.
# firewall --disabled
# authconfig --enableshadow --passalgo=sha512
# selinux --disabled
# timezone Asia/Shanghai
# bootloader --location=mbr --driveorder=sda
# # The following is the partition information you requested
# # Note that any partitions you deleted are not expressed
# # here so unless you clear all partitions first, this is
# # not guaranteed to work
# clearpart --all --drives=sda --initlabel
# part pv.nginxM --grow --size=12000
# volgroup VolGroup --pesize=4096 pv.nginxM
# logvol / --fstype=ext4 --name=lv_root --vgname=VolGroup --grow --size=10240 --maxsize=120000
# logvol swap --name=lv_swap --vgname=VolGroup --size=16000
#
# part /boot --fstype=ext4 --size=200
#
# repo --name="CentOS"  --baseurl=http://10.8.2.241/ --cost=100
# reboot
# %packages --nobase
# @core
# ntpdate
# apr
# openssh-clients
#
# %post
# /sbin/chkconfig --level 2345 iptables off
# /sbin/chkconfig --level 2345 ip6tables off
#
# echo "NOZEROCONF=yes" >> /etc/sysconfig/network
# echo "NETWORKING_IPV6=no" >> /etc/sysconfig/network
# echo "IPV6INIT=no" >> /etc/sysconfig/network
#
#
# %end
#

Related articles