[ t h e f r a g g l e . c o m ]

technology, photography and anything else that springs to mind.

xen vcpu pinning defaults aren’t ideal

I noticed an oddity the other day with a xen Domain0 host we have. There’s a cron scripted job that verifies the RPM database and the RPM’s that are installed on the system, for some reason this job failed, but kept the process open, and kept spinning around trying to do it’s job. Now, I really ought to have set up a “process count” check on the nagios monitoring we have here, but I didn’t have this at the time, so didn’t pick it up for a few days. Whilst this was all going on, the Domain0 got pretty busy and started having to use time on the other CPU’s as well as the main VCPU that wasn’t pinned to anything but the Domain0.

You can see this from the list below of the vcpu resources used by a xen server currently:

[root@somedomain0 ~]# xm vcpu-list
Name                              ID VCPUs   CPU State   Time(s) CPU Affinity
Domain-0                           0     0     0   -b-  1535018.3 0
Domain-0                           0     1     1   -b-  139549.6 1
Domain-0                           0     2     2   -b-  943651.0 2
Domain-0                           0     3     3   -b-   53883.4 3
Domain-0                           0     4     4   -b-  336268.9 4
Domain-0                           0     5     5   -b-   65240.1 5
Domain-0                           0     6     6   -b-   42854.6 6
Domain-0                           0     7     7   r–   67960.9 7
domain1                           4     0     2   r–  1791844.4 1-2
domain1                           4     1     1   r–  1619120.1 1-2
domain2                       5     0     3   -b-  511300.0 3-5
domain2                       5     1     3   -b-  456253.1 3-5
domain2                       5     2     5   -b-  456516.1 3-5
domain3                     6     0     6   -b-  166344.6 6-7
domain3                     6     1     7   -b-  137435.2 6-7

You’ll see Domain-0 which is the control domain, is pinned to all the other cpu’s that should only be used by the guests.

This isn’t ideal, and as a result you find that usually instead of a vmstat looking quite healthy and the “steal %” value that shows up being at 0, it’ll start to creep up. This means that the scheduler on the Domain0 side is interrupting the VCPU and requires CPU time from it, interrupting whatever is happening on the DomainU side.

There is a vcpu-pin action available within the xm command, which isn’t ideal to be used when you have the server live. What I found best, was to change the boot configuration for the Domain0 from the following:

title Enterprise Linux (2.6.18-128.el5xen)
root (hd0,0)
kernel /xen.gz-2.6.18-128.el5
module /vmlinuz-2.6.18-128.el5xen ro root=/dev/vg01/root console=tty0 rhgb quiet
module /initrd-2.6.18-128.el5xen.img

To the following:

title Enterprise Linux (2.6.18-128.el5xen)
root (hd0,0)
kernel /xen.gz-2.6.18-128.el5 dom0_max_vcpus=1
module /vmlinuz-2.6.18-128.el5xen ro root=/dev/vg01/root console=tty0 rhgb quiet
module /initrd-2.6.18-128.el5xen.img

You’ll notice the option dom0_max_vcpus=1, this tells the Domain0 to pin to only one available VCPU, the one it’ll choose should be the first one.

You’ll see a difference in the vcpu-list afterwards like this:

[root@somedomain0 ~]# xm vcpu-list
Name                              ID VCPUs   CPU State   Time(s) CPU Affinity
Domain-0                           0     0     0   r–      54.0 0
domain1                           3     0     7   -b-       3.2 6-7
domain1                           3     1     6   -b-       3.0 6-7
domain2                          1     0     1   -b-      10.3 1-2
domain2                           1     1     2   -b-       2.9 1-2
domain3                          2     0     3   -b-       3.7 3-5
domain3                          2     1     4   -b-       2.5 3-5
domain3                          2     2     5   -b-       0.9 3-5

It’s worth noting that you can also limit this on the fly, by using the following command:

xm vcpu-pin Domain0 0 0

Which can be useful if you can’t get the down time for a box and it’s guests.

Tags: , ,
, ,
September 1, 2009 at 11:11 pm Comments (0)

TanTan flickr photo gallery broken after upgrade to Wordpress 2.3

So I realised that the admin part of the Flickr Photo gallery plugin I use (which is at http://www.thefraggle.com/flickr/) was a bit broken after upgrading wordpress to v2.3.

It appeared to work with the configuration I had set previously. However, when on the options page for the plugin, it was complaining of not being able to find a standard wordpress include.

A quick google search showed me a couple of resources, that told me where I needed to make a slight alteration to the source of my flickr plugin …

in the file “/path/to/your/wordpress-install/wp-content/plugins/silaspartners/flickr/admin-options-load.php

    require_once(dirname(__FILE__).$tmpPath.'/wp-admin/admin-db.php');

needs to change to

    require_once(dirname(__FILE__).$tmpPath.'/wp-admin/includes/user.php');

Once that’s done, the plugin works a treat!

Tags: , ,
, ,
October 27, 2007 at 7:30 pm Comments (0)

Virtual domains in exim4

I’ve been using exim for a while with virtual domain support, and thought it be best to document what I did somewhere.

For a long time I wondered how I might actually support virtual domains in exim 4 and held off by just dumping all mail from all domains into my mailbox (how gosh darn lazy is that).

I finally got bothered enough to, and found a lot of easy to follow help on the internets (google), and came up with the following additions to my exim4.conf :


domainlist localdomains = dsearch;/etc/exim4/virtual : @ : localhost

and in the routers section

begin routers
...
vdom_aliases:
driver = redirect
allow_defer
allow_fail
domains = dsearch;/etc/exim4/virtual
data = ${expand:${lookup{$local_part}lsearch*@{/etc/exim4/virtual/$domain}}}
retry_use_local_part
pipe_transport = address_pipe
file_transport = address_file
no_more

As you can see there is a directory called /etc/exim4/virtual, which contains several files, each of which define the aliases for a domain, an example file in that directory could look like:
filename: thefraggle.com


* : :fail:
chris : chris@localhost

As you can see this looks pretty simalar to the sendmail aliases file, but requires no rebuilding (if you have used sendmail at some point, you’ll know that you need to issue a “newaliases” command.

Anyway, hopefully that wasn’t too painful, any improvements, suggestions and other observations welcome!

Tags: , , ,
, , ,
June 14, 2007 at 6:59 pm Comment (1)

the number of servers you run …

Well reading popeys blog entry on how many servers he has, and what he uses them for made me feel a bit better than I did previously, about running more than one server of my own for personal use. I only have three servers, and a workstation and a laptop; maybe I don’t waste as much electricity as I thought I did :)

  • etch.thefraggle.com – Debian etch, xen vps from bitfolk; general webserver for www.thefraggle.com, and master mail server.
  • sarge.thefraggle.com – Debian etch, xen vps from bitfolk; run’s IRCd’s for blitzed.org and nixhelp.org and tertiary mail exchanger.
  • beastie.thefraggle.com – FreeBSD-6.2-stable on an old p2 400mhz 128mb ram; used to run an ircd for nixhelp, and thefraggle.com website, but now has been retired to being a development machine and tertiary mail exchanger.
  • laptop – centrino duo 1.7ghz 1gb ram; work laptop with winxp / debian etch for work stuff

There’s actually another box there, my dads p4 3ghz, that I have pretty much nicked off him for day to day internetting :) . I suppose the fact that I have three servers kind of means I am pretty geeky?

Would be interesting if anyone reading this also commented with what they use :) .

Tags: , , , ,
, , , ,
April 13, 2007 at 5:34 pm Comments (0)

always check the disk free!

Came up against the strangest problem the other day, which in the end made it blatantly clear that the most simple 1st checks should always be done; that is things like disk space etc.

A server I have access too uses LDAP for user info and Kerberos5 for realm authentication. It was reported that this server wasn’t letting anyone login via ssh, and the only way that I was able to login, was via the console connection for the box (so essentially the only way to connect was locally).

I was able to prove that LDAP lookups were working, by simply id’ing on user accounts I knew to not exist locally which were stored in LDAP. I was also able to init a kerberos ticket when logged in, and login as ldap/krb5 users “locally”.

After a while of faffing about, enabling debug logging on sshd and so on, it dawned on me to check the disk space, thanks to an odd I/O moan in the sshd debug log. Low and behold! the partition where the kerberos key cache for ssh was completely full!

It goes to show that even simple checks like that which sometimes seem noddy, should always be done!

Tags: ,
,
April 12, 2007 at 9:27 pm Comments (0)

« Older Posts