|
Matt
|
 |
« on: July 11, 2011, 10:12:19 AM » |
|
Still experiencing slowness on the server. Have sorted the initial issue of the system being slow because WordPress was trying to contact through a proxy for updates. Sorted that. Now, it works fine for 1 user, possibly up to around 5 - and then is goes to 8-10 secs to load a page. Ive tried a load test using some software and it replicates exactly the issue I have seen when multiple users login. I ran the ab testing tool, and again, it was slow when I was using it during the test. Here is the output: -f protocol Specify SSL/TLS protocol (SSL2, SSL3, TLS1, or ALL) dash@dashboard:~$ ab -n 1000 -c 10 http://10.50.4.100/ This is ApacheBench, Version 2.3 <$Revision: 655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 10.50.4.100 (be patient) Completed 100 requests Completed 200 requests Completed 300 requests Completed 400 requests Completed 500 requests Completed 600 requests Completed 700 requests Completed 800 requests Completed 900 requests Completed 1000 requests Finished 1000 requests
Server Software: Apache/2.2.17 Server Hostname: 10.50.4.100 Server Port: 80
Document Path: / Document Length: 2 bytes
Concurrency Level: 10 Time taken for tests: 195.414 seconds Complete requests: 1000 Failed requests: 0 Write errors: 0 Non-2xx responses: 1000 Total transferred: 334000 bytes HTML transferred: 2000 bytes Requests per second: 5.12 [#/sec] (mean) Time per request: 1954.138 [ms] (mean) Time per request: 195.414 [ms] (mean, across all concurrent requests) Transfer rate: 1.67 [Kbytes/sec] received
Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.0 0 1 Processing: 566 1950 524.6 1954 5972 Waiting: 566 1949 524.4 1954 5972 Total: 566 1950 524.7 1954 5972
Percentage of the requests served within a certain time (ms) 50% 1954 66% 2114 75% 2199 80% 2256 90% 2420 95% 2651 98% 2961 99% 3834 100% 5972 (longest request) dash@dashboard:~$
Can anyone give me some help on where to look and what to look for?
|
|
|
|
|
Logged
|
|
|
|
|
Matt
|
 |
« Reply #1 on: July 11, 2011, 11:03:07 AM » |
|
This looks like its topping out? Its a proper quad core server :/  NB -0 this is running a test stimulating 20 users. With 1 user cpu use is around 20%
|
|
|
|
« Last Edit: July 11, 2011, 11:10:10 AM by Matt »
|
Logged
|
|
|
|
JasonD
|
 |
« Reply #2 on: July 11, 2011, 12:46:42 PM » |
|
Read your ab output before you post it next time. Document Length: 2 bytes Non-2xx responses: 1000
Although I have no idea why you take so long to fail. When you get it working with one request you can try again. Where has all your memory gone? Type > in top (to move the sort order to %MEM) 4GB *should* be enough (of the 8GB ram here (and no swap), it is only using 1.5 with 6.5 GB disk cache). Why so much swap? And is it on some nice simple single local physical device/logical volume or something fancy (raid/encryption/etc)?
|
|
|
|
|
Logged
|
|
|
|
|
Matt
|
 |
« Reply #3 on: July 11, 2011, 01:35:45 PM » |
|
AH sorry, it failed a couple of times before, so I thought it had been successful that time as it ran through the process.
Ubuntu is hosted on a HP Proliant DL380, with 3 72GB SCSI Drives, I dont recall it doing anything like putting the drives in RAID - can I run a command to check?
Afraid Im not up on what you mean by swap?
Sorry, as I have said this is the first time I have used Ubuntu, or even attempted to do this kind of thing, and those who were supposed to help me with this part of the build have simply sent one line emails.
|
|
|
|
|
Logged
|
|
|
|
|
Matt
|
 |
« Reply #4 on: July 11, 2011, 01:48:54 PM » |
|
Did the memory sort like you suggested, and got a few more lines of the top cmd: 
|
|
|
|
|
Logged
|
|
|
|
|
Matt
|
 |
« Reply #5 on: July 11, 2011, 02:49:49 PM » |
|
After a while of being slow I get this in the error logs: [Mon Jul 11 15:38:15 2011] [error] child process 1785 still did not exit, sending a SIGKILL [Mon Jul 11 15:38:15 2011] [error] child process 1786 still did not exit, sending a SIGKILL [Mon Jul 11 15:38:15 2011] [error] child process 1787 still did not exit, sending a SIGKILL [Mon Jul 11 15:38:15 2011] [error] child process 1788 still did not exit, sending a SIGKILL [Mon Jul 11 15:38:15 2011] [error] child process 1789 still did not exit, sending a SIGKILL [Mon Jul 11 15:38:16 2011] [notice] caught SIGTERM, shutting down
|
|
|
|
|
Logged
|
|
|
|
JasonD
|
 |
« Reply #6 on: July 11, 2011, 03:54:01 PM » |
|
Swap space, I think Windows calls it a page file, similar enough that I won't explain any further other than to add that in linux it will copy things to swap ahead of time (so don't pay too much attention to the used value in top) so that it can find some physical memory when it needs to (windows may do this too, I don't know) and is usually but not always on a dedicated partition or logical volume. grep swap /etc/fstab Will tell you where yours is. Raid done properly should be fast, I once encountered a server with some swap on an encrypted disk (if you have an encrypted disk encrypted swap makes sense), but the implementation was slow, turning off swap (swapoff) was an instant improvement. I don't know how ubuntu allocates swap space by default, 1.5 x physical memory is ancient advice that never went away, and may be a 'better too much than not enough' default. If it were a desktop and you wanted hibernation then you would add swap space >= physical memory (I think it still works like that). I'm assuming you don't plan to hibernate a server. If you ever used 6GB of swap you would be waiting all day on disk IO while it shuffles stuck back and forth, which may be what is happening. Adding more physical ram may improve things, you'll be able to serve 10 users instead of 5. Insufficient or no swap space, or when all swap space is used, and when all physical memory has been allocated and disk cache has been thrown out it will start killing off processes. I'm not sure how it decides what to kill off first. It is almost certainly stuck in an infinite loop of some kind, although why it doesn't happen all the time with only one user is odd.
|
|
|
|
|
Logged
|
|
|
|
|
Matt
|
 |
« Reply #7 on: July 12, 2011, 08:16:33 AM » |
|
This the output from that command: dash@10.50.4.100's password: Welcome to Ubuntu 11.04 (GNU/Linux 2.6.38-8-server x86_64)
* Documentation: http://www.ubuntu.com/server/doc Last login: Tue Jul 12 09:00:43 2011 from 10.50.4.1 dash@dashboard:~$ grep swap /etc/fstab /dev/mapper/dashboard-swap_1 none swap sw 0 0 dash@dashboard:~$
I thought it may be like pagefile. Interestingly - this is idle... Tasks: 241 total, 1 running, 240 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 0.1%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3990672k total, 3941460k used, 49212k free, 15588k buffers Swap: 5804028k total, 2970388k used, 2833640k free, 145636k cached
is that not saying the memory is totaly in use even when idle?
|
|
|
|
|
Logged
|
|
|
|
JasonD
|
 |
« Reply #8 on: July 12, 2011, 09:18:14 AM » |
|
/dev/mapper/dashboard-swap_1 none swap sw 0 0 sudo lvdisplay --maps dashboard/swap_1 is that not saying the memory is totaly in use even when idle? Almost. You have to subtract the cached value from the mem used value. Or free -m The -/+ buffers/cache line is easier. Out of interest, what difference does stopping and restarting apache make: free -m sudo /etc/init.d/apache2 stop free -m sudo /etc/init.d/apache2 start free -m
|
|
|
|
|
Logged
|
|
|
|
|
Matt
|
 |
« Reply #9 on: July 12, 2011, 09:57:24 AM » |
|
Ill try that stuff in a sec I just created a new DB - as I had installed a load of WP plugins and then removed them etc - so wanted to see if a fresh DB with only the plugins I need made a difference. Seems to have solved the memory issue I think? top - 10:45:49 up 20:29, 3 users, load average: 13.04, 7.88, 3.62 Tasks: 109 total, 11 running, 98 sleeping, 0 stopped, 0 zombie Cpu(s): 91.0%us, 4.5%sy, 0.0%ni, 3.7%id, 0.0%wa, 0.0%hi, 0.8%si, 0.0%st Mem: 3990672k total, 1048072k used, 2942600k free, 14100k buffers Swap: 5804028k total, 34040k used, 5769988k free, 125204k cached
Seems the CPU is the same ( I can see it grow as soon as a user starts browsing the site)
|
|
|
|
|
Logged
|
|
|
|
|
Matt
|
 |
« Reply #10 on: July 12, 2011, 10:05:41 AM » |
|
--- Logical volume --- LV Name /dev/dashboard/swap_1 VG Name dashboard LV UUID riglAA-Asps-xhqG-anqA-OO4l-W0Yi-PXNIRq LV Write Access read/write LV Status available # open 2 LV Size 5.54 GiB Current LE 1417 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 251:1
--- Segments --- Logical extent 0 to 1416: Type linear Physical volume /dev/cciss/c0d0p5 Physical extents 33248 to 34664
board:~$ free -m total used free shared buffers cached Mem: 3897 1305 2592 0 34 295 -/+ buffers/cache: 974 2922 Swap: 5667 33 5634
dash@dashboard:~$ free -m total used free shared buffers cached Mem: 3897 1305 2592 0 34 295 -/+ buffers/cache: 974 2922 Swap: 5667 33 5634 dash@dashboard:~$ ^C dash@dashboard:~$ sudo /etc/init.d/apache2 stop * Stopping web server apache2 ... waiting [ OK ] dash@dashboard:~$ free -m total used free shared buffers cached Mem: 3897 524 3373 0 34 296 -/+ buffers/cache: 193 3704 Swap: 5667 33 5634 dash@dashboard:~$ sudo /etc/init.d/apache2 start * Starting web server apache2 Apache needs to decrypt your SSL Keys for dashboard.wolverley.local:443 (RSA) Please enter passphrase: [ OK ] dash@dashboard:~$ free -m total used free shared buffers cached Mem: 3897 970 2926 0 34 296 -/+ buffers/cache: 639 3257 Swap: 5667 33 5634 dash@dashboard:~$
|
|
|
|
|
Logged
|
|
|
|
JasonD
|
 |
« Reply #11 on: July 12, 2011, 11:57:15 AM » |
|
with 3 72GB SCSI Drives, I dont recall it doing anything like putting the drives in RAID - can I run a command to check? It would make sense that 3 scsi drives would be on a scsi controller (the cciss in the device name) would be set up as raid 5. The physical extents of the LV put it towards the end of the usable space. But assuming there is no noticeable degradation of performance anywhere else we can assume the disks and the raid is working fine. Other than maybe being oversized I think we can ignore the swap space as a source of problems.
|
|
|
|
|
Logged
|
|
|
|
|
Matt
|
 |
« Reply #12 on: July 12, 2011, 12:54:25 PM » |
|
with 3 72GB SCSI Drives, I dont recall it doing anything like putting the drives in RAID - can I run a command to check? It would make sense that 3 scsi drives would be on a scsi controller (the cciss in the device name) would be set up as raid 5. The physical extents of the LV put it towards the end of the usable space. But assuming there is no noticeable degradation of performance anywhere else we can assume the disks and the raid is working fine. Other than maybe being oversized I think we can ignore the swap space as a source of problems. Cool! Step closer, I think it is something to do with apache2 or the processor - can I reset either, or get a vanilla httpd.conf or something?
|
|
|
|
|
Logged
|
|
|
|
JasonD
|
 |
« Reply #13 on: July 12, 2011, 01:10:35 PM » |
|
The only way I can provoke similar behaviour from Apache is with a deliberately bad script. Deliberately wasting memory will still hit memory_limit, wasting time eating cpu in a loop still hits max_execution_time. Eating as much memory as I can and then being blocked waiting on a locked file <?php // use 50M of memory for no good reason $x = str_repeat('x', 52428800); $f = fopen('/tmp/lock-test', 'w'); if (flock($f, LOCK_EX)) { // wait here fwrite($f, 'zzz'); } fclose($f); echo 'done';
With another running processing keeping the file locked. Does not hit max_execution_time (the timer is effectively paused), but doesn't use any cpu time either. Have a look at your phpinfo(), specifically max_execution_time and memory_limit. I suspect it may be a script doing something wrong rather than an apache configuration problem, but if the above values are not ridiculously high (or zero), then at this stage I'm running out of ideas and reinstalling can't make things any worse. I know I did say before that reinstalling rarely helps. I didn't say never. sudo apt-get purge apache2* and then go and clear out /etc/apache2 if it didn't already. Then sudo apt-get install libapache2-mod-php5 which should grab the other apache2 bits as a dependencies.
|
|
|
|
|
Logged
|
|
|
|
|
|
|
Matt
|
 |
« Reply #15 on: July 12, 2011, 01:32:21 PM » |
|
Also just turned off any plugins, running the same test, cpu still 97% + - Also disabled my theme and shifted to the new default one.
|
|
|
|
|
Logged
|
|
|
|
|
Matt
|
 |
« Reply #16 on: July 12, 2011, 02:03:01 PM » |
|
Gah.
Still doing the same. Im on the verge of giving up now, is Ubuntu the one? Or should I stick to Windows. Should I try to reinstall? God knows what it is, but Im not running a default WP install, on a fresh DB with no plugins and have just reinstall apache2.
Im proper stuck!
|
|
|
|
|
Logged
|
|
|
|
JasonD
|
 |
« Reply #17 on: July 12, 2011, 03:10:32 PM » |
|
The only significant difference between your phpinfo and mine was I didn't have ldap and mcrypt, install them didn't make any difference. I figured I should actually install wordpress to see if it is just you... Now, I haven't touched wordpress for years, and even then only briefly, so I have no reference for how fast or slow it should be. So having done nothing but follow the 5 minute install instructions, logged in and clicked '1 post' through to wp-admin/edit.php, it seemed to be working. Running ab -n 100 -c 10 -H 'Cookie: wordpress....' 'http.../wp-admin/edit.php' Document Path: /wp-admin/edit.php Document Length: 33935 bytes
Concurrency Level: 10 Time taken for tests: 5.206 seconds Complete requests: 100 Failed requests: 0 Write errors: 0 Total transferred: 3454100 bytes HTML transferred: 3393500 bytes Requests per second: 19.21 [#/sec] (mean) Time per request: 520.590 [ms] (mean) Time per request: 52.059 [ms] (mean, across all concurrent requests) Transfer rate: 647.95 [Kbytes/sec] received
Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.1 0 0 Processing: 198 509 184.3 523 900 Waiting: 165 421 147.7 456 747 Total: 199 509 184.4 523 901
Percentage of the requests served within a certain time (ms) 50% 523 66% 607 75% 651 80% 676 90% 766 95% 844 98% 886 99% 901 100% 901 (longest request)
I repeated that a couple of times and got more or less the same results. Does wordpress have some sort of throttling I'm not aware of? Whether this edit page is representative of general use I have no idea but it is ridiculously slow. My slowest page parses an 7MB tar file at 25 requests per second. The closest page I have of superficially similar purpose handles 140 requests per second and I don't consider that fast. I don't have any non-ubuntu/debian alike servers that I can think of off the top of my head to try. I'll see if I can spot anything stupid later.
|
|
|
|
|
Logged
|
|
|
|
|
Matt
|
 |
« Reply #18 on: July 12, 2011, 03:34:31 PM » |
|
See it still works even at 90%+ CPU - it just takes 10 seconds + to load a file. Whilst I have no idea if Wordpress is coded well or not, its used to power some pretty massive sites. I hosted this externally on our hosting server, and didnt get anywhere need the same kind of speed issues 
|
|
|
|
|
Logged
|
|
|
|
JasonD
|
 |
« Reply #19 on: July 12, 2011, 10:21:58 PM » |
|
Well, it is not only ubuntu that is slow. Arch linux in a VM, fresh install with the minimum setup necessary. Concurrency Level: 10 Time taken for tests: 28.025 seconds Complete requests: 100 Failed requests: 0 Write errors: 0 Total transferred: 3315800 bytes HTML transferred: 3256500 bytes Requests per second: 3.57 [#/sec] (mean) Time per request: 2802.481 [ms] (mean) Time per request: 280.248 [ms] (mean, across all concurrent requests) Transfer rate: 115.54 [Kbytes/sec] received
Memory usage was not as bad (about 650M, with no swap), still high. And for reference The closest page I have of superficially similar purpose handles 140 requests per second and I don't consider that fast. The same page on Arch only handles 55 requests per second. Also that 140 is over https, closer to 240 on plain http. In both instances Wordpress was installed in a clean virtual host and an empty database. Ubuntu host is 8GB 2.4Ghz quad core, Arch had 4GB and one 2.4Ghz CPU. Next up, Xdebug.
|
|
|
|
|
Logged
|
|
|
|
|
Matt
|
 |
« Reply #20 on: July 13, 2011, 08:17:30 AM » |
|
From my end, Im trying it on another server we have lying around, its an xeon, fallen at the first hurdle however, need to download again because its the wrong version for the CPU.
|
|
|
|
|
Logged
|
|
|
|
|
Matt
|
 |
« Reply #21 on: July 13, 2011, 08:40:10 AM » |
|
When setting up the new server, do I want to partion guided with an LVM or not?
|
|
|
|
|
Logged
|
|
|
|
|
|
|
Matt
|
 |
« Reply #23 on: July 13, 2011, 03:52:27 PM » |
|
Done a complete reinstall, and the same flipping thing is happening.
|
|
|
|
|
Logged
|
|
|
|
|
|
|
Matt
|
 |
« Reply #25 on: July 13, 2011, 06:57:37 PM » |
|
I ran it with Xdebug profiler, fed to kcachegrind, and looked at some of the code.
There is no particular part that is slow. It is all sub-optimal code, in small doses it wouldn't matter. It is wasting a lot of time doing a lot that as far as I can tell it has no reason to it. For instance, the aforementioned edit page makes 825 calls to translate() with 381 unique messages. There must be at most 50 translatable things on that page.
Its weird though that its fine for user, the CPU is not even that high, its when there is 4/5+ users on it. Kinda stuck what to do now. Could it be a server based issue? IE the hardware?
|
|
|
|
|
Logged
|
|
|
|
|
|
|
Matt
|
 |
« Reply #27 on: July 13, 2011, 10:05:21 PM » |
|
Going to try on an xxamp install tomorrow to rule out hardware. Tried 3 other servers today and couldn't get ubuntu to work on any  Might also try on our cloud hosting account.
|
|
|
|
|
Logged
|
|
|
|
|
Matt
|
 |
« Reply #28 on: July 14, 2011, 12:36:25 PM » |
|
Is this good or bad Jason (Thanks for all your help btw) - Im wondering if it may be the software Im using to load test? If this is not the right way to test to check the speed, can you recommend a good way other then getting a room full of kids to log on? dash@thedash:~$ ab -n 100 -c 5 http://www.thedash.org.uk/ This is ApacheBench, Version 2.3 <$Revision: 655654 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking www.thedash.org.uk (be patient).....done
Server Software: Apache/2.2.16 Server Hostname: www.thedash.org.uk Server Port: 80
Document Path: / Document Length: 23322 bytes
Concurrency Level: 5 Time taken for tests: 41.856 seconds Complete requests: 100 Failed requests: 0 Write errors: 0 Total transferred: 2357800 bytes HTML transferred: 2332200 bytes Requests per second: 2.39 [#/sec] (mean) Time per request: 2092.783 [ms] (mean) Time per request: 418.557 [ms] (mean, across all concurrent requests) Transfer rate: 55.01 [Kbytes/sec] received
Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 1.0 0 10 Processing: 1344 2074 418.6 2016 3260 Waiting: 843 1503 346.6 1459 2482 Total: 1344 2075 418.5 2016 3260
Percentage of the requests served within a certain time (ms) 50% 2016 66% 2227 75% 2327 80% 2378 90% 2635 95% 2870 98% 3200 99% 3260 100% 3260 (longest request) dash@thedash:~$
|
|
|
|
|
Logged
|
|
|
|
|
Matt
|
 |
« Reply #29 on: July 14, 2011, 01:25:26 PM » |
|
Also, trying it on an xamp install on my windows 7 PC produced not the same speed issues when accessing the site, but it did cause the CPU to be at 100%??
At a total loss what to do next. I would give up - but the system is ready for launch in september, staff have been adding things to it fine but of course thats only one or two user. It was only in the last couple of demos to staff it crashed when everyone was on it. The maxclient limit was hit, so lifting that means it does not crash, but if 30+ kids (if only in use in one of 7 IT rooms) access it at the start of the lesson, loading times would be in the region of 20 seconds +
|
|
|
|
|
Logged
|
|
|
|
|