Part 3 in a mini-series on developing a medium sized web app with Scala and the Play Framework. See Part 2 here: Playing with fire 2 - the instability comes into play.
Now we have the tools to chase and kill bugs and baddies. Let's begin!
The website has always been unstable, with the Play process often non-responsive. The cron job that pings the site often reported several problems per day, each meaning the server timed out a request and it was restarting the processes over and over again. I was having issues while using the website, myself. Luckily, it restarted quickly and only like 2-3 users out of 50 at that time had issues because of it.
On the Digital Cloud, Ubuntu VMs come by default with no swap file configured. Hey, it's cheap, but guess how fast you find that out. Here's what you do for 1G of RAM - allocate 4G of swap:
#========== setup swap - digitalOcean doesn't setup swap
#https://www.digitalocean.com/community/articles/how-to-add-swap-on-ubuntu-12-04
#
#check if none there:
sudo swapon -s
sudo dd if=/dev/zero of=/swapfile bs=1024 count=4096k
sudo mkswap /swapfile
sudo swapon /swapfile
#
vi /etc/fstab
paste:
/swapfile none swap sw 0 0
#
sudo chown root:root /swapfile
sudo chmod 0600 /swapfile
That's one down, several to go. Found that one while farting around with top
, acting on the sneaky suspicion that memory has something to do with my issues... there was no swapping at the top of top...
Sbt was sometimes spawning processes and locking a lot of memory for a long time, leaving just a little for the main play process - my VM was initially half a gig and then 1 gig and I didn't want to spend more moneys. I could not understand why those processes were hanging and grabbing lots of memory.
This was especially annoying since there were no changes to the files, so SBT should not have had anything to do.
This one went away when changing to the play dist
model. Figured this one while messing with top
and ps xaww | grep java
and saw that several java processes were running and their command line was telling: it wasn't me starting those processes, but sbt/play.
Back to our buddy, Ubuntu. So I use JMX and put in a simple page to report on memory usage and stuff and notice some memory is still leaking somewhere, as it was constantly going down, but not in my Java processes. In top, the memory didn't add up and, after a couple of unhealthy beers later, I figured out it was the OS itself blocking memory by abusing it's caches, with the default settings.
I put this cron as root and now it is fine - look it up if you care:
# setup a cron to clear system cached memory - Ubuntu is bad at it...?
crontab -e
0 */3 * * * sudo echo 3 > /proc/sys/vm/drop_caches
I understand this is caused by other tunable OS parameters, but it took too long to chase those down and this partial solution solved that particular problem.
Lesson #5 - invest some time and setup your OS properly, before production. Cheap clouds come with cheap defaults - honestly Rackspace cloud's VMs didn't need much tweaking while the ones from Digital Cloud did require a few hours of research and setup. I will eventually post my full setup for a Digital Cloud VM, time permitting.
I noticed that the JVM was not configured properly and it was picking out some 1.5G from somewhere, obviously running out of physical memory, I figured since my issue was in effect a timeout... maybe, just maybe.
Anyways, use the jconsole to keep an eye on memory, put some load on a test server and tune it accordingly. These settings seem to make sense in my case, for starters, on a 1G dedicated VM:
_JAVA_OPTIONS="-Xms600m -Xmx600m -XX:MaxPermSize=128m -XX:+CMSClassUnloadingEnabled"
In jconsole, the perm gen was see-sawing around 80-90M (after some of the baddies in the next installment were elimminated).
In the next installment we will catch some more interesting baddies, closer to scala and play... it takes so long to write this, eh? Playing with fire 4 - serious baddies join the party.