While detailing the post on my setup for Rapid Play Framework Development I missed one of the scripts i use.
I do keep an eye on the websites, quite obviously, and I get notified of any error via an email. When such an email comes however, the first thing you want to do is take a look at the latest log file.
Enter the getlog
script, which will find the last log file, zip it, copy it to my laptop and then view it with less
:
ssh vm3.coolscala.com "cd ~/app/logs; gzip -c log.log >log.log.gz"
scp vm3.coolscala.com:app/logs/log.log.gz log.log.gz
ssh vm3.coolscala.com "cd ~/app/logs; rm log.log.gz"
gunzip -f log.log.gz
less log.log
The email is something you can enable in Global, like so:
/** customize some global handling errors */
object Global extends WithFilters(LoggingFilter) {
override def onError(request: RequestHeader, ex: Throwable) = {
val m = ("ERR_onError", "Current count: " + lastErrorCount + " Request:" + request.toString, "headers:" + request.headers, "ex:" + ex.toString).toString
admin.SendEmail.withSession { implicit mailSession =>
Emailer.tellRaz("ERR_onError",
currentUser.map(_.userName).mkString, m)
//...
}
}
Sometimes you will see a flurry of emails, say whlie reloading the database or due to some network connectivity (yes, looking at you, Digital O) so I wrote a quick backoff scheme to stop my inbox from being flooded:
/** customize some global handling errors */
object Global extends WithFilters(LoggingFilter) {
// EMAIL BACKOFF stuff
val ERR_DELTA1 = 5 * 60 * 1000 // 5 min
val ERR_DELTA2 = 6 * 60 * 60 * 1000 // 6 hours
val ERR_EMAILS = 5 // per DELTA2
var errEmails = 0 // sent per DELTA2
var lastErrorTime = System.currentTimeMillis - ERR_DELTA1 // time last error email went out - just one every 5 min, eh
var firstErrorTime = System.currentTimeMillis - ERR_DELTA2 // time first error email went out - just one every 5 min, eh
var lastErrorCount = 0 // time last error email went out - just one every 5 min, eh
override def onError(request: RequestHeader, ex: Throwable) = {
Audit.logdb("ERR_onError", "request:" + request.toString, "headers:" + request.headers, "ex:" + ex.toString)
val m = ("ERR_onError", "Current count: " + lastErrorCount + " Request:" + request.toString, "headers:" + request.headers, "ex:" + ex.toString).toString
if (System.currentTimeMillis - lastErrorTime >= ERR_DELTA1) {
if (errEmails <= ERR_EMAILS || System.currentTimeMillis - firstErrorTime >= ERR_DELTA2) {
admin.SendEmail.withSession { implicit mailSession =>
Emailer.tellRaz("ERR_onError",
currentUser.map(_.userName).mkString, m)
synchronized {
if (errEmails == ERR_EMAILS || System.currentTimeMillis - firstErrorTime >= ERR_DELTA2) {
errEmails = 0
firstErrorTime = lastErrorTime
}
errEmails = errEmails + 1
lastErrorTime = System.currentTimeMillis()
lastErrorCount = 0
}
}
} else {
lastErrorCount = 0
}
} else {
lastErrorCount = 0
}
Note that if you log a lot and don't roll over the logs often, these can get quite large...
See the entire Global here: play-simple-logging.scala.
As you can see, important events are audited in the database - via Audit.logdb(...)
.
I use a Mongo DB and have a simple web page to review and clear these audited events every now and then. It is very good habit to get into - as it can alert you to uses of the website and hacking attempts and other data integrity issues that you won't otherwise know about.
As mentioned in my Playing with fire 2 - the instability comes into play, I log each request via the LF.START
and LF.STOP
messages.
These are good for debugging stuff whenever you have issues and you can also use as a simple performance monitor.
Alternatively, you could log that particular exception and the surrounding say 100 lines in a separate file and place a link to this file in the email - that way you can quickly inspect the log and details about the exception strait from the email.
I have used this technique in the past:
That way, it would be available for a while for easy inspection.
This way you can also handle multiple servers - the getlog
doesn't quite handle the case where you run a cluster, yet.
Of course, there are more complicated options that may offer more flexibility on bigger deployments, like Splunk etc - but for small adventures I prefer localized simple solutions.