Wabi Sabi Software: 2013

Monday, November 18, 2013

Random observations from Amazon's re:Invent conference

I just returned from Amazon's re:Invent developers conference and had a few social observations.

a) the gender breakdown seemed to be about 99% male which is depressing

b) every presentation I saw was given on a MacBook

c) in the audiences I saw people taking notes on iPads and all manner of 7 inch tables (its hard to distinguish iPad Mini from Kindle Fire from generic Android at a distance.

d) I think I saw a couple of PC laptops during the week and I saw zero Surface tablets.

e) I did for the first time see two people wearing Google Glasses and I heard the term Glass-hole!

f) The Venetian hotel has a truly amazing recreation of Saint Marks Square…they basically built a IMAX like curved ceiling over the entire indoor plaza creating a realistic impression of being outdoors.

g) the technology part of the conference (the real point after all) was of course very cool and compelling but thats a post for another day!

Wednesday, October 2, 2013

New article at PragPub

I have another article published at PragPub. That used to be a free publication but they've had to start charging (a very small and reasonable fee) for the magazine. You can get to the paid magazine at www.swaine.com/pragpub.

They are taking a very open approach and allowing authors such as myself to provide free links to our articles. So, here is a link to a copy of that article: hAppsVsWebFinal.pdf

As always, comments are welcome.

Thursday, September 12, 2013

How to Setup Amazon CloudWatch to Send Alarm Emails

Amazon Web Services are very powerful and some things are harder than you might expect. Getting an email notification that something is wrong is one of those things. I originally wrote this up so that I would remember next time I needed it and then decided it might be helpful to others.

You need to set up several things to get this to work. You need a metric, which is some value that indicates if the system is ok or not. You need an alarm that fires when that metric indicates failure. You need an SNS topic to get notified when an alarm fires. And lastly you need a subscription to that topic which a where you specify a phone number or email to receive the alarm notification.

create an SNS topic from SNS gui (press “Create and Add”)

give the topic a name and a display name

create a subscription…some place to send the notification

HTTP, HTTPS, email, sms, SQS and protocol specific “endpoint”

You will receive a confirmation email, text, etc. depending on the protocol you specified asking if you really want to create this subscription.

Take note of the “ARN” for the topic, it will be used as the alarm-action below

Create a Cloud Watch metric named ZkServerCount

"mon-put-data" is used both to put the data and to create the metric itself if it doesn’t already exist.

We use the command line to create the metric because CloudWatch seems to have a problem if you have defined a lot of metrics…and its easy to create a lot of metrics. Each instance you create automatically gets half a dozen standard metrics and those metrics live for 14 days regardless of how long your instance lives. If you create thousands of worker instances you seem to overwhelm the CloudWatch gui. We’ve reported this as a bug and are working with Amazon about it but in the meantime we use the command line to create metrics.

Setting up to use the command line

download the jars from http://aws.amazon.com/developertools/2534

setup environment vars as described in README.TXT. You must in particular setup the AWS_CLOUDWATCH_HOME variable

set up credentials as per the readme.

Run this command which will both create the metric and give it its first value of three: mon-put-data –metric-name ZkServerCount –namespace MyNameSpace –timestamp 2013-08-25T00:00:00Z – value 3

Create a Cloud Watch metric alarm based on that metric; this alarm will fire when the metric’s value is less than 3:

bin/mon-put-metric-alarm –alarm-name zk-mon –alarm-description “Alarm upon zookeeper server failure” –metric-name ZkServerCount –namespace MyNameSpace –statistic Average –period 300 –threshold 3 –comparison-operator LessThanThreshold –evaluation-periods 1 –alarm-actionsarn:aws:sns:us-east-1:abc123:ZooKeeper_failure<

Create some code that will periodically run the mon-put-data command from step 2 above. This looks like a lot of code to put a single value into the system but bear in mind that this includes some one-time setup, and that you can add multiple values at a time.

public void putMetric() {

BasicAWSCredentials credentials = BasicAWSCredentials(ACCESS_KEY, SECRET_KEY);

AmazonCloudWatchClient acwc = new AmazonCloudWatchClient(credentials);

Double metricValue = 3.0;

MetricDatum datum2 = new MetricDatum().

withMetricName(INSTANCE_COUNT_METRIC_NAME).

withTimestamp(new Date()).

withValue(metricValue).

withUnit(StandardUnit.Count);

PutMetricDataRequest putMetricDataRequest = new PutMetricDataRequest().

withNamespace(FRAGMENT_NAMESPACE).

withMetricData(datum1, datum2, datum3);

acwc.putMetricData(putMetricDataRequest);
}

When you are all done its of course a good idea to trigger the condition manually just to make sure you actually do get that text or email that you really don't want to get for real!

Saturday, April 6, 2013

Moving to Agile: estimates and promises

In transitioning a legacy team to agile the issue of estimates vs promises and how engineers and management interprets them is acknowledged as key. Management often asks engineers for estimates and then treats the answers as promises. This leads to a whole set of problems which are documented in the agile literature. I often think that communications difficulties can be helped by reversing a situation so lets try that here.

I wonder if we can help clarify the difference between estimates and promises by asking management for an estimate. “If my team gives you product X by date Y will you give us a specific level bonus?”. Any management I’ve seen will say they can’t promise a specific bonus in the future because there are too many free variables.

To which we say “exactly”.

If instead we say “we believe we have an 80% chance of hitting date Y with feature set Z” might we get them to say “there’s a 75% chance of getting a bonus of D dollars”? As we got closer and closer to the target release date might we get them to reduce the uncertainly of our bonus level as we reduce the uncertainly of the feature set?

I don’t think most management would actually do this but it might help them see our estimates in a new light.

Tuesday, March 26, 2013

Is It Coding in Scala or Coding With Courage And Humility That Matters?

Scala is all the rage these days, replacing its legacy cousin Java in the hearts and minds of all the cool kids. Admitting that you still code in Java is the geek’s equivalent to saying you drive a mini-van and don’t have a twitter handle. We’ve all heard the proclamations of how much more powerful and succinct Scala is..and I’ll admit to having done some of that myself.

While not dismissing those statements, because I do in fact find Scala to be a better language, I think there is another cluster of factors at work here as well. Those factors are courage and humility.

There is a correlation between Java and large, often distributed teams. This may be largely in part due to the relative age of the language. It is established and “mature” to use a kind word. If you have a 50-100 person development shop with offices in the US as well as in some subset of China, India, Russia you are very likely to be a Java shop. If you have maintenance or sustaining teams as well as a development team you are likely to be a Java shop. If you spend a fair bit of your time on “process” you are likely to be a Java shop.

Notice that I said “likely”. There are lots of counter examples. My own company uses Java and we have only three programmers. None the less, I think the preceeding statements are generally true.

So what? Well, my assertion is that along with large, process oriented and distributed comes the notion of lowest common denomenator coding. As the size and geographic distribution of your team grows so does concern about “those other programmers” being able to understand and maintain your code. That leads us to want to standardize and simplify the code. We want to make the code clear to that possibly junior coder who may be new to the project, who may never have absorbed the designs, and who many not be familiar with the code base.

This leads us to write code like:

public int calculateTheValue(long someInput, long someOtherInput) {
    long intermediateOne = someInput * getSomethingElse();
   intermediateOne += someOtherInput;
   intermediateOne = someMethod();
     ….
   return intermediateOne;
}

There is nothing wrong with this code and to a newbie its certainly more accessible than:

public int calculateTheValue(long someInput, long someOtherInput) {
return somethingElse(someInput, someMethod(someOtherInput));
}

The problem is that 10 lines of code versus 3 for every method in your system results in a sea of code where you literally can’t see the forest for the trees. I can tell what each individual line of code does but I have no idea why because I can never see more than 0.01% of the code on my screen at a time.

Some readers might protest at this point that this is all just formatting and Eclipse or Emacs could in principal convert between these two representations. To which I say: not so much.

The functional approach is all about the composition of a method from a collection of existing functions. In this approach it is clear that the new method is “just” using the existing methods. The new method has no logic per se other than using the output of other…presumably well named and tested functions.

In the more familiar Java approach each method is a new creation created out of whole cloth. It might do any old thing it wants. In this case freedom and creativity are to be considered bad things. Each method must be examined line by line to see what it might be doing. Lets look at a bit of open source code I’m actually currently debugging:

int to = readTimeout - clientCnxnSocket.getIdleRecv();
int timeToNextPing = (readTimeout / 2) - clientCnxnSocket.getIdleSend();
if (timeToNextPing <= 0) {
            sendPing();
            clientCnxnSocket.updateLastSend();
} else {
           if (timeToNextPing < to)
                     to = timeToNextPing;
}
clientCnxnSocket .doTransport(to);

After some period of study we can see that this code has two variables related to time outs: “to” and “readTimeOut”. Based on the results of two “getIdle” calls we might send a ping, and then we mutate “to” in a couple of possible ways and then use it as a parameter to a socket call. Further investigation reveals that “to” is the length of time the socket method may spend in a blocking “select” call. Thus, “to” is related to how long we can block before sending another ping.

I’ve spent the last couple of hours trying to track down a bug in this system and the problem is that every single line is ontologically at the same level. By that I mean that any of them could have or be a side effect, any could do something other than whats expected and the gestalt of what this code fragment intends can only be gleaned by close study.

I’ll assert that the following code does not suffer from those flaws:

if(timeToPing()) sendPing();
clientCnxnSocket(safeTimeToWaitForRead());

And that brings us to the humility side of the equation. Its ok to write little one line functions that just do the one thing that their name implies. TimeToPing is not a function you will put on your resume. You will not proudly show it to your coworkers. You will not tell your husband/wife/partner about the amazing bit of code you wrote today. This one line function will sit there quietly, unnoticed…working.

If we have the humility to write simple functions and then have the courage to combine them into composite functions without extraneous scafolding and temporary mutable variables then we have a chance to achieve greatness…even in a legacy language.

To be sure, there are things that are trivial in Scala that simply can not be done in Java. I know of no way to annotate a method to indicate that it does or doesn’t ever return null. We recently changed such a method to never return null. There is no way however to find all the code that’s now unncessary. Or assuming we had made the opposite change…to find the code that was not an NPE timebomb. [1]

In Scala of course if your function might return a Foo but might return nothing you return an Option(Foo)…and function’s callers must deal with the Option(Foo) or they will not compile. This isn’t fixed in Java 7, nor will it be fixed inJava’s 8, 9 or 10. Null is just baked into the language.

Java isn’t going to be “replaced” by any of the newer languages. It will continue to lose market share but will command a large segment of the market for the foreseeable future. Its also clear that Java will continue to evolve and will over time gain missing features such as lamda expressions and better package structure. Other things like null and the Generics system are likely to be with us to the bitter end. For good or bad erasures and generics are part of the language now and forever. Java 8 sprinkles a bit of syntactic sugar allowing the second repeat of the type to be omitted as in:

HashMap myHashMap = new HashMap<>();

but that’s a fairly trivial improvement in this age of modern type inference languages.

What this means is that engineers working with Java need to do the best they can with a 15 year old language. Courage and Humility can help with that.

[1] Yes, we could use PMD but that just points out that there is no support for such things in the language itself.

Wednesday, January 23, 2013

Nothing but better hiring will fix things

Recently in Slashdot there have been a series of posts about the quality level of the software at various companies. They've had titles like "How can I make my team write better code?" or "How can we improve our code quality?"

I also subscribe to numerous blogs and MeetUps and User Groups where titles like "Do 'x' to improve" are fairly standard.

I might just be getting my jaded but I'm coming to the conclusion that there is one and only one thing that can improve the quality of the software you/your team produces. That one thing is better hiring. To quote Joel On Software..the only two answers after an interview are "Hell yes!" or "No".

By and large people either care passionately about the code the write ... or they don't. In over thirty years of doing this I can't recall a case where a "bad" coder read a book, went to a class, attended a conference or had a talk with management and suddenly started caring.

That's a bit depressing because I speak at conference and have published lots of papers! On the other hand, my papers and talks have been aimed at the subset of our field that already care and are just looking to hone their skills. Trying to convince someone that unit testing, proper naming, or coherent design was a good thing is just a waste of time. People either "got it" a long time ago or they're not going to get it. Sorry.

So, work to get your team to hire better people or go someplace that already does. I just don't see another option.