OpenPilot.org – The Next Generation Open Source UAV Autopilot – The Next Generation Open Source UAV Autopilot

Have to come back to this. Very cool.

OpenPilot.org – The Next Generation Open Source UAV Autopilot – The Next Generation Open Source UAV Autopilot.

No Comments

Modern Lifecycle on the Cloud OS – Brian Harry’s blog – Site Home – MSDN Blogs

Interesting release today for Systems Center 2012; it got a Service Pack 1. All new features tie it directly to Visual Studio and Team Foundation Server. This is a very interesting move. New features include a Keynote or Gomez-like capability (Global Service Monitor (GSM)), new Lab Management support, Windows 2012 support, and incident integration. VS/TFS had these features ready in Update 1 which shipped at the end of November 2012. More details via the link below.

Modern Lifecycle on the Cloud OS – Brian Harry’s blog – Site Home – MSDN Blogs.

No Comments

Are You Ready for the Cloud? Some Food for Thought.

What’s the primary objective of moving to the cloud? I submit it is the ability to tie product/application usage directly to operational costs resulting in a cost per customer metric that is easily compared to revenue per customer. Total Cost of Operation for a product/application can be much more easily tracked using cloud technologies. If possible, Total Cost of Operation should be tracked in a before and after manner to capture trending and point in time operational cost data.

Adoption of cloud technologies requires a modernized and decently mature release discipline. Copying files between currently running OS instances no longer works, especially if you are using auto-scaling features of the different cloud platforms. Total Cost of Operations, which includes development and testing costs for each release, can now be used to calculate a TCO (where the ‘O’ is Operations) per customer.

This brings very basic and much clearer alignment between product development/IT/operations and the revenue generating parts of organizations.

A product’s readiness to embrace the cloud is a function of:

  1. Data model. Think No-SQL, a megatrend associated with movement to the cloud. Assessing how you will manage your data in the cloud is one of two major components to assessing cloud readiness. While going the No-SQL route is not a requirement of embracing the cloud, it is worth plotting the path to identify possible operational and financial benefits.

  2. Services orientation. Services orientation is the second major component to assessing cloud readiness. All services all the time. Services are assembled into applications and presentation. Services must be well structured, discoverable, secure, and protected in the cloud. If you haven’t reached this level of maturity for your product or application, migrating to the cloud will be difficult if not impossible.

  3. Design strength and maturity. The ability to design and execute a path from current state to future state in the cloud. Does your team have the ability to do this today, including the ability, willingness, or existing skill sets to map this path? Or, do you require additional assistance via external specialists and SME’s? Case in point; the Netflix journey towards cloud adoption.

  4. Code currency. Currency of the application code plays a big factor in cloud readiness. If an application is more than two major releases behind in terms of language and framework versions and/or includes components rooted in legacy languages it will be difficult if not impossible to get to the cloud. For new application and products, this is not much of an issue. Enterprise application or older products looking to make the leap, this is could be a major stumbling block.

  5. Development discipline. Product teams will need to adjust to the requirements that moving to the cloud will bring with it. Moving to the cloud will require greater ownership and accountability on the part of the product teams as there is no infrastructure supplier to lean on. In the cloud all infrastructure services are just that; services that are well known and published. You can’t run out of disk space, you can’t blame a storage array or “the network” for performance problems, you can’t misconfigure a firewall or load balancer in the cloud.

  6. Testing discipline. Automated testing is the key to success in the cloud. You’ll see this time-and-again in presentations and industry discussions. Automated testing forces the testing teams to become developers of a different type. The only language that doesn’t lie is code. Code must make testing transparent to their development counterparts and management alike. A select subset of this testing code must be passed forward into operations to ensure evaluation criteria, design assumptions, and constraints are maintained throughout the product’s lifecycle and continuously to be validated.

  7. Operational maturity. The operations and support teams must understand the product as well as the developers and testers. They must also change their focus from being firefighters to being assassins; eliminating the possibility of failure before it happens. The team that operates the Chaos Monkey at Netflix is a potential model for this type of operations team. The support and operations teams must become developers of yet a different type. They must be focused on operational algorithms, programmatic failure injection, detection, prevention, recovery, and constant evaluation of analytics.

Most enterprises will have to become comfortable with multiple cloud providers based on current platforms, languages, and frameworks currently deployed. This begs the question of public versus private cloud providers. Based on my experience, private clouds cost only marginally less than totally private infrastructure. In some cases, it could cost more. Public clouds scale better across economics, performance, and availability. Private clouds don’t fully support the evolution towards the principles and maturity levels I’ve identified above.

, ,

No Comments

Need Your MacBook Pro Battery Replaced? That’ll Be 5 Days, Sir.

If you own a MacBook Pro with Retina Display and your battery stops taking a charge or starts malfunctioning in some other manner such that it needs to be replaced it could take five days or longer. A friend and colleague of mine in Austin, Texas, went to the Apple Store at the Barton Creek Mall to replace his battery and was told he would have to surrender his computer for five days while they replace it. Apparently the battery is glued in place. As a result, the bottom half of the computer has to be replaced when the battery is replaced. Really? Definitely a design flaw.

,

No Comments

Netflix in the Cloud at SV Forum

This is one of the most impressive cloud adoption examples of 2012 that I found.

Netflix in the Cloud at SV Forum.

No Comments

The Role of Recovery Time in Application Availability

The following decomposition of availability related to Recovery Time Objective (RTO) was performed by Mike Antico during the course of a study we both participated in. This is an interesting analysis and food for thought for anyone responsible for the availability and performance of an online product or application with paying customers (or users who help fund the operation of a site via advertising views).

Some background on the discussion below. First, the discussion below assumes that an application has multiple sites or strings which provide mirrored functionality. Second, Recovery Point Objective (RPO) is a lower priority than Recovery Time Objective (RTO). Finally, failover or transferring of load between sites or strands may not be automated.

In order to achieve, say, 99.99% availability where RTO is higher priority than RPO, consider the following.

Pf = Pf1 * Pf2

That is, the probability of a failure is equal to the product of the probability of a failure of both operational sites or strings taken individually. Where,

  • Pf = The probability of failure of the global service. Failure of the global service is a loss of availability of any serviced client.
  • Pf1 = The probability of failure of string #1
  • Pf2 = The probability of failure of string #2

Basic stuff. If an application aspires to 99.99% availability or a failure rate of .0001 and both strings are essentially identical (in design) you get the following simple formula:

.0001 = Pf2
-or-
Pf = .01

or, restated, availability for each string needs to be 99% or better.

However, each substring achieves a measured 99.5% availability each year, often better. So how can a system with such operational history at the string level have less than 99.99% availability? The answer is the Recovery Time Objective (RTO). Let’s say in a given year an application achieved 99.987% (68.3 outage minutes). There were 8 outages, and the time to switch clients over to the alternate string was, on average, 8.54 minutes. Four nines availability allows for 52.56 minutes of outage per year. If the recovery time had been a little less than 2 minutes faster, they would have made the target of 99.99%. This is how availability can be improved by sharpening recovery time. Consider the alternative – working for less outages.

Which is less costly? Improving availability through improving PRO or RTO? We’ve already pointed out two means to improve availability. Improvements in RTO are often dismissed as not important but we can see it can be a valuable tool. RTO usually comes up in the context of Disaster Recovery (DR) in the literature so IT thinks of RTO in that context and not in the context of availability. This is a big oversight for the industry as a whole. But let’s pursue this a little further.

Consider the infamous semi-log chart of the cost of availability:

20121230-220936.jpg

Δ1<<Δ2 – or small improvement in availability create disproportionate costs. The curve is exponential so this is true at any point in the curve. Changes in RTO are linear if this depends on procedural refinements or modest automation. The point is simple, RTO can improve availability dramatically for dramatically less cost.

On RPO – a quick look at RPO has much significance for an honest measure of availability. If, for example, a system experiences a 1 minute outage but 30 minutes of work are lost – what’s the loss in availability? It is 30 minutes, not 1 minute! It hardly matters that the service is back up in 1 minute if you’ve lost 30 minutes of work. A comprehensive measure of operational availability needs to account for such dependencies. Making this clear feeds design and goes straight to customer satisfaction.

In this discussion we are only considering outages due to the loss of central services – not client communication outages. Recording outages only from the central systems perspective may undercount outages experienced by the client.

,

No Comments

Nike + ?

I started with a question: “How do I gather data about my hear rate and steps taken during the day on my iPhone 4?” I wanted to start collecting data about my daily activity to see how active (or inactive) I actually am and to see the affect of my environment on my heart rate. After a few days of sporadic research I discovered it is possible to do one or the other with various Apple devices but it is only possible to do both with an iPod Nano 6g, the Nike + iPod kit, and a Polar Wearlink for Nike + heart rate monitor. It was a bit confusing going through all of the marketing literature as the Nike + sensor can be use without the receiver when used with the iPhone 4. However, the iPhone will only receive from the Nike + sensor and not the heart rate monitor. I am about to spend the next week experimenting with the capabilities of my newly acquired equipment but I think I will find that I will need two apps for the two types of data I am looking to collect; Nike + iPod app and the Pedometer native app on the iPod Nano 6g. This may be an issue as I don’t think I can use two apps at once on the Nano. The Nike + iPod app says it can do both but I am not sure about the accuracy and it requires the Nike + receiver to be plugged into the iPod when in use which may affect battery life.

My first experiment will be to pit the two apps against each other to compare pedometer numbers; steps taken. I will be in a fairly controlled environment this week at a conference; in a hotel, same schedule every day, but at least with some decent walking between sessions. I will report back my results at the end of the week.

Also, the Nano has no connectivity other than through the universal connector. I’ll need to figure out how I get my data off of it. My guess is the Nike + web site has some sort of browser app that pulls it off the device directly. But what about the native Pedometer app? Hmmm, more research required.

No Comments

The Crux of the Cloud

In discussions over the past few months with Microsoft and other cloud services vendors the issue the entire industry is grappling with right now is how to allow “untrusted code” to run in the cloud. “Untrusted code” seems to be defined as “code not developed by me” with “me” being the operator of the cloud services. This applies to any cloud services for applications as opposed to cloud services for host virtualization. The limitations seem to be tied to the fact that most cloud services vendors are still in the throes of migrating traditional private instances, or “on-prem”, software to hosted, multi-tenant environments that aggregate all of the complexities and requirements of “on-prem” software into a single location. This rush to the cloud is fundamentally changing software design and architecture. The answer isn’t web services and it’s not better virtualization. The answer is aligning maturity levels between consumers and suppliers of the cloud service providers. With all of the marketing tricks and schemes focused on driving adoption of cloud services distracting focus from implementation, at the end of the day it comes down to the customer and service provider being able to communicate and live with each other’s expectations; technically and operationally. Service providers need to be able to provide clearly understood technical and operational standards. Customers have to be able to understand and work with the impact these standards have on their applications and services. This drives a lot of cost into the equation of the founding business case, which is often missed.

1 Comment

SPChainGang: SharePoint 2007 & 2010 Link Analysis & Rewriting Utility

SPChainGang is a link analysis and rewriting utility for Microsoft SharePoint 2007 and SharePoint 2010. It locates, analyzes, and reports links found in most webparts in SharePoint, but not documents. In one mode it will rewrite links based on a path mapping defined in a SharePoint list. Based on my research, there is nothing out in the marketplace that performs these functions other than SPChainGang.

No Comments

Time To Write Again

After one and a half years of not writing it is time to write again. My life feels empty without writing. This site will be archived and multiple blogs put up in it’s place.

No Comments