Steve Conover's blog
The Pivotal News Network has been going strong for six months (Pivots: talk to me if you'd like to share into the feed). Here are some highlights from May:
When starting any software project, there’s an age old argument: should we build something simple that solves our current problem or should we use an existing product that’s more complex, but more feature rich, since we know that’s where we’re going to end up in the future?
...
an oft neglected repercussion of building too much too quickly is that the extra functionality can calcify your product and make it very rigid. Releases become more complex, new features take longer to implement and bugs take longer to fix. You can find yourself a prisoner of your product, maintaining functionality and features that no one ( or very few ) people use. It can demoralize a engineering team, making them more and more susceptible to the nuclear option: the big rewrite.
I think the tendency to lean towards a more exhaustive solution upfront comes from a time when the effort require to change software was much higher than it is today. When systems were written in C, C++, Perl or even Java, making changes was a large undertaking. The thought of possibly throwing away chunks of code was nerve racking. It represented a huge investment in time and money. However, with todays rapid development languages and frameworks like Ruby/Rails & Python/Django, the investment required to create something, both in time and money, is rapidly shrinking.
Jeff [Patton]’s reply shocked me:
“The Ruby community cares about building high-quality apps, but doesn’t necessarily care about shipping high-value apps.”
Jeff went on to say that the Ruby community is obsessive about craftsmanship. This is a good thing, of course. We test. We write clean code. We take the time and care to build applications that are beautiful and do what our customers ask for.
Therein lies the rub: what customers ask for is rarely what they want, and almost never what they need. As Henry Ford put it, “If I had asked what people wanted, they would have said faster horses.” Or as I put it, your customer may pay you $1000 to deliver him a knuckle sandwich, but no amount of precision or strength training is going to leave you with a happy customer.
It turns out that constructing a high-quality application is not enough – you have to conceptualize and design an application that users will actually find useful. Doing this is every bit as difficult as constructing the software, if not harder. It requires a combination of research – generating new ideas from asking questions & identifying problems – and feedback – testing out ideas you’ve created. The Ruby & Agile worlds have been primarily focused on getting user feedback, without doing the all-important research.
Weeks ago, some people in the Ubuntu community got a bit disappointed with the distribution’s core team:
We are supposed to be a community, we all use Ubuntu and contribute to it, and we deserve some respect regarding these kind of decisions. We all make Ubuntu together, or is it a big lie?
We all make Ubuntu, but we do not all make all of it. In other words, we delegate well. We have a kernel team, and they make kernel decisions. You don’t get to make kernel decisions unless you’re in that kernel team. You can file bugs and comment, and engage, but you don’t get to second-guess their decisions. We have a security team. They get to make decisions about security. You don’t get to see a lot of what they see unless you’re on that team. We have processes to help make sure we’re doing a good job of delegation, but being an open community is not the same as saying everybody has a say in everything.
- from Velocity as a Goal
From my experience having velocity as a goal doesn't make any difference to the motivation of the team which is often cited as the reason for referring to it as a target. In all the teams I've worked on people are giving their best effort anyway so they can only really have an impact on the velocity by doing one of the following:
- Working longer hours
- Cutting corners on quality (by less testing perhaps)
- Finding a smarter way of working
... In reality I haven't noticed that people on the teams I've worked on pay that much attention to whether velocity is considered a target or not. People just do their job and we pretty much always have the same velocity each week regardless.
More popular shared items:
7 LESSONS LEARNED WHILE BUILDING REDDIT TO 270 MILLION PAGE VIEWS A MONTH
"The best way to handover work is to leave a broken test for your colleague to fix."
How many lines of code does it take to create the Android OS?
The unimportance of product names (which got smacked down in the Google Reader comments)
XSS #1: There's a huge cross-site scripting hole if you use the meta refresh tag...it has a "data" attribute into which you can insert arbitrary javascript.
XSS #2: Cross-site scripting resources, from an internal mailing list:
"I've gained a new appreciation for the importance of carefully thinking through security and escaping in RoR there's more than just h()'ing all your user entered data."
XSS vulnerabilities - http://ha.ckers.org/xss.html.
Very useful catalog of different XSS vectors. Includes some utilities to base64-, URL- and hex- encode attacks so you can test out your apps.General OWASP wiki - http://www.owasp.org/index.php/Main_Page. Lots of useful data information here. OWASP is a nonprofit group charted to improve the security of webapps in general.
Security Guide for RoR - http://www.lulu.com/product/download/owasp-ruby-on-rails-security-guide/4489819 general guidelines/things to think about for securing RoR apps.
Loofah - http://github.com/flavorjones/loofah is supported by a fellow Pivot and provides fast and good sanitization built on Nokogiri, albeit slightly slower on short strings than brittle regular expressions. It's in production at several companies.
"Loofah excels at HTML sanitization (XSS prevention). It includes some nice HTML sanitizers, which are based on HTML5lib’s whitelist, so it most likely won’t make your codes less secure."
Happy New Year
We've pooled some Pivot shared tech news feeds and made this feedburner feed:
http://feeds.feedburner.com/pivotal-news-network
The content is in the spirit of Blabs, so we hope readers here might find it to be useful. See what you think.
by Russell Edens. He has a great take on why Erector is interesting, complete with code examples:
With erector [views] are first class plain old ruby objects. Why is this good? It gives you all the tools of inheritance and mixin's for your views. That is cool. Especially for an application with multiple views of the same underlying models. You can refactor your views into base classes that derive and render the same data in different ways. This is object oriented design for views. Nice.
I've seen object oriented view code in other languages and it leads to some very powerful re-use that all OO programmers can understand. The most ambitious of these attemps was by an HR company ...[that] created their own markup language that was object oriented. The nature of HR data is that it has very complicated rules regarding who can see what data and when. The OO design of the language allowed that to be abstracted to the base classes and a functional programmer simply focused on the problem at hand. They took it further, as all commercial enterprise applications do, and they allowed the customer to define new models and views. Those views were very easy to write with this advanced data access logic abstracted out. Their customers loved it. They wrote very advanced business applications on top of this abstraction.
Views as simple classes, methods, and objects in Ruby - perfect!
Erector Hello World:
class Hello < Erector::Widget
def content
html do
head do
title "Hello"
end
body do
text "Hello, "
b "world!"
end
end
end
end
For more see the Erector user guide.
We made a code change and deployed to demo, and all the sudden some of our ruby processes were eating a ton of CPU against our full dataset.
In the java world you can send a SIGQUIT to any running java process and get a thread dump. Go ahead, run a java process and kill -3 it.
You can get this in the ruby world by using xray:
sudo gem install xray
Drop this into your code:
require "xray/thread_dump_signal_handler"
Now:
kill -3 <ruby pid>
Look in the log file where you're sending stdout. You'll see something like:
=============== XRay - Done ===============
/usr/lib64/ruby/gems/1.8/gems/eventmachine-0.12.0/lib/eventmachine.rb:224:in `call'
_ /usr/lib64/ruby/gems/1.8/gems/eventmachine-0.12.0/lib/eventmachine.rb:224:in `run_machine'
_ /usr/lib64/ruby/gems/1.8/gems/eventmachine-0.12.0/lib/eventmachine.rb:224:in `run'
_ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/lib/thin/backends/base.rb:57:in `start'
_ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/lib/thin/server.rb:150:in `start'
_ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/lib/thin/controllers/controller.rb:80:in `start'
_ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/lib/thin/runner.rb:173:in `send'
_ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/lib/thin/runner.rb:173:in `run_command'
_ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/lib/thin/runner.rb:139:in `run!'
_ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/bin/thin:6
_ /usr/bin/thin:19:in `load'
_ /usr/bin/thin:19
(that's thin patiently waiting to service the next request)
A one-line code drop-in results in a powerful new inspection tool. Pretty neat.
For bonus points:
ps ax | grep "thin server" | grep -v grep | awk '{print $1}' | xargs kill -3
For more bonus points, stick this in a capistrano task and grab the thread dumps from your logs, and you'll have a cluster-wide snapshotting tool.
We kill -3'd our CPU-eating thins and discovered a directory scan problem introduced by a recent code change - totally obvious from the thread dump. Now we're nailing it down with a failing perf unit test and fixing the problem.
I'm here as part of the Best Buy Remix crew, hanging out in Mashery's Circus Mashimus all weekend. Come by, have a beer, and check out Remix and other interesting API stuff if you're at SXSW.
We're in a room near the front of the convention center, not far from the Pepsi booth.
-Steve
By Steve Conover and Brian Takita
Peer-to-Patent, one of Pivotal Labs' clients, got Slashdotted last week, and we had no trouble handling the load. The site was just as responsive as it always is, and we didn't come close to having a scale problem.
Moral of the story: the technology for serving static web pages is old, boring, and extremely scalable. If you have the type of site that can be page-cached, do so aggressively, starting with the front page and any pages likely to be linked to. We got a huge payoff for the engineering time that we invested in our page-caching strategy.
Highlights:
- We moved away from Rails page-caching and developed our own "holeless cache", which uses a symlink trick (see below) to instantly and "holelessly" switch to a new version of a cached page. (The cache "hole" is the time between the expiration or purge of a cached page and the time when it's regenerated. The danger is that in that time your Mongrels can be saturated with requests - something we proved to ourselves could easily happen.)
Here's our symlink trick, using the front page as an example:
- Have index.html point to index.html.current
- If (index.html.current is >= 20 minutes old)
- Copy index.html.current to index.html.old
- Point index.html to index.html.old
- Rewrite index.html.current by asking Rails for the page (using the process method)
- Repoint index.html back at index.html.current
- Repeat step 2 every minute using a cron job.
For cache expiration that's model-based, we make a call from the model observer class to our holeless cache routine, instead of using Rails cache sweepers. So, instead of just deleting the cached page we regenerate it in place.
It was important to write tests that proved that the HTML we generated for cached pages looked exactly the same in different "modes" (user logged in vs not, for example). This forced us to push modal decision logic out of Markaby templates and into JavaScript, meaning that view-oriented Rspec tests asserting modal differences became useless. We rewrote them as Selenium tests.
Performance/load testing: we tried several tools and approaches and found that a simple Ruby script that launches wget requests (that write to /dev/null) in many separate threads worked best for us.
We send down exactly one .js and one .css file. If you are sending down more than one of each of these to the browser, you have a performance problem. Fix it with asset packager.
Update: one clarification about the cron job: we deploy this "automatically" using capistrano.
