Interesting Things
- RubyMine 1.1 + Latest Mac OS X Java Upgrade + configuring RubyMine to work with Java 1.6 = Complex FAIL. We downgraded to RubyMine 1.0.5 and it works again.
Explore Blog posts about everything we are up to, Tech Talk videos covering a huge range of timely topics and Event listings to keep you current on happenings at the Labs.
Jeff is Chief Scientist at Cloudera, which helps enterprises with Hadoop implementations.
Hadoop consists of three separate modules, which are apparently in the process of being split into separate Apache projects:
I’ll just mention some of the interesting little tidbits from the presentation:
HDFS:
Some examples of Hadoop-based projects:
Hadoop @ Yahoo: 16 clusters, each cluster is 2.5PB and 1400 nodes
Cloudera maintains convenient, stable Hadoop packages – it’s all open-source – so you don’t have to go around figuring out what version of what subproject works best with others.
Testing: Hadoop has a standalone mode, which uses a single reducer in one JVM.
Jeff mentioned that they use Facebook’s Scribe for distributed logging.
And last but not least, Cloudera has a GetSatisfaction page.
Quick report from Steve Sounders’ workshop at Velocity 2009, current Googler, author of High Performance Websites.
Short version: he has a new brand new book out, and if you’re interested in any of the following tips you should probably buy it:
Even Faster Web Sites
Resources:
cuzillion – model your page and see how various browsers load it using Firebug’s Net tab or…
httpwatch works in IE and Firefox
pagespeed – A little like (YSlow)[http://developer.yahoo.com/yslow/] (Steve uses a combo of YSlow and Pagespeed day-to-day) but gives you a different set of perf information, notably what % of functions in your script are actually invoked in the header, vs afterwards.
spriteme A tool that Steve developed and just released, which looks to be a major leap in css sprite-generation technology – i.e. it doesn’t just do the (easy) part where all the images get combined together. You get css help, etc.
smush.it Uses non-lossy image optimization methods to reduce the number of bytes your images take.
Some tips (I’m assuming these all get better/more elaborate treatments in his book):
For over 95% of websites, the vast majority (80%) of the time spent in page load is spent on the front end (i.e. only 10-20% is spent transferring html).
Scripts block other elements from downloading. So while js is downloading and executing, nothing else can be downloaded.
Typically only 25% of js functions are called before body onLoad (pagespeed helps you see what % this is for you). So one thing to consider when optimizing is lazy-loading the other 75%.
There are tricks you can use to pull down scripts in parallel, for instance by creating script tags through document.createElement and attaching to the dom. But there are other techniques, and pitfalls for many of them in different browsers. He goes through the strategy decision tree in the new book.
Bad: stylesheet tag followed by an inline script. This stops all the parallel resource loading and forces the browser to only download the js, then continue.
Using different domains for assets. A well-known trick. Steve adds that returns diminish around 2-4 domains. Also points out that the browser doesn’t care about whether these are actually separate hosts, just that the actual names are different, so you could use a simple CNAME record to make this work with one server.
Flush the document early. Particularly header sections (some common images + html). In addition to the raw speed benefit, Google user testing shows this is very positive for user perception – they get visual feedback earlier and have a perception that it’s a “fast page”.
Note that FF 3.5+ contains an interesting new event: MozAfterPaint – a great way to see when the browser decides to repaint parts of the page. See John Resig’s post on MozAfterPaint for more.
And don’t miss stevesounders.com
A new version of Tweed is available! (v.0.9.7)
You can place a marker (will appear above tweet) on any timeline — they are saved and reloaded with the timeline. If you mark another tweet, the marker will move. To remove the marker completely, tap unmark from the tweet menu for the tweet with the marker.
There is a bug we just discovered (after we gave Palm the release for review): if you have a marker on a timeline and then tap “Load More” the marker position is not displayed correctly. It is stored correctly, so if you leave the timeline and return to it, it is then displayed correctly. (We will fix this soon — we didn’t want to block the other features since we discovered this it morning.)
So, we know photo upload/integration is missing.
Here’s the scoop on photo integration. There isn’t direct support for photo upload in the Palm Mojo SDK, yet.
It is definitely coming, though given the incredible demand for photo integration in Tweed (everyone wants it yesterday, or more precisely, at launch), we realize it won’t be ready soon enough.
So, we are working with Palm on alternatives and options that will deliver the ability to tweet photos from Tweed.
We understand it is frustrating, but please know it is a HUGE priority for us and we are actively working on it.
We hope to have something in the coming weeks.
Let us know your feedback either @tweed on Twitter, tweed-support@pivotallabs.com or http://tinyurl.com/satisfaction-tweed




We’ve launched a Tracker Users Group in San Francisco, and the second meetup is on June 24th at 6:30pm at the Pivotal Labs office on Market St.
This second meetup will include a demo for new users; a rundown of the new features rolled out last night and soon to come; and a discussion of Story Estimation, Point scales, and the philosophy behind how they are used in Tracker.
Click the link below to become a member of the group and RSVP. We hope to see you there!
http://www.meetup.com/San-Francisco-Pivotal-Tracker-Users-Group/calendar/10658822/
We’ve added some new features to Pivotal Tracker.
There is a new activity feed on the dashboard. The feed lets you quickly view recent events that have occurred in all your projects including new stories, comments and stories that have been accepted and rejected. You can subscribe to this activity feed using any blog reader that supports Atom. Click the Subscribe link above the feed or the feed icon in the browser address bar and your browser should handle the rest. Recent activity data is also available via the API.

Another new feature on the dashboard is a small graph that shows the number of points accepted per iteration. The current velocity for each project is also displayed. If you hover over a project, you’ll see links to some of the more commonly used project pages, including members and settings.

The project history panel should now be more readable. Event timestamps are relative now (for example, “2 hours ago”), and updates to the same story within a short period of time together are bundled together. For example, if you add a new story, and immediately move it to the backlog, this will appear as one entry in the project history. You can also subscribe to a project’s history feed by clicking on the feed icon in the browser’s address bar.

To give even more visibility to the activity on your project, Tracker can now tweet project updates. Create a Twitter account for your project (or choose an existing Twitter account), and configure your Tracker project’s Twitter account settings on the Project Settings page. Remember – by default, Twitter accounts and tweets are public and searchable, so if you want to keep your project information private, make sure you enable the “protect my updates” option in your Twitter account settings.

If you select the “remember me” checkbox on the sign-in page, Tracker will do just that and you won’t need to sign in again after re-opening your browser. To clear this “remembered” state, log out or clear your browser cookies. Resetting your password will reset “remember me” on all computers where you have previously signed in.

Tracker now supports time zones, allowing you to see all dates in your local time zone, and giving all project members a consistent view of iteration boundaries. Every user has a default time zone (based on what your browser tells us), but it can be overriden on the My Profile page. Projects have time zones as well – this defaults to the time zone of the user who created it, but can be changed as well, in project settings. The project’s time zone controls when iteration boundaries occur. If a project’s iterations start on Mondays, and it’s time zone is PST, that means new iterations will start Mondays at midnight PST, and everyone in the world, will see the new iteration at that same time, even though they may be in different time zones. Someone in New York, for example, won’t see a new iteration until 3am their time.

More information about what’s new is available on the Pivotal Tracker recent updates pahge.
We’re starting a Tracker Users Group in New York, and the first meetup is on Jun 30, at 6:30pm, at the Pivotal Labs office on Chambers St. Click the link below to become a member of the group and RSVP. We hope to see you there!
Bulk geocoding: What to do if you’re importing millions of rows into your database and you need to geocode the address for each? Geocoding services like Google or Yahoo will throttle your requests, or shut you off completely.
Some options:
Non-numeric primary keys: fact or fiction?
We have one project that has considered using non-numeric primary keys in their MySql database. Enquiring minds want to know if this is a reasonable idea. General consensus was no:
by Russell Edens. He has a great take on why Erector is interesting, complete with code examples:
With erector [views] are first class plain old ruby objects. Why is this good? It gives you all the tools of inheritance and mixin’s for your views. That is cool. Especially for an application with multiple views of the same underlying models. You can refactor your views into base classes that derive and render the same data in different ways. This is object oriented design for views. Nice.
I’ve seen object oriented view code in other languages and it leads to some very powerful re-use that all OO programmers can understand. The most ambitious of these attemps was by an HR company …[that] created their own markup language that was object oriented. The nature of HR data is that it has very complicated rules regarding who can see what data and when. The OO design of the language allowed that to be abstracted to the base classes and a functional programmer simply focused on the problem at hand. They took it further, as all commercial enterprise applications do, and they allowed the customer to define new models and views. Those views were very easy to write with this advanced data access logic abstracted out. Their customers loved it. They wrote very advanced business applications on top of this abstraction.
Views as simple classes, methods, and objects in Ruby – perfect!
Erector Hello World:
class Hello < Erector::Widget
def content
html do
head do
title "Hello"
end
body do
text "Hello, "
b "world!"
end
end
end
end
For more see the Erector user guide.