palidwor.com

June 30, 2009

Is scientific publishing about to be disrupted? (yes)

Filed under: bioinformatics — Gareth @ 1:02 pm

Nice post on scientific publishing:

Is scientific publishing about to be disrupted?

The analogy made to other disruptive innovations is a good one, however fails to emphasize the niche nature of a lot of these disruptive technologies/companies initially before they get really big. They’re usually in areas the big boys think are beneath them.

If I had to point to one opportunity of this type in scientific publishing it would be creating some mechanisms (company/platform/API) for tracking/acknowledging various non-publication contributions i.e. micro-citations.

May 22, 2009

Cost of the NSERC Science Grant Peer Review System Exceeds the Cost of Giving Every Qualified Researcher a Baseline Grant

Filed under: bioinformatics — Gareth @ 3:14 pm

Found this paper by way of Andre Vellino’s Synthese blog, “Cost of the NSERC Science Grant Peer Review System Exceeds the Cost of  Giving Every Qualified Researcher a Baseline Grant

The article is full of great info, this has to be one of the sharpest (and snarkiest) observations:

“The word “excellence” has come to permeate Canadian grant
agencies, to the point where it has become a meaningless incantation
(Fig. 1). Excellence is defined ultimately as success in getting
grants. Excellent scientists, with excellent students, are chosen by
excellent peer reviewers, and, when money is tight, only the most
excellent of the excellent get funded. Grant agencies produce
lists of “excellent scientific journals,” which are the journals that
their excellent peer review panel members publish in.”

It’s an excellent paper, well worth a read.

Amazon EC2/S3 physical data upload

Filed under: bioinformatics, biology, programming — Gareth @ 8:45 am

Amazon continues to incrementally improve their cloud computing services. It’s now possible to ship hard drives to Amazon for upload into their cloud. This is very useful for huge (>1TB data sets) which are getting more and more common in biology.

I’ve recently used EC2 extra-large instances to do analysis using the R ShortRead package. It worked pretty well, though there are platform idiosyncracies to deal with, like the extra-large instances having no swap space configured by default. You can reallocate drive space to swap but using puts a big load on the processor and slows things down a lot.

March 27, 2009

our Huntingtin article on March 2009 cover of PLoS Computational Biology

Filed under: bioinformatics — Gareth @ 9:27 am

Miguel put this image together with Google Sketchup, I think it’s a great way to represent protein domains.

March 26, 2009

“the emergence of eschatology as a design challenge.”

Filed under: meta — Gareth @ 9:10 am

Nice article by Clive Thompson on how online gaming worlds end.

Game designers are realizing that ending their world in a dramatically satisfying way is actually a very interesting logistical, ludogical, and emotional trick. In essence, we’re slowly seeing the emergence of eschatology as a design challenge.

The situations he describes remind me of the scene in Consider Phlebus where the Culture is about to destroy the Vavatch Orbital. I can imagine game worlds with a clear sunset date and a grand zombie battle/world war/rapture/grey goo onslaught at the end.

BioconductorBuntu

Filed under: bioinformatics — Gareth @ 8:52 am
BioconductorBuntu is a custom distribution of Ubuntu Linux that automatically installs a server-side microarray processing environment, and provides a user friendly web-based graphical user interface to many of the tools developed by the Bioconductor Project

Good idea. Terrible name though…

March 23, 2009

dimensionality mapping terminology: tears and false neighborhoods

Filed under: bioinformatics — Gareth @ 10:53 am

Came across this site recently for the “Data Driven High-Dimensional Scaling” tool, and it had some nice terminology on it to describe issues with dimensional mapping, with applicability to Hilbert Curve mapping

tears: (as in rips not weeping), describe points that are disproportionately far apart in the mapped space versus the original space.

false neighborhoods: are points that are disproportionately close together in the mapped space versus the original space.

1D spaces mapped onto Hilbert Curves don’t have tears but they do have false neighborhoods.

Hilbert curve visualization publication in Bioinformatics

Filed under: bioinformatics — Gareth @ 10:47 am

I was rather surprised that the Hilbert curve tool I mentioned previously had not been published in a peer-reviewed journal as it’s novel and useful. The publication just came out:

Visualisation of genomic data with the Hilbert curve.

And it’s open access, so no paywall, hurray!

March 21, 2009

Detection of Alpha-Rod Protein Repeats Using a Neural Network and Application to Huntingtin

Filed under: bioinformatics — Gareth @ 9:08 am

Our new publication, Detection of Alpha-Rod Protein Repeats Using a Neural Network and Application to Huntingtin is now out in PLoS Computational Biology.

A friend at work once said that “most bioinformatics publications are jokes without punchlines”: they predict something or provide some sort of tool, but then don’t follow up with the (often obvious) experiments to validate the results. This paper has something of a punchline as yeast two hybrid was used to test interactions of the predicted domains.

Also: there are some pretty pictures.

March 20, 2009

EC2 in BMC Bionformatics publication

Filed under: bioinformatics — Gareth @ 12:06 pm

I think this is the first publication that describes using EC2 for bioinformatics:

 Is searching text more effective than searching abstracts?

From the publication:

EC2 is an example of a “utility computing” service, where anyone can “rent” computing cycles at a reasonable cost. For this work, EC2 provided a homogeneous computing environment that supports easy comparison of different cluster configurations. The basic unit of  computing resource in EC2 is the small instance-hour, the virtualized equivalent of a processor core with 1.7 GB of memory, running for an hour. I experimented with the following configurations:
• Lucene (version 2.0), running on a single EC2 instance. Default settings “out of the box” were used for all experiments.
• Ivory (with Hadoop version 0.17.0), running on an EC2 cluster with 10 slave instances (plus 1 instance for the master). This is comparable to a cluster with 10 cores.
• Same as above, except with 20 slave instances, comparable to a cluster with 20 cores.

Newer Posts »

Powered by WordPress