I have the great pleasure of being surrounded by amazing scientists. Over the years, I’ve learned quite a few things from my colleagues and from my own mistakes. One of the most important lessons I’ve learned is the skill of developing a research project. We aren’t explicitly taught this as graduate students and I’m hoping to write about it today. This post is specifically written for first/second year graduate students in computational biology and can probably still applicable to other data-driven fields.
I’ve split up the lessons I’ve learned into 5 guidelines.
Guideline 1: Start with a question and a data-set
I find it helpful to start with a specific question and a data-set to start the research. Questions can come from anywhere. For example, your question might have been sparked by a conversation with your colleague. Or, you might have applied a method to a data-set and found something unexpected. Coming up with a “good” question might be difficult for beginning graduate students who do not have an expectation of what is already known/non-trivial and interesting. However, from my personal experience, I find the specific details of the questions to be less important. For example, I find that after looking at the data, my initial question leads me to another more interesting question; or I find that my initial question doesn’t make much sense.
Guideline 2: be in the right place for your question
Your advisor usually has unique areas of expertise and it makes the most sense if your questions align with your advisor’s expertise. For example, it doesn’t make much sense for a graduate student to study the evolution of ground squirrels if they are in a cancer lab. We have mentors for a reason! I’ve seen beginning graduate students choose research areas that are outside their advisor’s expertise. In my experience, these projects tend to progress at a slower rate.
Guideline 3: Exploratory Data Analysis
This is a very important guideline and should be done before developing a model. This often involves visualizing the data many different ways. From your plots, you will ask more questions, and so on. Exploratory data analysis is an iterative process. Here, critical thinking is well.. critical. You need to carefully look at your plots, isolate interesting patterns… and just keep asking the data questions.
I found that tools such as Rstudio, Snakemake, dplyr, and ggplot2 to be extremely helpful. Basically, these tools allow you to turn your questions into code and plots quite quickly.
As a beginning graduate student, I did not appreciate this guidelines, and sometimes would want to directly jump into the model building. One of my advisors (Matthew Stephens) often told me to “not put the cart before the horse”. It took me a while to appreciate the significance of this saying (I initially thought it was a British thing).
Guideline 4: Start with a simple model, and build on it.
Ever heard of the phrase “eat an elephant one bite a time”? Start with a simple model that fits that patterns you saw in the data (see Guideline 3). And then you can build on it.
Guideline 5: Talk with people
Science should not be done in isolation. I find talking to people to be crucial to the development of my projects. You do not have to talk to senior scientists, but anyone who is interested and willing to listen. It is especially helpful to talk to people who have previously analyzed the type of data you are studying. For example, in ancient dna project, the fundamental problem we were trying to solve became more clear after a discussion with my ancient-dna colleagues. Maintain good relations with your colleagues and communicate often!
There is no algorithm for research. Indeed, every research project is different. I’ve seen a project turn out quite interesting that violated Guideline 3 and 4, such that it started with the method and not the data. However, these guidelines seem to work the best for me. I am interested in other opinions too. What have you found important/helpful for doing research?