months on a Machine Learning mission, solely to find you by no means outlined the “right” downside in the beginning? If that’s the case, or even when not, and you might be solely beginning with the info science or AI subject, welcome to my first Ivory Tower Observe, the place I’ll handle this matter.
The time period “Ivory Tower” is a metaphor for a scenario during which somebody is remoted from the sensible realities of on a regular basis life. In academia, the time period typically refers to researchers who have interaction deeply in theoretical pursuits and stay distant from the realities that practitioners face exterior academia.
As a former researcher, I wrote a quick collection of posts from my previous Ivory Tower notes — the notes earlier than the LLM period.
Scary, I do know. I’m scripting this to handle expectations and the query, “Why ever did you do issues this fashion?” — “As a result of no LLM instructed me the best way to do in any other case 10+ years in the past.”
That’s why my notes comprise “legacy” subjects comparable to information mining, machine studying, multi-criteria decision-making, and (typically) human interactions, airplanes ✈️ and artwork.
Nonetheless, at any time when there is a chance, I’ll map my “previous” data to generative AI advances and clarify how I utilized it to datasets past the Ivory Tower.
Welcome to put up #1…
How each Machine Studying and AI journey begins
— It begins with an issue.
For you, that is often “the” downside as a result of it is advisable stay with it for months or, within the case of analysis, years.
With “the” downside, I’m addressing the enterprise downside you don’t absolutely perceive or know the best way to clear up at first.
An excellent worse state of affairs is while you assume you absolutely perceive and know the best way to clear up it rapidly. This then creates solely extra issues which might be once more solely yours to unravel. However extra about this within the upcoming sections.
So, what’s “the” downside about?
Causa: It’s principally about not managing or leveraging sources correctly — workforce, gear, cash, or time.
Ratio: It’s often about producing enterprise worth, which may span from improved accuracy, elevated productiveness, price financial savings, income positive factors, quicker response, choice, planning, supply or turnaround occasions.
Veritas: It’s at all times about discovering an answer that depends and is hidden someplace within the current dataset.
Or, a couple of dataset that somebody labelled as “the one”, and that’s ready so that you can clear up the downside. As a result of datasets comply with and are created from technical or enterprise course of logs, “there must be an answer mendacity someplace inside them.”
Ah, if solely it had been really easy.
Avoiding a unique chain of thought once more, the purpose is you have to to:
1 — Perceive the issue absolutely,
2 — If not given, discover the dataset “behind” it, and
3 — Create a technique to get to the answer that may generate enterprise worth from it.
On this path, you’ll be tracked and measured, and time won’t be in your facet to ship the answer that may clear up “the universe equation.”
That’s why you have to to method the issue methodologically, drill all the way down to smaller issues first, and focus completely on them as a result of they’re the foundation reason behind the general downside.
That’s why it’s good to learn to…
Think like a Data Scientist.
Returning to the issue itself, let’s think about that you’re a vacationer misplaced someplace within the large museum, and also you need to work out the place you might be. What you do subsequent is stroll to the closest information map on the ground, which is able to present your present location.
At this second, in entrance of you, you see one thing like this:
The subsequent factor you would possibly inform your self is, “I need to get to Frida Kahlo’s portray.” (Observe: These are the insights you need to get.)
As a result of your aim is to see this one portray that introduced you miles away from your property and now sits two flooring beneath, you head straight to the second flooring. Beforehand, you memorized the shortest path to succeed in your aim. (Observe: That is the preliminary information assortment and discovery part.)
Nonetheless, alongside the best way, you come upon some obstacles — the elevator is shut down for renovation, so it’s a must to use the steps. The museum work had been reordered simply two days in the past, and the information plans didn’t mirror the modifications, so the trail you had in thoughts to get to the portray is just not correct.
Then you end up wandering across the third flooring already, asking quietly once more, “How do I get out of this labyrinth and get to my portray quicker?”
When you don’t know the reply, you ask the museum employees on the third flooring that can assist you out, and also you begin accumulating the brand new information to get the right path to your portray. (Observe: It is a new information assortment and discovery part.)
Nonetheless, when you get to the second flooring, you get misplaced once more, however what you do subsequent is begin noticing a sample in how the work have been ordered chronologically and thematically to group the artists whose types overlap, thus supplying you with a sign of the place to go to search out your portray. (Observe: It is a modelling part overlapped with the enrichment part from the dataset you collected throughout college days — your artwork data.)
Lastly, after adapting the sample evaluation and recalling the collected inputs on the museum route, you arrive in entrance of the portray you had been planning to see since reserving your flight a number of months in the past.
What I described now could be the way you method information science and, these days, generative AI issues. You at all times begin with the tip aim in thoughts and ask your self:
“What’s the anticipated end result I would like or have to get from this?”
You then begin planning from this query backwards. The instance above began with requesting holidays, reserving flights, arranging lodging, touring to a vacation spot, shopping for museum tickets, wandering round in a museum, after which seeing the portray you’ve been studying about for ages.
In fact, there’s extra to it, and this course of ought to be approached in another way if it is advisable clear up another person’s downside, which is a little more complicated than finding the portray within the museum.
On this case, it’s a must to…
Ask the “good” questions.
To do that, let’s define what a good question means [1]:
A good information science query have to be concrete, tractable, and answerable. Your query works properly if it naturally factors to a possible method on your mission. In case your query is too obscure to counsel what information you want, it received’t successfully information your work.
Formulating good questions retains you on observe so that you don’t get misplaced within the information that ought to be used to get to the precise downside resolution, otherwise you don’t find yourself fixing the fallacious downside.
Going into extra element, good questions will assist determine gaps in reasoning, keep away from defective premises, and create various situations in case issues do go south (which just about at all times occurs)👇🏼.

From the above-presented diagram, you perceive how good questions, at the start, have to assist concrete assumptions. This implies they must be formulated in a manner that your premises are clear and guarantee they are often examined with out mixing up details with opinions.
Good questions produce solutions that transfer you nearer to your aim, whether or not by means of confirming hypotheses, offering new insights, or eliminating fallacious paths. They’re measurable, and with this, they connect with mission targets as a result of they’re formulated with consideration of what’s doable, precious, and environment friendly [2].
Good questions are answerable with obtainable information, contemplating present information relevance and limitations.
Final however not least, good questions anticipate obstacles. If one thing is definite in information science, that is the uncertainty, so having backup plans when issues don’t work as anticipated is essential to supply outcomes on your mission.
Let’s exemplify this with one use case of an airline firm that has a problem with growing its fleet availability resulting from unplanned technical groundings (UTG).
These surprising upkeep occasions disrupt flights and price the corporate important cash. Due to this, executives determined to react to the issue and name in an information scientist (you) to assist them enhance plane availability.
Now, if this is able to be the primary information science process you ever obtained, you’ll possibly begin an investigation by asking:
“How can we remove all unplanned upkeep occasions?”
You perceive how this query is an instance of the fallacious or “poor” one as a result of:
- It’s not reasonable: It contains each doable defect, each small and massive, into one unimaginable aim of “zero operational interruptions”.
- It doesn’t maintain a measure of success: There’s no concrete metric to indicate progress, and in the event you’re not at zero, you’re at “failure.”
- It’s not data-driven: The query didn’t cowl which information is recorded earlier than delays happen, and the way the plane unavailability is measured and reported from it.
So, as a substitute of this obscure query, you’ll most likely ask a set of focused questions:
- Which plane (sub)system is most crucial to flight disruptions?
(Concrete, particular, answerable) This query narrows down your scope, specializing in just one or two particular (sub) methods affecting most delays. - What constitutes “crucial downtime” from an operational perspective?
(Helpful, ties to enterprise targets) If the airline (or regulatory physique) doesn’t outline what number of minutes of unscheduled downtime matter for schedule disruptions, you would possibly waste effort fixing much less pressing points. - Which information sources seize the foundation causes, and the way can we fuse them?
(Manageable, narrows the scope of the mission additional) This clarifies which information sources one would wish to search out the issue resolution.
With these sharper questions, you’ll drill all the way down to the actual downside:
- Not all delays weigh the identical in price or influence. The “right” information science downside is to foretell crucial subsystem failures that result in operationally pricey interruptions so upkeep crews can prioritize them.
That’s why…
Defining the issue determines each step after.
It’s the muse upon which your information, modelling, and analysis phases are constructed 👇🏼.

It means you might be clarifying the mission’s targets, constraints, and scope; it is advisable articulate the final word aim first and, aside from asking “What’s the anticipated end result I would like or have to get from this?”, ask as properly:
What would success seem like and the way can we measure it?
From there, drill all the way down to (doable) next-level questions that you simply (I) have discovered from the Ivory Tower days:
— Historical past questions: “Has anybody tried to unravel this earlier than? What occurred? What remains to be lacking?”
— Context questions: “Who’s affected by this downside and the way? How are they partially resolving it now? Which sources, strategies, and instruments are they utilizing now, and might they nonetheless be reused within the new fashions?”
— Impression Questions: “What occurs if we don’t clear up this? What modifications if we do? Is there a price we are able to create by default? How a lot will this method price?”
— Assumption Questions: “What are we taking with no consideration that may not be true (particularly with regards to information and stakeholders’ concepts)?”
— ….
Then, do that within the loop and at all times “ask, ask once more, and don’t cease asking” questions so you possibly can drill down and perceive which information and evaluation are wanted and what the bottom downside is.
That is the evergreen data you possibly can apply these days, too, when deciding in case your downside is of a predictive or generative nature.
(Extra about this in another notice the place I’ll clarify how problematic it’s attempting to unravel the issue with the fashions which have by no means seen — or have by no means been skilled on — related issues earlier than.)
Now, going again to reminiscence lane…
I need to add one essential notice: I’ve discovered from late nights within the Ivory Tower that no quantity of information or information science data can prevent in the event you’re fixing the fallacious downside and attempting to get the answer (reply) from a query that was merely fallacious and obscure.
When you’ve gotten an issue readily available, don’t rush into assumptions or constructing the fashions with out understanding what it is advisable do (Festina lente).
As well as, put together your self for surprising conditions and do a correct investigation along with your stakeholders and area consultants as a result of their endurance might be restricted, too.
With this, I need to say that the “actual artwork” of being profitable in information initiatives is understanding exactly what the issue is, determining if it may be solved within the first place, after which developing with the “how” half.
You get there by studying to ask good questions.
If I got one hour to save lots of the planet, I might spend 59 minutes defining the issue and one minute fixing it.
Thanks for studying, and keep tuned for the subsequent Ivory Tower notice.
In the event you discovered this put up precious, be at liberty to share it along with your community. 👏
Join for extra tales on Medium ✍️ and LinkedIn 🖇️.
References:
[1] DS4Humans, Backwards Design, accessed: April fifth 2025, https://ds4humans.com/40_in_practice/05_backwards_design.html#defining-a-good-question
[2] Godsey, B. (2017), Suppose Like a Knowledge Scientist: Sort out the info science course of step-by-step, Manning Publications.