A use case where a typical enterprise would need to have data in the cloud to fully enjoy the benefits of AI

Ṁ600 / 600

bounty left

Can anyone explain this tweet?

Talking to a number of larger enterprises and vendors at AWS ReInvent and one thing is clear: “Your data is your differentiator!”
Why? When it comes to AI it’s really hard to move your data to your models. It’s expensive to move data (egress fees…). It’s complicated to move data (lineage breaks, governance policies need to be updated, etc). The path of least resistance is bringing your models and AI applications to your data (and not the other way around).
It becomes even more important to get your data in order to capture value from the current wave of AI. As many have said, you don’t have an AI strategy without a data strategy

To me this is very counterintuitive. For regular ML, the kind of thing that you use Spark or Hadoop obviously, yeah. You may need hundreds of terabytes of data in the cloud to train a model.

But for (generative) AI, egress fees and the data requirements are simply way slower. The cost is the API call for OpenAI or AWS Bedrock, not the egress fees or bandwidth in the internet. GPT-3 was trained in 500Gb of data, which costs $40 to retrieve from AWS.

I see no reason why an enterprise could not be wall to wall on Amazon Web Services and for their AI needs, request OpenAI or GCP over the internet for GPT-4 or Gemini.

Obviously, I understand the contractual constraints. You can use your AWS Credits to use Bedrock. Some safety concerns. But these pale in comparison with the usual reasons why enterprises and startups go to the cloud (Capex to Opex, division of labor, serverless, hosted services, better access to SaaS and PaaS applications, scalability, cheaper development costs, etc)

Technology

Technical AI Timelines

Artificial Intelligence

Get

1,000

to start trading!

People are also trading

Will Cloud Providers see greater revenue growth in 2025 than AI companies?

Do data engineering jobs disappear due to the developments in AI in near future ?

19% chance

AI "devops" #2: Will there be an AI that can help manage cloud infrastructure by 2030?

78% chance

11 Comments

Sort by:

He’s saying it would be prohibitively expensive to sample, clean, update, transform and store proprietary data as you do to train AI models, but that the former is critically important to the success of the latter.

When it comes to the ETL and data pipelines typically required to transform some original, internal data set into something usable in a reliable, repeatable way, the $40 it costs to retrieve the data is not the right cost calculation.

You need a good data strategy to get any dataset to the point where it can confidently be used as training data, because these preliminary steps require a lot of operations and a lot of compute. These operations and this strategy are what give you confidence in the model, or at least the knowledge of what to train in order to improve it.

This is fairly standard and has been known for a long time. ETL and AI models were/are being commoditized, whereas data you collect as a business is priceless.

He is also saying that it's tricky to move data around, but not AI models - and he's not wrong.

People are also trading

Will Cloud Providers see greater revenue growth in 2025 than AI companies?

Do data engineering jobs disappear due to the developments in AI in near future ?

19% chance

AI "devops" #2: Will there be an AI that can help manage cloud infrastructure by 2030?

78% chance

People are also trading

People are also trading

Related questions