It's a subject that often comes up when we talk about privacy on the web: personal data. Databricks is a real reference in this field, with one of the most powerful data analysis systems on the market.

Databricks is a proven data processing platform used by many companies today. Databricks' machine learning models are used to process, transform and explore user data.

What is Databricks used for? Why do companies need the platform's services? Who are its employees? Dive into the heart of Big Data with us.

What is Databricks?

Whether in the USA or Europe, the various scandals linked to the processing of personal data have caused a lot of ink to flow. But what do we really know about data and its analysis? 

Creation

Databricks was developed by the creators of Apache Spark. This analysis engine is used for large-scale data processing. In particular, it breaks performance records by sorting 100 TB of data in 23 minutes. This feat, achieved in 2014, enabled the program to dethrone Yahoo, which had managed to analyze the same amount of data, but this time in the space of 72 minutes. Spark achieved the same performance as Yahoo, but three times faster and using "only" 206 nodes (compared with 2,100 for Yahoo).

Databricks, founded by Ali Ghodsi and others, interacts with Apache Spark. In their perpetual quest for innovation, the developers have chosen to offer an AI-based tool for analyzing the data they collect. With an interface tailored to the needs of data scientists, engineers and analysts alike, Databricks has won over a wide audience and is today used by numerous companies.

How does it work?

The Databricks platform is made up of four open source tools: Apache Spark, which handles large volumes of data; DeltaLake, an open source data storage layer; MLFlow, which manages the lifecycle of pipelines and artificial intelligence applications; and Koalas, which helps data scientists analyze large volumes of data.

A comprehensive platform that enables specialists to in-depth workwith a wide range of tools. All these functions are grouped together in a single SaaS interface.

Databricks' strength lies in its adaptability to distributed cloud environments such as Microsoft Azure, Amazon Web Services or even Google CLoud Platform. The benefits? Applications running on GPUs or CPUs are much faster. What's more, it's easier for companies to analyze large quantities of data.

What for?

The personal information you provide on the Web is a veritable goldmine for companies. They use it to offer you targeted content or personalized advertising. However, the huge amount of data on the web is difficult for companies to analyze on their own.

This is where Databricks comes in: the platform stores the data of Internet and other application users. Thanks to its machine learning system, it is then possible to sort the data for easier analysis.

Once studied, this data is used by entrepreneurs, advertisers and content creators, who can then offer targeted products tailored to their customers.

Data and controversy

Despite its obvious effectiveness, data processing does not meet with unanimous approval in the public debate. Some citizens feel their privacy has been violated, and data analysis systems are regularly singled out for criticism.

Cambridge Analytica

The scandal that brought data processing and its abuses to light is linked to the company Cambridge Analytica and American giant Facebook. The leak of personal data from over 87 million Facebook users made headlines in France and around the world.

The data processing company Cambridge Analytica was then accused of exploiting the information it had collected to to influence voting intentions in favor of certain politicians. In particular, the 2016 US presidential elections had been called into question when it was revealed that Donald Trump's campaign committee appeared to have altered the voting intentions of certain key voters.

The scandal forced Facebook to apologize. Despite this, the social networking giant saw its stock market value fall significantly.

The Health Data Hub

The effectiveness of data analytics platforms is no longer in doubt. That's why, in 2018, French President Emmanuel Macron decided to launch the Health Data Hubcalling it an innovation "prefiguring the medicine of tomorrow". The project aims to modernize the public healthcare system by proposing new techniques based on artificial intelligence.

French health information is to be used by public research centers, but also by foreign private companies such as Microsoft. To access this information, however, these companies and research centers must request authorization from the Cnil (Commission Nationale de l'Informatique et des Libertés).

Despite this obligation, many French citizens have doubts about the confidentiality of their information and the sovereignty of the French state in the program.

A real craze

Despite the reluctance of some Internet users and specialists, data collection and analysis is a booming market and some of the biggest players in the Web and financial sectors are planning to capitalize on it.

A company valued at $38 billion

Databricks' visionary approach attracted a large number of investors to the round led by Franklin Templeton, including Amazon Web Services, Capital G (Google's investment arm), Salesforce and Microsoft. This operation enabled the start-up to raise $1 billion and a valuation of $28 billion.

Databricks' main assets are its various innovations, the company's growing momentum and its impressive customer catalog. The company has nearly 5,000 customers40% of whom are Fortune 500 companies.

In August 2021, Databricks received a new round of financing from Counterpoint Global. The latter contributed $1.6 billion to the company, enabling it to reach a a valuation of $38 billion. Major investors included Amazon Web Services and Salesforce Ventures.

Towards an IPO?

The excellent results of the application, already used by Microsoft Azure, are pushing the company towards an IPO. In 2020, the California-based company saw profits grow from $200 million to $350 million and the various financings enabled it to validate its "vision of a data processing platform, capable of responding to different needs, including artificial intelligence" according to Ali Ghodsi.

The CEO has no intention of stopping there. Initially, the aim is to use this funding to continue the company's its global expansion. A final challenge for the company before its IPO.

If successful, the startup's profits could rise dramatically. It's worth noting that the data processing market is highly promising, and is set to grow grow by 142 billion over the period 2020-2024, according to a study by BusinessWire.

Databricks: key information

The workforce

In 2022, data processing company Databricks will have over 4,000 employees.

Sales figures

In 2022, Databricks achieved sales of over one billion dollars.

Initial public offering

To date, Databricks has not gone public. However, all the signs are green for CEO ALi Ghodsi, who is preparing an IPO in the coming year.

Business and strategic objectives

Thanks to its recent fund-raising, Databricks intends to expand even further internationally. To achieve this, the company aims to enhance the functionality of its platform and keep pace with the latest advances in AI. 

Scalability

The data processing market is booming. Specialists at BusinessWire forecast an increase of $142 billion over the period 2020/2024. Databricks is already well established in the sector, and enjoys the confidence of major companies such as Google and T-Mobile, so we don't expect to encounter any problems in its expansion. The coming year will tell whether the IPO will have enabled the company to achieve its objectives.

Mantra/Citation CEO

The best analytical database is a lakehouse.

 

Leave a comment

Your e-mail address will not be published. Required fields are marked with *.