The First 2 Years Lessons Learned Building the Sky Showtime Data Platform | DailyDevLists

Loading video player...

Full Transcript

4,297 words • EN

Um, so welcome. Good afternoon everyone.

Thanks for joining me and making your

way across the busy conference.

My name is Dominica Malinowska and I'll

spend the next half an hour or so

telling you a little bit more about Sky

Showtime and some of the lessons that

I've learned along the way whilst

building out a data platform.

Let me start by telling you a little bit

about myself. So I'm the head of data

engineering at Sky Showtime. Um having

previously worked in software and data

engineering across media, fintech and

legal industries. I've shared my contact

details as well um on there if in case

you like the talk and wanted to connect

afterwards or if you've got any

questions.

Can I get a quick show of hands of how

many people here have heard of Scare

Showtime before?

All right, a good amount. And the Sky

Showtime people at the end raised their

hands. So that's also a good start.

Cool. So for those of you who haven't

heard of Sky Showtime, we are a

streaming service available in mainland

Europe. Uh we are 50/50 owned by Comcast

and Paramount.

Uh we are not available in the UK

unfortunately. Uh but you can think

about us a little bit like the now TV of

Europe similar type service.

We are available in over 20 different

markets uh that are split into three

groups northern Europe, Central and

Eastern Europe and Iberia. Our content

is in 19 different languages.

That's a little bit of extra context for

you which will become important later.

And there may or may not be a little pop

quiz at the end.

So Sky Show Time was first registered as

a company in November 2021. I actually

wasn't around then, but I had to look it

up on company's house.

It wasn't until April 2022 when the

first commit was made on our main uh

GitHub repo,

but the service itself started launching

in September 2022. So less than six

months after the first commit uh we've

had the service go live in the Nordics

and the rollout continued until February

2023. So all in all we had less than six

months to stand up a data platform that

would be able to provide reporting for

the business uh for our key metrics such

as the all important signups and content

ves.

So, how did how did we do it? How did we

start from zero uh to a fully operating

data platform in less than six months?

Well, I'm going to be honest, we

actually didn't do it on our own. Uh the

great thing about being a company that's

owned by some very very successful and

rich parents is that they can help us

out. Uh so, Comcast owns Sky in the UK.

I'm sure many of you are familiar with

the Sky brand and actually there's a few

Sky people here on in the in the room as

well who worked on the project. Uh so

they all know uh that

we utilize the data processing uh

platform that Sky provides for all its

propositions. But as well as that we

were able to leverage the data platform

uh that was used by Peacock which is

another Comcast proposition available in

the US um and use it as a bit of a lift

and shift.

I will show you what our data platform

architecture looks like and how it's

split between Sky and Sky Showtime. So

where you can see the line in the middle

where uh the sky side is and the sky

showtime side which we had help um from

peacock on I was I will talk about both

sides uh in a little bit more detail.

Let me start with the sky part. Uh so on

here on the very left oh I don't think

my pointer works so I'll walk over. So

on the very left here we've got the sky

showtime app. You can think about that

as the uh the web app the mobile apps

that are available general sketch time

um which is the source of majority of

our data. Uh we've then got have this

box that's the the data ingestion

platform and sky which despite being

represented as a single box is a host of

microservices uh that uh sanitize and

transform data.

All the data is then outputed into

BigQuery tables which are organized in

fairly fairly logical way and make

available to Sky Showtime.

Onto the Sky Showtime side, we we take

the data that Sky make available for us.

We also ingest our own data from our

third party partners and other sources

um places our partners such as Mayo in

the um in the Netherlands or movie star

in Spain.

We use Airflow for our or for our data

uh pipeline orchestration. So majority

of our ingestion is on bas on batch

basis and we use BigQuery as our data

warehouse. We follow the medallion

architecture. So we've got bronze,

silver and gold tables. The data that we

take from Sky, we tend to treat as our

bronze tables.

And we have our silver tables which are

transformed user level data. Uh one

example of such table could be a

uh a user table which would provide a

daily snapshot of our subscribers and

whether they're active uh when was the

last time they they watched a piece of

content, whether they sign up on a

promotion, various business metrics. And

then we have our gold layer which is the

aggregate layer.

And our gold layer serves

our Tableau dashboards which are used

across across the business. Um we also

have a data science team who use Vert.Ex

AI um for for the um data for the

machine learning models.

We also found that whilst our business

users like looking at dashboards, they

also want to interact with the data and

are not particularly keen on doing so in

SQL. Uh so we are currently in the

process of rolling out amplitude uh to

allow that self-service data cap

capability

and finally we send data to our um third

party CRM providers and particle embrace

that used for customer communications.

So that's that's our data platform. um

we actually built out I would say about

70 to 80% of this data platform in in

those first uh six months.

And so how how did we do it? Well,

we as I mentioned earlier, we we didn't

actually do it from scratch. We used Sky

to do the initial processing

uh some of the sanitizing anonymizing

and presenting that data to us. We then

lifted and shifted from PE from Peacock

uh which helped us

design our our tables, our our main

tables. And as well as that, we also had

the Peacock team at hand who have been

through a similar launch process and who

were there to support us, answer any

questions and provide advice.

And that's it. We've delivered a data

platform and it worked really well and

nothing ever went wrong. and we all

lived happily ever after.

Well, I wouldn't actually be here if it

all went so smoothly and the roll out

took six months and everything worked.

Um, and very early on we've encountered

number of different issues. So, Peacock

as a service is US-based. That means a

single country in a single language.

Whereas, as you may remember from the

earlier slides, we have over 20

territories in 19 languages. That

instantly introduced an additional level

of complexity in essentially all of our

data models. We had to account for the

different territories, the different

different languages. Uh our content is

not available in every country at the

same time. Our users also went through a

very different subscription journey. So

Peacock from the outset had two separate

tiers. It had a free tier with adverts

and it had a pay tier that was ad free.

Sky Showtime started with a single tier

which was you either pay and watch the

content or you don't have access to Sky

Showtime.

And that meant that our definition of

who the active users are were different.

We didn't have to account for any ads.

So

our tables and data models look very

different related to localization. Our

content metadata is also very different.

Um we have content in English but we

also have local content.

Our our Sky Totem app displays content

in a local language of the user which

means that a single piece of content

could have 19 variations of titles.

It could be available at different

times.

Set certain seasons of a TV show would

launch in one territory and then not

launch in another one for another six

months. And yet our our business would

would be interested in how the content

is performing across all and individual

territories.

You may have already guessed moving on

to financial data

20 over 20 countries also means

different currencies.

We had to account for our financial

reports. We had to account for different

currencies. We had to do a currency

conversion. Make sure all the money adds

up. uh and also tax rules.

They're different across various

countries unsurprisingly.

And finally, tech debt. It would be very

naive of us to think that a

a data platform that was operational for

two to three years before we we came

along and decided to borrow some of the

code. Uh that it was that all the code

was perfect in every single way. Uh and

reality was we had to on top of having

to adjust the models to suit our needs,

we also had to work through a lot of

tech debt um and make sure that we don't

introduce any

incorrect uh data into into our our

platform.

But we did it. we managed to to launch

the data platform um in time for for the

for the launches. So in time for from

the September to February 2023.

But that wasn't and we took took a

little bit of time to regroup, refactor

our code, clean up where we needed. Um

but we didn't have a lot of time to do

that because our business is very

ambitious and it moves fast and so we

had to as well. And in April 2024, we

launched an advertising tier which was

swiftly followed by a launch of a

premium tier in October 2024.

I won't go into detail as to why um but

for number of reasons the

implementations

were were different and it meant that

a big chunk of our data models had to be

rewritten again to support the both the

advertising tier and then the premium

tier and our customers needed to be

migrated onto th those new tiers.

I have a sneak peek here of what the uh

what the what the tiering tier offering

looks like now. So we've got the premium

tier on top followed by the standard

tier and the standard with ads. If

anyone is from the sky show territories,

feel free to have a look yourself and

sign up.

But this time around we didn't have we

didn't have peacock to lift and shift

from. Peacock haven't been through the

similar journey as ourselves.

Uh we also didn't have to have have

anyone to support us in terms of

answering questions or providing advice.

We had to do it on on our own.

And once again, I'm pleased to say that

we we managed to to get there. we

managed to um deliver the data model

updates in time for for our launches

which is a great great story to tell

tell the business. Uh but reflecting

back as to how and why we were able to

deliver on time in such a short period

of time.

There were two two main pillars and

that's our people and the development

practices that we set for ourselves.

Starting with people,

it wasn't just the people or the

developers that that wrote the code and

it meant that things worked.

There was actually a lot more that that

went into having a successful team. And

here are some of the qualities that um

that really stood out over over that

that first 18 months, couple of years

really both during the launches and

whilst we're rolling out the advertising

and um and premium tiers

and we do look look for these these uh

qualities as well when we when we hire

um going going forward. And firstly that

was cur curiosity. Um for those of you

that have worked with with big data uh

you'll find that there is about a

hundred different user journeys at any

time and different edge cases and

things. There's always a race condition

that will ruin what you expect a user

journey to be. And we had people that

were willing to dig into that data,

compare timestamps, really trying to

figure out what the what the edge cases

are

and how we can and how we can handle

those in in our code.

At the same time, our people are are

creative

and we have having this shared data

platform that you've seen in in the f in

the first couple of slides, the

architecture slides. It means that we we

can't always do what we want. We we have

to work with our partners. We we don't

control everything. And sometimes we

have to work outside the box to to get

things done. And that's exactly what

what our teams will do. And one good

example of that is one of our engineers

quion um came up with a framework to uh

dynamically generate airflow DAGs

which hugely reduced the amount of code

that needs to be written every time we

want to deploy um a new data model and

actually removed um a lot of errors um

along the way as well.

And finally, our people are pragmatic.

So it's great to stay be curious and

creative and think of best solutions and

different different solutions out the

box. But we had very tight time scales

and we needed to to deliver on time. And

so sometimes the best solution was the

one that worked and and we sometimes we

had to accept that.

And moving on to development practices.

I'm sure you'll recognize all of these.

They are borrowed from software

engineering but very much applicable in

data engineering.

We keep it simple.

We don't write code just to to show off

our skills. The code is there to be

easily understandable by anyone who who

reads it. We have agreed coding

standards and style guides.

If a piece of code isn't going to be

used, it can go. We we don't keep

commented out code. Uh we we don't write

code because it may be useful in the

future. Yes, we try and understand what

the what the requirements are and what

the bounds of the requirements are and

what the future use cases may be, but we

don't try and unnecessarily future

proof.

And but at the same time if we know that

something's going to be used more than

once we try and extract it into uh

helper functions or airflow operators

to make sure that if we need if we ever

need to update it again we update it in

a single place and so it doesn't

introduce additional complexity.

But despite having the great people,

great team in place and the good

development practices,

we are still working on solving quite

quite a few challenges from those from

those early years. Um, sadly, we

couldn't just claim success and call it

a day. Um, we've introduced our own tech

tech debt.

The the pragmatism I talked about

earlier, it means that sometimes we will

just accept that a solution can be

suboptimal and we may have to go back to

it later and work through it and that's

that's where we are. We still have some

refactoring to do.

An interesting one is the operator

model. So by using a lift and shift from

from Peacock, we actually inadvertently

inherited the operating model which

isn't necessarily suited for for our

business. Uh Peacock is a large company

with a large platform, a lot of

customers, a lot of data engineers and a

lot of analysts that can write custom

scripts to and custom dashboards and

support one-off insights very easily for

all of their stakeholders.

we have we have much smaller teams and

we can't serve every single stakeholder

within a within a business in a timely

manner and so we need we need to do more

with less and we need other stakeholders

to do more and that's why we we are

investing into amplitudes to try and um

create a more of a self-s served data

culture at co showtime

there's also data quality

um who isn't worried about the data

quality.

We we do what we can. We've got some

alerts. Um we when we create new data

models, we try and think about what

could go wrong. Uh but we still have a

long way to go. It has not been a

priority because when you try and roll

out our data platform in in six months,

there are certain trade-offs you will

have to make and data quality has been

one of those. And so we are looking to

invest more time into data quality.

And having gotten to over two years,

we're actually coming up to three years.

Um it's time to start looking at our

costs. Um particularly our platform

costs. We have some suboptimal tables.

Uh again, tradeoffs and is something

that we are looking at more actively.

So what's next for us? Well, I think

I've covered some of those already, but

uh we're continuing to enabling our our

business 2025 initiatives. We've got Q4

to go with some with some big promotions

coming up. We we are looking at 2026 and

what 2026 looks for us and the business

and what main what the main goals are.

We're redesigning and refactoring. So,

we're very much dealing with that tech

debt. uh we are trying to reduce the the

cloud costs and we we're looking at some

some of our most costly queries and

dashboards and whether there is some

improvements to be made there.

A big one is enabling that self-service

data via Amplitude. And that's both

integrating with Amplitude, but also

empowering our our business users to dig

into data and um try and learn as much

as they can from from the data itself.

And finally, we're looking to invest

into data quality.

So that is a lot and

it's all happening at once and we are

always busy but it's all part of the

fun.

And before I finish up I just wanted to

leave you with with some with some

takeaways.

And the first one being use accelerators

where where possible.

Greenfield projects are fun. It is from

an engineering perspective is really fun

to work on. But actually when you've got

ambitious goals and when you've got

tight time timelines,

there's there's nothing wrong with

relying on your partners to using to

looking at what blueprints are

available.

And the small decisions can have

long-term impact. Whether it's good or

bad, whether it's introducing the right

um engineering standards or whether it's

inheriting an operating model that's not

suitable for your for your business.

And finally, people really are our

superpower.

We I wouldn't be standing here if we if

it wasn't for for our our team who went

above and beyond to deliver the platform

and continue to to deliver really good

quality quality work and enjoy doing so.

And that's all that's all from me.

Thanks everyone.

Do we have any questions?

Go ahead.

>> Okay. Uh so the question was what was

the size of the team in the first six

months? Uh so we started with so I

actually joined as as the first

full-time uh sky show data engineer uh

but was uh worked with three consultant

data engineers and we slowly um moved

hired more internal engineers and kind

of phased out the the consultant side.

Uh any more questions? Oh, I think

there's a mic coming coming around.

>> Thank you. Um, so I I've worked on some

of the projects we spoke earlier and uh

I I found that basically as the platform

scales the more stuff you have the more

basically things can wrong obviously. Uh

so what are some things that you've

done? You mentioned so automated uh um

airflow DAGs earlier.

>> Yeah. uh what are some other things that

you've done basically to speed up you

know uh development and maintenance of

the platform uh in like automated

testing monitoring CI and so on.

>> Uh sure. So yeah the automated the

framework to create airflow DAGs um has

probably been the biggest piece of work

that that we've done that um enabled

scaling the platform. Um automating

testing is another one that we've that

we've introduced. Uh we've also

introduced

um

SQL query passing at serialization ra um

rather than runtime for for airflow so

we know when issues um appear before

they go go live. Um we we have in on a

more of a process side, we introduce a

development process that we that all

engineers follow to make sure that

everything is is thoroughly tested

and we do follow um we have sty style

guides and engineering standards that

that we try and try and follow.

So you uh you mentioned that it took uh

six months to develop the platform which

is a amazing feat. Uh I was interested

uh in how were you able to make it

privacy compliant as well in that uh

short amount of time. And so 6 months

was the platform but how long did it

take to drive some value some business

value from the platform that you that

you created here let's say and I can

elaborate more if you want. Uh yeah

sure. Um in terms of uh privacy

compliance uh so we are GCP based so we

we heavily we we leverage the

capabilities of of GCP uh so we use RO

based access um for for our data layers.

Um it is used internally by sky showtime

only as well. Um so and we have a SRE

team that sits within Sky that helped us

with uh standing up the infrastructure.

Um in terms of how we we then

drew drew business insights or how we

enabled the platform to drive business

insights. Um our gold layer uh in within

a data warehouse served uh the Tableau

dashboard. So we had we've have a

business analysis uh team who was then

creating those uh Tableau dashboards and

made those available to the end users.

Um so the buildout of Tableau dashboards

was happening at the same time as we're

building out the silver and the gold

layers and so the the actual business

insights were available as soon as uh we

were finished. So it was it wasn't a

people didn't wait for the data models

to be finished to start building

dashboards. They were being built at the

same time. Hopefully that answers the

your question.

Um hi. Yeah. So I wanted to ask so I saw

that there's you know quite a few tools

in your tech stack and I just wanted to

know like when you were building

everything and like how did you decide

which tools to use for which part of

your stack and do you did you go with

any like open- source alternatives as

well rather than like doing SAS cuz I

remember you said cost is now becoming

like a concern. So, do you think that it

it might be a good idea to offsource

some things to open source because I

mean I've heard pitches all day for like

different products, but just wanted to

hear your take on that.

>> Uh yeah, sure. So, I to be honest, we

actually didn't get cho we didn't get

choice in terms of what tools to use.

So, that that came as part part of the

lift and shift. Um the all of the

infrastructure we've got exactly the

same thing that Peacock had for for

their data platform. Um, and if I'm

perfectly honest, I'm not sure that

thinking back at it, we would have

chosen the same tech stack. And that's

that was pro possibly our bigger

challenge than than choosing is the the

not having ability to choose.

And th those are some of the

restrictions that we do have to work uh

with within as part of the the hybrid

model that we've seen with Sky and Sky

Showtime. Um in terms of the the cost

saving at the moment looking at our

costs our biggest cost savings will be

in the data warehouse space. Uh so we we

are reviewing our silver and gold

tables. Um whether all of them are still

needed our business has moved on uh for

nearly nearly three years. Are they

using all the dashboards? Are dashboards

built well? Are the gold tables scanning

all of the data or are they using

partitions effectively? Uh so these are

all questions that we we currently

asking ourselves and refactoring where

needed.

>> Cool. I think we are pretty much on

time. So thanks everyone. Thanks for

joining me and enjoy the rest of the

conference.

The First 2 Years Lessons Learned Building the Sky Showtime Data Platform

Big Data LDN

10 days ago

29:32

Platform Engineering & DevOps Culture

Rank #6

Description

Streaming Analytics & Decision Automation Theatre Wednesday, 24th Sep 16:00 - 16:30 What does it take to deploy a data platform serving 22 European territories? We built it in just six months, and two years on, I'm sharing the insights. This is a journey filled with challenges and discoveries. I'll be sharing the inside story of how SkyShowtime achieved this ambitious goal, from the initial vision to the realities of large-scale implementation. Expect candid insights, valuable lessons, and a glimpse into how we're leveraging data to fuel SkyShowtime's continued growth and innovation. Dominika Malinowska Head of Data Engineering, SkyShowtime Dominika is Head of Data Engineering at SkyShowtime, Europe’s newest streaming service. She leads a team of data engineers in building and scaling a robust, cloud-based data platform that powers critical business insights. With a background in Big Data, she brings hands-on experience in Python, Scala, and distributed data systems. Outside of work, Dominika teaches coding to beginners and is a regular speaker at tech events. She’s passionate about mentoring, technology education, and exploring new tools and frameworks.

Video Details

Category

Platform Engineering & DevOps Culture

Featured Date

November 7, 2025

Quality Rank

#6

AI Recommended