Loading video player...
Um, so welcome. Good afternoon everyone.
Thanks for joining me and making your
way across the busy conference.
My name is Dominica Malinowska and I'll
spend the next half an hour or so
telling you a little bit more about Sky
Showtime and some of the lessons that
I've learned along the way whilst
building out a data platform.
Let me start by telling you a little bit
about myself. So I'm the head of data
engineering at Sky Showtime. Um having
previously worked in software and data
engineering across media, fintech and
legal industries. I've shared my contact
details as well um on there if in case
you like the talk and wanted to connect
afterwards or if you've got any
questions.
Can I get a quick show of hands of how
many people here have heard of Scare
Showtime before?
All right, a good amount. And the Sky
Showtime people at the end raised their
hands. So that's also a good start.
Cool. So for those of you who haven't
heard of Sky Showtime, we are a
streaming service available in mainland
Europe. Uh we are 50/50 owned by Comcast
and Paramount.
Uh we are not available in the UK
unfortunately. Uh but you can think
about us a little bit like the now TV of
Europe similar type service.
We are available in over 20 different
markets uh that are split into three
groups northern Europe, Central and
Eastern Europe and Iberia. Our content
is in 19 different languages.
That's a little bit of extra context for
you which will become important later.
And there may or may not be a little pop
quiz at the end.
So Sky Show Time was first registered as
a company in November 2021. I actually
wasn't around then, but I had to look it
up on company's house.
It wasn't until April 2022 when the
first commit was made on our main uh
GitHub repo,
but the service itself started launching
in September 2022. So less than six
months after the first commit uh we've
had the service go live in the Nordics
and the rollout continued until February
2023. So all in all we had less than six
months to stand up a data platform that
would be able to provide reporting for
the business uh for our key metrics such
as the all important signups and content
ves.
So, how did how did we do it? How did we
start from zero uh to a fully operating
data platform in less than six months?
Well, I'm going to be honest, we
actually didn't do it on our own. Uh the
great thing about being a company that's
owned by some very very successful and
rich parents is that they can help us
out. Uh so, Comcast owns Sky in the UK.
I'm sure many of you are familiar with
the Sky brand and actually there's a few
Sky people here on in the in the room as
well who worked on the project. Uh so
they all know uh that
we utilize the data processing uh
platform that Sky provides for all its
propositions. But as well as that we
were able to leverage the data platform
uh that was used by Peacock which is
another Comcast proposition available in
the US um and use it as a bit of a lift
and shift.
I will show you what our data platform
architecture looks like and how it's
split between Sky and Sky Showtime. So
where you can see the line in the middle
where uh the sky side is and the sky
showtime side which we had help um from
peacock on I was I will talk about both
sides uh in a little bit more detail.
Let me start with the sky part. Uh so on
here on the very left oh I don't think
my pointer works so I'll walk over. So
on the very left here we've got the sky
showtime app. You can think about that
as the uh the web app the mobile apps
that are available general sketch time
um which is the source of majority of
our data. Uh we've then got have this
box that's the the data ingestion
platform and sky which despite being
represented as a single box is a host of
microservices uh that uh sanitize and
transform data.
All the data is then outputed into
BigQuery tables which are organized in
fairly fairly logical way and make
available to Sky Showtime.
Onto the Sky Showtime side, we we take
the data that Sky make available for us.
We also ingest our own data from our
third party partners and other sources
um places our partners such as Mayo in
the um in the Netherlands or movie star
in Spain.
We use Airflow for our or for our data
uh pipeline orchestration. So majority
of our ingestion is on bas on batch
basis and we use BigQuery as our data
warehouse. We follow the medallion
architecture. So we've got bronze,
silver and gold tables. The data that we
take from Sky, we tend to treat as our
bronze tables.
And we have our silver tables which are
transformed user level data. Uh one
example of such table could be a
uh a user table which would provide a
daily snapshot of our subscribers and
whether they're active uh when was the
last time they they watched a piece of
content, whether they sign up on a
promotion, various business metrics. And
then we have our gold layer which is the
aggregate layer.
And our gold layer serves
our Tableau dashboards which are used
across across the business. Um we also
have a data science team who use Vert.Ex
AI um for for the um data for the
machine learning models.
We also found that whilst our business
users like looking at dashboards, they
also want to interact with the data and
are not particularly keen on doing so in
SQL. Uh so we are currently in the
process of rolling out amplitude uh to
allow that self-service data cap
capability
and finally we send data to our um third
party CRM providers and particle embrace
that used for customer communications.
So that's that's our data platform. um
we actually built out I would say about
70 to 80% of this data platform in in
those first uh six months.
And so how how did we do it? Well,
we as I mentioned earlier, we we didn't
actually do it from scratch. We used Sky
to do the initial processing
uh some of the sanitizing anonymizing
and presenting that data to us. We then
lifted and shifted from PE from Peacock
uh which helped us
design our our tables, our our main
tables. And as well as that, we also had
the Peacock team at hand who have been
through a similar launch process and who
were there to support us, answer any
questions and provide advice.
And that's it. We've delivered a data
platform and it worked really well and
nothing ever went wrong. and we all
lived happily ever after.
Well, I wouldn't actually be here if it
all went so smoothly and the roll out
took six months and everything worked.
Um, and very early on we've encountered
number of different issues. So, Peacock
as a service is US-based. That means a
single country in a single language.
Whereas, as you may remember from the
earlier slides, we have over 20
territories in 19 languages. That
instantly introduced an additional level
of complexity in essentially all of our
data models. We had to account for the
different territories, the different
different languages. Uh our content is
not available in every country at the
same time. Our users also went through a
very different subscription journey. So
Peacock from the outset had two separate
tiers. It had a free tier with adverts
and it had a pay tier that was ad free.
Sky Showtime started with a single tier
which was you either pay and watch the
content or you don't have access to Sky
Showtime.
And that meant that our definition of
who the active users are were different.
We didn't have to account for any ads.
So
our tables and data models look very
different related to localization. Our
content metadata is also very different.
Um we have content in English but we
also have local content.
Our our Sky Totem app displays content
in a local language of the user which
means that a single piece of content
could have 19 variations of titles.
It could be available at different
times.
Set certain seasons of a TV show would
launch in one territory and then not
launch in another one for another six
months. And yet our our business would
would be interested in how the content
is performing across all and individual
territories.
You may have already guessed moving on
to financial data
20 over 20 countries also means
different currencies.
We had to account for our financial
reports. We had to account for different
currencies. We had to do a currency
conversion. Make sure all the money adds
up. uh and also tax rules.
They're different across various
countries unsurprisingly.
And finally, tech debt. It would be very
naive of us to think that a
a data platform that was operational for
two to three years before we we came
along and decided to borrow some of the
code. Uh that it was that all the code
was perfect in every single way. Uh and
reality was we had to on top of having
to adjust the models to suit our needs,
we also had to work through a lot of
tech debt um and make sure that we don't
introduce any
incorrect uh data into into our our
platform.
But we did it. we managed to to launch
the data platform um in time for for the
for the launches. So in time for from
the September to February 2023.
But that wasn't and we took took a
little bit of time to regroup, refactor
our code, clean up where we needed. Um
but we didn't have a lot of time to do
that because our business is very
ambitious and it moves fast and so we
had to as well. And in April 2024, we
launched an advertising tier which was
swiftly followed by a launch of a
premium tier in October 2024.
I won't go into detail as to why um but
for number of reasons the
implementations
were were different and it meant that
a big chunk of our data models had to be
rewritten again to support the both the
advertising tier and then the premium
tier and our customers needed to be
migrated onto th those new tiers.
I have a sneak peek here of what the uh
what the what the tiering tier offering
looks like now. So we've got the premium
tier on top followed by the standard
tier and the standard with ads. If
anyone is from the sky show territories,
feel free to have a look yourself and
sign up.
But this time around we didn't have we
didn't have peacock to lift and shift
from. Peacock haven't been through the
similar journey as ourselves.
Uh we also didn't have to have have
anyone to support us in terms of
answering questions or providing advice.
We had to do it on on our own.
And once again, I'm pleased to say that
we we managed to to get there. we
managed to um deliver the data model
updates in time for for our launches
which is a great great story to tell
tell the business. Uh but reflecting
back as to how and why we were able to
deliver on time in such a short period
of time.
There were two two main pillars and
that's our people and the development
practices that we set for ourselves.
Starting with people,
it wasn't just the people or the
developers that that wrote the code and
it meant that things worked.
There was actually a lot more that that
went into having a successful team. And
here are some of the qualities that um
that really stood out over over that
that first 18 months, couple of years
really both during the launches and
whilst we're rolling out the advertising
and um and premium tiers
and we do look look for these these uh
qualities as well when we when we hire
um going going forward. And firstly that
was cur curiosity. Um for those of you
that have worked with with big data uh
you'll find that there is about a
hundred different user journeys at any
time and different edge cases and
things. There's always a race condition
that will ruin what you expect a user
journey to be. And we had people that
were willing to dig into that data,
compare timestamps, really trying to
figure out what the what the edge cases
are
and how we can and how we can handle
those in in our code.
At the same time, our people are are
creative
and we have having this shared data
platform that you've seen in in the f in
the first couple of slides, the
architecture slides. It means that we we
can't always do what we want. We we have
to work with our partners. We we don't
control everything. And sometimes we
have to work outside the box to to get
things done. And that's exactly what
what our teams will do. And one good
example of that is one of our engineers
quion um came up with a framework to uh
dynamically generate airflow DAGs
which hugely reduced the amount of code
that needs to be written every time we
want to deploy um a new data model and
actually removed um a lot of errors um
along the way as well.
And finally, our people are pragmatic.
So it's great to stay be curious and
creative and think of best solutions and
different different solutions out the
box. But we had very tight time scales
and we needed to to deliver on time. And
so sometimes the best solution was the
one that worked and and we sometimes we
had to accept that.
And moving on to development practices.
I'm sure you'll recognize all of these.
They are borrowed from software
engineering but very much applicable in
data engineering.
We keep it simple.
We don't write code just to to show off
our skills. The code is there to be
easily understandable by anyone who who
reads it. We have agreed coding
standards and style guides.
If a piece of code isn't going to be
used, it can go. We we don't keep
commented out code. Uh we we don't write
code because it may be useful in the
future. Yes, we try and understand what
the what the requirements are and what
the bounds of the requirements are and
what the future use cases may be, but we
don't try and unnecessarily future
proof.
And but at the same time if we know that
something's going to be used more than
once we try and extract it into uh
helper functions or airflow operators
to make sure that if we need if we ever
need to update it again we update it in
a single place and so it doesn't
introduce additional complexity.
But despite having the great people,
great team in place and the good
development practices,
we are still working on solving quite
quite a few challenges from those from
those early years. Um, sadly, we
couldn't just claim success and call it
a day. Um, we've introduced our own tech
tech debt.
The the pragmatism I talked about
earlier, it means that sometimes we will
just accept that a solution can be
suboptimal and we may have to go back to
it later and work through it and that's
that's where we are. We still have some
refactoring to do.
An interesting one is the operator
model. So by using a lift and shift from
from Peacock, we actually inadvertently
inherited the operating model which
isn't necessarily suited for for our
business. Uh Peacock is a large company
with a large platform, a lot of
customers, a lot of data engineers and a
lot of analysts that can write custom
scripts to and custom dashboards and
support one-off insights very easily for
all of their stakeholders.
we have we have much smaller teams and
we can't serve every single stakeholder
within a within a business in a timely
manner and so we need we need to do more
with less and we need other stakeholders
to do more and that's why we we are
investing into amplitudes to try and um
create a more of a self-s served data
culture at co showtime
there's also data quality
um who isn't worried about the data
quality.
We we do what we can. We've got some
alerts. Um we when we create new data
models, we try and think about what
could go wrong. Uh but we still have a
long way to go. It has not been a
priority because when you try and roll
out our data platform in in six months,
there are certain trade-offs you will
have to make and data quality has been
one of those. And so we are looking to
invest more time into data quality.
And having gotten to over two years,
we're actually coming up to three years.
Um it's time to start looking at our
costs. Um particularly our platform
costs. We have some suboptimal tables.
Uh again, tradeoffs and is something
that we are looking at more actively.
So what's next for us? Well, I think
I've covered some of those already, but
uh we're continuing to enabling our our
business 2025 initiatives. We've got Q4
to go with some with some big promotions
coming up. We we are looking at 2026 and
what 2026 looks for us and the business
and what main what the main goals are.
We're redesigning and refactoring. So,
we're very much dealing with that tech
debt. uh we are trying to reduce the the
cloud costs and we we're looking at some
some of our most costly queries and
dashboards and whether there is some
improvements to be made there.
A big one is enabling that self-service
data via Amplitude. And that's both
integrating with Amplitude, but also
empowering our our business users to dig
into data and um try and learn as much
as they can from from the data itself.
And finally, we're looking to invest
into data quality.
So that is a lot and
it's all happening at once and we are
always busy but it's all part of the
fun.
And before I finish up I just wanted to
leave you with with some with some
takeaways.
And the first one being use accelerators
where where possible.
Greenfield projects are fun. It is from
an engineering perspective is really fun
to work on. But actually when you've got
ambitious goals and when you've got
tight time timelines,
there's there's nothing wrong with
relying on your partners to using to
looking at what blueprints are
available.
And the small decisions can have
long-term impact. Whether it's good or
bad, whether it's introducing the right
um engineering standards or whether it's
inheriting an operating model that's not
suitable for your for your business.
And finally, people really are our
superpower.
We I wouldn't be standing here if we if
it wasn't for for our our team who went
above and beyond to deliver the platform
and continue to to deliver really good
quality quality work and enjoy doing so.
And that's all that's all from me.
Thanks everyone.
Do we have any questions?
Go ahead.
>> Okay. Uh so the question was what was
the size of the team in the first six
months? Uh so we started with so I
actually joined as as the first
full-time uh sky show data engineer uh
but was uh worked with three consultant
data engineers and we slowly um moved
hired more internal engineers and kind
of phased out the the consultant side.
Uh any more questions? Oh, I think
there's a mic coming coming around.
>> Thank you. Um, so I I've worked on some
of the projects we spoke earlier and uh
I I found that basically as the platform
scales the more stuff you have the more
basically things can wrong obviously. Uh
so what are some things that you've
done? You mentioned so automated uh um
airflow DAGs earlier.
>> Yeah. uh what are some other things that
you've done basically to speed up you
know uh development and maintenance of
the platform uh in like automated
testing monitoring CI and so on.
>> Uh sure. So yeah the automated the
framework to create airflow DAGs um has
probably been the biggest piece of work
that that we've done that um enabled
scaling the platform. Um automating
testing is another one that we've that
we've introduced. Uh we've also
introduced
um
SQL query passing at serialization ra um
rather than runtime for for airflow so
we know when issues um appear before
they go go live. Um we we have in on a
more of a process side, we introduce a
development process that we that all
engineers follow to make sure that
everything is is thoroughly tested
and we do follow um we have sty style
guides and engineering standards that
that we try and try and follow.
So you uh you mentioned that it took uh
six months to develop the platform which
is a amazing feat. Uh I was interested
uh in how were you able to make it
privacy compliant as well in that uh
short amount of time. And so 6 months
was the platform but how long did it
take to drive some value some business
value from the platform that you that
you created here let's say and I can
elaborate more if you want. Uh yeah
sure. Um in terms of uh privacy
compliance uh so we are GCP based so we
we heavily we we leverage the
capabilities of of GCP uh so we use RO
based access um for for our data layers.
Um it is used internally by sky showtime
only as well. Um so and we have a SRE
team that sits within Sky that helped us
with uh standing up the infrastructure.
Um in terms of how we we then
drew drew business insights or how we
enabled the platform to drive business
insights. Um our gold layer uh in within
a data warehouse served uh the Tableau
dashboard. So we had we've have a
business analysis uh team who was then
creating those uh Tableau dashboards and
made those available to the end users.
Um so the buildout of Tableau dashboards
was happening at the same time as we're
building out the silver and the gold
layers and so the the actual business
insights were available as soon as uh we
were finished. So it was it wasn't a
people didn't wait for the data models
to be finished to start building
dashboards. They were being built at the
same time. Hopefully that answers the
your question.
Um hi. Yeah. So I wanted to ask so I saw
that there's you know quite a few tools
in your tech stack and I just wanted to
know like when you were building
everything and like how did you decide
which tools to use for which part of
your stack and do you did you go with
any like open- source alternatives as
well rather than like doing SAS cuz I
remember you said cost is now becoming
like a concern. So, do you think that it
it might be a good idea to offsource
some things to open source because I
mean I've heard pitches all day for like
different products, but just wanted to
hear your take on that.
>> Uh yeah, sure. So, I to be honest, we
actually didn't get cho we didn't get
choice in terms of what tools to use.
So, that that came as part part of the
lift and shift. Um the all of the
infrastructure we've got exactly the
same thing that Peacock had for for
their data platform. Um, and if I'm
perfectly honest, I'm not sure that
thinking back at it, we would have
chosen the same tech stack. And that's
that was pro possibly our bigger
challenge than than choosing is the the
not having ability to choose.
And th those are some of the
restrictions that we do have to work uh
with within as part of the the hybrid
model that we've seen with Sky and Sky
Showtime. Um in terms of the the cost
saving at the moment looking at our
costs our biggest cost savings will be
in the data warehouse space. Uh so we we
are reviewing our silver and gold
tables. Um whether all of them are still
needed our business has moved on uh for
nearly nearly three years. Are they
using all the dashboards? Are dashboards
built well? Are the gold tables scanning
all of the data or are they using
partitions effectively? Uh so these are
all questions that we we currently
asking ourselves and refactoring where
needed.
>> Cool. I think we are pretty much on
time. So thanks everyone. Thanks for
joining me and enjoy the rest of the
conference.
Streaming Analytics & Decision Automation Theatre Wednesday, 24th Sep 16:00 - 16:30 What does it take to deploy a data platform serving 22 European territories? We built it in just six months, and two years on, I'm sharing the insights. This is a journey filled with challenges and discoveries. I'll be sharing the inside story of how SkyShowtime achieved this ambitious goal, from the initial vision to the realities of large-scale implementation. Expect candid insights, valuable lessons, and a glimpse into how we're leveraging data to fuel SkyShowtime's continued growth and innovation. Dominika Malinowska Head of Data Engineering, SkyShowtime Dominika is Head of Data Engineering at SkyShowtime, Europe’s newest streaming service. She leads a team of data engineers in building and scaling a robust, cloud-based data platform that powers critical business insights. With a background in Big Data, she brings hands-on experience in Python, Scala, and distributed data systems. Outside of work, Dominika teaches coding to beginners and is a regular speaker at tech events. She’s passionate about mentoring, technology education, and exploring new tools and frameworks.