SELECT video_content FROM LIBRARY WHERE SLUG='15-07-24-the-importance-of-collaboration-in-analytics-engineering-count-at-accenture'

[15.07.24] The importance of collaboration in analytics engineering - Count at Accenture

Name: [15.07.24] The importance of collaboration in analytics engineering - Count at Accenture
Uploaded: 2025-07-28T10:20:02Z
Description: Watch [15.07.24] The importance of collaboration in analytics engineering - Count at Accenture - Learn about data analytics and business intelligence with Count.

Today's session, we're gonna kick things off now. If you want, go ahead and put where you're coming in from in the comments. I think it's a nice way to to say hello and kick things off. You obviously don't have to. Join today with Justin Friels, who is the data engineering senior manager at Accenture. We got lots of stuff about Justin today. He's got some really great examples, but we're gonna be talking a lot about, the importance of analytics engineering in particular on collaboration, and how Justin and the team are using Cal to to help them do that. In terms of some admin before we get started, this will be recorded. We'll send out all the info afterwards, of course. And if you have any questions, please just ask at any time. I'll be keeping an eye on that. So if you've got questions, you don't have to wait till the end. You know, Justin's ready for questions as we go. So, please don't hesitate to kind of to ask your question in the chat at at any point. Then, with that, maybe we'll just take a look at the at the agenda then. So we're gonna go through a bit of context. Justin's gonna set the scene about what, his team does at Accenture. You can imagine, there's probably quite a few data engineering teams who'll get bit of context about what his team does. And then we'll talk about how he's using count, across a variety of different use cases, and then we'll go through a live example in the Canvas. And with that, Justin, I'll I'll hand over to you. Maybe you can start a brief intro to yourself and and your team at at Accenture. Yeah. Perfect. Hey, everybody. Justin Friels. I am at Accenture, on the data engineering data engineering side, based out of Austin, Texas. So good to see everybody on here. Thanks for joining. Yeah. So we're gonna talk about a bunch of things. Just a little background on on what we do at Accenture. So, I've been at Accenture for about a year and a half now, I think, maybe two years. We were part of an acquisition. So we were at a five hundred person company, commerce agency called The Stable, and we were acquired couple years ago by Accenture. So now we're in the massive Accenture ecosystem of, you know, I think, seven hundred and fifty thousand employees, something like that. So we do a lot of things at Accenture on the data side. Just, generally, we we are consolidating. We're pulling data from all the various commerce platforms all over the world. So think big big box retailers like your Walmarts, your Targets, your Amazons. We, as a company, manage, advertising for products at retailers. So we do Facebook advertising, Instagram, LinkedIn, retail media networks at these various retailers, things like that. And so our team is sort of in the middle of all that. So how do we pull this data? How do we, store it? How do we model it? How do we, set it up for analytics? And then, at the end of the day, get reporting out to internal and external folks at Accenture and beyond. So that's the general background for us. From a team size, if we think about, again, coming from a smaller company, we've got probably thirty ish people, I would say, on the engineering side, so maybe pure engineering building applications, writing back to various APIs to do all the fun commerce stuff that we do. And then that spans from, again, pure engineering to data engineering to analytics engineering and and then analytics reporting, BI, that sort of fun stuff. So there's a lot of us on that on that front across that spectrum. And then subset of those people, we'll call it five ish, that are actually doing DBT work. And then we'll talk about how we how we hook count into all of that as well. Anything else you want me to touch on or or miss out on? No. I think that's great. And I was just about to ask about the data stack, but I think that's where you're headed next. So Cool. Yeah. So a bunch of slides in here. Just I wanna start sort of very broadly, and then we'll zoom all the way into actually using count and looking at some actual data models. So I'll start with the tech stack. So, again, we work in the commerce space. So if we just kinda go from left to right here, typical data architecture slide or, you know, how data moves through. This is very simplified. Actually, we can zoom in. So Taylor apologies if you can hear my text in the background. Taylor's showing me all the cool things in in the, reporting piece of account, which I've used a little bit, but not this much. So we'll just see slides and doing a lot of fun stuff. So we've got on the left hand side is all of our data sources. So we're gonna kinda mention these, but this is all the all the retailers, all the media platforms, any other place that we get data, whether it's the client context, you know, specific things related to them, or there's other platforms that we support. For the most part, we're pulling all that data through Airflow, so we write all of our own Python code to pull data from these platforms. I think we've we looked at various, SaaS offerings and things like that, but because of the amount of data we have to pull, the amount of clients that we support, the number of integrations we have, it just made sense for us to build build our own. So all that stuff is on an airflow. We're on the AWS, cloud stack. So we use s three for our data lake. We use Redshift for our data warehouse. We've got DBT, doing all the modeling, on top of Redshift data. We've got count in here connecting to all the data in Redshift as well. And then from a sort of end user perspective, we use Power BI for our BI layer. And then all everything that we do is is wrapped up in our app, which I'll touch on briefly here. I just put Accenture as this, box here, but we've got an app where we're embedding, reporting. We have all sorts of other functionality, inside of that app. And so some of that data from that app also kinda circles background here and gets pulled back into this process as well. Just to give you a feel for what the, that app looks like, we call it Tower. This is sort of a full end to end commerce app, if you will. So this is just a reporting piece for a specific client. So just looking at it, again, a a Power BI dashboard embedded inside of here. But, again, this is sort of core to everything that we do, how we manage clients, how we manage data, all of that, and then and then actually report all that stuff back out to to end users. Any questions on that, tech stack or any anything else? No. I think that's good. I think it highlights, yeah, the breadth that you mentioned before, but also that you have a a couple different kind of stakeholders that you're you're working with. It's not just you know, a lot of teams just work with, like, internal stakeholders doing internal BI, but you have this additional kind of client complexity to that as well. I guess yeah. For sure. So I'll keep going on complexity. So, we've got, like I mentioned, a lot of platforms that we support that we pull data from. So this is just kind of an example of of how we think about data from those various platforms, how those roll up into various use cases, and then use cases on top of use cases and things like that. So on the left hand side here, just thinking about from a media perspective. So, again, we're running advertising through Facebook and various retail media networks, etcetera. So you think about those as the media platforms. So we have data models that relate to individual platforms, and then we also have data models on top of that that are cross platform. So if I made, you know, I'm selling products at at, you know, various places and I'm running media through various platforms, I wanna see all of that consolidated, from a reporting standpoint. So we think about that as cross platform media. Similarly, on the retail side, so this is sales and inventory at various retailers, and all the data that comes from those. We've got various platforms here. Those also have their own data models. We roll those up to cross cross platform retail. So I'm not just selling at Walmart. I'm selling at Target, Walmart, Amazon, etcetera, direct to consumer. And then you really wanna see a picture of the whole business, so not only how sales are doing in various products and inventory use cases, but how media is impacting that. Is media driving more sales? What percentage of sales is media, etcetera? So then we have cross platform retail media. So this, again, just a very simplified view of the world, but lots of platforms that roll up lots of platforms that roll up, for various use cases. So from a actually, before I get into the d v t side, just a little bit more on the complexity here. So we also manage data for a lot of different clients. And as you can imagine, every client is unique. So while we may support, you know, Facebook as a whole, not every client is running ads through Facebook, for example. So this is just an just sort of an example of those various platforms that we have. Different clients have different combinations of all those platforms. And then, again, in our, internal commerce app, we talk about or we manage all of this through, data integrations at a client level. So this is just an example of of our internal app where this is where we just we would associate this particular platform and this, what we call platform customer ID, but, like, an account ID on these various platforms, how those relate to a client. So this is really core to everything that we do because we need to manage we need to pull data for a specific platform for a specific client. And then from a modeling perspective, we have to be able to link this is all this client's data. This is this other client's data when when they were all the new use cases, which clients have which all the way up to reporting. So, again, core, functionality for how we manage all this complexity. And then just to dig in a little bit deeper, on the DBT side, this is just one screenshot. We have about, I think, nine hundred models, something like that, in DBT. Again, with all those platforms, all those models on top of those, etcetera, etcetera. So this is just a screenshot. I think I scrolled for, like, twenty seconds. I've had a recorded video of this, and I was like, I don't know how to make this a GIF, so I'll just have one screenshot. But you're gonna see, like, the level of, just the level of complexity here with the models. And then this is just an example of a a specific client project, that I'm currently engaged with, and, you can just see just the level of models just for this specific client. And this client doesn't really overlap with any of the stuff. The other normal platforms that we support, this is all more custom stuff. So a lot going on top down, all the way down to the DBT modeling side of things. Any questions on on any of that support? I guess just to kind of, tie this in with the kind of question of collaboration, I think to me, what stands out about your situation is that, yeah, the complexity that you're mentioning on the, you you know, on the data side of things, you have so many different sources. You have so many, different models that you're working with to combine these things in kind of bespoke combinations. And then you also have so many stakeholders on the other side. How do you find that that combination of factors impacts, I guess, how you think about collaboration? Because I think a lot of analytics engineering teams wouldn't really think of collaboration as something that they need or something particularly valuable to them. Yeah. So it's an it's an interesting question because, you know, prior to count and I've been using count for, I don't know, over a year at this point. I think I tried signing up for the beta back when it wasn't just freely available and had to be on a wait list. So, but I've been using count forever, and it's been invaluable in this in this case. You know, before count, it was, you know, as an engineer, as a data engineer, analytics engineer, you are sitting inside of tools, and you're sitting in those tools by yourself. So I've been a big user of Datagrip for years and years, and DataGrip is a SQL ID where you can connect to multiple databases. But it's just you writing queries, and it's writing one query at a time, and you're seeing the results of that query. There's also the reporting layer where you're sitting inside of a BI tool. So in our case right now, Power BI, I used to work at Tableau, so I'm very familiar with Tableau. But, again, it's you sitting inside of a tool figuring out how to solve a problem, and then, eventually, you're sharing that with other people. So what count has really done is is just completely changed that to where now I'm not working by myself in a silo. I can work directly with other people who have who are also trying to work on the same problem from a data perspective, so other analytics engineers, for example. But we can also bring in end users. So our stakeholders, people who are consuming these dashboards, etcetera. And so we can, sort of let them in on on how complex things are and where we might have to make decisions, where data might, you know, get consolidated with other data, and how like, basically, how we get to that end result. Right? Mhmm. And I think before being able to show people that, they don't really understand all that goes into what we do on a day to day basis. All they see at the end of the day is, you know, a dashboard with a number in it. And if we go back here, you know, they're just seeing some number in a dashboard. Right? And they go, well, I don't know where this number came from. I don't know how this number was created. I don't know if that number is right or wrong. You know? And so I think just being able to show people show people this complexity, and then as we get into, like, actual use cases in in count, being able to show them that, it just really helps, with just overall understanding and trust and and all that fun stuff. So, yeah, it's been it's been completely game changing to be able to share what we're working on as we're working on it rather than just, like, this lag of I have something I need, and then later on you get it, and it's just, you know, black box for them. So Yeah. That's it. Yeah. Well said. I like that. There's a question from the chat. And just a reminder, if you do have questions, please go go ahead and drop them in the chat as we go. You don't have to wait till the end. So Ginny has asked, how long do you have to support the models for these clients? Is it a short term or kind of multiyear span? Yeah. That's a good question. So I would say prior to working at Accenture, you know, our our mandate essentially was we need to provide reporting for all of our clients across all these platforms. Accenture is a much more client specific driven, company. So everything is like this one client has this one project, and everything is a lot more, I don't know, tight bands on it, things like that. So there's a predetermined start and stop to a project. We have to, you know, get rid of everything at the end of a project, including data and all those kinds of things. So I think we're working through what that looks like, on the bigger scale. I would say for the most part, we've tried to keep things simple where we something that we do for one client is reproducible for other clients, you know, sort of the pre Accenture world, in which case we're trying to build everything is reproducible. Everything is standardized, etcetera. There are custom cases that come along. You gotta deal with that. But, but in the Accenture world, it's it's basically all custom. So I think the and my answer I don't know if I've given one yet, but it becomes more complex. It becomes harder. It becomes more important to to know time frames on the client side, as we move forward in the world. But I don't know. Did I answer that question at all? Yep. You did. That sounds great. Thanks for the question. Alright. So moving on here, I'm gonna talk through so, basically, I just went through account and looked at sort of how we organize projects or in our account workspace, projects we have. We have a lot of data connections. We have a lot of, projects per data source. And I just wanna go through and take some screenshots to just kinda show, sort of all the things that we're doing. So these are broken down into categories. So organization understanding, collaboration, and then just kind of the other category, prototyping, brainstorming. So like I said, we manage all these platforms, so we support all these platforms. So what we've done is we have a individual count project per data source. So, again, I I keep using Facebook for all so I use Facebook again. So let's say this is Facebook data. One of the first things that we do is just sort of lay out all of the tables in the database. So from a DBT perspective, you know, we have sources and staging models and intermediate models and final models. We're trying to lay those out, on the canvas. So, it is easy for somebody who did not write all those models or somebody who has no concept of what's going on at all. They can come in and they can just see data laid out for them. So in this example, we're using, just color coding here. So in the green section, they are just the raw sources. Blue might be all the data models. So this is more for the, you know, analytics engineer to actually dig in and say from we go from staging, intermediate to final. In this particular case, we've got we also have data lake sources. So depending on the platform, depending on how we ingest that data, what we're trying to move to across the board is that we store everything in a data lake. So we use Redshift, and so we can query those data lake files that are in s three directly. And so that's what this is. So one of the nice things here with that particular use case is that somebody on the data engineering side can much more quickly share data with an analytics engineer because all they have to worry about is getting data in s three. And then we can expose that data here and analytics engineer can can look at it. They can start to build those models before we ever touch dbt, as an example. Oh, just real quick. Sorry. If you go back. Just in case people are are brand new to count and have no idea what it is, I just kinda wanted thought I'd give a very brief kind of context for this image, and we'll go to a live demo in a in a little bit or live example. But, basically, if you think of count as kind of a a Miro board plus SQL cells, visuals, and Python cells within them, and they all can kind of talk to each other, and you can have stuff from, yeah, BigQuery and, Athena and all these different places kind of talking to each other in the same place, and, obviously, it connects with DBT as well so you can bring in your models and run them on live data. So you can see, yeah, in this case, these, these little kind of white cells there, these would be SQL cells, and then the frames around them are just kind of organizing, based on what Justin was just saying, how you can organize these different cells. And this canvas is, everybody can get many people looking at it at the same time, like we will do in a minute. And, you know, Justin and I are both in this one as well. If you jump to this slide that you're on, you can see. Yeah. Just thought I'd do a really brief intro in case people are like, what is what is this really quick. But okay. Back to you. Awesome. Alright. So next sort of example, similar with the color coding here. Just the thing to note here is that we got DBT tests as well. So these are just the ones in yellow. I'll touch more on this, in a little bit on the live demo side, but, we do a lot of testing inside of DBT, which is which is awesome. The hard part about it is that when you when a test fails, there's not a ton of information about you know exactly what rows failed depending on the test, things like that. But when you can actually lay out those tests, lay out the raw data, in count, and especially for more critical stuff, then you can just when the test fails, go directly or look at it, see all the data, do some more investigation, ad hoc stuff, etcetera. So just, again, sort of adding to our documentation here, sources, the models, tests, etcetera. You can also see, just little comments here. These little bubbles here are places that people have clicked on something and and had a conversation about about the canvas as well. So, again, touching more on dbt stuff. We also have exposures in dbt. So exposures are just, you know, all the models that you're building in dbt, how are those being used in downstream systems. So in this particular case, we've got Power BI, semantic model connected to some tables that we modeled in DBT, and this is just some more documentation on what that looks like. So, again, had to take some screenshots and zoom out so you couldn't really read anything for privacy issues and stuff like that. But this is, just showing a little d b t lineage graph of the exposure in orange and then the models that feed into that. And so here, we're just kinda documenting, these are the join types, these are the join keys, again, tests that we might have. If we're if we have an assumption that some that that a join is one to one, we wanna actually test that's true. Again, that's stuff that can be done at DBT, but this just lets us let's us lay it out, visually and then inspect that if there's something wrong. And documentation like this on an on an exposure, would that be just kind of for your internal team, or does that go to stakeholders as well? Yeah. This could be anybody who is at that level of working with the data. So it might be just an analyst who's working in Power BI. They wanna understand how the semantic model or the data model in Power BI is built or they need to make changes or whatever. That might be a use case for this. This is probably a little too detailed for, again, just a a consumer of data, but Yep. It's here for conversation on the people that need to have that. Nice. Sort of touching a little bit more on just DBT lineage stuff. So this is, again, just screenshots from DBT, but we actually have the tables, here as well with some metrics, on here. So one of the things and I'll touch on this a little later. But one of the things I always do is I put the date the dates as little visuals and then maybe number of records or the unique key or whatever. So that you can instantly see, okay, this data is up to date or this data is out of date or, you know, if there's some issue with the I expect this number to be this and it's not. But, again, the nice thing here is just laying out, the specific use case that we're working on, all the actual data for that use case, and then sticky notes and all the other fun collaboration stuff that's on here as well. And there's visuals inside account, so, we will add those in as well. Digging a little bit deeper, onto the data modeling side, these are just a couple screenshots. Alex, who's, I think, is on the call actually did these things. So good job, Alex. But this is just some more detailed data modeling. So you can kinda see it the most on the right hand side. So what Alex did here is laid out all the sources, all the models that are on top of that, intermediate models, final models, just laid all this out. And we did all this by hand, and as we'll show here in a minute, like, now you can just click a button and do this, which is amazing. But we're we're doing this before before that was possible, so it's cool to see the progress there. Yeah. It's one of my one of my favorites. Yeah. Well done, Alex. It's good. For sure. And then just on the left again, just more organization of of data models, maybe from different sources of color coding, things like that. And then, again, this kinda shows the scope of this is a zoomed out version, and then you can drill all the way in to seeing actual raw data in a single table or single model. Alright. Moving on to collaboration stuff. This is, an example of one that might be more specific for end users. So in this particular case, this is, data coming from Amazon. Amazon's got a lot of different, parts to it as you can imagine. So we we're pulling data from seller central, vendor central, from their advertising platforms. And this is laid out in a way that we can show this data to, end users who wanna work with Amazon data. So we always get questions of, like, can we do this? Or, you know, I have a request to do this thing, and it's like, here is all of the Amazon data that we have in its raw format, every single table. If we have the data, then we can do whatever it is we need to do. If we don't, then we gotta, you know, solve that problem somewhere else. So just an example of, just all this laid out. One other thing that's just funny to me is, I blurred this out, but on the left hand side is just the all the database connections. So for this particular, platform, we use a third party, and they give us data in a database per client. So this is just a subset of that. And so what this allows us to do is actually write a query against one database, swap the connection to another database, and see that data for that other client. Mhmm. So one not only do we have the data to do this at all, but does do we have that table or that data model or whatever for that specific client? And something like this allows us to help it or to help do that. And I think I shared the screenshot in the Slack channel at some point, and somebody was like, you should win an award for the number of Yeah. Data. So I don't know if you guys thought about this particular use case of, like, you know, tens of databases or whatever, but we certainly have that situation. I think we only thought about kind of a production and the development kind of thing where you'd wanna swap between them. I don't think we anticipated this level of complexity. So, yeah, definitely win. Another example here. This is just sort of a diagram or data flow diagram. So just a piece of this, but, again, in an Amazon context, this is showing, you know, we've got data in in s three. We query that data from s three with Redshift Spectrum. We've got airflow in the mix. And then this is just showing some final models for various, parts of Amazon and then actual dashboard, just screenshots from dashboards with links. So, again, in having a conversation of how does data flow from one one end to the other, in a specific platform context, this is just a good example of that. Yep. Obviously, linking out to the dashboards themselves, which is great. Yeah. I mentioned that's a really powerful way if somebody comes back and is like, oh, this this isn't right or this isn't, you know, this isn't what I expected to be. Just kind of coming back here and explaining, you know, where some or finding out where something did go go wrong if it actually did or just explaining what's actually happening. Yeah. Exactly. And just being able to just lay it out visually and then linking out to other systems. So, like, you can see Airflow. Here's a link. So I would imagine this is a you click on this. You're actually going to the Airflow, app where you're going for to the code beneath that, specifically for Amazon. So just tons of power here to lay things out and then dig into details when you need to. Alright. And then just some other random stuff here. So this is an example, another thing done by Alex. We did a database migration, for some portion of data from Snowflake to Redshift. So, really, what I wanna highlight here is just on the right hand side and all these are errored out because we don't have these connections anymore because we migrate them. But we have, on the right hand side, Redshift. On the left hand side, Snowflake. All the Snowflake stuff is in blue. All the Redshift stuff is in red. And so, basically, laying out this table that exists over here compared to this table that's in this other database. And then we even use some, some filtering stuff in the middle so that we can have apples to apples comparison across these databases. And then we're trying to compare metrics across those. So we're once we do this migration, we expect that all the numbers will be the same, etcetera. And there's a lot of different nuance between databases and SQL syntax and all that kind of stuff, so this is really helpful to, make sure that everything is is working as we migrate the data. There's one more question from Jeannie, actually. I think it's from, the slide before this, which is I'm guessing those can be updated automatically, because it gets EBT. Also, does it connect to Power BI? Where does visualization come from? Yeah. I can out of this. So Sure. It'll make more sense, I think, what's kind of live and connected to DBT and what isn't when we go to the actual own show, like, how that the DBT integration works. And then in terms of visuals, these look like they're screenshots from Power BI. Correct? But count count can do all these visuals, but for, you guys can't you guys have to use Power BI for your client reporting and your internal kind of stuff, but we do have other people that are using, count for this kind of stuff as well. So it it can do this. So you could theoretically go from source to DBT model to output in one canvas if you wanted to or just use it for kind of any part of that like this. So if that makes sense. Yep. Yeah. And maybe just to add to that, so there's pipe you do Python inside account as well, so maybe there's a world where we could pull this take a screenshot using Python to drop into your, you know, some interesting stuff like that. Mhmm. But, yeah, for the most part, it's we're connected to a database through accounts. So when that data is updated in the database, it's updated in the account. But in this case, yeah, these are just screenshots that are dropped in and laid out on the canvas. Good question. Speaking of Python, this is just another example here. So in this in this context, we are, this is a data engineering context. So we're actually connecting to, in this case, Azure, for all of our users and groups and things like that that we manage security with. We have the legacy tenant, which was the previous company. We have the Accenture tenant, which is the current company. So this is us actually writing code Python code in the Canvas to pull data from these APIs. So to your point earlier, I mean, we can go from a direct, API call in Python to data inside of a database and count to visualizations all the way through. So what this helps us do is is do prototyping, so we can really quickly pull data from API, hand that off to an analytics engineer to start building out data models before we've ever built out the full process, in production and airflow. So just another really powerful way to use count. Cool. And then finally here, just just another example of this is just sort of your standard, I don't know, Miro use case, if you will. So this is just us brainstorming. I think there's a CICD process, incorporating DBT cloud. And so this is us just talking through that, sticky notes, brainstorming. And if we had data associated with all this, it could be right here in the same canvas. So just another really powerful use case, that we have and use all the time. Any questions about any of that before we move on? Pause for any any questions from the the group. Feel free to to jump in in the chat like we said. Yeah. I think this serves such a breadth of examples, and it's really great. Yeah. It's good to see it being used across a lot of different workflows. As you mentioned, that it's not just a DBT thing, you know, trying to doing the the data engineering prototyping in there as well. It's cool to see. Cool. Alright. So I'm gonna jump to the Canvas now. So just, I guess, for orientation for people, there's a report mode and then the actual canvas where all all the underlying work's being done. So these are just pieces that you just saw in the reporting. So what I wanna do and, actually, I'll touch on just this this real quick because I love this little feature here. This is just a little menu, that I was making. So every everything in count can be linked to, outside of count. You can send it to somebody and then also inside of count. So this is just allowing me to sort of organize my thoughts and then actually, I have a little home button too to jump back up here. So we're gonna talk about you know, we just talked about use cases, all all the stuff here, and then, we'll go to the actual DBT modeling stuff. So I just put a sticky note in here, to help me because I can't pull actual data that we use in the account for this presentation, but, account does have these, demo datasets and demo d b t project stuff, so I'm gonna use that. So on the left, we've got a we've got a handful of sources here. This is a deep this is a BigQuery data source that has DBT models hooked into it. You can see that with this little DBT icon here. So, this is DBT core. Is that right, Taylor? Yep. Yes. It's core, but you can use cloud as well. Yeah. Cool. So, there are different branches on this repo. So we're in Maine right now. And then DBT deployment n d a. So we're just seeing all the different schemas inside the database and then the various tables. So what I wanna look at is this final DBT model called mark players or player careers. So because this is hooked in a DBT, we can see DBT metadata about this. So I don't have access to this repo, but clicking this button here would go directly to this code inside of GitHub. We could see some summary information about this, dependencies, which is really cool. So what I wanna do, sort of the use case here is I I don't know anything about this data model. Taylor created this. So I'm just gonna explore it and then talk about sort of how I would approach, doing something like this. So I'm gonna click on add to Canvas, and we'll just start with the default, which is just adding a single cell. So let me just work over here for a second. So what this did is it just dropped in, the DBT model directly on the canvas here. So we can see the code that's inside the DBT model with all the various reps and things and then the actual data itself, from that model. So, Taylor, maybe you wanna touch on just what's what's in this scheme or what these tables are for for the NBA data. Yeah. The goal for this, model was to, take some NBA raw data that had loads of games that have been played since, well, we could find out, I wanna say like the the nineties up to a few years ago, as long as some and some player data to aggregate that into what's, a player's kind of career statistics. So it would kind of look across, all the different game stats and things and find out, you know, like, what position they played most often, what was their career points, how many did they average, that kind of thing to summary table. Cool. So this is the final result of all that, but, again, I wanna see how this is actually built, because it's a little bit more complicated than just a single data table. So this time, I'm gonna go add a canvas, and I'm gonna do three upstream models. This is the final model. So I just wanna look at three layers above this. So we'll take a look at what this looks like. So you can see you remember back on the picture that I showed, we were doing this by hand. So this is just instant now. So this goes from the staging model here down another staging model here, an intermediate model, a bunch of other intermediate models, all the way down to this final model, the smart player careers. So, again, as somebody who's never looked at this, I can start to understand what's going on here. So this gets laid out really nicely. There's some other functionality, like, within a intermediate model for for example. There's a ref here to a staging model. So I can just click on that and go directly here. So this is one of the staging models that's used. If we go look at another one, we can look at I guess one, one thing, we get actually some people asking if this is running kind of off of live data. So if you were to just change the code, you know, add a limit ten to one of these cells, like, this is all live running on the database. Just kinda wanted to, make that clear. Yep. Now updates everything downstream as well on that. Yep. Yep. Yeah. So that's really powerful as far as not only well, for in this particular case, I'm trying to understand what's happening in a model that already exists. But, yeah, I can make a change because maybe I'm gonna find something that's wrong with this, and then I can make those changes live, see all the results of that all the way down, before we ever go back and and change anything in DBT. Alright. Let's take that one and back up. Okay. So yeah. Go ahead. Sorry. Another question from Jeanne. Can you develop your DBT model and commit from count? Yes. There's a commit. So if you click on one of those cells, and then on the right, you see a commit to GitHub button. Yep. Let's get up. If you press that, it'll take you to a code editor so you can, yeah, see the the changes, and you can deploy or you can submit a pull request from there. Yeah. Good question. Sorry. Back to you, Justin. No worries. Yeah. That's a great question. Okay. So I wanna just sort of look at this and organize this a little bit and start to ask some basic some basic questions. So, as I showed before, I like to do the color coding, to sort of to organize these things. So I know there's a staging model here and another staging model here. So I'm just gonna move this up and then let's put these in a frame. So now they're connected, and I can just move them around, color code them, etcetera. So I'm just gonna make these green. Here we go. So now I'm just I know this is, like, the rawest form of the data. Now this is a d b t is a this is a d b t model connected to a source, but this is just select star. So let's assume this is, you know, the rawest form of that data. Now all the rest of this stuff is are intermediate models, and then what I really care about is what's going on down here at the end. So let's do another just do another frame here and maybe do this blue. Okay. So I don't really know what's going on here, but I can look at all the code and see. At the end of the day, though, I'm seeing we've got player names, or at the last longer player names, their first and last season, how many games they played, wins and losses, all their stats, I'm assuming aggregated lifetime on the day that we have. And then up here, we've got the the actual games that were played and then who is on who is playing in that game. It's my understanding of this before. So one of the first things I always do is I just look at basic, basic numbers here. So I'm gonna add a visual, in this, and I'm just gonna pull this out here so we're not messing with our sources here. So I just wanna look at the just some basic numbers here. So we've got a player name. We've also got a player ID, so let's just start with player name. And I'm gonna look at account distinct. So super basic. This is just player names. Alright. So we have twenty four hundred players, and then I'm just gonna I'm just copying and pasting this. And let's just do player IDs and player ID, and we'll count to sync that. So we got twenty four hundred and thirty four player IDs, twenty four hundred and seven player names. Now when I did this, it immediately jumps out to me that, okay. What's going on here? We've got less player names than player IDs. Interesting. Let's do something else and actually look at that in another more visual way. Let's just do a bar chart here, and let's look at player names, and we'll look at distinct player IDs. So just kinda touching on let me flip this and sort it. And, actually, I wanna see the labels on here too, so let's do that. So, Taylor, you mentioned before you can have visuals in here. It's just an example of you know, we're starting to build out some super basic visualizations here, but those can be combined into a dashboard. We're gonna have filters. We're gonna do all all that fun stuff. Okay. So, we got some players that have three player IDs, some with two, and then the bulk of those with one. So, Taylor, what was your initial thought here? The initial hypothesis are multiple players with the same name. Yeah. Makes total sense to me. Then I think we dug into this, and you got some some interesting names, apparently Christmas. I don't know if there's been two of them throughout history, but maybe I'm wrong. So anyway but we're just we're starting to get a feel for what's happening in this data. Right? So we're already uncovering some things that might have some impact on on the actual data model. Did did we account for this? If we look at the go back down here and look at the final data model, we only have player name in here. So we might need to also add a column for player ID to be able to do these things in, you know, in some other downstream system or, join that some table or or what have you. Okay. So, let's check these same numbers and see if they, just exported that or something else we can do. Let's see if they match on the final model. So, again, I'm just copying and pasting here, and then let's just drag these things on here. Alright. We can see this arrow. This is still connected to we can see it over here. This is still connected to the staging model, but I wanna connect that to this mark player careers. So I'm just gonna do that and do the same thing here. Now this gives us an error because there is no player ID in this final model, and we already mentioned that. That's to be expected. And this is twenty four zero seven. So twenty four zero seven, which matches our twenty four zero seven. Right? So there's no issue. What I'm really looking for here is, like, what's the what's a unique row of data in that final model? Because that's gonna be important for joins and other things. And then do we have any duplicated data or missing data, things like that? So kind of a first pass twenty four seven and twenty four seven matches, but we know that there's this player ID thing that we've gotta deal with. So I'm not gonna go too far much further down that path, but you can start to under you can start to see, like, what the power is here. And we can start to see, okay, these this data model here, this data model here, this data model here, all of those use this staging table. And so we might need to add player ID. Right? So this would be a spot where we would need to do that. In this query here, which is another, which has a CTE in it, we don't have player ID. So this is this is where I would start to make changes assuming that the our end goal is that we wanna get all the way back down to the bottom and have a new column of data here. So I'll pause there, but, hopefully, that at least the thought process there makes sense. Yeah. I've made, a call for any other questions. Please please go ahead and type them out. But I guess one thing as well in this when you're making these changes like this, if you zoom out, you have another copy of this tag over to the right. So you can have kind of a in progress one and then your real one and be able to compare the changes downstream after that. So you could have you know, you could copy and paste that whole tag, make changes to the one on the left while you're comparing, you know, the the old version on the right. Really nice way to do it. And then, you know, when you're happy with those changes, then you deploy that tag to back to GitHub or or save it to DBT. Yep. So It's kinda the yeah. We just we can track those over time against each other as we make changes to your point. Yep. That's very cool. Nice. There might not be any other questions. Yeah. I guess maybe one one last question for me. What's what's next do you think for for your team? What's something that you're thinking about doing to kind of take this idea further of, you know, talk about transparency or talk about collaboration? Yeah. Just understanding, having a better grip on these models. What you know, do you have any thoughts on what you might wanna do to further that? Yeah. I mean so like I mentioned, we in you know, historically, we've been worried about managing data at scale across multiple clients who have, you know, similar platforms. If we build data integration to a new platform, we can now support a whole slew of new customers. Right? The further and further we get into the Accenture Accenture world, the more the larger the clients, the larger the projects, the more customization needs to happen. So I think we're on that journey of figuring out how to deal with that, just those complexities. Mhmm. And so I think something like count is really important in that case because we can quickly involve everybody who needs to be involved. We can prototype things very, very quickly. We don't have to worry about building a doing a full end to end process for a net new platform. We can take some data, export it as a CSV, drop it in the Canvas, build out some basic reporting, build out basic data modeling, do all that very, very quickly, and we provide value back to the the client or the team that's working on the client. So to me, that's that's part of the future is, like, how do we how do we engage very, very quickly with much more custom client demand? Yeah. I assume you are working in, as you mentioned before, these new kind of tighter cycles or much more, like, rigid timelines. Being able to go quickly in that sense is is gonna be really beneficial for you guys ultimately. Yep. Exactly. And, you know, we have a lot of tools for DBT has documentation. Right? But the the audience for, you know, logging into Airflow and seeing all the Airflow DAGs or looking at DBT documentation, those are very specific subsets of people versus here, we can give raw data or reporting or model data or whatever to a much larger audience. And so I think that's also that's also the future, getting more people engaged with data, because it's really easy to do so. Right? Send them a link. Yep. There's context around it. They can ask questions. They can we can make changes really, really quickly. And then when we need to do that stuff, you know, at a much larger scale or in production or whatever, we can take all that stuff that we've done here, and now we have a much faster path to productionizing stuff in other systems and tools and things like that. So Yep. That that's great. Well, cool. I don't I don't think there are any other questions. If you do think of anything, I'll send an e oh, question. Not really discount. Quite curious how you handle data documentation in DVT and how you expose and maintain in Power BI since I don't think those two connect. Yeah. So d like I mentioned, DVT has a lot of built in documentation functionality. So everything what can be done in d b t is done in d b t. So we have YAML files for all of our, data models. We have exposures with documentation on those. And then from a Power BI perspective, I don't know. It's harder. There's not really stuff for that. So, again, I think I think count is the glue that connects those things. So we can have both DBT documentation for DBT users or, you know, integration directly to DBT where you can look at models and all the things that we showed at the end here. We can have links to the documentation or screenshots or whatever, in a canvas, then you can have Power BI. So the semantic model in Power BI or the dashboards or whatever. I think you need a place to put all that stuff together, and I think that normal tools would be things like a, you know, a Wiki page or something like that. But being able to do this in a Canvas collaboratively with data in there as well, which is the key missing component, to me, this is where you do it. This is this becomes the documentation for those other tools, and then you just link out to them if you need further detail, raw code, or things like that. Yeah. One of the things that we're starting to see more of I don't know if you saw, a little while ago, we introduced custom templates. So we have, data teams creating, yeah, a a frame, basically, that has the custom documentation for a specific table. And then anybody, any stakeholder who wants to use that table can just add that directly from a canvas, and it gives them everything that they need. So it can have a combination of, you know, some DBT documentation. It can explain the lineage of that, but it can also go into you know, I think there was one of the examples that you showed that had kind of here's everything that you could do with with Amazon. I think you you could have that with a specific table, and here's, like, you know, the common things that you could do with this table, and that just becomes a template that anybody can just import a few clicks. It's pretty cool. Yeah. I don't know if my account has access to the templates, but you wanna just show that real quick if if it's easy for you? Yeah. I can do that. So I'll let you share your screen. It's high pressure to make sure I share the right tab. Good luck. Yeah. So you can see, here's all the templates, and then you can have these custom ones as well. So, I don't have any in here that or that exact example, but you can imagine if, you know, this is just, a template for a slide or a template for something that I wanna reuse, I can make this kind of exactly what I want, and then anyone can come in here in my workspace and and grab it. So this could be, you know, have actual data in it with all the documentation and things like that. So, yeah, quite powerful. Nice. Well, I think now there are no other questions. But if you do think of any, you'll get an email from me, later today. And so if you have any questions, feel free to reply to that email. I can pass them along with Justin. Thank you guys for showing up. And, Justin, thank you so much for your time and the the effort you put into, going through all this. I hope you guys all found it as as valuable as I did. Yeah. I think it's it's been really great. I learned a lot. Thank you. Awesome. And I would say a quick plug on the there's a Slack community, so I'm on there. If you wanna chat with me, feel free to jump on there, and tons of people with, lots of knowledge on there. So check it out. Yep. I'll send a link to that as well. Good reminder. Cool. Cool. Alright. I appreciate it. Thanks, everybody. Thank you. Bye.

Loading video player...

[15.07.24] The importance of collaboration in analytics engineering - Count at Accenture

Video Transcript+