CANedge + MATLAB: Building a Digital Twin for an EV Battery Pack - MATLAB
Video Player is loading.
Current Time 0:00
Duration 44:20
Loaded: 0.37%
Stream Type LIVE
Remaining Time 44:20
 
1x
  • Chapters
  • descriptions off, selected
  • captions off, selected
  • en (Main), selected
    Video length is 44:20

    CANedge + MATLAB: Building a Digital Twin for an EV Battery Pack

    In this webinar, go through the process for an end-to-end digital twin project, focusing on how you can use real-world CAN bus data to drive your understanding and model development.

    Here’s what you’ll learn:

    • Data Collection: Discover how to log and decode raw CAN bus data from vehicles using the CANedge family of data loggers from CSS Electronics.
    • Data Preparation: Once you have logged data, understand the process used to prepare the data for analysis and modeling.
    • Digital Twin Development: Develop and deploy a digital twin model of an EV battery, designed to estimate the battery’s state of health.

    Gain key insights and considerations for planning your own digital twin project. Don’t miss out on this opportunity to enhance your knowledge and skills!

    Published: 24 Sep 2024

    In today's webinar, we will be talking about how to log CAN bus and how to collect it at scale, and we will also discuss how this data can be used within MATLAB, specifically focusing on a use case of building a digital twin for an electric vehicle battery pack.

    So we will start with a brief introductions, and then I will go through the basics on CAN data logging, both how you connect to the CAN bus and how you decode CAN bus data. And then I will explain briefly how the CANedge CAN bus data logger can be used to record CAN bus data and to transfer this data to the cloud. And I will also cover briefly how the data from the CANedge can be integrated with MATLAB in a number of different ways. And after that, Will, from MathWorks, will essentially go through a specific application within MATLAB of how to use this data.

    Briefly on CSS Electronics, we are a Danish company. We specialize in developing CAN bus data loggers as well as, for example, sensor to CAN bus modules. And our products are used around the world by 5,000 plus companies, and in particular, we sell our products to OEMs, automotive and industrial manufacturers, and the engineers working at these companies.

    Briefly about myself, I'm one of the two co-owners at CSS Electronics, and my primary work is focused on sales and marketing but also technical sparring with customers, focusing on helping them achieve their use cases.

    And my name is Will Wilson. I'm an application engineer at the MathWorks, and I'm focused on MATLAB-based analytics, with an eye towards fleet analytics and data engineering, specifically with data logged from vehicles or machines out in the field.

    All right. So starting out, we wanted to give just a very practical introduction on how you actually record CAN bus data in the field so that we do not simply start when the data has magically appeared in the cloud. So the first step when you are looking to record CAN bus data is, how do you actually connect to the CAN bus?

    So if we jump here, you can see the very simple version of this, you will, of course, need a data logger or other tool that is able to record the CAN bus data, and then you will need typically an adapter cable in order to connect physically to the CAN bus that is relevant for your data logging.

    In the simplistic case, as illustrated here, you might have an excavator or a truck where you can use a cable directly into the diagnostic connector. And in most heavy duty vehicles, that would look as the first cable in the overview, the one we call J1939, the green one, and that will be available in most heavy duty vehicles, but not always.

    Similarly, you will experience that in most cars, you will see an OBD2 connector near the steering wheel. And for such use cases, you can use an OBD2 adapter to connect. And a similar example is found in maritime applications, where an M12 adapter will allow you to connect to the CAN bus, but in the field you will encounter very frequently that there are many other adapter cables required or other connector pinouts.

    So you also have to consider that sometimes you will need to create your own custom adapter cable matching the pinout of the application. And sometimes, you may want instead to use a contactless CAN bus adapter to simply record the data directly from the CAN high and CAN low wires. And you do this through induction using an adapter like the CANCrocodile adapter you see to the lower right. That will essentially allow you to record from any CAN bus, regardless of what the connector might be in the vehicle, but it can be a bit more cumbersome to install.

    Let's assume that you have now are connected to a CAN bus and you now record raw CAN bus frames from it. What you are looking at will look illustratively as what you see on the left here? You will have timestamp data that contains information like what bus was this recorded from, if you have multiple CAN buses at play, what was the ID of a given frame, and what was the data payload of that frame.

    And obviously, in order to use this in different software tools to perform analysis and visualization, you will need to turn this into human readable form, which is the process of decoding the CAN bus data. The end goal being to create something that looks a bit like what you see on the right hand side, where, again, you have time stamped data, but now you have what we call physical values such as degrees, kilometers per hour, RPM for different CAN bus signals. So you have essentially extracted the information contained in the raw CAN bus data.

    If we look on the next page, a tool or a database file that is commonly used here is called a DBC file. DBC is a standardized way of storing the information required in order to interpret the raw CAN bus data and essentially decode it into useful physical values. The information you need for this is to know for a given CAN ID for a specific parameter within that CAN frame what is the bit start in the payload, what is the length of the signal, how do you scale it and how do you offset it? Which is illustrated here on the left for the signal called engine speed within J1939.

    Assuming you have this information and assuming you store it in a DBC file, then you are essentially good to go for decoding data from the application. The challenge, as we can see from this overview here, is that a DBC file can be proprietary.

    So if you go to a vehicle from a specific manufacturer, that manufacturer may encode the data in a proprietary manner, which means that, in order to decode it, you need to get the DBC file from the manufacturer if you wish to decode 100% of the information. And the manufacturer is not always willing to share this information.

    For example, if you were to go to Volkswagen, they will most likely not provide you with the full DBC file for decoding any of their car models. However, there can be exceptions to this. If you have a manufacturer of a sensor, for example, that is connected onto a CAN bus, that manufacturer may, as part of the business model, be willing to share the DBC file because that allows for that sensor to be more easily integrated.

    And beyond this, there are also a number of common standardized protocols within the automotive and maritime industry where you can decode at least a large subset of the data from the application.

    Three examples are shown here. J1939 is a heavy duty vehicle protocol and is used in most heavy duty vehicles. In practice, that means that, across different models and brands, you can record the raw J1939 data, and you will typically experience that 60% to 80% of the data can be decoded using the standardized file.

    In a similar way, you have the same principle go within the NMEA protocol in maritime applications and to some extent in passenger cars when we're talking non-electric vehicles via the OBD2 protocol, where you can get some useful signals extracted.

    So that was a very brief introduction on how to record and how to decode CAN bus data. And as we illustrated on this recap here, the tools that are involved in this would be some form of adapter cable, a piece of hardware for recording the data, which I'll dive into in a moment. And then you will get your raw data, and you will need to have the DBC file. And then, you will need a software for decoding the data, which we'll also talk about briefly, MATLAB being one example of that.

    If we look at the hardware, at CSS electronics, our primary product is called the CANedge The CANedge is a series of CANBUS data loggers which allow you to record CAN bus data from any application to an SD card. And we sell a couple of variants of this.

    If we look at the overview here, we sell both the offline CANedge1. We sell a CANedge2 with Wi-Fi connectivity and a CANedge3 with cellular connectivity. And if we jump into the CANedge1 to start with, there are a number of features for this device that we focus on.

    We try to make it very plug and play. Some examples of that is, if you installed it in a CAN bus application, it will automatically detect the bit rate and start recording the data onto the SD card. So it's quite quick and easy to install across different applications.

    But at the same time, this product is intended for OEMs and engineers, and therefore, we focus on ensuring that the specifications of the product adhere to the requirements of these types of use cases. That includes lossless logging of thousands of frames per second, support for multiple CAN buses, multiple LIN buses, CAN FD, and high-timestamp resolution.

    We try to package this in a very compact form factor, as illustrated by the picture here, and we try to make it highly configurable so that you can customize, for example, what you record, what data you might want to transmit onto the CAN bus, and other things like encryption and compression of the data.

    And finally, we try to make it possible to integrate the data from the CANedge with your preferred tool, whether that is an open source tool or whether that is a tool like, for example, MATLAB. we try to make that integration as seamless as possible.

    If you need to offload the data in a more automated fashion and at scale, we also offer the CANedge2 with Wi-Fi. This is essentially like a CANedge1. Essentially, it records data to the SD card, but you are able to set up a server for receiving the data and you are able to specify a Wi-Fi router or Wi-Fi access point that the device should connect to in order to offload the data from the SD card to the server.

    And if we look at the next slide here, we try to illustrate how this works in practice. For example, if you have a stationary asset, it could be in a production environment or energy environment, you may want to connect the CANedge2 and then have it continuously have access to Wi-Fi and thus the server. And your experience here will be that whenever a new log file is created, it is automatically offloaded to the cloud, and once successfully offloaded, it is deleted from the SD card. So you can imagine having many of these in the field, and they will continuously upload the data.

    At the same time, you can also use it for periodic offloading. So if you have a warehouse, for example, with a Wi-Fi router in one end of the warehouse, you can have forklifts that move in and out of range. So when out of range, they record to the SD card. When in range, they offload the data because they are now in range of Wi-Fi. And while in range, they offload as much data as possible, and then they start accumulating again when they exit the Wi-Fi range.

    We also offer another product, the CANedge3 for cellular. The idea is the same, but instead of Wi-Fi, you insert your own SIM card. And as illustrated on the next slide here, that makes the CANedge3 ideal for these type of mobile applications.

    So if we try and jump to the next slide here, you can see that if you have, for example, automotives on the road, where you do not necessarily have a Wi-Fi access point, then it is often preferable to offload the data via cellular instead. And you can get more or less continuous coverage, a bit like the stationary asset within Wi-Fi range.

    But you can also use the cellular connectivity, again, in assets that move in and out of coverage. Classic example being ships that may operate at sea, outside of cellular coverage for a period of maybe weeks, accumulate data to the SD card, and when they return, data is offloaded to the cloud automatically.

    With the data being offloaded either to your local disk on your PC or to your cloud server, the next step will typically be to decode and process the data, analyze it and potentially visualize it. And we offer a number of integrations, as mentioned earlier. What I will focus on here is a specific integration we have, which is with MATLAB.

    And MATLAB offers a lot of functionality that is great for the type of audience we have, again, being automotive and industrial OEM engineers. It is very popular programming platform within this audience here, and it allows users to directly load data from the CANedge via the Vehicle Network Toolbox or alternatively to transform the data into Parquet files, as I'll explain in a brief moment here. And MATLAB also natively supports S3, which is the type of cloud storage that the CANedge2 uploads to.

    So if we look at just a few of the examples of how we can load the data into MATLAB, the simplest way or the most direct way is to use the Vehicle Network Toolbox to load a raw file from the CANedge directly. And by raw file, we mean an unfinalized and unsorted MF4 file. So that's a specific version of the file format we have that is supported by MATLAB.

    So this is great for single file analysis, but often, you may want to manage or to analyze multiple files at a time. And here, you can instead use a data store within MATLAB. This requires that the MF4 files have been pre-processed or finalized, as we call them, which can be easily done via a drag and drop routine or in the cloud in an automation routine.

    If we want to go even beyond and go beyond memory, MATLAB also supports loading DBC decoded files. So a bit like before, you can do a pre-processing step where you ensure that MF4 files are already decoded from raw CAN bus data into the physical values. And when you do this, MATLAB supports what is called tall arrays, and you can essentially load, you can say gigabytes or terabytes of data and perform data processing of this even if you go beyond the memory of your PC. So this is useful for very large scale analysis.

    And an alternative route for integrating the data into MATLAB is to use Parquet files. This is useful if you do not have the Vehicle Network Toolbox or if your goal is indeed to perform very large scale analysis beyond memory. Because again, Parquet files can be loaded into tall arrays in MATLAB and data stores quite easily, and we provide a simple executable that will allow you to easily DBC decode the data into a Parquet data lake.

    And as shown here on my last slide, the next one, this is also very seamlessly integrated with the cloud functionality of the CANedge, where we provide integration workflows that automate the creation of Parquet data lakes, for example. And that in turn makes it very simple for you to use MATLAB, either in the cloud or on your PC, to access this S3-based Parquet data lake to perform your analysis and data processing. So that was it on the basics of the CANedge and data processing, and now I think we will go through a specific application example.

    Thanks, Martin. I want to talk about a case study we did about building a cloud-based digital twin for this vehicle. And before I get into that, I want to motivate a little bit why digital twin. Really, it's this kind of evolution we're seeing across many industries, where we go from things like selling physical assets to selling logistics or uptime. Or maybe, as opposed to getting revenue from one off sales, we're moving over to subscription-based pricing models.

    And so the idea of a digital twin is that you have a representation of a physical asset. So it's a digital model of this asset, something that's out in the field, and it's an up-to-date model that you can make decisions with and do, perhaps, predictive analysis with, depending on that specific case.

    So the specific goal here of what I set out to do was to take data from the CANedge2 that was logged from this machine in the field for about a year and a half and see what we could do with that data in terms of building a digital twin of the battery pack to ultimately predict system behavior. We're going to be specifically looking at state of health to try to understand, can we estimate state of health of a given machine?

    And this project really focused on data, specifically. We didn't have deep knowledge of the system. We didn't design and build the system ourselves. So we had to rely on the data and use that to try to inform our modeling capabilities and try to figure out, what could we learn from the data itself?

    So I want to start with the end in mind. We actually built a dashboard. Here is the dashboard here. It was hosted and demonstrated on Google Cloud so that you could host this wherever you want to. This dashboard was built in MATLAB, and it's got some of the key things.

    If you have, you can imagine a fleet of assets in the field and some number of units, you have some information about the current status of the unit you're looking at, some metrics about its particular behavior at this time. And then you can look at things like state of charge, like time at level. Because that, perhaps, if you spend time at the extremes out here, that could perhaps cause damage to the system if you're not careful.

    If we look at the different assets, we see here that asset R30 is in degraded operation and D49 has some sort of a electrical problem that we probably should take action on right away. So the idea here is that you build these examples or this sort of a dashboard to be able to understand the current state of your fleet and then take the appropriate action. Whether that's predictive maintenance or not, you're using the data to inform your next actions.

    So that's where we want to get to, and ultimately, we'll be talking about the state of health model today. And that will be running in the background so that when this dashboard displays, it would call up that model, and whatever the latest data is, would go into that model and you get the reflection of how that asset is behaving in the field.

    So we specifically talk about four steps when we talk about data analysis in MATLAB, and that involves accessing your data, preparing your data, modeling your data, and then ultimately, deploying or sharing your data with the world, your model, with the world. And in this AI era, if you will, everybody wants to jump right into the modeling phase, often forgetting that there is a non-trivial amount of work to be done in the upfront phase. And the first two steps, I pretty much classify as data engineering or the act of preparing your data for future use.

    So today, we're not going to dive in too much on the data engineering. I'll talk about some of the major steps that we took, but we're going to focus on the modeling piece. And then the dashboard is how we deploy or shared our work.

    So in slides, just to treat this, we took the data from the CAN bus edge that was logged to an S3 bucket and went into an S3 bucket, grabbed a handful of files, and brought those down to our local machine just to have a look and get a sense of, what is this data contain? What is it telling us? And these were MF4 files.

    So we looked at those locally. We developed some scripts, which ultimately ended up in a pipeline that allowed us to go from the raw MF4 files to do the finalizing and sorting that Martin talked about, unpack all of that time series data, decode it with the DBC file, organize it into MATLAB tables or time tables, and then ultimately write that data out to Parquet files that would be consumed for further downstream analysis work.

    So altogether, you can think of this as the data engineering pipeline. These are the steps it took to go from raw data to an analysis friendly data set. And this was done locally with a handful of files first and then at scale on the cloud, with MATLAB running on AWS. The data was in AWS, so all of the work happened completely on the cloud.

    I guess just kind of backing this up pictorially, so we have the raw MF4 files directly off the CANedge. They get logged to an S3 bucket. I'm calling this bucket A. With MATLAB running on the cloud, we go through the data engineering pipeline and do a bit of-- I guess there's a little bit of coalescing work, where, depending on the key cycles and how the data is logged, we'll do a little bit of grouping of the data just to make things simpler, and then ultimately, repartitioning and coalescing the data down into a temporal kind of schema, if you will.

    So I wanted the data to be in files by year and month because the analysis we were going to do, we needed to report information on a per month or perhaps per week basis, and it just made it easier to do that. So I took the opportunity to do this organization work early in the process so that downstream, we can benefit from that every time we have an idea, we want to try out a new algorithm.

    So this is at a very high level of how that data engineering pipeline played out. And then all of the data that we use for analysis that we'll talk about in a minute is this data here. So this is where we'll start consuming the data to do analysis work.

    And the reason we did this in the cloud specifically is because the sheer scale of the files. And you can do this for a handful of files for a few gigabytes, but when you get into hundreds of gigabytes or terabytes, you need to be leveraging parallel computing.

    You need to be leveraging the elasticity of the cloud because you just won't be able to keep up. You won't be able to do this on your laptop or local machine because the size of the data just gets to be too big. And of course, you can't drag all that data from the cloud down to your local machine because the time it takes to do that I/O work just becomes untenable.

    I Agree. And there are also significant egress fees typically when you pull the data out of the cloud to your local PC.

    That's right.

    And also .

    And even have to pay for it every time you touch it. All right. So when I talk about data engineering-- one more picture to just to reinforce the idea here is that you have this-- I showed this as binary files, but imagine that these are your MDL files or MF4 your files. You get those. And the idea is that you would want to decode and prepare this data one time. For example, you may not ever need to decode CAN data more than once.

    If everybody needs the engineering data, why would you unpack it more than once? So set up these steps, do it one time, and then preserve that data in what I'm calling an engineered format in the middle, in this case Parquet files. A Parquet is a Apache open source file format. It's excellent for representing tabular data. So anything you can put in a MATLAB table or time table would be a candidate to write to Parquet. And it's very just efficient in terms of I/O and compression.

    You could, of course write data to databases if that was a piece of your pipeline, but the idea is get it into analysis friendly format as quickly as you can. And then forever after, for all your analysis, your downstream work, your dashboarding, whatever the case may be, come back to that efficiently prepared data, and you'll see gains and benefits forever after there.

    All right, so let's hop over to MATLAB and take a look at how we actually did this. So here I'm going to be using a live script in MATLAB, and I'll just walk through this as we go. I'm going to show data on my hard drive just for this demo purposes, but there's no reason we couldn't have data in an S3 bucket here. So if you have a fully qualified S3 URI, you could dial in directly to this.

    So if I was demoing this from my local MATLAB client on my laptop to the cloud, I could show that, or we could show MATLAB running on the cloud, talking to the S3 bucket as well. When you do this, though, this presents a new challenge, and that is to deal with credentials. And there's lots of considerations around credentials and authentication. Having credentials in code is a bad idea because, of course, you run the risk of sharing things that shouldn't be shared.

    So we have a couple of new solutions for this in MATLAB. There's what's called an env file, which is a plain text file, where you would keep things, for example, an access key or a token. All of these are obviously expired because it wouldn't be showing them to you otherwise. But you can put whatever sort of credentials you need in a plain text file and then just call it up and use it right in your MATLAB script. So it takes the credentials out of the code.

    There's also something called the MATLAB vault, which is a new thing in 2024A, which does help you manage secrets in that way too. Here, I just chose to show the simple file-based route, but you have to be thinking about how you authenticate, how you handle credentials. And of course, an env file would never be put into Git, so you'd have to make sure this was part of your Git ignore file to-- you didn't accidentally spill your credentials out to places where they shouldn't be.

    All right. So now that we can connect to our data, whether it's local or on the cloud, we're going to use a data store. So Martin mentioned the idea of a Parquet data store before. And a data store, any flavor of it is basically an object that helps you work with collections of files. They point to collections of files locally, on the network share, on the cloud, wherever it is, and they help you tell MATLAB what data there is to read and how you're going to read it.

    Now, there's data source for all different data types. In this case, because we're using Parquet files, we're directly using a Parquet data store. But if I had images or CSV files, there's data stores, for example, that are tailor made for those, as well as a full blown software developer level API for you to write a custom data store if you have that sort of need.

    So here we're going to say this data store is pointed at the data directory, and it knows about the files that is going to represent. The other important property I want to point out is the read size. So you can tell in this case, how will MATLAB consume this data? One entire file at a time or perhaps a chunk of a file at a time? And that can help you-- depending on how you've sliced up your data, that can help you manage how reads are happening behind the scenes.

    Now, for this example today, we're going to focus on the sixth file, which happens to be the month of November. And because there's no reason for me to run through all the data at once, I just want to illustrate the purpose or the main idea here. And when we preview the data, what we get back is a MATLAB table-- sorry-- MATLAB time table, where the first column is time, and then each of the columns here represent individual sensors. And because this is a battery pack, we've got signals or channels like voltage and current and that sort of thing.

    There are on the order of 120 different channels for this data set. And one of the most important things I can suggest to you is that, for any given analysis, take the macro data set and reduce it down to just the channels you need, just the columns you need.

    This is going to do a couple of things for you. It's going to save you time in terms of I/O because you're not reading everything. You're just reading the subset of columns you care about for the given analysis you're doing. It's also going to help you with time because you're just touching less data. It's going to be faster overall.

    So here, I've just defined an array of channels that we need for our analysis. You can tell the data store, hey, these are, in fact, the channels-- so this array here-- these are the channels I would like to read. And so instead of giving you back the entire set of 123 columns, it's going to just give you back these four. And then, if we do a preview, we can just double check ourselves and say, yeah.

    So every time the data store runs through this giant data set and gives you back a chunk of data, it's going to be a chunk with just these four columns. And this is a definite way you can just use to manage and think about wrestling giant data sets down to the subsets you care about.

    So one of the things I like to do always in the beginning is make a picture, and this is no different. So in this particular example, this is the month of November. On the top plot, I've got current, and the bottom plot, I've got voltage. And then both of these are with respect to time on the x, And as I look at data, one of the first things I see is, oh, look, there's a big gap here.

    Now, this is a custom function I built. Just, it ends up calling tiledlayout and scatter. So there's no magic here. I just wanted it to look a certain way. So I wrote a custom function, called a couple of different plotting functions of the shelf functions in MATLAB. And the reason I chose scatter and not just plot is because plot will automatically connect the lines for me.

    And if we zoom in here around this area, looking at time data, I have a non-trivial chunk of time that I've got no data for. This is about three days ish. So you have to ask yourself right away, was this just a power outage? Was this a holiday weekend? Was this an earthquake? I mean, what happened here? There's no data, and we don't really have any context.

    And if we didn't look at this and didn't understand this was a big gap in time, we might be tempted to just take the signals and just do some sort of linear interpolation. And if you just connect these dots over three days, you may have something totally different from whatever was going on at this time, whatever was going on at this time, with respect to rain or temperature or you name it. So you need to start to get a sense of what is the data telling you.

    Remember, we're trying to get towards this model, but before you get to modeling, you have to understand your data. And this is just a great way to do that and just exploratory plots in MATLAB and just kind of have an idea of what's going on here. So this is the first place that I start. And you use this just to try to start building up your understanding.

    So the model we talked about is getting at this idea of state of health. We want to try to represent state of health. So we chose a very simple, linear combination of resistance and capacity. To do this-- I know there's a more complex methods, but we chose a simple model first, with the idea being it's probably fast ish to get to an answer because it's linear and hopefully easier to explain. And then we can always make it more complicated as we move on there.

    So in order to do this, I talked to a couple of colleagues who are battery experts, and they helped instruct me on some of the calculations. We're going to look at delta V over delta I to compute what's called internal resistance, and then to estimate the capacity, we're going to look at the integral of capacity over time and relate that back to state of charge.

    So these were where we're going to start we try to then implement this math in our data. In order to do this, the first thing we realize is that we needed to have what we would call well behaved data or signals in order to perform calculations. If you have data that's very, very noisy, you don't know if your computations are correct or not or if we're just getting washed out in the noise.

    So here's an example of a couple of days of data. The purple represents the machine in use. So negative current is when the machine is actually being used and cycled. And then we have, up here, what we're calling a quick charge event.

    So this is when the machine gets parked, plugged in, perhaps over a lunch break, let's say, and there's some charging going on. Down here is a-- call this a constant power charging. It's more of a slower charge, typically at night. So when the day is done, park the machine, plug it in overnight.

    And we decided we needed to look at slices of data where the signal was relatively calm and didn't have a bunch of noise like this. So we chose to use what we're calling the quick power charge segments to use those as the little time slices that we will apply our math to.

    So being able to label our data was key for this. We did the labeling with MATLAB, just wrote a function. It looks at things like moving variance, a little bit of windowing, a couple of thresholding features, and we just did iteration until we got to a place where it looked like it was consistent and made sense.

    And then, because I'm a huge fan of MATLAB tables, I use MATLAB tables for pretty much everything. For all the events, we had, in this case, three in this little plot that I showed you, we captured the start and end time of each of the events and the total time. And the reason this is important is now for each of these events, each of these time slices, we're going to add on columns this way that will be additional calculations within those time windows. And so we have a nice way to manage all of our findings in one MATLAB table.

    So if we think about this computation, again, visually, so if I take one time slice here and if I just Zoom in on this a little bit, the gray band is the event itself. So this is where the just we've got current, voltage, and we've got state of charge here. These are the logged signals. And this little gray band is the quick charge event. This is one quick charge event, with a start and end of the quick charge event here.

    And so we looked at-- I talked about, for the internal resistance calculation before-- this here is delta I, this is delta V, and then we have our state of charge across this window, what we're looking at here. I think, just on the order of 300 or 400 events for the entire year and a half of data, we're doing this exact same thing across each one of those events. I'm just showing you one to illustrate the idea.

    So when we do that, oh, yeah, we take the integral of the current across that little gray band I talked about and looking at delta SOC. And then we fit a line to that, and we use the slope of that line as our estimated capacity. Again, linear assumption might be totally wrong, but it's at least objective, and we can move forward with this and iterate on from here.

    So when we take these calculations, so that event table I showed you in the beginning, now we're starting to add internal resistance for this event, state of charge for this event, and estimated capacity. Just keep moving, just in order to keep things organized and get to the place where we can present our model with the data from this table.

    One other thing I like to do, just maybe by way of curiosity, is just understand, when we go into these quick charge events, what is the state of charge? And I just want to understand, from a statistical perspective, what sort of trends or behaviors are we seeing. And what we see is about 90% of the time, we're between actually 10% and 40% SOC.

    And the reason being, if we had a whole wide variation here, we maybe would have to have multiple models. We're not going to just assume there's one model for everything. Maybe we need to understand this. So we're going to use just this 90% of observations band to further our calculations.

    And so we're basically just adding on to that table and saying, for each of the little events, each of the time slices, which SOC bin do you fall into? And you can do additional reporting if you need to, if need to report to-- I don't know-- a customer or supplier. And I've got full kind of traceability around, how did I break the data down? Just completeness here is what this is.

    One of the other things I was suggested to look at is understand how temperature plays a role in this overall work. This machine does operate outside. So what effect does ambient temperature have? We know that heat and cold do things to batteries and, how could we account for that?

    So we actually went out to NOAA, which is a public repository for weather, and got the ambient temperature for the weather station closest to where this machine operates in the world. And then, using a MATLAB dictionary, captured the date and then the average daily temperature for each of the dates.

    Use the dictionary. It's a newer data type in MATLAB. Really helps you relate keys and values. And it just made sense to be able to use this here. So now I could take all of my data here. And I could add the ambient temperature in whatever units I need for the date that the event happened. So I can do a lookup on the date and report the average temperature.

    Now, putting this all together in context, the next visualization is looking at-- so on the top, we have our-- this is our estimated capacity-- let's see-- estimated, and then this is our internal resistance, here on the bottom. Purple dots represent temperature. That's on this axis on this side. And then the bigger dots that are colored represent the SOC bin, so whether it's 10% to 20% SOC, so on, because, again, thinking there might be different modalities here.

    Now, looking at this, we can say a couple of things. We start maybe with resistance. If we look when the temperature is low, the resistance is high, and when the temperature is high, resistance is low. This is consistent with the physics that we know how it behaves. So it gives us a sense that our internal resistance calculations have merit. They might not be exactly correct, but at least they're directionally correct and not way off.

    For the estimated capacity term we're trying to estimate, we do see some sort of seasonality here. That the estimate goes along or is somehow correlated with temperature. Don't if that's totally correct. It seems like maybe it's not, but at this point, we don't really know because, again, we don't have knowledge of how the battery management system behaves or how it was calculated or even if it has a temperature correction built into it. So this is one of those observations as we're walking through our data, what are we seeing and what does that mean to us?

    So we're going to keep going for now and use this just as our model, or as our inputs to our model. The last thing here, just again sanity checking, is looking at these internal resistance. So I take the log of the internal resistance versus 1 over temperature. And then the colored dots represent the different bins. Again, thinking that, perhaps, we can't just clump all this data together. Maybe we need to tease it out in a different groups.

    And what we learned from this is, by fitting lines to these individual SOC bins, we can see that the slopes are relatively the same. They are offset, of course, but the fact that the slopes are the same can give us confidence that we're on the right track, and we're going to attempt to use this for some sort of prediction.

    So we took each of these slopes, took the average of them, and called that the mean slope here. And then we're just using this then as the next parameter into-- this is what's called the Arrhenius equation. It's going to help us, I guess, normalize our internal resistance back to a common reference temperature. We chose 23 degrees Celsius as the common point. Could have chosen anything but chose that just as a common point because it is a rather common temperature to use.

    And then, we ended up-- this gamma term here is what the average of the three slopes I showed you before turns out to be, this guy. When we put all this together and we actually execute this equation, again, against each of our little windows, this is our normalized internal resistance. And that can be compared to our directly estimated DCIR here.

    So again, this is the value of keeping everything all in one place. You can use this to very easily just manage your data, then create additional plots or explain your science this way. And then, if we take a look at this box and whisker plot, again, this is looking at just the distribution of these things over time. We do see that there's some sort of outlier business going on up here that probably would warrant further research, but the fact that it seems relatively well behaved and gives us confidence that we're on the right track.

    And then, if we bring this all back together, all the way back to what we talked about with, hey, we want to get to this model, thinking we had this initial simple, linear formula in the past, we're going to have our alpha and beta terms as our calibrated parameters. We're just setting those equal weights right now. You can visualize our results like this. So we're looking at-- this is all of the data now for the entire year and a half.

    And we've got estimated state of health over here on the y-axis. We've got time on the x. The colored dots are just temperature. So again, we can try to understand, is there's some sort of temperature dependence here? And we can see that we're able to predict or somehow map out state of health versus time for a given ambient temperature.

    So I think it was George Box, a famous statistician, said, "all models are wrong, some are useful." And I love that quote because I think it's really true. And you can see our model here does, in fact, suggest that we have over 100% state of health, which obviously can't happen. So this tells us that at least we have a baseline, we have a workflow established, but we have some work to do on the model accuracy and making sure that it's predicting realistic values.

    And so hopefully, you've seen everything from end to end here, from the raw data coming off the CANedge device into an S3 bucket, going through that data engineering pipeline, getting us to an analysis friendly format, and then working through some of the visualization and understanding aspects of the data over time, using data stores to work with very large collections of files, and then ultimately, distilling this down into what are essentially events, and then, within those events, applying specific calculations and then aggregating those over time as well.

    So that's it for the example. And I think I'll hop back to the slides real quick and just wrap it up with, if you'd like to learn more, please visit either the CSS website or MathWorks. And Martin, did you have any closing comments?

    No. Thanks for sharing the insights into to your application as well. And as you said, if anybody is interested in learning more on the hardware side and integration with MATLAB using the CANedge, feel free to contact us and otherwise contact MathWorks for further details.

    All right, thank you very much.