I am new to Git. I mainly use Git to publish releases of Matlab packages. The packages consists of a core with some functions and folders. In addition to that I have support folders and files (with examples, documentation and some test scripts).
Before using git I would work on a new release by copying only the core, and some test scripts to a new folder. I would then update the core, and the test scripts until everything works. And then I would update the support folders and files such that they are consistent with the core, and publish the release.
My questions is: how would you organize this work flow with Git?
Intuitively, I would bascially keep doing the same: make a new branch say "update_main", and copy only the core and the test scripts into that branch. Make that branch work, and then update and add the support files to that branch. And finally merge (or simply overwrite) the "main" with the "update_main" and publish the new release.
This procedure ensures that the core and support files are consistent within each branch, which seems kind of important to me.
However, my suggested approach implies that I would hardly use Git for the main process, since I could just as well make a new directory, and build the new release there. Git would then only help with merging code (for example when I work on two updates at the same time).
One disadvantage of using branches in this case would be that they are not directly visible in directories, so I am not even sure using Git branches has that many benefits compared my old approach where I would build a new release in a new directory.
Does this make sense? Does anybody have any thoughts or advice on this?

2 comentarios

Ilya Gurin
Ilya Gurin el 6 de En. de 2022
First, welcome to Git! It's my all-time favorite productivity tool.
I'm not sure what exactly you're trying to do with your "packages." How many packages are we talking about? Do they have a common "core", with other elements that are unique to each package?
Sargondjani
Sargondjani el 6 de En. de 2022
@Ilya Gurin yeah, so far I love Git too! Just trying to get the hang of efficient workflows...
What I call a "package" is basically a toolbox (core) plus support files. So we may basically assume one package = one toolbox = one repository. Everything is unique for each toolbox (no overlap), so we can discuss as if there is only one toolbox.

Iniciar sesión para comentar.

 Respuesta aceptada

Benjamin Kraus
Benjamin Kraus el 7 de En. de 2022
Editada: Benjamin Kraus el 7 de En. de 2022

1 voto

Welcome to the world of version control. Git is an amazing and powerful tool, but it can take some getting used to.
I would suggest you take a step back from the specifics of your project (or even MATLAB) and just start with some Git (or even version control) basics.
The first basic is that if you find yourself copying an entire folder for a release, you are probably not using Git to it's full advantage. If you are a developer working alone on a project, you can still heavily benefit from Git without ever branching or copying your folder. I assume you've done this already, but start by creating a Git respository in your code directory, then checking in all the files. My suggestion would be to start with the latest release, check that code into Git, then immediately tag the current state (using git tag). By tagging this state, you can always restore your current working directory back to that state using git checkout). There is no need to manually make a copy of the folder. As long as you've committed any changes into Git, you can always use git checkout to switch to any other version of your files, all within the same directory.
As you work, whenever you complete a small chunk of work (it is up to you to decide what "small" and "chunk of work" mean), check that into Git. Every time you call "git commit" the code you've submitted is given a unique label, allowing you to restore that state. Git tag is just a way to give a friendly name (rather than a long and complicated automatically generated name). Once you are ready for a new release, use git tag again to name that specific version of the code.
The only reason to have two separate folders with to different copies of your code is if you want to be able to run two versions of your code at the same time (or perhaps open them side-by-side, but there are ways to do that in git as well). However, once your code is in Git, you shouldn't be copy/pasting an entire folder any more. You should be using "git clone", "git push", and "git pull" (and "git fetch") to create a clone of one directory into another directory. This isn't required, but this will work best if you pick a hosting service (like GitHub or GitLab), and then each copy of your code can synchronize with that server.
Once you've got those basics down, there are a few reasons to branch, such as:
  1. You want to work on two independent features. You can create a branch for each feature, and then when the feature is done you can merge it back into the main branch.
  2. You want to apply bug fixes to a past release, without incorporating all the new features into the past release. You create a branch based on the release (you can create a new branch from a git tag), and apply the fix to just that branch.
  3. You are working in a team.
When you get to that point, you may want to look at some online articles and tutorials regarding different branching models for Git. There are a ton of different articles (and opinions) on this topic. I did some very very quick Google searching using search terms like "git branching models" or "git branching strategies" (and I am not endorcing any of these specific models), but to give you some specific examples of what I mean, here are some links:
This isn't required, but my personal recommendation would be to learn the command line versions of all of the above first, get a good solid understanding of how Git works and what it means to commit, branch, tag, merge, rebase, push, and pull. Once you've done that, you can start leveraging the tools built-in to MATLAB to make your life easier, but it will be easier to understand those tools if you've learned the command line versions.
I hope that helps get you started!

15 comentarios

Sargondjani
Sargondjani el 7 de En. de 2022
@Benjamin Kraus thanks a lot! That's very useful advice! Especially the tagging thing. I am already familiar with the most common commands using some test repo (this video was very useful: https://youtu.be/RGOj5yH7evk)
But I still have two issues:
Issue 1: testing two versions of my code at the same time
The crucial thing is this:
"The only reason to have two separate folders with to different copies of your code is if you want to be able to run two versions of your code at the same time (or perhaps open them side-by-side, but there are ways to do that in git as well)."
This is exactly the reason why I would work with two directories. I want to compare two versions of my code. I saw Matlab has the possibility to switch between Git branches, but I have the feeling it's not so easy to use for my purpose.
Let's say I have a project that uses my toolbox/repository, and I want to compare the results withtwo different versions of the toolbox. If I use two directories, inside the project code I can simply select which of the two version to use by selecting the appropriate directory:
if strcmp(opt_version,'AA')
addpath C:\repo_versionA
elseif strcmp(opt_version,'BB')
addpath C:\repo_versionB
end
Also, I can have the two versions open next to each other in the Matlab code editor.
What would be the alternative way to do this with proper Git use?
Issue 2: Consistency of examples with latest version
I have quite a few examples in the repository. If I work on a new version, I want to be able to keep track of the examples that have been adjusted to the new version and which not. So the safest way to do that is to simply delete all examples, and add them only after updating them to the new version.
Is there an alternative to this? Such that I can still easily see which files have been adjusted and which not?
Benjamin Kraus
Benjamin Kraus el 7 de En. de 2022
Issue 1:
If your goal is to just be able to arbitrarily pick which version to run, then you could do something like this:
addpath('C:\codeRepo')
cmd = sprintf('git check %s', opt_version);
system(cmd)
This just checks-out from git whatever version of the code you want to run, but only one directory is added to the path. Note this only works if you don't have to compile anything (and nothing is P-coded).
However, I agree, there are many reasons you want to have two directories with two different revisions of your code. The complication with this is keeping the two directories in-sync with one another, not just the code itself, but the underlying repositories. WIth git, every single copy of your repository has a copy of the entire git repository and history, so if you have two directories with code from your repository you need to do something to keep the two copies in-sync with each other.
If one of the two directories is static (for example, a past release that you are not changing any more), then I guess in this case the simplest approach would be to just copy the directory. At that point it may be considered best practice to delete the ".git" directory, to prevent you from accidentally committing new changes into that directory (and effectively making independent repositories that can now drift from each other). You could also look into git commands to do this for you (I've never used it, but "git archive" looks promising).
If you plan to actively develop in both directories, then you want to takes some action to allow you to keep git histories in-sync between the two copies. Ironically, this most likely involves a third copy of the repository. This third copy will be something called a "bare repository", which can either live on your local computer but much more commonly will live on a service such as GitHub or GitLab. The first hit I found on Google for bare repositories is this: https://www.saintsjd.com/2011/01/what-is-a-bare-git-repository/
Basically, the idea is you have two "working directories" and a single central respository. Each working directory has a copy of the repository but when you use "git push" and "git pull" you can synchronize the respository (git history) from each working directory with the main repository. The basic process is:
  1. You start with a local git repository that exists exclusively in your working directory.
  2. You create a "bare repository", either in a separate folder, or on a hosting service like GitHub. This repository starts with no history and no files.
  3. You link your working directory with the bare repository using "git remote". Note that when you first create a new GitHub (or GitLab) repository, it prominently shows you exactly what code to run to complete this step, customized for your specific repository.
  4. You "git push" from your local repository into the new central repository.
  5. You use "git clone" to create the second working directory. When you use "git clone" the repository is already linked to the central repository that you cloned from.
I think technically you can do this with just two working directories (synchronizing between each other), but it is much more common to use the third bare repository. This process would be something like this:
  1. You start with a local git repository that exists exclusively in your working directory.
  2. You then run "git clone" to create a new repository based on the working directory.
Issue 2:
Deleting the examples may be a bit extreme. The examples should be checked-in to the same repository with the code, and then you can track changes to the code along with changes to the examples.
I think if you were going to follow a formal code development process, this is basically how you should do it:
  • Every example should have an associated test that runs the example against the current code base. You can take a look at the MATLAB Unit Test Framework for how to write MATLAB based tests.
  • When you start working on a new feature (not a new release, but a new feature) you create a new git branch (a "feature branch") for that new feature. A feature can be very small (one new input argument) or large (completely rewriting an entire function), that part is up to you.
  • As you develop the feature, before each commit you run the tests, if any of them fail you update the corresponding example immediately, so that each commit has examples that are consistent with the current code base.
  • Once the feature is done being developed, you can add any additional examples (and any corresponding tests) and commit those to the feature branch.
  • Once you are satisfied with the feature, you merge that feature branch back into the main branch.
Technically speaking, a very formal code development process will include something called "test driven development"... in other words, you update the examples and write the tests first, then you update the code until the tests pass.
The nice thing about this process is that when you are done with the feature, the examples are already done (and tested). And... be honest, how often have you discovered bugs in your code while writing examples? This process helps you fix those bugs while the code for the new feature is fresh in your memory, and prevent regressions. However, it can be a lot for a small project being developed by one person, so you can choose how much of this process to follow.
Perhaps a more casual variant of this approach would be to create a branch, develop several features, checking them in along the way, then just remember that before you merge that branch back into the main code you need to update the examples, then you just update the examples and commit those as a separate step.
Ilya Gurin
Ilya Gurin el 7 de En. de 2022
Once your code is committed, you should never copy the directory. If you find yourself tempted to copy the directory, stop and figure out if there's a way to solve your problem within Git. The only exception to this is if you think Git is malfunctioning and you want to debug (it's happened to me before).
Also, the distinction between "core", "support", etc. isn't relevant to version control. Consider them all project files. The idea is that another user should be able to clone the repo and run your project. You can impose any folder structure you want and Git will be fine. (It gets a little tricky if you move files later, but if you already have an established folder structure, you shouldn't have to worry about this.)
Sargondjani
Sargondjani el 7 de En. de 2022
Thanks again @Benjamin Kraus and @Ilya Gurin. I understand a lot better now what the normal approach is. And especially the code to choose a checkout/branch seems very usefull!
One last question with respect to Issue 2: is there a way to "flag" files that still need updating? So in my case, can I flag all example files, so I can easily see which files still need updating to the new feature?
Sargondjani
Sargondjani el 7 de En. de 2022
Editada: Sargondjani el 7 de En. de 2022
"Also, the distinction between "core", "support", etc. isn't relevant to version control. Consider them all project files. The idea is that another user should be able to clone the repo and run your project."
That is exactly why I would delete all the examples as long as they are not updated to the new feature. So I guess the standard approach would be to use a branch, and only merge it with the main once all the examples and documentation are up to date as well.
The thing is I am running this show alone, so there can be a huge time gap (months) between developing a new feature, which I already use for my own projects, and updating all the examples and the documentation, because that is a lot of work (say a full week of work, and it is not part of my main job).
Benjamin Kraus
Benjamin Kraus el 7 de En. de 2022
I'm not aware of any mechanism built-in to git to flag a file as still needing to be updated.
One really simple idea would be to just add the following line of code to the top of every example:
error('This example has not been updated to reflect the new release yet.')
You can check-in that change instead of deleting the examples, so that the examples remain in the version control, but you have a clear way to search for and track the examples that have not been updated. Then, as you update examples, you can just delete that error line. Of course, you can use disp instead of error, or any other way of signaling the example is outdated.
Maybe another option that is less invasive, and perhaps even more useful: At the start or end of each exmaple, you have a statment to the effect of "This example was last updated for release number X" or "This example was last updated on December 24, 2021 for release number Y".
This provides your end-users with a clear statement that the example may be outdated, and also serves as a reference for you, you can quickly see what examples have not been updated.
There are ways to automate that with git (i.e. you can have git automatically update the date in the file whenever the file is modified), but that is probably more work that is necessary for your purposes.
If you went with the test based approach, there is a mechanism built-in to the testing framework for marking a test as not finished (or "expected to fail"). In this case, you would have a test file that runs all your examples, then instead of the error statement in the example, you would add this line of code to your test:
testCase.assumeFail('This example has not been updated to reflect the new release yet.')
This signals to the testing framework not to bother running this test because you know it will fail, but the test code remains, and the testing framework tracks "passed", "failed", and "filterd" tests, so you can keep track of tests (and therefore examples) that still need to be updated.
Sargondjani
Sargondjani el 7 de En. de 2022
Thanks again for taking your time to think with me!!
Yeah, the error note would also be an option. In fact, this made me realise one important aspect: I actually only want my public repo to show the latest release, which is complete customer-ready. So basically, I only want the public repo to show releases that have been tested and completely updated.
So I should probably do all the development in a private repo?? That would mean I need an extra repo, where I do the development, and then when its ready for public release 'copy' the thing to the public repo.
Is that also a commonly used approach? I guess commerical software would use something like that as well? Or do they just have private branches?
Benjamin Kraus
Benjamin Kraus el 7 de En. de 2022
The common approach is to do all development on the same repository, and then tag releases. There are built-in mechanisms on GitHub to do release tracking, so users can always download the latest release, even if the code has been updated since then. Most projects on GitHub have the latest and greatest "bleeding edge" available with all the latest commits, but it will also have a tag with the latest release so you can download the version that is stable.
The important thing to note about git is that every copy of your repository is technically a different repository, and the repositories are kept in-sync using "git push" and "git pull". If you do all your work locally and never call "git push", the remote repository will never know.
Further, technically, each copy of the repository has it's own branches, but when you create branches you can tell git whether to mirror the branches across repositories. By default branches are not mirrored, unless you specificall tell it to push the branch to the remote server.
In addition, once you call "git push", what your end-users will see depends on your merging strategy: You can merge and preserve history, or you can "rebase" which will collapse all revisions into a single commit. Do a search for "merge vs. rebase" and I'm sure you can find some articles on the topic.
So, for instance, if your server (such as GitHub) has a repository, and you are doing local development, you may have dozens of local branches, but they will never be visible unless you push that branch to the server. You can accomplish your goal of hiding intermediate states by having release branches that are mirrored on the server, and development branches that are only local, and always use a rebase when merging from development branches into your release branch (which will "collapse" the history into a single commit).
Benjamin Kraus
Benjamin Kraus el 7 de En. de 2022
"So I should probably do all the development in a private repo?? That would mean I need an extra repo, where I do the development, and then when its ready for public release 'copy' the thing to the public repo."
Just to reiterate... if you find yourself copying a repo, you are probably not leveraging git to its full extent.
And, your repository on your local machine is already a private "clone" (which is the git way to copy a repository) of the public repository, so there is no need to create another "private" copy.
Sargondjani
Sargondjani el 7 de En. de 2022
Ok, got it. Thanks again for the advice!
Sargondjani
Sargondjani el 8 de En. de 2022
Tagging is one important feature I didnt fully understand before. When you create a tag (and push it to remote), then you can download a zip file with the whole repo at that point. This makes things a lot easier!! So now Im convinced to put all files in my private repo.
But I am also sure I dont want all files and folders to be in the public repo (because I want the public repo to be as clean as possible, and only include relevant things).
Is there an easy way to only take certain files from my private repo, and put them in a public repo? (Or possibly even take files/folder from several repos?).
I guess I could clone the private repo, and then use gitignore to upload to the public repo, but that's probably not the most elegant way to do it.
Sargondjani
Sargondjani el 8 de En. de 2022
Again thanks a lot both for pushing me to includes all files in Github. I have done it, and worked the whole day on updating the code, and it works very well indeed!! I also got the hang of testing with different branches.
Because I include all files I have to work with two repos though, because not all things can be public, plus I would like my public repo to be as clean as possible.
So if any one has any ideas on how to take only certain files/folders from my private repo, and put them in a public repo, I would be very happy. I am going to research it a bit more anyway, because I believe more people will do this.
Ilya Gurin
Ilya Gurin el 10 de En. de 2022
I've put some effort over the years into developing a test infrastructure. I don't use Matlab's built-in test features; instead I rolled my own code that executes test cases and counts failures. Works for me, but YMMV. The important thing for this discussion is that I have a function called run_tests that automatically finds all the test cases and runs them in sequence. When I'm disciplined, I make sure to run_tests before making a release. If you either have sufficient tests to cover each new or modified feature, you'll know right away when everything is synced up. (This approach also works if you drop in error statements when you know your "examples" are going out of date.)
Sargondjani
Sargondjani el 12 de En. de 2022
Yeah, its good you mention this testing thing. I also automated my testing a lot more. It helps a lot
Bogdan Bodnarescu
Bogdan Bodnarescu el 14 de Jun. de 2022
Editada: Bogdan Bodnarescu el 14 de Jun. de 2022
Is there a similar topic for workflow Simulink with Git?
Until now I couldn't find a way to merge Simulink models that are modified on different branches, also merging the generated code files and all the other .mat files and other type of files that Embedded Coder generates is a nightmare.
For a simple branch merge it takes me around 1 day of work to solve all the conflicts and even after so much work I can still find bugs introduced by automatically merging of the simulink models.
Is there any way to tell git not to do anything with some specific file types, I saw that if I configure a Matlab Project then a .gitattribute file will be generated, but still the automatic merge will be triggered by using meld or modelmerge from Matlab so this doesn't seem to work either?
Should I make a new topic for Simulink and Git only?

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Etiquetas

Preguntada:

el 6 de En. de 2022

Editada:

el 14 de Jun. de 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by