Why you should use Git command line

git-cli

Writing this for developers, testers, UX designers or anyone who might interact with source code stored in Git but as yet hasn’t used Git command line, instead uses a GUI or integration (Tortoise, Source Tree, Intellij/VS integration etc.).

Git command line is the low level program that interacts with git repos in a terminal or powershell window. I know that using the terminal and typing commands is a big step for most inexperienced users, but please stick with me and hopefully I’ll convince you.

Why you should use Git command line

You will know and understand what you are doing

Most people I’ve known starting with Git first start with a GUI tool. Something to handle Git for them. While understandable, this is a mistake as you do not know what the tool is doing or learn how Git works.

GUI tools will be using the same Git commands under the hood, they will just hide them from you. Whether you are learning source control from scratch or just new to Git it will help you in the long run to understand what the basic commands are and what they do.

You will know exactly what you are changing

Source control GUI tools hide some of the complexity of using git, but in doing so hide what they are actually doing. This can be as simple as pre-selecting the list of modified files for a commit, or as complex as handling a merge for you. Either way, you no longer know exactly what the tool is doing and changing.

With Git command line you are forced to declare exactly what files in source you are changing and can see exactly how they have changed.

git status and git diff, how I love you.

It’s not that complicated

My normal day to day use of git only uses 7 commands; pull, checkout, status, diff, add, commit and push.

Truthfully, for anything more complicated I just search for it. I haven’t memorised much else and you shouldn’t have to either.

Command line is universal

Git command line is the same on every machine, every environment. Learn it once and you won’t have to learn it again. Not if you switch editor, language or go from Windows to Mac. The commands won’t change.

Different GUI tools use different UI, different names for the actions and even apply different low level operations for actions, e.g. may automatically pull before a push or rebase.

Even authentication is standardised, as you can setup your ssh key so you don’t need to keep entering username/password for operations.

How to learn Git command line

It’s easier than ever to pick up and learn git, here’s a few resources to help you start:

Conclusion

I hope this has convinced you to give Git command line a try. If not at least you will understand why I sigh when I see you trying to fix a git issue with your mouse.

Using private Nuget packages hosted in VST

Writing this as a quick guide to using your own private Nuget packages hosted in a private feed in VSTS for dotnet core. There is official documentation but I encountered enough issues that I think it’s worth documenting.

1. Install Package Manager into your VSTS

You must install Packagse Manager extension into VSTS. There’s a 30 day free trial, after which you must pay.

Setup your private feed, this will be used to publish your private packages to authenticated VSTS users needing the packages in Visual Studio and in the VSTS builds.

2. Create your Class Library needed as a package

Create the Class Library project in Visual Studio which you need packaged.

NOTE: As of writing in dotnet core you must create as Console App and update to class library in project properties->Application->output type due to issues with templates.

In project properties->Package setup the package metadata. Do not check “Generate Nuget package on build”. Version number will be overridden in VSTS build definition.

nugetp

3. Setup VSTS Build definition for Nuget Package

Add a new build definition for your Class Library repo/project, based on the template for ASP.NET Core template (sets defaults for project paths and build version number).

  1. Remove the Publish setups.
  2. Replace dotnet restore with a Nuget task restore (to allow using your own feed).nuget1
  3. Use a dotnet pack task to build the Nuget package with explicit version number based on build.nuget2
  4. Use a Nuget task to push the build package to the private feed.nuget3

4. Reference your private Nuget package in another project

Add a Nuget.config file to your project which needs the private package dependency to use the private Package Manager feed.

nugetconfig

You can also add this in your Visual Studio global Nuget.config but makes the feed explicit for others using the same source. You will be prompted to authenticate with VSTS the first time you build to resolve the dependency from the feed.

In the build definition for the project add a Nuget Restore step which references your private feed (standard dotnet restore will not pick up the Nuget.config or authenticate with the private feed).

nuget4

Tricks and traps (as of writing 2017/07/25):

  • Standard agent queue “Hosted” does not support dotnet core, use “Hosted Visual Studio 2017
  • dotnet restore does not support using Nuget.config or authenticating with private feed, use Nuget restore task
  • Nuget pack does not support dotnet core packages, use dotnet pack with explicit version option

Getting word count from template files

Recently I had to get an approximate word count of an entire site for estimating translation time. To do this I processed the template files to get all the non-html tag/logic content using find and sed, then counted the words using wc.

# from views directory

# create .out files with HTML tags stripped
find . -name '*.html' -exec sh -c 'sed -e "s/<[^>]*>//g" $1 > $1.out' -- {} \;

# create .out.bout files with nunjucks/jinja2 tags stripped
find . -name '*.html.out' -exec sh -c 'sed -e "s/{[^}]*}//g" $1 > $1.bout' -- {} \;

# word count
find . -name '*.html.out.bout' | xargs wc -w

Links:

Session data is the manageable devil you know

devil-29973_640

Last year I wrote a bit of a rant post “Session data is evil” coming out of some projects which suffered from session related problems. Time and some experience trying to avoid sessions have softened my opinion, so I thought I would write a counter-point to that post.

It’s extremely hard to avoid state

People think in states. They naturally work incremently, adding a little here, editing/removing a little there, not in large atomic chunks. The means they don’t like large complex forms that require everything being entered/edited at once. They also expect little things that require state, remembering preferences and where they were in an application process. While it is possible to break down your application while avoiding state, it means increasing the complexity of your persistence and routing, adding complexity.

Session data is the most straight forward way to deal with these hidden requirements.

Dealing with sessions is a known problem

Sessions may be tricky, but there’s decades of experience dealing with them. Most web servers and frameworks have explicit tools and best practises for using them, supporting sticky sessions and external session state for multiple web servers.

Using sensible approaches it is entirely possible to scale and handle sessions correct.

Over engineering causes worse problems

If in an effort to remove sessions you add tons of persistence and routing complexity, you are just adding more code and places for things to go wrong. A simple session based approach is easier to maintain and understand than a over-complicated stateless one that is making explicit calls every request. It will work fine so long as you use session sensibly, encapsilate it, understand the limitations and how it will work in production conditions.

Large scale and PaaS applications may have to be much more careful, but there are still ways to work with sessions for them.

Conclusion

Don’t abandon session out of fear or fashion, it’s a simple and extremely common approach for managing state in a world that demands it, just don’t shove everything in there…

The symmetrical architecture trap

Often when thinking about topics to write about I hesitate, as in retrospect what I’m saying seems obvious. But it’s very common to fall into simple patterns when you are in the thick of a project, doing something which the flaws only become apparent later when early chaos is over and you can think clearly.

One of these traps is symmetrical application architecture, making two similar components in your solution use the same architecture, even when they have different requirements.

A common example of this is when a web application will have a public facing external site and a more secure internal site for administration. On the surface these two components have similarities, they both serve HTML and need to access/persist data, so you may initially use the same architecture for them.

Symmetric architecture diagram

However, you soon realise the external site needs to handle much more traffic than the internal, and it’s data requirements are different (higher read or write, only needing access to specific data). You can resolve this by scaling the architecture, but it’s clunky.

Symmetric architecture with external load diagram

Then you realise the internal site has more security and auditing requirements. You can resolve with implementation changes but it would be neater to include additional layers or services.

Symmetric architecture with internal audit security diagram

The symmetry of the architecture becomes a conceptual barrier to change, changing either one appears to be introducing more complexity but in reality the implementations are diverging anyway due to their different circumstances. Looking at them individually and at how they will be hosted on less abstract infrastructure diagrams can help. Could be your external and internal sites don’t need the same data store or layers, and changing them could save resources and simplify implementation.

Asymmetric architecture diagram

Embracing asymmetry in your architecture early can help you break out of this mindset and prevent you hitting problems later when your implementation work arounds start to creak.

Design – Minimalist principle of least privilege web application approach

principle-of-least-privilege-db-design-1

I used this design on a recent project and wanted to write up my thoughts.

This approach was taken as the project was a small scale web application with quick time scales. I’d previously worked on several projects which took a generic web-api-db pattern, even when there was no plan or ability to scale or separate the components out, so the implementation increased complexity for little gain. Also principle of least privilege had come up in some security reviews, database permissions not really being considered early on.

I wanted to see if I could cut out the API components, that added an additional layer of mainly boilerplate code, without resorting to a monolith design. This also reduced the complexity of the infrastructure and networking. Experience from looking at database permissions made me aware that users/roles/schemas permissions can be set very fine grained, providing assurances that connections can be locked down to specific tables/operations (e.g. SELECT/UPDATE only, no DELETE)

You could take this  further and go for full microservice split, with internal/external each having a separate worker and communicating via limited exposed API endpoints, but for this project that wasn’t really necessary and I was sick of designs dictated by patterns rather than needs.

Scenario

You have two applications:

  • external-web
    • Public site used by unauthenticated users and exposed to the internet
    • Allows users to submit application data to be processed, with a limited view of previously submitted application data
    • High risk, don’t want users to potentially view other users application data or change details
    • Higher usage than internal (public facing, unpredictable traffic)
  • internal-site
    • Internal site used by authenticated users and IP restricted
    • Allows users to process applications
    • Lower risk, but still don’t want users to be able to perform actions like deleting records or submitting applications
    • Low number of active users (small team)

Proposed solution:

  • Split the data stores, so external-web and internal-web have their own stores, with external-web only holding data as long as necessary
  • Use permissions to prevent each application from doing anything other than the minimum they need on their stores (principle of least privilege)
  • Use a worker application, not exposed or directly connected to either application, to move data between the two
  • Use either a special API or function to allow external-web to query historic data with limited access, so it cannot query the entire store

Thoughts on outcome

Pros

  • Simple and low number of components (moving parts that could go wrong)
  • Low infrastructure requirements
  • Still able to scale internal/external independently
  • Less code and complexity
  • Public facing site only has access to data in transit, not large amounts of long term data

Cons

  • Public facing application has access to database (even if limited to select/updates)
  • Unable to scale external/internal API independently from sites
  • Worker unable to scale independently of external/internal
  • Lose a lot of relational integrity from copying between stores if using relational stores

Links

Managing data store changes in containers

docker-container-ci-data-migrations

When creating microservices it’s common to keep their persistence stores loosely coupled so that changes to one service do not affect others. Each service should manage it’s own concerns, be the owner of retrieving/updating it’s data and define how and where it gets it from.

When using a relational database for a store there is an additional problem that each release may require schema/data updates, a known problem of database migration (schema migration/database change management).

There a large number of tools for doing this; flyway, liquibaseDbUp. They allow you to define the schema/data for your service as a series of ordered migration scripts, which can be applied to your database regardless of it’s state as a fresh DB or existing one with production data.

When your container service needs a relational database with a specific schema and you are performing continuous delivery you will need to handle this problem. Traditionally this is handled separately from the service by CI, where a Jenkins/Teamcity task runs the database migration tool before the task to deploy the updated release code for the service. You will have similar problems with containers that require config changes to non-relational stores (redis/mongo etc.).

This is still possible in a containerised deployment, but has disadvantages. Your CI will need knowledge/connection to each containers data store, and run the task for each container with a store. As the number of containers increase this will add more and more complexity into your CI which will need to be aware of all their needs and release changes.

To prevent this from happening the responsibility of updating their persistence store should be on developer for the container itself, as part of the containers definition, code and orchestration details. This allows the developers to define what their persistence store is and how it should be updated each release, leaving CI only responsible for deploying the latest version of the containers.

node_modules/.bin/knex migrate:latest --env development

As an example of this I created a simple People API node application and container, which has a dependency on a mysql database with people data. Using Knex for database migration, the source defines the scripts necessary to setup the database or upgrade it to the latest version. The Dockerfile startup command waits for the database to be available then runs the migration before starting the Node application. The containers necessary for the solution and the dependency on mysql are defined and configured in the docker-compose.yml.

docker-compose up

For a web application example I created People Web node application, that wraps the API and displays the results as HTML. It has a docker-compose.yml that spins up containers for mysql, node-people-api (using the latest image pushed to Docker Hub) and itself. As node-people-api manages it’s own store inside the container node-people-web doesn’t need any knowledge of the migration scripts to setup the mysql database.

Links