Mountain

Introduction

One of the major new features to appear with the release of Unreal Engine in 5.0 is Mass, which was showcased extensively in the Matrix/CitySample demo. Many people have asked what exactly Mass is, and why it was created? In this article I'm going to use create a simple gameplay scenario and implement it in both the traditional GameFramework classes with Actors and Components, and then in Mass.

The article is aimed at novice programmers that do not have experience of ECS patterns, or perhaps more experienced programmers who are interested to see how to Epic have implemented their ECS in practical terms. Code samples here are not complete and contain code designed for brevity and ease of explanation. You can find the complete source and demo project here

midgen/UEPerfDemos: (github.com)

The Scenario - Many Random Movers

We'll start with a simple AI-related gameplay scenario. Say we want to have 1000 entities in our game, which move around randomly.

Each entity will need following data :

  • FVector Location - The current location of the entity
  • FVector MoveTarget - The location the entity is trying to move to
  • FVector Velocity - The velocity of the entity

Each frame we perform the following logic:

  1. If an entity is within distance X of it's move target, it selects a new random move target.
  2. The velocity is calculated as the unit vector from our current location to move target, multiplied by our speed.
  3. The location is updated by adding our velocity multiplied by the frame delta time.

To facilitate the comparisons, we'll put these functions into a helper namespace so we can easily reference the logic from both standard Unreal code, and Mass.

UEPerfFunction.h

namespace UEPerf_Functions
{
    void UpdateMoveTarget(const FVector& InCurrentLocation, FVector& OutMoveTarget);
    void UpdateVelocity(const FVector& InMoveTarget, const FVector& InLocation, FVector& OutVelocity);
    void UpdateMovement(const FVector& InVelocity, const float InDeltaSeconds, FVector& OutLocation);
}

UEPerfFunctions.cpp

namespace PerfFunction_Private
{
	static constexpr float MoveCompleteRange = 50.f;
	static constexpr float MoveCompleteRange2 = MoveCompleteRange * MoveCompleteRange;
	static constexpr float MoveTargetRange = 5000.f;
	static constexpr float MoveSpeed = 1000.f;
}

void UEPerf_Functions::UpdateMoveTarget(const FVector& CurrentLocation, FVector& MoveTarget)
{
	if(FVector::DistSquared2D(CurrentLocation, MoveTarget) < PerfFunction_Private::MoveCompleteRange2)
	{
		MoveTarget.X = FMath::RandRange(-PerfFunction_Private::MoveTargetRange, PerfFunction_Private::MoveTargetRange);
		MoveTarget.Y = FMath::RandRange(-PerfFunction_Private::MoveTargetRange, PerfFunction_Private::MoveTargetRange);
	}
}

void UEPerf_Functions::UpdateVelocity(const FVector& InMoveTarget, const FVector& InLocation, FVector& OutVelocity)
{
	OutVelocity = (InMoveTarget - InLocation).GetSafeNormal2D() * PerfFunction_Private::MoveSpeed;
}

void UEPerf_Functions::UpdateMovement(const FVector& Velocity, const float DeltaSeconds, FVector& OutLocation)
{
	OutLocation+= (Velocity * DeltaSeconds);
}

The Unreal GameFramework Implementation

So now we implement what I guess we have to call a naive implementation. We need 1000 entities in the world so we create a basic actor and then a component that will contain the required data and tick function.

PerfActorBase.h

UCLASS(Blueprintable)
class APerfActorBase : public AActor
{
	GENERATED_BODY()

public:
	explicit APerfActorBase(const FObjectInitializer& InInitialiser);
	
	UPROPERTY(BlueprintReadOnly)
	TObjectPtr<UPerfActorComponent> PerfActorComponent;
};

PerfActorBase.cpp

namespace PerfActor_Private
{
	static const FName PerfActorComponentName = TEXT("PerfActorComponent");
}

APerfActorBase::APerfActorBase(const FObjectInitializer& InInitialiser)
	: Super(InInitialiser)
, PerfActorComponent(CreateDefaultSubobject<UPerfActorComponent>(PerfActor_Private::PerfActorComponentName))
{
	
}

PerfActorComponent.h

UCLASS(ClassGroup=(Custom), meta=(BlueprintSpawnableComponent))
class UEPERFDEMOS_API UPerfActorComponent : public UActorComponent
{
	GENERATED_BODY()
public:
	UPerfActorComponent();
	virtual void TickComponent(float DeltaTime, ELevelTick TickType, FActorComponentTickFunction* ThisTickFunction) override;
private:
	virtual void BeginPlay() override;
	
	FVector Location{FVector::ZeroVector};
	FVector MoveTarget{FVector::ZeroVector};
	FVector Velocity{FVector::ZeroVector};
};

PerfActorComponent.cpp

UPerfActorComponent::UPerfActorComponent()
{
	PrimaryComponentTick.bCanEverTick = true;
}

void UPerfActorComponent::BeginPlay()
{
	Super::BeginPlay();

	// Make sure our location is synced to the parent actor
	Location = GetOwner()->GetActorLocation();
}

void UPerfActorComponent::TickComponent(float DeltaTime, ELevelTick TickType,
                                        FActorComponentTickFunction* ThisTickFunction)
{
	Super::TickComponent(DeltaTime, TickType, ThisTickFunction);

	SCOPE_CYCLE_COUNTER(STAT_PerfDemoUpdateMoveTarget)
	{
		UEPerf_Functions::UpdateMoveTarget(Location, MoveTarget);
	}
	SCOPE_CYCLE_COUNTER(STAT_PerfDemoUpdateVelocity)
	{
		UEPerf_Functions::UpdateVelocity(MoveTarget, Location, Velocity);
	}
	SCOPE_CYCLE_COUNTER(STAT_PerfDemoUpdateMovement)
	{
		UEPerf_Functions::UpdateMovement(Velocity, DeltaTime, Location);
	}

	GetOwner()->SetActorLocation(Location);
}

In the tick, we call our three logic functions, UpdateMoveTarget, UpdateVelocity, and UpdateMovement, then update the actor location. We have scope cycle counters around each function to facilitate profiling.

All good! Now we just have to put 1000 of these in our scene and set them going. If you look at the demo project, I've made a simple spawner that creates them automatically at runtime, rather than having to place them manually.

Let's run the game, and see it in action! Note that in the demo project, the actors are invisible to avoid the rendering bottlenecks getting in the way. You can check they are in fact moving around by inspecting the actor transforms in the outliner while the game is running. Bring up the console and enter Stat PerfDemo to see our timings. Here are the timings on my PC.

Counter CallCount Inclusive Avg
UEPerfDemo UpdateMoveTarget 1000 18ms
UEPerfDemo UpdateVelocity 1000 18ms
UEPerfDemo UpdateMovement 1000 18ms


Uh oh, we've already blown 30fps, and we're not even rendering anything.

We can have a look at our functions, but there's no obvious room for optimisation in the logic itself. Let's reduce the number of entities until we hit 30fps. On my aging PC I need to bring it down to 500 entities, which is still nowhere near good enough as we still have the rest of the game to build.

Pretty disappointing. Let's try implementing this in Mass and see what it can offer us.

The Mass Implementation

In the interests of keeping this article size manageable, we'll not go into the details of Entity Component System architectures here, rather just show in practical terms how to take the logic we created using the GameFramework and Actors, and move it into the Mass framework.

First up, we need somewhere to put our data. In mass, we don't use Components, we use Fragments, so let's create Fragments for our MoveTarget, Velocity, and Location data:

PerfDemoMassFragments.h

USTRUCT()
struct FPerfDemoMassFragment_Location : public FMassFragment
{
	GENERATED_BODY()
	FVector Location;
};

USTRUCT()
struct FPerfDemoMassFragment_MoveTarget : public FMassFragment
{
	GENERATED_BODY()
	FVector MoveTarget;
};

USTRUCT()
struct FPerfDemoMassFragment_Velocity : public FMassFragment
{
	GENERATED_BODY()
	FVector Velocity;
};

Now, Mass allows you to create preset collections of Fragments, known as Traits, to make configuring entities a bit easier, so let's make one:

PerfDemoMassTraits.h

UCLASS(meta=(DisplayName="PerfDemoRandomMovement"))
class UMassRandomMovementTrait : public UMassEntityTraitBase
{
	GENERATED_BODY()
public:

	virtual void BuildTemplate(FMassEntityTemplateBuildContext& BuildContext, const UWorld& World) const override;
};

PerfDemoMassTraits.cpp

void UMassRandomMovementTrait::BuildTemplate(FMassEntityTemplateBuildContext& BuildContext,
                                                    const UWorld& World) const
{
	BuildContext.AddFragment<FPerfDemoMassFragment_Location>();
	BuildContext.AddFragment<FPerfDemoMassFragment_MoveTarget>();
	BuildContext.AddFragment<FPerfDemoMassFragment_Velocity>();
};

Next, we need to execute our logic. In Mass, rather than implementing tick functions on Actors, we create Processors, which act on the Fragments we just created.

PerfDemoMassProcessors.h

UCLASS()
class UPerfDemoMoveTargetProcessor : public UMassProcessor
{
	GENERATED_BODY()
	UPerfDemoMoveTargetProcessor();
protected:
	virtual void ConfigureQueries() override;
	virtual void Execute(FMassEntityManager& EntityManager, FMassExecutionContext& Context) override;
	FMassEntityQuery EntityQuery;
};

UCLASS()
class UPerfDemoVelocityProcessor : public UMassProcessor
{
	GENERATED_BODY()
	UPerfDemoVelocityProcessor();
protected:
	virtual void ConfigureQueries() override;
	virtual void Execute(FMassEntityManager& EntityManager, FMassExecutionContext& Context) override;
	FMassEntityQuery EntityQuery;
};

UCLASS()
class UPerfDemoMovementProcessor : public UMassProcessor
{
	GENERATED_BODY()
	UPerfDemoMovementProcessor();
protected:
	virtual void ConfigureQueries() override;
	virtual void Execute(FMassEntityManager& EntityManager, FMassExecutionContext& Context) override;
	FMassEntityQuery EntityQuery;
};

PerfDemoMassProcessors.cpp

UPerfDemoMoveTargetProcessor::UPerfDemoMoveTargetProcessor()
	: EntityQuery(*this)
{
	ExecutionFlags = (int32)(EProcessorExecutionFlags::All);
	ExecutionOrder.ExecuteInGroup = UE::Mass::ProcessorGroupNames::Tasks;
	ExecutionOrder.ExecuteAfter.Add(UE::Mass::ProcessorGroupNames::Behavior);
	bRequiresGameThreadExecution = true;
}

void UPerfDemoMoveTargetProcessor::ConfigureQueries()
{
	EntityQuery.AddRequirement<FPerfDemoMassFragment_Location>(EMassFragmentAccess::ReadOnly);
	EntityQuery.AddRequirement<FPerfDemoMassFragment_MoveTarget>(EMassFragmentAccess::ReadWrite);
}

void UPerfDemoMoveTargetProcessor::Execute(FMassEntityManager& EntityManager, FMassExecutionContext& Context)
{
	const float CurrentTime = GetWorld()->GetTimeSeconds();

	EntityQuery.ForEachEntityChunk(EntityManager, Context, [this, &EntityManager, CurrentTime, World = EntityManager.GetWorld()](FMassExecutionContext& Context)
	{
		const int32 NumEntities = Context.GetNumEntities();
		const TConstArrayView<FPerfDemoMassFragment_Location> LocationList = Context.GetFragmentView<FPerfDemoMassFragment_Location>();
		const TArrayView<FPerfDemoMassFragment_MoveTarget> MoveTargetList = Context.GetMutableFragmentView<FPerfDemoMassFragment_MoveTarget>();

		for (int32 i = 0; i < NumEntities; ++i)
		{
			const FPerfDemoMassFragment_Location& Location = LocationList[i];
			FPerfDemoMassFragment_MoveTarget& MoveTarget = MoveTargetList[i];
			
			SCOPE_CYCLE_COUNTER(STAT_PerfDemoUpdateMoveTarget_Mass)
			{
				UEPerf_Functions::UpdateMoveTarget(Location.Location, MoveTarget.MoveTarget);
			}
		}
	});
}

//**********************************

UPerfDemoVelocityProcessor::UPerfDemoVelocityProcessor()
	: EntityQuery(*this)
{
	ExecutionFlags = (int32)(EProcessorExecutionFlags::All);
	ExecutionOrder.ExecuteInGroup = UE::Mass::ProcessorGroupNames::Tasks;
	ExecutionOrder.ExecuteAfter.Add(UE::Mass::ProcessorGroupNames::Behavior);
	bRequiresGameThreadExecution = true;
}

void UPerfDemoVelocityProcessor::ConfigureQueries()
{
	EntityQuery.AddRequirement<FPerfDemoMassFragment_MoveTarget>(EMassFragmentAccess::ReadOnly);
	EntityQuery.AddRequirement<FPerfDemoMassFragment_Location>(EMassFragmentAccess::ReadOnly);
	EntityQuery.AddRequirement<FPerfDemoMassFragment_Velocity>(EMassFragmentAccess::ReadWrite);
}

void UPerfDemoVelocityProcessor::Execute(FMassEntityManager& EntityManager, FMassExecutionContext& Context)
{
	const float CurrentTime = GetWorld()->GetTimeSeconds();

	EntityQuery.ForEachEntityChunk(EntityManager, Context, [this, &EntityManager, CurrentTime, World = EntityManager.GetWorld()](FMassExecutionContext& Context)
	{
		const int32 NumEntities = Context.GetNumEntities();
		const TConstArrayView<FPerfDemoMassFragment_MoveTarget> MoveTargetList = Context.GetFragmentView<FPerfDemoMassFragment_MoveTarget>();
		const TConstArrayView<FPerfDemoMassFragment_Location> LocationList = Context.GetFragmentView<FPerfDemoMassFragment_Location>();
		const TArrayView<FPerfDemoMassFragment_Velocity> VelocityList = Context.GetMutableFragmentView<FPerfDemoMassFragment_Velocity>();

		for (int32 i = 0; i < NumEntities; ++i)
		{
			const FPerfDemoMassFragment_MoveTarget& MoveTarget = MoveTargetList[i];
			const FPerfDemoMassFragment_Location& Location = LocationList[i];
			FPerfDemoMassFragment_Velocity& Velocity = VelocityList[i];

			SCOPE_CYCLE_COUNTER(STAT_PerfDemoUpdateVelocity_Mass)
			{
				UEPerf_Functions::UpdateVelocity( MoveTarget.MoveTarget, Location.Location, Velocity.Velocity);
			}
		}
	});
}

//**********************************

UPerfDemoMovementProcessor::UPerfDemoMovementProcessor()
	: EntityQuery(*this)
{
	ExecutionFlags = (int32)(EProcessorExecutionFlags::All);
	ExecutionOrder.ExecuteInGroup = UE::Mass::ProcessorGroupNames::Tasks;
	ExecutionOrder.ExecuteAfter.Add(UE::Mass::ProcessorGroupNames::Behavior);
	bRequiresGameThreadExecution = true;
}

void UPerfDemoMovementProcessor::ConfigureQueries()
{
	EntityQuery.AddRequirement<FPerfDemoMassFragment_Velocity>(EMassFragmentAccess::ReadOnly);
	EntityQuery.AddRequirement<FPerfDemoMassFragment_Location>(EMassFragmentAccess::ReadWrite);
}

void UPerfDemoMovementProcessor::Execute(FMassEntityManager& EntityManager, FMassExecutionContext& Context)
{
	const float CurrentTime = GetWorld()->GetTimeSeconds();

	EntityQuery.ForEachEntityChunk(EntityManager, Context, [this, &EntityManager, CurrentTime, World = EntityManager.GetWorld()](FMassExecutionContext& Context)
	{
		const int32 NumEntities = Context.GetNumEntities();
		const TConstArrayView<FPerfDemoMassFragment_Velocity> VelocityList = Context.GetFragmentView<FPerfDemoMassFragment_Velocity>();
		const TArrayView<FPerfDemoMassFragment_Location> LocationList = Context.GetMutableFragmentView<FPerfDemoMassFragment_Location>();
		const float WorldDeltaTime = Context.GetDeltaTimeSeconds();

		for (int32 i = 0; i < NumEntities; ++i)
		{
			FPerfDemoMassFragment_Location& Location = LocationList[i];
			const FPerfDemoMassFragment_Velocity& Velocity = VelocityList[i];

			SCOPE_CYCLE_COUNTER(STAT_PerfDemoUpdateMovement_Mass)
			{
				UEPerf_Functions::UpdateMovement( Velocity.Velocity, WorldDeltaTime, Location.Location);
			}
		}
	});
}

You will notice some interesting snippets of code in the processor constructors, which we'll cover in more detail in part 2 of this series, but for now, just be aware that we've explicitly indicated that the processors must run on the game thread. This isn't necessary for the work we're doing in these processors, but done to keep the comparison benchmarks simple for this article. Note that as of the release of Unreal 5.1, Mass Processors will default to running off the game thread.

Now we're done with the code, we can fire up the editor and create the entity definition. We do this by creating a new MassEntityConfig data asset, and adding the PerfDemoRandomMovement trait we created earlier to it.

entityconfig.png

We add a Mass spawner to the level, and configure it to spawn 1000 entities. We'll use an EQS spawn generator to create a grid of spawn points, the config should look something like this.

spawnerconfig.png

Now, we're ready to hit play and see what using Mass has done for us.

Counter CallCount Inclusive Avg
UEPerfDemo UpdateMoveTarget_Mass 1000 0.06ms
UEPerfDemo UpdateVelocity_Mass 1000 0.06ms
UEPerfDemo UpdateMovement_Mass 1000 0.06ms

 

We're executing *exactly* the same code, exactly the same number of times, but it's taking a fraction of the time. Doing a little testing, we can hit 30fps with 100,000 entities. We've gone from having 500 entities moving around randomly in our 33ms frame, to 100,000.

Where Does The Speed Come From?

In the GameFramework implementation, the data for each entity is stored in a ActorComponent, which, like all UObjects in Unreal, is simply new-ed onto the heap wherever the allocator sees fit. When we tick all the components, each time we're having stall the CPU to fetch the component data into cache, do the calculations, go look for the next component data, stall the CPU again, etc. In fact, our artificial test scenario probably runs a little faster than it would in a complete game, as we only have these entities in the scene, so the chances of a cache hit are fairly high, although I did try to make the scene more representative this by putting a few dummy components on each actor.

In Mass, all Fragments are stored in contiguous arrays (as with most ECS patterns, some kind of chunked array so you don't get hit with big copies as N grows). Mass processors when they execute then simply iterate over the container, as they're processing data for entity N, the data for N+1,2,3 is either already in the same cache line, or the prefetch engine can have it ready in another line. We eliminate all the cache misses and subsequent stalls while data is fetched, which is where we lose all the time in the GameFramework implementation.

There are also important benefits when it comes to multithreading of using the ECS pattern that will be covered in the next part of this series.

For more detailed information on where this performance comes from, have a search for Data Oriented Design, and Entity Component Systems. The PDF What Every Programmer Should Know About Memory is a great read if you want to go into detail.

The Trade-Off

You may now be thinking, why don't we pile all our code into Mass to see these performance gains? But alas, there is a trade-off here.

In our GameFramework implementation, all the entities are Actors that exist in our world. Mass entities do not exist in the world, they live in an alternate reality that is generally referred to in Mass as the Simulation. Data in the simulation is completely separate from the UObject world that Unreal levels are built around. To understand how we bridge the gap from the simulation, to the level, keep an eye out for part 2.