Unity uses a bunch of built-in markers. Those markers gives hints about what's going on. This is specially important to detect your bottleneck. I put below a collection of pretty useful markers I found into Unity docs, forums and in my own experience.
A: Profiler modules. This is a list of all the modules you can profile in your application. Use the drop-down menu at the top of this area to add and remove modules from the window.
B: Profiler controls. Use these controls to set which device to profile from and what kind of profiling Unity should perform, navigate between frames, and start recording data.
C: Frame charts. This area contains charts of each module the Profiler profiles. This area is blank when you open the Profiler for the first time, and fills with information when you start profiling your application.
D: Module details panel. The information in this area of the window changes based on the module you have selected. For instance, when you select the CPU Usage Profiler module, it contains a detailed timeline and the option to switch to a Hierarchy view. When you select the Rendering Profiler module, this area displays a list of debugging information. This area is blank when you open the Profiler for the first time, and fills with information when you start profiling your application.
Main thread base markers
PlayerLoop: Contains any samples that originate from your application’s main loop. If you target the Editor instead of Play mode while the Player is running within the Editor in active Play mode, PlayerLoop samples nest under the EditorLoop.
EditorLoop: Contains any samples that originate from the Editor’s main loop. This is only present while you profile a player in the Editor. When you target Play mode with the Profiler, EditorLoop samples show the amount of time spent rendering and running the Editor that contains the Player.
Profiler.CollectEditorStats: Contains any samples that relate to collecting statistics for different active Profiler modules.
Profiler.CollectGlobalStats: indicate how much overhead the Player incurs when it collects the statistics of a particular module. All other child samples only reflect their effect in the Editor.
Script update markers
BehaviourUpdate: Contains all samples of MonoBehaviour.Update methods.
CoroutinesDelayedCalls: Contains all samples of Coroutines after their first yield.
FixedBehaviourUpdate: Contains all samples of Monobehaviour.FixedUpdate methods.
PreLateUpdate.ScriptRunBehaviourLateUpdate: Contains all samples of Monobehaviour.LateUpdate methods.
Update.ScriptRunBehaviourUpdate: Contains all samples of MonoBehaviour.Update and Coroutines.
Idle: Contains samples that indicate the length of time that a Worker Thread is inactive for. A worker thread is inactive any time that the Job System does not utilize it, and it goes into wait mode, where it waits on the semaphore.
JobHandle.Complete: Contains samples that indicate when a sync point on a job happened. Sync points might have a performance impact on your application and might interfere with the execution of multi-threaded job code. To make it easier to find where exactly the sync point happened, enable Call Stack recording for this sample. In the CPU Profiler module’s Timeline view you can enable Flow Events to see which jobs finished at this point.
Semaphore.WaitForSignal: This is just a generic "Unity is waiting for something" marker in the profiler. It's not specific enough for us to have any idea about the problem you are facing. The important bit to report is the name of the marker that appears above it. Contains a sample that depicts a synchronization point on a thread. To find the thread it is waiting for, check the Timeline view for samples that ended shortly before this one.
WaitForJobGroupID: A Sync Fence on a JobHandle was triggered. This might lead to work stealing, which happens when a worker finishes its work and then looks at other workers’ jobs to complete. These show up as job samples executed under this marker. Jobs that were “stolen” are not necessarily the jobs that were being waited for.
Rendering and VSync markers
WaitForTargetFPS: Indicates how much time your application spent waiting for the targeted FPS that Application.targetFrameRate specifies. If the sample is a subsample of Gfx.WaitForPresentOnGfxThread, it represents the amount of time that your application spent waiting for the GPU. For example, this could be time that the GPU spent waiting for the next VSync, if that is configured in QualitySettings.vSyncCount, or if vSync is enforced on your target platform. However, samples with this marker are also emitted if the GPU hasn’t finished computing the frame. 1
Gfx.PresentFrame: Represents the time your application spent waiting for the GPU to render and present the frame, which includes waiting for VSync. Samples with the WaitForTargetFPS marker on the main thread show how much time is spent waiting for VSync.
Gfx.ProcessCommands: Contains all processing of the rendering commands on the render thread. Your application might have spent some of this processing time waiting for VSync or new commands from the main thread, which you can see from its child sample Gfx.WaitForPresentOnGfxThread.
Gfx.WaitForCommands: Indicates that the render thread was ready for new commands. If you see this marker, it might indicate a bottleneck on the main thread.
<GraphicsAPIName>.WaitForLastPresent: Samples with this marker appear when the main thread waited for the GPU to flip the frame number to the screen (Time.frameCount - QualitySettings.maxQueuedFrames + 1). This means that if QualitySettings.maxQueuedFrames is greater than one, this time is spent waiting for the GPU to flip a frame that your application requested to render in a previous main thread frame.
Gfx.WaitForRenderThread: Indicates that the main thread was waiting for the render thread to process all the commands in its command stream. Samples with this marker only appear in multithreaded rendering.
Gfx.WaitForPresent: When the main thread is ready to start rendering the next frame, but the render thread has not finished waiting on the GPU to Present the frame. This might indicate that your game is GPU bound. Look at the Timeline view to see if the render thread is simultaneously spending time in Gfx.PresentFrame. If the render thread is still spending time in Camera.Render, your game is CPU bound and e.g. spending to much time sending draw calls/textures to the GPU. 1
Gfx.WaitForPresentOnGfxThread: Usually, this means that the GPU is the thing limiting your framerate. This means it's probably time to use a GPU profiler, or other means to analyse how many vertices/pixels/big textures/etc you are asking to be drawn. Or maybe the frame stats will tell you that you are drawing way too much stuff. Etc. 1
Canvas.BuildBatch: Waiting for rendering to finish on another thread. 1
Back end scripting markers
GC.Alloc: Represents an allocation in the managed heap that contains managed allocations that are subject to automatic garbage collection. To reduce the time your application spends on automatic garbage collection, you should minimize these types of samples.
GC.Collect: Represents samples that relate to garbage collection. Whenever Unity needs to perform garbage collection, it stops running your program code and only resumes normal execution when the garbage collector has finished all its work. Note: If you have enabled Incremental Garbage Collection the garbage collector might not finish its work in a single frame.
Mono.JIT(Mono-only): Contains samples that relate to just-in-time compilation of a scripting method. When a function is executed for the first time, Mono compiles it and Mono.JIT represents this compilation overhead.
UnsafeUtility.Malloc: Contains samples that call UnsafeUtility.Malloc to allocate unmanaged memory. While the Garbage Collector does not track this memory, allocating memory might have a significant performance impact which is shown with this sample. To investigate the source of this call, you can enable Call Stack
Physics.FetchResults: Contains samples that collect the results of the physics simulation from the physics engine, such as contact streams, trigger overlaps, and joint breakage events.
Physics.Interpolation: Contains samples that measure the execution time of the Physics.Interpolation method. This method manages the interpolation of positions and rotations for all the physics objects in your application.
Physics.Processing: Contains samples that spent time waiting on the main thread until the physics simulation completed across all threads. If your application spends a lot of time in Physics.Processing but only has a few physics related GameObjects in the Scene, it might indicate that worker threads picked up other systems tasks due to job stealing and reported as physics. This is because while waiting, the main thread picks up jobs from the high priority queue.
Physics.ProcessingCloth: Contains samples that measure the execution time of the Physics.ProcessingCloth method. This method processes all cloth physics jobs. Expand this sample to show the low-level detail of the work done internally in the physics engine.
Physics.ProcessReports: Contains samples that correspond to time spent forwarding physics data to scripts via callbacks such as OnTriggerEnter. Note: These samples do not compute the data required because they have already been prepared during FetchResults.
Physics.Contacts: Contains samples that measure the execution time of Physics.Contacts. This processes OnCollisionEnter, OnCollisionExit, and OnCollisionStay events.
Physics.JointBreaks: Contains samples that measure the execution time of Physics.JointBreaks. This processes updates and messages related to broken joints.
Physics.TriggerEnterExits: Contains samples that measure the execution time of Physics.TriggerEnterExits. This processes OnTriggerEnter and OnTriggerExit events.
Physics.TriggerStays: Contains samples that measure the execution time of Physics.TriggerStays. This processes OnTriggerStay events.
Physics.Simulate: Contains samples that measure the amount of time spent working on the pre-requisites for the Physics.Simulate method. This method instructs the physics engine to run its simulation, which updates the state of the current physics.
Physics.UpdateBodies: Contains samples that update all the physics bodies’ positions and rotations. For each gameObject that has a Rigidbody component, samples with this marker read the pose from the physics engine and write it to the Transform.
Physics.UpdateCloth: Contains samples that measure the execution time of the Physics.UpdateCloth method. This method processes updates that relate to cloth and their Skinned Meshes.
Animation.DestroyAnimationClip/Animation.AddClip/Animation.RemoveClip/Animation.Clone/Animation.Deactivate: Indicates that RebuildInternalState has been triggered. RebuildInternalState is an operation that goes through the list of curves for each clip in the Animation component, and then rebinds each curve to a value on a component, on a GameObject. This is a resource-intensive operation, so you should avoid calling these methods at runtime as much as possible.
Rigidbody.SetKinematic: Recreate non-convex MeshCollider for Rigidbody.
AnimationScriptPlayable.ProcessAnimation: This is usually associate to some runtime rigging like OverrideTransformJob or MultiAimContraint.
AssetBundle.asset/allAssets: Indicates that Unity called the AssetBundleRequest.assets/allAssets API while the AssetBundle loading was not complete (AssetBundleRequest.isDone is false). This causes a stall on the main thread and waits for the loading operation to complete.
AsyncUploadManager.AsyncBufferResized. AsyncUploadManager.AsyncBufferDelete: Indicates that the internal buffer for uploading data to the GPU is resized because it’s not big enough. This resizing is slow and causes spikes in CPU activity. You can avoid this warning if you can spare the memory to allocate a larger size up front. You can use Async Upload Buffer Size setting in Quality Settings to set the default size. The AsyncUploadManager.AsyncBufferResized marker indicates the newly allocated size which you can use as a guide for the default buffer size.