Great work!
According to your very detailed description, I am pretty sure that you are not doing anything wrong when using image tracking and motion tracking. The drift is quite possible from VIO system (ARKit/ARCore/MotionTracker).
EasyAR did not define the image target as an anchor in motion tracking because it is a separated feature. A drift cannot be solved without lower level fusion, which is not available now in EasyAR.
You can try sparse spatial map feature, generate a map for the whole arena, parent arena to the map, and use LocalizationMode.KeepUpdate to relocate map continuously.
Before a further suggestion, I want to know what problem does the drift bring and how are different users involved in the game? I can now list two possible problems. One is that the arena drifted do not align to the image when user come back to the entrance, but what about when not at the entrance? Are there any other parts of the arena sensitive to the drift? Second is the multi-user interaction do not align, but I am not sure about if and how this looks like.