The best way to build efficient data architecture is to adopt a strategic approach and non tactical. Indeed, the valorization of the data of the organization should be seen with business eyes and the technical way is the consequence.
To build a data architecture based on Hadoop, the main piece to look at is the distributed file system HDFS, the others pieces of the technical platform, must be linked to the core as services through an abstraction layer such as Yarn. By this way, we construct a modular and a flexible platform with a main approach which is service (Service Oriented Architecture). In fact, all tools such as the data processing (MapReduce, Hive, Pig, Spark) or the data access (flume, sqoop) can be seen as services around data stored on HDFS.
In the end the big deal, It remains to manage the issues of data access and the identity management.