Text this: Multimodal convolutional transformer (mct-dd): depression diagnosis through joint task analysis