Text to visual synthesis with appearance models