On names

“You are Haoyue, not Freda. Be yourself.”

Fortunately, no one has ever said the above sentences to me, though I suspect some of my friends are hesitant to say so.

Yes, if I choose to be Haoyue, I believe kind people will learn to pronounce it correctly, though it is insane to request that non-Chinese native speakers pronounce “Haoyue” so that I can recognize it with nearly zero error.

It has been extremely frustrating for me to hear “hey bro” through chats from people who have only seen my name in Chinese characters—the Hao, which means “great,” is typically used in boys’ names. Personally, I can completely erase these unpleasant memories, by simply using the name Freda.

I adore and appreciate Chinese culture: I read and write poems in Chinese, and I converse with my Chinese friends in Chinese, while I prefer to be Freda, at least in English contexts. I have the reasons stated above for my name preference, but I believe that I do not need to say so in order for people to refer to me as Freda.

Chinese people have become more critical of other Chinese choosing English names for themselves, particularly in light of the current political situation in which nationalism is becoming increasingly popular among young Chinese people. However, I believe that one’s name should be up to themselves: on the one hand, if they choose to use their name in their native language, reasonable others should support them and do their best to learn how to pronounce it; on the other hand, and equally important, if they choose to use another name, reasonable others should respect their choice rather than forcing the person to “be themselves.” It’s wonderful to encourage friends to be themselves, but preferring a name that isn’t in their native language does not always (or almost never) mean they’re losing themselves.

Whenever I am Freda or Haoyue, I was, am, and will be myself.

Evaluating diversity in machine language generation

Yesterday I was talking with Han Shao about the metric of diversity in machine language generation. Suppose there’re three systems A, B and C:

A generates 3 examples; each example is in a different pattern.
B generates 100 examples; each example is in one of 5 patterns, uniformly (i.e., each pattern has 20 examples).
C generates 100 examples, 96 of which are in pattern (a), while the rest 4 are in different other patterns respectively.

Which is the most diverse one? We found it’s difficult to quantitatively answer this question. But I somewhat convinced myself that it can be measured as follows, applying simple equations from information theory.

Let’s assume that each pattern is independent of any of the others. It’s also necessary to assume the observed empirical distribution to be the true distribution that the model represents, though we should let the models generate as many examples as they can to obtain a good estimation of the true distribution.

Let P_{\Theta}(x) denote the probability for model \Theta to generate pattern x, the entropy of such distribution is

H(P_\Theta ) = -\sum_x P_\Theta (x)\log P_\Theta(x)

Larger entropy typically means better diversity. We can then compute the entropy of the above three distributions:

H(P_A) = -3 \cdot \frac13 \log\frac13 = \log 3

H(P_B) = -5 \cdot \frac15 \log\frac15 = \log 5 > H(P_A)

H(P_C) = -\frac{96}{100} \log\frac{96}{100} - 4\cdot \frac{1}{100} \log\frac{1}{100} = \frac{100\log{100} - 96\log 96 }{100} < log 3 = H(P_A)