경영통계분석 과제2

Question 1

At one large midwest university, about 40% of the college seniors have a social science major. Five seniors will be selected at random. Let X denote the number that don’t have a social science major.

(a) List the probability distribution

학생을 선정할 때, 사회과학 전공이 아닌 경우를 1로, 사회과학 전공인 경우를 0으로 하면 이는 확률 0.6의 독립된 베르누이 시행으로 볼 수 있습니다.

따라서, 이를 5번 반복하는 확률변수 $X$는 $X\sim B(5,0.6)$를 따른다고 볼 수 있으며, $P(X=k)={5\choose k}0.6^k0.4^{(5-k)}$이므로 이에 따른 확률분포는 아래와 같습니다.

kX	0	1	2	3	4	5
P(X=k)	0.0102	0.0768	0.230	0.346	0.259	0.0778

library(tidyverse)

binomial <- tibble(k=c(0,1,2,3,4,5), prob=choose(5,k)*0.6^k*0.4^(5-k))
binomial

# A tibble: 6 × 2
      k   prob
  <dbl>  <dbl>
1     0 0.0102
2     1 0.0768
3     2 0.230 
4     3 0.346 
5     4 0.259 
6     5 0.0778

(b) Calculate mean and variance from the entries in the list from part (a).

평균은 3, 분산은 1.2입니다.

mean=sum(binomial$k*binomial$prob)
variance=sum(binomial$k^2*binomial$prob)-mean^2
paste(mean,variance,sep=" / ")

[1] "3 / 1.2"

(c) Calculate E(X)=np and Var(X)=np(1-p) and compare your answer with part (b).

np와 np(1-p)의 값은 각각 3, 1.2로, 평균 및 분산과 같습니다.

Question 2

If X has a normal distribution with μ = 100 and σ = 5 , find b such that

(a) P(X<b)=0.67

b=102.2

# Set parameter
mean <- 100; std <- 5
# (a) P(X<b)=0.67, Using bisection method
i <- mean-3*std;j <- mean+3*std;mid <- (i+j)/2
while(abs(pnorm(mid,mean,std)-0.67)>0.000001){
  if(pnorm(mid,mean,std)-0.67>=0){j <- mid}
  if(pnorm(mid,mean,std)-0.67<0){i <- mid}
  mid=(i+j)/2
}
paste(j,pnorm(j,mean,std),sep=" / ")

[1] "102.200012207031 / 0.670032330457297"

(b) P(X>b)=0.011

b=111.451

# (b) P(X>b)=0.011, Using bisection method
i <- mean-3*std;j <- mean+3*std;mid <- (i+j)/2
while(abs(pnorm(mid,mean,std)-(1-0.011))>0.000001){
  if(pnorm(mid,mean,std)-(1-0.011)>=0){j <- mid}
  if(pnorm(mid,mean,std)-(1-0.011)<0){i <- mid}
  mid=(i+j)/2}
paste(i,1-pnorm(i,mean,std),sep=" / ")

[1] "111.451416015625 / 0.0110024524392477"

(c) P(|X-100|<b)=0.966

b=10.601

# (c) P(|X-100|<b)=0.966, Using bisection method
i <- mean-3*std;j <- mean+3*std;mid <- (i+j)/2
while(abs(1-(1-pnorm(j,mean,std))*2-0.966)>0.000001){
  if(pnorm(mid,mean,std)-(1-(1-0.966)/2)>=0){j <- mid}
  if(pnorm(mid,mean,std)-(1-(1-0.966)/2)<0){i <- mid}
  mid=(i+j)/2
}
paste(j-100,1-(1-pnorm(j,mean,std))*2,sep=" / ")

[1] "10.6003761291504 / 0.966000298159686"

(d) P(X<110)=b

b=0.9772

# (d) P(X<110)=b, Using bisection method
b_d <- pnorm(110,mean,std); b_d

[1] 0.9772499

(e) P(X>95)=b

b=0.8413

# (e) P(X>95)=b
b_e <- 1-pnorm(95,mean,std); b_e

[1] 0.8413447

Question 3

Suppose the amount of sun block lotion in plastic bottles leaving a filling machine has a normal distribution. The bottles are labeled 300 milliliter(ml) but the actual mean is 302 ml and the standard deviation is 2 ml.

(a) What is the probability that an individual bottle will contain less than 299 ml?

로션의 양(ml)에 대한 확률변수 $X$는 $X\sim N(302,2^2)$를 따르므로, $P(X<299)\approx 6.68\%$

pnorm(299,302,2)

[1] 0.0668072

(b) If you pick up 5 bottles and check the amount of sun block lotion in each bottle, what is the probability that all of 5 bottles contain less than 299 ml? (Assume that they are independent.)

각각의 병이 확률변수 $X_i,\;i=1,2,3,4,5$라고 하면 $X_i\sim N(302,2^2)$이며 각 확률변수는 독립이므로 $P(X_i\in k_1,\;X_j\in k_2)=P(X_i\in k_1)P(X_j\in k_2)$가 성립합니다.

따라서, 모든 병이 299ml 미만일 확률 $P(X_i<299\;for\;all\;i)=\prod_{k=1}^5 P(X_i<299)\approx 0.668^5$

pnorm(299,302,2)^5

[1] 1.330811e-06

Question 4

The number of complaints per day, X, received by a cable TV distributor has the probability distribution

x	0	1	2	3
f(x)	0.4	0.3	0.1	0.2

(a) Find the expected value and the standard deviation of the number of complaints per day.

평균은 1.1, 표준편차는 $\sqrt{1.29}\approx 1.136$입니다.

complaints <- tibble(x=c(0,1,2,3), prob=c(0.4,0.3,0.1,0.2))
mean <- sum(complaints$x*complaints$prob)
vol <- sum(complaints$x^2*complaints$prob)-mean^2
std <- sqrt(vol)
paste(mean,std,sep=" / ")

[1] "1.1 / 1.13578166916005"

(b) What is the approximate probability that the distributor will receive more than 2 complaints in average during 100 days?

먼저, $X_i$를 i일 뒤에 컴플레인이 들어오는 횟수라고 한다면 $X_i$는 위 분포를 따르게 될 것입니다. 100일 뒤까지의 컴플레인이 들어오는 횟수를 $X_1,\;X_2\;,...,X_{100}$이라고 한다면, 각각의 $X_i$는 독립이고 동일한 분포(iid)를 가지는 표본입니다.

한편, 중심극한정리(Central Limit Theorem)에 따라, 표본의 크기가 충분히 크다면 해당 표본들을 정규분포로 근사시킬 수 있습니다.

\[if\;X_i\;are\;iid,\;\frac{\sum_{k=1}^{n}(X_i-\mu)}{\sqrt{n}\times\sigma}\sim N(0,1)\;for\;large\;n\]

100번의 독립시행 $X_1\sim X_{100}$은 정규근사하기 충분한 표본이므로 이를 적용하면 다음과 같습니다.

\[By\;CLT,\;\frac{\sum_{k=1}^{100}(X_i-1.1)}{\sqrt{100}\times\sqrt{1.29}}=\frac{\sum_{k=1}^{100}X_i-110}{\sqrt{129}}\sim N(0,1)\]

이제, 100일 동안의 일평균 컴플레인 횟수가 2보다 클 확률은 다음과 같습니다.

\[P[\frac{\sum_{k=1}^{100}X_i}{100}>2]=P[\frac{\sum_{k=1}^{100}X_i-110}{\sqrt{129}}=Z>\frac{90}{\sqrt{129}}]\]

해당 확률을 R을 통해 구해보면, 0에 가깝습니다.

1-pnorm(90/sqrt(129),0,1)

[1] 1.110223e-15

(c) If we observe for five days, what is the probability that the distributor will receive exactly 1 complaint only for two days?

컴플레인을 1개만 받을 확률은 0.3입니다.

하루를 컴플레인을 1개만 받는 경우를 1, 1개가 아닌 경우를 0으로만 나눈다면, 해당 시행은 확률 0.3의 베르누이 시행입니다. 이를 5일 동안 관측한다면 각각은 독립이므로 해당 분포는 $B(5,0.3)$인 이항분포입니다.

즉, 5일중 2일만 컴플레인을 정확히 1개 받을 확률은 ${5\choose2}0.3^20.7^3=30.87\%$

choose(5,2)*0.3^2*0.7^3

[1] 0.3087

(d) If we observe for 100 days, what is the approximate distribution of the number of days that the distributor will not receive any complaint?

하루에 컴플레인이 없을 확률은 0.4로, 컴플레인이 없는 경우를 1, 있는 경우를 0으로 나눈다면 해당 시행은 확률 0.4의 베르누이 시행으로 볼 수 있습니다.

이를 100일간의 표본으로 나누어 시행하면 하루단위 시행은 각각 독립이므로, 이는 확률 0.4 및 시행횟수 100의 이항분포($B(100,0.4)$)를 따르게 됩니다.

한편, 해당 이항분포는 동일한 확률을 가진 100개의 베르누이시행($B_i\sim B(1,0.4)$)으로 나누어 볼 수 있고, (b)와 동일하게 CLT를 이용한다면 아래와 같이 표현할 수 있습니다. \[\frac{\sum_{k=1}^{100}(B_i-0.4)}{\sqrt{100}\sqrt{1\times 0.4 \times 0.6}}\sim N(0,1),\;Since\;X=\sum_{k=1}^{100}B_i,\;\frac{X-40}{\sqrt{24}}\sim N(0,1)\;\Rightarrow\;X\sim N(40,24)\]

즉, 평균이 40이고 분산이 24인 정규분포로 근사할 수 있습니다.

(e) Using the approximate distribution in (d), what is the approximate probability that the number of days that the distributor will not receive any complaint is at most 30 days?

해당 확률은 $P(X\leq 30)\;for\;X\sim B(100,0.4)$과 같습니다.

위의 정규분포 근사를 활용하면 $X\sim N(40,24),\;then\;P(X\leq 30)\approx 2.06\%$

pnorm(30,40,sqrt(24))

[1] 0.02061342

Question 5

The probability that a voter will believe a rumor about a politician is 0.3.

(a) Find the probability that the first 3 voters don’t believe the rumor but the 4th voter believe it.

$0.7\times 0.7\times 0.7\times 0.3=0.1029=10.29\%$

(b) Find the probability that the exactly one person believe the rumor if 5 voters are told individually

각 유권자는 소문을 개별적으로 확인하였으므로 유권자들이 루머를 믿는 확률변수는 각각 독립입니다.

루머를 믿을 확률은 0.3이므로 루머를 믿는 경우를 1, 안믿는 경우를 0이라고 한다면 이는 확률 0.3의 베르누이 시행입니다. 이를 5번 반복하면 5명의 유권자 중 루머를 믿는 유권자의 수는 $B(5,0.3)$를 따릅니다.

정확히 한사람만 루머를 믿을 확률은, ${5\choose1}0.3^10.7^4\approx 36.02\%$

choose(5,1)*0.3*0.7^4

[1] 0.36015

Question 6

Here is the assignment of probabilities that describes the age (in years) and the gender of a randomly selected American College student.

Category	14~18	18~25	25~35	35~
Male	0.01	0.28	0.13	0.04
Femail	0.02	0.3	0.14	0.08

A college student will be selected at random. Let A=[student is Female] and B=[student is at least 25 but less than 35 years old]. Find,

(a) P(A) and P(B)

$P(A)=0.02+0.3+0.14+0.08=0.54$

$P(B)=0.13+0.14=0.27$

(b) P(A or B)

$P(A\,or\,B)=P(A\cup B)=P(A)+P(B)-P(A\cap B)=0.54+0.27-0.14=0.67$

Question 7

Show the following statement:If $X\sim t[k]$, then $X^2\sim F[1,k]$

\[X\sim t[k]\;\Leftrightarrow\;X=\frac{Z}{\sqrt(V/k)}\;for\;Z\sim N(0,1),\;V\sim \chi^2(k)\]

\[\Leftrightarrow X^2=\frac{Z^2}{V/k}=\frac{U/1}{V/k}\;for\;U=Z^2\sim \chi^2(1)\]

\[\Leftrightarrow F=X^2=\frac{U/1}{V/k}\sim F(1,k)\;for\;U\sim\chi^2(1),\;V\sim\chi^2(k)\;\;\;\square\]

# 경영통계분석 과제2 {.unnumbered} ## Question 1 At one large midwest university, about 40% of the college seniors have a social science major. Five seniors will be selected at random. Let X denote the number that **don’t** have a social science major. **(a) List the probability distribution** 학생을 선정할 때, 사회과학 전공이 아닌 경우를 1로, 사회과학 전공인 경우를 0으로 하면 이는 확률 0.6의 독립된 베르누이 시행으로 볼 수 있습니다. 따라서, 이를 5번 반복하는 확률변수 $X$는 $X\sim B(5,0.6)$를 따른다고 볼 수 있으며, $P(X=k)={5\choose k}0.6^k0.4^{(5-k)}$이므로 이에 따른 확률분포는 아래와 같습니다. |k\in X| 0| 1| 2| 3| 4| 5| |:----:|--:|--:|--:|--:|--:|--:| |P(X=k)|0.0102|0.0768|0.230|0.346|0.259|0.0778| ```{r} #| output: FALSE library(tidyverse) ``` ```{r} binomial <- tibble(k=c(0,1,2,3,4,5), prob=choose(5,k)*0.6^k*0.4^(5-k)) binomial ``` **(b) Calculate mean and variance from the entries in the list from part (a).** 평균은 3, 분산은 1.2입니다. ```{r} mean=sum(binomial$k*binomial$prob) variance=sum(binomial$k^2*binomial$prob)-mean^2 paste(mean,variance,sep=" / ") ``` **(c) Calculate E(X)=np and Var(X)=np(1-p) and compare your answer with part (b).** np와 np(1-p)의 값은 각각 3, 1.2로, 평균 및 분산과 같습니다. ## Question 2 If X has a normal distribution with μ = 100 and σ = 5 , find b such that **(a) P(X<b)=0.67** b=102.2 ```{r} # Set parameter mean <- 100; std <- 5 # (a) P(X<b)=0.67, Using bisection method i <- mean-3*std;j <- mean+3*std;mid <- (i+j)/2 while(abs(pnorm(mid,mean,std)-0.67)>0.000001){ if(pnorm(mid,mean,std)-0.67>=0){j <- mid} if(pnorm(mid,mean,std)-0.67<0){i <- mid} mid=(i+j)/2 } paste(j,pnorm(j,mean,std),sep=" / ") ``` **(b) P(X>b)=0.011** b=111.451 ```{r} # (b) P(X>b)=0.011, Using bisection method i <- mean-3*std;j <- mean+3*std;mid <- (i+j)/2 while(abs(pnorm(mid,mean,std)-(1-0.011))>0.000001){ if(pnorm(mid,mean,std)-(1-0.011)>=0){j <- mid} if(pnorm(mid,mean,std)-(1-0.011)<0){i <- mid} mid=(i+j)/2} paste(i,1-pnorm(i,mean,std),sep=" / ") ``` **(c) P(|X-100|<b)=0.966** b=10.601 ```{r} # (c) P(|X-100|<b)=0.966, Using bisection method i <- mean-3*std;j <- mean+3*std;mid <- (i+j)/2 while(abs(1-(1-pnorm(j,mean,std))*2-0.966)>0.000001){ if(pnorm(mid,mean,std)-(1-(1-0.966)/2)>=0){j <- mid} if(pnorm(mid,mean,std)-(1-(1-0.966)/2)<0){i <- mid} mid=(i+j)/2 } paste(j-100,1-(1-pnorm(j,mean,std))*2,sep=" / ") ``` **(d) P(X<110)=b** b=0.9772 ```{r} # (d) P(X<110)=b, Using bisection method b_d <- pnorm(110,mean,std); b_d ``` **(e) P(X>95)=b** b=0.8413 ```{r} # (e) P(X>95)=b b_e <- 1-pnorm(95,mean,std); b_e ``` ## Question 3 Suppose the amount of sun block lotion in plastic bottles leaving a filling machine has a normal distribution. The bottles are labeled 300 milliliter(ml) but the actual mean is 302 ml and the standard deviation is 2 ml. **(a) What is the probability that an individual bottle will contain less than 299 ml?** 로션의 양(ml)에 대한 확률변수 $X$는 $X\sim N(302,2^2)$를 따르므로, $P(X<299)\approx 6.68\%$ ```{r} pnorm(299,302,2) ``` **(b) If you pick up 5 bottles and check the amount of sun block lotion in each bottle, what is the probability that all of 5 bottles contain less than 299 ml?** (Assume that they are independent.) 각각의 병이 확률변수 $X_i,\;i=1,2,3,4,5$라고 하면 $X_i\sim N(302,2^2)$이며 각 확률변수는 독립이므로 $P(X_i\in k_1,\;X_j\in k_2)=P(X_i\in k_1)P(X_j\in k_2)$가 성립합니다. 따라서, 모든 병이 299ml 미만일 확률 $P(X_i<299\;for\;all\;i)=\prod_{k=1}^5 P(X_i<299)\approx 0.668^5$ ```{r} pnorm(299,302,2)^5 ``` ## Question 4 The number of complaints per day, X, received by a cable TV distributor has the probability distribution |x | 0| 1| 2| 3| |:--:|:--:|:--:|:--:|:--:| |f(x)|0.4|0.3|0.1|0.2| **(a) Find the expected value and the standard deviation of the number of complaints per day.** 평균은 1.1, 표준편차는 $\sqrt{1.29}\approx 1.136$입니다. ```{r} complaints <- tibble(x=c(0,1,2,3), prob=c(0.4,0.3,0.1,0.2)) mean <- sum(complaints$x*complaints$prob) vol <- sum(complaints$x^2*complaints$prob)-mean^2 std <- sqrt(vol) paste(mean,std,sep=" / ") ``` **(b) What is the approximate probability that the distributor will receive more than 2 complaints in average during 100 days?** 먼저, $X_i$를 i일 뒤에 컴플레인이 들어오는 횟수라고 한다면 $X_i$는 위 분포를 따르게 될 것입니다. 100일 뒤까지의 컴플레인이 들어오는 횟수를 $X_1,\;X_2\;,...,X_{100}$이라고 한다면, 각각의 $X_i$는 독립이고 동일한 분포(iid)를 가지는 표본입니다. 한편, 중심극한정리(Central Limit Theorem)에 따라, 표본의 크기가 충분히 크다면 해당 표본들을 정규분포로 근사시킬 수 있습니다. $$if\;X_i\;are\;iid,\;\frac{\sum_{k=1}^{n}(X_i-\mu)}{\sqrt{n}\times\sigma}\sim N(0,1)\;for\;large\;n$$ 100번의 독립시행 $X_1\sim X_{100}$은 정규근사하기 충분한 표본이므로 이를 적용하면 다음과 같습니다. $$By\;CLT,\;\frac{\sum_{k=1}^{100}(X_i-1.1)}{\sqrt{100}\times\sqrt{1.29}}=\frac{\sum_{k=1}^{100}X_i-110}{\sqrt{129}}\sim N(0,1)$$ 이제, 100일 동안의 일평균 컴플레인 횟수가 2보다 클 확률은 다음과 같습니다. $$P[\frac{\sum_{k=1}^{100}X_i}{100}>2]=P[\frac{\sum_{k=1}^{100}X_i-110}{\sqrt{129}}=Z>\frac{90}{\sqrt{129}}]$$ 해당 확률을 R을 통해 구해보면, 0에 가깝습니다. ```{r} 1-pnorm(90/sqrt(129),0,1) ``` **(c) If we observe for five days, what is the probability that the distributor will receive exactly 1 complaint only for two days?** 컴플레인을 1개만 받을 확률은 0.3입니다. 하루를 컴플레인을 1개만 받는 경우를 1, 1개가 아닌 경우를 0으로만 나눈다면, 해당 시행은 확률 0.3의 베르누이 시행입니다. 이를 5일 동안 관측한다면 각각은 독립이므로 해당 분포는 $B(5,0.3)$인 이항분포입니다. 즉, 5일중 2일만 컴플레인을 정확히 1개 받을 확률은 ${5\choose2}0.3^20.7^3=30.87\%$ ```{r} choose(5,2)*0.3^2*0.7^3 ``` **(d) If we observe for 100 days, what is the approximate distribution of the number of days that the distributor will not receive any complaint?** 하루에 컴플레인이 없을 확률은 0.4로, 컴플레인이 없는 경우를 1, 있는 경우를 0으로 나눈다면 해당 시행은 확률 0.4의 베르누이 시행으로 볼 수 있습니다. 이를 100일간의 표본으로 나누어 시행하면 하루단위 시행은 각각 독립이므로, 이는 확률 0.4 및 시행횟수 100의 이항분포($B(100,0.4)$)를 따르게 됩니다. 한편, 해당 이항분포는 동일한 확률을 가진 100개의 베르누이시행($B_i\sim B(1,0.4)$)으로 나누어 볼 수 있고, (b)와 동일하게 CLT를 이용한다면 아래와 같이 표현할 수 있습니다. $$\frac{\sum_{k=1}^{100}(B_i-0.4)}{\sqrt{100}\sqrt{1\times 0.4 \times 0.6}}\sim N(0,1),\;Since\;X=\sum_{k=1}^{100}B_i,\;\frac{X-40}{\sqrt{24}}\sim N(0,1)\;\Rightarrow\;X\sim N(40,24)$$ 즉, 평균이 40이고 분산이 24인 정규분포로 근사할 수 있습니다. **(e) Using the approximate distribution in (d), what is the approximate probability that the number of days that the distributor will not receive any complaint is at most 30 days?** 해당 확률은 $P(X\leq 30)\;for\;X\sim B(100,0.4)$과 같습니다. 위의 정규분포 근사를 활용하면 $X\sim N(40,24),\;then\;P(X\leq 30)\approx 2.06\%$ ```{r} pnorm(30,40,sqrt(24)) ``` ## Question 5 The probability that a voter will believe a rumor about a politician is 0.3. **(a) Find the probability that the first 3 voters don’t believe the rumor but the 4th voter believe it.** $0.7\times 0.7\times 0.7\times 0.3=0.1029=10.29\%$ **(b) Find the probability that the exactly one person believe the rumor if 5 voters are told individually** 각 유권자는 소문을 개별적으로 확인하였으므로 유권자들이 루머를 믿는 확률변수는 각각 독립입니다. 루머를 믿을 확률은 0.3이므로 루머를 믿는 경우를 1, 안믿는 경우를 0이라고 한다면 이는 확률 0.3의 베르누이 시행입니다. 이를 5번 반복하면 5명의 유권자 중 루머를 믿는 유권자의 수는 $B(5,0.3)$를 따릅니다. 정확히 한사람만 루머를 믿을 확률은, ${5\choose1}0.3^10.7^4\approx 36.02\%$ ```{r} choose(5,1)*0.3*0.7^4 ``` ## Question 6 Here is the assignment of probabilities that describes the age (in years) and the gender of a randomly selected American College student. |Category|14~18|18~25|25~35|35~| |:------:|:------:|:------:|:------:|:------:| |Male|0.01|0.28|0.13|0.04| |Femail|0.02|0.3|0.14|0.08| A college student will be selected at random. Let A=[student is Female] and B=[student is at least 25 but less than 35 years old]. Find, **(a) P(A) and P(B)** $P(A)=0.02+0.3+0.14+0.08=0.54$ $P(B)=0.13+0.14=0.27$ **(b) P(A or B)** $P(A\,or\,B)=P(A\cup B)=P(A)+P(B)-P(A\cap B)=0.54+0.27-0.14=0.67$ ## Question 7 **Show the following statement:If $X\sim t[k]$, then $X^2\sim F[1,k]$** $$X\sim t[k]\;\Leftrightarrow\;X=\frac{Z}{\sqrt(V/k)}\;for\;Z\sim N(0,1),\;V\sim \chi^2(k)$$ $$\Leftrightarrow X^2=\frac{Z^2}{V/k}=\frac{U/1}{V/k}\;for\;U=Z^2\sim \chi^2(1)$$ $$\Leftrightarrow F=X^2=\frac{U/1}{V/k}\sim F(1,k)\;for\;U\sim\chi^2(1),\;V\sim\chi^2(k)\;\;\;\square$$