<- "https://github.com/DS4PS/cpp-524-sum-2020/blob/master/labs/data/female-np-entrepreneurs.rds?raw=true"
URL <- readRDS(gzcon(url( URL )))
dat head( dat )
## gender age income edu.level years.prof.exp experience.np.create
## 1 Female 54 79669 Graduate 11-15 No
## 2 Female 62 63474 Graduate 15+ No
## 3 Female 70 27887 Graduate 15+ Yes
## 4 Male 63 63474 Graduate 15+ Yes
## 5 Female 60 170832 Graduate 15+ Yes
## 6 Female 41 69531 Graduate 6-10 Yes
## experience.np.form experience.np.other take.on.debt seed.funding
## 1 No Yes $0 No
## 2 Yes Yes $0 No
## 3 Yes Yes $0 No
## 4 No Yes $0 No
## 5 Yes Yes $0 Yes
## 6 No No $0 Yes
## most.imp.fund.source
## 1 Donations
## 2 Gov Grant
## 3 Donations
## 4 Donations
## 5 Corp Grant
## 6 Gov Grant
Comparing education levels of male and female entrepreneurs. Education levels of male & female entrepreneurs. “What is the highest level of education you achieved?”
Tables
<- table( dat$edu.level, dat$gender )
a %>% prop.table( margin=1 ) %>% round(2) %>% pander() a
Female | Male | |
---|---|---|
None | 0.47 | 0.53 |
High School | 0.4 | 0.6 |
Some College | 0.57 | 0.43 |
Bachelor | 0.6 | 0.4 |
Graduate | 0.52 | 0.48 |
Chi Square Tests
chisq.test( a, simulate.p.value = TRUE , B = 10000 )
##
## Pearson's Chi-squared test with simulated p-value (based on 10000
## replicates)
##
## data: a
## X-squared = 4.3831, df = NA, p-value = 0.3588
The two factors chi square p value is less than 0.05, therefore they are correlated, and not independent. While we find that females on average have slightly higher levels of education, they are not statistically significant and therefore not expressly correlated.
Compare work experience for male and female entrepreneurs. “Do males or females have more professional* work experience?”
<- table( dat$years.prof.exp, dat$gender )
b %>% prop.table( margin=1 ) %>% round(2) %>% pander() b
Female | Male | |
---|---|---|
0 | 0.73 | 0.27 |
1-2 | 0.67 | 0.33 |
3-5 | 0.62 | 0.38 |
6-10 | 0.57 | 0.43 |
11-15 | 0.58 | 0.42 |
15+ | 0.53 | 0.47 |
chisq.test( b, simulate.p.value = TRUE , B = 10000 )
##
## Pearson's Chi-squared test with simulated p-value (based on 10000
## replicates)
##
## data: b
## X-squared = 4.0086, df = NA, p-value = 0.5764
The two factors chi square p value is greater than 0.05, therefore they are uncorrelated, and independent. The work experience of female entrepreneurs is on average greater and is statistically significant.
Do females or males on average receive more seed funding?
<- table( dat$seed.funding, dat$gender )
c %>% prop.table( margin=1 ) %>% round(2) %>% pander() c
Female | Male | |
---|---|---|
No | 0.55 | 0.45 |
Yes | 0.54 | 0.46 |
chisq.test( c, simulate.p.value = TRUE , B = 10000 )
##
## Pearson's Chi-squared test with simulated p-value (based on 10000
## replicates)
##
## data: c
## X-squared = 0.033448, df = NA, p-value = 0.8558
The two factors chi square p value is greater than 0.05, therefore they are uncorrelated, and independent. Therefore we can determine that our data is statistically significant in finding that females receive seen funding more often than males.
Are males or females more willing to take on debt to start a business?
<- table( dat$take.on.debt, dat$gender )
d %>% prop.table( margin=1 ) %>% round(2) %>% pander() d
Female | Male | |
---|---|---|
$0 | 0.57 | 0.43 |
$0k-$10k | 0.6 | 0.4 |
$10k-$25k | 0.39 | 0.61 |
$25k-$50k | 0.47 | 0.53 |
$50k+ | 0.36 | 0.64 |
chisq.test( d, simulate.p.value = TRUE , B = 10000 )
##
## Pearson's Chi-squared test with simulated p-value (based on 10000
## replicates)
##
## data: d
## X-squared = 8.6158, df = NA, p-value = 0.06689
The two factors chi square p value is just greater than 0.05, therefore they are uncorrelated, and independent. Our data do suggest that males comparatively take on debt more often than their counterparts.
Do males and females have different sources of funding that are most importasnt to their businesses?
<- table( dat$most.imp.fund.source, dat$gender )
e %>% prop.table( margin=1 ) %>% round(2) %>% pander() e
Female | Male | |
---|---|---|
Donations | 0.51 | 0.49 |
Founder | 0.56 | 0.44 |
Earned Revenues | 0.66 | 0.34 |
Foundation Grant | 0.59 | 0.41 |
Gov Grant | 0.59 | 0.41 |
Member Fees | 0.46 | 0.54 |
Parent Org | 0.5 | 0.5 |
Angel | 0.52 | 0.48 |
Corp Grant | 0.67 | 0.33 |
chisq.test( e, simulate.p.value = TRUE , B = 10000 )
##
## Pearson's Chi-squared test with simulated p-value (based on 10000
## replicates)
##
## data: e
## X-squared = 8.8304, df = NA, p-value = 0.367
The two factors chi square p value is just greater than 0.05, therefore they are uncorrelated, and independent. Our data do suggest that in some areas there are statistically significant differences in funding sources by gender.
Does the average age of nonprofit founders differ by gender?
chisq.test( f, simulate.p.value = TRUE , B = 10000 )
##
## Pearson's Chi-squared test with simulated p-value (based on 10000
## replicates)
##
## data: f
## X-squared = 98.545, df = NA, p-value = 5e-04
t.test( age ~ gender, data=dat )
##
## Welch Two Sample t-test
##
## data: age by gender
## t = -3.1749, df = 589.02, p-value = 0.001577
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
## -5.031881 -1.185709
## sample estimates:
## mean in group Female mean in group Male
## 51.93948 55.04828
<- function( tvalue, n1, n2 )
get_pval_from_tval
{<- min( 655, 657 ) - 1 # size of treated group and comparison
df <- 2 * pt( abs(tvalue), df=df, lower.tail=FALSE )
pval return( pval )
}
get_pval_from_tval( tvalue=-3.1749, n1=655, n2=657 )
## [1] 0.001569263
Our data suggest within a 95% confidence interval that the average age of male nonprofit entrepreneurs is slightly older than that of female nonprofit entrepreneurs. (Though our adjusted alpha suggests the groups are equivalent, more in Q8)
Were the income levels between men and women entrepeneurs different at the time of inception?
chisq.test( g, simulate.p.value = TRUE , B = 10000 )
##
## Pearson's Chi-squared test with simulated p-value (based on 10000
## replicates)
##
## data: g
## X-squared = 601.88, df = NA, p-value = 0.6753
t.test( income ~ gender, data=dat )
##
## Welch Two Sample t-test
##
## data: income by gender
## t = -3.6353, df = 630.22, p-value = 0.0003003
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
## -20239.710 -6042.518
## sample estimates:
## mean in group Female mean in group Male
## 67741.83 80882.95
The two factors chi square p value is just greater than 0.05, therefore they are uncorrelated, and independent. The differences in income between men and women at the time of starting a nonprofit was meaningfully statistically different and greater for males.
-Further investigation using Bonferroni’s corrected alpha could be of use concerning type 1 errors. -The lowest p-value across the contrasts was that of age. -The only statistically significant p-value was that of age, and after running a basic Bonferroni Correction on it, we cannot reject the nunll hypothesis and therefore must state that these entrepreneurs surveyed are statistically the same when considering gender.
```