08-publication_bias.Rmd

# Publication Bias

![](publicationbias.jpg)

In the last chapters, we have showed you how to **pool effects** in meta-analysis, choose the **right pooling model**, assess the **heterogeneity** of your effect estimate, and determine sources of heterogeneity through **outlier, influence, and subgroup analyses**.

Nevertheless, even the most thoroughly conducted meta-analysis can only work with the **study material at hand**. An issue commonly discussed in research, however, is the **file-drawer** or **publication bias** problem, which states that a study with high effect sizes **is more likely to be published** than a study with a low effect size [@rothstein2006publication]. 

Such **missing studies** with low effect sizes, it is assumed, thus never get published and therefore cannot be integrated into our meta-analysis. This leads to **publication bias**, as the pooled effect we estimate in our meta-analysis might be higher than the **true effect size** because we did not consider the missing studies with lower effects due to the simple fact that they were never published.

Although this practice is gradually changing [@nelson2018psychology], weather a study is published or not heavily depends on the **statistical significance** ($p<0.05$) of its results [@dickersin2005publication]. For any sample size, a result is more likely to become **statistically significant** if its **effect size is high**. This is particularly true for **small studies**, for which very large effect sizes are needed to attain a statisitcally significant result.

In the [following chapter](#smallstudyeffects), we will describe the **idea behind statistical models for publication bias** in further depth. We termed these concepts and methods **small-study effect methods**, as it is small studies that they mostly focus on. These methods assume that publication bias is primarily driven by **effect size** and because researchers **immediately put every study in the file drawer once the results are not significant**.

Recently, it has been argued that these **assumptions may not be true**, and that publication bias is mostly caused by **significance levels** and **p-hacking** [@simonsohn2014p]. An alternative method called **P-Curve** has therefore been suggested to examine **publication bias**. We will present this method in the [last chapter](#pcurve) of this section.

```{block,type='rmdinfo'}
**Which method should i use for my meta-analysis?**

While recent research suggests that the conventional **small-study effect methods** may have substantial **limitations**, and that **p-Curve** may be able to estimate the true effect with less bias [@simonsohn2014p;@simonsohn2015better;@simonsohn2014pb], please note that both methods are based on different **theoretical assumptions** about the origin of publication bias. As we cannot ultimately decide which assumption is the **"true"** one in specific research fields, and, in practice **the true effect is unkown when doing meta-analysis**, we argue that you may use **both methods** and compare results as **sensitivity analyses** [@harrer2019internet].

**P-curve** was developed with **full-blown experimental psychological research in mind**, in which researchers often have **high degrees of "researcher freedom"** [@simmons2011false] in deleting outliers and performing statistical test on their data.

We argue that this looks slightly different for **clinical psychology** and the medical field, where researchers conduct **randomized controlled trials** whith a clear **primary outcome**: the difference between the control and the intervention group after the treatment. While it is also true for **medicine and clinical psychology that statistical significance plays an important role**, the **effect size** of an intervention is often of greater interest, as **treatments are often compared in terms of their treatment effects** in this field. Furthermore, best practice for randomized controlled trials is to perform **intention-to-treat** analyses, in which all collected data in a trial has to be considered, giving researchers less space to "play around" with their data and perform p-hacking. While we certainly do not want to insinuate that **outcome research in clinical psychology** is free from p-hacking and bad data analysis practices, this should be seen as a **caveat** that the assumptions of the small-study effects methods may be more adequate for clinical psychology than other fields within psychology, especially when **the risk of bias for each study is also taken into account*.
Facing this uncertainty, we think that conducting both analyses and reporting them in our research paper may be the most adequate approach until meta-scientific research gives us more certainty about which **assumption actually best reflects the field of clinical psychology**. 

```


## Small-study effect methods {#smallstudyeffects}

The **small-study effect methods** we present here have been conventional for many years. Thus various methods to assess and control for publication bias have been developed, but we will only focus on the most important ones here.

```{block,type='rmdinfo'}
**The model behind small-study effects methods**

According to Borenstein et al. [@borenstein2011]. The model behind the most common small-study effects methods has these core **assumptions**:

1.  Because they involve large commitment of ressources and time, **large studies are likely to get published**, weather the results are significant or not
2.  Moderately sized studies are at **greater risk of missing**, but with a moderate sample size even moderately sized effects are likely to become significant, which means that only some studies will be missing
3.  Small studies are **at greatest risk** for being non-significant, and thus being missing. Only small studies with a very large effect size become significant, and will be found in the published literature.

In accordance with these assumptions, the methods we present here particularly focus **on small studies with small effect sizes, and wheather they are missing**.

```

### Funnel plots

The best way to visualize weather **small studies with small effect sizes are missing** is through **funnel plots**.

We can generate a funnel plot for our `m.hksj` meta-analysis output using the `funnel()` function in `meta`.

```{r,echo=FALSE,message=FALSE}
library(meta)
load("Meta_Analysis_Data.RData")
madata<-Meta_Analysis_Data

m.hksj<-metagen(TE,
        seTE,
        data=madata,
        studlab=paste(Author),
        comb.fixed = FALSE,
        comb.random = TRUE,
        method.tau = "SJ",
        hakn = TRUE,
        prediction=TRUE,
        sm="SMD")


```

```{r}
funnel(m.hksj,xlab = "Hedges' g")
```

The **funnel plot** basically consists of a **funnel** and two **axes**: the y-axis showing the **Standard Error** $SE$ of each study, with larger studies (which thus have a smaller $SE$) plotted **on top of the y-axis**; and the x-axis showing the **effect size** of each study.

Given our assumptions, and in the case when there is **no publication bias**, all studies would lie **symmetrically around our pooled effect size (the striped line)** within the form of the funnel. When **publication bias is present**, we would assume that the funnel would look asymmetrical, because only the small studies with a large effect size very published, **while small studies without a significant, large effect would be missing**.

We see from the plot that in the case of our meta-anlysis `m.hksj`, the latter is probably true. We see that the plot is highly asymmetrical, with exactly the small studies with low effect size missing in the bottom-left corner of our plot.

We can also display the name of each study using the `studlab` parameter.

```{r}
funnel(m.hksj,xlab = "g",studlab = TRUE)
```

Here, we see that asymmetry is primarily driven by **three studies with high effects, but a small study sample** in the bottom right corner. Interestingly, two of these studies are the ones we also detected in our [outlier](#outliers) and [influence](#influenceanalyses) analyses.

An even better way to inspect the funnel plot is through **contour-enhanced funnel plots**, which help to distinguish publication bias from other forms of asymmetry [@peters2008contour]. Contour-enhanced funnels include colors signifying the significance level into which the effects size of each study falls. We can plot such funnels using this code:

```{r,message=FALSE,warning=FALSE,eval=FALSE}
funnel(m.hksj, xlab="Hedges' g", 
       contour = c(.95,.975,.99),
       col.contour=c("darkblue","blue","lightblue"))+
legend(1.4, 0, c("p < 0.05", "p<0.025", "< 0.01"),bty = "n",
       fill=c("darkblue","blue","lightblue"))
```
```{r, echo=FALSE, fig.width=7}
library(png)
library(grid)
img <- readPNG("funnel.PNG")
grid.raster(img)
```

We can see in the plot that while **some studies have statistically significant effect sizes** (blue background), other do not (white background). We also see a trend that while the moderately sized studies are partly significant and non-significant, with slightly more significant studies, the asymmetry is much larger for small studies. This gives us a hint that publication bias might indeed be present in our analysis.

### Testing for funnel plot asymmetry using Egger's test

**Egger's test of the intercept** [@egger1997bias] quantifies the funnel plot asymmetry and performs a statistical test.

We have prepared a function called `eggers.test` for you, which can be found below. The function is a wrapper for the `metabias` function in `meta`.

Again, R doesn't know this function yet, so we have to let R learn it by **copying and pasting** the code underneath **in its entirety** into the **console** on the bottom left pane of RStudio, and then hit **Enter ⏎**.

```{r}
eggers.test<-function(data){

  data<-data
  eggers<-metabias(data)
  intercept<-as.numeric(eggers$estimate[1])
  intercept<-round(intercept,digits=3)
  se.intercept<-eggers$estimate[2]
  lower.intercept<-as.numeric(intercept-1.96*se.intercept)
  lower.intercept<-round(lower.intercept,digits = 2)
  higher.intercept<-as.numeric(intercept+1.96*se.intercept)
  higher.intercept<-round(higher.intercept,digits = 2)
  ci.intercept<-paste(lower.intercept,"-",higher.intercept)
  ci.intercept<-gsub(" ", "", ci.intercept, fixed = TRUE)
  intercept.pval<-as.numeric(eggers$p.value)
  intercept.pval<-round(intercept.pval,digits=5)
  eggers.output<-data.frame(intercept,ci.intercept, intercept.pval)
  names(eggers.output)<-c("intercept","95%CI","p-value")
  title<-"Results of Egger's test of the intercept"
  
print(title)
print(eggers.output)
}
```

Now we can use the `eggers.test` function. We only have to specify our meta-analysis output `m.hksj` as the `data` the function should use.

```{r}
eggers.test(data=m.hksj)
```

The function returns the `intercept` along with its confidence interval. We can see that the `p-value` of Egger's test is **significant** ($p<0.05$), which means that there is substanital asymmetry in the Funnel plot. This asymmetry could have been caused by publication bias.

### Duval & Tweedie's trim-and-fill procedure {#dant}

**Duval & Tweedie's trim-and-fill procedure** [@duval2000trim] is also based the funnel plot and its symmetry/asymmetry. When **Egger's test is significant**, we can use this method to estimate what **the actaul effect size would be had the "missing" small studies been published**. The procedure **imputes** missing studies into the funnel plot until symmetry is reached again.

```{block,type='rmdinfo'}
**The trim-and-fill procedure includes the following five steps** [@schwarzer2015meta]:

1.  Estimating the number of studies in the outlying (right) part of the funnel plot
2. Removing (trimming) these effect sizes and pooling the results with the remaining effect sizes
3. This pooled effect is then taken as the center of all effect sizes
4. For each trimmed/removed study, a additional study is imputed, mirroring the effect of the study on the left side of the funnel plot
5. Pooling the results with the imputed studies and the trimmed studies included
```

The **trim-and-fill-procedure** can be performed using the `trimfill` function in `meta`, and specifying our meta-analysis output.

```{r}
trimfill(m.hksj)
```

We see that the procedure identified and trimmed **eight studies** `(with 8 added studies)`). The overall effect estimated by the procedure is $g = 0.34$.

Let's compare this to our initial results.

```{r}
m.hksj$TE.random
```

The initial pooled effect size was $g = 0.59$, which is substantially larger than the bias-corrected effect. In our case, if we assume that **publication bias** was a problem in the analyses, the **trim-and-fill procedure** lets us assume that our initial results were **overestimated** due to publication bias, and the "true" effect when controlling for selective publication might be $g = 0.34$ rather than $g = 0.59$.

If we store the results of the `trimfill` function in an **object**, we can also create **funnel plots including the imputed studies**.

```{r}
m.hksj.trimfill<-trimfill(m.hksj)
funnel(m.hksj.trimfill,xlab = "Hedges' g")
```

## P-Curve {#pcurve}

In the last chapter, we showed you how you can apply **Egger's test of the intercept**, **Duval & Tweedie's trim and fill procedure**, and inspect **Funnel plots** in R.

As we have mentioned before, recent research has shown **that the assumptions of the small-effect study methods may be inaccurate in many cases**. The **Duval & Tweedie trim-and-fill procedure** in particular has been shown to be prone to providing **inaccurate effect size estimates** [@simonsohn2014pb].

**P-curve Analysis** has been proposed as an alternative way to assess publication bias and estimate the true effect behind our collected data.

P-Curve assumes that publication bias is not primarily generated because researchers **do not publish non-significant results**, but because the **"play" around with their data (e.g., selectively removing outliers, choosing different outcomes, controlling for different variables) until a non-significant finding becomes significant**. This (bad) practice is called **p-hacking**, and has been shown to be extremely frequent among researchers [@head2015extent].

```{block,type='rmdinfo'}
**The idea behind P-Curve**

<iframe width="700" height="415" src="https://www.youtube.com/embed/V7pvYLZkcK4" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>
```


### Preparing RStudio and our data

To use **P-curve**, we need our data in the same format as the one we used for the `metacont` function in [Chapter 3.1.1](#excel_preparation). We need to specify `Me`, `Se`, `Ne`, `Mc`, `Sc`, and `Nc`. My `metacont` data already has this format. 

```{r,echo=FALSE}
library(knitr)
load("metacont_data.RData")
kable(metacont)
```

To use p-curve, we also have to install and load **four packages**. The `esc` package [@esc], the `compute.es` package [@del2013compute], the `stringr` package, and the `poibin` package [@hong2011poibin].

```{r,message=FALSE,warning=FALSE}
library(compute.es)
library(esc)
library(stringr)
library(poibin)
```

To **prepare our data** which we stored in the format described above, we need to use the `pcurve_dataprep` function, which we prepared for you.

Again, R doesn't know this function yet, so we have to let R learn it by **copying and pasting** the code underneath **in its entirety** into the **console** on the bottom left pane of RStudio, and then hit **Enter ⏎**.


```{r}
pcurve_dataprep<-function(data){
data<-data
Me<-as.numeric(data$Me)
Se<-as.numeric(data$Se)
Ne<-as.numeric(data$Ne)
Mc<-as.numeric(data$Mc)
Sc<-as.numeric(data$Sc)
Nc<-as.numeric(data$Nc)

esc<-esc_mean_sd(grp1m=Me, 
                 grp1sd=Se, 
                 grp1n=Ne, 
                 grp2m=Mc, 
                 grp2sd=Sc, 
                 grp2n=Nc, 
                 es.type = "d")

output<-des(d=esc$es,n.1=Ne,n.2=Nc, verbose = FALSE)
output$r<-abs(output$r)
tot<-data.frame(paste("r(",output$N.total-2,")=",output$r))
colnames(tot)<-c("output")
tot$output<-gsub(" ", "", tot$output, fixed = TRUE)
totoutput<-as.character(tot$output)
print(tot, row.names = FALSE)
write(totoutput,ncolumns=1, file="input.txt")
}

```

To use the `pcurve_dataprep` function, we simply have to specify our dataset. In my case, this is `metacont`.

```{r}
pcurve_dataprep(data=metacont)
```

The function gives us **the correct format of our effect sizes (t-values)** which we need to conduct the p-curve analysis.

The function also **automatically stores a .txt-File (input.txt) in your working directory folder on your computer**.

If you forgot where your working directory is, use the `getwd` function.

```{r}
getwd()
```

This tells you the path to your working directory, where the **input.txt** file should be stored.

### Creating P-Curves

To create p-curves, we have to use the `pcurve_app` function. The code for this function is very long, so it is not displayed here. The code for the function has been made publically available online by **Uri Simonsohn, Leif Nelson, and Joseph Simmons** and can be found [here](http://p-curve.com/app4/pcurve_app4.06.r).

Again, R doesn't know this function yet, so we have to let R learn it by **copying and pasting** the code underneath **in its entirety** into the **console** on the bottom left pane of RStudio, and then hit **Enter ⏎**.

```{r,echo=FALSE}

###################################################################################################################
#This is the R Code behind the p-curve app 4.06
#Last updated: 2017 11 30
#Written by Uri Simonsohn (urisohn@gmail.com)
# 
###################################################################################################################

#The app is a single function, pcurve_app() .
#To run it you need to store statistical tests in a textfile with the format from the example, see http://p-curve.com/app4/example.txt
#Then, you just run the function on that file. For exmaple, if you save the file "example2.txt" in the folder "c:\data" then you would run:
#pcurve_app("example2.txt","c://data"). The app generates various text files with the results in table format and saves the figures 
# as .PNG files on the same folder as where you put the .txt file
#
#This R Code was written specifically to run in the back-end of the online app and thus it may do things in a way that's not the most intuitive or most efficient
#for a user actually interacting with the R Code. The goal is to exactly replicate what is done on the website, thus the R Code is left intact. The website runs the exact same code
#below.

###################################################################################################################


  library(stringr)  #Library to process string variables (text of the entered tests)
  library(poibin)   #This library has the poisson binomial, the distribution of the sum of binomial with different underlying probabilities

 pcurve_app=function(file1,dir1)
 {
  #0.1 Set up parameters
  #used to compute the binomial test given that each test has a (slightly) different probability of p<.025 depending on its own noncentral parameter
  #See Hong (2013) - "On computing the distribution function for the Poisson binomial distribution" Computational Statistics and Data Analysis, V59, p.41-51 - http://dx.doi.org/10.1016/j.csda.2012.10.006 
  
  setwd(dir1)                               #Set as Working Directory the folder on server where temporary files are saved
  filek=substr(file1,1,nchar(file1)-4)      #filek is the name of the file entered by the user/server, dropping the extension

  #Disable scientific notation
	options(scipen=999)
  
##############################################
#(0) CREATE A FEW FUNCTIONS
##############################################
#Function 1 - functions that find non-centrality parameter for f,chi distributions that gives some level of power

#F-test 
#Note: starting with app 4.0, t() are converted to F() and Z to chisq() to avoid unnecessary repetition of code
#So we only need code to find ncp for F() and chisq()

  getncp.f =function(df1,df2, power)   {      
  error = function(ncp_est, power, x, df1,df2) pf(x, df1 = df1, df2=df2, ncp = ncp_est) - (1-power)   
  xc=qf(p=.95, df1=df1,df2=df2) 
  return(uniroot(error, c(0, 1000), x = xc, df1 = df1,df2=df2, power=power)$root)  }


#chisq-test
  getncp.c =function(df, power)   {      
  xc=qchisq(p=.95, df=df) 
  error = function(ncp_est, power, x, df)      pchisq(x, df = df, ncp = ncp_est) - (1-power)   
  return(uniroot(error, c(0, 1000), x = xc, df = df, power=power)$root)   }

#Combine both in single function
  getncp=function(family,df1,df2,power) {
    if (family=="f") ncp=getncp.f(df1=df1,df2=df2,power=power)
    if (family=="c") ncp=getncp.c(df=df1,power=power)
  return(ncp)  }
  
###############################################################################
#Function 2 - percent() : makes a number look like a percentage
percent <- function(x, digits = 0, format = "f", ...)   {
  paste(formatC(100 * x, format = format, digits = digits, ...), "%", sep = "")
}
###############################################################################


###############################################################################
#Function 3 - pbound: bound p-values and pp-values by precision of measurement to avoid errors
pbound=function(p) pmin(pmax(p,2.2e-16),1-2.2e-16)


#Function 4 - prop33(pc) - Computes % of p-values that are expected to be smaller than pc, 
                        #for the tests submitted to p-curve, if power is 33%
prop33=function(pc)
{
  #pc: critical  p-value
  
  #Overview:
  #Creates a vector of the same length as the number of tests submitted to p-curve, significant and not,
  #    and computes the proportion of p-values expected to be smaller than {pc} given the d.f. 
  #    and outputs the entire vector, with NA values where needed
  
  #F-tests (& thus  t-tests)
    prop=ifelse(family=="f" & p<.05,1-pf(qf(1-pc,df1=df1, df2=df2),df1=df1, df2=df2, ncp=ncp33),NA)
  #Chi2 (& thus Normal)
    prop=ifelse(family=="c" & p<.05,1-pchisq(qchisq(1-pc,df=df1),  df=df1, ncp=ncp33),prop)
  #output it
    prop
}

#Function 5 Stouffer test for a vector of pp-values
  stouffer=function(pp) sum(qnorm(pp),na.rm=TRUE)/sqrt(sum(!is.na(pp)))
    
  
###############################################################################
#(1) PROCESS USER INPUT AND CONVERT IT INTO USABLE R DATA
###############################################################################  
  
  
#(1.1) Load data
  
  
#From file;
  raw = scan(file=file1,what="")      #read file on server
  raw=tolower(raw)                                      #lower case
  ktot=length(raw)                                      #count studies

#Create vector that numbers studies 1 to N,includes n.s. studies
  k=seq(from=1,to=length(raw))

#1.2 Parse the entered text into usable statistical results
#1.3 Create test type indicator
  stat=substring(raw,1,1)          #stat:   t,f,z,c,r
  test=ifelse(stat=="r","t",stat)  #test:   t,f,z,c      (r-->t)  

#1.4 Create family to turn t-->F and z-->chi2
  family=test
  family=ifelse(test=="t","f",family)
  family=ifelse(test=="z","c",family)

#Note on terminology:
  #Stat:   t,f,c,z,r  is what the user entered, t,f,c,z,r
  #test:   t,f,c,z    is the test statistic, same as stat but with r-->t
  #family: f,c        converting t-->f and z-->c

#1.5 Find comma,parentheses,equal sign 
  par1 =str_locate(raw,"\\(")[,1]         #(  First  parenthesis
  par2 =str_locate(raw,"\\)")[,1]         #)  Second parenthesis
  comma=str_locate(raw,",")[,1]           #,  comma
  eq   =str_locate(raw,"=")[,1]           #=  equal

#1.6 DF for t-tests
  df=as.numeric(ifelse(test=="t",substring(raw,par1+1,par2 -1),NA))             #t(df) later assigned to df2 in  F test with df1=1

#1.7 DF1 for all tests 
#   recall, normal=sqrt(chi(1)) so df1=1 for Normal, same f(1,df)<-t(df)
  df1=as.numeric(ifelse(test=="f",substring(raw,par1+1,comma-1),NA))            #If F test, take value after comma, NA otherwise
  df1=as.numeric(ifelse(test=="z",1,df1))                                       #If Z replace missing value with a 1
  df1=as.numeric(ifelse(test=="t",1,df1))                                       #If t, replace missing value with a 1
  df1=as.numeric(ifelse(test=="c",substring(raw,par1+1,par2 -1),df1))           #If c, replace missing value with value in ()

#1.8 DF2 for F(df1,df2) tests
  df2=as.numeric(ifelse(test=="f",substring(raw,comma+1,par2-1),NA))            #F-test
  df2=as.numeric(ifelse(test=="t",df,df2))                                      #t(df) into the df2 F test

#1.9 Take value after equal sign, the value of the test-statistic, and put it in vector "equal"
  equal=abs(as.numeric(substring(raw,eq+1)))  #if not a r(), take the value after the ="

#1.10  Go from "equal" (the value after = sign) to F or Chi2 value,
  value=ifelse((stat=="f" | stat=="c"),equal,NA)                      #For F and Chi2 test, equal=value
  value=ifelse(stat=="r", (equal/(sqrt((1-equal**2)/df2)))**2,value)  #For correlation, first turn value (r) to t, then square t. (using t=r/sqrt(1-r**2)/DF)
  value=ifelse(stat=="t", equal**2 ,value)                            #For t and Z, square it since t(df)**2=f(1,df) and z**2=chi(1)
  value=ifelse(stat=="z", equal**2 ,value)  
    
  
#1.11 Compute p-values
  p=ifelse(family=="f",1-pf(value,df1=df1,df2=df2),NA)
  p=ifelse(family=="c",1-pchisq(value,df=df1),p)
  p=pbound(p)  #Bound it to level of precision, see function 3 above 

#1.12 Count  studies
  #ktot is all studies
  ksig= sum(p<.05,na.rm=TRUE)     #significant studies
  khalf=sum(p<.025,na.rm=TRUE)    #half p-curve studies

###############################################################################  
#(2) COMPUTE PP-VALUES
##############################################################################  
  
#2.1 Right Skew, Full p-curve
  ppr=as.numeric(ifelse(p<.05,20*p,NA))            #If p<.05, ppr is 1/alpha*p-value, so 20*pvalue, otherwise missing. 
  ppr=pbound(ppr)                                  #apply pbound function to avoid 0


#2.2 Right Skew, half p-curve
  ppr.half=as.numeric(ifelse(p<.025,40*p,NA))    #If p<.05, ppr is 40*pvalue, otherwise missing. 
  ppr.half=pbound(ppr.half)

#2.3 Power of 33%
#2.3.1 NCP for  f,c distributions
# NCP33 (noncentrality parameter giving each test in p-curve 33% power given the d.f. of the test)
    ncp33=mapply(getncp,df1=df1,df2=df2,power=1/3,family=family)  #See function 1 above
    
#2.3.2 Full-p-curve
#Using the ncp33 compute pp33
  pp33=ifelse(family=="f" & p<.05,3*(pf(value, df1=df1, df2=df2, ncp=ncp33)-2/3),NA)
  pp33=ifelse(family=="c" & p<.05,3*(pchisq(value, df=df1, ncp=ncp33)-2/3),pp33)
  pp33=pbound(pp33)

#2.3.3 HALF-p-curve
  #Share of p-values expected to be p<.025 if 33% power (using Function 4 from above, prop33() )
      prop25=3*prop33(.025)
      prop25.sig=prop25[p<.05]
    

 #Compute pp-values for the half
    pp33.half=ifelse(family=="f" & p<.025, (1/prop25)*(    pf(value,df1=df1,df2=df2,ncp=ncp33)-(1-prop25)),NA)
    pp33.half=ifelse(family=="c" & p<.025, (1/prop25)*(pchisq(value,df=df1,         ncp=ncp33)-(1-prop25)),pp33.half)
    pp33.half=pbound(pp33.half)

  
###############################################################################  
#(3) INFERENCE - STOUFFER & BINOMIAL
##############################################################################  

#3.1 Convert pp-values to Z scores, using Stouffer function above
  Zppr =     stouffer(ppr)            #right skew  - this is a Z value from Stouffer's test
  Zpp33=     stouffer(pp33)           #33% - idem 
  Zppr.half= stouffer(ppr.half)       #right skew, half p-curve - idem 
  Zpp33.half=stouffer(pp33.half)      #33% skew, half p-curve - idem 

#3.2 Overall p-values from Stouffer test
  p.Zppr =pnorm(Zppr)	
  p.Zpp33=pnorm(Zpp33)
  p.Zppr.half =pnorm(Zppr.half)
  p.Zpp33.half=pnorm(Zpp33.half)
  
#3.3 Save results to file
  main.results=as.numeric(c(ktot, ksig, khalf, Zppr, p.Zppr, Zpp33, p.Zpp33, Zppr.half, p.Zppr.half, Zpp33.half, p.Zpp33.half))
  write(main.results, paste("STOUFFER_",filek,".txt", sep=""),sep="\n")

#3.4 BINOMIAL
  #Observed share of p<.025
    prop25.obs=sum(p<.025)/sum(p<.05)
#3.4.1 Flat null
  binom.r=1-pbinom(q=prop25.obs*ksig- 1, p=.5, size=ksig)     #The binomial in R computes the probability of x<=xo. We want prob(x>=x0) so we subtract one from x, and 1-prob()
#3.4.2 Power of 33% null
  binom.33=ppoibin(kk=prop25.obs*ksig,pp=prop25[p<.05])             

  #syntax for ppoibin():
  #   kk: is the proportion of succeses, a scalar, in this case, the share of p<.025
  #   pp: is the probabilty of success for each attempt, the number of attempts is determined
  #    by the length of the vector. For example ppoibin(kk=0,pp=c(.5,.5,.5)) is .125,
  #    if there are three attempts, each with probability .5, the odds of getting 0 succeses is .125
  #     ppoibin(kk=1,pp=c(1,.75)), in turn computes the probability of getting 1 success
  #     when one has a 100% of success, and the other 75%, and the solution is .25, since
  #     the first one succeeds for sure and the second would need to fail, with 25% chance.


#3.4.3  Save binomial results
  binomial=c(mean(prop25.sig), prop25.obs, binom.r, binom.33)
  write(binomial, paste("BINOMIAL_",filek,".txt", sep=""),sep="\n")

    
#3.5 Beutify results for printing in figure
  #3.5.1 Function that processes p-values and bounds when >.999 or <.0001
    cleanp=function(p)
    {
      p.clean=round(p,4)           #Round it
      p.clean=substr(p.clean,2,6)  #Drop the 0
      p.clean=paste0("= ",p.clean)
      if (p < .0001) p.clean= " < .0001"
      if (p > .9999) p.clean= " > .9999"
      return(p.clean)
    }  
    
  #If there are zero p<.025, change Stouffer values for half-p-curve tests for "N/A" messages	
		if (khalf==0) {
			 Zppr.half ="N/A"
			 p.Zppr.half ="=N/A"
			 Zpp33.half ="N/A"
			 p.Zpp33.half ="=N/A"
		}

       
  #If there are more than 1 p<.025, round the Z and beutify the p-values
    if (khalf>0) {
			 Zppr.half =round(Zppr.half,2)
			 Zpp33.half =round(Zpp33.half,2)
			 p.Zppr.half=cleanp(p.Zppr.half)
			 p.Zpp33.half=cleanp(p.Zpp33.half)
			 }
    
  #Clean  results for full test
      Zppr=round(Zppr,2)
      Zpp33=round(Zpp33,2)
      p.Zppr=cleanp(p.Zppr)
      p.Zpp33=cleanp(p.Zpp33)
      binom.r=cleanp(binom.r)
      binom.33=cleanp(binom.33)
              
################################################
#(4) POWER ESTIMATE
################################################

#4.1 Function powerfit(power_est) - Returns the Stouffer Z of observing at least as right skewed a p-curve if  power=power_est
                                    #if Z=0, power_est is the best fit (p=50%). 
                                    #if Z<0 the best fit is <power_est, 
                                    #if Z>0 the best fit is >power_est
  powerfit=function(power_est) 
  {
  #4.1.1 Get the implied non-centrality parameters (ncps) that give power_est to each test submitted to p-curve
    ncp_est=mapply(getncp,df1=df1,df2=df2,power=power_est,family=family)
  #4.1.2 Compute implied pp-values from those ncps_est,  
    pp_est=ifelse(family=="f" & p<.05,(pf(value,df1=df1,df2=df2,ncp=ncp_est)-(1-power_est))/power_est,NA)
    pp_est=ifelse(family=="c" & p<.05,(pchisq(value,df=df1,ncp=ncp_est)-(1-power_est))/power_est,pp_est)
    pp_est=pbound(pp_est)
  #4.1.3 Aggregate pp-values for null that power=power_est via Stouffer
    return(stouffer(pp_est))   #This is a z score, so powerfit is expressed as the resulting Z score.
  }
  
  
#4.2 COMPUTE FIT FOR EACH POWER for 5.1%, AND THEN 6-99%, AND PLOT IT. With power=5% boundary condition lead to errors
#This becomes the diagnostic plot and gives us the best estimate, within 1%, of power.

#Create image file to contain results
  png(filename=paste(filek,"_fit.png",sep=""), width=1200, height=1000, res=200)  
# Fit will be evaluated at every possible value of power between 5.1% and 99% in steps of 1%, stored in fit()
  fit=c()                                          #Create empty vector
  fit=abs(powerfit(.051))                      #Start it eavaluting fit of 5.1% power
  for (i in 6:99)   fit=c(fit,abs(powerfit(i/100))) #Now do 6% to 99%
  
# Find the minimum
  #which ith power level considered leads to best estimate
    mini=match(min(fit,na.rm=TRUE),fit)       
  #convert that into the power level, the ith value considered is (5+ith)/100
    hat=(mini+4)/100                          
#Plot results
#create the x-axis
  x.power=seq(from=5,to=99)/100 
#Draw the line
  par(mar=c(5.1,8.1,4.1,2.1))  #Margins 
  plot(x.power,fit,xlab="Underlying Power", ylab="",ylim=c(-.15,max(fit)), main="")  
#Figure title
  mtext("Estimating underlying statistical power",side=3,line=1.75,cex=1.5,at=0.4)
  mtext("(Plot should be V shaped, or a smooth line to 99%; else don't trust estimate)",col='red',side=3,line=.5,cex=1,at=0.4)
#Make red dot at the estimate
  points(hat,min(fit,na.rm=TRUE),pch=19,col="red",cex=2)    
#Put a label with the estimate value
  sign="="
  if (hat<.06) sign="<"
  text(min(.5,max(.28,hat)),min(fit,na.rm=TRUE)-.15,paste0("Estimated Power ",sign," ",hat*100,"%"))
#Label the y-axis
  mtext(c("Good","Bad"),side=2,line=3,at=c(0,max(fit)),las=1,cex=1.25,col=c("blue","red"))
  mtext("Fit for observed p-curve",side=2,line=6.5,cex=1.5)
  mtext("(Stouffer test for null of power in x-axis)\n|Z-score|",side=2,line=4.5,col="gray")
  dev.off()
 
  
#4.3 Confidence interval for power estimate
#4.3.1 Function get.power_pct(pct) 
  get.power_pct =function(pct)   {   
      #Function that finds power that gives p-value=pct for the Stouffer test 
      #for example, get.power_pct(.5) returns the level of power that leads to p=.5  for the stouffer test.
      #half the time we would see p-curves more right skewed than the one we see, and half the time
      #less right-skewed, if the true power were that get.power_pct(.5). So it is the median estimate of power
      #similarliy, get.power_pct(.1) gives the 10th percentile estimate of power...
  #Obtain the normalized equivalent of pct, e.g., for 5% it is -1.64, for 95% it is 1.64
    z=qnorm(pct)  #convert to z because powerfit() outputs a z-score. 
  #Quantify gap between computed p-value and desired pct
    error = function(power_est, z)  powerfit(power_est) - z
  #Find the value of power that makes that gap zero, (root)
    return(uniroot(error, c(.0501, .99),z)$root)   }
  
#4.3.2 Boundary conditions (when the end of the ci=5% or 99% we cannot use root to find it, 
      #use boundary value instead)

  #Boundary conditions
    p.power.05=pnorm(powerfit(.051)) #Proability p-curve would be at least at right-skewed if power=.051
    p.power.99=pnorm(powerfit(.99))  #Proability p-curve would be at least at right-skewed if power=.99

#4.3.3 Find lower end of ci
#Low boundary condition? If cannot reject 5% power, don't look for lower levels, use 5% as the end 
  if (p.power.05<=.95) power.ci.lb=.05   
#High boundary condition? If we reject 99%, from below dont look for higher power, use 99% as the low end
  if (p.power.99>=.95) power.ci.lb=.99   
#If low bound is higher than 5.1% power and lower than 99% power, estimate it, find interior solution
  if (p.power.05>.95 && p.power.99<.95)  power.ci.lb=get.power_pct(.95)


#4.3.4 Higher end of CI
#If we reject 5% power from below, 5% is above the confidence interval, use 5% as the upper end of the confidence interval
  if (p.power.05<=.05) power.ci.ub=.05
#If we do not reject that 99% power, don't look higher, use 99% as the higher end 
  if (p.power.99>=.05) power.ci.ub=.99
#If the the upper bound is between 5% and 99%, find it
  if (p.power.05>.05 && p.power.99<.05) power.ci.ub=get.power_pct(.05)

    
#4.4 Save power fit estiate and ci 
  power_results=c(power.ci.lb,hat,power.ci.ub)
  write(power_results, paste("POWERHAT_",filek,".txt", sep=""),sep="\n")     
  
  #Note, I use hat as the estimate of power, with powerfit(.5) we could get a more precise best fitting 
  #level of power than the minimum in the figure above between .051 and .99, hat, but more precision than 1% in power is not informative.
   

###############################################################################  
#(5) MAIN FIGURE: OBSERVED P-CURVE AND EXPECTED UNDERL NULL AND 33% POWER
##############################################################################  

#5.1 Green line (Expected p-curve for 33% power)
#5.1.1 Proportion of tests expected to get <01, <02...
#Uses FUNCTION 4, prop33() - see top of page
  gcdf1=prop33(.01)         #vector with proportion of p-values p<.01, with 33% power
  gcdf2=prop33(.02)         #              ""                   p<.02,      "
  gcdf3=prop33(.03)         #              ""                   p<.03,      "
  gcdf4=prop33(.04)         #              ""                   p<.04,      "
#Note: for p<.05 we know it is 33% power

#5.1.2 Now compute difference, and divide by 1/3 to get the share of significant p-values in each bin      
  green1=mean(gcdf1,na.rm=TRUE)*3        #Average of the vector p<.01
  green2=mean(gcdf2-gcdf1,na.rm=TRUE)*3  #Difference between .02 and .01
  green3=mean(gcdf3-gcdf2,na.rm=TRUE)*3  #Difference between .03 and .02
  green4=mean(gcdf4-gcdf3,na.rm=TRUE)*3  #Difference between .04 and .03
  green5=mean(1/3-gcdf4,na.rm=TRUE)*3    #Difference between .05 and .04
  #Because we have one row per test submitted, the average is weighted average, giving each test equal weight
    green=100*c(green1,green2,green3,green4,green5)  #The 5 values plotted in the figure for 33% power line


#5.2 The blue line (observed p-curve)   
  #Put all significant p-values  into bins, .01 ....05
    ps=ceiling(p[p<.05]*100)/100
  #Count # of tests in each bin
    blue=c()
  #This loop creates a vector, blue, with 5 elements, with the proportions of p=.01,p=.02...p=.05
    for (i in c(.01,.02,.03,.04,.05)) blue=c(blue,sum(ps==i,na.rm=TRUE)/ksig*100)
  
  
  #5.3 Red line
    red=c(20,20,20,20,20)
  
  #5.4 Make the graph

  #Note: R_temp & filek are parameterS set at the beggining of the program for the location of files
    png(filename=paste(filek,".png",sep=""), width=2600, height=2400, res=400)  
    
  #5.4.1  Define x-axis as p-values (.01, .02..)
      x = c(.01,.02,.03,.04,.05)

  #5.4.2 Plot the observed p-curve
    par(mar=c(6,5.5,1.5,3)) 
  #5.4.3 Does blue line cross over  68% in when p>.02?
    #Does blue line go above 68%? If yes, move up graph so that legend is not interrupted by it
      moveup=max(max(blue[2:5])-66,0)  #moveup by difference between blue and 68
      ylim=c(0,105+moveup)
      legend.top=100+moveup
    
#5.4.4 Start the plot  
      plot(x,blue,   type='l', col='dodgerblue2',  main="",
           lwd=2, xlab="", ylab="", xaxt="n",yaxt="n", xlim=c(0.01,0.051),
           ylim=ylim, bty='L', las=1,axes=F)  	

  #5.4.5 x-axis value labels
    x_=c(".01",".02",".03",".04",".05")
    axis(1,at=x,labels=x_)
  #5.4.6 y-axis value labels
    y_=c("0%","25%","50%","75%","100%")
    y=c(0,25,50,75,100)
    axis(2,at=y,labels=y_,las=1,cex.axis=1.2)
  
  #5.4.7 Add y-axis label
    mtext("Percentage of test results",font=2,side=2,line=3.85,cex=1.25)
  #5.4.8 Add y-axis label
    mtext("p            ",font=4,side=1,line=2.3,cex=1.25)
    mtext(" -value",      font=2,side=1,line=2.3,cex=1.25)
  
  #5.4.9 Add little point in actual frequencies
    points(x,blue,type="p",pch=20,bg="dodgerblue2",col="dodgerblue2")
  #5.4.10 Add value-labels
    text(x+.00075,blue+3.5,percent(round(blue)/100),col='black', cex=.75)
  #5.4.11 Add red and green lines
    lines(x,red,   type='l', col='firebrick2',    lwd=1.5, lty=3)
    lines(x,green, type='l', col='springgreen4',  lwd=1.5, lty=5)
  
  #5.4.12 Legend
     #x Position of text 
      tab1=.017          #Labels for line at p=.023 in x-axis
      tab2=tab1+.0015    #Test results and power esimates at tab1+.0015
     #gaps in y positions
      gap1=9             #between labels
      gap2=4             #between lable and respective test (e.g., "OBserved p-curve" and "power estimate")
     
    #Color of font for test results
      font.col='gray44'  
     
    #Legend for blue line
          #Put together the text for power estimate in a single variable
              text.blue=paste0("Power estimate: ",percent(hat),", CI(",
                                percent(power.ci.lb),",",
                                percent(power.ci.ub),")")
           #Print it
              text(tab1,legend.top,     adj=0,cex=.85,bquote("Observed "*italic(p)*"-curve"))
              text(tab2,legend.top-gap2,adj=0,cex=.68,text.blue,col=font.col)
     
     #Legend for red line           
              text.red=bquote("Tests for right-skewness: "*italic(p)*""[Full]~.(p.Zppr)*",  "*italic(p)*""[Half]~.(p.Zppr.half))
                   #note: .() within bquote prints the value rather than the variable name
              text(tab1,legend.top-gap1,    adj=0,cex=.85, "Null of no effect" )  
              text(tab2,legend.top-gap1-gap2,  adj=0,cex=.68, text.red, col=font.col ) 

     #Legend for green line           
          text.green=bquote("Tests for flatness: "*italic(p)*""[Full]~.(p.Zpp33)*",  "*italic(p)*""[half]~.(p.Zpp33.half)*",  "*italic(p)*""[Binomial]~.(binom.33))
          text(tab1,legend.top-2*gap1,    adj=0,cex=.85,"Null of 33% power") 
          text(tab2,legend.top-2*gap1-gap2,  adj=0,cex=.68,text.green,col=font.col) 

      #LINES in the legend:
        segments(x0=tab1-.005,x1=tab1-.001,y0=legend.top,y1=legend.top,      col='dodgerblue2',lty=1,lwd=1.5) 
        segments(x0=tab1-.005,x1=tab1-.001,y0=legend.top-gap1,  y1=legend.top-gap1,col='firebrick2',lty=3,lwd=1.5)      
        segments(x0=tab1-.005,x1=tab1-.001,y0=legend.top-2*gap1,y1=legend.top-2*gap1,col='springgreen4',lty=2,lwd=1.5)      
      #Box for the legend
        rect(tab1-.0065,legend.top-2*gap1-gap2-3,tab1+.032,legend.top+3,border='gray85')   
        
  #NOTE AT BOTTOM   
  
  #Number of tests in p-curve      
        msgx=bquote("Note: The observed "*italic(p)*"-curve includes "*.(ksig)*
                                 " statistically significant ("*italic(p)*" < .05) results, of which "*.(khalf)*
                                 " are "*italic(p)*" < .025.")
        mtext(msgx,side=1,line=4,cex=.65,adj=0)
      #Number of n.s. results entered
        kns=ktot-ksig
        if (kns==0) ns_msg="There were no non-significant results entered." 	
	      if (kns==1) ns_msg=bquote("There was one additional result entered but excluded from "*italic(p)*"-curve because it was "*italic(p)*" > .05.") 	
	      if (kns>1)  ns_msg=bquote("There were "*.(kns)*" additional results entered but excluded from "*italic(p)*"-curve because they were "*italic(p)*" > .05.")
        mtext(ns_msg,side=1,line=4.75,cex=.65,adj=0)

        dev.off()  
  
  
###############################################################################  
#(6)  SAVE PP-VALUE CALCULATIONS TO TEXT FILES
##############################################################################  

#6.1 Table contains the original text enter, the corresponding p-value, the pp-values, and Z-scores for those pp-values
  table_calc=data.frame(raw, p, ppr, ppr.half, pp33, pp33.half, 
                      qnorm(ppr),  qnorm(ppr.half), qnorm(pp33), qnorm(pp33.half))
#6.2 Headers
  headers1=c("Entered statistic","p-value", "ppr", "ppr half", "pp33%","pp33 half",
           "Z-R","Z-R half","Z-33","z-33 half")
#6.3 Put headers onto table
  table_calc=setNames(table_calc,headers1)

#6.4 Save it
  write.table(table_calc,sep="	", row.names=FALSE, file=paste("Calculations_",filek,".txt", sep=""))

#6.5 Save results behind p-curve figure
  headers2=c("p-value","Observed (blue)","Power 33% (Green)", "Flat (Red)")
  table_figure=setNames(data.frame(x,blue,green,red),headers2)
#Save it to file
  write.table(table_figure, sep="	",row.names=FALSE, file=paste("FigNumbers_",filek,".txt", sep=""))
  

################################################
# 7 CUMULATIVE P-CURVES
################################################
  

#7.1 FUNCTION THAT RECOMPUTES OVERALL STOUFFER TEST WITHOUT (K) MOST EXTREME VALUES, ADJUSTING THE UNIFORM TO ONLY INCLUDE RANGE THAT REMAINS
  dropk=function(pp,k,droplow) 
  {
  #Syntax:
  #pp: set of pp-values to analyze sensitivity to most extremes
  #k:  # of most extreme values to exclude
  #dropsmall: 1 to drop smallest, 0 to drop largest
  
  pp=pp[!is.na(pp)]                             #Drop missing values 
  n=length(pp)                                  #See how many studies are left
  pp=sort(pp)                                   #Sort the pp-value from small to large
  if (k==0) ppk=pp                              #If k=0 do nothing for nothing is being dropped
  #If we are dropping low values
  if (droplow==1 & k>0) 
  {
    #Eliminate lowest k from the vector of pp-values
    ppk=(pp[(1+k):n])
    ppmin=min(pp[k],k/(n+1))   #Boundary used to define possible range of values after exclusion
    ppk=(ppk-ppmin)/(1-ppmin)  #Take the k+1 smallest pp-value up to the highest, subtract from each the boundary value, divide by the range, ~U(0,1) under the null
    #This is explained in Supplement 1 of Simonsohn, Simmons Nelson, JEPG 2016 "Better p-curves" paper. See https://osf.io/mbw5g/
      
    
  }
  
  #If we are dropping high values
  if (droplow==0 & k>0) 
  {
    #Eliminate lowest k from the vector of pp-values
    ppk=pp[1:(n-k)]
    ppmax=max(pp[n-k+1],(n-k)/(n+1))  #Find new boundary of range
    ppk=ppk/ppmax                      #Redefine range to make U(0,1)
  }
  #In case of a tie with two identical values we would have the ppk be 0 or 1, let's replace that with almost 0 and almost 1
  ppk=pmax(ppk,.00001)                       #Adds small constant to the smallest redefined p-value, avoids problem if dropped p-value is "equal" to next highest, then that pp-value becomes 0
  ppk=pmin(ppk,.99999)                       #Subtract small constant to the largest  redefined pp-value, same reason
  Z=sum(qnorm(ppk))/sqrt(n-k)                        
  return(pnorm(Z))
  } #End function dropk


#7.2 Apply function, in loop with increasing number of exclusions, to full p-curve
#Empty vectors for results
  droplow.r=droplow.33=drophigh.r=drophigh.33=c()

#Loop over full p-curves
  for (i in 0:(round(ksig/2)-1))
  {
  #Drop the lowest k studies in terms of respective overall test
  #Right skew
    droplow.r= c(droplow.r,   dropk(pp=ppr,k=i,droplow=1))
    drophigh.r=c(drophigh.r,  dropk(pp=ppr,k=i,droplow=0))
  #Power of 33%
    droplow.33=c(droplow.33,  dropk(pp=pp33,k=i,droplow=1))
    drophigh.33=c(drophigh.33, dropk(pp=pp33,k=i,droplow=0))
  } 

#Half p-curves

  if (khalf>0)
  {
    droplow.halfr=drophigh.halfr=c()
    for (i in 0:(round(khalf/2)-1))
    {
    #Drop the lowest k studies in terms of respective overall test
      droplow.halfr= c(droplow.halfr,   dropk(pp=ppr.half,k=i,droplow=1))
      drophigh.halfr=c(drophigh.halfr,  dropk(pp=ppr.half,k=i,droplow=0))
    } #End loop
  }#End if that runs calculations only if khalf>0

#7.3 FUNCTION THAT DOES THE PLOT OF RESULTS
  plotdrop=function(var,col)
  {
    k=length(var)
  #Plot the dots
    plot(0:(k-1),var,xlab="",ylab="",type="b",yaxt="n",xaxt="n",main="",
       cex.main=1.15,ylim=c(0,1),col=col)
  #Add marker in results with 0 drops
    points(0,var[1],pch=19,cex=1.6)
  #Red line at p=.05
    abline(h=.05,col="red")  
  #Y-axis value labels
    axis(2,c(.05,2:9/10),labels=c('.05','.2','.3','.4','.5','6','7','.8','.9'),las=1,cex.axis=1.5)
    axis(1,c(0:(k-1)),las=1,cex.axis=1.4)
  }


#7.4 RUN PLOT FUNCTION 6 TIMES
#Save what follows
  png(filename=paste(filek,"_cumulative.png",sep=""), width=4000, height=4000, res=400)

#Put all graphs together
  par(mfrow=c(3,2),mar=c(4,3,0,2),mgp=c(2.5,1,0),oma=c(5,14,5,1)) 
#Plot(1) - Right skew, drop low
   plotdrop(droplow.r,col="dodgerblue2")
   mtext(side=2,line=4.5,bquote(italic(P)*"-value of overall test"),font=2,cex=1.25)
   mtext(side=2,line=3,"(Stouffer's method)",font=3,cex=1)
#Rigt Skew label
  mtext("Right skew\n\n",line=7,side = 2,cex=1.2,las=1.25,col="dodgerblue2")
  mtext(bquote("(Full "*italic(p)*"-curve)"),line=7,side = 2,cex=1,las=1,col="dodgerblue2") 
#Low to high label
  mtext(bquote("Drop "*italic(k)~bold("lowest")~"original "*italic(p)*"-values"),line=1,side = 3,cex=1.5,las=1) 
#Plot(2) - Right skew, drop high
  plotdrop(drophigh.r,col="dodgerblue2")
  mtext(bquote("Drop "*italic(k)~bold("highest")~"original "*italic(p)*"-values"),line=1,side = 3,cex=1.5,las=1) 
#Plot (3) - Half right skew, drop low
  #If there are P<.025 results do it.
  if (khalf>0)
    {
    plotdrop(droplow.halfr,col="blue3")
    mtext(side=2,line=4.5,bquote(italic(P)*"-value of overall test"),font=2,cex=1.25)
    mtext(side=2,line=3,"(Stouffer's method)",font=3,cex=1)    
    }

  #otherwise, message saying no chart
    if (khalf==0)
    {
    plot(1, type="n", axes=F, xlab="", ylab="")
    mtext(side=1,line=-10,"No p<.025 results, so no half p-curve",font=2,cex=1.25)
    }

#Half p-curve label
  mtext("Right skew\n\n",line=7,side = 2,cex=1.25,las=1,col="blue3")
  mtext(bquote("(Half "*italic(p)*"-curve)"),line=7,side = 2,cex=1,las=1,col="blue3") 
  
#Plot (4) - Half right skew, drop high
  plot(1, type="n", axes=F, xlab="", ylab="")
  mtext(side=1,line=-10,"Graph not needed",font=2,cex=1.25)
  mtext(side=1,line=-8.5,"(Half p-curve excludes high p-values)",font=3,cex=.75)
  
#Plot (5) Power 33%, drop low
  plotdrop(drophigh.33,col="springgreen4")

#33%  Skew label
  mtext("33% power\n\n",line=7,side = 2,cex=1.25,las=1,col="springgreen4")
  mtext(bquote("(Full "*italic(p)*"-curve)"),line=7,side = 2,cex=1,las=1,col="springgreen4") 
#Labels bottom left of chart
    mtext(side=2,line=4.5,bquote(italic(P)*"-value of overall test"),font=2,cex=1.25)
    mtext(side=2,line=3,"(Stouffer's method)",font=3,cex=1)
    mtext(side=1,line=2.5,bquote(bold("Number of tests dropped (")*bolditalic("k")*")"),cex=1.15)
#Plot (6)
  plotdrop(droplow.33,col="springgreen4")
#Labels bottom right of chart
  mtext(side=1,line=2.5,bquote(bold("Number of tests dropped (")*bolditalic("k")*")"),cex=1.15)
#x-axis label
#Legend (winging it for location)
  op=par(usr=c(0,1,0,1),xpd=NA)   #allow goin outsie of plot 6 and recalibrate dimensions to be 0-1 in x and y
  
  legend(-.6,-.25,horiz=TRUE,pch=c(19,1),cex=1.4, legend=expression("Including all "*italic(p)*"-values","Dropping "*italic(p)*"-values"))   
#so the legend is placed 60% of a chart to the left of 0 of #6, an 25% of a chart below it.
#save it
  dev.off()
}
```

To use the `pcurve_app` function, we have to specify that the function should use the **input.txt** in which the data we created using the `pcurve_dataprep` function is stored. We also have to provide the function with the **path to our folder/working directory where the file is stored**. 

```{block,type='rmdachtung'}
Please not that while the standard way that **Windows** provides you with paths is using **backslashes**, we need to use **normal slashes** (**/**) for our designated path.
```

```{r,eval=FALSE}
pcurve_app("input.txt","C://Users/Admin/Documents/R/WORKING_DIRECTORY/Meta-Analyse Buch/bookdown-demo-master")
```

The function automatically **creates and stores several files and figures into our folder/working directory**. Here the most important ones:

<br><br>

**Figure 1: "input.png"**

```{r, echo=FALSE, fig.width=7}
library(png)
library(grid)
img <- readPNG("_pcurve/input.png")
grid.raster(img)
```

This figure shows you the **p-curve for your results (in blue)**. On the bottom, you can also find the number of effect sizes with $p<0.05$ which were included in the analysis. There are two tests displayed in the plot.

**The test for right-skewness**

If there is evidential value behind our data, the p-curve should be **right-skewed**. Through eyeballing, we see that this is pretty much the case here, and the **tests for the half and full p-curve** are also both **significant** ($p_{Full}<0.001, p_{Half}<0.001$). This means that the p-curve is heavily right-skewed, indicating that **evidential value is present in our data**

**The test for flatness**

If there is evidential value behind our data, the p-curve should also **not be flat**. Through eyeballing, we see that this is pretty much the case here. The **tests for flatness** are both not significant ($p_{Full}=0.9887, p_{Binomial}=0.7459$).

<br><br>

**Figure 2: "input_fit.png"**

```{r, echo=FALSE, fig.width=7}
library(png)
library(grid)
img <- readPNG("_pcurve/input_fit.png")
grid.raster(img)
```

This plot estimates the **power** behind our data, meaning if we have sufficient studies with sufficient participants to find a true effect if it exists. A conventional threshold for optimal power is **80%**, but P-curve can even assess evidential value if studies are **underpowered**. In our case, the the power estimate is **90%**.

<br><br>

**Figure 3: "input_cumulative.png"**

```{r, echo=FALSE, fig.width=7}
library(png)
library(grid)
img <- readPNG("_pcurve/input_cumulative.png")
grid.raster(img)
```

This plot provides **sensitivity analyses** in which the **highest and lowest p-values are dropped**.

### Estimating the "true" effect of our data

To estimate the **true effect of our data** with p-curve (much like the [Duval & Tweedie trim-and-fill procedure](#dant)), we can use the `plotloss` function described and made openly available by Simonsohn et al. [@simonsohn2014pb].

The code for this function is quite long, so it is not displayed here. It can accessed using this [link](https://github.com/MathiasHarrer/Doing-Meta-Analysis-in-R/blob/master/true_effect_estimation.R).

Again, R doesn't know this function yet, so we have to let R learn it by **copying and pasting** the code underneath **in its entirety** into the **console** on the bottom left pane of RStudio, and then hit **Enter ⏎**.


```{r,echo=FALSE}
##DISCLAIMER: This code is a wrapper including parts of the code to estimate the
##true effect size with P-Curve using a K-S test. The code can be found at
##http://www.p-curve.com/Supplement/Rcode_paper2/APPENDIX%20-%20Loss%20Function%20and%20Estimation.R
##This method has been proposed and described in
##Simonsohn, Nelson, Simmons (Perspectives on Psych Science 2014) - "P-Curve and Effect Size: Correcting for Publication Bias Using Only Significant Results" V9(6) p.666-681

t.test2 <- function(m1,m2,s1,s2,n1,n2,m0=0,equal.variance=FALSE)
{
  if( equal.variance==FALSE )
  {
    se <- sqrt( (s1^2/n1) + (s2^2/n2) )
    # welch-satterthwaite df
    df <- ( (s1^2/n1 + s2^2/n2)^2 )/( (s1^2/n1)^2/(n1-1) + (s2^2/n2)^2/(n2-1) )
  } else
  {
    # pooled standard deviation, scaled by the sample sizes
    se <- sqrt( (1/n1 + 1/n2) * ((n1-1)*s1^2 + (n2-1)*s2^2)/(n1+n2-2) )
    df <- n1+n2-2
  }
  t <- (m1-m2-m0)/se
  dat <- c(m1-m2, se, t, 2*pt(-abs(t),df))
  names(dat) <- c("Difference of means", "Std Error", "t", "p-value")
  return(t)
}


#################################################################################################
#SYNTAX beneath taken from http://www.p-curve.com/Supplement/Rcode_paper2/APPENDIX%20-%20Loss%20Function%20and%20Estimation.R


#LOSS FUNCTION
loss=function(t_obs,df_obs,d_est) {
  #################################################################################################
  #SYNTAX beneath taken from http://www.p-curve.com/Supplement/Rcode_paper2/APPENDIX%20-%20Loss%20Function%20and%20Estimation.R


  #################################################################################################
  #SYNTAX:
  #1. t_obs is a vector with observed t-values,
  #2. df_obs vector with degrees of freedom associated with each t-value
  #3. d_est is the effect size on which fitted p-curve is based and the measure of loss computed
  #################################################################################################

  #1.Convert all ts to the same sign (for justification see Supplement 5)
  t_obs=abs(t_obs)

  #2 Compute p-values
  p_obs=2*(1-pt(t_obs,df=df_obs))

  #3 Keep significant t-values and corresponding df.
  t.sig=subset(t_obs,p_obs<.05)
  df.sig=subset(df_obs,p_obs<.05)


  #4.Compute non-centrality parameter implied by d_est and df_obs
  #df+2 is total N.
  #Becuase the noncentrality parameter for the student distribution is ncp=sqrt(n/2)*d,
  #we add 2 to d.f. to get N,  divide by 2 to get n, and by 2 again for ncp, so -->df+2/4
  ncp_est=sqrt((df.sig+2)/4)*d_est

  #5.Find critical t-value for p=.05 (two-sided)
  #this is used below to compute power, it is a vector as different tests have different dfs
  #and hence different critical values
  tc=qt(.975,df.sig)

  #4.Find power for ncp given tc, again, this is a vector of implied power, for ncp_est,  for each test
  power_est=1-pt(tc,df.sig,ncp_est)

  #5.Compute pp-values
  #5.1 First get the overall probability of a t>tobs, given ncp
  p_larger=pt(t.sig,df=df.sig,ncp=ncp_est)

  #5.2 Now, condition on p<.05
  ppr=(p_larger-(1-power_est))/power_est  #this is the pp-value for right-skew

  #6. Compute the gap between the distribution of observed pp-values and a uniform distribution 0,1
  KSD=ks.test(ppr,punif)$statistic        #this is the D statistic outputted by the KS test against uniform
  return(KSD)
}


#Function 2: Estimate d and plot loss function

plotloss=function(data,dmin,dmax)
{
  #Create t_obs and df_obs vector

data<-data
m1<-as.numeric(data$Me)
m2<-as.numeric(data$Mc)
s1<-as.numeric(data$Se)
s2<-as.numeric(data$Sc)
n1<-as.numeric(data$Ne)
n2<-as.numeric(data$Nc)

t_obs<-t.test2(m1=m1,m2=m2,s1=s1,s2=s2,n1=n1,n2=n2)
df_obs<-(n1+n2)-2


  #################################################################################################
  #SYNTAX:
  #t_obs  : vector with observed t-values
  #df_obs : vector with degrees of freedom associated with each t-value
  #dmin   : smallest  effect size to consider
  #dnax   : largest   effect size to consider
  #e.g., dmin=-1, dmax=1 would look for the best fitting effect size in the d>=-1 and d<=1 range
  #################################################################################################

  #Results will be stored in these vectors, create them first
  loss.all=c()
  di=c()

  #Compute loss for effect sizes between d=c(dmin,dmax) in steps of .01
  for (i in 0:((dmax-dmin)*100))
  {
    d=dmin+i/100                   #effect size being considered
    di=c(di,d)                     #add it to the vector (kind of silly, but kept for symmetry)
    options(warn=-1)               #turn off warning becuase R does not like its own pt() function!
    loss.all=c(loss.all,loss(df_obs=df_obs,t_obs=t_obs,d_est=d))
    #apply loss function so that effect size, store result
    options(warn=0)                #turn warnings back on
  }

  #find the effect leading to smallest loss in that set, that becomes the starting point in the optimize command
  imin=match(min(loss.all),loss.all)       #which i tested effect size lead to the overall minimum?
  dstart=dmin+imin/100                     #convert that i into a d.

  #optimize around the global minimum
  dhat=optimize(loss,c(dstart-.1,dstart+.1), df_obs=df_obs,t_obs=t_obs)
  options(warn=-0)

  #Plot results
  pcurveplot<-plot(di,loss.all,xlab="Effect size\nCohen-d", ylab="Loss (D stat in KS test)",ylim=c(0,1), main="How well does each effect size fit? (lower is better)")
  points(dhat$minimum,dhat$objective,pch=19,col="red",cex=2)
  text(dhat$minimum,dhat$objective-.08,paste0("p-curve's estimate of effect size:\nd=",round(dhat$minimum,3)),col="red")
  return(pcurveplot)
  return(dhat$minimum)
}
```

For the `plotloss` function, we only have to provide the **data** to be used to find the "true" effect size underlying the data, and a **range of effect sizes in which the function should search for the true effect**, delineated by `dmin` and `dmax`.

I will use my `metacont` data again here, and will search for the true effect between $d=0$ to $d=1.0$.

```{r}
plotloss(data=metacont,dmin=0,dmax=1)
```

The function provides an effect estimate of $d=0.64$. 

```{block,type='rmdachtung'}
It should be noted that this chapter should only be seen as **an introduction into p-Curve**, which should not be seen as comprehensive.

Simonsohn et al. [@simonsohn2015better] also stress that P-Curve should only be used for **outcome data which was actually of interest for the Authors of the specific article**, because those are the one's likely to get p-hacked. They also ask meta-researchers to provide a **detailed table in which the reported results of each outcome data used in the p-Curve is documented** (a guide can be found [here](http://p-curve.com/guide.pdf)). 

It has also been shown that P-Curve's effect estimate are **not robust when the heterogeneity of a meta-analyis is high** (*I*^2^ > 50%). Van Aert et al. [@van2016conducting] propose **not to determine the "true" effect using P-Curve when heterogeneity is high** (defined as I^2^ > 50%). 

A poosible solution for this problem might be to **reduce the overall heterogeneity using outlier removal**, or to p-Curve results in **more homogeneous subgroups**.
```